As our intro picture already cryptically explained. In a lot of cases weather should be considered an important factor when looking for correlation (<> causation). In this picture both variables are dependant on the weather. A no brainer, I know.
More and more companies are looking into this subject to understand where these correlations exist. That is why more and more data regarding this topic is collected and combined with the data already available for analysis.
This post will help you to get started with collecting this data and adding it to your apps to combine it with real business performance data and insights.
In this post I will explain and attach a few things that might help you to get started on this topic.
- What the source is for the data in this post
- What comes with this data
- How you can add it to your analysis
- How the app in this example works
- A download to the app to get you started
The data source
The source for this data is https://www.wunderground.com/history/ at this address you can do a manual lookup with the wunderground console. With that capability there is also an API that can output the data in various formats of which this is an example: https://www.wunderground.com/history/airport/AMS/2016/1/1/CustomHistory.html?dayend=31&monthend=12&yearend=2016&format=1
This example will get you the weather data from the 1st of january to 31st of december for 2016 for the airport weatherstation of Amsterdam. The URL will output a CSV format with this data. And that is why these kind of APIs are so interesting to work with in combination with Qlik Sense.
What does the data hold?
With the URL in the example you will get a number of data points:
- The date of the events and measures
- Min/Max and Mean temperature in degrees Celsius
- Min/Max and Mean Humidity
- Max and Mean Windspeed in Km/h
- Cloud Coverage
- (Precipitation in MM) which I commented out in the script, because it is not measured at every weatherstation
- Events, like fog, rain, etc…
How can this data be combined with your own data?
The data which is fetched comes from a location (Weatherstation / ICAO) on a certain date. This means that whenever you have data in your own analysis with for example longitudes and latitudes and the date range of the events, you can associate it together. You can then add the measures from the weather dataset to present or calculate in combination with your business data.
An explanation of the app attached to this post
This image shows some of the visualizations that can be found in the app. The main reason for the app being in this post is the collection of the weather data itself. Therefor I will focus on the script of this app which is responsible for the collection of the data.
When you open the data load editor you will see a few tabs/sections. These each do particular operations, which I will explain next:
In the settings tab the global variables in most cases are already set. The addition to this tab I did:
// Fetch weather results // of todays year LET vYear = 2016; // Year(Today());
This creates a variable for determining the year for which you want the data to be fetched from the wunderground API. More of this will be explained later.
This tab loads data from a csv file I uploaded with a few cities and their GPS coordinates.
Cities: LOAD // ID, CityID, CityName, replace(Latitude, ',', '.') as Latitude, replace(Longitude, ',', '.') as Longitude, Country as CountryCode, // Region, Population, replace(Latitude, ',', '.') &','& replace(Longitude, ',', '.') as WeatherLocation, GeoMakePoint(replace(Latitude, ',', '.'), replace(Longitude, ',', '.')) as GeoPoint FROM [lib://Cities] (txt, utf8, embedded labels, delimiter is ';', msq) ;
The CSV file with the cities can be found @ https://www.dropbox.com/s/5ypbjursnqufi0j/Cities.skv?dl=1
I also load in a few other external sources:
Countries: LOAD "ISO-3166alpha2" as CountryCode, "ISO-3166alpha3" as CountryCode3, // "ISO-3166numeric", Country as CountryName, "Area in km²" as AreaSize, trim(Population) as CountryPopulation, Continent as ContinentCode FROM [lib://Countries] (html, utf8, embedded labels, table is @2) Where Exists(CountryCode, "ISO-3166alpha2") ;
This is the load script for adding country information to the data model. This data is loaded straight of a webpage @ http://www.geonames.org/countries/
I also loaded a KML file that holds data for the polygons of the country boundaries. This file can be found @ https://www.dropbox.com/s/7ywpabs224p5jkz/world.kml?dl=1
This is the snippet that loads this data:
WorldKML: LOAD [world.kml?dl=1.Name] as CountryCode3, [world.kml?dl=1.Area] as CountryArea FROM [lib://World KML] (kml, Table is [World.shp/Features]) ;
This tab uses an inline table to add continent names to the mix:
Continents: LOAD * INLINE [ ContinentCode, ContinentName AF, Africa AN, Antarctica AS, Asia EU, Europe NA, North America OC, Australia SA, South America ];
The link for the ‘ContinentCode’ field can be found in the table ‘Countries’.
Tab: WeatherStation Lookup
In this tab the first step of processing our request for weather data starts, looking up our weatherstations according to our location data (in this case the ‘Cities’ tab). These weatherstations need to be in our process before getting the actual weather data. We need these weatherstations to loop through them collecting the data that is relevant to the locations we need the data for. This is the script:
// ##################### Lookup Weatherstations (airport codes) based on GPS // http://api.wunderground.com/auto/wui/geo/GeoLookupXML/index.xml?query=37.76834106,-122.39418793 // Get Stations from Wunderground API // number of seconds to wait for a URL datasource SET OpenUrlTimeout = 10; FOR i = 0 to NoOfRows('Cities')-1 ; LET vGPS = peek( 'WeatherLocation', $(i), 'Cities'); LET vCityID = peek( 'CityID', $(i), 'Cities'); TRACE Loading Page: $(i); WeatherStationCodes: FIRST 1 LOAD $(vCityID) as CityID, 'http://api.wunderground.com/auto/wui/geo/GeoLookupXML/index.xml?query=$(vGPS)' as wFetchURL, $(i) as wPageNr, city as Weatherstation, state as WeatherState, country as WeatherCountry, icao, // Weatherstation (Airport) Code lat as wLat, lon as wLon, if(len(icao) > 0, 'No', 'Yes') as LookupError FROM [http://api.wunderground.com/auto/wui/geo/GeoLookupXML/index.xml?query=$(vGPS)] (XmlSimple, table is [location/nearby_weather_stations/airport/station]) ; SLEEP 100; NEXT;
As you can see this script creates a loop over the rows that are available in our cities table (FOR i = 0 to NoOfRows(‘Cities’)-1;). Then for every row in the cities table the weather stations closest to our GPS coordinates will be fetched from http://api.wunderground.com/auto/wui/geo/GeoLookupXML/index.xml?query=$(vGPS). In which $(vGPS) are our GPS coordinates from the field ‘WeatherLocation’ in our ‘Cities’ table. After all weather stations are collected, our next step is to get the actual weather data into the application with the following script:
Tab: Weather History Lookup
// ##################### Lookup Weather history based on airport codes // number of seconds to wait for a URL datasource SET OpenUrlTimeout = 10; FOR i = 0 to NoOfRows('WeatherStationCodes')-1 ; LET vICAO = peek( 'icao', $(i), 'WeatherStationCodes'); TRACE Loading Page: $(i); Weather: LOAD $(i) as whPage, '$(vICAO)' as icao, @1 as Date, @2 as MaxTemp_C, @3 as MeanTemp_C, @4 as MinTemp_C, @8 as MaxHumidity, @9 as MeanHumidity, @10 as MinHumidity, @17 as MaxWindSpeed_KmH, @18 as MeanWindSpeed_KmH, // @20 as Precipitation_MM, @21 as CloudCoverage, @22 as Events FROM [https://www.wunderground.com/history/airport/$(vICAO)/$(vYear)/1/1/CustomHistory.html?dayend=31&monthend=12&yearend=$(vYear)&format=1] (txt, codepage is 1252, no labels, delimiter is ',', msq, header is 1 lines) Where isnum(@2) = -1 and @2 < 60 and @4 > -40 ; SLEEP 100; NEXT;
Remember our example URL in our ‘Data source’ paragraph? https://www.wunderground.com/history/airport/AMS/2016/1/1/CustomHistory.html?dayend=31&monthend=12&yearend=2016&format=1
In the ‘FROM’ statement in the script this is where we use this URL. Only now we use variables (within a loop) to dynamically generate the URL we need for the weather stations we looked up in the tab ‘Weatherstation Lookup’. Also the addition of the ‘vYear’ variable is done to get the data for the data range we want. As you can see I also filter out temperatures above 60 degrees Celsius and below -40 degrees. I don’t want these outliers to be in my dataset.
The last tab is a calendar table which is generated based on the ‘vYear’ variable. Which I won’t post here. You can find it in the app.
When you reload this app (which can take around 5-10 minutes, APIs aren’t that fast in general), you will get the following data model:
I hope this post will help you get started adding weather data to your own datasets. You could even consider storing this data into QVDs as a datamart.
The app for this post can be found here.
A screenshot of the dashboard in the app