As part of its open data initiatives, the European Environment Agency (EEA) provides a selection of datasets about air quality in Europe, collectively known as the Air Quality e-Reporting database. The richest subset of this data is the air quality time series data. There are 2 subsets: E1a, which is cleaner and more validated, and E2a, which is more up to date. This is an important dataset for climate research, as well as a great source for data scientists who are looking for some non-trivial real world data to practice on (when combined with the measurement station metadata, you have both geodata and timeseries).
The one major obstacle to using this data is the download interface (a.k.a. "the portal"), which is very cumbersome. First you need to navigate to the download page and select the parameters of the dataset you want to download, like the pollutant of interest, the location, and the time span:
The form constructs a URL with query parameters matching what you've selected in the form (this was my first hint that this should be easy to automate). Once you hit the "Download" button, you would expect that it would download the dataset, but that would of course be too easy. If you hit the button for the parameters I've filled in you're greeted with a list of links to various CSV files to individually click on and download yourself.
For this particular query this isn't too bad, but if you want to query many types of pollutant or a bigger group of countries, this is going to cost you some serious time.
A better way
To make this process easier, I developed
airbase: an easy Python client for accessing the data (this database was formerly known as AirBase, and I thought it was a catchy name). It started off as a script to help a friend of mine who is in climate research, and I realized with a bit of cleanup it might be useful to other people as well. It's available on PyPI, so to install you can simply
$ pip install airbase
To start downloading your dataset, import the package and initialize the client:
>>> import airbase >>> client = airbase.AirbaseClient()
The client helps you to construct your request, and does some validation for you, like checking that the pollutant you want is available in the countries you're asking for. It does this by downloading some files from the portal, so this requires an internet connection.
Kind of like using the portal, but more conveniently, you next construct the parameters of the dataset you're looking for:
>>> request = client.request(country="NL", pl="NO3", year_from=2014, year_to=2017)
If you don't include a parameter, the client will construct a request for all possible values (so you can just user
client.request() to get the whole dataset).
With your request constructed, all that's left to do is to choose how you want to download the data. You can choose to either
download_to_directory() to get all those CSVs individually, or you can
download_to_file() to concatenate them into one big CSV. Either way, the request object will first contact the portal to get the links to all the CSVs you need, and then start downloading them as instructed. Of course, you can follow the progress with nice progress bars.
>>> request.download_to_directory("./data") Generating CSV download links... 100%|██████████████████████████| 1/1 [00:03<00:00, 3.14s/it] Generated 16 CSV links ready for downloading Downloading CSVs to ./data... 100%|██████████████████████████| 16/16 [00:03<00:00, 5.34it/s]
If you want to update your dataset later (e.g. getting the last week's worth of data),
download_to_directory() will automatically skip downloading most of the files that are already there.
Hunting for correlations between locations? Make sure to download the metadata file that contains the locations and other properties of the measurement stations that supply the data:
>>> client.download_metadata("./data/metadata.tsv") Writing metadata to ./data/metadata.tsv...
Full documentation is availble on ReadTheDocs, and of course the whole package is open sourced on GitHub too. I know at least a handful of people have used the package, including one confirmed publication (by my friend who I made the original script for), which is very cool!
Have you used
airbase in your research or learning? Let me know! I'd love to hear about it!