The European Environment Agency (EEA) provides a selection of datasets about air quality in Europe. The data is available for download at the portal, but the interface makes it a bit time consuming to do bulk downloads. Hence, an easy Python-based interface.
I recently discovered that Pandas has a function to propagate time series events forward (or backward) in time across a DataFrame. Here's how it works.
Since I've started using Apache Spark, one of the frequent annoyances I've come up against is having an idea that would be very easy to implement in Pandas, but turns out to require a really verbose workaround in Spark. A recent example of this is doing a forward fill (filling
null values with the last known non-
Pandas has a useful feature that I didn't appreciate enough when I first started using it:
groupbys without aggregation. What do I mean by that? Let's look at an example.
This is a cute trick I discovered the other day for quickly computing the time since an event on regularly spaced time series data (like monthly reporting), without looping over the data.