Notebook: OpenAQ data overview

File names

/notebooks/01_openaq-data-overview.ipynb
/notebooks/01_openaq-data-overview.py

Motivation

OpenAQ provides access to thousands of real-time air pollution measurement stations via a free API. For our target list of cities around the world, we want to explore the following questions:

How many cities are covered by the OpneAQ database for individual pollutants?
How homogeneous is the data across different stations for the same city?
Will spurious measurements be a problem? How can we detect outliers?

Methods

fetch all available data for the last 7 days for each city from the OpenAQ API
save all data for stations providing any of ["o3", "no2", "so2", "pm10", "pm25"] within a radius of 25 km (i.e. API max)
plot global maps of cities with available data and time series for all stations for the last 7 days grouped by city

Results

Global coverage

no data for any pollutant at any point over the last week for 50 out of the 153 cities (and some others with single/dodgy stations)
best coverage for Europe, North America and Eastern Asia
average of $\sim$ 9 stations per location
most measurements for $PM_{2.5}$, fewest for $SO_{2}$ and $O_{3}$

Local Variability

consistency between stations within the same city/region highly variable (e.g. see below for dominant diurnal ozone cycle in Warsaw; large spread in Athens, non-stationary):

air quality stations likely clustered around pollution hot spots, e.g. see $NO_{2}$ peaks around large streets (Hallein A10, Hallein B159) in Salzburg:

large spread across stations for all pollutants in Lima (see below)
e.g. for $PM_{10}$, choice of station/averaging makes difference between AQI level 1 (<20) or 6 (>150)

Data Quality

some outliers and missing values which are not masked are easy to identify, as they are either 0 or -1000:

but sometimes spikes of 2-3 data points are difficult to judge/interpret:

Conclusions

no useful data for any pollutant for $\sim$ half of all cities in OpenAQ data -> might need to consider additional data sources
spurious data/outliers for most cities -> robust quality screening will be necessary to detect and remove these data
large spread between sensors within some cities -> develop methods to generate city-level estimates

vAirify Wiki

Home

Getting Started and Overview

Investigations and Notebooks

Testing

Manual Test Charters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly