-
Notifications
You must be signed in to change notification settings - Fork 1
Notebook: OpenAQ data overview
Sebastian Steinig edited this page May 16, 2024
·
6 revisions
/notebooks/01_openaq-data-overview.ipynb
/notebooks/01_openaq-data-overview.py
OpenAQ provides access to thousands of real-time air pollution measurement stations via a free API. For our target list of cities around the world, we want to explore the following questions:
- How many cities are covered by the OpneAQ database for individual pollutants?
- How homogeneous is the data across different stations for the same city?
- Will spurious measurements be a problem? How can we detect outliers?
- fetch all available data for the last 7 days for each city from the OpenAQ API
- save all data for stations providing any of ["o3", "no2", "so2", "pm10", "pm25"] within a radius of 25 km (i.e. API max)
- plot global maps of cities with available data and time series for all stations for the last 7 days grouped by city
- no data for any pollutant at any point over the last week for 50 out of the 153 cities (and some others with single/dodgy stations)
- best coverage for Europe, North America and Eastern Asia
- average of
$\sim$ 9 stations per location - most measurements for
$PM_{2.5}$ , fewest for$SO_{2}$ and$O_{3}$
- consistency between stations within the same city/region highly variable (e.g. see below for dominant diurnal ozone cycle in Warsaw; large spread in Athens, non-stationary):
- air quality stations likely clustered around pollution hot spots, e.g. see
$NO_{2}$ peaks around large streets (Hallein A10, Hallein B159) in Salzburg:
- large spread across stations for all pollutants in Lima (see below)
- e.g. for
$PM_{10}$ , choice of station/averaging makes difference between AQI level 1 (<20) or 6 (>150)
- some outliers and missing values which are not masked are easy to identify, as they are either 0 or -1000:
- but sometimes spikes of 2-3 data points are difficult to judge/interpret:
- no useful data for any pollutant for
$\sim$ half of all cities in OpenAQ data -> might need to consider additional data sources - spurious data/outliers for most cities -> robust quality screening will be necessary to detect and remove these data
- large spread between sensors within some cities -> develop methods to generate city-level estimates
Getting Started and Overview
- Product Description
- Roles and Responsibilities
- User Roles and Goals
- Architectural Design
- Iterations
- Decision Records
- Summary Page Explanation
- Deployment Guide
- Working Practices
- Q&A
Investigations and Notebooks
- CAMs Schema
- Exploratory Notebooks
- Forecast ETL Process
- In Situ air pollution data sources
- Notebook: OpenAQ data overview
- Notebook: Unit conversion
- Data Archive Considerations
Manual Test Charters
- Charter 1 (Comparing ECMWF forecast to database values)
- Charter 2 (Backend performance)
- Charter 3 (Forecast range implementation)
- Charter 4 (In situ bad data)
- Charter 5 (Filtering ppm units)
- Charter 7 (Forecast API input validation)
- Charter 8 (Forecast API database sizes)
- Charter 9 (Measurements summary API input validation)
- Charter 10 (Seeding bad data)
- Charter 11 ()Measurements API input validation
- Charter 12 (Validating echart plot accuracy)
- Charter 13 (Explore UI after data outage)
- Charter 14 (City page address)
- Charter 15 (BugFix diff 0 calculation)
- Charter 16 (City page chart data mocking)
- Charter 17 (Summary table logic)
- Charter 18 (AQI chart colour banding)
- Charter 19 (City page screen sizes)
- Charter 20 (Date picker)
- Charter 21 (Graph consistency)
- Charter 22 (High measurement values)
- Charter 23 (ppm -> µg m³)
- Charter 24 (Textures API input validation)
- Charter 25 (Graph line colours)
- Charter 26 (Fill in gaps in forecast)
- Charter 27 (Graph behaviour with mock data)
- Charter 28 (Summary table accuracy)
- Re‐execute: Charter 28
- Charter 29 (Fill in gaps in situ)
- Charter 30 (Forecast window)
- Charter 31 (UI screen sizes)