Skip to content

Commit

Permalink
chore: Bring up to speed with changes in dev
Browse files Browse the repository at this point in the history
  • Loading branch information
r-leyshon committed Jun 19, 2024
2 parents 177401f + 3fa154e commit 0754155
Show file tree
Hide file tree
Showing 12 changed files with 350 additions and 56 deletions.
1 change: 1 addition & 0 deletions .github/workflows/quarto-render.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ jobs:
run: |
sudo apt update
sudo apt install -y libgeos-dev
sudo apt-get install osmosis
shell: sh
- name: Build API reference pages
run: |
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,7 @@ docs/build/
docs/_sidebar.yml
docs/reference/
!docs/_static/tp_logo_white_background.png
!docs/explanation/**/*.PNG

# PyBuilder
.pybuilder/
Expand Down
3 changes: 3 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ quartodoc:
package: transport_performance.osm
contents:
- osm_utils
- validate_osm
- title: "`analyse_network`"
desc: >
A class wrapping r5py network routing to calculate travel times between all origin/destination cells.
Expand All @@ -158,3 +159,5 @@ quartodoc:
- defence
- io
- raster

jupyter: python3
61 changes: 61 additions & 0 deletions docs/_static/styles.css
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,64 @@
max-height: 500px;
overflow-y: auto;
}

.jumbotron {
border: 1px solid black;
padding: 15px;
padding-bottom: 0px;
text-align: center;
align-items: center;
}

.grid-container {
display: grid;
gap: 15px;
}

.item1 {
grid-column-start: 1;
grid-column-end: 1;
grid-row-start: 1;
grid-row-end: 1;
}

.item2 {
grid-column-start: 2;
grid-column-end: 2;
grid-row-start: 1;
grid-row-end: 1;
}

.item3 {
grid-column-start: 1;
grid-column-end: 1;
grid-row-start: 2;
grid-row-end: 2;
}

.item4 {
grid-column-start: 2;
grid-column-end: 2;
grid-row-start: 2;
grid-row-end: 2;
}

.item5 {
grid-column-start: 1;
grid-column-end: 1;
grid-row-start: 3;
grid-row-end: 3;
}

.item6 {
grid-column-start: 2;
grid-column-end: 2;
grid-row-start: 3;
grid-row-end: 3;
}

.jumbotron-icon{
font-size: 50px;
margin-top: -0.5rem;
margin-bottom: -0.5rem;
}
136 changes: 132 additions & 4 deletions docs/explanation/calculate_tp/index.qmd
Original file line number Diff line number Diff line change
@@ -1,10 +1,138 @@
---
title: "2. Transport Performance: An Example"
description: An overview of how we used `transport_performance` to calculate the transport performance of urban centre public transit networks.
date-modified: 05/16/2024 # must be in MM/DD/YYYY format
title: "2. Transport Performance: An Overview"
description: |
An overview of using the `transport_performance` package to calculate the
transport performance of urban centre public transit networks.
date-modified: 06/12/2024 # must be in MM/DD/YYYY format
categories: ["Explanation"] # see https://diataxis.fr/tutorials-how-to/#tutorials-how-to, delete as appropriate
toc: true
date-format: iso
---

🚧 Page under construction 🚧
This page discusses the main methods and tools
used within the package and provides links to additional resources for further
reading. In particular, this page presents a methodology for assessing the
performance of urban centre public transit networks using
`transport_performance`. Although, it is possible to modify and extend the
approach presented to suit the requirements of most transport analyses
including:

- Analysis area (no strict requirement on using [Eurostat's urban centre definition][urban centre])
- Date of analysis
- Time of day
- Transport modes such as walking, cycling, public transit, private car or a combination of these modes
- Maximum journey duration

::: {.callout-note}

This page does not cover retrieving input data or `transport_performance` API
usage. See the [how-to](../../how_to/index.qmd),
[tutorials](../../tutorials/index.qmd), and
[API reference](../../reference/index.qmd) pages for more information on these
aspects. It should be noted that `transport_performance` will work with any
custom boundary provided, in which case urban centre detection will not be
required. Also that public transit schedule preprocessing is not required for
modalities other than public transit.

:::

`transport_performance` can be used to assess urban centre public transit
performance by following the overall approach shown in @fig-tp-methods.

::: {#fig-tp-methods layout-nrow=1}

```{mermaid}
flowchart LR
A[Urban centre\ndetection] --> B[Population\npreprocessing]
A --> C[Public transit schedule\npreprocessing]
A --> D[OpenStreetMap\npreprocessing]
B --> E
C --> E
D --> E
E[Transport network\nrouting] --> F[Calculate transport\nperformance]
```


An overview of a methodology for calculating the transport performance of
urban centre public transit networks using `transport_performance`.

:::

The process starts with urban centre detection. This definition was created by
Eurostat, and represents high density population clusters (see the [Eurostat
level 1 degree of urbanisation methodology document][eurostat-uc] for more
details). In short, it is a cluster of contiguous 1 Km<sup>2</sup> grid cells
with a density of at least 1,500 inhabitants/Km<sup>2</sup> and a total
population of at least 50,000. This definition is advantageous since it can be
applied consistently internationally.

`transport_performance` currently works with gridded population estimates. Such
a data source is the [Global Human Settlement Layer][ghsl] (GHSL). The
[GHSL-POP][ghsl-pop] layer provides high resolution estimates with worldwide
coverage. It uses combined satellite imagery and national census data to
produce population estimates down to 100 metre grids (see [section 2.5 of the
GHSL technical paper][ghsl-pop-methods] for more details). Using
`transport_performance`, it is also possible to reaggregate gridded population
estimates (e.g. from 100m to 200m grids) as a balance between achieving
granular results and performance at the transport network routing stage.

When considering public transit performance, schedule data is a core input (for
other modalities this step is not required). The widely adopted [General
Transit Feed Specification (GTFS)][gtfs-overview] data are required for
defining the public transit network within `transport_performance`. This is
scheduled data, therefore the effects of delays (such as traffic) are not
accounted for in the final transport performance results.
`transport_performance` provides a range of GTFS validation, cleaning, and
filtering methods to pre-process the inputs for use during the transport
network routing stage.

The underlying route network is built using [OpenStreetMap][osm]
(OSM) data. OSM is an open, community-maintained source of map data worldwide.
OSM data provides the spatial information about the street network, such as
road and pathway locations, speed limits, transport rules and junction
locations. With `transport_performance` it is possible to optimise these data
by spatially filtering OSM files to an area of interest (using [Osmosis]). This
filtering also removes OSM features that are not required for transport routing
(such as buildings and waterways).

The transport network routing stage calculates the feasible journey travel
times over multiple departure times. `transport_performance` uses [R<sup>5</sup>py][r5py],
to undertake performant transit routing with the [Round-Based Public Transit Routing engine (RAPTOR)][raptor].
It is also is highly configurable and caters for a range of transport modalities,
including public transit, private car, cycling, and walking. This improves upon
the ONS Data Science Campus' [previous transport modelling work][dsc-otp] by
calculating robust median travel times over many journeys. Calculated travel
duration at a single journey departure time can vary significantly, depending on
the public transport service availability within the locality of the journey.
Travel time statistics are calculated across multiple consecutive journies
within a given time window. These statistics are a fairer representation of
average journey travel times within a given area. For more details, see
[Fink, Klumpenhouwer, Saraiva, Pereira, and Tenkanen (2022)][r5py-paper]
and [Conway, Byrd, and van der Linden (2017)][r5-paper].

The final stage uses the network routing results (travel times) to calculate
the transport performance. See the [Transport Performance: A Definition](../what_is_tp/index.qmd)
page for more details on this step.

::: {.callout-note}

For more information on the known `transport_performance` package limitations,
see the [limitations and caveats](../limitations/index.qmd) page.

:::


[eurostat-uc]: https://ec.europa.eu/eurostat/documents/3859598/15348338/KS-02-20-499-EN-N.pdf/0d412b58-046f-750b-0f48-7134f1a3a4c2?t=1669111363941#page=35
[ghsl]: https://human-settlement.emergency.copernicus.eu/dataToolsOverview.php
[ghsl-pop]: https://human-settlement.emergency.copernicus.eu/download.php?ds=pop
[ghsl-pop-methods]: https://human-settlement.emergency.copernicus.eu/documents/GHSL_Data_Package_2023.pdf?t=1698413418
[gtfs-overview]: https://gtfs.org/schedule/
[osm]: https://www.openstreetmap.org/about
[r5py]: https://r5py.readthedocs.io/en/stable/
[raptor]: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/01/raptor_alenex.pdf
[r5py-paper]: https://zenodo.org/records/7060438
[r5-paper]: https://core.ac.uk/reader/223242270
[dsc-otp]: https://datasciencecampus.ons.gov.uk/using-open-data-to-understand-hyperlocal-differences-in-uk-public-transport-availability/
[Osmosis]: https://wiki.openstreetmap.org/wiki/Osmosis
[urban centre]: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Glossary:Urban_centre
Binary file added docs/explanation/what_is_tp/accessible_pop.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
58 changes: 55 additions & 3 deletions docs/explanation/what_is_tp/index.qmd
Original file line number Diff line number Diff line change
@@ -1,10 +1,62 @@
---
title: 1. What is Transport Performance?
title: "1. Transport Performance: A Definition"
description: An insight into what transport performance is and what it tells us about transport networks.
date-modified: 05/16/2024 # must be in MM/DD/YYYY format
date-modified: 06/11/2024 # must be in MM/DD/YYYY format
categories: ["Explanation"] # see https://diataxis.fr/tutorials-how-to/#tutorials-how-to, delete as appropriate
toc: true
date-format: iso
---

🚧 Page under construction 🚧
Transport Performance (TP) is a metric originally developed by the European Commission in their [2020 work on low carbon urban transport accessibility][euro-commission-paper]. TP puts the population at the centre of its definition, by measuring how efficiently a transport network moves the surrounding population to a destination within a certain time frame. A TP value of 100% would mean all the nearby population can travel to a location within the time threshold.

Since TP is bound by a time frame, it is highly dependent on transport modalities; for example, public transit, private vehicle, cycling, and walking. The example discussed on this page considers the public transit network.

TP is also dependent on the surrounding population and the destination itself, making it highly variable across an area. For this reason, it is calculated on a granular scale to build up the TP picture across an area of interest. The example discussed on this page uses populated 200x200m cells.

@fig-tp-definition illustrates how TP is calculated for one cell in the centre of Newport, Wales using a 45 minutes time threshold, an 11.25Km distance limit on the surrounding population, and the public transit network.

::: {.callout-tip}

`transport_performance` is highly configurable. It caters for different modalities and time/distance thresholds (and more!) beyond the configuration presented on this page. See the [tutorials](../../tutorials/index.qmd) and [API reference](../../reference/index.qmd) for more details.

:::

::: {#fig-tp-definition layout-ncol=2}

![Accessible population - the total population that can travel to a cell in central Newport, Wales within 45 minutes by public transit](accessible_pop.PNG){#fig-access}

![Proximity population - the total nearby population to a cell in central Newport, Wales within the distance limit (11.25km)](proximity_pop.PNG){#fig-proxi}

Accessible and proximity population definitions using 200x200m cells and an example destination in the middle of Newport, Wales.<br><span class="figure-source">Source: ONS Data Science Campus, April 2024.</span>
:::

@fig-tp-definition uses a green marker to denote the destination cell and a red dashed line to illustrate the boundary of the nearby population. The dark pink region in @fig-access represents the **accessible population**. This is the total population that can reach the green marker within the time threshold using the transport network. The dark blue region in @fig-proxi represents the **proximity population**. This is the total nearby population within the distance limit. Then, to calculate the total accessible and proximity populations, we count the population across all highlighted cells respectively. The **transport performance** of the network when travelling to the destination is then the ratio of the accessible and proximity populations (multiplied by 100 to convert to a percentage), as shown in @eq-tp:

$$
T_i(t_{max}, d_{max}) = 100 \times \frac{P_{access, i}}{P_{proxi, i}}
$$ {#eq-tp}
Where:
- $T_i$ is the transport performance of destination cell, $i$.
- $t_{max}$ is the maximum time threshold.
- $d_{max}$ is the maximum distance threshold (the limit on proximity population from the destination).
- $P_{access, i}$ is the total population that can travel to destination cell, $i$, within $t_{max}$ and $d_{max}$.
- $P_{proxi, i}$ is the total population within $d_{max}$ of destination cell, $i$.
This calculation is repeated to construct the transport performance throughout an entire area of interest (in this case across every destination cell within the urban centre). An example of this for the Newport, Wales [urban centre] is shown in @fig-tp-newport.
::: {#fig-tp-newport layout-ncol="1"}
![](newport_tp.PNG){width=100%"}
Transport performance across Newport, Wales. Public transit within 45 minutes. The red line denotes the boundary of the urban centre.<br><span class="figure-source">Source: ONS Data Science Campus, April 2024.</span>
:::
@fig-tp-newport shows how transport performance can vary across an area on a granular scale. The yellow/light green region indicates that ~50-60% of the surrounding population can reach the main city centre of Newport, Wales using public transit within 45 minutes. The transport performance also generally decreases closer to the outskirts of the urban centre. This means a smaller proportion of the surrounding population can reach the dark blue/purple areas using public transit within 45 minutes. Overall, it provides detailed, hyperlocal, insights into how the performance of the transport networks varies throughout an area.
Calculating transport performance requires several stages of input data processing and transport network travel time estimation. The methods and tools used by this Python package are discussed in more detail on the [Transport Performance: An Overview](../calculate_tp/index.qmd) page. For more insights on how to use `transport_performance` itself, check out the [tutorials](../../tutorials/index.qmd) and [API reference](../../reference/index.qmd).
[euro-commission-paper]: https://ec.europa.eu/regional_policy/en/information/publications/working-papers/2022/low-carbon-urban-accessibility
[urban centre]: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Glossary:Urban_centre
Binary file added docs/explanation/what_is_tp/newport_tp.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/explanation/what_is_tp/proximity_pop.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 21 additions & 10 deletions docs/tutorials/osm/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -148,9 +148,9 @@ that you have `osmosis` installed for this task.

Define a `filtered_osm_path` object to save the filtered pbf file to.

Use the `filter_osm()` function to restrict the PBF file to the extent of
`BBOX_LIST`. Inspect `help(filter_osm)` for information on all available
parameters.
Use the [`filter_osm()`](/docs/reference/osm_utils.qmd#transport_performance.osm.osm_utils.filter_osm)
function to restrict the PBF file to the extent of `BBOX_LIST`. Inspect the API
reference or use `help(filter_osm)` for information on all available parameters.

### Hint

Expand All @@ -174,6 +174,15 @@ filter_osm(

:::

::: {.callout-note}

When using `filter_osm()`, the default behaviour is to remove elements tagged
as buildings, waterways, landuse, and natural since they are not required for
transport routing and removing them significantly reduces file size.
If this is not desired, set `tag_filter=False`.

:::

Notice that `osmosis` is quite chatty and will print various exceptions
originating from the Java code. If the filter operation was performed
successfully, you should see `INFO: Pipeline complete.` and an execution time
Expand Down Expand Up @@ -209,9 +218,9 @@ tag IDs that are available.

### Task

Use the `validate_osm.FindIds` class to discover the full list of IDs within
the pbf file saved at `filtered_osm_path`. Assign the class instance to
`id_finder`.
Use the [`validate_osm.FindIds`](/docs/reference/validate_osm.qmd#transport_performance.osm.validate_osm.FindIds)
class to discover the full list of IDs within the pbf file saved at
`filtered_osm_path`. Assign the class instance to `id_finder`.

Use an appropriately named method to count the available IDs within the file.

Expand Down Expand Up @@ -285,8 +294,9 @@ forward to visualise the points on a map.

### Task

Assign `validate_osm.FindLocation` to an instance called `loc_finder`. You will
need to point this class to the same filtered PBF file as you used previously.
Assign [`validate_osm.FindLocations`](/docs/reference/validate_osm.qmd#transport_performance.osm.validate_osm.FindLocations)
to an instance called `loc_finder`. You will need to point this class to the
same filtered PBF file as you used previously.

Using the `way_ids` list from a previous task, pass the first 5 IDs to
`loc_finder.plot_ids()` in a list. Ensure that you specify that the
Expand Down Expand Up @@ -314,7 +324,9 @@ Visualising these features of the PBF file can help to validate features of the
local transit network, particularly in areas where changes to infrastructure
are ongoing. Examining the features present in relation to our bounding box, we
can see that the geometries may not be neatly cropped to the extent of the
bounding box.
bounding box. This is because `filter_osm()` ensures all ways and relations
are complete when cropping to a bounding box. This means roads and paths that
traverse the edge of the bounding box remain whole.

Below we display every way (and their member nodes) in the PBF relative to the
bounding box crop we applied (purple).
Expand All @@ -329,7 +341,6 @@ poly_gdf = gpd.GeoDataFrame({"geometry": poly}, crs=4326, index=[0])
poly_gdf.explore(color="purple", m=imap)
```


The `filter_osm` function has reduced the file size but has also retained
features outside of the crop that we specified. This is because removing a
feature outside of the crop, that is referenced by a feature within the crop
Expand Down
Loading

0 comments on commit 0754155

Please sign in to comment.