diff --git a/.github/workflows/quarto-render.yml b/.github/workflows/quarto-render.yml
index 187fb1a8..e2ce7dfd 100644
--- a/.github/workflows/quarto-render.yml
+++ b/.github/workflows/quarto-render.yml
@@ -38,6 +38,7 @@ jobs:
run: |
sudo apt update
sudo apt install -y libgeos-dev
+ sudo apt-get install osmosis
shell: sh
- name: Build API reference pages
run: |
diff --git a/.gitignore b/.gitignore
index 232867ba..96994f6c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -198,6 +198,7 @@ docs/build/
docs/_sidebar.yml
docs/reference/
!docs/_static/tp_logo_white_background.png
+!docs/explanation/**/*.PNG
# PyBuilder
.pybuilder/
diff --git a/_quarto.yml b/_quarto.yml
index 58d1e82d..8e7da31d 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -138,6 +138,7 @@ quartodoc:
package: transport_performance.osm
contents:
- osm_utils
+ - validate_osm
- title: "`analyse_network`"
desc: >
A class wrapping r5py network routing to calculate travel times between all origin/destination cells.
@@ -158,3 +159,5 @@ quartodoc:
- defence
- io
- raster
+
+jupyter: python3
diff --git a/docs/_static/styles.css b/docs/_static/styles.css
index 98e86887..0ee09761 100644
--- a/docs/_static/styles.css
+++ b/docs/_static/styles.css
@@ -2,3 +2,64 @@
max-height: 500px;
overflow-y: auto;
}
+
+.jumbotron {
+ border: 1px solid black;
+ padding: 15px;
+ padding-bottom: 0px;
+ text-align: center;
+ align-items: center;
+}
+
+.grid-container {
+ display: grid;
+ gap: 15px;
+}
+
+.item1 {
+ grid-column-start: 1;
+ grid-column-end: 1;
+ grid-row-start: 1;
+ grid-row-end: 1;
+}
+
+.item2 {
+ grid-column-start: 2;
+ grid-column-end: 2;
+ grid-row-start: 1;
+ grid-row-end: 1;
+}
+
+.item3 {
+ grid-column-start: 1;
+ grid-column-end: 1;
+ grid-row-start: 2;
+ grid-row-end: 2;
+}
+
+.item4 {
+ grid-column-start: 2;
+ grid-column-end: 2;
+ grid-row-start: 2;
+ grid-row-end: 2;
+}
+
+.item5 {
+ grid-column-start: 1;
+ grid-column-end: 1;
+ grid-row-start: 3;
+ grid-row-end: 3;
+}
+
+.item6 {
+ grid-column-start: 2;
+ grid-column-end: 2;
+ grid-row-start: 3;
+ grid-row-end: 3;
+}
+
+.jumbotron-icon{
+ font-size: 50px;
+ margin-top: -0.5rem;
+ margin-bottom: -0.5rem;
+}
diff --git a/docs/explanation/calculate_tp/index.qmd b/docs/explanation/calculate_tp/index.qmd
index c3957942..e18a770e 100644
--- a/docs/explanation/calculate_tp/index.qmd
+++ b/docs/explanation/calculate_tp/index.qmd
@@ -1,10 +1,138 @@
---
-title: "2. Transport Performance: An Example"
-description: An overview of how we used `transport_performance` to calculate the transport performance of urban centre public transit networks.
-date-modified: 05/16/2024 # must be in MM/DD/YYYY format
+title: "2. Transport Performance: An Overview"
+description: |
+ An overview of using the `transport_performance` package to calculate the
+ transport performance of urban centre public transit networks.
+date-modified: 06/12/2024 # must be in MM/DD/YYYY format
categories: ["Explanation"] # see https://diataxis.fr/tutorials-how-to/#tutorials-how-to, delete as appropriate
toc: true
date-format: iso
---
-π§ Page under construction π§
+This page discusses the main methods and tools
+used within the package and provides links to additional resources for further
+reading. In particular, this page presents a methodology for assessing the
+performance of urban centre public transit networks using
+`transport_performance`. Although, it is possible to modify and extend the
+approach presented to suit the requirements of most transport analyses
+including:
+
+- Analysis area (no strict requirement on using [Eurostat's urban centre definition][urban centre])
+- Date of analysis
+- Time of day
+- Transport modes such as walking, cycling, public transit, private car or a combination of these modes
+- Maximum journey duration
+
+::: {.callout-note}
+
+This page does not cover retrieving input data or `transport_performance` API
+usage. See the [how-to](../../how_to/index.qmd),
+[tutorials](../../tutorials/index.qmd), and
+[API reference](../../reference/index.qmd) pages for more information on these
+aspects. It should be noted that `transport_performance` will work with any
+custom boundary provided, in which case urban centre detection will not be
+required. Also that public transit schedule preprocessing is not required for
+modalities other than public transit.
+
+:::
+
+`transport_performance` can be used to assess urban centre public transit
+performance by following the overall approach shown in @fig-tp-methods.
+
+::: {#fig-tp-methods layout-nrow=1}
+
+```{mermaid}
+flowchart LR
+ A[Urban centre\ndetection] --> B[Population\npreprocessing]
+ A --> C[Public transit schedule\npreprocessing]
+ A --> D[OpenStreetMap\npreprocessing]
+ B --> E
+ C --> E
+ D --> E
+ E[Transport network\nrouting] --> F[Calculate transport\nperformance]
+
+```
+
+
+An overview of a methodology for calculating the transport performance of
+urban centre public transit networks using `transport_performance`.
+
+:::
+
+The process starts with urban centre detection. This definition was created by
+Eurostat, and represents high density population clusters (see the [Eurostat
+level 1 degree of urbanisation methodology document][eurostat-uc] for more
+details). In short, it is a cluster of contiguous 1 Km2 grid cells
+with a density of at least 1,500 inhabitants/Km2 and a total
+population of at least 50,000. This definition is advantageous since it can be
+applied consistently internationally.
+
+`transport_performance` currently works with gridded population estimates. Such
+a data source is the [Global Human Settlement Layer][ghsl] (GHSL). The
+[GHSL-POP][ghsl-pop] layer provides high resolution estimates with worldwide
+coverage. It uses combined satellite imagery and national census data to
+produce population estimates down to 100 metre grids (see [section 2.5 of the
+GHSL technical paper][ghsl-pop-methods] for more details). Using
+`transport_performance`, it is also possible to reaggregate gridded population
+estimates (e.g. from 100m to 200m grids) as a balance between achieving
+granular results and performance at the transport network routing stage.
+
+When considering public transit performance, schedule data is a core input (for
+other modalities this step is not required). The widely adopted [General
+Transit Feed Specification (GTFS)][gtfs-overview] data are required for
+defining the public transit network within `transport_performance`. This is
+scheduled data, therefore the effects of delays (such as traffic) are not
+accounted for in the final transport performance results.
+`transport_performance` provides a range of GTFS validation, cleaning, and
+filtering methods to pre-process the inputs for use during the transport
+network routing stage.
+
+The underlying route network is built using [OpenStreetMap][osm]
+(OSM) data. OSM is an open, community-maintained source of map data worldwide.
+OSM data provides the spatial information about the street network, such as
+road and pathway locations, speed limits, transport rules and junction
+locations. With `transport_performance` it is possible to optimise these data
+by spatially filtering OSM files to an area of interest (using [Osmosis]). This
+filtering also removes OSM features that are not required for transport routing
+(such as buildings and waterways).
+
+The transport network routing stage calculates the feasible journey travel
+times over multiple departure times. `transport_performance` uses [R5py][r5py],
+to undertake performant transit routing with the [Round-Based Public Transit Routing engine (RAPTOR)][raptor].
+It is also is highly configurable and caters for a range of transport modalities,
+including public transit, private car, cycling, and walking. This improves upon
+the ONS Data Science Campus' [previous transport modelling work][dsc-otp] by
+calculating robust median travel times over many journeys. Calculated travel
+duration at a single journey departure time can vary significantly, depending on
+the public transport service availability within the locality of the journey.
+Travel time statistics are calculated across multiple consecutive journies
+within a given time window. These statistics are a fairer representation of
+average journey travel times within a given area. For more details, see
+[Fink, Klumpenhouwer, Saraiva, Pereira, and Tenkanen (2022)][r5py-paper]
+and [Conway, Byrd, and van der Linden (2017)][r5-paper].
+
+The final stage uses the network routing results (travel times) to calculate
+the transport performance. See the [Transport Performance: A Definition](../what_is_tp/index.qmd)
+page for more details on this step.
+
+::: {.callout-note}
+
+For more information on the known `transport_performance` package limitations,
+see the [limitations and caveats](../limitations/index.qmd) page.
+
+:::
+
+
+[eurostat-uc]: https://ec.europa.eu/eurostat/documents/3859598/15348338/KS-02-20-499-EN-N.pdf/0d412b58-046f-750b-0f48-7134f1a3a4c2?t=1669111363941#page=35
+[ghsl]: https://human-settlement.emergency.copernicus.eu/dataToolsOverview.php
+[ghsl-pop]: https://human-settlement.emergency.copernicus.eu/download.php?ds=pop
+[ghsl-pop-methods]: https://human-settlement.emergency.copernicus.eu/documents/GHSL_Data_Package_2023.pdf?t=1698413418
+[gtfs-overview]: https://gtfs.org/schedule/
+[osm]: https://www.openstreetmap.org/about
+[r5py]: https://r5py.readthedocs.io/en/stable/
+[raptor]: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/01/raptor_alenex.pdf
+[r5py-paper]: https://zenodo.org/records/7060438
+[r5-paper]: https://core.ac.uk/reader/223242270
+[dsc-otp]: https://datasciencecampus.ons.gov.uk/using-open-data-to-understand-hyperlocal-differences-in-uk-public-transport-availability/
+[Osmosis]: https://wiki.openstreetmap.org/wiki/Osmosis
+[urban centre]: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Glossary:Urban_centre
diff --git a/docs/explanation/what_is_tp/accessible_pop.PNG b/docs/explanation/what_is_tp/accessible_pop.PNG
new file mode 100644
index 00000000..36fdf364
Binary files /dev/null and b/docs/explanation/what_is_tp/accessible_pop.PNG differ
diff --git a/docs/explanation/what_is_tp/index.qmd b/docs/explanation/what_is_tp/index.qmd
index 6779bdd4..7d00a96d 100644
--- a/docs/explanation/what_is_tp/index.qmd
+++ b/docs/explanation/what_is_tp/index.qmd
@@ -1,10 +1,62 @@
---
-title: 1. What is Transport Performance?
+title: "1. Transport Performance: A Definition"
description: An insight into what transport performance is and what it tells us about transport networks.
-date-modified: 05/16/2024 # must be in MM/DD/YYYY format
+date-modified: 06/11/2024 # must be in MM/DD/YYYY format
categories: ["Explanation"] # see https://diataxis.fr/tutorials-how-to/#tutorials-how-to, delete as appropriate
toc: true
date-format: iso
---
-π§ Page under construction π§
+Transport Performance (TP) is a metric originally developed by the European Commission in their [2020 work on low carbon urban transport accessibility][euro-commission-paper]. TP puts the population at the centre of its definition, by measuring how efficiently a transport network moves the surrounding population to a destination within a certain time frame. A TP value of 100% would mean all the nearby population can travel to a location within the time threshold.
+
+Since TP is bound by a time frame, it is highly dependent on transport modalities; for example, public transit, private vehicle, cycling, and walking. The example discussed on this page considers the public transit network.
+
+TP is also dependent on the surrounding population and the destination itself, making it highly variable across an area. For this reason, it is calculated on a granular scale to build up the TP picture across an area of interest. The example discussed on this page uses populated 200x200m cells.
+
+@fig-tp-definition illustrates how TP is calculated for one cell in the centre of Newport, Wales using a 45 minutes time threshold, an 11.25Km distance limit on the surrounding population, and the public transit network.
+
+::: {.callout-tip}
+
+`transport_performance` is highly configurable. It caters for different modalities and time/distance thresholds (and more!) beyond the configuration presented on this page. See the [tutorials](../../tutorials/index.qmd) and [API reference](../../reference/index.qmd) for more details.
+
+:::
+
+::: {#fig-tp-definition layout-ncol=2}
+
+![Accessible population - the total population that can travel to a cell in central Newport, Wales within 45 minutes by public transit](accessible_pop.PNG){#fig-access}
+
+![Proximity population - the total nearby population to a cell in central Newport, Wales within the distance limit (11.25km)](proximity_pop.PNG){#fig-proxi}
+
+Accessible and proximity population definitions using 200x200m cells and an example destination in the middle of Newport, Wales.
Source: ONS Data Science Campus, April 2024.
+:::
+
+@fig-tp-definition uses a green marker to denote the destination cell and a red dashed line to illustrate the boundary of the nearby population. The dark pink region in @fig-access represents the **accessible population**. This is the total population that can reach the green marker within the time threshold using the transport network. The dark blue region in @fig-proxi represents the **proximity population**. This is the total nearby population within the distance limit. Then, to calculate the total accessible and proximity populations, we count the population across all highlighted cells respectively. The **transport performance** of the network when travelling to the destination is then the ratio of the accessible and proximity populations (multiplied by 100 to convert to a percentage), as shown in @eq-tp:
+
+$$
+T_i(t_{max}, d_{max}) = 100 \times \frac{P_{access, i}}{P_{proxi, i}}
+$$ {#eq-tp}
+
+Where:
+
+- $T_i$ is the transport performance of destination cell, $i$.
+- $t_{max}$ is the maximum time threshold.
+- $d_{max}$ is the maximum distance threshold (the limit on proximity population from the destination).
+- $P_{access, i}$ is the total population that can travel to destination cell, $i$, within $t_{max}$ and $d_{max}$.
+- $P_{proxi, i}$ is the total population within $d_{max}$ of destination cell, $i$.
+
+This calculation is repeated to construct the transport performance throughout an entire area of interest (in this case across every destination cell within the urban centre). An example of this for the Newport, Wales [urban centre] is shown in @fig-tp-newport.
+
+::: {#fig-tp-newport layout-ncol="1"}
+
+![](newport_tp.PNG){width=100%"}
+
+Transport performance across Newport, Wales. Public transit within 45 minutes. The red line denotes the boundary of the urban centre.
Source: ONS Data Science Campus, April 2024.
+
+:::
+
+@fig-tp-newport shows how transport performance can vary across an area on a granular scale. The yellow/light green region indicates that ~50-60% of the surrounding population can reach the main city centre of Newport, Wales using public transit within 45 minutes. The transport performance also generally decreases closer to the outskirts of the urban centre. This means a smaller proportion of the surrounding population can reach the dark blue/purple areas using public transit within 45 minutes. Overall, it provides detailed, hyperlocal, insights into how the performance of the transport networks varies throughout an area.
+
+Calculating transport performance requires several stages of input data processing and transport network travel time estimation. The methods and tools used by this Python package are discussed in more detail on the [Transport Performance: An Overview](../calculate_tp/index.qmd) page. For more insights on how to use `transport_performance` itself, check out the [tutorials](../../tutorials/index.qmd) and [API reference](../../reference/index.qmd).
+
+[euro-commission-paper]: https://ec.europa.eu/regional_policy/en/information/publications/working-papers/2022/low-carbon-urban-accessibility
+[urban centre]: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Glossary:Urban_centre
diff --git a/docs/explanation/what_is_tp/newport_tp.PNG b/docs/explanation/what_is_tp/newport_tp.PNG
new file mode 100644
index 00000000..292df8ba
Binary files /dev/null and b/docs/explanation/what_is_tp/newport_tp.PNG differ
diff --git a/docs/explanation/what_is_tp/proximity_pop.PNG b/docs/explanation/what_is_tp/proximity_pop.PNG
new file mode 100644
index 00000000..b18c3f25
Binary files /dev/null and b/docs/explanation/what_is_tp/proximity_pop.PNG differ
diff --git a/docs/tutorials/osm/index.qmd b/docs/tutorials/osm/index.qmd
index 230d2c61..7d34d427 100644
--- a/docs/tutorials/osm/index.qmd
+++ b/docs/tutorials/osm/index.qmd
@@ -148,9 +148,9 @@ that you have `osmosis` installed for this task.
Define a `filtered_osm_path` object to save the filtered pbf file to.
-Use the `filter_osm()` function to restrict the PBF file to the extent of
-`BBOX_LIST`. Inspect `help(filter_osm)` for information on all available
-parameters.
+Use the [`filter_osm()`](/docs/reference/osm_utils.qmd#transport_performance.osm.osm_utils.filter_osm)
+function to restrict the PBF file to the extent of `BBOX_LIST`. Inspect the API
+reference or use `help(filter_osm)` for information on all available parameters.
### Hint
@@ -174,6 +174,15 @@ filter_osm(
:::
+::: {.callout-note}
+
+When using `filter_osm()`, the default behaviour is to remove elements tagged
+as buildings, waterways, landuse, and natural since they are not required for
+transport routing and removing them significantly reduces file size.
+If this is not desired, set `tag_filter=False`.
+
+:::
+
Notice that `osmosis` is quite chatty and will print various exceptions
originating from the Java code. If the filter operation was performed
successfully, you should see `INFO: Pipeline complete.` and an execution time
@@ -209,9 +218,9 @@ tag IDs that are available.
### Task
-Use the `validate_osm.FindIds` class to discover the full list of IDs within
-the pbf file saved at `filtered_osm_path`. Assign the class instance to
-`id_finder`.
+Use the [`validate_osm.FindIds`](/docs/reference/validate_osm.qmd#transport_performance.osm.validate_osm.FindIds)
+class to discover the full list of IDs within the pbf file saved at
+`filtered_osm_path`. Assign the class instance to `id_finder`.
Use an appropriately named method to count the available IDs within the file.
@@ -285,8 +294,9 @@ forward to visualise the points on a map.
### Task
-Assign `validate_osm.FindLocation` to an instance called `loc_finder`. You will
-need to point this class to the same filtered PBF file as you used previously.
+Assign [`validate_osm.FindLocations`](/docs/reference/validate_osm.qmd#transport_performance.osm.validate_osm.FindLocations)
+to an instance called `loc_finder`. You will need to point this class to the
+same filtered PBF file as you used previously.
Using the `way_ids` list from a previous task, pass the first 5 IDs to
`loc_finder.plot_ids()` in a list. Ensure that you specify that the
@@ -314,7 +324,9 @@ Visualising these features of the PBF file can help to validate features of the
local transit network, particularly in areas where changes to infrastructure
are ongoing. Examining the features present in relation to our bounding box, we
can see that the geometries may not be neatly cropped to the extent of the
-bounding box.
+bounding box. This is because `filter_osm()` ensures all ways and relations
+are complete when cropping to a bounding box. This means roads and paths that
+traverse the edge of the bounding box remain whole.
Below we display every way (and their member nodes) in the PBF relative to the
bounding box crop we applied (purple).
@@ -329,7 +341,6 @@ poly_gdf = gpd.GeoDataFrame({"geometry": poly}, crs=4326, index=[0])
poly_gdf.explore(color="purple", m=imap)
```
-
The `filter_osm` function has reduced the file size but has also retained
features outside of the crop that we specified. This is because removing a
feature outside of the crop, that is referenced by a feature within the crop
diff --git a/index.qmd b/index.qmd
index 19e1a7cc..f8cb4d09 100644
--- a/index.qmd
+++ b/index.qmd
@@ -1,6 +1,14 @@
---
title: "`transport_performance` documentation"
-toc: false
+title-block-banner: true
+date-format: iso
+description: |
+
+ A Python package bringing together open-source data, tools, and research to
+ make transport network analyses more simple, reproducible, and consistent for
+ everyone.
+toc: true
+toc-title: "On this homepage"
sidebar: false
about:
template: marquee
@@ -10,31 +18,83 @@ about:
text: GitHub
---
-![](docs/_static/tp_logo_white_background.png){width="15em" fig-align="center"}
+## What is `transport_performance`?
+
+The performance of transport networks are highly variable throughout and
+between countries. There is often a lack of consistent and comparable data
+which can make it difficult to understand these differences. This is typically
+because of computational complexity, transparency (closed-source and paid
+services), and data consistency (format and availability).
+
+The `transport_performance` Python package helps to reduce barriers to
+transport analysis. It allows developers to:
-## What is this and why does it exist?
+- Define an [urban centre] boundary based on population density;
+- Inspect, clean, and process [public transit timetable data][gtfs] and
+[OpenStreetMap data][osm]; and
+- Conduct [multimodal routing][r5py] and calculate a range of
+[transport metrics][eurostat-paper].
-Description
+::: {.callout-tip}
+Check out the [transport performance Docker image][tp-docker] π³!
+This aims to simplify the dependency installation and end-to-end use of
+`transport_performance`.
+:::
## Where do I go now?
These docs are structured in accordance with the [DiΓ‘taxis][diataxis] framework:
-- If you're looking to get started with the package quickly, head over to the [Getting Started](docs/getting_started/index.qmd) page.
-- For more information on the `transport_performance` package, refer to the [Explanation section](docs/explanation/index.qmd).
-- The [How-To](docs/how_to/index.qmd) pages provide you with instructions on things like retrieving input data.
-- If you're interested in learning how to use the package, head over to the [Tutorials](docs/tutorials/index.qmd).
-- The `transport_performance` technical reference can be found here: [API reference](docs/reference/index.qmd).
+
Want to get up and running with `transport_performance` quickly?
+Getting Started +Need more details on the methods/tools used within `transport_performance`?
+Explanation +Looking for guidance on how to get something done (e.g. find input data)?
+How-To +Interested in learning how to use `transport_performance` by examples?
+Tutorials +Requiring a technical reference covering the `transport_performance` API?
+API reference +Want to contribute to the development of `transport_performance`?
+GitHub +