You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For popgetter v0.2 the geographic query does not use consistent CRS between countries
Proposed v0.3 behaviour
Overall
For each country, there is likely one (or a small number) commonly used CRS. Often, this will be the CRS used in the upstream source geometry data. It is reasonable to expect that users might wish to use that CRS for country-specific applications.
For global applications, it is possible that they will not wish to handle a separate CRS for each country but use one of the popular generic global CRSs (e.g., WGS84 or Web Mercator).
We should request feedback on whether WGS84, Web Mercator, or both would be the most useful. For the rest of this description, I am assuming just WGS84 for simplicity.
“poppusher” behaviour
Poppusher should publish two copies of the geographic data in two CRS’s:
The country’s “local” CRS, as used by the upstream data source.
Reprojected to WGS84
If the upstream data source is already in WGS84, then only one copy of the data should be published.
Changes to pipelines:
The country specific pipelines need only handle the source CRS.
The cloud_publisher pipeline, should contain a new node to convert and publish the WGS84 data.
Somewhere within the metadata model, we will need to represent the CRS at the country or dataset level.
As we add additional non-census datasets, we might need to consider the case where, for a single country, there are two upstream data sources published in different CRSs.
“popgetter-client” behaviour
The overall principle is that Popgetter-client should not need to do any transformation of coordinates, only be aware of which CRS they refer to.
Within the CLI options / API there should be additional parameters and options:
Where the --bbox parameter is used, there should be an optional addition parameter --bbox-crs: The CRS of the bounding-box query (default to WGS84)
For the data command, there should be a --crs parameter: The CRS in which geographic data should be returned. (default ???)
popgetter countries should print the CRSs available for each country. (This could be information included in a --verbose option if it is too much information to fit in the existing table).
Interaction of CRSs and BBox queries that span more than one country
From the most permissive to the most restrictive case:
To allow completely arbitrary BBox queries, the query and the returned data must be in WGS84 (as the only CRS common to all countries).
Where there are two countries, which both use the same “local” CRS, a BBox query which intersets both countries, should return equivilent data from both country (where metrics allow)
If two neighbouring countries have different CRS, but the valid extent of those CRS overlap, users should not expect results from the neighbouring country, even it some or all of it is included in the BBox result.
A hypothetical example would be if, in future, we include the Republic of Ireland census, which is published using the Irish Transverse Mercator (ITM) [EPSG:2157]. The extent of valid values for ITM includes all of Northern Ireland. A user entering this command should not expect any results from Northern Ireland: popgetter data --bbox 600000 800000 700000 900000 --bbox-crs "EPSG:2157" ...
(The BBox specified intersects the border between the Republic of Ireland and Northern Ireland. See https://en.wikipedia.org/wiki/Irish_Transverse_Mercator )
Rational
poppusher already has all of the dependencies to do arbitrary CRS translations (courtesy of geopandas)
I've not tried it, but I’m assuming that building (proj)[https://blackfriars.atlassian.net/wiki/pages/resumedraft.action?draftId=17039373&draftShareId=69f2b202-3a53-4096-8876-f150f27461d9] and its dependencies in wasm or some other target for popgetter is likely to be painful. The above approach saves us from needing to try this.
Reprojecting lots of data is nicely suited to a one-off background task in a data pipeline and best avoided on the fly where possible.
The text was updated successfully, but these errors were encountered:
andrewphilipsmith
changed the title
Add coordinate reference system to geometry metadata
Popgetter Coordinate Reference System strategy
Jul 19, 2024
V0.2 behaviour
For popgetter v0.2 the geographic query does not use consistent CRS between countries
Proposed v0.3 behaviour
Overall
“poppusher” behaviour
Poppusher should publish two copies of the geographic data in two CRS’s:
If the upstream data source is already in WGS84, then only one copy of the data should be published.
Changes to pipelines:
Somewhere within the metadata model, we will need to represent the CRS at the country or dataset level.
As we add additional non-census datasets, we might need to consider the case where, for a single country, there are two upstream data sources published in different CRSs.
“popgetter-client” behaviour
The overall principle is that Popgetter-client should not need to do any transformation of coordinates, only be aware of which CRS they refer to.
Within the CLI options / API there should be additional parameters and options:
--bbox
parameter is used, there should be an optional addition parameter--bbox-crs
: The CRS of the bounding-box query (default to WGS84)--crs
parameter: The CRS in which geographic data should be returned. (default ???)--verbose
option if it is too much information to fit in the existing table).Interaction of CRSs and BBox queries that span more than one country
From the most permissive to the most restrictive case:
To allow completely arbitrary BBox queries, the query and the returned data must be in WGS84 (as the only CRS common to all countries).
Where there are two countries, which both use the same “local” CRS, a BBox query which intersets both countries, should return equivilent data from both country (where metrics allow)
If two neighbouring countries have different CRS, but the valid extent of those CRS overlap, users should not expect results from the neighbouring country, even it some or all of it is included in the BBox result.
A hypothetical example would be if, in future, we include the Republic of Ireland census, which is published using the Irish Transverse Mercator (ITM) [EPSG:2157]. The extent of valid values for ITM includes all of Northern Ireland. A user entering this command should not expect any results from Northern Ireland:
popgetter data --bbox 600000 800000 700000 900000 --bbox-crs "EPSG:2157" ...
(The BBox specified intersects the border between the Republic of Ireland and Northern Ireland. See https://en.wikipedia.org/wiki/Irish_Transverse_Mercator )
Rational
proj
)[https://blackfriars.atlassian.net/wiki/pages/resumedraft.action?draftId=17039373&draftShareId=69f2b202-3a53-4096-8876-f150f27461d9] and its dependencies in wasm or some other target for popgetter is likely to be painful. The above approach saves us from needing to try this.The text was updated successfully, but these errors were encountered: