Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Popgetter Coordinate Reference System strategy #142

Open
sgreenbury opened this issue Jul 10, 2024 · 0 comments
Open

Popgetter Coordinate Reference System strategy #142

sgreenbury opened this issue Jul 10, 2024 · 0 comments
Milestone

Comments

@sgreenbury
Copy link
Collaborator

sgreenbury commented Jul 10, 2024

V0.2 behaviour

For popgetter v0.2 the geographic query does not use consistent CRS between countries

Proposed v0.3 behaviour

Overall

  • For each country, there is likely one (or a small number) commonly used CRS. Often, this will be the CRS used in the upstream source geometry data. It is reasonable to expect that users might wish to use that CRS for country-specific applications.
  • For global applications, it is possible that they will not wish to handle a separate CRS for each country but use one of the popular generic global CRSs (e.g., WGS84 or Web Mercator).
  • We should request feedback on whether WGS84, Web Mercator, or both would be the most useful. For the rest of this description, I am assuming just WGS84 for simplicity.

“poppusher” behaviour

Poppusher should publish two copies of the geographic data in two CRS’s:

  • The country’s “local” CRS, as used by the upstream data source.
  • Reprojected to WGS84

If the upstream data source is already in WGS84, then only one copy of the data should be published.

Changes to pipelines:

  • The country specific pipelines need only handle the source CRS.
  • The cloud_publisher pipeline, should contain a new node to convert and publish the WGS84 data.

Somewhere within the metadata model, we will need to represent the CRS at the country or dataset level.

As we add additional non-census datasets, we might need to consider the case where, for a single country, there are two upstream data sources published in different CRSs.

“popgetter-client” behaviour

The overall principle is that Popgetter-client should not need to do any transformation of coordinates, only be aware of which CRS they refer to.

Within the CLI options / API there should be additional parameters and options:

  • Where the --bbox parameter is used, there should be an optional addition parameter --bbox-crs: The CRS of the bounding-box query (default to WGS84)
  • For the data command, there should be a --crs parameter: The CRS in which geographic data should be returned. (default ???)
  • popgetter countries should print the CRSs available for each country. (This could be information included in a --verbose option if it is too much information to fit in the existing table).

Interaction of CRSs and BBox queries that span more than one country

From the most permissive to the most restrictive case:

  • To allow completely arbitrary BBox queries, the query and the returned data must be in WGS84 (as the only CRS common to all countries).

  • Where there are two countries, which both use the same “local” CRS, a BBox query which intersets both countries, should return equivilent data from both country (where metrics allow)

  • If two neighbouring countries have different CRS, but the valid extent of those CRS overlap, users should not expect results from the neighbouring country, even it some or all of it is included in the BBox result.

    A hypothetical example would be if, in future, we include the Republic of Ireland census, which is published using the Irish Transverse Mercator (ITM) [EPSG:2157]. The extent of valid values for ITM includes all of Northern Ireland. A user entering this command should not expect any results from Northern Ireland:
    popgetter data --bbox 600000 800000 700000 900000 --bbox-crs "EPSG:2157" ...
    (The BBox specified intersects the border between the Republic of Ireland and Northern Ireland. See https://en.wikipedia.org/wiki/Irish_Transverse_Mercator )

Rational

  • poppusher already has all of the dependencies to do arbitrary CRS translations (courtesy of geopandas)
  • I've not tried it, but I’m assuming that building (proj)[https://blackfriars.atlassian.net/wiki/pages/resumedraft.action?draftId=17039373&draftShareId=69f2b202-3a53-4096-8876-f150f27461d9] and its dependencies in wasm or some other target for popgetter is likely to be painful. The above approach saves us from needing to try this.
  • Reprojecting lots of data is nicely suited to a one-off background task in a data pipeline and best avoided on the fly where possible.
@andrewphilipsmith andrewphilipsmith changed the title Add coordinate reference system to geometry metadata Popgetter Coordinate Reference System strategy Jul 19, 2024
@andrewphilipsmith andrewphilipsmith added this to the v0.3 release milestone Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog:
Development

No branches or pull requests

2 participants