Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The STAC transform should allow items with no Proj information #297

Open
alexgleith opened this issue Aug 8, 2021 · 2 comments
Open

The STAC transform should allow items with no Proj information #297

alexgleith opened this issue Aug 8, 2021 · 2 comments

Comments

@alexgleith
Copy link
Contributor

In ODC, the shape and transform (grid) information is optional. We should be able to handle assets without shape and transform, but currently we can't.

Errors look like this:

Traceback (most recent call last):
  File "C:\Users\cesar\anaconda3\envs\cubeenv\lib\site-packages\odc\apps\dc_tools\fs_to_dc.py", line 58, in cli
    metadata = stac_transform(metadata)
  File "C:\Users\cesar\anaconda3\envs\cubeenv\lib\site-packages\odc\index\stac.py", line 272, in stac_transform
    proj_transform=proj_transform,
  File "C:\Users\cesar\anaconda3\envs\cubeenv\lib\site-packages\odc\index\stac.py", line 130, in _get_stac_bands
    grid = f"g{transform[0]:g}m"
TypeError: 'NoneType' object is not subscriptable
@Kirill888
Copy link
Member

Problem Description

EO3 metadata format expects one to define native projection information for each band. Unlike STAC, EO3 expects an extra indirection layer: each band is assigned to some named grid (shared across several bands), this way if 10 bands share common grid (footprint), then grid information is recorded once. One of the grids must be called "default". Bands that belong to "default" grid can omit grid specification as "default" grid is implied. This also means that a common case of "all bands in the dataset share the same footprint and resolution" has minimal textual representation in EO3 format.

Each "grid" is basically shape and transform tuple, that in combination with shared CRS fully define a shared footprint of the bands belonging to that grid. The four image corners 0,0; W,0; W,H; 0,H are mapped via linear transform to give footprint information. The transform also encodes native resolution of the image.

The "essential" information needed by dc.load is "footprint", while "native resolution" and "native projection" information are "nice to haves". One can still search for and load data without knowing up front in what projection pixels are stored, so long as bounding box is accurate enough (fully encloses valid data of the Dataset while being tight).

Proposed Solution

When native projection and resolution data is not available but bounding box or a geometry are defined we can produce a "fake" default grid. Such grid would have CRS: "EPSG:4326", shape: [1,1] and transform: computed in such a way that 0,0 -> 1,1 square maps to a bounding box of the dataset in lon,lat.

Only concern is that native_geobox called on such a Dataset will report valid geobox even though it shouldn't. But that could be a simple fix to detect shape==(1,1) and report missing data instead in this case, I do not expect a valid case for 1x1 pixel data, so using that as a marker for "information not available" is acceptable in my view.

cc: @gadomski

@Kirill888
Copy link
Member

EPSG:4326 is somewhat annoying to deal with though, and is a least tested configuration in datacube, so maybe it's worth supporting customization of "default" CRS by the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants