Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define layer from product subset #885

Open
valpesendorfer opened this issue Oct 17, 2022 · 9 comments
Open

Define layer from product subset #885

valpesendorfer opened this issue Oct 17, 2022 · 9 comments
Assignees

Comments

@valpesendorfer
Copy link
Contributor

valpesendorfer commented Oct 17, 2022

Hello 👋,

not sure if the title gives it away fully, but I'd like to know about the possibility to add a feature that enables the definition of an OWS layer from a subset of datasets within a datacube product.

Currently, a layer needs to correspond to at least one ODC product, with multiple products possible within multiproduct layer configuration.

But as far as I know or see, there's no way to define a layer from a subset of an ODC product, without creating a new ODC product that contains only the desired datasets, which is what I want to avoid.

As an example:

We have an OWS layer defined based on an ODC product containing country specific datasets for multiple different countries. Every dataset has its respective country encoded in the metadata as region_code attribute.

However, if you're trying to visualize only one country of interest, say Mozambique in this example, you're stuck with also getting the other datasets displayed which fall into the current extent (here Zimbabwe and Namibia):

image

Ideally, I'd be able to define a layer for a specific country, while still referencing the same ODC product containing all datasets, adding the ability to filter out datasets for the country in question using a query on region_code, as you can already do with the datacube-core.

Maybe this could even be more generalized to any other queryable attributes, or even a time range. But for us, the focus would be strictly on region_code.

So my question, is this something that can be added (if it's not possible already)? What would be the required steps and modifications? From the datacube side it seems pretty straight-forward, but surely things get tricky in the OGC part.

Thanks!!!

@SpacemanPaul
Copy link
Contributor

There are a couple of possible approaches.

  1. At one point, OWS used to support "sub-products" (I would call them sub-layers now). The use case was daily swath/path satellite data (like Landsat or Sentinel-2). There would be one manually configured parent layer, but one child layer created automatically for each satellite path. So if you were interested in one particular location, you could thumb through the dates that data was available for that particular satellite path, rather than having to skip through all the dates where there was no data for that path. The implementation involved supplying a user function that looked at dataset metadata and extracted a value that was used to define which sub-layer it belonged to.

It worked well for Landsat but we never got it working properly for Sentinel-2 and in the end the "Filter by Location" function in Terria plus the available-date list in GetFeatureInfo made it redundant. Then there was a major code-refactor at one point and support for sub-layers was dropped. But there are still stubs and hooks for it various places in the code base (in particular the wms.sub_product_ranges table still exists).

Reinstating this functionality would appear to meet your needs, and would be relatively easy.

  1. Another approach would be to leverage ODC's "virtual product" functionality, as discussed in Enhancement Proposal: Native support for virtual products in OWS. #455. This would be a lot more work, but would also be a lot more powerful - indeed it could eventually replace quite a lot of functionality in OWS that is currently implemented independently of ODC core, resulting in considerable simplification of OWS code and configuration syntax.

@valpesendorfer
Copy link
Contributor Author

Hey Paul, thanks a lot for your answer.

  1. At one point, OWS used to support "sub-products" (I would call them sub-layers now). The use case was daily swath/path satellite data (like Landsat or Sentinel-2). There would be one manually configured parent layer, but one child layer created automatically for each satellite path. So if you were interested in one particular location, you could thumb through the dates that data was available for that particular satellite path, rather than having to skip through all the dates where there was no data for that path. The implementation involved supplying a user function that looked at dataset metadata and extracted a value that was used to define which sub-layer it belonged to.

It worked well for Landsat but we never got it working properly for Sentinel-2 and in the end the "Filter by Location" function in Terria plus the available-date list in GetFeatureInfo made it redundant. Then there was a major code-refactor at one point and support for sub-layers was dropped. But there are still stubs and hooks for it various places in the code base (in particular the wms.sub_product_ranges table still exists).

Reinstating this functionality would appear to meet your needs, and would be relatively easy.

This sounds interesting, particularly appealing is the structure as parent / child layer and as you mention the relatively easy implementation.

The problem we're having now is two-fold:
a) we have dates displayed for a location, where there's actually no dataset available for that location.
b) when we select a date for a country, all other countries in the same extent with the same date are displayed as well, i.e. we can't exclude countries of "non-interest".

Sounds like a) would be already handled by GetFeatureInfo? I need to familiarize myself having ignored this until now. Appreciate any links you could provide seeing this in action.

But if I understand correctly, this would not be a fix for b).

  1. Another approach would be to leverage ODC's "virtual product" functionality, as discussed in Enhancement Proposal: Native support for virtual products in OWS. #455. This would be a lot more work, but would also be a lot more powerful - indeed it could eventually replace quite a lot of functionality in OWS that is currently implemented independently of ODC core, resulting in considerable simplification of OWS code and configuration syntax.

Indeed I thought virtual products could be a good fit for this purpose. But given the amount of required work and the lack of traction since 2020, I'd think this would need some more interest from other sides. Maybe this could be a eventual replacement for a "stop-gap" solution in an eventual bigger refactor?

@SpacemanPaul
Copy link
Contributor

Sounds like a) would be already handled by GetFeatureInfo? I need to familiarize myself having ignored this until now. Appreciate any links you could provide seeing this in action.

It's really a TerriaJS feature (not sure if you're using Terria, but you could potentially implement the same thing in your front-end). There's a "filter by location" button for each layer, and when you press it you are prompted to click on a point on the map. Then a GetFeatureInfo request is made in the background for that point and that layer. The resulting json file contains the list of valid dates for that location, and Terria uses that for the date selector instead of the full date list for the layer from GetCaps.

E.g. Here's a map with a daily satellite layer selected: https://maps.dea.ga.gov.au/#share=s-3MhNG77mZbaQdd24fwSOuIdDRic

Check the available dates in the date selector, then click on "Filter by Location" and click on a point on the map, then look at the date selector again.

But you are right, this doesn't address your use-case b).

@SpacemanPaul
Copy link
Contributor

SpacemanPaul commented Oct 18, 2022

I remember now why we dropped support for it. The old implementation required maintaining the range table by walking through the entire database, reading every metadata document in full. When we shifted to more efficient processes for maintaining range tables, it had to go.

But I can see how to do it more efficiently now - at the cost of losing some generality. For example, you would need to restrict yourself to the "search fields" defined in the metadata type so that indexed database queries can be used. (region_code is a search field in the EO3 metadata type, so if you are using EO3, that shouldn't be an issue.

(Similar restriction apply to virtual products in any case.)

@valpesendorfer
Copy link
Contributor Author

valpesendorfer commented Oct 18, 2022

Thanks for the example, very helpful. We're not using TerraJS, but this could be indeed somehow integrated into our workflow.

To address b):

We do indeed use EO3 and I was hoping there should be a more straightforward solution given that region_code is queryable in the datacube. I was thinking adding some functionality in the config that allows for a query on these fields isn't a simple possibility because it'll require changes to SQL code or the range table implementation? My knowledge of the code base clearly ends here.

Is adding the required parts to query on these fields a similarly extensive change as adding support for virtual products or could this be an easier solution?

@SpacemanPaul
Copy link
Contributor

SpacemanPaul commented Oct 19, 2022

Hmm I just remembered another major complication that makes both approaches discussed above problematic - OWS doesn't actually use the ODC for dataset queries - it uses its own materialised views.

The postgis spatial search against the materialised views is both more accurate and more efficient than native ODC searches. Work is underway to apply the learnings from OWS into datacube-core, but that's a long and bumpy road with no clear ETA.

Given your specific use cases, it would probably be easier to define the sub-layers spatially. I.e. provide in config a list of countries, with a simplified polygon (in EPSG:4326) for the extent of each country.

Then each country would be presented as a sub-layer as described above, and the query can be done spatially against the materialised views by simply taking the intersection of the requested extent and sub-layer extent.

That would be a much more OWS-friendly approach - but could only be used for "spatial" sub-layers. It wouldn't work for metadata search fields that didn't map neatly to a geospatial region.

@valpesendorfer
Copy link
Contributor Author

Hmm I just remembered another major complication that makes both approaches discussed above problematic - OWS doesn't actually use the ODC for dataset queries - it uses its own materialised views.

Just for my understanding, when adding a new layer these views need to be calculated from presumably the geo info form the datasets? Or does it come from the ODC product definition? I was hoping maybe this is a place where region_code could be injected.

Given your specific use cases, it would probably be easier to define the sub-layers spatially. I.e. provide in config a list of countries, with a simplified polygon (in EPSG:4326) for the extent of each country.

Then each country would be presented as a sub-layer as described above, and the query can be done spatially against the materialised views by simply taking the intersection of the requested extent and sub-layer extent.

That would be a much more OWS-friendly approach - but could only be used for "spatial" sub-layers. It wouldn't work for metadata search fields that didn't map neatly to a geospatial region.

That actually sounds like a working solution! Instead of providing the region_code in the layer config that would describe all required datasets, I could provide a simplified polygon that does a spatial subset for the country.

Where can I define that polygon? I've been looking at the docs and config examples but didn't find any info.

Thanks, much appreciated!

@SpacemanPaul
Copy link
Contributor

Just for my understanding, when adding a new layer these views need to be calculated from presumably the geo info form the datasets? Or does it come from the ODC product definition? I was hoping maybe this is a place where region_code could be injected.

The view definition gets the geo and time info from from the dataset metadata, directly from the database using JSON queries - with multiple pathways depending on metadata type - and implemented in raw sql. It's a major pain to make changes to - and completely ungeneralisable.

Where can I define that polygon? I've been looking at the docs and config examples but didn't find any info.

No sorry, I didn't mean to give you that impression. It's not available now - it requires new code. It just requires much less refactoring of existing code than the other options discussed.

@valpesendorfer
Copy link
Contributor Author

Ok got it, thanks. So if this were to be implemented, we'd go about it by adding the polygon filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants