Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These are some helpful changes and/or additions that were necessary as I tested out the anemoi-datasets module on a few different datasets. Please let me know if you have any questions! Thanks!
Bug Fix
Bug fix - the arg in the class Variable is "variable" not "var". This caused a failure for me when working with levels until fixed.
Additions I found helpful
src/anemoi/datasets/create/init.py
Add the ability to work with cftime when needed.
src/anemoi/datasets/create/functions/sources/xarray/fieldlist.py
This is an update that was helpful when working with a curvilinear grid (but will not always be necessary - it depends on the dataset). It came after I noticed that some dimensions are not always set as coordinates/variables in the dataset. For example. a dataset with a curvilinear grid could have lat/lon as coordinates, but the relevant x and y index dimensions are not set as coordinates or variables. Therefore, this is a check to make sure that if dimension in a dataset was identified in a "flavour" that it is also set as a coordinate within the dataset.
Ability to read from Microsoft's Planetary Computer
There are a number of super useful datasets on planetary computer, and these updates make it possible to read most zarr stores from there (e.g. CONUS404 example: https://planetarycomputer.microsoft.com/dataset/conus404#Example-Notebook). The URL that a user of anemoi-datasets would use in a yaml file is the STAC collection URL associated with the dataset. I chose to set it up this way because this url appeared to be the most consistent way to reference zarr datasets within planetary computer.
src/anemoi/datasets/create/functions/sources/xarray/init.py
Elif statement specifically for planetary computer. There are a few extra hoops to jump through for open_zarr to work.
src/anemoi/datasets/data/stores.py
Add class PlanetaryComputerStore() that mimics the already existing S3 store and HTTPS store.
Unit test addition
I added two unit tests for additional datasets. The updates within this PR were able to make anemoi-datasets work with both of these datasets.