Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/new datasets #99

Merged
merged 7 commits into from
Oct 30, 2024
Merged

Conversation

mariahpope
Copy link
Contributor

@mariahpope mariahpope commented Oct 24, 2024

These are some helpful changes and/or additions that were necessary as I tested out the anemoi-datasets module on a few different datasets. Please let me know if you have any questions! Thanks!

Bug Fix

  1. src/anemoi/datasets/create/functions/sources/xarray/variable.py
    Bug fix - the arg in the class Variable is "variable" not "var". This caused a failure for me when working with levels until fixed.

Additions I found helpful

  1. src/anemoi/datasets/create/init.py
    Add the ability to work with cftime when needed.

  2. src/anemoi/datasets/create/functions/sources/xarray/fieldlist.py
    This is an update that was helpful when working with a curvilinear grid (but will not always be necessary - it depends on the dataset). It came after I noticed that some dimensions are not always set as coordinates/variables in the dataset. For example. a dataset with a curvilinear grid could have lat/lon as coordinates, but the relevant x and y index dimensions are not set as coordinates or variables. Therefore, this is a check to make sure that if dimension in a dataset was identified in a "flavour" that it is also set as a coordinate within the dataset.

Ability to read from Microsoft's Planetary Computer

There are a number of super useful datasets on planetary computer, and these updates make it possible to read most zarr stores from there (e.g. CONUS404 example: https://planetarycomputer.microsoft.com/dataset/conus404#Example-Notebook). The URL that a user of anemoi-datasets would use in a yaml file is the STAC collection URL associated with the dataset. I chose to set it up this way because this url appeared to be the most consistent way to reference zarr datasets within planetary computer.

  1. src/anemoi/datasets/create/functions/sources/xarray/init.py
    Elif statement specifically for planetary computer. There are a few extra hoops to jump through for open_zarr to work.

  2. src/anemoi/datasets/data/stores.py
    Add class PlanetaryComputerStore() that mimics the already existing S3 store and HTTPS store.

Unit test addition

  1. tests/xarray/test_zarr.py
    I added two unit tests for additional datasets. The updates within this PR were able to make anemoi-datasets work with both of these datasets.

@FussyDuck
Copy link

FussyDuck commented Oct 24, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@b8raoult b8raoult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the new dependencies in the pyproject.toml file, as optional dependecies? Namely pandas, pystac_client and planetary_computer.

@mchantry
Copy link
Member

Thanks for the great contribution!

@b8raoult b8raoult merged commit e1303ad into ecmwf:develop Oct 30, 2024
5 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants