Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch info for v2.0 Beta #76

Merged
merged 2 commits into from
Apr 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions docs/overview.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Zarr Data Overview
==================

Check warning on line 2 in docs/overview.rst

View workflow job for this annotation

GitHub Actions / build_and_preview

Duplicate explicit target name: "here".

Requirements
------------
Expand All @@ -25,13 +25,24 @@
Data Locations
--------------


CMIP6 data in the cloud can be found in both Google Cloud and AWS S3 storage buckets:

- ``gs://cmip6`` (part of `Google Cloud Public Datasets <https://cloud.google.com/public-datasets>`_)
- ``s3://cmip6-pds`` (part of the `AWS Open Data Sponsorship Program <https://aws.amazon.com/opendata/public-datasets/>`_)

The data is primarily `Zarr <https://zarr.readthedocs.io/en/stable/>`_-formatted, with a predetermined and well-defined directory structure to ensure that it is properly organized and classified.
This directory structure is reflected in the master CSV files located at the root of each bucket, which enumerates all available Zarr stores using their containing directory names as columns to allow for sorting and filtering.
.. warning::
The AWS S3 storage copy mechanism is currently broken and thus data might be out of sync.
Progress on reimplementing a sync between buckets is tracked `here <https://github.com/leap-stc/cmip6-leap-feedstock/issues/134>`_.

The `Zarr <https://zarr.readthedocs.io/en/stable/>`_-formatted data is currently ingested using `Pangeo-Forge <https://pangeo-forge.org>`_ recipes as part of the `NSF LEAP Project <https://leap.columbia.edu>`_ (`more info <https://github.com/leap-stc/cmip6-leap-feedstock>`_)

The base organization of Zarr stores is reflected in the master CSV files located at the root of each bucket, which enumerates all available Zarr stores and their facets (components of the instance_id) to allow for sorting and filtering.

.. warning::
**Parts of the information below is superseeded by the new `Pangeo-ESGF CMIP6 Zarr Data 2.0` (currently in Beta testing)**
Please refer to the `repository <https://github.com/leap-stc/cmip6-leap-feedstock/>`_ for up to date information, particularly how to `access new data <https://github.com/leap-stc/cmip6-leap-feedstock#how-to-access-the-newly-uploaded-data>`_ and `request new data to be ingested <https://github.com/leap-stc/cmip6-leap-feedstock#how-can-i-request-new-data>`_.
This page will be updated once the `beta testing phase is complete <https://github.com/leap-stc/cmip6-leap-feedstock/issues/135>`_.

Zarr storage format
-------------------
Expand Down Expand Up @@ -112,7 +123,7 @@
There are currently over 400,000 entries - which is too large for Google Spreadsheets, but can be viewed in most standard spreadsheet applications and the entries can be sorted, selected and discovered quickly and efficiently. We find that importing them as a python ``pandas`` dataframe is very useful.

NetCDF Data Overview
====================

Check warning on line 126 in docs/overview.rst

View workflow job for this annotation

GitHub Actions / build_and_preview

Duplicate explicit target name: "aws open data sponsorship program".

Check warning on line 126 in docs/overview.rst

View workflow job for this annotation

GitHub Actions / build_and_preview

Duplicate explicit target name: "here".

Check warning on line 126 in docs/overview.rst

View workflow job for this annotation

GitHub Actions / build_and_preview

Duplicate explicit target name: "here".

Check warning on line 126 in docs/overview.rst

View workflow job for this annotation

GitHub Actions / build_and_preview

Duplicate explicit target name: "here".

Data locations
-------------------------------
Expand Down
Loading