Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog consistency MDTF and user data catalog #588

Open
aradhakrishnanGFDL opened this issue Jun 7, 2024 · 0 comments · Fixed by #587
Open

Catalog consistency MDTF and user data catalog #588

aradhakrishnanGFDL opened this issue Jun 7, 2024 · 0 comments · Fixed by #587
Assignees
Labels
data catalogs Issues related to intake esm data catalogs

Comments

@aradhakrishnanGFDL
Copy link
Collaborator

What problem will this feature solve?

Achieves some level of consistency in the input data catalog (from GFDL catalog builder) and the MDTF intermediate catalogs in PP.

Important so users that are new to catalogs can learn one set of terms and specs/template for the data catalog, as they get started.

Helps both GFDL analysis scripts with and without MDTF to use a common catalog and hence improve interoperability.

Helps with training material, shared across GFDL and CESM, and for model inter-comparison projects.

Describe the solution you'd like
To the aggregate_columns:
https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/c87746c7e19870806b025c79c90f96cc33c1d173/src/util/catalog.py#L205:L216

Add: chunk_freq ,
Change: variant_label to member_id (MDTF)
Consider:
For recording the “convention”, evaluate reusing the CMIP CV.
“project_id” as the column name. Example: project_id = CMIP, project_id = dev , project_id = GFDL.

If activity_id is not being used, can it be removed or moved outside of aggregate columns? It was originally used to filter by “MIP” in CMIP6. It could be an “optional” column, rather than in aggregate_columns.

Ordering of the aggregate columns can also be maintained, so that a user that typically uses a "key pattern" to query a dataset is less confused.

Here is how the GFDL catalog builder template looks like (to be merged in, more changes pending):
https://github.com/aradhakrishnanGFDL/CatalogBuilder/blob/129-cmip/cats/gfdl_template.json#L79:L88

(note that modeling_realm will be changed to realm in the above; temporal_subset will be changed to time_range)

Describe alternatives you've considered

Alternate way of handling things considered and following actions to be taken from the GFDL Catalog builder side to help synchronize the data catalog template with MDTF.
(Following is NOT for MDTF framework suggested changes)

Change: modeling_realm to realm (GFDL Catalog Builder)
Change: temporal_subset to time_range (GFDL Catalog Builder)

If there are changes that do not resonate with the framework goals or catalog usage, please raise them to discuss further and rethink solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data catalogs Issues related to intake esm data catalogs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants