Updated timeseries generation with GenTS #143

AgentOxygen · 2024-10-15T19:01:39Z

All Submissions:

Have you followed the guidelines in our Contributor's Guide (including the pre-commit check)?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Have you lint your code locally prior to submission?

Changes to Core Features:

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you successfully tested your changes locally?

Commentary

GenTS is a modernized post-processing package that specializes in converting history files to timeseries files. All code changes are made to run.py as timeseries.py isn't needed (all of the functionality is encapsulated in GenTS). Release versions are made available via PyPI, so I opted to add it to the environment dependencies list rather than git fleximod or externals.

More testing is required for GenTS to make sure it is post-processing history files correctly and integrating into CUPiD will promote further testing. If CUPiD is nearing the production stage, then it might make more sense to create a separate branch.

GenTS can run in serial, but can run in parallel by utilizing Dask. Following the other notebooks, I create a local cluster unless serial is specified in the config.yml.

There are some timeseries specifications in the config.yml that don't yet exist in GenTS. For example, GenTS allows the user to specify a time slice, but this is generalized to all of the history files stored within a ModelOutputDatabase and cannot be broken into model components. To get around this, I create a unique ModelOutputDatabase for each model component, which is likely inefficient. GenTS does not require a history string to identify history files, but this comes with the caveat of processing all history files within the output directory. This may not be ideal for some use cases. I am unsure whether to implement these features into CUPiD or GenTS, as I would prefer to keep GenTS as generalized as possible but there may be some useful tools I could build into GenTS to enable this sort of configuration.

mnlevy1981

I haven't had a chance to test this out yet, but a few things jumped out at first glance. I think this will be a great step forward for us, but we want to make sure we don't lose any of the existing functionality :)

mnlevy1981 · 2024-10-21T15:56:36Z

environments/dev-environment.yml

@@ -18,4 +18,5 @@ dependencies:
    - pyyaml
    - xarray
    - pip:
+      - gents


Do we want to install a static version of gents, or should we add it as a submodule and then use -e ../externals/gents to install it? The latter might be helpful as we tweak the CUPiD API for timeseries generation, especially if it will result in changes to gents as well.

mnlevy1981 · 2024-10-21T16:13:40Z

cupid/run.py

+                n_workers=1,
+                processes=1,
+                threads_per_worker=1,
+                memory_limit="2GB",


Is there a reason to specify memory_limit rather than letting the LocalCluster object figure it out based on available resources? It seems like it could be problematic (if less than 2 GB / core is available) or unnecessarily restrictive (if more than 2 GB / core is available)

mnlevy1981 · 2024-10-21T16:18:05Z

cupid/run.py

-                                    f"{component}", "proc", "tseries",
-                            ),
-                        ]
+                    year_start = int(year_start[0])


The current implementation is that start_years (and end_years) should be lists that are the same length as case_name. This lets us, as an example, compare 60 years of a current run against 100 years of a baseline.

mnlevy1981 · 2024-10-21T16:19:40Z

cupid/run.py

+                    year_end = int(year_end[0])
+
+            modb = gents.ModelOutputDatabase(
+                hf_head_dir=global_params["CESM_output_dir"] + "/" + timeseries_params["case_name"],


What happens if timeseries_params["case_name"] is a list? As mentioned above, this is the case in our key_metrics example, and we want to loop through each case_name, applying the appropriate start_year and end_year (ts_done and overwrite_ts are both lists of the same length as well, though I'm not 100% clear on why ts_done is in the config file)

AgentOxygen and others added 5 commits October 9, 2024 15:44

Added GenTS to be installed in dev environment

4e7753d

Implemented GenTS using current config.yml format

d556170

Fixed bug with yml config path handling

3bcc2cd

Fixed precommit errors

8413466

Cleaned up blank spaces and trailing commas

ec17e2e

AgentOxygen mentioned this pull request Oct 17, 2024

Simultaneous timeseries generation for multiple history files types within a model component #148

Open

mnlevy1981 requested changes Oct 21, 2024

View reviewed changes

TeaganKing added the common utility label Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated timeseries generation with GenTS #143

Updated timeseries generation with GenTS #143

AgentOxygen commented Oct 15, 2024 •

edited

Loading

mnlevy1981 left a comment

mnlevy1981 Oct 21, 2024

mnlevy1981 Oct 21, 2024

mnlevy1981 Oct 21, 2024

mnlevy1981 Oct 21, 2024

@@ @@ -18,4 +18,5 @@ dependencies: @@
                   - pyyaml
                   - xarray
                   - pip:
+                    - gents

Updated timeseries generation with GenTS #143

Are you sure you want to change the base?

Updated timeseries generation with GenTS #143

Conversation

AgentOxygen commented Oct 15, 2024 • edited Loading

All Submissions:

New Feature Submissions:

Changes to Core Features:

Commentary

mnlevy1981 left a comment

Choose a reason for hiding this comment

mnlevy1981 Oct 21, 2024

Choose a reason for hiding this comment

mnlevy1981 Oct 21, 2024

Choose a reason for hiding this comment

mnlevy1981 Oct 21, 2024

Choose a reason for hiding this comment

mnlevy1981 Oct 21, 2024

Choose a reason for hiding this comment

AgentOxygen commented Oct 15, 2024 •

edited

Loading