-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated timeseries generation with GenTS #143
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to test this out yet, but a few things jumped out at first glance. I think this will be a great step forward for us, but we want to make sure we don't lose any of the existing functionality :)
@@ -18,4 +18,5 @@ dependencies: | |||
- pyyaml | |||
- xarray | |||
- pip: | |||
- gents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to install a static version of gents
, or should we add it as a submodule and then use -e ../externals/gents
to install it? The latter might be helpful as we tweak the CUPiD API for timeseries generation, especially if it will result in changes to gents as well.
n_workers=1, | ||
processes=1, | ||
threads_per_worker=1, | ||
memory_limit="2GB", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to specify memory_limit
rather than letting the LocalCluster
object figure it out based on available resources? It seems like it could be problematic (if less than 2 GB / core is available) or unnecessarily restrictive (if more than 2 GB / core is available)
f"{component}", "proc", "tseries", | ||
), | ||
] | ||
year_start = int(year_start[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation is that start_years
(and end_years
) should be lists that are the same length as case_name
. This lets us, as an example, compare 60 years of a current run against 100 years of a baseline.
year_end = int(year_end[0]) | ||
|
||
modb = gents.ModelOutputDatabase( | ||
hf_head_dir=global_params["CESM_output_dir"] + "/" + timeseries_params["case_name"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if timeseries_params["case_name"]
is a list? As mentioned above, this is the case in our key_metrics
example, and we want to loop through each case_name
, applying the appropriate start_year
and end_year
(ts_done
and overwrite_ts
are both lists of the same length as well, though I'm not 100% clear on why ts_done
is in the config file)
All Submissions:
pre-commit
check)?New Feature Submissions:
Changes to Core Features:
Commentary
GenTS is a modernized post-processing package that specializes in converting history files to timeseries files. All code changes are made to
run.py
astimeseries.py
isn't needed (all of the functionality is encapsulated in GenTS). Release versions are made available via PyPI, so I opted to add it to the environment dependencies list rather than git fleximod or externals.More testing is required for GenTS to make sure it is post-processing history files correctly and integrating into CUPiD will promote further testing. If CUPiD is nearing the production stage, then it might make more sense to create a separate branch.
GenTS can run in serial, but can run in parallel by utilizing Dask. Following the other notebooks, I create a local cluster unless serial is specified in the
config.yml
.There are some timeseries specifications in the
config.yml
that don't yet exist in GenTS. For example, GenTS allows the user to specify a time slice, but this is generalized to all of the history files stored within aModelOutputDatabase
and cannot be broken into model components. To get around this, I create a uniqueModelOutputDatabase
for each model component, which is likely inefficient. GenTS does not require a history string to identify history files, but this comes with the caveat of processing all history files within the output directory. This may not be ideal for some use cases. I am unsure whether to implement these features into CUPiD or GenTS, as I would prefer to keep GenTS as generalized as possible but there may be some useful tools I could build into GenTS to enable this sort of configuration.