Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to specify generated time series file output path #117

Merged
merged 4 commits into from
Jul 18, 2024

Conversation

nusbaume
Copy link
Collaborator

@nusbaume nusbaume commented Jul 16, 2024

Fixes #116

All Submissions:

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you lint your code locally prior to submission?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@nusbaume
Copy link
Collaborator Author

@mnlevy1981 @TeaganKing please let me know what tests you want to have run with this branch as specified in the checklist above (I obviously tested it in my own config file). Thanks!

@TeaganKing
Copy link
Collaborator

Hi @nusbaume , We don't have robust testing set up yet (that PR template checklist was a bit preemptive), so I think just testing with your own config file is great.

@TeaganKing
Copy link
Collaborator

It looks like the pre-commit is also passing. Whenever it's ready, feel free to request a review.

@nusbaume
Copy link
Collaborator Author

@TeaganKing thanks for letting me know! Sadly it looks like I am unable to add reviewers to this PR.

Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just one small request and then a quick question that will lead to a second small request regardless of your answer :)

cupid/run.py Outdated
Comment on lines 116 to 130
ts_output_dir = [
os.path.join(
global_params["CESM_output_dir"],
timeseries_params["case_name"],
f"{component}", "proc", "tseries",
),
]

if "ts_output_dir" in timeseries_params:
ts_output_dir = [
os.path.join(
timeseries_params["ts_output_dir"],
f"{component}", "proc", "tseries",
),
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion: can we keep ts_output_dir as a string, and pass [ts_output_dir] to cupid.timeseries.create_time_series()? (A separate issue ticket might be to update create_time_series to accept strings or lists as arguments, so list-ifying a handful of arguments seems silly... but that will be easier to address if ts_output_dir is a string.)

Also, a question: an alternate approach to this block of code would be

                if "ts_output_dir" in timeseries_params:
                    ts_output_dir =os.path.join(
                        timeseries_params["ts_output_dir"],
                        f"{component}", "proc", "tseries",
                    )
                else:
                    ts_output_dir = os.path.join(
                        global_params["CESM_output_dir"],
                        timeseries_params["case_name"],
                        f"{component}", "proc", "tseries",
                    )

Is there a reason to prefer one implementation over the other? If you want to keep

var=value
if condition:
    var=other_value

I see one if-else block in run.py, could you change that one?

        # Doing initial subsetting on full catalog, e.g. to only use certain cases

+        cat_path = full_cat_path
        if "subset" in control["data_sources"]:
            first_subset_kwargs = control["data_sources"]["subset"]
            cat_subset = full_cat.search(**first_subset_kwargs)
            # This pulls out the name of the catalog from the path
            cat_subset_name = full_cat_path.split("/")[-1].split(".")[0] + "_subset"
            cat_subset.serialize(
                directory=temp_data_path, name=cat_subset_name, catalog_type="file",
            )
            cat_path = temp_data_path + "/" + cat_subset_name + ".json"
-        else:
-            cat_path = full_cat_path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with either method of writing the if-statement. I personally prefer the if-else method, but have found that some static analysis tools like pylint will complain about it, hence I went the var=value, if condition: var=other_value way in this PR. Do you all have a preference one way or the other?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with if-else, and if we eventually add pylint or some other checker that complains we'll just have one more block to update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I've applied the requested changes.

@nusbaume nusbaume requested a review from mnlevy1981 July 17, 2024 22:40
Copy link
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

@mnlevy1981 mnlevy1981 merged commit 644ca18 into NCAR:main Jul 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need ability for user to select where generated time series are written
3 participants