Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EAMxx variables #880

Open
wants to merge 26 commits into
base: cdat-migration-fy24
Choose a base branch
from
Open

EAMxx variables #880

wants to merge 26 commits into from

Conversation

chengzhuzhang
Copy link
Contributor

@chengzhuzhang chengzhuzhang commented Oct 29, 2024

Description

  • This PR replaces More EAMXX var support #849
    This enhancement will be merged in a new e3sm_diags code base after cdat-migration-fy24 branch is merged.

Reference:
The mapping of new variables is based on https://acme-climate.atlassian.net/wiki/spaces/EAMXX/pages/4535976058/Output+Standard+Names put together by @AaronDonahue
The decadal output outlined by @brhillman: https://github.com/E3SM-Project/eamxx-scripts/pull/180/files#diff-1646ba1e37781387625d2ce585aad9ef7f5b6407616300838c7aecd44c67df7e

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder changed the base branch from main to cdat-migration-fy24 October 29, 2024 17:50
@tomvothecoder tomvothecoder force-pushed the cdat-migration-fy24 branch 2 times, most recently from d5a1aad to 7550b3d Compare October 29, 2024 21:18
@tomvothecoder tomvothecoder force-pushed the eamxx_1024 branch 2 times, most recently from 2c55d9e to 469cebb Compare November 4, 2024 19:47
ds_climo = climo(ds, self.var, season).to_dataset()
ds_climo = ds_climo.bounds.add_missing_bounds(axes=["X", "Y"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should close #884. Please verify the fix, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this fixed the problem! Thank you!

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Nov 5, 2024

@tomvothecoder I completed 2D and 3D variables derivation, excepted for COSP related output.
The only thing left here is that we need to support lowercase landfrac/ocnfrac that from EAMxx. The code block is as follows:

LAND_OCEAN_MASK_PATH = os.path.join(INSTALL_PATH, "acme_ne30_ocean_land_mask.nc")
# The keys for the land and ocean fraction variables in the
# `LAND_OCEAN_MASK_PATH` file.
LAND_FRAC_KEY = "LANDFRAC"
OCEAN_FRAC_KEY = "OCNFRAC"

I'm now sure how to provide a clean way to accommodate lower case variable names..

@tomvothecoder
Copy link
Collaborator

The only thing left here is that we need to support lowercase landfrac/ocnfrac that from EAMxx.

I just pushed 46a5dad to add support for more land/ocean var keys. Let me know your thoughts.

@tomvothecoder
Copy link
Collaborator

Pushed ef261b8 (#880) to make land sea mask methods a bit cleaner. Should be good to go.

@@ -243,7 +242,7 @@ def _apply_land_sea_mask(
ds_new = ds.copy()
ds_new = _drop_unused_ilev_axis(ds)
output_grid = ds_new.regridder.grid
mask_var_key = _get_region_mask_var_key(region)
mask_var_key = _get_region_mask_var_key(ds_mask, region)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into error below for a special usecase when the region is land_60S90N.

raise ValueError(f"Only land and ocean regions are supported, not '{region}'.")
ValueError: Only land and ocean regions are supported, not 'land_60S90N'.

It looks like we should have a way to distinguish land/ocean mask only and the other type of special regions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 542b88b should address this.

Comment on lines -1174 to 1147
ds_sub = self._subset_vars_and_load(ds, var)

time_slice = self._get_time_slice(ds_sub)
ds_sub = ds_sub.sel(time=time_slice).squeeze()
time_slice = self._get_time_slice(ds)
ds_sub = ds.sel(time=time_slice).squeeze()

if self.is_sub_monthly:
ds_sub = self._exclude_sub_monthly_coord_spanning_year(ds_sub)

ds_sub = self._subset_vars_and_load(ds_sub, var)

return ds_sub
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should close #892. I was able to run the U variable. Can you please confirm the fix too @chengzhuzhang?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My output:

2024-11-08 12:52:52,411 [INFO]: e3sm_diags_driver.py(_save_env_yml:58) >> Saved environment yml file to: [/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/prov/environment.yml](https://vscode-remote+ssh-002dremote-002bperlmutter.vscode-resource.vscode-cdn.net/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/prov/environment.yml)
2024-11-08 12:52:52,413 [INFO]: e3sm_diags_driver.py(_save_parameter_files:69) >> Saved command used to: [/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/prov/cmd_used.txt](https://vscode-remote+ssh-002dremote-002bperlmutter.vscode-resource.vscode-cdn.net/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/prov/cmd_used.txt)
2024-11-08 12:52:52,414 [INFO]: e3sm_diags_driver.py(_save_python_script:133) >> Saved Python script to: [/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/prov/ipykernel_launcher.py](https://vscode-remote+ssh-002dremote-002bperlmutter.vscode-resource.vscode-cdn.net/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/prov/ipykernel_launcher.py)
2024-11-08 12:52:52,976 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
2024-11-08 12:53:12,435 [INFO]: lat_lon_driver.py(_run_diags_3d:396) >> Selected pressure level(s): [850.0]
2024-11-08 12:53:13,554 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-11-08 12:53:21,676 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in [/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.json](https://vscode-remote+ssh-002dremote-002bperlmutter.vscode-resource.vscode-cdn.net/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.json)
2024-11-08 12:53:28,892 [INFO]: utils.py(_save_plot:91) >> Plot saved in: [/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png](https://vscode-remote+ssh-002dremote-002bperlmutter.vscode-resource.vscode-cdn.net/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png)
2024-11-08 12:53:28,892 [INFO]: utils.py(_save_plot:91) >> Plot saved in: [/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png](https://vscode-remote+ssh-002dremote-002bperlmutter.vscode-resource.vscode-cdn.net/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png)
2024-11-08 12:53:28,895 [INFO]: main.py(create_viewer:132) >> lat_lon [/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/viewer](https://vscode-remote+ssh-002dremote-002bperlmutter.vscode-resource.vscode-cdn.net/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/viewer)
2024-11-08 12:53:32,950 [INFO]: main.py(create_viewer:135) >> [('Latitude-Longitude contour maps', 'lat_lon/index.html'), ('Table', 'table/index.html'), ('Taylor Diagram', 'taylor/index.html'), ('CMIP6 Comparison', 'cmip6/index.html')]
2024-11-08 12:53:32,956 [INFO]: e3sm_diags_driver.py(main:392) >> Viewer HTML generated at [/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/viewer/index.html](https://vscode-remote+ssh-002dremote-002bperlmutter.vscode-resource.vscode-cdn.net/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/viewer/index.html)
2024-11-08 12:53:32,976 [INFO]: logger.py(move_log_to_prov_dir:106) >> Log file saved in [/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/prov/e3sm_diags_run.log](https://vscode-remote+ssh-002dremote-002bperlmutter.vscode-resource.vscode-cdn.net/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/eamxx_decadal_1996_1031_edv3/prov/e3sm_diags_run.log)

Copy link
Contributor Author

@chengzhuzhang chengzhuzhang Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder When testing after re-basing, issue #892 came back. Now I'm having 30 mins to complete this run:

(/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3) chengzhu@nid004205:~/eamxx_diags/run_script> python run_e3sm_diags_1996.py -d U_lat_lon.cfg
2024-12-12 15:23:42,166 [INFO]: e3sm_diags_driver.py(_save_env_yml:58) >> Saved environment yml file to: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1212_edv3_U/prov/environment.yml
2024-12-12 15:23:42,172 [INFO]: e3sm_diags_driver.py(_save_parameter_files:69) >> Saved command used to: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1212_edv3_U/prov/cmd_used.txt
2024-12-12 15:23:42,184 [INFO]: e3sm_diags_driver.py(_save_parameter_files:99) >> Saved cfg file to: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1212_edv3_U/prov/U_lat_lon.cfg
2024-12-12 15:23:42,191 [INFO]: e3sm_diags_driver.py(_save_python_script:133) >> Saved Python script to: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1212_edv3_U/prov/run_e3sm_diags_1996.py
2024-12-12 15:23:53,074 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
2024-12-12 15:53:23,937 [INFO]: lat_lon_driver.py(_run_diags_3d:396) >> Selected pressure level(s): [850.0]
2024-12-12 15:53:27,832 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-12 15:53:40,129 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1212_edv3_U/lat_lon/ERA5/ERA5-U-850-ANN-global.json
2024-12-12 15:53:48,999 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1212_edv3_U/lat_lon/ERA5/ERA5-U-850-ANN-global.png
2024-12-12 15:53:49,007 [INFO]: main.py(create_viewer:132) >> lat_lon /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1212_edv3_U/viewer
2024-12-12 15:54:02,811 [INFO]: main.py(create_viewer:135) >> [('Latitude-Longitude contour maps', 'lat_lon/index.html'), ('Table', 'table/index.html'), ('Taylor Diagram', 'taylor/index.html'), ('CMIP6 Comparison', 'cmip6/index.html')]
2024-12-12 15:54:02,847 [INFO]: e3sm_diags_driver.py(main:392) >> Viewer HTML generated at /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1212_edv3_U/viewer/index.html

I think the code diff look okay though, the code change needed is carried over, but not sure why the performance issue returned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for re-running the script. I will debug the performance bottleneck when I'm back Mon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be on main anyways. I am going to port it over to #907.

@tomvothecoder
Copy link
Collaborator

I'm getting a separate error about circular imports. Do you see this too?

Not sure how this was introduced, but it should be addressed:

2024-11-08 12:27:54,820 [ERROR]: core_parameter.py(_run_diag:343) >> Error in e3sm_diags.driver.lat_lon_driver
Traceback (most recent call last):
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 340, in _run_diag
    single_result = module.run_diag(self)
AttributeError: partially initialized module 'e3sm_diags.driver.lat_lon_driver' has no attribute 'run_diag' (most likely due to a circular import)
2024-11-08 12:27:54,821 [ERROR]: run.py(run_diags:91) >> Error traceback:
Traceback (most recent call last):
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/run.py", line 89, in run_diags
    params_results = main(params)
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 373, in main
    parameters_results = _run_serially(parameters)
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 271, in _run_serially
    nested_results.append(parameter._run_diag())
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 333, in _run_diag
    module = importlib.import_module(mod_str)
  File "/global/u2/v/vo13/mambaforge/envs/e3sm_diags_dev_892/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/driver/zonal_mean_xy_driver.py", line 17, in <module>
    from e3sm_diags.metrics.metrics import spatial_avg
ImportError: cannot import name 'spatial_avg' from partially initialized module 'e3sm_diags.metrics.metrics' (most likely due to a circular import) (/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/metrics/metrics.py)
2024-11-08 12:27:54,824 [INFO]: logger.py(move_log_to_prov_dir:106) >> Log file saved in /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/prov/e3sm_diags_run.log
2024-11-08 12:27:55,985 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
Value(False)
2024-11-08 12:32:27,811 [INFO]: lat_lon_driver.py(_run_diags_3d:396) >> Selected pressure level(s): [850.0]
2024-11-08 12:32:29,678 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-11-08 12:32:39,801 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.json
2024-11-08 12:32:54,463 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png
2024-11-08 12:32:54,463 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png

@tomvothecoder
Copy link
Collaborator

I'm getting a separate error about circular imports. Do you see this too?

Not sure how this was introduced, but it should be addressed:

2024-11-08 12:27:54,820 [ERROR]: core_parameter.py(_run_diag:343) >> Error in e3sm_diags.driver.lat_lon_driver
Traceback (most recent call last):
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 340, in _run_diag
    single_result = module.run_diag(self)
AttributeError: partially initialized module 'e3sm_diags.driver.lat_lon_driver' has no attribute 'run_diag' (most likely due to a circular import)
2024-11-08 12:27:54,821 [ERROR]: run.py(run_diags:91) >> Error traceback:
Traceback (most recent call last):
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/run.py", line 89, in run_diags
    params_results = main(params)
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 373, in main
    parameters_results = _run_serially(parameters)
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 271, in _run_serially
    nested_results.append(parameter._run_diag())
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 333, in _run_diag
    module = importlib.import_module(mod_str)
  File "/global/u2/v/vo13/mambaforge/envs/e3sm_diags_dev_892/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/driver/zonal_mean_xy_driver.py", line 17, in <module>
    from e3sm_diags.metrics.metrics import spatial_avg
ImportError: cannot import name 'spatial_avg' from partially initialized module 'e3sm_diags.metrics.metrics' (most likely due to a circular import) (/global/u2/v/vo13/E3SM-Project/e3sm_diags/e3sm_diags/metrics/metrics.py)
2024-11-08 12:27:54,824 [INFO]: logger.py(move_log_to_prov_dir:106) >> Log file saved in /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/prov/e3sm_diags_run.log
2024-11-08 12:27:55,985 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
Value(False)
2024-11-08 12:32:27,811 [INFO]: lat_lon_driver.py(_run_diags_3d:396) >> Selected pressure level(s): [850.0]
2024-11-08 12:32:29,678 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-11-08 12:32:39,801 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.json
2024-11-08 12:32:54,463 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png
2024-11-08 12:32:54,463 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/877-attr-err/eamxx_decadal_1996_1107_edv3/lat_lon/ERA5/ERA5-U-850-ANN-global.png

No longer appearing after running make install again. Good to go here.

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Nov 8, 2024

Commit #892 worked well! It took about 1 min to finish the 3d variableU run, which is comparable to what cdat does. Thank you for a quick fix! @tomvothecoder

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Nov 8, 2024

I found another issue that derived variable is not working for time-series files. Example .cfg

[#]
sets = ["lat_lon"]
case_id = "ERA5"
variables = ["QREFHT"]
ref_name = "ERA5_ext"
reference_name = "ERA5 Reanalysis"
seasons = ["ANN", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "DJF", "MAM", "JJA", "SON"]
contour_levels = [0.2, 0.5, 1, 2.5, 5, 7.5, 10, 12.5, 15, 17.5]
diff_levels = [-5, -4, -3, -2, -1, -0.25, 0.25, 1, 2, 3, 4, 5]

The input files d2m and sp files (which are ERA5 variables used to derive QREFHT ) are available in time-series/EAM5_ext directory. But the program is trying to look for:
/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series/ERA5_ext/QREFHT_.{13}.nc

@tomvothecoder
Copy link
Collaborator

I found another issue that derived variable is not working for time-series files. Example .cfg

[#]
sets = ["lat_lon"]
case_id = "ERA5"
variables = ["QREFHT"]
ref_name = "ERA5_ext"
reference_name = "ERA5 Reanalysis"
seasons = ["ANN", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "DJF", "MAM", "JJA", "SON"]
contour_levels = [0.2, 0.5, 1, 2.5, 5, 7.5, 10, 12.5, 15, 17.5]
diff_levels = [-5, -4, -3, -2, -1, -0.25, 0.25, 1, 2, 3, 4, 5]

The input files d2m and sp files (which are ERA5 variables used to derive QREFHT ) are available in time-series/EAM5_ext directory. But the program is trying to look for: /global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series/ERA5_ext/QREFHT_.{13}.nc

You may need to step-through the loop that attempts to derive QREFHT until it hits d2m and sp. I walked through the code and it tries to match on these two filepath patterns, which look correct.

  • /global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series/ERA5_ext/d2m_.{13}.nc
  • /global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series/ERA5_ext/sp_.{13}.nc.

Possible issue

The ERA5_ext sub-directory doesn't look like it exists under the root directory (/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series).

(e3sm_diags_dev_892) vo13@login08:.../Atm/time-series$ pwd
/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series
(e3sm_diags_dev_892) vo13@login08:.../Atm/time-series$ ls ERA5_ext
ls: cannot access 'ERA5_ext': No such file or directory
(e3sm_diags_dev_892) vo13@login08:.../Atm/time-series$ 

@chengzhuzhang
Copy link
Contributor Author

@tomvothecoder thank you for looking into this! I'm actually stepping into the code, and it does look like the logic should be correct. Yes, the files are mis-placed to EAR5, but should be in a seperate directory ERA5-ext. Let me fix the data and try again.

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Nov 9, 2024

I can confirm that this is a data problem. The problem described in #880 (comment) is resolved with time-series files placed in correct directory ERA5-ext. The data fix is ready on lcrc and perlmutter.

@chengzhuzhang chengzhuzhang marked this pull request as ready for review November 9, 2024 00:48
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chengzhuzhang, this PR looks good to me. I had some minor comments/questions. Thanks for working on this PR!

Comment on lines +112 to +115
("precip_liq_surf_mass_flux", "precip_ice_surf_mass_flux"): prect, # EAMxx
("precip_total_surf_mass_flux",): lambda pr: convert_units(
rename(pr), target_units="mm/day"
), # EAMxx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we considering separating EAMxx variables out to another derivations dictionary so we don't need to comment # EAMxx?

This can be done through #716.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep the comments for now and we can consider separation in the future. We probably want to separate variables from multiple obs sources as well.

e3sm_diags/derivations/derivations.py Outdated Show resolved Hide resolved
e3sm_diags/derivations/derivations.py Outdated Show resolved Hide resolved
e3sm_diags/driver/utils/dataset_xr.py Show resolved Hide resolved
@tomvothecoder
Copy link
Collaborator

I think this branch needs to be rebased on the latest cdat-migration-fy24. Let me do this and then we can merge whenever you're ready.

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Nov 11, 2024

I think this branch needs to be rebased on the latest cdat-migration-fy24. Let me do this and then we can merge whenever you're ready.

Thank you for the review and rebasing @tomvothecoder. I will tag EAMxx developers for a review before merging.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Nov 11, 2024

Just rebased, should be good to go for further review.

@chengzhuzhang
Copy link
Contributor Author

Hi @PeterCaldwell @brhillman @crterai @AaronDonahue:

This PR added support for all EAMxx output variables (except for those COSP related). This takes a little longer because, this update is based on the brand new e3sm_diags that is just migrated over to use xarray/xcdat to replace cdat (kudos to @tomvothecoder).

Here is an example e3sm_diags run based on the 1996ish EAMxx decadal run that Ben provided.

The workflow to generate this run is to first run ncclimo to generate the regridded climatology file ; and then run the e3sm_diags run script.

Example of the nco script as below. Thanks to @czender, the two nco steps listed below can be simplified with just using one ncclimo command line with latest nco release. The improvement will be available through the next e3sm-unified release scheduled in February.

#!/bin/bash                                 

source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh

drc_in=/global/cfs/cdirs/e3sm/chengzhu/eamxx/run
drc_out=/global/cfs/cdirs/e3sm/chengzhu/eamxx/post/data
caseid=output.scream.decadal.monthlyAVG_ne30pg2.AVERAGE.nmonths_x1

# spoofed climatology files with data from 1995-09 to 1996-08

# create climatology files
cd ${drc_in};ls ${caseid}*1996-0[1-8]*.nc ${caseid}*1995-09*.nc ${caseid}*1995-1[0-2]*.nc | ncclimo -P eamxx --fml_nm=eamxx_decadal --yr_srt=1996 --yr_end=1996 --drc_out=$drc_out


map=/global/cfs/projectdirs/e3sm/zender/maps/map_ne30pg2_to_cmip6_180x360_traave.20231201.nc
# remaping climo files to regular lat-lon
cd $drc_out;ls *.nc | ncremap -P eamxx --prm_opt=time,lwband,swband,ilev,lev,plev,cosp_tau,cosp_cth,cosp_prs,dim2,ncol --map=${map} --drc_out=${drc_out}/rgr

exit

It would be great if you could review the code change to verify the variable derivations are correct. Next we will work on better support on arbitrary length runs; COSP histogram and other variability sets (if given longer simulation). Any feedback on capabilities and priorities are welcome!

Thanks,
Jill

@tomvothecoder
Copy link
Collaborator

RE: comment for finding root cause of performance issue

@chengzhuzhang I pushed 6f1949f with a debug script that isolates the "ua" dataset with Xarray/xCDAT directly.

Hey @xylar, do you have any thoughts or ideas on why open_mfdataset() + .load() hangs?

Test cases

  • Test case 1 - open_mfdataset() + "ua" dataset (76 GB) + subsetting on time slice to 2GB + .load()
    • Result: .load() does not work, hangs
    • Notes: This is what happens in the current codebase.
  • Test case 2 - open_dataset() + "ua" dataset (76 GB) + subsetting on time slice to 2 GB + .load()
    • Result: .load() works
  • Test case 3 - open_mfdataset + "pr" dataset (2 GB) + .load()
    • Result: .load() works
    • Notes: pr is 3D variable (time, lat, lon), while ua is a 4D variable (time, lat, lon, plev).

Explanation

I don't know why open_mfdataset() hangs with .load() with single, large datasets like "ua". It used to work a few months ago as mentioned in this comment, before rebasing on main. I don't see any obvious code differences that would cause this issue.

On the technical side, open_mfdataset() will open datasets using dask arrays, while open_dataset() loads datasets using numpy arrays. It seems like there is an issue with .load() on 4D dask arrays?

Possible workaround

If there are multiple files to open, use open_mfdataset(). If there is only 1 file to open (e.g., large ua dataset), use open_dataset(). I tested this workaround and it was successful.

@chengzhuzhang
Copy link
Contributor Author

@tomvothecoder I'm trying to identify what might have changed that caused this resolved issue to resurface after rebasing. It's likely that many package versions differ compared to the prior environment where we originally tested this. Could the issue be reproduced using earlier versions of xarray/xcdat?

@tomvothecoder
Copy link
Collaborator

@tomvothecoder I'm trying to identify what might have changed that caused this resolved issue to resurface after rebasing. It's likely that many package versions differ compared to the prior environment where we originally tested this. Could the issue be reproduced using earlier versions of xarray/xcdat?

I'm currently trying with different versions of xarray/xcdat. I will let you know the results.

@xylar
Copy link
Contributor

xylar commented Dec 18, 2024

Hey @xylar, do you have any thoughts or ideas on why open_mfdataset() + .load() hangs?

Could this be a regression in xarray itself? I haven't seen this myself.

Update I see @chengzhuzhang already suggested this.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Dec 18, 2024

@chengzhuzhang and @xylar I was was able to track down the root cause of the slow performance to a regression between dask=2024.11.2 and dask=2024.12.0. This occurs with both xarray.open_mfdataset() and xcdat.open_mfdataset().

Results

I ran the debug script using five different conda environments.

Envs 1-4 with different combinations of Xarray, xCDAT, and the latest Dask loaded the dataset in ~85 secs (all slow). These results suggest that there is a regression in a sub-dependency that is causing the slowdown.

I then created Env 5 with the latest versions of Xarray and xCDAT and Dask v2024.11.2. It loaded the dataset in ~4 secs.

Root cause

Regression between dask=2024.11.2 and dask=2024.12.0. This occurs with both xarray.open_mfdataset() and xcdat.open_mfdataset().

Next steps

  • File a bug report with dask and/or xarray with a minimum reproducible example of the slow down
  • Add a workaround to e3sm_diags

Test environments

Env 1 - Xarray v2024.11.0, xCDAT v0.7.3, latest Dask -- really slow/hangs

conda create -y -n xr_2024110_xc_073 -c conda-forge xarray=2024.11.0 xcdat=0.7.3 ipykernel

Result: Time taken to load ds_xc_sub: 85.32655910775065 seconds

Env 2 - Xarray v2024.11.0, xCDAT v0.7.0, latest Dask -- really slow/hangs

conda create -y -n xr_2024110_xc_070 -c conda-forge xarray=2024.11.0 xcdat=0.7.0 ipykernel

Result: Time taken to load ds_xc_sub: 85.4243874088861 seconds

Env 3 - Xarray v2024.9.0, xCDAT v0.7.3, latest Dask -- really slow/hangs

conda create -y -n xr_202490_xc_073 -c conda-forge xarray=2024.9.0 xcdat=0.7.3 ipykernel

Results: Time taken to load ds_xc_sub: 84.86508742999285 seconds

Env 4 - Xarray v2024.9.0, xCDAT v0.7.0, latest Dask -- really slow/hangs

conda create -y -n xr_202490_xc_070 -c conda-forge xarray=2024.9.0 xcdat=0.7.0 ipykernel

Results: Time taken to load ds_xc_sub: 84.75896426010877 seconds

Env 5 - Xarray v2024.11.0, xCDAT v0.7.3, and Dask v2024.11.2 (older) -- really fast

conda create -y -n xr_2024110_xc_073_dask_2024112 -c conda-forge xarray=2024.11.0 xcdat=0.7.3 dask=2024.11.2 ipykernel

Results: Time taken to load ds_xc_sub: 3.812566387001425 seconds

Env 6 - Xarray v2024.11.0, xCDAT v0.7.3, and Dask v2024.12.0 (released 2 weeks ago) -- really slow/hangs

conda create -y -n xr_2024110_xc_073_dask_202431 -c conda-forge xarray=2024.11.0 xcdat=0.7.3 dask=2024.11.2 ipykernel

Results: Time taken to load ds_xc_sub: 84.14018559036776 seconds

@chengzhuzhang
Copy link
Contributor Author

Great job tracking down the problematic versions! It looks like we need to avoid using the faulty xarray/Dask combination for now...

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Dec 20, 2024

By downgrading dask version to 2024.11.2, the speed is back again

(/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3) (edv3) chengzhu@login15:~/eamxx_diags/run_script> python run_e3sm_diags_1996.py -d U_lat_lon.cfg
2024-12-20 11:19:43,316 [INFO]: e3sm_diags_driver.py(_save_env_yml:58) >> Saved environment yml file to: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1220_edv3_U/prov/environment.yml
2024-12-20 11:19:43,317 [INFO]: e3sm_diags_driver.py(_save_parameter_files:69) >> Saved command used to: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1220_edv3_U/prov/cmd_used.txt
2024-12-20 11:19:43,318 [INFO]: e3sm_diags_driver.py(_save_parameter_files:99) >> Saved cfg file to: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1220_edv3_U/prov/U_lat_lon.cfg
2024-12-20 11:19:43,318 [INFO]: e3sm_diags_driver.py(_save_python_script:133) >> Saved Python script to: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1220_edv3_U/prov/run_e3sm_diags_1996.py
2024-12-20 11:19:48,401 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
2024-12-20 11:20:19,096 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [850.0]
2024-12-20 11:20:20,201 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-20 11:20:28,357 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1220_edv3_U/lat_lon/ERA5/ERA5-U-850-ANN-global.json
2024-12-20 11:20:36,030 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1220_edv3_U/lat_lon/ERA5/ERA5-U-850-ANN-global.png
2024-12-20 11:20:36,030 [INFO]: main.py(create_viewer:132) >> lat_lon /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1220_edv3_U/viewer
2024-12-20 11:20:41,987 [INFO]: main.py(create_viewer:135) >> [('Latitude-Longitude contour maps', 'lat_lon/index.html'), ('Table', 'table/index.html'), ('Taylor Diagram', 'taylor/index.html'), ('CMIP6 Comparison', 'cmip6/index.html')]
2024-12-20 11:20:41,991 [INFO]: e3sm_diags_driver.py(main:392) >> Viewer HTML generated at /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1220_edv3_U/viewer/index.html
2024-12-20 11:20:41,992 [INFO]: logger.py(move_log_to_prov_dir:106) >> Log file saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1220_edv3_U/prov/e3sm_diags_run.log

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Dec 20, 2024

Great job tracking down the problematic versions! It looks like we need to avoid using the faulty xarray/Dask combination for now...

I’m trying to create a minimal reproducible script to replicate the loading issue with large E3SM datasets for an Xarray bug report. However, I’ve had little success so far. Creating a 76 GB dummy dataset, subsetting it, and loading it has been very slow with both Dask versions (2024.11.2 and 2024.12.0). I’m unsure why this is happening, and I don’t think it’s worth investing more time into this approach.

"""
This script benchmarks the time taken to load a subset of a large xarray dataset
into memory.

Test Environments:
1. xarray=2024.11.0, dask=2024.11.2
    - Command: mamba create -y -n xr_2024110_dask_20241112 -c conda-forge xarray=2024.11.0 dask=2024.11.2 netcdf4 ipykernel

2. xarray=2024.11.0, dask=2024.12.0
    - Command: mamba create -y -n xr_2024110_dask_2024120 -c conda-forge xarray=2024.11.0 dask=2024.12.0 netcdf4 ipykernel

Steps:
1.
"""

# %%
import numpy as np
import pandas as pd
import xarray as xr
import timeit

import dask.array as da

# 1. Create the coordinates.
times = pd.date_range("1979-01-01", "2019-12-31", freq="MS")
plevs = np.array(
    [
        100000.0,
        97500.0,
        95000.0,
        92500.0,
        90000.0,
        87500.0,
        85000.0,
        82500.0,
        80000.0,
        77500.0,
        75000.0,
        70000.0,
        65000.0,
        60000.0,
        55000.0,
        50000.0,
        45000.0,
        40000.0,
        35000.0,
        30000.0,
        25000.0,
        22500.0,
        20000.0,
        17500.0,
        15000.0,
        12500.0,
        10000.0,
        7000.0,
        5000.0,
        3000.0,
        2000.0,
        1000.0,
        700.0,
        500.0,
        300.0,
        200.0,
        100.0,
    ]
)
lats = np.linspace(-90, 90, 721)
lons = np.linspace(0, 360, 1440, endpoint=False)

# 2. Define the dimensions
time = len(times)
plev = len(plevs)
lat = len(lats)
lon = len(lons)

# 3. Create the dataset and subset it on time.
ds = xr.DataArray(
    name="ua",
    data=da.random.random(
        size=(time, plev, lat, lon), chunks=(497, 37, 721, 1440)
    ).astype(np.float32),
    dims=["time", "plev", "lat", "lon"],
    coords={"time": times, "plev": plevs, "lat": lats, "lon": lons},
).to_dataset()


ds_sub = ds.sel(time=slice("1996-01-15", "1997-01-15"))

# %%
# 4. Load the sub-setted dataset into memory.
start_time = timeit.default_timer()
ds_sub.load()
end_time = timeit.default_timer()

print(f"Time taken to load the uaset: {end_time - start_time} seconds")

Instead, I plan to provide a script that references the E3SM ua dataset (/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/ERA5/ua_197901_201912.nc).

@chengzhuzhang Are the diagnostic datasets publicly available somewhere for the developers to access? Or can I move the dataset to www and link to it?

"""
This script benchmarks the time taken to load a subset of a large xarray dataset
into memory.

Test Environments:
1. xarray=2024.11.0, dask=2024.11.2
    - Command: mamba create -y -n xr_2024110_dask_20241112 -c conda-forge xarray=2024.11.0 dask=2024.11.2 netcdf4 ipykernel
    - Result: ~3-4 secs to load

2. xarray=2024.11.0, dask=2024.12.0
    - Command: mamba create -y -n xr_2024110_dask_2024120 -c conda-forge xarray=2024.11.0 dask=2024.12.0 netcdf4 ipykernel
    - Result: ~85 secs to load

Steps:
1. Open the "ua" dataset (~76 GB) from a specified file path.
2. Subset the "ua" dataset to a smaller size (~2 GB) based on a time range.
3. Load the subsetted dataset into memory and measure the time taken for this operation.
"""

# %%
import timeit

import xarray as xr

# 1. Open the "ua" dataset (~76 GB)
filepaths = [
    "/lcrc/group/e3sm/diagnostics/observations/Atm/time-series/ERA5/ua_197901_201912.nc"
]

ds = xr.open_mfdataset(filepaths)

# 2. Subset the "ua" dataset (~2 GB)
ds_sub = ds.sel(time=slice("1996-01-15", "1997-01-15", None))

# %%
# 3. Load into memory
start_time = timeit.default_timer()
ds_sub.load()
elapsed = timeit.default_timer() - start_time

print(f"Time taken to load ds_xc_sub: {elapsed} seconds")

@chengzhuzhang
Copy link
Contributor Author

I’m trying to create a minimal reproducible script to replicate the loading issue with large E3SM datasets for an Xarray bug report.

yeah, if it is possible, we should report this bug (even if we can't provide a reproducer due to large data size. When I just checked, there is a new dask release v2024.12.1. We can probably try it again to see if this fixed the bug we see.

@chengzhuzhang
Copy link
Contributor Author

@tomvothecoder, I just tested and dask 2024.12.1 doesn't fix the problem, the script still hung. @tomvothecoder we do have a version of ua file can be accessed publicly, https://web.lcrc.anl.gov/public/e3sm/diagnostics/observations/Atm/time-series/ERA5/ua_197901_201912.nc, we should file an issue with dask

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Dec 23, 2024

While downgrading to dask 2024.11.0 resolved the performance issue in a single run for variable U at 850. When re-run the complete run script: https://github.com/E3SM-Project/e3sm_diags/pull/880/files#diff-f1b232e7ead3e2eea141a279f68dda9257278babe3286234104d882549b2d453

The process hang at another 3d variable T. Following is the last several lines of logs before hanging:

2024-12-23 13:11:01,288 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5_ext-QREFHT-ANN-global.png
2024-12-23 13:11:01,289 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U10
2024-12-23 13:11:14,482 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:11:36,898 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5_ext-U10-ANN-global.json
2024-12-23 13:11:46,349 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5_ext-U10-ANN-global.png
2024-12-23 13:11:46,350 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
2024-12-23 13:12:47,845 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [850.0]
2024-12-23 13:12:51,033 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:13:01,314 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-U-850-ANN-global.json
2024-12-23 13:13:08,882 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-U-850-ANN-global.png
2024-12-23 13:13:08,883 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
2024-12-23 13:14:01,364 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [200.0]
2024-12-23 13:14:04,701 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:14:26,871 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-U-200-ANN-global.json
2024-12-23 13:14:29,465 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-U-200-ANN-global.png
2024-12-23 13:14:29,466 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: Z3
2024-12-23 13:15:33,686 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [500.0]
2024-12-23 13:15:36,954 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:16:00,436 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-Z3-500-ANN-global.json
2024-12-23 13:16:02,665 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-Z3-500-ANN-global.png
2024-12-23 13:16:02,666 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: OMEGA
2024-12-23 13:17:09,429 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [200.0]
2024-12-23 13:17:13,528 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:17:37,203 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-200-ANN-global.json
2024-12-23 13:19:29,942 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-200-ANN-global.png
2024-12-23 13:19:29,944 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: OMEGA
2024-12-23 13:20:27,928 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [500.0]
2024-12-23 13:20:32,877 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:20:54,878 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-500-ANN-global.json
2024-12-23 13:22:31,206 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-500-ANN-global.png
2024-12-23 13:22:31,207 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: OMEGA
2024-12-23 13:23:28,493 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [850.0]
2024-12-23 13:23:33,735 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:23:44,627 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-850-ANN-global.json
2024-12-23 13:28:17,632 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-850-ANN-global.png
2024-12-23 13:28:17,635 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: T

I also tested T variable at 850mb standalone, it worked okay. I did two tests and the problem can be reproduced. This is new issue emerging after re-basing, the same complete run was completed Nov8th as shown here. It is not clear if this is caused by regression of dependencies or some newer commits being merged, e.g. #866

@tomvothecoder
Copy link
Collaborator

I also tested T variable at 850mb standalone, it worked okay. I did two tests and the problem can be reproduced.

To clarify, are you saying the T variable works fine standalone, but the performance bottleneck appears with the complete run?

@tomvothecoder
Copy link
Collaborator

@tomvothecoder, I just tested and dask 2024.12.1 doesn't fix the problem, the script still hung.

I forgot to mention earlier that I tested this version and it has the same problem.

@tomvothecoder
Copy link
Collaborator

While downgrading to dask 2024.11.0 resolved the performance issue in a single run for variable U at 850. When re-run the complete run script: #880 (files)

The process hang at another 3d variable T. Following is the last several lines of logs before hanging:

2024-12-23 13:11:01,288 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5_ext-QREFHT-ANN-global.png
2024-12-23 13:11:01,289 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U10
2024-12-23 13:11:14,482 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:11:36,898 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5_ext-U10-ANN-global.json
2024-12-23 13:11:46,349 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5_ext-U10-ANN-global.png
2024-12-23 13:11:46,350 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
2024-12-23 13:12:47,845 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [850.0]
2024-12-23 13:12:51,033 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:13:01,314 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-U-850-ANN-global.json
2024-12-23 13:13:08,882 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-U-850-ANN-global.png
2024-12-23 13:13:08,883 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: U
2024-12-23 13:14:01,364 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [200.0]
2024-12-23 13:14:04,701 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:14:26,871 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-U-200-ANN-global.json
2024-12-23 13:14:29,465 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-U-200-ANN-global.png
2024-12-23 13:14:29,466 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: Z3
2024-12-23 13:15:33,686 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [500.0]
2024-12-23 13:15:36,954 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:16:00,436 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-Z3-500-ANN-global.json
2024-12-23 13:16:02,665 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-Z3-500-ANN-global.png
2024-12-23 13:16:02,666 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: OMEGA
2024-12-23 13:17:09,429 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [200.0]
2024-12-23 13:17:13,528 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:17:37,203 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-200-ANN-global.json
2024-12-23 13:19:29,942 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-200-ANN-global.png
2024-12-23 13:19:29,944 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: OMEGA
2024-12-23 13:20:27,928 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [500.0]
2024-12-23 13:20:32,877 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:20:54,878 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-500-ANN-global.json
2024-12-23 13:22:31,206 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-500-ANN-global.png
2024-12-23 13:22:31,207 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: OMEGA
2024-12-23 13:23:28,493 [INFO]: lat_lon_driver.py(_run_diags_3d:398) >> Selected pressure level(s): [850.0]
2024-12-23 13:23:33,735 [INFO]: regrid.py(subset_and_align_datasets:70) >> Selected region: global
2024-12-23 13:23:44,627 [INFO]: io.py(_save_data_metrics_and_plots:77) >> Metrics saved in /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-850-ANN-global.json
2024-12-23 13:28:17,632 [INFO]: utils.py(_save_plot:91) >> Plot saved in: /global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx/eamxx_decadal_1996_1223_edv3_lat_lon/lat_lon/ERA5/ERA5-OMEGA-850-ANN-global.png
2024-12-23 13:28:17,635 [INFO]: lat_lon_driver.py(run_diag:69) >> Variable: T

I also tested T variable at 850mb standalone, it worked okay. I did two tests and the problem can be reproduced. This is new issue emerging after re-basing, the same complete run was completed Nov8th as shown here. It is not clear if this is caused by regression of dependencies or some newer commits being merged, e.g. #866

I ran the complete run script and and it successfully completed in a little over an hour. I used the latest dev env from this branch which includes dask=2024.11.2. Can you try again and make sure to run make install?

Results directory:https://portal.nersc.gov/project/e3sm/cdat-migration-fy24/892-bottleneck/eamxx_decadal_1996_1107_edv3/viewer/lat_lon/index.html

Commands

git checkout eamxx_1024
git pull
conda env create -f conda-env/dev.yml -n eamxx_1024
conda activate eamxx_1024
make install

python auxiliary_tools/cdat_regression_testing/892-bottleneck/run_script.py 

Log

Link: https://portal.nersc.gov/project/e3sm/cdat-migration-fy24/892-bottleneck/eamxx_decadal_1996_1107_edv3/prov/complete_run.log

Environment dependencies

conda list

# packages in environment at /global/u2/v/vo13/mambaforge/envs/eamxx_1024:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
alabaster                 1.0.0              pyhd8ed1ab_1    conda-forge
asttokens                 3.0.0              pyhd8ed1ab_1    conda-forge
aws-c-auth                0.8.0               hb921021_15    conda-forge
aws-c-cal                 0.8.1                h1a47875_3    conda-forge
aws-c-common              0.10.6               hb9d3cd8_0    conda-forge
aws-c-compression         0.3.0                h4e1184b_5    conda-forge
aws-c-event-stream        0.5.0               h7959bf6_11    conda-forge
aws-c-http                0.9.2                hefd7a92_4    conda-forge
aws-c-io                  0.15.3               h831e299_5    conda-forge
aws-c-mqtt                0.11.0              h11f4f37_12    conda-forge
aws-c-s3                  0.7.7                hf454442_0    conda-forge
aws-c-sdkutils            0.2.1                h4e1184b_4    conda-forge
aws-checksums             0.2.2                h4e1184b_4    conda-forge
aws-crt-cpp               0.29.7               hd92328a_7    conda-forge
aws-sdk-cpp               1.11.458             hc430e4a_4    conda-forge
azure-core-cpp            1.14.0               h5cfcd09_0    conda-forge
azure-identity-cpp        1.10.0               h113e628_0    conda-forge
azure-storage-blobs-cpp   12.13.0              h3cf044e_1    conda-forge
azure-storage-common-cpp  12.8.0               h736e048_1    conda-forge
azure-storage-files-datalake-cpp 12.12.0              ha633028_1    conda-forge
babel                     2.16.0             pyhd8ed1ab_1    conda-forge
beautifulsoup4            4.12.3             pyha770c72_1    conda-forge
black                     23.9.1          py312h7900ff3_1    conda-forge
blosc                     1.21.6               he440d0b_1    conda-forge
bokeh                     3.6.2              pyhd8ed1ab_1    conda-forge
bottleneck                1.4.2           py312hc0a28a1_0    conda-forge
brotli                    1.1.0                hb9d3cd8_2    conda-forge
brotli-bin                1.1.0                hb9d3cd8_2    conda-forge
brotli-python             1.1.0           py312h2ec8cdc_2    conda-forge
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-ares                    1.34.4               hb9d3cd8_0    conda-forge
ca-certificates           2024.12.14           hbcca054_0    conda-forge
cartopy                   0.24.0          py312hf9745cd_0    conda-forge
cartopy_offlinedata       0.24.0             pyhd8ed1ab_0    conda-forge
certifi                   2024.12.14         pyhd8ed1ab_0    conda-forge
cf-units                  3.3.0           py312hc0a28a1_0    conda-forge
cf_xarray                 0.10.0             pyhd8ed1ab_2    conda-forge
cffi                      1.17.1          py312h06ac9bb_0    conda-forge
cfgv                      3.3.1              pyhd8ed1ab_1    conda-forge
cftime                    1.6.4           py312hc0a28a1_1    conda-forge
charset-normalizer        3.4.1              pyhd8ed1ab_0    conda-forge
cli-ui                    0.17.2             pyhd8ed1ab_0    conda-forge
click                     8.1.8              pyh707e725_0    conda-forge
cloudpickle               3.1.0              pyhd8ed1ab_2    conda-forge
colorama                  0.4.6              pyhd8ed1ab_1    conda-forge
comm                      0.2.2              pyhd8ed1ab_1    conda-forge
contextlib2               21.6.0             pyhd8ed1ab_1    conda-forge
contourpy                 1.3.1           py312h68727a3_0    conda-forge
coverage                  7.6.10          py312h178313f_0    conda-forge
cycler                    0.12.1             pyhd8ed1ab_1    conda-forge
cytoolz                   1.0.1           py312h66e93f0_0    conda-forge
dask                      2024.11.2          pyhff2d567_1    conda-forge
dask-core                 2024.11.2          pyhff2d567_1    conda-forge
dask-expr                 1.1.19             pyhd8ed1ab_0    conda-forge
debugpy                   1.8.11          py312h2ec8cdc_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_1    conda-forge
distlib                   0.3.9              pyhd8ed1ab_1    conda-forge
distributed               2024.11.2          pyhff2d567_1    conda-forge
docopt                    0.6.2              pyhd8ed1ab_2    conda-forge
docutils                  0.21.2             pyhd8ed1ab_1    conda-forge
e3sm-diags                2.12.1                   pypi_0    pypi
esmf                      8.7.0           nompi_h6063b07_1    conda-forge
esmpy                     8.7.0              pyhecae5ae_1    conda-forge
exceptiongroup            1.2.2              pyhd8ed1ab_1    conda-forge
executing                 2.1.0              pyhd8ed1ab_1    conda-forge
filelock                  3.16.1             pyhd8ed1ab_1    conda-forge
flake8                    6.1.0              pyhd8ed1ab_0    conda-forge
flake8-isort              6.1.0              pyhd8ed1ab_0    conda-forge
fonttools                 4.55.3          py312h178313f_1    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
fsspec                    2024.12.0          pyhd8ed1ab_0    conda-forge
future                    1.0.0              pyhd8ed1ab_1    conda-forge
geos                      3.13.0               h5888daf_0    conda-forge
gflags                    2.2.2             h5888daf_1005    conda-forge
glog                      0.7.1                hbabe93e_0    conda-forge
h2                        4.1.0              pyhd8ed1ab_1    conda-forge
hdf4                      4.2.15               h2a13503_7    conda-forge
hdf5                      1.14.4          nompi_h2d575fe_105    conda-forge
hpack                     4.0.0              pyhd8ed1ab_1    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_1    conda-forge
identify                  2.6.5              pyhd8ed1ab_0    conda-forge
idna                      3.10               pyhd8ed1ab_1    conda-forge
imagesize                 1.4.1              pyhd8ed1ab_0    conda-forge
importlib-metadata        8.5.0              pyha770c72_1    conda-forge
importlib_resources       6.4.5              pyhd8ed1ab_1    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_1    conda-forge
ipykernel                 6.29.5             pyh3099207_0    conda-forge
ipython                   8.31.0             pyh707e725_0    conda-forge
isort                     5.12.0             pyhd8ed1ab_1    conda-forge
jedi                      0.19.2             pyhd8ed1ab_1    conda-forge
jinja2                    3.1.5              pyhd8ed1ab_0    conda-forge
joblib                    1.4.2              pyhd8ed1ab_1    conda-forge
jupyter_client            8.6.3              pyhd8ed1ab_1    conda-forge
jupyter_core              5.7.2              pyh31011fe_1    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.7           py312h68727a3_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.43                 h712a8e2_2    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20240722.0      cxx17_hbbce691_4    conda-forge
libaec                    1.1.3                h59595ed_0    conda-forge
libarrow                  18.1.0           hd595efa_7_cpu    conda-forge
libarrow-acero            18.1.0           hcb10f89_7_cpu    conda-forge
libarrow-dataset          18.1.0           hcb10f89_7_cpu    conda-forge
libarrow-substrait        18.1.0           h08228c5_7_cpu    conda-forge
libblas                   3.9.0           26_linux64_openblas    conda-forge
libbrotlicommon           1.1.0                hb9d3cd8_2    conda-forge
libbrotlidec              1.1.0                hb9d3cd8_2    conda-forge
libbrotlienc              1.1.0                hb9d3cd8_2    conda-forge
libcblas                  3.9.0           26_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcurl                   8.11.1               h332b0f4_0    conda-forge
libdeflate                1.23                 h4ddbbb0_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.6.4                h5888daf_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc                    14.2.0               h77fa898_1    conda-forge
libgcc-ng                 14.2.0               h69a702a_1    conda-forge
libgfortran               14.2.0               h69a702a_1    conda-forge
libgfortran5              14.2.0               hd5240d6_1    conda-forge
libgomp                   14.2.0               h77fa898_1    conda-forge
libgoogle-cloud           2.33.0               h2b5623c_1    conda-forge
libgoogle-cloud-storage   2.33.0               h0121fbd_1    conda-forge
libgrpc                   1.67.1               h25350d4_1    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           26_linux64_openblas    conda-forge
libllvm14                 14.0.6               hcd5def8_4    conda-forge
liblzma                   5.6.3                hb9d3cd8_1    conda-forge
libnetcdf                 4.9.2           nompi_h5ddbaa4_116    conda-forge
libnghttp2                1.64.0               h161d5f1_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.28          pthreads_h94d23a6_1    conda-forge
libparquet                18.1.0           h081d1f1_7_cpu    conda-forge
libpng                    1.6.44               hadc24fc_0    conda-forge
libprotobuf               5.28.3               h6128344_1    conda-forge
libre2-11                 2024.07.02           hbbce691_2    conda-forge
libsodium                 1.0.20               h4ab18f5_0    conda-forge
libsqlite                 3.47.2               hee588c1_0    conda-forge
libssh2                   1.11.1               hf672d98_0    conda-forge
libstdcxx                 14.2.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              14.2.0               h4852527_1    conda-forge
libthrift                 0.21.0               h0e7cc3e_0    conda-forge
libtiff                   4.7.0                hd9ff511_3    conda-forge
libudunits2               2.2.28               h40f5838_3    conda-forge
libutf8proc               2.9.0                hb9d3cd8_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.5.0                h851e524_0    conda-forge
libxcb                    1.17.0               h8a09558_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.13.5               h0d44e9d_1    conda-forge
libxslt                   1.1.39               h76b75d6_0    conda-forge
libzip                    1.11.2               h6991a6a_0    conda-forge
libzlib                   1.3.1                hb9d3cd8_2    conda-forge
llvmlite                  0.43.0          py312h374181b_1    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lxml                      5.3.0           py312he28fd5a_2    conda-forge
lz4                       4.3.3           py312hf0f0c11_2    conda-forge
lz4-c                     1.10.0               h5888daf_1    conda-forge
mache                     1.27.0             pyhff2d567_0    conda-forge
markupsafe                3.0.2           py312h178313f_1    conda-forge
matplotlib-base           3.10.0          py312hd3ec401_0    conda-forge
matplotlib-inline         0.1.7              pyhd8ed1ab_1    conda-forge
mccabe                    0.7.0              pyhd8ed1ab_1    conda-forge
msgpack-python            1.1.0           py312h68727a3_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mypy                      1.5.1           py312h98912ed_1    conda-forge
mypy_extensions           1.0.0              pyha770c72_1    conda-forge
ncurses                   6.5                  he02047a_1    conda-forge
nest-asyncio              1.6.0              pyhd8ed1ab_1    conda-forge
netcdf-fortran            4.6.1           nompi_ha5d1325_108    conda-forge
netcdf4                   1.7.2           nompi_py312ha728dd9_101    conda-forge
nodeenv                   1.9.1              pyhd8ed1ab_1    conda-forge
numba                     0.60.0          py312h83e6fd3_0    conda-forge
numpy                     2.0.2           py312h58c1407_1    conda-forge
openjpeg                  2.5.3                h5fbd93e_0    conda-forge
openssl                   3.4.0                h7b32b05_1    conda-forge
orc                       2.0.3                h12ee42a_2    conda-forge
output_viewer             1.3.3              pyhd8ed1ab_2    conda-forge
packaging                 24.2               pyhd8ed1ab_2    conda-forge
pandas                    2.2.3           py312hf9745cd_1    conda-forge
parso                     0.8.4              pyhd8ed1ab_1    conda-forge
partd                     1.4.2              pyhd8ed1ab_0    conda-forge
pathspec                  0.12.1             pyhd8ed1ab_1    conda-forge
patsy                     1.0.1              pyhd8ed1ab_1    conda-forge
pexpect                   4.9.0              pyhd8ed1ab_1    conda-forge
pickleshare               0.7.5           pyhd8ed1ab_1004    conda-forge
pillow                    11.1.0          py312h80c1187_0    conda-forge
pip                       24.3.1             pyh8b19718_2    conda-forge
platformdirs              4.3.6              pyhd8ed1ab_1    conda-forge
pluggy                    1.5.0              pyhd8ed1ab_1    conda-forge
popt                      1.16              h0b475e3_2002    conda-forge
pre-commit                4.0.1              pyha770c72_1    conda-forge
progressbar2              4.5.0              pyhd8ed1ab_1    conda-forge
proj                      9.5.1                h0054346_0    conda-forge
prompt-toolkit            3.0.48             pyha770c72_1    conda-forge
properscoring             0.1                pyhd8ed1ab_1    conda-forge
psutil                    6.1.1           py312h66e93f0_0    conda-forge
pthread-stubs             0.4               hb9d3cd8_1002    conda-forge
ptyprocess                0.7.0              pyhd8ed1ab_1    conda-forge
pure_eval                 0.2.3              pyhd8ed1ab_1    conda-forge
pyarrow                   18.1.0          py312h7900ff3_0    conda-forge
pyarrow-core              18.1.0          py312h01725c0_0_cpu    conda-forge
pycodestyle               2.11.1             pyhd8ed1ab_0    conda-forge
pycparser                 2.22               pyh29332c3_1    conda-forge
pyflakes                  3.1.0              pyhd8ed1ab_0    conda-forge
pygments                  2.18.0             pyhd8ed1ab_1    conda-forge
pyparsing                 3.2.1              pyhd8ed1ab_0    conda-forge
pyproj                    3.7.0           py312he630544_0    conda-forge
pyshp                     2.3.1              pyhd8ed1ab_1    conda-forge
pysocks                   1.7.1              pyha55dd90_7    conda-forge
pytest                    8.3.4              pyhd8ed1ab_1    conda-forge
pytest-cov                6.0.0              pyhd8ed1ab_1    conda-forge
python                    3.12.8          h9e4cc4f_1_cpython    conda-forge
python-dateutil           2.9.0.post0        pyhff2d567_1    conda-forge
python-tzdata             2024.2             pyhd8ed1ab_1    conda-forge
python-utils              3.9.1              pyhff2d567_1    conda-forge
python_abi                3.12                    5_cp312    conda-forge
pytz                      2024.1             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.2           py312h66e93f0_1    conda-forge
pyzmq                     26.2.0          py312hbf22597_3    conda-forge
qhull                     2020.2               h434a139_5    conda-forge
re2                       2024.07.02           h9925aae_2    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.32.3             pyhd8ed1ab_1    conda-forge
rsync                     3.3.0                h168f954_1    conda-forge
s2n                       1.5.10               hb5b8611_0    conda-forge
schema                    0.7.7              pyhd8ed1ab_0    conda-forge
scikit-learn              1.6.0           py312h7a48858_0    conda-forge
scipy                     1.15.0          py312h180e4f1_0    conda-forge
setuptools                75.6.0             pyhff2d567_1    conda-forge
shapely                   2.0.6           py312h391bc85_2    conda-forge
six                       1.17.0             pyhd8ed1ab_0    conda-forge
snappy                    1.2.1                h8bd8927_1    conda-forge
snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
sparse                    0.15.4             pyh267e887_1    conda-forge
sphinx                    8.1.3              pyhd8ed1ab_1    conda-forge
sphinx-multiversion       0.2.4              pyhd8ed1ab_0    conda-forge
sphinx_rtd_theme          3.0.1              pyha770c72_0    conda-forge
sphinxcontrib-applehelp   2.0.0              pyhd8ed1ab_1    conda-forge
sphinxcontrib-devhelp     2.0.0              pyhd8ed1ab_1    conda-forge
sphinxcontrib-htmlhelp    2.1.0              pyhd8ed1ab_1    conda-forge
sphinxcontrib-jquery      4.1                pyhd8ed1ab_1    conda-forge
sphinxcontrib-jsmath      1.0.1              pyhd8ed1ab_1    conda-forge
sphinxcontrib-qthelp      2.0.0              pyhd8ed1ab_1    conda-forge
sphinxcontrib-serializinghtml 1.1.10             pyhd8ed1ab_1    conda-forge
sqlite                    3.47.2               h9eae976_0    conda-forge
stack_data                0.6.3              pyhd8ed1ab_1    conda-forge
statsmodels               0.14.4          py312hc0a28a1_0    conda-forge
tabulate                  0.9.0              pyhd8ed1ab_2    conda-forge
tblib                     3.0.0              pyhd8ed1ab_1    conda-forge
tbump                     6.9.0              pyhd8ed1ab_0    conda-forge
threadpoolctl             3.5.0              pyhc1e730c_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toml                      0.10.2             pyhd8ed1ab_1    conda-forge
tomli                     2.2.1              pyhd8ed1ab_1    conda-forge
tomlkit                   0.13.2             pyha770c72_1    conda-forge
toolz                     1.0.0              pyhd8ed1ab_1    conda-forge
tornado                   6.4.2           py312h66e93f0_0    conda-forge
traitlets                 5.14.3             pyhd8ed1ab_1    conda-forge
types-pyyaml              6.0.12.20241230    pyhd8ed1ab_0    conda-forge
typing_extensions         4.12.2             pyha770c72_1    conda-forge
tzdata                    2024b                hc8b5060_0    conda-forge
udunits2                  2.2.28               h40f5838_3    conda-forge
ukkonen                   1.0.1           py312h68727a3_5    conda-forge
unicodedata2              15.1.0          py312h66e93f0_1    conda-forge
unidecode                 1.3.8              pyh29332c3_1    conda-forge
urllib3                   2.3.0              pyhd8ed1ab_0    conda-forge
virtualenv                20.28.1            pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.13             pyhd8ed1ab_1    conda-forge
wheel                     0.45.1             pyhd8ed1ab_1    conda-forge
xarray                    2025.1.0           pyhd8ed1ab_0    conda-forge
xcdat                     0.7.3              pyhd8ed1ab_1    conda-forge
xesmf                     0.8.8              pyhd8ed1ab_1    conda-forge
xgcm                      0.8.1              pyhd8ed1ab_1    conda-forge
xhistogram                0.3.2              pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.12               hb9d3cd8_0    conda-forge
xorg-libxdmcp             1.1.5                hb9d3cd8_0    conda-forge
xskillscore               0.0.26             pyhd8ed1ab_1    conda-forge
xxhash                    0.8.2                hd590300_0    conda-forge
xyzservices               2024.9.0           pyhd8ed1ab_1    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zeromq                    4.3.5                h3b0a872_7    conda-forge
zict                      3.0.0              pyhd8ed1ab_1    conda-forge
zipp                      3.21.0             pyhd8ed1ab_1    conda-forge
zlib                      1.3.1                hb9d3cd8_2    conda-forge
zstandard                 0.23.0          py312hef9b889_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

@chengzhuzhang
Copy link
Contributor Author

@tomvothecoder thank you for filing an error report to xarray/dask, and testing the complete run. Good to know you run completed without hanging. I'm re-running the long run script, following make clean and pip install . to see if the run still hangs.

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Jan 8, 2025

@tomvothecoder the fulllat-lon set was completed within 50 mins, without hanging. I'm now testing the core set, I was trying to compare the performance before and after rebasing, but noticed the log file was incomplete as you reported and can't derive the time. I think before rebasing the complete set should complete within 2hours, not sure if it is still the case. It would be nice to testing the updates in performance following #907 (review). Or we can do it separately after #907 is merged.

Update: the complete run finished in 1hr 52 mins, and the metrics matches before and after rebasing.

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Jan 8, 2025

@PeterCaldwell @brhillman @crterai @AaronDonahue, this PR to support EAMxx monthly ne30pg2 output variables in e3sm_diags is now ready to be merged in the newly refactored e3sm_diags codes on main (now is xarray/xcdat based). We included scripts example to set up diags run for the decadal run here. Thank you @AaronDonahue and @crterai for reviewing and commenting on variable derivations. We will have separate PRs for more capabilities that support EAMxx. Feedback welcome!

@@ -13,7 +13,7 @@ dependencies:
- cartopy >=0.17.0
- cartopy_offlinedata
- cf-units
- dask
- dask <2024.12.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang @xylar FYI I was notified the performance issue will be fixed in dask=2025.1.0, dask/community#410 via dask/dask#11638 .

I think we should still constrain the dask version for now until 2025.1.0 is released and we do some more testing during the RC testing phase.

Copy link
Contributor

@xylar xylar Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to change this to dask !=2024.12.0,!=2024.12.1? I guess it's fine either way but that seems like we're we're likely to end up, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either way would be fine. Before we are able to test 2025.1.0, I think dask <2024.12.0 is safer for now.

Copy link
Collaborator

@tomvothecoder tomvothecoder Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update this constraint to dask !=2024.12.0,!=2024.12.1 in a separate PR so we can test 2025.1.0 to confirm it is indeed fixed.

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me. Thank you for your work @chengzhuzhang!

Let's wait to merge PR #907 first, then rebase this branch and re-run the EAMxx complete run. I want to make sure the subsetting changes mentioned here work the same on this branch.

Once everything checks out, I'll merge this PR.

e3sm_diags/driver/lat_lon_driver.py Outdated Show resolved Hide resolved
@crterai
Copy link

crterai commented Jan 13, 2025

Thank you for working on this PR and bringing it to completion, @chengzhuzhang and @tomvothecoder. The documentation will be helpful to acquaint new users.
I noticed the sample e3sm_diags output that is produced mainly compares with ERA5 variables and skips the radiation fluxes, likely because the simulation period (1996) does not overlap with the CERES period. Whenever we're trying to do full global eval on the EAMxx side, we should try to run a period around 2001 to make the most of the new capabilities to compare coincident obs with EAMxx output.
@chengzhuzhang - did you notice any outputs that we were not saving from our sims that we typically also evaluate?

@chengzhuzhang
Copy link
Contributor Author

chengzhuzhang commented Jan 13, 2025

@crterai thank you for your comment, this is great point! The CERES EBAF variables are not showing up, because the period used doesn't have CERES record. Other than this, I did notice there are variables or group of variables that are often looked at in EAM, but can't be evaluated in this particular decadal output:

  1. Surface stress that can be derived from e.g. TAUX, TAUY in EAM.
  2. Monthly mean daily max and min temperature: e.g. TREFMXAV and TREFMNAV in EAM

Next I will evaluate the COSP output mapping, also need to follow up with the aerosol, chemistry development, cause we already have some basic capabilities for aerosol budget and chemistry, it should be straightforward to update them to support future EAMxx output if the equivalent variables are saved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants