Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Investigate differences in plots produced with zppy test using Xarray/xCDAT codebase #906

Closed
tomvothecoder opened this issue Dec 11, 2024 · 8 comments · Fixed by #907
Labels
bug Bug fix (will increment patch version)

Comments

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Dec 11, 2024

What happened?

Related to E3SM-Project/zppy#651 (comment)

Image check failures

There's a large number of image check diffs in subdirectories of:

But as far as I can tell, the errors are mostly benign. Example errors:

  • Many involve actual having "RMSE" and "CORR" further to the left than in expected.
  • Slight differences in text shape/size -- e.g., actual and expected
  • Slight differences in contours like actual and expected
  • Differences in lat/lon labels -- actual and expected

Some errors seem more concerning:

What did you expect to happen? Are there are possible answers you came across?

No response

Minimal Complete Verifiable Example (MVCE)

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

Latest main with Xarray/xCDAT codebase

@tomvothecoder tomvothecoder added the bug Bug fix (will increment patch version) label Dec 11, 2024
@chengzhuzhang
Copy link
Contributor

chengzhuzhang commented Dec 11, 2024

@forsyth2 and @tomvothecoder thank you for reporting this issue. Some results are definitely off. example https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-346-20241210/v3.LR.historical_0051/image_check_failures_comprehensive_v3/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/lat_lon/ERA5/ERA5-TREFHT-ANN-global.png_actual.png and its corresponding box plot. I will try go through in details.

@forsyth2
Copy link
Collaborator

@chengzhuzhang @tomvothecoder If you want to test using zppy (rather than purely e3sm_diags), you can use the following workflow:

Workflow

# Set up branch
cd <zppy directory>
git status
# Check that there are no file changes that will persist when we switch branches.
git fetch origin main
# Check out local version of my branch
# You need to do this because zppy can't run post-CDAT E3SM Diags with its code as of `main`.
git checkout -b issue-346-diags-post-refactor origin/issue-346-diags-post-refactor

# Set up E3SM Diags environment
cd <e3sm_diags directory>
git status
# Check that there are no file changes that will persist when we switch branches.
git fetch upstream
git checkout main
git reset --hard upstream/main
conda clean --all --y
conda env create -f conda-env/dev.yml -n e3sm_diags_main_<date>
conda activate e3sm_diags_main_<date>
pip install .

# Set up zppy environment
cd <zppy directory>
conda clean --all --y
conda env create -f conda/dev.yml -n zppy_dev_pr651
conda activate zppy_dev_pr651
pip install .

If you want quick debugging method:

For a sample test cfg, you could modify the paths in tests/integration/generated/test_min_case_e3sm_diags_comprehensive_v3_chrysalis.cfg, available at https://github.com/E3SM-Project/zppy/pull/651/files

# Run zppy
zppy -c test.cfg

If you want to produce my test results:

Full test reproduction steps
# Set up a zppy-interfaces environment too
# We need to do this because the full test suite is expecting we have all tasks operating.
cd <zppy-interfaces>
git status
# Check that there are no file changes that will persist when we switch branches.
git fetch upstream
git checkout main
git reset --hard upstream/main
conda clean --all --y
conda env create -f conda-env/dev.yml -n zi_main_<date>
conda activate zi_main_<date>
pip install .

# Modify tests/integration/utils.py:
# UNIQUE_ID = "my-test-id"
# For get_chyrsalis_expansions: (Switch out the paths so they point to your conda install location)
#        "diags_environment_commands": "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate e3sm_diags_main_<date>",
#        "global_time_series_environment_commands": "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate zi_main_<date>",

python tests/integration/utils.py
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Runs 1st part of bundles cfg

# Once those all finish:
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Runs 2nd part of bundles cfg

# Once that finishes:
# Check output, grep lines should print nothing
cd /lcrc/group/e3sm/${USER}/zppy_weekly_comprehensive_v3_output/my-test-id/v3.LR.historical_0051/post/scripts/
grep -v "OK" *status
cd /lcrc/group/e3sm/${USER}/zppy_weekly_comprehensive_v2_output/my-test-id/v2.LR.historical_0201/post/scripts
grep -v "OK" *status
cd /lcrc/group/e3sm/${USER}/zppy_weekly_bundles_output/my-test-id/v3.LR.historical_0051/post/scripts
grep -v "OK" *status

# Run integration tests
cd <zppy directory>
pytest tests/integration/test_*.py

Now look for the image_check_failures subdirectories on the web server:

https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/<username>/zppy_<test_name>_www/my-test-id/v3.LR.historical_0051/

Those have actual, expected, and diff images.

@chengzhuzhang
Copy link
Contributor

thanks @forsyth2 !
It does look like cdscan is now successfully removed as a zppy dependency, with the updates in e3sm_diags new code base and zppy?

@chengzhuzhang
Copy link
Contributor

I manually went through the diffs and noted some outstanding issues we need to address:

  1. Any sets with variable TREFHT, SST and TREFMNAV
    2. GPCP v3.2 diff plot
  2. MISRCOSP-CLDLOW_TAU1.3_9.4_MISR
  3. Metrics diffs in zonal mean 2d plots.
    I think we should be able to confirm these diffs with v3 complete run regression tests.

@forsyth2
Copy link
Collaborator

It does look like cdscan is now successfully removed as a zppy dependency, with the updates in e3sm_diags new code base and zppy?

Yes, E3SM-Project/zppy#651 removes cdscan from zppy code. It was never actually a zppy dependency -- it was used in e3sm_diags.bash which runs in whatever environment that environment_commands is set to.

The only other issue with CDAT was with global-time-series, which was resolved in E3SM-Project/zppy#519 + E3SM-Project/zppy#611 (same situation -- the environment that global_time_series.bash is run in is actually what used it, not the zppy environment itself.)

In summary, the zppy environment never used CDAT/cdscan, but there were two instances where it was used in templated code run in other environments. Once E3SM-Project/zppy#651 merges, not even these will remain.

@chengzhuzhang
Copy link
Contributor

@tomvothecoder I can reproduce first issue (TREFHT) with a dev environment created from main . However, when debugging in a new PR (#907) in vscode. I ran into a problem regarding ESMF:

chengzhu@login12:~/e3sm_diags>  cd /global/homes/c/chengzhu/e3sm_diags ; /usr/bin/env /global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/bin/python /global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/adapter/../../debugpy/launcher 54913 -- /global/homes/c/chengzhu/e3sm_diags/auxiliary_tools/cdat_regression_testing/906-v3_complete_run/12_11_24_lat-lon_diffs.py 
Traceback (most recent call last):
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71, in <module>
    cli.main()
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
    run()
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
    return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
    _run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
    exec(code, run_globals)
  File "/global/homes/c/chengzhu/e3sm_diags/auxiliary_tools/cdat_regression_testing/906-v3_complete_run/12_11_24_lat-lon_diffs.py", line 9, in <module>
    from e3sm_diags.parameter.core_parameter import CoreParameter
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/parameter/__init__.py", line 1, in <module>
    from .annual_cycle_zonal_mean_parameter import ACzonalmeanParameter
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/parameter/annual_cycle_zonal_mean_parameter.py", line 1, in <module>
    from .core_parameter import CoreParameter
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 11, in <module>
    from e3sm_diags.driver.utils.climo_xr import ClimoFreq
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/utils/climo_xr.py", line 11, in <module>
    import xcdat as xc
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xcdat/__init__.py", line 11, in <module>
    from xcdat.regridder.accessor import RegridderAccessor  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xcdat/regridder/__init__.py", line 1, in <module>
    from xcdat.regridder.accessor import RegridderAccessor  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xcdat/regridder/accessor.py", line 8, in <module>
    from xcdat.regridder import regrid2, xesmf, xgcm
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xcdat/regridder/xesmf.py", line 4, in <module>
    import xesmf as xe
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xesmf/__init__.py", line 3, in <module>
    from . import data, util
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xesmf/util.py", line 8, in <module>
    import esmpy as ESMF
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/esmpy/__init__.py", line 108, in <module>
    from esmpy.api.esmpymanager import *
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/esmpy/api/esmpymanager.py", line 9, in <module>
    from esmpy.interface.cbindings import *
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/esmpy/interface/cbindings.py", line 13, in <module>
    from esmpy.interface.loadESMF import _ESMF
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/esmpy/interface/loadESMF.py", line 89, in <module>
    raise VersionMismatch("ESMF installation version {}, ESMPy version {}".format(
esmpy.util.exceptions.VersionMismatch: ESMF installation version 8.4.2, ESMPy version 8.7.0

I'm wondering have you encountered similar problems before?

@tomvothecoder
Copy link
Collaborator Author

I manually went through the diffs and noted some outstanding issues we need to address:

1. Any sets with variable TREFHT, [SST](https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-346-20241210/v3.LR.historical_0051/image_check_failures_comprehensive_v3/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/lat_lon/SST_CL_HadISST/HadISST_CL-SST-ANN-global.png_diff.png) and [TREFMNAV](https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-346-20241210/v3.LR.historical_0051/image_check_failures_comprehensive_v3/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/lat_lon/MERRA2/MERRA2-TREFMNAV-ANN-global.png_diff.png)
   2.[ GPCP v3.2 diff plot](https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-346-20241210/v3.LR.historical_0051/image_check_failures_comprehensive_v3/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/lat_lon/GPCP_v3.2/GPCP_v3.2-PRECT-ANN-global.png_diff.png)

2. [MISRCOSP-CLDLOW_TAU1.3_9.4_MISR](https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-346-20241210/v3.LR.historical_0051/image_check_failures_comprehensive_v3/e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1987-1988/lat_lon/Cloud%20MISR/MISRCOSP-CLDLOW_TAU1.3_9.4_MISR-ANN-global.png_diff.png)

3. Metrics diffs in zonal mean 2d plots.
   I think we should be able to confirm these diffs with v3 complete run regression tests.

Here is my comment in PR #903 with the regression testing notebook.

The variables that you pointed out here with issues are also appearing in the list of mismatching variables.

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Dec 12, 2024

@tomvothecoder I can reproduce first issue (TREFHT) with a dev environment created from main . However, when debugging in a new PR (#907) in vscode. I ran into a problem regarding ESMF:

chengzhu@login12:~/e3sm_diags>  cd /global/homes/c/chengzhu/e3sm_diags ; /usr/bin/env /global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/bin/python /global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/adapter/../../debugpy/launcher 54913 -- /global/homes/c/chengzhu/e3sm_diags/auxiliary_tools/cdat_regression_testing/906-v3_complete_run/12_11_24_lat-lon_diffs.py 
Traceback (most recent call last):
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71, in <module>
    cli.main()
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
    run()
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
    return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
    _run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.12.0/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
    exec(code, run_globals)
  File "/global/homes/c/chengzhu/e3sm_diags/auxiliary_tools/cdat_regression_testing/906-v3_complete_run/12_11_24_lat-lon_diffs.py", line 9, in <module>
    from e3sm_diags.parameter.core_parameter import CoreParameter
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/parameter/__init__.py", line 1, in <module>
    from .annual_cycle_zonal_mean_parameter import ACzonalmeanParameter
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/parameter/annual_cycle_zonal_mean_parameter.py", line 1, in <module>
    from .core_parameter import CoreParameter
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 11, in <module>
    from e3sm_diags.driver.utils.climo_xr import ClimoFreq
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/utils/climo_xr.py", line 11, in <module>
    import xcdat as xc
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xcdat/__init__.py", line 11, in <module>
    from xcdat.regridder.accessor import RegridderAccessor  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xcdat/regridder/__init__.py", line 1, in <module>
    from xcdat.regridder.accessor import RegridderAccessor  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xcdat/regridder/accessor.py", line 8, in <module>
    from xcdat.regridder import regrid2, xesmf, xgcm
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xcdat/regridder/xesmf.py", line 4, in <module>
    import xesmf as xe
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xesmf/__init__.py", line 3, in <module>
    from . import data, util
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/xesmf/util.py", line 8, in <module>
    import esmpy as ESMF
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/esmpy/__init__.py", line 108, in <module>
    from esmpy.api.esmpymanager import *
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/esmpy/api/esmpymanager.py", line 9, in <module>
    from esmpy.interface.cbindings import *
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/esmpy/interface/cbindings.py", line 13, in <module>
    from esmpy.interface.loadESMF import _ESMF
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/edv3/lib/python3.12/site-packages/esmpy/interface/loadESMF.py", line 89, in <module>
    raise VersionMismatch("ESMF installation version {}, ESMPy version {}".format(
esmpy.util.exceptions.VersionMismatch: ESMF installation version 8.4.2, ESMPy version 8.7.0

I'm wondering have you encountered similar problems before?

Hey @chengzhuzhang, I think I ran into this with e3sm_diags before and I recreated the dev env to get around it.

I just created the dev environment and was able to import xcdat without issue. All of the correct dependency versions are installed.

mamba env create -f conda-env/dev.yml -n e3sm_diags_dev_894_2
mamba activate e3sm_diags_dev_894_2

(e3sm_diags_dev_894_2) vo13@login17:.../E3SM-Project/e3sm_diags$ python
Python 3.12.8 | packaged by conda-forge | (main, Dec  5 2024, 14:24:40) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xcdat
>>> 
KeyboardInterrupt
>>> 
[1]+  Stopped                 python
(e3sm_diags_dev_894_2) vo13@login17:.../E3SM-Project/e3sm_diags$ conda list
# packages in environment at /global/homes/v/vo13/mambaforge/envs/e3sm_diags_dev_894_2:
#
# Name                    Version                   Build  Channel
...
esmf                      8.7.0           nompi_h6063b07_1    conda-forge
esmpy                     8.7.0              pyhecae5ae_0    conda-forge
...
xarray                    2024.11.0          pyhd8ed1ab_0    conda-forge
xcdat                     0.7.3              pyhd8ed1ab_1    conda-forge
xesmf                     0.8.8              pyhd8ed1ab_1    conda-forge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fix (will increment patch version)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants