Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting error in area average of irregular grids in UKESM data #2643

Open
rswamina opened this issue Jan 22, 2025 · 23 comments
Open

Reporting error in area average of irregular grids in UKESM data #2643

rswamina opened this issue Jan 22, 2025 · 23 comments
Assignees
Labels
preprocessor Related to the preprocessor

Comments

@rswamina
Copy link

In testing ESMValTool with some newly CMORized data, I ran into the following error message when I tried to do area averaging. The variable is tos. Here is some sample code to test this

import iris
from esmvalcore.preprocessor import area_statistics


def main():

    filename = '/gws/nopw/j04/terrafirma/TerraFIRMA/MOHC/UKESM1-2/esm-hist/r1i1p1f1/Omon/tos/gn/v20240619/tos_Omon_UKESM1-2-LL_esm-hist_r1i1p1f1_gn_195001-201412.nc'
    cube = iris.load_cube(filename)
    cube_averaged = area_statistics(cube,'mean')


if __name__ == '__main__':
    main()

I used module load esmvaltool on JASMIN and and the error message I got was:

ERROR 1: PROJ: proj_create_from_database: Open of /apps/jasmin/community/esmvaltool/miniconda3_py311_23.11.0-2/envs/esmvaltool/share/proj failed
/apps/jasmin/community/esmvaltool/miniconda3_py311_23.11.0-2/envs/esmvaltool/lib/python3.11/site-packages/iris/fileformats/cf.py:859: UserWarning: Missing CF-netCDF measure variable 'areacello', referenced by netCDF variable 'tos'
  warnings.warn(
Traceback (most recent call last):
  File "./test_esmvalcore_area_average.py", line 13, in <module>
    main()
  File "./test_esmvalcore_area_average.py", line 9, in main
    cube_averaged = area_statistics(cube,'mean')
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/apps/jasmin/community/esmvaltool/miniconda3_py311_23.11.0-2/envs/esmvaltool/lib/python3.11/site-packages/esmvalcore/preprocessor/_area.py", line 398, in area_statistics
    agg_kwargs = update_weights_kwargs(
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/apps/jasmin/community/esmvaltool/miniconda3_py311_23.11.0-2/envs/esmvaltool/lib/python3.11/site-packages/esmvalcore/preprocessor/_shared.py", line 180, in update_weights_kwargs
    callback(cube, **callback_kwargs)
  File "/apps/jasmin/community/esmvaltool/miniconda3_py311_23.11.0-2/envs/esmvaltool/lib/python3.11/site-packages/esmvalcore/preprocessor/_area.py", line 339, in _try_adding_calculated_cell_area
    raise CoordinateMultiDimError(cube.coord('latitude'))
iris.exceptions.CoordinateMultiDimError: Multi-dimensional coordinate not supported: 'latitude'

@valeriupredoi valeriupredoi added the preprocessor Related to the preprocessor label Jan 22, 2025
@valeriupredoi
Copy link
Contributor

cheers @rswamina - I'll have a closer look at it tomorrow, it is almost certain the preprocessor fails since there are no area_cell auxiliary variables to load, but I'll have a poke see if we can find anything else 👍

@schlunma
Copy link
Contributor

Hi Ranjini! Yes, V's right. Since these data most likely use a curvilinear grid (what we call "irregular" in ESMValTool jargon), you need cell areas for averaging (in this case the areacello variable). The recommended way to do that is described here: https://github.com/ESMValGroup/ESMValCore/blob/main/notebooks/loading-and-processing-data.ipynb

If you already have the cell areas as cube, you can add them to your parent cube with add_supplementary_variables.

@rswamina
Copy link
Author

Thanks @schlunma and @valeriupredoi for this. I am tagging @jprb-walton here to provide the fx file for the cell areas. Since this is not a standard directory set up in ESMValTool yet, can I just load that file into a cube and use
add_supplementary_variables(cube, area_cell_cube)?

@schlunma
Copy link
Contributor

Thanks @schlunma and @valeriupredoi for this. I am tagging @jprb-walton here to provide the fx file for the cell areas. Since this is not a standard directory set up in ESMValTool yet, can I just load that file into a cube and use add_supplementary_variables(cube, area_cell_cube)?

Yes

@valeriupredoi
Copy link
Contributor

I looked through the available Terrafirma data, and couldn't find any areacello output for @rswamina - not even areacella (which is wrong!) not in current experiment, nor in piControl, maybe a hack would be to use UKESM1-0 for that? These variables barely do change over model iterations /badc/cmip6/data/CMIP6/CMIP/MOHC/UKESM1-0-LL/piControl/r1i1p1f2/Ofx/

@rswamina
Copy link
Author

@valeriupredoi and @schlunma - I can confirm that we are to use a previous version's areacello file. @jprb-walton suggests using /badc/cmip6/data/CMIP6/CMIP/MOHC/UKESM1-0-LL/piControl/r1i1p1f2/Ofx/areacello/gn/files/d20190705/areacello_Ofx_UKESM1-0-LL_piControl_r1i1p1f2_gn.nc.

I am still not successfully able to plot the area averaged mean though. I tried the following:

area_cells_file = '/badc/cmip6/data/CMIP6/CMIP/MOHC/UKESM1-0-LL/piControl/r1i1p1f2/Ofx/areacello/gn/files/d20190705/areacello_Ofx_UKESM1-0-LL_piControl_r1i1p1f2_gn.nc'

area_cells_cube = iris.load(area_cells_file)
add_supplementary_variables(cube, area_cells_cube)

cube.convert_units('Celsius') 
cube = area_statistics(cube,'mean')
cube = annual_statistics(cube, operator="mean")    
qplt.plot(cube)

What I get is the plot as shown

Image

The above is orders of magnitude different from what I should get. I am wondering if this is some area measure instead of area weighted temperature. For reference, here is the plot from one month of one year using the original grid coordinate system (just to get an idea of the range of values).

Image

Can you suggest what I should change to get the actual area averages of temperature? Thanks!

@valeriupredoi
Copy link
Contributor

hmm that could be one of these:

  • masking problem ie _fill_value is not identified correctly
  • unit conversion is done wrong (maybe because the original cube's data is wonky)
  • issue with the actual weights - if the UKESM1-0 areacello is on a different NEMO grid than Terrafirma data (is shouldn't be) then you'll get all manners of weighting pre/postfactors in the data

@valeriupredoi
Copy link
Contributor

at least the trend looks correct - so am guessing there's some constant fudge factor that comes in at some point

@rswamina
Copy link
Author

How do I fix this?

@valeriupredoi
Copy link
Contributor

I'll have a play with this particular test @rswamina 🍺

@rswamina
Copy link
Author

Hi @valeriupredoi - any updates on this?

@valeriupredoi
Copy link
Contributor

OK this is a problem related to iris lazy vs realized data - a quick fix is for @rswamina to pop a simple realization of the areacello data via eg a print statement print("Area cells data", np.ma.min(area_cells_cube[0].data), np.ma.max(area_cells_cube[0].data)) ie

import iris
import numpy as np
import matplotlib.pyplot as plt
import iris.quickplot as qplt

from esmvalcore.preprocessor import add_supplementary_variables
from esmvalcore.preprocessor import area_statistics


def do_avg():
    filename = '/gws/nopw/j04/terrafirma/TerraFIRMA/MOHC/UKESM1-2/esm-hist/r1i1p1f1/Omon/tos/gn/v20240619/tos_Omon_UKESM1-2-LL_esm-hist_r1i1p1f1_gn_195001-201412.nc'
    cube = iris.load_cube(filename)
    area_cells_file = '/badc/cmip6/data/CMIP6/CMIP/MOHC/UKESM1-0-LL/piControl/r1i1p1f2/Ofx/areacello/gn/files/d20190705/areacello_Ofx_UKESM1-0-LL_piControl_r1i1p1f2_gn.nc'

    area_cells_cube = iris.load(area_cells_file)
    print("Area cells data", np.ma.min(area_cells_cube[0].data), np.ma.max(area_cells_cube[0].data))
    add_supplementary_variables(cube, area_cells_cube)
    cube.convert_units('Celsius')
    cube = area_statistics(cube,'mean')
    print("Cube mean data after area stats", np.mean(cube.data))
    qplt.plot(cube)
    plt.show()


do_avg()

will give you a nice and sane plot

Image

@valeriupredoi
Copy link
Contributor

could be that the issue (I vaguely remember which one it was) got solved, but Ranjini has iris=3.7 and dask=2024.1 in the JASMIN env (via esmvaltool=2.10 central install), I'll try see if I can replicate the issue with modern iris/Dask now

@rswamina
Copy link
Author

I see. I would not have expected to do this but do I understand correctly that the print statement will allow the realization of the data into memory? If this is the case, do I need to use the print statement in my code or is this not the appropriate fix?

@valeriupredoi
Copy link
Contributor

any call to cube.data whether it be a print statement or a simple variable assignment to it, will realize cube's data into memory ie it will convert the data from a Dask object that only contains pointers to the data (lazy data) to an actual offload of the float32 data into memory. So you don't have to print per se. I am trying to replicate the issue with latest Dask and iris - JASMIN is a complete PITA (filesystem as slow as a dead horse) , but will hopefully get it done by end of day today 🍺

@rswamina
Copy link
Author

Ok. But just to clarify, is this something we should learn to do (as in realization) or is this considered a bug in a sense? To ensure this is incorporated into any programming for the future or used as a debugging step instead. Thanks much!

@valeriupredoi
Copy link
Contributor

No! Realization is bad - because that's heavy memory consumption! This is a bug in Iris and I need to get its roots, if it's not been fixed in the newer versions - that's why I'm trying to get an env on JASMIN with the latest iris (and Dask)

@valeriupredoi
Copy link
Contributor

(this was somwehat of a last resort for me to see where the issue may be coming from)

@valeriupredoi
Copy link
Contributor

OK good news! All is fine with the latest Dask, iris, and esmvalcore:

(esmvalcore) [valeriu@sci-vm-02 ~]$ conda list dask
# packages in environment at /home/users/valeriu/miniconda3/envs/esmvalcore:
#
# Name                    Version                   Build  Channel
dask                      2025.1.0           pyhd8ed1ab_0    conda-forge
dask-core                 2025.1.0           pyhd8ed1ab_0    conda-forge
dask-jobqueue             0.9.0              pyhd8ed1ab_0    conda-forge
(esmvalcore) [valeriu@sci-vm-02 ~]$ conda list iris
# packages in environment at /home/users/valeriu/miniconda3/envs/esmvalcore:
#
# Name                    Version                   Build  Channel
iris                      3.11.0             pyha770c72_0    conda-forge
iris-esmf-regrid          0.11.0             pyhd8ed1ab_1    conda-forge
iris-grib                 0.21.0             pyhd8ed1ab_0    conda-forge
(esmvalcore) [valeriu@sci-vm-02 ~]$ conda list esmvalcore
# packages in environment at /home/users/valeriu/miniconda3/envs/esmvalcore:
#
# Name                    Version                   Build  Channel
esmvalcore                2.11.0.dev240+g1e342016c          pypi_0    pypi

this is most prob the bug that Manu and I found way back and iris fixed it. I suggest you use a latest install of ESMValTool, Ranjini. I could prob update the central install, but I'd rather do it when v2.12 is out in a month or so.

@valeriupredoi
Copy link
Contributor

OK, so let's do it this way then: I'll install a central version of ESMValTool (with the latest development ESMValCore) on JASMIN and will make it available to you and @jprb-walton (please don't spread the news about it, since it's gonna be usable by others too as a normal module). Just wanted to check one thing though: I am looking at the average temp values in C before area stats and that is about 7C, then after the area stats I see 18C -> is that something we'd expect, I mean the jump after averaging over areas? 🍺

@rswamina
Copy link
Author

@valeriupredoi - Will this have implications for those using the new simulations until then? Is it best to use a latest installation of ESMValTool instead of the JASMIN module or is this something that only affects data with irregular grids or perhaps larger files? It will be good to know and let others know as well so they can avoid this problem till the update.

@rswamina
Copy link
Author

@valeriupredoi - I am happy to go woth whatever you suggest for how we should use the right version of iris/dask. I don't know if the values are right but will look into that.

@valeriupredoi
Copy link
Contributor

@rswamina there have been a number of (some less serious, some rather serious) bugs in Dask, impacting Iris, and us - via iris, or directly via Dask. As far as I remember this particular one was an issue from iris but I can't remember the exact issue - what's important is that a lot of these issues have now been solved, that's why I want to install the latest ESMValTool with the latest ESMValCore on JASMIN for you and Jeremy to use; don't worry about letting users know about stuff involving the 2.10 installation on JASMIN, that'll get superseded by 2.12 in a short while (somehow I missed installing 2.11 but that'll be this one I'll try do today with latest Tool/Core)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preprocessor Related to the preprocessor
Projects
None yet
Development

No branches or pull requests

3 participants