-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EAMxx: Populate long_name of output with human readable description #6940
Comments
IO already outputs the standard name. Perhaps the branch you are using is old? Although, I think we added the feature quite a long time ago... E.g., from our baselines:
I'm not sure why we have Edit: it looks like it was added not too long ago: PR E3SM-Project/scream#3105 |
Both are valid attributes; the long_name is a human description of the value (as you see above, "Downwelling longwave flux at surface") which can be a project internal description or whatever, but the standard_name follows stricter criteria: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html |
I wish we used "description" then, rather than "long_name". |
In the case you pointed to, the "long_name" of "T_mid" is neither long nor a description. Is the default to just repeat the variable name? |
Yes. We only have stored a long_name value for a handful of fieldd: std::map<std::string,std::string> name_2_longname = {
{"lev","hybrid level at midpoints (1000*(A+B))"},
{"ilev","hybrid level at interfaces (1000*(A+B))"},
{"hyai","hybrid A coefficient at layer interfaces"},
{"hybi","hybrid B coefficient at layer interfaces"},
{"hyam","hybrid A coefficient at layer midpoints"},
{"hybm","hybrid B coefficient at layer midpoints"}
}; For everything else we simply repeat the eamxx name. |
Yes, but we discussed this on the eamxx call today, and some eval peeps will come up with a list and we will add it like we did for standard_names |
@rljacob + @bartgol + @crterai + @AaronDonahue, I propose the following: We save this info (see main post above #6940 (comment)) in easily viewable files (ideally yaml or csv) in the repo and then make a little function to read them and load them. That way, anyone can edit them. I volunteer to do that once we compile a list of long names we ant to add. |
I see your proposal and raise another proposal: instead of long_name, we call the metadata "description" and storing a very similar (if not identical) string as in "standard_name". Btw, @jeff-cohere wrote a utility a while ago that downloads the CF database, which should also include a description (IIRC). We could revive that, and modify the DB to be a map eamxx_name->metadata, and have Io read that yaml file at runtime |
"description" and "long_name" are already 2 distinct pieces of metadata in climate netcdf files. The "description" in the CF table is really long and not something you want to include. "long_name" comes from the COARDS convention, an older convention that CF generalizes and extends. COARDS doesn't prescribe any specific long_names for variables but says what it should do in general: "a long descriptive name (title). This could be used for labeling plots, for example. If a variable has no long_name attribute assigned, the variable name will be used as a default." There is one standard that prescribes long_names: CMIP. See https://github.com/PCMDI/cmip6-cmor-tables/blob/main/Tables/CMIP6_AERmon.json |
My proposal is actually simple, see patch below. All I desire is a simple way (away from cpp/f90 code) for users to simply issue a PR updating these names as they wish. That isolates the code mechanics from the naming stuff. In current github ui, the csv file is rendered and searchable nicely (see here). So, when someone (like Chris above) points out deficiencies, we can point the user to issuing a PR updating this CSV file. We can also link it in the docs with instructions as well. patch
From 6d5429bffd7cac20f0b6cdd9699b913a84cf19fc Mon Sep 17 00:00:00 2001
From: Naser Mahfouz <[email protected]>
Date: Tue, 28 Jan 2025 22:46:44 -0500
Subject: [PATCH] add csv io names to scream
---
.../src/share/util/scream_io_longnames.csv | 7 ++
.../share/util/scream_io_standardnames.csv | 70 +++++++++++
.../eamxx/src/share/util/scream_utils.hpp | 115 +++++-------------
3 files changed, 110 insertions(+), 82 deletions(-)
create mode 100644 components/eamxx/src/share/util/scream_io_longnames.csv
create mode 100644 components/eamxx/src/share/util/scream_io_standardnames.csv
diff --git a/components/eamxx/src/share/util/scream_io_longnames.csv b/components/eamxx/src/share/util/scream_io_longnames.csv
new file mode 100644
index 000000000000..db53f3a82556
--- /dev/null
+++ b/components/eamxx/src/share/util/scream_io_longnames.csv
@@ -0,0 +1,7 @@
+variable,longname
+lev,hybrid level at midpoints (1000*(A+B))
+ilev,hybrid level at interfaces (1000*(A+B))
+hyai,hybrid A coefficient at layer interfaces
+hybi,hybrid B coefficient at layer interfaces
+hyam,hybrid A coefficient at layer midpoints
+hybm,hybrid B coefficient at layer midpoints
diff --git a/components/eamxx/src/share/util/scream_io_standardnames.csv b/components/eamxx/src/share/util/scream_io_standardnames.csv
new file mode 100644
index 000000000000..33aadffee523
--- /dev/null
+++ b/components/eamxx/src/share/util/scream_io_standardnames.csv
@@ -0,0 +1,70 @@
+variable,standardname
+p_mid,air_pressure
+p_mid_at_cldtop,air_pressure_at_cloud_top
+T_2m,air_temperature
+T_mid,air_temperature
+T_mid_at_cldtop,air_temperature_at_cloud_top
+aero_g_sw,asymmetry_factor_of_ambient_aerosol_particles
+pbl_height,atmosphere_boundary_layer_thickness
+precip_liq_surf_mass,atmosphere_mass_content_of_liquid_precipitation
+cldlow,low_type_cloud_area_fraction
+cldmed,medium_type_cloud_area_fraction
+cldhgh,high_type_cloud_area_fraction
+cldtot,cloud_area_fraction
+cldfrac_tot_at_cldtop,cloud_area_fraction
+cldfrac_tot,cloud_area_fraction_in_atmosphere_layer
+cldfrac_tot_for_analysis,cloud_area_fraction_in_atmosphere_layer
+cldfrac_rad,cloud_area_fraction_in_atmosphere_layer
+qi,cloud_ice_mixing_ratio
+qc,cloud_liquid_water_mixing_ratio
+U,eastward_wind
+eff_radius_qi,effective_radius_of_cloud_ice_particles
+eff_radius_qc,effective_radius_of_cloud_liquid_water_particles
+eff_radius_qc_at_cldtop,effective_radius_of_cloud_liquid_water_particles_at_liquid_water_cloud_top
+eff_radius_qr,effective_radius_of_cloud_rain_particles
+qv,humidity_mixing_ratio
+cldfrac_ice_at_cldtop,ice_cloud_area_fraction
+cldfrac_ice,ice_cloud_area_fraction_in_atmosphere_layer
+omega,lagrangian_tendency_of_air_pressure
+landfrac,land_area_fraction
+latitude,latitude
+cldfrac_liq_at_cldtop,liquid_water_cloud_area_fraction
+cldfrac_liq,liquid_water_cloud_area_fraction_in_atmosphere_layer
+longitude,longitude
+rainfrac,mass_fraction_of_liquid_precipitation_in_air
+V,northward_wind
+nc,number_concentration_of_cloud_liquid_water_particles_in_air
+cdnc_at_cldtop,number_concentration_of_cloud_liquid_water_particles_in_air_at_liquid_water_cloud_top
+ni,number_concentration_of_ice_crystals_in_air
+aero_tau_sw,optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
+aero_tau_lw,optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
+aero_ssa_sw,single_scattering_albedo_in_air_due_to_ambient_aerosol_particles
+sunlit,sunlit_binary_mask
+ps,surface_air_pressure
+LW_flux_dn_at_model_bot,surface_downwelling_longwave_flux_in_air
+SW_flux_dn_at_model_bot,surface_downwelling_shortwave_flux_in_air
+SW_clrsky_flux_dn_at_model_bot,surface_downwelling_shortwave_flux_in_air_assuming_clear_sky
+phis,surface_geopotential
+surf_radiative_T,surface_temperature
+surf_sens_flux,surface_upward_sensible_heat_flux
+SW_flux_dn_at_model_top,toa_incoming_shortwave_flux
+LW_flux_up_at_model_top,toa_outgoing_longwave_flux
+LW_clrsky_flux_up_at_model_top,toa_outgoing_longwave_flux_assuming_clear_sky
+surf_evap,water_evapotranspiration_flux
+AtmosphereDensity,air_density
+PotentialTemperature,air_potential_temperature
+SeaLevelPressure,air_pressure_at_mean_sea_level
+IceWaterPath,atmosphere_mass_content_of_cloud_ice
+LiqWaterPath,atmosphere_mass_content_of_cloud_liquid_water
+VapWaterPath,atmosphere_mass_content_of_water_vapor
+AerosolOpticalDepth550nm,atmosphere_optical_thickness_due_to_ambient_aerosol_particles
+Exner,dimensionless_exner_function
+z_mid,geopotential_height
+geopotential_mid,geopotential_height
+RelativeHumidity,relative_humidity
+surface_upward_latent_heat_flux,surface_upward_latent_heat_flux
+LongwaveCloudForcing,toa_longwave_cloud_radiative_effect
+ShortwaveCloudForcing,toa_shortwave_cloud_radiative_effect
+VirtualTemperature,virtual_temperature
+VaporFlux,water_evapotranspiration_flux
+wind_speed,wind_speed
diff --git a/components/eamxx/src/share/util/scream_utils.hpp b/components/eamxx/src/share/util/scream_utils.hpp
index 9577b5597bff..66ecd151b21e 100644
--- a/components/eamxx/src/share/util/scream_utils.hpp
+++ b/components/eamxx/src/share/util/scream_utils.hpp
@@ -12,6 +12,8 @@
#include <algorithm>
#include <map>
#include <iostream>
+#include <fstream>
+#include <sstream>
namespace scream {
@@ -388,89 +390,38 @@ struct DefaultMetadata {
}
}
- // Create map of longnames, can be added to as developers see fit.
- std::map<std::string,std::string> name_2_longname = {
- {"lev","hybrid level at midpoints (1000*(A+B))"},
- {"ilev","hybrid level at interfaces (1000*(A+B))"},
- {"hyai","hybrid A coefficient at layer interfaces"},
- {"hybi","hybrid B coefficient at layer interfaces"},
- {"hyam","hybrid A coefficient at layer midpoints"},
- {"hybm","hybrid B coefficient at layer midpoints"}
- };
+ // Create map of longnames, see associated file
+ auto name_2_longname = readCSVToMap("scream_io_longnames.csv")
+
+ // Create map of longnames, see associated file
+ auto name_2_standardname = readCSVToMap("scream_io_standardnames.csv")
+
+ std::map<std::string, std::string> readCSVToMap(const std::string& filename) {
+ std::ifstream file(filename);
+ if (!file.is_open()) {
+ std::cerr << "Could not open the file!" << std::endl;
+ return {};
+ }
+
+ std::map<std::string, std::string> dataMap;
+ std::string line;
+ bool isFirstLine = true;
+ while (std::getline(file, line)) {
+ if (isFirstLine) {
+ isFirstLine = false;
+ continue;
+ }
+ std::stringstream ss(line);
+ std::string column1, column2;
+ std::getline(ss, column1, ',');
+ std::getline(ss, column2, ',');
+ dataMap[column1] = column2;
+ }
+
+ file.close();
+ return dataMap;
+ }
- // Create map of longnames, can be added to as developers see fit.
- std::map<std::string,std::string> name_2_standardname = {
- {"p_mid" , "air_pressure"},
- {"p_mid_at_cldtop" , "air_pressure_at_cloud_top"},
- {"T_2m" , "air_temperature"},
- {"T_mid" , "air_temperature"},
- {"T_mid_at_cldtop" , "air_temperature_at_cloud_top"},
- {"aero_g_sw" , "asymmetry_factor_of_ambient_aerosol_particles"},
- {"pbl_height" , "atmosphere_boundary_layer_thickness"},
- {"precip_liq_surf_mass" , "atmosphere_mass_content_of_liquid_precipitation"},
- {"cldlow" , "low_type_cloud_area_fraction"},
- {"cldmed" , "medium_type_cloud_area_fraction"},
- {"cldhgh" , "high_type_cloud_area_fraction"},
- {"cldtot" , "cloud_area_fraction"},
- {"cldfrac_tot_at_cldtop" , "cloud_area_fraction"},
- {"cldfrac_tot" , "cloud_area_fraction_in_atmosphere_layer"},
- {"cldfrac_tot_for_analysis" , "cloud_area_fraction_in_atmosphere_layer"},
- {"cldfrac_rad" , "cloud_area_fraction_in_atmosphere_layer"},
- {"qi" , "cloud_ice_mixing_ratio"},
- {"qc" , "cloud_liquid_water_mixing_ratio"},
- {"U" , "eastward_wind"},
- {"eff_radius_qi" , "effective_radius_of_cloud_ice_particles"},
- {"eff_radius_qc" , "effective_radius_of_cloud_liquid_water_particles"},
- {"eff_radius_qc_at_cldtop" , "effective_radius_of_cloud_liquid_water_particles_at_liquid_water_cloud_top"},
- {"eff_radius_qr" , "effective_radius_of_cloud_rain_particles"},
- {"qv" , "humidity_mixing_ratio"},
- {"cldfrac_ice_at_cldtop" , "ice_cloud_area_fraction"},
- {"cldfrac_ice" , "ice_cloud_area_fraction_in_atmosphere_layer"},
- {"omega" , "lagrangian_tendency_of_air_pressure"},
- {"landfrac" , "land_area_fraction"},
- {"latitude" , "latitude"},
- {"cldfrac_liq_at_cldtop" , "liquid_water_cloud_area_fraction"},
- {"cldfrac_liq" , "liquid_water_cloud_area_fraction_in_atmosphere_layer"},
- {"longitude" , "longitude"},
- {"rainfrac" , "mass_fraction_of_liquid_precipitation_in_air"},
- {"V" , "northward_wind"},
- {"nc" , "number_concentration_of_cloud_liquid_water_particles_in_air"},
- {"cdnc_at_cldtop" , "number_concentration_of_cloud_liquid_water_particles_in_air_at_liquid_water_cloud_top"},
- {"ni" , "number_concentration_of_ice_crystals_in_air"},
- {"aero_tau_sw" , "optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles"},
- {"aero_tau_lw" , "optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles"},
- {"aero_ssa_sw" , "single_scattering_albedo_in_air_due_to_ambient_aerosol_particles"},
- {"sunlit" , "sunlit_binary_mask"},
- {"ps" , "surface_air_pressure"},
- {"LW_flux_dn_at_model_bot" , "surface_downwelling_longwave_flux_in_air"},
- {"SW_flux_dn_at_model_bot" , "surface_downwelling_shortwave_flux_in_air"},
- {"SW_clrsky_flux_dn_at_model_bot" , "surface_downwelling_shortwave_flux_in_air_assuming_clear_sky"},
- {"phis" , "surface_geopotential"},
- {"surf_radiative_T" , "surface_temperature"},
- {"surf_sens_flux" , "surface_upward_sensible_heat_flux"},
- {"SW_flux_dn_at_model_top" , "toa_incoming_shortwave_flux"},
- {"LW_flux_up_at_model_top" , "toa_outgoing_longwave_flux"},
- {"LW_clrsky_flux_up_at_model_top" , "toa_outgoing_longwave_flux_assuming_clear_sky"},
- {"surf_evap" , "water_evapotranspiration_flux"},
- {"AtmosphereDensity" , "air_density"},
- {"PotentialTemperature" , "air_potential_temperature"},
- {"SeaLevelPressure" , "air_pressure_at_mean_sea_level"},
- {"IceWaterPath" , "atmosphere_mass_content_of_cloud_ice"},
- {"LiqWaterPath" , "atmosphere_mass_content_of_cloud_liquid_water"},
- {"VapWaterPath" , "atmosphere_mass_content_of_water_vapor"},
- {"AerosolOpticalDepth550nm" , "atmosphere_optical_thickness_due_to_ambient_aerosol_particles"},
- {"Exner" , "dimensionless_exner_function"},
- {"z_mid" , "geopotential_height"},
- {"geopotential_mid" , "geopotential_height"},
- {"RelativeHumidity" , "relative_humidity"},
- {"surface_upward_latent_heat_flux" , "surface_upward_latent_heat_flux"},
- {"LongwaveCloudForcing" , "toa_longwave_cloud_radiative_effect"},
- {"ShortwaveCloudForcing" , "toa_shortwave_cloud_radiative_effect"},
- {"VirtualTemperature" , "virtual_temperature"},
- {"VaporFlux" , "water_evapotranspiration_flux"},
- {"wind_speed" , "wind_speed"}
- };
-
};
|
I am just a bit against having two fields (long_name and standard_name) which seem to often be the same. Redundant information can be confusing too (e.g., ppl may start using one thinking it's the other). Is there a compelling reason to have both instead of a single long (and standardized) name? What would break if we just kept one? Could we fix downstream tools to use only the one we keep (if we decided to do away with one or the other)? I am all for moving the eamxx_name->standard_name into its own file outside of the source code. |
This is outside my wheelhouse personally; so I defer to @rljacob to decide. In my understanding, the standard_name is the widely recognized one, but because it is standardized (see table below), it has all these underscores and stuff, which I think makes it less convenient for humans. My understanding (from Rob's comment above) is that the long name is for humans to use in plots and such, so you could do something like: # some logic to determine what the var name is based on standard_name
# e.g., for x in ds.variables: find one matching T_mid and save it as _var
# plots at a single column somewhere (x is lev, y is quantity of interest)
ds[_var].isel(ncol=-1).plot()
plt.xlabel(f"{ds.lev.long_name}, {ds.lev.units}")
plt.ylabel(f"{ds[_var].long_name}, {ds[_var].units}")
# etc. cf standard names: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html |
Sorry, late to the discussion. But to address the long name vs standard name question. My understanding is that the standard names main function is to be readable by post-processing tools. So it has a universal convention which may, at times, not be very descriptive to a human. I think the long name is a place to put a single sentence description that is a lot easier to understand. We included the long name originally because of a request. I guess there is one E3SM post-processing tool that specifically looks for the long name of the hybrid coordinate system. Standard name came later to bring EAMxx into CF-Compliance. And for lack of a better option (and for expediency) the default was to assign long_name to match standard_name unless a specific long_name existed. |
So ideally, moving forward we would invest the time (maybe Naser's time ;-) ) to establish different and useful long names for our variables. Would it be reasonable to only include the long name attribute if it is specified? Meaning, if no specific long name exists we don't add that attribute to the variable metadata. That would address Luca's concern that we have netCDF metadata that is filled with redundant information. |
Re: @mahf708 's suggestion for a YAML. I'm in favor of this from a simplicity perspective. It would be way easier to get domain scientists to improve a single YAML file rather than hunt down information in code. Naser also noted that a single YAML file could be used by both F90 and C++. Didn't we have a long discussion about this already though with the OMEGA team? There was a push to adopt a library that stored python dictionaries of variables, or maybe YAML files. It seems like this is a conversation we occasionally touch upon but don't commit to a solution. Maybe we can discuss in the Dev Call and come to a consensus on what to do. |
I would go for a CSV for even more simplicity. Imagine a situation where you just point a domain scientist to a csv file and tell them to update it and send it back to you... That would be slightly better than a YAML file. I am fine with either. Yeah, let's discuss. I think a super general solution may not work well and we should start somewhere manageable first. We could begin with this experiment in EAMxx and see how it goes; if OMEGA team wants to follow suit or suggest an alternative, we could discuss. I do think they have their own sophisticated registry, even in the current f90 code, but I could be wrong. |
MPAS-O used a registry (in XML) but Omega is not taking that route due to various issues related to auto-generated code that created opaque data structures where the relevant info got lost and you couldn't grep stuff. Omega is defining fields in code and keeps an internal list of available fields, though in the case of tracers, the set of field create calls is kept in a single included (.inc) file. Our field create interface has several required args for required metadata, including standard name (for CF-compliance), name (short name used internally) and long name for a slightly longer description. The standard names can be unwieldy and obscure so we do like to keep all these name options. |
You're talking about a file that is very EAMxx specific since it maps EAMxx variable names to metadata. Whatever format you want but I wouldn't worry about F90. The constants dictionary discussion is still iterating on the YAML format and was interrupted by the holiday break. |
If the issue is just having a "short name for plotting", shouldn't the customer decide however they want to label the variables? We provide a standard name (which can be quite long), so that tools can recognize the variable. The long name seems to be responsibility of the user, no? If they want certain names, they can setup some local maps stdname->my_name in their script, and use those... Why should it fall on the shoulders of the model to provide something that is "out of convenience of some users"? Drawing from the link pasted by Rob
It's not like the 2nd is much more amenable to be used as a plot label. Most likely, the user will have to set up their own labels... I will not say anything else, as this is probably my super-personal taste on the matter. I just find having to carry around two naming standards confusing and pointless. That said, we can accommodate it, and storing a cvs file is just fine (I'd prefer a yaml, so we can already load it in via our yaml parsers, but a cvs parser is just a 10-line fcn). |
I think there is more to this than just having a name for plotting. My understanding is that the long_name or description provides a more precise meaning to help users understand the variable. I don't believe a standard name if that follows CF conventions can work for all cases. A simple example from EAMxx is that both T_mid and T_2m have the CF standard name |
Wow, I did not realize that. So the CF standard does not provide any guidance on how to differentiate the name for X when sliced at different elevations? It seems like a shortcoming of the standard. But I see your point. |
No, I don't think so :(. The CF standard does not provide specific guidance on how to differentiate variable names when data is sliced at different elevations. I guess this is why inter-comparison programs, such as CMIP, need to develop augmented variable names when requesting data from models. |
Longnames Mapping
Standard Names Mapping
original body of the issue below:
Currently, the output in the history files do not include a human-readable description of what the variables are, which makes it difficult for users to determine what the outputs are.
Example of what EAM history outputs produce:
The text was updated successfully, but these errors were encountered: