EAMxx: Populate long_name of output with human readable description #6940

crterai · 2025-01-24T17:43:52Z

Longnames Mapping

Name	Long Name
lev	hybrid level at midpoints (1000*(A+B))
ilev	hybrid level at interfaces (1000*(A+B))
hyai	hybrid A coefficient at layer interfaces
hybi	hybrid B coefficient at layer interfaces
hyam	hybrid A coefficient at layer midpoints
hybm	hybrid B coefficient at layer midpoints

Standard Names Mapping

Name	Standard Name
p_mid	air_pressure
p_mid_at_cldtop	air_pressure_at_cloud_top
T_2m	air_temperature
T_mid	air_temperature
T_mid_at_cldtop	air_temperature_at_cloud_top
aero_g_sw	asymmetry_factor_of_ambient_aerosol_particles
pbl_height	atmosphere_boundary_layer_thickness
precip_liq_surf_mass	atmosphere_mass_content_of_liquid_precipitation
cldlow	low_type_cloud_area_fraction
cldmed	medium_type_cloud_area_fraction
cldhgh	high_type_cloud_area_fraction
cldtot	cloud_area_fraction
cldfrac_tot_at_cldtop	cloud_area_fraction
cldfrac_tot	cloud_area_fraction_in_atmosphere_layer
cldfrac_tot_for_analysis	cloud_area_fraction_in_atmosphere_layer
cldfrac_rad	cloud_area_fraction_in_atmosphere_layer
qi	cloud_ice_mixing_ratio
qc	cloud_liquid_water_mixing_ratio
U	eastward_wind
eff_radius_qi	effective_radius_of_cloud_ice_particles
eff_radius_qc	effective_radius_of_cloud_liquid_water_particles
eff_radius_qc_at_cldtop	effective_radius_of_cloud_liquid_water_particles_at_liquid_water_cloud_top
eff_radius_qr	effective_radius_of_cloud_rain_particles
qv	humidity_mixing_ratio
cldfrac_ice_at_cldtop	ice_cloud_area_fraction
cldfrac_ice	ice_cloud_area_fraction_in_atmosphere_layer
omega	lagrangian_tendency_of_air_pressure
landfrac	land_area_fraction
latitude	latitude
cldfrac_liq_at_cldtop	liquid_water_cloud_area_fraction
cldfrac_liq	liquid_water_cloud_area_fraction_in_atmosphere_layer
longitude	longitude
rainfrac	mass_fraction_of_liquid_precipitation_in_air
V	northward_wind
nc	number_concentration_of_cloud_liquid_water_particles_in_air
cdnc_at_cldtop	number_concentration_of_cloud_liquid_water_particles_in_air_at_liquid_water_cloud_top
ni	number_concentration_of_ice_crystals_in_air
aero_tau_sw	optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
aero_tau_lw	optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
aero_ssa_sw	single_scattering_albedo_in_air_due_to_ambient_aerosol_particles
sunlit	sunlit_binary_mask
ps	surface_air_pressure
LW_flux_dn_at_model_bot	surface_downwelling_longwave_flux_in_air
SW_flux_dn_at_model_bot	surface_downwelling_shortwave_flux_in_air
SW_clrsky_flux_dn_at_model_bot	surface_downwelling_shortwave_flux_in_air_assuming_clear_sky
phis	surface_geopotential
surf_radiative_T	surface_temperature
surf_sens_flux	surface_upward_sensible_heat_flux
SW_flux_dn_at_model_top	toa_incoming_shortwave_flux
LW_flux_up_at_model_top	toa_outgoing_longwave_flux
LW_clrsky_flux_up_at_model_top	toa_outgoing_longwave_flux_assuming_clear_sky
surf_evap	water_evapotranspiration_flux
AtmosphereDensity	air_density
PotentialTemperature	air_potential_temperature
SeaLevelPressure	air_pressure_at_mean_sea_level
IceWaterPath	atmosphere_mass_content_of_cloud_ice
LiqWaterPath	atmosphere_mass_content_of_cloud_liquid_water
VapWaterPath	atmosphere_mass_content_of_water_vapor
AerosolOpticalDepth550nm	atmosphere_optical_thickness_due_to_ambient_aerosol_particles
Exner	dimensionless_exner_function
z_mid	geopotential_height
geopotential_mid	geopotential_height
RelativeHumidity	relative_humidity
surface_upward_latent_heat_flux	surface_upward_latent_heat_flux
LongwaveCloudForcing	toa_longwave_cloud_radiative_effect
ShortwaveCloudForcing	toa_shortwave_cloud_radiative_effect
VirtualTemperature	virtual_temperature
VaporFlux	water_evapotranspiration_flux
wind_speed	wind_speed

original body of the issue below:

Currently, the output in the history files do not include a human-readable description of what the variables are, which makes it difficult for users to determine what the outputs are.

	float qc(time, ncol, lev) ;
		qc:units = "kg/kg" ;
		qc:_FillValue = 3.402824e+33f ;
		qc:averaging_count_tracker = "avg_count_ncol_lev" ;
		qc:long_name = "qc" ;
	float qi(time, ncol, lev) ;
		qi:units = "kg/kg" ;
		qi:_FillValue = 3.402824e+33f ;
		qi:averaging_count_tracker = "avg_count_ncol_lev" ;
		qi:long_name = "qi" ;
	float qm(time, ncol, lev) ;
		qm:units = "kg/kg" ;
		qm:_FillValue = 3.402824e+33f ;
		qm:averaging_count_tracker = "avg_count_ncol_lev" ;
		qm:long_name = "qm" ;
	float qr(time, ncol, lev) ;
		qr:units = "kg/kg" ;
		qr:_FillValue = 3.402824e+33f ;
		qr:averaging_count_tracker = "avg_count_ncol_lev" ;
		qr:long_name = "qr" ;

Example of what EAM history outputs produce:

	float FLDS(time, ncol) ;
		FLDS:Sampling_Sequence = "rad_lwsw" ;
		FLDS:_FillValue = 1.e+20f ;
		FLDS:missing_value = 1.e+20f ;
		FLDS:units = "W/m2" ;
		FLDS:long_name = "Downwelling longwave flux at surface" ;
		FLDS:standard_name = "surface_downwelling_longwave_flux_in_air" ;
		FLDS:cell_methods = "time: mean" ;

The text was updated successfully, but these errors were encountered:

bartgol · 2025-01-27T19:10:33Z

IO already outputs the standard name. Perhaps the branch you are using is old? Although, I think we added the feature quite a long time ago...

E.g., from our baselines:

	float T_mid(time, ncol, lev) ;
		T_mid:units = "K" ;
		T_mid:_FillValue = 3.402824e+33f ;
		T_mid:averaging_count_tracker = "avg_count_ncol_lev" ;
		T_mid:long_name = "T_mid" ;
		T_mid:standard_name = "air_temperature" ;

I'm not sure why we have long_name as well as standard_name. I think the former is only used for hyai/hyam/hybi/hybm, and it's sort of a "description"...

Edit: it looks like it was added not too long ago: PR E3SM-Project/scream#3105

mahf708 · 2025-01-28T15:06:25Z

I'm not sure why we have long_name as well as standard_name

Both are valid attributes; the long_name is a human description of the value (as you see above, "Downwelling longwave flux at surface") which can be a project internal description or whatever, but the standard_name follows stricter criteria: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html

bartgol · 2025-01-28T16:51:45Z

I wish we used "description" then, rather than "long_name".

rljacob · 2025-01-28T17:47:12Z

In the case you pointed to, the "long_name" of "T_mid" is neither long nor a description. Is the default to just repeat the variable name?

bartgol · 2025-01-28T17:59:02Z

In the case you pointed to, the "long_name" of "T_mid" is neither long nor a description. Is the default to just repeat the variable name?

Yes. We only have stored a long_name value for a handful of fieldd:

  std::map<std::string,std::string> name_2_longname = {
    {"lev","hybrid level at midpoints (1000*(A+B))"},
    {"ilev","hybrid level at interfaces (1000*(A+B))"},
    {"hyai","hybrid A coefficient at layer interfaces"},
    {"hybi","hybrid B coefficient at layer interfaces"},
    {"hyam","hybrid A coefficient at layer midpoints"},
    {"hybm","hybrid B coefficient at layer midpoints"}
  };

For everything else we simply repeat the eamxx name.

mahf708 · 2025-01-28T17:59:29Z

In the case you pointed to, the "long_name" of "T_mid" is neither long nor a description. Is the default to just repeat the variable name?

Yes, but we discussed this on the eamxx call today, and some eval peeps will come up with a list and we will add it like we did for standard_names

mahf708 · 2025-01-29T00:30:15Z

@rljacob + @bartgol + @crterai + @AaronDonahue, I propose the following:

We save this info (see main post above #6940 (comment)) in easily viewable files (ideally yaml or csv) in the repo and then make a little function to read them and load them. That way, anyone can edit them. I volunteer to do that once we compile a list of long names we ant to add.

bartgol · 2025-01-29T01:57:22Z

I see your proposal and raise another proposal: instead of long_name, we call the metadata "description" and storing a very similar (if not identical) string as in "standard_name".

Btw, @jeff-cohere wrote a utility a while ago that downloads the CF database, which should also include a description (IIRC). We could revive that, and modify the DB to be a map eamxx_name->metadata, and have Io read that yaml file at runtime

rljacob · 2025-01-29T02:57:50Z

"description" and "long_name" are already 2 distinct pieces of metadata in climate netcdf files. The "description" in the CF table is really long and not something you want to include. "long_name" comes from the COARDS convention, an older convention that CF generalizes and extends. COARDS doesn't prescribe any specific long_names for variables but says what it should do in general: "a long descriptive name (title). This could be used for labeling plots, for example. If a variable has no long_name attribute assigned, the variable name will be used as a default."

There is one standard that prescribes long_names: CMIP. See https://github.com/PCMDI/cmip6-cmor-tables/blob/main/Tables/CMIP6_AERmon.json

mahf708 · 2025-01-29T04:05:30Z

My proposal is actually simple, see patch below. All I desire is a simple way (away from cpp/f90 code) for users to simply issue a PR updating these names as they wish. That isolates the code mechanics from the naming stuff. In current github ui, the csv file is rendered and searchable nicely (see here).

So, when someone (like Chris above) points out deficiencies, we can point the user to issuing a PR updating this CSV file. We can also link it in the docs with instructions as well.

patch

From 6d5429bffd7cac20f0b6cdd9699b913a84cf19fc Mon Sep 17 00:00:00 2001
From: Naser Mahfouz <[email protected]>
Date: Tue, 28 Jan 2025 22:46:44 -0500
Subject: [PATCH] add csv io names to scream

---
 .../src/share/util/scream_io_longnames.csv    |   7 ++
 .../share/util/scream_io_standardnames.csv    |  70 +++++++++++
 .../eamxx/src/share/util/scream_utils.hpp     | 115 +++++-------------
 3 files changed, 110 insertions(+), 82 deletions(-)
 create mode 100644 components/eamxx/src/share/util/scream_io_longnames.csv
 create mode 100644 components/eamxx/src/share/util/scream_io_standardnames.csv

diff --git a/components/eamxx/src/share/util/scream_io_longnames.csv b/components/eamxx/src/share/util/scream_io_longnames.csv
new file mode 100644
index 000000000000..db53f3a82556
--- /dev/null
+++ b/components/eamxx/src/share/util/scream_io_longnames.csv
@@ -0,0 +1,7 @@
+variable,longname
+lev,hybrid level at midpoints (1000*(A+B))
+ilev,hybrid level at interfaces (1000*(A+B))
+hyai,hybrid A coefficient at layer interfaces
+hybi,hybrid B coefficient at layer interfaces
+hyam,hybrid A coefficient at layer midpoints
+hybm,hybrid B coefficient at layer midpoints
diff --git a/components/eamxx/src/share/util/scream_io_standardnames.csv b/components/eamxx/src/share/util/scream_io_standardnames.csv
new file mode 100644
index 000000000000..33aadffee523
--- /dev/null
+++ b/components/eamxx/src/share/util/scream_io_standardnames.csv
@@ -0,0 +1,70 @@
+variable,standardname
+p_mid,air_pressure
+p_mid_at_cldtop,air_pressure_at_cloud_top
+T_2m,air_temperature
+T_mid,air_temperature
+T_mid_at_cldtop,air_temperature_at_cloud_top
+aero_g_sw,asymmetry_factor_of_ambient_aerosol_particles
+pbl_height,atmosphere_boundary_layer_thickness
+precip_liq_surf_mass,atmosphere_mass_content_of_liquid_precipitation
+cldlow,low_type_cloud_area_fraction
+cldmed,medium_type_cloud_area_fraction
+cldhgh,high_type_cloud_area_fraction
+cldtot,cloud_area_fraction
+cldfrac_tot_at_cldtop,cloud_area_fraction
+cldfrac_tot,cloud_area_fraction_in_atmosphere_layer
+cldfrac_tot_for_analysis,cloud_area_fraction_in_atmosphere_layer
+cldfrac_rad,cloud_area_fraction_in_atmosphere_layer
+qi,cloud_ice_mixing_ratio
+qc,cloud_liquid_water_mixing_ratio
+U,eastward_wind
+eff_radius_qi,effective_radius_of_cloud_ice_particles
+eff_radius_qc,effective_radius_of_cloud_liquid_water_particles
+eff_radius_qc_at_cldtop,effective_radius_of_cloud_liquid_water_particles_at_liquid_water_cloud_top
+eff_radius_qr,effective_radius_of_cloud_rain_particles
+qv,humidity_mixing_ratio
+cldfrac_ice_at_cldtop,ice_cloud_area_fraction
+cldfrac_ice,ice_cloud_area_fraction_in_atmosphere_layer
+omega,lagrangian_tendency_of_air_pressure
+landfrac,land_area_fraction
+latitude,latitude
+cldfrac_liq_at_cldtop,liquid_water_cloud_area_fraction
+cldfrac_liq,liquid_water_cloud_area_fraction_in_atmosphere_layer
+longitude,longitude
+rainfrac,mass_fraction_of_liquid_precipitation_in_air
+V,northward_wind
+nc,number_concentration_of_cloud_liquid_water_particles_in_air
+cdnc_at_cldtop,number_concentration_of_cloud_liquid_water_particles_in_air_at_liquid_water_cloud_top
+ni,number_concentration_of_ice_crystals_in_air
+aero_tau_sw,optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
+aero_tau_lw,optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
+aero_ssa_sw,single_scattering_albedo_in_air_due_to_ambient_aerosol_particles
+sunlit,sunlit_binary_mask
+ps,surface_air_pressure
+LW_flux_dn_at_model_bot,surface_downwelling_longwave_flux_in_air
+SW_flux_dn_at_model_bot,surface_downwelling_shortwave_flux_in_air
+SW_clrsky_flux_dn_at_model_bot,surface_downwelling_shortwave_flux_in_air_assuming_clear_sky
+phis,surface_geopotential
+surf_radiative_T,surface_temperature
+surf_sens_flux,surface_upward_sensible_heat_flux
+SW_flux_dn_at_model_top,toa_incoming_shortwave_flux
+LW_flux_up_at_model_top,toa_outgoing_longwave_flux
+LW_clrsky_flux_up_at_model_top,toa_outgoing_longwave_flux_assuming_clear_sky
+surf_evap,water_evapotranspiration_flux
+AtmosphereDensity,air_density
+PotentialTemperature,air_potential_temperature
+SeaLevelPressure,air_pressure_at_mean_sea_level
+IceWaterPath,atmosphere_mass_content_of_cloud_ice
+LiqWaterPath,atmosphere_mass_content_of_cloud_liquid_water
+VapWaterPath,atmosphere_mass_content_of_water_vapor
+AerosolOpticalDepth550nm,atmosphere_optical_thickness_due_to_ambient_aerosol_particles
+Exner,dimensionless_exner_function
+z_mid,geopotential_height
+geopotential_mid,geopotential_height
+RelativeHumidity,relative_humidity
+surface_upward_latent_heat_flux,surface_upward_latent_heat_flux
+LongwaveCloudForcing,toa_longwave_cloud_radiative_effect
+ShortwaveCloudForcing,toa_shortwave_cloud_radiative_effect
+VirtualTemperature,virtual_temperature
+VaporFlux,water_evapotranspiration_flux
+wind_speed,wind_speed
diff --git a/components/eamxx/src/share/util/scream_utils.hpp b/components/eamxx/src/share/util/scream_utils.hpp
index 9577b5597bff..66ecd151b21e 100644
--- a/components/eamxx/src/share/util/scream_utils.hpp
+++ b/components/eamxx/src/share/util/scream_utils.hpp
@@ -12,6 +12,8 @@
 #include <algorithm>
 #include <map>
 #include <iostream>
+#include <fstream>
+#include <sstream>
 
 namespace scream {
 
@@ -388,89 +390,38 @@ struct DefaultMetadata {
     }
   }
 
-  // Create map of longnames, can be added to as developers see fit.
-  std::map<std::string,std::string> name_2_longname = {
-    {"lev","hybrid level at midpoints (1000*(A+B))"},
-    {"ilev","hybrid level at interfaces (1000*(A+B))"},
-    {"hyai","hybrid A coefficient at layer interfaces"},
-    {"hybi","hybrid B coefficient at layer interfaces"},
-    {"hyam","hybrid A coefficient at layer midpoints"},
-    {"hybm","hybrid B coefficient at layer midpoints"}
-  };
+  // Create map of longnames, see associated file
+  auto name_2_longname = readCSVToMap("scream_io_longnames.csv")
+
+  // Create map of longnames, see associated file
+  auto name_2_standardname = readCSVToMap("scream_io_standardnames.csv")
+
+  std::map<std::string, std::string> readCSVToMap(const std::string& filename) {
+      std::ifstream file(filename);
+      if (!file.is_open()) {
+          std::cerr << "Could not open the file!" << std::endl;
+          return {};
+      }
+
+      std::map<std::string, std::string> dataMap;
+      std::string line;
+      bool isFirstLine = true;
+      while (std::getline(file, line)) {
+          if (isFirstLine) {
+              isFirstLine = false;
+              continue;
+          }
+          std::stringstream ss(line);
+          std::string column1, column2;
+          std::getline(ss, column1, ',');
+          std::getline(ss, column2, ',');
+          dataMap[column1] = column2;
+      }
+
+      file.close();
+      return dataMap;
+  }
 
-  // Create map of longnames, can be added to as developers see fit.
-  std::map<std::string,std::string> name_2_standardname = {
-    {"p_mid"                                                       , "air_pressure"},
-    {"p_mid_at_cldtop"                                             , "air_pressure_at_cloud_top"},
-    {"T_2m"                                                        , "air_temperature"},
-    {"T_mid"                                                       , "air_temperature"},
-    {"T_mid_at_cldtop"                                             , "air_temperature_at_cloud_top"},
-    {"aero_g_sw"                                                   , "asymmetry_factor_of_ambient_aerosol_particles"},
-    {"pbl_height"                                                  , "atmosphere_boundary_layer_thickness"},
-    {"precip_liq_surf_mass"                                        , "atmosphere_mass_content_of_liquid_precipitation"},
-    {"cldlow"                                                      , "low_type_cloud_area_fraction"},
-    {"cldmed"                                                      , "medium_type_cloud_area_fraction"},
-    {"cldhgh"                                                      , "high_type_cloud_area_fraction"},
-    {"cldtot"                                                      , "cloud_area_fraction"},
-    {"cldfrac_tot_at_cldtop"                                       , "cloud_area_fraction"},
-    {"cldfrac_tot"                                                 , "cloud_area_fraction_in_atmosphere_layer"},
-    {"cldfrac_tot_for_analysis"                                    , "cloud_area_fraction_in_atmosphere_layer"},
-    {"cldfrac_rad"                                                 , "cloud_area_fraction_in_atmosphere_layer"},
-    {"qi"                                                          , "cloud_ice_mixing_ratio"},
-    {"qc"                                                          , "cloud_liquid_water_mixing_ratio"},
-    {"U"                                                           , "eastward_wind"},
-    {"eff_radius_qi"                                               , "effective_radius_of_cloud_ice_particles"},
-    {"eff_radius_qc"                                               , "effective_radius_of_cloud_liquid_water_particles"},
-    {"eff_radius_qc_at_cldtop"                                     , "effective_radius_of_cloud_liquid_water_particles_at_liquid_water_cloud_top"},
-    {"eff_radius_qr"                                               , "effective_radius_of_cloud_rain_particles"},
-    {"qv"                                                          , "humidity_mixing_ratio"},
-    {"cldfrac_ice_at_cldtop"                                       , "ice_cloud_area_fraction"},
-    {"cldfrac_ice"                                                 , "ice_cloud_area_fraction_in_atmosphere_layer"},
-    {"omega"                                                       , "lagrangian_tendency_of_air_pressure"},
-    {"landfrac"                                                    , "land_area_fraction"},
-    {"latitude"                                                    , "latitude"},
-    {"cldfrac_liq_at_cldtop"                                       , "liquid_water_cloud_area_fraction"},
-    {"cldfrac_liq"                                                 , "liquid_water_cloud_area_fraction_in_atmosphere_layer"},
-    {"longitude"                                                   , "longitude"},
-    {"rainfrac"                                                    , "mass_fraction_of_liquid_precipitation_in_air"},
-    {"V"                                                           , "northward_wind"},
-    {"nc"                                                          , "number_concentration_of_cloud_liquid_water_particles_in_air"},
-    {"cdnc_at_cldtop"                                              , "number_concentration_of_cloud_liquid_water_particles_in_air_at_liquid_water_cloud_top"},
-    {"ni"                                                          , "number_concentration_of_ice_crystals_in_air"},
-    {"aero_tau_sw"                                                 , "optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles"},
-    {"aero_tau_lw"                                                 , "optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles"},
-    {"aero_ssa_sw"                                                 , "single_scattering_albedo_in_air_due_to_ambient_aerosol_particles"},
-    {"sunlit"                                                      , "sunlit_binary_mask"},
-    {"ps"                                                          , "surface_air_pressure"},
-    {"LW_flux_dn_at_model_bot"                                     , "surface_downwelling_longwave_flux_in_air"},
-    {"SW_flux_dn_at_model_bot"                                     , "surface_downwelling_shortwave_flux_in_air"},
-    {"SW_clrsky_flux_dn_at_model_bot"                              , "surface_downwelling_shortwave_flux_in_air_assuming_clear_sky"},
-    {"phis"                                                        , "surface_geopotential"},
-    {"surf_radiative_T"                                            , "surface_temperature"},
-    {"surf_sens_flux"                                              , "surface_upward_sensible_heat_flux"},
-    {"SW_flux_dn_at_model_top"                                     , "toa_incoming_shortwave_flux"},
-    {"LW_flux_up_at_model_top"                                     , "toa_outgoing_longwave_flux"},
-    {"LW_clrsky_flux_up_at_model_top"                              , "toa_outgoing_longwave_flux_assuming_clear_sky"},
-    {"surf_evap"                                                   , "water_evapotranspiration_flux"},
-    {"AtmosphereDensity"                                           , "air_density"},
-    {"PotentialTemperature"                                        , "air_potential_temperature"},
-    {"SeaLevelPressure"                                            , "air_pressure_at_mean_sea_level"},
-    {"IceWaterPath"                                                , "atmosphere_mass_content_of_cloud_ice"},
-    {"LiqWaterPath"                                                , "atmosphere_mass_content_of_cloud_liquid_water"},
-    {"VapWaterPath"                                                , "atmosphere_mass_content_of_water_vapor"},
-    {"AerosolOpticalDepth550nm"                                    , "atmosphere_optical_thickness_due_to_ambient_aerosol_particles"},
-    {"Exner"                                                       , "dimensionless_exner_function"},
-    {"z_mid"                                                       , "geopotential_height"},
-    {"geopotential_mid"                                            , "geopotential_height"},
-    {"RelativeHumidity"                                            , "relative_humidity"},
-    {"surface_upward_latent_heat_flux"                             , "surface_upward_latent_heat_flux"},
-    {"LongwaveCloudForcing"                                        , "toa_longwave_cloud_radiative_effect"},
-    {"ShortwaveCloudForcing"                                       , "toa_shortwave_cloud_radiative_effect"},
-    {"VirtualTemperature"                                          , "virtual_temperature"},
-    {"VaporFlux"                                                   , "water_evapotranspiration_flux"},
-    {"wind_speed"                                                  , "wind_speed"}
-  };
-  
 };

bartgol · 2025-01-29T15:51:13Z

I am just a bit against having two fields (long_name and standard_name) which seem to often be the same. Redundant information can be confusing too (e.g., ppl may start using one thinking it's the other). Is there a compelling reason to have both instead of a single long (and standardized) name? What would break if we just kept one? Could we fix downstream tools to use only the one we keep (if we decided to do away with one or the other)?

I am all for moving the eamxx_name->standard_name into its own file outside of the source code.

mahf708 · 2025-01-29T16:11:12Z

I am just a bit against having two fields (long_name and standard_name) which seem to often be the same. Redundant information can be confusing too (e.g., ppl may start using one thinking it's the other). Is there a compelling reason to have both instead of a single long (and standardized) name? What would break if we just kept one? Could we fix downstream tools to use only the one we keep (if we decided to do away with one or the other)?

I am all for moving the eamxx_name->standard_name into its own file outside of the source code.

This is outside my wheelhouse personally; so I defer to @rljacob to decide. In my understanding, the standard_name is the widely recognized one, but because it is standardized (see table below), it has all these underscores and stuff, which I think makes it less convenient for humans. My understanding (from Rob's comment above) is that the long name is for humans to use in plots and such, so you could do something like:

# some logic to determine what the var name is based on standard_name
# e.g., for x in ds.variables: find one matching T_mid and save it as _var

# plots at a single column somewhere (x is lev, y is quantity of interest)

ds[_var].isel(ncol=-1).plot()
plt.xlabel(f"{ds.lev.long_name}, {ds.lev.units}")
plt.ylabel(f"{ds[_var].long_name}, {ds[_var].units}")

# etc.

cf standard names: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html

AaronDonahue · 2025-01-29T17:27:46Z

Sorry, late to the discussion. But to address the long name vs standard name question. My understanding is that the standard names main function is to be readable by post-processing tools. So it has a universal convention which may, at times, not be very descriptive to a human. I think the long name is a place to put a single sentence description that is a lot easier to understand.

We included the long name originally because of a request. I guess there is one E3SM post-processing tool that specifically looks for the long name of the hybrid coordinate system. Standard name came later to bring EAMxx into CF-Compliance. And for lack of a better option (and for expediency) the default was to assign long_name to match standard_name unless a specific long_name existed.

AaronDonahue · 2025-01-29T17:29:18Z

So ideally, moving forward we would invest the time (maybe Naser's time ;-) ) to establish different and useful long names for our variables.

Would it be reasonable to only include the long name attribute if it is specified? Meaning, if no specific long name exists we don't add that attribute to the variable metadata. That would address Luca's concern that we have netCDF metadata that is filled with redundant information.

AaronDonahue · 2025-01-29T17:32:01Z

Re: @mahf708 's suggestion for a YAML. I'm in favor of this from a simplicity perspective. It would be way easier to get domain scientists to improve a single YAML file rather than hunt down information in code. Naser also noted that a single YAML file could be used by both F90 and C++.

Didn't we have a long discussion about this already though with the OMEGA team? There was a push to adopt a library that stored python dictionaries of variables, or maybe YAML files. It seems like this is a conversation we occasionally touch upon but don't commit to a solution. Maybe we can discuss in the Dev Call and come to a consensus on what to do.

mahf708 · 2025-01-29T17:56:00Z

I would go for a CSV for even more simplicity. Imagine a situation where you just point a domain scientist to a csv file and tell them to update it and send it back to you... That would be slightly better than a YAML file. I am fine with either.

Yeah, let's discuss. I think a super general solution may not work well and we should start somewhere manageable first. We could begin with this experiment in EAMxx and see how it goes; if OMEGA team wants to follow suit or suggest an alternative, we could discuss. I do think they have their own sophisticated registry, even in the current f90 code, but I could be wrong.

philipwjones · 2025-01-29T18:31:31Z

MPAS-O used a registry (in XML) but Omega is not taking that route due to various issues related to auto-generated code that created opaque data structures where the relevant info got lost and you couldn't grep stuff. Omega is defining fields in code and keeps an internal list of available fields, though in the case of tracers, the set of field create calls is kept in a single included (.inc) file. Our field create interface has several required args for required metadata, including standard name (for CF-compliance), name (short name used internally) and long name for a slightly longer description. The standard names can be unwieldy and obscure so we do like to keep all these name options.

rljacob · 2025-01-29T18:56:24Z

You're talking about a file that is very EAMxx specific since it maps EAMxx variable names to metadata. Whatever format you want but I wouldn't worry about F90.

The constants dictionary discussion is still iterating on the YAML format and was interrupted by the holiday break.

bartgol · 2025-01-29T23:06:16Z

If the issue is just having a "short name for plotting", shouldn't the customer decide however they want to label the variables? We provide a standard name (which can be quite long), so that tools can recognize the variable. The long name seems to be responsibility of the user, no? If they want certain names, they can setup some local maps stdname->my_name in their script, and use those... Why should it fall on the shoulders of the model to provide something that is "out of convenience of some users"? Drawing from the link pasted by Rob

        "abs550aer": {
            "standard_name": "atmosphere_absorption_optical_thickness_due_to_ambient_aerosol_particles", 
            "long_name": "Ambient Aerosol Absorption Optical Thickness at 550nm", 
            ...

It's not like the 2nd is much more amenable to be used as a plot label. Most likely, the user will have to set up their own labels...

I will not say anything else, as this is probably my super-personal taste on the matter. I just find having to carry around two naming standards confusing and pointless. That said, we can accommodate it, and storing a cvs file is just fine (I'd prefer a yaml, so we can already load it in via our yaml parsers, but a cvs parser is just a 10-line fcn).

chengzhuzhang · 2025-01-29T23:27:26Z

If the issue is just having a "short name for plotting", shouldn't the customer decide however they want to label the variables?

I think there is more to this than just having a name for plotting. My understanding is that the long_name or description provides a more precise meaning to help users understand the variable. I don't believe a standard name if that follows CF conventions can work for all cases. A simple example from EAMxx is that both T_mid and T_2m have the CF standard name air_temperature, which is not adequate for new users to understand the temperature to which it refers. I believe the description of the variable can be considered as part of the model/data documentation.

bartgol · 2025-01-29T23:32:34Z

A simple example from EAMxx is that both T_mid and T_2m have the CF standard name air_temperature

Wow, I did not realize that. So the CF standard does not provide any guidance on how to differentiate the name for X when sliced at different elevations? It seems like a shortcoming of the standard.

But I see your point.

chengzhuzhang · 2025-01-29T23:50:43Z

So the CF standard does not provide any guidance on how to differentiate the name for X when sliced at different elevations?

No, I don't think so :(. The CF standard does not provide specific guidance on how to differentiate variable names when data is sliced at different elevations. I guess this is why inter-comparison programs, such as CMIP, need to develop augmented variable names when requesting data from models.

crterai added the EAMxx PRs focused on capabilities for EAMxx label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EAMxx: Populate long_name of output with human readable description #6940

EAMxx: Populate long_name of output with human readable description #6940

crterai commented Jan 24, 2025 •

edited by mahf708

Loading

bartgol commented Jan 27, 2025 •

edited

Loading

mahf708 commented Jan 28, 2025

bartgol commented Jan 28, 2025

rljacob commented Jan 28, 2025

bartgol commented Jan 28, 2025

mahf708 commented Jan 28, 2025

mahf708 commented Jan 29, 2025

bartgol commented Jan 29, 2025 •

edited

Loading

rljacob commented Jan 29, 2025 •

edited

Loading

mahf708 commented Jan 29, 2025 •

edited

Loading

bartgol commented Jan 29, 2025 •

edited

Loading

mahf708 commented Jan 29, 2025

AaronDonahue commented Jan 29, 2025

AaronDonahue commented Jan 29, 2025

AaronDonahue commented Jan 29, 2025

mahf708 commented Jan 29, 2025

philipwjones commented Jan 29, 2025

rljacob commented Jan 29, 2025

bartgol commented Jan 29, 2025

chengzhuzhang commented Jan 29, 2025 •

edited

Loading

bartgol commented Jan 29, 2025

chengzhuzhang commented Jan 29, 2025

EAMxx: Populate long_name of output with human readable description #6940

EAMxx: Populate long_name of output with human readable description #6940

Comments

crterai commented Jan 24, 2025 • edited by mahf708 Loading

Longnames Mapping

Standard Names Mapping

original body of the issue below:

bartgol commented Jan 27, 2025 • edited Loading

mahf708 commented Jan 28, 2025

bartgol commented Jan 28, 2025

rljacob commented Jan 28, 2025

bartgol commented Jan 28, 2025

mahf708 commented Jan 28, 2025

mahf708 commented Jan 29, 2025

bartgol commented Jan 29, 2025 • edited Loading

rljacob commented Jan 29, 2025 • edited Loading

mahf708 commented Jan 29, 2025 • edited Loading

bartgol commented Jan 29, 2025 • edited Loading

mahf708 commented Jan 29, 2025

AaronDonahue commented Jan 29, 2025

AaronDonahue commented Jan 29, 2025

AaronDonahue commented Jan 29, 2025

mahf708 commented Jan 29, 2025

philipwjones commented Jan 29, 2025

rljacob commented Jan 29, 2025

bartgol commented Jan 29, 2025

chengzhuzhang commented Jan 29, 2025 • edited Loading

bartgol commented Jan 29, 2025

chengzhuzhang commented Jan 29, 2025

crterai commented Jan 24, 2025 •

edited by mahf708

Loading

bartgol commented Jan 27, 2025 •

edited

Loading

bartgol commented Jan 29, 2025 •

edited

Loading

rljacob commented Jan 29, 2025 •

edited

Loading

mahf708 commented Jan 29, 2025 •

edited

Loading

bartgol commented Jan 29, 2025 •

edited

Loading

chengzhuzhang commented Jan 29, 2025 •

edited

Loading