Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EAMxx: Populate long_name of output with human readable description #6940

Open
crterai opened this issue Jan 24, 2025 · 22 comments
Open

EAMxx: Populate long_name of output with human readable description #6940

crterai opened this issue Jan 24, 2025 · 22 comments
Labels
EAMxx PRs focused on capabilities for EAMxx

Comments

@crterai
Copy link
Contributor

crterai commented Jan 24, 2025

Longnames Mapping

Name Long Name
lev hybrid level at midpoints (1000*(A+B))
ilev hybrid level at interfaces (1000*(A+B))
hyai hybrid A coefficient at layer interfaces
hybi hybrid B coefficient at layer interfaces
hyam hybrid A coefficient at layer midpoints
hybm hybrid B coefficient at layer midpoints

Standard Names Mapping

Name Standard Name
p_mid air_pressure
p_mid_at_cldtop air_pressure_at_cloud_top
T_2m air_temperature
T_mid air_temperature
T_mid_at_cldtop air_temperature_at_cloud_top
aero_g_sw asymmetry_factor_of_ambient_aerosol_particles
pbl_height atmosphere_boundary_layer_thickness
precip_liq_surf_mass atmosphere_mass_content_of_liquid_precipitation
cldlow low_type_cloud_area_fraction
cldmed medium_type_cloud_area_fraction
cldhgh high_type_cloud_area_fraction
cldtot cloud_area_fraction
cldfrac_tot_at_cldtop cloud_area_fraction
cldfrac_tot cloud_area_fraction_in_atmosphere_layer
cldfrac_tot_for_analysis cloud_area_fraction_in_atmosphere_layer
cldfrac_rad cloud_area_fraction_in_atmosphere_layer
qi cloud_ice_mixing_ratio
qc cloud_liquid_water_mixing_ratio
U eastward_wind
eff_radius_qi effective_radius_of_cloud_ice_particles
eff_radius_qc effective_radius_of_cloud_liquid_water_particles
eff_radius_qc_at_cldtop effective_radius_of_cloud_liquid_water_particles_at_liquid_water_cloud_top
eff_radius_qr effective_radius_of_cloud_rain_particles
qv humidity_mixing_ratio
cldfrac_ice_at_cldtop ice_cloud_area_fraction
cldfrac_ice ice_cloud_area_fraction_in_atmosphere_layer
omega lagrangian_tendency_of_air_pressure
landfrac land_area_fraction
latitude latitude
cldfrac_liq_at_cldtop liquid_water_cloud_area_fraction
cldfrac_liq liquid_water_cloud_area_fraction_in_atmosphere_layer
longitude longitude
rainfrac mass_fraction_of_liquid_precipitation_in_air
V northward_wind
nc number_concentration_of_cloud_liquid_water_particles_in_air
cdnc_at_cldtop number_concentration_of_cloud_liquid_water_particles_in_air_at_liquid_water_cloud_top
ni number_concentration_of_ice_crystals_in_air
aero_tau_sw optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
aero_tau_lw optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
aero_ssa_sw single_scattering_albedo_in_air_due_to_ambient_aerosol_particles
sunlit sunlit_binary_mask
ps surface_air_pressure
LW_flux_dn_at_model_bot surface_downwelling_longwave_flux_in_air
SW_flux_dn_at_model_bot surface_downwelling_shortwave_flux_in_air
SW_clrsky_flux_dn_at_model_bot surface_downwelling_shortwave_flux_in_air_assuming_clear_sky
phis surface_geopotential
surf_radiative_T surface_temperature
surf_sens_flux surface_upward_sensible_heat_flux
SW_flux_dn_at_model_top toa_incoming_shortwave_flux
LW_flux_up_at_model_top toa_outgoing_longwave_flux
LW_clrsky_flux_up_at_model_top toa_outgoing_longwave_flux_assuming_clear_sky
surf_evap water_evapotranspiration_flux
AtmosphereDensity air_density
PotentialTemperature air_potential_temperature
SeaLevelPressure air_pressure_at_mean_sea_level
IceWaterPath atmosphere_mass_content_of_cloud_ice
LiqWaterPath atmosphere_mass_content_of_cloud_liquid_water
VapWaterPath atmosphere_mass_content_of_water_vapor
AerosolOpticalDepth550nm atmosphere_optical_thickness_due_to_ambient_aerosol_particles
Exner dimensionless_exner_function
z_mid geopotential_height
geopotential_mid geopotential_height
RelativeHumidity relative_humidity
surface_upward_latent_heat_flux surface_upward_latent_heat_flux
LongwaveCloudForcing toa_longwave_cloud_radiative_effect
ShortwaveCloudForcing toa_shortwave_cloud_radiative_effect
VirtualTemperature virtual_temperature
VaporFlux water_evapotranspiration_flux
wind_speed wind_speed

original body of the issue below:

Currently, the output in the history files do not include a human-readable description of what the variables are, which makes it difficult for users to determine what the outputs are.

	float qc(time, ncol, lev) ;
		qc:units = "kg/kg" ;
		qc:_FillValue = 3.402824e+33f ;
		qc:averaging_count_tracker = "avg_count_ncol_lev" ;
		qc:long_name = "qc" ;
	float qi(time, ncol, lev) ;
		qi:units = "kg/kg" ;
		qi:_FillValue = 3.402824e+33f ;
		qi:averaging_count_tracker = "avg_count_ncol_lev" ;
		qi:long_name = "qi" ;
	float qm(time, ncol, lev) ;
		qm:units = "kg/kg" ;
		qm:_FillValue = 3.402824e+33f ;
		qm:averaging_count_tracker = "avg_count_ncol_lev" ;
		qm:long_name = "qm" ;
	float qr(time, ncol, lev) ;
		qr:units = "kg/kg" ;
		qr:_FillValue = 3.402824e+33f ;
		qr:averaging_count_tracker = "avg_count_ncol_lev" ;
		qr:long_name = "qr" ;

Example of what EAM history outputs produce:

	float FLDS(time, ncol) ;
		FLDS:Sampling_Sequence = "rad_lwsw" ;
		FLDS:_FillValue = 1.e+20f ;
		FLDS:missing_value = 1.e+20f ;
		FLDS:units = "W/m2" ;
		FLDS:long_name = "Downwelling longwave flux at surface" ;
		FLDS:standard_name = "surface_downwelling_longwave_flux_in_air" ;
		FLDS:cell_methods = "time: mean" ;
@crterai crterai added the EAMxx PRs focused on capabilities for EAMxx label Jan 24, 2025
@bartgol
Copy link
Contributor

bartgol commented Jan 27, 2025

IO already outputs the standard name. Perhaps the branch you are using is old? Although, I think we added the feature quite a long time ago...

E.g., from our baselines:

	float T_mid(time, ncol, lev) ;
		T_mid:units = "K" ;
		T_mid:_FillValue = 3.402824e+33f ;
		T_mid:averaging_count_tracker = "avg_count_ncol_lev" ;
		T_mid:long_name = "T_mid" ;
		T_mid:standard_name = "air_temperature" ;

I'm not sure why we have long_name as well as standard_name. I think the former is only used for hyai/hyam/hybi/hybm, and it's sort of a "description"...

Edit: it looks like it was added not too long ago: PR E3SM-Project/scream#3105

@mahf708
Copy link
Contributor

mahf708 commented Jan 28, 2025

I'm not sure why we have long_name as well as standard_name

Both are valid attributes; the long_name is a human description of the value (as you see above, "Downwelling longwave flux at surface") which can be a project internal description or whatever, but the standard_name follows stricter criteria: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html

@bartgol
Copy link
Contributor

bartgol commented Jan 28, 2025

I wish we used "description" then, rather than "long_name".

@rljacob
Copy link
Member

rljacob commented Jan 28, 2025

In the case you pointed to, the "long_name" of "T_mid" is neither long nor a description. Is the default to just repeat the variable name?

@bartgol
Copy link
Contributor

bartgol commented Jan 28, 2025

In the case you pointed to, the "long_name" of "T_mid" is neither long nor a description. Is the default to just repeat the variable name?

Yes. We only have stored a long_name value for a handful of fieldd:

  std::map<std::string,std::string> name_2_longname = {
    {"lev","hybrid level at midpoints (1000*(A+B))"},
    {"ilev","hybrid level at interfaces (1000*(A+B))"},
    {"hyai","hybrid A coefficient at layer interfaces"},
    {"hybi","hybrid B coefficient at layer interfaces"},
    {"hyam","hybrid A coefficient at layer midpoints"},
    {"hybm","hybrid B coefficient at layer midpoints"}
  };

For everything else we simply repeat the eamxx name.

@mahf708
Copy link
Contributor

mahf708 commented Jan 28, 2025

In the case you pointed to, the "long_name" of "T_mid" is neither long nor a description. Is the default to just repeat the variable name?

Yes, but we discussed this on the eamxx call today, and some eval peeps will come up with a list and we will add it like we did for standard_names

@mahf708
Copy link
Contributor

mahf708 commented Jan 29, 2025

@rljacob + @bartgol + @crterai + @AaronDonahue, I propose the following:

We save this info (see main post above #6940 (comment)) in easily viewable files (ideally yaml or csv) in the repo and then make a little function to read them and load them. That way, anyone can edit them. I volunteer to do that once we compile a list of long names we ant to add.

@bartgol
Copy link
Contributor

bartgol commented Jan 29, 2025

I see your proposal and raise another proposal: instead of long_name, we call the metadata "description" and storing a very similar (if not identical) string as in "standard_name".

Btw, @jeff-cohere wrote a utility a while ago that downloads the CF database, which should also include a description (IIRC). We could revive that, and modify the DB to be a map eamxx_name->metadata, and have Io read that yaml file at runtime

@rljacob
Copy link
Member

rljacob commented Jan 29, 2025

"description" and "long_name" are already 2 distinct pieces of metadata in climate netcdf files. The "description" in the CF table is really long and not something you want to include. "long_name" comes from the COARDS convention, an older convention that CF generalizes and extends. COARDS doesn't prescribe any specific long_names for variables but says what it should do in general: "a long descriptive name (title). This could be used for labeling plots, for example. If a variable has no long_name attribute assigned, the variable name will be used as a default."

There is one standard that prescribes long_names: CMIP. See https://github.com/PCMDI/cmip6-cmor-tables/blob/main/Tables/CMIP6_AERmon.json

@mahf708
Copy link
Contributor

mahf708 commented Jan 29, 2025

My proposal is actually simple, see patch below. All I desire is a simple way (away from cpp/f90 code) for users to simply issue a PR updating these names as they wish. That isolates the code mechanics from the naming stuff. In current github ui, the csv file is rendered and searchable nicely (see here).

So, when someone (like Chris above) points out deficiencies, we can point the user to issuing a PR updating this CSV file. We can also link it in the docs with instructions as well.

patch

From 6d5429bffd7cac20f0b6cdd9699b913a84cf19fc Mon Sep 17 00:00:00 2001
From: Naser Mahfouz <[email protected]>
Date: Tue, 28 Jan 2025 22:46:44 -0500
Subject: [PATCH] add csv io names to scream

---
 .../src/share/util/scream_io_longnames.csv    |   7 ++
 .../share/util/scream_io_standardnames.csv    |  70 +++++++++++
 .../eamxx/src/share/util/scream_utils.hpp     | 115 +++++-------------
 3 files changed, 110 insertions(+), 82 deletions(-)
 create mode 100644 components/eamxx/src/share/util/scream_io_longnames.csv
 create mode 100644 components/eamxx/src/share/util/scream_io_standardnames.csv

diff --git a/components/eamxx/src/share/util/scream_io_longnames.csv b/components/eamxx/src/share/util/scream_io_longnames.csv
new file mode 100644
index 000000000000..db53f3a82556
--- /dev/null
+++ b/components/eamxx/src/share/util/scream_io_longnames.csv
@@ -0,0 +1,7 @@
+variable,longname
+lev,hybrid level at midpoints (1000*(A+B))
+ilev,hybrid level at interfaces (1000*(A+B))
+hyai,hybrid A coefficient at layer interfaces
+hybi,hybrid B coefficient at layer interfaces
+hyam,hybrid A coefficient at layer midpoints
+hybm,hybrid B coefficient at layer midpoints
diff --git a/components/eamxx/src/share/util/scream_io_standardnames.csv b/components/eamxx/src/share/util/scream_io_standardnames.csv
new file mode 100644
index 000000000000..33aadffee523
--- /dev/null
+++ b/components/eamxx/src/share/util/scream_io_standardnames.csv
@@ -0,0 +1,70 @@
+variable,standardname
+p_mid,air_pressure
+p_mid_at_cldtop,air_pressure_at_cloud_top
+T_2m,air_temperature
+T_mid,air_temperature
+T_mid_at_cldtop,air_temperature_at_cloud_top
+aero_g_sw,asymmetry_factor_of_ambient_aerosol_particles
+pbl_height,atmosphere_boundary_layer_thickness
+precip_liq_surf_mass,atmosphere_mass_content_of_liquid_precipitation
+cldlow,low_type_cloud_area_fraction
+cldmed,medium_type_cloud_area_fraction
+cldhgh,high_type_cloud_area_fraction
+cldtot,cloud_area_fraction
+cldfrac_tot_at_cldtop,cloud_area_fraction
+cldfrac_tot,cloud_area_fraction_in_atmosphere_layer
+cldfrac_tot_for_analysis,cloud_area_fraction_in_atmosphere_layer
+cldfrac_rad,cloud_area_fraction_in_atmosphere_layer
+qi,cloud_ice_mixing_ratio
+qc,cloud_liquid_water_mixing_ratio
+U,eastward_wind
+eff_radius_qi,effective_radius_of_cloud_ice_particles
+eff_radius_qc,effective_radius_of_cloud_liquid_water_particles
+eff_radius_qc_at_cldtop,effective_radius_of_cloud_liquid_water_particles_at_liquid_water_cloud_top
+eff_radius_qr,effective_radius_of_cloud_rain_particles
+qv,humidity_mixing_ratio
+cldfrac_ice_at_cldtop,ice_cloud_area_fraction
+cldfrac_ice,ice_cloud_area_fraction_in_atmosphere_layer
+omega,lagrangian_tendency_of_air_pressure
+landfrac,land_area_fraction
+latitude,latitude
+cldfrac_liq_at_cldtop,liquid_water_cloud_area_fraction
+cldfrac_liq,liquid_water_cloud_area_fraction_in_atmosphere_layer
+longitude,longitude
+rainfrac,mass_fraction_of_liquid_precipitation_in_air
+V,northward_wind
+nc,number_concentration_of_cloud_liquid_water_particles_in_air
+cdnc_at_cldtop,number_concentration_of_cloud_liquid_water_particles_in_air_at_liquid_water_cloud_top
+ni,number_concentration_of_ice_crystals_in_air
+aero_tau_sw,optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
+aero_tau_lw,optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles
+aero_ssa_sw,single_scattering_albedo_in_air_due_to_ambient_aerosol_particles
+sunlit,sunlit_binary_mask
+ps,surface_air_pressure
+LW_flux_dn_at_model_bot,surface_downwelling_longwave_flux_in_air
+SW_flux_dn_at_model_bot,surface_downwelling_shortwave_flux_in_air
+SW_clrsky_flux_dn_at_model_bot,surface_downwelling_shortwave_flux_in_air_assuming_clear_sky
+phis,surface_geopotential
+surf_radiative_T,surface_temperature
+surf_sens_flux,surface_upward_sensible_heat_flux
+SW_flux_dn_at_model_top,toa_incoming_shortwave_flux
+LW_flux_up_at_model_top,toa_outgoing_longwave_flux
+LW_clrsky_flux_up_at_model_top,toa_outgoing_longwave_flux_assuming_clear_sky
+surf_evap,water_evapotranspiration_flux
+AtmosphereDensity,air_density
+PotentialTemperature,air_potential_temperature
+SeaLevelPressure,air_pressure_at_mean_sea_level
+IceWaterPath,atmosphere_mass_content_of_cloud_ice
+LiqWaterPath,atmosphere_mass_content_of_cloud_liquid_water
+VapWaterPath,atmosphere_mass_content_of_water_vapor
+AerosolOpticalDepth550nm,atmosphere_optical_thickness_due_to_ambient_aerosol_particles
+Exner,dimensionless_exner_function
+z_mid,geopotential_height
+geopotential_mid,geopotential_height
+RelativeHumidity,relative_humidity
+surface_upward_latent_heat_flux,surface_upward_latent_heat_flux
+LongwaveCloudForcing,toa_longwave_cloud_radiative_effect
+ShortwaveCloudForcing,toa_shortwave_cloud_radiative_effect
+VirtualTemperature,virtual_temperature
+VaporFlux,water_evapotranspiration_flux
+wind_speed,wind_speed
diff --git a/components/eamxx/src/share/util/scream_utils.hpp b/components/eamxx/src/share/util/scream_utils.hpp
index 9577b5597bff..66ecd151b21e 100644
--- a/components/eamxx/src/share/util/scream_utils.hpp
+++ b/components/eamxx/src/share/util/scream_utils.hpp
@@ -12,6 +12,8 @@
 #include <algorithm>
 #include <map>
 #include <iostream>
+#include <fstream>
+#include <sstream>
 
 namespace scream {
 
@@ -388,89 +390,38 @@ struct DefaultMetadata {
     }
   }
 
-  // Create map of longnames, can be added to as developers see fit.
-  std::map<std::string,std::string> name_2_longname = {
-    {"lev","hybrid level at midpoints (1000*(A+B))"},
-    {"ilev","hybrid level at interfaces (1000*(A+B))"},
-    {"hyai","hybrid A coefficient at layer interfaces"},
-    {"hybi","hybrid B coefficient at layer interfaces"},
-    {"hyam","hybrid A coefficient at layer midpoints"},
-    {"hybm","hybrid B coefficient at layer midpoints"}
-  };
+  // Create map of longnames, see associated file
+  auto name_2_longname = readCSVToMap("scream_io_longnames.csv")
+
+  // Create map of longnames, see associated file
+  auto name_2_standardname = readCSVToMap("scream_io_standardnames.csv")
+
+  std::map<std::string, std::string> readCSVToMap(const std::string& filename) {
+      std::ifstream file(filename);
+      if (!file.is_open()) {
+          std::cerr << "Could not open the file!" << std::endl;
+          return {};
+      }
+
+      std::map<std::string, std::string> dataMap;
+      std::string line;
+      bool isFirstLine = true;
+      while (std::getline(file, line)) {
+          if (isFirstLine) {
+              isFirstLine = false;
+              continue;
+          }
+          std::stringstream ss(line);
+          std::string column1, column2;
+          std::getline(ss, column1, ',');
+          std::getline(ss, column2, ',');
+          dataMap[column1] = column2;
+      }
+
+      file.close();
+      return dataMap;
+  }
 
-  // Create map of longnames, can be added to as developers see fit.
-  std::map<std::string,std::string> name_2_standardname = {
-    {"p_mid"                                                       , "air_pressure"},
-    {"p_mid_at_cldtop"                                             , "air_pressure_at_cloud_top"},
-    {"T_2m"                                                        , "air_temperature"},
-    {"T_mid"                                                       , "air_temperature"},
-    {"T_mid_at_cldtop"                                             , "air_temperature_at_cloud_top"},
-    {"aero_g_sw"                                                   , "asymmetry_factor_of_ambient_aerosol_particles"},
-    {"pbl_height"                                                  , "atmosphere_boundary_layer_thickness"},
-    {"precip_liq_surf_mass"                                        , "atmosphere_mass_content_of_liquid_precipitation"},
-    {"cldlow"                                                      , "low_type_cloud_area_fraction"},
-    {"cldmed"                                                      , "medium_type_cloud_area_fraction"},
-    {"cldhgh"                                                      , "high_type_cloud_area_fraction"},
-    {"cldtot"                                                      , "cloud_area_fraction"},
-    {"cldfrac_tot_at_cldtop"                                       , "cloud_area_fraction"},
-    {"cldfrac_tot"                                                 , "cloud_area_fraction_in_atmosphere_layer"},
-    {"cldfrac_tot_for_analysis"                                    , "cloud_area_fraction_in_atmosphere_layer"},
-    {"cldfrac_rad"                                                 , "cloud_area_fraction_in_atmosphere_layer"},
-    {"qi"                                                          , "cloud_ice_mixing_ratio"},
-    {"qc"                                                          , "cloud_liquid_water_mixing_ratio"},
-    {"U"                                                           , "eastward_wind"},
-    {"eff_radius_qi"                                               , "effective_radius_of_cloud_ice_particles"},
-    {"eff_radius_qc"                                               , "effective_radius_of_cloud_liquid_water_particles"},
-    {"eff_radius_qc_at_cldtop"                                     , "effective_radius_of_cloud_liquid_water_particles_at_liquid_water_cloud_top"},
-    {"eff_radius_qr"                                               , "effective_radius_of_cloud_rain_particles"},
-    {"qv"                                                          , "humidity_mixing_ratio"},
-    {"cldfrac_ice_at_cldtop"                                       , "ice_cloud_area_fraction"},
-    {"cldfrac_ice"                                                 , "ice_cloud_area_fraction_in_atmosphere_layer"},
-    {"omega"                                                       , "lagrangian_tendency_of_air_pressure"},
-    {"landfrac"                                                    , "land_area_fraction"},
-    {"latitude"                                                    , "latitude"},
-    {"cldfrac_liq_at_cldtop"                                       , "liquid_water_cloud_area_fraction"},
-    {"cldfrac_liq"                                                 , "liquid_water_cloud_area_fraction_in_atmosphere_layer"},
-    {"longitude"                                                   , "longitude"},
-    {"rainfrac"                                                    , "mass_fraction_of_liquid_precipitation_in_air"},
-    {"V"                                                           , "northward_wind"},
-    {"nc"                                                          , "number_concentration_of_cloud_liquid_water_particles_in_air"},
-    {"cdnc_at_cldtop"                                              , "number_concentration_of_cloud_liquid_water_particles_in_air_at_liquid_water_cloud_top"},
-    {"ni"                                                          , "number_concentration_of_ice_crystals_in_air"},
-    {"aero_tau_sw"                                                 , "optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles"},
-    {"aero_tau_lw"                                                 , "optical_thickness_of_atmosphere_layer_due_to_ambient_aerosol_particles"},
-    {"aero_ssa_sw"                                                 , "single_scattering_albedo_in_air_due_to_ambient_aerosol_particles"},
-    {"sunlit"                                                      , "sunlit_binary_mask"},
-    {"ps"                                                          , "surface_air_pressure"},
-    {"LW_flux_dn_at_model_bot"                                     , "surface_downwelling_longwave_flux_in_air"},
-    {"SW_flux_dn_at_model_bot"                                     , "surface_downwelling_shortwave_flux_in_air"},
-    {"SW_clrsky_flux_dn_at_model_bot"                              , "surface_downwelling_shortwave_flux_in_air_assuming_clear_sky"},
-    {"phis"                                                        , "surface_geopotential"},
-    {"surf_radiative_T"                                            , "surface_temperature"},
-    {"surf_sens_flux"                                              , "surface_upward_sensible_heat_flux"},
-    {"SW_flux_dn_at_model_top"                                     , "toa_incoming_shortwave_flux"},
-    {"LW_flux_up_at_model_top"                                     , "toa_outgoing_longwave_flux"},
-    {"LW_clrsky_flux_up_at_model_top"                              , "toa_outgoing_longwave_flux_assuming_clear_sky"},
-    {"surf_evap"                                                   , "water_evapotranspiration_flux"},
-    {"AtmosphereDensity"                                           , "air_density"},
-    {"PotentialTemperature"                                        , "air_potential_temperature"},
-    {"SeaLevelPressure"                                            , "air_pressure_at_mean_sea_level"},
-    {"IceWaterPath"                                                , "atmosphere_mass_content_of_cloud_ice"},
-    {"LiqWaterPath"                                                , "atmosphere_mass_content_of_cloud_liquid_water"},
-    {"VapWaterPath"                                                , "atmosphere_mass_content_of_water_vapor"},
-    {"AerosolOpticalDepth550nm"                                    , "atmosphere_optical_thickness_due_to_ambient_aerosol_particles"},
-    {"Exner"                                                       , "dimensionless_exner_function"},
-    {"z_mid"                                                       , "geopotential_height"},
-    {"geopotential_mid"                                            , "geopotential_height"},
-    {"RelativeHumidity"                                            , "relative_humidity"},
-    {"surface_upward_latent_heat_flux"                             , "surface_upward_latent_heat_flux"},
-    {"LongwaveCloudForcing"                                        , "toa_longwave_cloud_radiative_effect"},
-    {"ShortwaveCloudForcing"                                       , "toa_shortwave_cloud_radiative_effect"},
-    {"VirtualTemperature"                                          , "virtual_temperature"},
-    {"VaporFlux"                                                   , "water_evapotranspiration_flux"},
-    {"wind_speed"                                                  , "wind_speed"}
-  };
-  
 };
 
 

@bartgol
Copy link
Contributor

bartgol commented Jan 29, 2025

I am just a bit against having two fields (long_name and standard_name) which seem to often be the same. Redundant information can be confusing too (e.g., ppl may start using one thinking it's the other). Is there a compelling reason to have both instead of a single long (and standardized) name? What would break if we just kept one? Could we fix downstream tools to use only the one we keep (if we decided to do away with one or the other)?

I am all for moving the eamxx_name->standard_name into its own file outside of the source code.

@mahf708
Copy link
Contributor

mahf708 commented Jan 29, 2025

I am just a bit against having two fields (long_name and standard_name) which seem to often be the same. Redundant information can be confusing too (e.g., ppl may start using one thinking it's the other). Is there a compelling reason to have both instead of a single long (and standardized) name? What would break if we just kept one? Could we fix downstream tools to use only the one we keep (if we decided to do away with one or the other)?

I am all for moving the eamxx_name->standard_name into its own file outside of the source code.

This is outside my wheelhouse personally; so I defer to @rljacob to decide. In my understanding, the standard_name is the widely recognized one, but because it is standardized (see table below), it has all these underscores and stuff, which I think makes it less convenient for humans. My understanding (from Rob's comment above) is that the long name is for humans to use in plots and such, so you could do something like:

# some logic to determine what the var name is based on standard_name
# e.g., for x in ds.variables: find one matching T_mid and save it as _var

# plots at a single column somewhere (x is lev, y is quantity of interest)

ds[_var].isel(ncol=-1).plot()
plt.xlabel(f"{ds.lev.long_name}, {ds.lev.units}")
plt.ylabel(f"{ds[_var].long_name}, {ds[_var].units}")

# etc.

cf standard names: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html

@AaronDonahue
Copy link
Contributor

Sorry, late to the discussion. But to address the long name vs standard name question. My understanding is that the standard names main function is to be readable by post-processing tools. So it has a universal convention which may, at times, not be very descriptive to a human. I think the long name is a place to put a single sentence description that is a lot easier to understand.

We included the long name originally because of a request. I guess there is one E3SM post-processing tool that specifically looks for the long name of the hybrid coordinate system. Standard name came later to bring EAMxx into CF-Compliance. And for lack of a better option (and for expediency) the default was to assign long_name to match standard_name unless a specific long_name existed.

@AaronDonahue
Copy link
Contributor

So ideally, moving forward we would invest the time (maybe Naser's time ;-) ) to establish different and useful long names for our variables.

Would it be reasonable to only include the long name attribute if it is specified? Meaning, if no specific long name exists we don't add that attribute to the variable metadata. That would address Luca's concern that we have netCDF metadata that is filled with redundant information.

@AaronDonahue
Copy link
Contributor

Re: @mahf708 's suggestion for a YAML. I'm in favor of this from a simplicity perspective. It would be way easier to get domain scientists to improve a single YAML file rather than hunt down information in code. Naser also noted that a single YAML file could be used by both F90 and C++.

Didn't we have a long discussion about this already though with the OMEGA team? There was a push to adopt a library that stored python dictionaries of variables, or maybe YAML files. It seems like this is a conversation we occasionally touch upon but don't commit to a solution. Maybe we can discuss in the Dev Call and come to a consensus on what to do.

@mahf708
Copy link
Contributor

mahf708 commented Jan 29, 2025

I would go for a CSV for even more simplicity. Imagine a situation where you just point a domain scientist to a csv file and tell them to update it and send it back to you... That would be slightly better than a YAML file. I am fine with either.

Yeah, let's discuss. I think a super general solution may not work well and we should start somewhere manageable first. We could begin with this experiment in EAMxx and see how it goes; if OMEGA team wants to follow suit or suggest an alternative, we could discuss. I do think they have their own sophisticated registry, even in the current f90 code, but I could be wrong.

@philipwjones
Copy link
Contributor

MPAS-O used a registry (in XML) but Omega is not taking that route due to various issues related to auto-generated code that created opaque data structures where the relevant info got lost and you couldn't grep stuff. Omega is defining fields in code and keeps an internal list of available fields, though in the case of tracers, the set of field create calls is kept in a single included (.inc) file. Our field create interface has several required args for required metadata, including standard name (for CF-compliance), name (short name used internally) and long name for a slightly longer description. The standard names can be unwieldy and obscure so we do like to keep all these name options.

@rljacob
Copy link
Member

rljacob commented Jan 29, 2025

You're talking about a file that is very EAMxx specific since it maps EAMxx variable names to metadata. Whatever format you want but I wouldn't worry about F90.

The constants dictionary discussion is still iterating on the YAML format and was interrupted by the holiday break.

@bartgol
Copy link
Contributor

bartgol commented Jan 29, 2025

If the issue is just having a "short name for plotting", shouldn't the customer decide however they want to label the variables? We provide a standard name (which can be quite long), so that tools can recognize the variable. The long name seems to be responsibility of the user, no? If they want certain names, they can setup some local maps stdname->my_name in their script, and use those... Why should it fall on the shoulders of the model to provide something that is "out of convenience of some users"? Drawing from the link pasted by Rob

        "abs550aer": {
            "standard_name": "atmosphere_absorption_optical_thickness_due_to_ambient_aerosol_particles", 
            "long_name": "Ambient Aerosol Absorption Optical Thickness at 550nm", 
            ...

It's not like the 2nd is much more amenable to be used as a plot label. Most likely, the user will have to set up their own labels...

I will not say anything else, as this is probably my super-personal taste on the matter. I just find having to carry around two naming standards confusing and pointless. That said, we can accommodate it, and storing a cvs file is just fine (I'd prefer a yaml, so we can already load it in via our yaml parsers, but a cvs parser is just a 10-line fcn).

@chengzhuzhang
Copy link
Contributor

chengzhuzhang commented Jan 29, 2025

If the issue is just having a "short name for plotting", shouldn't the customer decide however they want to label the variables?

I think there is more to this than just having a name for plotting. My understanding is that the long_name or description provides a more precise meaning to help users understand the variable. I don't believe a standard name if that follows CF conventions can work for all cases. A simple example from EAMxx is that both T_mid and T_2m have the CF standard name air_temperature, which is not adequate for new users to understand the temperature to which it refers. I believe the description of the variable can be considered as part of the model/data documentation.

@bartgol
Copy link
Contributor

bartgol commented Jan 29, 2025

A simple example from EAMxx is that both T_mid and T_2m have the CF standard name air_temperature

Wow, I did not realize that. So the CF standard does not provide any guidance on how to differentiate the name for X when sliced at different elevations? It seems like a shortcoming of the standard.

But I see your point.

@chengzhuzhang
Copy link
Contributor

So the CF standard does not provide any guidance on how to differentiate the name for X when sliced at different elevations?

No, I don't think so :(. The CF standard does not provide specific guidance on how to differentiate variable names when data is sliced at different elevations. I guess this is why inter-comparison programs, such as CMIP, need to develop augmented variable names when requesting data from models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx PRs focused on capabilities for EAMxx
Projects
None yet
Development

No branches or pull requests

7 participants