Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data override ext weights #1556

Merged
merged 27 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
6ec2ad4
Updated read_table_yaml subroutine to support new format for reading …
abrooks1085 May 14, 2024
f6ef704
Added logic to detect usage of external weights block
abrooks1085 May 14, 2024
cbab4db
Added ext_weights component to derived type data_type
abrooks1085 May 14, 2024
3909e0d
Merge branch 'NOAA-GFDL:main' into Data_Override_Ext_Weights
abrooks1085 May 17, 2024
ee989b8
Fixed typos
abrooks1085 May 17, 2024
462f0cd
Added check to see if subregion sub-block is present in yaml before a…
abrooks1085 May 21, 2024
143ae89
Updated test data_table.yaml to use the new yaml layout
abrooks1085 May 21, 2024
08abb80
Updated documentation for using new data_table.yaml format
abrooks1085 May 21, 2024
b7aaa0a
Fixed typos in documentation
abrooks1085 May 21, 2024
5d70e72
Remove tabs and trailing whitespace
abrooks1085 May 21, 2024
7718378
Merge branch 'NOAA-GFDL:main' into Data_Override_Ext_Weights
abrooks1085 May 22, 2024
f611226
clarified descriptions of variables in examples
abrooks1085 May 23, 2024
e703793
Merge branch 'Data_Override_Ext_Weights' of github.com:abrooks1085/FM…
abrooks1085 May 23, 2024
eed531c
Merge branch 'main' of github.com:NOAA-GFDL/FMS into Data_Override_Ex…
uramirez8707 Jul 19, 2024
9499e13
implement the reading of the weight files, needs a lot of clean up
uramirez8707 Jul 19, 2024
f555b40
major refactor/clean up
uramirez8707 Jul 19, 2024
5e4b9c1
more cleanup + documentation
uramirez8707 Jul 19, 2024
a340334
minor changes
uramirez8707 Jul 19, 2024
19af3d1
move the test, fix bug when hecking the size of the weight files grid
uramirez8707 Jul 22, 2024
12a421a
set data_entry%ext_weights to false when using the legacy table
uramirez8707 Jul 22, 2024
85306b6
small updates to test, needed for gcc
uramirez8707 Jul 22, 2024
6560dcb
workaround for the gnu issue
uramirez8707 Jul 23, 2024
56f546d
remove the INPUT directory after finishing the tests
uramirez8707 Jul 23, 2024
585cacb
refactor the data_table yaml parsing
uramirez8707 Jul 30, 2024
d9fec99
Update the documentation, add examples
uramirez8707 Jul 30, 2024
ecd9d6b
Fix typo in documentation
uramirez8707 Aug 1, 2024
0f953a3
more minor documentation updates
uramirez8707 Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 104 additions & 46 deletions data_override/README.MD
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,29 @@
- [How to use it?](README.MD#2-how-to-use-it)
- [Converting legacy data_table to data_table.yaml](README.MD#3-converting-legacy-data_table-to-data_tableyaml)
- [Examples](README.MD#4-examples)
- [External Weight File Structure](README.MD#5-external-weight-file-structure)

#### 1. YAML Data Table format:
Each entry in the data_table has the following key values:
- **gridname:** Name of the grid to interpolate the data to. The acceptable values are "ICE", "OCN", "ATM", and "LND"
- **fieldname_code:** Name of the field as it is in the code to interpolate.
- **fieldname_file:** Name of the field as it is writen in the file. **Required** only if overriding from a file
- **file_name:** Name of the file where the variable is located, including the directory. **Required** only if overriding from a file
- **interpol_method:** Method used to interpolate the field. The acceptable values are "bilinear", "bicubic", and "none". "none" implies that the field in the file is already in the model grid. The LIMA format is no longer supported. **Required** only if overriding from a file
- **grid_name:** Name of the grid to interpolate the data to. The acceptable values are "ICE", "OCN", "ATM", and "LND"
- **fieldname_in_model:** Name of the field as it is in the code to interpolate.
- **override_file:** The parent key containing the nested keys related to overriding data from file. **Required** only if overriding from a file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerned the use of parent and nested keys here could be confused with the idea of parent and nested grids. Is the standardized YAML terminology to use parent and nest or can we define it as we want?

- **file_name:** Name of the file where the variable is located, including the directory
- **fieldname_in_file:** Name of the field as it is writen in the file
- **interp_method:** Method used to interpolate the field. The acceptable values are "bilinear", "bicubic", and "none". "none" implies that the field in the file is already in the model grid. The LIMA format is no longer supported
- **multi_file:** The multifile parent key. **Required** only if it is desired to use multiple(3) input netcdf files instead of 1. Note that **file_name** must be the second file in the set when using multiple input netcdf files
- **prev_file_name:** The name of the first file in the set
- **next_file_name:** The name of the third file in the set
- **external_weights:** The external weights parent key. **Required** only if it is desired to use external weights from file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the only supported method for external weights is bilinear. Is the interp_method used in conjunction with the external_weights to determine the method?

It seems we may be opening things up to undetermined behavior using the interp_method in conjunction with the external_weights. Is there a method or some other global attribute in the file we could use to compare against the chosen interp_method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a method or some other global attribute in the file we could use to compare against the chosen interp_method

The only thing that we currently do is check that the weight file has the expected dimensions for the interp method. It can be done more rigorous though.

- **file_name:** Name of the file where the external weights are located, including the directory
- **source:** Name of the source that generated the external weights. The only acceptable value is "fregrid"
- **factor:** Factor that will be multiplied after the data is interpolated

If it is desired to interpolate the data to a region of the model grid. The following **optional** arguments are available.
- **region_type:** The region type. The acceptable values are "inside_region" and "outside_region"
- **lon_start:** The starting latitude in the same units as the grid data in the file
- **lon_end:** The ending latitude in the same units as the grid data in the file
- **lat_start:** The starting longitude in the same units as the grid data in the file
- **lon_end:** The ending longitude in the same units as the grid data in the file

If it is desired to use multiple(3) input netcdf files instead of 1. The following **optional** keys are available.
- **is_multi_file:** Set to `True` is using the multi-file feature
- **prev_file_name:** The name of the first file in the set
- **next_file_name:** The name of the third file in the set

Note that **file_name** must be the second file in the set. **prev_file_name** and/or **next_file_name** are required if **is_multi_file** is set to `True`
- **subregion:** The subregion parent key. **Required** only if it is desired to interpolate the data to a region of the model grid
- **type:** The region type. The acceptable values are "inside_region" and "outside_region"
- **lon_start:** The starting latitude in the same units as the grid data in the file
- **lon_end:** The ending latitude in the same units as the grid data in the file
- **lat_start:** The starting longitude in the same units as the grid data in the file
- **lon_end:** The ending longitude in the same units as the grid data in the file

#### 2. How to use it?
In order to use the yaml data format, [libyaml](https://github.com/yaml/libyaml) needs to be installed and linked with FMS. Additionally, FMS must be compiled with -Duse_yaml macro. If using autotools, you can add `--with-yaml`, which will add the macro for you and check that libyaml is linked correctly.
Expand All @@ -55,21 +55,22 @@ In the **legacy format**, the data_table will look like:
In the **yaml format**, the data_table will look like
```
data_table:
- gridname : ICE
fieldname_code : sic_obs
fieldname_file : sic
file_name : INPUT/hadisst_ice.data.nc
interpol_method : bilinear
factor : 0.01
- grid_name : ICE
fieldname_in_model : sic_obs
override_file:
- file_name : INPUT/hadisst_ice.data.nc
fieldname_in_file : sic
interp_method : bilinear
factor : 0.01
```
Which corresponds to the following model code:
```F90
call data_override('ICE', 'sic_obs', icec, Spec_Time)
```
where:
- `ICE` corresponds to the gridname in the data_table
- `sic_obs` corresponds to the fieldname_code in the data_table
- `icec` is the variable to write the data to
- `ICE` is the component domain for which the variable is being interpolated and corresponds to the grid_name in the data_table
- `sic_obs` corresponds to the fieldname_in_model in the data_table
- `icec` is the storage array that holds the interpolated data
- `Spec_Time` is the time to interpolate the data to.

Additionally, it is required to call data_override_init (in this case with the ICE domain). The grid_spec.nc file must also contain the coordinate information for the domain being used.
Expand All @@ -82,25 +83,25 @@ call data_override_init(Ice_domain_in=Ice_domain)

In the **legacy format**, the data_table will look like:
```
"ICE", "sit_obs", "", "INPUT/hadisst_ice.data.nc", "none", 2.0
"ICE", "sit_obs", "", "INPUT/hadisst_ice.data.nc", "none", 2.0
```

In the **yaml format**, the data_table will look like:
```
``` yaml
data_table:
- gridname : ICE
fieldname_code : sit_obs
factor : 0.01
- grid_name : ICE
fieldname_in_model : sit_obs
factor : 0.01
```

Which corresponds to the following model code:
```F90
call data_override('ICE', 'sit_obs', icec, Spec_Time)
```
where:
- `ICE` corresponds to the gridname in the data_table
- `sit_obs` corresponds to the fieldname_code in the data_table
- `icec` is the variable to write the data to
- `ICE` is the component domain for which the variable is being interpolated and corresponds to the grid_name in the data_table
- `sit_obs` corresponds to the fieldname_in_model in the data_table
- `icec` is the storage array that holds the interpolated data
- `Spec_Time` is the time to interpolate the data to.

Additionally, it is required to call data_override_init (in this case with the ICE domain). The grid_spec.nc file is still required to initialize data_override with the ICE domain.
Expand All @@ -117,28 +118,85 @@ In the **legacy format**, the data_table will look like:
```

In the **yaml format**, the data_table will look like:
```
``` yaml
data_table:
- gridname : OCN
fieldname_code : runoff
fieldname_file : runoff
file_name : INPUT/runoff.daitren.clim.nc
interpol_method : none
factor : 1.0
- grid_name : OCN
fieldname_in_model : runoff
override_file:
- file_name : INPUT/runoff.daitren.clim.nc
fieldname_in_file : runoff
interp_method : none
factor : 1.0
```

Which corresponds to the following model code:
```F90
call data_override('OCN', 'runoff', runoff_data, Spec_Time)
```
where:
- `OCN` corresponds to the gridname in the data_table
- `runoff` corresponds to the fieldname_code in the data_table
- `runoff_data` is the variable to write the data to
- `OCN` is the component domain for which the variable is being interpolated and corresponds to the grid_name in the data_table
- `runoff` corresponds to the fieldname_in_model in the data_table
- `runoff_data` is the storage array that holds the interpolated data
- `Spec_Time` is the time to interpolate the data to.

Additionally, it is required to call data_override_init (in this case with the ocean domain). The grid_spec.nc file is still required to initialize data_override with the ocean domain and to determine if the data in the file is in the same grid as the ocean.

```F90
call data_override_init(Ocn_domain_in=Ocn_domain)
```

**4.4** The following example uses the multi-file capability
``` yaml
data_table:
- grid_name : ICE
fieldname_in_model : sic_obs
override_file:
- file_name : INPUT/hadisst_ice.data_yr1.nc
fieldname_in_file : sic
interp_method : bilinear
multi_file:
- next_file_name: INPUT/hadisst_ice.data_yr2.nc
prev_file_name: INPUT/hadisst_ice.data_yr0.nc
factor : 0.01
```
Data override determines which file to use depending on the model time. This is to prevent having to combine the 3 yearly files into one, since the end of the previous file and the beginning of the next file are needed for yearly simulations.

**4.5** The following example uses the external weight file capability
``` yaml
data_table:
- grid_name : ICE
fieldname_in_model : sic_obs
override_file:
- file_name : INPUT/hadisst_ice.data.nc
fieldname_in_file : sic
interp_method : bilinear
external_weights:
- file_name: INPUT/remamp_file.nc
source: fregrid
factor : 0.01
```

#### 5. External Weight File Structure

**5.1** Bilinear weight file example from fregrid

```
dimensions:
nlon = 5 ;
nlat = 6 ;
three = 3 ;
four = 4 ;
variables:
int index(three, nlat, nlon) ;
double weight(four, nlat, nlon) ;
```
- `nlon` and `nlat` must be equal to the size of the global domain.
- `index(1,:,:)` corresponds to the index (i) of the longitudes point in the data file, closest to each model lon, lat
- `index(2,:,:)` corresponds to the index (j) of the lattidude point in the data file, closest to each model lon, lat
- `index(3,:,:)` corresponds to the tile (it should be 1 since data_override does not support interpolation **from** cubesphere grids)
- From there the four corners are (i,j), (i,j+1) (i+1) (i+1,j+1)
- The weights for the four corners
- weight(:,:,1) -> (i,j)
- weight(:,:,2) -> (i,j+1)
- weight(:,:,3) -> (i+1,j)
- weight(:,:,4) -> (i+1,j+1)
Loading
Loading