diff --git a/.gitmodules b/.gitmodules deleted file mode 100644 index e69de29..0000000 diff --git a/README.md b/README.md index 79d8928..8644d3c 100644 --- a/README.md +++ b/README.md @@ -1,87 +1,302 @@ -# Instructions to postprocess FMS history output on PP/AN or gaea + +This repository holds code for defining tasks, applications, tools, workflows, and other aspects of the FRE2 postprocessing +ecosystem. -1. Checkout postprocessing workflow template -This will clone the postprocessing repository into `/home/$USER/cylc-src/EXPNAME__PLATFORM__TARGET`. + + +# Instructions to postprocess FMS history files on GFDL's PP/AN +These instructions are targeted at workflow developers. If you are a user simply looking to run a specific workflow, see +[`fre-cli`](https://github.com/NOAA-GFDL/fre-cli). + + + + +## 1. Clone `fre-workflows` repository ``` -module load fre/canopy -fre pp checkout -e EXPNAME -p PLATFORM -t TARGET +git clone https://github.com/NOAA-GFDL/fre-workflows.git +cd fre-workflows ``` +**Do not clone to a temporary directory** - the directory must be readable by slurm from all nodes. Directories on local +`\vftmp` are not, while those on `/home`, `/work`, and `/xtmp` are. -2. Configure pp template with either XML or pp.yaml + + +## 2. Load Cylc, the backend workflow engine used by FRE2 +[`cylc`](https://cylc.github.io/cylc-doc/stable/html/) lets us parse workflow template files (`*.cylc`) and their +configurations into modular, interdependent batch jobs. Tools used by those jobs (e.g. `fre-nctools` or `xarray`) should +be loaded by those jobs as part of their requirements and do not need to be loaded at this time unless desired. + +To run this repository's code, we need `cylc` accessible and somewhere in out `PATH` environment variable. One way is to activate +a conda environment with `cylc-flow`, `cylc-rose`, and `metomi-rose` installed. At GFDL's PP/AN, it is sufficient to do ``` -fre pp configure-xml -e EXPNAME -p PLATFORM -t TARGET -x XML -``` -or +module load cylc ``` -fre pp configure-yaml -e EXPNAME -p PLATFORM -t TARGET -y YAML -``` -3. (OPTIONAL BUT RECCOMENDED) Create `history-manifest` for config validation + +## 3. Fill in rose-suite configuration fields +With your favorite text editor, open up `rose-suite.conf` and set variables to desired values. These values will be passed to +task definitions within `flow.cylc` and taken as configuration settings to instruct task execution. + +Key values include: +- `SITE` set to "ppan" to submit jobs to PP/AN cluster +- `HISTORY_DIR` directory path to your raw model output +- `HISTORY_SEGMENT` amount of time covered by a single history file (ISO8601 datetime) +- `PP_CHUNK_A` amount of time covered by a single postprocessed file (ISO8601 datetime) +- `PP_CHUNK_B` secondary chunk size for postprocessed files, if desired (ISO8601 datetime) +- `PP_START` start of the desired postprocessing (ISO8601 datetime) +- `PP_STOP` end of the desired postprocessing (ISO8601 datetime) +- `PP_COMPONENTS` space-separated list of user-defined components, discussed in more detail below +- `PP_GRID_SPEC` path to FMS grid definition tarfile +- `PP_DEFAULT_XYINTERP` default target resolution for regridding, if the resolution is not specified elsewhere +- `FRE_ANALYSIS_HOME` For locating shared analysis scripts (only define if `DO_ANALYSIS=True`) -Create a `history-manifest` of a single tar file archive first for use in the validation. -This list represents the available source files within the history tar archives, and enables the -validation procedure to catch a wider variety of potential errors. This can be done like so- +It is common to not know exactly what to set `PP_COMPONENTS` to when configuring a new workflow from scratch. Later +steps in this guide can help inform how to adjust these settings. The Rose configuation file format is described in full +[elsewhere](https://metomi.github.io/rose/doc/html/api/configuration/rose-configuration-format.html) + +If one is looking to hit the ground running at GFDL's PP/AN, copy/paste the code block here into your `rose-suite.conf`, replacing +`YOUR.USERNAME` with your own where applicable: ``` -tar -tf /archive/$USER/path/to/history/files/YYYYMMDD.nc.tar | grep -v tile[2-6] | sort > /home/$USER/cylc-src/EXPNAME__PLATFORM__TARGET/history-manifest +SITE="ppan" +EXPERIMENT='FOO' +PLATFORM='BAR' +TARGET='BAZ' + +DO_STATICS=True +DO_TIMEAVGS=True +DO_ATMOS_PLEVEL_MASKING=True +DO_MDTF=False + +DO_REFINEDIAG=False +REFINEDIAG_SCRIPTS="\$CYLC_WORKFLOW_RUN_DIR/etc/refineDiag/refineDiag_atmos_cmip6.csh" + +DO_PREANALYSIS=False +PREANALYSIS_SCRIPT="\$CYLC_WORKFLOW_RUN_DIR/etc/refineDiag/refineDiag_data_stager_globalAve.csh" + +DO_ANALYSIS=False +DO_ANALYSIS_ONLY=False +FRE_ANALYSIS_HOME="/home/fms/local/opt/fre-analysis/test" +ANALYSIS_DIR='/nbhome/YOUR.USERNAME/fre/FMS2023.04_om5_20240410/ESM4.2JpiC_om5b04r1' + +CLEAN_WORK=True + +PTMP_DIR='/xtmp/$USER/ptmp' + +HISTORY_DIR='/archive/Ian.Laflotte/fre/FMS2023.04_om5_20240410/ESM4.2JpiC_om5b04r1/gfdl.ncrc5-intel23-prod-openmp/history' +HISTORY_SEGMENT='P1Y' + +PP_DIR='/archive/YOUR.USERNAME/fre/FMS2023.04_om5_20240410/ESM4.2JpiC_om5b04r1/gfdl.ncrc5-intel23-prod-openmp/pp' +PP_CHUNK_A='P2Y' +PP_COMPONENTS='atmos atmos_scalar land land_static' +PP_START="00010101" +PP_STOP="00020101" +PP_DEFAULT_XYINTERP="360,180" +PP_GRID_SPEC='/work/Ian.Laflotte/mosaic_generation/exchange_grid_toolset/workdir/mosaic_c96om5b04v20240410.20240423.an105/mosaic_c96om5b04v20240410.20240423.an105.tar' ``` -4. Validate the configuration + + + +## 4. Create history file manifest (optional but highly recommended) +For more complete validation of workflow settings, we create a manifest for our history file archives with ``` -fre pp validate -e EXPNAME -p PLATFORM -t TARGET +tar -tf /path/to/history/YYYYMMDD.nc.tar | grep -v "tile[2-6]" | sort > history-manifest ``` -Warnings related to directories are probably valid and should be fixed in `rose-suite.conf`, or created as necessary via `mkdir`. +The `history-manifest` contains a list of source files contained within the targeted history files. This can be +helpful for validating settings on a component-by-component basis in the next step(s). + -If you are running postprocessing gaea, you'll need to change the `SITE` variable in `rose-suite.conf` from `ppan` to `gaea`. -5. Install the workflow + +## 5. Define your desired postprocessing components for `remap-pp-components` +Users define their own postprocessing components for their workflow, which represent a group of source files to be postprocessed +together. This grouping is typically united by a common gridding, which may be the current "native" gridding of the source files, +or a desired target gridding to achieve via regridding. If `history-manifest` was created, it will be used to check that the +source files specified in the components are actually present in the history files. +User-defined components are configured within `app/remap-pp-components/rose-app.conf`. An example set of components could be: ``` -fre pp install -e EXPNAME -p PLATFORM -t TARGET +[command] +default=remap-pp-components + +[atmos] +sources=atmos_month +grid=regrid-xy/default + +[atmos_scalar] +sources=atmos_scalar atmos_global_cmip +grid=native + +[land] +sources=land_month_cmip +grid=regrid-xy/288_180.conserve_order1 + +[land_static] +sources=land_static +grid=regrid-xy/288_180.conserve_order1 +freq=P0Y ``` -If you are attempting this on gaea, you'll need to make two one-time changes before installing. -- Currently, `cylc`, `rose`, and `isodatetime` must be in your PATH for new shells. One approach to do this is -to symlink the fms-user-installed fre-cli cylc/rose/isodatetime scripts into your local `~/bin` directory, -and then add that `~/bin` directory to your PATH in your `.bashrc` or `.cshrc`. (If you don't do this, Cylc tasks -will fail complaining those 3 tools are not available.) +Here we've defined the `atmos` component as being a set of one source file, `atmos_month`. The `grid` field shows we wish to +have these two source files regridded to the default resolution specified in `rose-suite.conf`. By contrast, the `atmos_scalar` +component contains two source files, and specifies a `native` grid. This indicates that `atmos_scalar` and `atmos_global_cmip` +source files will not be regridded when processing them for the `atmos_scalar` component. +Note- it is not uncommon for a specific component to be named after a source file contained in it's `sources` field, but it +does not imply anything special about the relationship between the source file and the component. + +The third component is `land`, and will be regridded to a resolution corresponding to a 180x288 lat/lon grid, using an +interpolation scheme which is conservative to a first-order approximation. The last is the `land_static` component, and will +be similarly handled to `land`. Since `land_static` is time-independent, it will require `freq=P0Y`, as the name is not used +to determine if a component involves static data. Statics will only be processed if `DO_STATICS=True` in `rose-suite.conf`. + +The setting for `PP_COMPONENTS` should reflect information in `app/remap-pp-components/rose-app.conf`. From our example, a +good list that passes validation would be `PP_COMPONENTS=atmos atmos_scalar land land_static`. + + + +## 6. Provide more specifics for `regrid-xy` +Any component specified in `app/remap-pp-components/rose-app.conf` requesting regridding requires a corresponding entry in +`app/regrid-xy/rose-app.conf`, providing further information. A full set of options one can specify in this configuration +can be found in `app/regrid-xy/README.md`. + +Following up on our example in the previous step, we would not have an extry in `app/regrid-xy/rose-app.conf` for +`atmos_scalar`, but we will for `atmos`, `land` and `land_static` components: ``` -cd ~/bin -ln -s /ncrc/home2/Flexible.Modeling.System/conda/envs/fre-cli/bin/{cylc,rose,isodatetime} . -echo 'setenv PATH ${PATH}:~/bin' >> ~/.cshrc +[command] +default=regrid-xy + +[atmos] +inputGrid=cubedsphere +inputRealm=atmos +interpMethod=conserve_order2 +outputGridLat=180 +outputGridLon=288 +outputGridType=default +sources=atmos_month + +[land] +inputGrid=cubedsphere +inputRealm=land +interpMethod=conserve_order1 +outputGridLat=180 +outputGridLon=288 +outputGridType=288_180.conserve_order1 +sources=land_month_cmip + +[land_static] +inputGrid=cubedsphere +inputRealm=land +interpMethod=conserve_order1 +outputGridLat=180 +outputGridLon=288 +outputGridType=288_180.conserve_order1 +sources=land_static ``` -- Currently, the cylc available on gaea (through `module load cylc` or the `PATH` trick above) does not -include any global configuration, so you'll need to create a file `~/.cylc/flow/global.cylc` that contains the following. -If you don't do this, Cylc will use your home directory for the scratch space and rapidly fill your quota.) +Note that the `atmos_scalar` component does not have an entry here, as we requested a `native` regridding for source files in +that component. Full documentation on the available input configuration fields is available in the +[`app/regrid-xy` directory](https://github.com/NOAA-GFDL/fre-workflows/tree/update.README/app/regrid-xy), but some things worth +noting above: +- `inputGrid` can be `cubedsphere` or `tripolar`. +- `inputRealm` attribute is used for identifying the `land`, `atmos`, or `ocean` grid mosaic file. +- The `interpMethod` should be `conserve_order1`, `conserve_order2`, or `bilinear`. +- `OutputGridType` is the grid label referenced in the `app/remap-pp-components/rose-app.conf` file. +- `OutputGridLat` and `OutputGridLon` identify the target grid if `OutputGridType` is not specified + + + + +## 7. Validate your workflow configuration +Rose can validate the configuration by checking the field values against a list of rules defined by the devlopers of this +repository. It's crucial to note that while this list of rules is determined by the requirements of th + +One can wait until this step in this guide, or validate as they go along at any point in the previous instructions ``` -[install] - [[symlink dirs]] - [[[localhost]]] - run = /gpfs/f5/scratch/gfdl_f/$USER +rose macro --validate ``` +Common errors include non-existent directories and time intervals that are not ISO8601 datetimes. It is recommended to address +any/all complaints. If `history-manifest` exists, `rose macro --validate` will report on source files referenced by components +that are not present in the history tar file archives. Whether a missing file is a show-stopper or a toothless complaint is +at the discretion of the user. If a source file is missing, consider reconfiguring the component definition(s), remove the +source file from the component, or simply removing the component altogether. + + + + -6. Run the workflow + + + +## 8. Validate/Install/Run the configured workflow templates with `cylc` +Validate the workflow with `cylc` by entering: ``` -fre pp run -e EXPNAME -p PLATFORM -t TARGET +cylc validate . ``` +If the Cylc validation fails but the Rose validation passes, please raise an issue on this repository, as it is better to +catch configuration issues at the `rose macro --validate` step, and the validation rules can be updated to match the task +definition requirements. -7. Report status of workflow progress +We install the workflow with: +``` +cylc install . +``` +This creates a workflow directory in `~/cylc-run`. +After successful installation, the workflow is launched with: ``` -fre pp status -e EXPNAME -p PLATFORM -t TARGET +cylc play fre-workflows/run1 ``` +If on PP/AN, cylc launches a scheduler daemon on a `workflow1` server, via `ssh`, triggering the login banner to be printed. +This daemon submits and runs jobs based on the task dependencies defined in `flow.cylc`. + + + + -8. Launch GUI + +## 9. Inspect workflow progress with an interface (GUI or TUI) +The workflow will run and shutdown when all tasks are complete. If tasks fail, the workflow may stall, in which case +it will shutdown in error after a period of time. + +`cylc` has two workflow viewing interfaces (full GUI and text UI), and a variety of CLI commands that can expose workflow +and task information. The text-based GUI can be launched via: +``` +cylc tui fre-workflows/run1 ``` -TODO: fre pp gui? The full GUI can be launched on jhan or jhanbigmem (an107 or an201). - +``` cylc gui --ip=`hostname -f` --port=`jhp 1` --no-browser ``` +Then, navigate to one of the two links printed to screen in your web browser. If one just wants a quick look at the state of +their workflow, the user-interfaces can be completely avoided by using the `workflow-state` command, two examples of which are: +``` +cylc workflow-state -v fre-workflows/run1 # show all jobs +cylc workflow-state -v fre-workflows/run1 | grep failed # show only failed ones +``` + + + + +## 10. Inspect workflow progress with a terminal CLI +Various other `cylc` commands are useful for inspecting a running workflow. Try `cylc help`, and `cylc --help` for +more information on how to use these tools to your advantage! + +- `cylc scan` Lists running workflows +- `cylc cat-log fre-workflows/run1` Show the scheduler log +- `cylc list` Lists all tasks +- `cylc report-timings` + + + + + + diff --git a/README_using_fre-cli.md b/README_using_fre-cli.md new file mode 100644 index 0000000..0867772 --- /dev/null +++ b/README_using_fre-cli.md @@ -0,0 +1,89 @@ +note these instructions will be/are from https://github.com/NOAA-GFDL/fre-cli/tree/main/fre/pp + +# Instructions to postprocess FMS history output on PP/AN or gaea with fre-cli + +1. Checkout postprocessing workflow template +This will clone the postprocessing repository into `/home/$USER/cylc-src/EXPNAME__PLATFORM__TARGET`. +``` +module load fre/canopy +fre pp checkout -e EXPNAME -p PLATFORM -t TARGET +``` + +2. Configure pp template with either XML or pp.yaml + +``` +fre pp configure-xml -e EXPNAME -p PLATFORM -t TARGET -x XML +``` +or +``` +fre pp configure-yaml -e EXPNAME -p PLATFORM -t TARGET -y YAML + +``` + +3. (OPTIONAL BUT RECCOMENDED) Create `history-manifest` for config validation + +Create a `history-manifest` of a single tar file archive first for use in the validation. +This list represents the available source files within the history tar archives, and enables the +validation procedure to catch a wider variety of potential errors. This can be done like so- +``` +tar -tf /archive/$USER/path/to/history/files/YYYYMMDD.nc.tar | grep -v tile[2-6] | sort > /home/$USER/cylc-src/EXPNAME__PLATFORM__TARGET/history-manifest +``` + +4. Validate the configuration +``` +fre pp validate -e EXPNAME -p PLATFORM -t TARGET +``` + +Warnings related to directories are probably valid and should be fixed in `rose-suite.conf`, or created as necessary via `mkdir`. + +If you are running postprocessing gaea, you'll need to change the `SITE` variable in `rose-suite.conf` from `ppan` to `gaea`. + +5. Install the workflow + +``` +fre pp install -e EXPNAME -p PLATFORM -t TARGET +``` + +If you are attempting this on gaea, you'll need to make two one-time changes before installing. +- Currently, `cylc`, `rose`, and `isodatetime` must be in your PATH for new shells. One approach to do this is +to symlink the fms-user-installed fre-cli cylc/rose/isodatetime scripts into your local `~/bin` directory, +and then add that `~/bin` directory to your PATH in your `.bashrc` or `.cshrc`. (If you don't do this, Cylc tasks +will fail complaining those 3 tools are not available.) + +``` +cd ~/bin +ln -s /ncrc/home2/Flexible.Modeling.System/conda/envs/fre-cli/bin/{cylc,rose,isodatetime} . +echo 'setenv PATH ${PATH}:~/bin' >> ~/.cshrc +``` +- Currently, the cylc available on gaea (through `module load cylc` or the `PATH` trick above) does not +include any global configuration, so you'll need to create a file `~/.cylc/flow/global.cylc` that contains the following. +If you don't do this, Cylc will use your home directory for the scratch space and rapidly fill your quota.) + +``` +[install] + [[symlink dirs]] + [[[localhost]]] + run = /gpfs/f5/scratch/gfdl_f/$USER +``` + +6. Run the workflow + +``` +fre pp run -e EXPNAME -p PLATFORM -t TARGET +``` + +7. Report status of workflow progress + +``` +fre pp status -e EXPNAME -p PLATFORM -t TARGET +``` + +8. Launch GUI + +``` +TODO: fre pp gui? + +The full GUI can be launched on jhan or jhanbigmem (an107 or an201). + +cylc gui --ip=`hostname -f` --port=`jhp 1` --no-browser +```