Cylc is a general purpose workflow scheduler that is very efficient for cycling systems. For more documentation, visit https://cylc.github.io/cylc-doc/stable/html.
The usage of cylc includes configuration files: workflow configuration, global.cylc, site configurations, rose-suite configuration, and rose-suite-experiment configrations.
The workflow configuration (flow.cylc) defines the workflow, what scripts to use throughout the workfow, and task dependencies. The global.cylc defines default cylc flow settings, including the job runner and platform information (further used in user site configurations). The site configurations define the platform to be used and other site specific user settings, such as site specific tools utilized. Furthermore, the rose-suite configuration references a default configuration file for all experiments. These settings can be applied to all experiments but can also be overwritten by experiment configurations. Rose-suite-exp. configurations include specific info for the experiment such as the history file location, the output directory location, pp components to process, etc.
To help understand the workflow configuration, the following diagram was created. It is laid out in sections.
flowchart TD
subgraph Parts of Workflow
%%C["Cylc Workflow:\nGeneral purpose workflow engine that \norchestrates cycling systems very efficiently\n\n(https://cylc.github.io/cylc-doc/stable/html/)"]
F[Cylc workflow: \nflow.cylc]
F-->S{{1. site\n-set user site\n-can be gfdl-ws, generic, or ppan}}
F--->f1{{2. meta: \n-information about the workflow}}
%%-.-> RS
F---->f2{{"3. schedule: \n-settings for the scheduler\n-non-task-specific workflow configuration"}}
F----->f3{{4.task parameters: \n-define task parameters values and parameter templates}}
F------>f4{{5.scheduling: \n-allows cylc to determine when tasks are ready to run\n-define families\n-defines qualifiers}}
F------->f5{{"6.runtime: \n-determines how, where, and what to execute when tasks are ready \n-can specify what script to execute, what compute resources to use, \n and how to run a task"}}
end
The portable workflow (flow.cylc) was created by utilizing conda environments with workflow tools and moving gfdl site specific tools to the ppan.cylc site configuration file, in addition to creating a generic.cylc site configuration to be used in its place.
The generic.cylc is where the conda environments containing tools needed for the workflow are activated, as seen below. The site configurations also define the platform used which references the global.cylc. As mentioned, the global.cylc contains info about the platform
flowchart TD
SS[Site] --> s1[gfdl-ws.cylc]
SS-->s2[ppan.cylc]
SS-->s3[generic.cylc]
subgraph gdfl-ws configuration
s1-->wsp[platform = localhost]
end
subgraph pp/an configuration
s2-->ppanp[platform = ppan]
s2-->info[gfdl site specific workflow and tools]
end
subgraph generic configuration
s3-->E
E[activate envs]-->e1[cylc.yaml: \n-cylc dependencies]
s3-->genp[platform = localhost]
end
G[global.cylc] --> g1["-defines default Cylc Flow settings for a user or site \n-includes info for each platform used in the site configs"]
wsp -.->G
ppanp -.->G
genp -..->G
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
conda install -c conda-forge mamba
conda config --add envs_dirs [location with more space]
# EX: /collab1/data/$USER/envs for niagara
conda config --add pkgs_dirs [location with more space]
# EX: /collab1/data/$USER/pkgs for niagara
- Environment yaml
cylc
: includes cylc-flow, cylc-rose, metomi-rose, hsm, fre-nctools, nco, cdo, and netcdf-cxx4. To create an environment:
mamba env create --file cylc.yaml
conda activate cylc-task-tools
- Globus online was used to transfer experiment data and any other necessary data (HISTORY_DIR, PP_GRID_SPEC)
- Create/edit pp.yaml
a. If not on gfdl pp/an or gaea, set site = "generic"
b. Ensure directories, switches and other information in the pp.yaml is correct
- history directory (where data files were transferred)
- create ptmp and tmp directories
- HISTORY_DIR_REFINED left blank
- DO_REFINEDIAG=False
- pp dir
- pp grid spec (where data files were transferred)
- component info for regrid-xy and remap-pp-components
-
TO-D0: Add
--symlink-dirs
tocylc install
in configure scripts (optional)-
--symlink-dirs='run=[location with more space]'
can be added to thecylc install
command- niagara-specific example:
--symlink-dirs=run=/collab1/data/$USER'
- edit was done due to limited space on niagara
- niagara-specific example:
-
-
Make sure cylc conda environment is activated
conda activate cylc-task-tools
- Point to the global.cylc used for
generic
site
- In generic-global-config folder in the postprocessing template repository (fre2/workflows/postprocessing/generic-global-config)
export CYLC_CONF_PATH=/path/to/generic-global-config
- Create TMPDIR environment variable
- This is used for the stage-history task
export TMPDIR=/path/to/TMPDIR/tmp
- Follow FRE-cli instructions on the main README.md
- To monitor status
-
See debugging messages:
cylc play --no-detach --debug [exp]
-
Monitor status of each task:
watch -n 5 cylc workflow-state [exp]
Instructions for portable workflow (Post-processing container use) (IN DEVELOPMENT - Is not fully updated yet)
[docker or podman] pull [image/sif file]
- Create directories in ppp-setup (transfer files here):
- PPGridspec
- history
- Copy runscript.sh from HPC-ME repo in ppp-setup (https://gitlab.gfdl.noaa.gov/fre/HPC-ME/-/tree/main/ppp?ref_type=heads)
- Globus online was used to transfer experiment data and any other necessary data (HISTORY_DIR, PP_GRID_SPEC)
i. pp.yaml
1. If not on gfdl pp/an or gaea, set site = "generic"
2. Ensure directories, switches and other information in the pp.yaml is correct
- history directory (where data files were transferred)
- HISTORY_DIR_REFINED left blank
- DO_REFINEDIAG=False
- pp dir
- pp grid spec (where data files were transferred)
- component info for regrid-xy and remap-pp-components
[singularity or apptainer] exec --writable-tmpfs --bind [location/to/ppp-setup/]:/mnt [location/to/sif/file] /mnt/runscript.sh