Skip to content

Commit

Permalink
Fixed SLURM docs
Browse files Browse the repository at this point in the history
  • Loading branch information
micheles committed Oct 2, 2024
1 parent 94b797e commit 91fdaef
Showing 1 changed file with 22 additions and 75 deletions.
97 changes: 22 additions & 75 deletions doc/getting-started/installation-instructions/slurm.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

Most HPC clusters support a scheduler called SLURM (
Simple Linux Utility for Resource Management). The OpenQuake engine
is able to transparently interface with SLURM.
is able to transparently interface with SLURM, thus making it possible
to run a single calculation over multiple nodes of the cluster.

## Running OpenQuake calculations with SLURM

Expand All @@ -27,7 +28,7 @@ which will split the calculation over 4 nodes. Clearly, there are
limitations on the number of available nodes, so if you set a number
of nodes which is too large you can have one of the following:

1. an error "You can use at most N nodes"; N depends on the
1. an error "You can use at most N nodes", where N depends on the
configuration chosen by your system administrator and can be inferred from
the parameters in the openquake.cfg file as `max_cores / num_cores`;
for instance for `max_cores=1024` and `num_cores=128` you would have `N=8`
Expand All @@ -42,19 +43,19 @@ of nodes which is too large you can have one of the following:
`Resources` (waiting for resources to become available) or `Priority`
(queued behind a higher priority job).

If you are stuck in situation 2 you must kill the openquake job and the
If you are stuck in situation 2 you must kill the
SLURM job with the command `scancel JOBID` (JOBID is listed by the
command `$ squeue -u $USER`). If you are stuck in situation 3 for a long
time it can be better to kill the jobs (both openquake and SLURM) and
then relaunch the calculations, this time asking for fewer nodes.
time it can be better to kill the job and
then relaunch the calculation, this time asking for fewer nodes.

## Running out of quota

The engine will store the calculation files in `shared_dir`
and some auxiliary files in `custom_dir`; both directories are
mandatory and must be specified in the configuration file. The
`shared_dir` is meant to point to the work area of the cluster
and the `custom_tmp` to the scratch area of the cluster.
`shared_dir` should be locateded in the work area of the cluster
and the `custom_tmp` in the scratch area of the cluster.

Classical calculations will generate an .hdf5 file for each
task spawned, so each calculation can spawn thousands of files.
Expand All @@ -66,23 +67,23 @@ old calculations, which will have the form `scratch_dir/calc_XXX`.
This section is for the administrators of the HPC cluster.

Here are the installations instructions to create modules for
engine 3.18 assuming you have python3.10 installed as modules.
engine 3.21 assuming you have python3.11 installed as modules.

We recommend choosing a base path for openquake and then installing
the different versions using the release number, in our example /apps/openquake/3.18.
the different versions using the release number, in our example /apps/openquake/3.21.
This will create different modules for different releases

```
# module load python/3.10
# module load python/3.11
# mkdir /apps/openquake
# python3.10 -m venv /apps/openquake/3.18
# source /apps/openquake/3.18/bin/activate
# python3.11 -m venv /apps/openquake/3.21
# source /apps/openquake/3.21/bin/activate
# pip install -U pip
# pip install -r https://github.com/gem/oq-engine/raw/engine-3.18/requirements-py310-linux64.txt
# pip install openquake.engine==3.18
# pip install -r https://github.com/gem/oq-engine/raw/engine-3.21/requirements-py310-linux64.txt
# pip install openquake.engine==3.21
```
Then you have to define the module file. In our cluster it is located in
`/apps/Modules/modulefiles/openquake/3.18`, please use the appropriate
`/apps/Modules/modulefiles/openquake/3.21`, please use the appropriate
location for your cluster. The content of the file should be the following:
```
#%Module1.0
Expand All @@ -93,10 +94,10 @@ proc ModulesHelp { } {
puts stderr "\n\tThis will add OpenQuake to your PATH environment variable."
}
module-whatis "loads the OpenQuake 3.18 environment"
module-whatis "loads the OpenQuake 3.21 environment"
set version 3.18
set root /apps/openquake/3.18
set version 3.21
set root /apps/openquake/3.21
prepend-path LD_LIBRARY_PATH $root/lib64
prepend-path MANPATH $root/share/man
Expand All @@ -110,7 +111,7 @@ After installing the engine, the sysadmin has to edit the file
[distribution]
oq_distribute = slurm
serialize_jobs = 2
python = /apps/openquake/3.18/bin/python
python = /apps/openquake/3.21/bin/python
[directory]
# optionally set it to something like /mnt/large_shared_disk
Expand All @@ -119,61 +120,7 @@ shared_dir =
[dbserver]
host = local
```
With `serialize_jobs = 2` at most two jobs per user can run concurrently. You may want to
With `serialize_jobs = 2` at most two jobs per user can be run concurrently. You may want to
increase or reduce this number. Each user will have its own database located in
`$HOME/oqdata/db.sqlite3`. The database will be created automatically
the first time the user runs a calculation (NB: in engine-3.18 it must be
created manually with the command `srun oq engine --upgrade-db --yes`).

## How it works internally

The support for SLURM is implemented in the module
`openquake/baselib/parallel.py`. The idea is to submit to SLURM a job
array of tasks for each parallel phase of the calculation. For instance
a classical calculations has three phases: preclassical, classical
and postclassical.

The calculation will start sequentially, then it will reach the
preclassical phase: at that moment the engine will create a
bash script called `slurm.sh` and located in the directory
`$HOME/oqdata/calc_XXX` being XXX the calculation ID, which is
an OpenQuake concept and has nothing to do with the SLURM ID.
The `slurm.sh` script has the following template:
```bash
#!/bin/bash
#SBATCH --job-name={mon.operation}
#SBATCH --array=1-{mon.task_no}
#SBATCH --time=10:00:00
#SBATCH --mem-per-cpu=1G
#SBATCH --output={mon.calc_dir}/%a.out
#SBATCH --error={mon.calc_dir}/%a.err
srun {python} -m openquake.baselib.slurm {mon.calc_dir} $SLURM_ARRAY_TASK_ID
```
At runtime the `mon.` variables will be replaced with their values:

- `mon.operation` will be the string "preclassical"
- `mon.task_no` will be the total number of tasks to spawn
- `mon.calc_dir` will be the directory `$HOME/oqdata/calc_XXX`
- `python` will be the path to the python executable to use, as set in openquake.cfg

System administrators may want to adapt such template. At the moment
this requires modifying the engine codebase; in the future the template
may be moved in the configuration section.

A task in the OpenQuake engine is simply a Python function or
generator taking some arguments and a monitor object (`mon`),
sending results to the submitter process via zmq.

Internally the engine will save the input arguments for each task
in pickle files located in `$HOME/oqdata/calc_XXX/YYY.pik`, where
XXX is the calculation ID and YYY is the `$SLURM_ARRAY_TASK_ID` starting from 1
to the total number of tasks.

The command `srun {python} -m openquake.baselib.slurm {mon.calc_dir}
$SLURM_ARRAY_TASK_ID` in `slurm.sh` will submit the tasks in parallel
by reading the arguments from the input files.

Using a job array has the advantage that all tasks can be killed
with a single command. This is done automatically by the engine
if the user aborts the calculation or if the calculation fails
due to an error.
the first time the user runs a calculation.

0 comments on commit 91fdaef

Please sign in to comment.