Skip to content

Commit

Permalink
Update running.md
Browse files Browse the repository at this point in the history
  • Loading branch information
sbryngelson authored Mar 2, 2024
1 parent d9a7546 commit 3d8452f
Showing 1 changed file with 27 additions and 27 deletions.
54 changes: 27 additions & 27 deletions docs/documentation/running.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ using 4 cores:

The MFC detects which scheduler your system is using and handles the creation and execution of batch scripts.
The batch engine is requested via the `-e batch` option.
The number of nodes can be specified with the `-N` (i.e `--nodes`) option.
The number of nodes can be specified with the `-N` (i.e., `--nodes`) option.

We provide a list of (baked-in) submission batch scripts in the `toolchain/templates` folder.

Expand All @@ -70,35 +70,35 @@ We provide a list of (baked-in) submission batch scripts in the `toolchain/templ

Other useful arguments include:

- `-# <job name>` to name your job. (i.e `--name`)
- `-@ [email protected]` to receive emails from the scheduler. (i.e `--email`)
- `-w hh:mm:ss` to specify the job's maximum allowed walltime. (i.e `--walltime`)
- `-a <account name>` to identify the account to be charged for the job. (i.e `--account`)
- `-p <partition name>` to select the job's partition. (i.e `--partition`)
- `-# <job name>` to name your job. (i.e., `--name`)
- `-@ [email protected]` to receive emails from the scheduler. (i.e., `--email`)
- `-w hh:mm:ss` to specify the job's maximum allowed walltime. (i.e., `--walltime`)
- `-a <account name>` to identify the account to be charged for the job. (i.e., `--account`)
- `-p <partition name>` to select the job's partition. (i.e., `--partition`)

As an example, one might request GPUs on a SLURM system using the following:

**Disclaimer**: IBM's JSRUN on LSF-managed computers does not use the traditional node-based approach to
allocate resources. Therefore, the MFC constructs equivalent resource-sets in task and GPU count.
allocate resources. Therefore, the MFC constructs equivalent resource sets in the task and GPU count.

### Profiling with NVIDIA Nsight

MFC provides two different argument to facilitate profiling with NVIDIA Nsight.
**Please ensure that the used argument is placed at the end so that their respective flags can be appended.**
MFC provides two different arguments to facilitate profiling with NVIDIA Nsight.
**Please ensure the used argument is placed at the end so their respective flags can be appended.**
- Nsight Systems (Nsys): `./mfc.sh run ... --nsys [nsys flags]` allows one to visualize MFC's system-wide performance with [NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems).
NSys is best for getting a general understanding of the order and execution times of major subroutines (WENO, Riemann, etc.) in MFC.
NSys is best for understanding the order and execution times of major subroutines (WENO, Riemann, etc.) in MFC.
When used, `--nsys` will run the simulation and generate `.nsys-rep` files in the case directory for all targets.
These files can then be imported into Nsight System's GUI, which can be downloaded [here](https://developer.nvidia.com/nsight-systems/get-started#latest-Platforms). It is best to run case files with a few timesteps so that the report files remain small. Learn more about NVIDIA Nsight Systems [here](https://docs.nvidia.com/nsight-systems/UserGuide/index.html).
These files can then be imported into Nsight System's GUI, which can be downloaded [here](https://developer.nvidia.com/nsight-systems/get-started#latest-Platforms). It is best to run case files with a few timesteps to keep the report files small. Learn more about NVIDIA Nsight Systems [here](https://docs.nvidia.com/nsight-systems/UserGuide/index.html).
- Nsight Compute (NCU): `./mfc.sh run ... --ncu [ncu flags]` allows one to conduct kernel-level profiling with [NVIDIA Nsight Compute](https://developer.nvidia.com/nsight-compute).
NCU provides profiling information for every subroutine called and is more detailed than NSys.
When used, `--ncu` will output profiling information for all subroutines, including elapsed clock cycles, memory used, and more after the simulation is run.
Please note that adding this argument will significantly slow down the simulation and should only be used on case files with a few timesteps.
Adding this argument will significantly slow the simulation and should only be used on case files with a few timesteps.
Learn more about NVIDIA Nsight Compute [here](https://docs.nvidia.com/nsight-compute/NsightCompute/index.html).

### Restarting Cases

When running a simulation, MFC generates a `./restart_data` folder in the case directory that contains `lustre_*.dat` files that can be used to restart a simulation from saved timesteps.
This allows a user to run a simulation to some timestep $X$, then later continue it to run to another timestep $Y$, where $Y > X$.
This allows a user to simulate some timestep $X$, then continue it to run to another timestep $Y$, where $Y > X$.
The user can also choose to add new patches at the intermediate timestep.

If you want to restart a simulation,
Expand All @@ -108,11 +108,11 @@ If you want to restart a simulation,
- `t_step_stop` : $t_f$
- `t_step_save` : $SF$
in which $t_i$ is the starting time, $t_f$ is the final time, and $SF$ is the saving frequency time.
- Run pre-process and simulation on the case.
- Run `pre_process` and `simulation` on the case.
- `./mfc.sh run case.py -t pre_process simulation `
- As the simulation runs, it will create LUSTRE files for each saved timestep in `./restart_data`.
- When the simulation stops, choose any LUSTRE file as the restarting point (lustre_ $t_s$.dat)
- Create a new duplicate input file, (ex. `restart_case.py`), on which it should:
- As the simulation runs, it will create Lustre files for each saved timestep in `./restart_data`.
- When the simulation stops, choose any Lustre file as the restarting point (lustre_ $t_s$.dat)
- Create a new duplicate input file (e.g., `restart_case.py`), which should have:

1. For the Computational Domain Parameters
- Have the following removed __except__ `m`, `n`, and `p`:
Expand All @@ -123,14 +123,14 @@ in which $t_i$ is the starting time, $t_f$ is the final time, and $SF$ is the sa
- `a_(xyz)`
- `(xyz)_a`
- `(xyz)_b`
- Have the following altered:
- `t_step_start` : $t_s$ # the point at which the simulation will restart
- `t_step_stop` : $t_{f2}$ # new final simulation time, which can be the same as $t_f$
- `t_step_save` : ${SF}_2$ # if interested in changing the saving frequency
- Have the following ADDED:
- `old_ic` : 'T' # to specify that we have initial conditions from previous simulations
- `old_grid` : 'T' # to specify that we have a grid from previous simulations (maybe I do not need m, n, and p, then?)
- `t_step_old` : $t_i$ # the time step used as the `t_step_start` of the original `case.py` file
- Alter the following:
- `t_step_start` : $t_s$ (the point at which the simulation will restart)
- `t_step_stop` : $t_{f2}$ (new final simulation time, which can be the same as $t_f$)
- `t_step_save` : ${SF}_2$ (if interested in changing the saving frequency)
- Add the following:
- `old_ic` : 'T' (to specify that we have initial conditions from previous simulations)
- `old_grid` : 'T' (to specify that we have a grid from previous simulations)
- `t_step_old` : $t_i$ (the time step used as the `t_step_start` of the original `case.py` file)
2. For the Simulation Algorithm Parameters
- Substitute `num_patches` to reflect the number of ADDED patches in the `restart_case.py` file. If no patches are added, set `num_patches: 0`

Expand All @@ -145,7 +145,7 @@ in which $t_i$ is the starting time, $t_f$ is the final time, and $SF$ is the sa
4. For Fluid properties
- Keep information about the fluid properties

- Run pre-process and simulation on restart_case.py
- Run pre-process and simulation on `restart_case.py`
- `./mfc.sh run restart_case.py -t pre_process simulation `

- Run the post_process
Expand All @@ -157,7 +157,7 @@ in which $t_i$ is the starting time, $t_f$ is the final time, and $SF$ is the sa
./mfc.sh run restart_case.py -t post_process
```

We have provided an example `case.py` and `restart_case.py` in `/examples/1D_vacuum_restart/`. This simulation is a duplicate of the `1D_vacuum` case. It demonstrates stopping at timestep 7000, adding a new patch, and restarting the simulation. To test this code, run:
We have provided an example, `case.py` and `restart_case.py` in `/examples/1D_vacuum_restart/`. This simulation is a duplicate of the `1D_vacuum` case. It demonstrates stopping at timestep 7000, adding a new patch, and restarting the simulation. To test this code, run:

```console
./mfc.sh run examples/1D_vacuum_restart/case.py -t pre_process simulation
Expand Down

0 comments on commit 3d8452f

Please sign in to comment.