Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOF-7534: QE GPU tutorial #302

Merged
merged 17 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 17 additions & 18 deletions .github/workflows/build-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ jobs:
matrix:
os: ["ubuntu-24.04"]
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
Expand Down Expand Up @@ -45,22 +44,22 @@ jobs:
if: (github.repository != 'Exabyte-io/template-definitions-js-py') && (github.ref_name == 'master')

steps:
- name: Checkout this repository
uses: actions/checkout@v4
with:
lfs: true
- name: Checkout this repository
uses: actions/checkout@v4
with:
lfs: true

- name: Checkout actions repository
uses: actions/checkout@v4
with:
repository: Exabyte-io/actions
token: ${{ secrets.BOT_GITHUB_TOKEN }}
path: actions
- name: Checkout actions repository
uses: actions/checkout@v4
with:
repository: Exabyte-io/actions
token: ${{ secrets.BOT_GITHUB_TOKEN }}
path: actions

- name: Publish python release
uses: ./actions/py/publish
with:
python-version: 3.9.x
github-token: ${{ secrets.BOT_GITHUB_TOKEN }}
publish-tag: 'true'
publish-to-pypi: 'false'
- name: Publish python release
uses: ./actions/py/publish
with:
python-version: "3.10"
github-token: ${{ secrets.BOT_GITHUB_TOKEN }}
publish-tag: "true"
publish-to-pypi: "false"
6 changes: 3 additions & 3 deletions .github/workflows/s3-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ name: Update S3 deploy
on:
push:
branches:
- 'master'
- "master"
schedule:
- cron: '0 0 1 1 *'
- cron: "0 0 1 1 *"
workflow_dispatch:

jobs:
Expand All @@ -26,7 +26,7 @@ jobs:
- name: Set python 3 version
uses: actions/setup-python@v5
with:
python-version: "3.8"
python-version: "3.10"

- name: Build pages
uses: Exabyte-io/action-mkdocs-build@main
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

For a quick installation:

1. Install dependencies: python 3 (tested on Python `3.8`-`3.13`), `pip`, `curl`, [`virtualenv`](https://virtualenv.pypa.io/en/latest/installation/), git, [git-lfs](https://git-lfs.github.com/).
1. Install dependencies: python 3 (tested on Python `3.9`-`3.13`), `pip`, `curl`, [`virtualenv`](https://virtualenv.pypa.io/en/latest/installation/), git, [git-lfs](https://git-lfs.github.com/).

2. Clone this repository:

Expand Down
3 changes: 3 additions & 0 deletions images/jobs-cli/open-web-terminal.webp
Git LFS file not shown
174 changes: 174 additions & 0 deletions lang/en/docs/tutorials/jobs-cli/qe-gpu.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
{
"descriptionLinks": [
"Accelerate Quantum ESPRESSO simulation with GPUs: https://docs.mat3ra.com/tutorials/jobs-cli/qe-gpu/"
],
"description": "We walk through a step-by-step example of running a Quantum ESPRESSO job on a GPU enabled node. We see significant performance improvement by using CUDA/GPU-enabled version of Quantum ESPRESSO.",
"tags": [
{
"...": "../../metadata/general.json#/tags"
},
{
"...": "../../models-directory/dft.json#/tags"
},
{
"...": "../../software-directory/modeling/quantum-espresso.json#/tags"
},
"CUDA",
"GPU",
"NVIDIA"
],
"title": "Mat3ra Tutorial: Accelerate Quantum ESPRESSO simulation with GPUs",
"youTubeCaptions": [
{
"text": "Hello, and welcome to the matera tutorial series.",
"startTime": "00:00:00.000",
"endTime": "00:00:03.000"
},
{
"text": "In today's tutorial, we will go through a step-by-step example of running a Quantum ESPRESSO simulation on one of our GPU enabled compute nodes.",
"startTime": "00:00:04.000",
"endTime": "00:00:14.000"
},
{
"text": "We will see how we can dramatically improve the performance of our simulation using GPUs.",
"startTime": "00:00:15.000",
"endTime": "00:00:20.000"
},
{
"text": "At the moment, GPU build of Quantum ESPRESSO is only available via our command line interface, and soon it will be made available in the web interface.",
"startTime": "00:00:21.000",
"endTime": "00:00:30.000"
},
{
"text": "Let's connect to the login node using SSH.",
"startTime": "00:00:31.000",
"endTime": "00:00:34.000"
},
{
"text": "You can use your terminal application and type S S H, your username at login dot matera dot com and press enter.",
"startTime": "00:00:35.000",
"endTime": "00:00:41.000"
},
{
"text": "If you need help on how to set up S S H, please visit our documentation site at docs dot matera dot com, and search S S H.",
"startTime": "00:00:42.000",
"endTime": "00:00:51.000"
},
{
"text": "Here you will find step by step guide to setup S S H key for seamless authentication.",
"startTime": "00:00:52.000",
"endTime": "00:00:57.000"
},
{
"text": "Note that it is also possible to connect to the login node from our web platform using the web terminal.",
"startTime": "00:00:58.000",
"endTime": "00:01:04.000"
},
{
"text": "Besides, <break time='0.5'/> it is also possible to run a command line job via bash workflow in our web platform.",
"startTime": "00:01:05.000",
"endTime": "00:01:12.000"
},
{
"text": "Create a new workflow. Select shell script as application.",
"startTime": "00:01:13.000",
"endTime": "00:01:16.000"
},
{
"text": "Add an execution unit and write your job script.",
"startTime": "00:01:17.000",
"endTime": "00:01:20.000"
},
{
"text": "For now, let's focus on the command line part.",
"startTime": "00:01:22.000",
"endTime": "00:01:24.000"
},
{
"text": "The example calculation we are going to demonstrate is available in our github repository C L I job examples.",
"startTime": "00:01:25.000",
"endTime": "00:01:33.000"
},
{
"text": "Please browse under espresso, then gpu, where you will find required input and reference output files.",
"startTime": "00:01:34.000",
"endTime": "00:01:39.000"
},
{
"text": "Once connected to the login node, let's navigate to your working directory, and clone our example repository.",
"startTime": "00:01:40.000",
"endTime": "00:01:47.000"
},
{
"text": "After cloning the repository, we also need to sync the L F S objects with git L F S pull.",
"startTime": "00:01:50.000",
"endTime": "00:01:56.000"
},
{
"text": "Let's navigate to our GPU example.",
"startTime": "00:01:57.000",
"endTime": "00:02:00.000"
},
{
"text": "Let's examine the P B S job script.",
"startTime": "00:02:03.000",
"endTime": "00:02:05.000"
},
{
"text": "We will run our job in GPU enabled G O F queue, we will request one node which has eight CPUs.",
"startTime": "00:02:07.000",
"endTime": "00:02:13.000"
},
{
"text": "To run quantum espresso jobs in GPUs, we need to load the CUDA build of quantum espresso.",
"startTime": "00:02:14.000",
"endTime": "00:02:19.000"
},
{
"text": "We set eight open M P threads and 1 M P I per GPU.",
"startTime": "00:02:20.000",
"endTime": "00:02:24.000"
},
{
"text": "We can also set parallelization options for k point and matrix diagonalization.",
"startTime": "00:02:25.000",
"endTime": "00:02:30.000"
},
{
"text": "Finally, we can submit our job with Q sub command. We can find the status of job with Q stat.",
"startTime": "00:02:31.000",
"endTime": "00:02:37.000"
},
{
"text": "Once the job is completed, we can examine the output file.",
"startTime": "00:02:38.000",
"endTime": "00:02:41.000"
},
{
"text": "We will see that the GPU acceleration was enabled for the calculation.",
"startTime": "00:02:44.000",
"endTime": "00:02:49.000"
},
{
"text": "If we scroll to the bottom of the file, we will see the total time taken by the program. The wall time for this job was slightly less than a minute.",
"startTime": "00:02:50.000",
"endTime": "00:02:58.000"
},
{
"text": "For comparison, we ran the same job using eight CPUs but without GPU acceleration, <break time='0.5'/> it took about 20 times longer.",
"startTime": "00:03:02.000",
"endTime": "00:03:10.000"
},
{
"text": "Now you may test different combination of M P I and open M P threads, different parallelization option, and see what gives you the best performance.",
"startTime": "00:03:11.000",
"endTime": "00:03:20.000"
},
{
"text": "Thank you for watching this tutorial and using our platform.",
"startTime": "00:03:21.000",
"endTime": "00:03:24.000"
}
],
"youTubeId": "trLDEwWc3ho"
}
90 changes: 90 additions & 0 deletions lang/en/docs/tutorials/jobs-cli/qe-gpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
tags:
- GPU
- CUDA
hide:
- tags
---
# Accelerate Quantum ESPRESSO simulation with GPUs

We will walk through a step-by-step example of running a Quantum ESPRESSO job on
GPUs. As of the time of writing, the GPU (CUDA) build of Quantum ESPRESSO is
only available via the Command Line Interface (CLI). We will see that we can
dramatically speedup our Quantum ESPRESSO simulation by using GPUs.

1. First connect to login node via [SSH client](../../remote-connection/ssh.md),
or [web terminal](../../remote-connection/web-terminal.md). Note that it is also
possible to run CLI jobs by creating a [bash workflow](
../../software-directory/scripting/shell/overview.md).

![Wen Terminal](../../images/jobs-cli/open-web-terminal.webp)

2. Example job that we are going to run is available in git repository
[exabyte-io/cli-job-examples](https://github.com/exabyte-io/cli-job-examples).
You may clone the repository to your working directory:
```bash
git clone https://github.com/exabyte-io/cli-job-examples
cd cli-job-examples
git lfs pull
cd espresso/gpu
```

3. You will find all required input files and job script under `espresso/gpu`.
Please review the input files and PBS job script, update the project name, and
other parameters as necessary.

4. We will use [GOF](../../infrastructure/clusters/aws.md#hardware-specifications)
queue, which comprises 8 CPUs and 1 NVIDIA V100 GPU per node.

5. Since our compute node contains 8 CPUs with 1 GPU, we will run 1 MPI process
with 8 OpenMP threads.
```bash
module load espresso/7.4-cuda-12.4-cc-70
export OMP_NUM_THREADS=8
mpirun -np 1 pw.x -npool 1 -ndiag 1 -in pw.cuo.scf.in > pw.cuo.gpu.scf.out
```

6. Finally, we can submit our job using:
```bash
qsub job.gpu.pbs
```

7. Once, the job is completed, we can inspect the output file `pw.cuo.gpu.scf.out`.
We will see that GPU was used, and the job took about 1 minute wall time.
```
Parallel version (MPI & OpenMP), running on 8 processor cores
Number of MPI processes: 1
Threads/MPI process: 8
...

GPU acceleration is ACTIVE. 1 visible GPUs per MPI rank
GPU-aware MPI enabled
...

Parallel routines

PWSCF : 37.94s CPU 50.77s WALL
```

8. For comparison, we ran the same calculation using only CPUs, and it took
about 20 times longer.
```
Parallel version (MPI), running on 8 processors

MPI processes distributed on 1 nodes
...

Parallel routines

PWSCF : 18m 0.56s CPU 18m25.33s WALL
```

You may experiment different combinations of MPI and OpenMP, various
[parallelization options](https://www.quantum-espresso.org/Doc/user_guide/node20.html),
and find what gives you the best performance.

## Step-by-step screenshare video

<div class="video-wrapper">
<iframe class="gifffer" width="100%" height="100%" src="https://www.youtube.com/embed/trLDEwWc3ho" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ nav:
- Overview: tutorials/jobs-cli/overview.md
- Create + run a CLI Job: tutorials/jobs-cli/job-cli-example.md
- Import a CLI Job to Web Interface: tutorials/jobs-cli/cli-job-import.md
- QE GPU Job: tutorials/jobs-cli/qe-gpu.md
- Templating:
- Overview: tutorials/templating/overview.md
- Flags by Elemental Composition: tutorials/templating/set-flag-by-composition.md
Expand Down
2 changes: 1 addition & 1 deletion netlify.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@
publish = "site/"

[build.environment]
PYTHON_VERSION = "3.8"
PYTHON_VERSION = "3.10"
NODE_VERSION = "20"
Loading