Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI for GPU tests on Ruche #29

Merged
merged 104 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
4e1a0e4
Factorize builds
pzehner Jan 23, 2024
370f3d9
Build base image in test
pzehner Jan 23, 2024
1269343
Rename build job
pzehner Jan 23, 2024
d0d7d10
Fix syntax error
pzehner Jan 23, 2024
28b67c2
Remove old line in ci
pzehner Jan 23, 2024
a7fd955
Fix docker images name
pzehner Jan 23, 2024
1956c95
Fix incorrect backend accesses
pzehner Jan 23, 2024
d331b07
Fix Cmake flags
pzehner Jan 23, 2024
6299e60
Fix save image step and restrict to create images for native target only
pzehner Jan 23, 2024
f5dfaad
Merge branch 'main' into feature/nvidia-ci
pzehner Jan 24, 2024
312009e
Use a base nvcc Docker image from Kokkos
pzehner Jan 24, 2024
0deb0fb
Fix install step
pzehner Jan 24, 2024
2d2af45
Fix ci
pzehner Jan 24, 2024
79e3ade
Only use base image
pzehner Jan 24, 2024
2fd2228
Set singularity version
pzehner Jan 24, 2024
450b412
Fix singularity version
pzehner Jan 24, 2024
502108e
Downgrade singularity
pzehner Jan 24, 2024
6ecf5de
Install singularity manually
pzehner Jan 24, 2024
504dce3
Fix manual install
pzehner Jan 24, 2024
7719bde
Reorder singularity installation
pzehner Jan 24, 2024
b26dd9d
Fix permission
pzehner Jan 24, 2024
2406529
Fix CI
pzehner Jan 24, 2024
61fa461
Fix execution rights and singularity image name
pzehner Jan 24, 2024
2677448
Fix test build
pzehner Jan 25, 2024
8754931
Fix environment for test build
pzehner Jan 25, 2024
9f6c43a
Fix ifs
pzehner Jan 25, 2024
1172c03
Fix if quotes
pzehner Jan 25, 2024
439b5d8
Improve and fix build of test code
pzehner Jan 25, 2024
b79adcf
Set number of workers for building
pzehner Jan 25, 2024
dfa198a
Harmonize KokkosFFT name within CMake files
pzehner Jan 25, 2024
5df4988
Disable self-hosted runner label
pzehner Jan 25, 2024
412abb1
Fix calls to tar command
pzehner Jan 25, 2024
932e7c5
Do not rebuild base image if not modified
pzehner Jan 25, 2024
3ec18d7
Update checkout to v4
pzehner Jan 26, 2024
4b16ccd
Downgrade download-artifacts to v3
pzehner Jan 26, 2024
a9a5aa3
Add cuda label
pzehner Jan 26, 2024
582a75d
Downgrade upload-artifact to v3
pzehner Jan 26, 2024
cedef9e
Fix archive name
pzehner Jan 26, 2024
12627bb
Fix singularity image name when pulling
pzehner Jan 26, 2024
be0e335
Fix Singularity image name to run
pzehner Jan 26, 2024
8eae079
Add cleanup step on self-hosted runner
pzehner Jan 26, 2024
453bad6
Fix syntax
pzehner Jan 26, 2024
3f81ce6
Remove cleanup and rely on runner scripts
pzehner Jan 26, 2024
c2bd5d3
Do not exclude CMake files
pzehner Jan 26, 2024
dbf9868
Fix Singularity setup
pzehner Jan 26, 2024
9f4b2c2
Enable CUDA constexpr
pzehner Jan 26, 2024
22be7a0
Change images name
pzehner Jan 26, 2024
383faba
Enable HIP builds
pzehner Jan 26, 2024
a9da39d
Remove Docker tests files
pzehner Jan 26, 2024
a85e9f6
Fix CI
pzehner Jan 26, 2024
180b335
Change depth
pzehner Jan 26, 2024
5f81949
List changed Dockerfiles
pzehner Jan 26, 2024
8c0b346
Fix chaining jobs
pzehner Jan 26, 2024
36e312d
Fix incorrectly spelled inline cache argument
pzehner Jan 26, 2024
e193f19
Add HIPFFT in HIP image
pzehner Jan 26, 2024
ae242c4
Specify compiler command
pzehner Jan 26, 2024
c10fdf3
Disable free disk space
pzehner Jan 26, 2024
ca20d83
Improve conditions
pzehner Jan 26, 2024
d0aea65
Add ROCFFT to HIP image
pzehner Jan 26, 2024
c22a92b
Fix CI
pzehner Jan 26, 2024
4ae3ef7
Fix CI
pzehner Jan 26, 2024
0ee07bb
Merge branch 'main' into feature/nvidia-ci
pzehner Jan 30, 2024
defc4a4
Enable SYCL CI
pzehner Jan 30, 2024
1a9cf59
Fix CI
pzehner Jan 30, 2024
de1f940
Add missing SYCL Docker file
pzehner Jan 30, 2024
7da5bd3
Convert to Singularity image conditionnaly
pzehner Jan 30, 2024
36ebe0d
Remove CUDA constexpr flag
pzehner Jan 30, 2024
f203a0e
Fix Git settings in SYCL Docker file
pzehner Jan 30, 2024
51f488b
Fix SYCL Docker file for Git and CMake version
pzehner Jan 30, 2024
a0a1b77
Fix SYCL Docker file
pzehner Jan 30, 2024
7e63b52
Merge branch 'main' into feature/nvidia-ci
pzehner Jan 30, 2024
b4500b1
Fix missing SYCL configuration for install
pzehner Jan 30, 2024
5fc7633
Merge branch 'main' into feature/nvidia-ci
pzehner Jan 31, 2024
4cfc6b0
Enable OpenMP backend
pzehner Jan 31, 2024
c963d8f
Do not login to ghcr when building
pzehner Jan 31, 2024
22c44a4
Fix Git safe directories in all Docker files
pzehner Jan 31, 2024
7779042
Fix CI
pzehner Jan 31, 2024
5a4c118
Add Git to all Docker files
pzehner Jan 31, 2024
e24df92
Enable OpenMP tests in CI
pzehner Jan 31, 2024
1258f5a
Fix CI for tests
pzehner Jan 31, 2024
06abc81
Fix OpenMP tests
pzehner Feb 1, 2024
89150c9
Use a distinct image for PRs that modified Docker files
pzehner Feb 1, 2024
7e8fd5f
Fix CI
pzehner Feb 1, 2024
3746620
Fix lower-case commands in Docker files
pzehner Feb 1, 2024
5e912df
Rename local SIF files to simpler terms
pzehner Feb 1, 2024
d442209
Replace always by not cancelled on CI
pzehner Feb 1, 2024
e78d176
Use _main suffix for main images
pzehner Feb 1, 2024
37c5119
Remove install test scripts
pzehner Feb 1, 2024
8b91e6f
Update documentation
pzehner Feb 2, 2024
3c14f7d
Move test file
pzehner Feb 2, 2024
b3e1811
Build base image in a different workflow
pzehner Feb 2, 2024
9dd14f4
Fix CI
pzehner Feb 2, 2024
3dfc193
Fix pull error on Ruche
pzehner Feb 2, 2024
5d43004
Force generation of main images
pzehner Feb 2, 2024
3f4cee9
Revert "Force generation of main images"
pzehner Feb 2, 2024
3f083b1
Add automatic base images generation and cleanup workflows
pzehner Feb 2, 2024
7f9e4fe
Collapse pull and run on Ruche
pzehner Feb 2, 2024
413a09a
Add comment for SYCL generic code
pzehner Feb 2, 2024
f76955f
Revert "Collapse pull and run on Ruche"
pzehner Feb 2, 2024
bd3d299
Add comment for test code
pzehner Feb 2, 2024
a856232
Fix crons
pzehner Feb 2, 2024
f4cf216
Add workflow documentation and add CI badge
pzehner Feb 6, 2024
259f856
Rename main CI worflow for badge
pzehner Feb 6, 2024
06a7190
Change build type to Release
pzehner Feb 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions .github/workflows/__build_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Generate base images for each backend. Images are stored on Github registry
# as Docker format inconditionnaly and as Singularity format only if requested.
# Docker images are used for building Kokkos FFT, while Singularity images are
# used for test execution on the righteous hardware. This workflow can only be
# invoked through another workflows.

name: Build base images

on:
workflow_call:
inputs:
# suffix of the Docker and Singularity images
image_name_suffix:
required: false
default: main
type: string

env:
# Force the use of BuildKit for Docker
DOCKER_BUILDKIT: 1

jobs:
build_base:
runs-on: ubuntu-latest

strategy:
matrix:
backend:
- name: openmp
use_singularity: false
- name: cuda
use_singularity: true
- name: hip
use_singularity: false
- name: sycl
use_singularity: false

steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/[email protected]
with:
tool-cache: true
large-packages: false

- name: Checkout repository
uses: actions/checkout@v4

- name: Get Singularity
env:
SINGULARITY_VERSION: 3.11.2
run: |
wget https://github.com/sylabs/singularity/releases/download/v${{ env.SINGULARITY_VERSION }}/singularity-ce_${{ env.SINGULARITY_VERSION }}-jammy_amd64.deb
sudo apt-get install ./singularity-ce_${{ env.SINGULARITY_VERSION }}-jammy_amd64.deb

- name: Login in GitHub Containers Repository with Docker
run: echo ${{ secrets.GITHUB_TOKEN }} | docker login ghcr.io -u ${{ github.actor }} --password-stdin

- name: Login in GitHub Containers Repository with Singularity
run: echo ${{ secrets.GITHUB_TOKEN }} | singularity remote login -u ${{ github.actor }} --password-stdin oras://ghcr.io

- name: Build Docker image
run: |
docker build \
-t ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_${{ inputs.image_name_suffix }} \
--cache-from ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_main \
--build-arg BUILDKIT_INLINE_CACHE=1 \
--progress=plain \
docker/${{ matrix.backend.name }}

- name: Push Docker image
run: docker push ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_${{ inputs.image_name_suffix }}

- name: Convert Docker image to Singularity
run: singularity build base.sif docker://ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_${{ inputs.image_name_suffix }}
if: ${{ matrix.backend.use_singularity }}

- name: Push Singularity image
run: singularity push base.sif oras://ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_singularity_${{ inputs.image_name_suffix }}
if: ${{ matrix.backend.use_singularity }}
222 changes: 222 additions & 0 deletions .github/workflows/build_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
# Build and test Kokkos FFT using Docker and Singularity images. Pre-generated
# images are pulled from Github registry; they are updated only if the current
# PR or commit modified the Docker files.

name: CI

on:
pull_request:
branches:
- main

env:
# Customize the CMake build type here (Release, Debug, RelWithDebInfo, etc.)
BUILD_TYPE: Release

# Force the use of BuildKit for Docker
DOCKER_BUILDKIT: 1

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: DoozyX/[email protected]
with:
source: 'common/ fft/ examples/'
exclude: ''
extensions: 'hpp,cpp'
clangFormatVersion: 12

check_docker_files:
runs-on: ubuntu-latest

outputs:
# true if any Docker file was modified in the PR (PR mode) or since last pushed commit (push mode)
docker_files_have_changed: ${{ steps.get_changed_docker_files.outputs.any_changed == 'true' }}
# use "pr" as image name suffix if on PR mode and if any Docker file was modified, otherwise use "main"
# this is intended to avoid a PR test to alter Docker images for other PRs or for the main branch
image_name_suffix: ${{ steps.get_changed_docker_files.outputs.any_changed == 'true' && github.event_name == 'pull_request' && 'pr' || 'main' }}

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Get changed Dockerfiles
id: get_changed_docker_files
uses: tj-actions/changed-files@v42
with:
files: docker/**/Dockerfile

- name: List changed Dockerfiles
if: ${{ steps.docker_files_have_changed.outputs.any_changed == 'true' }}
env:
ALL_CHANGED_FILES: ${{ steps.docker_files_have_changed.outputs.all_changed_files }}
run: |
for file in "$ALL_CHANGED_FILES"; do
echo "$file was changed"
done

build_base:
needs: check_docker_files

if: ${{ needs.check_docker_files.outputs.docker_files_have_changed == 'true' }}

uses: ./.github/workflows/__build_base.yaml

with:
image_name_suffix: ${{ needs.check_docker_files.outputs.image_name_suffix }}

build:
runs-on: ubuntu-latest

needs:
- check_docker_files
- build_base

# run this job even if build_base did not run
if: ${{ ! cancelled() && (needs.build_base.result == 'success' || needs.build_base.result == 'skipped') }}

strategy:
matrix:
backend:
- name: openmp
c_compiler: gcc
cxx_compiler: g++
cmake_flags: -DKokkos_ENABLE_OPENMP=ON
- name: cuda
c_compiler: gcc
cxx_compiler: g++
cmake_flags: -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80=ON
- name: hip
c_compiler: hipcc
cxx_compiler: hipcc
cmake_flags: -DKokkos_ENABLE_HIP=ON -DKokkos_ARCH_VEGA90A=ON
- name: sycl
c_compiler: icx
cxx_compiler: icpx
# building for Intel PVC was unsuccessful without the proper device
# for now on, we simply use Intel GPU generic code
cmake_flags: -DKokkos_ENABLE_SYCL=ON -DKokkos_ARCH_INTEL_GEN=ON
target:
- name: native
cmake_flags: ""
- name: host_device
cmake_flags: -DKokkosFFT_ENABLE_HOST_AND_DEVICE=ON
exclude:
- backend:
name: openmp
target:
name: host_device

steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/[email protected]
with:
tool-cache: true
large-packages: false

- name: Checkout built branch
uses: actions/checkout@v4
with:
submodules: recursive

- name: Configure
run: |
docker run -v ${{ github.workspace }}:/work ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_${{ needs.check_docker_files.outputs.image_name_suffix }} \
cmake -B build \
-DCMAKE_INSTALL_PREFIX=/work/install \
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }} \
-DCMAKE_C_COMPILER=${{ matrix.backend.c_compiler }} \
-DCMAKE_CXX_COMPILER=${{ matrix.backend.cxx_compiler }} \
-DCMAKE_CXX_STANDARD=17 \
-DBUILD_TESTING=ON \
-DKokkosFFT_INTERNAL_Kokkos=ON \
${{ matrix.backend.cmake_flags }} \
${{ matrix.target.cmake_flags }}

- name: Build
run: |
docker run -v ${{ github.workspace }}:/work ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_${{ needs.check_docker_files.outputs.image_name_suffix }} \
cmake --build build -j $(( $(nproc) * 2 + 1 ))

- name: Prepare artifacts
# this is mandatory to preserve execution rights
run: tar -cvf tests_${{ matrix.backend.name }}.tar build/
if: ${{ matrix.target.name == 'native' }}

- name: Save artifacts
# use v3 as more recent versions cannot run on Ruche
uses: actions/upload-artifact@v3
with:
name: tests_${{ matrix.backend.name }}
path: tests_${{ matrix.backend.name }}.tar
if: ${{ matrix.target.name == 'native' }}

- name: Install
pzehner marked this conversation as resolved.
Show resolved Hide resolved
run: |
docker run -v ${{ github.workspace }}:/work ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_${{ needs.check_docker_files.outputs.image_name_suffix }} \
cmake --install build

- name: Configure and build test code
# Use the built and installed Kokkos FFT library to build a test code
run: |
docker run -v ${{ github.workspace }}:/work ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_${{ needs.check_docker_files.outputs.image_name_suffix }} \
cmake -B build_test \
-DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} \
-DCMAKE_C_COMPILER=${{ matrix.backend.c_compiler }} \
-DCMAKE_CXX_COMPILER=${{ matrix.backend.cxx_compiler }} \
-DCMAKE_CXX_STANDARD=17 \
-DCMAKE_PREFIX_PATH=/work/install \
install_test
docker run -v ${{ github.workspace }}:/work ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_${{ needs.check_docker_files.outputs.image_name_suffix }} \
cmake --build build_test -j $(( $(nproc) * 2 + 1 ))

test:
runs-on: ${{ matrix.backend.runner }}

needs:
- check_docker_files
- build

# run this job even if build_base did not run
if: ${{ ! cancelled() && needs.build.result == 'success' }}

strategy:
matrix:
backend:
# run CUDA tests on Ruche supercomputer
- name: cuda
runner: [self-hosted, cuda]
# run OpenMP tests on Azure server
- name: openmp
runner: ubuntu-latest

steps:
- name: Get artifacts
# use v3 as more recent versions cannot run on Ruche
uses: actions/download-artifact@v3
with:
name: tests_${{ matrix.backend.name }}

- name: Deploy artifacts
run: tar -xvf tests_${{ matrix.backend.name }}.tar

- name: Pull Singularity image
# pulling the image in advance seems necessary as sometimes invoking `singularity run` on the image URL fails because it cannot find ghcr.io
run: singularity pull oras://ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_singularity_${{ needs.check_docker_files.outputs.image_name_suffix }}:latest
if: ${{ matrix.backend.name == 'cuda' }}

- name: Run CUDA tests within Slurm job and Singularity image
run: |
srun --nodes=1 --time=01:00:00 -p gpua100 --gres=gpu:1 \
singularity run --nv --bind $PWD/build:/work/build -H /work/build base_${{ matrix.backend.name }}_singularity_${{ needs.check_docker_files.outputs.image_name_suffix }}_latest.sif \
ctest
if: ${{ matrix.backend.name == 'cuda' }}

- name: Run OpenMP tests within Docker image
run: |
docker run -v $PWD/build:/work/build -w /work/build ghcr.io/cexa-project/kokkos-fft/base_${{ matrix.backend.name }}_${{ needs.check_docker_files.outputs.image_name_suffix }} \
ctest
if: ${{ matrix.backend.name == 'openmp' }}
28 changes: 28 additions & 0 deletions .github/workflows/cleanup_base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Clean old Docker and Singularity images that are too old periodically.

name: Cleanup base images

on:
schedule:
- cron: "0 3 2,16 * *" # every 2nd and 16th of the month at 3am UTC

jobs:
cleanup:
runs-on: ubuntu-latest

steps:
- name: Cleanup old images
uses: SmartsquareGmbH/[email protected]
with:
type: container
names: |
base_cuda_main
base_cuda_pr
base_cuda_singularity_main
base_cuda_singularity_pr
base_hip_main
base_hip_pr
base_openmp_main
base_openmp_pr
base_sycl_main
base_sycl_pr
Loading
Loading