-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Distributed Ensemble (MPI Support) (#1090)
Add optional support for distributing simulations within an ensemble across multiple machines via MPI * CMake: Add FLAMEGPU_ENABLE_MPI option. * Breaking: CUDAEnsemble::getLogs() now returns a map They key corresponds to the index of the corresponding RunPlan within the input RunPlanVector. * BugFix: Ensemble log already exist exception message contained bad filepath. * BugFix: Replace occurences of throw with THROW * MPI tests in new target tests_mpi * Warn about MPI link failures with CMake < 3.20.1 * Warn at CMake Configure when mpich forces -flto. * CI: Add MPI CI * Assigns GPUs to MPI ranks per node, allowing more flexible MPI configurations MPI ensembles can use multiple mpi ranks per node, evenly(ish) distributing GPUs across the ranks per shared memory system. If more MPI ranks are used on a node than GPUs, additional ranks will do nothing and a warning is reported. I.e. any number of mpi ranks can be launched, but only the sensible amount will be used. If the user specifies device indices, they will be load balanced, otherwise all visible devices within the node will be balanced. Only one rank per node sends the device string back for telemetry, others send back an empty string (while the assembleGPUsString method is expecting a message from each rank in the world. If no valid CUDAdevices are provided, an exception is raised Device allocation is implemented in a static method so it can be tested programmatically, without launching the test N times with different MPI configurations. Closes #1114 --------- Co-authored-by: Peter Heywood <[email protected]>
- Loading branch information
Showing
37 changed files
with
2,340 additions
and
392 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,261 @@ | ||
# Perform builds with supported MPI versions | ||
name: MPI | ||
|
||
on: | ||
# On pull_requests which mutate this CI workflow, using pr rather than push:branch to ensure that the merged state will be OK, as that is what is important here. | ||
pull_request: | ||
paths: | ||
- ".github/workflows/MPI.yml" | ||
- "tests/test_cases/simulation/test_mpi_ensemble.cu" | ||
- "include/flamegpu/simulation/detail/MPISimRunner.h" | ||
- "include/flamegpu/simulation/detail/AbstractSimRunner.h" | ||
- "src/flamegpu/simulation/detail/MPISimRunner.cu" | ||
- "src/flamegpu/simulation/detail/AbstractSimRunner.cu" | ||
- "src/flamegpu/simulation/CUDAEnsemble.cu" | ||
- "**mpi**" | ||
- "**MPI**" | ||
# Or trigger on manual dispatch. | ||
workflow_dispatch: | ||
|
||
defaults: | ||
run: | ||
# Default to using bash regardless of OS unless otherwise specified. | ||
shell: bash | ||
|
||
jobs: | ||
build-ubuntu: | ||
runs-on: ${{ matrix.cudacxx.os }} | ||
strategy: | ||
fail-fast: false | ||
# Multiplicative build matrix | ||
# optional exclude: can be partial, include: must be specific | ||
matrix: | ||
# CUDA_ARCH values are reduced compared to wheels due to CI memory issues while compiling the test suite. | ||
cudacxx: | ||
- cuda: "12.0" | ||
cuda_arch: "50-real;" | ||
hostcxx: gcc-11 | ||
os: ubuntu-22.04 | ||
python: | ||
- "3.8" | ||
mpi: | ||
- lib: "openmpi" | ||
version: "apt" # MPI 3.1 | ||
# - lib: "openmpi" | ||
# version: "4.1.6" # MPI 3.1 | ||
# - lib: "openmpi" | ||
# version: "4.0.0" # MPI 3.1 | ||
# - lib: "openmpi" | ||
# version: "3.0.0" # MPI 3.1 | ||
# - lib: "openmpi" | ||
# version: "2.0.0" # MPI 3.1 | ||
- lib: "openmpi" | ||
version: "1.10.7" # MPI 3.0 | ||
- lib: "mpich" | ||
version: "apt" # MPI 4.0 | ||
# - lib: "mpich" | ||
# version: "4.1.2" # MPI 4.0 | ||
# - lib: "mpich" | ||
# version: "4.0" # MPI 4.0 | ||
# - lib: "mpich" | ||
# version: "3.4.3" # MPI 3.1 | ||
- lib: "mpich" | ||
version: "3.3" # MPI 3.1 | ||
config: | ||
- name: "Release" | ||
config: "Release" | ||
SEATBELTS: "ON" | ||
VISUALISATION: | ||
- "OFF" | ||
|
||
# Name the job based on matrix/env options | ||
name: "build-ubuntu-mpi (${{ matrix.mpi.lib }}, ${{ matrix.mpi.version }}, ${{ matrix.cudacxx.cuda }}, ${{matrix.python}}, ${{ matrix.VISUALISATION }}, ${{ matrix.config.name }}, ${{ matrix.cudacxx.os }})" | ||
|
||
# Define job-wide env constants, and promote matrix elements to env constants for portable steps. | ||
env: | ||
# Define constants | ||
BUILD_DIR: "build" | ||
FLAMEGPU_BUILD_TESTS: "ON" | ||
# Conditional based on matrix via awkward almost ternary | ||
FLAMEGPU_BUILD_PYTHON: ${{ fromJSON('{true:"ON",false:"OFF"}')[matrix.python != ''] }} | ||
# Port matrix options to environment, for more portability. | ||
CUDA: ${{ matrix.cudacxx.cuda }} | ||
CUDA_ARCH: ${{ matrix.cudacxx.cuda_arch }} | ||
HOSTCXX: ${{ matrix.cudacxx.hostcxx }} | ||
OS: ${{ matrix.cudacxx.os }} | ||
CONFIG: ${{ matrix.config.config }} | ||
FLAMEGPU_SEATBELTS: ${{ matrix.config.SEATBELTS }} | ||
PYTHON: ${{ matrix.python }} | ||
MPI_LIB: ${{ matrix.mpi.lib }} | ||
MPI_VERSION: ${{ matrix.mpi.version }} | ||
VISUALISATION: ${{ matrix.VISUALISATION }} | ||
|
||
steps: | ||
- uses: actions/checkout@v3 | ||
|
||
- name: Install CUDA | ||
if: ${{ startswith(env.OS, 'ubuntu') && env.CUDA != '' }} | ||
env: | ||
cuda: ${{ env.CUDA }} | ||
run: .github/scripts/install_cuda_ubuntu.sh | ||
|
||
- name: Install/Select gcc and g++ | ||
if: ${{ startsWith(env.HOSTCXX, 'gcc-') }} | ||
run: | | ||
gcc_version=${HOSTCXX//gcc-/} | ||
sudo apt-get install -y gcc-${gcc_version} g++-${gcc_version} | ||
echo "CC=/usr/bin/gcc-${gcc_version}" >> $GITHUB_ENV | ||
echo "CXX=/usr/bin/g++-${gcc_version}" >> $GITHUB_ENV | ||
echo "CUDAHOSTCXX=/usr/bin/g++-${gcc_version}" >> $GITHUB_ENV | ||
- name: Install MPI from apt | ||
if: ${{ env.MPI_VERSION == 'apt' }} | ||
working-directory: ${{ runner.temp }} | ||
run: | | ||
sudo apt-get install lib${{ env.MPI_LIB }}-dev | ||
- name: Install OpenMPI from source | ||
if: ${{ env.MPI_VERSION != 'apt' && env.MPI_LIB == 'openmpi' }} | ||
working-directory: ${{ runner.temp }} | ||
run: | | ||
# Note: using download.open-mpi.org as gh tags aren't pre configured | ||
MPI_VERISON_MAJOR_MINOR=$(cut -d '.' -f 1,2 <<< "${{ env.MPI_VERSION}}") | ||
echo "https://download.open-mpi.org/release/open-mpi/v${MPI_VERISON_MAJOR_MINOR}/openmpi-${{ env.MPI_VERSION}}.tar.gz" | ||
wget -q https://download.open-mpi.org/release/open-mpi/v${MPI_VERISON_MAJOR_MINOR}/openmpi-${{ env.MPI_VERSION}}.tar.gz --output-document openmpi-${{ env.MPI_VERSION }}.tar.gz || (echo "An Error occurred while downloading OpenMPI '${{ env.MPI_VERSION }}'. Is it a valid version of OpenMPI?" && exit 1) | ||
tar -zxvf openmpi-${{ env.MPI_VERSION }}.tar.gz | ||
cd openmpi-${{ env.MPI_VERSION}} | ||
./configure --prefix="${{ runner.temp }}/mpi" | ||
make -j `nproc` | ||
make install -j `nproc` | ||
echo "${{ runner.temp }}/mpi/bin" >> $GITHUB_PATH | ||
echo "LD_LIBRARY_PATH=${{ runner.temp }}/mpi/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV | ||
echo "LD_RUN_PATH=${{ runner.temp }}/mpi/lib:${LD_RUN_PATH}" >> $GITHUB_ENV | ||
# This will only work for mpich >= 3.3: | ||
# 3.0-3.2 doesn't appear compatible with default gcc in 22.04. | ||
# 1.x is named mpich2 so requires handling differently | ||
# Uses the ch3 interface, as ch4 isn't available pre 3.4, but one must be specified for some versions | ||
- name: Install MPICH from source | ||
if: ${{ env.MPI_VERSION != 'apt' && env.MPI_LIB == 'mpich' }} | ||
working-directory: ${{ runner.temp }} | ||
run: | | ||
MPI_MAJOR=$(cut -d '.' -f 1 <<< "${{ env.MPI_VERSION}}") | ||
MPI_MINOR=$(cut -d '.' -f 2 <<< "${{ env.MPI_VERSION}}") | ||
[[ ${MPI_MAJOR} < 3 ]] && echo "MPICH must be >= 3.0" && exit 1 | ||
echo "https://www.mpich.org/static/downloads/${{ env.MPI_VERSION }}/mpich-${{ env.MPI_VERSION}}.tar.gz" | ||
wget -q https://www.mpich.org/static/downloads/${{ env.MPI_VERSION }}/mpich-${{ env.MPI_VERSION}}.tar.gz --output-document mpich-${{ env.MPI_VERSION }}.tar.gz || (echo "An Error occurred while downloading MPICH '${{ env.MPI_VERSION }}'. Is it a valid version of MPICH?" && exit 1) | ||
tar -zxvf mpich-${{ env.MPI_VERSION }}.tar.gz | ||
cd mpich-${{ env.MPI_VERSION}} | ||
DISABLE_FORTRAN_FLAGS="" | ||
if (( ${MPI_MAJOR} >= 4 )) || ( ((${MPI_MAJOR} >= 3)) && ((${MPI_MINOR} >= 2)) ); then | ||
# MPICH >= 3.2 has --disable-fortran | ||
DISABLE_FORTRAN_FLAGS="--disable-fortran" | ||
else | ||
DISABLE_FORTRAN_FLAGS="--disable-f77 --disable-fc" | ||
fi | ||
./configure --prefix="${{ runner.temp }}/mpi" --with-device=ch3 ${DISABLE_FORTRAN_FLAGS} | ||
make -j `nproc` | ||
make install -j `nproc` | ||
echo "${{ runner.temp }}/mpi/bin" >> $GITHUB_PATH | ||
echo "LD_LIBRARY_PATH=${{ runner.temp }}/mpi/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV | ||
echo "LD_RUN_PATH=${{ runner.temp }}/mpi/lib:${LD_RUN_PATH}" >> $GITHUB_ENV | ||
- name: Select Python | ||
if: ${{ env.PYTHON != '' && env.FLAMEGPU_BUILD_PYTHON == 'ON' }} | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ env.PYTHON }} | ||
|
||
- name: Install python dependencies | ||
if: ${{ env.PYTHON != '' && env.FLAMEGPU_BUILD_PYTHON == 'ON' }} | ||
run: | | ||
sudo apt-get install python3-venv | ||
python3 -m pip install --upgrade wheel build setuptools | ||
- name: Install Visualisation Dependencies | ||
if: ${{ startswith(env.OS, 'ubuntu') && env.VISUALISATION == 'ON' }} | ||
run: | | ||
# Install ubuntu-20.04 packages | ||
if [ "$OS" == 'ubuntu-20.04' ]; then | ||
sudo apt-get install -y libglew-dev libfontconfig1-dev libsdl2-dev libdevil-dev libfreetype-dev | ||
fi | ||
# Install Ubuntu 18.04 packages | ||
if [ "$OS" == 'ubuntu-18.04' ]; then | ||
sudo apt-get install -y libglew-dev libfontconfig1-dev libsdl2-dev libdevil-dev libfreetype6-dev libgl1-mesa-dev | ||
fi | ||
- name: Install Swig >= 4.0.2 | ||
run: | | ||
# Remove existing swig install, so CMake finds the correct swig | ||
if [ "$OS" == 'ubuntu-20.04' ]; then | ||
sudo apt-get remove -y swig swig4.0 | ||
fi | ||
# Install Ubuntu 18.04 packages | ||
if [ "$OS" == 'ubuntu-18.04' ]; then | ||
sudo apt-get remove -y swig | ||
fi | ||
# Install additional apt-based dependencies required to build swig 4.0.2 | ||
sudo apt-get install -y bison | ||
# Create a local directory to build swig in. | ||
mkdir -p swig-from-source && cd swig-from-source | ||
# Install SWIG building from source dependencies | ||
wget https://github.com/swig/swig/archive/refs/tags/v4.0.2.tar.gz | ||
tar -zxf v4.0.2.tar.gz | ||
cd swig-4.0.2/ | ||
./autogen.sh | ||
./configure | ||
make | ||
sudo make install | ||
# This pre-emptively patches a bug where ManyLinux didn't generate buildnumber as git dir was owned by diff user | ||
- name: Enable git safe-directory | ||
run: git config --global --add safe.directory $GITHUB_WORKSPACE | ||
|
||
- name: Configure cmake | ||
run: > | ||
cmake . -B "${{ env.BUILD_DIR }}" | ||
-DCMAKE_BUILD_TYPE="${{ env.CONFIG }}" | ||
-Werror=dev | ||
-DCMAKE_WARN_DEPRECATED="OFF" | ||
-DFLAMEGPU_WARNINGS_AS_ERRORS="ON" | ||
-DCMAKE_CUDA_ARCHITECTURES="${{ env.CUDA_ARCH }}" | ||
-DFLAMEGPU_BUILD_TESTS="${{ env.FLAMEGPU_BUILD_TESTS }}" | ||
-DFLAMEGPU_BUILD_PYTHON="${{ env.FLAMEGPU_BUILD_PYTHON }}" | ||
-DPYTHON3_EXACT_VERSION="${{ env.PYTHON }}" | ||
-DFLAMEGPU_VISUALISATION="${{ env.VISUALISATION }}" | ||
-DFLAMEGPU_ENABLE_MPI="ON" | ||
-DFLAMEGPU_ENABLE_NVTX="ON" | ||
${MPI_OVERRIDE_CXX_OPTIONS} | ||
- name: Reconfigure cmake fixing MPICH from apt | ||
if: ${{ env.MPI_VERSION == 'apt' && env.MPI_LIB == 'mpich' }} | ||
run: > | ||
cmake . -B "${{ env.BUILD_DIR }}" | ||
-DMPI_CXX_COMPILE_OPTIONS="" | ||
- name: Build static library | ||
working-directory: ${{ env.BUILD_DIR }} | ||
run: cmake --build . --target flamegpu --verbose -j `nproc` | ||
|
||
- name: Build python wheel | ||
if: ${{ env.FLAMEGPU_BUILD_PYTHON == 'ON' }} | ||
working-directory: ${{ env.BUILD_DIR }} | ||
run: cmake --build . --target pyflamegpu --verbose -j `nproc` | ||
|
||
- name: Build tests | ||
if: ${{ env.FLAMEGPU_BUILD_TESTS == 'ON' }} | ||
working-directory: ${{ env.BUILD_DIR }} | ||
run: cmake --build . --target tests --verbose -j `nproc` | ||
|
||
- name: Build tests_mpi | ||
if: ${{ env.FLAMEGPU_BUILD_TESTS == 'ON' }} | ||
working-directory: ${{ env.BUILD_DIR }} | ||
run: cmake --build . --target tests_mpi --verbose -j `nproc` | ||
|
||
- name: Build ensemble example | ||
working-directory: ${{ env.BUILD_DIR }} | ||
run: cmake --build . --target ensemble --verbose -j `nproc` | ||
|
||
- name: Build all remaining targets | ||
working-directory: ${{ env.BUILD_DIR }} | ||
run: cmake --build . --target all --verbose -j `nproc` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.