Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP and MPI+HIP updates #361

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open

Conversation

ohearnk
Copy link
Collaborator

@ohearnk ohearnk commented Apr 15, 2024

This MR ports the updated CUDA and MPI+CUDA codes (including recently merged f-function optimizations) to HIP and MPI+HIP versions, respectively. Also, this MR begins to unify the CUDA and HIP versions to simpily future GPU code maintainence.

Additional work also included in this PR concerns the following items:

  • Integer-based atomic operation code paths (used in place of double precision atomics -mainly for older GPUs lacking hardware support [pre-pascal on NVIDIA GPUs])
  • Wrappers around CUDA/HIP APIs (better error checking)
  • HIP diagonalization (rocsolver re-enabled, CPU code disabled)
  • Several bug fixed (memory leaks, initialization errors)
  • ROCm version detection and disabling HIP builds against known buggy ROCm versions (>= v5.4.3)

Closes #344.

…MPI+HIP codes. Unify CUDA and HIP code paths (CUDA / HIP => GPU, CUDA_MPIV / HIP_MPIV => MPIV_GPU, etc.).
@ohearnk ohearnk requested review from agoetz and Madu86 April 15, 2024 03:06
@ohearnk ohearnk self-assigned this Apr 15, 2024
@ohearnk
Copy link
Collaborator Author

ohearnk commented Apr 15, 2024

As noted above, all tests (full test suite) are passing for the CUDA and MPI+CUDA (1 GPU) versions. However, some tests are failing for the HIP and MPI+HIP versions. See the logs below from tests on the MI210s on the AMD AAC. Interestingly, the test failures are slightly different between the HIP and MPI+HIP versions.

This is a bit difficult to debug at least when comparing against the working HIP / MPI+HIP versions from the 23.08b release as there are also a number of test failures there. It may be better to pick a commit before the f-function optimizations and run tests there for comparison.

Test configuration on the AMD AAC:

  • RHEL9 partition (1CN128C8G2H_2IB_MI210_RHEL9)
  • ROCm v5.7.1, UCX v1.15.0, OpenMPI v4.1.6, GCC v11.3.1 (gfortran)
  • f-function support disabled
  • CMake configuration (HIP version):
cmake .. -DCOMPILER=MANUAL -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_Fortran_COMPILER=gfortran -DMPI= -DHIP=TRUE -DQUICK_USER_ARCH=gfx90a -DENABLEF= -DCMAKE_INSTALL_PREFIX=${PWD}/../install_rhel9_hip_gfx90a_rocm5.7.1_ucx1.15.0_ompi4.1.6 -DHIP_TOOLKIT_ROOT_DIR=/shared/apps/rhel9/opt/rocm-5.7.1

HIP test summary and diffs:
runtest_hip.log
hip_test_diffs.log

MPI+HIP test summary and diffs:
runtest_mpi_hip_1gpu.log
mpi_hip_test_diffs.log

ohearnk added 11 commits April 17, 2024 21:47
…preprocessor definitions for performance and storage considerations. Refactor preprocessor defintions to avoid unnecessary arithmetic.
…regarding STORE_OPERATOR). Fix segfault in debug builds of GPU code without ERI f function supported enabled but basis contains f functions. Remove unneeded DGEMM operation in CUDA codes in SCF/USCF methods. Other code clean-up.
…power functions (inlined device functions calling pow to preprocessor definitions using multiplication operations). Other code clean-up.
@ohearnk ohearnk force-pushed the hip-f-func-porting branch from fbd9602 to f937da6 Compare June 26, 2024 18:33
ohearnk added 13 commits July 1, 2024 11:09
…s. Add CMake option to enable LLVM-based address sanitizer (ASAN) for debugging with HIP builds.
… and replace with emulation at full double precision for pre-Pascal NVIDIA GPUs (previously toggled via USE_LEGACY_ATOMICS). Note that the old code was leading to slow and possibly failing SCF convergence which was only exposed during testing with tighter density matrix convergence thresholds and integral cut-offs. This is likely due to the truncation used for energy and gradient calculations (1e-6 and 1e-12, respectively).
…t-offs (abs -> fabs). Tune exchange correlation code.
@ohearnk ohearnk force-pushed the hip-f-func-porting branch from e9e44dc to 1840518 Compare October 14, 2024 13:59
@ohearnk
Copy link
Collaborator Author

ohearnk commented Nov 7, 2024

For record keeping, AMD engineers confirmed in mid-2024 that the QUICK HIP codes (one and two electron) were triggering a bug present in several versions of ROCm/HIP (register spill/fill bug, v5.4.3 - v6.2.0). Commits in this PR detect the ROCm version being built against and refuse to build if a known afflicted version is detected. If a future workaround can be found that does not negatively impact performance, support for these versions may be restored.

@ohearnk ohearnk force-pushed the hip-f-func-porting branch from d1be629 to 52c8e65 Compare December 5, 2024 20:01
@ohearnk ohearnk force-pushed the hip-f-func-porting branch from 52c8e65 to 53c25af Compare December 9, 2024 18:19
@ohearnk ohearnk marked this pull request as ready for review December 18, 2024 18:16
@ohearnk ohearnk force-pushed the hip-f-func-porting branch from d2f68c5 to b43412e Compare January 9, 2025 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HIP and MPI+HIP builds broken since adding f-function support (PR #312)
1 participant