Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP and MPI+HIP updates #361

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
c388fde
Port updated CUDA and MPI+CUDA codes (f function support) to HIP and …
ohearnk Apr 15, 2024
534ea10
Merge branch 'master' into hip-f-func-porting.
ohearnk Apr 18, 2024
43e5ad9
Merge branch 'master' into hip-f-func-porting.
ohearnk Apr 19, 2024
a77c08c
Fix uninitialized variable usage.
ohearnk Apr 19, 2024
5e3f244
Add missed file during CUDA source conversion via hipify-perl (*.cuh).
ohearnk Apr 19, 2024
6510281
Fix source file permissions. Remove unused code.
ohearnk Apr 19, 2024
4815e0b
Merge branch 'master' into hip-f-func-porting.
ohearnk Apr 30, 2024
779130d
Deduplicate GPU codes (CUDA/HIP). Change several static constants to …
ohearnk May 1, 2024
c0a1a64
Fix include path for HIP builds. Match preprocessor controlled code p…
ohearnk May 1, 2024
9d56048
Fix initialized data in GPU code (1e and 2e integrals, address later …
ohearnk Jun 22, 2024
0bf89e1
Conditionally compile ROCsolver code (for SCF diagonalizations) if to…
ohearnk Jun 26, 2024
f937da6
Further GPU code deduplication. Use faster math functions for simple …
ohearnk Jun 26, 2024
815b8c9
Remove unnecessary DGEMM in SCF for CUDA GPU codepaths.
ohearnk Jul 1, 2024
72782c8
Ensure QUICK GPU architectures are always set correctly for HIP build…
ohearnk Jul 1, 2024
01315f6
Remove improper legacy atomic support for double precision arithmetic…
ohearnk Aug 20, 2024
2ea2acc
Fix declaration for emulated double precision atomic addition.
ohearnk Aug 20, 2024
dc9031b
Reduce the number of atomics used during computation of operator mati…
ohearnk Aug 21, 2024
55d9ccd
Reduce the number of atomics used during computation of operator mati…
ohearnk Aug 21, 2024
e2b3f9b
Fix memory leaks.
ohearnk Aug 22, 2024
b15e410
Fix deallocation issue.
ohearnk Aug 22, 2024
0f0085c
OEI code tuning.
ohearnk Aug 22, 2024
6aa0c5d
Hand-tune ERI gradient code.
ohearnk Aug 22, 2024
5a0f95e
More ERI gradient tuning. Other code clean-up.
ohearnk Aug 23, 2024
609e55a
Fix truncation of double precision absolute value calculations for cu…
ohearnk Aug 30, 2024
d57901c
Remove superfluous arithmetic in generated one electron integral code.
ohearnk Aug 30, 2024
bcb47de
Indexing changes for intermediate data structures.
ohearnk Sep 19, 2024
6940f20
Remove stack size limits and cache config hints.
ohearnk Sep 19, 2024
1cf87a1
Revert commit 609e55a.
ohearnk Oct 2, 2024
dfe1173
Add ROCm version check (via hipcc) to fail for known versions afflict…
ohearnk Oct 2, 2024
ff9200d
Fix ROCm/HIP version check.
ohearnk Oct 3, 2024
1840518
Merge branch 'master' into hip-f-func-porting.
ohearnk Oct 14, 2024
aa23ee6
Merge branch 'master' into hip-f-func-porting.
ohearnk Oct 21, 2024
c2e6690
Merge branch 'master' into hip-f-func-porting.
ohearnk Oct 24, 2024
2d7d964
Revert commit 01315f6. Simplify legacy atomics code.
ohearnk Oct 25, 2024
337a584
Configure rocBLAS source directory by ROCm version. Re-enable legacy …
ohearnk Oct 25, 2024
6affe35
Fix non-legacy atomics in gradient calculations.
ohearnk Oct 26, 2024
0f35424
Standardize type conversions in legacy atomic codepath.
ohearnk Oct 28, 2024
e0110a8
More type conversion standization for legacy atomics (host-side code).
ohearnk Oct 28, 2024
63af5a1
Fix integer atomics conversion in HIP codes.
ohearnk Oct 30, 2024
9eb63a4
Correctly scope emulated double precision atomics with integer atomics.
ohearnk Oct 30, 2024
53c25af
Add wrappers for GPU functions.
ohearnk Dec 5, 2024
508fc52
Merge branch 'master' into hip-f-func-porting.
ohearnk Dec 10, 2024
f3bfa32
HIP: switch to non-deprecated functions. Use native atomic functions.
ohearnk Dec 11, 2024
870e0df
Remove unused code.
ohearnk Dec 18, 2024
f98e306
Further restrict HIP builds to known working ROCm versions.
ohearnk Dec 18, 2024
3724740
Disable diagonalization on the GPU with rocSOLVER for older ROCm vers…
ohearnk Dec 21, 2024
4133692
Make error messages GPU-agnostic.
ohearnk Dec 24, 2024
c04b2b8
Update log file citation to match that in README.
ohearnk Jan 6, 2025
5e4c79a
Update README to reflect HIP support being restored.
ohearnk Jan 7, 2025
0456378
Add GFX942 target to CMake flags (QUICK_USER_ARCH) for supporting bui…
ohearnk Jan 7, 2025
b43412e
Clean up ERI f-function specific code (ffff integrals). Remove unused…
ohearnk Jan 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Features
* Supports QM/MM calculations with Amber22 and later
* Fortran API to use QUICK as QM energy and force engine
* MPI parallelization for CPU platforms
* Massively parallel GPU implementation via CUDA/HIP for Nvidia/AMD GPUs (HIP available in QUICK-23.08, currently disabled)
* Massively parallel GPU implementation via CUDA/HIP for Nvidia/AMD GPUs
* Multi-GPU support via MPI + CUDA/HIP, also across multiple compute nodes

Limitations
Expand All @@ -36,7 +36,6 @@ Limitations
* Effective core potentials (ECPs) are not supported
* DFT calculations are performed exclusively using the SG1 grid system
* No meta-GGA functionals, no range-separated hybrid functionals
* HIP (AMD GPU support) is currently disabled (available in QUICK-23.08 but not QUICK-24.03)

Installation
------------
Expand All @@ -61,9 +60,9 @@ Citation
--------
Please cite QUICK-24.03 as follows.

Manathunga, M.; O'Hearn, K. A., Shajan, A.; Smith, J.; Miao, Y.; He, X.; Ayers, K;
Brothers, E.; Götz, A. W.; Merz, K. M. QUICK-24.03
University of California San Diego, CA and
Manathunga, M.; O'Hearn, K. A.; Shajan, A.; Smith, J.; Miao, Y.; He, X.; Ayers, K;
Brothers, E.; Götz, A. W.; Merz, K. M. QUICK-24.03.
University of California, San Diego, CA and
Michigan State University, East Lansing, MI, 2024.

If you perform density functional theory calculations please also cite:
Expand Down
44 changes: 22 additions & 22 deletions configure
Original file line number Diff line number Diff line change
Expand Up @@ -1306,7 +1306,7 @@ for buildtype in $buildtypes; do

if [ "$enablef" = 'yes' ]; then
echo "F functions will be compiled in the $buildtype version."
cuda_incl_flags="$cuda_incl_flags -DCUDA_SPDF"
cuda_incl_flags="$cuda_incl_flags -DGPU_SPDF"
fi

# set cew flag for nvcc
Expand All @@ -1331,7 +1331,7 @@ for buildtype in $buildtypes; do

if [ "$enablef" = 'yes' ]; then
echo "F functions will be compiled in the $buildtype version."
hip_incl_flags="$hip_incl_flags -DHIP_SPDF"
hip_incl_flags="$hip_incl_flags -DGPU_SPDF"
fi

fi
Expand All @@ -1345,31 +1345,31 @@ for buildtype in $buildtypes; do

elif [ "$buildtype" = 'cuda' ]; then

fort_flags="$fort_flags -DCUDA"
cc_flags="$cc_flags -DCUDA"
cxx_flags="$cxx_flags -DCUDA"
cuda_incl_flags="$cuda_incl_flags -DCUDA"
fort_flags="$fort_flags -DGPU -DCUDA"
cc_flags="$cc_flags -DGPU -DCUDA"
cxx_flags="$cxx_flags -DGPU -DCUDA"
cuda_incl_flags="$cuda_incl_flags -DGPU -DCUDA"

elif [ "$buildtype" = 'cudampi' ]; then

fort_flags="$fort_flags -DMPIV -DCUDA_MPIV"
cc_flags="$cc_flags -DMPIV -DCUDA_MPIV"
cxx_flags="$cxx_flags -DMPIV -DCUDA_MPIV"
cuda_incl_flags="$cuda_incl_flags -DMPIV -DCUDA_MPIV"
fort_flags="$fort_flags -DMPIV -DMPIV_GPU -DCUDA_MPIV"
cc_flags="$cc_flags -DMPIV -DMPIV_GPU -DCUDA_MPIV"
cxx_flags="$cxx_flags -DMPIV -DMPIV_GPU -DCUDA_MPIV"
cuda_incl_flags="$cuda_incl_flags -DMPIV -DMPIV_GPU -DCUDA_MPIV"

elif [ "$buildtype" = 'hip' ]; then

fort_flags="$fort_flags -DHIP"
cc_flags="$cc_flags -DHIP"
cxx_flags="$cxx_flags -DHIP"
hip_incl_flags="$hip_incl_flags -DHIP"
fort_flags="$fort_flags -DGPU -DHIP"
cc_flags="$cc_flags -DGPU -DHIP"
cxx_flags="$cxx_flags -DGPU -DHIP"
hip_incl_flags="$hip_incl_flags -DGPU -DHIP"

elif [ "$buildtype" = 'hipmpi' ]; then

fort_flags="$fort_flags -DMPIV -DHIP_MPIV"
cc_flags="$cc_flags -DMPIV -DHIP_MPIV"
cxx_flags="$cxx_flags -DMPIV -DHIP_MPIV"
hip_incl_flags="$hip_incl_flags -DMPIV -DHIP_MPIV"
fort_flags="$fort_flags -DMPIV -DMPIV_GPU -DHIP_MPIV"
cc_flags="$cc_flags -DMPIV -DMPIV_GPU -DHIP_MPIV"
cxx_flags="$cxx_flags -DMPIV -DMPIV_GPU -DHIP_MPIV"
hip_incl_flags="$hip_incl_flags -DMPIV -DMPIV_GPU -DHIP_MPIV"

fi

Expand Down Expand Up @@ -1438,13 +1438,13 @@ for buildtype in $buildtypes; do
if [ "$buildtype" = 'mpi' ]; then
fort_ext_lib_flags="$fort_ext_lib_flags -DMPIV"
elif [ "$buildtype" = 'cuda' ]; then
fort_ext_lib_flags="$fort_ext_lib_flags -DCUDA"
fort_ext_lib_flags="$fort_ext_lib_flags -DGPU -DCUDA"
elif [ "$buildtype" = 'cudampi' ]; then
fort_ext_lib_flags="$fort_ext_lib_flags -DMPIV -DCUDA_MPIV"
fort_ext_lib_flags="$fort_ext_lib_flags -DMPIV -DMPIV_GPU -DCUDA_MPIV"
elif [ "$buildtype" = 'hip' ]; then
fort_ext_lib_flags="$fort_ext_lib_flags -DHIP"
fort_ext_lib_flags="$fort_ext_lib_flags -DGPU -DHIP"
elif [ "$buildtype" = 'hipmpi' ]; then
fort_ext_lib_flags="$fort_ext_lib_flags -DMPIV -DHIP_MPIV"
fort_ext_lib_flags="$fort_ext_lib_flags -DMPIV -DMPIV_GPU -DHIP_MPIV"
fi

# set the installer
Expand Down
10 changes: 5 additions & 5 deletions quick-cmake/FindHipCUDA.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -890,12 +890,12 @@ endif()
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE CACHE BOOL "Add paths to linker search and installed rpath")

# Use target ID syntax if supported for AMDGPU_TARGETS
if(TARGET_ID_SUPPORT)
#if(TARGET_ID_SUPPORT)
# set(AMDGPU_TARGETS gfx803;gfx900:xnack-;gfx906:xnack-;gfx908:xnack- CACHE STRING "List of specific machine types for library to target")
set(AMDGPU_TARGETS ${QUICK_USER_ARCH} CACHE STRING "List of specific machine types for library to target")
else()
set(AMDGPU_TARGETS gfx803;gfx900;gfx906;gfx908;gfx90a CACHE STRING "List of specific machine types for library to target")
endif()
#else()
# set(AMDGPU_TARGETS gfx803;gfx900;gfx906;gfx908;gfx90a CACHE STRING "List of specific machine types for library to target")
#endif()
set(AMDGPU_TARGETS "${QUICK_USER_ARCH}" CACHE STRING "List of specific machine types for library to target")
set(AMDGPU_TEST_TARGETS "" CACHE STRING "List of specific device types to test for") # Leave empty for default system device

list(APPEND CMAKE_PREFIX_PATH /opt/rocm /opt/rocm/hip)
Expand Down
79 changes: 62 additions & 17 deletions quick-cmake/QUICKCudaConfig.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ set(QUICK_GPU_TARGET_NAME "cuda")
set(GPU_LD_FLAGS "") # hipcc requires special flags for linking (see below)

if(CUDA)

find_package(CUDA REQUIRED)

if(NOT CUDA_FOUND)
Expand Down Expand Up @@ -126,7 +125,7 @@ if(CUDA)
if("${QUICK_USER_ARCH}" MATCHES "maxwell")
message(STATUS "Configuring QUICK for SM5.0")
list(APPEND CUDA_NVCC_FLAGS ${SM50FLAGS})
list(APPEND CUDA_NVCC_FLAGS -DUSE_LEGACY_ATOMICS)
list(APPEND CUDA_NVCC_FLAGS -DUSE_LEGACY_ATOMICS)
set(DISABLE_OPTIMIZER_CONSTANTS TRUE)
set(FOUND "TRUE")
endif()
Expand Down Expand Up @@ -266,15 +265,15 @@ if(CUDA)
endif()

# extra CUDA flags
list(APPEND CUDA_NVCC_FLAGS -use_fast_math)
list(APPEND CUDA_NVCC_FLAGS --use_fast_math)

if(TARGET_LINUX OR TARGET_OSX)
list(APPEND CUDA_NVCC_FLAGS --compiler-options -fPIC)
endif()

# SPDF
if(ENABLEF)
list(APPEND CUDA_NVCC_FLAGS -DCUDA_SPDF)
list(APPEND CUDA_NVCC_FLAGS -DGPU_SPDF)
endif()

if(DISABLE_OPTIMIZER_CONSTANTS)
Expand All @@ -299,9 +298,6 @@ endif()
#option(HIP_RDC "Build relocatable device code, also known as separate compilation mode." FALSE)
#option(HIP_WARP64 "Build for CDNA AMD GPUs (warp size 64) or RDNA (warp size 32)" TRUE)
if(HIP)
# HIP builds currently unavailable (TODO: fix post release)
message(FATAL_ERROR "Error: HIP support is currently unavailable in this QUICK release. Support will be added back in a future release.")

set(QUICK_GPU_PLATFORM "HIP")
set(QUICK_GPU_TARGET_NAME "hip")
set(GPU_LD_FLAGS -fgpu-rdc --hip-link)
Expand All @@ -325,19 +321,15 @@ if(HIP)
endif()

list(APPEND AMD_HIP_FLAGS -fPIC -std=c++14)
set(TARGET_ID_SUPPORT ON)
#set(TARGET_ID_SUPPORT ON)

# if(HIP_WARP64)
# add_compile_definitions(QUICK_PLATFORM_AMD_WARP64)
# endif()

# HIP codes currently do not support f-functions with -DUSE_LEGACY_ATOMICS targets (gfx906 and gfx908)
if(ENABLEF AND (("${QUICK_USER_ARCH}" STREQUAL "") OR ("${QUICK_USER_ARCH}" MATCHES "gfx906") OR ("${QUICK_USER_ARCH}" MATCHES "gfx908")))
message(FATAL_ERROR "Error: Unsupported HIP options (ENABLEF with -DUSE_LEGACY_ATOMICS). ${PROJECT_NAME} support for f-functions requires newer HIP architecture targets not using LEGACY_ATOMICS. Please specify architectures with QUICK_USER_ARCH not needing LEGACY_ATOMICS (post-gfx908) or disable f-function support.")
endif()

if( NOT "${QUICK_USER_ARCH}" STREQUAL "")
set(FOUND "FALSE")

if("${QUICK_USER_ARCH}" MATCHES "gfx908")
message(STATUS "Configuring QUICK for gfx908")
list(APPEND AMD_HIP_FLAGS -DUSE_LEGACY_ATOMICS)
Expand All @@ -346,25 +338,78 @@ if(HIP)

if("${QUICK_USER_ARCH}" MATCHES "gfx90a")
message(STATUS "Configuring QUICK for gfx90a")
list(APPEND AMD_HIP_FLAGS -munsafe-fp-atomics -DAMD_ARCH_GFX90a)
list(APPEND AMD_HIP_FLAGS -DAMD_ARCH_GFX90a)
set(FOUND "TRUE")
endif()

if("${QUICK_USER_ARCH}" MATCHES "gfx942")
message(STATUS "Configuring QUICK for gfx942")
list(APPEND AMD_HIP_FLAGS -DAMD_ARCH_GFX90a)
set(FOUND "TRUE")
endif()

if (NOT ${FOUND})
message(FATAL_ERROR "Invalid value for QUICK_USER_ARCH. Possible values are gfx908, gfx90a.")
message(FATAL_ERROR "Invalid value for QUICK_USER_ARCH. Possible values are gfx908, gfx90a, gfx942.")
endif()
else()
list(APPEND AMD_HIP_FLAGS -DUSE_LEGACY_ATOMICS)
set(QUICK_USER_ARCH "gfx908")
list(APPEND AMD_HIP_FLAGS -DUSE_LEGACY_ATOMICS)
message(STATUS "AMD GPU architecture not specified. Code will be optimized for gfx908.")
endif()

find_package(HipCUDA REQUIRED)

execute_process(
COMMAND ${HIP_HIPCC_EXECUTABLE} --version
OUTPUT_VARIABLE HIPCC_VERSION_OUTPUT
RESULT_VARIABLE HIPCC_VERSION_RESULT)

if(NOT HIPCC_VERSION_RESULT EQUAL "0")
message(FATAL_ERROR "Failed to get ROCm/HIP version.")
endif()

string(REPLACE "\n" ";" HIPCC_VERSION_OUTPUT ${HIPCC_VERSION_OUTPUT})
string(REGEX MATCH "rocm-([0-9]+).([0-9]+).([0-9]+)" _ "${HIPCC_VERSION_OUTPUT}")
set(HIP_VERSION_MAJOR ${CMAKE_MATCH_1})
set(HIP_VERSION_MINOR ${CMAKE_MATCH_2})
set(HIP_VERSION_PATCH ${CMAKE_MATCH_3})
set(HIP_VERSION "${HIP_VERSION_MAJOR}.${HIP_VERSION_MINOR}.${HIP_VERSION_PATCH}" CACHE STRING "ROCm/HIP version (reported by hipcc).")
mark_as_advanced(HIP_VERSION)
message(STATUS "Detected ROCm/HIP version: ${HIP_VERSION}")

# check ROCm version (as reported by hipcc),
# as the QUICK HIP codes trigger a known scalar register fill/spill bug
# in several ROCm versions
if (${HIP_VERSION} VERSION_GREATER_EQUAL 5.4.3)
message(STATUS "")
message("************************************************************")
message("Error: Incompatible ROCm/HIP version: ${HIP_VERSION}")
message(" The QUICK HIP codes trigger a known compiler scalar register ")
message(" fill/spill bug in ROCm >= v5.4.3.")
message(" Please build QUICK with a known working ROCm version.")
message("************************************************************")
message(STATUS "")
message(FATAL_ERROR)
endif()

list(APPEND CUDA_NVCC_FLAGS ${AMD_HIP_FLAGS})

if(QUICK_DEBUG_HIP_ASAN)
set(QUICK_USER_ARCH "${QUICK_USER_ARCH}:xnack+")
list(APPEND CUDA_NVCC_FLAGS -fsanitize=address -fsanitize-recover=address -shared-libsan -g --offload-arch=${QUICK_USER_ARCH})
endif()

# SPDF
if(ENABLEF)
list(APPEND CUDA_NVCC_FLAGS -DGPU_SPDF)
endif()

if(USE_LEGACY_ATOMICS)
list(APPEND CUDA_NVCC_FLAGS -DUSE_LEGACY_ATOMICS)
endif()

set(CMAKE_CXX_COMPILER ${HIP_HIPCC_EXECUTABLE})
set(CMAKE_CXX_LINKER ${HIP_HIPCC_EXECUTABLE})
set(CMAKE_CXX_LINKER ${HIP_HIPCC_EXECUTABLE})

# if(HIP_RDC)
# # Only hipcc can link a library compiled using RDC mode
Expand Down
Loading
Loading