-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIP and MPI+HIP updates #361
base: master
Are you sure you want to change the base?
Conversation
…MPI+HIP codes. Unify CUDA and HIP code paths (CUDA / HIP => GPU, CUDA_MPIV / HIP_MPIV => MPIV_GPU, etc.).
As noted above, all tests (full test suite) are passing for the CUDA and MPI+CUDA (1 GPU) versions. However, some tests are failing for the HIP and MPI+HIP versions. See the logs below from tests on the MI210s on the AMD AAC. Interestingly, the test failures are slightly different between the HIP and MPI+HIP versions. This is a bit difficult to debug at least when comparing against the working HIP / MPI+HIP versions from the 23.08b release as there are also a number of test failures there. It may be better to pick a commit before the f-function optimizations and run tests there for comparison. Test configuration on the AMD AAC:
HIP test summary and diffs: MPI+HIP test summary and diffs: |
…preprocessor definitions for performance and storage considerations. Refactor preprocessor defintions to avoid unnecessary arithmetic.
…aths for older HIP builds.
…regarding STORE_OPERATOR). Fix segfault in debug builds of GPU code without ERI f function supported enabled but basis contains f functions. Remove unneeded DGEMM operation in CUDA codes in SCF/USCF methods. Other code clean-up.
…ggled on in CMake build.
…power functions (inlined device functions calling pow to preprocessor definitions using multiplication operations). Other code clean-up.
fbd9602
to
f937da6
Compare
…s. Add CMake option to enable LLVM-based address sanitizer (ASAN) for debugging with HIP builds.
… and replace with emulation at full double precision for pre-Pascal NVIDIA GPUs (previously toggled via USE_LEGACY_ATOMICS). Note that the old code was leading to slow and possibly failing SCF convergence which was only exposed during testing with tighter density matrix convergence thresholds and integral cut-offs. This is likely due to the truncation used for energy and gradient calculations (1e-6 and 1e-12, respectively).
…t-offs (abs -> fabs). Tune exchange correlation code.
e9e44dc
to
1840518
Compare
…atomics and ERI f function code in HIP versions.
For record keeping, AMD engineers confirmed in mid-2024 that the QUICK HIP codes (one and two electron) were triggering a bug present in several versions of ROCm/HIP (register spill/fill bug, v5.4.3 - v6.2.0). Commits in this PR detect the ROCm version being built against and refuse to build if a known afflicted version is detected. If a future workaround can be found that does not negatively impact performance, support for these versions may be restored. |
d1be629
to
52c8e65
Compare
52c8e65
to
53c25af
Compare
1a98223
to
508fc52
Compare
…ion (< v5.3.0) due to poor performance and use CPU diagonalization routines instead.
…lds on AMD MI300 series GPUs.
… function arguments to save stack space.
d2f68c5
to
b43412e
Compare
This MR ports the updated CUDA and MPI+CUDA codes (including recently merged f-function optimizations) to HIP and MPI+HIP versions, respectively. Also, this MR begins to unify the CUDA and HIP versions to simpily future GPU code maintainence.
Additional work also included in this PR concerns the following items:
Closes #344.