-
Notifications
You must be signed in to change notification settings - Fork 103
The Kokkos Lectures: Module 6 Q&A
Daniel Arndt edited this page Aug 21, 2020
·
4 revisions
- Yes, that should work.
- 3.0
Why was the HostPinnedSpace generally faster than the standard CudaSpace for the MPI comm? Is this specific to the Spectrum MPI implementation on Summit or sth to generally expect?
- The best choice for the best MPI communication strategies varies wildly. It is advised to write code that supports all cases and then measure which variant is fatest for a specific application/hardware combination.
- HostPinned buffer still lives on the CPU so GPU-aware MPI is not necessary.
- Set OMP_NUM_THREADS appropriately if using OpenMP.
- Per default, NVSHMEM runs on the Cuda Execution Space, SHMEM and MPI One-Sided runs on the host execution space.