-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HIP multi-GPU bug #822
Conversation
To fix a problem about multi-GPU parallelism on AMD.
Add files via upload
HIP multi-GPU bug fixed! |
Excellent! |
Nice! Was the problem that there were still ongoing calculations on the GPU from previous |
perhaps CUDA version is just lucky. Need a sync rigorously speaking. |
This is great! I will do some benchmarking :) Btw @brucefan1983 would it be possible to lower the system size requirement for running on multiple GPUs? or is there some technical limitation for this (apart from efficiency). Now we require that the shortest lattice constant is more than 5 times the cutoff per GPU. "The longest direction has less than 5 times of the NEP cutoff per GPU." Since we have 8 MI250x GPUs on one node (4 physical cards) that means we need 5 x 8 x cutoff which is a huge system especially if one has a cubic box. |
Yes this requirement is based on efficiency. Perhaps it can be reduced to a minimal of 3 cutoffs. You can try to change the code to
But I encourage to compare the performance with increasing number of GPUs and then choose wisely. |
Summary
To fix a problem about multi-GPU parallelism on AMD.
Modification
5 additions on the code
Others