Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HIP multi-GPU bug #822

Merged
merged 3 commits into from
Dec 12, 2024
Merged

Conversation

BohanZhang0908
Copy link
Contributor

Summary
To fix a problem about multi-GPU parallelism on AMD.
Modification
5 additions on the code
Others

@brucefan1983 brucefan1983 changed the title Change Fix HIP multi-GPU bug Dec 12, 2024
@brucefan1983
Copy link
Owner

@Dankomaister
@jesperbygg

HIP multi-GPU bug fixed!

@brucefan1983
Copy link
Owner

@elindgren

@jesperbygg
Copy link

Excellent!

@elindgren
Copy link
Collaborator

Nice! Was the problem that there were still ongoing calculations on the GPU from previous compute calls? Or why the need for gpuDeviceSynchronize

@brucefan1983
Copy link
Owner

perhaps CUDA version is just lucky. Need a sync rigorously speaking.

@brucefan1983 brucefan1983 merged commit 8d885f4 into brucefan1983:master Dec 12, 2024
2 checks passed
@Dankomaister
Copy link

This is great! I will do some benchmarking :)

Btw @brucefan1983 would it be possible to lower the system size requirement for running on multiple GPUs? or is there some technical limitation for this (apart from efficiency).

Now we require that the shortest lattice constant is more than 5 times the cutoff per GPU. "The longest direction has less than 5 times of the NEP cutoff per GPU."

Since we have 8 MI250x GPUs on one node (4 physical cards) that means we need 5 x 8 x cutoff which is a huge system especially if one has a cubic box.

@brucefan1983
Copy link
Owner

This is great! I will do some benchmarking :)

Btw @brucefan1983 would it be possible to lower the system size requirement for running on multiple GPUs? or is there some technical limitation for this (apart from efficiency).

Now we require that the shortest lattice constant is more than 5 times the cutoff per GPU. "The longest direction has less than 5 times of the NEP cutoff per GPU."

Since we have 8 MI250x GPUs on one node (4 physical cards) that means we need 5 x 8 x cutoff which is a huge system especially if one has a cubic box.

Yes this requirement is based on efficiency. Perhaps it can be reduced to a minimal of 3 cutoffs. You can try to change the code to

 if (num_bins_longitudinal < 6) {
    printf("The longest direction has less than 3 times of the NEP cutoff per GPU.\n");
    printf("Please reduce the number of GPUs or increase the simulation cell size.\n");
    exit(1);
  }

But I encourage to compare the performance with increasing number of GPUs and then choose wisely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants