-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Belos: BlockGmresSolMgr significantly slower on CUDA when not using UVM #12029
Comments
@trilinos/belos |
Maybe related: #9979 ? |
@ddement Do you know if the 30x slowdown is in the velocity, continuity, or both phases? It's been quite a while since I ran this test case. |
It is used for the velocity solve. Continuity uses BiCGStab, which is a little slower (a couple seconds) on this particular run, but nowhere near the 30x number. I'll take a look at the other PRs to see if I think they're related - I would not be surprised if they are. |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
This issue was closed due to inactivity for 395 days. |
@srajama1
We have been doing a series of experiments in Nalu-Wind regarding the removal of UVM from CUDA runs. When Trilinos is built without UVM, most regression tests run slightly faster than with it. However, one case runs approximately 30x slower when UVM is not used. In particular, this slowdown has been traced primarily to the "BlockGmresSolMgr total solve time" and "ICGS[2]: Ortho (Norm)" timing lines from Belos. Several other regression tests exercise other Belos solvers, and none of them show similar regressions.
Unfortunately, the reproducer for this case is a Nalu-Wind regression test - we do not have a more minimal problem. The regression test in question is the "taylorGreenVortex_p3" test. We can assist with running and debugging as necessary. @jhux2 may also have experience with running this case.
The text was updated successfully, but these errors were encountered: