Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Belos: BlockGmresSolMgr significantly slower on CUDA when not using UVM #12029

Closed
ddement opened this issue Jul 10, 2023 · 7 comments
Closed
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. pkg: Belos type: bug The primary issue is a bug in Trilinos code or tests

Comments

@ddement
Copy link

ddement commented Jul 10, 2023

@srajama1

We have been doing a series of experiments in Nalu-Wind regarding the removal of UVM from CUDA runs. When Trilinos is built without UVM, most regression tests run slightly faster than with it. However, one case runs approximately 30x slower when UVM is not used. In particular, this slowdown has been traced primarily to the "BlockGmresSolMgr total solve time" and "ICGS[2]: Ortho (Norm)" timing lines from Belos. Several other regression tests exercise other Belos solvers, and none of them show similar regressions.

Unfortunately, the reproducer for this case is a Nalu-Wind regression test - we do not have a more minimal problem. The regression test in question is the "taylorGreenVortex_p3" test. We can assist with running and debugging as necessary. @jhux2 may also have experience with running this case.

@ddement ddement added type: bug The primary issue is a bug in Trilinos code or tests pkg: Belos labels Jul 10, 2023
@jhux2
Copy link
Member

jhux2 commented Jul 10, 2023

@trilinos/belos

@jhux2
Copy link
Member

jhux2 commented Jul 10, 2023

@ddement #11837 is in progress to address GMRES orthogonalization in Belos. I'm wondering if you are running into the case that the PR is meant to address.

@cgcgcg
Copy link
Contributor

cgcgcg commented Jul 10, 2023

Maybe related: #9979 ?

@jhux2
Copy link
Member

jhux2 commented Jul 10, 2023

@ddement Do you know if the 30x slowdown is in the velocity, continuity, or both phases? It's been quite a while since I ran this test case.

@ddement
Copy link
Author

ddement commented Jul 11, 2023

@jhux2

It is used for the velocity solve. Continuity uses BiCGStab, which is a little slower (a couple seconds) on this particular run, but nowhere near the 30x number.

I'll take a look at the other PRs to see if I think they're related - I would not be surprised if they are.

Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Jul 13, 2024
Copy link

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Aug 14, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. pkg: Belos type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

3 participants