-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential fix for peformance regression in #14415 #14706
Conversation
/ok to test |
I'm hoping some enterprising soul can run the V100 benchmark as @GregoryKimball did here, as I lack access to a machine with that card. The resultant PTX code from this patch appears similar to the code with the lines commented out as was done here. Edit: I scrounged up an AWS instance with a V100.
This PR:
|
Ran a few more benchmarks.
|
Wow @etseidl it looks like you solved the mystery!! |
Nah, you solved the mystery @GregoryKimball, I just submitted the PR 😅 |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is the only spot where I got "creative" with the control flow. Looks good now.
Thank you for solving this!
/merge |
Description
Potentially fixes #14415
Checklist