-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[alpaka] Refactor prefixScan implementation #220
base: master
Are you sure you want to change the base?
[alpaka] Refactor prefixScan implementation #220
Conversation
20130b8
to
fb7bd6f
Compare
Fixed conflicts and applied code formatting. |
be2894f
to
d427564
Compare
Rebased and fixed conflicts. |
d427564
to
f8a75ea
Compare
Rebased and fixed conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks ok.
On Cori (with CUDA 11.2) I got the following failure when running
I'm really puzzled what |
Here is a stack trace of the exception
|
f8a75ea
to
3ba0e0d
Compare
Fixed conflicts, rebased, etc. |
3ba0e0d
to
63cae86
Compare
While the validation is good, now I see a small but systematic loss in performance. Before:
After:
So 2-3% slower. |
The
prefixScan
algorithm is implemented in Alpaka using two kernels, while a single kernel is used for Native CUDA.I refactored the
prefixScan
implementation in order to use a single kernel (similar with the Native CUDA implementation).