-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of kokkos serial backend vs. plain serial code #297
Comments
In plugin-SiPixelClusterizer, there's a difference between the serial code and Kokkos code. Kokkos version (around line 97)
The serial version (around line 58)
Two reasons the Kokkos version is doing more work:
If the Kokkos loop is changed to match the serial loop, the performance becomes similar. Notes
Link to Kokkos version: Link to serial version: |
@markdewing Thanks, good catch. Digging from history both of these (the loop bound and initialization of additional variables) was done in #80, and probably this comment #80 (comment) explains the reasoning. Since both |
The time from the kokkos serial backend (
kokkos --serial
) is slower for one thread than the standalone serial code (serial
).Looking at profiles, the function
kernel_connect
( inplugin-PixelTriplets
) takes significantly more time in the kokkos version. From the instructions retired, it is clearly performing more operations in the kokkos version.The loops do not perform their iterations in the same order.
The serial version has no outer loop. However, printing the values for
idx
andj
shows the same values get accessed, just in a different order between the versions.It looks like (based on instructions retired), the kokkos version is doing more work (by a factor of 2x or more), in routines like areAlignedRZ. But based on printing out how many times it's called, they should be the same.
The text was updated successfully, but these errors were encountered: