-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement batched serial iamax #2399
implement batched serial iamax #2399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, the only small question would be: "Are we worried about views containing more than 2B elements with an int return type?"
Assuming that this is a |
I am not really worried was just thinking about it. One thing we could do is check the type that the view is using to store indices and use that. Then the problem becomes a Kokkos Core problem :p |
Sure. |
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
Signed-off-by: Yuuichi Asahi <[email protected]>
c0ccdb9
to
b0ab297
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me!
This PR implements iamax function, which is needed for getrf PR.
Following files are added:
KokkosBatched_Iamax_Serial_Impl.hpp
: Internal interfaces with implementation detailsKokkosBatched_Iamax.hpp
: APIsTest_Batched_SerialIamax.hpp
: Unit tests for thatDetailed description
This returns the index of the first element having maximum absolute value.
X
:(batch_count, n)
The length N vector.
Parallelization would be made in the following manner. This is efficient only when
A is given in
LayoutLeft
for GPUs andLayoutRight
for CPUs (parallelized over batch direction).Tests
X
. Compare the return value with the index of the first element having maximum absolute value.