Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] CUDA Scan implementation #1250

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

ZelboK
Copy link

@ZelboK ZelboK commented Feb 18, 2024

Just an initial skeleton of a scan implementation for CUDA. For brevity I just used the reduce test spec to test my changes. Obviously it would need it's own spec.

Copy link

copy-pr-bot bot commented Feb 18, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

for (size_t i = 0; i < 10; ++i) {
std::cout << host_out[i] << std::endl;
}
self.op_state_.propagate_completion_signal(stdexec::set_value, d_out);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change d_out to *d_out you can confirm that the data is scanning properly. But it won't compile I imagine because of the comcpletion signatures being wrong.


template <class SenderId, class ReceiverId, class InitT, class Fun>
struct receiver_t
: public __algo_range_init_fun::receiver_t<
Copy link
Author

@ZelboK ZelboK Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reuses the algorithm_base.cuh . The ExclusiveScan API I used was the one that allows you to specify an initial value, so I could easily reuse this base. Nearly all of scan.cuh is identical to reduce with the exception of the CUB api they call and the final return type.
The difference between the reduce is that it returns a single value where as a scan is to return an array of data so it is very similar.

// template <class Range>
// using _set_value_t = completion_signatures<set_value_t(
// std::vector<typename __algo_range_init_fun::binary_invoke_result_t<Range, InitT, Fun>>)>;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to get the completion signatures right. Hoping to get some guidance

@ZelboK
Copy link
Author

ZelboK commented Mar 2, 2024

@gevtushenko Can you take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant