-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate the use of SYCL #376
Comments
Comments based on a first (ongoing) reading of the specification, version 1.2.1 revision 5:
|
@makortel FYI |
Thanks. Below I'm mostly thinking out loud.
I don't know, but I really hope we don't need them (sounds like potential slowdown).
Does
I'm hoping we would not need such data structures, but I can also imagine we could easily have cases where such structures would be needed. To me this point is sort of two-edged sword: on one hand it is restrictive, on the other hand, I suppose SYCL would be the way for us to run on certain GPUs so if we want to do that we would have to accept this restriction. Further OTOH, if we would use "higher-level" abstraction than SYCL without such a restriction for non-SYCL backends, we could easily start with SYCL-needed HW by just dropping out those modules needing hierarchical structures. |
According to the documentation CUDA supports system-wide atomic operations, starting from Pascal (sm 6.x GPU) and Xavier (sm 7.2 SoC):
That corresponds to the SYCL workgroup barrier. According to the documentation cooperative groups should allow for different granularity. Unfortunately the documentation is a bit vague, so it's not clear for example if this is allowed if (...) {
auto active = coalesced_threads();
...
active.sync();
}
it seems Intel is adding some extensions to SYCL for its own compiler and gpus: https://github.com/intel/llvm/blob/sycl/sycl/ReleaseNotes.md .
So our baseline may actually be a superset of SYCL 1.2.1 (or a new SYCL version). |
Thanks for the clarifications.
Interesting. Makes me feel even stronger that for time being it might be better to not commit on SYCL for all platforms but to keep it specific Intel. (and adjust if/when the landscape changes) |
Some more details:
I have not read them, but it looks like Intel's SYCL will have pointers and the equivalent of CUDA Unified Memory ... |
Other useful extensions for us
|
In progress in the pixel track standalone code base:
|
From [https://www.khronos.org/sycl/]:
Specifications:
Implementations
The text was updated successfully, but these errors were encountered: