Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add support for work group memory free function kernel parameter #15861

Open
wants to merge 118 commits into
base: sycl
Choose a base branch
from

Conversation

lbushi25
Copy link
Contributor

@lbushi25 lbushi25 commented Oct 24, 2024

This PR concludes the implementation of the work group memory extension.
It adds support for work group memory parameters when using free function kernels.

lbushi25 and others added 30 commits August 14, 2024 15:16
*Result += mem[i];
}
} else {
sycl::ext::oneapi::experimental::work_group_memory<int> ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line creates a "dummy" work_group_memory that does not allocate any SLM memory. Therefore, the call to sum_helper should not work because it writes the sum into an unallocated memory location.

Note that work_group_memory is not like work_group_static. You cannot use it to statically allocate local memory. It can only be used to pass SLM via a kernel argument.

Does this code actually work? It seems like it would end up dereferencing an uninitialized ptr.

@Pennycook: I wonder if this will be a common source of confusion for users. Some possible solutions:

  • We could delete the default constructor, which would prevent people from writing code like this. The only remaining constructors take a handler, so it would only be possible to construct a work_group_memory from command group scope.

  • We could require T to be an unbounded array. I think this would dissuade people from trying to statically allocate an SLM variable because it would be obvious that the array has no extent. I think the most common use case is for T to be an unbounded array. People can use work_group_static in the case when T is not unbounded.

  • We could change T to be the element type of an implied array, making work_group_memory more similar to local_accessor.

Copy link
Contributor Author

@lbushi25 lbushi25 Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. I also agree that it seems a bit too easy to make this kind of mistake.
My choice would be either the second bullet point or simply leaving it as is. The spec does say that calling the default constructor from device code is not allowed and one does not need to do much digging to find this sentence in the spec.
I stand corrected, it is actually allowed to call the default constructor from device code, but it would be a dummy object and any operation on it would be undefined behavior except for assigning another object to it.
The frontend, on the other hand, does actually call the default constructor from device code and calls its __init method to set up the device-side objects, which is why I'd really like to keep the default constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could delete the default constructor, which would prevent people from writing code like this. The only remaining constructors take a handler, so it would only be possible to construct a work_group_memory from command group scope.

I'm hesitant to do this, because it seems like the default constructor here fulfils a similar purpose to accessor(). If we want developers to be able to swap out local accessors for this simpler class, don't we need to support something like this?

We could require T to be an unbounded array. I think this would dissuade people from trying to statically allocate an SLM variable because it would be obvious that the array has no extent. I think the most common use case is for T to be an unbounded array. People can use work_group_static in the case when T is not unbounded.

Again, I think we should look at what accessor does here. It's (currently) meaningful to create a placeholder 0-dimension accessor, so for completeness it seems like we should support T as well as T[].

We could change T to be the element type of an implied array, making work_group_memory more similar to local_accessor.

We could do this, but I quite like that work_group_memory supports scalars. The 0-dimension part of accessor has always felt like a hack to me, and if we have an opportunity to move away from that I'd prefer we try and find a solution that works.


As @lbushi25 pointed out, the extension specification already says "Passing a dummy object as a kernel argument or calling any of its other member functions or operators produces undefined behavior.", and the statement about UB seems to hold for both host and device. If we're worried that people will make this mistake a lot, could we just teach the compiler to throw an error if it sees some usage of an uninitialized work_group_memory variable?

Am I right in thinking that a sanitizer would already catch this as an error, since at runtime it will just look like a dereference of a null pointer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to do this, because it seems like the default constructor here fulfils a similar purpose to accessor(). If we want developers to be able to swap out local accessors for this simpler class, don't we need to support something like this?

I have the same general concern, but I'm not sure how real it is. I think we added the accessor default constructor to SYCL 2020 only to make accessor meet the C++ requirements of "container". I'm not sure if application code really uses it, though.

If we're worried that people will make this mistake a lot, could we just teach the compiler to throw an error if it sees some usage of an uninitialized work_group_memory variable?

What do you mean by "throw an error"? Do you mean throw an exception (at runtime) or issue a diagnostic (at compile time)? We cannot throw an exception in device code because exceptions aren't supported there. Issuing a diagnostic at compile time would require tracing the usage of the variable, which is not something the front end normally does.

Am I right in thinking that a sanitizer would already catch this as an error, since at runtime it will just look like a dereference of a null pointer?

Yes, I think the sanitizer could probably catch this. I'm not sure if it would work out of the box, or if we would need to add some special handling. Note, however, that people need to actively choose to run the sanitizer, so it would be easy to have this sort of error in your application and not know about it.

Copy link
Contributor

@Pennycook Pennycook Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same general concern, but I'm not sure how real it is. I think we added the accessor default constructor to SYCL 2020 only to make accessor meet the C++ requirements of "container". I'm not sure if application code really uses it, though.

Wouldn't a default constructor also allow people to define classes that store accessor/work_group_memory members, without having to initialize them upon construction? I thought that this was the use-case for placeholder accessors: somebody defines an object that will store an accessor, but things don't get bound until that object is later used somewhere within a submit and gets associated with a command-group handler.

What do you mean by "throw an error"? Do you mean throw an exception (at runtime) or issue a diagnostic (at compile time)? We cannot throw an exception in device code because exceptions aren't supported there. Issuing a diagnostic at compile time would require tracing the usage of the variable, which is not something the front end normally does.

I meant either. I keep coming back to this new C++26 idea of "erroneous behavior", which could be extended to cases like these. I wasn't necessarily thinking about what would be easy to do, just possible solutions.

The front-end must already have some limited support for this sort of thing, though, because it gives a useful warning in the equivalent pointer case:

int* ptr;
*ptr = 5;
<source>:3:6: warning: variable 'ptr' is uninitialized when used here [-Wuninitialized]
    3 |     *ptr = 5;
      |      ^~~
<source>:2:13: note: initialize the variable 'ptr' to silence this warning
    2 |     int* ptr;
      |             ^
      |              = nullptr
1 warning generated.

I have no idea how difficult it would be to extend this support to other types, but it might be possible.

Yes, I think the sanitizer could probably catch this. I'm not sure if it would work out of the box, or if we would need to add some special handling. Note, however, that people need to actively choose to run the sanitizer, so it would be easy to have this sort of error in your application and not know about it.

Yeah, I agree it's not a perfect solution. But I think what you've said applies to pretty much every use of memory in C++. If somebody created an uninitialized local_ptr<T> or T* inside of a kernel they'd hit UB just as quickly as in this example, and they'd have to use a sanitizer to catch it. So leaving this as UB wouldn't really be making things worse.

If we really want to make the behavior more obvious, maybe we could replace the default constructor with something like work_group_memory(sycl::uninitialized)? Then the example that started this discussion would have to be written as:

sycl::ext::oneapi::experimental::work_group_memory<int> ret = sycl::uninitialized{};

The use-case where a user-defined class has an uninitialized member would also work:

struct S
{
    sycl::work_group_memory<float> m = sycl::uninitialized{};
};

It doesn't really solve the problem, because somebody can still create an uninitialized version if they want to. But it might help to stop somebody from making the mistake.

EDIT: Actually, we might even want to call it something like sycl::indeterminate or sycl::indeterminate_value (see https://en.cppreference.com/w/cpp/language/attributes/indeterminate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants