Skip to content
This repository has been archived by the owner on Jun 27, 2024. It is now read-only.

[RFC] First class Triton support in OpenXLA Nvgpu #54

Open
ezhulenev opened this issue May 4, 2023 · 6 comments
Open

[RFC] First class Triton support in OpenXLA Nvgpu #54

ezhulenev opened this issue May 4, 2023 · 6 comments

Comments

@ezhulenev
Copy link
Contributor

ezhulenev commented May 4, 2023

[RFC] First class Triton support in OpenXLA-Nvgpu

We want to improve the state of Triton and OpenXLA integration, and make jax-triton more user and compiler friendly.

Please let us know what you think!

@ezhulenev ezhulenev changed the title [RFC] [RFC] First class Triton support in OpenXLA Nvgpu [RFC] First class Triton support in OpenXLA Nvgpu May 4, 2023
@stellaraccident
Copy link
Contributor

Neat. My main comment is a meta one: the openxla-nvgpu project is still pretty young and even missing CI and full/proper build support/integration. I'm open to moving fast, but we also need to prioritize some project infrastructure work to hold everything together.

@benvanik
Copy link

benvanik commented May 4, 2023

There's other ways of doing this that are much better integrated and should all work today - I'll respond on the doc but the short of it is custom dispatches (ala samples/custom_dispatch/cuda/) are sufficient and well-supported - custom modules and other things should not be required.

@ezhulenev
Copy link
Contributor Author

👍 good point, I think we can start with custom dispatches. Although if we want to push Triton compilation to run time and bundle it with auto tuning (tile selection mostly?), then we'll not be able to do it as a custom dispatch?

@benvanik
Copy link

benvanik commented May 4, 2023

Ah, so you're intending to use the sample compiled IREE program but vary the triton kernels without recompiling the program?

@ezhulenev
Copy link
Contributor Author

I think we'll have both strategies:

  1. Triton IR is fully defined at compile time and we just produce a PTX from it (that's what I want to start with)
  2. Triton IR is parametrized (I think it's not representable in triton IR today), and at run time we tune it (that's how XLA:GPU uses triton for matmuls today to pick the best tiling strategy)

@benvanik
Copy link

benvanik commented May 4, 2023

Cool. For #1 the custom dispatch way should work. For #2 there are some other ways that are potentially easier. Executable specialization constants can be used to parameterize executables when they are loaded but they may be slightly trickier to integrate with black boxes - may still be interesting to reuse that mechanism with a custom executable type at runtime though. Another option would be to have your custom module return a !hal.executable and schedule work as normal, but at that point it's probably best to use streamable custom calls instead - you'd take your params as push constants, do whatever you needed, and then launch the kernel against the stream.

ezhulenev added a commit that referenced this issue May 17, 2023
Initial implementation of the First class triton integration:
#54

Requires Triton + patches from
https://github.com/ezhulenev/triton/commits/openxla-triton

```
git submodule update --remote third_party/triton
```

Run tests:
```
ctest --test-dir build -R triton
```
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants