You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the Triton::MakeTensorPtrOp has to be rewritten by the RewriteTensorPointer pass to use "regular" memory operations, the generated code seems less performant than a code directly written using regular operations.
Indeed, the Trtion::AdvanceOp are used as anchors to generate the new memory accesses, which cause the entire code that calculates the pointers to be inside the loop, while a significant part of these instructions could be hoisted outside the loop.
For example:
The relevant section of the TritonGPU MLIR code of the 03 tutorial, looks like;
We can fallback to gather/scatter memory accessing when lowering tt.load and tt.store from TTGIR to SIMT LLVM.
The offsets can be re-calculated with the information of the block pointer when lowering.
So that we can remove the RewriterTensorPointer pass which maybe not efficient.
chengjunlu
changed the title
Improve the code generated by the RewriteTensorPointer pass.
[Performance] Improve the code generated by the RewriteTensorPointer pass.
Aug 30, 2024
When the
Triton::MakeTensorPtrOp
has to be rewritten by theRewriteTensorPointer
pass to use "regular" memory operations, the generated code seems less performant than a code directly written using regular operations.Indeed, the
Trtion::AdvanceOp
are used as anchors to generate the new memory accesses, which cause the entire code that calculates the pointers to be inside the loop, while a significant part of these instructions could be hoisted outside the loop.For example:
The relevant section of the TritonGPU MLIR code of the 03 tutorial, looks like;
While the code for an equivalent of the 03 tutorial using block pointers (after forcing the block pointers to be rewritten) looks like:
The
RewriteTensorPointer
pass should therefore be optimized to hoist these extra instructions out of the loop.The text was updated successfully, but these errors were encountered: