-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[intel] Remove RewriteTensorPointer
pass
#2359
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Whitney Tsang <[email protected]>
In principle this looks ok - but it is a pretty big divergence from upstream. I understand we want to propagate the tensor pointer as long as possible so we can lower it to 2D block loads if possible. But, if the 2D block load is not possible, do we lose the possibility for optimization of the unpacked load in the TTGIR? The other disadvantage is for debugging - now the Triton Intel GPU to LLVM pass does even more work, and its very hard to debug individual pieces of that pass vs if we could represent TritonGEN::Matrix2BlockLoads in the ttgir. |
Signed-off-by: Whitney Tsang <[email protected]>
I understand your worries. IMO, we should solve this problem in general, even for blocked pointer that can be lowered to 2D block loads. One idea is to introduce a new interface upstream, and modify optimization passes to operate on the interface, so they can work for both tensor of pointers and blocked pointers.
I am not sure I completely understand your idea, as Another motivation to remove |
Baseline:
Remove
=> 06-fused-attention.py performance degraded for FP8. We need to improve |
Yup. Is this something you plan to work on in this PR ? If not we can put this PR in draft mode (until that piece of work is done). |
Moved to draft. |
After #2181,
tt.load
can be lowered with arbitrary combination of the block pointer and layout, so we can simply remove theRewriteTensorPointer
pass.