-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix order #4914
base: main
Are you sure you want to change the base?
Fix order #4914
Conversation
I don't understand what problems this PR is trying to address. Is there any specific case you find something wrong? |
It solves a case where your tensor has contiguity of all 1s (or more generally, the contiguity of all the dimension in the tensor are equal). In that case, using In my use case, I had a load (which cannot be vectorized) which was consumed by a store (which can be vectorized). The transpose introduced by the coalesce pass lead to an unnecessary convertLayout in the final code. The root cause of this is the fact that the This caused some performance issues which go away with this PR. |
Seems to me that it's better to take a look at the |
The convert layout is generated because the I understand that not all layout conversions are expensive. However, this one is indeed expensive and unnecessary. |
Either [1, 0] or [0, 1] is correct. I don't know about the performance impact yet after changing it to [1, 0] for other test cases. So I'll block the PR for now. |
please add a lit test in https://github.com/triton-lang/triton/blob/main/test/TritonGPU/coalesce.mlir I agree with Keren that this doesn't address the problem of avoiding incompatible layouts. That being said I find it a bit weird that we pick transposed layout when the compiler cannot tell anything about contiguity. Right now triton to triton GPU defaults to [1, 0] I think we should keep it like that if we have no information about contiguity which is what this PR does. @Jokeren, what you think? |
I think it's fine to merge the PR if there's no perf regression issue. Otherwise you may have to bisect the commits later anyway |
yes definitely. We can test for that or let the nightly catch those |
Makes the default order from contiguity row-major. It used to be tranposed by default, which could lead to some unnecessary convert layouts in the final TTGIR if the contiguity in all dimensions was equal.
Adds a new
getOrderFromContiguity
function to handle the default case differently, since the order reversal would be surprising from a call toargsort
.Complete the following tasks before sending your PR, and replace
[ ]
with[x]
to indicate you have done them.I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD
.Select one of the following.
/test
forlit
tests/unittest
for C++ tests/python/test
for end-to-end testsFILL THIS IN
.Select one of the following.
lit
tests.lit
tests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)