Merge OpenAI Triton commit `3613bf4` #2574

whitneywhtsang · 2024-10-25T11:01:17Z

This PR change the Triton base from 1064b59 to 3613bf4 (Oct 24).
Pass rate: 98.98%

Please do not squash and merge this PR.

Note, there are no uses of `nvgpu::` in this lib. Unblocks building `*-opt` tools with "custom" LLVM that was built with `-DLLVM_TARGETS_TO_BUILD="host;AMDGPU"` (i.e., no `NVPTX`).

This PR implements general conversion of MFMA dot operand to Linear Layout.

Hopper supports vectorized atomics for add, max, and min. This PR adds support for generating these instructions. Note: atomic add/min/max also have packed instructions for f16x2 and bf16x2. Packed instructions were used prior to this PR, but vectorized instructions weren't. When vectorized instructions are available, this PR switches to using vectorized instructions (like .v2.f16 instead of .f16x2, or .v8.f16 instead of .v4.f16x2). When vectorized instructions aren't available, packed instructions will be used instead. This PR also adds a check for mask alignment, which wasn't previously checked.

…4974) This is a quick follow-up for the recent autotuner/testing changes as in triton-lang/triton#4496. This PR moves the empty cache creation into the driver code to make the code more device independent.

…d (#4980) The bitwidth is unimplemented in LLVM for pointer types so it throws an exception when evaluating the condition `tensorTy.getElementType().getIntOrFloatBitWidth()`

This commit refactors the AccelerateAMDMatmul patterns in prep for mxfp support.

…#4979)

makslevental and others added 8 commits October 23, 2024 08:15

[Build] Remove unnecessary NVGPUIR from TritonGPUToLLVM (#4977)

c9a40b2

Note, there are no uses of `nvgpu::` in this lib. Unblocks building `*-opt` tools with "custom" LLVM that was built with `-DLLVM_TARGETS_TO_BUILD="host;AMDGPU"` (i.e., no `NVPTX`).

[AMD] Add MFMA dot operand to LinearLayout conversion (#4961)

a20ce64

This PR implements general conversion of MFMA dot operand to Linear Layout.

[AUTOTUNER] A quick follow-up for more device-independent do_bench (#…

6ad95ee

…4974) This is a quick follow-up for the recent autotuner/testing changes as in triton-lang/triton#4496. This PR moves the empty cache creation into the driver code to make the code more device independent.

[BACKEND] Fix when trying to convert an mma<!tt.ptr<f32>> into blocke…

4a54311

…d (#4980) The bitwidth is unimplemented in LLVM for pointer types so it throws an exception when evaluating the condition `tensorTy.getElementType().getIntOrFloatBitWidth()`

[AMD] NFC: Refactor AccelerateAMDMatmul patterns (#4985)

3c13f09

This commit refactors the AccelerateAMDMatmul patterns in prep for mxfp support.

[BACKEND] Fix the register accessing order of dot operands of mmav2 (…

3613bf4

…#4979)

Merge commit '3c13f09ef992594f0cee020b23bc41d45cd87fde'

c0c76b1

whitneywhtsang requested a review from pbchekin October 25, 2024 11:01

whitneywhtsang self-assigned this Oct 25, 2024

whitneywhtsang marked this pull request as ready for review October 25, 2024 11:58

Merge commit '3613bf40d90a38766ec65a250aeadb391f9f7fc9'

be47a27

pbchekin approved these changes Oct 25, 2024

View reviewed changes

whitneywhtsang changed the title ~~Merge OpenAI Triton commit 3c13f09~~ Merge OpenAI Triton commit 3613bf4 Oct 25, 2024

whitneywhtsang merged commit be47a27 into main Oct 25, 2024
5 checks passed

whitneywhtsang deleted the whitneywhtsang/merge2 branch October 25, 2024 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge OpenAI Triton commit `3613bf4` #2574

Merge OpenAI Triton commit `3613bf4` #2574

whitneywhtsang commented Oct 25, 2024 •

edited

Loading

Merge OpenAI Triton commit 3613bf4 #2574

Merge OpenAI Triton commit 3613bf4 #2574

Conversation

whitneywhtsang commented Oct 25, 2024 • edited Loading

Merge OpenAI Triton commit `3613bf4` #2574

Merge OpenAI Triton commit `3613bf4` #2574

whitneywhtsang commented Oct 25, 2024 •

edited

Loading