-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BACKEND] Use vectorized atomics on Hopper (#4971)
Hopper supports vectorized atomics for add, max, and min. This PR adds support for generating these instructions. Note: atomic add/min/max also have packed instructions for f16x2 and bf16x2. Packed instructions were used prior to this PR, but vectorized instructions weren't. When vectorized instructions are available, this PR switches to using vectorized instructions (like .v2.f16 instead of .f16x2, or .v8.f16 instead of .v4.f16x2). When vectorized instructions aren't available, packed instructions will be used instead. This PR also adds a check for mask alignment, which wasn't previously checked.
- Loading branch information
1 parent
a20ce64
commit a1aa58b
Showing
3 changed files
with
184 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters