-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP6 Speed on A100 80g #1181
Comments
Looks like related to #1092 (the speedup numbers are similar). What is your torchao version? Can you try update torchao or install nightly / from source? Should be fixed in 0.6.1 |
Thanks for your help! My torchao version is torchao-0.7.0.dev20241028+cu121. I tried the 0.6.1 and got the correct performance. |
If everything works as expected, let me know so I can close the issue. |
Thanks for your help. I tried the latest torchao==0.7.0+gitcbd90e38 and it worked correctly. But when i installed the torchao-0.7.0.dev20241028+cu121 again, I encounterd the bug: ENV: |
@shihaobai Have you tried recompiling the C++/CUDA code by running |
@tobiasvanderwerff I tried based on the latest commit and it worked correctly. |
ENV:
cuda: 12.1
torch: 2.5.0+cu121
python benchmark_fp6.py
Hello, have you tested the performance of the FP6 kernel on the A100? I found that the speed is much slower compared to FP16."
The text was updated successfully, but these errors were encountered: