Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP6 Speed on A100 80g #1181

Open
shihaobai opened this issue Oct 28, 2024 · 6 comments
Open

FP6 Speed on A100 80g #1181

shihaobai opened this issue Oct 28, 2024 · 6 comments

Comments

@shihaobai
Copy link

ENV:
cuda: 12.1
torch: 2.5.0+cu121
python benchmark_fp6.py
image
Hello, have you tested the performance of the FP6 kernel on the A100? I found that the speed is much slower compared to FP16."

@gau-nernst
Copy link
Collaborator

Looks like related to #1092 (the speedup numbers are similar). What is your torchao version? Can you try update torchao or install nightly / from source? Should be fixed in 0.6.1

@shihaobai
Copy link
Author

Thanks for your help! My torchao version is torchao-0.7.0.dev20241028+cu121. I tried the 0.6.1 and got the correct performance.

@gau-nernst
Copy link
Collaborator

torchao-0.7.0.dev20241028+cu121 should have the correct fix I think. Can you double check that torchao-0.7.0.dev20241028+cu121 is also working correctly?

If everything works as expected, let me know so I can close the issue.

cc @tobiasvanderwerff

@shihaobai
Copy link
Author

Thanks for your help. I tried the latest torchao==0.7.0+gitcbd90e38 and it worked correctly. But when i installed the torchao-0.7.0.dev20241028+cu121 again, I encounterd the bug:
image

ENV:
cuda: 12.1
torch: 2.5.0+cu121
pip install torchao==0.7.0.dev20241028 --index-url https://download.pytorch.org/whl/nightly/cu121

@tobiasvanderwerff
Copy link
Contributor

@shihaobai Have you tried recompiling the C++/CUDA code by running pip install . in the base directory of ao? This might help resolve the error.

@shihaobai
Copy link
Author

@tobiasvanderwerff I tried based on the latest commit and it worked correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants