-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add test configurations to run Torch compile (#95) #155
add test configurations to run Torch compile (#95) #155
Conversation
aliabdelkader
commented
Mar 12, 2024
•
edited
Loading
edited
- add test configurations for cuda inference using torch compile
Hi @aliabdelkader, |
49c3fad
to
3963db8
Compare
it seems there's an error with |
yes, I did. But, to be honest in a fairly limited environment basically colab/kaggle notebook. I have seen that error before. But, I mistakenly thought it was a limitation in that environment setup. Now, I think that inductor/Triton is attempting to benchmark different kernels on the gpu to pick the fastest kernel configuration. One of those launches is producing that illegal memory access error. I am not an expert. but, it seems that this behavior from inductor is disabled by forcing pytorch / cudnn to be deterministic. by setting torch.use_deterministic_algorithms() to true and the environment variable I gave that a try locally. it seems to have removed the error. Would like me to make the change or remove the test ? |
@aliabdelkader thank you for looking this up, |
- remove cpu / cuda test configuration that use gpt and torch compile - add test configuration to run inference benchmack with timm model and torch compile on cuda gpu
- enabling conv_1x1_as_mm option removes cuda illegal memory access error that occurs during the tiny stable diffusion model's compilation on CI machine or nvidia T4 gpu
3963db8
to
54a1c09
Compare
Thanks for the blog post's link. I have checked the options mentioned there. I found that setting the mode to I decided make a commit with I hope that was somewhat helpful. Please let me know what you think. |
thanks a lot for this optimal fix ! |
… rocm 5.7 - starting pytorch nightly version 2.3, inductor uses a trition-rocm compiler version that attempts to load rocm 6. Therefore, the torch_compile tests fail with pytorch 2.3+ and rocm 5.7
I checked the cli rocm pytorch test failure. The test was failing because there is a missing library It seems that inductor (triton compiler) in the pytorch nightly version 2.3+ wants to load rocm 6. I removed the torch_compile tests from the workflow file for the pytorch nightly and rocm5.7 image. But, that feels like a workaround. Do you think there is a better way of handling it ? |
@aliabdelkader thanks for the fix, could we just set |
… failure - In pytorch nightly starting version 2.3, inductor uses a triton compiler that attempts to load rocm 6 which was casuing torch_compile tests to fail. - remove pytorch nightly from cli rocm pytorch workflow. - Revert "disable torch_compile tests for pytorch nightly version 2.3+ with amd rocm 5.7" commit
@IlyasMoutawwakil yes I think that would work. I pushed that change. |