-
-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test #47: clblast_test_xconvgemm ...........Subprocess aborted***Exception: 0.45 sec #563
Comments
Could you compile CLBlast in VERBOSE mode to get additional output? And perhaps also share the output of the |
There is temporary build log with |
Thanks, but I think either CMake is suppressing the output of the test itself, or the VERBOSE build did not work. If you run |
Maybe there is not a lot of output because it's aborted early?
In gdb:
|
Another run in gdb with debuginfo installed:
|
Thanks, that's helpful. Indeed, it does not even seem to start the first kernel. It fails before that, during one of the compilation steps of the kernel. In particular, from your first backtrace, it seems to fail in GetIR() in one of the |
Yes and it's worked at 2024-07-08 so it's obscure. Maybe it's related to clang update, I will try to compile with older clang later. |
I bisected our build process and found that the test failure happened since this change in the build env: -rpmi: clang17.0-17.0.6-alt4.2
-rpmi: clang17.0-support-17.0.6-alt4.2
-rpmi: libclang-cpp17-17.0.6-alt4.2
+rpmi: libomp18.1-18.1.8-alt0.1
-rpmi: libpocl2-5.0-alt0.2
+rpmi: libpocl2-6.0-alt0.1
-rpmi: llvm17.0-filesystem-17.0.6-alt4.2
-rpmi: llvm17.0-gold-17.0.6-alt4.2
-rpmi: llvm17.0-libs-17.0.6-alt4.2
-rpmi: llvm17.0-polly-17.0.6-alt4.2
-rpmi: opencl-headers-1:2023.12.14-alt1
+rpmi: opencl-headers-1:2024.05.08-alt1
-rpmi: pocl-devel-5.0-alt0.2
-rpmi: pocl-opencl-icd-5.0-alt0.2
+rpmi: pocl-devel-6.0-alt0.1
+rpmi: pocl-opencl-icd-6.0-alt0.1 (Yes llvm17.0 is removed, but llvm18.1 (18.1.8) is present in both envs.) |
Googled this llvm/llvm-project#92648 and it's inspired me to think that it's possible this is a compiler bug that is happening not even in clblast codebase, but where and how to workaround it... |
Also this pocl/pocl#1608 |
A bit late to the party, but I somehow missed that Debian's testsuite started failing at the same test on September 29. Sadly the data retention policy means that the logs for the last passed test (September 28) are gone, so I can't pinpoint exactly what changed for us on the 29th. We also see a failure with This is with POCL 6.0. But POCL 6.0 entered Debian in June, and we definitely have successful tests between that and the first failure. |
Thanks @gspr for joining in. I see that @vt-alt was recently active in pocl/pocl#1608 about this matter. If there is anything I can do, please let me know. If I would be able to reproduce things locally (and had some time), here's what I would do:
Alternatively you can also run the CLBlast GEMM kernel tuner. That will compile a lot of different flavours of the kernel, and perhaps not all of them cause the issue: that why we could also find out which variant(s) cause the crash. I hope this is helpful. |
clblast_test_xconvgemm
fails on x86-64:Interestingly, when I first built it on 2024-07-08, all tests passed.
The text was updated successfully, but these errors were encountered: