-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train.py CUDA_ERROR_NO_BINARY_FOR_GPU #26
Comments
Hi @quizz0n , |
Yes that's probably right. LE: This is how I started the docker image: |
This is probably related to WSL2+GPU+CUDA. I am currently trying to have bullet-proof guidelines to set-up OTBTF on windows with GPU, but I am not very familiar with Windows. What you could try, is to rebuild the docker image on your computer. |
I've tried to run this just now on a clean Ubuntu 20.04 install (real OS, not WSL2), but the error is the same. I'm not very familiar with rebuilding a docker image but I will look into it. Basically to create a new docker image based on this one but with different CUDA? LE: The error message on Ubuntu 20.04 install:
Applying this fix: https://stackoverflow.com/questions/38303974/tensorflow-running-error-with-cublas I get:
Tried to replace the ptxas as here: tensorflow/tensorflow#45590 I get:
|
You should be able to build the docker image with a single command (see this). Maybe you will have to try different build options. |
Managed to build a new docker image and successfully trained the network. However when running
|
Looks like the error is from
Strange that you can train the network but not use it at inference time. |
Tried with a SavedModel I created.
but that generates another error and that's why I wasn't sure that's the issue.
|
The last error reminds me this issue in OTBTF. However the |
Indeed that looks like its the issue as I'm importing tensorflow to fix |
I think we can close this issue. The initial error |
Thanks. Do you know which parameter(s) did you manage to change? |
For |
Hi @remicres,
So when running the training on the
docker/otbtf/gpu:2.4
, after successfully opening TensorFlow libraries, I receive this error:This is while running on RTX 3070 with CUDA 11.3.
LE: I believe this is due to different versions of CUDA between the host and docker image? 11.3 not being compatible with 11.0?
The text was updated successfully, but these errors were encountered: