-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(inference_single_image) RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #3
Comments
Hello, It seems to be a problem with the Pytorch/CUDA/CUDNN libraries, and not specifically with the code in this repo AFAIK. I did search about it, and seems to be quite a common issue. It seems to occur when your GPU memory overflows, but you're using H100 with 80GB GPU, right? I've tested the code on A100 with 40 GB GPU, and it runs fine in that. Did you install the environment from the provided environment.yaml and followed other instructions in readme? Let me know, regarding that. You may also try to manually install pytorch with its cuda version from the pytorch website itself instead from environment.yaml. Further, you can refer to these links:
Please let me know if you are not able to resolve the issue, despite following some of the above recommendation. If you're able to figure out the solution, do update it here, as it can benefit others facing the same issues. |
Thank you for reply. |
Hello, sorry to hear that. Could you please try uninstalling Pytorch and trying to install it again manually, maybe using a different version? You can also try to install Detectron2 from the source just to make sure whether it works or not.(https://detectron2.readthedocs.io/en/latest/tutorials/install.html) I think the cause of this problem could mostly be your CUDA/Pytorch installation. If you can install it again or try running this code on a different machine with at least 40GB VRAM, it can work. |
@ralpyna you can also ensure that when you're running the code, the GPU memory is empty. Type "nvidia-smi" and verify that no other process is occupying the GPU. |
Thank you so much for updating "inference_single_image".
But, the following error occurred when using the script.
[02/21 14:08:53 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/maskrcnn_v2/model_final.pth ... Downloading (…)ip_pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.51G/3.51G [04:00<00:00, 14.6MB/s] Downloading tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20.6k/20.6k [00:00<00:00, 19.3MB/s] Downloading tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:01<00:00, 2.18MB/s] Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 2.71MB/s] Traceback (most recent call last): File "inference_single_image.py", line 103, in <module> inference_single_image(model, image_path, text_prompt_list, param_dict) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/app/altp/models/cooperative-foundational-models/evaluation.py", line 49, in inference_single_image _ = inference_gdino(model, inputs, text_prompt_list, param_dict) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/app/altp/models/cooperative-foundational-models/ground_dino_utils.py", line 77, in inference_gdino outputs = rcnn_model(inputs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 146, in forward return self.inference(batched_inputs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 200, in inference features = self.backbone(images.tensor) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/backbone/resnet.py", line 445, in forward x = self.stem(x) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/backbone/resnet.py", line 356, in forward x = self.conv1(x) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/layers/wrappers.py", line 106, in forward x = F.conv2d( RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
and, my environmet info here :
`[02/21 14:08:52 detectron2]: Environment info:
sys.platform linux
Python 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
numpy 1.23.5
detectron2 0.6 @/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.3
detectron2 arch flags /home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/_C.cpython-38-x86_64-linux-gnu.so; cannot find cuobjdump
DETECTRON2_ENV_MODULE
PyTorch 1.10.1 @/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0 NVIDIA H100 80GB HBM3 (arch=9.0)
Driver version 535.129.03
CUDA_HOME /usr/local/cuda
Pillow 8.3.2
torchvision 0.11.2 @/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torchvision
torchvision arch flags /home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torchvision/_C.so; cannot find cuobjdump
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.7.0
PyTorch built with:
The text was updated successfully, but these errors were encountered: