(inference_single_image) RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #3

ralpyna · 2024-02-21T05:38:55Z

Thank you so much for updating "inference_single_image".
But, the following error occurred when using the script.

[02/21 14:08:53 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/maskrcnn_v2/model_final.pth ... Downloading (…)ip_pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.51G/3.51G [04:00<00:00, 14.6MB/s] Downloading tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20.6k/20.6k [00:00<00:00, 19.3MB/s] Downloading tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:01<00:00, 2.18MB/s] Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 2.71MB/s] Traceback (most recent call last): File "inference_single_image.py", line 103, in <module> inference_single_image(model, image_path, text_prompt_list, param_dict) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/app/altp/models/cooperative-foundational-models/evaluation.py", line 49, in inference_single_image _ = inference_gdino(model, inputs, text_prompt_list, param_dict) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/app/altp/models/cooperative-foundational-models/ground_dino_utils.py", line 77, in inference_gdino outputs = rcnn_model(inputs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 146, in forward return self.inference(batched_inputs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 200, in inference features = self.backbone(images.tensor) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/backbone/resnet.py", line 445, in forward x = self.stem(x) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/backbone/resnet.py", line 356, in forward x = self.conv1(x) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/layers/wrappers.py", line 106, in forward x = F.conv2d( RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

and, my environmet info here :
`[02/21 14:08:52 detectron2]: Environment info:

sys.platform linux
Python 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
numpy 1.23.5
detectron2 0.6 @/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.3
detectron2 arch flags /home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/_C.cpython-38-x86_64-linux-gnu.so; cannot find cuobjdump
DETECTRON2_ENV_MODULE
PyTorch 1.10.1 @/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0 NVIDIA H100 80GB HBM3 (arch=9.0)
Driver version 535.129.03
CUDA_HOME /usr/local/cuda
Pillow 8.3.2
torchvision 0.11.2 @/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torchvision
torchvision arch flags /home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torchvision/_C.so; cannot find cuobjdump
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.7.0

PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,`

The text was updated successfully, but these errors were encountered:

rohit901 · 2024-02-21T05:57:26Z

Hello,

It seems to be a problem with the Pytorch/CUDA/CUDNN libraries, and not specifically with the code in this repo AFAIK.

I did search about it, and seems to be quite a common issue. It seems to occur when your GPU memory overflows, but you're using H100 with 80GB GPU, right?

I've tested the code on A100 with 40 GB GPU, and it runs fine in that.

Did you install the environment from the provided environment.yaml and followed other instructions in readme?

Let me know, regarding that. You may also try to manually install pytorch with its cuda version from the pytorch website itself instead from environment.yaml.

Further, you can refer to these links:

Please let me know if you are not able to resolve the issue, despite following some of the above recommendation. If you're able to figure out the solution, do update it here, as it can benefit others facing the same issues.

ralpyna · 2024-02-22T09:33:48Z

Thank you for reply.
I only followed the instructions in readme you wrote.
I checked the versions of Pytorch, CUDA and cuDNN, as well as the contents of the links you provided me, but I was unable to resolve this error.
I'm still looking for the cause of the problem and how to fix it.

rohit901 · 2024-02-22T10:04:52Z

Hello, sorry to hear that.

Could you please try uninstalling Pytorch and trying to install it again manually, maybe using a different version? You can also try to install Detectron2 from the source just to make sure whether it works or not.(https://detectron2.readthedocs.io/en/latest/tutorials/install.html)

I think the cause of this problem could mostly be your CUDA/Pytorch installation. If you can install it again or try running this code on a different machine with at least 40GB VRAM, it can work.

rohit901 · 2024-02-22T10:22:46Z

@ralpyna you can also ensure that when you're running the code, the GPU memory is empty. Type "nvidia-smi" and verify that no other process is occupying the GPU.
The error is also thrown sometimes when you get out of memory issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(inference_single_image) RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #3

(inference_single_image) RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #3

ralpyna commented Feb 21, 2024

rohit901 commented Feb 21, 2024 •

edited

Loading

ralpyna commented Feb 22, 2024

rohit901 commented Feb 22, 2024

rohit901 commented Feb 22, 2024

(inference_single_image) RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #3

(inference_single_image) RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #3

Comments

ralpyna commented Feb 21, 2024

rohit901 commented Feb 21, 2024 • edited Loading

ralpyna commented Feb 22, 2024

rohit901 commented Feb 22, 2024

rohit901 commented Feb 22, 2024

rohit901 commented Feb 21, 2024 •

edited

Loading