Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(inference_single_image) RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #3

Open
ralpyna opened this issue Feb 21, 2024 · 4 comments

Comments

@ralpyna
Copy link

ralpyna commented Feb 21, 2024

Thank you so much for updating "inference_single_image".
But, the following error occurred when using the script.

[02/21 14:08:53 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/maskrcnn_v2/model_final.pth ... Downloading (…)ip_pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.51G/3.51G [04:00<00:00, 14.6MB/s] Downloading tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20.6k/20.6k [00:00<00:00, 19.3MB/s] Downloading tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:01<00:00, 2.18MB/s] Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 2.71MB/s] Traceback (most recent call last): File "inference_single_image.py", line 103, in <module> inference_single_image(model, image_path, text_prompt_list, param_dict) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/app/altp/models/cooperative-foundational-models/evaluation.py", line 49, in inference_single_image _ = inference_gdino(model, inputs, text_prompt_list, param_dict) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/app/altp/models/cooperative-foundational-models/ground_dino_utils.py", line 77, in inference_gdino outputs = rcnn_model(inputs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 146, in forward return self.inference(batched_inputs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/meta_arch/rcnn.py", line 200, in inference features = self.backbone(images.tensor) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/backbone/resnet.py", line 445, in forward x = self.stem(x) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/modeling/backbone/resnet.py", line 356, in forward x = self.conv1(x) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/layers/wrappers.py", line 106, in forward x = F.conv2d( RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

and, my environmet info here :
`[02/21 14:08:52 detectron2]: Environment info:


sys.platform linux
Python 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
numpy 1.23.5
detectron2 0.6 @/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.3
detectron2 arch flags /home/user/miniconda/envs/cfm/lib/python3.8/site-packages/detectron2/_C.cpython-38-x86_64-linux-gnu.so; cannot find cuobjdump
DETECTRON2_ENV_MODULE
PyTorch 1.10.1 @/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0 NVIDIA H100 80GB HBM3 (arch=9.0)
Driver version 535.129.03
CUDA_HOME /usr/local/cuda
Pillow 8.3.2
torchvision 0.11.2 @/home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torchvision
torchvision arch flags /home/user/miniconda/envs/cfm/lib/python3.8/site-packages/torchvision/_C.so; cannot find cuobjdump
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.7.0


PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX512
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.2
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,`
@rohit901
Copy link
Owner

rohit901 commented Feb 21, 2024

Hello,

It seems to be a problem with the Pytorch/CUDA/CUDNN libraries, and not specifically with the code in this repo AFAIK.

I did search about it, and seems to be quite a common issue. It seems to occur when your GPU memory overflows, but you're using H100 with 80GB GPU, right?

I've tested the code on A100 with 40 GB GPU, and it runs fine in that.

Did you install the environment from the provided environment.yaml and followed other instructions in readme?

Let me know, regarding that. You may also try to manually install pytorch with its cuda version from the pytorch website itself instead from environment.yaml.

Further, you can refer to these links:

  1. https://stackoverflow.com/questions/61467751/unable-to-find-a-valid-cudnn-algorithm-to-run-convolution
  2. https://discuss.pytorch.org/t/unable-to-find-a-valid-cudnn-algorithm-to-run-convolution/78724/25

Please let me know if you are not able to resolve the issue, despite following some of the above recommendation. If you're able to figure out the solution, do update it here, as it can benefit others facing the same issues.

@ralpyna
Copy link
Author

ralpyna commented Feb 22, 2024

Thank you for reply.
I only followed the instructions in readme you wrote.
I checked the versions of Pytorch, CUDA and cuDNN, as well as the contents of the links you provided me, but I was unable to resolve this error.
I'm still looking for the cause of the problem and how to fix it.

@rohit901
Copy link
Owner

Hello, sorry to hear that.

Could you please try uninstalling Pytorch and trying to install it again manually, maybe using a different version? You can also try to install Detectron2 from the source just to make sure whether it works or not.(https://detectron2.readthedocs.io/en/latest/tutorials/install.html)

I think the cause of this problem could mostly be your CUDA/Pytorch installation. If you can install it again or try running this code on a different machine with at least 40GB VRAM, it can work.

@rohit901
Copy link
Owner

@ralpyna you can also ensure that when you're running the code, the GPU memory is empty. Type "nvidia-smi" and verify that no other process is occupying the GPU.
The error is also thrown sometimes when you get out of memory issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants