Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Out of Memory Error During Inference in samapi Environment #16

Open
halqadasi opened this issue Apr 5, 2024 · 4 comments
Open

Comments

@halqadasi
Copy link

While running inference tasks in the samapi environment, I encountered a CUDA out of memory error, causing the application to fallback to CPU inference. This issue significantly impacts performance. I'm looking for advice on mitigating this error or any potential fixes.

Environment

  • Operating System: Ubuntu 20.04
  • Python Version: 3.10
  • Anaconda Environment: samapi
  • GPU Model: NVIDIA RTX 4080

Steps to Reproduce

  1. Restart the server to ensure no residual GPU memory usage.
  2. Activate the samapi environment: source activate samapi
  3. Run the command: uvicorn samapi.main:app --workers 2
  4. Error encountered after selecting the vim-h and starting the labeling process.

Expected Behavior

I expected the GPU to handle the inference tasks without running out of memory, allowing for faster processing times.

Actual Behavior

Received a warning/error indicating CUDA out of memory. The system defaulted to using the CPU for inference, significantly slowing down the process. The error message was:

/home/.../anaconda3/envs/samapi/lib/python3.10/site-packages/samapi/main.py:152: UserWarning: cuda device found but got the error CUDA out of memory. Tried to allocate 768.00 MiB (GPU 3; 10.75 GiB total capacity; 1.95 GiB already allocated; 244.25 MiB free; 2.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF - using CPU for inference

Additional Information

  • The issue occurs under both light and heavy workloads.
  • No significant processes were running on the GPU aside from the current task.
  • Attempted solutions: I experienced the same issue previously with the label-studio ML Backend I solved it. The error is because of loading the SAM vit-h model each time I label. I solved this issue by loading the model only once at the beginning of the labeling. The error has been solved. Please look at this https://github.com/open-mmlab/playground/issues/150
@ksugar
Copy link
Owner

ksugar commented Apr 5, 2024

Hi @halqadasi, thank you for reporting the issue.
Could you run the following script in the Script Editor in QuPath to check the size of the image to be sent to the server?
If you are using a high-resolution display, the image size may become larger than expected.
If you encounter this problem, please try lowering the screen resolution to see if it fixes the issue.

import org.elephant.sam.Utils
import qupath.lib.awt.common.AwtTools

def viewer = getCurrentViewer()
def renderedServer = Utils.createRenderedServer(viewer)
def region = AwtTools.getImageRegion(viewer.getDisplayedRegionShape(), viewer.getZPosition(),
                viewer.getTPosition());
def viewerRegion = RegionRequest.createInstance(renderedServer.getPath(), viewer.getDownsampleFactor(),
                region);
viewerRegion = viewerRegion.intersect2D(0, 0, renderedServer.getWidth(), renderedServer.getHeight())
def img = renderedServer.readRegion(viewerRegion)
println "Image size processed on the server: (" + img.getWidth() + ", " + img.getHeight() + ")"

@halqadasi
Copy link
Author

halqadasi commented Apr 5, 2024

Hi @halqadasi, thank you for reporting the issue. Could you run the following script in the Script Editor in QuPath to check the size of the image to be sent to the server? If you are using a high-resolution display, the image size may become larger than expected. If you encounter this problem, please try lowering the screen resolution to see if it fixes the issue.

import org.elephant.sam.Utils
import qupath.lib.awt.common.AwtTools

def viewer = getCurrentViewer()
def renderedServer = Utils.createRenderedServer(viewer)
def region = AwtTools.getImageRegion(viewer.getDisplayedRegionShape(), viewer.getZPosition(),
                viewer.getTPosition());
def viewerRegion = RegionRequest.createInstance(renderedServer.getPath(), viewer.getDownsampleFactor(),
                region);
viewerRegion = viewerRegion.intersect2D(0, 0, renderedServer.getWidth(), renderedServer.getHeight())
def img = renderedServer.readRegion(viewerRegion)
println "Image size processed on the server: (" + img.getWidth() + ", " + img.getHeight() + ")"

The size is 1133 * 731 and I got this error on the terminal:

    raise DecompressionBombError(msg)
PIL.Image.DecompressionBombError: Image size (256160025 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.

@ksugar
Copy link
Owner

ksugar commented Apr 15, 2024

@halqadasi the pixel size 178956970 looks larger than expected. I will investigate it further.
In the mean time, could you check if the smaller models (vit_l, vit_b, vit_t) works without giving CUDA OOM error?

@ksugar
Copy link
Owner

ksugar commented Apr 17, 2024

@halqadasi, it seems that the OOM issue was caused by an older version of dependencies.
I have updated the torch dependency to the latest version in samapi v0.4.1. Please try updating the samapi server and see if the issue is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants