CUDA Out of Memory Error During Inference in samapi Environment #16

halqadasi · 2024-04-05T12:55:52Z

While running inference tasks in the samapi environment, I encountered a CUDA out of memory error, causing the application to fallback to CPU inference. This issue significantly impacts performance. I'm looking for advice on mitigating this error or any potential fixes.

Environment

Operating System: Ubuntu 20.04
Python Version: 3.10
Anaconda Environment: samapi
GPU Model: NVIDIA RTX 4080

Steps to Reproduce

Restart the server to ensure no residual GPU memory usage.
Activate the samapi environment: source activate samapi
Run the command: uvicorn samapi.main:app --workers 2
Error encountered after selecting the vim-h and starting the labeling process.

Expected Behavior

I expected the GPU to handle the inference tasks without running out of memory, allowing for faster processing times.

Actual Behavior

Received a warning/error indicating CUDA out of memory. The system defaulted to using the CPU for inference, significantly slowing down the process. The error message was:

/home/.../anaconda3/envs/samapi/lib/python3.10/site-packages/samapi/main.py:152: UserWarning: cuda device found but got the error CUDA out of memory. Tried to allocate 768.00 MiB (GPU 3; 10.75 GiB total capacity; 1.95 GiB already allocated; 244.25 MiB free; 2.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF - using CPU for inference

Additional Information

The issue occurs under both light and heavy workloads.
No significant processes were running on the GPU aside from the current task.
Attempted solutions: I experienced the same issue previously with the label-studio ML Backend I solved it. The error is because of loading the SAM vit-h model each time I label. I solved this issue by loading the model only once at the beginning of the labeling. The error has been solved. Please look at this https://github.com/open-mmlab/playground/issues/150

The text was updated successfully, but these errors were encountered:

ksugar · 2024-04-05T15:45:16Z

Hi @halqadasi, thank you for reporting the issue.
Could you run the following script in the Script Editor in QuPath to check the size of the image to be sent to the server?
If you are using a high-resolution display, the image size may become larger than expected.
If you encounter this problem, please try lowering the screen resolution to see if it fixes the issue.

import org.elephant.sam.Utils
import qupath.lib.awt.common.AwtTools

def viewer = getCurrentViewer()
def renderedServer = Utils.createRenderedServer(viewer)
def region = AwtTools.getImageRegion(viewer.getDisplayedRegionShape(), viewer.getZPosition(),
                viewer.getTPosition());
def viewerRegion = RegionRequest.createInstance(renderedServer.getPath(), viewer.getDownsampleFactor(),
                region);
viewerRegion = viewerRegion.intersect2D(0, 0, renderedServer.getWidth(), renderedServer.getHeight())
def img = renderedServer.readRegion(viewerRegion)
println "Image size processed on the server: (" + img.getWidth() + ", " + img.getHeight() + ")"

halqadasi · 2024-04-05T17:26:55Z

Hi @halqadasi, thank you for reporting the issue. Could you run the following script in the Script Editor in QuPath to check the size of the image to be sent to the server? If you are using a high-resolution display, the image size may become larger than expected. If you encounter this problem, please try lowering the screen resolution to see if it fixes the issue.
import org.elephant.sam.Utils
import qupath.lib.awt.common.AwtTools

def viewer = getCurrentViewer()
def renderedServer = Utils.createRenderedServer(viewer)
def region = AwtTools.getImageRegion(viewer.getDisplayedRegionShape(), viewer.getZPosition(),
                viewer.getTPosition());
def viewerRegion = RegionRequest.createInstance(renderedServer.getPath(), viewer.getDownsampleFactor(),
                region);
viewerRegion = viewerRegion.intersect2D(0, 0, renderedServer.getWidth(), renderedServer.getHeight())
def img = renderedServer.readRegion(viewerRegion)
println "Image size processed on the server: (" + img.getWidth() + ", " + img.getHeight() + ")"

The size is 1133 * 731 and I got this error on the terminal:

    raise DecompressionBombError(msg)
PIL.Image.DecompressionBombError: Image size (256160025 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.

ksugar · 2024-04-15T17:14:35Z

@halqadasi the pixel size 178956970 looks larger than expected. I will investigate it further.
In the mean time, could you check if the smaller models (vit_l, vit_b, vit_t) works without giving CUDA OOM error?

ksugar · 2024-04-17T08:48:04Z

@halqadasi, it seems that the OOM issue was caused by an older version of dependencies.
I have updated the torch dependency to the latest version in samapi v0.4.1. Please try updating the samapi server and see if the issue is solved.

ksugar mentioned this issue Apr 17, 2024

v0.4.0 -> v0.4.1 ksugar/samapi#19

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Out of Memory Error During Inference in samapi Environment #16

CUDA Out of Memory Error During Inference in samapi Environment #16

halqadasi commented Apr 5, 2024

ksugar commented Apr 5, 2024 •

edited

Loading

halqadasi commented Apr 5, 2024 •

edited

Loading

ksugar commented Apr 15, 2024 •

edited

Loading

ksugar commented Apr 17, 2024 •

edited

Loading

CUDA Out of Memory Error During Inference in samapi Environment #16

CUDA Out of Memory Error During Inference in samapi Environment #16

Comments

halqadasi commented Apr 5, 2024

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Information

ksugar commented Apr 5, 2024 • edited Loading

halqadasi commented Apr 5, 2024 • edited Loading

ksugar commented Apr 15, 2024 • edited Loading

ksugar commented Apr 17, 2024 • edited Loading

ksugar commented Apr 5, 2024 •

edited

Loading

halqadasi commented Apr 5, 2024 •

edited

Loading

ksugar commented Apr 15, 2024 •

edited

Loading

ksugar commented Apr 17, 2024 •

edited

Loading