[BUG] 通过lmdeploy推理sft后qwen2-vl模型，特定数据偶发报错 #3078

boceng · 2025-01-23T03:20:33Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

../aten/src/ATen/native/cuda/IndexKernel.cu:353: masked_scatter_size_check: block: [0,0,0], thread: [0,0,0] Assertion `totalElements <= srcSize` failed.
2025-01-23 10:48:18,507 - lmdeploy - ERROR - engine.py:981 - Task <MainLoopBackground> failed
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 976, in __task_callback
    task.result()
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 929, in _async_loop_background
    await self._async_step_background(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 784, in _async_step_background
    output = await self._async_model_forward(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/utils.py", line 241, in __tmp
    return (await func(*args, **kwargs))
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 674, in _async_model_forward
    ret = await __forward(inputs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 650, in __forward
    return await self.model_agent.async_forward(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 275, in async_forward
    output = self._forward_impl(inputs,
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 257, in _forward_impl
    output = model_forward(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 156, in model_forward
    output = model(**input_dict)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/graph_runner.py", line 149, in __call__
    return self.model(**kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/models/qwen2_vl.py", line 741, in forward
    inputs_embeds = inputs_embeds.masked_scatter(
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

背景：对于一批数据进行infer时，会有部分数据导致报错，其余数据正常出结果。同时，一旦当一个样本infer出错后，后续的样本也无法再infer成功，一直报错'internal error happened'，尝试更换0.6.5~0.7.0之间的版本，也会存在该错误。
通过观察具体的/.../lmdeploy/pytorch/models/qwen2_vl.py文件内容，初步发现是在对inputs_embeds根据mask填充image_embeds时触发的错误，通过打印日志发现image_embeds的个数不足以够填充mask的个数，具体如下

inputs_embeds.shape torch.Size([1, 3447, 1536])
image_mask.shape torch.Size([1, 3447])
torch.sum(image_mask), true_count_num: tensor(2281, device='cuda:0')
image_embeds.shape torch.Size([2280, 1536])

进一步，我尝试解决这个问题，即将inputs_embeds = inputs_embeds.masked_scatter这行代码注释掉，以此检验是否只有这一个问题。注释掉后重跑推理就没有任何问题了。所以我的需求是，应该如何正确地解决该问题，以保证最终infer结果的正确性。

Reproduction

from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig, ChatTemplateConfig
from lmdeploy.vl import load_image


backend_config = TurbomindEngineConfig(session_len=8192, cache_max_entry_count=0.8)
pipe = pipeline(model_path, backend_config=backend_config)

...

messages = []
for ... in ... :
    message = [
        {
            "role": 'user',
            "content": [
                {"type": "text", "text": query}
             ]
         }
     ]
    for img in video_frames:
         message[0]["content"].append({"type": "image_url", "image_url": {"url": f"{img}"}})
    messages.append(message)

response = pipe(messages)

Environment

sys.platform: linux
Python: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA L20
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.4.1+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1  (built against CUDA 12.4)
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.19.1+cu121
LMDeploy: 0.7.0+
transformers: 4.48.1
gradio: Not Found
fastapi: 0.109.2
pydantic: 2.5.0
triton: 3.0.0
NVIDIA Topology: 
        GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-191   0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

The text was updated successfully, but these errors were encountered:

RunningLeon · 2025-01-23T08:58:47Z

@boceng hi, thanks for your feedback. Could you kindly share your query and image that could reproduce the issue?
If possible, could you change here and try again?

lmdeploy/lmdeploy/vl/model/qwen2.py

Line 50 in 800b601

    
           result.update(dict(image_size=image.size, image_tokens=image_tokens, image_token_id=0))

To

result.update(dict(image_size=image.size, image_tokens=image_tokens, image_token_id=self.hf_config.image_token_id))

boceng · 2025-01-23T13:59:02Z

@boceng hi, thanks for your feedback. Could you kindly share your query and image that could reproduce the issue? If possible, could you change here and try again?

lmdeploy/lmdeploy/vl/model/qwen2.py

Line 50 in 800b601

result.update(dict(image_size=image.size, image_tokens=image_tokens, image_token_id=0))
To
result.update(dict(image_size=image.size, image_tokens=image_tokens, image_token_id=self.hf_config.image_token_id)) 

Thanks!!! The issue was resolved perfectly after the changes were made. But, I regret that I am unable to provide our internal data.

boceng changed the title ~~通过lmdeploy推理sft后qwen2-vl模型，特定数据偶发报错~~ [BUG]通过lmdeploy推理sft后qwen2-vl模型，特定数据偶发报错 Jan 23, 2025

boceng changed the title ~~[BUG]通过lmdeploy推理sft后qwen2-vl模型，特定数据偶发报错~~ [BUG] 通过lmdeploy推理sft后qwen2-vl模型，特定数据偶发报错 Jan 23, 2025

lvhan028 assigned RunningLeon Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] 通过lmdeploy推理sft后qwen2-vl模型，特定数据偶发报错 #3078

[BUG] 通过lmdeploy推理sft后qwen2-vl模型，特定数据偶发报错 #3078

boceng commented Jan 23, 2025 •

edited

Loading

RunningLeon commented Jan 23, 2025 •

edited

Loading

boceng commented Jan 23, 2025

[BUG] 通过lmdeploy推理sft后qwen2-vl模型，特定数据偶发报错 #3078

[BUG] 通过lmdeploy推理sft后qwen2-vl模型，特定数据偶发报错 #3078

Comments

boceng commented Jan 23, 2025 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

Error traceback

RunningLeon commented Jan 23, 2025 • edited Loading

boceng commented Jan 23, 2025

boceng commented Jan 23, 2025 •

edited

Loading

RunningLeon commented Jan 23, 2025 •

edited

Loading