Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 通过lmdeploy推理sft后qwen2-vl模型,特定数据偶发报错 #3078

Open
3 tasks done
boceng opened this issue Jan 23, 2025 · 2 comments
Open
3 tasks done
Assignees

Comments

@boceng
Copy link

boceng commented Jan 23, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

../aten/src/ATen/native/cuda/IndexKernel.cu:353: masked_scatter_size_check: block: [0,0,0], thread: [0,0,0] Assertion `totalElements <= srcSize` failed.
2025-01-23 10:48:18,507 - lmdeploy - ERROR - engine.py:981 - Task <MainLoopBackground> failed
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 976, in __task_callback
    task.result()
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 929, in _async_loop_background
    await self._async_step_background(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 784, in _async_step_background
    output = await self._async_model_forward(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/utils.py", line 241, in __tmp
    return (await func(*args, **kwargs))
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 674, in _async_model_forward
    ret = await __forward(inputs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 650, in __forward
    return await self.model_agent.async_forward(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 275, in async_forward
    output = self._forward_impl(inputs,
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 257, in _forward_impl
    output = model_forward(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 156, in model_forward
    output = model(**input_dict)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/graph_runner.py", line 149, in __call__
    return self.model(**kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/models/qwen2_vl.py", line 741, in forward
    inputs_embeds = inputs_embeds.masked_scatter(
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

背景:对于一批数据进行infer时,会有部分数据导致报错,其余数据正常出结果。同时,一旦当一个样本infer出错后,后续的样本也无法再infer成功,一直报错'internal error happened',尝试更换0.6.5~0.7.0之间的版本,也会存在该错误。
通过观察具体的/.../lmdeploy/pytorch/models/qwen2_vl.py文件内容,初步发现是在对inputs_embeds根据mask填充image_embeds时触发的错误,通过打印日志发现image_embeds的个数不足以够填充mask的个数,具体如下

inputs_embeds.shape torch.Size([1, 3447, 1536])
image_mask.shape torch.Size([1, 3447])
torch.sum(image_mask), true_count_num: tensor(2281, device='cuda:0')
image_embeds.shape torch.Size([2280, 1536])

进一步,我尝试解决这个问题,即将inputs_embeds = inputs_embeds.masked_scatter这行代码注释掉,以此检验是否只有这一个问题。注释掉后重跑推理就没有任何问题了。所以我的需求是,应该如何正确地解决该问题,以保证最终infer结果的正确性。

Reproduction

from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig, ChatTemplateConfig
from lmdeploy.vl import load_image


backend_config = TurbomindEngineConfig(session_len=8192, cache_max_entry_count=0.8)
pipe = pipeline(model_path, backend_config=backend_config)

...

messages = []
for ... in ... :
    message = [
        {
            "role": 'user',
            "content": [
                {"type": "text", "text": query}
             ]
         }
     ]
    for img in video_frames:
         message[0]["content"].append({"type": "image_url", "image_url": {"url": f"{img}"}})
    messages.append(message)

response = pipe(messages)

Environment

sys.platform: linux
Python: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA L20
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.4.1+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1  (built against CUDA 12.4)
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.19.1+cu121
LMDeploy: 0.7.0+
transformers: 4.48.1
gradio: Not Found
fastapi: 0.109.2
pydantic: 2.5.0
triton: 3.0.0
NVIDIA Topology: 
        GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-191   0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

@boceng boceng changed the title 通过lmdeploy推理sft后qwen2-vl模型,特定数据偶发报错 [BUG]通过lmdeploy推理sft后qwen2-vl模型,特定数据偶发报错 Jan 23, 2025
@boceng boceng changed the title [BUG]通过lmdeploy推理sft后qwen2-vl模型,特定数据偶发报错 [BUG] 通过lmdeploy推理sft后qwen2-vl模型,特定数据偶发报错 Jan 23, 2025
@RunningLeon
Copy link
Collaborator

RunningLeon commented Jan 23, 2025

@boceng hi, thanks for your feedback. Could you kindly share your query and image that could reproduce the issue?
If possible, could you change here and try again?

result.update(dict(image_size=image.size, image_tokens=image_tokens, image_token_id=0))

To

result.update(dict(image_size=image.size, image_tokens=image_tokens, image_token_id=self.hf_config.image_token_id)) 

@boceng
Copy link
Author

boceng commented Jan 23, 2025

@boceng hi, thanks for your feedback. Could you kindly share your query and image that could reproduce the issue? If possible, could you change here and try again?

lmdeploy/lmdeploy/vl/model/qwen2.py

Line 50 in 800b601

result.update(dict(image_size=image.size, image_tokens=image_tokens, image_token_id=0))
To

result.update(dict(image_size=image.size, image_tokens=image_tokens, image_token_id=self.hf_config.image_token_id)) 

Thanks!!! The issue was resolved perfectly after the changes were made. But, I regret that I am unable to provide our internal data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants