You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
../aten/src/ATen/native/cuda/IndexKernel.cu:353: masked_scatter_size_check: block: [0,0,0], thread: [0,0,0] Assertion `totalElements <= srcSize` failed.
2025-01-23 10:48:18,507 - lmdeploy - ERROR - engine.py:981 - Task <MainLoopBackground> failed
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 976, in __task_callback
task.result()
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 929, in _async_loop_background
await self._async_step_background(
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 784, in _async_step_background
output = await self._async_model_forward(
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/utils.py", line 241, in __tmp
return (await func(*args, **kwargs))
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 674, in _async_model_forward
ret = await __forward(inputs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 650, in __forward
return await self.model_agent.async_forward(
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 275, in async_forward
output = self._forward_impl(inputs,
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 257, in _forward_impl
output = model_forward(
File "/home/ray/anaconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 156, in model_forward
output = model(**input_dict)
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/graph_runner.py", line 149, in __call__
return self.model(**kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/lmdeploy/pytorch/models/qwen2_vl.py", line 741, in forward
inputs_embeds = inputs_embeds.masked_scatter(
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@boceng hi, thanks for your feedback. Could you kindly share your query and image that could reproduce the issue?
If possible, could you change here and try again?
@boceng hi, thanks for your feedback. Could you kindly share your query and image that could reproduce the issue? If possible, could you change here and try again?
Checklist
Describe the bug
背景:对于一批数据进行infer时,会有部分数据导致报错,其余数据正常出结果。同时,一旦当一个样本infer出错后,后续的样本也无法再infer成功,一直报错'internal error happened',尝试更换0.6.5~0.7.0之间的版本,也会存在该错误。
通过观察具体的/.../lmdeploy/pytorch/models/qwen2_vl.py文件内容,初步发现是在对inputs_embeds根据mask填充image_embeds时触发的错误,通过打印日志发现image_embeds的个数不足以够填充mask的个数,具体如下
进一步,我尝试解决这个问题,即将inputs_embeds = inputs_embeds.masked_scatter这行代码注释掉,以此检验是否只有这一个问题。注释掉后重跑推理就没有任何问题了。所以我的需求是,应该如何正确地解决该问题,以保证最终infer结果的正确性。
Reproduction
Environment
Error traceback
The text was updated successfully, but these errors were encountered: