Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroundingDINO模型onnx转bmodel报错 #82

Open
jiajia1417 opened this issue Jan 6, 2025 · 9 comments
Open

GroundingDINO模型onnx转bmodel报错 #82

jiajia1417 opened this issue Jan 6, 2025 · 9 comments

Comments

@jiajia1417
Copy link

jiajia1417 commented Jan 6, 2025

报下面这个错误

2025/01/06 18:27:08 - INFO : 
         _____________________________________________________ 
        | preprocess:                                           |
        |   (x - mean) * scale                                  |
        '-------------------------------------------------------'
  config Preprocess args : 
        resize_dims           : same to net input dims
        keep_aspect_ratio     : False
        keep_ratio_mode       : letterbox
        pad_value             : 0
        pad_type              : center
        --------------------------
        mean                  : [0.0, 0.0, 0.0]
        scale                 : [1.0, 1.0, 1.0]
        --------------------------
        pixel_format          : bgr
        channel_format        : nchw

2025/01/06 18:27:09 - INFO : Input_shape assigned
2025-01-06 18:27:15.955223013 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/bert/embeddings/token_type_embeddings/Gather' Status Message: indices element out of data bounds, idx=8 must be within the inclusive range [-2,1]
2025/01/06 18:27:15 - WARNING : ConstantFolding failed.
2025/01/06 18:27:16 - INFO : ConstantFolding finished
2025/01/06 18:27:16 - INFO : skip_fuse_bn:False
2025/01/06 18:31:01 - INFO : Onnxsim opt finished
2025-01-06 18:31:05.123233501 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/bert/embeddings/token_type_embeddings/Gather' Status Message: indices element out of data bounds, idx=3 must be within the inclusive range [-2,1]
2025/01/06 18:31:05 - WARNING : ConstantFolding failed.
2025/01/06 18:31:05 - INFO : ConstantFolding finished
Traceback (most recent call last):
  File "/workspace/tpu-mlir/python/tools/model_transform.py", line 442, in <module>
    tool.model_transform(args.mlir, args.add_postprocess, args.patterns_count)
  File "/workspace/tpu-mlir/python/tools/model_transform.py", line 63, in model_transform
    self.converter.generate_mlir(mlir_origin)
  File "/workspace/tpu-mlir/python/transform/OnnxConverter.py", line 739, in generate_mlir
    self.onnxop_factory.get(n.op_type, lambda x: NoneAndRaise(x))(n)
  File "/workspace/tpu-mlir/python/transform/OnnxConverter.py", line 250, in <lambda>
    "ReduceMax": lambda node: self.convert_reduce_op(node),
  File "/workspace/tpu-mlir/python/transform/OnnxConverter.py", line 1909, in convert_reduce_op
    op = self.getOperand(onnx_node.inputs[0])
  File "/workspace/tpu-mlir/python/transform/BaseConverter.py", line 54, in getOperand
    raise KeyError("operand {} not found".format(name))
KeyError: 'operand /Constant_2_output_0 not found'
2025/01/06 18:31:34 - INFO : TPU-MLIR v1.13.beta.0-84-g8eed36af6-20250106
Traceback (most recent call last):
  File "/workspace/tpu-mlir/python/tools/model_deploy.py", line 536, in <module>
    tool = DeployTool(args)
  File "/workspace/tpu-mlir/python/tools/model_deploy.py", line 92, in __init__
    self.module = MlirParser(args.mlir)
  File "/workspace/tpu-mlir/python/utils/mlir_parser.py", line 164, in __init__
    with open(mlir_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'groundingdino.mlir'
mv: cannot stat 'groundingdino_bm1684x_fp16.bmodel': No such file or directo

跑的是官方demo,打印onnx节点是有/Constant_2_output_0这个输出节点的

@jiajia1417 jiajia1417 changed the title GroundingDINO模型onnx转bmodel时,报keyError: 'operand /Constant_2_output_0 not found' #204 GroundingDINO模型onnx转bmodel报错 Jan 7, 2025
@sophon-leevi
Copy link
Collaborator

用groudingdino 这个例程提供的mlir版本试试。

@jiajia1417
Copy link
Author

jiajia1417 commented Jan 8, 2025

用groudingdino 这个例程提供的mlir版本试试。
现在可以onnx转mlir成功了,但是mlir转bmodel报以下错误

2025/01/07 16:44:18 - INFO : TPU-MLIR v1.9.beta.0-89-g009410603-20240715
2025/01/07 16:44:26 - INFO : 
  load_config Preprocess args : 
        resize_dims           : [800, 800]
        keep_aspect_ratio     : False
        keep_ratio_mode       : letterbox
        pad_value             : 0
        pad_type              : center
        input_dims            : [800, 800]
        --------------------------
        mean                  : [0.0, 0.0, 0.0]
        scale                 : [1.0, 1.0, 1.0]
        --------------------------
        pixel_format          : bgr
        channel_format        : nchw

[Running]: tpuc-opt /workspace/GroundingDINO_Torch/out/groundingDino.mlir --processor-assign="chip=bm1688 num_device=1 num_core=1 addr_mode=auto" --processor-top-optimize --convert-top-to-tpu="mode=F16  asymmetric=False doWinograd=False ignore_f16_overflow=False q_group_size=0" --canonicalize --weight-fold -o /workspace/GroundingDINO_Torch/out/groundingDino_bm1688_f16_tpu.mlir
Create Core #0/2, NPU_NUM=32
  LocalMem [0x25000000, 0x25400000), size=4194304
  StaticMem [0x25800000, 0x25810000), size=65536
The dir path of compiler_profile is "./"
bmcpu init: skip cpu_user_defined
Cannot open [libusercpu.so](http://libusercpu.so/), disable user cpu layer.
%0 = "top.None"() : () -> none loc(unknown)
v is none type
UNREACHABLE executed at /workspace/tpu-mlir/lib/Support/Module.cpp:401!
PLEASE submit a bug report to [llvm/llvm-project](https://github.com/llvm/llvm-project/issues/) and include the crash backtrace.
Stack dump:
0.      Program arguments: tpuc-opt /workspace/GroundingDINO_Torch/out/groundingDino.mlir --init "--processor-assign=chip=bm1688 num_device=1 num_core=1 addr_mode=auto" --processor-top-optimize "--convert-top-to-tpu=mode=F16  asymmetric=False doWinograd=False ignore_f16_overflow=False q_group_size=0" --canonicalize --weight-fold --deinit --mlir-print-debuginfo -o /workspace/GroundingDINO_Torch/out/groundingDino_bm1688_f16_tpu.mlir
 #0 0x0000557d71e07b47 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7e4b47)
 #1 0x0000557d71e0586e (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7e286e)
 #2 0x0000557d71e084ca (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7e54ca)
 #3 0x00007f2c313cb520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007f2c3141f9fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #5 0x00007f2c313cb476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #6 0x00007f2c313b17f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #7 0x0000557d71e05691 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7e2691)
 #8 0x0000557d73366679 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1d43679)
 #9 0x0000557d73171aed (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1b4eaed)
#10 0x0000557d7323bcf7 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1c18cf7)
#11 0x0000557d7323860f (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1c1560f)
#12 0x0000557d732016ec (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1bde6ec)
#13 0x0000557d731fe53c (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1bdb53c)
#14 0x0000557d7315abcb (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1b37bcb)
#15 0x0000557d73264fe4 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1c41fe4)
#16 0x0000557d73265611 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1c42611)
#17 0x0000557d73267ab8 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1c44ab8)
#18 0x0000557d71df91fb (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7d61fb)
#19 0x0000557d71df85c4 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7d55c4)
#20 0x0000557d7347ac48 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x1e57c48)
#21 0x0000557d71df28ca (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7cf8ca)
#22 0x0000557d71df2d94 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7cfd94)
#23 0x0000557d71df17da (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7ce7da)
#24 0x00007f2c313b2d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#25 0x00007f2c313b2e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#26 0x0000557d71df0be5 (/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/bin/tpuc-opt+0x7cdbe5)
Aborted (core dumped)
Traceback (most recent call last):
  File "/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/python/tools/model_deploy.py", line 425, in <module>
    lowering_patterns = tool.lowering()
  File "/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/python/tools/model_deploy.py", line 145, in lowering
    patterns = mlir_lowering(self.mlir_file,
  File "/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/python/utils/mlir_shell.py", line 138, in mlir_lowering
    _os_system(cmd, mute=mute)
  File "/workspace/GroundingDINO_Torch/tpu-mlir_v1.9/python/utils/mlir_shell.py", line 56, in _os_system
    raise RuntimeError("[!Error]: {}".format(cmd_str))
RuntimeError: [!Error]: tpuc-opt /workspace/GroundingDINO_Torch/out/groundingDino.mlir --processor-assign="chip=bm1688 num_device=1 num_core=1 addr_mode=auto" --processor-top-optimize --convert-top-to-tpu="mode=F16  asymmetric=False doWinograd=False ignore_f16_overflow=False q_group_size=0" --canonicalize --weight-fold -o /workspace/GroundingDINO_Torch/out/groundingDino_bm1688_f16_tpu.mlir

请问这个是什么原因呢,用的milr版本是readme里的1.9版本。我的dino是微调权重后的,然后直接使用GroundingDINO_Torch将pytorch成功转成了onnx

@sophon-leevi
Copy link
Collaborator

我试着转了一下onnx,onnx转bmodel也没问题呢,您这边微调权重有改模型结构吗?
或者您导出onnx时所用的torch、onnx版本是什么样的?我的版本是:
onnx 1.17.0
torch 2.4.1+cu124
transformers 4.46.2

@jiajia1417
Copy link
Author

我试着转了一下onnx,onnx转bmodel也没问题呢,您这边微调权重有改模型结构吗? 或者您导出onnx时所用的torch、onnx版本是什么样的?我的版本是: onnx 1.17.0 torch 2.4.1+cu124 transformers 4.46.2

我用netron看了onnx没有更改模型结构,然后版本已经和您用的一样了,但是现在转bmodel时,在转换过程的最后报错。

Searching best group slices...
[    #                  #     #                 #] 100%
clusters idx(size): 0(1), 
process base group 403, layer_num=1, cluster_num=1
clusters idx(size): 0(1), 
process base group 404, layer_num=1, cluster_num=1
clusters idx(size): 0(1), 
process base group 405, layer_num=1, cluster_num=1
-------------------------------------------------------
Consider redundant computation and gdma cost
-------------------------------------------------------
-------------------------------------------------------
Merge cut idx to reduce gdma cost
-------------------------------------------------------
==---------------------------==
Run GroupPostTransformPass : 
    Some transform after layer groups is determined
==---------------------------==
==---------------------------==
Run TimeStepAssignmentPass : 
    Assign timestep task for each group.
==---------------------------==
==---------------------------==
Run LocalMemoryAllocationPass : 
    Allocate local memory for all layer groups
==---------------------------==
==---------------------------==
Run TimeStepCombinePass : 
    Combine time step for better parallel balance
==---------------------------==
===group idx: 95
merge timestep 3 to timestep 2
merge timestep 4 to timestep 3
===group idx: 243
merge timestep 3 to timestep 2
merge timestep 4 to timestep 3
===group idx: 391
merge timestep 3 to timestep 2
merge timestep 4 to timestep 3
===group idx: 539
merge timestep 3 to timestep 2
merge timestep 4 to timestep 3
===group idx: 687
merge timestep 3 to timestep 2
merge timestep 4 to timestep 3
===group idx: 835
merge timestep 3 to timestep 2
merge timestep 4 to timestep 3
===group idx: 884
merge timestep 5 to timestep 4
merge timestep 5 to timestep 4
==---------------------------==
Run GroupDataMoveOverlapPass : 
    Overlap data move between two layer group
==---------------------------==
GmemAllocator use OpSizeOrderAssign
%329 = "tpu.Reshape"(%10) {shape = []} : (tensor<1x1x256x256xf16, 687198306304 : i64>) -> tensor<1x256x1x256xf16, 687198306304 : i64> loc("/bert/Mul_output_0_Mul_reshape")
op name conflict
UNREACHABLE executed at /workspace/GroundingDINO_Torch/tpu-mlir-1.8/lib/Support/Module.cpp:1671!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: tpuc-opt groundingdino_bm1688_f16_tpu.mlir --init --mlir-disable-threading "--strip-io-quant=quant_input=False quant_output=False quant_input_list= quant_output_list=" --processor-tpu-optimize --dev-parallel --weight-reorder --subnet-divide=dynamic=False --op-reorder "--layer-group=opt=2 group_by_cores=auto compress_mode=none" --core-parallel --address-assign=addr_mode=auto --deinit --mlir-print-debuginfo -o groundingdino_bm1688_f16_final.mlir
 #0 0x00005617bbdff8f7 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7c78f7)
 #1 0x00005617bbdfd61e (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7c561e)
 #2 0x00005617bbe0027a (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7c827a)
 #3 0x00007fef3d5a5520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007fef3d5f99fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #5 0x00007fef3d5a5476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #6 0x00007fef3d58b7f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #7 0x00005617bbdfd441 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7c5441)
 #8 0x00005617bd30163e (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x1cc963e)
 #9 0x00005617bbf5130e (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x91930e)
#10 0x00005617bd3001d6 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x1cc81d6)
#11 0x00005617bd1f5de4 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x1bbdde4)
#12 0x00005617bd1f6411 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x1bbe411)
#13 0x00005617bd1f88b8 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x1bc08b8)
#14 0x00005617bbdf0fab (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7b8fab)
#15 0x00005617bbdf0374 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7b8374)
#16 0x00005617bd40b378 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x1dd3378)
#17 0x00005617bbdea67a (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7b267a)
#18 0x00005617bbdeab44 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7b2b44)
#19 0x00005617bbde958a (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7b158a)
#20 0x00007fef3d58cd90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#21 0x00007fef3d58ce40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#22 0x00005617bbde8995 (/workspace/GroundingDINO_Torch/tpu-mlir-1.8/install/bin/tpuc-opt+0x7b0995)
Aborted (core dumped)
Traceback (most recent call last):
  File "/workspace/GroundingDINO_Torch/tpu-mlir-1.8/python/tools/model_deploy.py", line 393, in <module>
    tpu_patterns = tool.build_model()
  File "/workspace/GroundingDINO_Torch/tpu-mlir-1.8/python/tools/model_deploy.py", line 260, in build_model
    patterns = mlir_to_model(self.tpu_mlir, self.model, self.final_mlir, self.dynamic,
  File "/workspace/GroundingDINO_Torch/tpu-mlir-1.8/python/utils/mlir_shell.py", line 195, in mlir_to_model
    _os_system(cmd)
  File "/workspace/GroundingDINO_Torch/tpu-mlir-1.8/python/utils/mlir_shell.py", line 55, in _os_system
    raise RuntimeError("[!Error]: {}".format(cmd_str))
RuntimeError: [!Error]: tpuc-opt groundingdino_bm1688_f16_tpu.mlir --mlir-disable-threading --strip-io-quant="quant_input=False quant_output=False quant_input_list= quant_output_list=" --processor-tpu-optimize --dev-parallel --weight-reorder  --subnet-divide="dynamic=False" --op-reorder --layer-group="opt=2 group_by_cores=auto compress_mode=none" --core-parallel --address-assign="addr_mode=auto" -o groundingdino_bm1688_f16_final.mlir 
mv: cannot stat 'groundingdino_bm1688_fp16.bmodel': No such file or directory

@jiajia1417
Copy link
Author

jiajia1417 commented Jan 8, 2025

我试着转了一下onnx,onnx转bmodel也没问题呢,您这边微调权重有改模型结构吗? 或者您导出onnx时所用的torch、onnx版本是什么样的?我的版本是: onnx 1.17.0 torch 2.4.1+cu124 transformers 4.46.2
请问您的mlir的版本时什么样的呢,我跑官方demo的onnx也报这个错,转1684x没问题,转1688就不行

@sophon-leevi
Copy link
Collaborator

image
用的就是groudingdino提供的mlir版本。
image
docker版本。

@sophon-leevi
Copy link
Collaborator

我看你好像是直接在mlir的tools目录下运行的命令,这样可能不太对,正确的操作步骤是找一个空的目录放mlir发布包,然后运行source envsetup.sh设置环境变量。
再转到sophon-demo/sample/GroundingDINO/scripts/目录中运行编译脚本。

@jiajia1417
Copy link
Author

image 用的就是groudingdino提供的mlir版本。 image docker版本。

我用了download.sh里提供的GroundingDINO.onnx确实可以正常的转成bmodel,但是如果我使用GroundingDINO_Torch工具将pytorch转成onnx的话就不行,即使用的是官方提供的pth,能请您帮忙看一看吗,这是我的pth和转的onnx,通过dfss下载的GroundingDINO_Torch。

python3 -m dfss [email protected]:sophon-demo/GroundingDINO/GroundingDINO_Torch.zip

@sophon-leevi
Copy link
Collaborator

我用你的权重转模型也有问题,此问题已提供给mlir工具链团队解决,请耐心等待

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants