MiGraphx CPU/GPU Status Tracking #325

zjgarvey · 2024-08-14T19:13:04Z

This issue will be used to track compilation failures for migraphx models on CPU and GPU. Compile failures for each model should have a link to an issue with a smaller reproducer in the notes column.

Notes:

migraphx_ORT__bert_base_cased_1 fails on CPU but passes on GPU. Other adjacent models fail for similar reasons on both. Very odd.
not including tests migraphx_sdxl__unet__model, migraphx_ORT__bert_large_uncased_1 because they cause a crash (likely OOM)
not including any of the tf models yet.

CPU Status Table

The Following report was generated with IREE compiler version iree-org/iree@caacf6c
Torch-mlir version llvm/torch-mlir@2665ed3

Passing Summary

TOTAL TESTS = 30

Stage	# Passing	% of Total	% of Attempted
Setup	30	100.0%	100.0%
IREE Compilation	24	80.0%	80.0%
Gold Inference	22	73.3%	91.7%
IREE Inference Invocation	19	63.3%	86.4%
Inference Comparison (PASS)	15	50.0%	78.9%

Fail Summary

TOTAL TESTS = 30

Stage	# Failed at Stage	% of Total
Setup	0	0.0%
IREE Compilation	6	20.0%
Gold Inference	2	6.7%
IREE Inference Invocation	3	10.0%
Inference Comparison	4	13.3%

Test Run Detail

Test was run with the following arguments:
Namespace(device='local-task', backend='llvm-cpu', iree_compile_args=None, mode='cl-onnx-iree', torchtolinalg=True, stages=None, skip_stages=None, benchmark=False, load_inputs=False, groups='all', test_filter='migraphx', testsfile=None, tolerance=None, verbose=True, rundirectory='test-run', no_artifacts=False, cleanup='0', report=True, report_file='mi_10_10.md')

Test	Exit Status	Mean Benchmark Time (ms)	Notes
migraphx_agentmodel__AgentModel	compilation	None	iree-18268 iree-18412 torch-mlir-3651
migraphx_bert__bert-large-uncased	preprocessing	None
migraphx_bert__bertsquad-12	Numerics	None
migraphx_cadene__dpn92i1	PASS	None
migraphx_cadene__inceptionv4i16	PASS	None
migraphx_cadene__resnext101_64x4di1	PASS	None
migraphx_cadene__resnext101_64x4di16	PASS	None
migraphx_huggingface-transformers__bert_mrpc8	native_inference	None
migraphx_mlperf__bert_large_mlperf	Numerics	None
migraphx_mlperf__resnet50_v1	PASS	None
migraphx_models__whisper-tiny-decoder	compiled_inference	None
migraphx_models__whisper-tiny-encoder	native_inference	None
migraphx_onnx-misc__taau_low_res_downsample_d2s_for_infer_time_fp16_opset11	import_model	None
migraphx_onnx-model-zoo__gpt2-10	preprocessing	None
migraphx_ORT__bert_base_cased_1	PASS	None
migraphx_ORT__bert_base_uncased_1	PASS	None
migraphx_ORT__bert_large_uncased_1	PASS	None
migraphx_ORT__distilgpt2_1	compiled_inference	None
migraphx_ORT__onnx_models__bert_base_cased_1_fp16_gpu	Numerics	None
migraphx_ORT__onnx_models__bert_large_uncased_1_fp16_gpu	Numerics	None
migraphx_ORT__onnx_models__distilgpt2_1_fp16_gpu	compiled_inference	None
migraphx_pytorch-examples__wlang_gru	PASS	None
migraphx_pytorch-examples__wlang_lstm	PASS	None
migraphx_sd__unet__model	import_model	None
migraphx_sdxl__unet__model	import_model	None
migraphx_torchvision__densenet121i32	PASS	None
migraphx_torchvision__inceptioni1	PASS	None
migraphx_torchvision__inceptioni32	PASS	None
migraphx_torchvision__resnet50i1	PASS	None
migraphx_torchvision__resnet50i64	PASS	None

OLD STATUS (Will update and migrate issues to current table)

Test	Exit Status	Notes
migraphx_agentmodel__AgentModel	compilation
migraphx_bert__bert-large-uncased	compilation	iree-18269 Two IR reported under this, depicting different behavior
migraphx_bert__bertsquad-12	compilation	iree-18267 torch-mlir-3647
migraphx_cadene__dpn92i1	PASS
migraphx_cadene__inceptionv4i16	PASS
migraphx_cadene__resnext101_64x4di1	PASS
migraphx_cadene__resnext101_64x4di16	PASS
migraphx_huggingface-transformers__bert_mrpc8	compilation	iree-18413
migraphx_mlperf__bert_large_mlperf	compilation	iree-18297
migraphx_mlperf__resnet50_v1	PASS
migraphx_models__whisper-tiny-decoder	compilation	torch-mlir-3647
migraphx_models__whisper-tiny-encoder	compilation	torch-mlir-3647
migraphx_onnx-misc__taau_low_res_downsample_d2s_for_infer_time_fp16_opset11	construct_inputs	ORT issue with resize with f16 inputs?
migraphx_onnx-model-zoo__gpt2-10	compilation	shark-turbine-465 torch-mlir-615 torch-mlir-3293
migraphx_ORT__bert_base_cased_1	Numerics	Passed when '--iree-input-demote-i64-to-i32' is not present iree-18273
migraphx_ORT__bert_base_uncased_1	Numerics	Passed when '--iree-input-demote-i64-to-i32' is not present
migraphx_ORT__bert_large_uncased_1	compilation	crashes "MatMul" fail to legalize stream.cmd.dispatch iree-org/iree#18229 llvm/torch-mlir#3647 ??
migraphx_ORT__distilgpt2_1	Numerics
migraphx_ORT__onnx_models__bert_base_cased_1_fp16_gpu	Numerics
migraphx_ORT__onnx_models__bert_large_uncased_1_fp16_gpu	Numerics
migraphx_ORT__onnx_models__distilgpt2_1_fp16_gpu	Numerics
migraphx_pytorch-examples__wlang_gru	Numerics	iree-18441
migraphx_pytorch-examples__wlang_lstm	Numerics	iree-18441
migraphx_sd__unet__model	import_model	Killed during MLIR import. Too big??
migraphx_sdxl__unet__model	import_model	Killed during MLIR import. Too big??
migraphx_torchvision__densenet121i32	PASS
migraphx_torchvision__inceptioni1	PASS
migraphx_torchvision__inceptioni32	PASS
migraphx_torchvision__resnet50i1	PASS
migraphx_torchvision__resnet50i64	PASS

GPU Status Table

last generated with pip installed iree tools at version

iree-compiler      20240903.1005
iree-runtime       20240903.1005

Summary

Stage	Count
Total	21 (non-crashing, see table below)
PASS	12
Numerics	2
results-summary	0
postprocessing	0
compiled_inference	up to 5 (not included in total) crash during this stage
compilation	4
preprocessing	0
import_model	1
native_inference	2
construct_inputs	0
setup	0

Test Run Detail

Test was run with the following arguments:
Namespace(device='hip://1', backend='rocm', iree_compile_args=['iree-hip-target=gfx942'], mode='onnx-iree', torchtolinalg=False, stages=None, skip_stages=None, load_inputs=False, groups='all', test_filter='migraphx', tolerance=None, verbose=True, rundirectory='test-run', no_artifacts=False, report=True, report_file='9_3_migraphx.md')

Test	Exit Status	Notes
migraphx_agentmodel__AgentModel	compilation	related : llvm/torch-mlir#3630
migraphx_bert__bert-large-uncased	compilation	operand return type issue (see CPU table)
migraphx_bert__bertsquad-12	compilation (without shape inference)/ compiled_inference	1. Failing to use shape inference torch-mlir passes in torch-to-iree pipeline gives an all dynamic squeeze-dim op. 2. If using torch-lower-to-backend-contract to get the shape information, this crashes during inference with OOB memory access
migraphx_cadene__dpn92i1	PASS
migraphx_cadene__inceptionv4i16	PASS
migraphx_cadene__resnext101_64x4di1	PASS
migraphx_cadene__resnext101_64x4di16	PASS
migraphx_huggingface-transformers__bert_mrpc8	native_inference
migraphx_mlperf__bert_large_mlperf	native_inference
migraphx_mlperf__resnet50_v1	PASS
migraphx_onnx-misc__taau_low_res_downsample_d2s_for_infer_time_fp16_opset11	import_model
migraphx_onnx-model-zoo__gpt2-10	compilation	nod-ai/SHARK-ModelDev#465 llvm/torch-mlir#615 llvm/torch-mlir#3293
migraphx_ORT__bert_base_cased_1	PASS
migraphx_ORT__bert_base_uncased_1	PASS
migraphx_ORT__distilgpt2_1	likely compiled_inference	crashes with "Memory access fault by GPU node-3 (Agent handle: 0x5595fe450840) on address 0x7f1811a56000. Reason: Unknown."
migraphx_ORT__onnx_models__bert_base_cased_1_fp16_gpu	compiled_inference	causes a hard crash for trying to access memory out of bounds (Mi300x)
migraphx_ORT__onnx_models__bert_large_uncased_1_fp16_gpu	compiled_inference	same crash as above
migraphx_ORT__onnx_models__distilgpt2_1_fp16_gpu	likely compiled_inference	crashes with "Memory access fault by GPU node-3 (Agent handle: 0x5595fe450840) on address 0x7f1811a56000. Reason: Unknown."
migraphx_pytorch-examples__wlang_gru	Numerics
migraphx_pytorch-examples__wlang_lstm	Numerics
migraphx_torchvision__densenet121i32	PASS
migraphx_torchvision__inceptioni1	PASS
migraphx_torchvision__inceptioni32	PASS
migraphx_torchvision__resnet50i1	PASS
migraphx_torchvision__resnet50i64	PASS

Note: GPU missing sd model (runs out of memory and kills the test). Probably happening during native inference, so it might need some looking into.

Performance data with iree-benchmark-module on GPU

Summary

Stage	Count
Total	30
PASS	13
Numerics	3
results-summary	0
postprocessing	0
benchmark	0
compiled_inference	2
native_inference	1
construct_inputs	0
compilation	8
preprocessing	0
import_model	3
setup	0

Test Run Detail

Test was run with the following arguments:
Namespace(device='local-task', backend='llvm-cpu', iree_compile_args=None, mode='cl-onnx-iree', torchtolinalg=False, stages=None, skip_stages=None, benchmark=True, load_inputs=False, groups='all', test_filter='migraphx', testsfile=None, tolerance=None, verbose=True, rundirectory='test-run', no_artifacts=False, cleanup='0', report=True, report_file='report.md')

Test	Exit Status	Mean Benchmark Time (ms)
migraphx_agentmodel__AgentModel	compilation	None
migraphx_bert__bert-large-uncased	compilation	None
migraphx_bert__bertsquad-12	compilation	None
migraphx_cadene__dpn92i1	PASS	457.4397828740378
migraphx_cadene__inceptionv4i16	PASS	26072.668661984306
migraphx_cadene__resnext101_64x4di1	PASS	995.6825857516378
migraphx_cadene__resnext101_64x4di16	PASS	6324.309662605326
migraphx_huggingface-transformers__bert_mrpc8	compilation	None
migraphx_mlperf__bert_large_mlperf	PASS	8195.630943014596
migraphx_mlperf__resnet50_v1	PASS	219.81522629761858
migraphx_models__whisper-tiny-decoder	compiled_inference	None
migraphx_models__whisper-tiny-encoder	native_inference	None
migraphx_onnx-misc__taau_low_res_downsample_d2s_for_infer_time_fp16_opset11	import_model	None
migraphx_onnx-model-zoo__gpt2-10	compilation	None
migraphx_ORT__bert_base_cased_1	PASS	817.4834945239127
migraphx_ORT__bert_base_uncased_1	compilation	None
migraphx_ORT__bert_large_uncased_1	PASS	2728.984761983156
migraphx_ORT__distilgpt2_1	compiled_inference	None
migraphx_ORT__onnx_models__bert_base_cased_1_fp16_gpu	Numerics	2141.3577783387154
migraphx_ORT__onnx_models__bert_large_uncased_1_fp16_gpu	Numerics	6767.566671983029
migraphx_ORT__onnx_models__distilgpt2_1_fp16_gpu	Numerics	101.96079453453422
migraphx_pytorch-examples__wlang_gru	compilation	None
migraphx_pytorch-examples__wlang_lstm	compilation	None
migraphx_sd__unet__model	import_model	None
migraphx_sdxl__unet__model	import_model	None
migraphx_torchvision__densenet121i32	PASS	2639.900082334255
migraphx_torchvision__inceptioni1	PASS	627.4162046611309
migraphx_torchvision__inceptioni32	PASS	22124.727455200627
migraphx_torchvision__resnet50i1	PASS	284.1490000589854
migraphx_torchvision__resnet50i64	PASS	11100.900294492021

The text was updated successfully, but these errors were encountered:

nirvedhmeshram · 2024-08-19T16:12:21Z

@zjgarvey added llvm/torch-mlir#3647 to some of the models as we need that along with iree-org/iree#18229

MaheshRavishankar · 2024-08-19T17:12:57Z

cc @lialan as well. Can you co-ordinate with Zach to track CPU codegen issues.

nirvedhmeshram · 2024-08-20T14:19:39Z

Also adding llvm/torch-mlir#3651 that needs to be done for supporting broad range of models.

zjgarvey · 2024-11-05T18:51:35Z

Updated benchmarks for static-dim bert tests on mi300:

Passing Summary

TOTAL TESTS = 18

Stage	# Passing	% of Total	% of Attempted
Setup	18	100.0%	100.0%
IREE Compilation	18	100.0%	100.0%
Gold Inference	18	100.0%	100.0%
IREE Inference Invocation	18	100.0%	100.0%
Inference Comparison (PASS)	16	88.9%	88.9%

Fail Summary

TOTAL TESTS = 18

Stage	# Failed at Stage	% of Total
Setup	0	0.0%
IREE Compilation	0	0.0%
Gold Inference	0	0.0%
IREE Inference Invocation	0	0.0%
Inference Comparison	2	11.1%

Test Run Detail

Test was run with the following arguments:
Namespace(device='hip://1', backend='rocm', iree_compile_args=['iree-hip-target=gfx942'], mode='cl-onnx-iree', torchtolinalg=False, stages=None, skip_stages=None, benchmark=True, load_inputs=False, groups='all', test_filter='migx_', testsfile=None, tolerance=None, verbose=True, rundirectory='test-run', no_artifacts=False, cleanup='0', report=True, report_file='bert-bench-11-5.md', get_metadata=False)

Test	Exit Status	Mean Benchmark Time (ms)
migx_bench_bert-large-uncased_16_128	PASS	31.207363539631814
migx_bench_bert-large-uncased_16_256	PASS	55.50303652834816
migx_bench_bert-large-uncased_16_384	Numerics	73.14148765678208
migx_bench_bert-large-uncased_1_128	PASS	13.602430612827915
migx_bench_bert-large-uncased_1_256	PASS	14.240951777125396
migx_bench_bert-large-uncased_1_384	PASS	19.958815195908148
migx_bench_bert-large-uncased_2_128	PASS	13.128591842236526
migx_bench_bert-large-uncased_2_256	PASS	13.671312931608528
migx_bench_bert-large-uncased_2_384	PASS	21.517712740472167
migx_bench_bert-large-uncased_32_128	PASS	62.9078254498767
migx_bench_bert-large-uncased_32_256	PASS	101.5021381234484
migx_bench_bert-large-uncased_32_384	Numerics	143.94597491870323
migx_bench_bert-large-uncased_4_128	PASS	14.44128212411286
migx_bench_bert-large-uncased_4_256	PASS	17.125056890238607
migx_bench_bert-large-uncased_4_384	PASS	26.636395024326745
migx_bench_bert-large-uncased_8_128	PASS	18.925565496288442
migx_bench_bert-large-uncased_8_256	PASS	27.419584516722423
migx_bench_bert-large-uncased_8_384	PASS	41.23994989284113

zjgarvey mentioned this issue Aug 16, 2024

[Tracker] Onnx FE Support nod-ai/SHARK-ModelDev#564

Open

MaheshRavishankar assigned MaheshRavishankar, nirvedhmeshram and lialan Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiGraphx CPU/GPU Status Tracking #325

MiGraphx CPU/GPU Status Tracking #325

zjgarvey commented Aug 14, 2024 •

edited

Loading

nirvedhmeshram commented Aug 19, 2024

MaheshRavishankar commented Aug 19, 2024

nirvedhmeshram commented Aug 20, 2024

zjgarvey commented Nov 5, 2024

MiGraphx CPU/GPU Status Tracking #325

MiGraphx CPU/GPU Status Tracking #325

Comments

zjgarvey commented Aug 14, 2024 • edited Loading

Notes:

CPU Status Table

Passing Summary

Fail Summary

Test Run Detail

OLD STATUS (Will update and migrate issues to current table)

GPU Status Table

Summary

Test Run Detail

Performance data with iree-benchmark-module on GPU

Summary

Test Run Detail

nirvedhmeshram commented Aug 19, 2024

MaheshRavishankar commented Aug 19, 2024

nirvedhmeshram commented Aug 20, 2024

zjgarvey commented Nov 5, 2024

Updated benchmarks for static-dim bert tests on mi300:

Passing Summary

Fail Summary

Test Run Detail

zjgarvey commented Aug 14, 2024 •

edited

Loading