-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whisper with DirectML EP not working: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for DecoderMaskedMultiHeadAttention(1) node with name 'Attention_0' #1213
Comments
Hi, could you try the workflow again by adding this part adds Olive/olive/passes/onnx/insert_beam_search.py Line 272 in 7fa2c41
|
Thank you for your reply. This fixed the NOT_IMPLEMENTED error, however I still cannot get the correct behavior although i do not get any error messages anymore. After it starts evaluating the model, it fails without any error messages. [2024-06-27 10:24:45,683] [DEBUG] [ort_inference.py:72:get_ort_inference_session] inference_settings: {'execution_provider': ['DmlExecutionProvider'], 'provider_options': None} Any idea why this happens? Could it be failing when trying the create the InferenceSession? Olive/olive/common/ort_inference.py Line 118 in b59ef7d
|
Can you paste a dump of the full log? |
Sure. Here is the full log: [2024-06-27 14:46:32,154] [INFO] [run.py:138:run_engine] Running workflow default_workflow |
that's weird that it just fails silently. Can you add |
The output of the ort log is added below. I noticed the lines "ORT optimization- Force fallback to CPU execution for node: The ort log output: |
The test_transcription.py file tries to look for a file called "whisper_dml_fp32_gpu-dml_model.onnx", but the generated file is called "whisper_dml_fp32_gpu-cpu_model.onnx", so I tried modifying the test_transcription.py file to read the generated file instead, and it results in the following output: C:\anaconda3\envs\olv-whisper\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: Traceback (most recent call last): |
Hi @jambayk any updates on this matter? |
can you uninstall both packages and re-install onnxruntime-directml? After that please run the test transcription script using the |
I have realized that having both packages was wrong, however i cannot run the script using the gpu-dml as the corresponding model is not generated due to the silent fail happening. After some changes in the config file, I got it to not silently fail, but I am now facing this issue reported in #1221. Traceback (most recent call last): |
Can you pull the latest changes from main branch and try again? |
I cloned the main branch again and ran it and I still get the [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for DecoderMaskedMultiHeadAttention(1) node with name 'Attention_0' error when I run InsertBeamSearch without "use_gpu": false and it still fails silently when I set "use_gpu": false. The problem persists. |
Hi guys, any update? |
Hi @jambayk @xiaoyu-work any updates on adding dml support to whisper? |
Describe the bug
I am trying to run Whisper on an AMD Radeon 780M Graphics using DirectML EP but it is showing the Not Implemented error below.
To Reproduce
python -m pip install onnxruntime-directml
python -m olive.workflows.run --config whisper_dml_fp32.json --setup
python -m pip install onnxruntime_extensions
python -m olive.workflows.run --config whisper_dml_fp32.json
Olive config
whisper_dml_fp32.json:
{
"input_model": {
"type": "PyTorchModel",
"config": {
"model_script": "code/user_script.py",
"script_dir": "code",
"hf_config": {
"model_class": "WhisperForConditionalGeneration",
"model_name": "openai/whisper-tiny.en",
"components": [
{
"name": "encoder_decoder_init",
"io_config": "get_encdec_io_config",
"component_func": "get_encoder_decoder_init",
"dummy_inputs_func": "encoder_decoder_init_dummy_inputs"
},
{
"name": "decoder",
"io_config": "get_dec_io_config",
"component_func": "get_decoder",
"dummy_inputs_func": "decoder_dummy_inputs"
}
],
"from_pretrained_args": {
"attn_implementation": "eager"
}
}
}
},
"systems": {
"local_system": {
"type": "LocalSystem",
"config": {
"accelerators": [
{
"device": "gpu",
"execution_providers": [
"DmlExecutionProvider"
]
}
]
}
}
},
"evaluators": {
"common_evaluator": {
"metrics": [
{
"name": "latency",
"type": "latency",
"sub_types": [
{
"name": "avg",
"priority": 1
}
],
"user_config": {
"user_script": "code/user_script.py",
"script_dir": "code",
"data_dir": "data",
"dataloader_func": "whisper_dataloader",
"func_kwargs": {
"dataloader_func": {
"model_name": "openai/whisper-tiny.en",
"use_audio_decoder": true
}
}
}
}
]
}
},
"passes": {
"conversion": {
"type": "OnnxConversion",
"config": {
"target_opset": 17
}
},
"transformers_optimization": {
"type": "OrtTransformersOptimization",
"config": {
"optimization_options": {
"use_multi_head_attention": true
},
"use_gpu": true
}
},
"insert_beam_search": {
"type": "InsertBeamSearch",
"config": {
"use_forced_decoder_ids": false,
"use_logits_processor": false,
"fp16": false
}
},
"prepost": {
"type": "AppendPrePostProcessingOps",
"config": {
"tool_command": "whisper",
"tool_command_args": {
"model_name": "openai/whisper-tiny.en",
"use_audio_decoder": true
},
"target_opset": 17
}
}
},
"engine": {
"log_severity_level": 0,
"host": "local_system",
"target": "local_system",
"evaluator": "common_evaluator",
"evaluate_input_model": false,
"clean_cache": false,
"cache_dir": "cache",
"output_dir": "models",
"output_name": "whisper_dml_fp32"
}
}
Olive logs
After it reached the step in which it tries to run Olive on gpu-dml, it fails giving this error:
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for DecoderMaskedMultiHeadAttention(1) node with name 'Attention_0'
Other information
The text was updated successfully, but these errors were encountered: