Inconsistent Inference Results Between HuggingFace Python Model and TensorRT-LLM Triton Model (v0.11.0) #2713

fclearner · 2025-01-23T07:18:12Z

I am currently using the official trtllm v0.11.0 Docker image. However, I encountered the following issues:

1、The inference results of the HuggingFace model in Python cannot align with the results of the TensorRT-LLM model converted for Triton.
2、beam search result suffers around 3% bad decrease compared to transformers beam search method

the inference codes is in: https://github.com/k2-fsa/sherpa/tree/master/triton/speech_llm

Could this be an inherent issue with version 0.11.0?
I would greatly appreciate your guidance or suggestions on how to resolve this. Thank you in advance for your help!

nv-guomingz · 2025-01-23T15:20:34Z

Hi @fclearner , thx for reporting this issue. Would u please try our latest docker image to see if the issue still exists or not? 0.11 maybe too outdated at this moment.

fclearner · 2025-01-26T04:27:33Z

Hi @fclearner , thx for reporting this issue. Would u please try our latest docker image to see if the issue still exists or not? 0.11 maybe too outdated at this moment.

Thanks for the advice! I will try the new image and close this issue for now. If the problem persists, I'll reopen it. Appreciate your help!

fclearner closed this as completed Jan 23, 2025

fclearner reopened this Jan 23, 2025

fclearner closed this as completed Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Inference Results Between HuggingFace Python Model and TensorRT-LLM Triton Model (v0.11.0) #2713

Inconsistent Inference Results Between HuggingFace Python Model and TensorRT-LLM Triton Model (v0.11.0) #2713

fclearner commented Jan 23, 2025 •

edited

Loading

nv-guomingz commented Jan 23, 2025

fclearner commented Jan 26, 2025

Inconsistent Inference Results Between HuggingFace Python Model and TensorRT-LLM Triton Model (v0.11.0) #2713

Inconsistent Inference Results Between HuggingFace Python Model and TensorRT-LLM Triton Model (v0.11.0) #2713

Comments

fclearner commented Jan 23, 2025 • edited Loading

nv-guomingz commented Jan 23, 2025

fclearner commented Jan 26, 2025

fclearner commented Jan 23, 2025 •

edited

Loading