Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Inference Results Between HuggingFace Python Model and TensorRT-LLM Triton Model (v0.11.0) #2713

Closed
fclearner opened this issue Jan 23, 2025 · 2 comments

Comments

@fclearner
Copy link

fclearner commented Jan 23, 2025

I am currently using the official trtllm v0.11.0 Docker image. However, I encountered the following issues:

1、The inference results of the HuggingFace model in Python cannot align with the results of the TensorRT-LLM model converted for Triton.
2、beam search result suffers around 3% bad decrease compared to transformers beam search method

the inference codes is in: https://github.com/k2-fsa/sherpa/tree/master/triton/speech_llm

Could this be an inherent issue with version 0.11.0?
I would greatly appreciate your guidance or suggestions on how to resolve this. Thank you in advance for your help!

@fclearner fclearner reopened this Jan 23, 2025
@nv-guomingz
Copy link
Collaborator

Hi @fclearner , thx for reporting this issue. Would u please try our latest docker image to see if the issue still exists or not? 0.11 maybe too outdated at this moment.

@fclearner
Copy link
Author

Hi @fclearner , thx for reporting this issue. Would u please try our latest docker image to see if the issue still exists or not? 0.11 maybe too outdated at this moment.

Thanks for the advice! I will try the new image and close this issue for now. If the problem persists, I'll reopen it. Appreciate your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants