-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Roberta and few new tests for Bert #778
Add Roberta and few new tests for Bert #778
Conversation
…re test cases; add real data input test cases
Thanks for the great contributions to TensorRT-LLM! Really impressed by your efforts. We will be excited to merge your contribution into our internal repo and then release it onto the github later, so the community can also benefit from your great work. Currently, there are ongoing efforts to improve the TensorRT-LLM workflow, such as unifying the build logic as well as the runtime logic. Here are the examples of reimplementing Bloom/OPT with the new workflow:
And we are actively working to reimplement other models with the new workflow. There are two mechanisms to merge your contributions:
Let's discuss which way makes the most sense. Thanks for your great contributions to TensorRT-LLM again. Happy new year:) June |
Hi @juney-nvidia, Thank you very much for your timely response! A unified converting and building workflow is nice and the new workflow is also elegant for various decoder models👍. However, as a beginner of Tensorrt-LLM, some config terms in the new unified build.py such as So I'd like to suggest we first keep the current workflow of Bert/Roberta and then have a new unified workflow, which allows us to track these changes and improvements between the current and new workflow. Also, before the new workflow of Bert/Roberta comes, the community can benefit from Roberta/Bert/XLMRoebrta models in TensorRT-LLM. Thank you very much! |
Hi, we had a discussion with the team and there will be engineers assigned to help merge this MR into our internal repo firstly, then publish to the github repo. We will keep you posted with the progress. Thanks June |
Hi @juney-nvidia Thank you very much👍 |
Hi @juney-nvidia, Previously, I also made a parallel/related PR in the tensorrt backend repo to support the deployment of Tensorrt-LLM-based classification models. I hope this simplified triton classification example can help the community deploy classification models based on your optimized transformers more easily and faster! I am new to both Tensorrt-LLM and Triton, so there may exist potential misunderstandings of your framework. Please feel free to adopt any useful codes of these PRs at your convenience. Thank you very much. 😊 |
Thanks, @erenup, for the great end-to-end contribution! |
Hi @symphonylyh Thank you very much! |
Hi, As @symphonylyh mentioned before, we already started the efforts of integrating your nice MR into our internal repo. During the integration, we found that it may be better to add the Roberta support as the variant of the existing BERT implementation to remove duplicated code, based on this idea, what will be finally merged into the github will not be exactly the same as what you have done in this MR. For sure your contributions will still be acknowledged since you initiated the efforts. June |
Thank you very much. Yes, Roberta can be the variant of BERT. As I mentioned in the first message of this PR "Roberta model is similar to the Bert model. The main difference is about position_id and usage of pad_token_id." However, I think it may not be straightforward for the beginner to understand this difference according to the multiple issues I have seen. so when we combine them, we'd better have a document section or readme section to tell users how to use them correctly. Thank you very much. |
Hi @erenup Thanks. For sure the necessary documentation will be prepared to tell the users how to enable it properly. Thanks again for your contribution and great suggestions. June |
@erenup we have merged the code internally for RoBERTa support, based on your PR and our refactor mentioned above (i.e., implement as delta against BERT, instead of a standalone model). It will be in the v0.8 release. Thanks for your contribution! |
Hi @symphonylyh Look forward to seeing the new release. Thank you very much. |
Hi, I am based on https://github.com/erenup/TensorRT-LLM accelerate a four-classification Bert model using trt llm. My code successfully ran through, but there are two issues that have not been resolved. The first is that the acceleration effect of trt llm on Bert is not significant, and the speed is only 1.8 times that of hf. The second issue is that the logits and hf of the Bert output accelerated by trt-llm have a significant diff, so I would like to disturb you. I hope you can guide me. Thank you and best wish!!! =========================================================== |
Thank you very much for using and testing this feature. |
hi, @erenup Thank you for your reply, your work has greatly inspired me. |
Hi @zhangjiawei5911, Thank you very much. I did not make too many comparisons between different settings. In one 4080 gpu with 128 max-seq-lenth, 80%+ GPU utilization, fp16. the speed of 12 layers of bert with tensorrt-llm can be about to 1k requests/s. It's enough and super fast for me I think. I did not try int8 since fp16 is already enough for me. Hope it could be useful for you. |
Update: it will be in v0.8 official version release, and it's now already released earlier in the dev main branch, with annoucement and acknowledgement: #1020. Thanks for the contribution! Closing for now. @erenup Please check our modified implementation based on your PR, and open any issue if needed. And @zhangjiawei5911 can you please rebase your code on the latest main and see if you can come up with a reproducible github issue if needed? Thanks |
Hi @symphonylyh are sequence classification tasks with T5 models not supported yet? |
Hi @ncomly-nvidia @Shixiaowei02 @kaiyux @juney-nvidia @jdemouth-nvidia and nvidia team,
First of all, thank you very much for the great work of TensorRT-LLM!
Pull Request Topic
This Pull Request is to support Roberta model and more test cases for Bert model.
As @ncomly-nvidia's TensorRT-LLM Requests mentioned, Roberta is expected to be supported. So I developed roberta model based on my understanding of the TensorRT-LLM framework.
There are also some issues related to Bert and Roberta:
Hope this PR could be helpful.
Features
This PR is related to 2 models: Bert related improvement and Roberta Support.
Bert Improvement
For Bert model:
Roberta model:
Environment and Testing
Happy New Year everyone.
Thank you very much.