Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing Model Performance: Exploring ONNX Export and Engine Integration with TensorRT and OpenVino #68

Open
AntonioConsiglio opened this issue Oct 10, 2023 · 4 comments

Comments

@AntonioConsiglio
Copy link

Hi, have you explored evaluating the architecture through the export to ONNX format and its implementation with different engines like TensorRT or OpenVino?

@z-x-yang
Copy link
Collaborator

Thanks for your interest. Currently, we haven't explored exporting the models to other formats.

@AntonioConsiglio
Copy link
Author

AntonioConsiglio commented Oct 22, 2023

Thanks for your interest. Currently, we haven't explored exporting the models to other formats.

I did some tests, unifying all the attention blocks and building it with TensorRT v8.5.
I've notice only improvement in memory consumption. For a long-term memory tensor of 5 frames max, the memory reserved (input size 1280×720) is reduced from 20GB to 10 GB.

While this improvement in memory the runtime using jetson platform is slower when building the TRT engine respect of pure Pytorch, instead using a nvidia RTX card it remain the same (I'm running the engine using python api).

What do you think about this approache (https://github.com/hkchengrex/Cutie) ? Your object memory version is similar?

@bhack
Copy link

bhack commented Oct 22, 2023

There are some Torch compile issues with these models:
pytorch/pytorch#103716

@SuyueLiu
Copy link

Could you please share the sample script to convert model to onnx?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants