We provide code for deployment with TensorRT python API. In general, once you use NVIDIA GPU in your applications, TensorRT is the best choice for deployment, rather than training frameworks like TensorFlow, PyTorch, MXNet, Caffe...
Refer to inference_speed_evaluation for details.
- usr
to_onnx.py
to generate onnx model file - run
predict_tensorrt.py
to do inference based on the generated model file - after you fully understand the code, you may reform and merge it to your own project.
In most practical cases, C++ is the primary choice for efficient running. So you can rewrite the code according to the python code structure. In the future, we will provide C++ version.
TBD