-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to replicate fps results on AGX Xavier #275
Comments
Hi @lpkoh, Three considerations:
Finally, yolo4-csp is not Yolov4, it's Scaled Yolo and it is slower, but more precise. Let me know if you have further questions. |
Actually I get very similar results for Yolov4 and Yolov4-csp.
|
Hi thank you for replying on this. I am confused. You said here and #186 and #173 that what demo prints on screen is the preprocessing + inference + postprocessing. I thought what the demo prints on screen = demo output, hence I thought that the "only inference fps" on tkDNN was slower than the "only inference fps" on ./trtexec. Can I check where do I find the demo output that corresponds to just inference, no pre/post processing then? I don't find that information here. Also as I understand tkDNN is a wrapper around tensorrt and cudnn. Does this mean its actually meant to be faster than just running ./trtexec on a jetson board, at least theoretically? |
Yeah, you are actually right. In the past the demo was printing also pre/post, but currently it prints the inference time only, so what you see is the inference time. For ./test_rtinference and the script scripts/test_all_tests.sh it's the same. Yes, tkDNN is just a wrapper of tensorRT and cuDNN. It is just a framework that we use to optimize NN for our projects. It is not because it's faster that we develop it, but to easily port not supported models. |
Ah gotcha. So I guess the difference between ~27 fps on yolo4-416x416 vs ~44 in your repo is probably down to MAXN? Could the Tensorrt version difference be an issue? I am using 7, your repo mentions 8. I heard 8 is faster, but for things like transformers, not yolo. |
Maybe it's due to MAXN and jetson_clock. |
TensorRT8 is now supported on tensorrt8 branch. |
Does TensorRT8 still being slower? |
Hi,
I am using an AGX Xavier. I followed the instructions to run the demo for 2d object detection. I built a yolo4_fp16.rt model, which would be a 416x416 model. I then ran ./demo yolo4_fp16.rt, with batch = 1, and received an fps of ~9. This is significantly less than the ~ 41 FPS reported. Images are shown below:
I have no other background processes running. I do not have CUDA_VISIBLE_DEVICES set to anything. My nvpmodel is set to 1 (settings below). I have run sudo jetson_clocks.
I am aware from looking at some of the other issues that this reported fps corresponds to inference only, so unsure why it is so slow (significantly slower than just testing with tensorrt with ./trtexec)
The text was updated successfully, but these errors were encountered: