Unable to replicate fps results on AGX Xavier #275

lpkoh · 2021-11-23T13:59:14Z

Hi,

I am using an AGX Xavier. I followed the instructions to run the demo for 2d object detection. I built a yolo4_fp16.rt model, which would be a 416x416 model. I then ran ./demo yolo4_fp16.rt, with batch = 1, and received an fps of ~9. This is significantly less than the ~ 41 FPS reported. Images are shown below:

I have no other background processes running. I do not have CUDA_VISIBLE_DEVICES set to anything. My nvpmodel is set to 1 (settings below). I have run sudo jetson_clocks.

I am aware from looking at some of the other issues that this reported fps corresponds to inference only, so unsure why it is so slow (significantly slower than just testing with tensorrt with ./trtexec)

lpkoh · 2021-11-23T14:05:36Z

Result on csp. Don't think it is a thermal throttling issue as Jetson AGX Xavier is cool to the touch and I have a fan blowing directly at it.

lpkoh · 2021-11-23T17:01:32Z

Hi,

I have repeated the experiment. The original one was with a low power setting.

This is my environment:

Jetpack 4.5.1, TensorRT 7.1.3, CUDA 10.2, CUDNN 8.01

Other details:

Device: AGX Xavier
Mode: nvpmodel 4 - 30W 6CORE
Model: yolov4-fp16-516x516
Batch: 1

Results:

Another result, this time with Yolov4-csp-512x512 fp16:

I have two questions:

The first result does not match the 41.01 FPS from AGX Xavier tolo4 416 result

Why is this so? Could it be because I am on MODE 30W 6CORE vs MAXN setting for AGX Xavier? I can't test this as I face an issue where the device shuts off when I run it with nvpmodel -m 0
The results seem to be slower than just pure tensorrt. I ran a separate experiment with just ./trtexec on darknet weights that were converted to trt. I ran this multiple times, including on Yolov4-csp-512x512-fp16 (same number of classes, filters, etc.). The nvpmodel and jetson clocks were all the same. However, I obtained a result of 37.4 fps vs 21.9 fps above. As both are inference only fps, this would imply tensorrt + tkdnn is actually slower, all else held constant (as far as I can see). Is there a reason for this? Am I not maxing out tkdnn in some way? As it seems to be slower than raw tensorrt.

lpkoh · 2021-11-24T08:16:26Z

I have re run the result with adac857, thinking it might be due to this issue: #226

However, the results have actually slightly worsened, with ~18 fps on yolo4-csp. Can anyone advise?

mive93 · 2021-11-24T15:14:38Z

Hi @lpkoh,

Three considerations:

when I do the tests, I always use the MAXN configuration.
I generally use sudo jetson_clocks to have the best performance.
On the readme, as specified, it is only reported the performance of the inference, what the demos prints on screen is preprocessing + inference + postprocessing. Pre/post could be optimized indeed.

Finally, yolo4-csp is not Yolov4, it's Scaled Yolo and it is slower, but more precise.

Let me know if you have further questions.

mive93 · 2021-11-24T15:36:55Z

Actually I get very similar results for Yolov4 and Yolov4-csp.
These results are obtained on a Xavier AGX, with Jetpack 4.5 with full precision (FP32), selecting only those models in this script .

test	avg ms	min ms	max ms	avg FPS
yolo4_fp32_2	47.3199	46.5271	63.1509	21.1328
yolo4-csp_fp32_2	51.1207	50.8716	51.8859	19.5615

lpkoh · 2021-11-24T15:46:55Z

Hi thank you for replying on this.

I am confused. You said here and #186 and #173 that what demo prints on screen is the preprocessing + inference + postprocessing. I thought what the demo prints on screen = demo output, hence I thought that the "only inference fps" on tkDNN was slower than the "only inference fps" on ./trtexec. Can I check where do I find the demo output that corresponds to just inference, no pre/post processing then? I don't find that information here.

Also as I understand tkDNN is a wrapper around tensorrt and cudnn. Does this mean its actually meant to be faster than just running ./trtexec on a jetson board, at least theoretically?

mive93 · 2021-11-24T16:01:07Z

Yeah, you are actually right. In the past the demo was printing also pre/post, but currently it prints the inference time only, so what you see is the inference time. For ./test_rtinference and the script scripts/test_all_tests.sh it's the same.

Yes, tkDNN is just a wrapper of tensorRT and cuDNN. It is just a framework that we use to optimize NN for our projects. It is not because it's faster that we develop it, but to easily port not supported models.

lpkoh · 2021-11-24T16:15:10Z

Ah gotcha. So I guess the difference between ~27 fps on yolo4-416x416 vs ~44 in your repo is probably down to MAXN? Could the Tensorrt version difference be an issue? I am using 7, your repo mentions 8. I heard 8 is faster, but for things like transformers, not yolo.

mive93 · 2021-11-24T16:47:46Z

Maybe it's due to MAXN and jetson_clock.
Jetpack 4.5 uses TensorRT 7.
TensorRT8, that will be supported by tkDNN very soon, is actually slower on Jetson platform for now. We hope NVIDIA will solve the issue with the next minor release.

mive93 · 2021-11-24T17:30:31Z

TensorRT8 is now supported on tensorrt8 branch.

masip85 · 2022-07-20T06:17:47Z

Does TensorRT8 still being slower?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to replicate fps results on AGX Xavier #275

Unable to replicate fps results on AGX Xavier #275

lpkoh commented Nov 23, 2021 •

edited

Loading

lpkoh commented Nov 23, 2021

lpkoh commented Nov 23, 2021 •

edited

Loading

lpkoh commented Nov 24, 2021

mive93 commented Nov 24, 2021

mive93 commented Nov 24, 2021

lpkoh commented Nov 24, 2021 •

edited

Loading

mive93 commented Nov 24, 2021

lpkoh commented Nov 24, 2021

mive93 commented Nov 24, 2021

mive93 commented Nov 24, 2021

masip85 commented Jul 20, 2022

Unable to replicate fps results on AGX Xavier #275

Unable to replicate fps results on AGX Xavier #275

Comments

lpkoh commented Nov 23, 2021 • edited Loading

lpkoh commented Nov 23, 2021

lpkoh commented Nov 23, 2021 • edited Loading

lpkoh commented Nov 24, 2021

mive93 commented Nov 24, 2021

mive93 commented Nov 24, 2021

lpkoh commented Nov 24, 2021 • edited Loading

mive93 commented Nov 24, 2021

lpkoh commented Nov 24, 2021

mive93 commented Nov 24, 2021

mive93 commented Nov 24, 2021

masip85 commented Jul 20, 2022

lpkoh commented Nov 23, 2021 •

edited

Loading

lpkoh commented Nov 23, 2021 •

edited

Loading

lpkoh commented Nov 24, 2021 •

edited

Loading