-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recorded FPS vs actual FPS #56
Comments
Hi sambo, |
Thanks. Any pointers on how to optimise those aspects? |
Opencv Is comfortable but slow expecially for visualization. Find an opengl viewer that fit your needs. |
I tried running Batchsize=1 vs 4 and I notice that the inference speed is not as fast as I thought it would be on a RTX2070. For size=1 I get ~6.6ms inf time while when i do size=4 I get ~17.7ms (2.6+ slower). Are these numbers correct? I am only using ~1.6Gb of GPU mem and about ~40% processing power even when running size 4. Would you guys know of a way to optimize this by utilizing more of the GPU power? On a side note, my pre and post processing times are awful when I do size=4 which are running a total of ~9ms. I will try to check opencv again and ensure compiled on CUDA to see if it significantly improves. |
I built opencv-4.2.0 with cuda and cudnn enabled and found that there was no substantial improvement gained on the pre and post processing portion of Yolo3Detection.cpp. In fact, it was slower in my end if i enable (uncomment tkDNN/include/tkDNN/DetectionNN.h Line 17 in 7c2155d
Also I guess the results I have on inference speed cannot be optimized anymore on my hardware? |
Hi @rod-hendricks If you really want to improve pre and post processing, you could try to implement CUDA kernels for those phases, and having everything on the device. So you could copy the frame at the beginning and copy back the bounding boxes at the end. We have not tried this solution yet, but I have it's on my list to improve those parts. |
Thanks for the response and advice @mive93 ! I am not sure yet as to the effort vs gain on doing this so I'll see if I can consider working on this later. I would like to ask further though about what you meant by Yolo having too many useless passages between host and device. Do you mean that within the yolo inference task, there is still the data passages going on between host and device before the network output is produced? |
@rod-hendricks if I have updates on any improvements, I will let you know. No, I meant only on our preprocessing. The code should be cleaned up and fix to remove a useless passage between host and device. When I'll have time I'll fix that :) |
Thanks @mive93 ! Much appreciated. Good work on this repo. Its amazing! |
Hello @mive93 , any updates on removing the useless passage between host and device? |
HI @MohammadKassemZein, |
@mive93 Sorry to bother :) any updates on this? |
Not yet, but maybe soon. |
Sounds great! |
Signed-off-by: Micaela Verucchi <[email protected]>
I'm struggling to achieve the FPS reported in the command line.
For example when I run inference on a 10 min 30fps video the reported inference fps is 300+.
I would expect that the time taken to run inference on the entire video would be 30x60x10 = 18000 / 300 fps = 60secs = 1min
Yet the code takes at least 3 mins to run. Is there something wrong with my calculation? Why would the reported fps not be the actual?
The text was updated successfully, but these errors were encountered: