Halide slower than OpenCV GaussianBlur? #4672
Replies: 4 comments
-
A thought: for smaller kernels (~<5x5) sometimes I have noticed that the naive 2D iteration to be faster than decomposing the computation. A quick skim of the OpenCV code base shows that they do have 3x3 and 5x5 specializations. |
Beta Was this translation helpful? Give feedback.
-
I think you're doing a fresh compilation every time you call GaussianBlur, so what you're measuring is Halide compile time. The code that defines the pipeline should be separate to the code that actually blurs an image, so that you can define and compile the pipeline once, and use it to blur many images without recompiling. |
Beta Was this translation helpful? Give feedback.
-
I moved out the code that define the pipeline from the benchmark and the result change a lot!
From 146 ms to 1.08 ms for the first case, I guess, I need to try to create an AOT, but, the OpenCV benchmark still being faster than Halide performing the GaussianBlur. I think there must be something wrong with the source code implementation.
|
Beta Was this translation helpful? Give feedback.
-
You still will get it compiled the first time you call
Otherwise, it might be hard to say why OpenCV is faster without knowing the details of their implementation (for example, they might be doing fixed point math instead of floating, approximating something, some other algorithm differences). |
Beta Was this translation helpful? Give feedback.
-
I perform a benchmarking using OpenCV and Halide for measure GaussianBlur filter and i get the following results :
The code for perform the Gaussian Blur is the next:
I'm developing in the following environment:
Also i am working with the next jpg image https://github.com/opencv/opencv/blob/master/samples/data/lena.jpg
This is an issue, platform error, bad implementation or merely benchmarking error?
thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions