-
Hi, I am new to halide, and am wondering what would be the best scheduling to achieve most efficient pixel conversion from any of RGB/BGR/RGBA/BGRA to any of RGB/BGR/RGBA/BGRA Currently I have the algorithm split in two functions, the one where destination pixel type is 3 bytes long, and the one where it's 4 (with alpha channel). swap 3:
swap 4:
The image in memory is interleaved (R1 G1 B1 R2 G2 B2 R3...), and I am loading it to halide buffer like this:
I am benchmarking against OpenCV implementations of the same things, and currently achieving around 5-10 times worse results. Do you have any suggestions on improving this? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
I also tried running Adams2019 auto scheduler and got something like this: swap 3:
swap 4:
That is much faster than what I did initially, but still almost 2 times worse than OpenCV. |
Beta Was this translation helpful? Give feedback.
-
I think you want something like this (this is for RGB to BGR, but the approach generalizes to other conversions):
Notes:
|
Beta Was this translation helpful? Give feedback.
I think you want something like this (this is for RGB to BGR, but the approach generalizes to other conversions):
Notes:
f(...) = ...
implies a separate loop nest, with a separate schedule. You've only provided a schedule for the first loop nest, so for …