runtime performance of ucorrelate is disappointing #4

tritemio · 2018-08-23T08:49:35Z

Issue moved from tritemio#7. Original author @Phillip-M-Feldman.

Pycorrelate version: 0.3
Python version: 3.6
Operating System: Windows 10

Description

I wrote a script to compare the runtime perforance of numpy.correlate and pycorrelate.ucorrelate. The input sequences have length 1e5, and the maximum lag of interest is 10,000. Since I can't specify a maximum lag with numpy.correlate, I used the mode='same' option.

Although numpy.correlate does the full cross-correlation, and was thus expected to be much slower, the results showed that it is actually faster (the following test results are based on 10 iterations in each case):

elapsed (wall time) | user CPU | system CPU

      numpy:  37.03 | 141.09   |   6.92
pycorrelate: 106.91 |  55.62   |  51.78

I'm wondering if you have any thoughts re. why the elapse time is worse for pycorrelate.ucorrelate. Also, have you considering using FFTs to speed up the calculations?

The text was updated successfully, but these errors were encountered:

tritemio · 2018-08-23T08:56:37Z

@Phillip-M-Feldman, thanks for doing the benchmark, very useful. I added ucorrelate as a side function to pycorrelate, so I didn't put much though on optimizing the implementation. The ucorrelate function is a naive implementation of the correlation defintion, using a for-loop and numba for acceleration. The numpy code is highly optimized, except that it computes all the time lags. I expect that when there is a big difference between max time lag and array size, ucorrelate will have an edge. As the max lag becomes a bigger fraction of the total size, numpy will win. I didn't benchmark where the tipping point is. In a few test I ran, I used max lags of 1/1000 the size of the array and I remember having an advantage.

That said, I would like to see the example script you used.

A few suggestion you could try. Are you sure numba is working properly?

You can try modifying the ucorrelate decorator to @numba.jit(nopython=True), or to specify the input types.
You can try translating the function to cython. If numba does its job the performance should be very similar to cython.

Regarding the FFT approach, can it take advantage of a reduced max-lag?

In the future ucorrelate could do some euristics and switch between numpy or custom implementation based on maxlag. But we need some good benchmark for this.

tritemio mentioned this issue Aug 23, 2018

runtime performance of ucorrelate is disappointing tritemio/pycorrelate#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime performance of ucorrelate is disappointing #4

runtime performance of ucorrelate is disappointing #4

tritemio commented Aug 23, 2018

tritemio commented Aug 23, 2018 •

edited

Loading

runtime performance of ucorrelate is disappointing #4

runtime performance of ucorrelate is disappointing #4

Comments

tritemio commented Aug 23, 2018

Description

tritemio commented Aug 23, 2018 • edited Loading

tritemio commented Aug 23, 2018 •

edited

Loading