Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime performance of ucorrelate is disappointing #4

Open
tritemio opened this issue Aug 23, 2018 · 1 comment
Open

runtime performance of ucorrelate is disappointing #4

tritemio opened this issue Aug 23, 2018 · 1 comment

Comments

@tritemio
Copy link

Issue moved from tritemio#7. Original author @Phillip-M-Feldman.

  • Pycorrelate version: 0.3
  • Python version: 3.6
  • Operating System: Windows 10

Description

I wrote a script to compare the runtime perforance of numpy.correlate and pycorrelate.ucorrelate. The input sequences have length 1e5, and the maximum lag of interest is 10,000. Since I can't specify a maximum lag with numpy.correlate, I used the mode='same' option.

Although numpy.correlate does the full cross-correlation, and was thus expected to be much slower, the results showed that it is actually faster (the following test results are based on 10 iterations in each case):

elapsed (wall time) | user CPU | system CPU

      numpy:  37.03 | 141.09   |   6.92
pycorrelate: 106.91 |  55.62   |  51.78

I'm wondering if you have any thoughts re. why the elapse time is worse for pycorrelate.ucorrelate. Also, have you considering using FFTs to speed up the calculations?

@tritemio
Copy link
Author

tritemio commented Aug 23, 2018

@Phillip-M-Feldman, thanks for doing the benchmark, very useful. I added ucorrelate as a side function to pycorrelate, so I didn't put much though on optimizing the implementation. The ucorrelate function is a naive implementation of the correlation defintion, using a for-loop and numba for acceleration. The numpy code is highly optimized, except that it computes all the time lags. I expect that when there is a big difference between max time lag and array size, ucorrelate will have an edge. As the max lag becomes a bigger fraction of the total size, numpy will win. I didn't benchmark where the tipping point is. In a few test I ran, I used max lags of 1/1000 the size of the array and I remember having an advantage.

That said, I would like to see the example script you used.

A few suggestion you could try. Are you sure numba is working properly?

  • You can try modifying the ucorrelate decorator to @numba.jit(nopython=True), or to specify the input types.
  • You can try translating the function to cython. If numba does its job the performance should be very similar to cython.

Regarding the FFT approach, can it take advantage of a reduced max-lag?

In the future ucorrelate could do some euristics and switch between numpy or custom implementation based on maxlag. But we need some good benchmark for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant