-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup 07: 2.5x speedup with parallel parallel median / MAD calculation #531
Conversation
Fails are coming from the need for the joblib dependency. I'm not averse to adding joblib as a dependency, but if we can avoid it then that would be great. Do you know why joblib has such an advantage over direct multiprocessing? Because the mad function is just calling numpy funcs this could also be a use for multithreading given that numpy should release the GIL. This would also be fun to write a quick and simple c-func for to compare speed. |
Timings from me:
Based on this and the additional dependency I'm going to suggest a change to using a |
I felt that I was searching for alternative solutions and how to best parallelize numpy operations for quite a while - but obviously missed out on how much threadpools actually help because numpy releases the GIL. Your suggestion is of course much cleaner in not needing joblib, and being another 2.5 times faster it's a lot better. So a total speedup of 5x sounds very good, and closer to what I was expecting should be possible here :-). Thanks a lot! And while the joblib-parallelization quickly saturates with the number of cores because it needs to transfer a lot of memory, the threaded execution continues to scale with more cores - so for a cccsums of 2000000 x 500 elements I got a speedup of 8x over joblib (0.75 s vs 6.1 s), or 15x over serial with 40 cores - awesome! |
Great, thanks for merging this! I would love to take advantage of numpy releasing the GIL in more places (and this is one of my internal arguments for redesigning EQcorrscan objects so that they immediately work on numpy array rather than obspy streams, but that is something for when I have more time...). |
What does this PR do?
Implements 2.5x speedup for MAD threshold calculation in
core.match_filter.matched_filter
. With this PR, MAD thresholds can be calculated in parallel; with joblib this is already worth it with a relatively small dataset (> 15 cccsum or 2e7 values in cccsums).This has a noticeable effect for larger datasets, e.g., for 2000 templates, this reduced MAD-calculation from 50 s to 20 s with 30 cores.
Why was it initiated? Any relevant Issues?
This PR contributes to the summary issue in #522
Examples for comparing serial and parallel MAD:
PR Checklist
develop
base branch selected?- [ ] Any new features or fixed regressions are be covered via new tests.- [ ] Any new or changed features have are fully documented.CHANGES.md
.- [ ] First time contributors have added your name toCONTRIBUTORS.md
.