Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading problem caused by FFTW dependency #80

Open
VPetukhov opened this issue Feb 19, 2020 · 3 comments
Open

Multithreading problem caused by FFTW dependency #80

VPetukhov opened this issue Feb 19, 2020 · 3 comments

Comments

@VPetukhov
Copy link

VPetukhov commented Feb 19, 2020

tl;dr if you use many threads, running FFTW.set_num_threads(1) can be a good idea. Otherwise FFTW probably slows down computation and prevents outer parallelism. I suggest adding it to the README.

Full explanation
I was trying to do a lot of KDE in the loop, but it occurred that running the code in parallel slow down the process. Even if I simply set JULIA_NUM_THREADS=20 (for 56 core server) without using @threads:

using KernelDensity
using Base.Threads

interp_kde(coords::Array{Float64, 2}, bandwidth::Float64) =
    InterpKDE(kde((coords[1,:], coords[2,:]), bandwidth=(bandwidth, bandwidth)))

td = rand(2, 100000);
@time for i in 1:500
    interp_kde(td, 1.0)
end

It creates multiple threads with loading 30% and takes 15.9 seconds. The same code with JULIA_NUM_THREADS=1 takes 7.5 seconds, working fairly in single thread. Timing doesn't really change if I use `@threads:

@time @threads for i in 1:500
    interp_kde(td, 1.0)
end

After some digging, the problem occurred to be in the FFTW package, which is called somewhere during interpolation and by default uses nthreads() * 4 threads inside its C code. To disable it you need to run FFTW.set_num_threads(1). After that, running with JULIA_NUM_THREADS=20 but without @threads takes 7.5 seconds, as it should be, and with @threads it takes 0.5 seconds.

I was trying different run configurations, but at the end, looks like having FFTW parallel improves situation comparing to single thread only with large arrays (>500000) and large number of iterations (>100) And it's always much worse than having outer loop parallel.

@andreasnoack
Copy link
Member

andreasnoack commented Feb 20, 2020

Which version of Julia and the package are you using?

@VPetukhov
Copy link
Author

VPetukhov commented Feb 20, 2020

Julia 1.3.1, FFTW v1.2.0, KernelDensity v0.5.1

@andreasnoack
Copy link
Member

@stevengj Any thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants