You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tl;dr if you use many threads, running FFTW.set_num_threads(1) can be a good idea. Otherwise FFTW probably slows down computation and prevents outer parallelism. I suggest adding it to the README.
Full explanation
I was trying to do a lot of KDE in the loop, but it occurred that running the code in parallel slow down the process. Even if I simply set JULIA_NUM_THREADS=20 (for 56 core server) without using @threads:
using KernelDensity
using Base.Threads
interp_kde(coords::Array{Float64, 2}, bandwidth::Float64) =InterpKDE(kde((coords[1,:], coords[2,:]), bandwidth=(bandwidth, bandwidth)))
td =rand(2, 100000);
@timefor i in1:500interp_kde(td, 1.0)
end
It creates multiple threads with loading 30% and takes 15.9 seconds. The same code with JULIA_NUM_THREADS=1 takes 7.5 seconds, working fairly in single thread. Timing doesn't really change if I use `@threads:
@time@threadsfor i in1:500interp_kde(td, 1.0)
end
After some digging, the problem occurred to be in the FFTW package, which is called somewhere during interpolation and by default usesnthreads() * 4 threads inside its C code. To disable it you need to run FFTW.set_num_threads(1). After that, running with JULIA_NUM_THREADS=20 but without @threads takes 7.5 seconds, as it should be, and with @threads it takes 0.5 seconds.
I was trying different run configurations, but at the end, looks like having FFTW parallel improves situation comparing to single thread only with large arrays (>500000) and large number of iterations (>100) And it's always much worse than having outer loop parallel.
The text was updated successfully, but these errors were encountered:
tl;dr if you use many threads, running
FFTW.set_num_threads(1)
can be a good idea. Otherwise FFTW probably slows down computation and prevents outer parallelism. I suggest adding it to the README.Full explanation
I was trying to do a lot of KDE in the loop, but it occurred that running the code in parallel slow down the process. Even if I simply set
JULIA_NUM_THREADS=20
(for 56 core server) without using@threads
:It creates multiple threads with loading 30% and takes 15.9 seconds. The same code with
JULIA_NUM_THREADS=1
takes 7.5 seconds, working fairly in single thread. Timing doesn't really change if I use `@threads:After some digging, the problem occurred to be in the FFTW package, which is called somewhere during interpolation and by default uses
nthreads() * 4
threads inside its C code. To disable it you need to runFFTW.set_num_threads(1)
. After that, running withJULIA_NUM_THREADS=20
but without@threads
takes 7.5 seconds, as it should be, and with@threads
it takes 0.5 seconds.I was trying different run configurations, but at the end, looks like having FFTW parallel improves situation comparing to single thread only with large arrays (>500000) and large number of iterations (>100) And it's always much worse than having outer loop parallel.
The text was updated successfully, but these errors were encountered: