Provide a user level locking mechanism for FFTW #548

ahbarnett · 2024-09-04T20:53:26Z

This commit provides a basic interface for the user to provide a lock around fftw calls. This is because any plan manipulation functions in fftw (make, destroy) are not thread-safe. See example code here:

 #include <vector>
 #include <mutex>

 #include <fftw3.h>
 #include <finufft.h>
 #include <omp.h>

using namespace std;

 #define N 65384

void locker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->lock(); }
void unlocker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->unlock(); }

int main() {
    int64_t Ns[3]; // guru describes mode array by vector [N1,N2..]
    Ns[0] = N;
    recursive_mutex lck;

    finufft_opts opts;
    finufft_default_opts(&opts);
    opts.nthreads = 1;
    opts.debug = 0;
    opts.fftw_lock_fun = locker;
    opts.fftw_unlock_fun = unlocker;
    opts.fftw_lock_data = reinterpret_cast<void *>(&lck);

    // random nonuniform points (x) and complex strengths (c)...
    vector<complex<double>> c(N);

    // init FFTW threads
    fftw_init_threads();

    // FFTW and FINUFFT execution using OpenMP parallelization
    #pragma omp parallel for
    for (int j = 0; j < 100; ++j) {

        // allocate output array for FFTW...
        vector<complex<double>> F1(N);

        // FFTW plan
        lck.lock();
        fftw_plan_with_nthreads(1);
        fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast<fftw_complex*>(c.data()), reinterpret_cast<fftw_complex*>(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE);
        fftw_destroy_plan(plan);
        lck.unlock();

        // FINUFFT plan
        finufft_plan nufftplan;
        finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, &opts);
        finufft_destroy(nufftplan);
    }
    return 0;
}

jkrimmer · 2024-09-09T07:16:38Z

Just as the CI seems to fail on MacOS with LLVM, I have been able to reproduce the apparently underlying locking issues for the double-precision basicpassfail-test locally. For whatever reason, adding a pointer to fftw_lock in the FFTWLockGuard class seems to fix the issue:

class FFTWLockGuard {
public:
  FFTWLockGuard(void (*lock_fun)(void *), void (*unlock_fun)(void *), void *lock_data)
      : unlock_fun_(unlock_fun), lock_data_(lock_data) {
    fftw_lock_p = &fftw_lock;
    if (lock_fun)
      lock_fun(lock_data_);
    else
      fftw_lock_p->lock();
  }
  ~FFTWLockGuard() {
    if (unlock_fun_)
      unlock_fun_(lock_data_);
    else
      fftw_lock_p->unlock();
  }

private:
  void (*unlock_fun_)(void *);
  void *lock_data_;
  std::mutex *fftw_lock_p;
};

blackwer · 2024-09-09T15:11:29Z

Just as the CI seems to fail on MacOS with LLVM, I have been able to reproduce the apparently underlying locking issues for the double-precision basicpassfail-test locally. For whatever reason, adding a pointer to fftw_lock in the FFTWLockGuard class seems to fix the issue:

I just read this after my push -- found the same thing though used a reference instead. My guess is that the optimizer is attempting to re-use the lock guard, rather than actually calling the destructor at the end of scope as it's supposed to. I used a reference since that's essentially an identical implementation as std::lock_guard.

blackwer · 2024-09-10T13:57:16Z

@ahbarnett it looks like the MATLAB code should work ignoring the new options, so it's optional to add this new functionality there. MATLAB doesn't support creating C-function pointers from MATLAB functions, so the user would have to write a mex wrapper to make matlab callbacks. I'm not sure MATLAB supports threads, but instead uses process pools for parallelism iirc. I doubt it's worth it to support this, but was wondering if you have any thoughts.

Also, any other things we should be worrying about here?

https://www.mathworks.com/matlabcentral/answers/100602-how-do-i-create-a-function-pointer-that-points-to-an-matlab-file-using-libpointer-in-the-matlab-gene

This commit provides a basic interface for the user to provide a lock around `fftw` calls. This is because any plan manipulation functions in `fftw` (make, destroy) are not thread-safe. See example code here: ```c++ #include <vector> #include <mutex> #include <fftw3.h> #include <finufft.h> #include <omp.h> using namespace std; #define N 65384 void locker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->lock(); } void unlocker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->unlock(); } int main() { int64_t Ns[3]; // guru describes mode array by vector [N1,N2..] Ns[0] = N; recursive_mutex lck; finufft_opts opts; finufft_default_opts(&opts); opts.nthreads = 1; opts.debug = 0; opts.fftw_lock_fun = locker; opts.fftw_unlock_fun = unlocker; opts.fftw_lock_data = reinterpret_cast<void *>(&lck); // random nonuniform points (x) and complex strengths (c)... vector<complex<double>> c(N); // init FFTW threads fftw_init_threads(); // FFTW and FINUFFT execution using OpenMP parallelization #pragma omp parallel for for (int j = 0; j < 100; ++j) { // allocate output array for FFTW... vector<complex<double>> F1(N); // FFTW plan lck.lock(); fftw_plan_with_nthreads(1); fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast<fftw_complex*>(c.data()), reinterpret_cast<fftw_complex*>(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE); fftw_destroy_plan(plan); lck.unlock(); // FINUFFT plan finufft_plan nufftplan; finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, &opts); finufft_destroy(nufftplan); } return 0; } ```

docs/opts.rst

src/finufft.cpp

This reverts commit 8594b65.

blackwer · 2024-09-17T14:59:30Z

I investigated the lock failures more on the apple silicon and still don't have a definitive answer. I can confirm a few things.

The optimizer is not optimizing the unlock call away in the lock guard. fftw_lock.unlock() is still being called when not using a reference to the variable in the class
The address reported of fftw_lock is the same on all calls, so it should be using the same mutex variable (though maybe the state it points to is being mangled somehow?!)
This only seems to happen when using the provided makefile. I haven't been able to reproduce this with cmake yet. There might be a magic incantation, but I haven't found it...
I have not be able to reproduce this bug when using a reference in the class.

I don't have any good guesses for what's going on at this point. I'm not savvy enough with lldb to dig deep into interpreting the mutex state either.

ahbarnett · 2024-09-17T16:03:15Z

Thanks for investigating this. What do you think is the best way forward? Can you leave that pointer in, as a temp fix, and we bring this in? Best, Alex

blackwer · 2024-09-18T14:38:47Z

Thanks for investigating this. What do you think is the best way forward? Can you leave that pointer in, as a temp fix, and we bring this in? Best, Alex

For the record, we discussed this offline yesterday and decided it's probably OK to merge in with the reference and monitor it. If the issue returns we'll remove it and re-investigate.

jkrimmer mentioned this pull request Sep 9, 2024

Temporarily use safe FFTW planning ludvigak/FINUFFT.jl#66

Merged

blackwer requested a review from lu1and10 September 10, 2024 13:35

blackwer added 5 commits September 10, 2024 10:32

fftw_lock: fix issue when null opts passed

904baf3

fftw_lock: add new opts to python bindings

f129880

threads: fix lock guard issue in osx arm64 clang

49675c0

fftw_lock: add info to changelog and user documentation

024bfa8

blackwer force-pushed the user_fftw_lock branch from 790a67f to 024bfa8 Compare September 10, 2024 14:34

DiamonDinoia requested changes Sep 10, 2024

View reviewed changes

docs/opts.rst Show resolved Hide resolved

src/finufft.cpp Show resolved Hide resolved

src/finufft.cpp Show resolved Hide resolved

blackwer added 2 commits September 10, 2024 11:28

fftw_lock: add test for new fftw_lock

1e2c047

fftw_lock: share lock and init between float/double

8594b65

lu1and10 approved these changes Sep 10, 2024

View reviewed changes

This was referenced Sep 11, 2024

Use fftw_planner_nthreads #526

Draft

De-macroize FFT #558

Merged

blackwer added 2 commits September 17, 2024 09:22

Merge remote-tracking branch 'upstream/master' into user_fftw_lock

3de4ed5

Revert "fftw_lock: share lock and init between float/double"

4dd9ad7

This reverts commit 8594b65.

blackwer added 2 commits September 17, 2024 16:55

fftw_lock_test: add comments to source describing why

0dd134c

Merge remote-tracking branch 'upstream/master' into user_fftw_lock

5101cef

blackwer merged commit b98fd1d into flatironinstitute:master Sep 18, 2024
167 checks passed

blackwer deleted the user_fftw_lock branch September 18, 2024 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a user level locking mechanism for FFTW #548

Provide a user level locking mechanism for FFTW #548

ahbarnett commented Sep 4, 2024

jkrimmer commented Sep 9, 2024

blackwer commented Sep 9, 2024

blackwer commented Sep 10, 2024

blackwer commented Sep 17, 2024

ahbarnett commented Sep 17, 2024

blackwer commented Sep 18, 2024

Provide a user level locking mechanism for FFTW #548

Provide a user level locking mechanism for FFTW #548

Conversation

ahbarnett commented Sep 4, 2024

jkrimmer commented Sep 9, 2024

blackwer commented Sep 9, 2024

blackwer commented Sep 10, 2024

blackwer commented Sep 17, 2024

ahbarnett commented Sep 17, 2024

blackwer commented Sep 18, 2024