Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a user level locking mechanism for FFTW #548

Merged
merged 11 commits into from
Sep 18, 2024

Conversation

ahbarnett
Copy link
Collaborator

This commit provides a basic interface for the user to provide a lock around fftw calls. This is because any plan manipulation functions in fftw (make, destroy) are not thread-safe. See example code here:

 #include <vector>
 #include <mutex>

 #include <fftw3.h>
 #include <finufft.h>
 #include <omp.h>

using namespace std;

 #define N 65384

void locker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->lock(); }
void unlocker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->unlock(); }

int main() {
    int64_t Ns[3]; // guru describes mode array by vector [N1,N2..]
    Ns[0] = N;
    recursive_mutex lck;

    finufft_opts opts;
    finufft_default_opts(&opts);
    opts.nthreads = 1;
    opts.debug = 0;
    opts.fftw_lock_fun = locker;
    opts.fftw_unlock_fun = unlocker;
    opts.fftw_lock_data = reinterpret_cast<void *>(&lck);

    // random nonuniform points (x) and complex strengths (c)...
    vector<complex<double>> c(N);

    // init FFTW threads
    fftw_init_threads();

    // FFTW and FINUFFT execution using OpenMP parallelization
    #pragma omp parallel for
    for (int j = 0; j < 100; ++j) {

        // allocate output array for FFTW...
        vector<complex<double>> F1(N);

        // FFTW plan
        lck.lock();
        fftw_plan_with_nthreads(1);
        fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast<fftw_complex*>(c.data()), reinterpret_cast<fftw_complex*>(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE);
        fftw_destroy_plan(plan);
        lck.unlock();

        // FINUFFT plan
        finufft_plan nufftplan;
        finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, &opts);
        finufft_destroy(nufftplan);
    }
    return 0;
}

@jkrimmer
Copy link
Contributor

jkrimmer commented Sep 9, 2024

Just as the CI seems to fail on MacOS with LLVM, I have been able to reproduce the apparently underlying locking issues for the double-precision basicpassfail-test locally. For whatever reason, adding a pointer to fftw_lock in the FFTWLockGuard class seems to fix the issue:

class FFTWLockGuard {
public:
  FFTWLockGuard(void (*lock_fun)(void *), void (*unlock_fun)(void *), void *lock_data)
      : unlock_fun_(unlock_fun), lock_data_(lock_data) {
    fftw_lock_p = &fftw_lock;
    if (lock_fun)
      lock_fun(lock_data_);
    else
      fftw_lock_p->lock();
  }
  ~FFTWLockGuard() {
    if (unlock_fun_)
      unlock_fun_(lock_data_);
    else
      fftw_lock_p->unlock();
  }

private:
  void (*unlock_fun_)(void *);
  void *lock_data_;
  std::mutex *fftw_lock_p;
};

@blackwer
Copy link
Member

blackwer commented Sep 9, 2024

Just as the CI seems to fail on MacOS with LLVM, I have been able to reproduce the apparently underlying locking issues for the double-precision basicpassfail-test locally. For whatever reason, adding a pointer to fftw_lock in the FFTWLockGuard class seems to fix the issue:

I just read this after my push -- found the same thing though used a reference instead. My guess is that the optimizer is attempting to re-use the lock guard, rather than actually calling the destructor at the end of scope as it's supposed to. I used a reference since that's essentially an identical implementation as std::lock_guard.

@blackwer
Copy link
Member

@ahbarnett it looks like the MATLAB code should work ignoring the new options, so it's optional to add this new functionality there. MATLAB doesn't support creating C-function pointers from MATLAB functions, so the user would have to write a mex wrapper to make matlab callbacks. I'm not sure MATLAB supports threads, but instead uses process pools for parallelism iirc. I doubt it's worth it to support this, but was wondering if you have any thoughts.

Also, any other things we should be worrying about here?

https://www.mathworks.com/matlabcentral/answers/100602-how-do-i-create-a-function-pointer-that-points-to-an-matlab-file-using-libpointer-in-the-matlab-gene

This commit provides a basic interface for the user to provide a lock around `fftw` calls. This
is because any plan manipulation functions in `fftw` (make, destroy) are not thread-safe. See
example code here:

```c++
 #include <vector>
 #include <mutex>

 #include <fftw3.h>
 #include <finufft.h>
 #include <omp.h>

using namespace std;

 #define N 65384

void locker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->lock(); }
void unlocker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->unlock(); }

int main() {
    int64_t Ns[3]; // guru describes mode array by vector [N1,N2..]
    Ns[0] = N;
    recursive_mutex lck;

    finufft_opts opts;
    finufft_default_opts(&opts);
    opts.nthreads = 1;
    opts.debug = 0;
    opts.fftw_lock_fun = locker;
    opts.fftw_unlock_fun = unlocker;
    opts.fftw_lock_data = reinterpret_cast<void *>(&lck);

    // random nonuniform points (x) and complex strengths (c)...
    vector<complex<double>> c(N);

    // init FFTW threads
    fftw_init_threads();

    // FFTW and FINUFFT execution using OpenMP parallelization
    #pragma omp parallel for
    for (int j = 0; j < 100; ++j) {

        // allocate output array for FFTW...
        vector<complex<double>> F1(N);

        // FFTW plan
        lck.lock();
        fftw_plan_with_nthreads(1);
        fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast<fftw_complex*>(c.data()), reinterpret_cast<fftw_complex*>(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE);
        fftw_destroy_plan(plan);
        lck.unlock();

        // FINUFFT plan
        finufft_plan nufftplan;
        finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, &opts);
        finufft_destroy(nufftplan);
    }
    return 0;
}
```
docs/opts.rst Show resolved Hide resolved
src/finufft.cpp Show resolved Hide resolved
src/finufft.cpp Show resolved Hide resolved
This was referenced Sep 11, 2024
@blackwer
Copy link
Member

I investigated the lock failures more on the apple silicon and still don't have a definitive answer. I can confirm a few things.

  1. The optimizer is not optimizing the unlock call away in the lock guard. fftw_lock.unlock() is still being called when not using a reference to the variable in the class
  2. The address reported of fftw_lock is the same on all calls, so it should be using the same mutex variable (though maybe the state it points to is being mangled somehow?!)
  3. This only seems to happen when using the provided makefile. I haven't been able to reproduce this with cmake yet. There might be a magic incantation, but I haven't found it...
  4. I have not be able to reproduce this bug when using a reference in the class.

I don't have any good guesses for what's going on at this point. I'm not savvy enough with lldb to dig deep into interpreting the mutex state either.

@ahbarnett
Copy link
Collaborator Author

Thanks for investigating this. What do you think is the best way forward? Can you leave that pointer in, as a temp fix, and we bring this in? Best, Alex

@blackwer
Copy link
Member

Thanks for investigating this. What do you think is the best way forward? Can you leave that pointer in, as a temp fix, and we bring this in? Best, Alex

For the record, we discussed this offline yesterday and decided it's probably OK to merge in with the reference and monitor it. If the issue returns we'll remove it and re-investigate.

@blackwer blackwer merged commit b98fd1d into flatironinstitute:master Sep 18, 2024
167 checks passed
@blackwer blackwer deleted the user_fftw_lock branch September 18, 2024 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants