Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add FFT=DUCC option to makefile #511

Merged
merged 19 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 17 additions & 15 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,22 +1,25 @@
List of features / changes made / release notes, in reverse chronological order.
If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).

V 2.3.0beta (7/24/24)
V 2.3.0-rc1 (8/2/24)

* python build modernized to pyproject.toml (both CPU and GPU).
PRs 507 (Anden, Lu, Barbone)
* switchable FFT: either FFTW or DUCC0 (latter need no plan stage; also it is
* Switched C++ standards from C++14 to C++17, allowing various templating
improvements (Barbone).
* python build modernized to pyproject.toml (for both CPU and GPU).
PR 507 (Anden, Lu, Barbone)
* switchable FFT: either FFTW or DUCC0 (latter needs no plan stage; also it is
used to exploit sparsity pattern to achieve FFT speedups 1-3x in 2D and 3D).
PR463, Martin Reinecke.
PR463, Martin Reinecke. Both CMake and makefile includes this DUCC0 option
(makefile PR511 by Barnett; CMake by Barbone).
* ES kernel rescaled to max value 1, reduced poly degrees for upsampfac=1.25,
cleaner Horner coefficient generation PR499 (fixes fp32 overflow issue #454).
* Major manual acceleration of spread/interp kernels via XSIMD header-only lib,
kernel evaluation, templating by ns with AVX-width-dependent decisions.
Up to 80% faster, dep on compiler. (Marco Barbone with help from Libin Lu).
PRs 459, 471, 502.
NOTE: introduces new dependency (XSIMD), added to cMake and makefile.
A large chunk of work: PRs 459, 471, 502.
NOTE: introduces new dependency (XSIMD), added to CMake and makefile.
* Exploiting even/odd symmetry for 10% faster xsimd-accel kernel poly eval
Libin Lu based on idea of Martin Reinecke (PR477,492,493).
(Libin Lu based on idea of Martin Reinecke; PR477,492,493).
* new test/finufft3dkernel_test checks kerevalmeth=0 and 1 agree to tolerance
PR 473 (M Barbone).
* new perftest/compare_spreads.jl compares two spreadinterp libs (A Barnett).
Expand Down Expand Up @@ -47,13 +50,12 @@ V 2.3.0beta (7/24/24)
any 32-bit integers to 64-bit when calling cufinufft(f)_setpts. Note that
internally, 32-bit integers are still used, so calling cufinufft with more
than 2e9 points will fail. This restriction may be lifted in the future.
* cmake build system revamped completely, more modern practices.
It auto selects compiler flags based on the supported ones on all operating systems.
Added support for Windows (llvm, msvc), Linux (llvm, gcc) and MacOS (llvm, gcc).
* cmake support for both ducc0 and fftw
* cmake adding nvcc and msvc optimization flags
* cmake supports sphinx
* updated install docs
* CMake build system revamped completely, using more modern practices (Barbone).
It now auto-selects compiler flags based on those supported on all OSes, and
has support for Windows (llvm, msvc), Linux (llvm, gcc) and MacOS (llvm, gcc).
* CMake added nvcc and msvc optimization flags.
* sphinx local doc build also using CMake.
* updated install docs, including for DUCC0 FFT.

V 2.2.0 (12/12/23)

Expand Down
3 changes: 2 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,8 @@ if(CMAKE_PROJECT_NAME STREQUAL PROJECT_NAME)
if(FINUFFT_BUILD_TESTS)
enable_testing()
endif()
include(cmake/setupSphinx.cmake)
# include(cmake/setupSphinx.cmake) # to be made default off since only for
# devs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be not commented out. Please, merge master

endif()

if(FINUFFT_USE_CPU)
Expand Down
10 changes: 6 additions & 4 deletions docs/devnotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ Developer notes

* The kernel function in spreadinterp is evaluated via piecewise-polynomial approximation (Horner's rule). The code for this is auto-generated in MATLAB, for all upsampling factors. There are two versions supported:

- 2018--2024 vintage: no explicit SIMD vectorization, C code is generated code for the Horner evaluation loop, by running from MATLAB `gen_all_horner_C_code.m`
- 2018--2024 vintage: no explicit SIMD vectorization, C code is generated code for the Horner evaluation loop, by running from MATLAB ``gen_all_horner_C_code.m``

- post-2024 vintage: explicit SIMD and many other acceleration tricks, and the generated code is a static C++ array of coefficients, and their sizes (`nc` or number of coefficients) for each width `w`. Run from MATLAB `gen_ker_horner_loop_cpp_code.m`
- post-2024 vintage: explicit SIMD and many other acceleration tricks, and the generated code is a static C++ array of coefficients, and their sizes (``nc`` or number of coefficients) for each width ``w``. Run from MATLAB ``gen_ker_horner_loop_cpp_code.m``

See `devel/README` for more details. The ES kernel coefficient and poly approx degree for both of the above are defined in a single location, `devel/get_degree_and_beta.m`, which must match the C++ `setup_spreader()` function.
See ``devel/README`` for more details. The ES kernel coefficient and poly approx degree for both of the above are defined in a single location, ``devel/get_degree_and_beta.m``, which must match the C++ ``setup_spreader()`` function.

* Continuous Integration (CI). See files for this in ``.github/workflows/``. It currently tests the default ``makefile`` settings in linux, and three other ``make.inc.*`` files covering OSX and Windows (MinGW). CI does not test build the variant OMP=OFF. The dev should test these locally. Likewise, the Julia wrapper is separate and thus not tested in CI. We have added ``JenkinsFile`` for the GPU CI via python wrappers.

Expand All @@ -49,7 +49,9 @@ Developer notes

* The cufinufft Python wheels are generated using Docker based on the manylinux2014 image. For instructions, see ``tools/cufinufft/distribution_helper.sh``. These are binary wheels that are built using CUDA 11 (or optionally CUDA 12, but these are not distributed on PyPI) and bundled with the necessary libraries.

* Testing cufinufft (for FI, mostly)
* CMake compiling on linux at Flatiron Institute (Rusty cluster): We have had a report that if you want to use LLVM, you need to ``module load llvm/16.0.3`` otherwise the default ``llvm/14.0.6`` does not find ``OpenMP_CXX``.
Copy link
Collaborator

@DiamonDinoia DiamonDinoia Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is outdated? @blackwer, could you give a review?


* Testing cufinufft (for FI, mostly):

.. code-block:: sh

Expand Down
Loading
Loading