flatironinstitute · ahbarnett · Aug 6, 2024 · Aug 1, 2024 · Aug 1, 2024 · Aug 2, 2024
diff --git a/CHANGELOG b/CHANGELOG
@@ -1,22 +1,25 @@
 List of features / changes made / release notes, in reverse chronological order.
 If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).
 
-V 2.3.0beta (7/24/24)
+V 2.3.0-rc1 (8/2/24)
 
-* python build modernized to pyproject.toml (both CPU and GPU).
-  PRs 507 (Anden, Lu, Barbone)
-* switchable FFT: either FFTW or DUCC0 (latter need no plan stage; also it is
+* Switched C++ standards from C++14 to C++17, allowing various templating
+  improvements (Barbone).
+* python build modernized to pyproject.toml (for both CPU and GPU).
+  PR 507 (Anden, Lu, Barbone)
+* switchable FFT: either FFTW or DUCC0 (latter needs no plan stage; also it is
   used to exploit sparsity pattern to achieve FFT speedups 1-3x in 2D and 3D).
-  PR463, Martin Reinecke.
+  PR463, Martin Reinecke. Both CMake and makefile includes this DUCC0 option
+  (makefile PR511 by Barnett; CMake by Barbone).
 * ES kernel rescaled to max value 1, reduced poly degrees for upsampfac=1.25,
   cleaner Horner coefficient generation PR499 (fixes fp32 overflow issue #454).
 * Major manual acceleration of spread/interp kernels via XSIMD header-only lib,
   kernel evaluation, templating by ns with AVX-width-dependent decisions.
   Up to 80% faster, dep on compiler. (Marco Barbone with help from Libin Lu).
-  PRs 459, 471, 502.
-  NOTE: introduces new dependency (XSIMD), added to cMake and makefile.
+  A large chunk of work: PRs 459, 471, 502.
+  NOTE: introduces new dependency (XSIMD), added to CMake and makefile.
 * Exploiting even/odd symmetry for 10% faster xsimd-accel kernel poly eval
-  Libin Lu based on idea of Martin Reinecke (PR477,492,493).
+  (Libin Lu based on idea of Martin Reinecke; PR477,492,493).
 * new test/finufft3dkernel_test checks kerevalmeth=0 and 1 agree to tolerance
   PR 473 (M Barbone).
 * new perftest/compare_spreads.jl compares two spreadinterp libs (A Barnett).
@@ -47,13 +50,12 @@ V 2.3.0beta (7/24/24)
   any 32-bit integers to 64-bit when calling cufinufft(f)_setpts. Note that
   internally, 32-bit integers are still used, so calling cufinufft with more
   than 2e9 points will fail. This restriction may be lifted in the future.
-* cmake build system revamped completely, more modern practices.
-  It auto selects compiler flags based on the supported ones on all operating systems.
-  Added support for Windows (llvm, msvc), Linux (llvm, gcc) and MacOS (llvm, gcc).
-* cmake support for both ducc0 and fftw
-* cmake adding nvcc and msvc optimization flags
-* cmake supports sphinx
-* updated install docs
+* CMake build system revamped completely, using more modern practices (Barbone).
+  It now auto-selects compiler flags based on those supported on all OSes, and
+  has support for Windows (llvm, msvc), Linux (llvm, gcc) and MacOS (llvm, gcc).
+* CMake added nvcc and msvc optimization flags.
+* sphinx local doc build also using CMake.
+* updated install docs, including for DUCC0 FFT.
 
 V 2.2.0 (12/12/23)
 

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -144,7 +144,8 @@ if(CMAKE_PROJECT_NAME STREQUAL PROJECT_NAME)
   if(FINUFFT_BUILD_TESTS)
     enable_testing()
   endif()
-  include(cmake/setupSphinx.cmake)
+  # include(cmake/setupSphinx.cmake)    # to be made default off since only for
+  # devs
 endif()
 
 if(FINUFFT_USE_CPU)

diff --git a/docs/devnotes.rst b/docs/devnotes.rst
@@ -27,11 +27,11 @@ Developer notes
 
 * The kernel function in spreadinterp is evaluated via piecewise-polynomial approximation (Horner's rule). The code for this is auto-generated in MATLAB, for all upsampling factors. There are two versions supported:
 
-  - 2018--2024 vintage: no explicit SIMD vectorization, C code is generated code for the Horner evaluation loop, by running from MATLAB `gen_all_horner_C_code.m`
+  - 2018--2024 vintage: no explicit SIMD vectorization, C code is generated code for the Horner evaluation loop, by running from MATLAB ``gen_all_horner_C_code.m``
 
-  - post-2024 vintage: explicit SIMD and many other acceleration tricks, and the generated code is a static C++ array of coefficients, and their sizes (`nc` or number of coefficients) for each width `w`. Run from MATLAB `gen_ker_horner_loop_cpp_code.m`
+  - post-2024 vintage: explicit SIMD and many other acceleration tricks, and the generated code is a static C++ array of coefficients, and their sizes (``nc`` or number of coefficients) for each width ``w``. Run from MATLAB ``gen_ker_horner_loop_cpp_code.m``
 
-  See `devel/README` for more details. The ES kernel coefficient and poly approx degree for both of the above are defined in a single location, `devel/get_degree_and_beta.m`, which must match the C++ `setup_spreader()` function.
+  See ``devel/README`` for more details. The ES kernel coefficient and poly approx degree for both of the above are defined in a single location, ``devel/get_degree_and_beta.m``, which must match the C++ ``setup_spreader()`` function.
 
 * Continuous Integration (CI). See files for this in ``.github/workflows/``. It currently tests the default ``makefile`` settings in linux, and three other ``make.inc.*`` files covering OSX and Windows (MinGW). CI does not test build the variant OMP=OFF. The dev should test these locally. Likewise, the Julia wrapper is separate and thus not tested in CI. We have added ``JenkinsFile`` for the GPU CI via python wrappers.
 
@@ -49,7 +49,9 @@ Developer notes
 
 * The cufinufft Python wheels are generated using Docker based on the manylinux2014 image. For instructions, see ``tools/cufinufft/distribution_helper.sh``. These are binary wheels that are built using CUDA 11 (or optionally CUDA 12, but these are not distributed on PyPI) and bundled with the necessary libraries.
 
-* Testing cufinufft (for FI, mostly)
+* CMake compiling on linux at Flatiron Institute (Rusty cluster): We have had a report that if you want to use LLVM, you need to ``module load llvm/16.0.3`` otherwise the default ``llvm/14.0.6`` does not find ``OpenMP_CXX``.
+
+* Testing cufinufft (for FI, mostly):
 
 .. code-block:: sh