regenerate kernel Horner coeffs: scaled to max val 1, and shave off some degrees for upsampfac=1.25 #499

ahbarnett · 2024-07-22T02:15:40Z

This automates the ES kernel coeff generation in the new C++ header constexpr static templated array format of Barbone and Lu. This means we now have upsampfac=1.25 header coeff arrays, ready to use by the recent fast xsimd even-odd kernel evaluator of Lu.

In the process I corrected a bug in Ludvig's (or mine?) devel/gen_ker_horner_loop_C_code.m which confused d with nc=d+1, so made all degrees one less than intended.

It also reduces the # coeffs for upsampfac=1.25, so allows faster kernel eval here.
(exception is w=2, where degree goes from 3 to 4. This only affects tol=1e-1. If this causes slowdown, we should switch the degree rule from the current d=ceil(0.6w+3.2) to ceil(0.7w+2.2) or something.

It also rescales the ES kernel to max value 1 throughout the CPU spreadinterp.cpp code, fixing #454.

(GPU remains with the old kernel scale, thus may have overflow in 3d single-prec high-accuracy with large sized inputs, as in the above Issue.)

Finally I made a trivial fix to test/finufft1d_test to make it report ier=1 as a warning for type 3, as should be, and is in all other test codes.

…accuracy

devel/gen_all_horner_cpp_header.m

src/ker_lowupsampfac_horner_allw_loop_constexpr.h

lu1and10

Overall, it looks good to me. The only comment is https://github.com/flatironinstitute/finufft/pull/499/files#r1686214108
that std::ceil is not constexpr until c++23. Current code compiles as we use #include "ker_lowupsampfac_horner_allw_loop_constexpr.c", if later we change to use ker_lowupsampfac_horner_allw_loop_constexpr.h, the nc125 function may not compile with C++ standard < c++23.

ahbarnett · 2024-07-22T15:06:13Z

Ah, I didn't realise this. This, combined with the fact that I'd like to be able to tweak individual nc values for each w, suggests instead using a static array ncs = {4,5,7,9,10,...} giving the ns for each w = 2,3,..16. Since I don't understand the templated code, esp: `template<class T, uint8_t w>\nconstexpr std::array<std::array<T, w>, nc200<w>()> get_horner_coeffs_200() noexcept {` could the static ncs array be simply written into the corresponding conditional block for each w? I would like to simplify this code, and avoiding the nc200 and nc125 functions would be great! Please give me suggestions - thanks, Alex

…

On Mon, Jul 22, 2024 at 5:07 AM Libin Lu ***@***.***> wrote: ***@***.**** commented on this pull request. Looks good to me. The only comment is https://github.com/flatironinstitute/finufft/pull/499/files#r1686214108 that std::ceil is not constexpr until c++23. current code compiles as we use #include "ker_lowupsampfac_horner_allw_loop_constexpr.c", if later we change to use ker_lowupsampfac_horner_allw_loop_constexpr.h, the nc125 function may not compile with C++ standard < c++23. — Reply to this email directly, view it on GitHub <#499 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSSNNJC4VRT7UGIZWWLZNTDT5AVCNFSM6AAAAABLHMUJROVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCOJRGA2DANBXGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

…d the poly degrees for kernel, checked accuracy

ahbarnett · 2024-07-22T18:51:07Z

@lu1and10 and @DiamonDinoia see what you think of this much more maintainable scheme. A static array of ncs (nc for each w) is written. There is no code to maintain - rather a SSOT in the matlab codes. Much cleaner. Please check I used C++ correctly for the ncs definition, at the top of the two ker_*.h headers. Thanks!

DiamonDinoia · 2024-07-22T18:56:37Z

src/ker_horner_allw_loop_constexpr.h


 template<class T, uint8_t w>
 constexpr std::array<std::array<T, w>, nc200<w>()> get_horner_coeffs_200() noexcept {
    constexpr auto nc = nc200<w>();
-    if constexpr (w == 2) {
+    if constexpr (w==2) {


spaces, the formatter will auto add them if run before commit. Please let me know if devnotes.rst or contributing.md are not easy to follow.

src/spreadinterp.cpp

DiamonDinoia · 2024-07-22T19:03:35Z

src/ker_lowupsampfac_horner_allw_loop_constexpr.c

-    FLT c1[] = {2.5079742199350562E+01, -2.5079742199350562E+01, 0.0000000000000000E+00, 0.0000000000000000E+00};
-    FLT c2[] = {-3.5023281580177050E+00, -3.5023281580177086E+00, 0.0000000000000000E+00, 0.0000000000000000E+00};
-    FLT c3[] = {-7.3894949249195587E+00, 7.3894949249195632E+00, 0.0000000000000000E+00, 0.0000000000000000E+00};
+    FLT c0[] = {6.1209111871385724E-01, 6.1209111871385691E-01, 0.0000000000000000E+00, 0.0000000000000000E+00};


We do not need padding anymore. The CPU version generates it (even when xsimd is not used explicitly. The GPU version does not use padding.

src/ker_horner_allw_loop_constexpr.c

src/ker_lowupsampfac_horner_allw_loop_constexpr.h

src/ker_horner_allw_loop_constexpr.h

…ch neater

ahbarnett · 2024-07-22T22:15:07Z

well, that was extremely easy, and result much neater.

Ah, except conflict in spreadinterp.cpp now, will fix tomorrow.

src/ker_lowupsampfac_horner_allw_loop_constexpr.h

DiamonDinoia · 2024-07-23T14:33:57Z

New coefficients are a bit faster in 1D, 2/3D small difference

ahbarnett added 7 commits July 16, 2024 11:36

gen_all_horner_cpp_header.m 1st try

4cf74ff

new ES kernel generation

497de5b

regen ES kernel with max size 1 not exp(beta)

24a68a9

fix constexpr in matlab-gen ES kernel code

946825f

done rescaling ES kernel to max value 1, regen all coeffs, fixes #454

180e8d7

src/ker_horner_allw_loop_constexpr.c

50c5a35

add 1 to degree for larger-w upsampfac=1.25 ker eval, recovers prior …

b32929c

…accuracy

ahbarnett requested review from lu1and10 and DiamonDinoia July 22, 2024 02:16

fix 1dtest issue re allowing ier=1 (warning only), #500

79069e5

ahbarnett mentioned this pull request Jul 22, 2024

upsampfac=1.25 with tol<1e-9 reports warnings for type=1,2 but errors out for type=3 #500

Closed

lu1and10 reviewed Jul 22, 2024

View reviewed changes

devel/gen_all_horner_cpp_header.m Outdated Show resolved Hide resolved

lu1and10 reviewed Jul 22, 2024

View reviewed changes

src/ker_lowupsampfac_horner_allw_loop_constexpr.h Outdated Show resolved Hide resolved

lu1and10 reviewed Jul 22, 2024

View reviewed changes

ahbarnett added 2 commits July 22, 2024 13:47

more automated matlb kernel coeff generation

1afcf2d

automated and documented matlab kernel coefficient generation; tweake…

20b6265

…d the poly degrees for kernel, checked accuracy

DiamonDinoia reviewed Jul 22, 2024

View reviewed changes

src/ker_horner_allw_loop_constexpr.h Outdated Show resolved Hide resolved

regen ker coeffs .h without nc200() etc func; use .size() instead, mu…

2e88b76

…ch neater

DiamonDinoia reviewed Jul 22, 2024

View reviewed changes

src/ker_lowupsampfac_horner_allw_loop_constexpr.h Outdated Show resolved Hide resolved

ahbarnett added 3 commits July 22, 2024 18:19

merged nc = ..size() with master

c49701a

reverted matlab before clang-format messed it up

b22ae00

clang-formatted regen with assert logic fixed in .h horner stuff

5b14baa

ahbarnett mentioned this pull request Jul 22, 2024

make spreadinterp.cpp use upsampfac=1.25 .h coeffs code, instead of old .c code. #501

Closed

DiamonDinoia merged commit 3e231e2 into master Jul 23, 2024
72 checks passed

DiamonDinoia added this to the 2.3 milestone Jul 23, 2024

ahbarnett deleted the gencoef branch October 9, 2024 00:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regenerate kernel Horner coeffs: scaled to max val 1, and shave off some degrees for upsampfac=1.25 #499

regenerate kernel Horner coeffs: scaled to max val 1, and shave off some degrees for upsampfac=1.25 #499

ahbarnett commented Jul 22, 2024 •

edited

Loading

lu1and10 left a comment •

edited

Loading

ahbarnett commented Jul 22, 2024 via email

ahbarnett commented Jul 22, 2024

DiamonDinoia Jul 22, 2024

DiamonDinoia Jul 22, 2024

ahbarnett commented Jul 22, 2024 •

edited

Loading

DiamonDinoia commented Jul 23, 2024

regenerate kernel Horner coeffs: scaled to max val 1, and shave off some degrees for upsampfac=1.25 #499

regenerate kernel Horner coeffs: scaled to max val 1, and shave off some degrees for upsampfac=1.25 #499

Conversation

ahbarnett commented Jul 22, 2024 • edited Loading

lu1and10 left a comment • edited Loading

Choose a reason for hiding this comment

ahbarnett commented Jul 22, 2024 via email

ahbarnett commented Jul 22, 2024

DiamonDinoia Jul 22, 2024

Choose a reason for hiding this comment

DiamonDinoia Jul 22, 2024

Choose a reason for hiding this comment

ahbarnett commented Jul 22, 2024 • edited Loading

DiamonDinoia commented Jul 23, 2024

ahbarnett commented Jul 22, 2024 •

edited

Loading

lu1and10 left a comment •

edited

Loading

ahbarnett commented Jul 22, 2024 •

edited

Loading