-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regenerate kernel Horner coeffs: scaled to max val 1, and shave off some degrees for upsampfac=1.25 #499
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good to me. The only comment is https://github.com/flatironinstitute/finufft/pull/499/files#r1686214108
that std::ceil
is not constexpr
until c++23. Current code compiles as we use #include "ker_lowupsampfac_horner_allw_loop_constexpr.c"
, if later we change to use ker_lowupsampfac_horner_allw_loop_constexpr.h
, the nc125
function may not compile with C++ standard < c++23.
Ah, I didn't realise this. This, combined with the fact that I'd like to be
able to tweak individual nc values for each w,
suggests instead using a static array ncs = {4,5,7,9,10,...} giving the ns
for each w = 2,3,..16.
Since I don't understand the templated code, esp:
`template<class T, uint8_t w>\nconstexpr std::array<std::array<T, w>,
nc200<w>()> get_horner_coeffs_200() noexcept {`
could the static ncs array be simply written into the corresponding
conditional block for each w?
I would like to simplify this code, and avoiding the nc200 and nc125
functions would be great!
Please give me suggestions - thanks, Alex
…On Mon, Jul 22, 2024 at 5:07 AM Libin Lu ***@***.***> wrote:
***@***.**** commented on this pull request.
Looks good to me. The only comment is
https://github.com/flatironinstitute/finufft/pull/499/files#r1686214108
that std::ceil is not constexpr until c++23. current code compiles as we
use #include "ker_lowupsampfac_horner_allw_loop_constexpr.c", if later we
change to use ker_lowupsampfac_horner_allw_loop_constexpr.h, the nc125
function may not compile with C++ standard < c++23.
—
Reply to this email directly, view it on GitHub
<#499 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSSNNJC4VRT7UGIZWWLZNTDT5AVCNFSM6AAAAABLHMUJROVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCOJRGA2DANBXGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
…d the poly degrees for kernel, checked accuracy
@lu1and10 and @DiamonDinoia see what you think of this much more maintainable scheme. A static array of ncs (nc for each w) is written. There is no code to maintain - rather a SSOT in the matlab codes. Much cleaner. Please check I used C++ correctly for the ncs definition, at the top of the two ker_*.h headers. Thanks! |
src/ker_horner_allw_loop_constexpr.h
Outdated
|
||
template<class T, uint8_t w> | ||
constexpr std::array<std::array<T, w>, nc200<w>()> get_horner_coeffs_200() noexcept { | ||
constexpr auto nc = nc200<w>(); | ||
if constexpr (w == 2) { | ||
if constexpr (w==2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces, the formatter will auto add them if run before commit. Please let me know if devnotes.rst or contributing.md are not easy to follow.
FLT c1[] = {2.5079742199350562E+01, -2.5079742199350562E+01, 0.0000000000000000E+00, 0.0000000000000000E+00}; | ||
FLT c2[] = {-3.5023281580177050E+00, -3.5023281580177086E+00, 0.0000000000000000E+00, 0.0000000000000000E+00}; | ||
FLT c3[] = {-7.3894949249195587E+00, 7.3894949249195632E+00, 0.0000000000000000E+00, 0.0000000000000000E+00}; | ||
FLT c0[] = {6.1209111871385724E-01, 6.1209111871385691E-01, 0.0000000000000000E+00, 0.0000000000000000E+00}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need padding anymore. The CPU version generates it (even when xsimd is not used explicitly. The GPU version does not use padding.
well, that was extremely easy, and result much neater. Ah, except conflict in spreadinterp.cpp now, will fix tomorrow. |
This automates the ES kernel coeff generation in the new C++ header constexpr static templated array format of Barbone and Lu. This means we now have upsampfac=1.25 header coeff arrays, ready to use by the recent fast xsimd even-odd kernel evaluator of Lu.
In the process I corrected a bug in Ludvig's (or mine?)
devel/gen_ker_horner_loop_C_code.m
which confused d with nc=d+1, so made all degrees one less than intended.It also reduces the # coeffs for upsampfac=1.25, so allows faster kernel eval here.
(exception is w=2, where degree goes from 3 to 4. This only affects tol=1e-1. If this causes slowdown, we should switch the degree rule from the current d=ceil(0.6w+3.2) to ceil(0.7w+2.2) or something.
It also rescales the ES kernel to max value 1 throughout the CPU spreadinterp.cpp code, fixing #454.
(GPU remains with the old kernel scale, thus may have overflow in 3d single-prec high-accuracy with large sized inputs, as in the above Issue.)
Finally I made a trivial fix to
test/finufft1d_test
to make it report ier=1 as a warning for type 3, as should be, and is in all other test codes.