Add RISC-V vector spec v1.0 support #279

sh-zheng · 2022-03-29T02:32:56Z

Enable rvv 1.0 with --enable-rvv.
Correctness tests for FP32 and FP64 are done with bench --verify on qemu.

rdolbeau · 2022-05-04T05:42:48Z

Maybe you could first submit to https://github.com/rdolbeau/fftw3/tree/riscv-v where the code was originally developed with the EPI intrinsics.

The RVV implementation (and the SVE implementation upon which it is based in https://github.com/rdolbeau/fftw3/tree/arm-sve-clean) adds a lot of codelets by using a 'fixed-width' model, which may not be ideal for code-size and planning time. But a "scalable" implementation (full-width and/or dynamically masked) would require an adapted infrastructure, as currently the SIMD solvers require a known width at compile-time.

sh-zheng · 2022-05-04T06:34:54Z

@rdolbeau The code is indeed a prototype and need further development. I'd love to submit it to your repo first, and we can make some optimizations on it later.
I found that the branch https://github.com/rdolbeau/fftw3/tree/riscv-v has not updated since 2020 and maybe out of date with the official mainline. Could you creat a new branch from mainline, where the code is latest to submit?

rdolbeau · 2022-05-05T05:44:39Z

@sh-zheng Indeed I need to bring the branch up-to-date (or create a new, clean one), I'll try to do that ASAP.

kito-cheng · 2022-05-05T06:15:05Z

simd-support/rvv.c

+#include <riscv_vector.h>
+
+#if HAVE_RVV
+/* don't know how to autodetect RVV; assume it is present */


Detect RVV at run-time:

#include <sys/auxv.h> if (getauxval(AT_HWCAP) & HWCAP_ISA_V) { }

It's not detect zve* right, but that's the best we can did for now.

Thank you so much. I will try this detection.

sh-zheng · 2022-05-05T06:59:05Z

@rdolbeau Thank you so much. A new, clean branch may be better, since the official rvv intrinsics are stable enough for development.

rdolbeau · 2022-05-31T14:43:45Z

@sh-zheng Sorry for the delay, been quite busy; riscv-v-clean should be up-to-date with master and just a couple of consolidated commits (rdcycle & the V support).

sh-zheng · 2022-05-31T15:06:35Z

@rdolbeau Thank you very much for the new branch. I will submit my codes into it as soon as possible.

nick-knight · 2022-09-07T14:24:54Z

What is the plan? For @sh-zheng to PR their changes to @rdolbeau's fork, and then PR that here?

a "scalable" implementation (full-width and/or dynamically masked) would require an adapted infrastructure, as currently the SIMD solvers require a known width at compile-time.

My suggestion is to contribute the fixed-width RVV support first, and then propose optimizations in follow-up patches. A fixed SIMD width seems to be a fundamental assumption of FFTW's code-gen, and breaking this assumption may require invasive changes. Additionally, both RVV and SVE could benefit from supporting scalable vectors, and it might be preferable to consider both use-cases at the same time, to ensure the changes don't preclude one ISA or the other.

sh-zheng · 2022-09-07T15:12:53Z

@nick-knight I agree. Maybe we can submit the fixed-width version first. The submit of scalable version is still in progress and may not be perfect soon.

JustToEnjoy · 2022-12-11T16:43:19Z

@rdolbeau @sh-zheng
1， I fetch branch https://github.com/rdolbeau/fftw3/tree/riscv-v-clean, and build it using C906 toolchain, but it will happen error when I --enable-r5v, Have you encountered this problem?

r5v.c:28:12: warning: implicit declaration of function '__builtin_epi_vsetvl'; did you mean '__builtin_iceil'? [-Wimplicit-function-declaration] 28 | return __builtin_epi_vsetvl(rs/64, __epi_e64, __epi_m1) >= (rs/64); | ^~~~~~~~~~~~~~~~~~~~ | __builtin_iceilr5v.c:28:12: warning: nested extern declaration of '__builtin_epi_vsetvl' [-Wnested-externs]r5v.c:28:40: error: '__epi_e64' undeclared (first use in this function) 28 | return __builtin_epi_vsetvl(rs/64, __epi_e64, __epi_m1) >= (rs/64); | ^~~~~~~~~r5v.c:28:40: note: each undeclared identifier is reported only once for each function it appears inr5v.c:28:51: error: '__epi_m1' undeclared (first use in this function) 28 | return __builtin_epi_vsetvl(rs/64, __epi_e64, __epi_m1) >= (rs/64); | ^~~~~~~~r5v.c:29:3: warning: control reaches end of non-void function [-Wreturn-type] 29 | }

2，I remove --enable-r5v, it can build pass, but it can't work in C906, if I call any fftw API(for example, fftw_plan_dft_1d), the program will get stuck. No any error been print.

These two problems make me so consufed. Could you help me?

rdolbeau · 2023-03-02T16:22:30Z

@JustToEnjoy Unfortunately on my branch the code still uses EPI syntax (the European Processor Initiative project, which developed some early support for a set V intrinsics internally). It needs updating to 'standard' intrinsics. @sh-zheng had a prototype but I'm not sure targeting which set of intrinsics, or which version(s) of which compilers (llvm, gcc) would have support.

nick-knight · 2023-03-02T16:49:04Z

SiFive has successfully built FFTW from this PR branch: it passes tests, and we are pointing our customers at it. We used an in-house LLVM toolchain, but I don't think it differs significantly from upstream LLVM w.r.t. the necessary intrinsics.

rdolbeau · 2023-03-02T17:18:28Z

@nick-knight My suggestion to @sh-zheng was to merge on my branch (riscv-v-clean) so we keep the history, then merge to upstream (here). Unfortunately it seems our calendar didn't align it seems :-(

nick-knight · 2023-03-02T17:35:39Z

@rdolbeau I have done my best to acknowledge your contribution in customer engagements. If @sh-zheng follows through on your suggestion, then I assume they will close this PR and you will open a new one. When that happens, I will start pointing customers at the new one. Or, even better, the maintainer will accept the PR and we can just use master :)

sh-zheng · 2023-03-13T09:02:36Z

So sorry for the delay since I'm in some troubles, and may not continue to move forward with the project :-( . I'd appreciate it if someone could take the project forward.

The project is now in such a state, that:
1. If we don't very care about the vector-length adaptability of the code, the PR could be submited right now, to support the vector length no greater than 1024.
2. If we really care the vector-length adaptability of the code, the PR should be merged into rdolbeau's branch, and need more development as the SVE version. But this may need more time.

My suggestion is that, since the main-line of the fftw has not supported the flexible vector-length, we could submit this version, as a preliminary implementation of fixed vector length, and at least make it usable. The version of flexible vector-length could be developed later, and be submitted in another PR. @nick-knight @rdolbeau

rdolbeau · 2023-03-13T09:39:23Z

@sh-zheng @nick-knight For unrelated reason I recently opened a PR for SVE (#315). I'll try to take a look at @sh-zeng modifications and see if I can merge them on my RVV branch to follow-up, as they are somewhat similar in terms of behavior and issues.

sh-zheng · 2023-03-13T12:45:08Z

Great!
I think the main different between the two PRs is the simd-common.h. Maybe they can be merged and submitted synchronously. @rdolbeau

1. Extend the support of VLEN to 65536. The data type in struct tw_instr of kernel/ifftw.h should be extended synchronously, to support longer integer. 2. Include vtw.h for VTW1, VTW2, and VTWS, to be compatible with the coding style of ARM SVE. Although the vtw.h is auto-generated in the version of ARM SVE, for the integrity and correctness, the vtw.h will be submitted in this version.

OMaghiarIMG · 2023-12-12T09:22:27Z

Hello @sh-zheng, think you might want to add dft/simd/rvv*/*.c and rdft/simd/rvv*/*.c to .gitignore.

simd-support/rvv.c

sh-zheng · 2023-12-12T14:17:36Z

@OMaghiarIMG Thank you for the review. A new version has been updated.

rdolbeau · 2024-07-14T14:05:12Z

At last I've been able to move my own branch from the old EPI intrinsics to the current set of RISC-V V1.0 intrinsics. I've made a release package to ease testing (no need for ocaml and maintainer mode!).

It is able to compile and pass checks using either clang-18 (from Debian sid) or gcc-14 (recompiled from FSF source), running on a Banana Pi F3 SBC which features RISC-V V1.0 using 256-bits registers. It's not very fast and only has 4 GiB of RAM, but it's a lot faster than Qemu :-)

@sh-zheng branch also pass the same set of tests on the same hardware. Interestingly, the performance results I get are highly dependent on the compiler: with clang-18 my branch appear to be a bit faster, while with gcc-14 @sh-zheng branch is a bit faster. There's probably quite a bit of analytics needed to figure out what's the best way (or a reasonable compromise...) of doing single-vector/interleaved format using RISC-V V. The many different micro-architectures that are appearing will make trade-offs in the source a lot more complex than for x86-64 or aarch64...

Also, an alternate implementation would be to update and test the 'split' code, which uses two vectors for complex (one for real, one for imaginary). However this requires the V macro to hold both scalable vectors in a structure or array, which I'm not sure is supported in current compilers (the EPI compiler from the BSC does [based on clang], or a least did at some point). @rofirrim, do you know in which compiler(s) (upstream clang, upstream gcc, BSC's clang) there would be support for a struct or an array of scalable types?

rofirrim · 2024-07-15T15:45:38Z

Hi @rdolbeau,

Also, an alternate implementation would be to update and test the 'split' code, which uses two vectors for complex (one for real, one for imaginary). However this requires the V macro to hold both scalable vectors in a structure or array, which I'm not sure is supported in current compilers (the EPI compiler from the BSC does [based on clang], or a least did at some point). @rofirrim, do you know in which compiler(s) (upstream clang, upstream gcc, BSC's clang) there would be support for a struct or an array of scalable types?

I seem to recall you would be able to do that if it is acceptable to you to specify a minimum vector size (similar to what SVE does). Check: https://clang.llvm.org/docs/AttributeReference.html#riscv-rvv-vector-bits

I was wondering also if you could use a segmented tuple load/store and then extract the different vectors? Check https://www.godbolt.org/z/5h1Wj7dEf as an example of a naive complex multiplication in case it is useful.

Hope this helps.

rdolbeau · 2024-07-15T16:07:39Z

I seem to recall you would be able to do that if it is acceptable to you to specify a minimum vector size (similar to what SVE does). Check: https://clang.llvm.org/docs/AttributeReference.html#riscv-rvv-vector-bits

Unfortunately, that would require compiling each file with a different set of options. Not impossible, but certainly a burden to add to the build system...

OTOH, my memory is failing me; while I did use an array or struct, for RISC-V V with the EPI compilers, it was a tuple type: https://github.com/rdolbeau/fftw3/blob/46763868ee386f7a94cb5863afa70519a8123d24/simd-support/simd-r5v-split.h#L73 . It should be possible to do using the current vfloat64m2_t tuple type as V, and not require any special support of feature... I need to investigate this. Only half as many registers are available, but for smaller transform the smaller overhead my be a benefit.

... or, as I just saw in the godbolt link you added, the tuple type vfloat32m1x2_t, which I didn't know existed. Plenty of options actually :-)

I was wondering also if you could use a segmented tuple load/store and then extract the different vectors?

Segmented load was the goal, it did work in simulation back then (https://github.com/rdolbeau/fftw3/blob/46763868ee386f7a94cb5863afa70519a8123d24/simd-support/simd-r5v-split.h#L216). IIRC they were once an extension, do I read the specifications right and they are actually in the base V extension now?

rofirrim · 2024-07-15T19:28:23Z

Segmented load was the goal, it did work in simulation back then (https://github.com/rdolbeau/fftw3/blob/46763868ee386f7a94cb5863afa70519a8123d24/simd-support/simd-r5v-split.h#L216). IIRC they were once an extension, do I read the specifications right and they are actually in the base V extension now?

Yes, they are now part of base V.

rdolbeau · 2024-07-20T06:49:52Z

New release package from my branch here.

Edit: updated from 003 to 004

Add RISC-V vector spec v1.0 support

8806866

sh-zheng mentioned this pull request May 4, 2022

risc-v vector extensions #234

Open

kito-cheng reviewed May 5, 2022

View reviewed changes

rdolbeau mentioned this pull request Mar 2, 2023

enable-r5v can't be using in riscv #311

Open

sh-zheng added 2 commits May 3, 2023 18:31

Merge branch 'FFTW:master' into master

cbf34e4

sh-zheng force-pushed the master branch from 4b98782 to daecbbf Compare May 6, 2023 15:37

sh-zheng added 5 commits May 10, 2023 22:26

Update to new rvv intrinsic spec, with "__riscv_" prefix

708a35e

Merge cycle.h from rdolbeau's rvv branch

7a5397b

Merge branch 'FFTW:master' into master

ee4d6ed

Merge branch 'FFTW:master' into master

c029e77

Merge branch 'FFTW:master' into master

bdf1f9d

OMaghiarIMG reviewed Dec 12, 2023

View reviewed changes

simd-support/rvv.c Outdated Show resolved Hide resolved

Update .gitignore,, and fix compiling error on other platforms.

d5808c3

sh-zheng added 3 commits April 4, 2024 11:14

Merge branch 'FFTW:master' into master

36b5b7c

Merge branch 'FFTW:master' into master

8b51ffe

Merge branch 'FFTW:master' into master

3c49c94

rdolbeau added the enhancement label Jul 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RISC-V vector spec v1.0 support #279

Add RISC-V vector spec v1.0 support #279

sh-zheng commented Mar 29, 2022 •

edited

Loading

rdolbeau commented May 4, 2022

sh-zheng commented May 4, 2022

rdolbeau commented May 5, 2022

kito-cheng May 5, 2022

sh-zheng May 5, 2022

sh-zheng commented May 5, 2022

rdolbeau commented May 31, 2022 •

edited

Loading

sh-zheng commented May 31, 2022

nick-knight commented Sep 7, 2022

sh-zheng commented Sep 7, 2022

JustToEnjoy commented Dec 11, 2022

rdolbeau commented Mar 2, 2023

nick-knight commented Mar 2, 2023

rdolbeau commented Mar 2, 2023

nick-knight commented Mar 2, 2023

sh-zheng commented Mar 13, 2023

rdolbeau commented Mar 13, 2023

sh-zheng commented Mar 13, 2023 •

edited

Loading

OMaghiarIMG commented Dec 12, 2023

sh-zheng commented Dec 12, 2023

rdolbeau commented Jul 14, 2024

rofirrim commented Jul 15, 2024

rdolbeau commented Jul 15, 2024

rofirrim commented Jul 15, 2024

rdolbeau commented Jul 20, 2024 •

edited

Loading

Add RISC-V vector spec v1.0 support #279

Are you sure you want to change the base?

Add RISC-V vector spec v1.0 support #279

Conversation

sh-zheng commented Mar 29, 2022 • edited Loading

rdolbeau commented May 4, 2022

sh-zheng commented May 4, 2022

rdolbeau commented May 5, 2022

kito-cheng May 5, 2022

Choose a reason for hiding this comment

sh-zheng May 5, 2022

Choose a reason for hiding this comment

sh-zheng commented May 5, 2022

rdolbeau commented May 31, 2022 • edited Loading

sh-zheng commented May 31, 2022

nick-knight commented Sep 7, 2022

sh-zheng commented Sep 7, 2022

JustToEnjoy commented Dec 11, 2022

rdolbeau commented Mar 2, 2023

nick-knight commented Mar 2, 2023

rdolbeau commented Mar 2, 2023

nick-knight commented Mar 2, 2023

sh-zheng commented Mar 13, 2023

rdolbeau commented Mar 13, 2023

sh-zheng commented Mar 13, 2023 • edited Loading

OMaghiarIMG commented Dec 12, 2023

sh-zheng commented Dec 12, 2023

rdolbeau commented Jul 14, 2024

rofirrim commented Jul 15, 2024

rdolbeau commented Jul 15, 2024

rofirrim commented Jul 15, 2024

rdolbeau commented Jul 20, 2024 • edited Loading

sh-zheng commented Mar 29, 2022 •

edited

Loading

rdolbeau commented May 31, 2022 •

edited

Loading

sh-zheng commented Mar 13, 2023 •

edited

Loading

rdolbeau commented Jul 20, 2024 •

edited

Loading