-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add first NEON SIMD opcode implementations to fast interp #3859
Add first NEON SIMD opcode implementations to fast interp #3859
Conversation
build-scripts/config_common.cmake
Outdated
@@ -597,3 +597,6 @@ endif() | |||
if (NOT WAMR_BUILD_SANITIZER STREQUAL "") | |||
message (" Sanitizer ${WAMR_BUILD_SANITIZER} enabled") | |||
endif () | |||
if (ENABLE_NEON EQUAL 1) | |||
add_definitions(-DENABLE_NEON=1) | |||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the devices we target are ARMv7 rather than ARMv8 (and even the v8 devices don't usually run in AArch64 mode. Most, but not all of the v7 devices have NEON support.
So I want to add a new macro for this, but not sure if this is the right thing to do?
I was thinking I could do something like a WAMR_BUILD_ARM_NEON
style option and then #ifdef WAMR_BUILD_ARM_NEON != 0 || BUILD_TARGET_AARCH64
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new build flag makes sense if really NEON support is non-deterministic only based on the architecture. I'd however suggest a few changes here:
- use
WAMR_BUILD_*
prefix for consistency (e.g.WAMR_BUILD_ARM_NEON
) - consider enabling it by default for some common combinations
- for the
#ifdef WAMR_BUILD_ARM_NEON != 0 || BUILD_TARGET_AARCH64
condition, I'm not sure if the build target is needed; we should just use the NEON ifdef, otherwise it should not be defined if not desired.
case SIMD_v128_load8x8_s: | ||
#if ENABLE_NEON == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I do something similar to the invokeNative files where I define some function/symbol that is defined for each architecture in a separate file and put the opcode implementation there, or is it better inline here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my preference would be to not keep it here as it might grow a lot if we start supporting more architectures.
Also regarding testing, I think I will need to add some qemu tests for ARMv7 + NEON in this PR. Is that okay? |
build-scripts/config_common.cmake
Outdated
@@ -597,3 +597,6 @@ endif() | |||
if (NOT WAMR_BUILD_SANITIZER STREQUAL "") | |||
message (" Sanitizer ${WAMR_BUILD_SANITIZER} enabled") | |||
endif () | |||
if (ENABLE_NEON EQUAL 1) | |||
add_definitions(-DENABLE_NEON=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_definitions(-DENABLE_NEON=1) | |
add_definitions(-DWASM_ENABLE_NEON=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, you should add this to the config.h
file and set it to 0 by default
|
||
int8x8_t v8 = vld1_s8((const int8_t *)maddr); | ||
v8 = vrev64_s8(v8); | ||
int16x8_t v16 = vmovl_s8(v8); | ||
|
||
V128 result; | ||
vst1q_s16((int16_t *)&result, v16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess only this part is architecture specific so the other code can go outside of the ifdef
(and possibly have an inefficient, plain C implementation in the else
block?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same for all the other opcodes below.
|
||
int8x8_t v8 = vld1_s8((const int8_t *)maddr); | ||
v8 = vrev64_s8(v8); | ||
int16x8_t v16 = vmovl_s8(v8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had better not use arch/os specified API here, could we wrap them to common APIs? And are there common implementations which doesn't depend on specified arch/os?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
using arch specific things directly like this PR should be the last resort after considering other options.
have you already considered libraries like simde? or compiler intrinsics? or even pure C implementation?
cf. https://github.com/yamt/toywasm/blob/e9226fae3e610d37742aa7afc8b0b7a7cf04f941/lib/insn_impl_simd.h#L8-L12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to use simde - I'll update this PR adding that and implementing a couple of opcodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added the library and implemented the swizzle opcode using it. Should be quite quick to add the rest of them now.
8a4d665
to
2de1515
Compare
2de1515
to
71d129f
Compare
build-scripts/config_common.cmake
Outdated
@@ -288,6 +288,9 @@ endif () | |||
if (WAMR_BUILD_LIB_RATS EQUAL 1) | |||
message (" Lib rats enabled") | |||
endif() | |||
if (WAMR_BUILD_SIMDE EQUAL 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now it only supports fast-interp, how about if ((WAMR_BUILD_SIMDE EQUAL 1) AND (WAMR_BUILD_FAST_INTERP EQUAL 1))
?
build-scripts/runtime_lib.cmake
Outdated
@@ -142,6 +142,10 @@ if (WAMR_BUILD_LIB_RATS EQUAL 1) | |||
include (${IWASM_DIR}/libraries/lib-rats/lib_rats.cmake) | |||
endif () | |||
|
|||
if (WAMR_BUILD_SIMDE EQUAL 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -19,6 +19,10 @@ else () | |||
set (LOADER "wasm_loader.c") | |||
endif () | |||
|
|||
if (WAMR_BUILD_SIMD) | |||
set (WAMR_BUILD_SIMDE 1) | |||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add these lines since WAMR_BUILD_SIMD is already larger than 0, had better remove these lines?
wasm_set_exception(module, "unsupported SIMD opcode"); | ||
uint32 offset, addr; | ||
offset = read_uint32(frame_ip); | ||
addr = GET_OPERAND(uint32, I32, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it should be addr = GET_OPERAND(uint32, I32, 0); frame_ip += 2;
?
|
||
V128 data; | ||
data = POP_V128(); | ||
frame_ip += 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should remove this line?
f101096
to
b00c2e6
Compare
b00c2e6
to
d92cfd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ae858b2
to
c669aaf
Compare
c669aaf
to
d35ed4d
Compare
HANDLE_OP(WASM_OP_SIMD_PREFIX) | ||
{ | ||
GET_OPCODE(); | ||
|
||
switch (opcode) { | ||
/* Memory */ | ||
case SIMD_v128_load: | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view, wouldn't it be better to move all the new ~1000 lines to a new, separate file? The original core/iwasm/interpreter/wasm_interp_fast.c is already large. let's not make it any bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - I'm happy to follow up with a PR that moves things around later, but I have a few other branches that I'd need to rebase etc so would be good to merge this? This is just a feature branch so I think it's okay if it's a bit messy for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jammar1 I added several comments, could you have a look? I am fine to merge this PR to the branch dev/simd_for_interp
first but we should address the issues then. Do you want to merge this PR first, or merge it after you address the new comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyongh I've addressed the comments and CI/ the checks have passed. Could you merge it if you're happy with the changes? Thanks
d8042b4
to
2f651f1
Compare
@wenyongh I think I've addressed your comments, would you be able to review this? Thanks |
e3e45c6
to
6b90955
Compare
6b90955
to
b20c300
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c930c4d
into
bytecodealliance:dev/simd_for_interp
Sure, merged. A suggestion is to change |
Add some implementations of SIMD opcodes using NEON instructions.
Tested using: