Reimplement some x86 intrinsics without arch-specific LLVM intrinsics #1463

eduardosm · 2023-08-29T15:39:37Z

Reimplements some x86 without using arch-specific LLVM intrinsics:

Store unaligned (_mm*_storeu_*): Use <*mut _>::write_unaligned instead of llvm.x86.*.storeu.*.
Shift by immediate (_mm*_s{ll,rl,ra}i_epi*): Use if (srl, sll) or min (sra) to simulate the behaviour when the RHS is out of range. RHS is constant, so the if/min will be optimized away.

The advantages are:

codegen will not have to handle those LLVM instrinsics.
miri will be able to emulate them without specific shims

…e_unaligned` instead of LLVM intrinsics

…r}` instead of LLVM intrinsics

…hl` instead of LLVM intrinsics

…hr` instead of LLVM intrinsics

…shl` instead of LLVM intrinsics

…shr` instead of LLVM intrinsics

rustbot · 2023-08-29T15:39:42Z

r? @Amanieu

(rustbot has picked a reviewer for you, use r? to override)

Amanieu · 2023-08-29T17:55:34Z

crates/core_arch/src/x86/sse2.rs

@@ -732,7 +752,11 @@ pub unsafe fn _mm_srl_epi32(a: __m128i, count: __m128i) -> __m128i {
 #[stable(feature = "simd_x86", since = "1.27.0")]
 pub unsafe fn _mm_srli_epi64<const IMM8: i32>(a: __m128i) -> __m128i {
    static_assert_uimm_bits!(IMM8, 8);
-    transmute(psrliq(a.as_i64x2(), IMM8))
+    if IMM8 >= 32 {


Shouldn't this be 64?

…dtwco,bjorn3 Update stdarch submodule and remove special handling in cranelift codegen for some AVX and SSE2 LLVM intrinsics rust-lang/stdarch#1463 reimplemented some x86 intrinsics to avoid using some x86-specific LLVM intrinsics: * Store unaligned (`_mm*_storeu_*`) use `<*mut _>::write_unaligned` instead of `llvm.x86.*.storeu.*`. * Shift by immediate (`_mm*_s{ll,rl,ra}i_epi*`) use `if` (srl, sll) or `min` (sra) to simulate the behaviour when the RHS is out of range. RHS is constant, so the `if`/`min` will be optimized away. This PR updates the stdarch submodule to pull these changes and removes special handling for those LLVM intrinsics from cranelift codegen. I left gcc codegen untouched because there are some autogenerated lists.

Those were removed from stdarch in rust-lang/stdarch#1463 (`<*mut _>::write_unaligned` is used instead)

…ediate intrinsics Those were removed from stdarch in rust-lang/stdarch#1463 (`simd_shl` and `simd_shr` are used instead)

Update stdarch submodule and remove special handling in cranelift codegen for some AVX and SSE2 LLVM intrinsics rust-lang/stdarch#1463 reimplemented some x86 intrinsics to avoid using some x86-specific LLVM intrinsics: * Store unaligned (`_mm*_storeu_*`) use `<*mut _>::write_unaligned` instead of `llvm.x86.*.storeu.*`. * Shift by immediate (`_mm*_s{ll,rl,ra}i_epi*`) use `if` (srl, sll) or `min` (sra) to simulate the behaviour when the RHS is out of range. RHS is constant, so the `if`/`min` will be optimized away. This PR updates the stdarch submodule to pull these changes and removes special handling for those LLVM intrinsics from cranelift codegen. I left gcc codegen untouched because there are some autogenerated lists.

eduardosm added 12 commits August 29, 2023 17:21

Implement SSE2 and AVX unaligned stores (storeu) with `<*mut T>::writ…

56ec699

…e_unaligned` instead of LLVM intrinsics

Implement SSE2 shift by immediate (slli, srli, srai) with `simd_sh{l,…

7055451

…r}` instead of LLVM intrinsics

Implement AVX2 shift by immediate (slli, srli, srai) with `simd_sh{l,…

e6dae05

…r}` instead of LLVM intrinsics

Implement AVX512F 32-bit shift by immediate (slli_epi32) with `simd_s…

016e4d1

…hl` instead of LLVM intrinsics

Implement AVX512F 64-bit shift by immediate (slli_epi64) with `simd_s…

d65b426

…hl` instead of LLVM intrinsics

Implement AVX512F 32-bit shift by immediate (srli_epi32) with `simd_s…

fb7b761

…hr` instead of LLVM intrinsics

Implement AVX512F 64-bit shift by immediate (srli_epi64) with `simd_s…

2fecd0c

…hr` instead of LLVM intrinsics

Implement AVX512F 32-bit shift by immediate (srai_epi32) with `simd_s…

44be785

…hr` instead of LLVM intrinsics

Implement AVX512F 64-bit shift by immediate (srai_epi64) with `simd_s…

2d6e98f

…hr` instead of LLVM intrinsics

Implement AVX512BW 16-bit shift by immediate (slli_epi16) with `simd_…

9be2a21

…shl` instead of LLVM intrinsics

Implement AVX512BW 16-bit shift by immediate (srli_epi16) with `simd_…

102bbdc

…shr` instead of LLVM intrinsics

Implement AVX512BW 16-bit shift by immediate (srai_epi16) with `simd_…

2c665fe

…shr` instead of LLVM intrinsics

rustbot assigned Amanieu Aug 29, 2023

Amanieu reviewed Aug 29, 2023

View reviewed changes

Fix _mm_srli_epi64

6818ed8

Amanieu merged commit ff07f35 into rust-lang:master Aug 30, 2023
26 checks passed

eduardosm mentioned this pull request Sep 5, 2023

Update stdarch submodule and remove special handling in cranelift codegen for some AVX and SSE2 LLVM intrinsics rust-lang/rust#115580

Merged

bjorn3 pushed a commit to rust-lang/rustc_codegen_cranelift that referenced this pull request Sep 7, 2023

Remove special handling in codegen for some SSE2 "storeu" intrinsics

9f562f2

Those were removed from stdarch in rust-lang/stdarch#1463 (`<*mut _>::write_unaligned` is used instead)

eduardosm deleted the x86-intrinsics branch September 13, 2023 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement some x86 intrinsics without arch-specific LLVM intrinsics #1463

Reimplement some x86 intrinsics without arch-specific LLVM intrinsics #1463

eduardosm commented Aug 29, 2023

rustbot commented Aug 29, 2023

Amanieu Aug 29, 2023

eduardosm Aug 29, 2023

Reimplement some x86 intrinsics without arch-specific LLVM intrinsics #1463

Reimplement some x86 intrinsics without arch-specific LLVM intrinsics #1463

Conversation

eduardosm commented Aug 29, 2023

rustbot commented Aug 29, 2023

Amanieu Aug 29, 2023

Choose a reason for hiding this comment

eduardosm Aug 29, 2023

Choose a reason for hiding this comment