-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Portable math functions don't get inlined #523
Comments
Looks like #449 is not limited to PowerPC. |
26edfc2 might help (extend it to the other instruction sets) |
This (https://godbolt.org/g/SQnNfN) #![feature(stdsimd)]
use std::simd::f32x2;
pub fn sin(a: f32x2) -> f32x2 {
a.sin()
} produces the following LLVM IR: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
define void @_ZN7example3sin17h9c036eb2ab1eadaaE(<2 x float>* noalias nocapture sret dereferenceable(8), <2 x float>* noalias nocapture readonly dereferenceable(8) %a) unnamed_addr #0 {
%arg.i = alloca <2 x float>, align 8
%1 = bitcast <2 x float>* %a to i64*
%2 = load i64, i64* %1, align 8
%3 = bitcast <2 x float>* %arg.i to i8*
call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %3)
%4 = bitcast <2 x float>* %arg.i to i64*
store i64 %2, i64* %4, align 8, !noalias !0
call void @"_ZN97_$LT$core..coresimd..ppsv..v64..f32x2$u20$as$u20$core..coresimd..ppsv..codegen..sin..FloatSin$GT$3sin17hae53fde8f08ed2edE"(<2 x float>* noalias nocapture nonnull sret dereferenceable(8) %0, <2 x float>* noalias nocapture nonnull dereferenceable(8) %arg.i) #2, !noalias !4
call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %3)
ret void
}
declare void @"_ZN97_$LT$core..coresimd..ppsv..v64..f32x2$u20$as$u20$core..coresimd..ppsv..codegen..sin..FloatSin$GT$3sin17hae53fde8f08ed2edE"(<2 x float>* noalias nocapture sret dereferenceable(8), <2 x float>* noalias nocapture dereferenceable(8)) unnamed_addr #0
declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
attributes #0 = { nounwind "probe-stack"="__rust_probestack" }
attributes #1 = { argmemonly nounwind }
attributes #2 = { nounwind }
!0 = !{!1, !3}
!1 = distinct !{!1, !2, !"_ZN4core8coresimd4ppsv3v645f32x23sin17h532ec7bf70ae1d2dE: argument 0"}
!2 = distinct !{!2, !"_ZN4core8coresimd4ppsv3v645f32x23sin17h532ec7bf70ae1d2dE"}
!3 = distinct !{!3, !2, !"_ZN4core8coresimd4ppsv3v645f32x23sin17h532ec7bf70ae1d2dE: %self"}
!4 = !{!3} |
cc @rkruppe @alexcrichton any idea why this might not be getting inlined? |
Probably because we are missing |
@CryZe can you try to reproduce with https://github.com/gnzlbg/ppv upstream? The problem should be fixed there. |
FWIW, there is a new clippy lint (https://rust-lang-nursery.github.io/rust-clippy/master/index.html#missing_inline_in_public_items) that detects whether a function, trait method impl, etc. is not marked with |
I noticed the same thing looking into some Currently That leads to the paradoxical situation that scalar math (e.g.,
While explicitly vectorized code
|
This has been fixed upstream. |
Gets compiled to this:
Godbolt
I don't think this is intended.
The text was updated successfully, but these errors were encountered: