You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instantiating copies of all operators for all supported data types increases the binary size and compile time of the library. For example I'm currently prototyping adding f16 support, and a naive implementation increased the size of the rten CLI tool by 300 KB / 11%.
For operators which merely move or copy data, such as Transpose, we only need different code based on the size of elements. A single instantiation could handle i32, u32 and f32 for example.
Operators which could use this optimization include:
Operators which rely only on elements implementing Copy. This includes all the layout operations.
Bitwise operations
Sign for certain combinations of types where the sign bit is in the same place
Maybe operations which care about bitwise equality with zero. For floats this is complicated by +0.0 vs -0.0.
Maybe operations which care about bitwise equality of values. For floats this is complicated by +0.0 vs -0.0, NaN etc.
The text was updated successfully, but these errors were encountered:
Maybe operations which care about bitwise equality with zero. For floats this is complicated by +0.0 vs -0.0.
I suppose we could implement eg. NonZero for a given bit width with shared code using a function which receives two different zero values as arguments. When used on types which only have a single zero value, these two values would be the same.
Instantiating copies of all operators for all supported data types increases the binary size and compile time of the library. For example I'm currently prototyping adding f16 support, and a naive implementation increased the size of the rten CLI tool by 300 KB / 11%.
For operators which merely move or copy data, such as
Transpose
, we only need different code based on the size of elements. A single instantiation could handlei32
,u32
andf32
for example.Operators which could use this optimization include:
Copy
. This includes all the layout operations.Sign
for certain combinations of types where the sign bit is in the same placeThe text was updated successfully, but these errors were encountered: