Proposal: FP16 value type and operations #1497

SPY · 2024-01-30T00:01:34Z

Motivation

From a lab toy ML found its adoption in day-to-day usage and integrated in numerous web application now.
To unblock full potential of AI-augmented application many initiative came to Web space recently.
WebGPU proposal allows GPU enabled machines to perform better on AI front.
Half-precision floating point number is a common choice for ML usage, because it provides better memory bandwidth and performance, and less precision is not that important there.
JS Float16Array proposal improve integration of JS and WebGPU API.
Wasm Memory control proposal aims to make GPU <-> Wasm interaction more efficient by reducing memory traffic.
Modern hardware as well brings more native support for FP16: ARMv8-A NEON fp16 and F16C for example.
I believe introduction of native support for half-precision floating point computation to WebAssembly will extend that could be achieved on that field as well and match and complement trends going on the hardware stage.

Potential solutions

Second-class support

We can mimic JS approach and introduce only 2 memory instruction for reading and writing f32 values in binary16 format.

f32.load_f16: [i32] -> [f32]
f32.store_f16: [i32, f32] -> []

It is easy to implement by VM, but only makes more efficient communication with memory regions shared with GPU somehow.

First-class support

For full scale support I suggest to refer a dedicated explainer for more details.

Briefly,

New value type: f16.
New shape for v128 values: f16x8.
Instructions for scalar arithmetic operations over f16 to be on parity with f32 values.
Vector instructions for f16x8 shape.

Despite the fact it is more invasive change it unblocks not only better interaction with GPU originated memory, but also could provide fallback for devices without GPU available for web-usage. Also, it could be used for smaller ML models: text processing, context inference, etc.

Conclusion

I believe second first-class support approach is more beneficial for ecosystem.
All said above could be also applied to non-ML graphic applications.

The text was updated successfully, but these errors were encountered:

bakkot · 2024-03-29T07:09:22Z

People pursuing this may wish to follow along at tc39/proposal-float16array#12: x86 prior to Sapphire Rapids does not have a native way to do casts from float64 to float16, which means it would need to be done in software (though it can probably be done fairly cheaply, depending on your definition of "cheaply").

Not relevant if there's only casts from f32, but do note that f64 -> f32 -> f16 gives different results than f64 -> f16 (because of rounding), so it may make sense to have both, particularly as languages like C, C++ and Swift add native support for f16 and have casts from f64 -> f16.

syg · 2024-03-29T21:07:38Z

x86 prior to Sapphire Rapids

It's worth repeating that Sapphire Rapids is Xeon. There's nothing on Intel roadmaps AFAICT to bring this AVX512 extension to consumer chips.

sunfishcode added the floating point label Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: FP16 value type and operations #1497

Proposal: FP16 value type and operations #1497

SPY commented Jan 30, 2024

bakkot commented Mar 29, 2024 •

edited

Loading

syg commented Mar 29, 2024

Proposal: FP16 value type and operations #1497

Proposal: FP16 value type and operations #1497

Comments

SPY commented Jan 30, 2024

Motivation

Potential solutions

Second-class support

First-class support

Conclusion

bakkot commented Mar 29, 2024 • edited Loading

syg commented Mar 29, 2024

bakkot commented Mar 29, 2024 •

edited

Loading