You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From a lab toy ML found its adoption in day-to-day usage and integrated in numerous web application now.
To unblock full potential of AI-augmented application many initiative came to Web space recently. WebGPU proposal allows GPU enabled machines to perform better on AI front.
Half-precision floating point number is a common choice for ML usage, because it provides better memory bandwidth and performance, and less precision is not that important there. JS Float16Array proposal improve integration of JS and WebGPU API. Wasm Memory control proposal aims to make GPU <-> Wasm interaction more efficient by reducing memory traffic.
Modern hardware as well brings more native support for FP16: ARMv8-A NEON fp16 and F16C for example.
I believe introduction of native support for half-precision floating point computation to WebAssembly will extend that could be achieved on that field as well and match and complement trends going on the hardware stage.
Potential solutions
Second-class support
We can mimic JS approach and introduce only 2 memory instruction for reading and writing f32 values in binary16 format.
f32.load_f16: [i32] -> [f32]
f32.store_f16: [i32, f32] -> []
It is easy to implement by VM, but only makes more efficient communication with memory regions shared with GPU somehow.
Instructions for scalar arithmetic operations over f16 to be on parity with f32 values.
Vector instructions for f16x8 shape.
Despite the fact it is more invasive change it unblocks not only better interaction with GPU originated memory, but also could provide fallback for devices without GPU available for web-usage. Also, it could be used for smaller ML models: text processing, context inference, etc.
Conclusion
I believe second first-class support approach is more beneficial for ecosystem.
All said above could be also applied to non-ML graphic applications.
The text was updated successfully, but these errors were encountered:
People pursuing this may wish to follow along at tc39/proposal-float16array#12: x86 prior to Sapphire Rapids does not have a native way to do casts from float64 to float16, which means it would need to be done in software (though it can probably be done fairly cheaply, depending on your definition of "cheaply").
Not relevant if there's only casts from f32, but do note that f64 -> f32 -> f16 gives different results than f64 -> f16 (because of rounding), so it may make sense to have both, particularly as languages like C, C++ and Swift add native support for f16 and have casts from f64 -> f16.
Motivation
From a lab toy ML found its adoption in day-to-day usage and integrated in numerous web application now.
To unblock full potential of AI-augmented application many initiative came to Web space recently.
WebGPU proposal allows GPU enabled machines to perform better on AI front.
Half-precision floating point number is a common choice for ML usage, because it provides better memory bandwidth and performance, and less precision is not that important there.
JS Float16Array proposal improve integration of JS and WebGPU API.
Wasm Memory control proposal aims to make GPU <-> Wasm interaction more efficient by reducing memory traffic.
Modern hardware as well brings more native support for FP16: ARMv8-A NEON fp16 and F16C for example.
I believe introduction of native support for half-precision floating point computation to WebAssembly will extend that could be achieved on that field as well and match and complement trends going on the hardware stage.
Potential solutions
Second-class support
We can mimic JS approach and introduce only 2 memory instruction for reading and writing f32 values in binary16 format.
f32.load_f16: [i32] -> [f32]
f32.store_f16: [i32, f32] -> []
It is easy to implement by VM, but only makes more efficient communication with memory regions shared with GPU somehow.
First-class support
For full scale support I suggest to refer a dedicated explainer for more details.
Briefly,
f16
.v128
values:f16x8
.f16
to be on parity withf32
values.f16x8
shape.Despite the fact it is more invasive change it unblocks not only better interaction with GPU originated memory, but also could provide fallback for devices without GPU available for web-usage. Also, it could be used for smaller ML models: text processing, context inference, etc.
Conclusion
I believe second first-class support approach is more beneficial for ecosystem.
All said above could be also applied to non-ML graphic applications.
The text was updated successfully, but these errors were encountered: