-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: Tim Besard <[email protected]>
- Loading branch information
1 parent
4bf8645
commit af12482
Showing
1 changed file
with
85 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
+++ | ||
title = "Metal.jl 1.4: Improved random numbers" | ||
author = "Christian Guinard" | ||
abstract = """ | ||
Metal.jl 1.4 adds higher-quality random number generators from the Metal Performance | ||
Shaders library. Some limitations apply, with a fallback to the current implementation | ||
in those situations.""" | ||
+++ | ||
{{abstract}} | ||
|
||
|
||
## `Metal.rand` and friends | ||
|
||
Using functionality provided by the Metal Performance Shaders (MPS) library, Metal.jl now | ||
comes with much improved GPU random number generators. Uniform distributions using | ||
`Metal.rand` (and its in-place variant `Metal.rand!`) are available for all Metal-supported | ||
integer types and `Float32`. However, due to [Metal API | ||
limitations](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400767-copyfrombuffer?language=objc), | ||
8-bit and 16-bit integers may fall back to the lower-quality GPUArrays.jl random number | ||
generator if their size in bytes is not a multiple of 4. Normally distributed `Float32` | ||
values can be generated for with `Metal.randn` and `Metal.randn!`, while `Float16` is not | ||
supported by the MPS library and will always fall back to the GPUArrays implementation. | ||
|
||
The easiest way to use these is to use the Metal convenience functions `Metal.rand[n][!]` as | ||
you would the usual functions from the Random.jl standard library: | ||
|
||
```julia-repl | ||
julia> a = Metal.rand(Float32, 2) | ||
2-element MtlVector{Float32, Metal.PrivateStorage}: | ||
0.95755994 | ||
0.7110207 | ||
julia> Metal.randn!(a) | ||
2-element MtlVector{Float32, Metal.PrivateStorage}: | ||
1.7230463 | ||
0.55636907 | ||
``` | ||
|
||
However, the Random.jl methods can also be used by providing the appropriate `RNG` either | ||
from `MPS.default_rng()` or `MPS.RNG()` to the standard `Random.rand[n][!]` functions: | ||
|
||
|
||
```julia-repl | ||
julia> using Random | ||
julia> rng = MPS.RNG(); | ||
julia> Random.rand(rng, 2) | ||
2-element MtlVector{Float32, Metal.PrivateStorage}: | ||
0.8941469 | ||
0.67628527 | ||
``` | ||
|
||
Seeding is done by calling `Metal.seed!` for the global RNG, or `Random.seed!` when working | ||
with an explicit `RNG` object. | ||
|
||
|
||
## Other improvements since the last blog post | ||
|
||
- Since v0.5: `MtlArray` storage mode has been parameterized, allowing one to create a | ||
shared storage `MtlArray` by calling `MtlArray{eltype, ndims, Metal.SharedStorage}(...)`. | ||
- Since v0.3: MPS-accelerated decompositions were added. | ||
- Various performance improvements | ||
- *Many* bug fixes. | ||
|
||
|
||
## Future work | ||
|
||
Although Metal.jl is now in v1, there is still work to be done to make it as fast and | ||
feature-complete as possible. In particular: | ||
|
||
- Metal.jl is now using native ObjectiveC FFI for wrapping Metal APIs. However, these | ||
wrappers have to be written manually for every piece of Objective-C code. *We are looking | ||
for help with improving Clang.jl and ObjectiveC.jl* to [enable the automatic generation of | ||
these wrappers](https://github.com/JuliaInterop/ObjectiveC.jl/issues/41); | ||
- The MPS wrappers are incomplete, automatic wrapper generation would greatly help with full | ||
MPS support; | ||
- To implement a full-featured KernelAbstractions.jl back-end, Metal atomic operations need | ||
to [be hooked up to Atomix](https://github.com/JuliaGPU/Metal.jl/issues/218); | ||
- [Full support for BFloat16 values](https://github.com/JuliaGPU/Metal.jl/issues/298), which | ||
has been supported since Metal 3.1 (macOS 14), is not yet available in Metal.jl. There is, | ||
however, a [draft PR](https://github.com/JuliaGPU/Metal.jl/pull/446) in the works. Check | ||
it out if you're interested in helping out; | ||
- Some functionality present in CUDA.jl [could be ported to Metal.jl to improve | ||
usability](https://github.com/JuliaGPU/Metal.jl/issues/443). |