-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4bf8645
commit b4f8595
Showing
1 changed file
with
57 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
+++ | ||
title = "Metal.jl 1.4: Metal.rand" | ||
author = "Christian Guinard" | ||
abstract = """ | ||
Metal.jl 1.4 adds higher-quality on-device random number generation from Metal Performance | ||
Shaders. Some limitations apply, with fallback to the previously-existing rand | ||
implementation in those situations.""" | ||
+++ | ||
{{abstract}} | ||
|
||
## Metal.rand and friends | ||
|
||
Using functionality provided by the Metal Performance Shaders, improved on-gpu random number | ||
generation has been implemented. Uniform distributions using `Metal.rand` (and its in-place | ||
variant `Metal.rand!`) are available for all Metal-supported integer types and Float32. | ||
However, due to Metal [API](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400767-copyfrombuffer?language=objc) | ||
limitations, 8-bit and 16-bit integers may fall back to the lower-quality GPUArrays.jl random | ||
numbers if their size in bytes is not a multiple of 4. Normally distributed Float32 values can be | ||
generated for with `Metal.randn` and `Metal.randn!`. Float16 is not supported by the Metal | ||
Performance Shaders RNG, and will always fall back to the GPUArrays implementation. | ||
|
||
The easiest way to use these is to use the Metal convenience functions `Metal.rand[n][!]` | ||
as you would the usual functions. However, the regular Random.jl methods can also be used | ||
by providing the appropriate `RNG` either from `MPS.default_rng()` or `MPS.RNG()` to the | ||
standard `Random.rand[n][!]` functions. | ||
|
||
|
||
|
||
## Other improvements since the last blog post | ||
|
||
- Since v0.5: `MtlArray` storage mode has been parameterized, allowing one to create a shared storage `MtlArray` | ||
by calling `MtlArray{eltype, ndims, Metal.SharedStorage}(...)`. | ||
- Since v0.3: MPS-accelerated decompositions were added. | ||
- Various performance improvements | ||
- *Many* bug fixes. | ||
|
||
|
||
## Future work | ||
|
||
Although Metal.jl is now in v1, there is still work to be done to make it as fast and | ||
feature-complete. In particular: | ||
|
||
- since the last blog post, Metal.jl has started using native ObjectiveC FFI for wrapping | ||
Metal APIs. However, these wrappers have to be written manually for every piece of | ||
Objective-C code. We are looking for help improving Clang.jl and ObjectiveC.jl to enable | ||
the automatic generation of these wrappers. See tracking [issue](https://github.com/JuliaInterop/ObjectiveC.jl/issues/41); | ||
- the MPS wrappers are incomplete, automatic wrapper generation would greatly help with | ||
full MPS support; | ||
- support for atomic operations is missing, which is required to implement a full-featured | ||
KernelAbstractions.jl back-end. See tracking [issue](https://github.com/JuliaGPU/Metal.jl/issues/218); | ||
- full support for BFloat16 values, which has been supported since Metal 3.1 (macOS 14), | ||
is not yet in Metal.jl. See tracking [issue](https://github.com/JuliaGPU/Metal.jl/issues/298); | ||
- some functionality present in CUDA.jl could be ported to Metal.jl to improve usability. | ||
See tracking [issue](https://github.com/JuliaGPU/Metal.jl/issues/443); | ||
- general performance improvements. In particular, improvements to the ObjectiveC.jl type model | ||
could greatly reduce the number of allocations currently necessary for every | ||
Objective-C/Metal operation. See tracking [issue](https://github.com/JuliaInterop/ObjectiveC.jl/issues/13). |