Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blog post on OpenCL.jl 0.10. #49

Merged
merged 1 commit into from
Jan 13, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 166 additions & 0 deletions post/2025-01-13-opencl_0.10.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
+++
title = "OpenCL.jl 0.10: Now with native Julia kernels"
author = "Tim Besard"
abstract = """
Version 0.10 of OpenCL.jl is a significant release that adds support for native Julia
kernels. This necessitated a major overhaul of the package's internals, bringing the
package in line with modern Julia GPU programming practices."""
+++

{{abstract}}


## Native Julia kernels

The highlight of this release is the addition of **a compiler that makes it possible to
write OpenCL kernels in Julia** instead of having to use OpenCL C and accompanying
string-based APIs. Let's illustrate using the typical `vadd` vector-additional example,
which starts by generating some data and uploading it to the GPU:

```julia
using OpenCL

dims = (2,)
a = round.(rand(Float32, dims) * 100)
b = round.(rand(Float32, dims) * 100)
c = similar(a)

d_a = CLArray(a)
d_b = CLArray(b)
d_c = CLArray(c)
```

The typical way to write a kernel is to use a string with OpenCL C code, which is then
compiled and executed on the GPU. This is done as follows:

```julia
const source = """
__kernel void vadd(__global const float *a,
__global const float *b,
__global float *c) {
int i = get_global_id(0);
c[i] = a[i] + b[i];
}"""

prog = cl.Program(; source) |> cl.build!
kern = cl.Kernel(prog, "vadd")

len = prod(dims)
clcall(kern, Tuple{Ptr{Float32}, Ptr{Float32}, Ptr{Float32}},
d_a, d_b, d_c; global_size=(len,))
```

With the new GPUCompiler.jl-based compiler, you can now write the kernel in Julia just like
with our other back-ends:

```julia
function vadd(a, b, c)
i = get_global_id()
@inbounds c[i] = a[i] + b[i]
return
end

len = prod(dims)
@opencl global_size=len vadd(d_a, d_b, d_c)
```

This is of course a much more natural way to write kernels, and it also allows for OpenCL.jl
to be plugged into the rest of the JuliaGPU ecosystem. Concretely, OpenCL.jl now implements
the GPUArrays.jl interface, enabling lots of vendor-neutral functionality, and also provides
a KernelAbstractions.jl back-end for use with the plenty of libraries that build on top of
KernelAbstractions.jl.

There is no free lunch, though, and **the native compiler functionality currently relies on
your OpenCL driver supporting SPIR-V**. This is sadly not a common feature, e.g., neither
NVIDIA or ADM's OpenCL drivers support it, only Intel's. But if you are stuck with a driver
that does not support SPIR-V, there is still hope: SPIR-V can be compiled back to OpenCL C,
using [Google `clspv`](https://github.com/google/clspv). If you are interested, check out
[this issue](https://github.com/JuliaGPU/OpenCL.jl/issues/234) and feel free to reach out.


## Breaking API changes

Existing users of OpenCL.jl will of course have noticed that even the string-based example
above uses a different API than before. In order to support the new compiler, and bring
OpenCL.jl in line with modern Julia programming practices, we have **significantly
overhauled the package's internals as well as some external APIs**.

The most significant high-level changes include:

- Memory management is now done using `CLArray`, backed by Shared Virtual Memory (SVM),
instead of opaque buffers. Raw buffers are still supported, but not compatible with native
kernel execution (because they can not be converted to a pointer).
- Kernels are called using the new `clcall` function, which performs automatic conversion
of objects much like how `ccall` works.

At the lower-level (of the `cl` submodule), the changes are more extensive:

- Context, device and queue arguments have been removed from most APIs, and are now stored
in task-local storage. These values can be queried (`cl.platform()`, `cl.device()`, etc)
and set (`cl.platform!(platform)`, `cl.device!(device)`, etc) as needed.
- As part of the above change, questionable APIs like `cl.create_some_context()` and
`cl.devices()` have been removed;
- The `Buffer` API has been completely reworked. It now only provides low-level
functionality, such as `unsafe_copyto!` or `unsafe_map!`, while high-level functionality
like `copy!` is implemented for the CLArray type;
- The `cl.info` method, and the `getindex` overloading to access properties of OpenCL
objects, have been replaced by `getproperty` overloading on the objects themselves
(e.g., `cl.info(dev, :name)` and `dev[:name]` are now simply `dev.name`);
- The blocking `cl.launch` has been replaced by a nonblocking `cl.call`, while also removing
the `getindex`-overloading shorthand. However, it's recommended to use the
newly-added `cl.clcall` function, which takes an additional tuple type argument and
performs automatic conversions of arguments to those types. This makes it possible to
pass a `CLArray` to an OpenCL C function expecting Buffer-backed pointers, for example.
- Argument conversion has been removed; the user should make sure Julia arguments passed to
kernels match the OpenCL argument types (i.e., no empty types, 4-element tuples for a
3-element `float3` arguments).
- The `to_host` function has been replaced by simply calling `Array` on the `CLArray`.
- Queue and execution capabilities of a device are now to be queried using dedicated
functions, `cl.queue_properties` and `cl.exec_capabilities`.

Working towards the first stable version of this package, we anticipate having to make even
more breaking changes. However, we want to get the current changes out there to get feedback
from the community. If some of the removed functionality is crucial to your workflow, feel
free to reach out and we can discuss how to best support it in the future.


## JLL-based OpenCL drivers

Another significant change is the **integration with OpenCL drivers built and provided using
Julia's BinaryBuilder infrastructure**. Over time, this should simplify the installation of
OpenCL drivers by avoiding the need to install global drivers. For now, the only driver
provided as a JLL is a CPU driver based on the [Portable Computing Language (PoCL)
library](https://portablecl.org/). This driver can be used by simply installing and loading
`pocl_jll` before you start using OpenCL.jl:

```julia-repl
julia> using OpenCL, pocl_jll

julia> OpenCL.versioninfo()
OpenCL.jl version 0.10.0

Toolchain:
- Julia v1.11.2
- OpenCL_jll v2024.5.8+1

Available platforms: 1
- Portable Computing Language
OpenCL 3.0, PoCL 6.0 Apple, Release, RELOC, SPIR-V, LLVM 16.0.6jl, SLEEF, DISTRO, POCL_DEBUG
· cpu (fp16, fp64, il)
```

Notice the `il` capability reported by `OpenCL.versioninfo()`, indicating that PoCL supports
SPIR-V and can thus be used with the new native Julia kernel compiler. In fact, this is one
of the goals of reworking OpenCL.jl: to provide a CPU fallback implementation for use with
Julia GPU libraries.


## Work towards OpenCL.jl 1.0

This release is a significant step towards a stable 1.0 release of OpenCL.jl, bringing the
package in line with our other Julia GPU-backends. Our focus is on improving OpenCL.jl in
order to support a CPU fallback back-end for KernelAbstractions.jl based on PoCL. If you
are a user of OpenCL.jl, or are interested in using the package in the future, please test
out this release with your application and/or driver, and provide feedback on the changes
we've made. Pull requests are greatly appreciated, and we are happy to help you get started
with contributing to the package.
Loading