From 770f8fab598389a99f4d052fb96846fcf8c6c182 Mon Sep 17 00:00:00 2001 From: Tim Besard Date: Mon, 13 Jan 2025 13:35:59 +0100 Subject: [PATCH] Add blog post on OpenCL.jl 0.10. --- post/2025-01-13-opencl_0.10.md | 166 +++++++++++++++++++++++++++++++++ 1 file changed, 166 insertions(+) create mode 100644 post/2025-01-13-opencl_0.10.md diff --git a/post/2025-01-13-opencl_0.10.md b/post/2025-01-13-opencl_0.10.md new file mode 100644 index 0000000..824fc9b --- /dev/null +++ b/post/2025-01-13-opencl_0.10.md @@ -0,0 +1,166 @@ ++++ +title = "OpenCL.jl 0.10: Now with native Julia kernels" +author = "Tim Besard" +abstract = """ + Version 0.10 of OpenCL.jl is a significant release that adds support for native Julia + kernels. This necessitated a major overhaul of the package's internals, bringing the + package in line with modern Julia GPU programming practices.""" ++++ + +{{abstract}} + + +## Native Julia kernels + +The highlight of this release is the addition of **a compiler that makes it possible to +write OpenCL kernels in Julia** instead of having to use OpenCL C and accompanying +string-based APIs. Let's illustrate using the typical `vadd` vector-additional example, +which starts by generating some data and uploading it to the GPU: + +```julia +using OpenCL + +dims = (2,) +a = round.(rand(Float32, dims) * 100) +b = round.(rand(Float32, dims) * 100) +c = similar(a) + +d_a = CLArray(a) +d_b = CLArray(b) +d_c = CLArray(c) +``` + +The typical way to write a kernel is to use a string with OpenCL C code, which is then +compiled and executed on the GPU. This is done as follows: + +```julia +const source = """ + __kernel void vadd(__global const float *a, + __global const float *b, + __global float *c) { + int i = get_global_id(0); + c[i] = a[i] + b[i]; + }""" + +prog = cl.Program(; source) |> cl.build! +kern = cl.Kernel(prog, "vadd") + +len = prod(dims) +clcall(kern, Tuple{Ptr{Float32}, Ptr{Float32}, Ptr{Float32}}, + d_a, d_b, d_c; global_size=(len,)) +``` + +With the new GPUCompiler.jl-based compiler, you can now write the kernel in Julia just like +with our other back-ends: + +```julia +function vadd(a, b, c) + i = get_global_id() + @inbounds c[i] = a[i] + b[i] + return +end + +len = prod(dims) +@opencl global_size=len vadd(d_a, d_b, d_c) +``` + +This is of course a much more natural way to write kernels, and it also allows for OpenCL.jl +to be plugged into the rest of the JuliaGPU ecosystem. Concretely, OpenCL.jl now implements +the GPUArrays.jl interface, enabling lots of vendor-neutral functionality, and also provides +a KernelAbstractions.jl back-end for use with the plenty of libraries that build on top of +KernelAbstractions.jl. + +There is no free lunch, though, and **the native compiler functionality currently relies on +your OpenCL driver supporting SPIR-V**. This is sadly not a common feature, e.g., neither +NVIDIA or ADM's OpenCL drivers support it, only Intel's. But if you are stuck with a driver +that does not support SPIR-V, there is still hope: SPIR-V can be compiled back to OpenCL C, +using [Google `clspv`](https://github.com/google/clspv). If you are interested, check out +[this issue](https://github.com/JuliaGPU/OpenCL.jl/issues/234) and feel free to reach out. + + +## Breaking API changes + +Existing users of OpenCL.jl will of course have noticed that even the string-based example +above uses a different API than before. In order to support the new compiler, and bring +OpenCL.jl in line with modern Julia programming practices, we have **significantly +overhauled the package's internals as well as some external APIs**. + +The most significant high-level changes include: + +- Memory management is now done using `CLArray`, backed by Shared Virtual Memory (SVM), + instead of opaque buffers. Raw buffers are still supported, but not compatible with native + kernel execution (because they can not be converted to a pointer). +- Kernels are called using the new `clcall` function, which performs automatic conversion + of objects much like how `ccall` works. + +At the lower-level (of the `cl` submodule), the changes are more extensive: + +- Context, device and queue arguments have been removed from most APIs, and are now stored + in task-local storage. These values can be queried (`cl.platform()`, `cl.device()`, etc) + and set (`cl.platform!(platform)`, `cl.device!(device)`, etc) as needed. +- As part of the above change, questionable APIs like `cl.create_some_context()` and + `cl.devices()` have been removed; +- The `Buffer` API has been completely reworked. It now only provides low-level + functionality, such as `unsafe_copyto!` or `unsafe_map!`, while high-level functionality + like `copy!` is implemented for the CLArray type; +- The `cl.info` method, and the `getindex` overloading to access properties of OpenCL + objects, have been replaced by `getproperty` overloading on the objects themselves + (e.g., `cl.info(dev, :name)` and `dev[:name]` are now simply `dev.name`); +- The blocking `cl.launch` has been replaced by a nonblocking `cl.call`, while also removing + the `getindex`-overloading shorthand. However, it's recommended to use the + newly-added `cl.clcall` function, which takes an additional tuple type argument and + performs automatic conversions of arguments to those types. This makes it possible to + pass a `CLArray` to an OpenCL C function expecting Buffer-backed pointers, for example. +- Argument conversion has been removed; the user should make sure Julia arguments passed to + kernels match the OpenCL argument types (i.e., no empty types, 4-element tuples for a + 3-element `float3` arguments). +- The `to_host` function has been replaced by simply calling `Array` on the `CLArray`. +- Queue and execution capabilities of a device are now to be queried using dedicated + functions, `cl.queue_properties` and `cl.exec_capabilities`. + +Working towards the first stable version of this package, we anticipate having to make even +more breaking changes. However, we want to get the current changes out there to get feedback +from the community. If some of the removed functionality is crucial to your workflow, feel +free to reach out and we can discuss how to best support it in the future. + + +## JLL-based OpenCL drivers + +Another significant change is the **integration with OpenCL drivers built and provided using +Julia's BinaryBuilder infrastructure**. Over time, this should simplify the installation of +OpenCL drivers by avoiding the need to install global drivers. For now, the only driver +provided as a JLL is a CPU driver based on the [Portable Computing Language (PoCL) +library](https://portablecl.org/). This driver can be used by simply installing and loading +`pocl_jll` before you start using OpenCL.jl: + +```julia-repl +julia> using OpenCL, pocl_jll + +julia> OpenCL.versioninfo() +OpenCL.jl version 0.10.0 + +Toolchain: + - Julia v1.11.2 + - OpenCL_jll v2024.5.8+1 + +Available platforms: 1 + - Portable Computing Language + OpenCL 3.0, PoCL 6.0 Apple, Release, RELOC, SPIR-V, LLVM 16.0.6jl, SLEEF, DISTRO, POCL_DEBUG + ยท cpu (fp16, fp64, il) +``` + +Notice the `il` capability reported by `OpenCL.versioninfo()`, indicating that PoCL supports +SPIR-V and can thus be used with the new native Julia kernel compiler. In fact, this is one +of the goals of reworking OpenCL.jl: to provide a CPU fallback implementation for use with +Julia GPU libraries. + + +## Work towards OpenCL.jl 1.0 + +This release is a significant step towards a stable 1.0 release of OpenCL.jl, bringing the +package in line with our other Julia GPU-backends. Our focus is on improving OpenCL.jl in +order to support a CPU fallback back-end for KernelAbstractions.jl based on PoCL. If you +are a user of OpenCL.jl, or are interested in using the package in the future, please test +out this release with your application and/or driver, and provide feedback on the changes +we've made. Pull requests are greatly appreciated, and we are happy to help you get started +with contributing to the package.