-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot iterate over a Tuple of mixed Type #607
Comments
This is not easily fixable on the GPUCompiler.jl side, as it's Julia's codegen generating runtime-reliant code here. For example, a Metal-based MWE: using Metal
function kernel(a, t)
for j in 1:2
@inbounds a[1] = t[j]
end
return
end
function main()
a = Metal.ones(1)
@metal kernel(a, (1, 3.0))
end Generating CPU code for this requires the runtime: julia> code_llvm(kernel, Tuple{Vector{Float32}, Tuple{Float32,Float64}})
; @ /Users/tim/Julia/pkg/Metal/wip.jl:3 within `kernel`
define void @julia_kernel_4986({}* noundef nonnull align 16 dereferenceable(40) %0, { float, double }* nocapture noundef nonnull readonly align 8 dereferenceable(16) %1) #0 {
top:
%gcframe19 = alloca [3 x {}*], align 16
%gcframe19.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe19, i64 0, i64 0
%2 = bitcast [3 x {}*]* %gcframe19 to i8*
call void @llvm.memset.p0i8.i64(i8* align 16 %2, i8 0, i64 24, i1 true)
%3 = call {}*** inttoptr (i64 6926966180 to {}*** (i64)*)(i64 261) #4
%4 = bitcast [3 x {}*]* %gcframe19 to i64*
store i64 4, i64* %4, align 16
%5 = load {}**, {}*** %3, align 8
%6 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe19, i64 0, i64 1
%7 = bitcast {}** %6 to {}***
store {}** %5, {}*** %7, align 8
%8 = bitcast {}*** %3 to {}***
store {}** %gcframe19.sub, {}*** %8, align 8
%9 = bitcast { float, double }* %1 to i8*
%10 = bitcast {}* %0 to float**
; @ /Users/tim/Julia/pkg/Metal/wip.jl:5 within `kernel`
; ┌ @ tuple.jl:31 within `getindex`
%ptls_field20 = getelementptr inbounds {}**, {}*** %3, i64 2
%11 = bitcast {}*** %ptls_field20 to i8**
%ptls_load2122 = load i8*, i8** %11, align 8
%box = call noalias nonnull dereferenceable(32) {}* @ijl_gc_pool_alloc(i8* %ptls_load2122, i32 800, i32 32) #10
%12 = bitcast {}* %box to i64*
%13 = getelementptr inbounds i64, i64* %12, i64 -1
store atomic i64 4859175776, i64* %13 unordered, align 8
%14 = bitcast {}* %box to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(16) %14, i8* noundef nonnull align 8 dereferenceable(16) %9, i64 16, i1 false)
%15 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe19, i64 0, i64 2
store {}* %box, {}** %15, align 16
%16 = call {}* @ijl_get_nth_field_checked({}* nonnull %box, i64 0)
; └
%17 = bitcast {}* %16 to i64*
%18 = getelementptr inbounds i64, i64* %17, i64 -1
%19 = load atomic i64, i64* %18 unordered, align 8
%20 = and i64 %19, -16
switch i64 %20, label %L16 [
i64 4904455440, label %L7
i64 4904455376, label %L12
] I don't think we can easily specialize this; one solution would be to add specialized codegen support to essentially do union splitting, but I'm not sure that's worth it. For a workaround, try unrolling this loop at the Julia level (i.e., before codegen, not using LLVMLoopInfo.jl). |
I am posting this as a "bug", butI am not sure if it's fixable. For clarity, the actual error with JuliaGPU/CUDA.jl#2450 is due to the fact that we can't iterate through a Tuple of mixed type. I think this is a Julia-specific problem because tbh no one else is crazy enough to send a container of mixed type to the GPU. Julia has to do it though because Functions have their own inherent type, so there is no other way to pass in functions.
Anyway, here's an example of something that fails (in KA, sorry):
The workarounds for JuliaGPU/CUDA.jl#2450 also work here.
CUDA Errors:
AMD Error:
The text was updated successfully, but these errors were encountered: