Efficient bulk transfer of a i8 array's contents into a JavaScript ArrayBuffer? #568

lax1dude · 2024-10-26T22:14:58Z

Hello, first of all I am sorry if I've failed to do sufficient research on this topic and am fundamentally misunderstanding some core aspect of the WASM GC extension, please let me know if that is the case, but I've been browsing the spec for hours and can't come to any solid conclusions on this specific question of mine.

WebGL primarily accepts JavaScript typed arrays backed by ArrayBuffers for any functions that deal with large chunks of binary data to pass to the GPU efficiently (uploading textures, vertex buffers, glUniformMatrix, etc). For my specific application I am compiling Java to run in a browser, and being able to efficiently stream vertex data that is generated and uploaded on a per-frame basis is crucial for performance, since I am emulating the fixed function pipeline for a great deal of stuff.

Up to this point I have always just compiled Java to JavaScript and completely ignored WebAssembly under the assumption that any performance gains I would get would be outweighed by not having efficient garbage collection, therefore its common for me to prepare vertex data and textures inside byte or int arrays and then pass them to WebGL efficiently by getting a reference to the JavaScript typed array that backs the primitive numeric Java array at runtime.

I was very exited to learn that WASM GC is now available in Chrome and Firefox, however I've spent a considerable amount of time browsing the spec and I've come to the conclusion that it will probably no longer be possible to pass the data in a primitive numeric Java array to WebGL without copying the data, which in my opinion is still fine as long as performing the copy can be done efficiently.

However, it doesn't seem like it can be done efficiently, it seems like the only way to do it would be to program a for loop that copies the values out of the WASM GC array into a JavaScript typed array one value at a time. Doing it this way makes me want to bang my head against the wall, but I can't seem to find anyone else who's raised a similar concern yet. Is there any efficient way to perform a bulk copy of data from a numeric array in the WASM GC into an ArrayBuffer? If not, I think this feature would be a crucial addition to the spec in order to make porting code easier that was written for environments where this type of bulk transfer can be done efficiently.

Thank you for your time, and I am sorry if I missed anything or there is already an open discussion about this. I did see some other open issues regarding bulk transfer of an array's contents but none seem to cover the case of actually getting the data out of WASM and into a JavaScript ArrayBuffer to pass to browser APIs.

The text was updated successfully, but these errors were encountered:

lax1dude · 2024-10-26T22:48:28Z

Okay this may be a duplicate of #395

tlively · 2024-10-28T10:02:39Z

If you can wrap the ArrayBuffer in a WebAssembly.Memory and import it into your module as a memory, then yes, the ideas in #395 should cover this use case. It would probably make sense to have a version of memory.copy that copies between arrays and memories as well. If the ArrayBuffer comes into Wasm as an externref rather than as an imported memory, a better solution would be to provide builtin functions that can be imported to operate directly on the ArrayBuffer or JS typed arrays. This is similar to what we do for accessing JS strings efficiently: https://github.com/WebAssembly/js-string-builtins.

osa1 · 2024-11-12T08:51:40Z

However, it doesn't seem like it can be done efficiently, it seems like the only way to do it would be to program a for loop that copies the values out of the WASM GC array into a JavaScript typed array one value at a time.

FWIW, if you decide to do the copying manually right now (instead of waiting for the instructions, which we hope will be added at at some point), the most performant way we could find to do this is by doing the copying in JS rather than in Wasm. This is because V8 optimizes JS calling Wasm better than Wasm calling JS.

In dart2wasm, we generate this JS to copy one byte at a time from a Wasm array i8 to a JS Uint8Array: https://github.com/dart-lang/sdk/blob/a0c4efddb8ada994c3f8568268478a4948bc32b7/sdk/lib/_internal/wasm/lib/js_helper_patch.dart#L155-L160

      (jsArray, jsArrayOffset, wasmArray, wasmArrayOffset, length) => {
        const getValue = dartInstance.exports.$wasmI8ArrayGet;
        for (let i = 0; i < length; i++) {
          jsArray[jsArrayOffset + i] = getValue(wasmArray, wasmArrayOffset + i);
        }
      }

The Wasm function called by this JS:

(type $Array<WasmI8> (;0;) (array (field (mut i8))))

(func $_wasmI8ArrayGet (;87;) (export "$wasmI8ArrayGet") (param $var0 externref) (param $var1 i32) (result i32)
  local.get $var0
  any.convert_extern
  ref.cast $Array<WasmI8>
  local.get $var1
  array.get_u $Array<WasmI8>)

sjrd · 2024-11-12T09:59:30Z

The Wasm function called by this JS:

(type $Array<WasmI8> (;0;) (array (field (mut i8))))

(func $_wasmI8ArrayGet (;87;) (export "$wasmI8ArrayGet") (param $var0 externref) (param $var1 i32) (result i32)
  local.get $var0
  any.convert_extern
  ref.cast $Array<WasmI8>
  local.get $var1
  array.get_u $Array<WasmI8>)

FYI, you can simplify that function by directly taking a $Array<WasmI8>:

(func $_wasmI8ArrayGet (;87;) (export "$wasmI8ArrayGet") (param $var0 (ref $Array<WasmI8>)) (param $var1 i32) (result i32)
  local.get $var0
  local.get $var1
  array.get_u $Array<WasmI8>)

If it's just this function, it won't make a difference. But I suspect you may have a bunch of these cases if you have one. ;)

osa1 · 2024-11-12T10:28:18Z

FYI, you can simplify that function by directly taking a $Array:

We can't use Wasm GC types in exports and imports, wasm-opt --closed-world doesn't like them:

  --closed-world,-cw                            Assume code outside of the 
                                                module does not inspect or 
                                                interact with GC and function 
                                                references, even if they are 
                                                passed out. The outside may hold
                                                on to them and pass them back 
                                                in, but not inspect their 
                                                contents or call them.

It generates errors like:

[wasm-validator error in module] publicly exposed type disallowed with a closed world: $Array<_Type>, on 
(type $array.0 (array (mut (ref $struct.0))))

I don't really know how beneficial --closed-world is, given that it has some downsides as well. Maybe we should reconsider it.

jakobkummerow · 2024-11-12T11:50:18Z

Also, for the record, the V8 optimization that @osa1 mentioned currently only triggers for (nullable!) externref in the Wasm signature. We'd like to implement support for other reftypes at some point, but that's considerably harder to pull off, so we haven't gotten around to it yet.
Even aside from that particular optimization, implicit type checks at the boundary are currently implemented in a way that's quite a bit slower than using externref and then an explicit ref.cast on the Wasm side.

sjrd · 2024-11-12T12:27:07Z

Huh. That's good to know. Does the optimization trigger for nullable anyref as well, or not even that? anyref is basically the same as externref in the engines, AFAICT. Due to our language type system, our codegen generates a lot of anyrefs at the boundary with JS.

I guess that means we could speed up some things if we change our codegen a bit. But also I'm reluctant to do so because it's going to remove incentives for the engines to improve on that. 🤷‍♂️

jakobkummerow · 2024-11-12T12:57:23Z

anyref is currently unsupported by V8's Wasm-into-JS inlining.
anyref and externref are two distinct types because engines might choose different representations for them. V8 currently does that for null values. So the conversion is not a no-op.
I'm not saying that you should optimize for V8's current behavior, I'm just describing what that behavior is for now.

kripken · 2024-11-12T15:50:00Z

@osa1

[wasm-validator error in module] ..

Note that that wasm-opt validation error has been removed in WebAssembly/binaryen#7019. After that, you can use any type on the boundary. However, types on the boundary are assumed to be public, which means that wasm-opt will not modify them, so it will inhibit some of the benefits (potentially a lot, depending on the type).

I don't really know how beneficial --closed-world is, given that it has some downsides as well. Maybe we should reconsider it.

It is worth measuring how beneficial --closed-world is for you, but in general it can have huge benefits. By assuming the outside does not observe interior details of types, wasm-opt is able to remove fields, devirtualize, merge and remove intermediate types, etc. etc. We see very large improvements in many cases. As one data point I happened to have an unoptimized Dart file on my machine ("complex.wasm", must have been from a bug report?) and --closed-world makes it 20% smaller.

lax1dude · 2024-11-13T02:52:00Z

I’ve chosen to go the route of using a WebAssembly.Memory along with creating a special malloc/free implementation for the time being, since 90% of my code prepares data to send to the GPU in Java ByteBuffer objects that are backed by malloc/free in my “desktop runtime” and therefore my code is already set up to free them explicitly. This way I can pass them to WebGL without copying them, I can just create a typed array that views the slice of the memory that the buffer resides in.

I was able to successfully implement some intrinsic functions into the WASM GC backend of the Java to WASM compiler I’m using (TeaVM) in order to allow WASM GC programs to still create a conventional WASM memory and load/store to specific addresses. I was then able to use the intrinsic functions I added to remake the essence of emmalloc, and it ended up working out better than I ever imagined. I believe I’ve squashed all the bugs and can try to get my app running in WASM GC now with some buffer classes based on the intrinsic memory load/store functions.

The creator of TeaVM also said he plans to implement the same feature (“direct” Java NIO ByteBuffers using a WASM memory) into his compiler himself, he got pretty frustrated at me for suggesting the idea to him and forbid me from making any PRs on his repository related to this feature, so he’s clearly also planning to go the same route for TeaVM itself as some core feature. I also believe TeaVM is probably the most popular Java to WASM compiler right now among actual developers (not corporations trying to keep their old applets running), so it’s safe to say that accessing a conventional memory object from a WASM GC program is going to be a common practice for the near future in the JVM language crowd.

However, we probably shouldn't let it become the "best" practice. I personally think its impossible to ever implement some dedicated ArrayBuffer load/store/copy instructions that are as fast as just using a slice of a WebAssembly.Memory, but for apps that don't explicitly free buffers and instead rely on them to be garbage collected, there would have to be some hack with a FinalizationRegistry to free the memory used by a buffer when its no longer needed. This is obviously not ideal because the finalizers aren't gonna run unless the program has some regular async delay in it, and even then all it takes is the thread being busy for too long in a part without a delay to potentially run out of memory if its wasteful with buffers.

You could always also create a way to make WASM GC arrays that are backed by ArrayBuffers, the only issue I see with this is it would be harder for the VM optimize the array load/store/copy instructions if WASM GC arrays could potentially come in multiple flavors like this, since the JIT compiler has no way of knowing ahead of time if the code is dealing with a native WASM GC array or a WASM GC array that is backed by an ArrayBuffer unless its handling a constant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient bulk transfer of a i8 array's contents into a JavaScript ArrayBuffer? #568

Efficient bulk transfer of a i8 array's contents into a JavaScript ArrayBuffer? #568

lax1dude commented Oct 26, 2024

lax1dude commented Oct 26, 2024

tlively commented Oct 28, 2024

osa1 commented Nov 12, 2024

sjrd commented Nov 12, 2024

osa1 commented Nov 12, 2024

jakobkummerow commented Nov 12, 2024

sjrd commented Nov 12, 2024

jakobkummerow commented Nov 12, 2024

kripken commented Nov 12, 2024 •

edited

Loading

lax1dude commented Nov 13, 2024 •

edited

Loading

Efficient bulk transfer of a i8 array's contents into a JavaScript ArrayBuffer? #568

Efficient bulk transfer of a i8 array's contents into a JavaScript ArrayBuffer? #568

Comments

lax1dude commented Oct 26, 2024

lax1dude commented Oct 26, 2024

tlively commented Oct 28, 2024

osa1 commented Nov 12, 2024

sjrd commented Nov 12, 2024

osa1 commented Nov 12, 2024

jakobkummerow commented Nov 12, 2024

sjrd commented Nov 12, 2024

jakobkummerow commented Nov 12, 2024

kripken commented Nov 12, 2024 • edited Loading

lax1dude commented Nov 13, 2024 • edited Loading

kripken commented Nov 12, 2024 •

edited

Loading

lax1dude commented Nov 13, 2024 •

edited

Loading