-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm-SPIRV #9491
Comments
Wasm and SPIR-V have a fundamentally different memory model from each other. Wasm models memory as a single array of bytes, while SPIR-V models it as a bunch of typed objects. Some of these may be arrays into which you can index, but it fundamentally doesn't support arbitrary pointers like wasm does. https://github.com/EmbarkStudios/spirt can lift some uses of untyped memory (rust, wasm, ...) into typed memory (spir-v), but can't lift all of them. Also to make any effective use of a GPU you also need to support work-group local memory and more, which wasm doesn't support. |
All you say is right but , it's still possible . |
@SkillfulElectro thanks for the issue! I was involved in some discussions around this in 2020 or so -- and the conclusion then was essentially the same as @bjorn3's points now, that the target is quite different and this would not be an easy adaptation. The use-case in question ended up finding a different way to program GPUs portably. That discussion was purely about a Cranelift port, but the Wasm runtime as well is an even bigger question mark: what would it mean for Wasmtime to run on a GPU where there is no operating system, (sometimes) no virtual memory, etc.? Or does the Wasm VM get split between GPU and CPU, with (expensive) calls between them? And then how does one actually take advantage of the parallelism? Do we need a new "vectorized Wasm call" API in Wasmtime? (Keep in mind that a single thread of a GPU has lower performance than a single thread on a CPU; GPUs only make sense when leveraging the SIMT model. And SIMT != SIMD, i.e., the programming model is not the same as what Wasm has exposed for data parallelism.) What do we do about branch divergence? Do we have estimates or modeling that show this would be reasonably low overhead for typical Wasms? For all these reasons I'm pretty skeptical. That doesn't mean we should shut down discussion now, at all. What it does mean is that probably there should be a more detailed writeup: what is the use-case, how would all of these high-level design questions be resolved, etc. This should probably take the form of an RFC discussion eventually, but before that, it would help if you could write a bit more about motivation and these other questions here. |
Well all of his point is correct but look all the languages compile to wasm so if we can run wasm it means we can run all of our ordinary code without touching on GPU . Also what about compiling to wgsl if managing things this way is hard? We just need to add a functionality to user specify number of blocks and workgroups size and compilation needs to be done once so it's cheap price to make simple codes run on GPU and we can reuse that module again without recompiling . For this we can use wgpu and naga |
Yes, I don't think anyone doubts that having this target would be very useful. The difficult design questions are really the heart of the problem though -- the question is how to map Wasm to the GPU programming abstraction in a way that makes sense and yields speedup. I'd invite you to give your thoughts on any of the questions I wrote out above! (I'll actually say a little more directly: the way open-source works is that interested parties come in with time and energy and drive interesting new directions or additions to projects. Leaving a comment asking for a very general high-level goal, and then arguing why you want it without driving the engineering, isn't likely to lead anywhere. What I'm trying to steer you toward is driving the design exploration here yourself, in a way that could break the problem down into actionable pieces.) |
@cfallin oki so i think first of all why would we need to use GPU? parallel computing . so some of wasm file types cant be compiled to GPU kernel functions which are using WASI or others which are not related to computing , second we create an struct which stores number of blocks in each dim and number of workgroups (threads) in each block , third the wasm function must get index of type int as its first para and an array of supported data types by wgsl as its second para , with this simple rules most of codes which are compiled to wasm can be ran on gpu . now we compile the wasm bytecode to wgsl and pass it to wgpu ( i say wgpu because i am familiar with it ) the index becomes global invocation id and its calculation and data storage arrays or textures becomes input of our array and we pass them to gpu buffer using wgpu buffer also wasm functions which are compiled to run on GPU must not return anything . their returned value must be written on the input arrays we have multiple GPU devices for example in the server or smth . so we need to add a way to iterate over them by index for choosing prefered device . smth like https://github.com/SkillfulElectro/EMCompute/blob/main/src/gpu_device.rs . |
@SkillfulElectro thanks for your reply. I think there needs to be a deeper exploration of the engineering tradeoffs here. I'll go through your points and my questions above to try to help guide you a bit.
Sure, again, no one is doubting how useful this would be if it were built!
This is a very high-level and vague description of a more detailed system design that I think you have in your head. A few followup questions that could help expand it:
At a higher level, I'll repeat the questions I wrote above; we need crisp answers to all of these I think:
|
@cfallin well you are right wasmtime is just runtime for wasm .
|
Feature
SPIRV compilation target
Benefit
Adding SPIRV target for wasm makes it the best way to write code once and use it with CPU and GPU so it's going to be powerful option to use
Implementation
I think we should convert or code to naga-ir , and then use wgpu for running or spirv . I think abstraction over memory allocation copy and etc cpu<->gpu transfers can improve development time
Alternatives
Directly compiling to spirv
The text was updated successfully, but these errors were encountered: