Asyncify? #675

kripken · 2020-10-19T18:23:51Z

kripken
Oct 19, 2020

Hi! This came up in the stack switching meeting just now, posting an issue so it's not forgotten.

As @tlively mentioned, there is an Asyncify functionality in Binaryen, which can let wasm pause and resume. More details in that link, but basically you have an API to say "start unwinding the stack", "start resuming the stack", and you run a transform on the wasm to add all the support for it. Then you can easily pause and resume multiple contexts of execution, swap between them, scan their stacks, etc.

As mentioned in the link, this has overhead - it's kind of like a polyfill in "userspace" for a future proper stack switching. But it works today - it's been tested very heavily, and is used in production by people. Emscripten for example has integration for it, and it uses it to implement things like fibers, JS Promise integration, and conservative stack scanning (i.e. scanning values in wasm local variables all the way up the call stack - like a polyfill for a future stack scanning proposal).

It might be interesting to get Lumen working today using Asyncify, to unblock you from waiting for proper stack switching? And the experience there could help the stack switching discussion.

cc @RReverser

bitwalker · 2020-11-02T21:53:13Z

bitwalker
Nov 2, 2020
Maintainer

@kripken Sorry for the delay in getting back to you! I've been busy at work on other things for the past two weeks.

The transformation that asyncify applies is essentially equivalent to the transformation that LLVM does for coroutines correct? In other words, it splits a function into multiple functions based on the points at which the function needs to be able to resume, creates a structure to hold the live stack values, and then unwinds the stack completely to yield, and "replays" the stack to resume (where "replay" here just refers to the fact that the code has to jump through all of the functions that were on the yielded call stack in order to get back to where the code yielded).

The main reasons why we have avoided that kind of transformation so far is due to the frequency at which yields occur, and the fact that Erlang code is highly recursive, so the call stack can grow quite large in situations that are not tail optimized. In the first case, Erlang's runtime performs preemptive scheduling of green threads (processes) by tracking the number of reductions (a measure of the amount of work a process has done since it was scheduled), this count is checked against a limit at certain points and if the limit is exceeded, the code yields back to the scheduler, suspending the current state. This check is currently performed on function entry and after every function call - as a result, a transformation like that applied for LLVM coroutines would result in splitting virtually every function in the program at multiple locations. Code size bloat aside though, the runtime overhead would be prohibitive at best, as the common case isn't that a callee bounces control back and forth with its caller, but rather that functions at any depth of the call stack will yield all the way back to the scheduler, and then immediately resume another process deep in its own call stack.

I think its certainly possible for us to apply a transformation like Asyncify, but I strongly doubt it function well in practice due to the above constraints. If you have data to the contrary though, especially from those who are using it in production, I'm definitely interested. In particular, getting a sense of how frequently that code pauses/resumes, and how deep the call stack is between the yield point and where control is handed off.

0 replies

kripken · 2020-11-02T23:03:32Z

kripken
Nov 2, 2020
Author

The transformation that asyncify applies is essentially equivalent to the transformation that LLVM does for coroutines correct?

Similar, but different. In particular, it does not generate separate functions. This avoids the code size bloat issue pretty well, in fact there's a guarantee on code size not increasing beyond something like a factor of 2 even in the very worst case. This has a slight runtime cost, though.

We do have good results from people using Asyncify in production, but none have the demands Erlang does, I think. For example, the web port of DosBox has almost no overhead from using Asyncify, but IIRC their context switches are fairly rare. That at least shows that it doesn't harm general throughput.

Overall, I think Lumen would be the first in this space. So the question is whether it's worth experimenting with, without knowing in advance the result. I'm not sure how much work it would be on your side, but Asyncify itself is stable and I'd be happy to help with any issues there!

0 replies

bcardarella · 2020-12-09T16:57:31Z

bcardarella
Dec 9, 2020
Maintainer

Per Dec 9 standup, this issue has been converted to a Discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asyncify? #675

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Asyncify? #675

kripken Oct 19, 2020

Replies: 3 comments

bitwalker Nov 2, 2020 Maintainer

kripken Nov 2, 2020 Author

bcardarella Dec 9, 2020 Maintainer

kripken
Oct 19, 2020

bitwalker
Nov 2, 2020
Maintainer

kripken
Nov 2, 2020
Author

bcardarella
Dec 9, 2020
Maintainer