-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interleave benchmark iterations from different processes #143
base: main
Are you sure you want to change the base?
Interleave benchmark iterations from different processes #143
Conversation
Even when we are only using one process for all samples. This will make everything easier when we parent<-->child communication protocol changes to support interleaving iterations from different processes as part of bytecodealliance#139.
This commit makes it so that, rather than spawning a process for a Wasm benchmark and engine pair and running all iterations for that process immediately, we now spawn a bunch of processes and then run one iteration from a random one of them at a time. This interleaves benchmark iterations, not just processes. This means that we need to have communication back and forth between the parent process and the child processes. Before, we had fork/join style subprocesses where we just spawned a child and waited for it to finish before starting the next subprocess and there was no communication between the parent and child before the child was done executing. In order to interleave benchmark iterations, we need to spawn many processes, have them wait on the parent, and have the parent tell one subprocess at a time to run one iteration and then wait again. The way that this is done is with a very simple `stdin` and `stdout`-based protocol. All children are waiting to read a newline on their `stdin` before they run an iteration. The parent chooses a child, writes a newline and then waits for the child to finish by reading a newline from the child's `stdout`. When the child finishes its iteration, it writes a newline to its `stdout`. When all iterations are complete, then the child writes its results as JSON to `stdout`. Fixes bytecodealliance#139
Not sure how things were working before...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice, thanks!
Just to make sure I understand the synchronization fully: child processes' iterations are fully mutually exclusive, because the parent only unblocks one at a time, and waits for the baton back before unblocking the next, yes? But: the first iteration(s) of the main loop might run while other child processes are still starting up, if startup is slow for some reason; in other words, we have the dependency graph:
Parent
| (spawn N children)
|\_________________________________________
| | | | ...
| __|____ |
| |startup| ___|___
| (go)| | |startup|
|\__ |_______| | |
| | | | |
| | ___|_____ | |
| >| iter 0 | |_______|
| |_________| |
| ___/ | |
| /(done)| |
|< | |
| | |
| (go) | |
|\_______|_________ |
| | \ ____|____
| | >| iter 0 |
| | |_________|
| _______|____________/ |
|/ (done)| |
|< | |
: : :
: : :
and the startup work (process initialization, etc.) of the second child is concurrent with (has no synchronizing edge with respect to) the 0th iteration of the first child.
I think the fix is pretty simple -- just write a byte from each child to say "I'm started up and ready", and wait for such a byte from each child before commencing the main loop in the parent. Thoughts?
Good eye @cfallin! I'll fix that up in one sec. |
…d process initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Looks like the tests are hanging on windows. I wonder if |
Windows is hanging, and maybe this is the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked at this a few times over the last few days and there are couple of concerns that I wasn't able to articulate, hence my lack of review. But here goes:
- this PR adds quite a bit of complexity but I'm not yet convinced that the benefits are worth it. I'm not saying that interleaving iterations isn't a good idea or that the benefits aren't there, but rather that I would like to see how much more accurate the results are with this enabled compared to without it being enabled. Is there some way to see that?
- Also, to mitigate the complexity, I wonder if this functionality shouldn't be encapsulated somehow (queues + a process pool, e.g.) so that it is easier to reason about separately from the "run a benchmark" logic. It might also make it easier to run things with and without it turned on to see if it is effective (see first point).
@fitzgen, what do you think? I don't want to block merging this, so go ahead if you believe in the change. But could you consider what I'm trying to get at above?
f2be469
to
726979e
Compare
I would expect that we would see less correlation between iteration # and the measurement's count (e.g count of cycles or count of nanoseconds) as well as process # and the measurement's count. We should, in theory, be able to measure this with a chi-squared test, but I haven't been able to get that working in R myself.
I've pulled the wait/notify logic of writing/reading to/from stdio out into helper methods, but I'm not sure how to simplify further without introducing a bunch more complexity/infrastructure than what we have here. Right now it is a minimal protocol over pipes, and doing something like a process pool is a whole lot more moving parts than what we have now. If you have concrete suggestions, I'm definitely open to them, because I agree it would be nice to factor this out a bit more! |
@abrown did you have any specific suggestions for how to change this PR before we merge it? |
What about something like the following:
Then in Here's another option:
|
This PR simplifies the current set up, where we either are doing single process benchmarking or multiprocess, so that we are always doing multiprocess even if it is just one process. So there would only ever be a single implementation of this trait, which seems like the trait wouldn't be carrying its weight, and would actually be making things more complicated / have more pieces. Regarding factoring out the parent into something analogous to the |
Some folks noticed the lack of this behavior in their benchmarks (i.e. that the same benchmark would run 10 times in a row) and brought it up to me as a potential issue and source of measurement bias. Would be great to land this almost two years later! 😬 @abrown did you find my last comment convincing for why a |
This commit makes it so that, rather than spawning a process for a Wasm
benchmark and engine pair and running all iterations for that process
immediately, we now spawn a bunch of processes and then run one iteration from a
random one of them at a time. This interleaves benchmark iterations, not just
processes.
This means that we need to have communication back and forth between the parent
process and the child processes. Before, we had fork/join style subprocesses
where we just spawned a child and waited for it to finish before starting the
next subprocess and there was no communication between the parent and child
before the child was done executing. In order to interleave benchmark
iterations, we need to spawn many processes, have them wait on the parent, and
have the parent tell one subprocess at a time to run one iteration and then wait
again. The way that this is done is with a very simple
stdin
andstdout
-based protocol. All children are waiting to read a newline on theirstdin
before they run an iteration. The parent chooses a child, writes anewline and then waits for the child to finish by reading a newline from the
child's
stdout
. When the child finishes its iteration, it writes a newline toits
stdout
. When all iterations are complete, then the child writes itsresults as JSON to
stdout
.Fixes #139