Interleave benchmark iterations from different processes #143

fitzgen · 2021-06-04T20:41:42Z

This commit makes it so that, rather than spawning a process for a Wasm
benchmark and engine pair and running all iterations for that process
immediately, we now spawn a bunch of processes and then run one iteration from a
random one of them at a time. This interleaves benchmark iterations, not just
processes.

This means that we need to have communication back and forth between the parent
process and the child processes. Before, we had fork/join style subprocesses
where we just spawned a child and waited for it to finish before starting the
next subprocess and there was no communication between the parent and child
before the child was done executing. In order to interleave benchmark
iterations, we need to spawn many processes, have them wait on the parent, and
have the parent tell one subprocess at a time to run one iteration and then wait
again. The way that this is done is with a very simple stdin and
stdout-based protocol. All children are waiting to read a newline on their
stdin before they run an iteration. The parent chooses a child, writes a
newline and then waits for the child to finish by reading a newline from the
child's stdout. When the child finishes its iteration, it writes a newline to
its stdout. When all iterations are complete, then the child writes its
results as JSON to stdout.

Fixes #139

Even when we are only using one process for all samples. This will make everything easier when we parent<-->child communication protocol changes to support interleaving iterations from different processes as part of bytecodealliance#139.

This commit makes it so that, rather than spawning a process for a Wasm benchmark and engine pair and running all iterations for that process immediately, we now spawn a bunch of processes and then run one iteration from a random one of them at a time. This interleaves benchmark iterations, not just processes. This means that we need to have communication back and forth between the parent process and the child processes. Before, we had fork/join style subprocesses where we just spawned a child and waited for it to finish before starting the next subprocess and there was no communication between the parent and child before the child was done executing. In order to interleave benchmark iterations, we need to spawn many processes, have them wait on the parent, and have the parent tell one subprocess at a time to run one iteration and then wait again. The way that this is done is with a very simple `stdin` and `stdout`-based protocol. All children are waiting to read a newline on their `stdin` before they run an iteration. The parent chooses a child, writes a newline and then waits for the child to finish by reading a newline from the child's `stdout`. When the child finishes its iteration, it writes a newline to its `stdout`. When all iterations are complete, then the child writes its results as JSON to `stdout`. Fixes bytecodealliance#139

Not sure how things were working before...

cfallin

This is really nice, thanks!

Just to make sure I understand the synchronization fully: child processes' iterations are fully mutually exclusive, because the parent only unblocks one at a time, and waits for the baton back before unblocking the next, yes? But: the first iteration(s) of the main loop might run while other child processes are still starting up, if startup is slow for some reason; in other words, we have the dependency graph:

Parent
  |            (spawn N children)
  |\_________________________________________
  |        |                |                |   ...
  |      __|____            |
  |     |startup|        ___|___
  | (go)|       |       |startup|
  |\__  |_______|       |       |
  |   |    |            |       |
  |   | ___|_____       |       |
  |   >| iter 0  |      |_______|
  |    |_________|          |
  |  ___/  |                |
  | /(done)|                |
  |<       |                |
  |        |                |
  | (go)   |                |
  |\_______|_________       |
  |        |         \  ____|____
  |        |          >| iter 0  |
  |        |           |_________|
  | _______|____________/   |
  |/ (done)|                |
  |<       |                |
  :        :                :
  :        :                :

and the startup work (process initialization, etc.) of the second child is concurrent with (has no synchronizing edge with respect to) the 0th iteration of the first child.

I think the fix is pretty simple -- just write a byte from each child to say "I'm started up and ready", and wait for such a byte from each child before commencing the main loop in the parent. Thoughts?

fitzgen · 2021-06-04T22:09:23Z

Good eye @cfallin! I'll fix that up in one sec.

…d process initialization

cfallin

Looks good!

fitzgen · 2021-06-04T22:59:56Z

Looks like the tests are hanging on windows. I wonder if \n isn't flushing stdout? I'll try explicitly flushing.

Windows is hanging, and maybe this is the fix?

abrown

I have looked at this a few times over the last few days and there are couple of concerns that I wasn't able to articulate, hence my lack of review. But here goes:

this PR adds quite a bit of complexity but I'm not yet convinced that the benefits are worth it. I'm not saying that interleaving iterations isn't a good idea or that the benefits aren't there, but rather that I would like to see how much more accurate the results are with this enabled compared to without it being enabled. Is there some way to see that?
Also, to mitigate the complexity, I wonder if this functionality shouldn't be encapsulated somehow (queues + a process pool, e.g.) so that it is easier to reason about separately from the "run a benchmark" logic. It might also make it easier to run things with and without it turned on to see if it is effective (see first point).

@fitzgen, what do you think? I don't want to block merging this, so go ahead if you believe in the change. But could you consider what I'm trying to get at above?

fitzgen · 2021-06-14T20:32:37Z

I would like to see how much more accurate the results are with this enabled compared to without it being enabled. Is there some way to see that?

I would expect that we would see less correlation between iteration # and the measurement's count (e.g count of cycles or count of nanoseconds) as well as process # and the measurement's count. We should, in theory, be able to measure this with a chi-squared test, but I haven't been able to get that working in R myself.

Also, to mitigate the complexity, I wonder if this functionality shouldn't be encapsulated somehow (queues + a process pool, e.g.) so that it is easier to reason about separately from the "run a benchmark" logic. It might also make it easier to run things with and without it turned on to see if it is effective (see first point).

I've pulled the wait/notify logic of writing/reading to/from stdio out into helper methods, but I'm not sure how to simplify further without introducing a bunch more complexity/infrastructure than what we have here. Right now it is a minimal protocol over pipes, and doing something like a process pool is a whole lot more moving parts than what we have now. If you have concrete suggestions, I'm definitely open to them, because I agree it would be nice to factor this out a bit more!

fitzgen · 2021-06-21T17:13:40Z

@abrown did you have any specific suggestions for how to change this PR before we merge it?

abrown · 2021-06-21T23:49:31Z

What about something like the following:

trait BenchmarkExecutionStrategy {
  fn execute(&self) -> Result<()>;
}
struct SameProcess(...); // Maybe wrap up `BenchmarkCommand` here? Or an intermediate struct?
struct MultiProcess(...);
struct MultiProcessInterleaved(...);

Then in BenchmarkCommand::execute pick the appropriate strategy and call execute on that. Perhaps the BenchmarkCommand flags get passed down into the implementations as you have done for Child but maybe it's cleaner to create an intermediate struct that represents the "engine-wasm-iterations-etc." tuple needed to actually run the benchmark.

Here's another option:

struct BenchmarkExecution { engine, wasm, iterations, ... }
struct BenchmarkResults { /* there is probably already a type for this */ }
trait BenchmarkExecutionQueue {
  async fn submit(&mut self, work: BenchmarkExecution) -> Result<BenchmarkResults>;
}
struct MultiProcess(...);
struct MultiProcessInterleaved(..

fitzgen · 2021-06-22T18:23:40Z

trait BenchmarkExecutionStrategy

...

trait BenchmarkExecutionQueue

This PR simplifies the current set up, where we either are doing single process benchmarking or multiprocess, so that we are always doing multiprocess even if it is just one process. So there would only ever be a single implementation of this trait, which seems like the trait wouldn't be carrying its weight, and would actually be making things more complicated / have more pieces.

Regarding factoring out the parent into something analogous to the Child struct: with child processes, we have reified state we can encapsulate: the child process object and its stdin and stdout objects. With the parent, that state isn't something we can encapsulate and control access to because it is ambient in the whole process: any bit of code can add a println!. This could be an argument for using named pipes instead of stdin/stdout but then we get into portability issues and would need some kind of solution for Windows, and we are once again talking about a lot of new infrastructure and moving parts.

fitzgen · 2023-02-13T19:31:52Z

Some folks noticed the lack of this behavior in their benchmarks (i.e. that the same benchmark would run 10 times in a row) and brought it up to me as a potential issue and source of measurement bias.

Would be great to land this almost two years later! 😬

@abrown did you find my last comment convincing for why a trait would be overkill here? If so, I can rebase this and we can finally merge it.

fitzgen added 2 commits June 4, 2021 11:28

fitzgen requested review from cfallin and abrown June 4, 2021 20:41

Give anyhow dependency an actual version

3b3dea7

Not sure how things were working before...

fitzgen mentioned this pull request Jun 4, 2021

Always use "faster" in effect size summary #144

Merged

cfallin reviewed Jun 4, 2021

View reviewed changes

Ensure that the first iteration doesn't happen concurrently with chil…

a780663

…d process initialization

fitzgen requested a review from cfallin June 4, 2021 22:23

cfallin approved these changes Jun 4, 2021

View reviewed changes

fitzgen added 5 commits June 4, 2021 16:03

Flush stdout/child-stdin when we write newlines into them

42d3f5c

Windows is hanging, and maybe this is the fix?

CI: set the RUST_LOG level to debug

0ab7209

Add more logging for IPC protocol

f6ca5d4

CI: only run cargo test not cargo build

cf7cfce

CI: Don't capture stdout/stderr and run single threaded

534c79c

abrown reviewed Jun 9, 2021

View reviewed changes

fitzgen added 2 commits June 10, 2021 10:22

Log stdout/stderr for sightglass-cli test commands

4b5a70b

Read children's stdout before waiting on them to exit

726979e

fitzgen force-pushed the interleave-benchmark-iterations branch from f2be469 to 726979e Compare June 14, 2021 20:17

abrown approved these changes Mar 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interleave benchmark iterations from different processes #143

Interleave benchmark iterations from different processes #143

fitzgen commented Jun 4, 2021

cfallin left a comment

fitzgen commented Jun 4, 2021

cfallin left a comment

fitzgen commented Jun 4, 2021

abrown left a comment

fitzgen commented Jun 14, 2021

fitzgen commented Jun 21, 2021

abrown commented Jun 21, 2021

fitzgen commented Jun 22, 2021

fitzgen commented Feb 13, 2023

Interleave benchmark iterations from different processes #143

Are you sure you want to change the base?

Interleave benchmark iterations from different processes #143

Conversation

fitzgen commented Jun 4, 2021

cfallin left a comment

Choose a reason for hiding this comment

fitzgen commented Jun 4, 2021

cfallin left a comment

Choose a reason for hiding this comment

fitzgen commented Jun 4, 2021

abrown left a comment

Choose a reason for hiding this comment

fitzgen commented Jun 14, 2021

fitzgen commented Jun 21, 2021

abrown commented Jun 21, 2021

fitzgen commented Jun 22, 2021

fitzgen commented Feb 13, 2023