Run tests in parallel #234

alysbrooks · 2021-07-21T18:30:04Z

Very much a work in progress! Mostly opening this PR so I have a place to take and share notes.

alysbrooks · 2021-07-21T22:53:34Z

A few notes:
—Somewhat unusually, this is currently rebased off of Arne's cleanup branch (#233) rather than main because I thought it might benefit from some of the cleanup work and the use of Portal.
—The basic strategy is as follows: for kaocha.type/ns tests (basic Clojure tests), fire off run-tests in a new thread using future, then return the futures as normal. At the top level (the test suite), recursively search for futures and deref them. That way we essentially deref at the last possible moment (after all the tests have been kicked off but before any external code that's expecting actual test results).
—One complication is that we don't want to execute in a new thread if we're already in a worker thread. I'm not sure whether nested futures are supposed to work, but it doesn't seem to work in practice and I'm not sure it would benefit us. The approach I've taken is to modify the configuration to remove :parallel.
—I've benchmarked this on contrived tests that favor a parallel approach (very simple tests with Thread/sleep calls). This is mostly to sniff out bugs (a lot of attempts early this afternoon resulted in no speedup because Kaocha was waiting for each thread to finish before running the next one); real-world tests probably won't be this parallelizable.
—I cannot run the whole Kaocha test suite with :parallel enabled. Ideally, you'd be able to enable this option and Kaocha just ignores it for tests that don't support it, but ensuring it can handle all the corner cases in our test suite may be too ambitious for the initial PR.

alysbrooks · 2021-07-21T23:55:12Z

I did a little bit more testing and realized what I have doesn't work consistently. There seems to be a nondeterministic bug with loading certain specs or vars. I don't really understand it; my intuition is that all the loading would happen before we kick off any futures so this kind of bug shouldn't really happen.

We do some fancy dynamic loading (I think jit falls into this category, too), and it's possible some of these techniques aren't thread safe.

alysbrooks · 2021-07-22T00:01:52Z

Another thing to look for is places we're touching atoms or dynamic vars in code that's called within the futures. Modifying a var or atom in a secondary thread isn't inherently problematic, but it's possible we've accidentally used them in a way that makes them reliant on being modified by a single thread. (For example, two atoms won't stay in sync; you should either use one atom or add some sort of other abstraction to keep them in sync.)

alysbrooks · 2021-08-27T04:37:58Z

Now it seems to be consistently giving me an error about not find the spec for :kaocha.type/var.

alysbrooks · 2021-08-27T05:24:02Z

I added some retry code. I think what's happening is that there's a race condition when running require. ~~I suppose we could create a thread-safe version of require~~ Turns out that serialize-require is pretty easy to replicate, so I did that.

alysbrooks · 2021-08-27T06:30:15Z

Did a slightly more realistic benchmark with 1000 simple io tests (simply slurping README.md).

with :parallel:

Executed in    9.39 secs   fish           external 
   usr time   23.17 secs    0.00 micros   23.17 secs 
   sys time    0.57 secs  418.00 micros    0.57 secs

without :parallel:

Executed in   67.13 secs   fish           external 
   usr time   81.38 secs  569.00 micros   81.38 secs 
   sys time    0.87 secs   94.00 micros    0.87 secs

And another that grabs random Wikipedia articles. (Only 25 tests, so as to not be rude.)

with :parallel:

Executed in    5.78 secs   fish           external 
   usr time   14.03 secs  494.00 micros   14.03 secs 
   sys time    0.42 secs   75.00 micros    0.42 secs

without :parallel:

Executed in   20.25 secs   fish           external 
   usr time   15.82 secs  596.00 micros   15.81 secs 
   sys time    0.66 secs   83.00 micros    0.66 secs

On the one hand, these aren't super revealing if you've seen people talk about the advantages of async and/or multithreading. On the other hand, reading files isn't too uncommon of a task for integration testing, so it's plausible you could see a significant improvement on that part of your test suite.

alysbrooks · 2021-08-28T01:48:23Z

If you have a test suite that would benefit from parallelization, I'd be interested to hear from you. What tests do you have? When it's a little more polished, would you be willing to test or benchmark parallelized test runs?

It's easy enough to document how to turn on parallelized test run (set :parallel to true in tests.edn), but I think for this feature to be useful, the documentation will also have to include information on what kinds of tests benefit from it and what kinds of common tests aren't thread-safe. I'm hoping to hear from Kaocha users to help research this part of the documentation.

Having real-world tests will also let me determine whether parallelizing at the namespace level is enough (current approach) or if we need to consider working at the test level.

nwjsmith · 2021-08-28T14:56:57Z

I'll be able to test and benchmark in a few days. The primary reason I'd like to see test parallelization is to speed up my suite of test.check tests. Parallelization at the per-test level would have the largest impact on these, but even per-namespace should provide a speed bump. Really excited to see your work on this, thank you!

alysbrooks · 2021-08-30T23:06:20Z

@nwjsmith You're welcome! I somehow blanked on test.check as a use case even though I'm a big fan of property-based testing and we use it on some of our projects, e.g., Regal. I believe test.check tests should work already (because they're gathered into namespace testables, which are parallelized), but enabling test.check tests at the test level would be another step because they're a different type.

alysbrooks · 2021-10-19T23:04:59Z

Two updates:

You can now use the `:parallel-children-exclude` key to ensure that any children will not be further parallelized.

Setting a type of test whose children shouldn't be parallelized probably isn't the configuration we ultimately want but is easier to implement because currently we decide whether to parallelize something when we're given a bunch of testables. However, it does let you choose between tests at the namespace level (add :kaocha.type/ns) and those at the test level (leave it blank).

Regal's test suite in parallel passes

Our first real-world performance test:

↪ hyperfine 'clojure -A:test -m kaocha.runner clj' 'clojure -A:test-local -m kaocha.runner clj'
Benchmark #1: clojure -A:test -m kaocha.runner clj
  Time (mean ± σ):     43.235 s ±  6.081 s    [User: 76.913 s, System: 2.508 s]
  Range (min … max):   32.442 s … 54.122 s    10 runs
 
Benchmark #2: clojure -A:test-local -m kaocha.runner clj
  Time (mean ± σ):     40.678 s ± 11.089 s    [User: 83.026 s, System: 2.449 s]
  Range (min … max):   27.970 s … 57.834 s    10 runs
 
Summary
  'clojure -A:test-local -m kaocha.runner clj' ran
    1.06 ± 0.33 times faster than 'clojure -A:test -m kaocha.runner clj'

(test-local uses a local copy of Kaocha with the parallelize branch checked out.)

alysbrooks · 2021-10-20T00:00:41Z

The previous test had high variance so I quit some resource-intensive programs and set the tests to use a fixed seed

And with variance addresed, we get a 10 percent speed up:

↪ hyperfine 'clojure -A:test -m kaocha.runner clj' 'clojure -A:test-local -m kaocha.runner clj'
Benchmark #1: clojure -A:test -m kaocha.runner clj
  Time (mean ± σ):     43.979 s ±  0.905 s    [User: 80.840 s, System: 2.393 s]
  Range (min … max):   42.434 s … 45.172 s    10 runs
 
Benchmark #2: clojure -A:test-local -m kaocha.runner clj
  Time (mean ± σ):     39.900 s ±  1.060 s    [User: 84.105 s, System: 2.453 s]
  Range (min … max):   37.989 s … 41.194 s    10 runs
 
Summary
  'clojure -A:test-local -m kaocha.runner clj' ran
    1.10 ± 0.04 times faster than 'clojure -A:test -m kaocha.runner clj'

Regal isn't the ideal example for this. Even though it has a time-consuming test suite, the majority of its time is taken up by a single test. Even though the test is a property based test that really consists of running the same simple test over a number of random inputs, from Kaocha's perspective, it's a single test.

alysbrooks · 2021-10-20T18:20:13Z

I did some more tests, and parallelization's running fine! 🎉 Tests consistently show parallelization is faster (although sometimes the confidence interval overlaps with it being slower), if only a little bit faster. This is actually really good news—it means that we are not creating so much overhead that it is likely to slow down tests. I'm still not planning on turning it on by default, but it mean that people don't necessarily have to do a ton of benchmarking to ensure it won't make things worse for their test suite.

alysbrooks · 2021-10-25T19:58:40Z

I noticed some cases where parallelization does indeed add overhead (I think tests ran 5 to 10 percent slower?), so I spent some time investigating. Last week, I did some benchmarking with Criterium and today I did some profiling/sampling with VisualVM to see whether part of our implementation is adding overhead and could be optimized. Profiling was a dead end (too slow), but sampling revealed that one of the methods the most time spent in was java.lang.Thread.<init>. A lot of time was spent in java.util.concurrent.AbstractExecutorService.submit (), too.

I think I should put this on hold because it could easily become a rabbit hole.

AndreaCrotti · 2022-01-06T15:25:56Z

I wonder if one possible alternative solution would be to support something like https://www.juxt.pro/blog/parallel-kaocha
So circleci allows you to run tests in parallel, just by running a different subset of tests in every different container.
The more tricky thing circleci does is that it merges back the test results and present the results like tests were run normally.

Maybe supporting something like this in kaocha directly would be easier and not have the same Threading performance issues?

alysbrooks · 2022-01-08T01:09:44Z

@AndreaCrotti That's a clever solution. What changes were you thinking to make? Adding the circleci-parallel namespace code to Kaocha itself?

While I am interested in that solution, I plan to go ahead with the parallel option.

AndreaCrotti · 2022-01-11T14:52:09Z

So well @alysbrooks, I guess there are three things required for this to work:

some way to group tests in a way that each group takes more or less the same time (which circleci does for you with from the past timing information)
some code to then run in parallel multiple kaochas with the various groups of namespaces
some code to get the results from all the kaochas and give a unified output.

Maybe only 3. could be done directly in kaocha, but probably the whole thing could be done with a plugin ideally.
For example, you could:

run all your tests without parallelization and record the timings
group and run all the kaochas and report everything.

One issue with the approach in this PR is that every test suite with database (or similar) writes will probably need quite a bit of adjusting to run in parallel.
This other approach I was suggesting in theory has the same issue, but if you parameterize your test resources configuration, you could also be able to just use a totally different test infrastructure for each kaocha run, and avoid the problem entirely (and it would also avoid the Threading overhead issue as well).

AndreaCrotti · 2022-01-11T14:53:08Z

So well in short I guess it would be great to have this PR merged, but maybe I can investigate the other way to parallelize tests as a separate plugin as a possible alternative strategy.

alysbrooks · 2022-01-11T23:24:18Z

@AndreaCrotti Interesting. I could see 2 being done with a Babashka script that we ship, and I think 3 could be a plugin, in theory. They'd be stretching the idea of plugins because the instance of Kaocha doing the combining wouldn't necessarily run any tests, it would just grab the test results and inserts them so the reporters could summarize them. Actually, a plugin that takes saved test results and feeds them back into Kaocha as though those tests just ran could be pretty interesting.

alysbrooks · 2022-01-12T04:51:36Z

Threw together a simple graph of the Regal results:

stig · 2022-04-14T10:43:59Z

I'm interested in trying this in our CI. Is a snapshot published for this branch?

stig

I'm excited about this PR :-)

I took the liberty to add a few stylistic nitpick suggestions.

src/kaocha/api.clj

+                  (throw+ {:kaocha/early-exit 0}))
+                (when (:parallel config)
+                  (output/warn (str "Parallelization enabled. This feature is in "
+                                    "beta If you encounter errors, try "


repl_sessions/benchmark.clj

+
+(ns benchmark
+  (:require [criterium.core :as c])
+  (:import [java.util.concurrent Executors ])
+  )


src/kaocha/runner.clj

@@ -34,6 +35,7 @@
   [nil  "--[no-]fail-fast"     "Stop testing after the first failure."]
   [nil  "--[no-]color"         "Enable/disable ANSI color codes in output. Defaults to true."]
   [nil  "--[no-]watch"         "Watch filesystem for changes and re-run tests."]
+   [nil  "--[no-]parallel"           "Run tests in parallel. Warning: This feature is beta."]


src/kaocha/test_suite.clj

@@ -4,10 +4,10 @@

 (defn run [testable test-plan]
  (t/do-report {:type :begin-test-suite})
-  (let [results (testable/run-testables (:kaocha.test-plan/tests testable) test-plan)
+  (let [results (testable/run-testables (:kaocha.test-plan/tests testable) test-plan) 


alysbrooks · 2022-12-21T23:11:22Z

I added a profiling plugin you can run by enabling :kaocha.plugin.alpha/parallel-profiling to :plugins. I don't want to add too much to this PR, but it's a plugin so the risk is relatively low.

jumarko · 2023-07-31T10:32:58Z

Are there any plans to wrap up support for parallel test runs soon or should I resort to eftest if I need such a test runner?

AndreaCrotti · 2023-07-31T14:16:06Z

Are there any plans to wrap up support for parallel test runs soon or should I resort to eftest if I need such a test runner?

In case it helps you can still get some parallelism with circleci and kaocha for example https://www.juxt.pro/blog/parallel-kaocha/

marcobiscaro2112 · 2023-11-22T01:44:28Z

src/kaocha/testable.clj

+        futures (doall
+                 (map (fn [t]
+                        (future
+                          (run-testable (assoc t ::thread-info (current-thread-info)) test-plan)))
+                      testables))]
+    (doall (map deref futures))))


Here we are creating an unbounded number of threads. If the test plan contains too many tests (hundreds or thousands) this will probably be a problem.

OS threads are expensive to create, and when we have a number of running threads that is significantly greater than the number of available processors, we'll also incur in a lot of context switching cost.

This probably explains what you observed in your previous comment, because for each future invocation, we're creating a new thread to be executed (via ExecutorService.submit):

I noticed some cases where parallelization does indeed add overhead (I think tests ran 5 to 10 percent slower?)

one of the methods the most time spent in was java.lang.Thread.. A lot of time was spent in java.util.concurrent.AbstractExecutorService.submit ()

One alternative we could explore instead is to create a fixed thread pool ExecutorService limited to the number of available cores.

We then can map each test to a function that runs it, and pass that to invokeAll (this works because a Clojure function implements Callable). This returns a list of Future, so we can do the (doall (map deref futures)) on it the same way we're already doing.

We're still going to create a Callable and a Future per task, but we'll limit the number of threads created and the parallelism of execution to the available number of processors.

This should avoid the overhead of too much parallelism and hopefully result in parallel tests always running faster than serial tests.

What do you think?

Something like

(import '(java.util.concurrent ExecutorService Executors)) (defn task [num] (fn [] (.println System/out num) (Thread/sleep 500) (* num num))) (let [num-cpus (.availableProcessors (Runtime/getRuntime)) executor (Executors/newFixedThreadPool num-cpus) data (range 100) tasks (map task data) futures (.invokeAll executor tasks) results (doall (map deref futures))] (println results))

I do something similar in Splint:

(defn pmap* "Efficient version of pmap which avoids the overhead of lazy-seq. (doall (pmap (fn [_] (Thread/sleep 100)) coll)) => 3.34 secs (pmap* (fn [_] (Thread/sleep 100)) coll) => 202 ms" [f coll] (let [executor (Executors/newCachedThreadPool) futures (mapv #(.submit executor (reify Callable (call [_] (f %)))) coll) ret (mapv #(.get ^Future %) futures)] (.shutdownNow executor) ret))

I couldn't remember the default executor for future, and it appears to be newCachedThreadPool so it will save some on overhead for short-lived threads, but probably creates too many in other situations.

For future reference, Clojure core's fixed thread pool uses the number of processors plus 2: https://github.com/clojure/clojure/blob/08a2d9bdd013143a87e50fa82e9740e2ab4ee2c2/src/jvm/clojure/lang/Agent.java#L49C1-L51C77

Add some information about the thread each test ran on. May want to move this to something not specific to parallelization, otherwise non-parallel tests won't have any thread info.

Internal refers to whether there were delays between running tests in the same thread. It doesn't measure waiting for I/O or many kinds of contention. External refers to how much of the overall time the thread spent. Why do I think this is useful? Well, if you've parallelized your test suite and you end up with a thread that runs from time 0 to time 100 and another that runs from time 1 to time 10, the burden is heavily on the first thread, suggsting the load is not well-balanced. Both may have good internal utilization, but the second has poor external globalization. Part of the reason this is WIP is that there may be better (ideally standard) terms for "internal" and "external". There are some calculation issues and hacks: * Some testables are missing the ::start and ::end for some reason. I don't think this should happen? * I don't think this version counts the number of threads accurately.

Clean up the changes around parallelization, make opting-in from test types explicit, allow config on testable and metadata level, simplify future handling.

alysbrooks force-pushed the parallelize branch from 74d1f81 to 4a5b1b6 Compare July 21, 2021 22:34

alysbrooks changed the title ~~Parallelize~~ WIP: Run tests in parallel Jul 21, 2021

alysbrooks force-pushed the parallelize branch from aa00f15 to 601f7fa Compare September 2, 2021 00:38

plexus mentioned this pull request Sep 14, 2021

Kaocha cloverage plugin doesn't create reports when used with kaocha.repl/run lambdaisland/kaocha-cloverage#16

Open

alysbrooks force-pushed the parallelize branch from 2d69775 to 1748ba6 Compare March 8, 2022 20:16

alysbrooks marked this pull request as ready for review March 8, 2022 22:20

alysbrooks changed the title ~~WIP: Run tests in parallel~~ Run tests in parallel Mar 8, 2022

plexus mentioned this pull request Mar 19, 2022

Running tests in parallel #81

Open

imrekoszo mentioned this pull request Apr 8, 2022

Running tests in parallel polyfy/polylith#153

Closed

stig reviewed Apr 14, 2022

View reviewed changes

ArielA147 force-pushed the main branch 2 times, most recently from a569099 to 4e9e7e2 Compare November 29, 2022 20:12

alysbrooks self-assigned this Feb 23, 2023

plexus force-pushed the parallelize branch from e0eb607 to dc258bb Compare May 12, 2023 20:54

marcobiscaro2112 reviewed Nov 22, 2023

View reviewed changes

alysbrooks and others added 21 commits November 22, 2023 00:26

Add data structure foundation.

32e8eed

Work from pairing session with Arne.

e0f0300

Add semi-functional version (with :parallel) as the flag.

f128717

Add some debugging code for now.

3382198

Add workarounds for now.

6a35e46

Replace workarounds by locking require.

a5d2f7f

Allow parallelization to happen at any level.

20e2812

Clean up.

cb09129

Clean up.

503bfe7

Add benchmarking.

544a3d3

Add warning.

141683e

Add commandline flag.

cc983c1

Clean up.

e6382b8

Add docs and CHANGELOG entry.

8e07ed6

Fix dash.

fd5bfe9

Tweak wording further.

dc52eb1

Add option to filter out by test-suite.

0136e65

Add thread info.

78f2312

Add some information about the thread each test ran on. May want to move this to something not specific to parallelization, otherwise non-parallel tests won't have any thread info.

Post-rebase fixups

fbfd541

Polish up parallelization: clearer config, docs, vestigial code

615ea82

Clean up the changes around parallelization, make opting-in from test types explicit, allow config on testable and metadata level, simplify future handling.

alysbrooks force-pushed the parallelize branch from dc258bb to 615ea82 Compare November 22, 2023 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run tests in parallel #234

Run tests in parallel #234

alysbrooks commented Jul 21, 2021 •

edited

Loading

alysbrooks commented Jul 21, 2021

alysbrooks commented Jul 21, 2021

alysbrooks commented Jul 22, 2021

alysbrooks commented Aug 27, 2021

alysbrooks commented Aug 27, 2021

alysbrooks commented Aug 27, 2021 •

edited

Loading

alysbrooks commented Aug 28, 2021

nwjsmith commented Aug 28, 2021

alysbrooks commented Aug 30, 2021

alysbrooks commented Oct 19, 2021 •

edited

Loading

alysbrooks commented Oct 20, 2021 •

edited

Loading

alysbrooks commented Oct 20, 2021

alysbrooks commented Oct 25, 2021

AndreaCrotti commented Jan 6, 2022

alysbrooks commented Jan 8, 2022

AndreaCrotti commented Jan 11, 2022

AndreaCrotti commented Jan 11, 2022

alysbrooks commented Jan 11, 2022

alysbrooks commented Jan 12, 2022

stig commented Apr 14, 2022

stig left a comment

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

alysbrooks commented Dec 21, 2022

jumarko commented Jul 31, 2023

AndreaCrotti commented Jul 31, 2023

marcobiscaro2112 Nov 22, 2023

marcobiscaro2112 Nov 22, 2023

NoahTheDuke Nov 22, 2023

alysbrooks Nov 22, 2023

alysbrooks Nov 22, 2023

Run tests in parallel #234

Are you sure you want to change the base?

Run tests in parallel #234

Conversation

alysbrooks commented Jul 21, 2021 • edited Loading

alysbrooks commented Jul 21, 2021

alysbrooks commented Jul 21, 2021

alysbrooks commented Jul 22, 2021

alysbrooks commented Aug 27, 2021

alysbrooks commented Aug 27, 2021

alysbrooks commented Aug 27, 2021 • edited Loading

alysbrooks commented Aug 28, 2021

nwjsmith commented Aug 28, 2021

alysbrooks commented Aug 30, 2021

alysbrooks commented Oct 19, 2021 • edited Loading

You can now use the :parallel-children-exclude key to ensure that any children will not be further parallelized.

Regal's test suite in parallel passes

alysbrooks commented Oct 20, 2021 • edited Loading

alysbrooks commented Oct 20, 2021

alysbrooks commented Oct 25, 2021

AndreaCrotti commented Jan 6, 2022

alysbrooks commented Jan 8, 2022

AndreaCrotti commented Jan 11, 2022

AndreaCrotti commented Jan 11, 2022

alysbrooks commented Jan 11, 2022

alysbrooks commented Jan 12, 2022

stig commented Apr 14, 2022

stig left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

alysbrooks commented Dec 21, 2022

jumarko commented Jul 31, 2023

AndreaCrotti commented Jul 31, 2023

marcobiscaro2112 Nov 22, 2023

Choose a reason for hiding this comment

marcobiscaro2112 Nov 22, 2023

Choose a reason for hiding this comment

NoahTheDuke Nov 22, 2023

Choose a reason for hiding this comment

alysbrooks Nov 22, 2023

Choose a reason for hiding this comment

alysbrooks Nov 22, 2023

Choose a reason for hiding this comment

alysbrooks commented Jul 21, 2021 •

edited

Loading

alysbrooks commented Aug 27, 2021 •

edited

Loading

alysbrooks commented Oct 19, 2021 •

edited

Loading

You can now use the `:parallel-children-exclude` key to ensure that any children will not be further parallelized.

alysbrooks commented Oct 20, 2021 •

edited

Loading