-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run tests in parallel #234
base: main
Are you sure you want to change the base?
Conversation
A few notes: |
I did a little bit more testing and realized what I have doesn't work consistently. There seems to be a nondeterministic bug with loading certain specs or vars. I don't really understand it; my intuition is that all the loading would happen before we kick off any futures so this kind of bug shouldn't really happen. We do some fancy dynamic loading (I think |
Another thing to look for is places we're touching atoms or dynamic vars in code that's called within the futures. Modifying a var or atom in a secondary thread isn't inherently problematic, but it's possible we've accidentally used them in a way that makes them reliant on being modified by a single thread. (For example, two atoms won't stay in sync; you should either use one atom or add some sort of other abstraction to keep them in sync.) |
Now it seems to be consistently giving me an error about not find the spec for |
I added some retry code. I think what's happening is that there's a race condition when running |
Did a slightly more realistic benchmark with 1000 simple io tests (simply with
without
And another that grabs random Wikipedia articles. (Only 25 tests, so as to not be rude.) with
without
On the one hand, these aren't super revealing if you've seen people talk about the advantages of async and/or multithreading. On the other hand, reading files isn't too uncommon of a task for integration testing, so it's plausible you could see a significant improvement on that part of your test suite. |
If you have a test suite that would benefit from parallelization, I'd be interested to hear from you. What tests do you have? When it's a little more polished, would you be willing to test or benchmark parallelized test runs? It's easy enough to document how to turn on parallelized test run (set :parallel to true in tests.edn), but I think for this feature to be useful, the documentation will also have to include information on what kinds of tests benefit from it and what kinds of common tests aren't thread-safe. I'm hoping to hear from Kaocha users to help research this part of the documentation. Having real-world tests will also let me determine whether parallelizing at the namespace level is enough (current approach) or if we need to consider working at the test level. |
I'll be able to test and benchmark in a few days. The primary reason I'd like to see test parallelization is to speed up my suite of |
@nwjsmith You're welcome! I somehow blanked on |
aa00f15
to
601f7fa
Compare
Two updates: You can now use the
|
The previous test had high variance so I quit some resource-intensive programs and set the tests to use a fixed seed And with variance addresed, we get a 10 percent speed up:
Regal isn't the ideal example for this. Even though it has a time-consuming test suite, the majority of its time is taken up by a single test. Even though the test is a property based test that really consists of running the same simple test over a number of random inputs, from Kaocha's perspective, it's a single test. |
I did some more tests, and parallelization's running fine! 🎉 Tests consistently show parallelization is faster (although sometimes the confidence interval overlaps with it being slower), if only a little bit faster. This is actually really good news—it means that we are not creating so much overhead that it is likely to slow down tests. I'm still not planning on turning it on by default, but it mean that people don't necessarily have to do a ton of benchmarking to ensure it won't make things worse for their test suite. |
I noticed some cases where parallelization does indeed add overhead (I think tests ran 5 to 10 percent slower?), so I spent some time investigating. Last week, I did some benchmarking with Criterium and today I did some profiling/sampling with VisualVM to see whether part of our implementation is adding overhead and could be optimized. Profiling was a dead end (too slow), but sampling revealed that one of the methods the most time spent in was I think I should put this on hold because it could easily become a rabbit hole. |
I wonder if one possible alternative solution would be to support something like https://www.juxt.pro/blog/parallel-kaocha Maybe supporting something like this in kaocha directly would be easier and not have the same Threading performance issues? |
@AndreaCrotti That's a clever solution. What changes were you thinking to make? Adding the While I am interested in that solution, I plan to go ahead with the parallel option. |
So well @alysbrooks, I guess there are three things required for this to work:
Maybe only 3. could be done directly in kaocha, but probably the whole thing could be done with a plugin ideally.
One issue with the approach in this PR is that every test suite with database (or similar) writes will probably need quite a bit of adjusting to run in parallel. |
So well in short I guess it would be great to have this PR merged, but maybe I can investigate the other way to parallelize tests as a separate plugin as a possible alternative strategy. |
@AndreaCrotti Interesting. I could see 2 being done with a Babashka script that we ship, and I think 3 could be a plugin, in theory. They'd be stretching the idea of plugins because the instance of Kaocha doing the combining wouldn't necessarily run any tests, it would just grab the test results and inserts them so the reporters could summarize them. Actually, a plugin that takes saved test results and feeds them back into Kaocha as though those tests just ran could be pretty interesting. |
I'm interested in trying this in our CI. Is a snapshot published for this branch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm excited about this PR :-)
I took the liberty to add a few stylistic nitpick suggestions.
src/kaocha/api.clj
Outdated
(throw+ {:kaocha/early-exit 0})) | ||
(when (:parallel config) | ||
(output/warn (str "Parallelization enabled. This feature is in " | ||
"beta If you encounter errors, try " |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
|
||
(ns benchmark | ||
(:require [criterium.core :as c]) | ||
(:import [java.util.concurrent Executors ]) | ||
) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
src/kaocha/runner.clj
Outdated
@@ -34,6 +35,7 @@ | |||
[nil "--[no-]fail-fast" "Stop testing after the first failure."] | |||
[nil "--[no-]color" "Enable/disable ANSI color codes in output. Defaults to true."] | |||
[nil "--[no-]watch" "Watch filesystem for changes and re-run tests."] | |||
[nil "--[no-]parallel" "Run tests in parallel. Warning: This feature is beta."] |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
src/kaocha/test_suite.clj
Outdated
@@ -4,10 +4,10 @@ | |||
|
|||
(defn run [testable test-plan] | |||
(t/do-report {:type :begin-test-suite}) | |||
(let [results (testable/run-testables (:kaocha.test-plan/tests testable) test-plan) | |||
(let [results (testable/run-testables (:kaocha.test-plan/tests testable) test-plan) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
a569099
to
4e9e7e2
Compare
I added a profiling plugin you can run by enabling |
Are there any plans to wrap up support for parallel test runs soon or should I resort to eftest if I need such a test runner? |
In case it helps you can still get some parallelism with circleci and kaocha for example https://www.juxt.pro/blog/parallel-kaocha/ |
futures (doall | ||
(map (fn [t] | ||
(future | ||
(run-testable (assoc t ::thread-info (current-thread-info)) test-plan))) | ||
testables))] | ||
(doall (map deref futures)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are creating an unbounded number of threads. If the test plan contains too many tests (hundreds or thousands) this will probably be a problem.
OS threads are expensive to create, and when we have a number of running threads that is significantly greater than the number of available processors, we'll also incur in a lot of context switching cost.
This probably explains what you observed in your previous comment, because for each future
invocation, we're creating a new thread to be executed (via ExecutorService.submit):
I noticed some cases where parallelization does indeed add overhead (I think tests ran 5 to 10 percent slower?)
one of the methods the most time spent in was java.lang.Thread.. A lot of time was spent in java.util.concurrent.AbstractExecutorService.submit ()
One alternative we could explore instead is to create a fixed thread pool ExecutorService limited to the number of available cores.
We then can map each test to a function that runs it, and pass that to invokeAll (this works because a Clojure function implements Callable). This returns a list of Future, so we can do the (doall (map deref futures))
on it the same way we're already doing.
We're still going to create a Callable and a Future per task, but we'll limit the number of threads created and the parallelism of execution to the available number of processors.
This should avoid the overhead of too much parallelism and hopefully result in parallel tests always running faster than serial tests.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
(import '(java.util.concurrent ExecutorService Executors))
(defn task [num]
(fn []
(.println System/out num)
(Thread/sleep 500)
(* num num)))
(let [num-cpus (.availableProcessors (Runtime/getRuntime))
executor (Executors/newFixedThreadPool num-cpus)
data (range 100)
tasks (map task data)
futures (.invokeAll executor tasks)
results (doall (map deref futures))]
(println results))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do something similar in Splint:
(defn pmap*
"Efficient version of pmap which avoids the overhead of lazy-seq.
(doall (pmap (fn [_] (Thread/sleep 100)) coll)) => 3.34 secs
(pmap* (fn [_] (Thread/sleep 100)) coll) => 202 ms"
[f coll]
(let [executor (Executors/newCachedThreadPool)
futures (mapv #(.submit executor (reify Callable (call [_] (f %)))) coll)
ret (mapv #(.get ^Future %) futures)]
(.shutdownNow executor)
ret))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't remember the default executor for future
, and it appears to be newCachedThreadPool
so it will save some on overhead for short-lived threads, but probably creates too many in other situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference, Clojure core's fixed thread pool uses the number of processors plus 2: https://github.com/clojure/clojure/blob/08a2d9bdd013143a87e50fa82e9740e2ab4ee2c2/src/jvm/clojure/lang/Agent.java#L49C1-L51C77
Add some information about the thread each test ran on. May want to move this to something not specific to parallelization, otherwise non-parallel tests won't have any thread info.
Internal refers to whether there were delays between running tests in the same thread. It doesn't measure waiting for I/O or many kinds of contention. External refers to how much of the overall time the thread spent. Why do I think this is useful? Well, if you've parallelized your test suite and you end up with a thread that runs from time 0 to time 100 and another that runs from time 1 to time 10, the burden is heavily on the first thread, suggsting the load is not well-balanced. Both may have good internal utilization, but the second has poor external globalization. Part of the reason this is WIP is that there may be better (ideally standard) terms for "internal" and "external". There are some calculation issues and hacks: * Some testables are missing the ::start and ::end for some reason. I don't think this should happen? * I don't think this version counts the number of threads accurately.
Clean up the changes around parallelization, make opting-in from test types explicit, allow config on testable and metadata level, simplify future handling.
dc258bb
to
615ea82
Compare
Very much a work in progress! Mostly opening this PR so I have a place to take and share notes.
Add some way of monitoring for debug or profiling?Investigate test-level parallelizationAdd configuration for test-level parallelizationEnsure test reporters print meaningful resultsDotsDocumentationkaocha.watch
doesn't interfere with this. See Ensure unskipped testable for reporting load error #371.