Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clock vectors #16

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Add clock vectors #16

wants to merge 4 commits into from

Conversation

bartoszmodelski
Copy link
Contributor

@bartoszmodelski bartoszmodelski commented Mar 13, 2023

The core of DSCheck is closely following the design introduced in the DPOR paper, except for clock vectors, which are missing. This PR adds them.

Motivation

We want to be able to track happens before relationship as it is critical to an efficient search. For example, consider the following program:

Thread A 
a1: write x=1
a2: write y=2 

Thread B 
b1: write y=1 
b2: write x=2 

If we were to do a full search, we'd explore 6 traces:

a1 a2 b1 b2 
a1 b1 a2 b2 
a1 b1 b2 a2 
b1 b2 a2 a2
b1 a1 b2 a2 
b1 a1 a2 b2 

But many of these trace are redundant as far as the terminal state is concerned. For example, consider: a1 b1 a2 b2. Reordering of either the first two or the last two transitions leads to the same result (x=2, y=2). In fact, the only three possible terminal states are (x=1, y=2), (y=1, x=2), (x=2, y=2) and as long as these are covered, so are all possible realizations.

Results

There's a significant improvement on tests with multiple variables:

  • Artificial test. Number of traces goes down from 31 to 5.
  • Michael-Scott queue. The runtime on tests with 4 items goes down from 0.2 s to ~0.001 s. Eyeballing it, previously adding a single item increased the runtime 7-fold, while now it doubles it.

There's a 10-15% slowdown on test_list, which has all threads focus on a single variable, due to cost of extra accounting. If we don't find any further optimizations here in the future, we can always add a flag for fast-path here.

test_naive_counter is too small to be affected.

Design

I followed the section 4.2 and figure 5 in DPOR paper. It was not that trivial due to sheer amount of notation but it should be easier to verify than to translate for the first time.

I think the best resource is the paper itself. In part because I'm not an expert in this area, and another pair of eyes looking at this could potentially discover things I've missed (even in terms of building understanding). Having said that, I'm super happy to jump on a call or try to explain more here.

I'd recommend a quick read of sections 1-3, and spending more time on section 4. The paper explains vector clocks but I've supported myself with Martin Kleppmann's excellent short lecture on them.

@bartoszmodelski bartoszmodelski requested a review from a team March 13, 2023 21:46
Atomic.set a 2)
|> ignore;

Atomic.final (fun () -> ()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should help to insert some debug print here:

Atomic.final (fun () ->
  Format.printf "a=%i b=%i c=%i d=%i@."
    (Atomic.get a) (Atomic.get b) (Atomic.get c) (Atomic.get d)));

and investigate why it outputs these terminal states:

a=2 b=2 c=2 d=2
a=2 b=2 c=2 d=2
a=2 b=2 c=2 d=1
a=2 b=2 c=2 d=2
a=2 b=2 c=2 d=1

rather than:

a=2 b=2 c=2 d=2
a=2 b=2 c=2 d=1
a=2 b=2 c=1 d=1
a=2 b=1 c=1 d=1
a=1 b=1 c=1 d=1

@bartoszmodelski
Copy link
Contributor Author

Ok, I think I understand what's going on. The mechanism with vector clocks is built for tracking a much stronger relationship than we have between SC atomics. That is, happens-before occurs due to program order and in synchronization points, e.g. thread spawn, thread join, locks, etc. These are the only places where are allowed to merge clocks of two threads (max them and update the clock of thread that was waiting if I understand correctly). It was not quite obvious from the DPOR paper since they only make this specialization in section 4 but CDSCheck one makes it more explicit.

We should add support for join, locks, etc. at some point to be able to verify a wider range of real programs but for now I'm going to hold off with this change.

Thanks @art-w for help with this example!

@bartoszmodelski bartoszmodelski removed the request for review from a team March 16, 2023 14:52
@bartoszmodelski bartoszmodelski marked this pull request as draft March 16, 2023 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants