Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: single solver shared by multiple paths #218

Merged
merged 27 commits into from
Nov 29, 2023
Merged

Conversation

daejunpark
Copy link
Collaborator

@daejunpark daejunpark commented Nov 10, 2023

  • share a single solver for all paths
  • full dfs path exploration across external call boundaries
  • use z3 push/pop to avoid the solver reinstantiation overhead during the dfs
  • generate counterexamples in parallel with the dfs
  • a new option for early exit upon finding the first counterexample
  • a new option to set the number of threads for parallel solvers

note: the farcaster tests are temporarily disabled in ci, due to independent issues that are to be fixed in a separate pr

@daejunpark daejunpark marked this pull request as ready for review November 26, 2023 03:27
@0xkarmacoma 0xkarmacoma self-requested a review November 27, 2023 23:58
@@ -815,7 +864,7 @@ def sha3_data(self, data: Bytes, size: int) -> Word:
sha3_expr = f_sha3(data)

# assume hash values are sufficiently smaller than the uint max
self.solver.add(ULE(sha3_expr, con(2**256 - 2**64)))
self.path.append(ULE(sha3_expr, con(2**256 - 2**64)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it would be useful to expose an additional method like add_assertion? It feels like this is a way to add it to the solver assertions but as a side effect it is also appended to the list of human-readable path conditions, when it doesn't really correspond to a path (branching) condition

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just path.assert(cond) for short

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! added a todo note to get back to this later.

@@ -596,6 +643,7 @@ class Exec: # an execution path

# tx
context: CallContext
callback: Any # to be called when returning back to the parent context
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use something like Optional[Callable] as the type hint here?

self.context = kwargs["context"]
self.callback = kwargs["callback"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that callback is optional (can be None), can we also make it optional in the kwargs?

something like self.callback = kwargs.get("callback", None)

Copy link
Collaborator Author

@daejunpark daejunpark Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the callback is None only at the top-level. i prefer to non-optional in the kwargs, so that i can catch earlier if callback is missing by mistake.

Comment on lines 2108 to +2114
if opcode == EVM.JUMPI:
steps[step_id] = {"parent": prev_step_id, "exec": str(ex)}
self.steps[step_id] = {"parent": prev_step_id, "exec": str(ex)}
# elif opcode == EVM.CALL:
# steps[step_id] = {'parent': prev_step_id, 'exec': str(ex) + ex.st.str_memory() + '\n'}
# self.steps[step_id] = {'parent': prev_step_id, 'exec': str(ex) + ex.st.str_memory() + '\n'}
else:
# steps[step_id] = {'parent': prev_step_id, 'exec': ex.summary()}
steps[step_id] = {"parent": prev_step_id, "exec": str(ex)}
# self.steps[step_id] = {'parent': prev_step_id, 'exec': ex.summary()}
self.steps[step_id] = {"parent": prev_step_id, "exec": str(ex)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not directly related to this PR, but it looks like both branches are doing the same thing and could be simplified

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! it's legacy code, and i need to clean this up anyway; will do this later.

Comment on lines 2467 to 2470
if ex.callback is None:
yield ex
else:
yield from ex.callback(ex, stack, step_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract these 4 lines in a helper function? They seem to be duplicated a couple times

a comment might help understanding the yield from ex.callback(...), it takes some thinking to figure out 🧙‍♂️

if len(self.pending) > 0:
raise ValueError("deepcopy pending path", self)

path = Path(self.solver)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth a comment here. This is where the real magic happens since we re-use the solver instance instead of actually copying it, right? Maybe explain in a sentence or two why we do this and the trade-offs

new_path = ex.path.copy()
new_path.append(str(cond))
new_path = deepcopy(ex.path)
new_path.pending.append(cond)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: I had to hunt for this, I couldn't immediately see in Path where things were added to pending. Would be mildly cleaner/easier with an add_pending(cond) method IMO

@@ -585,6 +585,53 @@ def __next__(self) -> Instruction:
return insn


class Path:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would you say that an instance of this class does represent a single path through the program? My instinct says no, because it makes it a little weird to initialize an SEVM with a Path object (since I don't really have a path before we get started).

Semantically, would something like PathTracker or something similar be more accurate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point! but in my mental model, it's still a path. i added explanation in the comment for now. we can discuss more to improve naming.

Copy link
Collaborator

@0xkarmacoma 0xkarmacoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean! See minor comments and requests for clarifications

@daejunpark daejunpark merged commit 380314a into main Nov 29, 2023
61 checks passed
@daejunpark daejunpark deleted the perf/solver-reuse branch November 29, 2023 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants