-
Notifications
You must be signed in to change notification settings - Fork 20.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trie/pathdb: state iterator (snapshot integration pt 4) #30654
trie/pathdb: state iterator (snapshot integration pt 4) #30654
Conversation
0c0db5a
to
409da23
Compare
aac2c48
to
352e348
Compare
352e348
to
6d1581a
Compare
// holdableIterator is a wrapper of underlying database iterator. It extends | ||
// the basic iterator interface by adding Hold which can hold the element | ||
// locally where the iterator is currently located and serve it up next time. | ||
type holdableIterator struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't a great name; how about cachingIterator
? the elements that are "held" are basically cached so that you don't have to iterate the 2nd time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or "replayableIterator", since the iteration is "replayable"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or "pin"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open for renaming it, for sure. But I don't want to change it in this pull request.
It's a copy-paste from the state snapshot and it's easier to not change it for review.
} | ||
it.key = common.CopyBytes(it.it.Key()) | ||
it.val = common.CopyBytes(it.it.Value()) | ||
it.atHeld = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit weird that a function called Hold
will set atHeld
to false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's polish it in the following PRs, this pr is already huge enough and let's focus on the core changes.
if !bytes.Equal(it.Value(), []byte(content[order[idx]])) { | ||
t.Errorf("item %d: value mismatch: have %s, want %s", idx, string(it.Value()), content[order[idx]]) | ||
} | ||
// Should be safe to call discard multiple times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean "hold" instead of "discard" ?
a Iterator | ||
b Iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have more descriptive names, e.g. left
and right
? Same thing with k
. Is that a key ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about a primary and a secondary iterator. IMO a
and b
is slightly better than left
and right
. And yes, k
is the current key. I guess it could be renmed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to rename in a following PR.
// newBinaryAccountIterator creates a simplistic account iterator to step over | ||
// all the accounts in a slow, but easily verifiable way. | ||
// | ||
// nolint:all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this nolint
here ? I don't see anything obviously wrong.
switch bytes.Compare(hashI[:], hashJ[:]) { | ||
case -1: | ||
return -1 | ||
case 1: | ||
return 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch bytes.Compare(hashI[:], hashJ[:]) { | |
case -1: | |
return -1 | |
case 1: | |
return 1 | |
} | |
comparison := bytes.Compare(hashI[:], hashJ[:]) | |
if comparison != 0 { | |
return comparison | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to change it in a following PR.
@@ -98,6 +98,17 @@ func (s *stateSet) account(hash common.Hash) ([]byte, bool) { | |||
return nil, false // account is unknown in this set | |||
} | |||
|
|||
// mustAccount returns the account data associated with the specified address | |||
// hash. The difference is this function will return an error if the account | |||
// is not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it the wrong name then ? For example, we used to have MustCommit
which would panic if it couldn't commit the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe tryAccount
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let me find a more reasonable name
@@ -110,6 +121,19 @@ func (s *stateSet) storage(accountHash, storageHash common.Hash) ([]byte, bool) | |||
return nil, false // storage is unknown in this set | |||
} | |||
|
|||
// mustStorage returns the storage slot associated with the specified address | |||
// hash and storage key hash. The difference is this function will return an | |||
// error if the storage slot is not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, most code is is copied. However
if the corresponding state is mutable (e.g., the disk layer in the path database), the callback can operate as follows:
Where is that located?
a Iterator | ||
b Iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about a primary and a secondary iterator. IMO a
and b
is slightly better than left
and right
. And yes, k
is the current key. I guess it could be renmed
@@ -98,6 +98,17 @@ func (s *stateSet) account(hash common.Hash) ([]byte, bool) { | |||
return nil, false // account is unknown in this set | |||
} | |||
|
|||
// mustAccount returns the account data associated with the specified address | |||
// hash. The difference is this function will return an error if the account | |||
// is not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe tryAccount
?
priority: depth + 1, | ||
}) | ||
case *diffLayer: | ||
// The state set in diff layer is immutable and will never be stale, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we never merge into layers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For diff layers in the pathdb, yes they are immutable and we never modify the content of them.
The only mutable container is the buffer in the disk layer, which will accept the state set from the diff layers on top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only mutable container is the buffer in the disk layer, which will accept the state set from the diff layers on top.
That's nice. We could do the same in the snap layers. Instead of having n
mutable layers, where only the bottom-most difflayer is the one that is actually mutable, we could treat them differently.
@holiman This one could be an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went over your responses. LGTM
d5fbbb8
to
6cef4a6
Compare
6cef4a6
to
4ab8eef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It's highly recommended to review the changes by comparing the files directly.
In this pull request, the state iterator is implemented. It's mostly a copy-paste from
the original state snapshot package, but still has some important changes to highlight
here:
(a) The iterator for the disk layer consists of a diff iterator and a disk iterator.
Originally, the disk layer in the state snapshot was a wrapper around the disk, and
its corresponding iterator was also a wrapper around the disk iterator. However,
due to structural differences, the disk layer iterator is divided into two parts:
Checkout
BinaryIterator
andFastIterator
for more details.(b) The staleness management is improved in the diffAccountIterator and diffStorageIterator
Originally, in the
diffAccountIterator
, the layer’s staleness had to be checked within the Nextfunction to ensure the iterator remained usable. Additionally, a read lock on the associated diff
layer was required to first retrieve the account blob. This read lock protection is essential to
prevent concurrent map read/write. Afterward, a staleness check was performed to ensure the
retrieved data was not outdated.
The entire logic can be simplified as follows: a loadAccount callback is provided to retrieve
account data. If the corresponding state is immutable (e.g., diff layers in the path database),
the staleness check can be skipped, and a single account data retrieval is sufficient. However,
if the corresponding state is mutable (e.g., the disk layer in the path database), the callback
can operate as follows:
The callback solution can eliminate the complexity for managing concurrency with the read
lock for atomic operation.