-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add docs and details for the differential state archive #7049
base: feature/differential-archive
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## feature/differential-archive #7049 +/- ##
=============================================================
Coverage 49.24% 49.25%
=============================================================
Files 578 578
Lines 37443 37443
Branches 2168 2172 +4
=============================================================
+ Hits 18440 18441 +1
+ Misses 18963 18962 -1
Partials 40 40 |
@@ -0,0 +1,79 @@ | |||
--- | |||
title: Understanding Historical Sate Regeneration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: Understanding Historical Sate Regeneration | |
title: Understanding Historical State Regeneration |
|
||
**Approach** | ||
|
||
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies: | |
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, the following configuration for layers implies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies: | |
Assume we have the following chain representing the state object of every slot, with the following diff layer configurations `1,2,3,5`. With the assumption that we have 8 slots each epoch, the following configuration for layers implies: |
|
||
# Understanding Historical Sate Regeneration | ||
|
||
To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time. | |
To run a blockchain client and establish consensus we need latest headers and fork choice data. This operation does not require access to historical data, especially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time. |
|
||
## Solution | ||
|
||
To overcome the storage problem for the archive nodes we implemented following algorithm to store and fetch the historical sates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To overcome the storage problem for the archive nodes we implemented following algorithm to store and fetch the historical sates. | |
To overcome the storage problem for the archive nodes, we implemented the following algorithm to store and fetch the historical states. |
|
||
**Approach** | ||
|
||
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies: | |
Assume we have the following chain which represents the state object every slot, with following diff layer configurations `1,2,3,5`. With the assumption that we have 8 slots each epoch, the following configuration for layers implies: |
5. For slot `34` the path we follow `32 -> 24 -> 0`. | ||
6. For slot `41` path for the nearest snapshot slot is just one layer directly at slot `40`. | ||
|
||
As you can see with this approach we can find a shorter paths with smaller number of diffs to apply, which generate the nearest full state and reduce the number of blocks we have to replay to reach to actual slot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see with this approach we can find a shorter paths with smaller number of diffs to apply, which generate the nearest full state and reduce the number of blocks we have to replay to reach to actual slot. | |
As you can see with this approach we can find shorter paths with smaller number of diffs to apply, which generate the nearest full state and reduce the number of blocks we have to replay to reach the actual slot. |
\end{align*} | ||
$$ | ||
|
||
As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters. | |
As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is a ever growing data structure, the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters. |
T_{diff} &= \text{Time to take differential backup}\\ | ||
T_{replay} &= \text{Time to replay a block}\\ | ||
R_{full} &= \text{Time to restore full backup}\\ | ||
R_{diff} &= \text{Tiem to restore differential backup}\\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R_{diff} &= \text{Tiem to restore differential backup}\\ | |
R_{diff} &= \text{Time to restore differential backup}\\ |
title: Understanding Historical Sate Regeneration | ||
--- | ||
|
||
# Understanding Historical Sate Regeneration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Understanding Historical Sate Regeneration | |
# Understanding Historical State Regeneration |
|
||
Based on these assumptions and system we decided for the following constants. | ||
|
||
| Name | Value | Description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also run yarn docs:lint:fix
for prettier to fix this table
|
||
# Understanding Historical Sate Regeneration | ||
|
||
To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run a blockchain client and establish consensus we need latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs which are finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and not suitable for running the node for long time. | |
To run a blockchain client and establish consensus we need the latest headers and forkchoice data. This operation does not require to historical data, specially after the epochs are being finalized. Storing the full state information for the finalized slots increase the storage requirement a lot and is not suitable for running the node for long time. |
|
||
**Approach** | ||
|
||
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies: | |
Assuming we have following chain represents the state object every slot, with following diff layer configurations `1,2,3,5`. With assumption that we have 8 slots each epoch, The following configuration for layers implies: |
Let's take few scenarios: | ||
|
||
1. For slot `0` all layers collide, so we use the lowest layer which is the snapshot layer. So for the slot `0` we store and fetch the snapshot. | ||
2. For slots (0-7) within first epoch we there is no intermediary layer, so we read the snapshot from slot `0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. For slots (0-7) within first epoch we there is no intermediary layer, so we read the snapshot from slot `0`. | |
2. For slots (0-7) within the first epoch, there is no intermediary layer, so we read the snapshot from slot `0`. |
1. For slot `0` all layers collide, so we use the lowest layer which is the snapshot layer. So for the slot `0` we store and fetch the snapshot. | ||
2. For slots (0-7) within first epoch we there is no intermediary layer, so we read the snapshot from slot `0`. | ||
3. For slots (8-15) the path we follow is `8 -> 0`. e.g. For slot `12`, we apply diff from slot `8` on snapshot from slot `0`. Then we replay blocks from 9-12. | ||
4. For slot `18` the shortest path to nearest snapshot is `16 -> 0` and rest will follow same as above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4. For slot `18` the shortest path to nearest snapshot is `16 -> 0` and rest will follow same as above. | |
4. For slot `18` the shortest path to nearest snapshot is `16 -> 0` and the rest will follow same as above. |
\end{align*} | ||
$$ | ||
|
||
As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there are lot of parameters in the system and we don't have accurate values for these so we started few possible estimates. Also as the chain is ever growing data structure the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters. | |
As there are lot of parameters in the system and we don't have accurate values for these. we started with few possible estimates. Also as the chain is an ever- growing data structure, the value for `F` is not finite. We decided to do this estimation based on 30 days time period and `mainnet` parameters. |
Motivation
Add docs explaining differential state archive.
Description
Keep the reference docs updated.
Steps to test or reproduce