-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate ledger snapshot checksum #6047
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
geo2a
force-pushed
the
geo2a/10.1.3-ledger-snapshot-checksum
branch
from
December 4, 2024 10:19
6a2a0a9
to
6d7e573
Compare
geo2a
force-pushed
the
geo2a/10.1.3-ledger-snapshot-checksum
branch
from
December 4, 2024 12:35
6d7e573
to
3ce2379
Compare
jasagredo
reviewed
Dec 4, 2024
jasagredo
approved these changes
Dec 5, 2024
geo2a
force-pushed
the
geo2a/10.1.3-ledger-snapshot-checksum
branch
3 times, most recently
from
December 9, 2024 14:50
76334d7
to
a208e5a
Compare
- Categorise `LedgerDB.SnapshotMissingChecksum` trace as `Warning` - Expose snapshot checksum switch in config file
geo2a
force-pushed
the
geo2a/10.1.3-ledger-snapshot-checksum
branch
2 times, most recently
from
December 9, 2024 15:35
bb20145
to
00efdba
Compare
github-merge-queue bot
pushed a commit
to IntersectMBO/ouroboros-consensus
that referenced
this pull request
Dec 10, 2024
…1319) Fixes #892 Integration into `cardano-node`: IntersectMBO/cardano-node#6047. This uses the branch [geo2a/issue-892-checksum-snaphot-file-release-ouroboros-consensus-0.21.0.0-backport](https://github.com/IntersectMBO/ouroboros-consensus/tree/geo2a/issue-892-checksum-snaphot-file-release-ouroboros-consensus-0.21.0.0-backport) which is the backport of this PR onto the most resent release of the `ouroboros-consensus` package. In this PR, we change the reading and writing disk snapshots of ledger state. When a snapshot is taken and written to disk, an additional file with the `.checksum` extension is written alongside it. The checksum file contains a string that represent the CRC32 checksum of the snapshot. The checksum is calculated incrementally, alongside writing the snapshot to disk. When a snapshot is read from dist, the checksum is again calculated and compared to the tracked one. If the checksum is different, `readSnaphot` returns the `ReadSnapshotDataCorruption` error, indicating data corruption. The checksum is calculated incrementally, alongside reading a writing the data. On write, we use the [`hPutAllCRC`](https://input-output-hk.github.io/fs-sim/fs-api/src/System.FS.CRC.html#hPutAllCRC) function from `fs-sim`, and on read we modify the [readIncremental](https://github.com/IntersectMBO/ouroboros-consensus/blob/892-checksum-snaphot-file/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Util/CBOR.hs#L191) function to compute the checksum as data is read. To enable seamless integration into `cardano-node`, we make the check optional. When initialising the ledger state from a snapshot in `initLedgerDB`, we issue a warning in case the checksum file is missing for a snapshot, but do not fail as in case of invalid snapshots. The `db-analyser` tool ignores the checksum files by default when reading the snapshots. We add `--disk-snapshot-checksum` flag to enabled the check. When writing a snapshot to disk, e.g. as a result of the `--store-ledger` analysis, `db-analyser` will always write calculate the checksum and write it into the snapshot's `.checksum` file. **Tests** There state machine test in `Test.Ouroboros.Storage.LedgerDB.OnDisk` is relevant to this feature, and has caught a number of silly mistakes in the process of its implementation, for example forgetting to delete a checksum file when the snapshot is deleted. The model in the test does not track checksums, and I do not think it can (or should) be augmented to do that. Howerver, the `Snap` and `Restore` events are now parameterised by the checksum flag, and the values for the flag are randomised when generating these events. This leads to testing the following properties: - this feature is backwards-compatible, i.e. the `Restore` events will always lead to restoring from a snapshot, even if `Snap` events do not write checksum files (i.e. their flag is `NoDoDiskSnapshotChecksum` ~= `False`). - If the interpretation of the `DoDiskSnapshotChecksum` flag changes in the code base and becomes strict, i.e. hard fail if the checksum file is missing, this test will discover that. **Effects on Performance**: Running `db-analyser` to read a ledger snapshot and store the snapshot of the state at the next slot shows a difference of 2 seconds on my machine. See a comment below for the logs. To precisely evaluate the effects, we need a micro-benchmark of the reading and writing of snapshots with and without the checksum calculation.
Remember that `Text.unwords` exists
geo2a
force-pushed
the
geo2a/10.1.3-ledger-snapshot-checksum
branch
from
December 12, 2024 12:15
00efdba
to
58c5f1f
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
IntersectMBO/ouroboros-consensus#1319, when merged, will checksum the ledger state snapshots to detect data corruption. This PR integrates this change into
cardano-node
.In addition to the snapshot file itself, a
.checksum
file is written to dist, which contains theCRC32
checksum of the snapshot data:To preserve backwards compatibility, the absence of the checksum file does not block the node from starting up. Instead, a warning is issued when restoring from a snapshot that is missing the checksum file:
... Listening on http://127.0.0.1:12798 [geo2a-la:cardano.node.ChainDB:Warning:5] [2024-12-04 13:09:30.72 UTC] Checksum file is missing for snapshot DiskSnapshot {dsNumber = 2223, dsSuffix = Nothing} ...
If the checksum differs from the file, there's a number of scenarios that all ultimately lead to the deletion of the snapshot and an attempt to restore from an older one:
dd
, i.e.:.checksum
file is invalid:length str == 8 && all isHexDigit str
The feature is on by default and can be disabled by the following configuration option: