-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot corruption in the state machine tests #558
Comments
I've opened #566 with my concerns around the use of There are some issues that are specific to the current implementation of snapshot corruption. Specifically,
I don't think there is an I've implemented the use of |
One possible solution might be to lift the tests themselves into the |
#539 introduced a classic property test for detecting corruption in snapshots. #542 fixed a bug in the code that #539 tested. The next step would be to detect corruption in the state machine tests as well. First, I'll describe the current approach to fault testing in the state machine tests, and then how to add snapshot corruption to the state machine tests.
Overview of fault testing in the state machine
At the time of writing this issue (87549dd), the state machine already has the possibility to inject noisy disk faults using
fs-sim
into the SUT. These faults are noisy because they produce IO exceptions. Even with fault injection, it is not guaranteed that the SUT will fail with an exception. There is non-determinism in the outcome. However, the model is deterministic, which means that the model must decide to fail with an exception, or to succeed. We solve this discrepancy as follows:Maybe Errors
in relevant state machine actions.lsm-tree/test/Database/LSMTree/Model/Session.hs
Lines 202 to 237 in 87549dd
withErrors
from thefs-sim
package.CreateSnapshot
accidentally succeeded, then the roll back action is to delete the snapshot. This makes sure the model and SUT are in sync again, and then we throw a dummy error so that the model and SUT have matching responses to the action.lsm-tree/test/Test/Database/LSMTree/StateMachine.hs
Lines 1301 to 1339 in 87549dd
Tasks
(feat: add corruptSnapshot to model #555) The model has to be made aware of snapshot corruption, so that it can respond appropriately when opening a snapshot that is corrupt. Moreover, we have to add a function to the model that explicitly corrupts a snapshot.
Add support for corrupting snapshots explicitly in the SUT. There are two options, and I'd probably recommend going with option (i) because it's the most straightforward to do and requires minimal code changes. We can implement option (ii) afterwards, but then we'll already have all the necessary boilerplate in place.
flipSnapshotBit
approach from Test snapshot corruption #539. The code to corrupt the snapshot can be factored out fromprop_flipSnapshotBit
, and then we'd probably need to add acorruptSnapshot
function to theIsTable
class and implement it usingflipSnapshotBit
.fs-sim
to also perform silent corrupted writes in addition to the noisy corrupted writes it does now.Represent the instruction to corrupt a snapshot in the
CreateSnapshot
action.Maybe (Double, Double)
should be added, which describes which file and bit to corrupt in a snapshotMaybe Errors
with silent corruption instructions.There is currently already a
Maybe Errors
in theCreateSnapshot
action. We probably do want to keep this field separate, because this existingMaybe Errors
injects noisy exceptions. This has a separate goal from the corruption testing: we use it to test that we do not forget references, double release references, leak file handles, etc. Moreover, we use it to test that the internal database state stays consistent.Then make the appropriate changes to handle updated
CreateSnapshot
action. Stuff like modifying generators, modifyingrunModel
, modifyingrunIO
andrunIOSim
, and so forth. Finally, run the tests, make fixes to the testing framework, the model, or the SUT, and so forth.The text was updated successfully, but these errors were encountered: