-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
khepri_machine: Restore projections when a snapshot is installed
This hooks into the new `ra_machine:snapshot_installed/2` callback introduced in the Ra version update in the parent commit. We want to restore projections when a snapshot is installed the same way as we do when the khepri machine recovers. A cluster member may be far behind other members in the cluster causing the cluster leader to try to "catch up" that member by sending it a snapshot. Once the snapshot is installed the machine state maches the leader at the snapshot index, but this 'jump' forward doesn't trigger any changes to the projections. So we need to sync the new machine state (the tree) and the projection tables by using the existing `restore_projections` aux effect. This fixes a bug reproducible in the server with the following reproduction steps: * Start a 3-node cluster with `make start-cluster`. * Enable the `khepri_db` feature flag. * Start a definitions import with a very large data set, for example the `100-queues-with-100-bindings-each.json` case from the rabbitmq/sample-configs repo. * Part-way through the import, perform a rolling restart of the cluster with `make restart-cluster`. * Examine a projection table affected by the definition import. Note the discrepancy between numbers of bindings in the `rabbit_khepri_bindings` table: ``` for i in 1 2 3; printf "rabbit-$i: "; rabbitmqctl -n rabbit-$i eval 'length(ets:tab2list(rabbit_khepri_bindings)).'; end rabbit-1: 49003 rabbit-2: 49003 rabbit-3: 23370 ``` The rolling restart stops and restarts rabbit-3 before the other nodes. The definition import continues while rabbit-3 is restarting though because rabbit-1 and rabbit-2 still form a majority. When those nodes restart, one will become leader (because they are ahead of rabbit-3 in terms of commit index) and will catch up rabbit-3 with a snapshot. The number of raft indices skipped ahead with that snapshot is nearly the same as the number of records missing from the projection table: the records missing from the projection table are the ones sent by the leader in the snapshot. By restoring projections after the snapshot is installed, all nodes reflect the same numbers of bindings.
- Loading branch information
1 parent
c32096e
commit 24b2814
Showing
2 changed files
with
147 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters