-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore projections when a snapshot is installed #259
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #259 +/- ##
==========================================
+ Coverage 89.64% 89.79% +0.15%
==========================================
Files 21 21
Lines 3091 3098 +7
==========================================
+ Hits 2771 2782 +11
+ Misses 320 316 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
f2ad654
to
0356e20
Compare
Previously the `snapshot_interval` config option only controlled when `khepri_machine` should emit the `{release_cursor, RaftIndex, State}` effect. Ra itself will ignore those effects though if they are emitted too often - an option controlled with the `min_snapshot_interval` (recently renamed from `snapshot_interval`) log init option in Ra. To make Ra emit snapshots as often as the specified `snapshot_interval` option, we need to set the `min_snapshot_interval` log init option to a value smaller than `snapshot_interval`. Ra handles `release_cursor` effects when the number of commands applied is greater than but not equal to the interval, so `min_snapshot_interval` must be smaller than `snapshot_interval`.
The new version of Ra adds a `ra_machine:snapshot_installed/2` callback that we will use in the child commit.
0356e20
to
24b2814
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great patch, thank you!
Just two minor comments:
A typo in the commit message of the last commit: "the machine state maches the leader", "maches" -> "matches"
Still in the commit message, in the steps to reproduce the problem with RabbitMQ, could you please clarify in the entire paragraph when you talk about RabbitMQ? Because this is Khepri repository, RabbitMQ is kind of out of the context. For example, the first line of that section says "This fixes a bug reproducible in the server with the following reproduction steps:". It should be explicit that you are talking about the "RabbitMQ server". Likewise for the rest of the section when you talk about servers, nodes and so on.
24b2814
to
963bd5c
Compare
This hooks into the new `ra_machine:snapshot_installed/2` callback introduced in the Ra version update in the parent commit. We want to restore projections when a snapshot is installed the same way as we do when the Khepri machine recovers. A cluster member may be far behind other members in the cluster causing the cluster leader to try to "catch up" that member by sending it a snapshot. Once the snapshot is installed the machine state matches the leader at the snapshot index, but this 'jump' forward doesn't trigger any changes to the projections. So we need to sync the new machine state (the tree) and the projection tables by using the existing `restore_projections` aux effect. This fixes a bug reproducible in the RabbitMQ 3.13.3 with the following reproduction steps: * Clone and enter the `rabbitmq/rabbitmq-server` repository. * Start a 3-node RabbitMQ cluster with `make start-cluster`. * Enable the `khepri_db` feature flag with `./sbin/rabbitmqctl enable_feature_flag khepri_db`. * Start a definitions import with a very large data set, for example the `100-queues-with-1000-bindings-each.json` case from the `rabbitmq/sample-configs` repo. * Un-gzip the case: `tar xzf topic-bindings/100-queues-with-1000-bindings-each.json.gz` in the sample-configs directory. * `./sbin/rabbitmqctl import_definitions \ path/to/sample-configs/topic-bindings/100-queues-with-1000-bindings-each.json.gz` in the rabbitmq-server directory. * Part-way through the import, perform a rolling restart of the cluster with `make restart-cluster` in the rabbitmq-server directory. * Examine a projection table affected by the definition import. Note the discrepancy between numbers of bindings in the `rabbit_khepri_bindings` table: ``` for i in 1 2 3; printf "rabbit-$i: "; ./sbin/rabbitmqctl -n rabbit-$i eval 'length(ets:tab2list(rabbit_khepri_bindings)).'; end rabbit-1: 49003 rabbit-2: 49003 rabbit-3: 23370 ``` RabbitMQ uses Khepri with this setup and uses the projections feature to store some data - like bindings in this example - for fast access. The rolling restart done by `make restart-cluster` causes RabbitMQ node rabbit-3's Khepri store to fall behind its peers rabbit-1 and rabbit-2 until they are restarted. Then a new leader catches up rabbit-3 via a snapshot. The definition file in this example contains very many bindings which should end up in the `rabbit_khepri_bindings` table. We see a discrepancy in the number of bindings in the projection table between the RabbitMQ nodes because rabbit-3 is caught up by a new leader (either rabbit-1 or rabbit-2) via a snapshot installation. Before this change this meant that all bindings which were inserted between rabbit-3's raft index before and after the snapshot installation are not projected by Khepri into the `rabbit_khepri_bindings` table. By restoring projections after the snapshot is installed, all nodes reflect the same numbers of bindings. Retrying the reproduction steps above after this change in Khepri causes all `rabbit_khepri_bindings` tables to contain the same data.
963bd5c
to
28bd581
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This is the companion PR to rabbitmq/ra#448
The new
ra_machine:snapshot_installed/2
optional callback can be used to sync the Khepri tree and projection tables. Without syncing these, if a member is caught up by the leader with a snapshot, any changes between where the member was in the log and the new snapshot index are not reflected in projection tables. The second commit has a reproduction case and explanation of a bug that is possible if we do not sync the projection tables when a snapshot is installed.