Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore projections when a snapshot is installed #259

Merged
merged 3 commits into from
Jun 27, 2024

Commits on Jun 19, 2024

  1. Configure Ra snapshot interval via snapshot_interval config

    Previously the `snapshot_interval` config option only controlled when
    `khepri_machine` should emit the `{release_cursor, RaftIndex, State}`
    effect. Ra itself will ignore those effects though if they are emitted
    too often - an option controlled with the `min_snapshot_interval`
    (recently renamed from `snapshot_interval`) log init option in Ra.
    To make Ra emit snapshots as often as the specified `snapshot_interval`
    option, we need to set the `min_snapshot_interval` log init option to
    a value smaller than `snapshot_interval`. Ra handles `release_cursor`
    effects when the number of commands applied is greater than but not
    equal to the interval, so `min_snapshot_interval` must be smaller than
    `snapshot_interval`.
    the-mikedavis committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    a808303 View commit details
    Browse the repository at this point in the history
  2. Update Ra to 2.11.0

    The new version of Ra adds a `ra_machine:snapshot_installed/2` callback
    that we will use in the child commit.
    the-mikedavis committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    c32096e View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2024

  1. khepri_machine: Restore projections when a snapshot is installed

    This hooks into the new `ra_machine:snapshot_installed/2` callback
    introduced in the Ra version update in the parent commit.
    
    We want to restore projections when a snapshot is installed the same
    way as we do when the Khepri machine recovers. A cluster member may be
    far behind other members in the cluster causing the cluster leader to
    try to "catch up" that member by sending it a snapshot. Once the
    snapshot is installed the machine state matches the leader at the
    snapshot index, but this 'jump' forward doesn't trigger any changes to
    the projections. So we need to sync the new machine state (the tree) and
    the projection tables by using the existing `restore_projections` aux
    effect.
    
    This fixes a bug reproducible in the RabbitMQ 3.13.3 with the following
    reproduction steps:
    
    * Clone and enter the `rabbitmq/rabbitmq-server` repository.
    * Start a 3-node RabbitMQ cluster with `make start-cluster`.
    * Enable the `khepri_db` feature flag with
      `./sbin/rabbitmqctl enable_feature_flag khepri_db`.
    * Start a definitions import with a very large data set, for example
      the `100-queues-with-1000-bindings-each.json` case from the
      `rabbitmq/sample-configs` repo.
        * Un-gzip the case:
          `tar xzf topic-bindings/100-queues-with-1000-bindings-each.json.gz`
          in the sample-configs directory.
        * `./sbin/rabbitmqctl import_definitions \
           path/to/sample-configs/topic-bindings/100-queues-with-1000-bindings-each.json.gz`
          in the rabbitmq-server directory.
    * Part-way through the import, perform a rolling restart of the cluster
      with `make restart-cluster` in the rabbitmq-server directory.
    * Examine a projection table affected by the definition import. Note the
      discrepancy between numbers of bindings in the `rabbit_khepri_bindings`
      table:
    
    ```
    for i in 1 2 3; printf "rabbit-$i: "; ./sbin/rabbitmqctl -n rabbit-$i eval 'length(ets:tab2list(rabbit_khepri_bindings)).'; end
    rabbit-1: 49003
    rabbit-2: 49003
    rabbit-3: 23370
    ```
    
    RabbitMQ uses Khepri with this setup and uses the projections feature to
    store some data - like bindings in this example - for fast access. The
    rolling restart done by `make restart-cluster` causes RabbitMQ node
    rabbit-3's Khepri store to fall behind its peers rabbit-1 and rabbit-2
    until they are restarted. Then a new leader catches up rabbit-3 via a
    snapshot.
    
    The definition file in this example contains very many bindings which
    should end up in the `rabbit_khepri_bindings` table. We see a
    discrepancy in the number of bindings in the projection table between
    the RabbitMQ nodes because rabbit-3 is caught up by a new leader (either
    rabbit-1 or rabbit-2) via a snapshot installation. Before this change
    this meant that all bindings which were inserted between rabbit-3's raft
    index before and after the snapshot installation are not projected by
    Khepri into the `rabbit_khepri_bindings` table. By restoring projections
    after the snapshot is installed, all nodes reflect the same numbers of
    bindings. Retrying the reproduction steps above after this change in
    Khepri causes all `rabbit_khepri_bindings` tables to contain the same
    data.
    the-mikedavis committed Jun 26, 2024
    Configuration menu
    Copy the full SHA
    28bd581 View commit details
    Browse the repository at this point in the history