Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ra log v2 #475

Merged
merged 2 commits into from
Nov 12, 2024
Merged

Ra log v2 #475

merged 2 commits into from
Nov 12, 2024

Conversation

kjnilsson
Copy link
Contributor

@kjnilsson kjnilsson commented Oct 11, 2024

Log v2

This is a substantial refactoring of the Ra log implementation. All disk formats remain the same - the changes in this PR primarily deal with:

  • Changes to memtable management and lifetimes
  • Refinement of Ra server / WAL / Segment writer messages and handling.

Motivation

Previously the WAL process had quite a lot of responsibilities:

  • batching incoming writes from all local ra servers (writers).
  • maintaining accounting information to ensure
  • Writing entries to disk and fsyncing
  • notifying ra server post fsync
  • creating and updating new memtable and the mem-table lookup tables. (ra_log_open_mem_tables, ra_log_closed_mem_tables).

As the WAL is the primary contention point of the Ra log system it was clear it would benefit from fewer responsibilities which should increase the performance and scalability of the log implementation.

In addition the ra server wrote all entries first to a local ETS "cache" table, the same entry that later was written to a memtable by the WAL. So each entry was written to two ETS tables (at a minimum, low priority entries may also be written to yet another ETS table).

V2

In the v2 implementation the WAL no longer writes to memtables. Instead of the ra servers maintaining a cache table this table, in effect, becomes the memtable. Only the ra servers write to memtable now.

The WAL's responsibilities have been reduced to:

  • batching
  • accounting
  • writing and syncing
  • notifying

Instead of memtables being linked to the life-time of a WAL file and deleted after being flushed to segments they are now maintained indefinitely and instead flushed entries are deleted from the table.

There are many additional changes and improvements and some API simplifictions.

Breaking change:

The rarely used ra:register_external_reader/2 API has been deprecated and will now only read entries from segment files. So in effect it will provide the same behaviour just delayed.

The docs/internals/LOG_V2.md provides further details on the new log design.

@kjnilsson kjnilsson marked this pull request as draft October 11, 2024 09:50
acogoluegnes added a commit to rabbitmq/ra-kv-store that referenced this pull request Oct 24, 2024
@kjnilsson kjnilsson changed the title Ra log single memtbl Ra log v2 Oct 25, 2024
@kjnilsson kjnilsson force-pushed the ra-log-single-memtbl branch 15 times, most recently from 6a3a148 to 674fce1 Compare November 11, 2024 17:05
This is a substantial refactoring of the Ra log implementation. All disk formats remain the same - the changes in this PR primarily deal with:

Changes to memtable management and lifetimes
Refinement of Ra server / WAL / Segment writer messages and handling.
Motivation
Previously the WAL process had quite a lot of responsibilities:

* batching incoming writes from all local ra servers (writers).
* Accounting
* Writing entries to disk and fsyncing
* Notifying ra server post fsync
* Creating and updating new memtable and the mem-table lookup tables. (ra_log_open_mem_tables, ra_log_closed_mem_tables).
* Notifying segment writer

As the WAL is the primary contention point of the Ra log system it was clear it would benefit from fewer responsibilities which should increase the performance and scalability of the log implementation.

In addition the ra server wrote all entries first to a local ETS "cache" table, the same entry that later was written to a memtable by the WAL. So each entry was written to two ETS tables (at a minimum, low priority entries may also be written to yet another ETS table).

V2

In the v2 implementation the WAL no longer writes to memtables. Instead of the ra servers maintaining a cache table this table, in effect, becomes the memtable. Only the ra servers write to memtables now.

The WAL's responsibilities have been reduced to:

* batching
* accounting
* writing and syncing
* notifying

Instead of memtables being linked to the life-time of a WAL file and deleted after being flushed to segments they are now maintained indefinitely and instead flushed entries are deleted from the table.

There are many additional changes and improvements and some API simplifictions.

Breaking change:

The rarely used ra:register_external_reader/2 API has been deprecated and will now only read entries from segment files. So in effect it will provide the same behaviour just delayed.

The docs/LOG_V2.md provides further details on the new log design.

Fix typos

wip
@kjnilsson kjnilsson marked this pull request as ready for review November 12, 2024 12:45
@kjnilsson kjnilsson added this to the 4.15 milestone Nov 12, 2024
@kjnilsson kjnilsson modified the milestones: 2.15.0, 2.16, 2.16.0 Nov 12, 2024
Copy link
Member

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that this PR makes quorum queue throughput go up by a non-trivial percentage.

Would be interesting to see the effects of this memtable use change on memory consumption. I assume we'd see the same zig-zag-looking pattern but potentially with a different amplitude.

@michaelklishin michaelklishin merged commit 09ee102 into main Nov 12, 2024
7 checks passed
@michaelklishin michaelklishin deleted the ra-log-single-memtbl branch November 12, 2024 23:47
@kjnilsson kjnilsson mentioned this pull request Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants