Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync name.dets to disk #443

Merged
merged 3 commits into from
May 20, 2024
Merged

Sync name.dets to disk #443

merged 3 commits into from
May 20, 2024

Conversation

visciang
Copy link
Contributor

@visciang visciang commented May 18, 2024

Proposed Changes

I have seen names.dets being left not in sync with the system state (server directory name).

The operations that trigger the problem are:

  1. start the app
  2. create a raft server X (UID-123)
  3. leave_and_delete the server X
  4. re create the raft server X (UID-456)
  5. immediately stop the app (<<<=== before the DETS auto_sync the new registered name to the disk - 500ms)

So, in other words, we are "resetting" an existing ra server (3+4).

What I've seen is that names.dets refer to the ra server identifier created at step 2 - UID-123 (the first instance of the server X).
When I restart the application, ra crashes while strarting with error enoent since the ra server directory UID-123 does not exist anymore (deleted by step 3).

Here a view of what I see - names.dets refer to the server "ELIXIRXR9LYKF78Z3R" (it's the one created on step 2) while on disk I see the server "ELIXIRHP06E8L3EYLT" (created on step 4).

image

I've been able to "fix" the problem introducing a dets:sync call after each change operation in ra_directory module, following what is already in place in the ra_log_metadata module.

@kjnilsson I have not jet created an ISSUE to link to this PR, would like to get a feedback from you first.
Even if the change is small, and I think should not break existing stuff and lower down the performance, maybe there is a better architectural approach you can come to - since given the non atomicity of writing to a dets can always lead to a dets that has not been updated (ie erlang app killed after a dets:insert but before the dets:sync).
Let me know if you need more information.

Types of Changes

  • Bug fix (non-breaking change which fixes issue "NOT JET OPENED")
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation (correction or otherwise)
  • Cosmetics (whitespace, appearance)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

@pivotal-cla
Copy link

@visciang Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

@pivotal-cla
Copy link

@visciang Thank you for signing the Contributor License Agreement!

@michaelklishin
Copy link
Member

@visciang it's fine not to file an issue first if you explain the reasoning behind the change and various design considerations (if any) in the PR. I think we have enough information here.

Thank you for taking the time to contribute!

@michaelklishin
Copy link
Member

FTR, the functions affected should not be used on the hot code path, so this PR is very unlikely to introduce a throughput hit with reasonable workloads:

  • ra_directory:deinit/1
  • ra_directory:register_name/6
  • ra_directory:unregister_name/2

@michaelklishin michaelklishin modified the milestone: 2.11.0 May 19, 2024
@kjnilsson
Copy link
Contributor

kjnilsson commented May 19, 2024 via email

@kjnilsson kjnilsson self-requested a review May 20, 2024 08:46
Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be sufficient to retain the change in ra_directory:deinit/1 and remove the other calls to dets:sync/1 as they are likely to have a negative performance impact on RabbitMQ.

src/ra_directory.erl Outdated Show resolved Hide resolved
src/ra_directory.erl Outdated Show resolved Hide resolved
@kjnilsson kjnilsson merged commit 6b48b15 into rabbitmq:main May 20, 2024
9 of 11 checks passed
@kjnilsson
Copy link
Contributor

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants