Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster_name property to avoid several clusters merging accidentally #36

Merged
merged 5 commits into from
Apr 12, 2022

Conversation

xvello
Copy link
Collaborator

@xvello xvello commented Apr 7, 2022

As described in quickwit-oss/quickwit#1260, we need a mechanism to ensure clusters do not accidentally merge when IPs are reused.

This PR:

  • introduces a required cluster_name argument when setting up the cluster
  • adds that cluster name to outgoing Syn messages
  • adds a BadCluster message type
  • if the cluster name in the Syn message does not match the peer's cluster name, a BadCluster message is returned instead of a SynAck

Additional notes

  • This PR changes the wire format, but I did not see any versioning mechanism in the protocol. We should merge this before v0.3 is released, to sidestep any compatibility issue.

@xvello xvello marked this pull request as ready for review April 7, 2022 14:22
Copy link
Contributor

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggested in the PR description, could you add an error message to the protocol?

As you mentioned it is nicer for testing, but also,I suspect a common problem that will occur is a misconfigured typo in the conf of one of the server. The user will likely investigate the log of the server that cannot connect to the cluster first and an explicit error message there might be nice.

@fulmicoton
Copy link
Contributor

I create #37 and #38 too.

@xvello
Copy link
Collaborator Author

xvello commented Apr 8, 2022

I just pushed a commit introducing a BadCluster message, instead of silently dropping the message.

I started writing the code with a more generic Rst message, but I could not get a relevant error displayed on the misconfigured node without extending it with a RstReason enum, which added unneeded complexity.

This will make testing and operations easier, by explicitly
refusing the message instead of silently dropping it.
@xvello
Copy link
Collaborator Author

xvello commented Apr 8, 2022

ignore_oversized_payload is failing, but I think this test has lost its initial meaning and is not useful anymore. Discussing it on discord.

Introduced in #4, this test has been outdated since the reception
buffer sized has been increased to the full UDP MTU.

As is, it sends an empty Syn message before sending another valid
Syn message. The assert checks for an SynAck message, that is the
response to the first packet instead of the second.

Now that peers check the cluster name property of Syn messages,
the first packet gets a BadCluster response, which fails the assert.
@xvello xvello requested a review from fulmicoton April 11, 2022 09:05
scuttlebutt/src/lib.rs Outdated Show resolved Hide resolved
@fulmicoton
Copy link
Contributor

@xvello FYI: in tracing ?val will use Debug. %val will call Display.
For a String you want %val

Alternatively, you can also try to get a &str and avoid any modifier. key=val.as_str()

@fulmicoton fulmicoton merged commit 6943a04 into main Apr 12, 2022
@xvello xvello deleted the xvello/cluster-name branch April 12, 2022 13:22
@xvello
Copy link
Collaborator Author

xvello commented Apr 12, 2022

Thanks for the review.
I'm opening a PR to quickwit to introduce the new config option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants