Original leader gets reelected after node restart #472
-
Hi, I am wondering if this is intentional - or it may be a strict protocol problem, or it may be a bug. Sometimes we notice in RabbitMQ that a node which was restarted starts hosting leaders again. This is quite unexpected - as new leaders are elected on node shutdown. I am testing this with RabbitMQ 3.13.7 but noticed it with earlier versions as well. Reproduction:
The difference seems to be the following that because on startup It seems weird to me that this is happening - and if it is happening, then why not to all queues? If the queue processes receives one message then no leader reversal happens. Thanks for your answers. Here is the log for one of such queues. One other difference seems to be to the other queues is that no
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
If between the moment you stop node A and when node A comes back there were no Raft log updates (e.g. no messages published or consumed + acknowledged), both replicas will have the same log state and the primary difference should be the election term. Raft's leader election voting (candidate selection) has a random component to it and can fail (end up in a split vote) and retry. When an older leader shows up in the process, that can produce a behavior you are observing. And the randomness in candidate selection during a vote probably explains why this happens to some queues but not all. Most importantly: the data in those QQs should still be safe. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the answer, makes sense. I was thinking it's OK to have this, but was wondering if maybe I am missing something. |
Beta Was this translation helpful? Give feedback.
If between the moment you stop node A and when node A comes back there were no Raft log updates (e.g. no messages published or consumed + acknowledged), both replicas will have the same log state and the primary difference should be the election term.
Raft's leader election voting (candidate selection) has a random component to it and can fail (end up in a split vote) and retry. When an older leader shows up in the process, that can produce a behavior you are observing.
And the randomness in candidate selection during a vote probably explains why this happens to some queues but not all.
Most importantly: the data in those QQs should still be safe.