Ra 1.2: leader operations fail with a timeout #374

jingxu2021 · 2023-05-14T15:26:43Z

jingxu2021
May 14, 2023

Hi,

I have 2 nodes and 3 clusters running on them. Node 1 is the leader in all 3 clusters. I stopped the node 1, then node 2 became leaders for 2 clusters, but node 2 in the last cluster is in state 'pre_vote', and seems stuck there.
If I run ra:members(cluster_3), I got {timeout, cluster_3}.
Then I tried everything I could imagine, it just stuck there. I tried, restart_server, stop_server/restart_server, start_or_restart_cluster. I always got error already_started.
I will try delete_cluster/start_cluster. Update: delete_cluster got {error,{no_more_servers_to_try,[{timeout,{cluster_3, node_2}}]}}
Could you please suggest how could I recover from this situation? I'm on v1.1.2, I know it's pretty old version, but that's what I have.
Thank you so much.
Jing

michaelklishin · 2023-05-14T18:39:12Z

michaelklishin
May 14, 2023
Maintainer

This means the cluster does not have an elected leader or quorum online. Or never had them.

0 replies

michaelklishin · 2023-05-14T18:40:41Z

michaelklishin
May 14, 2023
Maintainer

We cannot suggest much with this amount of information. Consider gathering node logs and state.

Ra 1.2.x is several minors and that be major behind.

0 replies

michaelklishin · 2023-05-14T18:59:59Z

michaelklishin
May 14, 2023
Maintainer

Without any code or logs, #179, #251, #264 look potentially relevant.

#179 mentions how to enable logging, including debug logging.

0 replies

jingxu2021 · 2023-05-15T01:55:04Z

jingxu2021
May 15, 2023
Author

Thanks so much for all these information. I will enable logging and try my best to provide more information. I appreciate your help.
If I could get out of this stuck situation, that's fine for me. But I couldn't find any API which can help so far.

0 replies

jingxu2021 · 2023-05-15T03:18:51Z

jingxu2021
May 15, 2023
Author

Here is what I found:
1, if both nodes are in 'follower' state, trigger_election() can recover the cluster
2, if a node is in 'pre_vote' state, stop_server() can change its state to 'follower', then trigger_election() can recover the cluster.
Thanks.

0 replies

michaelklishin · 2024-04-29T18:44:27Z

michaelklishin
Apr 29, 2024
Maintainer

There were many changes around member state, and some upcoming, so I don't think this is relevant any more in the 2.10.x era of Ra.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ra 1.2: leader operations fail with a timeout #374

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Ra 1.2: leader operations fail with a timeout #374

jingxu2021 May 14, 2023

Replies: 6 comments

michaelklishin May 14, 2023 Maintainer

michaelklishin May 14, 2023 Maintainer

michaelklishin May 14, 2023 Maintainer

jingxu2021 May 15, 2023 Author

jingxu2021 May 15, 2023 Author

michaelklishin Apr 29, 2024 Maintainer

jingxu2021
May 14, 2023

michaelklishin
May 14, 2023
Maintainer

michaelklishin
May 14, 2023
Maintainer

michaelklishin
May 14, 2023
Maintainer

jingxu2021
May 15, 2023
Author

jingxu2021
May 15, 2023
Author

michaelklishin
Apr 29, 2024
Maintainer