Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repmgrd autofailover not working if PR is down with File system hang #851

Open
nikhil-postgres opened this issue Apr 16, 2024 · 6 comments

Comments

@nikhil-postgres
Copy link

nikhil-postgres commented Apr 16, 2024

Hi repmgr team,

We found a bug in repmgrd process. Whenever a primary database host is hung (Not able to perform any DML/DDL operations), the repmgrd process on the HA is running but not updating the log files. It is stuck

repmgrd process in Sleep state on standby:

PID    USER    PR NI VIRT  RES   SHR  S %CPU %MEM  TIME+   COMMAND 
249678 postgres 20 0 87312 10604 7324 S 0.0  0.0  1334:46 /usr/pgsql-15/bin/repmgrd -f /postgres/admin/pgrepmgr/5304/pgrepmgr_5304.conf —log-level DEBUG —daemonize 
 

When we see the connections on the Primary, the repmgr process (of the standby ) is stuck in trying to INSERT data into repmgr.monitoring_history table.

During this situation there is no autofailover, Is this a known issue? how can we make sure that repmgrd does auto failover in such situations?

Thanks,
Nikhil

@nikhil-postgres
Copy link
Author

nikhil-postgres commented Apr 17, 2024

Hi @ibarwick @martinmarques , Do you know why repmgrd is not doing autofailover ?

@stephan-hahn
Copy link

Hi, i had a similar issue some time ago, resulting in "monitoring_history requested but primary connection not available" entries on the standby while no failover was happening (and therefore repmgrd continued to sleep). Since i had restarted repmgrd regularly, this didn't happen anymore. I not yet tried if this is still an issue in newer versions.
Stephan

@nikhil-postgres
Copy link
Author

Hi @stephan-hahn , newer versions also have the same issue but I don’t see any update from repmgr team. Is repmgr being actively developed or are the issues being looked into?

@stephan-hahn
Copy link

I hope (and think so). We use a cluster solution based on repmgr.

Could you fix the problem with a restart (e.g. daily)?

@nikhil-postgres
Copy link
Author

Yes, we can fix with a restart but being an autofailover solution, repmgr should be able to detect a system hang on primary and perform failover.

It is not performing the failover because repmgrd connections itself are hung on primary

@stephan-hahn
Copy link

For us, it works perfectly so far, and it's a quite lightweighted solution and completely free.
It would be interesting what a system hang means for you, and how do you cause it.
You could try to change connection_check_type from ping to another option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants