Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repmgr daemon status showing repmgrd as 'not running' #854

Open
mRx-z3d opened this issue Jun 24, 2024 · 1 comment
Open

repmgr daemon status showing repmgrd as 'not running' #854

mRx-z3d opened this issue Jun 24, 2024 · 1 comment

Comments

@mRx-z3d
Copy link

mRx-z3d commented Jun 24, 2024

Hi,

I'm playing with AlloyDB Omni, which is a standard PGSQL wrapped in a container and packed with some GCP (Google) steroids. Everything is working well, I was able to build a simple config with Primary and a single Standby. I was also able to use repmgr to test the switchover and switchback operations - this also works fine.
The problem starts when I try to use repmgr with automatic failover:

Versions:
repmgr --version
repmgr 5.4.1

postgres --version
postgres (PostgreSQL) 15.5

Configuration:
A) repmgrd content (/etc/default/repmgrd):
REPMGRD_ENABLED=yes
REPMGRD_CONF="/var/alloydb/config/repmgr.conf"
REPMGRD_OPTS="--daemonize=false"
REPMGRD_USER=postgres
REPMGRD_BIN=/usr/bin/repmgrd
REPMGRD_PIDFILE=/var/run/repmgrd.pid

B) repmgr cofiguration (/var/alloydb/config/repmgr.conf):
failover=automatic
promote_command='/usr/bin/repmgr standby promote -f /var/alloydb/config/repmgr.conf --log-to-file'
follow_command='/usr/bin/repmgr standby follow -f /var/alloydb/config/repmgr.conf --log-to-file --upstream-node-id=%n'
repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgrd'
repmgrd_service_start_command='sudo /usr/bin/systemctl stop repmgrd'
monitoring_history=yes
log_level=INFO
log_file='/var/log/postgres/repmgrd.log'

Sympthoms:
I'm able to start the repmgrd service on both nodes:

on prim:
repmgr -f /var/alloydb/config/repmgr.conf daemon start --verbose
NOTICE: using provided configuration file "/var/alloydb/config/repmgr.conf"
INFO: connecting to local node
NOTICE: executing: "sudo /usr/bin/systemctl start repmgrd"
NOTICE: repmgrd was successfully started

prim output:
● repmgrd.service - LSB: Start/stop repmgrd
Loaded: loaded (/etc/init.d/repmgrd; generated)
Active: active (running) since Mon 2024-06-24 04:24:39 EDT; 16min ago
Docs: man:systemd-sysv-generator(8)
Process: 10531 ExecStart=/etc/init.d/repmgrd start (code=exited, status=0/SUCCESS)
Tasks: 1 (limit: 19151)
Memory: 1.3M
CPU: 532ms
CGroup: /system.slice/repmgrd.service
└─10536 /usr/lib/postgresql/15/bin/repmgrd --config-file /var/alloydb/config/repmgr.conf --daemonize=false

Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Starting LSB: Start/stop repmgrd...
Jun 24 04:24:39 omnidbv-repli-03 repmgrd[10531]: Starting PostgreSQL replication management and monitoring daemon: repmgrd.
Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Started LSB: Start/stop repmgrd.

on stby:
repmgr -f /var/alloydb/config/repmgr.conf daemon start --verbose
NOTICE: using provided configuration file "/var/alloydb/config/repmgr.conf"
INFO: connecting to local node
NOTICE: executing: "sudo /usr/bin/systemctl start repmgrd"
NOTICE: repmgrd was successfully started

stby output:
● repmgrd.service - LSB: Start/stop repmgrd
Loaded: loaded (/etc/init.d/repmgrd; generated)
Active: active (running) since Mon 2024-06-24 04:24:39 EDT; 17min ago
Docs: man:systemd-sysv-generator(8)
Process: 10531 ExecStart=/etc/init.d/repmgrd start (code=exited, status=0/SUCCESS)
Tasks: 1 (limit: 19151)
Memory: 1.3M
CPU: 567ms
CGroup: /system.slice/repmgrd.service
└─10536 /usr/lib/postgresql/15/bin/repmgrd --config-file /var/alloydb/config/repmgr.conf --daemonize=false

Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Starting LSB: Start/stop repmgrd...
Jun 24 04:24:39 omnidbv-repli-03 repmgrd[10531]: Starting PostgreSQL replication management and monitoring daemon: repmgrd.
Jun 24 04:24:39 omnidbv-repli-03 systemd[1]: Started LSB: Start/stop repmgrd.

repmgr extention is installed on both nodes:
repmgr=# SELECT * FROM pg_extension;

oid extname extowner extnamespace extrelocatable extversion extconfig extcondition
14204 plpgsql 10 11 f 1.0
99377 google_columnar_engine 10 2200 t 1.0
99567 google_db_advisor 10 2200 t 1.0
99661 hypopg 10 2200 t 1.3.2
50059 repmgr 47598 50058 f 5.4 {50060,50076,50083} {"","",""}

repmgr service status and daemon status are able to show the repmgrd PIDs but reporting repmgrd as 'not running'

ID Name Role Status Upstream repmgrd PID Paused? Upstream last seen
1 omnidbv-03-n1 primary * running not running 52598 no n/a
2 omnidbv-03-n2 standby running omnidbv-03-n1 not running 10536 no 0 second(s) ago

Any clue why this can be happening? What types of checks repmgr is doing to get the daemon status (beside the repmgrd_is_running function)? Appreciate any help in debugging.
BTW. why the logfile is reporting about: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid and not as configured: REPMGRD_PIDFILE=/var/run/repmgrd.pid,

@mRx-z3d
Copy link
Author

mRx-z3d commented Jun 25, 2024

@ibarwick any chance you could look into this? Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant