help wanted - failed to get master from the node! #233

sauravassuritynz · 2019-06-20T21:39:32Z

Hello,
First of all thankyou for the indepth documentation and your article on deploying postgres using containers on swarm and kubernetes.

I am trying to get a swarm cluster up and running and it keeps failing to find the master. Unable to debug it, can you please help.

Error

postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Setting up STOP handlers...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> STARTING SSH (if required)...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> TUNING UP SSH CLIENT...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> STARTING SSH SERVER...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> STARTING POSTGRES...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> TUNING UP POSTGRES...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Cleaning data folder which might have some garbage...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Check all partner nodes for common upstream node...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>>>>> Checking NODE=pgmaster...
postdoc_pgmaster.1.l15shca67ook@masternode    | psql: could not connect to server: Connection refused
postdoc_pgmaster.1.l15shca67ook@masternode    | 	Is the server running on host "pgmaster" (10.0.3.3) and accepting
postdoc_pgmaster.1.l15shca67ook@masternode    | 	TCP/IP connections on port 5432?
postdoc_pgmaster.1.l15shca67ook@masternode    | >>>>>> Skipping: failed to get master from the node!
postdoc_pgmaster.1.l15shca67ook@masternode    | >>>>>> Checking NODE=pgslave1...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Auto-detected master name: ''
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Setting up repmgr...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Setting up repmgr config file '/etc/repmgr.conf'...
postdoc_pgmaster.1.l15shca67ook@masternode    | psql: could not translate host name "pgslave1" to address: No address associated with hostname
postdoc_pgmaster.1.l15shca67ook@masternode    | >>>>>> Skipping: failed to get master from the node!
postdoc_pgmaster.1.l15shca67ook@masternode    | cat: /var/lib/postgresql/data/standby.lock: No such file or directory
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Setting up upstream node...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Previously Locked standby upstream node LOCKED_STANDBY=''
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Waiting for upstream postgres server...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Wait schema replication_db.repmgr on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
postdoc_pgmaster.1.l15shca67ook@masternode    | psql: could not connect to server: Connection refused
postdoc_pgmaster.1.l15shca67ook@masternode    | 	Is the server running on host "pgmaster" (10.0.3.3) and accepting
postdoc_pgmaster.1.l15shca67ook@masternode    | 	TCP/IP connections on port 5432?

Compose file

---
version: "3.3"
services:
  pgmaster:
    hostname: pgmaster
    image: postdock/postgres:latest-postgres10-repmgr40
    environment:
      NODE_ID: 1
      NODE_NAME: node1
      PARTNER_NODES: "pgmaster,pgslave1"
      REPLICATION_PRIMARY_HOST: pgmaster
      NODE_PRIORITY: 100
      SSH_ENABLE: 1
      POSTGRES_PASSWORD: monkey_pass
      POSTGRES_USER: monkey_user
      POSTGRES_DB: monkey_db
      CLEAN_OVER_REWIND: 0a
      CONFIGS_DELIMITER_SYMBOL: ;
      CONFIGS: "listen_addresses:'*';max_replication_slots:5"
      CLUSTER_NAME: pg_cluster
      REPLICATION_DB: replication_db
      REPLICATION_USER: replication_user
      REPLICATION_PASSWORD: replication_pass
    configs:
      - source: id_rsa
        target: /tmp/.ssh/keys/id_rsa
      - source: id_rsa.pub
        target: /tmp/.ssh/keys/id_rsa.pub
    ports:
    - 5432:5432
    volumes:
    - pgmaster:/var/lib/postgresql/data
    networks:
    - postdock

  pgslave1:
    image: postdock/postgres:latest-postgres10-repmgr40
    environment:
      NODE_ID: 2
      NODE_NAME: node2
      CLUSTER_NODE_NETWORK_NAME: pgslave1
      SSH_ENABLE: 1
      PARTNER_NODES: "pgmaster,pgslave1"
      REPLICATION_PRIMARY_HOST: pgmaster
      CLEAN_OVER_REWIND: 1
      CONFIGS_DELIMITER_SYMBOL: ;
      CONFIGS: "listen_addresses:'*';max_replication_slots:5"
    configs:
      - source: id_rsa
        target: /tmp/.ssh/keys/id_rsa
      - source: id_rsa.pub
        target: /tmp/.ssh/keys/id_rsa.pub
    ports:
    - 5441:5432
    volumes:
    - pgslave1:/var/lib/postgresql/data
    networks:
    - postdock
networks:
  postdock:

volumes:
  pgmaster:
  pgslave1:

configs:
  id_rsa:
    file: /home/vagrant/ssh/id_rsa
  id_rsa.pub:
    file: /home/vagrant/ssh/id_rsa.pub

The text was updated successfully, but these errors were encountered:

webdobe · 2021-07-02T19:08:49Z

Running into same issue....

webdobe · 2021-07-02T19:41:09Z

I found something interesting every time I start up my stack I get this error:

Checking NODE=pgslave1...
psql: could not connect to server: Connection refused
Is the server running on host "pgslave1" (10.0.24.12) and accepting
TCP/IP connections on port 5432?

What is weird is pgslave is running on 10.0.24.13
[
{
"Name": "app_net",
"Id": "jut8x2wcutjqs89kiodhyfyou",
"Created": "2021-07-02T12:36:23.180056702-07:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.24.0/24",
"Gateway": "10.0.24.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"03b68a587e6b72773624c2675fef53d01f6c1426911e8e3c0189c7cd6d918b81": {
"Name": "pgpool.1.ccn6b37wx4efh490wl2v1g4x1",
"EndpointID": "a9a5c2b48c603225c85708822ebcb5dbb030d2b0b0eea09721ae4cc1d7045638",
"MacAddress": "02:42:0a:00:18:05",
"IPv4Address": "10.0.24.5/24",
"IPv6Address": ""
},
"d5ae3d1c8ca294486ed3f2cab1fc765c5fc8ae3ec9b1792e0a29842d0c8faa98": {
"Name": "pgslave1.1.og8y63qkcjqyqnquptkv2qxn0",
"EndpointID": "e68c9317bd21d3878009dbbb76e7441047d82c422644add15c73dff7ec51a583",
"MacAddress": "02:42:0a:00:18:0d",
"IPv4Address": "10.0.24.13/24",
"IPv6Address": ""
},
"lb-app_net": {
"Name": "app_net-endpoint",
"EndpointID": "59c3c052f31a53c8ec7abd9dd118dc6702068d683cb536a20126e6799a6412ba",
"MacAddress": "02:42:0a:00:18:08",
"IPv4Address": "10.0.24.8/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4116"
},
"Labels": {
"com.docker.stack.namespace": "hah"
},
"Peers": [
{
"Name": "c09db12e594c",
"IP": "10.0.2.15"
},
{
"Name": "8c59bf0cf8c6",
"IP": "10.1.0.141"
}
]
}
]

What's even weirder... is that when slave is running on another ip... It is always 1 less that it is checking on so when it was running on 17 it was looking for it at 16...

webdobe · 2021-07-02T20:02:59Z

Nvm that was due to endpoint_mode: vip vs endpoint_mode: dnsrr

paunin added the question label Jul 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

help wanted - failed to get master from the node! #233

help wanted - failed to get master from the node! #233

sauravassuritynz commented Jun 20, 2019

webdobe commented Jul 2, 2021

webdobe commented Jul 2, 2021 •

edited

Loading

webdobe commented Jul 2, 2021

help wanted - failed to get master from the node! #233

help wanted - failed to get master from the node! #233

Comments

sauravassuritynz commented Jun 20, 2019

webdobe commented Jul 2, 2021

webdobe commented Jul 2, 2021 • edited Loading

webdobe commented Jul 2, 2021

webdobe commented Jul 2, 2021 •

edited

Loading