Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help wanted - failed to get master from the node! #233

Open
sauravassuritynz opened this issue Jun 20, 2019 · 3 comments
Open

help wanted - failed to get master from the node! #233

sauravassuritynz opened this issue Jun 20, 2019 · 3 comments
Labels

Comments

@sauravassuritynz
Copy link

Hello,
First of all thankyou for the indepth documentation and your article on deploying postgres using containers on swarm and kubernetes.

I am trying to get a swarm cluster up and running and it keeps failing to find the master. Unable to debug it, can you please help.

Error

postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Setting up STOP handlers...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> STARTING SSH (if required)...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> TUNING UP SSH CLIENT...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> STARTING SSH SERVER...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> STARTING POSTGRES...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> TUNING UP POSTGRES...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Cleaning data folder which might have some garbage...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Check all partner nodes for common upstream node...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>>>>> Checking NODE=pgmaster...
postdoc_pgmaster.1.l15shca67ook@masternode    | psql: could not connect to server: Connection refused
postdoc_pgmaster.1.l15shca67ook@masternode    | 	Is the server running on host "pgmaster" (10.0.3.3) and accepting
postdoc_pgmaster.1.l15shca67ook@masternode    | 	TCP/IP connections on port 5432?
postdoc_pgmaster.1.l15shca67ook@masternode    | >>>>>> Skipping: failed to get master from the node!
postdoc_pgmaster.1.l15shca67ook@masternode    | >>>>>> Checking NODE=pgslave1...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Auto-detected master name: ''
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Setting up repmgr...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Setting up repmgr config file '/etc/repmgr.conf'...
postdoc_pgmaster.1.l15shca67ook@masternode    | psql: could not translate host name "pgslave1" to address: No address associated with hostname
postdoc_pgmaster.1.l15shca67ook@masternode    | >>>>>> Skipping: failed to get master from the node!
postdoc_pgmaster.1.l15shca67ook@masternode    | cat: /var/lib/postgresql/data/standby.lock: No such file or directory
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Setting up upstream node...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Previously Locked standby upstream node LOCKED_STANDBY=''
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Waiting for upstream postgres server...
postdoc_pgmaster.1.l15shca67ook@masternode    | >>> Wait schema replication_db.repmgr on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
postdoc_pgmaster.1.l15shca67ook@masternode    | psql: could not connect to server: Connection refused
postdoc_pgmaster.1.l15shca67ook@masternode    | 	Is the server running on host "pgmaster" (10.0.3.3) and accepting
postdoc_pgmaster.1.l15shca67ook@masternode    | 	TCP/IP connections on port 5432?

Compose file

---
version: "3.3"
services:
  pgmaster:
    hostname: pgmaster
    image: postdock/postgres:latest-postgres10-repmgr40
    environment:
      NODE_ID: 1
      NODE_NAME: node1
      PARTNER_NODES: "pgmaster,pgslave1"
      REPLICATION_PRIMARY_HOST: pgmaster
      NODE_PRIORITY: 100
      SSH_ENABLE: 1
      POSTGRES_PASSWORD: monkey_pass
      POSTGRES_USER: monkey_user
      POSTGRES_DB: monkey_db
      CLEAN_OVER_REWIND: 0a
      CONFIGS_DELIMITER_SYMBOL: ;
      CONFIGS: "listen_addresses:'*';max_replication_slots:5"
      CLUSTER_NAME: pg_cluster
      REPLICATION_DB: replication_db
      REPLICATION_USER: replication_user
      REPLICATION_PASSWORD: replication_pass
    configs:
      - source: id_rsa
        target: /tmp/.ssh/keys/id_rsa
      - source: id_rsa.pub
        target: /tmp/.ssh/keys/id_rsa.pub
    ports:
    - 5432:5432
    volumes:
    - pgmaster:/var/lib/postgresql/data
    networks:
    - postdock

  pgslave1:
    image: postdock/postgres:latest-postgres10-repmgr40
    environment:
      NODE_ID: 2
      NODE_NAME: node2
      CLUSTER_NODE_NETWORK_NAME: pgslave1
      SSH_ENABLE: 1
      PARTNER_NODES: "pgmaster,pgslave1"
      REPLICATION_PRIMARY_HOST: pgmaster
      CLEAN_OVER_REWIND: 1
      CONFIGS_DELIMITER_SYMBOL: ;
      CONFIGS: "listen_addresses:'*';max_replication_slots:5"
    configs:
      - source: id_rsa
        target: /tmp/.ssh/keys/id_rsa
      - source: id_rsa.pub
        target: /tmp/.ssh/keys/id_rsa.pub
    ports:
    - 5441:5432
    volumes:
    - pgslave1:/var/lib/postgresql/data
    networks:
    - postdock
networks:
  postdock:

volumes:
  pgmaster:
  pgslave1:

configs:
  id_rsa:
    file: /home/vagrant/ssh/id_rsa
  id_rsa.pub:
    file: /home/vagrant/ssh/id_rsa.pub
@webdobe
Copy link

webdobe commented Jul 2, 2021

Running into same issue....

@webdobe
Copy link

webdobe commented Jul 2, 2021

I found something interesting every time I start up my stack I get this error:

Checking NODE=pgslave1...
psql: could not connect to server: Connection refused
Is the server running on host "pgslave1" (10.0.24.12) and accepting
TCP/IP connections on port 5432?

What is weird is pgslave is running on 10.0.24.13
[
{
"Name": "app_net",
"Id": "jut8x2wcutjqs89kiodhyfyou",
"Created": "2021-07-02T12:36:23.180056702-07:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.24.0/24",
"Gateway": "10.0.24.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"03b68a587e6b72773624c2675fef53d01f6c1426911e8e3c0189c7cd6d918b81": {
"Name": "pgpool.1.ccn6b37wx4efh490wl2v1g4x1",
"EndpointID": "a9a5c2b48c603225c85708822ebcb5dbb030d2b0b0eea09721ae4cc1d7045638",
"MacAddress": "02:42:0a:00:18:05",
"IPv4Address": "10.0.24.5/24",
"IPv6Address": ""
},
"d5ae3d1c8ca294486ed3f2cab1fc765c5fc8ae3ec9b1792e0a29842d0c8faa98": {
"Name": "pgslave1.1.og8y63qkcjqyqnquptkv2qxn0",
"EndpointID": "e68c9317bd21d3878009dbbb76e7441047d82c422644add15c73dff7ec51a583",
"MacAddress": "02:42:0a:00:18:0d",
"IPv4Address": "10.0.24.13/24",
"IPv6Address": ""
},
"lb-app_net": {
"Name": "app_net-endpoint",
"EndpointID": "59c3c052f31a53c8ec7abd9dd118dc6702068d683cb536a20126e6799a6412ba",
"MacAddress": "02:42:0a:00:18:08",
"IPv4Address": "10.0.24.8/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4116"
},
"Labels": {
"com.docker.stack.namespace": "hah"
},
"Peers": [
{
"Name": "c09db12e594c",
"IP": "10.0.2.15"
},
{
"Name": "8c59bf0cf8c6",
"IP": "10.1.0.141"
}
]
}
]

What's even weirder... is that when slave is running on another ip... It is always 1 less that it is checking on so when it was running on 17 it was looking for it at 16...

@webdobe
Copy link

webdobe commented Jul 2, 2021

Nvm that was due to endpoint_mode: vip vs endpoint_mode: dnsrr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants