Setting Up PostgreSQL High Availability Cluster with ETCD and Patroni on Ubuntu

Setting Up PostgreSQL High Availability Cluster with ETCD and Patroni on Ubuntu

Regardless of the size of your business, a service outage could have serious issues, including lost clients and penalties for failing to meet Service Level Agreements (SLAs). Making sure the deployment is highly accessible is essential. In order to minimize the disturbance and eliminate the need of human intervention the system must be sufficiently automated

So, The high availability helps

  1. Minimize the likelihood of system failures
  2. Automatically mitigate the impact of failure
  3. Eliminate single point of failure

This article helps to understand the steps and challenges of high availability postgresql cluster of three patroni node on Ubuntu. Also elaborated Failover steps and addition of new node with existing configuration. This deployment example features etcd running on the same host machines as Patroni and PostgreSQL. Follow the below steps for configuration 

Prerequisites:

 Three postgresql cluster node with postgresql 16 version. Below are the nodes considered for testing purpose

node1  192.168.156.232
node2  192.168.157.149
node3  192.168.144.58        

Step 1: Setup host name entry in hosts file

  • Run the below commands on the first node.

sudo hostnamectl set-hostname node-1        

Similarly run the previous command on remaining two nodes by replace the node name with node-2 and node-3 

  • Update the /etc/hosts file on each PostgreSQL node at the end of file to include the IP addresses and hostnames of all other nodes in the cluster.

 127.0.0.1 localhost
 # Cluster IP and names
 192.168.156.232 node1
 192.168.157.149 node2
 192.168.144.58 node3        

Software Installation:

Execute the following commands in all nodes to install the Postgresql Percona Distribution

  • Install the curl utility

sudo apt update
sudo apt install curl        

  • Get the package from the percona-release repository:

curl -O https://repo.percona.com/apt/percona-release_latest.generic_all.deb        

  • Repository package Installation

sudo apt install gnupg2 lsb-release ./percona-release_latest.generic_all.deb        

  • Update the package information

sudo apt update        

  • Upgrade if latest version available

sudo apt upgrade        

  • Enable Repository

sudo percona-release setup ppg16        

ETCD Installation:

  • ETCD packages installation

sudo apt install etcd etcd-server etcd-client         

  • Stop and disable etcd services

sudo systemctl stop etcd
Sudo systemctl disable etcd        

  • Create or Update configuration file in all node. Modify the  vi /etc/etcd/etcd.conf.yaml configuration with actual node name and IP address
  • Node-1:

name: 'node1'
initial-cluster-token: PostgreSQL_HA_Cluster_1
initial-cluster-state: new
initial-cluster: node1=http://192.168.156.232:2380,node2=http://192.168.157.149:2380,node3=http://192.168.144.58:2380
data-dir: /var/lib/etcd
initial-advertise-peer-urls: http://192.168.156.232:2380
listen-peer-urls: http://192.168.156.232:2380
advertise-client-urls: http://192.168.156.232:2379
listen-client-urls: http://192.168.156.232:2379        

  • Node-2:

name: 'node2'
initial-cluster-token: PostgreSQL_HA_Cluster_1
initial-cluster-state: new
initial-cluster: node1=http://192.168.156.232:2380,node2=http://192.168.157.149:2380,node3=http://192.168.144.58:2380
data-dir: /var/lib/etcd
initial-advertise-peer-urls: http://192.168.157.149:2380
listen-peer-urls: http://192.168.157.149:2380
advertise-client-urls: http://192.168.157.149:2379
listen-client-urls: http://192.168.157.149:2379        

  • Node-3:

name: 'node3'
initial-cluster-token: PostgreSQL_HA_Cluster_1
initial-cluster-state: new
initial-cluster: node1=http://192.168.156.232:2380,node2=http://192.168.157.149:2380,node3=http://192.168.144.58:2380
data-dir: /var/lib/etcd
initial-advertise-peer-urls: http://192.168.144.58:2380
listen-peer-urls: http://192.168.144.58:2380
advertise-client-urls: http://192.168.144.58:2379
listen-client-urls: http://192.168.144.58:2379        

etcd looks for additional cluster nodes specified in the configuration when the node starts. A  timeout could cause the start to fail if the other nodes are not yet operational. Such behavior is expected. To create the etcd cluster, try starting every node simultaneously.

  • Enable etcd and check status

sudo systemctl enable --now etcd
sudo systemctl status etcd        

  • Check the etcd cluster members by using etcdctl. Execute the below command in any one of the node

export ETCDCTL_API=3
HOST_1=192.168.156.232
HOST_2=192.168.157.149
HOST_3=192.168.144.58
ENDPOINTS=$HOST_1:2379,$HOST_2:2379,$HOST_3:2379        

  • Show the cluster member and its status by running below command

sudo etcdctl --endpoints=$ENDPOINTS -w table member list        
Article content
Output result

  • To check the current leader node

sudo etcdctl --endpoints=$ENDPOINTS -w table endpoint status        
Article content
Output Result

Patroni Configuration:

Run the subsequent commands in all available nodes. This can be done concurrently

  • Percona Distribution installation

sudo apt install percona-postgresql-16        

  • Python packages installation

sudo apt install python3-pip python3-dev binutils        

  • Patroni installation

sudo apt install percona-patroni        

  • Stop and disable patroni and postgresql

sudo systemctl stop {patroni,postgresql}
sudo systemctl disable {patroni,postgresql}        

  • Patroni may use an existing Postgres installation, however removing the data directory forces it to build a new Postgres cluster instance. Its recommended only for new cluster which has no data

sudo systemctl stop postgresql
sudo rm -rf /var/lib/postgresql/16/main        

  • Environment variable creation for node name

export NODE_NAME=`hostname -f`         

  • Environment variable for Node IP and display the IP

export NODE_IP=`getent hosts $(hostname -f) | awk '{ print $1 }' | grep -v grep | grep -v '127.0.1.1'` 
echo $NODE_IP        

  • Create main directory under var/lib/postgresql/16

mkdir /var/lib/postgresql/16/main
chmod 755 -R /var/lib/postgresql/16/main
chown postgres:postgres -R /var/lib/postgresql/16/main        

  • Add the below commands in all nodes of home directory by running vi ~/.bashrc

# Add the value for PostgreSQL Percona
DATA_DIR="/var/lib/postgresql/16/main"
PG_BIN_DIR="/usr/lib/postgresql/16/bin"
NODE_NAME=`hostname -f`
NODE_IP=`getent hosts $(hostname -f) | awk '{ print $1 }' | grep -v grep | grep -v '127.0.1.1'`
NAMESPACE="percona_lab"
SCOPE="cluster_1"        

  • Compile finally source ~/.bashrc
  • Create directory for patroni and update below configuration

sudo mkdir -p /etc/patroni/
sudo chown -R  postgres:postgres /etc/patroni/        

  • Create the /etc/patroni/patroni.yml file and update the following configuration in all nodes

namespace: percona_lab
scope: cluster_1
name: node1

restapi:
    listen: 0.0.0.0:8008
    connect_address: 192.168.156.232:8008

etcd3:
    host: 192.168.156.232:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  dcs:
      ttl: 30
      loop_wait: 10
      retry_timeout: 10
      maximum_lag_on_failover: 1048576

      postgresql:
          use_pg_rewind: true
          use_slots: true
          parameters:
              wal_level: replica
              hot_standby: "on"
              wal_keep_segments: 10
              max_wal_senders: 5
              max_replication_slots: 10
              wal_log_hints: "on"
              logging_collector: 'on'
              max_wal_size: '10GB'
              archive_mode: "on"
              password_encryption: "md5"
              listen_addresses: "*"
              archive_timeout: 600s
              archive_command: "cp -f %p /home/postgres/archived/%f"

      pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
      - host all all 127.0.0.1/32 trust
      - host all all 0.0.0.0/0 trust
      - host all all 192.168.156.232/32 md5
      - host all all 192.168.157.149/32 md5
      - host all all 192.168.144.58/32 md5
      - host replication replicator 192.168.156.232/32 trust
      - host replication replicator 192.168.157.149/32 trust
      - host replication replicator 192.168.144.58/32 trust
      - host all all 0.0.0.0/0 md5
      recovery_conf:
            restore_command: cp /home/postgres/archived/%f %p

  # some desired options for 'initdb'
  initdb: # Note: It needs to be a list (some options need values, others are switches)
      - encoding: UTF8
      - data-checksums


postgresql:
    cluster_name: cluster_1
    listen: 0.0.0.0:5432
    connect_address: 192.168.156.232:5432
    data_dir: "/var/lib/postgresql/16/main"
    bin_dir: "/usr/lib/postgresql/16/bin"
    pgpass: /tmp/pgpass0
    authentication:
        replication:
            username: replicator
            password: replPasswd
        superuser:
            username: postgres
            password: qaz123
    parameters:
        unix_socket_directories: "/var/run/postgresql/"
    create_replica_methods:
        - basebackup
    basebackup:
        checkpoint: 'fast'

    watchdog:
      mode: required # Allowed values: off, automatic, required
      device: /dev/watchdog
      safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false        

Node-2:

namespace: percona_lab
scope: cluster_1
name: node2

restapi:
    listen: 0.0.0.0:8008
    connect_address: 192.168.157.149:8008

etcd3:
    host: 192.168.157.149:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  dcs:
      ttl: 30
      loop_wait: 10
      retry_timeout: 10
      maximum_lag_on_failover: 1048576

      postgresql:
          use_pg_rewind: true
          use_slots: true
          parameters:
              wal_level: replica
              hot_standby: "on"
              wal_keep_segments: 10
              max_wal_senders: 5
              max_replication_slots: 10
              wal_log_hints: "on"
              logging_collector: 'on'
              max_wal_size: '10GB'
              archive_mode: "on"
              password_encryption: "md5"
              listen_addresses: "*"
              archive_timeout: 600s
              archive_command: "cp -f %p /home/postgres/archived/%f"

      pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
      - host all all 127.0.0.1/32 trust
      - host all all 0.0.0.0/0 trust
      - host all all 192.168.156.232/32 md5
      - host all all 192.168.157.149/32 md5
      - host all all 192.168.144.58/32 md5
      - host replication replicator 192.168.156.232/32 trust
      - host replication replicator 192.168.157.149/32 trust
      - host replication replicator 192.168.144.58/32 trust
      - host all all 0.0.0.0/0 md5
      recovery_conf:
            restore_command: cp /home/postgres/archived/%f %p

  # some desired options for 'initdb'
  initdb: # Note: It needs to be a list (some options need values, others are switches)
      - encoding: UTF8
      - data-checksums

postgresql:
    cluster_name: cluster_1
    listen: 0.0.0.0:5432
    connect_address: 192.168.157.149:5432
    data_dir: "/var/lib/postgresql/16/main"
    bin_dir: "/usr/lib/postgresql/16/bin"
    pgpass: /tmp/pgpass0
    authentication:
        replication:
            username: replicator
            password: replPasswd
        superuser:
            username: postgres
            password: qaz123
    parameters:
        unix_socket_directories: "/var/run/postgresql/"
    create_replica_methods:
        - basebackup
    basebackup:
        checkpoint: 'fast'

    watchdog:
      mode: required # Allowed values: off, automatic, required
      device: /dev/watchdog
      safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false        

Node-3:

namespace: percona_lab
scope: cluster_1
name: node3

restapi:
    listen: 0.0.0.0:8008
    connect_address: 192.168.144.58:8008

etcd3:
    host: 192.168.144.58:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  dcs:
      ttl: 30
      loop_wait: 10
      retry_timeout: 10
      maximum_lag_on_failover: 1048576

      postgresql:
          use_pg_rewind: true
          use_slots: true
          parameters:
              wal_level: replica
              hot_standby: "on"
              wal_keep_segments: 10
              max_wal_senders: 5
              max_replication_slots: 10
              wal_log_hints: "on"
              logging_collector: 'on'
              max_wal_size: '10GB'
              archive_mode: "on"
              password_encryption: "md5"
              listen_addresses: "*"
              archive_timeout: 600s
              archive_command: "cp -f %p /home/postgres/archived/%f"

      pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
      - host all all 127.0.0.1/32 trust
      - host all all 0.0.0.0/0 trust
      - host all all 192.168.156.232/32 md5
      - host all all 192.168.157.149/32 md5
      - host all all 192.168.144.58/32 md5
      - host replication replicator 192.168.156.232/32 trust
      - host replication replicator 192.168.157.149/32 trust
      - host replication replicator 192.168.144.58/32 trust
      - host all all 0.0.0.0/0 md5
      recovery_conf:
            restore_command: cp /home/postgres/archived/%f %p

  # some desired options for 'initdb'
  initdb: # Note: It needs to be a list (some options need values, others are switches)
      - encoding: UTF8
      - data-checksums


postgresql:
    cluster_name: cluster_1
    listen: 0.0.0.0:5432
    connect_address: 192.168.144.58:5432
    data_dir: "/var/lib/postgresql/16/main"
    bin_dir: "/usr/lib/postgresql/16/bin"
    pgpass: /tmp/pgpass0
    authentication:
        replication:
            username: replicator
            password: replPasswd
        superuser:
            username: postgres
            password: qaz123
    parameters:
        unix_socket_directories: "/var/run/postgresql/"
    create_replica_methods:
        - basebackup
    basebackup:
        checkpoint: 'fast'

    watchdog:
      mode: required # Allowed values: off, automatic, required
      device: /dev/watchdog
      safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false        

  • Verify /etc/systemd/system/percona-patroni.service file availability. If its not created, Manually create and update the following commands in all nodes

[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target 

[Service]
Type=simple 

User=postgres
Group=postgres 

# Start the patroni process
ExecStart=/bin/patroni /etc/patroni/patroni.yml 

# Send HUP to reload from patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID 

# only kill the patroni process, not its children, so it will gracefully stop postgres
KillMode=process 

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=30 

# Do not restart the service if it crashes, we want to manually inspect database on failure
Restart=no 

[Install]
WantedBy=multi-user.target        

  • Reload the systemd file with new service

sudo systemctl daemon-reload        

  • Start the patroni one by one node, not in parallel. Need to wait until the service live in each node

sudo systemctl enable --now percona-patroni        

Following the instructions in the bootstrap section of the configuration file, Patroni initializes PostgreSQL at startup (because the data directory is empty).

  • Check the errors if any

sudo journalctl -fu patroni        

Challenges and Solution:

During the setup, we encountered several challenges and identified solutions that enabled successful data streaming.

1. Error:

Jul 16 09:58:29 node2 patroni[397]:   File "/usr/lib/python3/dist-packages/patroni/postgresql/config.py", line 438, in check_directories

Jul 16 09:58:29 node2 patroni[397]:     raise PatroniException(msg.format(d, "couldn't create the directory"))

Jul 16 09:58:29 node2 patroni[397]: patroni.exceptions.PatroniException: '/var/run/postgresql/' is defined in unix_socket_directories, couldn't create the directory

Jul 16 09:58:29 node2 systemd[1]: percona-patroni.service: Main process exited, code=exited, status=1/FAILURE

Jul 16 09:58:29 node2 systemd[1]: percona-patroni.service: Failed with result 'exit-code'.

Solution :

mkdir /var/run/postgresql
chmod 755 -R /var/run/postgresql
chown postgres:postgres -R /var/run/postgresql        

2. Error:

All are replica unable to do again bootstrap

root@node3:/etc/systemd/system# sudo patronictl -c /etc/patroni/patroni.yml list

+ Cluster: cluster_1 (7527603260178609398) ----+----+-----------+

| Member | Host            | Role    | State   | TL | Lag in MB |

+--------+-----------------+---------+---------+----+-----------+

| node1  | 192.168.156.232 | Replica | stopped |    |   unknown |

| node2  | 192.168.157.149 | Replica | stopped |    |   unknown |

| node3  | 192.168.144.58  | Replica | stopped |    |   unknown |

+--------+-----------------+---------+---------+----+-----------+

Solution:

Solved the above error by remove the cluster and reinitialize the node. Also we can remove the postgresql data directory and reinitialize the node helps to solve the mentioned error.

root@node1:/etc/patroni# patronictl -c /etc/patroni/patroni.yml remove cluster_1
+ Cluster: cluster_1 (7527603260178609398) ----+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| node1  | 192.168.156.232 | Replica | stopped |    |   unknown |
| node2  | 192.168.157.149 | Replica | stopped |    |   unknown |
| node3  | 192.168.144.58  | Replica | stopped |    |   unknown |
+--------+-----------------+---------+---------+----+-----------+
Please confirm the cluster name to remove: cluster_1
You are about to remove all information in DCS for cluster_1, please type: "Yes I am aware": Yes I am aware
root@node1:/etc/patroni# 

root@node3:/etc/systemd/system# sudo patronictl -c /etc/patroni/patroni.yml list
+ Cluster: cluster_1 (7527683292055713867) ----+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| node1  | 192.168.156.232 | Leader  | running |  1 |           |
| node2  | 192.168.157.149 | Replica | stopped |    |   unknown |
| node3  | 192.168.144.58  | Replica | stopped |    |   unknown |
+--------+-----------------+---------+---------+----+-----------+
root@node3:/etc/systemd/system# 

root@node1:/etc/patroni# sudo journalctl -fu percona-patroni

Jul 16 14:15:49 node1 patroni[1642]: 2025-07-16 14:15:49,435 INFO: initialized a new cluster
Jul 16 14:15:49 node1 patroni[1642]: 2025-07-16 14:15:49,552 INFO: no action. I am (node1), the leader with the lock
Jul 16 14:15:59 node1 patroni[1642]: 2025-07-16 14:15:59,560 INFO: no action. I am (node1), the leader with the lock
Jul 16 14:16:09 node1 patroni[1642]: 2025-07-16 14:16:09,619 INFO: no action. I am (node1), the leader with the lock
Jul 16 14:16:19 node1 patroni[1642]: 2025-07-16 14:16:19,481 INFO: no action. I am (node1), the leader with the lock
Jul 16 14:16:29 node1 patroni[1642]: 2025-07-16 14:16:29,481 INFO: no action. I am (node1), the leader with the lock
Jul 16 14:16:39 node1 patroni[1642]: 2025-07-16 14:16:39,481 INFO: no action. I am (node1), the leader with the lock
Jul 16 14:16:49 node1 patroni[1642]: 2025-07-16 14:16:49,480 INFO: no action. I am (node1), the leader with the lock
Jul 16 14:16:59 node1 patroni[1642]: 2025-07-16 14:16:59,480 INFO: no action. I am (node1), the leader with the lock
Jul 16 14:17:09 node1 patroni[1642]: 2025-07-16 14:17:09,480 INFO: no action. I am (node1), the leader with the lock

Remove replica cluster :

root@node2:/etc/systemd/system# patronictl -c /etc/patroni/patroni.yml remove cluster_1
+ Cluster: cluster_1 (7527683292055713867) ----+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| node1  | 192.168.156.232 | Leader  | running |  1 |           |
| node2  | 192.168.157.149 | Replica | stopped |    |   unknown |
| node3  | 192.168.144.58  | Replica | stopped |    |   unknown |
+--------+-----------------+---------+---------+----+-----------+
Please confirm the cluster name to remove: cluster_1
You are about to remove all information in DCS for cluster_1, please type: "Yes I am aware": Yes I am aware
This cluster currently is healthy. Please specify the leader name to continue: node1
root@node2:/etc/systemd/system# sudo patronictl -c /etc/patroni/patroni.yml list
+ Cluster: cluster_1 (uninitialized) +---------+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| node1  | 192.168.156.232 | Leader  | running |  1 |           |
| node2  | 192.168.157.149 | Replica | stopped |    |   unknown |
| node3  | 192.168.144.58  | Replica | stopped |    |   unknown |
+--------+-----------------+---------+---------+----+-----------+

Reinitialize :

root@node1:/etc/patroni# patronictl -c /etc/patroni/patroni.yml reinit cluster_1
+ Cluster: cluster_1 (7527684689691949434) ----+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| node2  | 192.168.157.149 | Leader  | running |  1 |           |
| node3  | 192.168.144.58  | Replica | stopped |    |   unknown |
+--------+-----------------+---------+---------+----+-----------+
Which member do you want to reinitialize [node3]? []: node3
Are you sure you want to reinitialize members node3? [y/N]: y
Success: reinitialize for member node3
root@node1:/etc/patroni# sudo patronictl -c /etc/patroni/patroni.yml list
+ Cluster: cluster_1 (7527684689691949434) -------------+----+-----------+
| Member | Host            | Role    | State            | TL | Lag in MB |
+--------+-----------------+---------+------------------+----+-----------+
| node2  | 192.168.157.149 | Leader  | running          |  1 |           |
| node3  | 192.168.144.58  | Replica | creating replica |    |   unknown |
+--------+-----------------+---------+------------------+----+-----------+

final : 

root@node2:~# sudo patronictl -c /etc/patroni/patroni.yml list
+ Cluster: cluster_1 (7527950681698150480) ------+----+-----------+
| Member | Host            | Role    | State     | TL | Lag in MB |
+--------+-----------------+---------+-----------+----+-----------+
| node1  | 192.168.156.232 | Replica | streaming |  1 |         0 |
| node2  | 192.168.157.149 | Replica | streaming |  1 |         0 |
| node3  | 192.168.144.58  | Leader  | running   |  1 |           |
+--------+-----------------+---------+-----------+----+-----------+        

3. Error:

Jul 16 14:58:39 node3 patroni[4900]: pg_basebackup: error: connection to server at "192.168.157.149", port 5432 failed: FATAL:  no pg_hba.conf entry for replication connection from host "192.168.144.58", user "replicator", no encryption

 postgres@node3:~$ psql -U replicator -W

Password:xxx

psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory

Is the server running locally and accepting connections on that socket?

Solution:

We need to check both postgres config and patroni file;"

vi /var/lib/postgresql/16/main/pg_hba.conf then check the file path to allow ip and reload the file

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
host    replication     replicator      192.168.156.232/32      trust
host    replication     replicator      192.168.157.149/32      trust
host    replication     replicator      192.168.144.58/32       trust


root@node3:/etc/patroni# sudo patronictl -c /etc/patroni/patroni.yml reload cluster_1
+ Cluster: cluster_1 (7527950681698150480) ----+----+-----------+
| Member | Host            | Role    | State   | TL | Lag in MB |
+--------+-----------------+---------+---------+----+-----------+
| node1  | 192.168.156.232 | Replica | stopped |    |   unknown |
| node2  | 192.168.157.149 | Replica | stopped |    |   unknown |
| node3  | 192.168.144.58  | Leader  | running |  1 |           |
+--------+-----------------+---------+---------+----+-----------+
Are you sure you want to reload members node1, node2, node3? [y/N]: y
Reload request received for member node1 and will be processed within 10 seconds
Reload request received for member node2 and will be processed within 10 seconds
Reload request received for member node3 and will be processed within 10 seconds
root@node3:/etc/patroni# sudo patronictl -c /etc/patroni/patroni.yml list
+ Cluster: cluster_1 (7527950681698150480) ------+----+-----------+
| Member | Host            | Role    | State     | TL | Lag in MB |
+--------+-----------------+---------+-----------+----+-----------+
| node1  | 192.168.156.232 | Replica | streaming |  1 |         0 |
| node2  | 192.168.157.149 | Replica | streaming |  1 |         0 |
| node3  | 192.168.144.58  | Leader  | running   |  1 |           |        

Failover Steps:

Here we tried and successfully verified automatic failover configuration with new leader node. It switches the role of current primary and secondary nodes

root@node2:~# sudo patronictl -c /etc/patroni/patroni.yml failover cluster_1
Current cluster topology
+ Cluster: cluster_1 (7527950681698150480) ------+----+-----------+
| Member | Host            | Role    | State     | TL | Lag in MB |
+--------+-----------------+---------+-----------+----+-----------+
| node1  | 192.168.156.232 | Replica | streaming |  1 |         0 |
| node2  | 192.168.157.149 | Replica | streaming |  1 |         0 |
| node3  | 192.168.144.58  | Leader  | running   |  1 |           |
+--------+-----------------+---------+-----------+----+-----------+
Candidate ['node1', 'node2'] []: node1
Are you sure you want to failover cluster cluster_1, demoting current leader node3? [y/N]: y
2025-07-17 08:18:12.04448 Successfully failed over to "node1"        

Addition of New Node with Existing setup:

We tried to add the new node with existing cluster setup. The node4 is configured as like other nodes and added to the existing streaming setup. We need to follow below steps to make new node streaming state.

sudo rm -rf var/lib/postgresql/16/main/*        
root@node2:/etc/patroni# sudo patronictl -c /etc/patroni/patroni.yml restart cluster_1
+ Cluster: cluster_1 (7527950681698150480) ------+----+-----------+
| Member | Host            | Role    | State     | TL | Lag in MB |
+--------+-----------------+---------+-----------+----+-----------+
| node1  | 192.168.156.232 | Leader  | running   |  2 |           |
| node2  | 192.168.157.149 | Replica | streaming |  2 |         0 |
| node3  | 192.168.144.58  | Replica | streaming |  2 |         0 |
| node4  | 192.168.147.200 | Replica | stopped   |    |   unknown |
+--------+-----------------+---------+-----------+----+-----------+
When should the restart take place (e.g. 2025-07-17T09:56)  [now]: 
Are you sure you want to restart members node1, node2, node3, node4? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2)  []: 
Success: restart on member node1
Success: restart on member node2
Success: restart on member node3
Success: restart on member node4
root@node2:/etc/patroni# sudo patronictl -c /etc/patroni/patroni.yml list
+ Cluster: cluster_1 (7527950681698150480) ------+----+-----------+
| Member | Host            | Role    | State     | TL | Lag in MB |
+--------+-----------------+---------+-----------+----+-----------+
| node1  | 192.168.156.232 | Leader  | running   |  2 |           |
| node2  | 192.168.157.149 | Replica | streaming |  2 |         0 |
| node3  | 192.168.144.58  | Replica | streaming |  2 |         0 |
| node4  | 192.168.147.200 | Replica | streaming |  2 |         0 |
+--------+-----------------+---------+-----------+----+-----------+         

Reference:

To view or add a comment, sign in

Others also viewed

Explore content categories