Physical Replication in PostgreSQL

Physical Replication in PostgreSQL

Replication in computing refers to the ability to duplicate a service n times to ensure system availability. However, ensuring availability always comes at a cost—either in terms of consistency or partition tolerance. (Although you could almost challenge the CAP theorem with eventual consistency 😛, but that’s not our topic today!)

PostgreSQL, like any database, is a service that provides data storage, retrieval, and mutation in various forms. Naturally, replicating this functionality can be beneficial since, like any system, databases have thresholds and can become congested.

Fortunately, PostgreSQL developers have provided a built-in replication mechanism that ensures 100% isolation and zero coupling with integrating systems.


Types of Replication in PostgreSQL

PostgreSQL supports two main types of replication:

  1. Physical Replication (which we’ll cover today)
  2. Logical Replication


What is Physical Replication?

Physical replication in PostgreSQL works at the binary level by copying Write-Ahead Logs (WALs) from the primary database to a standby server. This ensures that the standby server is an exact byte-for-byte copy of the primary, making it ideal for disaster recovery and load balancing.

Key Concepts:

  • The Primary Server acts as the upstream.
  • The Standby Server acts as the downstream.
  • Standby servers can also act as upstream servers when configured accordingly.


What is Logical Replication?

Unlike physical replication, logical replication operates at a higher level, allowing replication of specific databases, schemas, or tables instead of the entire instance.

Key Difference:

  • Physical replication copies the entire database binaries.
  • Logical replication allows replication of selected tables, schemas, or databases.


A Brief Look at Physical Replication

Replication can be configured to run synchronously or asynchronously. This is controlled by modifying the synchronous_standby_names parameter in the postgresql.conf file.

  • If synchronous_standby_names is empty, synchronous replication is disabled, meaning the system operates in asynchronous mode.
  • This parameter is set on the sender server (which we can consider the primary for now, but note that both servers can be senders).

Additional Configuration:

Another important attribute is primary_conninfo, which must also be modified in postgresql.conf.

  • This parameter is set on the standby server and is mandatory to establish communication between the receiver and sender servers.
  • While you can store passwords directly in primary_conninfo, a better practice is to use the .pgpass file.

Important Note:

To ensure that the standby server functions correctly, you must create the standby.signal file. Without this file, the server will start as a primary server instead of a standby.


Testing on Your Local Environment

I’ll share the commands needed to test physical replication on your local machine.

💡 Note: I am using Ubuntu and Dockerized PostgreSQL containers for this setup.


1. Create Required Directories

mkdir rep        

2. Start Primary and Standby Containers

docker run --name pg_primary -p 5432:5432 -v ~/rep/primary_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres_primary -d postgres

docker run --name pg_standby -p 5433:5432 -v ~/rep/standby_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres_standby -d postgres        

3. Stop the Containers

docker container stop pg_primary pg_standby        

4. Create a Dedicated Network and Connect Containers

docker network create pg_network
docker network connect pg_network pg_primary
docker network connect pg_network pg_standby        

5. Copy Data from Primary to Standby

mv standby_data/ standby_data_old/
sudo cp -R primary_data/ standby_data/        

6. Configure Standby Server

Modify the postgresql.conf file inside standby_data/:

primary_conninfo = 'application_name=standby_rep1 host=pg_primary port=5432 user=postgres'        

7. Configure Primary Server

Modify pg_hba.conf in primary_data/ to allow replication:

host replication postgres all md5        

Modify postgresql.conf to enable synchronous replication:

synchronous_standby_names = 'first 1 (standby_rep1)'        

8. Add pgpass File for Authentication

docker exec -it pg_standby bash
cd /var/lib/postgresql/
echo "pg_primary:5432:*:postgres:postgres_primary" > .pgpass
chmod 0600 /var/lib/postgresql/.pgpass
chown postgres:postgres /var/lib/postgresql/.pgpass        

9. Mark Standby Server as Read-Only

touch standby.signal        

10. Restart the containers

docker container start pg_primary pg_standby        

From now on, everything should be working fine with you.

You can get sure of the integrity of you solution by monitoring the docker logs [Container] command.

So now you can open two terminals. one connecting to the primary and the other is connecting to the standby. Try creating table and add data to it in the Primary server. after that go check the Standby Server. Voila. Replication happened.

I'll follow up this article with another article about the logical partitioning. and I'll write some posts on discussing the important attributes you could manipulate within you replication setting.

Resources:

https://www.postgresql.org/docs/current/runtime-config-replication.html

https://www.postgresql.org/docs/current/warm-standby.html

https://www.postgresql.org/docs/17/libpq-pgpass.html


To view or add a comment, sign in

More articles by Abdurhman Annaggar

Others also viewed

Explore content categories