Physical Replication in PostgreSQL
Replication in computing refers to the ability to duplicate a service n times to ensure system availability. However, ensuring availability always comes at a cost—either in terms of consistency or partition tolerance. (Although you could almost challenge the CAP theorem with eventual consistency 😛, but that’s not our topic today!)
PostgreSQL, like any database, is a service that provides data storage, retrieval, and mutation in various forms. Naturally, replicating this functionality can be beneficial since, like any system, databases have thresholds and can become congested.
Fortunately, PostgreSQL developers have provided a built-in replication mechanism that ensures 100% isolation and zero coupling with integrating systems.
Types of Replication in PostgreSQL
PostgreSQL supports two main types of replication:
What is Physical Replication?
Physical replication in PostgreSQL works at the binary level by copying Write-Ahead Logs (WALs) from the primary database to a standby server. This ensures that the standby server is an exact byte-for-byte copy of the primary, making it ideal for disaster recovery and load balancing.
Key Concepts:
What is Logical Replication?
Unlike physical replication, logical replication operates at a higher level, allowing replication of specific databases, schemas, or tables instead of the entire instance.
Key Difference:
A Brief Look at Physical Replication
Replication can be configured to run synchronously or asynchronously. This is controlled by modifying the synchronous_standby_names parameter in the postgresql.conf file.
Additional Configuration:
Another important attribute is primary_conninfo, which must also be modified in postgresql.conf.
Important Note:
To ensure that the standby server functions correctly, you must create the standby.signal file. Without this file, the server will start as a primary server instead of a standby.
Testing on Your Local Environment
I’ll share the commands needed to test physical replication on your local machine.
💡 Note: I am using Ubuntu and Dockerized PostgreSQL containers for this setup.
Recommended by LinkedIn
1. Create Required Directories
mkdir rep
2. Start Primary and Standby Containers
docker run --name pg_primary -p 5432:5432 -v ~/rep/primary_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres_primary -d postgres
docker run --name pg_standby -p 5433:5432 -v ~/rep/standby_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres_standby -d postgres
3. Stop the Containers
docker container stop pg_primary pg_standby
4. Create a Dedicated Network and Connect Containers
docker network create pg_network
docker network connect pg_network pg_primary
docker network connect pg_network pg_standby
5. Copy Data from Primary to Standby
mv standby_data/ standby_data_old/
sudo cp -R primary_data/ standby_data/
6. Configure Standby Server
Modify the postgresql.conf file inside standby_data/:
primary_conninfo = 'application_name=standby_rep1 host=pg_primary port=5432 user=postgres'
7. Configure Primary Server
Modify pg_hba.conf in primary_data/ to allow replication:
host replication postgres all md5
Modify postgresql.conf to enable synchronous replication:
synchronous_standby_names = 'first 1 (standby_rep1)'
8. Add pgpass File for Authentication
docker exec -it pg_standby bash
cd /var/lib/postgresql/
echo "pg_primary:5432:*:postgres:postgres_primary" > .pgpass
chmod 0600 /var/lib/postgresql/.pgpass
chown postgres:postgres /var/lib/postgresql/.pgpass
9. Mark Standby Server as Read-Only
touch standby.signal
10. Restart the containers
docker container start pg_primary pg_standby
From now on, everything should be working fine with you.
You can get sure of the integrity of you solution by monitoring the docker logs [Container] command.
So now you can open two terminals. one connecting to the primary and the other is connecting to the standby. Try creating table and add data to it in the Primary server. after that go check the Standby Server. Voila. Replication happened.
I'll follow up this article with another article about the logical partitioning. and I'll write some posts on discussing the important attributes you could manipulate within you replication setting.
Resources:
Insightful