PostgreSQL High Availability Cluster - Infrastructure Design and Implementation

PostgreSQL is one of the most popular and advanced relational databases used by a wide range of applications out there. It is quite common to choose PostgreSQL as backend database when designing a nascent application, especially in scenarios where definitive read and write patterns aren’t clear.

However, once your application evolves in complexity and traffic, and when constraints like high availability, service level agreements (SLAs) and downtime can directly impact your revenue, it becomes crucial to design an architecture that ensures redundancy for every component. In this scenario the database plays an important role.

Using PostgreSQL in standalone mode isn’t reliable. To address this limitation, a common approach is to create a cluster PostgreSQL.

A PostgreSQL cluster consists of multiple databases instances that work together to ensure high availability and redundancy. Making PostgreSQL instances redundant by creating replicas is a common well-documented practice. We could also use a database as a service (DBsaaS) like AWS Aurora that provides us a multi-region database. While this can be a good starting point, it can get out of hand in terms of budget and technical constraints for reason that are not in the scope of this blog.

In the scenario when a DBsaaS is not longer suitable, we could opt to build a self-managed infrastructure that helps us to use better our databases and optimize costs. In this blog I will show how to design a cluster PostgreSQL HA cluster in AWS using infrastructure as a code method and self-deployment configuration with ansible.

AWS architecture

For this project, I chose AWS platform to build this infrastructure. There isn’t a special reason why. GCP, as cloud provider, can as well provide the same capabilities to deploy infrastructure. In spite of that, I chose AWS because it is widely used nowadays and I use opentofu instead of aws cloudformation so I can deploy in another provider in the furture like GCP.

I designed the architecture keeping in mind the following aspects:

Regional and availability-zone fault tolerance.
Network isolation for PostgreSQL cluster.
IaC using OpenTofu.
Deployment automation isolation, separating infrastructure provisioning from software deployment.

Deploying in different regions will contribute to the robustness of our architecture, keeping us safe from an aws region failure.

The previous picture represents a minimal architecture for our infrastructure. We can see two PostgreSQL instances in our primary region and one PostgresSQL instance in the secondary region. There isn’t any limitation to spawn more instances in any region. Ideally, we’d want a odd number of instances among the regions, so in that case the next step in scaling up process would be adding 2 instances of postgreSQL in the secondary region. Consequently, if the primary region is lost, secondary region will take over having leader and two replicas.

Components per ec2 instances

The leader election is handled by Patroni. As described in Patroni’s documentation, Patroni needs etcd to store the configuration in a distributed way and to handle the leader election. For more robust medium and large environment, it is recommended to use a separated etcd cluster. This helps to reduce the risk of split-brain, enhances maintainability and reliability. But for simplicity this time, I’ll install etcd on each ec2 instance.

Concerning the contact point of the cluster, applications that want to connect to this cluster will need to know who the leader is so that they can send the writes. Since the leader can change at any time, I deployed one ec2 instance with HAproxy installed. HAproxy has the capability to work as a “farm” of HAproxys instances to gain high availability and fail tolerance. Using a keepalive or pacemaker we can monitor the health of any HAproxy and automatically fail over to a standby instance. As result of having a farm of HAproxy, our infrastructure design will eliminate this single point of failure.

In this example, I’ve opted to have a single HAproxy instance per region, however the automation code can deploy as many as required. The picture below shows the architecture for one region with the components running per instance ec2.

Implementation strategy

In other to implement this architecture, I opted to use opentofu to provision infrastructure in AWS and ansible to configure every instance. Opentofu will create VPC, subnets, AMI roles and ec2 instances amongs others.

Afterward, each EC2 instance will execute initialization code to clone a repository containing the Ansible playbooks and run them locally. The playbooks will only run on locahost so every instance will install and deploy their configuration in parallel.

Finally when the postgreSQL instances will be up and running and thereafter patroni choses the the leader, HAproxy will be notified and it will be able to provide the correct leader contact point. The client then will be using HAproxy contact point to connect to the cluster. This deployment from scraft will take only about 5 to 10 mins max.

Infratructure as code

The opentofu code must to be parameterized so we can adjust the number of instances when we need and, in the future, create an API that will expose this capability and dynamically create the number of replicas needed.

In the following section I will show a strategy to write modular and maintenable code.

The code will be organized in multiple files for better structure and maintainability, main.tf will only handle EC2 instances provisioning. Every infrastructure component like vpc, subnets, System Manager (ssm.tf) and Security Groups (sg.tf) will have their own file. This approach prevents “spaghetti code” — a poorly structured setup where all components are crammed into one large file and the code becomes hard-to-understand and maintain.

While I could create an ec2.tf file to futher modularize the code and leverage main.tf as orchestrator, for this current scope, I’ll keep main.tf as described and reserve this change for future improvements.

The terraform.tfvars file will contain all the variables set so we can do all the magic in main.tf in a modular way. Since we need to create the same instances for PostgreSQL but with different names, it is a good idea to reuse the code.

Reusing code helps ensure that our infrastructure stays maintainable and allows for easy updates or modifications over time.

One way to achieve this is by declaring the variables as shown below:

variable "postgres_instances_primary" {
  description = "List of PostgreSQL instance names in the primary region."
  type        = map(object({
    name = string
    zone = string
  }))
  default = {
    "db1" = { name = "db1", zone = "a" },
    "db2" = { name = "db2", zone = "b" },
    "db3" = { name = "db3", zone = "c" }
  }
}

variable "postgres_instances_secondary" {
  description = "List of PostgreSQL instance names in the secondary region."
  type        = map(object({
    name = string
    zone = string
  }))
    default = {
    "db1" = { name = "db1", zone = "a" },
    "db2" = { name = "db2", zone = "b" },
    "db3" = { name = "db3", zone = "c" }
  }
}

locals {
    postgres_instances_primary = [
        for key, instance in var.postgres_instances_primary : {
            name   = instance.name
            zone   = "${var.region1}${instance.zone}"
        }
    ]

    postgres_instances_secondary = [
        for key, instance in var.postgres_instances_secondary : {
            name   = instance.name
            zone   = "${var.region2}${instance.zone}"
        }
    ]
}

I designed the code in this way so when calling resource “aws_instance”, I can call my variables as the next picture shows.

resource "aws_instance" "postgres_instances_secondary" {
    count         = length(local.postgres_instances_secondary)
    ami           = var.ami_id_secondary
    instance_type = var.instance_type
    user_data = file("${path.module}/init.sh")
    availability_zone = local.postgres_instances_secondary[count.index].zone
    subnet_id = aws_subnet.private_subnet_region2[count.index].id
    vpc_security_group_ids = [aws_security_group.postgres_sg_region2.id, 
    aws_security_group.ssm_sg_region2.id,
    aws_security_group.etcd_sg_region2.id,
    aws_security_group.patroni_sg_region2.id]
    iam_instance_profile = aws_iam_instance_profile.session_manager_profile.name
    key_name = aws_key_pair.ssh_key_pair2.key_name


    tags = {
        Name        = local.postgres_instances_secondary[count.index].name
        project     = "MyClusterHA"
        environment = var.env2
        region      = var.region2
        owner       = var.owner
        application = "postgres"
        etcd = "node-${var.region2}-${count.index + 1}"
        cost_center = var.cost_center
    }

    provider = aws.secondary_region
}

Of course this is a brief code example to demonstrate the main idea. However for a production ready automation, I would rather split this code into reusable modules, so it can scale in complexity easily. But just to make a proof of concept, I will regroup my code into logical files, like network, security groups, intances etc.

Ultimately, the OpenTofu code will handle the deployment of instances, VPCs, subnets, IAM roles and more. Once the instances are running correctly, I provide a single script that will be executed by each instance. This is known as ‘user data’ in AWS EC2. The script purpose is to launch the next step of our automation: the configuration and deployment of PostgreSQL.

Configuration and Deployment

The deployment automation is done by ansible. The ansible code contains different roles, each role will be accountable of installing every component of the software we need to run a HA PostgreSQL cluster. The components are: Patroni, ETCD, PostgreSQL and HAproxy.

As I described before, the script that runs in the user data section will be in charge of preparing the instance to be able to clone the repositories where the ansible code resides and run the ansible playbook in background.

For tracking and observability purposes, we can follow the deployment process by sending the logs of our user data script to any logging service like splunk or Amazon CloudWatch Logs. If any problem happens during the deployment, installation error or code bug, I will be able to identify and set an alert for the future.

In the next picture you can see the flow of the automation.

Once ansible finishes successfully, we can start by connecting to any PostgreSQL instances and request the status by running patronictl list like bellow:

You can see all three of our instances, with one located in another region. As a result, we can ensure that if a region goes down, we still have PostgreSQL instances in another region.

To connect on psql to the cluster, we can use the HAproxy instance contact point like the next picture shows:

[root@ip-10-11-11-172 bin]# psql -h 10.11.1.100 -U postgres
Password for user postgres:
psql (15.12)
Type "help" for help.

postgres=# select inet_server_addr();
 inet_server_addr
------------------
 10.11.1.54
(1 row)

To recap, I will show every instance ec2 per region. The next picture presents the instances for the primary region.

In the secondary region, you will see the third PostgreSQL instance.

Identifying a point of failure

Just until here, I showed to you how to install and deploy PostgreSQL cluster in a multi region mode, however this isn’t the case for HAproxy. In order to make our infrastructure more resilient, we can actually deploy HAProxy in both regions.

As well for ETCD, a component that is running inside our PostgreSQL instances, can be move it away from there and create their own cluster, so our infrastructure will be more suitable for heavy load traffic. The next picture shows this configuration.

In this way, if we need to do any upgrade or mantainance in any component, we don’t need to stop temporally the PostgreSQL processes.

Future Work

What I find interesting in this approach, is the fact that all the provisioning is synchronized from the moment we type tofu apply. There is no need to check if the provisioning finished successfully before starting the second part of the automation which is the deployment with ansible.

Nevertheless, I left out some important points. These include: pipeline, security, alerting, observability/logging and monitoring.

As a next step, we can enhance the ansible playbook by adding a role for every missing aspect. For monitoring we can install datadog agents or postgres_exporters. They are commonly used, and we can export all the metrics to create detailed dashboards in grafana or use the basic datadog dashboard. For security, we should enable TLS connection among ETCD instances and PostgreSQL instances as well as client connections. For this scenario, every role can be modify to enable the TLS connection. For alerting and logging, I will create a role only for it.

On the topic of continuous deployment, a pipeline can be created in the github repository that contains the opentofu code. So when any change is introfuced, they will get deployed in the correct environment, of course after passing previous checks and approvals. The state of the infrastructure should be stored outside, like in a s3 bucket, so it can serve as a source of truth.

Conclusion

On a personal note, I find it both fun and interesting to deploy highly available infrastructure for PostgreSQL following the best practices in DevOps and IaC. This is exactly what database as a service (DBaaS) tries to offer, aiming to eliminate the need of handling infrastructure deployment. While DBaaS is a good starting point, the true mastery in this field lies in building and managing robust systems ourselves.

Of course this comes with many challenges and technologies that play along with orders to provide a resilient infrastructure. We can swap out PostgreSQL for any other database or application and the challenges will be the same, how to make our application resilient to failures, scalable and distributed.

Frequently, companies choose database as a service (DBaaS) so they can save time and monetize faster. But sometimes some companies reach a certain threshold of complexity and the pricing model isn’t longer affordable. On this condition, build a self-managed infrastructure becomes important.

Deploying the infrastruture can look scary but it is actually not so difficult. Of course it takes time to get there, it is process of transformation while the automation needs to gain a high level of maturity. Once there, tasks like upgrade, patch, scale up/down become a low risk operation in production environment.

PostgreSQL High Availability Cluster - Infrastructure Design and Implementation

Jean Carlo Rivera Ura

AWS architecture

Components per ec2 instances

Implementation strategy

Infratructure as code

Recommended by LinkedIn

Configuration and Deployment

Identifying a point of failure

Future Work

Conclusion

Others also viewed

PostgreSQL vs. Aurora PostgreSQL: Choosing the Right Database for Your AWS Cloud

How I Learned the Hard Way: Backing Up & Restoring MongoDB (Local & Atlas)

Building a Highly Available PostgreSQL Cluster with Patroni, Etcd, HAProxy, TimescaleDB & PGTune

Identifying and Managing Noisy Neighbors in Multi-Tenant PostgreSQL Deployments (On-Premise)

Swiss PG Day 2024

Scaling PostgreSQL Connections in Azure: A Deep Dive into Multi-PgBouncer Architectures

Pythian’s Most Seasoned Database Architect Chats With Me About Notable Projects and the Future of Open-Source Cloud: An Interview With John Schulz

Inside MongoDB Oplog: The Backbone of Replication and Real-Time Features

MongoDB Monitoring - Unlocking Performance and Stability

How to Achieve High Availability for Open Source PostgreSQL

Explore content categories