← All articlesCloud Infrastructure

PostgreSQL Replication and High Availability: From Streaming to Patroni

Set up PostgreSQL high availability with streaming replication, automatic failover using Patroni, connection pooling with PgBouncer, and monitoring with...

Y
Yash Pritwani
15 min read

Why PostgreSQL HA Matters

A database outage is the most impactful failure your application can experience. Unlike stateless services that can restart in seconds, a database holds your most critical asset — your data. High availability ensures your PostgreSQL database survives hardware failures, network partitions, and planned maintenance.

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 180" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="180" rx="12" fill="#1a1a2e"/><ellipse cx="150" cy="55" rx="60" ry="18" fill="#6366f1" opacity="0.8"/><rect x="90" y="55" width="120" height="50" fill="#6366f1" opacity="0.8"/><ellipse cx="150" cy="105" rx="60" ry="18" fill="#6366f1" opacity="0.9"/><text x="150" y="85" text-anchor="middle" fill="#ffffff" font-size="12" font-family="system-ui" font-weight="bold">Primary</text><text x="150" y="140" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Read + Write</text><ellipse cx="400" cy="30" rx="50" ry="14" fill="#a855f7" opacity="0.7"/><rect x="350" y="30" width="100" height="35" fill="#a855f7" opacity="0.7"/><ellipse cx="400" cy="65" rx="50" ry="14" fill="#a855f7" opacity="0.8"/><text x="400" y="52" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Replica 1</text><ellipse cx="400" cy="110" rx="50" ry="14" fill="#a855f7" opacity="0.7"/><rect x="350" y="110" width="100" height="35" fill="#a855f7" opacity="0.7"/><ellipse cx="400" cy="145" rx="50" ry="14" fill="#a855f7" opacity="0.8"/><text x="400" y="132" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Replica 2</text><defs><marker id="arrow8" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#2dd4bf"/></marker></defs><path d="M212,65 Q280,30 348,48" stroke="#2dd4bf" stroke-width="1.5" fill="none" marker-end="url(#arrow8)"/><path d="M212,90 Q280,130 348,128" stroke="#2dd4bf" stroke-width="1.5" fill="none" marker-end="url(#arrow8)"/><text x="280" y="55" text-anchor="middle" fill="#2dd4bf" font-size="9" font-family="system-ui">WAL stream</text><text x="280" y="130" text-anchor="middle" fill="#2dd4bf" font-size="9" font-family="system-ui">WAL stream</text><text x="500" y="52" text-anchor="start" fill="#94a3b8" font-size="9" font-family="system-ui">Read-only</text><text x="500" y="132" text-anchor="start" fill="#94a3b8" font-size="9" font-family="system-ui">Read-only</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Database replication: the primary handles writes while replicas serve read queries via WAL streaming.</p></div>

Setting Up Streaming Replication

Primary Configuration

-- postgresql.conf on primary
ALTER SYSTEM SET wal_level = 'replica';
ALTER SYSTEM SET max_wal_senders = 5;
ALTER SYSTEM SET wal_keep_size = '1GB';
ALTER SYSTEM SET hot_standby = 'on';
SELECT pg_reload_conf();

Create the replication user:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure-replication-password';

Replica Setup

# Stop PostgreSQL on replica
sudo systemctl stop postgresql

# Clear data directory
sudo rm -rf /var/lib/postgresql/16/main/*

# Base backup from primary (-R creates standby.signal)
sudo -u postgres pg_basebackup \
  -h primary.example.com \
  -U replicator \
  -D /var/lib/postgresql/16/main \
  -Fp -Xs -P -R

# Start replica
sudo systemctl start postgresql

Verify Replication

-- On primary: check connected replicas
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
       (sent_lsn - replay_lsn) AS replication_lag
FROM pg_stat_replication;

-- On replica: confirm it is in recovery mode
SELECT pg_is_in_recovery();  -- Should return true

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 200" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="200" rx="12" fill="#1a1a2e"/><text x="80" y="25" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Input</text><circle cx="80" cy="50" r="14" fill="none" stroke="#3b82f6" stroke-width="2"/><circle cx="80" cy="100" r="14" fill="none" stroke="#3b82f6" stroke-width="2"/><circle cx="80" cy="150" r="14" fill="none" stroke="#3b82f6" stroke-width="2"/><text x="230" y="25" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Hidden</text><circle cx="230" cy="45" r="14" fill="#6366f1" opacity="0.8"/><circle cx="230" cy="85" r="14" fill="#6366f1" opacity="0.8"/><circle cx="230" cy="125" r="14" fill="#6366f1" opacity="0.8"/><circle cx="230" cy="165" r="14" fill="#6366f1" opacity="0.8"/><text x="380" y="25" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Hidden</text><circle cx="380" cy="55" r="14" fill="#a855f7" opacity="0.8"/><circle cx="380" cy="100" r="14" fill="#a855f7" opacity="0.8"/><circle cx="380" cy="145" r="14" fill="#a855f7" opacity="0.8"/><text x="520" y="25" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Output</text><circle cx="520" cy="80" r="14" fill="none" stroke="#2dd4bf" stroke-width="2"/><circle cx="520" cy="130" r="14" fill="none" stroke="#2dd4bf" stroke-width="2"/><line x1="94" y1="50" x2="216" y2="45" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="50" x2="216" y2="85" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="50" x2="216" y2="125" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="50" x2="216" y2="165" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="100" x2="216" y2="45" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="100" x2="216" y2="85" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="100" x2="216" y2="125" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="100" x2="216" y2="165" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="150" x2="216" y2="45" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="150" x2="216" y2="85" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="150" x2="216" y2="125" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="94" y1="150" x2="216" y2="165" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="45" x2="366" y2="55" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="45" x2="366" y2="100" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="45" x2="366" y2="145" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="85" x2="366" y2="55" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="85" x2="366" y2="100" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="85" x2="366" y2="145" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="125" x2="366" y2="55" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="125" x2="366" y2="100" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="125" x2="366" y2="145" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="165" x2="366" y2="55" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="165" x2="366" y2="100" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="244" y1="165" x2="366" y2="145" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="55" x2="506" y2="80" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="55" x2="506" y2="130" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="100" x2="506" y2="80" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="100" x2="506" y2="130" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="145" x2="506" y2="80" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/><line x1="394" y1="145" x2="506" y2="130" stroke="#e2e8f0" stroke-width="0.5" opacity="0.3"/></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">Neural network architecture: data flows through input, hidden, and output layers.</p></div>

PgBouncer Connection Pooling

Applications should never connect directly to PostgreSQL in production. PgBouncer pools connections efficiently:

# pgbouncer.ini
[databases]
myapp = host=pg-primary port=5432 dbname=myapp
myapp_ro = host=pg-replica port=5432 dbname=myapp

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5

Patroni: Automatic Failover

Patroni manages PostgreSQL HA clusters with automatic failover using a distributed consensus store like etcd:

# patroni.yml
scope: myapp-cluster
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: node1:8008

etcd3:
  hosts: etcd1:2379,etcd2:2379,etcd3:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        max_wal_senders: 5
        hot_standby: "on"

postgresql:
  listen: 0.0.0.0:5432
  connect_address: node1:5432
  data_dir: /var/lib/postgresql/data
  authentication:
    superuser:
      username: postgres
      password: postgres-password
    replication:
      username: replicator
      password: replication-password

Monitoring Patroni

# Check cluster status
patronictl -c /etc/patroni.yml list

# Output shows leader, replicas, lag, and timeline
# Manual failover for planned maintenance
patronictl -c /etc/patroni.yml switchover --master node1 --candidate node2

Monitoring Replication Lag

-- Replication lag in seconds on the replica
SELECT CASE
    WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0
    ELSE EXTRACT(EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS replication_lag_seconds;

Key metrics to alert on:

Replication lag > 30 seconds
Flush lag increasing steadily
Replica count drops below expected value

<div style="margin:2.5rem auto;max-width:600px;width:100%;text-align:center;"><svg viewBox="0 0 600 180" xmlns="http://www.w3.org/2000/svg" style="width:100%;height:auto;"><rect width="600" height="180" rx="12" fill="#1a1a2e"/><rect x="30" y="60" width="80" height="50" rx="25" fill="#3b82f6" opacity="0.85"/><text x="70" y="90" text-anchor="middle" fill="#ffffff" font-size="11" font-family="system-ui">Prompt</text><rect x="145" y="50" width="90" height="70" rx="8" fill="#6366f1" opacity="0.85"/><text x="190" y="80" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Embed</text><text x="190" y="95" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">[0.2, 0.8...]</text><rect x="270" y="50" width="90" height="70" rx="8" fill="#a855f7" opacity="0.85"/><text x="315" y="75" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Vector</text><text x="315" y="90" text-anchor="middle" fill="#ffffff" font-size="10" font-family="system-ui">Search</text><text x="315" y="105" text-anchor="middle" fill="#ffffff" font-size="9" font-family="system-ui" opacity="0.7">top-k=5</text><rect x="395" y="50" width="90" height="70" rx="8" fill="#2dd4bf" opacity="0.85"/><text x="440" y="80" text-anchor="middle" fill="#1a1a2e" font-size="11" font-family="system-ui" font-weight="bold">LLM</text><text x="440" y="95" text-anchor="middle" fill="#1a1a2e" font-size="9" font-family="system-ui">+ context</text><rect x="520" y="60" width="55" height="50" rx="25" fill="#f59e0b" opacity="0.85"/><text x="547" y="90" text-anchor="middle" fill="#1a1a2e" font-size="10" font-family="system-ui">Reply</text><defs><marker id="arrow4" markerWidth="8" markerHeight="6" refX="8" refY="3" orient="auto"><path d="M0,0 L8,3 L0,6" fill="#e2e8f0"/></marker></defs><line x1="112" y1="85" x2="143" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow4)"/><line x1="237" y1="85" x2="268" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow4)"/><line x1="362" y1="85" x2="393" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow4)"/><line x1="487" y1="85" x2="518" y2="85" stroke="#e2e8f0" stroke-width="1.5" marker-end="url(#arrow4)"/><text x="300" y="155" text-anchor="middle" fill="#94a3b8" font-size="10" font-family="system-ui">Retrieval-Augmented Generation (RAG) Flow</text></svg><p style="margin-top:0.75rem;font-size:0.85rem;color:#94a3b8;font-style:italic;line-height:1.4;">RAG architecture: user prompts are embedded, matched against a vector store, then fed to an LLM with retrieved context.</p></div>

Backup from Replica

Always backup from the replica to avoid loading the primary:

# Logical backup from replica
pg_dump -h pg-replica -U postgres myapp | gzip > backup.sql.gz

# Physical backup for point-in-time recovery
pg_basebackup -h pg-replica -U replicator -D /backups/base -Ft -z -P

At TechSaaS, we run PostgreSQL 16 as a shared database service for multiple applications. For clients requiring HA, we deploy Patroni with PgBouncer connection pooling, achieving 99.99% database availability.

Need PostgreSQL HA for your application? Contact [email protected].

#postgresql#replication#high-availability#patroni#database

Need help with cloud infrastructure?

TechSaaS provides expert consulting and managed services for cloud infrastructure, DevOps, and AI/ML operations.