Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and Laravel Deployments on Linode

Establishing a High-Availability PostgreSQL Cluster with Patroni

Achieving automated failover for PostgreSQL requires a robust cluster management solution. Patroni stands out as a leading choice, offering declarative configuration, REST API management, and seamless integration with distributed configuration stores like etcd or Consul. For this guide, we’ll focus on etcd due to its widespread adoption and ease of deployment on Linode.

Our setup will involve at least three PostgreSQL nodes for quorum and high availability. One node will act as the primary, with others configured as replicas. Patroni will monitor the health of the primary and orchestrate the promotion of a replica in case of failure.

Prerequisites and Initial Setup

Before diving into Patroni configuration, ensure you have the following:

Three Linode instances (or more for production) provisioned with a recent Linux distribution (e.g., Ubuntu 22.04 LTS).
SSH access to all instances.
Basic understanding of PostgreSQL installation and administration.
A distributed configuration store (etcd) deployed and accessible by all PostgreSQL nodes. For simplicity, we’ll assume etcd is running on dedicated nodes or co-located on one of the PostgreSQL nodes (not recommended for production).

First, install PostgreSQL and Patroni on each PostgreSQL node. We’ll use `apt` for Ubuntu.

Node 1 (Primary Candidate), Node 2 (Replica Candidate), Node 3 (Replica Candidate)

On each PostgreSQL node, execute the following commands:

sudo apt update
sudo apt install -y postgresql postgresql-contrib python3-pip
sudo pip3 install --upgrade pip
sudo pip3 install patroni[etcd] python-etcd

Next, configure PostgreSQL to allow replication and remote connections. Edit postgresql.conf (typically located at /etc/postgresql/[version]/main/postgresql.conf) and pg_hba.conf (typically at /etc/postgresql/[version]/main/pg_hba.conf).

`postgresql.conf` modifications

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on

Ensure these settings are applied to all PostgreSQL nodes. Restart PostgreSQL to apply these changes:

sudo systemctl restart postgresql

`pg_hba.conf` modifications

Add entries to allow replication connections from other nodes and connections from your application servers. Replace [your_app_subnet] with your application server’s subnet or specific IP addresses.

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    replication     replicator      [node_ip_1]/32          md5
host    replication     replicator      [node_ip_2]/32          md5
host    replication     replicator      [node_ip_3]/32          md5
host    all             app_user        [your_app_subnet]       md5

You’ll need to create the replicator user and the app_user. This is typically done on the initial primary node before Patroni takes over. For now, we’ll assume these users will be managed by Patroni or created manually later.

Configuring Patroni

Patroni’s configuration is typically provided via a YAML file. Create a patroni.yml file on each PostgreSQL node. The configuration needs to point to your etcd cluster and define the PostgreSQL parameters.

`patroni.yml` (Example for Node 1)

# Global configuration
scope: my_pg_cluster
namespace: /service/ # etcd namespace for this cluster
name: patroni-node-1 # Unique name for this Patroni instance

# PostgreSQL configuration
postgresql:
  listen: 0.0.0.0:5432
  connect_address: [node_1_ip]:5432
  data_dir: /var/lib/postgresql/[version]/main/data
  bin_dir: /usr/lib/postgresql/[version]/bin
  pg_hba:
    - host    replication     replicator      0.0.0.0/0       md5
    - host    all             app_user        0.0.0.0/0       md5
  replication:
    username: replicator
    password: your_replication_password
    slots:
      - name: patroni_slot
        database: postgres
  parameters:
    wal_level: replica
    max_wal_senders: 10
    max_replication_slots: 10
    hot_standby: on

# etcd configuration
etcd:
  hosts:
    - [etcd_node_1_ip]:2379
    - [etcd_node_2_ip]:2379
    - [etcd_node_3_ip]:2379
  protocol: http # or https if using TLS

# Replication configuration
replication:
  synchronous_mode: false # Set to true for synchronous replication if needed
  synchronous_node_count: 1 # Number of synchronous replicas required

# Tags for identifying nodes
tags:
  nofailover: false
  clonefrom: false

# REST API configuration
restapi:
  listen: 0.0.0.0:8008
  connect_address: [node_1_ip]:8008

# Logging configuration
log:
  level: INFO
  dir: /var/log/patroni
  file: patroni.log

Important Notes:

Replace [node_1_ip], [etcd_node_x_ip], and [version] with your actual values.
The scope should be identical across all nodes in the cluster.
The name must be unique for each Patroni instance.
Ensure the data_dir and bin_dir paths are correct for your PostgreSQL installation.
The pg_hba entries in Patroni’s config will override or supplement the file-based ones. It’s generally recommended to manage pg_hba via Patroni for consistency.
Set a strong password for the replicator user.
The namespace in etcd is crucial for Patroni to discover other cluster members.
For production, consider using TLS for etcd communication and securing the REST API.

Create the /var/log/patroni directory and set appropriate permissions:

sudo mkdir -p /var/log/patroni
sudo chown -R postgres:postgres /var/log/patroni

Copy this patroni.yml to all PostgreSQL nodes, adjusting the name and connect_address for each node accordingly.

Initializing the PostgreSQL Cluster with Patroni

On one of the nodes (this will be the initial primary), you need to initialize the PostgreSQL cluster managed by Patroni. Ensure PostgreSQL service is stopped before proceeding.

sudo systemctl stop postgresql

Now, start Patroni. It will detect that no PostgreSQL cluster exists and will initialize it.

sudo patroni /etc/patroni/patroni.yml

Patroni will log its actions. You should see messages indicating it’s initializing the cluster, creating the replication user, and starting PostgreSQL. Once it reports that it’s running as a primary, you can start Patroni as a systemd service on the other nodes.

Starting Patroni as a Systemd Service

Create a systemd service file for Patroni (e.g., /etc/systemd/system/patroni.service):

[Unit]
Description=Patroni - Highly Available PostgreSQL solution
After=network.target

[Service]
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
Restart=on-failure
KillMode=process
TimeoutSec=30

[Install]
WantedBy=multi-user.target

Reload systemd, enable, and start the service:

sudo systemctl daemon-reload
sudo systemctl enable patroni
sudo systemctl start patroni

On the other nodes, Patroni will detect the existing primary in etcd and configure themselves as replicas, cloning data if necessary. Monitor the logs on all nodes to ensure they join the cluster correctly.

Testing Automatic Failover

To test the failover mechanism, simulate a failure of the current primary node. The simplest way is to stop the Patroni service on the primary.

Simulating Primary Failure

On the current primary node:

sudo systemctl stop patroni

Within seconds, Patroni on the other nodes will detect the primary’s absence. One of the replicas will be elected as the new primary. You can monitor this process via the Patroni logs on each node.

To verify the new primary, you can query the Patroni REST API on any node:

curl http://[any_node_ip]:8008/primary

This will return JSON output indicating the current primary’s API endpoint. You can also query /cluster for the full cluster status.

Integrating Laravel with the HA PostgreSQL Cluster

Your Laravel application needs to connect to the PostgreSQL cluster in a way that it always targets the current primary. The most straightforward approach is to use a load balancer or a DNS-based solution that points to the active primary.

Option 1: Using a Load Balancer (e.g., HAProxy)

Deploy HAProxy on a separate instance or co-locate it on one of your application servers. HAProxy can monitor the Patroni REST API to determine the current primary.

HAProxy Configuration

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    tcp
    option  tcplog
    option  dontlognull
    timeout connect 5000ms
    timeout client  50000ms
    timeout server  50000ms
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

listen postgres_cluster
    bind *:5432
    mode tcp
    option httpchk GET /primary
    http-check expect status 200
    balance roundrobin
    # This is where HAProxy will query Patroni for the primary
    # It will then forward traffic to the IP returned by the API
    # This requires a custom script or a more advanced HAProxy setup
    # For simplicity, we'll use a static list and rely on Patroni's health checks
    # A more robust solution involves a script that polls Patroni API and updates backend list.
    server patroni-node-1 [node_1_ip]:5432 check port 8008 inter 2s fall 3 rise 2
    server patroni-node-2 [node_2_ip]:5432 check port 8008 inter 2s fall 3 rise 2
    server patroni-node-3 [node_3_ip]:5432 check port 8008 inter 2s fall 3 rise 2

listen stats
    bind *:8404
    mode http
    stats enable
    stats uri /stats
    stats refresh 10s

Note on HAProxy Configuration: The above HAProxy configuration is a simplified example. For true dynamic failover detection, HAProxy needs to actively query the Patroni API (e.g., /primary endpoint) to determine the current primary and dynamically update its backend list. This often involves a custom script that runs periodically, polls the Patroni API, and reloads HAProxy configuration. Alternatively, tools like `patroni-vip` or `keepalived` can manage a floating IP that always points to the primary.

Option 2: DNS-Based Failover

Use a DNS provider that supports health checks and dynamic record updates. A script on your infrastructure would monitor the Patroni cluster and update a DNS A record to point to the current primary’s IP address. This is generally more complex to implement reliably.

Laravel Database Configuration

Configure your Laravel config/database.php to point to the load balancer’s IP address or the DNS name.

<?php

return [

    // ... other configurations

    'connections' => [

        'pgsql' => [
            'driver' => 'pgsql',
            'url' => env('DATABASE_URL'),
            'host' => env('DB_HOST', '127.0.0.1'), // This should be your HAProxy IP or DNS name
            'port' => env('DB_PORT', '5432'),
            'database' => env('DB_DATABASE', 'forge'),
            'username' => env('DB_USERNAME', 'forge'),
            'password' => env('DB_PASSWORD', ''),
            'charset' => 'utf8',
            'prefix' => '',
            'prefix_indexes' => true,
            'schema' => 'public',
            'sslmode' => 'prefer',
        ],

        // ... other connections
    ],

    // ... other configurations
];

In your .env file:

DB_HOST=192.168.1.100 # IP of your HAProxy instance
DB_PORT=5432
DB_DATABASE=your_app_db
DB_USERNAME=app_user
DB_PASSWORD=your_app_password

Advanced Considerations and Best Practices

Replication Slots: Ensure replication slots are properly configured and monitored. A misbehaving slot can cause the primary to retain WAL segments indefinitely, leading to disk full errors. Patroni manages slots, but it’s good practice to monitor their status.

Monitoring: Implement comprehensive monitoring for your PostgreSQL cluster, etcd, and HAProxy. Key metrics include replication lag, etcd health, Patroni API responsiveness, and HAProxy backend status.

Backups: Automated failover is not a substitute for a robust backup strategy. Use tools like pg_dump or pg_basebackup with a reliable backup solution (e.g., Linode Object Storage, S3-compatible storage).

Security: Secure etcd communication with TLS. Encrypt PostgreSQL replication traffic if nodes are on untrusted networks. Use strong passwords and consider IP whitelisting.

Testing: Regularly test your failover procedures. Simulate various failure scenarios (node crash, network partition, etcd failure) to ensure your system behaves as expected.

Read Replicas: For read-heavy workloads, you can configure additional read-only replicas. These can be managed by Patroni or as standard PostgreSQL replicas. Direct read traffic to these replicas to offload the primary.

Configuration Management: Use tools like Ansible, Chef, or Puppet to automate the deployment and configuration of PostgreSQL, Patroni, and etcd across your nodes. This ensures consistency and simplifies management.

By implementing Patroni for PostgreSQL high availability and integrating it with your Laravel application via a load balancer or intelligent DNS, you can achieve a resilient and automatically failing-over database infrastructure on Linode.