Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and Laravel Deployments on Linode
Establishing a High-Availability PostgreSQL Cluster with Patroni
Achieving automated failover for PostgreSQL requires a robust cluster management solution. Patroni stands out as a leading choice, offering declarative configuration, REST API management, and seamless integration with distributed configuration stores like etcd or Consul. For this guide, we’ll focus on etcd due to its widespread adoption and ease of deployment on Linode.
Our setup will involve at least three PostgreSQL nodes for quorum and high availability. One node will act as the primary, with others configured as replicas. Patroni will monitor the health of the primary and orchestrate the promotion of a replica in case of failure.
Prerequisites and Initial Setup
Before diving into Patroni configuration, ensure you have the following:
- Three Linode instances (or more for production) provisioned with a recent Linux distribution (e.g., Ubuntu 22.04 LTS).
- SSH access to all instances.
- Basic understanding of PostgreSQL installation and administration.
- A distributed configuration store (etcd) deployed and accessible by all PostgreSQL nodes. For simplicity, we’ll assume etcd is running on dedicated nodes or co-located on one of the PostgreSQL nodes (not recommended for production).
First, install PostgreSQL and Patroni on each PostgreSQL node. We’ll use `apt` for Ubuntu.
Node 1 (Primary Candidate), Node 2 (Replica Candidate), Node 3 (Replica Candidate)
On each PostgreSQL node, execute the following commands:
sudo apt update sudo apt install -y postgresql postgresql-contrib python3-pip sudo pip3 install --upgrade pip sudo pip3 install patroni[etcd] python-etcd
Next, configure PostgreSQL to allow replication and remote connections. Edit postgresql.conf (typically located at /etc/postgresql/[version]/main/postgresql.conf) and pg_hba.conf (typically at /etc/postgresql/[version]/main/pg_hba.conf).
postgresql.conf modifications
listen_addresses = '*' wal_level = replica max_wal_senders = 10 max_replication_slots = 10 hot_standby = on
Ensure these settings are applied to all PostgreSQL nodes. Restart PostgreSQL to apply these changes:
sudo systemctl restart postgresql
pg_hba.conf modifications
Add entries to allow replication connections from other nodes and connections from your application servers. Replace [your_app_subnet] with your application server’s subnet or specific IP addresses.
# TYPE DATABASE USER ADDRESS METHOD host replication replicator [node_ip_1]/32 md5 host replication replicator [node_ip_2]/32 md5 host replication replicator [node_ip_3]/32 md5 host all app_user [your_app_subnet] md5
You’ll need to create the replicator user and the app_user. This is typically done on the initial primary node before Patroni takes over. For now, we’ll assume these users will be managed by Patroni or created manually later.
Configuring Patroni
Patroni’s configuration is typically provided via a YAML file. Create a patroni.yml file on each PostgreSQL node. The configuration needs to point to your etcd cluster and define the PostgreSQL parameters.
patroni.yml (Example for Node 1)
# Global configuration
scope: my_pg_cluster
namespace: /service/ # etcd namespace for this cluster
name: patroni-node-1 # Unique name for this Patroni instance
# PostgreSQL configuration
postgresql:
listen: 0.0.0.0:5432
connect_address: [node_1_ip]:5432
data_dir: /var/lib/postgresql/[version]/main/data
bin_dir: /usr/lib/postgresql/[version]/bin
pg_hba:
- host replication replicator 0.0.0.0/0 md5
- host all app_user 0.0.0.0/0 md5
replication:
username: replicator
password: your_replication_password
slots:
- name: patroni_slot
database: postgres
parameters:
wal_level: replica
max_wal_senders: 10
max_replication_slots: 10
hot_standby: on
# etcd configuration
etcd:
hosts:
- [etcd_node_1_ip]:2379
- [etcd_node_2_ip]:2379
- [etcd_node_3_ip]:2379
protocol: http # or https if using TLS
# Replication configuration
replication:
synchronous_mode: false # Set to true for synchronous replication if needed
synchronous_node_count: 1 # Number of synchronous replicas required
# Tags for identifying nodes
tags:
nofailover: false
clonefrom: false
# REST API configuration
restapi:
listen: 0.0.0.0:8008
connect_address: [node_1_ip]:8008
# Logging configuration
log:
level: INFO
dir: /var/log/patroni
file: patroni.log
Important Notes:
- Replace
[node_1_ip],[etcd_node_x_ip], and[version]with your actual values. - The
scopeshould be identical across all nodes in the cluster. - The
namemust be unique for each Patroni instance. - Ensure the
data_dirandbin_dirpaths are correct for your PostgreSQL installation. - The
pg_hbaentries in Patroni’s config will override or supplement the file-based ones. It’s generally recommended to managepg_hbavia Patroni for consistency. - Set a strong password for the
replicatoruser. - The
namespacein etcd is crucial for Patroni to discover other cluster members. - For production, consider using TLS for etcd communication and securing the REST API.
Create the /var/log/patroni directory and set appropriate permissions:
sudo mkdir -p /var/log/patroni sudo chown -R postgres:postgres /var/log/patroni
Copy this patroni.yml to all PostgreSQL nodes, adjusting the name and connect_address for each node accordingly.
Initializing the PostgreSQL Cluster with Patroni
On one of the nodes (this will be the initial primary), you need to initialize the PostgreSQL cluster managed by Patroni. Ensure PostgreSQL service is stopped before proceeding.
sudo systemctl stop postgresql
Now, start Patroni. It will detect that no PostgreSQL cluster exists and will initialize it.
sudo patroni /etc/patroni/patroni.yml
Patroni will log its actions. You should see messages indicating it’s initializing the cluster, creating the replication user, and starting PostgreSQL. Once it reports that it’s running as a primary, you can start Patroni as a systemd service on the other nodes.
Starting Patroni as a Systemd Service
Create a systemd service file for Patroni (e.g., /etc/systemd/system/patroni.service):
[Unit] Description=Patroni - Highly Available PostgreSQL solution After=network.target [Service] User=postgres Group=postgres ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml Restart=on-failure KillMode=process TimeoutSec=30 [Install] WantedBy=multi-user.target
Reload systemd, enable, and start the service:
sudo systemctl daemon-reload sudo systemctl enable patroni sudo systemctl start patroni
On the other nodes, Patroni will detect the existing primary in etcd and configure themselves as replicas, cloning data if necessary. Monitor the logs on all nodes to ensure they join the cluster correctly.
Testing Automatic Failover
To test the failover mechanism, simulate a failure of the current primary node. The simplest way is to stop the Patroni service on the primary.
Simulating Primary Failure
On the current primary node:
sudo systemctl stop patroni
Within seconds, Patroni on the other nodes will detect the primary’s absence. One of the replicas will be elected as the new primary. You can monitor this process via the Patroni logs on each node.
To verify the new primary, you can query the Patroni REST API on any node:
curl http://[any_node_ip]:8008/primary
This will return JSON output indicating the current primary’s API endpoint. You can also query /cluster for the full cluster status.
Integrating Laravel with the HA PostgreSQL Cluster
Your Laravel application needs to connect to the PostgreSQL cluster in a way that it always targets the current primary. The most straightforward approach is to use a load balancer or a DNS-based solution that points to the active primary.
Option 1: Using a Load Balancer (e.g., HAProxy)
Deploy HAProxy on a separate instance or co-locate it on one of your application servers. HAProxy can monitor the Patroni REST API to determine the current primary.
HAProxy Configuration
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
listen postgres_cluster
bind *:5432
mode tcp
option httpchk GET /primary
http-check expect status 200
balance roundrobin
# This is where HAProxy will query Patroni for the primary
# It will then forward traffic to the IP returned by the API
# This requires a custom script or a more advanced HAProxy setup
# For simplicity, we'll use a static list and rely on Patroni's health checks
# A more robust solution involves a script that polls Patroni API and updates backend list.
server patroni-node-1 [node_1_ip]:5432 check port 8008 inter 2s fall 3 rise 2
server patroni-node-2 [node_2_ip]:5432 check port 8008 inter 2s fall 3 rise 2
server patroni-node-3 [node_3_ip]:5432 check port 8008 inter 2s fall 3 rise 2
listen stats
bind *:8404
mode http
stats enable
stats uri /stats
stats refresh 10s
Note on HAProxy Configuration: The above HAProxy configuration is a simplified example. For true dynamic failover detection, HAProxy needs to actively query the Patroni API (e.g., /primary endpoint) to determine the current primary and dynamically update its backend list. This often involves a custom script that runs periodically, polls the Patroni API, and reloads HAProxy configuration. Alternatively, tools like `patroni-vip` or `keepalived` can manage a floating IP that always points to the primary.
Option 2: DNS-Based Failover
Use a DNS provider that supports health checks and dynamic record updates. A script on your infrastructure would monitor the Patroni cluster and update a DNS A record to point to the current primary’s IP address. This is generally more complex to implement reliably.
Laravel Database Configuration
Configure your Laravel config/database.php to point to the load balancer’s IP address or the DNS name.
<?php
return [
// ... other configurations
'connections' => [
'pgsql' => [
'driver' => 'pgsql',
'url' => env('DATABASE_URL'),
'host' => env('DB_HOST', '127.0.0.1'), // This should be your HAProxy IP or DNS name
'port' => env('DB_PORT', '5432'),
'database' => env('DB_DATABASE', 'forge'),
'username' => env('DB_USERNAME', 'forge'),
'password' => env('DB_PASSWORD', ''),
'charset' => 'utf8',
'prefix' => '',
'prefix_indexes' => true,
'schema' => 'public',
'sslmode' => 'prefer',
],
// ... other connections
],
// ... other configurations
];
In your .env file:
DB_HOST=192.168.1.100 # IP of your HAProxy instance DB_PORT=5432 DB_DATABASE=your_app_db DB_USERNAME=app_user DB_PASSWORD=your_app_password
Advanced Considerations and Best Practices
Replication Slots: Ensure replication slots are properly configured and monitored. A misbehaving slot can cause the primary to retain WAL segments indefinitely, leading to disk full errors. Patroni manages slots, but it’s good practice to monitor their status.
Monitoring: Implement comprehensive monitoring for your PostgreSQL cluster, etcd, and HAProxy. Key metrics include replication lag, etcd health, Patroni API responsiveness, and HAProxy backend status.
Backups: Automated failover is not a substitute for a robust backup strategy. Use tools like pg_dump or pg_basebackup with a reliable backup solution (e.g., Linode Object Storage, S3-compatible storage).
Security: Secure etcd communication with TLS. Encrypt PostgreSQL replication traffic if nodes are on untrusted networks. Use strong passwords and consider IP whitelisting.
Testing: Regularly test your failover procedures. Simulate various failure scenarios (node crash, network partition, etcd failure) to ensure your system behaves as expected.
Read Replicas: For read-heavy workloads, you can configure additional read-only replicas. These can be managed by Patroni or as standard PostgreSQL replicas. Direct read traffic to these replicas to offload the primary.
Configuration Management: Use tools like Ansible, Chef, or Puppet to automate the deployment and configuration of PostgreSQL, Patroni, and etcd across your nodes. This ensures consistency and simplifies management.
By implementing Patroni for PostgreSQL high availability and integrating it with your Laravel application via a load balancer or intelligent DNS, you can achieve a resilient and automatically failing-over database infrastructure on Linode.