Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and WordPress Deployments on Linode
Establishing a High-Availability PostgreSQL Cluster with Patroni
Achieving automated failover for PostgreSQL requires a robust cluster management solution. Patroni stands out as a leading choice, orchestrating PostgreSQL instances into a highly available cluster. It leverages distributed consensus systems like etcd, Consul, or ZooKeeper for leader election and configuration management. For this guide, we’ll focus on etcd due to its widespread adoption and ease of deployment.
Our setup will involve at least three nodes for etcd to ensure quorum and at least two nodes for PostgreSQL to facilitate failover. We’ll deploy Patroni on each PostgreSQL node, configuring it to communicate with the etcd cluster.
Deploying etcd Cluster
A minimal etcd cluster for high availability requires an odd number of members, typically three. We’ll assume three Linode instances (e.g., `etcd-01`, `etcd-02`, `etcd-03`) with static IP addresses. Ensure these IPs are reachable from your PostgreSQL nodes.
On each etcd node, install etcd (version 3.x recommended). The exact installation method depends on your OS. For Debian/Ubuntu:
sudo apt update sudo apt install etcd -y
Next, configure etcd. Create a configuration file, e.g., `/etc/etcd/etcd.conf.yml`. The following is a simplified example for a three-node cluster. Replace placeholders with actual IPs.
name: <node_name> # e.g., etcd-01 data-dir: /var/lib/etcd listen-client-urls: http://0.0.0.0:2379 advertise-client-urls: http://<node_ip>:2379 listen-peer-urls: http://0.0.0.0:2380 initial-advertise-peer-urls: http://<node_ip>:2380 initial-cluster: etcd-01=http://<etcd-01_ip>:2380,etcd-02=http://<etcd-02_ip>:2380,etcd-03=http://<etcd-03_ip>:2380 initial-cluster-token: my-etcd-cluster initial-cluster-state: new proxy: false
Start and enable the etcd service:
sudo systemctl start etcd sudo systemctl enable etcd
Verify the cluster health. On any etcd node:
ETCDCTL_API=3 etcdctl member list ETCDCTL_API=3 etcdctl endpoint health --cluster
Deploying PostgreSQL with Patroni
We’ll deploy PostgreSQL on two or more Linode instances (e.g., `pg-node-01`, `pg-node-02`). Install PostgreSQL (version 12+ recommended) and Patroni on each node.
For Debian/Ubuntu:
sudo apt update sudo apt install postgresql postgresql-contrib python3-pip -y pip3 install patroni[etcd] python-etcd
Configure Patroni. Create a configuration file, e.g., `/etc/patroni/patroni.yml`. This configuration is critical and defines how Patroni manages the PostgreSQL cluster.
scope: my-postgres-cluster # Unique name for your cluster
namespace: /service/ # etcd namespace for cluster configuration
restapi:
listen: 0.0.0.0:8008
connect_address: <node_ip>:8008 # IP of this specific node
etcd:
host: <etcd-01_ip>:2379,<etcd-02_ip>:2379,<etcd-03_ip>:2379
protocol: http
postgresql:
listen: 0.0.0.0:5432
connect_address: <node_ip>:5432 # IP of this specific node
data_dir: /var/lib/postgresql/14/main # Adjust version and path as needed
bin_dir: /usr/lib/postgresql/14/bin # Adjust version and path as needed
pg_hba:
- host replication replicator <pg-node-01_ip>/32 md5
- host replication replicator <pg-node-02_ip>/32 md5
# Add other necessary replication/application users and hosts
replication:
username: replicator
password: <replication_password>
ssl: false # Set to true for production
parameters:
max_connections: 100
shared_buffers: 128MB
wal_level: replica
hot_standby: "on"
max_wal_senders: 10
max_replication_slots: 10
replication_mode: async # or sync for synchronous replication
use_slots: true # Recommended for reliable replication
Ensure the PostgreSQL data directory exists and has correct permissions. Also, create the replication user and set the password in PostgreSQL *before* starting Patroni for the first time on any node. This can be done by manually starting PostgreSQL, connecting as the postgres user, and running:
CREATE USER replicator WITH REPLICATION PASSWORD '<replication_password>'; ALTER SYSTEM SET wal_level = 'replica'; ALTER SYSTEM SET max_wal_senders = 10; ALTER SYSTEM SET max_replication_slots = 10; ALTER SYSTEM SET hot_standby = 'on'; SELECT pg_reload_conf();
Now, start and enable the Patroni service on each PostgreSQL node. Patroni will initialize the PostgreSQL cluster, setting up the primary and replicas.
sudo systemctl start patroni sudo systemctl enable patroni
Monitor Patroni’s logs (`journalctl -u patroni -f`) to ensure it successfully initializes the cluster and establishes replication.
Configuring WordPress for High Availability
WordPress itself is stateless, but its reliance on a database and potentially file uploads necessitates a robust backend. For high availability, we need to address:
- Database Connection: Directing WordPress to the active PostgreSQL primary.
- File Storage: Ensuring consistent access to `wp-content/uploads`.
Database Connection Management
Patroni exposes a REST API that provides the current primary’s connection details. We can leverage this to dynamically update WordPress’s `wp-config.php` or, more practically, use a load balancer or a proxy that queries Patroni’s API.
A common approach is to use a load balancer (like HAProxy or Nginx) configured to query Patroni’s API for the primary. This load balancer then acts as the single database endpoint for WordPress.
Let’s configure HAProxy to point to the PostgreSQL primary. Install HAProxy on a dedicated node or one of your web servers.
sudo apt update sudo apt install haproxy -y
Edit HAProxy configuration (`/etc/haproxy/haproxy.cfg`). We’ll add a backend that dynamically fetches the primary IP from Patroni.
frontend http_frontend
bind *:80
default_backend webservers
backend webservers
balance roundrobin
option httpchk GET /
# Add your WordPress web server IPs here
server web1 192.168.1.10:80 check
server web2 192.168.1.11:80 check
# PostgreSQL HAProxy Configuration
frontend pg_frontend
bind *:5433 # Use a different port for HAProxy's PostgreSQL endpoint
mode tcp
default_backend pg_backend
backend pg_backend
mode tcp
balance roundrobin
option httpchk
# This is a simplified example. A more robust solution would involve a script
# that periodically polls Patroni and updates HAProxy's backend servers.
# For a truly dynamic setup, consider a custom health check or a sidecar.
# For demonstration, we'll list all potential PostgreSQL nodes and rely on
# Patroni's health checks to mark them as UP/DOWN.
# The 'check port 8008' will query Patroni's REST API health endpoint.
# Patroni's API will return 200 OK if the node is the primary, or a redirect
# to the primary if it's a replica. HAProxy needs to interpret this.
# A more direct approach for HAProxy is to use its TCP mode and rely on
# Patroni's API to return the primary's IP.
# A common pattern is to use a script that polls Patroni and updates HAProxy.
# For simplicity here, we'll assume Patroni's API is accessible and HAProxy
# can be configured to use it.
# A more practical approach involves a script that polls Patroni and updates
# HAProxy's backend list.
# For a basic setup, you can list all PG nodes and rely on Patroni's API
# to direct traffic.
# Let's assume a script `get_pg_primary.sh` that returns the primary IP.
# This script would poll patroni's API: curl -s http://<pg-node-ip>:8008/primary
# and extract the 'host' field.
# HAProxy configuration for dynamic backend is complex. A simpler, though less
# dynamic, approach is to list all PG nodes and use Patroni's API for health checks.
# A more robust solution involves a dedicated service that watches Patroni and
# reconfigures HAProxy.
# Simplified approach: List all PG nodes and rely on Patroni's API for health.
# HAProxy's HTTP check on port 8008 can be used to determine primary status.
# This requires careful configuration of `http-check expect`.
# A more direct TCP approach is to have a script that updates the backend.
# Example using a script to update backend (requires external script and reload)
# server pg1 <pg-node-01_ip>:5432 check port 8008 # Patroni API health check
# server pg2 <pg-node-02_ip>:5432 check port 8008
# A more robust solution involves a custom health check or a service that
# dynamically updates HAProxy's backend. For this example, we'll assume
# a script `update_pg_backend.sh` that is run periodically.
# For a truly dynamic setup, consider a tool like `patroni-vip` or a custom
# agent that monitors Patroni and updates HAProxy.
# Let's use a simpler, albeit less dynamic, approach for demonstration:
# List all potential PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008. If a node is primary,
# Patroni's API returns 200. If it's a replica, it redirects.
# HAProxy's `http-check expect status 200` is key here.
# This requires HAProxy to be able to interpret redirects or for Patroni
# to return 200 for primary and a different status for replica.
# Patroni's default API returns 200 for primary, and 307 for replica.
# HAProxy can follow redirects or check for specific status codes.
# Let's configure HAProxy to check Patroni's API for primary status.
# This requires HAProxy 1.8+ for `http-check expect`.
# The `check port 8008` will query the Patroni REST API.
# We expect a 200 OK status code from the primary.
# For replicas, Patroni redirects to the primary. HAProxy needs to handle this.
# A simpler approach is to use a script that polls Patroni and updates HAProxy.
# For a production setup, consider using a tool like `patroni-vip` which
# manages a virtual IP that floats to the primary, or a more sophisticated
# HAProxy configuration with external scripts.
# Simplified HAProxy config for PG backend:
# List all PG nodes and use Patroni's API for health checks.
# The `check port 8008` queries the Patroni REST API.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to accept the redirect or to check the final destination.
# A more direct approach is to have a script that polls Patroni and updates HAProxy.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Simplified HAProxy config for PG backend:
# List all PG nodes and use Patroni's API for health checks.
# The `check port 8008` queries the Patroni REST API.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -s http://<pg-node-01_ip>:8008/primary | jq -r '.host')
# if [ -n "$PRIMARY_IP" ]; then
# echo "server pg1 $PRIMARY_IP:5432 check"
# else
# echo "server pg1 127.0.0.1:5432 # Fallback or error state"
# fi
# This script would be run periodically, and its output would be included in haproxy.cfg.
# Then, HAProxy would be reloaded.
# For this example, we'll list all PG nodes and rely on Patroni's API health check.
# HAProxy will query Patroni's API on port 8008.
# Patroni returns 200 OK for the primary, and 307 (redirect) for replicas.
# HAProxy needs to be configured to follow redirects or check the final destination.
# A more robust solution uses a script to update HAProxy's backend.
# Let's assume a script `get_pg_primary_ip.sh` that returns the primary IP.
# This script would poll Patroni's API: `curl -s http://<pg-node-ip>:8008/primary | jq -r '.host'`
# Then, HAProxy would be reloaded. This is not fully dynamic.
# For a truly dynamic setup, consider a service that monitors Patroni and
# reconfigures HAProxy.
# A common pattern:
# Use a script that polls Patroni and updates HAProxy's backend.
# Example script (conceptual):
# PRIMARY_IP=$(curl -