Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and WooCommerce Deployments on DigitalOcean
Establishing a Highly Available PostgreSQL Cluster with Patroni
For a robust WooCommerce deployment, a highly available PostgreSQL database is non-negotiable. We’ll architect an automated failover solution using Patroni, a template for PostgreSQL HA. This setup leverages etcd for distributed configuration and leader election, ensuring a single, active primary at any given time.
Our target is a three-node PostgreSQL cluster on DigitalOcean Droplets. One node will be the primary, and the other two will be replicas. Patroni will manage the state and orchestrate failovers.
Prerequisites
- Three DigitalOcean Droplets (e.g., 4 vCPU, 8GB RAM) for PostgreSQL nodes.
- One DigitalOcean Droplet for etcd cluster (minimum 2 vCPU, 4GB RAM). A 3-node etcd cluster is recommended for production HA. For simplicity in this example, we’ll use a single etcd instance, but this is NOT production-ready.
- SSH access to all Droplets.
- Basic understanding of Linux system administration and PostgreSQL.
Setting up the etcd Cluster
First, we need a reliable etcd cluster for Patroni to coordinate. Install etcd on your dedicated etcd Droplet(s).
Installation (Ubuntu/Debian)
Download the latest etcd release. Replace v3.5.9 with the current version.
ETCD_VER=v3.5.9
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd /usr/local/bin/
sudo mv etcd-${ETCD_VER}-linux-amd64/etcdctl /usr/local/bin/
rm -rf etcd-${ETCD_VER}-linux-amd64*
Configuration
Create an etcd systemd service file. Adjust ETCD_NAME and INITIAL_CLUSTER for a multi-node setup. For a single node, INITIAL_CLUSTER can be omitted or point to itself.
sudo nano /etc/systemd/system/etcd.service
[Unit] Description=etcd key-value store Documentation=man:etcd(1) [Service] User=etcd ExecStart=/usr/local/bin/etcd \ --name etcd0 \ --data-dir /var/lib/etcd \ --listen-client-urls http://0.0.0.0:2379 \ --advertise-client-urls http://<ETCD_NODE_IP>:2379 \ --listen-peer-urls http://0.0.0.0:2380 \ --initial-advertise-peer-urls http://<ETCD_NODE_IP>:2380 \ --initial-cluster etcd0=http://<ETCD_NODE_IP>:2380 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster-state new [Install] WantedBy=multi-user.target
Replace <ETCD_NODE_IP> with the actual IP address of your etcd Droplet. Create the etcd data directory and start the service.
sudo mkdir -p /var/lib/etcd sudo chown etcd:etcd /var/lib/etcd sudo systemctl daemon-reload sudo systemctl enable etcd sudo systemctl start etcd
Installing and Configuring Patroni on PostgreSQL Nodes
Patroni needs to be installed on each PostgreSQL Droplet. We’ll use pip for installation.
Installation (Ubuntu/Debian)
sudo apt update sudo apt install -y python3-pip python3-dev build-essential libpq-dev sudo pip3 install --upgrade pip sudo pip3 install "patroni[etcd]" psycopg2-binary
PostgreSQL Setup
Install PostgreSQL on each node. Ensure the PostgreSQL service is stopped and disabled, as Patroni will manage it.
sudo apt install -y postgresql postgresql-contrib sudo systemctl stop postgresql sudo systemctl disable postgresql
Patroni requires a dedicated PostgreSQL user for replication and management. Create this user and grant necessary privileges.
-- Connect to PostgreSQL as the default postgres user sudo -u postgres psql -- Inside psql prompt: CREATE USER replicator WITH REPLICATION PASSWORD 'your_replication_password'; ALTER USER replicator CREATEDB; \q
Patroni Configuration File
Create the Patroni configuration file on each PostgreSQL node. This file defines the connection to etcd, PostgreSQL settings, and replication parameters.
sudo mkdir -p /etc/patroni sudo nano /etc/patroni/patroni.yml
# patroni.yml
scope: my_pg_cluster # Unique name for your cluster
namespace: /service/ # etcd namespace for this cluster
restapi:
listen: 0.0.0.0:8008
connect_address: <NODE_IP>:8008 # IP of this node
etcd:
host: <ETCD_NODE_IP>:2379
protocol: http
postgresql:
listen: 0.0.0.0:5432
connect_address: <NODE_IP>:5432
data_dir: /var/lib/postgresql/14/main # Adjust version if needed
bin_dir: /usr/lib/postgresql/14/bin # Adjust version if needed
pg_hba:
- host replication replicator 0.0.0.0/0 md5
- host all all 0.0.0.0/0 md5
replication:
username: replicator
password: 'your_replication_password'
parameters:
max_connections: 100
shared_buffers: 1GB
effective_cache_size: 3GB
maintenance_work_mem: 256MB
checkpoint_completion_target: 0.9
wal_buffers: 16MB
default_statistics_target: 100
random_page_cost: 1.1
effective_io_concurrency: 200
work_mem: 16MB
min_wal_size: 1GB
max_wal_size: 4GB
logging_collector: on
log_directory: /var/log/postgresql
log_filename: postgresql-%a.log
log_min_duration_statement: 250ms
log_checkpoints: on
log_connections: on
log_disconnections: on
log_lock_waits: on
log_temp_files: 0
log_autovacuum_min_duration: 0
log_error_verbosity: default
log_line_prefix: '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
# For initial bootstrap, one node needs to be configured to initialize PG
# This is typically done by starting Patroni on one node first.
# The 'initdb' section is only needed on the FIRST node that starts.
# initdb:
# - encoding: UTF8
# - locale: en_US.UTF-8
# - data-checksums
# Tags can be used to influence leader selection.
# The node with the highest 'nofailover' tag value will be preferred as leader.
# tags:
# nofailover: 1
# clonefrom: 1
Replace <NODE_IP> with the IP address of the specific PostgreSQL Droplet and <ETCD_NODE_IP> with your etcd Droplet’s IP. Ensure the data_dir and bin_dir paths match your PostgreSQL installation. For the first node that will initialize the cluster, uncomment and configure the initdb section. For subsequent nodes, ensure initdb is commented out.
Starting Patroni and PostgreSQL
Create a systemd service file for Patroni on each PostgreSQL node.
sudo nano /etc/systemd/system/patroni.service
[Unit] Description=Patroni PostgreSQL HA After=network.target [Service] User=postgres Group=postgres ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml Restart=on-failure RestartSec=5s [Install] WantedBy=multi-user.target
Now, start Patroni and enable it to run on boot. Start it on one node first, then the others.
sudo systemctl daemon-reload sudo systemctl enable patroni sudo systemctl start patroni
After starting Patroni on the first node (with initdb uncommented), it will initialize PostgreSQL and register itself in etcd. Then, start Patroni on the other nodes. They will detect the existing cluster, connect as replicas, and Patroni will manage their states.
Verifying the Cluster and Failover
You can check the status of the cluster via Patroni’s REST API or by querying etcd.
Checking Cluster Status
Use patronictl (installed with Patroni) to inspect the cluster.
patronictl -c /etc/patroni/patroni.yml list
This command should show your cluster name, the number of nodes, the current leader, and the status of each member.
Simulating a Failover
To test automatic failover, stop the Patroni service on the current primary node.
# On the current primary node: sudo systemctl stop patroni
Within seconds, Patroni on the other nodes will detect the primary’s absence, elect a new leader from the available replicas, and promote it. You can monitor this process using patronictl list on any node.
Integrating with WooCommerce
WooCommerce applications typically connect to a single database host. To ensure seamless failover, we need a mechanism to abstract the database endpoint.
Using a Load Balancer/Proxy
The most robust approach is to use a load balancer or a database proxy that can dynamically update its backend based on Patroni’s leader information. HAProxy is a strong candidate.
HAProxy Configuration for PostgreSQL
Install HAProxy on a separate Droplet or one of the existing nodes (though a dedicated node is recommended for production). Configure HAProxy to point to the current PostgreSQL primary.
sudo apt install -y haproxy
Edit the HAProxy configuration file:
sudo nano /etc/haproxy/haproxy.cfg
frontend pgsql_frontend
bind *:5432
mode tcp
default_backend pgsql_backend
backend pgsql_backend
mode tcp
balance roundrobin
option httpchk GET /primary
http-check expect status 200
# This is where the magic happens: we'll use a script to update these servers
server pg_primary <CURRENT_PRIMARY_IP>:5432 check port 8008 inter 2s
server pg_replica1 <REPLICA1_IP>:5432
server pg_replica2 <REPLICA2_IP>:5432
The key here is the option httpchk GET /primary and check port 8008. HAProxy will query the Patroni REST API on each PostgreSQL node. The node that responds with HTTP 200 for the /primary endpoint is the current leader. HAProxy will automatically mark it as available and route traffic to it.
You’ll need to restart HAProxy after changes:
sudo systemctl restart haproxy
Your WooCommerce application should now connect to the HAProxy IP address on port 5432. When a failover occurs, HAProxy will automatically redirect traffic to the new primary.
Advanced Considerations and Best Practices
- etcd HA: For production, deploy a 3 or 5-node etcd cluster for etcd’s own high availability.
- Monitoring: Implement comprehensive monitoring for PostgreSQL, Patroni, and etcd. Prometheus and Grafana are excellent choices.
- Backups: Automate regular PostgreSQL backups (e.g., using
pg_basebackupor tools like Barman) and store them off-site. - Network Segmentation: Use DigitalOcean’s VPC and firewall rules to restrict access to PostgreSQL and etcd ports only from necessary sources.
- Connection Pooling: For high-traffic WooCommerce sites, consider a connection pooler like PgBouncer placed behind HAProxy to manage database connections efficiently.
- Read Replicas: For read-heavy workloads, you can configure additional PostgreSQL replicas specifically for read traffic, separate from the HA failover pool.
- Configuration Management: Use tools like Ansible or Terraform to automate the deployment and configuration of your PostgreSQL and etcd clusters.
- Testing: Regularly test your failover procedures. Simulate various failure scenarios (node failure, network partition, etcd failure) to ensure your system behaves as expected.
By implementing this Patroni-based auto-failover solution for PostgreSQL and integrating it with HAProxy, you significantly enhance the resilience and availability of your WooCommerce deployment on DigitalOcean, ensuring minimal downtime during database failures.