Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and WooCommerce Deployments on OVH

Establishing PostgreSQL High Availability with Patroni and etcd

For mission-critical WooCommerce deployments, PostgreSQL’s availability is paramount. A robust disaster recovery strategy hinges on automated failover. We’ll leverage Patroni, a template for PostgreSQL HA, in conjunction with etcd for distributed configuration and leader election. This setup ensures that if the primary PostgreSQL instance fails, a replica is automatically promoted with minimal downtime.

Our architecture will consist of at least three PostgreSQL nodes, each running Patroni. etcd will also be deployed as a cluster (typically 3 or 5 nodes) for resilience. Patroni monitors the health of PostgreSQL instances and uses etcd to manage cluster state, including identifying the current primary and orchestrating failover.

Prerequisites and Setup

Ensure you have a working etcd cluster. For this example, we assume etcd is accessible at etcd.yourdomain.com:2379. Each PostgreSQL node will require Patroni installed. The installation method depends on your OS; for Debian/Ubuntu, apt install patroni postgresql-client is common. Ensure PostgreSQL is configured to allow replication and superuser access for Patroni.

Patroni Configuration (`patroni.yml`)

Each Patroni instance will share a similar configuration file. Key parameters include connection details for PostgreSQL, etcd, and replication settings.

# patroni.yml on each PostgreSQL node
scope: my_woocommerce_cluster
namespace: /service/
restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.1.10[node_ip]:5432 # Replace with actual node IP

etcd:
  host: etcd.yourdomain.com:2379
  protocol: http # or https if using TLS

# PostgreSQL configuration
postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.1.10[node_ip]:5432 # Replace with actual node IP
  data_dir: /var/lib/postgresql/14/main # Adjust path as per your PostgreSQL installation
  bin_dir: /usr/lib/postgresql/14/bin # Adjust path as per your PostgreSQL installation
  pg_hba:
    - host    replication all       replicator_user/32_bit_mask(e.g., 0/0) trust
    - host    replication all       repl_user_ip_address/32_bit_mask(e.g., 192.168.1.0/24) trust
    - host    all       all       0.0.0.0/0               md5 # For application access
  replication:
    username: replicator_user
    password: your_replication_password
    ssl: false # Set to true if using SSL
  authentication:
    superuser:
      username: postgres
      password: your_postgres_password
    replication:
      username: replicator_user
      password: your_replication_password

# Tags for node identification
tags:
  nofailover: false
  clonefrom: false

Important Notes:

Replace my_woocommerce_cluster with a unique name for your cluster.
namespace in etcd should be consistent across all Patroni instances.
connect_address must be the IP address of the specific node Patroni is running on.
Ensure the replicator_user exists in PostgreSQL and has replication privileges.
The pg_hba.conf entries are crucial for allowing replication and application connections. Adjust IP ranges and user privileges as needed for your network security policy.
data_dir and bin_dir must point to your PostgreSQL installation’s directories.

Starting Patroni and PostgreSQL

Once patroni.yml is configured on each node, start Patroni. Patroni will then manage the lifecycle of the PostgreSQL instances. The first node to start will typically initialize itself as the primary.

# On each PostgreSQL node
sudo systemctl start patroni
sudo systemctl enable patroni

You can verify the cluster status using Patroni’s REST API or by inspecting etcd keys.

# Example: Check cluster status from any node
curl http://localhost:8008/cluster

The output should indicate the current primary and any replicas. If a node fails to start, check Patroni logs (e.g., journalctl -u patroni -f) and PostgreSQL logs for errors.

Configuring WooCommerce for Auto-Failover

WooCommerce, by default, connects directly to a single PostgreSQL instance. To achieve auto-failover, we need a mechanism that abstracts the database endpoint and redirects traffic to the current primary. This is typically achieved using a load balancer or a virtual IP (VIP) managed by a cluster resource manager.

Option 1: Using HAProxy as a Database Proxy

HAProxy can be configured to monitor the Patroni REST API and direct traffic to the active PostgreSQL primary. This provides a single, stable endpoint for WooCommerce.

HAProxy Configuration (`haproxy.cfg`)

# haproxy.cfg
global
    log /dev/log local0
    log /dev/log local1 notice
    maxconn 4096
    user haproxy
    group haproxy
    daemon

defaults
    log global
    mode tcp
    option tcplog
    option dontlognull
    timeout connect 5000
    timeout client 50000
    timeout server 50000

listen patroni_api
    bind *:8009
    mode http
    # This is a simplified check. A more robust check would query /primary
    # and verify the leader's IP.
    option httpchk GET /primary
    http-check expect status 200

frontend pgsql_frontend
    bind *:5433 # The port WooCommerce will connect to
    mode tcp
    default_backend pgsql_backend

backend pgsql_backend
    mode tcp
    balance roundrobin
    # Use Patroni's API to discover the primary
    # This requires a custom script or a more advanced HAProxy configuration
    # For simplicity, we'll list potential primaries and rely on health checks.
    # A more dynamic approach is recommended for production.
    server pgsql1 192.168.1.101:5432 check port 8008 # Node 1 Patroni API port
    server pgsql2 192.168.1.102:5432 check port 8008 # Node 2 Patroni API port
    server pgsql3 192.168.1.103:5432 check port 8008 # Node 3 Patroni API port

    # Health check for PostgreSQL itself
    option pgsql-check user postgres
    # If using a dedicated health check user:
    # user health_check_user password health_check_password

# A more advanced approach using a script to query Patroni API:
# In HAProxy 2.0+, you can use 'http-request use-tag' with external scripts.
# For older versions, a dedicated service like 'patronictl' or a custom agent
# would be needed to update HAProxy's backend server list dynamically.

Dynamic Backend Configuration (Advanced):

The static backend configuration above is not ideal for true auto-failover. A more robust solution involves a script that periodically queries the Patroni API (e.g., http://localhost:8008/primary) to get the current primary’s IP and port. This script would then dynamically update HAProxy’s configuration and reload HAProxy. Alternatively, tools like patronictl discover can be integrated.

Here’s a conceptual Python script snippet to achieve this:

import requests
import subprocess
import time
import os

PATRONI_API_URL = "http://localhost:8008/primary" # Or the IP of a Patroni node
HAPROXY_CONFIG_PATH = "/etc/haproxy/haproxy.cfg"
HAPROXY_RELOAD_CMD = ["systemctl", "reload", "haproxy"]
CHECK_INTERVAL = 10 # seconds

def get_primary_info():
    try:
        response = requests.get(PATRONI_API_URL, timeout=5)
        response.raise_for_status()
        data = response.json()
        return data.get("host"), data.get("port")
    except requests.exceptions.RequestException as e:
        print(f"Error querying Patroni API: {e}")
        return None, None

def update_haproxy_config(primary_host, primary_port):
    # Read the current HAProxy config
    with open(HAPROXY_CONFIG_PATH, 'r') as f:
        lines = f.readlines()

    new_lines = []
    in_backend = False
    primary_found = False

    for line in lines:
        if line.strip().startswith("backend pgsql_backend"):
            in_backend = True
            new_lines.append(line)
            continue
        if in_backend and line.strip().startswith("server "):
            # This is a server line, skip it if it's not the primary
            if not line.strip().startswith(f"server pgsql_primary {primary_host}:{primary_port}"):
                continue
            else:
                primary_found = True
                new_lines.append(f"server pgsql_primary {primary_host}:{primary_port} check port 8008\n") # Keep health check
        elif in_backend and not line.strip().startswith("server "):
            # End of backend section or other directives
            in_backend = False
            new_lines.append(line)
        else:
            new_lines.append(line)

    # If primary wasn't found (e.g., first run or config changed), add it
    if in_backend and not primary_found:
        new_lines.append(f"server pgsql_primary {primary_host}:{primary_port} check port 8008\n")

    # Write the new config if it's different
    if "".join(new_lines) != "".join(lines):
        with open(HAPROXY_CONFIG_PATH, 'w') as f:
            f.writelines(new_lines)
        print("HAProxy config updated. Reloading...")
        try:
            subprocess.run(HAPROXY_RELOAD_CMD, check=True)
            print("HAProxy reloaded successfully.")
        except subprocess.CalledProcessError as e:
            print(f"Error reloading HAProxy: {e}")
    else:
        print("HAProxy config unchanged.")


if __name__ == "__main__":
    print("Starting HAProxy dynamic configuration updater...")
    while True:
        host, port = get_primary_info()
        if host and port:
            update_haproxy_config(host, port)
        else:
            print("Could not determine primary. Retrying...")
        time.sleep(CHECK_INTERVAL)

This script needs to be run on the HAProxy server and configured to start on boot. The PATRONI_API_URL should point to a reliable Patroni instance. The script assumes a basic HAProxy config structure and needs adaptation for more complex setups. Ensure the HAProxy user has write permissions to the config file and can execute systemctl reload haproxy.

WooCommerce Connection String

Update your wp-config.php to point to the HAProxy listener.

// wp-config.php
define( 'DB_HOST', 'localhost:5433' ); // HAProxy listener IP and port
define( 'DB_NAME', 'your_woocommerce_db' );
define( 'DB_USER', 'your_db_user' );
define( 'DB_PASSWORD', 'your_db_password' );
define( 'DB_CHARSET', 'utf8mb4' );
define( 'DB_COLLATE', '' );

Option 2: Using a Virtual IP (VIP) with Keepalived

Another common approach is to use Keepalived to manage a Virtual IP address that floats between the PostgreSQL nodes. WooCommerce applications would connect to this VIP. Keepalived uses VRRP (Virtual Router Redundancy Protocol) to achieve this.

Keepalived Configuration (`keepalived.conf`)

On each PostgreSQL node, configure Keepalived. One node will initially hold the VIP. If that node fails, Keepalived on another node will take over the VIP.

# keepalived.conf on each PostgreSQL node
vrrp_script chk_postgres {
    script "/usr/local/bin/check_postgres.sh" # Custom script to check PostgreSQL health
    interval 2
    weight 20 # Higher weight for primary
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP # Start as BACKUP, will transition to MASTER if primary is healthy
    interface eth0 # Network interface to bind VIP to
    virtual_router_id 51
    priority 100 # Lower priority for backup nodes
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass your_vrrp_password
    }
    virtual_ipaddress {
        192.168.1.100/24 dev eth0 # The floating VIP
    }

    track_script {
        chk_postgres
    }
}

/usr/local/bin/check_postgres.sh script:

#!/bin/bash
# Checks if PostgreSQL is running and accepting connections as primary.
# This script should be adapted to check Patroni's status or query PG directly.

PG_USER="postgres"
PG_HOST="127.0.0.1"
PG_PORT="5432"

# Check if Patroni reports this node as primary
# This is a more reliable check than just checking if PG is running.
# You might need to adjust the URL based on your Patroni setup.
PATRONI_PRIMARY_URL="http://localhost:8008/primary"

if curl --fail --silent "$PATRONI_PRIMARY_URL" > /dev/null; then
    # Patroni reports this node as primary, or at least it's reachable.
    # Further check if PostgreSQL is actually responding.
    pg_isready -h $PG_HOST -p $PG_PORT -U $PG_USER >/dev/null 2>&1
    if [ $? -eq 0 ]; then
        exit 0 # PostgreSQL is ready and this node is likely primary
    else
        exit 1 # PostgreSQL not ready
    fi
else
    # Patroni does not report this node as primary.
    # If you want the VIP to float even if PG is down but Patroni is up,
    # you might adjust this logic. For HA, we want the VIP on the active primary.
    exit 1
fi

Ensure the check_postgres.sh script is executable and has the correct permissions. The weight in chk_postgres should be higher for the intended primary node(s) to ensure it gets the VIP first. The priority in vrrp_instance determines which node becomes MASTER if multiple nodes are equally healthy; higher priority wins. Ensure your_vrrp_password is strong and consistent across nodes.

WooCommerce Connection String with VIP

Configure wp-config.php to use the VIP.

// wp-config.php
define( 'DB_HOST', '192.168.1.100' ); // The floating VIP
define( 'DB_NAME', 'your_woocommerce_db' );
define( 'DB_USER', 'your_db_user' );
define( 'DB_PASSWORD', 'your_db_password' );
define( 'DB_CHARSET', 'utf8mb4' );
define( 'DB_COLLATE', '' );

Testing Failover Scenarios

Thorough testing is critical. Simulate failures to ensure the automated failover works as expected.

Simulate PostgreSQL Node Failure: Stop the PostgreSQL service on the current primary node (sudo systemctl stop postgresql). Observe Patroni’s logs and etcd state to confirm a replica is promoted. Check if HAProxy or Keepalived redirects traffic to the new primary. Test WooCommerce by performing a read and write operation.
Simulate Patroni Service Failure: Stop the Patroni service on the primary node (sudo systemctl stop patroni). Patroni should detect this and initiate a failover.
Simulate Network Partition: Isolate a PostgreSQL node from the etcd cluster and the other PostgreSQL nodes. Observe how Patroni handles this.
Simulate etcd Node Failure: If using an etcd cluster, stop one or more etcd nodes to test its resilience.

During failover, expect a brief period of unavailability (seconds to a minute, depending on configuration and network latency). Monitor application logs for connection errors during this window.

Monitoring and Maintenance

Continuous monitoring is essential for a reliable HA setup.

Patroni Health: Monitor the Patroni REST API endpoints (/cluster, /primary, /replicas) for status.
etcd Health: Monitor etcd cluster health, leader election, and key presence.
HAProxy/Keepalived Health: Monitor HAProxy stats and Keepalived status.
PostgreSQL Metrics: Track replication lag, connection counts, query performance, and disk I/O.
Log Aggregation: Centralize logs from PostgreSQL, Patroni, HAProxy, and Keepalived for easier debugging.

Regularly test your failover procedures and keep your Patroni, PostgreSQL, etcd, HAProxy, and Keepalived versions up-to-date. Consider implementing automated backups and point-in-time recovery (PITR) as a complementary disaster recovery strategy.

Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and WooCommerce Deployments on OVH

Establishing PostgreSQL High Availability with Patroni and etcd

Prerequisites and Setup

Patroni Configuration (patroni.yml)

Starting Patroni and PostgreSQL

Configuring WooCommerce for Auto-Failover

Option 1: Using HAProxy as a Database Proxy

HAProxy Configuration (haproxy.cfg)

WooCommerce Connection String

Option 2: Using a Virtual IP (VIP) with Keepalived

Keepalived Configuration (keepalived.conf)

WooCommerce Connection String with VIP

Testing Failover Scenarios

Monitoring and Maintenance

Recent Posts

Top Categories

Our Products

Our Services

Patroni Configuration (`patroni.yml`)

HAProxy Configuration (`haproxy.cfg`)

Keepalived Configuration (`keepalived.conf`)