Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and Magento 2 Deployments on Linode
Establishing a High-Availability PostgreSQL Cluster with Patroni
For a robust Magento 2 deployment, a highly available PostgreSQL database is non-negotiable. We’ll leverage Patroni, a template for PostgreSQL HA, to manage automatic failover. Patroni uses a distributed configuration store (like etcd, Consul, or ZooKeeper) to maintain cluster state and elect a leader. For simplicity and ease of deployment on Linode, we’ll use etcd.
This setup assumes you have at least three Linode instances. One will host etcd, and the others will run PostgreSQL nodes managed by Patroni. For production, consider a dedicated etcd cluster across multiple availability zones.
Deploying etcd for Patroni Coordination
First, install etcd on a dedicated Linode instance. We’ll configure it for basic peer discovery.
etcd Installation and Configuration
On your etcd node (e.g., `etcd-node.yourdomain.com`):
Install etcd (package manager varies by OS, using `apt` for Debian/Ubuntu):
sudo apt update sudo apt install etcd -y
Configure etcd by editing `/etc/etcd/etcd.conf.yml`. Ensure it’s accessible by your PostgreSQL nodes. Replace `ETCD_NODE_IP` with the actual IP of your etcd node.
name: etcd-node data-dir: /var/lib/etcd listen-client-urls: http://0.0.0.0:2379 advertise-client-urls: http://ETCD_NODE_IP:2379 listen-peer-urls: http://0.0.0.0:2380 initial-advertise-peer-urls: http://ETCD_NODE_IP:2380 initial-cluster: etcd-node=http://ETCD_NODE_IP:2380 initial-cluster-token: my-etcd-cluster-token initial-cluster-state: new proxy: false
Restart and enable the etcd service:
sudo systemctl restart etcd sudo systemctl enable etcd
Verify etcd is running and accessible:
ETCDCTL_API=3 etcdctl member list ETCDCTL_API=3 etcdctl endpoint health
Setting up Patroni on PostgreSQL Nodes
On each PostgreSQL node (e.g., `pg-node-1.yourdomain.com`, `pg-node-2.yourdomain.com`, `pg-node-3.yourdomain.com`):
Patroni Installation
Install PostgreSQL and Patroni. The exact PostgreSQL version should match your Magento 2 requirements.
# Install PostgreSQL (example for PostgreSQL 13 on Ubuntu) sudo apt update sudo apt install postgresql postgresql-contrib postgresql-13 -y # Install Patroni (using pip is recommended for latest features) sudo apt install python3-pip python3-dev build-essential libpq-dev -y sudo pip3 install --upgrade pip sudo pip3 install "patroni[etcd]" psycopg2-binary
Patroni Configuration
Create the Patroni configuration file, typically at `/etc/patroni/patroni.yml`. This configuration is crucial and must be identical across all PostgreSQL nodes, with only the `scope` (cluster name) and `name` (node identifier) differing.
Example `patroni.yml` for `pg-node-1.yourdomain.com`:
scope: magento_cluster
name: pg-node-1.yourdomain.com
restapi:
listen: 0.0.0.0:8008
connect_address: pg-node-1.yourdomain.com:8008
etcd:
host: ETCD_NODE_IP:2379
protocol: http
postgresql:
listen: 0.0.0.0:5432
connect_address: pg-node-1.yourdomain.com:5432
data_dir: /var/lib/postgresql/13/main
pg_hba:
- host all all 0.0.0.0/0 md5
parameters:
max_connections: 100
shared_buffers: 128MB
effective_cache_size: 384MB
maintenance_work_mem: 64MB
checkpoint_completion_target: 0.9
wal_buffers: 16MB
default_statistics_target: 100
random_page_cost: 1.1
effective_io_concurrency: 200
work_mem: 16MB
min_wal_size: 1GB
max_wal_size: 4GB
checkpoint_timeout: 5min
max_age: 10min
wal_keep_size: 1024MB # Patroni manages this, but good to set a baseline
replication:
master:
# Patroni will configure replication slots automatically
# Ensure PostgreSQL user has replication privileges
# This user will be created by Patroni if it doesn't exist.
username: replicator
password: YOUR_REPLICATION_PASSWORD
# Tags can be used for custom logic, e.g., routing read-only traffic
tags:
nofailover: false
clone: false
Important Notes:
- Replace `ETCD_NODE_IP` with the actual IP of your etcd node.
- Replace `pg-node-1.yourdomain.com` with the hostname of the current PostgreSQL node.
- Ensure the `data_dir` matches your PostgreSQL installation.
- The `pg_hba` configuration allows connections from any IP; restrict this in production.
- `wal_keep_size` is managed by Patroni, but setting a reasonable default is good practice.
- The `replicator` user and password will be managed by Patroni. Ensure the password is strong.
- For production, consider using SSL for etcd communication and PostgreSQL connections.
Create a systemd service file for Patroni, e.g., `/etc/systemd/system/patroni.service`:
[Unit] Description=Patroni PostgreSQL High-Availability After=network.target [Service] User=postgres Group=postgres ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml Restart=on-failure RestartSec=5s [Install] WantedBy=multi-user.target
Ensure the `postgres` user has read access to `/etc/patroni/patroni.yml` and write access to `data_dir`.
Initializing the PostgreSQL Cluster
On *one* of the PostgreSQL nodes (e.g., `pg-node-1.yourdomain.com`), start Patroni and PostgreSQL:
sudo systemctl start patroni sudo systemctl enable patroni
Check the Patroni logs for initialization progress:
sudo journalctl -u patroni -f
Once the first node is initialized, start Patroni on the *other* PostgreSQL nodes. They will detect the existing cluster in etcd and join as replicas.
# On pg-node-2.yourdomain.com and pg-node-3.yourdomain.com sudo systemctl start patroni sudo systemctl enable patroni
Verify the cluster status using Patroni’s REST API:
curl http://pg-node-1.yourdomain.com:8008/cluster
You should see output indicating the cluster members and their roles (one master, others replicas).
Configuring Magento 2 for High Availability
Magento 2 needs to be configured to connect to the PostgreSQL cluster. The key is to use a connection string that can be dynamically updated or to use a load balancer that directs traffic to the current master.
Using a Virtual IP or Load Balancer
The most robust approach is to use a load balancer or a virtual IP (VIP) that always points to the current PostgreSQL master. On Linode, you can achieve this with:
- HAProxy: A popular, high-performance TCP/HTTP load balancer.
- Keepalived: For managing a floating VIP.
Let’s focus on HAProxy as it can also handle health checks and direct traffic appropriately.
HAProxy Setup
Install HAProxy on a separate node or one of the existing nodes (ensure it’s not a PostgreSQL node itself to avoid single points of failure).
sudo apt update sudo apt install haproxy -y
Configure HAProxy by editing `/etc/haproxy/haproxy.cfg`. We’ll set up a frontend that directs traffic to the PostgreSQL master based on Patroni’s API.
[global] log /dev/log local0 maxconn 4096 daemon defaults log global mode tcp option tcplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 50000 listen pgsql_cluster bind *:5432 mode tcp option tcp-check # Use Patroni's API to find the master. This requires a custom script or a Patroni integration. # For simplicity, we'll use a basic health check and rely on Patroni's leader election. # A more advanced setup would involve a script that queries Patroni's API for the master's IP. # For this example, we'll list all potential PostgreSQL nodes and let HAProxy health checks do the work. # This is NOT fully automatic failover without a script. # A better approach: Use a script to query Patroni API and update backend servers dynamically. # Example with static backend list (requires manual intervention or a script to update) # server pg-node-1 pg-node-1.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 # server pg-node-2 pg-node-2.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 # server pg-node-3 pg-node-3.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 # A more dynamic approach using a script to query Patroni API: # This requires a script that runs periodically or on-demand to update the backend servers. # For a truly automated solution, consider using Patroni's built-in API to find the master. # Example using a script (not provided here, but conceptually): # server pg-master `get_pg_master_ip.sh` check port 5432 inter 2s fall 3 rise 2 # For a simpler, though less dynamic, setup, we can use Patroni's REST API to get the master. # This requires a tool like 'jq' and a script to parse the output. # Example using a script that updates HAProxy config dynamically: # Let's assume a script `update_haproxy_backends.sh` exists. # This script would query http://ETCD_NODE_IP:8008/cluster, find the master, and update the HAProxy config. # Then, it would reload HAProxy. # For demonstration, let's use a basic TCP check on all nodes. # Patroni's REST API is the key to true automation. # A common pattern is to have a script that polls Patroni's /cluster endpoint, # identifies the master, and dynamically updates HAProxy's backend list. # Then, HAProxy is reloaded. # Example of a basic setup that relies on Patroni's leader election and HAProxy's health checks. # This is NOT fully automated failover without a script to manage HAProxy backends. # The 'check' directive will probe port 5432. Patroni ensures only the master listens on 5432 for external connections. # However, replicas also listen on 5432 for replication. # A better check would be to query Patroni's API. # Let's use a more robust approach with a script that queries Patroni's API. # This script would run via cron or a systemd timer. # For now, we'll show the HAProxy config that *would* be managed. # Example of a backend definition that would be dynamically managed: # server pg-node-1 pg-node-1.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 # server pg-node-2 pg-node-2.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 # server pg-node-3 pg-node-3.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 # A more direct way to use Patroni's API with HAProxy: # HAProxy can be configured to use external scripts for health checks. # Or, a separate process can update HAProxy's configuration. # For a simpler, but less ideal, approach: # Configure HAProxy to point to all nodes and rely on Patroni to ensure only the master is available for writes. # This is not ideal as HAProxy might try to connect to replicas for writes. # The recommended approach: # 1. A script queries Patroni's /cluster API. # 2. It identifies the master's IP and port. # 3. It dynamically updates HAProxy's backend configuration. # 4. It reloads HAProxy. # Let's assume such a script exists and manages the 'server' lines below. # The 'check port 5432' is a basic TCP check. A more advanced check would be to # query Patroni's API for the master status. # Example of backend servers that would be dynamically managed: server pg-node-1 pg-node-1.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 server pg-node-2 pg-node-2.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 server pg-node-3 pg-node-3.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 # Optional: Add a read-only listener if you want to direct read traffic to replicas # listen pgsql_replicas # bind *:5433 # mode tcp # option tcp-check # # This would point to replicas. Requires logic to identify replicas. # server pg-node-2 pg-node-2.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2 # server pg-node-3 pg-node-3.yourdomain.com:5432 check port 5432 inter 2s fall 3 rise 2
Dynamic Backend Management Script (Conceptual):
import requests
import json
import subprocess
import os
PATRONI_API_URL = "http://ETCD_NODE_IP:8008/cluster"
HAPROXY_CONF_FILE = "/etc/haproxy/haproxy.cfg"
HAPROXY_BACKEND_SECTION = "pgsql_cluster"
PG_NODES = ["pg-node-1.yourdomain.com", "pg-node-2.yourdomain.com", "pg-node-3.yourdomain.com"]
PG_PORT = 5432
def get_master_node():
try:
response = requests.get(PATRONI_API_URL, timeout=5)
response.raise_for_status()
cluster_data = response.json()
for member in cluster_data.get("members", []):
if member.get("role") == "master":
return member.get("api_url").split(":")[0].replace("http://", "")
except requests.exceptions.RequestException as e:
print(f"Error fetching Patroni API: {e}")
return None
def update_haproxy_config(master_node):
if not master_node:
print("No master node found, skipping HAProxy update.")
return
new_backend_lines = []
for node in PG_NODES:
server_line = f" server {node.replace('.', '_')} {node}:{PG_PORT} check port {PG_PORT} inter 2s fall 3 rise 2"
if node == master_node:
# Optionally add specific checks for master, or just ensure it's in the list
new_backend_lines.append(server_line)
else:
# For replicas, you might want different checks or just list them.
# In a true HA setup, HAProxy should only direct writes to the master.
# This example assumes HAProxy is configured to direct writes to the master.
# If HAProxy is only for writes, replicas might not need to be in this list.
# If HAProxy is for both reads and writes, you'd need a separate backend for replicas.
new_backend_lines.append(server_line) # Including replicas for now, but writes should go to master
try:
with open(HAPROXY_CONF_FILE, 'r') as f:
lines = f.readlines()
in_backend_section = False
new_lines = []
for line in lines:
if line.strip().startswith(f"listen {HAPROXY_BACKEND_SECTION}"):
in_backend_section = True
new_lines.append(line)
# Add the dynamically generated server lines
new_lines.extend(new_backend_lines)
# Skip original server lines if they exist
continue
elif in_backend_section and (line.strip().startswith("server ") or line.strip() == ""):
# Skip original server lines and empty lines within the section until the next listen/global
if line.strip().startswith("server "):
continue
elif line.strip() == "" and not new_backend_lines: # If no new backends, keep empty lines
new_lines.append(line)
elif line.strip() == "" and new_backend_lines: # If new backends, skip empty lines that were before them
continue
elif line.strip().startswith("listen ") or line.strip().startswith("[global]"):
in_backend_section = False
new_lines.append(line)
else:
new_lines.append(line)
# Ensure the backend section is properly closed if it was the last section
if in_backend_section and not new_lines[-1].strip().startswith("listen ") and not new_lines[-1].strip().startswith("[global]"):
new_lines.append("\n") # Add a newline if needed
with open(HAPROXY_CONF_FILE, 'w') as f:
f.writelines(new_lines)
print("HAProxy config updated. Reloading HAProxy...")
subprocess.run(["sudo", "systemctl", "reload", "haproxy"], check=True)
print("HAProxy reloaded.")
except Exception as e:
print(f"Error updating HAProxy config: {e}")
if __name__ == "__main__":
master = get_master_node()
if master:
print(f"Detected master: {master}")
update_haproxy_config(master)
else:
print("Could not detect master node.")
This script should be scheduled to run periodically (e.g., every minute) using cron or a systemd timer. It queries Patroni’s API, finds the master, and updates the HAProxy configuration file, then reloads HAProxy. Ensure the script has the necessary permissions to write to `/etc/haproxy/haproxy.cfg` and reload the `haproxy` service.
Restart HAProxy after updating the configuration:
sudo systemctl restart haproxy sudo systemctl enable haproxy
Magento 2 Database Configuration
In your Magento 2 `app/etc/env.php` file, configure the database connection to point to the HAProxy listener.
<?php
return [
'backend' => [
'frontName' => 'admin_your_secret_path'
],
'crypt' => [
'key' => 'your_crypt_key_here'
],
'db' => [
'connection' => [
'default' => [
'host' => 'HAPROXY_IP_OR_HOSTNAME', // IP or hostname of your HAProxy server
'port' => '5432', // The port HAProxy listens on for PostgreSQL
'dbname' => 'magento_db',
'username' => 'magento_user',
'password' => 'YOUR_MAGENTO_DB_PASSWORD',
'model' => 'mysql4',
'initStatements' => 'SET NAMES utf8',
'driver_options' => [
1002 => 'SET NAMES utf8'
]
],
'innodb_strict_mode' => true,
'use_ssl' => '0' // Set to '1' if using SSL
],
'default_setup' => [
'host' => 'HAPROXY_IP_OR_HOSTNAME',
'port' => '5432',
'dbname' => 'magento_db',
'username' => 'magento_user',
'password' => 'YOUR_MAGENTO_DB_PASSWORD',
'model' => 'mysql4',
'initStatements' => 'SET NAMES utf8',
'driver_options' => [
1002 => 'SET NAMES utf8'
]
]
],
// ... other Magento configuration
];
Replace `HAPROXY_IP_OR_HOSTNAME` with the IP address or hostname of your HAProxy server. Magento will now connect through HAProxy, which will direct traffic to the current PostgreSQL master.
Testing Failover
To test the automatic failover:
- Identify the current master node using Patroni’s API: `curl http://ETCD_NODE_IP:8008/cluster`.
- Gracefully stop the PostgreSQL service on the master node: `sudo systemctl stop patroni`.
- Observe Patroni logs on the remaining nodes. Patroni should detect the master’s failure, and one of the replicas should be promoted to master.
- Verify the new master using Patroni’s API again.
- Check if Magento can still connect and perform operations. The HAProxy script should have updated its backend to point to the new master.
- Once the old master is back online, Patroni will configure it as a replica of the new master.
For a more aggressive test, you can simulate a network partition or forcefully kill the PostgreSQL process on the master node. Ensure your HAProxy health checks and the dynamic backend script are robust enough to handle these scenarios.
Securing the PostgreSQL Cluster
Production deployments require enhanced security:
- SSL/TLS: Encrypt communication between Patroni nodes, etcd, PostgreSQL clients, and HAProxy.
- Firewall Rules: Restrict access to PostgreSQL ports (5432) and Patroni API ports (8008) to only necessary IPs (e.g., HAProxy, Magento application servers).
- etcd Security: Configure etcd with TLS and authentication.
- PostgreSQL Authentication: Use strong passwords and consider certificate-based authentication.
- HAProxy ACLs: Implement access control lists in HAProxy to further restrict access.
By implementing Patroni for PostgreSQL HA and HAProxy with dynamic backend management for traffic routing, you can achieve a highly available and resilient database layer for your Magento 2 deployment on Linode.