Disaster Recovery 101: Architecting Auto-Failovers for PostgreSQL and Python Deployments on Linode

Establishing a Highly Available PostgreSQL Cluster with Patroni

Achieving automated failover for PostgreSQL necessitates a robust cluster management solution. Patroni stands out as a leading choice, orchestrating PostgreSQL instances and managing high availability through distributed consensus. We’ll focus on a three-node setup for redundancy, leveraging etcd for cluster state management.

First, ensure etcd is installed and running on separate nodes or co-located with PostgreSQL nodes. For a production environment, etcd should be configured for TLS and have its own high availability setup. For this example, we’ll assume a basic etcd cluster accessible at etcd-node-1:2379, etcd-node-2:2379, and etcd-node-3:2379.

Patroni Configuration for PostgreSQL Nodes

Each PostgreSQL node will run a Patroni agent. The configuration file (e.g., patroni.yml) defines its role, connection details to etcd, and PostgreSQL parameters. Below is a representative configuration for a PostgreSQL node.

# patroni.yml
scope: my_pg_cluster
namespace: /service/
restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.1.10:8008 # IP of this node's REST API

etcd:
  hosts:
    - etcd-node-1:2379
    - etcd-node-2:2379
    - etcd-node-3:2379
  protocol: http # Use 'https' in production with TLS
  username: # Optional etcd username
  password: # Optional etcd password

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.1.10:5432 # IP of this node's PostgreSQL
  data_dir: /var/lib/postgresql/14/main
  pg_hba:
    - host    all             all             0.0.0.0/0               md5
  parameters:
    max_connections: 100
    shared_buffers: 128MB
    wal_level: replica
    hot_standby: "on"
    max_wal_senders: 10
    max_replication_slots: 10

replication:
  type: pg berdasarkan
  master:
    host: 192.168.1.10 # IP of the current master
    port: 5432
    user: replicator
    password: your_replication_password

# Bootstrap configuration (only for the initial master)
bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576 # 1MB
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        recovery_target_timeline: 'latest'

Key parameters to note:

scope: A unique identifier for this PostgreSQL cluster.
namespace: The etcd path where Patroni will store cluster state.
restapi.listen and restapi.connect_address: The address and port for Patroni’s REST API, used for health checks and management.
etcd.hosts: List of etcd endpoints.
postgresql.data_dir: The PostgreSQL data directory. Ensure this is correctly set.
postgresql.pg_hba: Crucial for allowing replication and application connections. Adjust IP ranges as per your security policy.
replication.type: Set to pgbased for standard PostgreSQL replication.
bootstrap.postgresql.use_pg_rewind: Essential for fast failovers by rewinding the old master to a consistent state.
bootstrap.postgresql.use_slots: Utilizes PostgreSQL replication slots to prevent WAL file deletion on the primary before they are consumed by replicas.

Bootstrapping the Cluster

To initiate the cluster, start Patroni on one node with the bootstrap configuration enabled. This node will attempt to acquire a lock in etcd and initialize itself as the primary PostgreSQL server.

On the designated initial primary node:

sudo systemctl enable patroni
sudo systemctl start patroni

Patroni will:

Attempt to acquire a leader lock in etcd.
If successful, initialize the PostgreSQL data directory (if empty).
Start PostgreSQL as the primary.
Register itself as the primary in etcd.

Adding Replicas and Configuring Failover

Once the primary is up and registered in etcd, start Patroni on the other nodes. These nodes will detect the existing primary via etcd and configure themselves as replicas, performing a base backup if necessary.

On replica nodes:

sudo systemctl enable patroni
sudo systemctl start patroni

Patroni on replica nodes will:

Discover the current primary from etcd.
Configure PostgreSQL to replicate from the primary.
Perform a base backup (e.g., using pg_basebackup) if the data directory is empty.
Start PostgreSQL in standby mode.
Periodically check the primary’s health via etcd.

When the primary fails (e.g., Patroni agent stops responding, PostgreSQL process dies), Patroni on the remaining nodes will detect this. A consensus mechanism (etcd’s leader election) will determine which replica is best suited to become the new primary (based on lag, etc.). The chosen replica will be promoted, and other nodes will reconfigure to replicate from the new primary.

Integrating Python Applications with Auto-Failover

Your Python applications need to be aware of the PostgreSQL cluster’s dynamic nature. Hardcoding a single database host is a recipe for disaster. The strategy involves using a load balancer or a service discovery mechanism that can query Patroni’s API.

Option 1: Using a Load Balancer (HAProxy)

HAProxy is an excellent choice for load balancing TCP connections and can be configured to health-check PostgreSQL instances via Patroni’s REST API. This allows HAProxy to direct traffic only to healthy primary instances.

First, install HAProxy on a dedicated server or co-locate it with your application servers. Configure HAProxy to monitor the Patroni REST API endpoints.

# haproxy.cfg
global
    log /dev/log    local0
    log /dev/log    local1 notice
    daemon

defaults
    log     global
    mode    tcp
    option  tcplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000

listen postgresql_cluster
    bind *:5433  # Port your applications will connect to
    mode tcp
    option tcp-check
    # Health check against Patroni's API to determine primary
    tcp-check connect port 8008
    tcp-check send GET /primary HTTP/1.0\r\n\r\n
    tcp-check expect status 200
    # Backend servers (all PostgreSQL nodes)
    server pg_node1 192.168.1.10:5432 check port 8008 inter 2s fall 3 rise 2
    server pg_node2 192.168.1.11:5432 check port 8008 inter 2s fall 3 rise 2
    server pg_node3 192.168.1.12:5432 check port 8008 inter 2s fall 3 rise 2

In this HAProxy configuration:

mode tcp: HAProxy will operate at the transport layer, forwarding raw TCP packets.
option tcp-check: Enables TCP-level health checks.
tcp-check connect port 8008: HAProxy attempts to connect to the Patroni REST API port.
tcp-check send GET /primary HTTP/1.0\r\n\r\n: Sends an HTTP GET request to the /primary endpoint of Patroni. This endpoint returns the details of the current primary if the node is healthy and aware of the primary.
tcp-check expect status 200: HAProxy expects a 200 OK HTTP status code from the /primary endpoint. This signifies that the node is either the primary or a healthy replica aware of the primary.
server ... check port 8008 inter 2s fall 3 rise 2: Configures each PostgreSQL node as a backend server. The check port 8008 combined with the tcp-check directives ensures HAProxy only sends traffic to the actual PostgreSQL port (5432) of the node that is currently the primary.

Your Python application should now connect to your_haproxy_ip:5433. HAProxy will automatically route connections to the active PostgreSQL primary.

Option 2: Service Discovery with Python

For more dynamic environments or when HAProxy is not desired, your Python application can directly query Patroni’s API to find the current primary. This requires a custom connection logic within your application or a database connection pool library that supports dynamic host resolution.

Here’s a Python snippet demonstrating how to find the primary using the requests library:

import requests
import psycopg2
import time

PATRONI_API_URLS = [
    "http://192.168.1.10:8008/primary",
    "http://192.168.1.11:8008/primary",
    "http://192.168.1.12:8008/primary",
]

def get_primary_db_connection_details():
    """
    Queries Patroni API endpoints to find the current PostgreSQL primary.
    Returns a dictionary with 'host' and 'port' or None if not found.
    """
    for url in PATRONI_API_URLS:
        try:
            response = requests.get(url, timeout=2)
            if response.status_code == 200:
                data = response.json()
                # Check if the 'role' is 'master' or if it's a replica reporting the master
                # Patroni's /primary endpoint returns the master details.
                # If a node is not the master, it might still respond with master info if it knows it.
                # A more robust check might involve querying /cluster and checking roles.
                # For simplicity, we assume /primary returns master info if available.
                return {
                    "host": data.get("host"),
                    "port": data.get("port"),
                    "database": "your_db_name", # Add your database name
                    "user": "your_db_user",     # Add your DB user
                    "password": "your_db_password" # Add your DB password
                }
        except requests.exceptions.RequestException:
            # Ignore connection errors and try the next URL
            continue
    return None

def get_db_connection():
    """
    Establishes a PostgreSQL connection, dynamically finding the primary.
    Includes retry logic for failover scenarios.
    """
    max_retries = 5
    retry_delay = 5  # seconds
    conn = None
    db_details = None

    for attempt in range(max_retries):
        if conn is None:
            db_details = get_primary_db_connection_details()
            if db_details:
                try:
                    print(f"Attempting to connect to primary at {db_details['host']}:{db_details['port']}...")
                    conn = psycopg2.connect(
                        host=db_details["host"],
                        port=db_details["port"],
                        database=db_details["database"],
                        user=db_details["user"],
                        password=db_details["password"]
                    )
                    print("Successfully connected to PostgreSQL.")
                    return conn
                except psycopg2.OperationalError as e:
                    print(f"Connection failed: {e}")
                    conn = None # Ensure conn is None if connection fails
            else:
                print("Could not find primary database. Retrying...")

        if conn is None:
            print(f"Retrying connection in {retry_delay} seconds... (Attempt {attempt + 1}/{max_retries})")
            time.sleep(retry_delay)
        else:
            # If we got here, conn is valid, so we should have returned already.
            # This is a safeguard.
            break

    if conn is None:
        raise Exception(f"Failed to connect to PostgreSQL after {max_retries} attempts.")

    return conn

# Example usage:
if __name__ == "__main__":
    try:
        connection = get_db_connection()
        cursor = connection.cursor()
        cursor.execute("SELECT version();")
        db_version = cursor.fetchone()
        print(f"PostgreSQL version: {db_version[0]}")
        cursor.close()
        connection.close()
    except Exception as e:
        print(f"An error occurred: {e}")

This Python code:

Iterates through a list of Patroni API endpoints.
Makes a GET request to the /primary endpoint.
Parses the JSON response to extract the primary’s host and port.
Uses psycopg2 to establish a connection.
Includes a retry mechanism with exponential backoff (simplified to fixed delay here) to handle transient connection issues during failover.

For production, consider integrating this logic into your database connection pool or ORM configuration. Libraries like SQLAlchemy can be configured with custom connection handlers or use connection strings that might be dynamically updated.

Monitoring and Alerting

Automated failover is only part of the solution. Comprehensive monitoring and alerting are critical to ensure the system is functioning as expected and to be notified of any issues that might require manual intervention.

Key Metrics to Monitor

Patroni API Health: Regularly check the status of the /health endpoint on each Patroni node. A 200 OK indicates the agent is running and aware of the cluster state.
etcd Health: Monitor the health and performance of your etcd cluster. Unhealthy etcd can lead to cluster instability and failed failovers.
PostgreSQL Replication Lag: Use Patroni’s API (e.g., /cluster) or direct SQL queries (pg_stat_replication) to monitor replication lag on standby servers. High lag can impact failover RPO.
PostgreSQL Instance Health: Monitor PostgreSQL process status, CPU, memory, disk I/O, and connection counts.
HAProxy Statistics: If using HAProxy, monitor its backend server status and connection rates.

Tools like Prometheus with appropriate exporters (e.g., postgres_exporter, node_exporter, and custom Patroni/HAProxy exporters) are ideal for collecting and visualizing these metrics. Alerting can be configured via Alertmanager.

Example Prometheus Alert for Patroni Health

# prometheus.yml (alerting rules)
groups:
- name: postgresql_alerts
  rules:
  - alert: PatroniNodeUnhealthy
    expr: up{job="patroni"} == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Patroni agent on {{ $labels.instance }} is down."
      description: "The Patroni agent on {{ $labels.instance }} has been down for 5 minutes. This may indicate a problem with the PostgreSQL node or the Patroni service itself."

  - alert: PostgreSQLPrimaryNotElected
    # This query checks if there's no node reporting itself as 'master' in the cluster state.
    # It assumes Patroni's /cluster endpoint provides role information.
    # Adjust the query based on your specific Patroni metrics exporter.
    # A more direct approach might be to check if the 'patroni_role' metric is absent or not 'master' for any node.
    expr: count(patroni_role{role="master"}) by (cluster) == 0
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "No PostgreSQL primary elected in cluster {{ $labels.cluster }}."
      description: "Patroni has failed to elect a primary for the PostgreSQL cluster {{ $labels.cluster }} for 10 minutes. Manual intervention may be required."

Implementing this architecture provides a resilient PostgreSQL deployment that can automatically recover from node failures, ensuring minimal downtime for your Python applications.