Automating Multi-Region Redundancy for Python Architectures on OVH

OVH Public Cloud: Multi-Region Strategy for Python Applications

Achieving robust disaster recovery for Python applications necessitates a multi-region deployment strategy. This post details a practical approach using OVH Public Cloud, focusing on automated failover and data synchronization for a typical web application stack. We’ll cover infrastructure provisioning, application deployment, database replication, and automated health checks.

Infrastructure as Code: Terraform for OVH Deployment

Managing infrastructure across multiple regions manually is error-prone and time-consuming. Terraform provides a declarative way to define and provision your cloud resources. We’ll define two distinct regions, each with its own set of compute instances, load balancers, and databases.

First, ensure you have the OVH provider configured for Terraform. This typically involves setting up environment variables for your OVH API credentials:

export OVH_APPLICATION_KEY="YOUR_OVH_APPLICATION_KEY"
export OVH_APPLICATION_SECRET="YOUR_OVH_APPLICATION_SECRET"
export OVH_CONSUMER_KEY="YOUR_OVH_CONSUMER_KEY"

Next, define your Terraform configuration. This example uses two regions, ‘GRA’ (Gravelines) and ‘RBX’ (Roubaix), for demonstration. Each region will have a private network, a set of instances, and a managed PostgreSQL database.

# main.tf

provider "ovh" {
  endpoint = "ovh-eu"
}

variable "region_primary" {
  description = "Primary OVH region"
  type        = string
  default     = "GRA"
}

variable "region_secondary" {
  description = "Secondary OVH region"
  type        = string
  default     = "RBX"
}

# --- Primary Region Resources ---

module "primary_region" {
  source = "./modules/region"
  region = var.region_primary
  name_prefix = "primary"
}

# --- Secondary Region Resources ---

module "secondary_region" {
  source = "./modules/region"
  region = var.region_secondary
  name_prefix = "secondary"
}

# --- Global Resources (e.g., DNS, if managed externally) ---
# For simplicity, we'll assume DNS is managed separately or via OVH's DNS service
# which would require additional provider configuration.

The `modules/region` directory would contain the reusable infrastructure for a single region:

# modules/region/main.tf

variable "region" {
  description = "OVH region for this module"
  type        = string
}

variable "name_prefix" {
  description = "Prefix for resources in this region"
  type        = string
}

resource "ovh_cloud_project_network_private" "app_network" {
  service_name = "YOUR_OVH_PROJECT_ID" # Replace with your actual project ID
  region       = var.region
  name         = "${var.name_prefix}-app-net"
  vlan_id      = 100 # Example VLAN ID
}

resource "ovh_cloud_project_instance" "app_server" {
  service_name = "YOUR_OVH_PROJECT_ID"
  region       = var.region
  name         = "${var.name_prefix}-app-server-01"
  flavor_name  = "b2-7" # Example flavor
  image_name   = "Debian 11"
  network_id   = ovh_cloud_project_network_private.app_network.network_id
  ssh_key_name = "your-ssh-key-name" # Ensure this SSH key exists in your OVH account
  count        = 2 # Deploy two instances for redundancy within the region
}

resource "ovh_cloud_project_database" "app_db" {
  service_name = "YOUR_OVH_PROJECT_ID"
  region       = var.region
  engine       = "postgresql"
  version      = "13"
  plan_name    = "professional-1" # Example plan
  name         = "${var.name_prefix}-app-db"
  disk_size    = 100 # GB
}

output "instance_ips" {
  value = [for instance in ovh_cloud_project_instance.app_server : instance.public_ip]
}

output "db_endpoint" {
  value = ovh_cloud_project_database.app_db.endpoint
}

After defining your Terraform files, initialize and apply the configuration:

terraform init
terraform plan
terraform apply

Automated Application Deployment with Ansible

Once the infrastructure is provisioned, we need to deploy our Python application consistently across all instances in both regions. Ansible is an excellent choice for this. We’ll create an Ansible playbook that installs dependencies, copies application code, configures environment variables, and starts the application service.

First, gather the IP addresses of your deployed instances. You can output these from Terraform or query the OVH API. For this example, let’s assume you have an Ansible inventory file:

# inventory.ini

[all:vars]
ansible_user=debian # Or your chosen SSH user

[primary_app_servers]
primary-app-server-01 ansible_host=PRIMARY_APP_SERVER_01_IP
primary-app-server-02 ansible_host=PRIMARY_APP_SERVER_02_IP

[secondary_app_servers]
secondary-app-server-01 ansible_host=SECONDARY_APP_SERVER_01_IP
secondary-app-server-02 ansible_host=SECONDARY_APP_SERVER_02_IP

[all_app_servers:children]
primary_app_servers
secondary_app_servers

Now, create the Ansible playbook. This playbook assumes your Python application uses Gunicorn and systemd for process management.

# deploy_app.yml
---
- name: Deploy Python Application
  hosts: all_app_servers
  become: yes
  vars:
    app_repo: "[email protected]:your-org/your-python-app.git"
    app_path: "/opt/your_app"
    venv_path: "{{ app_path }}/venv"
    gunicorn_workers: 3
    gunicorn_bind: "0.0.0.0:8000"
    db_host: "{{ hostvars[groups['all'][0]]['primary_app_db_endpoint'] }}" # Example: Use primary DB endpoint by default
    db_name: "your_db_name"
    db_user: "your_db_user"
    db_password: "your_db_password"

  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes

    - name: Install system dependencies (python3, pip, venv, git, etc.)
      apt:
        name:
          - python3
          - python3-pip
          - python3-venv
          - git
          - build-essential
        state: present

    - name: Create application directory
      file:
        path: "{{ app_path }}"
        state: directory
        owner: www-data # Or your application user
        group: www-data

    - name: Clone or update application repository
      git:
        repo: "{{ app_repo }}"
        dest: "{{ app_path }}"
        version: main # Or a specific tag/branch
        force: yes

    - name: Create virtual environment
      pip:
        virtualenv: "{{ venv_path }}"
        virtualenv_command: /usr/bin/python3 -m venv

    - name: Install Python dependencies from requirements.txt
      pip:
        requirements: "{{ app_path }}/requirements.txt"
        virtualenv: "{{ venv_path }}"

    - name: Copy Gunicorn systemd service file
      template:
        src: templates/gunicorn.service.j2
        dest: /etc/systemd/system/gunicorn.service
      notify: Restart Gunicorn

    - name: Ensure Gunicorn service is enabled and started
      systemd:
        name: gunicorn
        enabled: yes
        state: started
        daemon_reload: yes

  handlers:
    - name: Restart Gunicorn
      systemd:
        name: gunicorn
        state: restarted
        daemon_reload: yes

You’ll also need a Jinja2 template for the Gunicorn systemd service:

# templates/gunicorn.service.j2
[Unit]
Description=Gunicorn instance to serve your_app
After=network.target

[Service]
User=www-data
Group=www-data
WorkingDirectory={{ app_path }}
ExecStart={{ venv_path }}/bin/gunicorn \
  --workers {{ gunicorn_workers }} \
  --bind {{ gunicorn_bind }} \
  your_app.wsgi:application # Adjust 'your_app.wsgi:application' to your app's entry point

[Install]
WantedBy=multi-user.target

Run the playbook:

ansible-playbook -i inventory.ini deploy_app.yml

Database Replication and Failover

For disaster recovery, database availability is paramount. OVH Managed Databases for PostgreSQL support logical replication. We’ll configure the primary database in ‘GRA’ to replicate to the secondary database in ‘RBX’.

Step 1: Enable Logical Replication on Primary DB

You can do this via the OVH Control Panel or the OVH API. Ensure the following parameters are set for your primary PostgreSQL instance:

wal_level = logical
max_replication_slots = 5 # Adjust as needed
max_wal_senders = 5       # Adjust as needed

Restart the primary database instance for these changes to take effect.

Step 2: Create a Replication User

Connect to your primary database and create a user with replication privileges:

-- Connect to your primary database
CREATE USER replicator WITH REPLICATION PASSWORD 'your_replication_password';
GRANT rds_replication TO replicator;

Step 3: Configure Logical Replication Slot on Primary DB

On the primary database, create a replication slot. This slot tracks the WAL (Write-Ahead Log) data that needs to be sent to the replica.

-- Connect to your primary database
SELECT pg_create_logical_replication_slot('app_replication_slot', 'pgoutput');

Step 4: Configure Publication on Primary DB

Define what data to replicate. For a full database replication, you can publish all tables.

-- Connect to your primary database
CREATE PUBLICATION app_publication FOR ALL TABLES;

Step 5: Configure Subscription on Secondary DB

Connect to your secondary database instance (in ‘RBX’) and create a subscription to pull data from the primary.

-- Connect to your secondary database
CREATE SUBSCRIPTION app_subscription
    CONNECTION 'host=PRIMARY_DB_ENDPOINT port=5432 user=replicator password=your_replication_password dbname=your_db_name'
    PUBLICATION app_publication
    WITH (copy_data = true, create_slot = false); -- create_slot=false because we created it on primary

Replace PRIMARY_DB_ENDPOINT with the actual endpoint of your primary database. The copy_data = true option will perform an initial data copy if the secondary database is empty.

Step 6: Monitoring Replication Lag

You can monitor replication lag by querying the subscription status on the secondary database:

-- Connect to your secondary database
SELECT srsubstate, srsublsn, pg_wal_lsn_diff(pg_current_wal_lsn(), srsublsn) AS replication_lag_bytes
FROM pg_subscription_rel sr
JOIN pg_subscription s ON s.oid = sr.srsubid
WHERE s.subname = 'app_subscription';

A consistently high replication_lag_bytes indicates an issue. You might need to adjust network bandwidth, database instance sizes, or the replication slot configuration.

Global Load Balancing and Health Checks

To enable seamless failover, we need a mechanism to direct traffic to the healthy region. OVH’s Load Balancer service can be configured for this. For true multi-region failover, consider a global DNS solution with health checking capabilities, or leverage OVH’s Global Load Balancing if available and suitable for your needs.

For this example, we’ll assume a simpler setup where a single OVH Load Balancer is used, and we’ll manually or script the failover. A more advanced setup would involve a global DNS provider (like Cloudflare, AWS Route 53, or OVH’s DNS) that can perform health checks across regions and update DNS records.

Step 1: Configure OVH Load Balancer

Create an OVH Load Balancer and configure backend pools pointing to the public IPs of your application servers in each region. Define health check probes (e.g., HTTP GET on a health endpoint like /healthz).

Step 2: Implement Application Health Endpoint

Your Python application should expose a /healthz endpoint that returns a 200 OK status if the application is healthy and can connect to its database. This is crucial for the load balancer’s health checks.

# Example Flask health check endpoint
from flask import Flask, jsonify
from your_app.database import get_db_connection # Assume this function exists

app = Flask(__name__)

@app.route('/healthz')
def health_check():
    try:
        # Attempt to get a database connection to verify DB health
        conn = get_db_connection()
        conn.close()
        return jsonify({"status": "ok"}), 200
    except Exception as e:
        return jsonify({"status": "error", "message": str(e)}), 503

Step 3: Automated Failover Script

A script can monitor the health of the primary region. If health checks fail consistently, it can trigger a failover. This script would typically:

Query the health status of the primary region’s load balancer or application servers.
If primary is unhealthy, update the DNS record (if using global DNS) to point to the secondary region’s load balancer or IPs.
Alternatively, if using OVH Load Balancer directly, you might need to reconfigure its backend pools to disable the primary region and enable the secondary. This can be done via the OVH API.
Send notifications (e.g., Slack, email) about the failover event.

Here’s a conceptual Python script using the OVH API to disable a backend pool (representing the primary region):

import ovh
import time
import requests # For health checks

# --- Configuration ---
OVH_ENDPOINT = "https://eu.api.ovh.com/1.0"
APP_KEY = "YOUR_OVH_APPLICATION_KEY"
APP_SECRET = "YOUR_OVH_APPLICATION_SECRET"
CONSUMER_KEY = "YOUR_OVH_CONSUMER_KEY"
PROJECT_ID = "YOUR_OVH_PROJECT_ID"
LOADBALANCER_ID = "YOUR_LOADBALANCER_ID" # The ID of your OVH Load Balancer
PRIMARY_POOL_ID = "YOUR_PRIMARY_POOL_ID" # The ID of the backend pool for the primary region
SECONDARY_POOL_ID = "YOUR_SECONDARY_POOL_ID" # The ID of the backend pool for the secondary region
HEALTH_CHECK_URL = "http://your-app.example.com/healthz" # A global URL or one that resolves to the LB

# --- Initialize OVH Client ---
client = ovh.Client(
    endpoint=OVH_ENDPOINT,
    application_key=APP_KEY,
    application_secret=APP_SECRET,
    consumer_key=CONSUMER_KEY,
)

def is_primary_region_healthy():
    """Checks if the primary region is healthy via an external health check URL."""
    try:
        response = requests.get(HEALTH_CHECK_URL, timeout=5)
        return response.status_code == 200
    except requests.exceptions.RequestException:
        return False

def update_loadbalancer_pool(pool_id, enabled):
    """Enables or disables a backend pool on the OVH Load Balancer."""
    try:
        client.put(f"/cloud/project/{PROJECT_ID}/loadbalancer/{LOADBALANCER_ID}/pool/{pool_id}",
                   status="active" if enabled else "disabled")
        print(f"Pool {pool_id} set to {'active' if enabled else 'disabled'}")
        return True
    except Exception as e:
        print(f"Error updating pool {pool_id}: {e}")
        return False

def perform_failover():
    """Initiates failover to the secondary region."""
    print("Primary region is unhealthy. Initiating failover...")
    if update_loadbalancer_pool(PRIMARY_POOL_ID, False):
        print("Primary pool disabled. Traffic should now be directed to the secondary pool.")
        # Optionally, enable secondary pool if it was disabled
        # update_loadbalancer_pool(SECONDARY_POOL_ID, True)
        # Send notification
        send_notification("Failover initiated: Primary region disabled.")
    else:
        print("Failed to disable primary pool. Manual intervention required.")
        send_notification("CRITICAL: Failover failed. Could not disable primary pool.")

def perform_failback():
    """Initiates failback to the primary region."""
    print("Primary region is healthy. Initiating failback...")
    if update_loadbalancer_pool(PRIMARY_POOL_ID, True):
        print("Primary pool re-enabled. Traffic should now be directed back to the primary pool.")
        # Optionally, disable secondary pool if it was enabled
        # update_loadbalancer_pool(SECONDARY_POOL_ID, False)
        send_notification("Failback initiated: Primary region re-enabled.")
    else:
        print("Failed to re-enable primary pool. Manual intervention required.")
        send_notification("WARNING: Failback failed. Could not re-enable primary pool.")

def send_notification(message):
    """Placeholder for sending notifications (e.g., Slack, email)."""
    print(f"NOTIFICATION: {message}")
    # Implement your notification logic here

def monitor_region_health():
    """Monitors health and triggers failover/failback."""
    primary_was_healthy = True
    while True:
        is_healthy = is_primary_region_healthy()
        if not is_healthy and primary_was_healthy:
            perform_failover()
            primary_was_healthy = False
        elif is_healthy and not primary_was_healthy:
            perform_failback()
            primary_was_healthy = True
        else:
            status = "healthy" if is_healthy else "unhealthy"
            print(f"Primary region is currently {status}. No action needed.")

        time.sleep(60) # Check every 60 seconds

if __name__ == "__main__":
    # Ensure you have configured your OVH API credentials and have the correct IDs
    # You might want to run this script on a separate monitoring instance.
    print("Starting region health monitoring...")
    monitor_region_health()

Considerations for Production

Data Consistency: While logical replication is powerful, ensure your application handles potential data inconsistencies during failover, especially if replication lag is significant.
Stateful Applications: If your application stores state locally (e.g., file uploads, cache), ensure this state is also replicated or managed in a shared, multi-region accessible storage solution.
DNS Propagation: If using DNS for failover, be mindful of DNS propagation delays. Lowering TTLs can help but increases DNS query load.
Testing: Regularly test your failover and failback procedures. Simulate region outages to validate your automation and recovery time objectives (RTO).
Monitoring and Alerting: Implement comprehensive monitoring for all components: infrastructure health, application performance, database replication lag, and load balancer status. Set up alerts for critical events.
Security: Secure your API credentials, SSH keys, and database access. Use network security groups and firewalls to restrict access.
Cost Management: Running infrastructure in multiple regions incurs higher costs. Optimize instance sizes and database plans.

By combining Infrastructure as Code, automated deployment, robust database replication, and intelligent load balancing, you can build a highly available and resilient Python architecture on OVH Public Cloud, ensuring business continuity even in the face of regional failures.

Automating Multi-Region Redundancy for Python Architectures on OVH

OVH Public Cloud: Multi-Region Strategy for Python Applications

Infrastructure as Code: Terraform for OVH Deployment

Automated Application Deployment with Ansible

Database Replication and Failover

Global Load Balancing and Health Checks

Considerations for Production

Recent Posts

Top Categories

Our Products

Our Services