• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • Home
  • Projects
  • Products
  • Themes
  • Tools
  • Request for Quote

Vengala Vinay

Having 9+ Years of Experience in Software Development

  • Home
  • WordPress
  • PHP
    • Codeigniter
  • Django
  • Magento
  • Selenium
  • Server
Home » Disaster Recovery 101: Architecting Auto-Failovers for Redis and C Deployments on OVH

Disaster Recovery 101: Architecting Auto-Failovers for Redis and C Deployments on OVH

On each application server, you’ll need to install Keepalived:

sudo apt update && sudo apt install keepalived -y # For Debian/Ubuntu
sudo yum update && sudo yum install keepalived -y # For CentOS/RHEL

The primary configuration file is /etc/keepalived/keepalived.conf. Here’s a sample configuration for two nodes:

! Configuration File for keepalived

global_defs {
   router_id app_server_1 # Unique identifier for this node
   enable_script_security
}

vrrp_script chk_app_status {
    script "/usr/local/bin/check_app_status.sh" # A script to check your application's health
    interval 2                                 # Check every 2 seconds
    weight 20                                  # Add 20 to priority if script passes
    fall 2                                     # Require 2 failures for KO
    rise 2                                     # Require 2 successes for OK
}

vrrp_instance VI_1 {
    state BACKUP                               # Set to MASTER on one node, BACKUP on others
    interface eth0                              # Network interface to bind VIP to
    virtual_router_id 51                        # Must be the same for all nodes in the group
    priority 100                                # Higher priority wins (MASTER should have higher)
    advert_int 1                                # VRRP advertisement interval
    authentication {
        auth_type PASS
        auth_pass your_vrrp_password
    }
    virtual_ipaddress {
        192.168.1.200/24 dev eth0 label eth0:vip # The Virtual IP address
    }
    track_script {
        chk_app_status
    }
}

Configuration Breakdown:

  • global_defs: Basic settings. router_id should be unique per server.
  • vrrp_script chk_app_status: Defines a script that Keepalived will execute to check the health of your application. If the script exits with status 0 (success), the node’s priority is increased.
  • vrrp_instance VI_1: Defines a VRRP instance.
  • state: Set to MASTER on one server and BACKUP on the others.
  • interface: The network interface where the VIP will be active.
  • virtual_router_id: A unique number (0-255) for this VRRP group. All nodes in the group must use the same ID.
  • priority: Determines which node becomes MASTER. The node with the highest priority wins. The track_script can dynamically increase this priority.
  • authentication: Simple password authentication for VRRP packets.
  • virtual_ipaddress: The IP address that will be managed by Keepalived. The dev and label are important for binding the IP correctly.
  • track_script: Links the VRRP instance to the health check script.

You’ll need to create the /usr/local/bin/check_app_status.sh script. This script should return 0 if the application is healthy and non-zero otherwise. For example:

#!/bin/bash
# Check if your application process is running or if a critical port is open
if pgrep -x "your_app_process_name" > /dev/null
then
    exit 0 # Application is running
else
    exit 1 # Application is not running
fi

Make the script executable: sudo chmod +x /usr/local/bin/check_app_status.sh.

Integrating with OVH Dedicated IPs

The VIP configured in Keepalived (e.g., 192.168.1.200 in the example) is a private IP. To make your application accessible from the internet, you need to associate this VIP with an OVH Dedicated IP (or Floating IP). OVH provides mechanisms to manage these IPs.

Option 1: Using OVH Floating IPs

OVH Floating IPs are designed for this purpose. You can allocate a Floating IP in your OVH Control Panel and then associate it with the primary network interface of your active application server. When a failover occurs, you need a mechanism to re-route the Floating IP to the new active server.

This re-routing can be automated. A script can be triggered by Keepalived’s state changes (e.g., via the notify_master, notify_backup, notify_fault directives in keepalived.conf) to call the OVH API and re-assign the Floating IP.

# Example keepalived.conf snippet with notify scripts
vrrp_instance VI_1 {
    # ... other configurations ...
    notify_master "/usr/local/bin/ovh_ip_manage.sh assign_floating_ip YOUR_FLOATING_IP_ID"
    notify_backup "/usr/local/bin/ovh_ip_manage.sh release_floating_ip YOUR_FLOATING_IP_ID"
    notify_fault "/usr/local/bin/ovh_ip_manage.sh release_floating_ip YOUR_FLOATING_IP_ID"
}

The ovh_ip_manage.sh script would use the OVH API (similar to the Python example for Redis) to associate or disassociate the Floating IP with the current server’s network interface. You’ll need to grant the OVH API credentials appropriate permissions to manage Floating IPs.

Option 2: Using OVH Dedicated IPs with IP Routing Rules

If you are using Dedicated Servers with a block of IPs, you can configure IP routing rules within OVH. When a server fails, you can use the OVH API to change the routing rule for your dedicated IP to point to the standby server. This is conceptually similar to Floating IPs but might involve different API calls and configurations within OVH.

The key is that Keepalived manages the *internal* VIP, ensuring your application process is running and ready on the active node. The OVH API then handles the *external* IP routing to direct internet traffic to that active node.

Application-Level Considerations

For stateful applications, ensure that state is either replicated or accessible from both nodes (e.g., shared storage, external database). If your application stores state locally, a failover will result in data loss unless that state is synchronized or persisted externally.

The check_app_status.sh script is critical. It should be sophisticated enough to detect not just if the process is running, but if the application is actually responsive and serving requests correctly. This might involve making a simple API call, checking a health endpoint, or attempting a basic operation.

Testing your failover mechanism thoroughly is paramount. Simulate node failures, network partitions, and application crashes to ensure Keepalived and your OVH IP management scripts behave as expected.

Automated Redis Failover with Sentinel and OVH Load Balancers

Achieving high availability for critical services like Redis requires robust disaster recovery strategies. For deployments on OVH, a common pattern involves leveraging Redis Sentinel for automatic failover and integrating with OVH’s network infrastructure for seamless client redirection. This section details the architecture and configuration for such a setup.

The core of Redis high availability lies in Redis Sentinel. Sentinel is a distributed system that monitors Redis instances, handles automatic failover, and provides configuration discovery for clients. A typical Sentinel setup involves at least three Sentinel instances to ensure quorum and avoid split-brain scenarios.

Sentinel Configuration and Deployment

We’ll deploy Sentinel instances on separate OVH instances, ideally in different availability zones or even regions for maximum resilience. Each Sentinel instance needs to be configured to monitor the primary Redis master and its replicas.

Here’s a sample sentinel.conf file:

port 26379
sentinel monitor mymaster 192.168.1.100 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
sentinel auth-pass mymaster YOUR_REDIS_PASSWORD

# For replicas, if you have them configured
sentinel client-reconnect-interval mymaster 1000

# Logging
logfile "/var/log/redis/sentinel.log"
dir "/tmp"

Explanation:

  • port 26379: The default port for Sentinel.
  • sentinel monitor mymaster 192.168.1.100 6379 2: This is the most crucial directive. It tells Sentinel to monitor a Redis master named ‘mymaster’ running on 192.168.1.100:6379. The ‘2’ indicates that at least two Sentinels must agree that the master is down for a failover to be initiated.
  • sentinel down-after-milliseconds mymaster 5000: If a master is unreachable for 5000 milliseconds (5 seconds), it’s considered down.
  • sentinel failover-timeout mymaster 60000: The maximum time (in milliseconds) to wait for a failover to complete.
  • sentinel parallel-syncs mymaster 1: During a failover, only one replica will be promoted to master at a time.
  • sentinel auth-pass mymaster YOUR_REDIS_PASSWORD: If your Redis instances require authentication, provide the password here.

To start Sentinel, assuming you have Redis installed:

redis-sentinel /etc/redis/sentinel.conf

It’s highly recommended to run Sentinel under a process manager like systemd for automatic restarts and monitoring.

OVH Load Balancer Integration

OVH’s load balancers (e.g., Load Balancer, Network Load Balancer) are essential for directing client traffic to the current Redis master. Sentinel itself doesn’t directly manage external network traffic; it provides the information. The load balancer needs to be configured to query Sentinel for the master’s IP address and port.

OVH Load Balancers typically support health checks. We can configure these health checks to query Sentinel for the master’s status. A common approach is to use a simple TCP check on the Redis port (6379) of the *potential* master. However, for true automated failover redirection, the load balancer needs to be dynamic or have a mechanism to query Sentinel.

Scenario 1: Static Configuration with Manual Update (Less Ideal)

In a simpler setup, you might manually update the load balancer’s backend server list when a failover occurs. This is reactive and not fully automated.

Scenario 2: Dynamic Load Balancer Configuration via API

The more robust solution involves using the OVH API to dynamically update the load balancer’s backend pool. A script, triggered by Sentinel events or running periodically, can query Sentinel for the current master and update the load balancer configuration.

First, ensure your Redis instances (master and replicas) are configured with appropriate network accessibility and potentially firewall rules to allow traffic from the load balancer and Sentinel. Also, ensure Sentinel can reach all Redis instances.

A Python script using the OVH API could look something like this (simplified):

import redis
import ovh
import json
import time

# --- Configuration ---
REDIS_SENTINEL_HOST = 'your_sentinel_ip'
REDIS_SENTINEL_PORT = 26379
REDIS_MASTER_NAME = 'mymaster'
REDIS_PASSWORD = 'YOUR_REDIS_PASSWORD' # If applicable

OVH_ENDPOINT = 'ovh-eu' # e.g., 'ovh-eu', 'soyoustart-eu', 'runabove-us'
OVH_APPLICATION_KEY = 'YOUR_APP_KEY'
OVH_APPLICATION_SECRET = 'YOUR_APP_SECRET'
OVH_CONSUMER_KEY = 'YOUR_CONSUMER_KEY'

LOADBALANCER_ID = 'your_loadbalancer_id' # From OVH Control Panel
FRONTEND_ID = 'your_frontend_id' # From OVH Control Panel
BACKEND_POOL_NAME = 'redis_backend' # A logical name for your backend pool

# --- OVH API Client Initialization ---
client = ovh.Client(
    endpoint=OVH_ENDPOINT,
    application_key=OVH_APPLICATION_KEY,
    application_secret=OVH_APPLICATION_SECRET,
    consumer_key=OVH_CONSUMER_KEY
)

# --- Redis Client Initialization ---
try:
    sentinel = redis.Sentinel(
        [(REDIS_SENTINEL_HOST, REDIS_SENTINEL_PORT)],
        redis_password=REDIS_PASSWORD,
        sentinel_kwargs={'socket_timeout': 0.5}
    )
    # Test connection to Sentinel
    sentinel.discover_master_nodes()
    print("Successfully connected to Redis Sentinel.")
except redis.exceptions.ConnectionError as e:
    print(f"Error connecting to Redis Sentinel: {e}")
    exit(1)

def get_current_redis_master():
    """Fetches the current Redis master IP and port from Sentinel."""
    try:
        master_details = sentinel.master_for(REDIS_MASTER_NAME, socket_timeout=1)
        # The master_for call returns a Redis client instance.
        # We need to get the host and port from the connection pool.
        host = master_details.connection_pool.host
        port = master_details.connection_pool.port
        return host, port
    except redis.exceptions.ConnectionError as e:
        print(f"Error discovering Redis master: {e}")
        return None, None
    except Exception as e:
        print(f"An unexpected error occurred while querying Sentinel: {e}")
        return None, None

def get_ovh_backend_servers(lb_id, frontend_id):
    """Retrieves current backend servers for a given frontend."""
    try:
        path = f"/cloud/loadBalancer/{lb_id}/frontend/{frontend_id}"
        response = client.get(path)
        # Assuming the backend servers are listed under 'defaultBackend' or similar
        # This part is highly dependent on the exact OVH Load Balancer API structure
        # You might need to inspect the API response for your specific LB type.
        # For example, if it's a simple pool:
        if 'defaultBackend' in response and 'id' in response['defaultBackend']:
            backend_id = response['defaultBackend']['id']
            backend_path = f"/cloud/loadBalancer/{lb_id}/backend/{backend_id}"
            backend_details = client.get(backend_path)
            return backend_details.get('servers', [])
        else:
            print("Could not find default backend configuration in frontend details.")
            return []
    except ovh.exceptions.APIError as e:
        print(f"OVH API Error getting backend servers: {e}")
        return []
    except Exception as e:
        print(f"An unexpected error occurred while fetching OVH backend servers: {e}")
        return []

def update_ovh_loadbalancer(lb_id, frontend_id, new_master_ip, new_master_port):
    """Updates the OVH Load Balancer backend with the new Redis master."""
    print(f"Attempting to update OVH Load Balancer {lb_id} frontend {frontend_id}...")

    # First, get the current backend configuration to find the backend ID
    try:
        frontend_path = f"/cloud/loadBalancer/{lb_id}/frontend/{frontend_id}"
        frontend_details = client.get(frontend_path)

        if 'defaultBackend' not in frontend_details or 'id' not in frontend_details['defaultBackend']:
            print("Error: Could not find default backend ID for the frontend.")
            return False

        backend_id = frontend_details['defaultBackend']['id']
        backend_path = f"/cloud/loadBalancer/{lb_id}/backend/{backend_id}"
        backend_details = client.get(backend_path)

        current_servers = backend_details.get('servers', [])
        print(f"Current backend servers: {current_servers}")

        # Construct the new server list.
        # We assume a single Redis master in the backend pool.
        # If you have multiple backends or complex routing, this needs adjustment.
        new_server_config = {
            "address": new_master_ip,
            "port": new_master_port,
            "status": "active", # Or 'backup' if you want to test failover
            "weight": 100,
            "ssl": False # Assuming no SSL for Redis
        }

        # Check if the master IP/Port has actually changed to avoid unnecessary API calls
        if current_servers and current_servers[0]['address'] == new_master_ip and current_servers[0]['port'] == new_master_port:
            print("Redis master has not changed. No update needed.")
            return True

        # Prepare the update payload for the backend
        update_payload = {
            "servers": [new_server_config],
            "healthCheck": backend_details.get('healthCheck', {}), # Preserve existing health check config
            "stickySessions": backend_details.get('stickySessions', 'none'),
            "timeout": backend_details.get('timeout', 60)
        }

        # Update the backend
        print(f"Updating backend {backend_id} with new server: {new_master_ip}:{new_master_port}")
        client.put(backend_path, json.dumps(update_payload))
        print("OVH Load Balancer backend updated successfully.")
        return True

    except ovh.exceptions.APIError as e:
        print(f"OVH API Error updating load balancer: {e}")
        return False
    except Exception as e:
        print(f"An unexpected error occurred during OVH LB update: {e}")
        return False

def main_loop():
    """Main loop to periodically check and update the load balancer."""
    last_master_ip = None
    last_master_port = None

    while True:
        master_ip, master_port = get_current_redis_master()

        if master_ip and master_port:
            if master_ip != last_master_ip or master_port != last_master_port:
                print(f"Detected Redis master change: {master_ip}:{master_port}")
                success = update_ovh_loadbalancer(LOADBALANCER_ID, FRONTEND_ID, master_ip, master_port)
                if success:
                    last_master_ip = master_ip
                    last_master_port = master_port
            else:
                print(f"Redis master is still {master_ip}:{master_port}. No change detected.")
        else:
            print("Could not determine current Redis master. Waiting...")

        time.sleep(30) # Check every 30 seconds

if __name__ == "__main__":
    # --- Initial Check and Update ---
    print("Performing initial check and update...")
    master_ip, master_port = get_current_redis_master()
    if master_ip and master_port:
        update_ovh_loadbalancer(LOADBALANCER_ID, FRONTEND_ID, master_ip, master_port)
        last_master_ip = master_ip
        last_master_port = master_port
    else:
        print("Failed to get initial Redis master. Load balancer may not be updated.")

    # --- Start Periodic Monitoring ---
    print("Starting periodic load balancer monitoring...")
    main_loop()

Prerequisites for the Python script:

  • Install the OVH Python SDK: pip install ovh
  • Install the Redis Python client: pip install redis
  • Obtain OVH API credentials (Application Key, Secret, Consumer Key) by creating an application in the OVH Control Panel. Grant it the necessary permissions (e.g., GET and PUT on /cloud/loadBalancer).
  • Identify your Load Balancer ID, Frontend ID, and the specific backend configuration within your OVH setup. This often requires inspecting the Load Balancer’s configuration in the OVH Control Panel or via the API.
  • Ensure the script has network access to both your Redis Sentinel instance and the OVH API endpoints.

This script should be run on a reliable instance within your OVH environment, potentially managed by systemd to ensure it’s always running. The script periodically queries Sentinel for the current master and, if it differs from the last known master, updates the OVH Load Balancer’s backend configuration via the API.

Automated C/C++ Application Failover with Keepalived and OVH IPs

For stateless or stateful C/C++ applications that require high availability, a common pattern is to use a virtual IP (VIP) managed by a high-availability cluster solution like Keepalived. This VIP is then routed to the active application instance. OVH’s infrastructure allows for the management of these IPs, enabling seamless failover.

Keepalived Configuration for VIP Management

Keepalived uses the Virtual Router Redundancy Protocol (VRRP) to manage a shared IP address between two or more servers. One server holds the VIP and is considered “MASTER,” while the other(s) are “BACKUP” and take over if the MASTER fails.

We’ll deploy two or more application servers on OVH, each running Keepalived. These servers will share a dedicated OVH IP address that will be assigned to the VIP.

On each application server, you’ll need to install Keepalived:

sudo apt update && sudo apt install keepalived -y # For Debian/Ubuntu
sudo yum update && sudo yum install keepalived -y # For CentOS/RHEL

The primary configuration file is /etc/keepalived/keepalived.conf. Here’s a sample configuration for two nodes:

! Configuration File for keepalived

global_defs {
   router_id app_server_1 # Unique identifier for this node
   enable_script_security
}

vrrp_script chk_app_status {
    script "/usr/local/bin/check_app_status.sh" # A script to check your application's health
    interval 2                                 # Check every 2 seconds
    weight 20                                  # Add 20 to priority if script passes
    fall 2                                     # Require 2 failures for KO
    rise 2                                     # Require 2 successes for OK
}

vrrp_instance VI_1 {
    state BACKUP                               # Set to MASTER on one node, BACKUP on others
    interface eth0                              # Network interface to bind VIP to
    virtual_router_id 51                        # Must be the same for all nodes in the group
    priority 100                                # Higher priority wins (MASTER should have higher)
    advert_int 1                                # VRRP advertisement interval
    authentication {
        auth_type PASS
        auth_pass your_vrrp_password
    }
    virtual_ipaddress {
        192.168.1.200/24 dev eth0 label eth0:vip # The Virtual IP address
    }
    track_script {
        chk_app_status
    }
}

Configuration Breakdown:

  • global_defs: Basic settings. router_id should be unique per server.
  • vrrp_script chk_app_status: Defines a script that Keepalived will execute to check the health of your application. If the script exits with status 0 (success), the node’s priority is increased.
  • vrrp_instance VI_1: Defines a VRRP instance.
  • state: Set to MASTER on one server and BACKUP on the others.
  • interface: The network interface where the VIP will be active.
  • virtual_router_id: A unique number (0-255) for this VRRP group. All nodes in the group must use the same ID.
  • priority: Determines which node becomes MASTER. The node with the highest priority wins. The track_script can dynamically increase this priority.
  • authentication: Simple password authentication for VRRP packets.
  • virtual_ipaddress: The IP address that will be managed by Keepalived. The dev and label are important for binding the IP correctly.
  • track_script: Links the VRRP instance to the health check script.

You’ll need to create the /usr/local/bin/check_app_status.sh script. This script should return 0 if the application is healthy and non-zero otherwise. For example:

#!/bin/bash
# Check if your application process is running or if a critical port is open
if pgrep -x "your_app_process_name" > /dev/null
then
    exit 0 # Application is running
else
    exit 1 # Application is not running
fi

Make the script executable: sudo chmod +x /usr/local/bin/check_app_status.sh.

Integrating with OVH Dedicated IPs

The VIP configured in Keepalived (e.g., 192.168.1.200 in the example) is a private IP. To make your application accessible from the internet, you need to associate this VIP with an OVH Dedicated IP (or Floating IP). OVH provides mechanisms to manage these IPs.

Option 1: Using OVH Floating IPs

OVH Floating IPs are designed for this purpose. You can allocate a Floating IP in your OVH Control Panel and then associate it with the primary network interface of your active application server. When a failover occurs, you need a mechanism to re-route the Floating IP to the new active server.

This re-routing can be automated. A script can be triggered by Keepalived’s state changes (e.g., via the notify_master, notify_backup, notify_fault directives in keepalived.conf) to call the OVH API and re-assign the Floating IP.

# Example keepalived.conf snippet with notify scripts
vrrp_instance VI_1 {
    # ... other configurations ...
    notify_master "/usr/local/bin/ovh_ip_manage.sh assign_floating_ip YOUR_FLOATING_IP_ID"
    notify_backup "/usr/local/bin/ovh_ip_manage.sh release_floating_ip YOUR_FLOATING_IP_ID"
    notify_fault "/usr/local/bin/ovh_ip_manage.sh release_floating_ip YOUR_FLOATING_IP_ID"
}

The ovh_ip_manage.sh script would use the OVH API (similar to the Python example for Redis) to associate or disassociate the Floating IP with the current server’s network interface. You’ll need to grant the OVH API credentials appropriate permissions to manage Floating IPs.

Option 2: Using OVH Dedicated IPs with IP Routing Rules

If you are using Dedicated Servers with a block of IPs, you can configure IP routing rules within OVH. When a server fails, you can use the OVH API to change the routing rule for your dedicated IP to point to the standby server. This is conceptually similar to Floating IPs but might involve different API calls and configurations within OVH.

The key is that Keepalived manages the *internal* VIP, ensuring your application process is running and ready on the active node. The OVH API then handles the *external* IP routing to direct internet traffic to that active node.

Application-Level Considerations

For stateful applications, ensure that state is either replicated or accessible from both nodes (e.g., shared storage, external database). If your application stores state locally, a failover will result in data loss unless that state is synchronized or persisted externally.

The check_app_status.sh script is critical. It should be sophisticated enough to detect not just if the process is running, but if the application is actually responsive and serving requests correctly. This might involve making a simple API call, checking a health endpoint, or attempting a basic operation.

Testing your failover mechanism thoroughly is paramount. Simulate node failures, network partitions, and application crashes to ensure Keepalived and your OVH IP management scripts behave as expected.

Primary Sidebar

A little about the Author

Having 9+ Years of Experience in Software Development.
Expertised in Php Development, WordPress Custom Theme Development (From scratch using underscores or Genesis Framework or using any blank theme or Premium Theme), Custom Plugin Development. Hands on Experience on 3rd Party Php Extension like Chilkat, nSoftware.

Recent Posts

  • Disaster Recovery 101: Architecting Auto-Failovers for Redis and PHP Deployments on OVH
  • How We Audited a High-Traffic WooCommerce Enterprise Stack on Google Cloud and Mitigated Race conditions during high-concurrency payment processing
  • Disaster Recovery 101: Architecting Auto-Failovers for Elasticsearch and Magento 2 Deployments on DigitalOcean
  • An Auditor’s Checklist for Securing WordPress Backends on OVH
  • Step-by-Step: Diagnosing Perl script high CPU throttling due to unoptimized regular expressions on AWS Servers

Copyright © 2026 · Vinay Vengala