Disaster Recovery 101: Architecting Auto-Failovers for Redis and C Deployments on OVH
On each application server, you’ll need to install Keepalived:
sudo apt update && sudo apt install keepalived -y # For Debian/Ubuntu sudo yum update && sudo yum install keepalived -y # For CentOS/RHEL
The primary configuration file is /etc/keepalived/keepalived.conf. Here’s a sample configuration for two nodes:
! Configuration File for keepalived
global_defs {
router_id app_server_1 # Unique identifier for this node
enable_script_security
}
vrrp_script chk_app_status {
script "/usr/local/bin/check_app_status.sh" # A script to check your application's health
interval 2 # Check every 2 seconds
weight 20 # Add 20 to priority if script passes
fall 2 # Require 2 failures for KO
rise 2 # Require 2 successes for OK
}
vrrp_instance VI_1 {
state BACKUP # Set to MASTER on one node, BACKUP on others
interface eth0 # Network interface to bind VIP to
virtual_router_id 51 # Must be the same for all nodes in the group
priority 100 # Higher priority wins (MASTER should have higher)
advert_int 1 # VRRP advertisement interval
authentication {
auth_type PASS
auth_pass your_vrrp_password
}
virtual_ipaddress {
192.168.1.200/24 dev eth0 label eth0:vip # The Virtual IP address
}
track_script {
chk_app_status
}
}
Configuration Breakdown:
global_defs: Basic settings.router_idshould be unique per server.vrrp_script chk_app_status: Defines a script that Keepalived will execute to check the health of your application. If the script exits with status 0 (success), the node’s priority is increased.vrrp_instance VI_1: Defines a VRRP instance.state: Set toMASTERon one server andBACKUPon the others.interface: The network interface where the VIP will be active.virtual_router_id: A unique number (0-255) for this VRRP group. All nodes in the group must use the same ID.priority: Determines which node becomes MASTER. The node with the highest priority wins. Thetrack_scriptcan dynamically increase this priority.authentication: Simple password authentication for VRRP packets.virtual_ipaddress: The IP address that will be managed by Keepalived. Thedevandlabelare important for binding the IP correctly.track_script: Links the VRRP instance to the health check script.
You’ll need to create the /usr/local/bin/check_app_status.sh script. This script should return 0 if the application is healthy and non-zero otherwise. For example:
#!/bin/bash
# Check if your application process is running or if a critical port is open
if pgrep -x "your_app_process_name" > /dev/null
then
exit 0 # Application is running
else
exit 1 # Application is not running
fi
Make the script executable: sudo chmod +x /usr/local/bin/check_app_status.sh.
Integrating with OVH Dedicated IPs
The VIP configured in Keepalived (e.g., 192.168.1.200 in the example) is a private IP. To make your application accessible from the internet, you need to associate this VIP with an OVH Dedicated IP (or Floating IP). OVH provides mechanisms to manage these IPs.
Option 1: Using OVH Floating IPs
OVH Floating IPs are designed for this purpose. You can allocate a Floating IP in your OVH Control Panel and then associate it with the primary network interface of your active application server. When a failover occurs, you need a mechanism to re-route the Floating IP to the new active server.
This re-routing can be automated. A script can be triggered by Keepalived’s state changes (e.g., via the notify_master, notify_backup, notify_fault directives in keepalived.conf) to call the OVH API and re-assign the Floating IP.
# Example keepalived.conf snippet with notify scripts
vrrp_instance VI_1 {
# ... other configurations ...
notify_master "/usr/local/bin/ovh_ip_manage.sh assign_floating_ip YOUR_FLOATING_IP_ID"
notify_backup "/usr/local/bin/ovh_ip_manage.sh release_floating_ip YOUR_FLOATING_IP_ID"
notify_fault "/usr/local/bin/ovh_ip_manage.sh release_floating_ip YOUR_FLOATING_IP_ID"
}
The ovh_ip_manage.sh script would use the OVH API (similar to the Python example for Redis) to associate or disassociate the Floating IP with the current server’s network interface. You’ll need to grant the OVH API credentials appropriate permissions to manage Floating IPs.
Option 2: Using OVH Dedicated IPs with IP Routing Rules
If you are using Dedicated Servers with a block of IPs, you can configure IP routing rules within OVH. When a server fails, you can use the OVH API to change the routing rule for your dedicated IP to point to the standby server. This is conceptually similar to Floating IPs but might involve different API calls and configurations within OVH.
The key is that Keepalived manages the *internal* VIP, ensuring your application process is running and ready on the active node. The OVH API then handles the *external* IP routing to direct internet traffic to that active node.
Application-Level Considerations
For stateful applications, ensure that state is either replicated or accessible from both nodes (e.g., shared storage, external database). If your application stores state locally, a failover will result in data loss unless that state is synchronized or persisted externally.
The check_app_status.sh script is critical. It should be sophisticated enough to detect not just if the process is running, but if the application is actually responsive and serving requests correctly. This might involve making a simple API call, checking a health endpoint, or attempting a basic operation.
Testing your failover mechanism thoroughly is paramount. Simulate node failures, network partitions, and application crashes to ensure Keepalived and your OVH IP management scripts behave as expected.
Automated Redis Failover with Sentinel and OVH Load Balancers
Achieving high availability for critical services like Redis requires robust disaster recovery strategies. For deployments on OVH, a common pattern involves leveraging Redis Sentinel for automatic failover and integrating with OVH’s network infrastructure for seamless client redirection. This section details the architecture and configuration for such a setup.
The core of Redis high availability lies in Redis Sentinel. Sentinel is a distributed system that monitors Redis instances, handles automatic failover, and provides configuration discovery for clients. A typical Sentinel setup involves at least three Sentinel instances to ensure quorum and avoid split-brain scenarios.
Sentinel Configuration and Deployment
We’ll deploy Sentinel instances on separate OVH instances, ideally in different availability zones or even regions for maximum resilience. Each Sentinel instance needs to be configured to monitor the primary Redis master and its replicas.
Here’s a sample sentinel.conf file:
port 26379 sentinel monitor mymaster 192.168.1.100 6379 2 sentinel down-after-milliseconds mymaster 5000 sentinel failover-timeout mymaster 60000 sentinel parallel-syncs mymaster 1 sentinel auth-pass mymaster YOUR_REDIS_PASSWORD # For replicas, if you have them configured sentinel client-reconnect-interval mymaster 1000 # Logging logfile "/var/log/redis/sentinel.log" dir "/tmp"
Explanation:
port 26379: The default port for Sentinel.sentinel monitor mymaster 192.168.1.100 6379 2: This is the most crucial directive. It tells Sentinel to monitor a Redis master named ‘mymaster’ running on192.168.1.100:6379. The ‘2’ indicates that at least two Sentinels must agree that the master is down for a failover to be initiated.sentinel down-after-milliseconds mymaster 5000: If a master is unreachable for 5000 milliseconds (5 seconds), it’s considered down.sentinel failover-timeout mymaster 60000: The maximum time (in milliseconds) to wait for a failover to complete.sentinel parallel-syncs mymaster 1: During a failover, only one replica will be promoted to master at a time.sentinel auth-pass mymaster YOUR_REDIS_PASSWORD: If your Redis instances require authentication, provide the password here.
To start Sentinel, assuming you have Redis installed:
redis-sentinel /etc/redis/sentinel.conf
It’s highly recommended to run Sentinel under a process manager like systemd for automatic restarts and monitoring.
OVH Load Balancer Integration
OVH’s load balancers (e.g., Load Balancer, Network Load Balancer) are essential for directing client traffic to the current Redis master. Sentinel itself doesn’t directly manage external network traffic; it provides the information. The load balancer needs to be configured to query Sentinel for the master’s IP address and port.
OVH Load Balancers typically support health checks. We can configure these health checks to query Sentinel for the master’s status. A common approach is to use a simple TCP check on the Redis port (6379) of the *potential* master. However, for true automated failover redirection, the load balancer needs to be dynamic or have a mechanism to query Sentinel.
Scenario 1: Static Configuration with Manual Update (Less Ideal)
In a simpler setup, you might manually update the load balancer’s backend server list when a failover occurs. This is reactive and not fully automated.
Scenario 2: Dynamic Load Balancer Configuration via API
The more robust solution involves using the OVH API to dynamically update the load balancer’s backend pool. A script, triggered by Sentinel events or running periodically, can query Sentinel for the current master and update the load balancer configuration.
First, ensure your Redis instances (master and replicas) are configured with appropriate network accessibility and potentially firewall rules to allow traffic from the load balancer and Sentinel. Also, ensure Sentinel can reach all Redis instances.
A Python script using the OVH API could look something like this (simplified):
import redis
import ovh
import json
import time
# --- Configuration ---
REDIS_SENTINEL_HOST = 'your_sentinel_ip'
REDIS_SENTINEL_PORT = 26379
REDIS_MASTER_NAME = 'mymaster'
REDIS_PASSWORD = 'YOUR_REDIS_PASSWORD' # If applicable
OVH_ENDPOINT = 'ovh-eu' # e.g., 'ovh-eu', 'soyoustart-eu', 'runabove-us'
OVH_APPLICATION_KEY = 'YOUR_APP_KEY'
OVH_APPLICATION_SECRET = 'YOUR_APP_SECRET'
OVH_CONSUMER_KEY = 'YOUR_CONSUMER_KEY'
LOADBALANCER_ID = 'your_loadbalancer_id' # From OVH Control Panel
FRONTEND_ID = 'your_frontend_id' # From OVH Control Panel
BACKEND_POOL_NAME = 'redis_backend' # A logical name for your backend pool
# --- OVH API Client Initialization ---
client = ovh.Client(
endpoint=OVH_ENDPOINT,
application_key=OVH_APPLICATION_KEY,
application_secret=OVH_APPLICATION_SECRET,
consumer_key=OVH_CONSUMER_KEY
)
# --- Redis Client Initialization ---
try:
sentinel = redis.Sentinel(
[(REDIS_SENTINEL_HOST, REDIS_SENTINEL_PORT)],
redis_password=REDIS_PASSWORD,
sentinel_kwargs={'socket_timeout': 0.5}
)
# Test connection to Sentinel
sentinel.discover_master_nodes()
print("Successfully connected to Redis Sentinel.")
except redis.exceptions.ConnectionError as e:
print(f"Error connecting to Redis Sentinel: {e}")
exit(1)
def get_current_redis_master():
"""Fetches the current Redis master IP and port from Sentinel."""
try:
master_details = sentinel.master_for(REDIS_MASTER_NAME, socket_timeout=1)
# The master_for call returns a Redis client instance.
# We need to get the host and port from the connection pool.
host = master_details.connection_pool.host
port = master_details.connection_pool.port
return host, port
except redis.exceptions.ConnectionError as e:
print(f"Error discovering Redis master: {e}")
return None, None
except Exception as e:
print(f"An unexpected error occurred while querying Sentinel: {e}")
return None, None
def get_ovh_backend_servers(lb_id, frontend_id):
"""Retrieves current backend servers for a given frontend."""
try:
path = f"/cloud/loadBalancer/{lb_id}/frontend/{frontend_id}"
response = client.get(path)
# Assuming the backend servers are listed under 'defaultBackend' or similar
# This part is highly dependent on the exact OVH Load Balancer API structure
# You might need to inspect the API response for your specific LB type.
# For example, if it's a simple pool:
if 'defaultBackend' in response and 'id' in response['defaultBackend']:
backend_id = response['defaultBackend']['id']
backend_path = f"/cloud/loadBalancer/{lb_id}/backend/{backend_id}"
backend_details = client.get(backend_path)
return backend_details.get('servers', [])
else:
print("Could not find default backend configuration in frontend details.")
return []
except ovh.exceptions.APIError as e:
print(f"OVH API Error getting backend servers: {e}")
return []
except Exception as e:
print(f"An unexpected error occurred while fetching OVH backend servers: {e}")
return []
def update_ovh_loadbalancer(lb_id, frontend_id, new_master_ip, new_master_port):
"""Updates the OVH Load Balancer backend with the new Redis master."""
print(f"Attempting to update OVH Load Balancer {lb_id} frontend {frontend_id}...")
# First, get the current backend configuration to find the backend ID
try:
frontend_path = f"/cloud/loadBalancer/{lb_id}/frontend/{frontend_id}"
frontend_details = client.get(frontend_path)
if 'defaultBackend' not in frontend_details or 'id' not in frontend_details['defaultBackend']:
print("Error: Could not find default backend ID for the frontend.")
return False
backend_id = frontend_details['defaultBackend']['id']
backend_path = f"/cloud/loadBalancer/{lb_id}/backend/{backend_id}"
backend_details = client.get(backend_path)
current_servers = backend_details.get('servers', [])
print(f"Current backend servers: {current_servers}")
# Construct the new server list.
# We assume a single Redis master in the backend pool.
# If you have multiple backends or complex routing, this needs adjustment.
new_server_config = {
"address": new_master_ip,
"port": new_master_port,
"status": "active", # Or 'backup' if you want to test failover
"weight": 100,
"ssl": False # Assuming no SSL for Redis
}
# Check if the master IP/Port has actually changed to avoid unnecessary API calls
if current_servers and current_servers[0]['address'] == new_master_ip and current_servers[0]['port'] == new_master_port:
print("Redis master has not changed. No update needed.")
return True
# Prepare the update payload for the backend
update_payload = {
"servers": [new_server_config],
"healthCheck": backend_details.get('healthCheck', {}), # Preserve existing health check config
"stickySessions": backend_details.get('stickySessions', 'none'),
"timeout": backend_details.get('timeout', 60)
}
# Update the backend
print(f"Updating backend {backend_id} with new server: {new_master_ip}:{new_master_port}")
client.put(backend_path, json.dumps(update_payload))
print("OVH Load Balancer backend updated successfully.")
return True
except ovh.exceptions.APIError as e:
print(f"OVH API Error updating load balancer: {e}")
return False
except Exception as e:
print(f"An unexpected error occurred during OVH LB update: {e}")
return False
def main_loop():
"""Main loop to periodically check and update the load balancer."""
last_master_ip = None
last_master_port = None
while True:
master_ip, master_port = get_current_redis_master()
if master_ip and master_port:
if master_ip != last_master_ip or master_port != last_master_port:
print(f"Detected Redis master change: {master_ip}:{master_port}")
success = update_ovh_loadbalancer(LOADBALANCER_ID, FRONTEND_ID, master_ip, master_port)
if success:
last_master_ip = master_ip
last_master_port = master_port
else:
print(f"Redis master is still {master_ip}:{master_port}. No change detected.")
else:
print("Could not determine current Redis master. Waiting...")
time.sleep(30) # Check every 30 seconds
if __name__ == "__main__":
# --- Initial Check and Update ---
print("Performing initial check and update...")
master_ip, master_port = get_current_redis_master()
if master_ip and master_port:
update_ovh_loadbalancer(LOADBALANCER_ID, FRONTEND_ID, master_ip, master_port)
last_master_ip = master_ip
last_master_port = master_port
else:
print("Failed to get initial Redis master. Load balancer may not be updated.")
# --- Start Periodic Monitoring ---
print("Starting periodic load balancer monitoring...")
main_loop()
Prerequisites for the Python script:
- Install the OVH Python SDK:
pip install ovh - Install the Redis Python client:
pip install redis - Obtain OVH API credentials (Application Key, Secret, Consumer Key) by creating an application in the OVH Control Panel. Grant it the necessary permissions (e.g.,
GETandPUTon/cloud/loadBalancer). - Identify your Load Balancer ID, Frontend ID, and the specific backend configuration within your OVH setup. This often requires inspecting the Load Balancer’s configuration in the OVH Control Panel or via the API.
- Ensure the script has network access to both your Redis Sentinel instance and the OVH API endpoints.
This script should be run on a reliable instance within your OVH environment, potentially managed by systemd to ensure it’s always running. The script periodically queries Sentinel for the current master and, if it differs from the last known master, updates the OVH Load Balancer’s backend configuration via the API.
Automated C/C++ Application Failover with Keepalived and OVH IPs
For stateless or stateful C/C++ applications that require high availability, a common pattern is to use a virtual IP (VIP) managed by a high-availability cluster solution like Keepalived. This VIP is then routed to the active application instance. OVH’s infrastructure allows for the management of these IPs, enabling seamless failover.
Keepalived Configuration for VIP Management
Keepalived uses the Virtual Router Redundancy Protocol (VRRP) to manage a shared IP address between two or more servers. One server holds the VIP and is considered “MASTER,” while the other(s) are “BACKUP” and take over if the MASTER fails.
We’ll deploy two or more application servers on OVH, each running Keepalived. These servers will share a dedicated OVH IP address that will be assigned to the VIP.
On each application server, you’ll need to install Keepalived:
sudo apt update && sudo apt install keepalived -y # For Debian/Ubuntu sudo yum update && sudo yum install keepalived -y # For CentOS/RHEL
The primary configuration file is /etc/keepalived/keepalived.conf. Here’s a sample configuration for two nodes:
! Configuration File for keepalived
global_defs {
router_id app_server_1 # Unique identifier for this node
enable_script_security
}
vrrp_script chk_app_status {
script "/usr/local/bin/check_app_status.sh" # A script to check your application's health
interval 2 # Check every 2 seconds
weight 20 # Add 20 to priority if script passes
fall 2 # Require 2 failures for KO
rise 2 # Require 2 successes for OK
}
vrrp_instance VI_1 {
state BACKUP # Set to MASTER on one node, BACKUP on others
interface eth0 # Network interface to bind VIP to
virtual_router_id 51 # Must be the same for all nodes in the group
priority 100 # Higher priority wins (MASTER should have higher)
advert_int 1 # VRRP advertisement interval
authentication {
auth_type PASS
auth_pass your_vrrp_password
}
virtual_ipaddress {
192.168.1.200/24 dev eth0 label eth0:vip # The Virtual IP address
}
track_script {
chk_app_status
}
}
Configuration Breakdown:
global_defs: Basic settings.router_idshould be unique per server.vrrp_script chk_app_status: Defines a script that Keepalived will execute to check the health of your application. If the script exits with status 0 (success), the node’s priority is increased.vrrp_instance VI_1: Defines a VRRP instance.state: Set toMASTERon one server andBACKUPon the others.interface: The network interface where the VIP will be active.virtual_router_id: A unique number (0-255) for this VRRP group. All nodes in the group must use the same ID.priority: Determines which node becomes MASTER. The node with the highest priority wins. Thetrack_scriptcan dynamically increase this priority.authentication: Simple password authentication for VRRP packets.virtual_ipaddress: The IP address that will be managed by Keepalived. Thedevandlabelare important for binding the IP correctly.track_script: Links the VRRP instance to the health check script.
You’ll need to create the /usr/local/bin/check_app_status.sh script. This script should return 0 if the application is healthy and non-zero otherwise. For example:
#!/bin/bash
# Check if your application process is running or if a critical port is open
if pgrep -x "your_app_process_name" > /dev/null
then
exit 0 # Application is running
else
exit 1 # Application is not running
fi
Make the script executable: sudo chmod +x /usr/local/bin/check_app_status.sh.
Integrating with OVH Dedicated IPs
The VIP configured in Keepalived (e.g., 192.168.1.200 in the example) is a private IP. To make your application accessible from the internet, you need to associate this VIP with an OVH Dedicated IP (or Floating IP). OVH provides mechanisms to manage these IPs.
Option 1: Using OVH Floating IPs
OVH Floating IPs are designed for this purpose. You can allocate a Floating IP in your OVH Control Panel and then associate it with the primary network interface of your active application server. When a failover occurs, you need a mechanism to re-route the Floating IP to the new active server.
This re-routing can be automated. A script can be triggered by Keepalived’s state changes (e.g., via the notify_master, notify_backup, notify_fault directives in keepalived.conf) to call the OVH API and re-assign the Floating IP.
# Example keepalived.conf snippet with notify scripts
vrrp_instance VI_1 {
# ... other configurations ...
notify_master "/usr/local/bin/ovh_ip_manage.sh assign_floating_ip YOUR_FLOATING_IP_ID"
notify_backup "/usr/local/bin/ovh_ip_manage.sh release_floating_ip YOUR_FLOATING_IP_ID"
notify_fault "/usr/local/bin/ovh_ip_manage.sh release_floating_ip YOUR_FLOATING_IP_ID"
}
The ovh_ip_manage.sh script would use the OVH API (similar to the Python example for Redis) to associate or disassociate the Floating IP with the current server’s network interface. You’ll need to grant the OVH API credentials appropriate permissions to manage Floating IPs.
Option 2: Using OVH Dedicated IPs with IP Routing Rules
If you are using Dedicated Servers with a block of IPs, you can configure IP routing rules within OVH. When a server fails, you can use the OVH API to change the routing rule for your dedicated IP to point to the standby server. This is conceptually similar to Floating IPs but might involve different API calls and configurations within OVH.
The key is that Keepalived manages the *internal* VIP, ensuring your application process is running and ready on the active node. The OVH API then handles the *external* IP routing to direct internet traffic to that active node.
Application-Level Considerations
For stateful applications, ensure that state is either replicated or accessible from both nodes (e.g., shared storage, external database). If your application stores state locally, a failover will result in data loss unless that state is synchronized or persisted externally.
The check_app_status.sh script is critical. It should be sophisticated enough to detect not just if the process is running, but if the application is actually responsive and serving requests correctly. This might involve making a simple API call, checking a health endpoint, or attempting a basic operation.
Testing your failover mechanism thoroughly is paramount. Simulate node failures, network partitions, and application crashes to ensure Keepalived and your OVH IP management scripts behave as expected.