Disaster Recovery 101: Architecting Auto-Failovers for Redis and Magento 2 Deployments on OVH

Automated Redis Failover with Sentinel and OVH Load Balancers

For a high-availability Magento 2 deployment, Redis is a critical component. Implementing automated failover for Redis is paramount to minimizing downtime. We’ll leverage Redis Sentinel for automatic master election and failover, coupled with OVH’s robust load balancing infrastructure to direct traffic to the healthy Redis master.

This setup assumes a multi-instance Redis deployment across different availability zones or even regions within OVH for true disaster recovery. A typical configuration involves at least three Redis instances: one master and two or more replicas. Sentinel daemons will monitor these instances.

Redis Sentinel Configuration

Each Redis Sentinel instance needs to be configured to monitor the Redis master and its replicas. The configuration file, typically sentinel.conf, should be identical across all Sentinel nodes.

Here’s a sample sentinel.conf for a Sentinel node:

# sentinel.conf

port 26379
daemonize yes
pidfile /var/run/redis_sentinel.pid
logfile /var/log/redis_sentinel.log

# Monitor the Redis master. 'mymaster' is the name of the master,
# 192.168.1.100 6379 is the IP and port of the master,
# and 2 is the quorum: the number of Sentinels that must agree
# that the master is down before initiating a failover.
# For high availability, a quorum of 2 or 3 is common.
sentinel monitor mymaster 192.168.1.100 6379 2

# The time in milliseconds the Sentinel needs to wait for a Redis
# instance to respond before considering it down.
sentinel down-after-milliseconds mymaster 5000

# The failover timeout. This is the maximum time in milliseconds
# that Sentinel will wait to reconfigure the master.
sentinel failover-timeout mymaster 60000

# The number of replicas that can be reconfigured at the same time
# during a failover.
sentinel parallel-syncs mymaster 1

# Optional: Specify the password for Redis instances if authentication is enabled.
# sentinel auth-pass mymaster YourRedisPassword

Ensure that the IP addresses (192.168.1.100 in this example) are accessible from all Sentinel nodes and that the Redis instances are configured to accept connections from the Sentinels. The quorum value is critical: setting it too low can lead to false positives and unnecessary failovers, while setting it too high can delay failover.

Redis Instance Configuration for Replication

Each Redis replica needs to be configured to replicate from the master. This is done in the redis.conf file.

# redis.conf (for replicas)

port 6379
daemonize yes
pidfile /var/run/redis_6379.pid
logfile /var/log/redis_6379.log

# If using replication, specify the master's IP and port.
# This should be set on all replica instances.
replicaof 192.168.1.100 6379

# If Redis instances are protected by a password:
# masterauth YourRedisPassword
# requirepass YourRedisPassword

# If Sentinel is used, Redis instances should be configured to accept Sentinel commands.
# This is usually handled by default, but ensure no restrictive 'bind' directives
# prevent Sentinel from connecting.

The master Redis instance does not need the replicaof directive. It should be configured with masterauth if replicas require authentication.

OVH Load Balancer Integration

OVH’s load balancers (e.g., HAProxy-based) are essential for abstracting the Redis master’s IP address. The load balancer will point to the current Redis master, and Sentinels will be responsible for updating the load balancer’s backend pool or, more commonly, the application will query Sentinel for the current master’s address.

For a seamless failover, the application (Magento 2 in this case) should be configured to use Sentinel to discover the current Redis master. Magento 2’s cache configuration supports this directly.

Magento 2 Cache Configuration for Sentinel

In Magento 2’s app/etc/env.php, you can configure the cache to use Sentinel for discovery. This eliminates the need to manually update load balancer configurations or application settings after a failover.

<?php
return [
    'backend' => [
        'front' => [
            'id_adapter' => 'redis',
            'host' => '127.0.0.1', // This host is ignored when using sentinel
            'port' => '6379',      // This port is ignored when using sentinel
            'password' => '',
            'database' => '0',
            'compress_data' => '1',
            'sentinel_master' => 'mymaster', // The name of the master defined in sentinel.conf
            'sentinel_hosts' => 'redis-sentinel-1.example.com:26379,redis-sentinel-2.example.com:26379,redis-sentinel-3.example.com:26379', // Comma-separated list of Sentinel hosts
            'sentinel_password' => '', // If Sentinels require authentication
        ],
        // ... other cache types
    ],
    // ... other configuration
];
?>

When Magento 2’s cache is initialized, it will connect to the specified sentinel_hosts, query for the current master of mymaster, and then connect directly to that master. If a failover occurs, Magento will query Sentinel again, discover the new master, and reconnect.

Orchestrating Failover with OVH Load Balancers (Alternative/Complementary)

While Magento’s direct Sentinel integration is preferred for Redis, you might still want an OVH load balancer as a primary endpoint for other services or as a fallback. In this scenario, the load balancer would point to a pool of Redis instances. When a failover occurs, Sentinel would trigger a script (e.g., via post-failover-script in sentinel.conf) to update the load balancer’s backend configuration.

# Example post-failover-script.sh
#!/bin/bash

NEW_MASTER_IP=$1
NEW_MASTER_PORT=$2
MASTER_NAME=$3

# This script would interact with the OVH API or a local HAProxy configuration
# to update the backend pool for the Redis service.

echo "Failover detected for $MASTER_NAME. New master is $NEW_MASTER_IP:$NEW_MASTER_PORT"

# Example: Using OVH API (requires OVH API client/script)
# ovh api call /loadBalancer/{serviceName}/frontend/{frontendId}/backend/{backendId} --method PUT --body '{"address": "'$NEW_MASTER_IP':'$NEW_MASTER_PORT'"}'

# Example: Reloading local HAProxy configuration
# sudo systemctl reload haproxy

The post-failover-script in sentinel.conf would be configured as:

# sentinel.conf snippet
sentinel post-failover-script /path/to/your/post-failover-script.sh

This approach adds complexity and a potential point of failure if the script or API interaction fails. The Magento-native Sentinel integration is generally more robust and simpler to manage for Redis failover.

Automated Failover for Magento 2 Application Servers

For the Magento 2 application servers themselves, automated failover typically involves a multi-tier approach: a load balancer distributing traffic to healthy web servers, and a mechanism to detect and replace unhealthy web servers.

OVH Load Balancer Configuration for Web Servers

OVH’s load balancers can be configured with health checks to monitor the availability of your Magento web servers. These health checks should be specific enough to detect application-level issues, not just network connectivity.

A common health check for a web server involves making an HTTP request to a specific endpoint and expecting a 200 OK status code. For Magento, this could be a dedicated health check endpoint or a static file.

# Example OVH Load Balancer Health Check Configuration (Conceptual)

# Service: Web Application Load Balancer
# Frontend: HTTP/HTTPS traffic
# Backend Pool: Magento Web Servers (e.g., 10.0.0.1:80, 10.0.0.2:80, 10.0.0.3:80)

# Health Check Settings:
# Protocol: HTTP
# Port: 80
# URI: /healthcheck.php  (or a static file like /robots.txt)
# Method: GET
# Expected Status Codes: 200
# Timeout: 5 seconds
# Interval: 10 seconds
# Fallback Threshold: 3 (number of failed checks before marking as down)
# Recovery Threshold: 2 (number of successful checks before marking as up)

The load balancer will automatically stop sending traffic to any backend server that fails these health checks. This provides immediate traffic redirection away from unhealthy instances.

Automated Server Provisioning and Replacement

To achieve true automated failover for application servers, you need a way to automatically provision new servers when existing ones fail. This is where infrastructure-as-code (IaC) and auto-scaling groups become crucial.

OVHcloud offers services that can be integrated with IaC tools like Terraform or Ansible. You can define your Magento server infrastructure in code, including the desired number of instances and their configuration.

Terraform for Infrastructure Management

Terraform can be used to define and manage your OVHcloud resources, including virtual machines (Public Cloud Instances), load balancers, and networks. You can create an auto-scaling group concept by defining a launch configuration and then using a mechanism to trigger new instance creation when a server is detected as unhealthy.

# Example Terraform Configuration (Conceptual)

# Provider configuration for OVHcloud
provider "ovh" {
  endpoint = "ovh-eu"
  # ... authentication details
}

# Define a Public Cloud Instance template
resource "ovh_compute_instance" "magento_web_server" {
  name          = "magento-web-${count.index}"
  image_name    = "ubuntu-2004"
  flavor_name   = "b2-7" # Example flavor
  region        = "GRA1"
  ssh_key_names = ["my-ssh-key"]
  user_data     = file("cloud-init.yaml") # Script to configure the instance on boot

  count = 3 # Initial number of instances
}

# Define the Load Balancer and its backend pool
resource "ovh_loadbalancer" "magento_lb" {
  # ... LB configuration
}

resource "ovh_loadbalancer_frontend" "magento_frontend_http" {
  # ... frontend config
}

resource "ovh_loadbalancer_backend" "magento_backend_http" {
  # ... backend config
  port = 80

  # Dynamically add instances from the compute instances
  dynamic "servers" {
    for_each = ovh_compute_instance.magento_web_server
    content {
      address = servers.value.ip_address
      status  = "up" # Initial status
    }
  }
}

# This part requires an external automation trigger or a more advanced setup
# For example, a monitoring system detects an unhealthy server and triggers
# a Terraform apply to add a new instance.

The user_data (cloud-init) script is crucial for bootstrapping new instances. It should install necessary software (web server, PHP, Magento dependencies), configure them, and potentially register them with the load balancer.

# cloud-init.yaml
#cloud-config
packages:
  - nginx
  - php-fpm
  - php-mysql
  - php-gd
  # ... other Magento dependencies

runcmd:
  - systemctl enable nginx
  - systemctl start nginx
  - |
    # Configure Nginx for Magento
    echo "server {
        listen 80 default_server;
        server_name yourdomain.com;
        root /var/www/html/magento; # Adjust path
        index index.php index.html index.htm;

        location / {
            try_files $uri $uri/ /index.php?$args;
        }

        location ~ \.php$ {
            include snippets/fastcgi-php.conf;
            fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Adjust PHP version
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
            include fastcgi_params;
        }

        # Magento specific configurations (e.g., static content, media)
        location /static/ {
            expires -1;
            location_unavailable_message 404;
            try_files $uri $uri/ /static/version.json $uri/index.php?$args;
        }
        location /media/ {
            expires 30d;
            location_unavailable_message 404;
            try_files $uri $uri/ /index.php?$args;
        }
    }" > /etc/nginx/sites-available/magento
    ln -sf /etc/nginx/sites-available/magento /etc/nginx/sites-enabled/magento
    nginx -s reload

  - |
    # Deploy Magento (if not already deployed via CI/CD)
    # cd /var/www/html/magento
    # composer install --no-dev --optimize-autoloader
    # php bin/magento setup:static-content:deploy -f en_US en_GB # Adjust locales
    # php bin/magento setup:upgrade
    # php bin/magento cache:flush

  - |
    # Register with Load Balancer (if not using dynamic Terraform)
    # This would involve an API call to OVH to add this server to the backend pool.
    # Alternatively, the LB health check will eventually remove it if it doesn't respond.

Monitoring and Alerting for Proactive Failover

A robust monitoring system is the backbone of any automated failover strategy. Tools like Prometheus, Grafana, and Alertmanager are excellent choices for this. They can monitor Redis Sentinel, application server health, and OVH load balancer metrics.

Key metrics to monitor:

Redis Sentinel: Number of masters down, number of Sentinels in quorum, failover events.
Redis Instances: Latency, memory usage, connected clients, replication lag.
Application Servers: CPU, memory, disk I/O, network traffic, HTTP error rates (5xx), request latency.
OVH Load Balancer: Backend server status (up/down), traffic volume, error rates.

Alerts should be configured for critical thresholds. For instance, if a Redis master is unreachable for an extended period, or if multiple application servers start returning 5xx errors, an alert should trigger an automated remediation process. This could involve:

Automatically provisioning a new application server via Terraform/Ansible.
Notifying the operations team via Slack, PagerDuty, or email.
Triggering a manual review if automated remediation is not possible.

Conclusion

Architecting auto-failover for Redis and Magento 2 on OVH requires a multi-faceted approach. For Redis, leveraging Sentinel with Magento’s native integration provides a resilient and automated solution. For application servers, a combination of OVH load balancers with intelligent health checks, coupled with infrastructure-as-code for automated provisioning and replacement, ensures high availability. Continuous monitoring and well-defined alerting are the final pieces that enable a truly robust disaster recovery strategy.