Disaster Recovery 101: Architecting Auto-Failovers for Redis and WooCommerce Deployments on OVH
Redis Sentinel for High Availability
Achieving automated failover for Redis in a production WooCommerce environment necessitates a robust high-availability strategy. Redis Sentinel is the de facto standard for this, providing monitoring, notification, and automatic failover for Redis instances. We’ll focus on a typical setup with one master and at least two sentinels, deployed across different availability zones within OVH’s infrastructure for resilience.
The core components are the Redis master, one or more Redis replicas (for read scaling and failover candidates), and the Redis Sentinel processes. Sentinels monitor the master and replicas, and if the master becomes unreachable, they elect a replica to promote to master and reconfigure the remaining replicas and clients.
Sentinel Configuration (`sentinel.conf`)
Each Sentinel process requires a configuration file. Key parameters include:
port: The port Sentinel listens on (typically 26379).sentinel monitor <master-name> <ip> <port> <quorum>: This is the most critical directive. It tells Sentinel to monitor a specific Redis master.<master-name>is an arbitrary name for your master (e.g.,mymaster).<ip>and<port>are the address of the master.<quorum>is the number of Sentinels that must agree that the master is down before initiating a failover. A common practice is to set this to(N/2) + 1, where N is the total number of Sentinels. For 3 Sentinels, a quorum of 2 is typical.sentinel down-after-milliseconds <master-name> <milliseconds>: The time after which a master is considered down if no reply is received.sentinel failover-timeout <master-name> <milliseconds>: The maximum time to complete a failover.sentinel parallel-syncs <master-name> <num-syncs>: The number of replicas that can sync with the new master simultaneously after a failover.logfile: Path to the Sentinel log file.dir: Working directory for Sentinel.
Here’s an example sentinel.conf for a master named mymaster running on 192.168.1.100:6379, with a quorum of 2:
# sentinel.conf port 26379 dir "/var/lib/redis/sentinel" # Ensure this directory exists and is writable # Monitor Redis master 'mymaster' at 192.168.1.100:6379 # Quorum of 2 means at least 2 sentinels must agree the master is down sentinel monitor mymaster 192.168.1.100 6379 2 # Time in milliseconds to consider a master down sentinel down-after-milliseconds mymaster 5000 # Time in milliseconds to complete a failover sentinel failover-timeout mymaster 60000 # Number of replicas that can sync with the new master at the same time sentinel parallel-syncs mymaster 1 # Log file location logfile "/var/log/redis/sentinel.log" # If you have replicas, Sentinel will automatically manage them. # For example, if you have a replica at 192.168.1.101:6379, Sentinel will detect it # and configure it to replicate from the new master after a failover.
Redis Configuration (`redis.conf`) for Master and Replicas
The Redis master configuration is standard, but replicas need to be configured to replicate from the master. Sentinels will automatically update replica configurations during failover, but the initial setup is crucial.
Master Configuration (`redis.conf`):
# redis.conf (Master) port 6379 daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log dir /var/lib/redis bind 0.0.0.0 # Or specific IP if preferred for security # For persistence, choose one or both: # RDB snapshotting save 900 1 save 300 10 save 60 10000 # AOF (Append Only File) - generally preferred for higher durability appendonly yes appendfilename "appendonly.aof" appendfsync everysec # Enable protected mode if not binding to a specific IP and no password is set # protected-mode yes # If using replication, master should not have replicaof directive. # If using Sentinel, it's good practice to have a password. # requirepass your_strong_redis_password
Replica Configuration (`redis.conf`):
# redis.conf (Replica) port 6379 # Can be different if running multiple replicas on same host, but typically same daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log dir /var/lib/redis bind 0.0.0.0 # Or specific IP # Crucial directive: point to the master # Sentinel will manage this if the master changes. replicaof 192.168.1.100 6379 # If master has a password, replica must also have it. # requirepass your_strong_redis_password # masterauth your_strong_redis_password # Replicas typically don't need persistence enabled unless they might become masters. # If they do, ensure their data is consistent with the master. # For simplicity in failover scenarios, often persistence is disabled on replicas. # save "" # appendonly no
Deployment and Orchestration on OVH
On OVH, you’d typically deploy these components using virtual machines (e.g., Public Cloud Instances) or containers (e.g., Managed Kubernetes Service). For maximum resilience, ensure your Redis master, replicas, and Sentinels are distributed across different Availability Zones (AZs) within a single OVH region.
Example Deployment Strategy:
- AZ-1: Redis Master, Sentinel 1
- AZ-2: Redis Replica 1, Sentinel 2
- AZ-3: Redis Replica 2, Sentinel 3
This ensures that even if an entire AZ fails, the remaining Sentinels can orchestrate a failover to a replica in another AZ.
Starting and Managing Services
Use systemd or a similar init system to manage your Redis and Sentinel processes. Ensure they are configured to start on boot and restart if they crash.
Example systemd service file for Redis:
# /etc/systemd/system/[email protected] [Unit] Description=Redis data store After=network.target [Service] User=redis Group=redis ExecStart=/usr/local/bin/redis-server /etc/redis/redis.conf ExecStop=/usr/local/bin/redis-cli -p %i shutdown Restart=always # Consider adding: # ProtectSystem=full # PrivateTmp=true # NoNewPrivileges=true [Install] WantedBy=multi-user.target
Example systemd service file for Sentinel:
# /etc/systemd/system/[email protected] [Unit] Description=Redis Sentinel After=network.target [email protected] # Ensure Redis master is up first [Service] User=redis Group=redis ExecStart=/usr/local/bin/redis-sentinel /etc/redis/sentinel.conf ExecStop=/usr/local/bin/redis-cli -p 26379 shutdown Restart=always # Consider adding: # ProtectSystem=full # PrivateTmp=true # NoNewPrivileges=true [Install] WantedBy=multi-user.target
After creating these files, run:
sudo systemctl daemon-reload sudo systemctl enable [email protected] sudo systemctl start [email protected] sudo systemctl enable [email protected] # Assuming sentinel port is 26379 sudo systemctl start [email protected]
WooCommerce Integration and Client Configuration
WooCommerce applications (typically PHP-based) need to be aware of the Redis cluster and how to connect to the current master. This is where Sentinel’s client-side support or a proxy layer comes into play.
PHP Redis Client with Sentinel Support
The most common PHP Redis client library, phpredis, has built-in support for Sentinel. When connecting, you provide a list of Sentinel hosts and the master name. The client library will then query the Sentinels to discover the current master’s address.
Example PHP connection using phpredis:
<?php
// Configuration for Redis Sentinel
$sentinels = [
'tcp://192.168.1.1:26379', // Sentinel 1 IP and Port
'tcp://192.168.1.2:26379', // Sentinel 2 IP and Port
'tcp://192.168.1.3:26379', // Sentinel 3 IP and Port
];
$masterName = 'mymaster'; // The name defined in sentinel.conf
$redisPassword = 'your_strong_redis_password'; // If you set a password
try {
// Initialize Redis client with Sentinel support
$redis = new Redis();
// Connect to Sentinel to discover the master
// The connect method will automatically find the current master
// It will also handle failover if the master changes while connected
// Note: The 'timeout' parameter is for the initial connection to Sentinel.
// The actual Redis connection timeout is handled internally.
$redis->connectSentinel($sentinels, $masterName, 1.0); // 1 second timeout for Sentinel connection
// Authenticate if a password is set
if ($redisPassword) {
$redis->auth($redisPassword);
}
// Set a key to test the connection
$redis->set('test_key', 'Hello from WooCommerce!');
$value = $redis->get('test_key');
echo "Successfully connected to Redis master: " . $redis->getHost() . ":" . $redis->getPort() . "\n";
echo "Value of test_key: " . $value . "\n";
// For read operations, you might want to connect to replicas to offload the master.
// Sentinel can provide replica addresses.
// $replicas = $redis->redisSentinel('REPLICAOF', $masterName);
// if (!empty($replicas)) {
// // Connect to one of the replicas
// $replicaRedis = new Redis();
// $replicaRedis->connect($replicas[0]['ip'], $replicas[0]['port']);
// if ($redisPassword) {
// $replicaRedis->auth($redisPassword);
// }
// echo "Connected to replica: " . $replicas[0]['ip'] . ":" . $replicas[0]['port'] . "\n";
// }
} catch (RedisException $e) {
// Handle connection errors, log them, and potentially trigger alerts
error_log("Redis connection failed: " . $e->getMessage());
// Implement fallback logic, e.g., use database for caching or disable caching
echo "Error connecting to Redis: " . $e->getMessage() . "\n";
}
?>
Ensure your PHP environment has the phpredis extension installed and enabled. The IP addresses for Sentinels and the master/replicas must be reachable from your WooCommerce application servers.
Application-Level Failover Handling
While Sentinel and the phpredis client handle the Redis-level failover, your WooCommerce application should also be prepared for transient errors. Implement retry mechanisms with exponential backoff for Redis operations. Log all Redis connection errors and failover events. Consider a circuit breaker pattern if Redis becomes persistently unavailable, to prevent cascading failures in your application.
For critical operations that cannot tolerate Redis downtime, even during failover, you might need to implement logic to fall back to your primary database (e.g., MySQL) or a simpler caching mechanism. This is highly application-specific.
Monitoring and Alerting
A comprehensive monitoring strategy is crucial for any HA setup. You need to monitor:
- Redis Master/Replica Health: Latency, memory usage, connected clients, replication lag.
- Sentinel Health: Ensure all Sentinels are running and can communicate. Monitor Sentinel logs for failover events.
- Application Connectivity: Monitor the success rate of Redis operations from your WooCommerce application.
Tools like Prometheus with the Redis Exporter, Grafana for visualization, and Alertmanager for notifications are excellent choices. Configure alerts for:
- Redis master down.
- Sentinel quorum not met.
- Replication lag exceeding a threshold.
- High Redis error rates from the application.
OVH’s monitoring tools can also provide basic infrastructure-level metrics for your instances.
Testing Failover
Regularly test your failover mechanism to ensure it works as expected. This is non-negotiable for a production system.
Manual Failover Test Steps:
- Identify the current Redis master.
- Gracefully shut down the Redis master process (e.g.,
redis-cli shutdown). - Observe the Sentinel logs. You should see Sentinels detecting the master as down, initiating a leader election, and promoting a replica.
- Verify that the application can still connect to Redis and that it’s now connected to the new master (check
redis-cli info replicationon the new master and its replicas). - Bring the old master back online. It should automatically connect as a replica to the new master.
Alternatively, you can simulate network partitions or kill Sentinel processes to test their resilience and failover coordination.