Disaster Recovery 101: Architecting Auto-Failovers for Redis and PHP Deployments on OVH
Automated Redis Failover with Sentinel and PHP Application Logic
For mission-critical applications relying on Redis for caching, session management, or real-time data, a robust disaster recovery strategy is paramount. This document outlines an architecture for automated failover of Redis deployments, specifically targeting a setup with PHP applications hosted on OVHcloud infrastructure. We will leverage Redis Sentinel for high availability and implement application-level logic to gracefully handle failover events.
Redis Sentinel Deployment and Configuration
Redis Sentinel is the de facto standard for Redis high availability. It provides monitoring, notification, and automatic failover for Redis master-replica setups. A typical Sentinel deployment involves at least three Sentinel instances to ensure quorum and avoid split-brain scenarios.
Setting up Redis Master and Replicas
First, ensure you have a Redis master and at least one replica configured. For simplicity, we’ll assume these are running on separate OVHcloud instances or within different availability zones for better resilience.
Redis Master Configuration (redis.conf)
# redis.conf for Master port 6379 daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log dbfilename dump.rdb dir /var/lib/redis # Replication replica-serve-stale-data yes replica-read-only yes replica-announce-ipreplica-announce-port 6379 # Security (example, use proper security measures) requirepass your_redis_password # If using TLS, uncomment and configure: # tls-port 6380 # tls-cert-file /etc/redis/redis.crt # tls-key-file /etc/redis/redis.key # tls-ca-cert-file /etc/redis/ca.crt
Redis Replica Configuration (redis.conf)
# redis.conf for Replica port 6379 daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log dbfilename dump.rdb dir /var/lib/redis # Replication replicaof6379 replica-serve-stale-data yes replica-read-only yes replica-announce-ip replica-announce-port 6379 # Security (example, use proper security measures) requirepass your_redis_password # If using TLS, uncomment and configure: # tls-port 6380 # tls-cert-file /etc/redis/redis.crt # tls-key-file /etc/redis/redis.key # tls-ca-cert-file /etc/redis/ca.crt
Sentinel Configuration (sentinel.conf)
Deploy at least three Sentinel instances on separate hosts. Each Sentinel needs to know about the master and other Sentinels. The quorum setting is critical: it’s the number of Sentinels that must agree a master is down for a failover to occur. A common practice is (N/2) + 1, where N is the number of Sentinels. For 3 Sentinels, a quorum of 2 is typical.
# sentinel.conf for Sentinel Instance 1 port 26379 daemonize yes pidfile /var/run/redis-sentinel.pid logfile /var/log/redis/sentinel.log dir /var/lib/redis # Monitor the Redis master # Format: sentinel monitorsentinel monitor mymaster 6379 2 # Failover timeout (milliseconds) sentinel failover-timeout mymaster 60000 # Down-after-milliseconds: how long the master must be unreachable to be considered down sentinel down-after-milliseconds mymaster 10000 # Parallel synchronization: number of replicas that can sync with the new master simultaneously sentinel parallel-syncs mymaster 1 # Sentinel communication: specify the IP and port for Sentinel to listen on sentinel bind 0.0.0.0 # If using TLS for Redis, Sentinels also need TLS configuration # sentinel tls-port 26380 # sentinel tls-cert-file /etc/redis/sentinel.crt # sentinel tls-key-file /etc/redis/sentinel.key # sentinel tls-ca-cert-file /etc/redis/ca.crt # sentinel tls-replication yes # sentinel tls-auth-clients no # Other Sentinels (for discovery and quorum) # These are typically discovered automatically if Sentinels are configured to talk to each other. # However, explicitly listing them can be useful for initial setup or static configurations. # sentinel discover-replica-nodes yes # (default is yes)
Ensure that the sentinel monitor directive uses the private IP addresses of your Redis instances within the OVHcloud network. The quorum of 2 for 3 Sentinels ensures that if one Sentinel fails, the remaining two can still elect a new master.
Starting Redis and Sentinel Services
Start the Redis master and replicas first, then start the Sentinel instances. Verify the Sentinel status using redis-cli -p 26379 SENTINEL master mymaster and redis-cli -p 26379 SENTINEL replicas mymaster.
# On Redis Master/Replica hosts sudo systemctl start redis-server # On Sentinel hosts sudo systemctl start redis-sentinel
PHP Application Integration and Failover Handling
Your PHP application needs to be aware of the Redis master’s address and be able to adapt when a failover occurs. The most robust way to achieve this is by using a Redis client library that supports Sentinel or by implementing custom logic to query Sentinel.
Using Predis with Sentinel Support
The Predis library for PHP offers excellent built-in support for Redis Sentinel. This simplifies the integration significantly.
require 'vendor/autoload.php'; // Assuming you use Composer
use Predis\Client;
// Sentinel connection details
$sentinels = [
'tcp://:26379',
'tcp://:26379',
'tcp://:26379',
];
// Redis master name as configured in sentinel.conf
$masterName = 'mymaster';
// Connection options
$options = [
'parameters' => [
'password' => 'your_redis_password',
// If using TLS for Redis:
// 'scheme' => 'tls',
// 'port' => 6380,
],
'sentinel' => [
'master' => $masterName,
'sentinels' => $sentinels,
'interval' => 5, // How often to check Sentinel for master changes (seconds)
],
];
try {
$client = new Client($sentinels, $options);
// Test connection and perform an operation
$client->set('test_key', 'Hello, Redis!');
$value = $client->get('test_key');
echo "Successfully connected to Redis. Value: " . $value . "\n";
// Example of using the client for sessions (if configured)
// session_set_save_handler(new PredisSessionHandler($client), true);
// session_start();
} catch (\Predis\CommunicationException $e) {
// Handle connection errors, potentially log and alert
error_log("Redis connection failed: " . $e->getMessage());
// Implement fallback logic here, e.g., use a secondary cache or direct DB access
echo "Error: Could not connect to Redis. Please try again later.\n";
} catch (\Exception $e) {
error_log("An unexpected error occurred with Redis: " . $e->getMessage());
echo "An unexpected error occurred. Please try again later.\n";
}
The Predis\Client, when configured with the sentinel option, automatically connects to the current master. If the master changes due to a failover, Predis will detect this change (by periodically querying Sentinels) and update its connection without requiring an application restart. The interval option controls how frequently Predis checks for master changes.
Custom Sentinel Query Logic (Less Recommended)
While Predis’s built-in Sentinel support is preferred, you could manually query Sentinel instances to determine the current master. This is more complex and error-prone but might be necessary if using a client library without Sentinel support or for specific monitoring tools.
function getRedisMasterAddress(array $sentinels, string $masterName, string $password = null): ?array
{
$redis = null;
foreach ($sentinels as $sentinel) {
try {
// Connect to a Sentinel instance
$redis = new \Redis();
// Use private IPs for Sentinel connections
if (!$redis->connect($sentinel['ip'], $sentinel['port'])) {
continue; // Try next Sentinel
}
if ($password) {
$redis->auth($password);
}
// Query for the master's address
$masterInfo = $redis->command('SENTINEL', ['master', $masterName]);
if (!empty($getMasterInfo)) {
// masterInfo is an array of arrays, e.g.,
// [['ip' => '...', 'port' => ..., 'name' => '...']]
// We need the first element which is the master
if (isset($masterInfo[0]['ip']) && isset($masterInfo[0]['port'])) {
return ['host' => $masterInfo[0]['ip'], 'port' => (int) $masterInfo[0]['port']];
}
}
} catch (\RedisException $e) {
// Log the error, but continue to try other Sentinels
error_log("Sentinel query failed for {$sentinel['ip']}: {$e->getMessage()}");
} finally {
if ($redis && $redis->isConnected()) {
$redis->close();
}
}
}
return null; // Could not determine master from any Sentinel
}
// Usage:
$sentinelHosts = [
['ip' => '', 'port' => 26379],
['ip' => '', 'port' => 26379],
['ip' => '', 'port' => 26379],
];
$masterName = 'mymaster';
$redisPassword = 'your_redis_password';
$masterAddress = getRedisMasterAddress($sentinelHosts, $masterName, $redisPassword);
if ($masterAddress) {
try {
$redisClient = new \Redis();
// Use private IPs for Redis connections
if (!$redisClient->connect($masterAddress['host'], $masterAddress['port'])) {
throw new \RedisException("Failed to connect to determined Redis master.");
}
$redisClient->auth($redisPassword);
// If using TLS: $redisClient->enableTLS();
// Perform Redis operations
$redisClient->set('app_data', 'some_value');
echo "Redis master: {$masterAddress['host']}:{$masterAddress['port']}\n";
} catch (\RedisException $e) {
error_log("Redis connection error: " . $e->getMessage());
// Implement fallback logic
}
} else {
error_log("Failed to determine Redis master address from Sentinels.");
// Implement fallback logic
}
This custom approach requires careful error handling and retry mechanisms. You would typically call getRedisMasterAddress during application bootstrap or when a Redis connection error occurs. The application must then re-establish its connection to the newly elected master.
OVHcloud Specific Considerations
Network Configuration and Security Groups
Ensure that your OVHcloud security groups (or firewall rules) allow traffic between your Redis instances, Sentinel instances, and your PHP application servers on the necessary ports (6379 for Redis, 26379 for Sentinel, and potentially TLS ports if configured). It is highly recommended to restrict access to these ports to only the private IP ranges of your internal network or specific security groups, rather than exposing them to the public internet.
Instance Placement for Resilience
To achieve true disaster recovery, deploy your Redis master, replicas, and Sentinel instances across different OVHcloud Availability Zones (AZs) within the same region. This protects against single datacenter failures. For example:
- Redis Master: AZ-A
- Redis Replica 1: AZ-B
- Redis Replica 2: AZ-C
- Sentinel 1: AZ-A
- Sentinel 2: AZ-B
- Sentinel 3: AZ-C
Your PHP application servers should also be distributed across AZs to maintain availability during an AZ failure.
Monitoring and Alerting
Implement comprehensive monitoring for your Redis and Sentinel instances. Key metrics include:
- Redis: Memory usage, CPU load, connected clients, latency, replication lag.
- Sentinel: Number of masters down, number of sentinels reporting masters down, current master, number of failovers.
Utilize OVHcloud’s monitoring tools or integrate with external services like Prometheus/Grafana or Datadog. Set up alerts for critical events, such as Sentinel reporting a master down, a failed failover, or high replication lag.
Testing Your Failover Strategy
Regularly test your failover mechanism to ensure it functions as expected and to identify any weaknesses. A simple test involves manually shutting down the Redis master instance. Observe:
- Sentinel logs: Verify that Sentinels detect the master as down and initiate a failover.
- Application logs: Check for connection errors and subsequent successful reconnections to the new master.
- Application behavior: Ensure the application remains functional, possibly with a brief interruption.
You can simulate a master failure by stopping the Redis service on the master instance:
# On the Redis Master host sudo systemctl stop redis-server
After the failover, restart the old master. Sentinel should automatically reconfigure it as a replica of the new master. This is a crucial part of the recovery process.
# On the old Redis Master host (after failover is complete) sudo systemctl start redis-server
Conclusion
By combining Redis Sentinel for automated high availability with robust PHP application integration using libraries like Predis, you can architect a resilient Redis deployment on OVHcloud. Distributing components across Availability Zones and implementing thorough monitoring and testing are key to ensuring your application can withstand infrastructure failures with minimal downtime.