Disaster Recovery 101: Architecting Auto-Failovers for Redis and PHP Deployments on DigitalOcean
Establishing a Highly Available Redis Cluster on DigitalOcean
Achieving true disaster recovery for a web application hinges on resilient data stores. For applications leveraging Redis for caching, session management, or real-time data, a single-instance deployment is a critical vulnerability. This section details the architecture and implementation of a highly available Redis cluster on DigitalOcean, focusing on automatic failover capabilities.
We will utilize Redis Sentinel, a distributed system designed to provide high availability for Redis. Sentinel monitors Redis instances, performs automatic failovers, and provides configuration discovery for clients.
Sentinel Deployment Strategy
A robust Sentinel setup requires a quorum of at least three Sentinel instances. This ensures that a majority of Sentinels must agree on the state of the Redis master and slaves before initiating a failover, preventing split-brain scenarios. We’ll deploy these Sentinel instances on separate DigitalOcean Droplets, ideally in different availability zones for maximum resilience.
Redis Configuration for High Availability
First, configure your Redis master and replica instances. Ensure they are set up for replication. The master needs to be discoverable by Sentinels, and replicas should be configured to follow the master.
Master Redis Configuration (redis.conf)
# redis.conf on Master Node port 6379 daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log dir /var/lib/redis # Enable persistence if needed, but for caching, it might be optional. # If persistence is enabled, ensure it's configured appropriately for your RPO. # save 900 1 # save 300 10 # save 60 10000 # appendonly yes # For Sentinel to manage, it's good practice to set a unique name. # This is not strictly required for Sentinel but good for identification. # redis-cli config set master-name mymaster # Binding to a specific IP is crucial for security and network isolation. # Replace with your Droplet's private IP. bind 10.10.0.10 protected-mode no # Only if you have strong firewall rules in place. # If using protected-mode yes, ensure clients can connect via bind IP. # Enable RDB snapshots if needed. dbfilename dump.rdb # AOF is generally preferred for durability if needed. appendonly yes appendfilename "appendonly.aof" appendfsync everysec
Replica Redis Configuration (redis.conf)
# redis.conf on Replica Node port 6379 daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log dir /var/lib/redis # Enable persistence if needed. # save 900 1 # save 300 10 # save 60 10000 # appendonly yes # Crucial for replication: point to the master. # Replace with your Master's private IP and port. replicaof 10.10.0.10 6379 # Binding to a specific IP. # Replace with your Droplet's private IP. bind 10.10.0.11 protected-mode no # Only if you have strong firewall rules in place. # AOF is generally preferred for durability if needed. appendonly yes appendfilename "appendonly.aof" appendfsync everysec
Sentinel Configuration (sentinel.conf)
On each of the Sentinel Droplets, create a sentinel.conf file. This configuration tells Sentinel how to monitor your Redis master, what quorum is required for failover, and how to reach the master.
# sentinel.conf on Sentinel Nodes port 26379 daemonize yes pidfile /var/run/redis-sentinel.pid logfile /var/log/redis/sentinel.log dir /var/lib/redis # Monitor the Redis master. # 'mymaster' is the name given to this master by Sentinel. # 10.10.0.10 6379 is the IP and port of the Redis master. # 2 is the quorum: minimum number of Sentinels that must agree for a failover. # We'll use 2 for a 3-node Sentinel setup. sentinel monitor mymaster 10.10.0.10 6379 2 # The time in milliseconds the Sentinel needs to see a master as down # before it starts the failover process. Default is 30 seconds. sentinel down-after-milliseconds mymaster 30000 # The time in milliseconds after which Sentinel will initiate failover # if a master is unreachable. Default is 30 seconds. sentinel failover-timeout mymaster 60000 # Number of replicas that can be reconfigured to point to the new master # during failover. If set to 1, only one replica is reconfigured at a time. # If set to 0, all replicas are reconfigured in parallel. sentinel parallel-syncs mymaster 1 # Optional: Specify the IP address for Sentinel to bind to. # Replace with your Sentinel Droplet's private IP. bind 10.10.0.20 protected-mode no # Only if you have strong firewall rules in place.
Setting up the Cluster
1. Provision Droplets: Create at least four Droplets on DigitalOcean: one for the Redis master, one for a Redis replica, and three for Redis Sentinel instances. Ensure they are on the same VPC network for private IP communication.
2. Install Redis: On all Redis Droplets, install Redis:
sudo apt update sudo apt install redis-server -y
3. Configure Redis: Edit redis.conf on the master and replica Droplets as described above. Restart Redis on each:
sudo systemctl restart redis-server
4. Configure Sentinel: On the three Sentinel Droplets, create sentinel.conf with the content above. Ensure the IP addresses are correct for your Droplets. Start Sentinel on each:
# On each Sentinel Droplet: redis-sentinel /etc/redis/sentinel.conf
5. Verify Replication: On the master, run redis-cli INFO replication. You should see connected slaves. On a replica, run redis-cli INFO replication; it should show master_host and master_port pointing to your master.
6. Verify Sentinel: On any Sentinel node, run redis-cli -p 26379 SENTINEL masters. You should see your mymaster listed with its status, including the number of masters, slaves, and sentinels. Run redis-cli -p 26379 SENTINEL master mymaster for detailed information about the master and its current state.
Architecting PHP Application for Redis Auto-Failover
Your PHP application needs to be aware of the Redis cluster and be able to connect to the current master, even after a failover. Redis Sentinel provides a mechanism for clients to discover the current master.
Using Predis for Sentinel Integration
The predis/predis library is a popular choice for PHP and offers excellent support for Redis Sentinel. If you’re not already using it, install it via Composer:
composer require predis/predis
PHP Connection Logic
The key is to configure your PHP application to connect to the Sentinel instances, not directly to the Redis master. Predis will then query Sentinel to get the address of the current master.
require 'vendor/autoload.php';
// Replace with the private IPs of your Sentinel Droplets
$sentinels = [
'tcp://10.10.0.20:26379',
'tcp://10.10.0.21:26379',
'tcp://10.10.0.22:26379',
];
// 'mymaster' is the name defined in sentinel.conf
$masterName = 'mymaster';
try {
$client = new Predis\Client($sentinels, [
'replication' => 'sentinel',
'service' => $masterName,
// Optional: Add authentication if your Redis instances are password-protected
// 'password' => 'your_redis_password',
]);
// Test the connection by setting and getting a key
$client->set('test_key', 'Hello, Sentinel!');
$value = $client->get('test_key');
echo "Successfully connected to Redis master: " . $client->getCurrentServer() . "\n";
echo "Value of test_key: " . $value . "\n";
// Example of using Redis for caching
$cacheKey = 'user_data_' . uniqid();
$userData = ['id' => 123, 'name' => 'John Doe', 'email' => '[email protected]'];
$client->setex($cacheKey, 3600, json_encode($userData)); // Cache for 1 hour
$cachedData = json_decode($client->get($cacheKey), true);
if ($cachedData) {
echo "Retrieved from cache: " . $cachedData['name'] . "\n";
}
} catch (Predis\Connection\ConnectionException $e) {
// Handle connection errors. This might happen during a failover.
// Implement retry logic or fallback mechanisms here.
error_log("Redis connection failed: " . $e->getMessage());
// For critical operations, you might want to redirect to a maintenance page
// or use a secondary data source if available.
die("Could not connect to Redis. Please try again later.");
} catch (Exception $e) {
error_log("An unexpected error occurred: " . $e->getMessage());
die("An internal error occurred.");
}
This PHP code snippet demonstrates how to initialize a Predis client using a list of Sentinel servers. Predis handles the discovery of the current master automatically. If a failover occurs, subsequent calls to the client will transparently connect to the new master after Sentinel has completed the promotion process.
Handling Failovers in Application Logic
While Predis handles the connection switch, your application should be prepared for temporary unavailability during a failover. This typically involves:
- Implementing retry mechanisms with exponential backoff for Redis operations.
- Logging connection errors to monitor failover events.
- Potentially implementing a fallback strategy, such as serving stale data from a cache or a secondary data source, if Redis is critical for core functionality and temporary unavailability is unacceptable.
- Ensuring your application’s deployment process is idempotent and can restart connections gracefully.
Simulating and Testing Failovers
Regular testing of your failover mechanism is paramount. You can simulate a failover by manually stopping the Redis master process or by instructing Sentinel to perform a failover.
Manual Master Shutdown
On the Redis master Droplet, stop the Redis server:
sudo systemctl stop redis-server
Observe the logs on your Sentinel instances (e.g., /var/log/redis/sentinel.log). You should see Sentinels detecting the master as down, initiating a leader election among themselves, and then promoting one of the replicas to become the new master. Your PHP application, when it next attempts to connect or perform an operation, should automatically connect to the new master.
Forced Failover via Sentinel CLI
You can also trigger a failover from any Sentinel instance using redis-cli:
# On any Sentinel node: redis-cli -p 26379 SENTINEL failover mymaster
This command instructs the Sentinels to initiate a failover for the master named mymaster. Monitor the Sentinel logs to confirm the process. After the failover completes, verify that your PHP application can still connect and operate correctly.
Monitoring and Alerting
Beyond manual testing, integrate robust monitoring and alerting. DigitalOcean’s monitoring tools can track Droplet health. For Redis and Sentinel specifically, consider:
- Using Prometheus with the Redis exporter to collect metrics on Redis and Sentinel performance, replication lag, and Sentinel status.
- Configuring Alertmanager to send notifications (e.g., Slack, PagerDuty) when Sentinel detects a master failure, initiates a failover, or if a failover is unsuccessful.
- Regularly reviewing Sentinel logs for any recurring issues or warnings.
Considerations for Production Deployments
While this setup provides automatic failover, several factors are critical for production readiness:
Network Configuration and Firewalls
Ensure that your DigitalOcean firewall rules allow traffic between your Redis master, replicas, and Sentinel instances on the respective ports (6379 for Redis, 26379 for Sentinel). Also, allow your application servers to connect to the Sentinel instances.
Redis Persistence and RPO
# redis.conf appendonly yes appendfilename "appendonly.aof" appendfsync everysec
If your Redis instance stores critical data (not just cache), configure persistence (AOF is generally preferred for durability). The appendfsync everysec setting offers a good balance between performance and durability, with a potential data loss of up to one second (Recovery Point Objective – RPO). Adjust this based on your business requirements. During a failover, the new master will be the latest replica that has fully synchronized with the old master. Ensure your replicas are configured to keep up.
Redis Authentication
# redis.conf requirepass your_strong_redis_password # sentinel.conf # If Redis requires a password, Sentinel needs it too. # sentinel auth-pass mymaster your_strong_redis_password
For production, always enable Redis authentication using requirepass in redis.conf. You’ll also need to configure sentinel auth-pass in your sentinel.conf files to allow Sentinels to authenticate with Redis instances.
Scaling and Performance
For read-heavy workloads, you can add more Redis replicas. These replicas will be automatically discovered and managed by Sentinel. For write-heavy workloads, consider Redis Cluster for sharding, which is a more complex setup than Sentinel for high availability but offers horizontal scaling for writes.
Application Downtime During Failover
While Sentinel aims for minimal downtime, a failover is not instantaneous. It involves detection, leader election, and replica promotion. This process can take anywhere from a few seconds to a minute, depending on network conditions and configuration. Your application’s resilience to this brief period of unavailability is key. If zero downtime is an absolute requirement, more advanced strategies like multi-master replication (which Redis doesn’t natively support in a simple way) or application-level buffering might be necessary.