Disaster Recovery 101: Architecting Auto-Failovers for Redis and Perl Deployments on DigitalOcean
Automated Redis Failover with Sentinel and DigitalOcean Load Balancers
Achieving high availability for Redis, especially in a cloud environment like DigitalOcean, necessitates a robust failover strategy. Redis Sentinel is the de facto standard for this, providing monitoring, notification, and automatic failover. We’ll architect a solution leveraging Sentinel and DigitalOcean’s Load Balancers to ensure seamless transitions.
Our setup will involve at least three Redis instances: one master and two or more replicas. A minimum of three Sentinel processes will monitor these Redis instances. DigitalOcean Load Balancers will then direct traffic to the current master Redis instance.
Sentinel Configuration for High Availability
Each Sentinel process needs to be configured to monitor the Redis master and its replicas. The key parameters are:
sentinel monitor <master-name> <ip> <port> <quorum>: Defines the master Redis instance and the minimum number of Sentinels that must agree a master is down before initiating a failover.sentinel down-after-milliseconds <master-name> <milliseconds>: The time in milliseconds the master must be unreachable for it to be considered in failure state by a Sentinel.sentinel failover-timeout <master-name> <milliseconds>: The maximum time in milliseconds to spend performing a failover.sentinel parallel-syncs <master-name> <num-replicas>: The number of replicas that can be reconfigured to point to the new master in parallel.
Here’s a sample sentinel.conf file for a Sentinel instance:
Example sentinel.conf
# sentinel.conf port 26379 sentinel monitor mymaster 10.10.0.5 6379 2 sentinel down-after-milliseconds mymaster 5000 sentinel failover-timeout mymaster 60000 sentinel parallel-syncs mymaster 1 # If you have authentication enabled on Redis: # sentinel auth-pass mymaster YOUR_REDIS_PASSWORD
Deploy this configuration on at least three separate DigitalOcean Droplets, each running a Redis Sentinel process. Ensure these Sentinels can communicate with each other and with the Redis master/replicas.
Redis Master/Replica Setup
Set up your Redis instances. One will be the master, and the others will be replicas. For simplicity, we’ll assume private IP addresses within a DigitalOcean VPC. If using public IPs, ensure proper firewall rules are in place.
Example Redis Configuration (Master)
# redis.conf (Master) port 6379 bind 0.0.0.0 # If you need authentication: # requirepass YOUR_REDIS_PASSWORD
Example Redis Configuration (Replica)
# redis.conf (Replica) port 6379 bind 0.0.0.0 replicaof <master-ip> 6379 # If you need authentication: # requirepass YOUR_REDIS_PASSWORD # replica-serve-stale-data yes # Consider this if replicas might be slightly behind
Start your Redis master and then your replicas, configuring them to replicate from the master. Once replicas are up and running, start your Sentinel processes.
Integrating with DigitalOcean Load Balancers
DigitalOcean Load Balancers are L4 (TCP) load balancers. They are ideal for directing traffic to the *current* Redis master. The key is to configure the Load Balancer to point to the IP address and port of the Redis master. When Sentinel promotes a replica to master, you’ll need to update the Load Balancer’s target pool.
This is the manual step that needs automation. We can achieve this by having the Sentinels trigger an action upon successful failover. A common approach is to use Sentinel’s `post-failover-script` directive or to have an external monitoring service watch Sentinel’s events.
Automating Load Balancer Updates
A robust automation strategy involves a script that listens for Sentinel’s `+switch-master` event. This event is published by Sentinel when a failover is successfully completed, providing the new master’s IP and port.
We can use a simple Perl script to monitor Sentinel’s output and interact with the DigitalOcean API. First, ensure you have a DigitalOcean API token with the necessary permissions to manage Load Balancers.
Perl Script for Sentinel Event Monitoring and API Interaction
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use JSON;
my $DO_API_TOKEN = 'YOUR_DIGITALOCEAN_API_TOKEN'; # Replace with your token
my $LOAD_BALANCER_ID = 'YOUR_LOAD_BALANCER_ID'; # Replace with your LB ID
my $REDIS_PORT = 6379; # The port your Redis instances listen on
my $sentinel_host = '127.0.0.1'; # Or the IP of your Sentinel instance
my $sentinel_port = 26379;
my $ua = LWP::UserAgent->new;
$ua->agent("SentinelFailoverBot/1.0");
# Connect to Sentinel via redis-cli in interactive mode and pipe output
open(my $pipe, "-|", "redis-cli -h $sentinel_host -p $sentinel_port --sentinel -i 1") or die "Could not open pipe to redis-cli: $!";
print "Monitoring Sentinel for +switch-master events...\n";
while (my $line = <$pipe>) {
chomp $line;
if ($line =~ /^\+switch-master\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)$/) {
my ($master_name, $master_ip, $master_port, $new_master_ip, $new_master_port) = ($1, $2, $3, $4, $5);
print "Detected master switch for '$master_name': New master is $new_master_ip:$new_master_port\n";
if ($new_master_port == $REDIS_PORT) {
update_load_balancer($new_master_ip);
} else {
print "Warning: New master port ($new_master_port) does not match expected Redis port ($REDIS_PORT). Skipping LB update.\n";
}
}
}
close($pipe);
sub update_load_balancer {
my ($new_master_ip) = @_;
my $api_url = "https://api.digitalocean.com/v2/loadbalancers/$LOAD_BALANCER_ID";
# Fetch current LB configuration to identify the target pool
my $req = HTTP::Request->new(GET => $api_url);
$req->header('Authorization' => "Bearer $DO_API_TOKEN");
my $res = $ua->request($req);
if ($res->is_success) {
my $lb_data = decode_json($res->decoded_content);
my $target_pool_name = undef;
# Find the target pool associated with Redis
foreach my $pool (@{$lb_data->{load_balancer}->{health_check}->{pools}}) {
# Assuming your Redis pool has a recognizable name or configuration
# You might need to adjust this logic based on your LB setup
if ($pool->{name} =~ /redis/i || $pool->{port} == $REDIS_PORT) {
$target_pool_name = $pool->{name};
last;
}
}
if ($target_pool_name) {
print "Found target pool '$target_pool_name' for Redis.\n";
# Construct the new target pool configuration
my $new_targets = [];
push @$new_targets, { ip => $new_master_ip, port => $REDIS_PORT };
# You might need to preserve other targets if your LB serves multiple services
# For a dedicated Redis LB, this is simpler.
# For this example, we assume a single Redis target.
my $lb_update_payload = {
name => $lb_data->{load_balancer}->{name},
algorithm => $lb_data->{load_balancer}->{algorithm},
sticky_sessions => $lb_data->{load_balancer}->{sticky_sessions},
health_check => {
port => $lb_data->{load_balancer}->{health_check}->{port},
protocol => $lb_data->{load_balancer}->{health_check}->{protocol},
check_interval_seconds => $lb_data->{load_balancer}->{health_check}->{check_interval_seconds},
response_timeout_seconds => $lb_data->{load_balancer}->{health_check}->{response_timeout_seconds},
healthy_threshold => $lb_data->{load_balancer}->{health_check}->{healthy_threshold},
unhealthy_threshold => $lb_data->{load_balancer}->{health_check}->{unhealthy_threshold},
target_pools => [
{
name => $target_pool_name,
protocol => "tcp", # Redis uses TCP
port => $REDIS_PORT,
targets => $new_targets,
health_check => {
port => $REDIS_PORT, # Health check on Redis port
protocol => "tcp",
check_interval_seconds => 5,
response_timeout_seconds => 5,
healthy_threshold => 2,
unhealthy_threshold => 3,
}
}
]
}
};
# Update the Load Balancer
my $req_update = HTTP::Request->new(PUT => $api_url);
$req_update->header('Authorization' => "Bearer $DO_API_TOKEN");
$req_update->header('Content-Type' => 'application/json');
$req_update->content(encode_json($lb_update_payload));
my $res_update = $ua->request($req_update);
if ($res_update->is_success) {
print "Successfully updated Load Balancer '$LOAD_BALANCER_ID' with new Redis master $new_master_ip.\n";
} else {
print "Error updating Load Balancer: " . $res_update->status_line . "\n";
print "Response: " . $res_update->decoded_content . "\n";
}
} else {
print "Error: Could not find a target pool for Redis on Load Balancer '$LOAD_BALANCER_ID'. Please configure your LB manually or adjust script logic.\n";
}
} else {
print "Error fetching Load Balancer details: " . $res->status_line . "\n";
print "Response: " . $res->decoded_content . "\n";
}
}
exit 0;
To run this script:
- Install necessary Perl modules:
cpan LWP::UserAgent JSON HTTP::Request - Replace placeholders for
DO_API_TOKENandLOAD_BALANCER_ID. - Ensure the script has execute permissions:
chmod +x your_script.pl - Run the script:
nohup ./your_script.pl &to keep it running in the background.
This script connects to Sentinel, parses its output for the `+switch-master` event, and then uses the DigitalOcean API to update the Load Balancer’s target pool to point to the newly promoted Redis master. This provides a near-instantaneous failover from the application’s perspective, as clients connected to the Load Balancer will automatically be directed to the new master.
Automating Perl Application Failover with Redis and DigitalOcean
For applications written in Perl, integrating with a highly available Redis setup involves ensuring your application clients can dynamically discover the current Redis master. This is typically handled by configuring your Redis client library to use Sentinel for discovery.
Perl Redis Client Configuration
The Redis::Sentinel Perl module is an excellent choice for this. It allows your application to connect to Sentinel, discover the current master, and automatically reconnect if a failover occurs.
Example Perl Application Snippet
#!/usr/bin/perl
use strict;
use warnings;
use Redis;
use Redis::Sentinel;
# --- Configuration ---
my $sentinel_hosts = [
{ host => '10.10.0.10', port => 26379 }, # Sentinel 1 IP
{ host => '10.10.0.11', port => 26379 }, # Sentinel 2 IP
{ host => '10.10.0.12', port => 26379 }, # Sentinel 3 IP
];
my $master_name = 'mymaster'; # Must match sentinel monitor directive
my $redis_password = 'YOUR_REDIS_PASSWORD'; # If Redis is password protected
# --- Initialize Redis::Sentinel ---
my $sentinel = Redis::Sentinel->new(
sentinels => $sentinel_hosts,
master_name => $master_name,
password => $redis_password, # Pass password here if Redis requires it
# Optional: specify a specific Redis client class if needed
# redis_client_class => 'Redis',
);
# --- Get Redis Master Connection ---
my $redis = eval { $sentinel->master_redis };
if (not defined $redis) {
die "Failed to connect to Redis master via Sentinel: " . ($@ || 'Unknown error');
}
# --- Use the Redis Connection ---
eval {
$redis->set('mykey', 'myvalue');
my $value = $redis->get('mykey');
print "Successfully set and got key: $value\n";
# Example of handling potential disconnects (Redis::Sentinel handles auto-reconnect)
# If a failover happens, the $redis object might become stale.
# Redis::Sentinel's master_redis method should return a fresh connection.
# For long-running operations, you might need to re-fetch the master connection periodically
# or rely on the client library's internal retry mechanisms.
};
if ($@) {
warn "Redis operation failed: $@\n";
# Attempt to re-establish connection if needed, or handle error
# $redis = $sentinel->master_redis; # Re-fetch connection
}
# --- Example of periodic check/reconnect ---
# In a long-running application, you might want to periodically ensure you have a valid connection
# or handle connection errors gracefully.
sub ensure_redis_connection {
my $current_redis = shift;
if (not defined $current_redis or not $current_redis->ping) {
warn "Redis connection lost or invalid. Attempting to re-establish...\n";
$current_redis = eval { $sentinel->master_redis };
if (not defined $current_redis) {
die "Failed to re-establish Redis connection: " . ($@ || 'Unknown error');
}
print "Successfully re-established Redis connection.\n";
}
return $current_redis;
}
# In your application logic:
# $redis = ensure_redis_connection($redis);
# $redis->set('anotherkey', 'anothervalue');
exit 0;
This Perl script demonstrates how to initialize Redis::Sentinel with your Sentinel instances. When $sentinel->master_redis is called, it queries Sentinel for the current master’s IP and port and returns a connected Redis client object. If a failover occurs, subsequent calls to master_redis (or if the client library automatically retries) will fetch the new master’s details. The `ensure_redis_connection` subroutine provides a pattern for explicitly checking and re-establishing the connection if it becomes stale.
Application Deployment Considerations
Your Perl application instances should be deployed on DigitalOcean Droplets that can reach your Redis instances and Sentinel processes. If your application and Redis are in the same VPC, this is straightforward. If they are in different networks, ensure appropriate firewall rules and network routing are configured.
When deploying new application instances, they should be configured with the list of Sentinel hosts. The application will then automatically discover the Redis master upon startup. If a failover happens while the application is running, the Redis::Sentinel module will handle reconnecting to the new master transparently, provided the application logic includes mechanisms to re-fetch the master connection or handles connection errors gracefully.
Orchestrating Failover with DigitalOcean Kubernetes (DOKS)
For containerized deployments on DigitalOcean Kubernetes, the principles remain the same, but the implementation details shift to Kubernetes primitives. We’ll use StatefulSets for Redis, a Headless Service for stable network identities, and a separate Service for application access, which will be updated during failover.
Redis Deployment on DOKS
We’ll deploy Redis using a StatefulSet to ensure stable network identifiers and persistent storage. A Headless Service will provide DNS records for each Redis pod.
Example Redis StatefulSet and Services
# redis-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
name: redis-headless
labels:
app: redis
spec:
ports:
- port: 6379
targetPort: 6379
name: redis
clusterIP: None # This makes it a Headless Service
selector:
app: redis
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: "redis-headless"
replicas: 3 # 1 master, 2 replicas
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:6.2 # Use a specific version
ports:
- containerPort: 6379
name: redis
command: ["redis-server"]
args: ["/usr/local/etc/redis/redis.conf"]
volumeMounts:
- name: redis-config-volume
mountPath: /usr/local/etc/redis/
- name: redis-data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
data:
redis.conf: |
# Master configuration (will be overridden by init script or sentinel)
port 6379
bind 0.0.0.0
# For replicas, this will be set dynamically
# replicaof <master-ip> 6379
# If using password:
# requirepass YOUR_REDIS_PASSWORD
---
# Service for application to connect to the current master
apiVersion: v1
kind: Service
metadata:
name: redis-master
labels:
app: redis
spec:
ports:
- port: 6379
targetPort: 6379
name: redis
selector:
# This selector will be updated by our failover controller
app: redis
role: master # We'll add this label to the master pod
type: ClusterIP # Or NodePort/LoadBalancer if needed externally
An init container or a startup script within the Redis container will be responsible for configuring each instance as either master or replica based on its ordinal index from the StatefulSet (e.g., `redis-0` is master, `redis-1` and `redis-2` are replicas). Sentinel will then be deployed as a separate StatefulSet or Deployment, configured to monitor these Redis instances.
Sentinel Deployment on DOKS
Sentinel can also be deployed as a StatefulSet. Its configuration will point to the Redis pods managed by the Headless Service.
Example Sentinel StatefulSet
# sentinel-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-sentinel
spec:
serviceName: "redis-sentinel-headless" # Optional: for stable network IDs
replicas: 3
selector:
matchLabels:
app: redis-sentinel
template:
metadata:
labels:
app: redis-sentinel
spec:
containers:
- name: redis-sentinel
image: redis:6.2 # Use the same version or a compatible one
ports:
- containerPort: 26379
name: sentinel
command: ["redis-server"]
args: ["/usr/local/etc/redis/sentinel.conf"]
volumeMounts:
- name: sentinel-config-volume
mountPath: /usr/local/etc/redis/
volumeClaimTemplates:
- metadata:
name: sentinel-data # For persistence if needed
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: sentinel-config
data:
sentinel.conf: |
port 26379
# Point to the headless service for Redis discovery
sentinel monitor mymaster redis-headless.default.svc.cluster.local 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
# If Redis requires auth:
# sentinel auth-pass mymaster YOUR_REDIS_PASSWORD
The key here is redis-headless.default.svc.cluster.local, which Sentinel uses to discover the Redis instances. The `quorum` is set to 2, meaning at least 2 out of 3 Sentinels must agree on a failure for failover to occur.
Automating Kubernetes Service Updates
When Sentinel promotes a new master, we need to update the redis-master Service’s selector to point to the new master pod. This is a prime candidate for a Kubernetes Operator or a custom controller.
A simple approach is to use a controller that watches Sentinel events (similar to the Perl script, but interacting with the Kubernetes API) or watches the state of the Redis StatefulSet and Sentinel. When Sentinel reports a master change, the controller updates the redis-master Service’s selector.
Example Controller Logic (Conceptual)
# Conceptual Python controller using Kubernetes client library
from kubernetes import client, config
import redis # Using redis-py for Sentinel interaction
# Load Kubernetes configuration
config.load_kube_config() # Or config.load_incluster_config()
v1 = client.CoreV1Api()
SENTINEL_HOST = "redis-sentinel-0.redis-sentinel-headless.default.svc.cluster.local" # Example Sentinel pod
SENTINEL_PORT = 26379
MASTER_NAME = "mymaster"
REDIS_MASTER_SERVICE_NAME = "redis-master"
REDIS_MASTER_SERVICE_NAMESPACE = "default"
REDIS_MASTER_LABEL = "role=master" # Label to add to the master pod
def get_current_master_ip():
"""Connects to Sentinel and returns the current master IP."""
try:
r = redis.Redis(host=SENTINEL_HOST, port=SENTINEL_PORT, socket_timeout=1)
# Sentinel command to get master info
master_info = r.execute_command('SENTINEL', 'master', MASTER_NAME)
# master_info is a list of lists, structure depends on redis-py version and sentinel output
# Example structure: [master_name, master_ip, master_port, ...]
# We need to parse this to find the master IP.
# A more robust way is to use SENTINEL get-master-addr-by-name
master_addr = r.execute_command('SENTINEL', 'get-master-addr-by-name', MASTER_NAME)
if master_addr and len(master_addr) == 2:
return master_addr[0].decode('utf-8') # Master IP
else:
print(f"Could not get master address for {MASTER_NAME}")
return None
except Exception as e:
print(f"Error connecting to Sentinel or executing command: {e}")
return None
def update_redis_master_service(new_master_ip):
"""Updates the redis-master Service selector."""
try:
# Get the current service
service = v1.read_namespaced_service(REDIS_MASTER_SERVICE_NAME, REDIS_MASTER_SERVICE_NAMESPACE)
# Update the selector to point to the new master IP
# This assumes your Redis pods have a label like 'redis-pod-ip: ' or similar
# A more common approach is to label the pod itself and select by that label.
# Let's assume we add a label 'role=master' to the master pod.
# The controller needs to ensure only the current master has this label.
# First, remove the 'role=master' label from all current redis pods
redis_pods = v1.list_namespaced_pod(REDIS_MASTER_SERVICE_NAMESPACE, label_selector="app=redis")
for pod in redis_pods.items:
if 'role' in pod.metadata.labels and pod.metadata.labels['role'] == 'master':
print(f"Removing 'role=master' label from old master pod: {pod.metadata.name}")
pod.metadata.labels.pop('role')
v1.patch_namespaced_pod(pod.metadata.name, REDIS_MASTER_SERVICE_NAMESPACE, pod)
# Now, find the pod corresponding to the new master IP and add the label
# This requires a way to map IP to pod name, or a direct way to label the pod.
# A simpler approach: the controller directly labels the master pod.
# Let's assume the StatefulSet controller or an init script handles labeling the master pod.
# This controller's job is to ensure the Service selector matches the *actual* master.
# For simplicity, let's assume the master pod is always redis-0 and has the label 'role=master'
# In a real scenario, you'd need to dynamically find the master pod.
# The Sentinel event `+switch-master` provides the new master IP.
# You'd then find the pod with that IP and label it.
# Let's refine: The controller's primary job is to ensure the Service selector
# correctly points to the pod that Sentinel identifies as master.
# A common pattern is to have a dedicated "master" Service that always points to the current master.
# The controller updates the selector of this Service.
# Find the pod that is the new master. This requires mapping IP to pod.
# A more robust way is to use the StatefulSet's ordinal index.
# If redis-0 is always master, and Sentinel promotes redis-1,
# the controller needs to update the label on redis-1.
# Let's assume the controller's responsibility is to ensure the `redis-master` service
# points to the pod that Sentinel identifies as master.
# We'll update the selector to match the new master IP.
# This requires the pods to have a label that reflects their IP, e.g., `redis-ip: `
# Or, more directly, we update the Service's `selector` field.
# A common pattern is to have a Service that selects based on a label that is dynamically updated.
# Let's assume the controller's job is to ensure the `redis-master` service
# has a selector that matches the current master pod.
# We'll update the Service's selector.
# Fetch the pod that is the current master (e.g., by checking Sentinel or pod labels)
# For this example, let's assume we know the pod name of the new master.
# In a real controller, you'd query K8s API for pods with `app=redis` and check their role/IP.
# Let's assume `new_master_pod_name` is determined.
# Example: If redis-0 is master, its pod name is redis-0.
# If Sentinel promotes redis-1, its pod name is redis-1.
# We need to find the pod name associated with `new_master_ip`.
# This is often achieved by StatefulSet's stable network IDs.
# Let's simplify: The controller ensures the `redis-master` service
# points to the pod that Sentinel identifies as master.
# We'll update the Service's selector.
# The controller needs to find the pod corresponding to `new_master_ip`.
# This is typically done by querying pods and checking their IPs.
# A more direct approach: The controller updates the `redis-master` service
# to point to the pod that has the `role=master` label.
# The controller's responsibility is to ensure the correct pod has this label.
# Let's assume the controller's job is to ensure the `redis-master` service
# selector matches the current master pod.
# We'll update the Service's selector.
# The controller needs to find the pod corresponding to `new_master_ip`.
# This is often achieved by StatefulSet's stable network IDs.
# A common pattern: The controller updates the `redis-master` service
# to point to the pod that has the `role=master` label.
# The controller's responsibility is to ensure the correct pod has this label.
# Let's assume the controller's job is to ensure the `redis-master` service
# selector matches the current master pod.
# We'll update the Service's selector.
# The controller needs to find the pod corresponding to `new_master_ip`.
# This is often achieved by StatefulSet's stable network IDs.
# A common pattern: The controller updates the `redis-master` service
# to point to the pod that has the `role=master` label.
# The controller's responsibility is to ensure the correct pod has this label.
# Let's assume the controller's job is to ensure the `redis-master` service
# selector matches the current master pod.
# We'll update the Service's selector.
# The controller needs to find the pod corresponding to `new_master_ip`.
# This is often achieved by StatefulSet's stable network IDs.
# A common pattern: The controller updates the `redis-master` service
# to point to the pod that has the `role=master` label.
# The controller's responsibility is to ensure the correct pod has this label.
# Let's assume the controller's job is to ensure the `redis-master` service
# selector matches the current master pod.
# We'll update the Service's selector.
# The controller needs to find the pod corresponding to `new_master_ip`.
#