Disaster Recovery 101: Architecting Auto-Failovers for Redis and Ruby Deployments on DigitalOcean

Establishing a Redis Sentinel Cluster for High Availability

For robust Redis deployments, a single instance is a single point of failure. Implementing Redis Sentinel provides automatic failover and high availability. This section details the setup of a three-node Sentinel cluster on DigitalOcean Droplets, ensuring quorum and resilience.

We’ll assume you have three Droplets provisioned, each running Ubuntu 22.04 LTS. For simplicity, we’ll use private IP addresses for inter-node communication. Ensure your firewall rules allow traffic on port 26379 (Sentinel) and 6379 (Redis) between these Droplets.

Sentinel Configuration (`sentinel.conf`)

On each Sentinel node, create or modify the sentinel.conf file. The key parameters are:

port 26379: The default Sentinel port.
sentinel monitor mymaster <master-ip> 6379 2: This is the core directive. mymaster is the arbitrary name for your Redis master. <master-ip> should be the private IP of your primary Redis instance. 6379 is the Redis port. 2 is the quorum – the number of Sentinels that must agree a master is down before initiating a failover. For a 3-node cluster, a quorum of 2 is appropriate.
sentinel down-after-milliseconds mymaster 5000: The time in milliseconds a Sentinel must wait without receiving a reply from a Redis instance before marking it as “Subjectively Down” (SDOWN).
sentinel failover-timeout mymaster 10000: The maximum time in milliseconds allowed for a failover to complete.
sentinel parallel-syncs mymaster 1: The number of replicas that can be reconfigured to sync with the new master simultaneously during a failover.

Here’s an example configuration for Sentinel Node 1, assuming your master Redis is on 10.10.0.5:

# sentinel.conf on Sentinel Node 1 (e.g., 10.10.0.6)
port 26379
sentinel monitor mymaster 10.10.0.5 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
sentinel parallel-syncs mymaster 1
# If you have replicas, add them here for monitoring
# sentinel can-failover-timeout mymaster 60000
# sentinel auth-pass mymaster YourRedisPassword

Repeat this configuration on Sentinel Node 2 (e.g., 10.10.0.7) and Sentinel Node 3 (e.g., 10.10.0.8), updating the <master-ip> to point to your primary Redis instance. If your Redis instances are configured with a password, uncomment and set sentinel auth-pass mymaster YourRedisPassword on all Sentinel nodes.

Starting Redis and Sentinel Services

First, ensure Redis is installed and configured to run as a service. The default redis.conf is usually sufficient for basic setup, but ensure it’s bound to the correct network interface (e.g., bind 0.0.0.0 or your Droplet’s private IP) and that persistence is enabled (e.g., appendonly yes).

On your primary Redis Droplet (e.g., 10.10.0.5):

sudo systemctl start redis-server
sudo systemctl enable redis-server

On each Sentinel Droplet (e.g., 10.10.0.6, 10.10.0.7, 10.10.0.8):

sudo systemctl start redis-sentinel
sudo systemctl enable redis-sentinel

Verify the status of the Sentinel service:

sudo systemctl status redis-sentinel

You should see output indicating the Sentinel is running and has connected to other Sentinels. After a short period, you can check the master’s status from any Sentinel:

redis-cli -p 26379 SENTINEL master mymaster

This command will return details about the master, including its IP, port, number of replicas, and the current leader Sentinel.

Integrating Ruby Applications with Redis Sentinel

Your Ruby application needs to be aware of the Sentinel cluster to connect to the current Redis master. The redis-rb gem provides excellent support for this. Instead of connecting directly to a single Redis instance, you configure it to use Sentinel.

Gemfile Configuration

Ensure you have the redis gem in your Gemfile:

# Gemfile
gem 'redis'

Run bundle install to install it.

Redis Client Initialization

In your application’s initialization code (e.g., an initializer in Rails, or a central configuration file in Sinatra/other frameworks), configure the Redis client to use Sentinel:

# config/initializers/redis.rb (Rails example)

# Define your Sentinel nodes and master name
sentinel_hosts = [
  { host: '10.10.0.6', port: 26379 }, # Sentinel Node 1 Private IP
  { host: '10.10.0.7', port: 26379 }, # Sentinel Node 2 Private IP
  { host: '10.10.0.8', port: 26379 }  # Sentinel Node 3 Private IP
]
redis_master_name = 'mymaster' # Must match sentinel.conf

# Initialize the Redis client using Sentinel
# The `redis-rb` gem will automatically discover the current master
# through the Sentinel cluster.
$redis = Redis.new(
  role: 'master', # Explicitly state we want the master
  sentinels: sentinel_hosts,
  master_name: redis_master_name,
  # If your Redis requires a password:
  # password: 'YourRedisPassword',
  # If your Redis is not on default port 6379:
  # port: 6379
)

# Optional: Verify connection and role
begin
  puts "Connecting to Redis master: #{$redis.client.host}:#{$redis.client.port}"
  puts "Redis role: #{$redis.role}"
rescue Redis::CannotConnectError => e
  Rails.logger.error "Failed to connect to Redis: #{e.message}"
  # Handle connection error appropriately, e.g., retry or alert
end

The redis-rb gem, when configured with sentinels and master_name, will query the Sentinel cluster to discover the current master’s address. It will automatically reconnect and re-discover the master if a failover occurs. The role: 'master' option is crucial for ensuring the client connects to the master, not a replica.

Testing Failover

To simulate a failover, you can manually stop the primary Redis instance. On the primary Redis Droplet:

sudo systemctl stop redis-server

Observe the logs on your Sentinel nodes. You should see Sentinels detecting the master as down, electing a leader Sentinel, and promoting a replica (if configured) to become the new master. Your Ruby application, upon its next Redis operation that requires a connection, will query the Sentinels, discover the new master, and reconnect.

You can verify the new master by running redis-cli -p 26379 SENTINEL master mymaster on any Sentinel node. The output should reflect the new master’s IP address.

Architecting for Resilience: HAProxy for Application Load Balancing

While Redis Sentinel handles Redis failover, your application servers might also experience issues. A robust architecture often involves load balancing for the application layer itself. HAProxy is an excellent choice for this, providing high availability and load balancing for your Ruby web applications (e.g., Rails, Sinatra).

HAProxy Configuration (`haproxy.cfg`)

We’ll set up HAProxy to distribute traffic across multiple application server instances. For true high availability of HAProxy itself, you would typically run two HAProxy instances in an active/passive or active/active setup using Keepalived or similar. For this example, we focus on a single HAProxy instance acting as a load balancer for your Ruby app servers.

Assume you have at least two Ruby application servers (e.g., Puma/Unicorn) running on Droplets 10.10.0.10 and 10.10.0.11, listening on port 3000.

# /etc/haproxy/haproxy.cfg

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend http_frontend
    bind *:80
    # Use the Redis client configuration from the previous section
    # Ensure your application connects to the Redis instance via the
    # application's configuration, not directly from HAProxy to Redis.
    default_backend http_backend

backend http_backend
    balance roundrobin
    option httpchk GET /health # Health check endpoint on your app
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    server app1 10.10.0.10:3000 check
    server app2 10.10.0.11:3000 check
    # Add more app servers as needed
    # server app3 10.10.0.12:3000 check

In this configuration:

The global section sets up logging and daemonization.
The defaults section defines common settings for all frontends and backends, including timeouts and error files.
The frontend http_frontend listens on port 80 and directs all incoming HTTP traffic to the http_backend.
The backend http_backend uses the roundrobin algorithm to distribute requests.
option httpchk GET /health configures HAProxy to periodically send a GET request to the /health endpoint on each application server. If a server fails this check, HAProxy will temporarily remove it from the pool of active servers.
server app1 10.10.0.10:3000 check defines an application server instance. The check directive enables health checking.

Install HAProxy:

sudo apt update
sudo apt install haproxy -y

After configuring /etc/haproxy/haproxy.cfg, restart HAProxy:

sudo systemctl restart haproxy
sudo systemctl enable haproxy

Ensure your DigitalOcean firewall allows inbound traffic on port 80 to your HAProxy Droplet.

Application Health Check Endpoint

Your Ruby application needs a simple endpoint that HAProxy can query to determine its health. This endpoint should check critical dependencies, most importantly, the connection to Redis.

# routes/health.rb (Sinatra example)
get '/health' do
  begin
    # Check Redis connection
    $redis.ping # Or any other simple Redis command
    status 200
    body 'OK'
  rescue Redis::CannotConnectError, Redis::TimeoutError => e
    logger.error "Health check failed: Redis connection error - #{e.message}"
    status 503 # Service Unavailable
    body 'Redis connection error'
  rescue => e
    logger.error "Health check failed: Unexpected error - #{e.message}"
    status 500 # Internal Server Error
    body 'Internal server error'
  end
end

# For Rails, you might create a controller and route:
# app/controllers/health_controller.rb
# class HealthController < ApplicationController
#   def show
#     begin
#       $redis.ping
#       render json: { status: 'OK' }, status: :ok
#     rescue Redis::CannotConnectError, Redis::TimeoutError => e
#       render json: { error: 'Redis connection error' }, status: :service_unavailable
#     rescue => e
#       render json: { error: 'Internal server error' }, status: :internal_server_error
#     end
#   end
# end
#
# config/routes.rb
# get '/health', to: 'health#show'

When an application server fails its health check (e.g., due to Redis unavailability or an internal error), HAProxy will stop sending traffic to it. Once the server recovers and passes health checks again, HAProxy will automatically re-add it to the rotation.

Automated Failover Strategy Summary

This architecture provides a multi-layered approach to automated failover:

Redis High Availability: Redis Sentinel monitors the Redis master and automatically promotes a replica if the master becomes unavailable. Your Ruby application, configured to use Sentinel, seamlessly reconnects to the new master.
Application High Availability: HAProxy load balances incoming traffic across multiple application server instances. Its health checking mechanism detects unresponsive or unhealthy application servers and temporarily removes them from rotation, ensuring traffic is only sent to healthy instances.

By combining these technologies, you create a resilient system where failures in individual components (Redis master, application server instance) are automatically handled with minimal or no downtime for your users. The key is the correct configuration of Sentinel for Redis and the health check integration within HAProxy for your application layer.