Disaster Recovery 101: Architecting Auto-Failovers for Redis and Ruby Deployments on OVH
Redis Sentinel for High Availability
Achieving automated failover for Redis requires a robust high-availability solution. Redis Sentinel is the de facto standard for this purpose. It provides monitoring, notification, and automatic failover for Redis instances. We’ll deploy Sentinel in a quorum-based configuration across multiple availability zones within OVHcloud’s infrastructure to ensure resilience.
The core idea behind Sentinel is that a majority of Sentinels must agree on the state of a Redis master (e.g., that it’s down) before initiating a failover. This prevents split-brain scenarios and ensures that failover actions are only taken when truly necessary.
Sentinel Configuration (`sentinel.conf`)
Each Sentinel instance needs a configuration file. Here’s a sample `sentinel.conf` tailored for an OVH deployment, assuming Redis masters are running on ports 6379 and Sentinels on 26379. We’ll define our primary Redis master and specify the quorum required for failover.
# sentinel.conf port 26379 daemonize yes pidfile /var/run/redis_sentinel.pid logfile /var/log/redis/sentinel.log # Define the master we want to monitor. # 'mymaster' is the name we'll use to refer to this master. # 192.168.1.100 6379 is the IP and port of the master. # 2 is the number of replicas that Sentinel should consider for failover. # The last argument, 1, is the quorum: the minimum number of Sentinels # that must agree that the master is down for a failover to be initiated. # For high availability, this should be at least (N/2) + 1, where N is the # total number of Sentinels. sentinel monitor mymaster 192.168.1.100 6379 2 # This is the minimum number of Sentinels that must agree that the master # is unreachable before Sentinel tries to promote a replica. # For a 3-node Sentinel cluster, this should be 2. sentinel down-after-milliseconds mymaster 5000 # This is the time in milliseconds after which Sentinel will start # the Sentinel Leader Election in order to select a Sentinel # that will perform the failover. sentinel failover-timeout mymaster 10000 # Number of replicas to promote. In this case, we'll promote one replica. sentinel parallel-syncs mymaster 1 # If you have multiple masters, you can define them here. # sentinel monitor mymaster2 192.168.1.101 6379 2 # Optional: Authentication for Redis instances. # sentinel auth-pass mymaster YourRedisPassword # Optional: Sentinel authentication. # sentinel auth-user mymaster YourSentinelUsername # sentinel auth-pass mymaster YourSentinelPassword
Deploy at least three Sentinel instances across different OVHcloud Availability Zones (e.g., GRA, RBX, BHS). This ensures that if one zone becomes unavailable, the remaining Sentinels can still form a quorum and manage failover.
Starting Redis and Sentinel Instances
On your designated Redis master and replica servers, start Redis with appropriate configurations. On your Sentinel servers, start the Sentinel process using the `redis-sentinel` executable and pointing to your `sentinel.conf` file.
Example command to start a Redis master:
redis-server /etc/redis/redis.conf
Example command to start a Redis replica (assuming master is at 192.168.1.100):
redis-server /etc/redis/redis_replica.conf --replicaof 192.168.1.100 6379
Example command to start a Sentinel instance:
redis-sentinel /etc/redis/sentinel.conf
Integrating with Ruby Applications
Your Ruby application needs to be aware of the Redis Sentinel setup. Instead of connecting directly to a single Redis master IP, it should connect to the Sentinel ensemble. The `redis-rb` gem, a popular Ruby client for Redis, has excellent Sentinel support.
Configuring `redis-rb` for Sentinel
When initializing your Redis client in your Ruby application, you’ll provide a list of Sentinel hosts and the name of the master group as defined in your `sentinel.conf` (e.g., `mymaster`).
# config/initializers/redis.rb (or similar)
# Ensure you have the redis gem installed: gem install redis
require 'redis'
# List of Sentinel hosts and their ports.
# These should be the IPs/hostnames of your Sentinel instances.
SENTINEL_HOSTS = [
['sentinel-1.your-domain.com', 26379],
['sentinel-2.your-domain.com', 26379],
['sentinel-3.your-domain.com', 26379]
]
# The name of the master group as defined in sentinel.conf
REDIS_MASTER_NAME = 'mymaster'
# Initialize the Redis client using Sentinel
begin
# The Redis.new method can directly take Sentinel hosts and master name.
# It will automatically discover the current master.
$redis = Redis.new(
driver: :sentinel,
sentinels: SENTINEL_HOSTS,
master_name: REDIS_MASTER_NAME,
# Optional: If your Redis instances require authentication
# password: 'YourRedisPassword'
)
# You can also explicitly get the master connection if needed for specific operations
# or to verify connectivity.
# $redis_master = Redis.new(url: $redis.master.first)
# Ping to ensure connection is established
$redis.ping
rescue Redis::CannotConnectError => e
Rails.logger.error "Failed to connect to Redis Sentinel: #{e.message}"
# Handle connection error - perhaps fall back to a read-only mode or
# display an error to the user.
$redis = nil # Ensure $redis is nil if connection fails
end
# Example usage in your application:
# if $redis
# $redis.set('mykey', 'myvalue')
# value = $redis.get('mykey')
# else
# # Handle Redis unavailability
# end
When the application starts, `redis-rb` will query the provided Sentinels to discover the current master. If a failover occurs, the `redis-rb` client will automatically re-query the Sentinels to find the new master and reconnect. This abstracts away the failover process from your application logic.
Handling Redis Unavailability Gracefully
Even with automated failover, there will be a brief period during failover where Redis is unavailable. Your Ruby application should be designed to handle this gracefully. This might involve:
- Implementing retry mechanisms with exponential backoff for Redis operations.
- Serving stale data from a cache if real-time data is not critical during the brief outage.
- Displaying a user-friendly message indicating temporary service degradation.
- Logging these events for monitoring and alerting.
The `redis-rb` gem’s Sentinel driver handles the reconnection automatically, but your application’s business logic needs to account for the potential latency or temporary unavailability.
OVHcloud Specific Considerations
When deploying on OVHcloud, several factors are crucial for a successful Redis HA setup:
Network Configuration and Security Groups
Ensure that your OVHcloud Security Groups (or equivalent firewall rules) allow traffic:
- Between Redis master, replicas, and Sentinels on port 6379 (or your configured Redis port).
- Between Sentinels on port 26379 (or your configured Sentinel port).
- From your application servers to the Redis master/Sentinels on port 6379 and 26379 respectively.
It’s best practice to restrict these ports to only the necessary internal IP ranges or specific security group IDs within your OVHcloud project to minimize the attack surface.
Instance Placement and Availability Zones
As mentioned, deploy your Redis master, replicas, and Sentinel instances across different OVHcloud Availability Zones (e.g., GRA1, GRA2, GRA3). This is fundamental for achieving true high availability. If one zone experiences an outage, your Redis service can continue to operate from other zones.
Monitoring and Alerting
Beyond Redis Sentinel’s built-in monitoring, integrate with OVHcloud’s monitoring tools or a third-party solution (like Prometheus/Grafana, Datadog) to track:
- Redis Sentinel health (number of masters down, number of Sentinels down).
- Redis master and replica performance metrics (latency, memory usage, CPU, network I/O).
- Application-level Redis connection errors.
Set up alerts for critical events, such as Sentinel reporting a master as down or a significant increase in connection errors from your application.
Automated Deployment (IaC)
For production environments, manage your Redis and Sentinel deployments using Infrastructure as Code (IaC) tools like Terraform or Ansible. This ensures consistency, repeatability, and simplifies disaster recovery planning. Your IaC scripts should define:
- OVHcloud instance creation and configuration.
- Security group rules.
- Redis and Sentinel installation and configuration file generation.
- Service startup and management (e.g., using systemd).
This approach allows you to quickly provision a new Redis HA cluster in a different region or zone if a catastrophic failure occurs that affects an entire OVHcloud region.
Testing Failover Scenarios
Regularly testing your failover mechanism is non-negotiable. You can simulate failures manually to verify that Sentinel correctly promotes a replica and that your application reconnects seamlessly.
Manual Failover Testing Steps
1. **Identify the current master:** Use `redis-cli` connected to any Sentinel or the current master to check its status.
redis-cli -hThis will return details about the master, including its IP and port. 2. **Simulate master failure:** * **Graceful shutdown:** Connect to the master using `redis-cli` and issue the `SHUTDOWN` command. * **Hard kill:** Terminate the `redis-server` process on the master instance (e.g., `sudo kill-p 26379 SENTINEL master mymaster
Automated Testing
For more advanced setups, consider integrating automated failover tests into your CI/CD pipeline. This could involve scripts that trigger a simulated failure, wait for the failover to complete, and then run a suite of integration tests against the Redis instance.