• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • Home
  • Projects
  • Products
  • Themes
  • Tools
  • Request for Quote

Vengala Vinay

Having 12+ Years of Experience in Software Development

  • Home
  • WordPress
  • PHP
    • Codeigniter
  • Django
  • Magento
  • Selenium
  • Server
Home » Disaster Recovery 101: Architecting Auto-Failovers for Redis and Ruby Deployments on AWS

Disaster Recovery 101: Architecting Auto-Failovers for Redis and Ruby Deployments on AWS

Designing for Resilience: Redis Sentinel and AWS RDS for PostgreSQL Auto-Failover

Achieving true high availability for critical services necessitates robust disaster recovery strategies, particularly for data stores. This post details the architecture and implementation of automated failover for a typical Ruby on Rails application leveraging Redis for caching and session management, and AWS RDS for PostgreSQL as the primary relational database. We will focus on leveraging AWS-native services and open-source tools to minimize manual intervention during an outage.

Redis High Availability with Sentinel

Redis Sentinel is the de facto standard for providing high availability for Redis. It monitors Redis instances, performs automatic failover if a master node becomes unavailable, and allows clients to discover the current master.

Sentinel Architecture Overview

A robust Sentinel setup requires a minimum of three Sentinel instances to ensure a quorum. These instances should be deployed across different Availability Zones (AZs) within your AWS region for maximum resilience. Each Sentinel instance monitors the Redis master and its replicas. If a master fails, Sentinels communicate to elect a new master from the available replicas.

Sentinel Configuration

The core configuration for Redis Sentinel is managed in a sentinel.conf file. Here’s a sample configuration for a setup with one master and two replicas, deployed across multiple AZs:

# sentinel.conf

port 26379
daemonize yes
pidfile /var/run/redis_sentinel.pid
logfile /var/log/redis/sentinel.log

# Monitor the Redis master named 'mymaster'
# The IP address and port of the master, and the quorum (minimum number of sentinels that must agree on a failure)
sentinel monitor mymaster 10.0.1.10 6379 2

# The time in milliseconds that the master must be unreachable for it to be considered failing
sentinel down-after-milliseconds mymaster 5000

# The time in milliseconds after which Sentinel will attempt to failover a master
sentinel failover-timeout mymaster 10000

# The number of replicas that can be reconfigured at the same time during failover
sentinel parallel-syncs mymaster 1

# Optional: If you have password-protected Redis instances
# sentinel auth-pass mymaster YOUR_REDIS_PASSWORD

# Optional: Specify the IP address and port of the Redis replicas
# sentinel upmon-replica mymaster 10.0.1.11 6379
# sentinel upmon-replica mymaster 10.0.1.12 6379

Key parameters:

  • sentinel monitor mymaster <ip> <port> <quorum>: Defines the master to monitor, its address, and the quorum required for failover. The quorum of 2 means at least two Sentinels must agree the master is down.
  • sentinel down-after-milliseconds mymaster 5000: If the master is unreachable for 5 seconds, it’s marked as S_DOWN (Subjectively Down).
  • sentinel failover-timeout mymaster 10000: If a majority of Sentinels agree the master is S_DOWN (Objectively Down – O_DOWN), a failover will be initiated after 10 seconds.
  • sentinel parallel-syncs mymaster 1: During failover, only one replica will be promoted to master at a time. This prevents overwhelming the network or the new master.

Deploying Sentinel on AWS

We recommend deploying Sentinel instances on EC2 instances within different Availability Zones. For example, if your Redis master and replicas are in us-east-1a, us-east-1b, and us-east-1c, deploy your Sentinel instances in each of these AZs. This ensures that a single AZ failure doesn’t prevent Sentinel from reaching a quorum.

You can automate the deployment and management of these EC2 instances using tools like Terraform or CloudFormation. Ensure appropriate Security Group rules are in place to allow communication between Sentinel instances and between Sentinels and Redis instances.

Integrating with Ruby Applications

The redis-rb gem, commonly used in Ruby applications, has built-in support for Redis Sentinel. You configure your Redis client to connect to the Sentinel instances, and it will automatically discover the current master.

# config/initializers/redis.rb
# For Rails applications

# Ensure you have the redis gem and the redis-sentinel gem installed
# gem 'redis'
# gem 'redis-sentinel'

# Example configuration for Redis Sentinel
sentinels = [
  { host: 'sentinel-az1.example.com', port: 26379 },
  { host: 'sentinel-az2.example.com', port: 26379 },
  { host: 'sentinel-az3.example.com', port: 26379 }
]

# Replace 'mymaster' with the name defined in your sentinel.conf
redis_client = Redis.new(
  driver: :sentinel,
  sentinels: sentinels,
  master_name: 'mymaster',
  url: ENV['REDIS_URL'] # Optional: For production, use environment variables
)

# If you need to set a password for Redis:
# redis_client = Redis.new(
#   driver: :sentinel,
#   sentinels: sentinels,
#   master_name: 'mymaster',
#   password: ENV['REDIS_PASSWORD'],
#   url: ENV['REDIS_URL']
# )

# Assign to a constant or use directly
$redis = redis_client

During a failover, the redis-rb gem will detect the change and reconnect to the new master automatically. It’s crucial to test this failover process thoroughly in a staging environment.

AWS RDS for PostgreSQL Auto-Failover

For your primary relational data, AWS Relational Database Service (RDS) offers managed Multi-AZ deployments that provide built-in high availability and automatic failover.

RDS Multi-AZ Deployment Explained

When you provision an RDS instance with Multi-AZ enabled, AWS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. All write operations are synchronously replicated to the standby replica. In the event of a primary instance failure (e.g., instance hardware failure, AZ outage, network disruption), RDS automatically initiates a failover to the standby replica. This process typically takes between 60 to 120 seconds, during which database availability is interrupted.

Configuring RDS for PostgreSQL Multi-AZ

You can enable Multi-AZ during instance creation or modify an existing instance. The configuration is straightforward via the AWS Management Console, AWS CLI, or Infrastructure as Code tools.

# Example using AWS CLI to create a Multi-AZ RDS instance
aws rds create-db-instance \
    --db-instance-identifier my-postgres-db \
    --db-instance-class db.r5.large \
    --engine postgres \
    --master-username admin \
    --master-user-password YOUR_DB_PASSWORD \
    --allocated-storage 100 \
    --storage-type gp2 \
    --multi-az \
    --vpc-security-group-ids sg-0123456789abcdef0 \
    --db-subnet-group-name my-db-subnet-group \
    --backup-retention-period 7 \
    --region us-east-1

The crucial parameter here is --multi-az. AWS handles the replication and failover automatically. The database endpoint remains the same, so your application doesn’t need to be reconfigured to point to a new IP address after a failover.

Application Integration and Connection Pooling

Your Ruby application connects to the RDS instance using its standard endpoint. For PostgreSQL, the pg gem (or jdbc-postgresql for JRuby) is used. Connection pooling is essential for performance and managing connections during failover events.

# config/database.yml (Rails example)

default: &default
  adapter: postgresql
  encoding: unicode
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  host: &db_host <%= ENV.fetch("RDS_HOSTNAME") { 'my-postgres-db.abcdefghijk.us-east-1.rds.amazonaws.com' } %>
  username: <%= ENV.fetch("RDS_USERNAME") { 'admin' } %>
  password: <%= ENV.fetch("RDS_PASSWORD") { 'YOUR_DB_PASSWORD' } %>
  port: <%= ENV.fetch("RDS_PORT") { 5432 } %>

development:
  <<: *default
  database: myapp_development

production:
  <<: *default
  database: myapp_production
  pool: 25 # Increase pool size for production

During an RDS failover, the database endpoint remains the same. However, active connections will be dropped. A well-configured connection pool will attempt to re-establish connections to the new primary instance after the failover is complete. It's important to tune your connection pool size appropriately. Too small a pool might lead to connection exhaustion during recovery, while too large a pool can strain the database.

Monitoring and Alerting

While failover is automated, proactive monitoring and alerting are critical. AWS CloudWatch provides metrics for both RDS and EC2 instances running Sentinel. Key metrics to monitor include:

  • RDS: CPUUtilization, DatabaseConnections, ReadIOPS, WriteIOPS, ReplicaLag (for read replicas, though not directly for Multi-AZ failover), FreeableMemory. Crucially, monitor the RDS-EVENT-LOG for "failover" events.
  • EC2 (Sentinel Hosts): CPUUtilization, NetworkIn, NetworkOut, StatusCheckFailed_Instance, StatusCheckFailed_System.
  • Redis (if self-managed): used_memory, connected_clients, instantaneous_ops_per_sec, rejected_connections.

Set up CloudWatch Alarms for critical thresholds. For example, an alarm on StatusCheckFailed_Instance for your Sentinel EC2 instances or a high CPUUtilization on your RDS instance can trigger notifications via SNS to your operations team. You should also configure alarms for Redis-specific metrics if you are managing Redis directly.

Testing Your Failover Strategy

Automated failover is only as good as its last successful test. Regularly simulate failures to validate your setup:

  • Redis Sentinel Failover: Gracefully stop the Redis master process. Observe Sentinel logs and your application's behavior. Then, restart the old master and observe how Sentinel re-integrates it as a replica.
  • RDS Multi-AZ Failover: From the AWS Management Console, select your RDS instance and choose "Reboot" with the "Reboot with failover" option. Monitor the failover process and application connectivity.
  • Simulate AZ Failure: If possible, use AWS Fault Injection Simulator or manually stop instances in one AZ to test the resilience of your multi-AZ deployments.

Document the expected behavior, the time it takes for failover, and any manual steps required (though the goal is zero manual steps). This documentation is invaluable during a real incident.

Conclusion

By combining Redis Sentinel for in-memory data store high availability and AWS RDS Multi-AZ deployments for relational databases, you can architect a resilient system capable of automated failover. This approach significantly reduces downtime and operational burden, allowing your engineering teams to focus on feature development rather than reactive incident management. Remember that continuous testing and monitoring are paramount to maintaining confidence in your disaster recovery strategy.

Primary Sidebar

A little about the Author

Having 12+ Years of Experience in Software Development, Vinay is a principal software architect, senior systems engineer, and elite technical consultant. He specializes in bespoke PHP/WordPress development, high-performance Magento 2 & Shopify architectures, custom plugin/theme development from scratch, and legacy code modernization (including VB6, VB.NET, PyQt, and Crystal Reports). Known for solving complex database bottlenecks, speed optimization (Core Web Vitals), and advanced security code auditing, Vinay engineers production-ready systems designed to scale under heavy concurrent load conditions.



Chat on WhatsApp

Recent Posts

  • How to securely integrate Shopify headless API endpoints into WordPress custom plugins using WordPress Settings API
  • How to securely integrate AWS S3 file uploads endpoints into WordPress custom plugins using WordPress Options API
  • Step-by-Step Guide to building a custom automatic translation switcher block for Gutenberg using REST API custom routes
  • Step-by-Step Guide to building a custom secure file encryption vault block for Gutenberg using custom WebAssembly modules
  • How to securely integrate SendGrid transactional mailer endpoints into WordPress custom plugins using Transients API

Categories

  • apache (1)
  • Business & Monetization (390)
  • Centos (4)
  • Comparisons & Decision Making (55)
  • Debian (2)
  • Debugging & Troubleshooting (588)
  • Desktop Applications (14)
  • DevOps (7)
  • DevOps & Cloud Scaling (962)
  • Django (1)
  • Laravel (4)
  • Migration & Architecture (192)
  • Mobile Applications (24)
  • MySQL (1)
  • Performance & Optimization (811)
  • PHP (5)
  • PHP Development (23)
  • Plugins & Themes (244)
  • Programming Languages (9)
  • Python (20)
  • Ruby on Rails (1)
  • Security & Compliance (554)
  • SEO & Growth (492)
  • Server (23)
  • Ubuntu (9)
  • VB6 & VB.NET (8)
  • Web Applications & Frontend (19)
  • Web Assembly (Wasm) (2)
  • WordPress (22)
  • WordPress Plugin Development (37)
  • WordPress Theme Development (357)

Recent Posts

  • How to securely integrate Shopify headless API endpoints into WordPress custom plugins using WordPress Settings API
  • How to securely integrate AWS S3 file uploads endpoints into WordPress custom plugins using WordPress Options API
  • Step-by-Step Guide to building a custom automatic translation switcher block for Gutenberg using REST API custom routes

Top Categories

  • DevOps & Cloud Scaling (962)
  • Performance & Optimization (811)
  • Debugging & Troubleshooting (588)
  • Security & Compliance (554)
  • SEO & Growth (492)
  • Business & Monetization (390)

Our Products

  • ERP & LMS Systems (4)
  • Directories & Marketplaces (4)
  • Healthcare Portals (3)
  • Point of Sale (POS) (2)
  • E-Commerce Engines (2)

Our Services

  • E-Commerce Development (10)
  • WordPress Development (8)
  • Python & Desktop GUI (7)
  • General Consulting (7)
  • Legacy Modernization (5)
  • Mobile App Development (4)

Copyright © 2026 · Vinay Vengala