Disaster Recovery 101: Architecting Auto-Failovers for Redis and Shopify Deployments on DigitalOcean
Leveraging Redis Sentinel for High Availability
For critical applications relying on Redis, particularly those integrated with platforms like Shopify, a robust disaster recovery strategy is paramount. The cornerstone of Redis high availability is Redis Sentinel. Sentinel provides monitoring, notification, and automatic failover for Redis instances. This section details the setup and configuration of a basic Redis Sentinel cluster on DigitalOcean droplets.
We’ll assume you have three DigitalOcean droplets provisioned, each running Ubuntu 22.04 LTS. These will host our Redis master, replica, and Sentinel instances. For simplicity, we’ll place one Sentinel on each droplet, alongside a Redis instance. In a production environment, you’d likely dedicate droplets to Sentinel for better isolation and resource management.
Sentinel Configuration
On each droplet, install Redis if it’s not already present:
sudo apt update sudo apt install redis-server -y
Next, configure the Redis instances. For the master and replica, ensure they are accessible by the Sentinel instances. We’ll bind them to the private IP addresses of the droplets. Edit the Redis configuration file (typically /etc/redis/redis.conf) on each server.
Master Redis Configuration (e.g., droplet-1)
# /etc/redis/redis.conf on droplet-1 (Master) bind 127.0.0.1 [DROPLET_PRIVATE_IP_1] protected-mode no port 6379 daemonize yes pidfile /var/run/redis/redis-server.pid logfile /var/log/redis/redis-server.log dir /var/lib/redis # Replication settings (for replica) # masterauth your_redis_password # replicaof [MASTER_PRIVATE_IP] 6379
Restart Redis on the master:
sudo systemctl restart redis-server
Replica Redis Configuration (e.g., droplet-2)
# /etc/redis/redis.conf on droplet-2 (Replica) bind 127.0.0.1 [DROPLET_PRIVATE_IP_2] protected-mode no port 6379 daemonize yes pidfile /var/run/redis/redis-server.pid logfile /var/log/redis/redis-server.log dir /var/lib/redis # Replication settings masterauth your_redis_password replicaof [DROPLET_PRIVATE_IP_1] 6379
Restart Redis on the replica:
sudo systemctl restart redis-server
Now, configure Redis Sentinel. Create a Sentinel configuration file, e.g., /etc/redis/sentinel.conf, on each of the three droplets.
Sentinel Configuration File (on all 3 droplets)
# /etc/redis/sentinel.conf port 26379 daemonize yes pidfile /var/run/redis/redis-sentinel.pid logfile /var/log/redis/redis-sentinel.log dir /var/lib/redis # Monitor the master Redis instance # The 'quorum' is the number of Sentinels that must agree that the master is down # for a failover to be initiated. It's recommended to set it to (N / 2) + 1, # where N is the total number of Sentinels. For 3 Sentinels, quorum is 2. sentinel monitor mymaster [DROPLET_PRIVATE_IP_1] 6379 2 # Authentication for the master (if set in redis.conf) sentinel auth-pass mymaster your_redis_password # Failover timeout (in milliseconds) sentinel down-after-milliseconds mymaster 5000 # Parallel syncs (number of replicas that can be reconfigured in parallel) sentinel parallel-syncs mymaster 1 # Notification script (optional, for external alerting) # sentinel notification-script mymaster /path/to/your/notification_script.sh
Start the Sentinel service on each droplet:
sudo systemctl start redis-sentinel sudo systemctl enable redis-sentinel
Verify the Sentinel status:
sudo systemctl status redis-sentinel
You can connect to a Sentinel instance to check the status of your Redis cluster:
redis-cli -p 26379 SENTINEL master mymaster
This command will show details about the master, including its current status, IP address, port, and the number of replicas and Sentinels currently connected and observing it. When the master fails, Sentinel will elect a new master from the replicas and update the configuration for other replicas and clients.
Automating Shopify Application Failover
Integrating Redis Sentinel with your Shopify deployment on DigitalOcean requires a mechanism to inform your application when a failover occurs and to reconfigure it to point to the new master. This typically involves a combination of application-level configuration management and potentially external monitoring tools.
Application Configuration Management
Your Shopify application (whether it’s a custom app, a headless storefront backend, or a related service) needs to be aware of the Redis master’s IP address. The most common approach is to use environment variables or configuration files that can be dynamically updated.
1. Environment Variables:
If your application reads Redis connection details from environment variables (e.g., REDIS_HOST, REDIS_PORT), you’ll need a process to update these variables on your application servers when a failover happens. This can be managed via:
- Orchestration Tools (Kubernetes, Docker Swarm): These platforms have built-in mechanisms for updating environment variables in running containers. A successful failover event can trigger a rolling update of your application pods/services.
- Configuration Management Tools (Ansible, Chef, Puppet): These tools can be used to push updated configuration files or environment variable definitions to your application servers.
- Custom Scripts: A dedicated script can monitor Sentinel’s output or query Sentinel directly for the current master, and then update application configuration files or restart application services.
Example: Using Ansible for Configuration Updates
Let’s consider an Ansible playbook that dynamically fetches the current Redis master IP from Sentinel and updates an application’s configuration file. First, you’ll need an Ansible inventory that lists your application servers and potentially your Redis Sentinel servers.
# inventory.ini [redis_sentinels] droplet-1 ansible_host=[DROPLET_PUBLIC_IP_1] droplet-2 ansible_host=[DROPLET_PUBLIC_IP_2] droplet-3 ansible_host=[DROPLET_PUBLIC_IP_3] [app_servers] appserver-1 ansible_host=[APP_SERVER_PUBLIC_IP_1] appserver-2 ansible_host=[APP_SERVER_PUBLIC_IP_2]
Now, an Ansible playbook to update application configuration. This playbook will use the redis_command module (which you might need to install or adapt if not available) or a simple shell command to query Sentinel.
---
- name: Update application Redis configuration
hosts: app_servers
gather_facts: no
vars:
redis_sentinel_host: "[DROPLET_PRIVATE_IP_OF_ONE_SENTINEL]" # Pick one Sentinel to query
redis_sentinel_port: 26379
redis_master_name: "mymaster"
app_config_path: "/etc/myapp/config.yml" # Example path
tasks:
- name: Get current Redis master IP from Sentinel
shell: "redis-cli -h {{ redis_sentinel_host }} -p {{ redis_sentinel_port }} SENTINEL get-master-addr-by-name {{ redis_master_name }} | head -n 1"
register: redis_master_info
changed_when: false # This command is for information gathering, not state change
- name: Extract master IP address
set_fact:
current_redis_master_ip: "{{ redis_master_info.stdout }}"
when: redis_master_info.stdout is defined and redis_master_info.stdout != ""
- name: Update application configuration file
lineinfile:
path: "{{ app_config_path }}"
regexp: "^redis_host:"
line: "redis_host: {{ current_redis_master_ip }}"
when: current_redis_master_ip is defined
- name: Restart application service to apply changes
systemd:
name: myapp.service # Replace with your actual service name
state: restarted
when: current_redis_master_ip is defined
This playbook can be triggered manually after a failover is detected or integrated into a more sophisticated automated failover system. The key is that your application configuration must be externalized and manageable.
Client-Side Redis Libraries and Failover
Many Redis client libraries have built-in support for Sentinel. Instead of configuring a single host, you provide a list of Sentinel hosts. The library then queries these Sentinels to discover the current master and automatically reconnects if a failover occurs. This is the most seamless approach.
For example, in Python with the redis-py library:
import redis
# List of Sentinel hosts
sentinels = [
('[DROPLET_PRIVATE_IP_1]', 26379),
('[DROPLET_PRIVATE_IP_2]', 26379),
('[DROPLET_PRIVATE_IP_3]', 26379),
]
# Master name as defined in sentinel.conf
master_name = 'mymaster'
# Create a Redis Sentinel client
# The client will automatically discover the master and handle failovers
try:
r = redis.Redis(
service_name=master_name,
sentinels=sentinels,
socket_timeout=None # Use default timeout
)
# Test connection
r.ping()
print("Successfully connected to Redis master.")
# Now you can use 'r' for your Redis operations
r.set('mykey', 'myvalue')
print(f"Value for mykey: {r.get('mykey').decode()}")
except redis.exceptions.ConnectionError as e:
print(f"Could not connect to Redis: {e}")
# Implement fallback logic or error handling here
except Exception as e:
print(f"An unexpected error occurred: {e}")
If your application framework or language has a similar Sentinel-aware client library, prioritize its use. This offloads the failover logic to the client, simplifying your application code and infrastructure management.
DigitalOcean Load Balancers for Application Servers
While Redis Sentinel handles the database failover, your application servers themselves might need a high-availability setup. DigitalOcean Load Balancers are an excellent way to distribute traffic across multiple application droplets, ensuring that if one droplet fails, traffic is automatically routed to healthy instances.
Setting up a Load Balancer
1. Create a Load Balancer: Navigate to the “Networking” section in your DigitalOcean control panel and select “Create Load Balancer.”
2. Configure Frontend:
- Protocol: Typically HTTP or HTTPS.
- Port: 80 for HTTP, 443 for HTTPS.
- SSL/TLS: If using HTTPS, you can either terminate SSL at the load balancer (recommended) by uploading your certificate or use a pass-through configuration.
3. Configure Backend:
- Droplets: Select the droplets that host your Shopify application instances.
- Health Check: This is crucial for automatic failover. Configure a health check that the load balancer will use to determine if a backend droplet is healthy. For a web application, this is often an HTTP GET request to a specific endpoint (e.g.,
/health) that returns a 2xx status code if the application is running correctly.
# Example Health Check Configuration Protocol: HTTP Path: /health Port: 80 Check Interval: 10s Response Timeout: 5s Healthy Threshold: 3 Unhealthy Threshold: 3
4. Enable Sticky Sessions (Optional): If your application requires sessions to be maintained on a specific server, enable sticky sessions. However, for stateless applications or those using external session stores (like Redis!), this is usually not necessary.
Once configured, your application will be accessible via the Load Balancer’s public IP address. DigitalOcean’s Load Balancer automatically removes unhealthy droplets from the pool of available backends based on the health check results, effectively achieving automatic failover at the application server level.
Monitoring and Alerting
A robust disaster recovery strategy is incomplete without comprehensive monitoring and alerting. You need to be notified proactively about potential issues before they impact users.
Key Metrics to Monitor
- Redis Sentinel Health: Monitor the status of Sentinel instances. Are they all running? Are they able to communicate with each other? Are they reporting the master as up/down?
- Redis Master/Replica Status: Check replication lag, memory usage, network traffic, and command latency for both master and replicas.
- Application Server Health: Monitor CPU, memory, disk I/O, and network usage on your application droplets.
- Load Balancer Health Checks: Track the number of unhealthy backend servers reported by the load balancer.
- Application Performance Metrics: Monitor response times, error rates, and throughput of your Shopify application.
Tools and Integrations
DigitalOcean provides basic monitoring for droplets and load balancers. For more advanced alerting, consider integrating with:
- Prometheus & Grafana: Deploy Prometheus to scrape metrics from your Redis instances (using a Redis exporter), application servers (Node Exporter), and potentially Sentinel. Grafana can then be used to visualize these metrics and set up alerts based on thresholds.
- Alertmanager: Integrate Alertmanager with Prometheus to route alerts to various notification channels like Slack, PagerDuty, or email.
- DigitalOcean Alerts: Configure basic alerts directly within DigitalOcean for droplet resource utilization (CPU, RAM, Disk) and load balancer health.
- External Monitoring Services: Services like Datadog, New Relic, or UptimeRobot can provide external checks and synthetic monitoring for your application’s availability.
For Redis Sentinel, you can leverage its built-in notification script feature. Create a script that Sentinel executes when certain events occur (e.g., master down, failover initiated, failover completed). This script can then trigger custom alerts or actions.
# Example notification_script.sh (on each Sentinel server)
#!/bin/bash
# Arguments passed by Sentinel:
# $1: Sentinel's name for the master
# $2: Master's IP address
# $3: Master's port
# $4: Event type (e.g., "failover-end", "master-down")
MASTER_NAME=$1
MASTER_IP=$2
MASTER_PORT=$3
EVENT_TYPE=$4
MESSAGE="Redis Sentinel Alert: Master '$MASTER_NAME' ($MASTER_IP:$MASTER_PORT) event '$EVENT_TYPE' occurred."
# Send notification (e.g., to Slack via webhook, PagerDuty API, or email)
echo "$MESSAGE"
# Example: curl -X POST -H 'Content-type: application/json' --data '{"text":"'"$MESSAGE"'"}' YOUR_SLACK_WEBHOOK_URL
exit 0
Ensure this script is executable and configured in your sentinel.conf file using sentinel notification-script mymaster /path/to/notification_script.sh.