Disaster Recovery 101: Architecting Auto-Failovers for Redis and C++ Deployments on DigitalOcean
Establishing a Highly Available Redis Cluster on DigitalOcean
Achieving automated failover for Redis, a critical in-memory data store, requires a robust architecture that goes beyond simple replication. We’ll focus on a Redis Sentinel-based setup, which provides high availability by monitoring Redis instances and orchestrating automatic failovers when a master node becomes unavailable. This setup will be deployed on DigitalOcean Droplets, leveraging their infrastructure for reliability.
Our strategy involves deploying at least three Redis instances: one master and two or more replicas. Crucially, we’ll also deploy at least three Redis Sentinel instances. Sentinels are independent processes that monitor Redis master and replica nodes, and they elect a new master if the current one fails. A quorum of Sentinels must agree on a failover for it to proceed, preventing split-brain scenarios.
Redis and Sentinel Configuration
Let’s outline the essential configuration for Redis and Redis Sentinel. We’ll assume a basic setup where Redis instances are listening on port 6379 and Sentinels on port 26379. For production, consider adjusting these ports and implementing robust network security.
Redis Master Configuration (redis.conf)
On the designated master Droplet, the redis.conf file should be configured as follows:
# redis.conf for Master Node port 6379 bind 0.0.0.0 daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log dir /var/lib/redis # Persistence (choose one or none, depending on RPO) # appendonly yes # appendfilename "appendonly.aof" # Replication (not needed for master, but good practice to have the config file ready) # replica-serve-stale-data yes # replica-read-only no # repl-diskless-sync no # repl-disable-tcp-nodelay no # repl-backlog-size 1mb # repl-backlog-ttl 3600 # Security (essential for production) # requirepass your_strong_redis_password # rename-command CONFIG "" # Disable potentially dangerous commands
Redis Replica Configuration (redis.conf)
On each replica Droplet, the redis.conf file needs to point to the master:
# redis.conf for Replica Node port 6379 bind 0.0.0.0 daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log dir /var/lib/redis # Replication replicaof <master_ip_address> 6379 # If using authentication: # masterauth your_strong_redis_password # Security (essential for production) # requirepass your_strong_redis_password # rename-command CONFIG ""
Redis Sentinel Configuration (sentinel.conf)
On each Sentinel Droplet, the sentinel.conf file is crucial for monitoring and failover orchestration. Ensure you have at least three Sentinels running on separate Droplets for quorum.
# sentinel.conf port 26379 daemonize yes pidfile /var/run/redis-sentinel.pid logfile /var/log/redis/redis-sentinel.log dir /var/lib/redis # Monitor the master Redis instance # The 'mymaster' name is arbitrary but must be consistent across all sentinels. # 1 is the quorum: minimum number of sentinels that must agree on a failover. # 10000 is the down-after-milliseconds: how long the master must be unreachable before considering it down. # 5000 is the failover-timeout: how long the failover process can take. # 10000 is the parallel-syncs: number of replicas that can be reconfigured in parallel. sentinel monitor mymaster <master_ip_address> 6379 3 # If using authentication: # sentinel auth-pass mymaster your_strong_redis_password # Optional: Configure replicas to be promoted # sentinel parallel-syncs mymaster 1 # sentinel failover-timeout mymaster 60000 # Optional: Specify replica configuration for failover # sentinel down-after-milliseconds mymaster 5000 # sentinel failover-timeout mymaster 15000 # sentinel parallel-syncs mymaster 5
Important Notes:
- Replace
<master_ip_address>with the actual private IP address of your master Redis Droplet. - Ensure that the private IPs of your Redis instances and Sentinels are reachable from each other. Configure DigitalOcean’s VPC networking or firewall rules accordingly.
- The
quorumvalue (e.g., 3) is critical. It defines the minimum number of Sentinels that must agree that the master is down before initiating a failover. For high availability, this should be an odd number and at least(N/2) + 1, where N is the total number of Sentinels. down-after-millisecondsis the timeout for considering a Redis instance unreachable. Tune this based on your network latency and Redis performance.- If your Redis instances require authentication, ensure
masterauthandsentinel auth-passare correctly configured with the same password.
Automating Deployment and Failover Orchestration
Manual configuration is error-prone and not scalable. We’ll leverage Ansible for automating the deployment of Redis and Sentinel, and for managing their configurations. For application-level failover detection and connection switching, we’ll use a C++ client library that can query Sentinels.
Ansible Playbook for Redis and Sentinel Deployment
This Ansible playbook assumes you have a DigitalOcean inventory file and SSH access configured. It will install Redis, configure it as master or replica, and set up Sentinels.
# playbook.yml
---
- hosts: redis_masters
become: yes
vars:
redis_version: "6.2.6" # Specify your desired Redis version
redis_conf_dir: "/etc/redis"
redis_data_dir: "/var/lib/redis"
redis_log_dir: "/var/log/redis"
redis_password: "your_strong_redis_password" # Use Ansible Vault for production
tasks:
- name: Install Redis Server
apt:
name: redis-server={{ redis_version }}
state: present
update_cache: yes
- name: Ensure Redis directories exist
file:
path: "{{ item }}"
state: directory
owner: redis
group: redis
mode: '0755'
loop:
- "{{ redis_conf_dir }}"
- "{{ redis_data_dir }}"
- "{{ redis_log_dir }}"
- name: Configure Redis Master
template:
src: templates/redis.conf.j2
dest: "{{ redis_conf_dir }}/redis.conf"
owner: redis
group: redis
mode: '0644'
notify: Restart Redis
- name: Ensure Redis service is running and enabled
systemd:
name: redis-server
state: started
enabled: yes
handlers:
- name: Restart Redis
systemd:
name: redis-server
state: restarted
- hosts: redis_replicas
become: yes
vars:
redis_version: "6.2.6"
redis_conf_dir: "/etc/redis"
redis_data_dir: "/var/lib/redis"
redis_log_dir: "/var/log/redis"
redis_password: "your_strong_redis_password"
master_ip: "{{ hostvars[groups['redis_masters'][0]]['ansible_default_ipv4']['address'] }}" # Assumes one master
tasks:
- name: Install Redis Server
apt:
name: redis-server={{ redis_version }}
state: present
update_cache: yes
- name: Ensure Redis directories exist
file:
path: "{{ item }}"
state: directory
owner: redis
group: redis
mode: '0755'
loop:
- "{{ redis_conf_dir }}"
- "{{ redis_data_dir }}"
- "{{ redis_log_dir }}"
- name: Configure Redis Replica
template:
src: templates/redis_replica.conf.j2
dest: "{{ redis_conf_dir }}/redis.conf"
owner: redis
group: redis
mode: '0644'
notify: Restart Redis
- name: Ensure Redis service is running and enabled
systemd:
name: redis-server
state: started
enabled: yes
handlers:
- name: Restart Redis
systemd:
name: redis-server
state: restarted
- hosts: redis_sentinels
become: yes
vars:
redis_version: "6.2.6"
redis_conf_dir: "/etc/redis"
redis_data_dir: "/var/lib/redis"
redis_log_dir: "/var/log/redis"
redis_password: "your_strong_redis_password"
master_ip: "{{ hostvars[groups['redis_masters'][0]]['ansible_default_ipv4']['address'] }}"
sentinel_quorum: 3 # Must match your desired quorum
tasks:
- name: Install Redis Server (Sentinels also use redis-server package)
apt:
name: redis-server={{ redis_version }}
state: present
update_cache: yes
- name: Ensure Redis directories exist for Sentinels
file:
path: "{{ item }}"
state: directory
owner: redis
group: redis
mode: '0755'
loop:
- "{{ redis_conf_dir }}"
- "{{ redis_data_dir }}"
- "{{ redis_log_dir }}"
- name: Configure Redis Sentinel
template:
src: templates/sentinel.conf.j2
dest: "{{ redis_conf_dir }}/sentinel.conf"
owner: redis
group: redis
mode: '0644'
notify: Restart Redis Sentinel
- name: Ensure Redis Sentinel service is running and enabled
systemd:
name: redis-sentinel
state: started
enabled: yes
handlers:
- name: Restart Redis Sentinel
systemd:
name: redis-sentinel
state: restarted
Ansible Template Files
Create the following template files in a templates/ directory alongside your playbook.
templates/redis.conf.j2 (Master)
# redis.conf for Master Node
port 6379
bind 0.0.0.0
daemonize yes
pidfile /var/run/redis_6379.pid
logfile {{ redis_log_dir }}/redis-server.log
dir {{ redis_data_dir }}
appendonly yes
appendfilename "appendonly.aof"
requirepass {{ redis_password }}
rename-command CONFIG ""
templates/redis_replica.conf.j2 (Replica)
# redis.conf for Replica Node
port 6379
bind 0.0.0.0
daemonize yes
pidfile /var/run/redis_6379.pid
logfile {{ redis_log_dir }}/redis-server.log
dir {{ redis_data_dir }}
replicaof {{ master_ip }} 6379
masterauth {{ redis_password }}
appendonly yes
appendfilename "appendonly.aof"
requirepass {{ redis_password }}
rename-command CONFIG ""
templates/sentinel.conf.j2 (Sentinel)
# sentinel.conf
port 26379
daemonize yes
pidfile /var/run/redis-sentinel.pid
logfile {{ redis_log_dir }}/redis-sentinel.log
dir {{ redis_data_dir }}
sentinel monitor mymaster {{ master_ip }} 6379 {{ sentinel_quorum }}
sentinel auth-pass mymaster {{ redis_password }}
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 15000
sentinel parallel-syncs mymaster 5
To run this playbook, you would typically have an inventory file like this:
# inventory.ini [redis_masters] master_droplet_ip ansible_user=root [redis_replicas] replica1_droplet_ip ansible_user=root replica2_droplet_ip ansible_user=root [redis_sentinels] sentinel1_droplet_ip ansible_user=root sentinel2_droplet_ip ansible_user=root sentinel3_droplet_ip ansible_user=root
And then execute:
ansible-playbook -i inventory.ini playbook.yml
C++ Client Integration for Auto-Failover
Your C++ application needs to be aware of the Redis cluster topology and be able to switch to a new master automatically when a failover occurs. This is achieved by using a Redis client library that supports Sentinel discovery.
We’ll use the popular redis-plus-plus library, which offers excellent Sentinel support. First, ensure you have it installed:
# Install redis-plus-plus (example using v1.3.0) # You might need to build from source or use a package manager if available. # For building from source: git clone https://github.com/sewenew/redis-plus-plus.git cd redis-plus-plus git checkout v1.3.0 # Or your desired version mkdir build cd build cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local make sudo make install
C++ Code Example for Sentinel-Aware Redis Client
This C++ code demonstrates how to connect to Redis via Sentinels. The library handles the discovery of the current master and automatically reconnects or switches to a new master if a failover is detected.
#include <iostream>
#include <vector>
#include <string>
#include <redis/redis.h>
int main() {
// List of Sentinel host:port pairs
std::vector<std::pair<std::string, int>> sentinels = {
{"sentinel1_private_ip", 26379},
{"sentinel2_private_ip", 26379},
{"sentinel3_private_ip", 26379}
};
// The name of the Redis master set in sentinel.conf
std::string master_name = "mymaster";
std::string redis_password = "your_strong_redis_password"; // If authentication is enabled
try {
// Create a Redis client instance using Sentinel
// The redis::Redis object will automatically discover the master
// and handle failovers.
redis::Redis redis(sentinels, master_name, redis_password);
// Test the connection and perform some operations
redis.set("mykey", "myvalue");
std::string value = redis.get("mykey");
std::cout << "Successfully connected to Redis. Value for 'mykey': " << value << std::endl;
// Example of a long-running operation or periodic check
for (int i = 0; i < 10; ++i) {
std::string timestamp = std::to_string(std::chrono::system_clock::now().time_since_epoch().count());
redis.set("heartbeat", timestamp);
std::cout << "Heartbeat set: " << timestamp << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(5));
}
} catch (const redis::redis_error& e) {
std::cerr << "Redis error: " << e.what() << std::endl;
return 1;
} catch (const std::exception& e) {
std::cerr << "General error: " << e.what() << std::endl;
return 1;
}
return 0;
}
Compilation:
# Assuming redis-plus-plus was installed in /usr/local g++ -std=c++17 your_app.cpp -o your_app -lredis++ -lpthread -lssl -lcrypto -I/usr/local/include/redis++ -L/usr/local/lib
Replace sentinel1_private_ip, sentinel2_private_ip, and sentinel3_private_ip with the actual private IP addresses of your Sentinel Droplets. The redis::Redis constructor, when given Sentinel endpoints, will query them to find the current master. If the master changes due to a failover, the library will detect this and update its internal connection information. Your application can continue to use the same redis::Redis object, and subsequent operations will be directed to the new master.
Testing the Failover Mechanism
To validate the auto-failover setup, simulate a failure of the current Redis master. The most straightforward way is to stop the Redis server process on the master Droplet.
- Step 1: Identify the current master. You can do this by connecting to any Sentinel instance and running
redis-cli -p 26379 SENTINEL master mymaster. - Step 2: Stop the Redis master process. SSH into the master Droplet and run:
sudo systemctl stop redis-server. - Step 3: Observe the Sentinels. Monitor the Sentinel logs (
/var/log/redis/redis-sentinel.log) on the Sentinel Droplets. You should see messages indicating that the master is down and that a failover is being initiated. - Step 4: Verify the new master. After a short period (depending on your
down-after-millisecondsandfailover-timeoutsettings), runredis-cli -p 26379 SENTINEL master mymasteragain from a Sentinel. It should now report a different IP address as the master. - Step 5: Check your C++ application. Your application, if it was running during the failover, should automatically reconnect to the new master. Any operations that were in progress might have failed with a connection error, but subsequent operations should succeed. You can also restart your application to ensure it correctly discovers the new master on startup.
- Step 6: Restore the old master. Once the old master is back online, it will be automatically configured as a replica of the new master by the Sentinels. You can verify this by checking its configuration or by running
redis-cli -p 26379 SENTINEL replicas mymasterfrom a Sentinel.
This automated failover process ensures minimal downtime for your Redis-dependent services. The combination of Redis Sentinel for high availability and Ansible for robust deployment provides a production-ready solution.