Disaster Recovery 101: Architecting Auto-Failovers for Redis and C Deployments on Linode
Establishing a High-Availability Redis Cluster with Sentinel
For critical applications relying on Redis for caching or session management, a single instance is a single point of failure. Implementing Redis Sentinel provides automatic failover, ensuring minimal downtime. This section details the setup of a three-node Redis Sentinel cluster on Linode.
We’ll assume you have three Linode instances provisioned, each with a static IP address. For this example, let’s use:
- Master Node: 192.168.1.10 (redis-master)
- Replica Node 1: 192.168.1.11 (redis-replica-1)
- Replica Node 2: 192.168.1.12 (redis-replica-2)
- Sentinel Node 1: 192.168.1.20 (sentinel-1)
- Sentinel Node 2: 192.168.1.21 (sentinel-2)
- Sentinel Node 3: 192.168.1.22 (sentinel-3)
It’s crucial to configure Redis to run as a daemon and to enable persistence (RDB or AOF) for data recovery. Ensure Redis is installed on all three nodes intended for Redis instances.
Configuring Redis Instances
On each Redis node (redis-master, redis-replica-1, redis-replica-2), edit the Redis configuration file (typically /etc/redis/redis.conf).
Master Node Configuration (redis-master)
Ensure the following settings are present or modified:
# /etc/redis/redis.conf on redis-master port 6379 daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log bind 0.0.0.0 # Or specific private IP if preferred appendonly yes # For replication, master doesn't need these specific lines, but good practice # replica-serve-stale-data yes # replica-read-only yes # repl-disable-tcp-nodelay no # repl-backlog-size 1mb # repl-backlog-ttl 3600 # slave-priority 100
Replica Node Configuration (redis-replica-1, redis-replica-2)
On each replica node, add the following lines, pointing to the master’s IP address:
# /etc/redis/redis.conf on redis-replica-1 & redis-replica-2 port 6379 daemonize yes pidfile /var/run/redis_6379.pid logfile /var/log/redis/redis-server.log bind 0.0.0.0 # Or specific private IP if preferred appendonly yes replica-serve-stale-data yes replica-read-only yes repl-disable-tcp-nodelay no repl-backlog-size 1mb repl-backlog-ttl 3600 slave-priority 100 # Default, can be adjusted for failover preference # Replication settings replicaof 192.168.1.10 6379 # Point to your master's IP and port
After configuring, restart Redis on all three nodes:
sudo systemctl restart redis-server
Setting Up Redis Sentinel
Redis Sentinel is a separate process that monitors Redis instances and performs automatic failover. Install Redis on the three Sentinel nodes (sentinel-1, sentinel-2, sentinel-3). You can use the same Redis installation package.
Create a Sentinel configuration file, e.g., /etc/redis/sentinel.conf, on each Sentinel node.
# /etc/redis/sentinel.conf on sentinel-1, sentinel-2, sentinel-3 port 26379 daemonize yes pidfile /var/run/redis-sentinel.pid logfile /var/log/redis/redis-sentinel.log bind 0.0.0.0 # Or specific private IP # Monitor the master Redis instance # The first argument is the name of the master, the second is its IP, the third is its port, # and the fourth is the quorum (minimum number of Sentinels that must agree a master is down). # A quorum of 2 is sufficient for 3 Sentinels. sentinel monitor mymaster 192.168.1.10 6379 2 # The failover timeout. If a master does not answer for this duration, it's considered down. sentinel down-after-milliseconds mymaster 5000 # The time in milliseconds between Sentinel trying to re-configure replicas. sentinel parallel-syncs mymaster 1 # The time in milliseconds Sentinel will wait before starting the failover process # after a master is detected as down. sentinel failover-timeout mymaster 10000 # Optional: If you have replicas with different priorities, you can specify them. # sentinel can-failover-master-with-replica-priority mymaster # Optional: If you want to use a specific replica to promote during failover. # sentinel known-replica mymaster 192.168.1.11 # sentinel known-replica mymaster 192.168.1.12 # Optional: Authentication for Redis instances # requirepass your_redis_password # masterauth your_redis_password # sentinel auth-pass mymaster your_redis_password
Start the Sentinel service on each Sentinel node:
sudo systemctl start redis-sentinel sudo systemctl enable redis-sentinel
Verify Sentinel status:
redis-cli -p 26379 INFO Sentinel
You should see output indicating the monitored master and the other Sentinels. Once all Sentinels are up and running, they will elect a leader and begin monitoring the Redis master. To test failover, stop the Redis master process:
# On redis-master sudo systemctl stop redis-server
Monitor the Sentinel logs (/var/log/redis/redis-sentinel.log) on the Sentinel nodes. Within a short period, one of the replicas will be promoted to master, and the Sentinels will reconfigure the remaining replicas. Your application should connect to the master’s IP address, and Sentinel will transparently redirect it to the new master.
Automating C Application Failover with Systemd and HAProxy
For stateless C applications that need high availability, we can leverage systemd for process management and automatic restarts, combined with HAProxy as a load balancer and health checker. This setup assumes your C application is designed to be stateless or can manage its state externally (e.g., via Redis, as configured above).
We’ll deploy two instances of the C application on separate Linode instances (app-1, app-2) and use a third Linode instance (lb-1) for HAProxy. The application will listen on a specific port (e.g., 8080).
Application Deployment and Systemd Service
On each application node (app-1, app-2), ensure your compiled C application binary is in a standard location (e.g., /usr/local/bin/my_c_app). Create a systemd service file to manage the application.
# /etc/systemd/system/my_c_app.service on app-1 and app-2 [Unit] Description=My C Application Service After=network.target [Service] ExecStart=/usr/local/bin/my_c_app --port 8080 --config /etc/my_c_app/config.conf Restart=always RestartSec=5 User=my_app_user Group=my_app_user WorkingDirectory=/opt/my_c_app Environment="MY_APP_ENV=production" [Install] WantedBy=multi-user.target
Create the user and group, and set up the application directory:
sudo groupadd my_app_user sudo useradd -r -g my_app_user -s /sbin/nologin my_app_user sudo mkdir -p /opt/my_c_app sudo chown -R my_app_user:my_app_user /opt/my_c_app sudo mkdir -p /etc/my_c_app sudo chown -R my_app_user:my_app_user /etc/my_c_app
Place your compiled C application binary and any necessary configuration files in the respective directories. Then, enable and start the service:
sudo systemctl daemon-reload sudo systemctl enable my_c_app sudo systemctl start my_c_app
Check the status:
sudo systemctl status my_c_app
Configuring HAProxy for Load Balancing and Health Checks
Install HAProxy on the load balancer node (lb-1).
sudo apt update && sudo apt install haproxy -y
Edit the HAProxy configuration file (/etc/haproxy/haproxy.cfg).
# /etc/haproxy/haproxy.cfg on lb-1
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend http_frontend
bind *:80
mode http
default_backend http_backend
backend http_backend
mode http
balance roundrobin
option httpchk GET /healthz # Assuming your C app has a /healthz endpoint
http-check expect status 200
server app1 192.168.1.30:8080 check # IP of app-1
server app2 192.168.1.31:8080 check # IP of app-2
# If your C app is not HTTP-based, use TCP mode
# frontend tcp_frontend
# bind *:8080
# mode tcp
# default_backend tcp_backend
# backend tcp_backend
# mode tcp
# balance roundrobin
# option tcp-check # Basic TCP connection check
# server app1 192.168.1.30:8080 check port 8080 # IP of app-1
# server app2 192.168.1.31:8080 check port 8080 # IP of app-2
Note: If your C application doesn’t expose an HTTP health check endpoint, you can use option tcp-check for basic TCP connectivity checks. Ensure your C application is configured to listen on the specified port (e.g., 8080).
Enable and start HAProxy:
sudo systemctl enable haproxy sudo systemctl start haproxy
To test the failover, stop the C application service on one of the application nodes:
# On app-1 sudo systemctl stop my_c_app
HAProxy will detect that the instance is unhealthy (either via HTTP health check or TCP check) and will stop sending traffic to it. Traffic will be automatically routed to the healthy instance (app-2). When you restart the service on app-1, HAProxy will re-add it to the pool after it passes health checks.
Integrating Redis and C Application Failover Strategies
The most robust disaster recovery strategy involves combining these two approaches. Your C application instances, managed by systemd and load-balanced by HAProxy, should connect to the highly available Redis cluster managed by Sentinel.
When configuring your C application (or its connection logic), it should be aware of the Redis Sentinel endpoint. Many Redis client libraries support Sentinel discovery. If your application’s client library doesn’t directly support Sentinel, you can implement a simple discovery mechanism:
Client-Side Redis Sentinel Discovery (Conceptual Python Example)
This Python snippet illustrates how a client might discover the current Redis master via Sentinel. Your C application would need a similar logic, potentially using a C Redis client library that supports Sentinel or by implementing this logic in a proxy layer.
import redis
# List of Sentinel nodes
SENTINELS = [('192.168.1.20', 26379), ('192.168.1.21', 26379), ('192.168.1.22', 26379)]
MASTER_NAME = 'mymaster'
def get_redis_master():
try:
# Initialize a Sentinel client
sentinel = redis.Sentinel(SENTINELS, socket_timeout=0.5)
# Get the master connection object
master = sentinel.master_for(MASTER_NAME, socket_timeout=0.5)
# Test connection by pinging
master.ping()
# Return connection details
return {
'host': master.connection_pool.host,
'port': master.connection_pool.port
}
except redis.exceptions.ConnectionError as e:
print(f"Error connecting to Redis Sentinel or Master: {e}")
return None
except Exception as e:
print(f"An unexpected error occurred: {e}")
return None
if __name__ == "__main__":
redis_info = get_redis_master()
if redis_info:
print(f"Current Redis Master: {redis_info['host']}:{redis_info['port']}")
# In a real application, you would use these details to connect
# For example:
# r = redis.Redis(host=redis_info['host'], port=redis_info['port'], db=0)
# r.set('mykey', 'myvalue')
else:
print("Failed to get Redis master information.")
In a C application, you would typically configure the Redis connection details (host, port) in a configuration file. When a connection error occurs, your application could trigger a re-discovery of the Redis master using the Sentinel IPs. This logic should be robust, handling temporary network glitches and Sentinel leader elections.
Monitoring and Alerting
Effective disaster recovery is incomplete without comprehensive monitoring. Implement checks for:
- Redis Sentinel health (number of masters down, number of sentinels available).
- Redis master and replica status (connected, replication lag).
- HAProxy backend health (number of available servers).
- Application-level metrics (request latency, error rates).
- System resource utilization (CPU, memory, disk I/O) on all nodes.
Tools like Prometheus with Alertmanager, Datadog, or Nagios can be integrated to provide real-time insights and trigger alerts for any anomalies, allowing for proactive intervention before a full-blown disaster occurs.