Disaster Recovery 101: Architecting Auto-Failovers for Redis and C++ Deployments on OVH

Redis Sentinel for High Availability

Achieving automated failover for Redis deployments hinges on leveraging Redis Sentinel. Sentinel is a distributed system that provides high availability for Redis. It monitors Redis instances, detects failures, and initiates failover procedures by promoting a replica to master. For a robust setup, we’ll deploy at least three Sentinel instances across different availability zones or even regions for maximum resilience. This ensures that a quorum can always be reached even if one or two Sentinels fail.

The core configuration for Redis Sentinel is managed in a separate configuration file, typically named sentinel.conf. Here’s a breakdown of essential directives:

Sentinel Configuration (`sentinel.conf`)

# The name of the master Redis instance we are monitoring.
# This name is arbitrary but must be consistent across all Sentinels.
sentinel monitor mymaster 192.168.1.100 6379 2

# The quorum is the number of Sentinels that must agree that a master is
# down before initiating a failover. A quorum of 2 is the minimum for
# a 3-node Sentinel setup to tolerate one Sentinel failure.
sentinel down-after-milliseconds mymaster 5000

# The failover timeout. This is the maximum time in milliseconds that Sentinel
# will wait for a failover to complete.
sentinel failover-timeout mymaster 60000

# The parallel syncs parameter specifies the maximum number of replicas that
# can be reconfigured in parallel to replicate with the new master after a
# failover.
sentinel parallel-syncs mymaster 1

# If you are using Redis password authentication, you must configure it here.
# sentinel auth-pass mymaster YourRedisPassword

# Specify the port for Sentinel to listen on (default is 26379).
port 26379

# Specify the bind address for Sentinel.
bind 0.0.0.0

# Log file location.
logfile "/var/log/redis/sentinel.log"

# PID file location.
pidfile "/var/run/redis/sentinel.pid"

# Data directory for Sentinel (optional, but good practice for persistence).
dir "/var/lib/redis/sentinel"

The sentinel monitor mymaster directive is critical. mymaster is an arbitrary name for your Redis master. The quorum value dictates how many Sentinels must agree on the master’s unreachability before a failover is triggered. For a highly available setup with three Sentinels, a quorum of 2 is standard.

To start Redis Sentinel, execute the following command on each Sentinel node:

redis-sentinel /etc/redis/sentinel.conf

C++ Client Integration with Sentinel

Your C++ application needs to be aware of the current Redis master. Instead of hardcoding the master’s IP address, it should query Sentinel for the current master. This is typically done by connecting to any Sentinel instance and issuing the SENTINEL master command. The response will include the IP address and port of the current master.

A common pattern is to use a Redis client library that has built-in Sentinel support. For example, using the hiredis library with its Sentinel extension:

Example C++ Snippet (Conceptual with hiredis-sentinel)

#include <iostream>
#include <hiredis/hiredis.h>
#include <hiredis/adapters/libevent.h> // Or your preferred event loop adapter

// Assume you have a function to get Sentinel connection details
std::vector<std::pair<std::string, int>> get_sentinel_endpoints() {
    return {
        {"sentinel1.example.com", 26379},
        {"sentinel2.example.com", 26379},
        {"sentinel3.example.com", 26379}
    };
}

redisContext* connect_to_redis_master(const std::string& master_name) {
    std::vector<std::pair<std::string, int>> sentinels = get_sentinel_endpoints();
    redisContext* context = nullptr;
    redisReply* reply = nullptr;

    for (const auto& sentinel : sentinels) {
        redisContext* sentinel_ctx = redisConnect(sentinel.first.c_str(), sentinel.second);
        if (!sentinel_ctx) {
            std::cerr << "Failed to connect to Sentinel: " << sentinel.first << ":" << sentinel.second << std::endl;
            continue;
        }

        // Construct the SENTINEL command
        std::string command = "SENTINEL master " + master_name;
        redisAppendCommand(sentinel_ctx, command.c_str());
        redisBufferWrite(sentinel_ctx, nullptr, 0);

        // Get the reply
        if (redisGetReply(sentinel_ctx, (void**)&reply) == REDIS_OK) {
            if (reply->type == REDIS_REPLY_ARRAY && reply->elements > 0) {
                // The first element is the master name, second is ip, third is port
                if (reply->element[2]->type == REDIS_REPLY_STRING && reply->element[3]->type == REDIS_REPLY_STRING) {
                    std::string master_ip = reply->element[2]->str;
                    int master_port = std::stoi(reply->element[3]->str);

                    std::cout << "Detected Redis master: " << master_ip << ":" << master_port << std::endl;

                    // Connect to the actual master
                    context = redisConnect(master_ip.c_str(), master_port);
                    if (context) {
                        // Optionally, set a read-only flag if the client is read-only
                        // redisEnableRedundancy(context); // For automatic failover detection with hiredis
                        freeReplyObject(reply);
                        redisFree(sentinel_ctx);
                        return context;
                    } else {
                        std::cerr << "Failed to connect to Redis master: " << master_ip << ":" << master_port << std::endl;
                    }
                }
            }
            freeReplyObject(reply);
        } else {
            std::cerr << "Error getting reply from Sentinel: " << sentinel.first << std::endl;
        }
        redisFree(sentinel_ctx);
    }
    return nullptr; // Failed to find master
}

int main() {
    const std::string master_name = "mymaster";
    redisContext* ctx = connect_to_redis_master(master_name);

    if (ctx) {
        // Use the context for Redis operations
        redisReply* reply = (redisReply*)redisCommand(ctx, "PING");
        if (reply) {
            std::cout << "PING reply: " << reply->str << std::endl;
            freeReplyObject(reply);
        }
        redisFree(ctx);
    } else {
        std::cerr << "Could not establish connection to Redis master." << std::endl;
    }
    return 0;
}

In a production environment, you would integrate this logic into your application’s connection management. When a connection error occurs, the application should re-query Sentinel to get the new master’s address and re-establish the connection. Libraries like hiredis can often handle this reconnection logic automatically if configured correctly (e.g., using redisEnableRedundancy or similar features that allow the client to be aware of Sentinel’s failover events).

OVH Cloud Infrastructure Considerations

Deploying this setup on OVH Cloud requires careful planning of your network and instance placement. For optimal resilience:

Instance Placement: Deploy your Redis master, replicas, and Sentinel instances across different Availability Zones (AZs) within a single OVH region. This protects against single datacenter failures. For even higher disaster recovery, consider deploying Sentinels in different regions, though this introduces higher latency.
Networking: Ensure your instances can communicate with each other on the necessary ports (6379 for Redis, 26379 for Sentinel). Use OVH’s private network capabilities (e.g., vRack) for secure and low-latency communication between your Redis and Sentinel nodes. If instances are in different AZs but the same region, they will typically be on the same private network by default.
Security Groups/Firewalls: Configure firewall rules (e.g., OVH’s Security Groups or iptables on the instances) to allow traffic only from your application servers to the Redis master/replicas and from your application servers and other Sentinels to the Sentinel instances. Restrict access to Redis and Sentinel ports to only necessary sources.
Monitoring and Alerting: Beyond Sentinel’s internal monitoring, set up external monitoring for your Redis and Sentinel instances using OVH’s monitoring tools or third-party solutions. Monitor CPU, memory, disk I/O, network traffic, and Redis-specific metrics (e.g., connected clients, memory usage, latency). Set up alerts for critical thresholds and Sentinel events.
Automated Deployment: Use Infrastructure as Code (IaC) tools like Terraform or Ansible to automate the deployment and configuration of your Redis and Sentinel instances. This ensures consistency and simplifies disaster recovery drills.

When setting up Redis replication, ensure your master instance is configured with replica-announce-ip and replica-announce-port if it’s behind a NAT or using private IPs that differ from its public-facing ones. This helps replicas correctly identify and connect to the master.

Example Terraform Snippet (Conceptual for OVH)

# This is a conceptual example. Actual OVH provider configuration and
# resource details will vary.

provider "ovh" {
  # Your OVH credentials and endpoint
  endpoint = "ovh-eu"
}

resource "ovh_compute_instance" "redis_master" {
  name          = "redis-master-01"
  image_name    = "ubuntu-2004"
  flavor_name   = "s1-2" # Adjust flavor as needed
  region        = "GRA"  # Example region
  zone          = "GRA5" # Example zone
  ssh_key_names = ["my-ssh-key"]

  # Attach to a private network for inter-instance communication
  network_interface {
    uuid = ovh_compute_network_private.redis_network.id
  }

  # User data to install and configure Redis
  user_data = file("scripts/install_redis.sh")
}

resource "ovh_compute_instance" "redis_replica" {
  count         = 2 # Two replicas
  name          = "redis-replica-${count.index}"
  image_name    = "ubuntu-2004"
  flavor_name   = "s1-2"
  region        = "GRA"
  zone          = "GRA6" # Different zone
  ssh_key_names = ["my-ssh-key"]

  network_interface {
    uuid = ovh_compute_network_private.redis_network.id
  }

  user_data = file("scripts/install_redis_replica.sh")
}

resource "ovh_compute_instance" "redis_sentinel" {
  count         = 3 # Three sentinels
  name          = "redis-sentinel-${count.index}"
  image_name    = "ubuntu-2004"
  flavor_name   = "s1-2"
  region        = "GRA"
  zone          = element(["GRA5", "GRA6", "GRA7"], count.index) # Different zones
  ssh_key_names = ["my-ssh-key"]

  network_interface {
    uuid = ovh_compute_network_private.redis_network.id
  }

  user_data = file("scripts/install_sentinel.sh")
}

resource "ovh_compute_network_private" "redis_network" {
  name   = "redis-private-network"
  region = "GRA"
  # Optionally specify a network CIDR
  # cidr = "10.0.0.0/24"
}

# Output the IP addresses for configuration
output "redis_master_ip" {
  value = ovh_compute_instance.redis_master.private_ip
}

output "redis_replica_ips" {
  value = ovh_compute_instance.redis_replica[*].private_ip
}

output "redis_sentinel_ips" {
  value = ovh_compute_instance.redis_sentinel[*].private_ip
}

The user_data scripts (e.g., install_redis.sh, install_sentinel.sh) would contain the necessary commands to download Redis, configure redis.conf and sentinel.conf, and start the services. These scripts would dynamically insert the correct IP addresses and configurations based on the Terraform outputs or by querying the OVH API during provisioning.

Testing and Validation

Regularly test your failover mechanism. The most straightforward method is to manually stop the Redis master process:

# On the Redis master instance
sudo systemctl stop redis-server
# or
sudo pkill redis-server

Observe the Sentinel logs on your Sentinel instances. You should see messages indicating that the master is down, a failover is being initiated, and a replica is being promoted. Your C++ application, upon encountering connection errors, should automatically reconnect to the new master.

To simulate a more complex failure scenario, you can:

Terminate the Redis master instance.
Network partition the master from the Sentinels (e.g., using iptables).
Stop one or two Sentinel instances to test quorum behavior.

After each test, verify that your application can still connect and perform operations. Document the results and any issues encountered. Automating these tests as part of your CI/CD pipeline or a dedicated testing framework is highly recommended for production readiness.

Disaster Recovery 101: Architecting Auto-Failovers for Redis and C++ Deployments on OVH

Redis Sentinel for High Availability

Sentinel Configuration (`sentinel.conf`)

C++ Client Integration with Sentinel

Example C++ Snippet (Conceptual with hiredis-sentinel)

OVH Cloud Infrastructure Considerations

Example Terraform Snippet (Conceptual for OVH)

Testing and Validation

Recent Posts

Top Categories

Our Products

Our Services