Disaster Recovery 101: Architecting Auto-Failovers for MongoDB and Ruby Deployments on OVH
Establishing a Robust MongoDB Replica Set for High Availability
A foundational element of any disaster recovery strategy for MongoDB is the implementation of a replica set. This ensures data redundancy and provides automatic failover capabilities. For a production deployment on OVH, we’ll configure a minimum of three nodes to maintain quorum and withstand a single node failure. Each node should reside in a different availability zone within the same OVH region for optimal resilience against localized infrastructure issues.
Let’s assume we have three OVH virtual machines (or bare-metal servers) provisioned with static IP addresses: `192.168.1.10` (node1), `192.168.1.11` (node2), and `192.168.1.12` (node3). We’ll install MongoDB Community Edition on each.
MongoDB Configuration File (`mongod.conf`)
On each node, the MongoDB configuration file (typically `/etc/mongod.conf`) needs to be adjusted. The key parameters for replica set configuration are `replication.replSetName` and `net.bindIp`.
For `node1` (`192.168.1.10`):
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
systemLog:
destination: file
path: /var/log/mongodb/mongod.log
logAppend: true
net:
port: 27017
bindIp: 0.0.0.0 # Or specific IPs for security
processManagement:
fork: true
pidFilePath: /var/run/mongodb/mongod.pid
replication:
replSetName: myReplicaSet
sharding:
clusterRole: configsvr # If this node is also a config server (for sharded clusters)
Repeat this configuration for `node2` and `node3`, ensuring `replication.replSetName` is identical across all nodes. The `bindIp` should be set to `0.0.0.0` to allow connections from other nodes in the replica set, or more restrictively, to the specific private IPs of your OVH instances and any application servers that need to connect.
Initializing the Replica Set
After starting the `mongod` service on all three nodes (e.g., `sudo systemctl start mongod` and `sudo systemctl enable mongod`), connect to one of the nodes (e.g., `node1`) using the MongoDB shell:
mongo --host 192.168.1.10 --port 27017
Once in the `mongo` shell, initiate the replica set:
rs.initiate(
{
_id: "myReplicaSet",
members: [
{ _id: 0, host: "192.168.1.10:27017" },
{ _id: 1, host: "192.168.1.11:27017" },
{ _id: 2, host: "192.168.1.12:27017" }
]
}
)
You can then check the status of the replica set with `rs.status()`. This command will show which node is the primary and which are secondaries. It may take a few moments for all members to sync and elect a primary.
Integrating Ruby Applications with MongoDB Replica Sets
Your Ruby application needs to be aware of the replica set to leverage its high availability features. This involves configuring the MongoDB connection string to include all members of the replica set and specifying the `replicaSet` option.
Using the `mongo` Gem
The standard `mongo` gem in Ruby provides excellent support for replica sets. When establishing a connection, you should provide a comma-separated list of hosts and the `replicaSet` parameter.
Here’s an example using the `mongo` gem in a Rails initializer or a standalone Ruby script:
require 'mongo'
# Connection string for the replica set
# Replace with your actual IPs and replica set name
mongo_uri = "mongodb://192.168.1.10:27017,192.168.1.11:27017,192.168.1.12:27017/?replicaSet=myReplicaSet"
begin
# Connect to MongoDB
client = Mongo::Client.new(mongo_uri)
# Access a database and collection
db = client[:my_database]
collection = db[:my_collection]
# Example operation: insert a document
result = collection.insert_one({ name: "Test Document", timestamp: Time.now })
puts "Inserted document with ID: #{result.inserted_id}"
# Example operation: find documents
documents = collection.find({ name: "Test Document" }).to_a
puts "Found documents: #{documents.inspect}"
# The driver automatically handles failover. If the primary becomes unavailable,
# it will detect the change and connect to the new primary.
rescue Mongo::Error => e
puts "Error connecting to MongoDB or performing operation: #{e.message}"
# Implement retry logic or alert mechanisms here
end
The `mongo` gem will automatically discover the current primary node. If the primary fails, the driver will detect the loss of connection and attempt to connect to another available secondary, which will then be promoted to primary by the MongoDB replica set mechanism. This failover process is typically seamless to the application, though there might be a brief latency increase during the transition.
Automating Failover Detection and Response
While MongoDB’s replica set provides automatic failover, robust disaster recovery often requires proactive monitoring and automated response mechanisms. This is crucial for scenarios where manual intervention might be too slow or impossible.
Monitoring MongoDB Replica Set Health
We can use tools like `mongostat` or `mongotop` for real-time monitoring, but for automated alerting, a dedicated monitoring system is essential. Prometheus with the MongoDB Exporter is a popular and effective choice.
1. Deploy MongoDB Exporter:
Run the MongoDB Exporter as a sidecar container or on a separate host. It scrapes metrics directly from MongoDB instances.
# Example using Docker docker run -d \ --name prometheus-mongodb-exporter \ -p 9271:9271 \ prom/mongodb-exporter:latest \ --mongodb.uri="mongodb://user:[email protected]:27017,192.168.1.11:27017,192.168.1.12:27017/?replicaSet=myReplicaSet"
Ensure the MongoDB user has sufficient privileges (e.g., `clusterMonitor` role) to access the necessary metrics.
2. Configure Prometheus:
Add a scrape job to your Prometheus configuration (`prometheus.yml`) to collect metrics from the exporter:
scrape_configs:
- job_name: 'mongodb'
static_configs:
- targets: [':9271'] # Replace with actual IP
Alerting on Failover Events
We can define Prometheus Alertmanager rules to detect replica set issues. A critical alert would be when the number of available members in the replica set drops below a certain threshold, or when a primary is not elected.
Example Alertmanager rule (`rules.yml`):
groups:
- name: mongodb_alerts
rules:
- alert: MongoDBReplicaSetDown
expr: mongodb_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "MongoDB replica set is down or unreachable."
description: "The MongoDB exporter cannot connect to any members of the replica set."
- alert: MongoDBNoPrimary
expr: sum(mongodb_replica_member_state{state="primary"}) by (replica_set) == 0
for: 2m
labels:
severity: critical
annotations:
summary: "MongoDB replica set '{{ $labels.replica_set }}' has no primary."
description: "No primary member found in replica set '{{ $labels.replica_set }}'. Automatic failover might be failing or taking too long."
- alert: MongoDBTooManySecondaries
expr: count(mongodb_replica_member_state{state="secondary"}) by (replica_set) < 2
for: 5m
labels:
severity: warning
annotations:
summary: "MongoDB replica set '{{ $labels.replica_set }}' has insufficient secondaries."
description: "Replica set '{{ $labels.replica_set }}' has fewer than 2 secondaries. This impacts redundancy and failover capability."
These alerts can be configured to notify your operations team via Slack, PagerDuty, or email, enabling rapid investigation and intervention if the automatic failover doesn't complete successfully.
OVH Specific Considerations for Disaster Recovery
When deploying on OVH, leverage their infrastructure to enhance your DR strategy:
- Multi-Region Deployment: For true disaster recovery, consider deploying a separate MongoDB replica set in a different OVH region. This protects against region-wide outages. Your application would need logic to connect to the secondary region's replica set in a catastrophic failure scenario.
- Network Configuration: Ensure your OVH security groups and firewall rules allow necessary traffic between MongoDB nodes and between application servers and MongoDB. For inter-region communication, configure OVH's private network capabilities or secure VPNs.
- Automated Backups: Implement regular, automated backups of your MongoDB data. Store these backups in a separate location, ideally in a different region or even a different cloud provider, using OVH's object storage (e.g., Swift) or other cloud storage solutions. Tools like `mongodump` scripted with `cron` or cloud-native backup services can be utilized.
- Infrastructure as Code (IaC): Use tools like Terraform or Ansible to manage your OVH infrastructure. This allows for rapid redeployment of your MongoDB cluster and application stack in a new region or zone if a disaster occurs.
By combining MongoDB's native replica set capabilities with robust monitoring, alerting, and OVH's resilient infrastructure, you can architect a highly available and disaster-resilient deployment for your Ruby applications.