Disaster Recovery 101: Architecting Auto-Failovers for MongoDB and Perl Deployments on OVH

Establishing a Robust MongoDB Replica Set for High Availability

For any mission-critical application, a single point of failure is unacceptable. When architecting disaster recovery for MongoDB, the cornerstone is a well-configured replica set. This isn’t just about backups; it’s about continuous availability. We’ll focus on a three-node replica set, a common and effective pattern for production environments, ensuring automatic failover with minimal downtime.

Our setup will involve three OVH cloud instances, each running a MongoDB instance. For simplicity, we’ll assume these instances are on the same private network for inter-node communication, but the principles extend to geographically distributed deployments with appropriate network configuration and security measures.

MongoDB Configuration for Replica Sets

Each MongoDB instance needs to be configured to participate in a replica set. This is primarily done via the mongod.conf file. We’ll ensure the replication.replSetName parameter is set consistently across all nodes.

Here’s a sample /etc/mongod.conf for the primary node (e.g., mongo-node-1):

storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true
net:
  bindIp: 0.0.0.0
  port: 27017
security:
  authorization: enabled
replication:
  replSetName: myReplicaSet
processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid

The configuration for mongo-node-2 and mongo-node-3 will be identical, with the exception of potentially different log file paths or PID file paths if you choose to isolate them further, though not strictly necessary for replica set functionality.

Initializing and Configuring the Replica Set

Once MongoDB is installed and configured on all nodes, and the mongod service is running on each, we initiate the replica set configuration from a mongo shell connected to any one of the nodes. It’s best practice to do this from the intended initial primary node.

Connect to the MongoDB instance on mongo-node-1:

mongo --host mongo-node-1 --port 27017

Inside the mongo shell, execute the replica set initialization command:

rs.initiate(
  {
    _id: "myReplicaSet",
    members: [
      { _id: 0, host: "mongo-node-1:27017" },
      { _id: 1, host: "mongo-node-2:27017" },
      { _id: 2, host: "mongo-node-3:27017" }
    ]
  }
)

After executing this, MongoDB will configure the replica set. You can verify the status by running rs.status() in the mongo shell. You should see one node elected as PRIMARY and the others as SECONDARY. The election process is automatic and happens when the replica set starts or if the primary becomes unavailable.

Integrating Perl Applications with MongoDB Auto-Failover

Your Perl applications need to be aware of the MongoDB replica set to leverage its high availability. This means configuring your database connection string to point to the replica set name, rather than a single host. The MongoDB Perl driver will then handle discovering the current primary and directing operations accordingly.

Perl MongoDB Driver Configuration

We’ll use the MongoDB Perl module. Ensure it’s installed via CPAN:

cpan MongoDB

When connecting, use a connection string that specifies the replica set name. This allows the driver to discover all members of the replica set and connect to the current primary.

Here’s a typical Perl script snippet demonstrating the connection:

use MongoDB;
use strict;
use warnings;

# Connection string for the replica set
# Format: mongodb://[user:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]
# The ?replicaSet=myReplicaSet is crucial for failover.
my $dsn = "mongodb://mongo-node-1:27017,mongo-node-2:27017,mongo-node-3:27017/?replicaSet=myReplicaSet";

my $client;
eval {
    $client = MongoDB::MongoClient->new(
        host => $dsn,
        # Optional: Add connection timeouts and read/write concerns for production
        # connect_timeout => 5000, # milliseconds
        # read_preference => 'primaryPreferred',
        # write_concern => { w => 'majority', wtimeout => 5000 },
    );
};
if ($@) {
    die "Failed to connect to MongoDB: $@\n";
}

my $db = $client->get_database('my_application_db');
my $collection = $db->get_collection('users');

# Example operation
eval {
    my $user_doc = $collection->find_one({ username => 'testuser' });
    if ($user_doc) {
        print "Found user: " . $user_doc->{'username'} . "\n";
    } else {
        print "User not found.\n";
    }
};
if ($@) {
    die "Database operation failed: $@\n";
}

# Close the connection (optional, as it's usually managed by the driver)
# $client->disconnect;

The key here is the ?replicaSet=myReplicaSet parameter in the DSN. The MongoDB driver will automatically discover the replica set members and connect to the current primary. If the primary fails, the driver will detect the change and reconnect to the newly elected primary.

Handling Failures in Perl Code

While the driver handles the connection failover, your application logic should be prepared for transient errors. Using eval blocks around database operations is essential for catching exceptions thrown by the MongoDB driver during network interruptions or during the brief period of primary election.

In a production scenario, you would implement retry mechanisms within your eval blocks for idempotent operations, or gracefully degrade service if operations cannot be retried. For writes, ensure you are using appropriate write concerns (e.g., w: 'majority') to guarantee data durability across nodes.

OVH Infrastructure Considerations for Auto-Failover

When deploying on OVH, several infrastructure aspects are critical for ensuring seamless auto-failover, especially for MongoDB replica sets.

Network Configuration and Firewall Rules

MongoDB replica set members communicate with each other on port 27017 (default). Ensure that your OVH firewall rules (e.g., using iptables or OVH’s cloud firewall service) allow traffic on this port between your MongoDB instances. If your instances are in different availability zones or regions, this becomes even more critical.

Example iptables rules for allowing inter-node communication (run on each MongoDB node):

# Allow traffic from other replica set members on port 27017
# Replace , , etc. with actual IPs
iptables -A INPUT -p tcp --dport 27017 -s  -j ACCEPT
iptables -A INPUT -p tcp --dport 27017 -s  -j ACCEPT
iptables -A INPUT -p tcp --dport 27017 -s  -j ACCEPT

# Allow traffic from your application servers
iptables -A INPUT -p tcp --dport 27017 -s  -j ACCEPT
iptables -A INPUT -p tcp --dport 27017 -s  -j ACCEPT

# Optionally, drop other traffic to port 27017 if not explicitly allowed
# iptables -A INPUT -p tcp --dport 27017 -j DROP

# Save iptables rules (method depends on your OS, e.g., iptables-persistent)
# service iptables-persistent save

For application servers, ensure they can reach all MongoDB nodes. The Perl driver will typically connect to the primary, but during failover, it might probe other nodes. Therefore, application servers should have access to port 27017 on all replica set members.

Monitoring and Alerting for Failover Events

Automated failover is only part of the story. You need to know when it happens and if it’s successful. Implement robust monitoring for your MongoDB replica set.

Replica Set Status: Regularly query rs.status() to check the health of each member, their state (PRIMARY, SECONDARY, ARBITER, etc.), and oplog lag.
Node Health: Monitor CPU, memory, disk I/O, and network traffic on each MongoDB instance.
Application Connectivity: Ensure your Perl applications can consistently connect and perform operations.

Tools like Prometheus with the MongoDB exporter, or commercial solutions, can be integrated. Set up alerts for:

Primary node becoming unavailable.
Secondary nodes falling too far behind the primary (high oplog lag).
Replica set election failures.
Application connection errors.

These alerts are crucial for human intervention when automated recovery mechanisms might not be sufficient or when investigating the root cause of frequent failovers.

Considerations for OVH Dedicated Servers vs. Public Cloud Instances

If you are using OVH dedicated servers, network configuration might involve their control panel or specific network interface configurations. For OVH Public Cloud instances, security groups and network ACLs play a similar role to iptables.

For geographically distributed deployments across different OVH regions (e.g., GRA, RBX, BHS), ensure that:

Network latency is acceptable for replication.
Bandwidth costs are factored in.
Appropriate security measures (VPNs, private network peering) are in place if not using OVH’s private network offerings.

An arbiter node can be considered in scenarios where you have an even number of data-bearing nodes and want to ensure a majority vote for elections without adding another full data node, though this adds complexity and is typically used in more advanced configurations.