Disaster Recovery 101: Architecting Auto-Failovers for MongoDB and Laravel Deployments on OVH

Leveraging OVH’s Infrastructure for Automated MongoDB Failover

Architecting a robust disaster recovery strategy for MongoDB, especially in a cloud environment like OVH, hinges on understanding its replication mechanisms and leveraging external orchestration. For automated failover, we’ll focus on a replica set configuration and external monitoring/triggering. This isn’t about MongoDB’s built-in automatic failover (which is primarily for electing a new primary within the replica set), but rather about detecting a complete cluster failure and initiating a switch to a standby environment.

MongoDB Replica Set Configuration for High Availability

A fundamental prerequisite is a properly configured MongoDB replica set. This ensures data redundancy and provides the foundation for failover. We’ll assume a multi-region deployment strategy for true disaster recovery, with a primary replica set in one OVH region and a secondary, standby replica set in another.

Consider a scenario with two OVH regions: GRA (Gravelines) and RBX (Roubaix). Our primary deployment will be in GRA, with a standby in RBX.

On your primary OVH instance (e.g., a dedicated server or a Public Cloud instance), ensure MongoDB is installed and configured as a replica set. The configuration file (typically /etc/mongod.conf) should include:

replication:
  replSetName: "myReplicaSet"
net:
  bindIp: 0.0.0.0
  port: 27017
storage:
  dbPath: /var/lib/mongodb
systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true
processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid

After starting the MongoDB daemon, initiate the replica set configuration:

mongo --port 27017
> rs.initiate( {
   _id : "myReplicaSet",
   members: [
      { _id : 0, host : "mongo-primary-gra.yourdomain.com:27017" },
      { _id : 1, host : "mongo-secondary-gra.yourdomain.com:27017" },
      { _id : 2, host : "mongo-arbiter-gra.yourdomain.com:27017" }
   ]
})

The standby replica set in RBX would be configured similarly, but initially, it would not be part of the primary replica set. Data synchronization to the standby can be achieved through various methods:

Manual Backups and Restores: Regular mongodump and mongorestore operations. This is the simplest but least real-time approach.
Replication to a Separate Replica Set: Configure the primary replica set to replicate to a secondary replica set in RBX. This requires careful network configuration and potentially using a hidden or delayed secondary.
OVH Snapshots: For Public Cloud instances, leverage OVH’s snapshot capabilities for periodic data backups.

For automated failover, we need a mechanism to detect the failure of the GRA cluster and promote the RBX cluster. This typically involves an external monitoring system and an orchestration script.

Automated Failover Orchestration with a Monitoring Script

A common approach is to use a dedicated monitoring server (which could be a small, low-cost instance in a third OVH region or even an external service) that periodically checks the health of the primary MongoDB cluster. If the primary becomes unreachable, this monitor triggers a failover process.

The monitoring script, written in Python for its ease of use with MongoDB drivers and system commands, would perform the following:

Periodically attempt to connect to the primary MongoDB instance in GRA.
Check for a specific condition (e.g., a health check document in a dedicated collection, or simply successful connection and query).
If the primary is unresponsive for a defined threshold (e.g., 3 consecutive failures), initiate the failover.

Here’s a simplified Python script for monitoring:

import pymongo
import time
import os
import requests # For potential webhook notifications

PRIMARY_MONGO_HOST = "mongo-primary-gra.yourdomain.com"
PRIMARY_MONGO_PORT = 27017
FAILOVER_THRESHOLD = 3 # Number of consecutive failures before triggering
CHECK_INTERVAL = 60 # Seconds between checks

def check_mongo_health():
    try:
        client = pymongo.MongoClient(PRIMARY_MONGO_HOST, PRIMARY_MONGO_PORT, serverSelectionTimeoutMS=5000)
        client.admin.command('ping') # Simple command to check connectivity
        return True
    except pymongo.errors.ConnectionFailure:
        return False
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        return False

def trigger_failover():
    print("Primary MongoDB cluster is unresponsive. Initiating failover...")
    # --- Failover Logic ---
    # 1. Promote the standby replica set in RBX.
    #    This involves connecting to one of the RBX members and calling rs.stepDown()
    #    on the current primary (if it's still reachable) or reconfiguring the RBX set.
    #    A more robust approach is to have the RBX set already configured and ready to be promoted.
    #    If using a separate replica set for standby, you might need to:
    #    a) Ensure RBX members are configured to accept writes.
    #    b) Potentially reconfigure the RBX replica set to include new members if the GRA set is lost entirely.

    # Example: Assuming RBX is a separate replica set and we need to make it primary
    # This is a simplified example. Real-world might involve more complex steps.
    try:
        rbx_client = pymongo.MongoClient("mongo-primary-rbx.yourdomain.com:27017", serverSelectionTimeoutMS=5000)
        # If RBX is already a replica set, you might need to ensure it's configured correctly.
        # If it's a standalone instance that needs to become a replica set, that's a different process.
        # For a pre-configured RBX replica set, you might just need to ensure it's the primary.
        # If the GRA set is completely gone, the RBX set might already be the primary.
        # If the GRA set is partially up but unhealthy, you might need to force a reconfig or stepDown.

        # A common strategy is to have the RBX replica set already initialized and ready.
        # If the GRA set is down, the RBX set might already be electing a primary.
        # If you need to explicitly promote it, you might need to:
        # 1. Connect to an RBX member.
        # 2. Check its replica set status.
        # 3. If it's not primary, and the GRA set is confirmed down, you might need to force a reconfig.
        #    This is complex and depends on your exact setup.

        # For simplicity, let's assume RBX is a pre-configured replica set and we just need to ensure it's the target.
        # In a real scenario, you'd likely have a mechanism to update application connection strings.
        print("RBX replica set is now the primary target.")

        # Notify other systems (e.g., Laravel application) to switch connection strings.
        notify_application_failover()

        # Optional: Send alerts
        send_alert("MongoDB failover triggered to RBX region.")

    except Exception as e:
        print(f"Failed to promote RBX replica set or notify application: {e}")
        send_alert(f"MongoDB failover failed: {e}")

def notify_application_failover():
    # This function would contain logic to inform your Laravel application
    # to switch its MongoDB connection string. This could involve:
    # - Updating a configuration file on the application servers and restarting services.
    # - Using an API to update dynamic configuration in a service discovery system.
    # - Triggering a deployment pipeline to update the application's environment variables.
    print("Notifying Laravel application to switch MongoDB connection...")
    # Example: Triggering a webhook or API call
    try:
        # Replace with your actual webhook URL or API endpoint
        response = requests.post("https://your-app-api.yourdomain.com/mongodb-failover", json={"new_primary": "mongo-primary-rbx.yourdomain.com:27017"})
        if response.status_code == 200:
            print("Application notified successfully.")
        else:
            print(f"Failed to notify application. Status code: {response.status_code}")
    except Exception as e:
        print(f"Error notifying application: {e}")

def send_alert(message):
    print(f"ALERT: {message}")
    # Implement actual alerting mechanism (e.g., PagerDuty, Slack, email)
    # Example: Sending a POST request to a Slack webhook
    # slack_webhook_url = os.environ.get("SLACK_WEBHOOK_URL")
    # if slack_webhook_url:
    #     requests.post(slack_webhook_url, json={"text": f"ALERT: {message}"})

if __name__ == "__main__":
    consecutive_failures = 0
    while True:
        if check_mongo_health():
            if consecutive_failures > 0:
                print("Primary MongoDB cluster is back online.")
                consecutive_failures = 0
        else:
            consecutive_failures += 1
            print(f"Primary MongoDB cluster is down. Failure count: {consecutive_failures}/{FAILOVER_THRESHOLD}")
            if consecutive_failures >= FAILOVER_THRESHOLD:
                trigger_failover()
                # After triggering failover, we might want to stop monitoring the old primary
                # or re-evaluate the monitoring strategy. For simplicity, we'll break here
                # or implement a more sophisticated state management.
                break # Exit after triggering failover for this run

        time.sleep(CHECK_INTERVAL)

This script needs to be deployed on a reliable instance within OVH (preferably in a different zone/region than your primary and standby databases) and run continuously. The trigger_failover function is the critical part. It needs to:

Promote Standby: If your standby is a separate replica set, you’ll need to ensure it’s configured to become the primary. This might involve connecting to one of its members and executing commands like rs.reconfig() or ensuring its configuration is set up for automatic primary election if the primary is unreachable. If you’re using a secondary member of the *same* replica set in a different region, you’d typically rely on MongoDB’s internal election process once the primary is confirmed dead, but this requires careful network setup and understanding of election timeouts.
Update Application Configuration: The most crucial step is informing your Laravel application to switch its database connection string to the new primary in RBX.

Integrating with Laravel for Connection Switching

Laravel’s database configuration is typically managed via config/database.php. For dynamic switching during a failover, we need an external mechanism to update this configuration or provide connection details at runtime.

Several strategies can be employed:

Environment Variables: The most common and recommended approach. The Laravel application reads database credentials (host, port, database name, username, password) from environment variables (e.g., .env file or system environment variables). The failover script would then update these environment variables on the application servers and trigger a graceful restart of the PHP-FPM process or the entire application server.
Configuration Caching: Be mindful of Laravel’s configuration caching. If you use php artisan config:cache, changes to config/database.php won’t take effect until the cache is cleared. Relying on environment variables read at runtime bypasses this issue.
Service Discovery: For more complex microservice architectures, a service discovery tool (like Consul or etcd) can store the current MongoDB primary endpoint. The failover script updates the service discovery record, and the Laravel application queries it to get the current connection details.

Let’s illustrate the environment variable approach. Your .env file might look like this:

DB_CONNECTION=mongodb
DB_HOST=mongo-primary-gra.yourdomain.com
DB_PORT=27017
DB_DATABASE=myapp_db
DB_USERNAME=db_user
DB_PASSWORD=secret_password

In your Laravel application’s config/database.php, you’d read these:

'mongodb' => [
    'driver' => 'mongodb',
    'host' => env('DB_HOST', 'localhost'),
    'port' => env('DB_PORT', 27017),
    'database' => env('DB_DATABASE'),
    'username' => env('DB_USERNAME'),
    'password' => env('DB_PASSWORD'),
    'options' => [
        'database' => env('DB_DATABASE'), // Often repeated for MongoDB driver
    ],
],

The failover script would then execute commands on the Laravel application servers to update the environment variables and restart services. This could be done via SSH or a configuration management tool like Ansible.

# Example Ansible playbook snippet to update .env and restart php-fpm
- name: Update .env file for MongoDB failover
  lineinfile:
    path: /var/www/html/your_laravel_app/.env
    regexp: "^DB_HOST="
    line: "DB_HOST=mongo-primary-rbx.yourdomain.com"
  notify: Restart PHP-FPM

- name: Update .env file for MongoDB failover (Port)
  lineinfile:
    path: /var/www/html/your_laravel_app/.env
    regexp: "^DB_PORT="
    line: "DB_PORT=27017"
  notify: Restart PHP-FPM

# ... other DB variables ...

- name: Clear config cache
  command: php artisan config:clear
  args:
    chdir: /var/www/html/your_laravel_app

# Handlers section in the playbook
# handlers:
#   - name: Restart PHP-FPM
#     service:
#       name: php8.1-fpm # Adjust version as needed
#       state: restarted

The notify_application_failover function in the Python script would orchestrate these Ansible commands or execute them directly via SSH. A crucial aspect is ensuring the application can gracefully handle the restart and reconnect to the new database primary.

OVH Specific Considerations

When deploying across OVH regions (e.g., GRA and RBX), consider:

Network Latency: Ensure your application servers can reach both database regions with acceptable latency. Cross-region traffic costs and latency are factors.
IP Whitelisting: If your MongoDB instances have strict firewall rules, ensure the monitoring server and application servers have access to the database endpoints in both regions.
OVH Public Cloud vs. Dedicated Servers: The automation approach might differ slightly. Public Cloud offers more API-driven control (e.g., for snapshots, instance management), while dedicated servers might rely more on SSH and direct OS-level commands.
Monitoring Server Location: Place your monitoring server in a third, independent OVH region or zone to avoid a single point of failure affecting both your primary and standby databases.
Data Synchronization: For the standby RBX cluster, ensure data is replicated or restored frequently enough to minimize data loss during a failover. This might involve setting up MongoDB replication across regions, using OVH snapshots, or custom data sync scripts.

Testing and Refinement

Thorough testing is paramount. Simulate failures by:

Stopping the MongoDB service on the primary GRA instance.
Simulating network partitions.
Testing the entire failover process end-to-end, including application configuration updates and restarts.
Verifying data integrity after failover.
Testing the failback process (returning to the original primary once it’s restored).

Automated failover is a complex but essential component of a robust disaster recovery strategy. By combining MongoDB’s replication capabilities with external monitoring and orchestration, you can build a resilient system on OVH’s infrastructure that minimizes downtime.

Disaster Recovery 101: Architecting Auto-Failovers for MongoDB and Laravel Deployments on OVH

Leveraging OVH’s Infrastructure for Automated MongoDB Failover

MongoDB Replica Set Configuration for High Availability

Automated Failover Orchestration with a Monitoring Script

Integrating with Laravel for Connection Switching

OVH Specific Considerations

Testing and Refinement

Recent Posts

Top Categories

Our Products

Our Services