Disaster Recovery 101: Architecting Auto-Failovers for MongoDB and WooCommerce Deployments on OVH
Establishing a Robust MongoDB Replica Set for High Availability
A foundational element for any disaster recovery strategy involving MongoDB is a properly configured replica set. This ensures data redundancy and provides automatic failover capabilities. For this architecture, we’ll assume a primary, a secondary, and an arbiter node. The arbiter does not hold data but participates in elections, preventing split-brain scenarios. We’ll deploy these across different OVH Availability Zones for maximum resilience.
First, ensure MongoDB is installed on your chosen OVH instances. The configuration file, typically located at /etc/mongod.conf, needs to be adjusted for replica set operation. Key parameters include replication.replSetName, net.bindIp, and net.port.
MongoDB Configuration Snippet
# /etc/mongod.conf
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
net:
port: 27017
bindIp: 0.0.0.0 # Or specific IPs for security
replication:
replSetName: "rs0" # The name of your replica set
processManagement:
fork: true
pidFilePath: /var/run/mongodb/mongod.pid
log:
destination: file
path: /var/log/mongodb/mongod.log
logAppend: true
security:
keyFile: /etc/mongodb-keyfile # Path to your keyfile
authorization: enabled
After configuring each MongoDB instance, start the service:
On each node:
Ensure the keyfile exists and has restricted permissions:
Create a keyfile (e.g., on one node and distribute it securely):
openssl rand -base64 756 > /etc/mongodb-keyfile chmod 400 /etc/mongodb-keyfile chown mongodb:mongodb /etc/mongodb-keyfile
Then, start the MongoDB service:
sudo systemctl start mongod sudo systemctl enable mongod
Once all nodes are running, initiate the replica set configuration from one of the nodes (typically the one you intend to be the initial primary). Connect to the MongoDB shell:
mongo
Inside the MongoDB shell, initiate the replica set:
rs.initiate(
{
_id : "rs0",
members: [
{ _id : 0, host : "mongo-node-1.example.com:27017" },
{ _id : 1, host : "mongo-node-2.example.com:27017" },
{ _id : 2, host : "mongo-arbiter.example.com:27017", arbiterOnly : true }
]
}
)
Verify the replica set status:
rs.status()
This output will show the state of each member and confirm if the replica set is healthy. The rs.status() command is crucial for monitoring and troubleshooting replica set health.
Automating WooCommerce Application Failover with HAProxy
WooCommerce, being a PHP application, typically runs on web servers like Nginx or Apache, with PHP-FPM processing the dynamic content. To achieve automatic failover for the application layer, we’ll employ HAProxy. HAProxy is a high-performance TCP/HTTP load balancer and proxying solution that excels at health checking and redirecting traffic to healthy backend servers.
We’ll set up HAProxy to monitor multiple WooCommerce application instances. If a primary instance becomes unresponsive, HAProxy will automatically direct traffic to a secondary instance. This requires careful configuration of backend server definitions and health checks.
HAProxy Configuration for WooCommerce
# /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
listen stats
bind *:8404
mode http
stats enable
stats uri /stats
stats realm Haproxy\ Statistics
stats auth admin:YourSecurePassword # Change this!
frontend http_frontend
bind *:80
mode http
default_backend webservers
backend webservers
mode http
balance roundrobin
option httpchk GET /wp-cron.php HTTP/1.1\r\nHost:\ yourdomain.com # Customize host header
http-check expect status 200 # Expect a 200 OK for health check
server web1 192.168.1.10:80 check fall 3 rise 2 # Primary WooCommerce server
server web2 192.168.1.11:80 check fall 3 rise 2 # Secondary WooCommerce server
# Add more servers as needed, ideally in different OVH Availability Zones
In this configuration:
globalanddefaultssections set up general logging, timeouts, and error handling.listen statsprovides a web interface to monitor HAProxy’s status, including backend server health. Remember to change the default password.frontend http_frontendlistens on port 80 for incoming HTTP traffic.backend webserversdefines the pool of application servers.balance roundrobindistributes traffic evenly. For strict failover, you might considerbalance leastconnor a custom script, but for auto-failover, roundrobin is often sufficient when combined with robust health checks.option httpchkconfigures HAProxy to perform an HTTP GET request to/wp-cron.phpon each backend server. This is a common, lightweight endpoint for WordPress health checks. Ensure yourHostheader is correctly set to your domain.http-check expect status 200ensures that HAProxy considers a server healthy only if it responds with an HTTP 200 status code.server web1 ... check fall 3 rise 2defines a backend server.checkenables health checking.fall 3means the server is considered down after 3 consecutive failed checks.rise 2means the server is considered up after 2 consecutive successful checks.
After installing and configuring HAProxy, start and enable the service:
sudo systemctl start haproxy sudo systemctl enable haproxy
To test the failover, you can stop the web server process on the primary WooCommerce instance (e.g., sudo systemctl stop apache2 or sudo systemctl stop nginx). HAProxy should detect the failure and automatically start sending traffic to the secondary instance. You can observe this in the HAProxy stats page.
Orchestrating Database and Application Failover with a Health Check Script
While MongoDB replica sets handle database failover automatically, and HAProxy handles application server failover, coordinating these events and ensuring a seamless transition for the user often requires an external orchestration layer. This is particularly true if your application needs to know which MongoDB node is the current primary to ensure writes are directed correctly, or if you need to perform application-level readiness checks before a new application server takes over.
A common approach is to use a custom script that periodically checks the health of both the MongoDB replica set and the application servers. This script can then trigger actions, such as updating DNS records, notifying monitoring systems, or even reconfiguring HAProxy if dynamic configuration is required (though HAProxy’s built-in health checks are usually sufficient).
Example Python Health Check Script
import pymongo
import requests
import time
import logging
# --- Configuration ---
MONGO_REPLICA_SET_NAME = "rs0"
MONGO_NODES = [
"mongodb://mongo-node-1.example.com:27017/?replicaSet={}".format(MONGO_REPLICA_SET_NAME),
"mongodb://mongo-node-2.example.com:27017/?replicaSet={}".format(MONGO_REPLICA_SET_NAME),
"mongodb://mongo-arbiter.example.com:27017/?replicaSet={}".format(MONGO_REPLICA_SET_NAME)
]
APP_PRIMARY_URL = "http://web1.example.com/wp-cron.php" # Assuming web1 is primary
APP_SECONDARY_URL = "http://web2.example.com/wp-cron.php"
HEALTH_CHECK_INTERVAL = 30 # seconds
LOG_FILE = "/var/log/dr_health_check.log"
# --- Logging Setup ---
logging.basicConfig(filename=LOG_FILE, level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s')
def get_mongo_primary():
"""Connects to MongoDB and returns the primary node's hostname."""
for mongo_uri in MONGO_NODES:
try:
client = pymongo.MongoClient(mongo_uri, serverSelectionTimeoutMS=5000)
# The ismaster command is cheap and does not require auth.
client.admin.command('ismaster')
primary_info = client.admin.command('replSetGetStatus')
for member in primary_info['members']:
if member['stateStr'] == 'PRIMARY':
return member['name']
client.close()
except pymongo.errors.ConnectionFailure as e:
logging.warning(f"Could not connect to {mongo_uri}: {e}")
except Exception as e:
logging.error(f"An error occurred while checking MongoDB: {e}")
return None
def check_app_health(url):
"""Checks if an application URL is healthy (returns 200 OK)."""
try:
response = requests.get(url, timeout=5)
return response.status_code == 200
except requests.exceptions.RequestException as e:
logging.warning(f"Application health check failed for {url}: {e}")
return False
def main():
logging.info("Starting DR health check script.")
while True:
current_mongo_primary = get_mongo_primary()
if current_mongo_primary:
logging.info(f"Current MongoDB primary: {current_mongo_primary}")
# In a more advanced setup, you might compare this to an expected primary
# and trigger actions if it changes unexpectedly or is unavailable.
else:
logging.error("Could not determine MongoDB primary. Replica set might be down.")
# Trigger critical alert or automated recovery if possible
is_primary_app_healthy = check_app_health(APP_PRIMARY_URL)
is_secondary_app_healthy = check_app_health(APP_SECONDARY_URL)
if is_primary_app_healthy:
logging.info(f"Application primary ({APP_PRIMARY_URL}) is healthy.")
else:
logging.warning(f"Application primary ({APP_PRIMARY_URL}) is unhealthy.")
if is_secondary_app_healthy:
logging.info(f"Application secondary ({APP_SECONDARY_URL}) is healthy. Traffic should be directed here.")
# Here you would implement actions to switch traffic, e.g.,
# - Update DNS (if HAProxy is not used or needs re-pointing)
# - Trigger HAProxy reconfigure (if dynamic)
# - Send alerts
else:
logging.error(f"Both application primary and secondary are unhealthy.")
# Trigger critical alert
time.sleep(HEALTH_CHECK_INTERVAL)
if __name__ == "__main__":
main()
This Python script uses pymongo to query the MongoDB replica set status and requests to check the health of the application endpoints. It runs in an infinite loop, performing checks at a defined interval. If the MongoDB primary cannot be determined or if the primary application server is unhealthy while the secondary is healthy, it logs the events. In a production environment, you would extend this script to:
- Send alerts via email, Slack, or PagerDuty.
- Trigger automated DNS updates (e.g., using OVH’s API to change A records).
- Execute commands to reconfigure HAProxy if its built-in health checks are insufficient or if you need to switch to a completely different set of servers.
- Perform application-specific readiness checks beyond a simple HTTP 200.
To run this script reliably, consider deploying it using a process manager like systemd or a container orchestration platform. Ensure the necessary Python libraries (pymongo, requests) are installed in the environment where the script runs.
OVH Specific Considerations and Best Practices
When architecting disaster recovery solutions on OVH, several platform-specific aspects are crucial:
- Availability Zones (AZs): Deploy your MongoDB nodes and application servers across different OVH Availability Zones. This is the most effective way to protect against datacenter-level failures.
- Networking: Ensure your firewall rules (OVH Security Groups or instance-level firewalls) allow traffic between your MongoDB nodes, application servers, and HAProxy instances. For MongoDB, this typically means allowing port 27017. For HAProxy, it means allowing traffic on port 80 (or your application’s port) and the stats port (e.g., 8404).
- IP Addresses: Use private IP addresses for inter-server communication within OVH’s network for better security and performance. Public IPs should be managed by HAProxy or DNS for external access.
- Monitoring: Leverage OVH’s monitoring tools in conjunction with your custom scripts and HAProxy stats. Monitor CPU, memory, disk I/O, and network traffic on all instances.
- Backups: While replication provides high availability, it is not a substitute for backups. Implement a robust backup strategy for your MongoDB data, storing backups off-site or in a separate OVH region.
- DNS Management: If you’re not solely relying on HAProxy for external access, consider using OVH’s DNS services. Automated failover might involve updating DNS records to point to a new HAProxy instance or a different set of application servers.
By combining MongoDB’s native replication, HAProxy’s intelligent load balancing and health checking, and a custom orchestration script, you can build a highly available and resilient WooCommerce deployment on OVH that automatically recovers from common failure scenarios.