Disaster Recovery 101: Architecting Auto-Failovers for MongoDB and PHP Deployments on Linode
Establishing a MongoDB Replica Set for High Availability
A robust disaster recovery strategy for MongoDB hinges on implementing a replica set. This ensures data redundancy and automatic failover. We’ll focus on a three-node setup for quorum and resilience, deployed on Linode instances. Each node will run MongoDB, and we’ll configure them to communicate as a replica set.
First, ensure MongoDB is installed on each Linode instance. For this example, we’ll assume Ubuntu 22.04 LTS. The installation process is standard:
Installing MongoDB
On each Linode node (e.g., `mongo-node-1`, `mongo-node-2`, `mongo-node-3`):
sudo apt update sudo apt install -y mongodb sudo systemctl enable mongod sudo systemctl start mongod
Configuring MongoDB for Replication
The core of replication configuration lies in the MongoDB configuration file, typically located at /etc/mongod.conf. We need to enable replication and specify a replica set name. This configuration must be identical across all nodes, with the exception of the bindIp setting if you’re not using a private network for inter-node communication (though highly recommended).
On each node, edit /etc/mongod.conf. Ensure the following sections are present and correctly configured:
# /etc/mongod.conf
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
systemLog:
destination: file
path: /var/log/mongodb/mongod.log
logAppend: true
net:
port: 27017
bindIp: 0.0.0.0 # Or specific private IPs for enhanced security
replication:
replSetName: myReplicaSet # This name must be identical on all nodes
processManagement:
fork: true
pidFilePath: /var/run/mongodb/mongod.pid
security:
authorization: enabled # Highly recommended for production
After modifying the configuration file on each node, restart the MongoDB service:
sudo systemctl restart mongod
Initializing the Replica Set
Once all nodes are running with the updated configuration, connect to one of the MongoDB instances (e.g., `mongo-node-1`) using the `mongosh` client. From there, initiate the replica set configuration.
mongosh --host mongo-node-1 --port 27017
Inside the `mongosh` shell, run the following command. Replace the hostnames/IPs with your actual Linode instance IPs or hostnames.
rs.initiate(
{
_id: "myReplicaSet",
members: [
{ _id: 0, host: "mongo-node-1:27017" },
{ _id: 1, host: "mongo-node-2:27017" },
{ _id: 2, host: "mongo-node-3:27017" }
]
}
)
You can verify the replica set status by running rs.status() in the `mongosh` shell. You should see all members in an `PRIMARY` or `SECONDARY` state, with one member elected as `PRIMARY`.
Architecting PHP Application for MongoDB Replica Set Connectivity
Your PHP application needs to be aware of the MongoDB replica set and connect to it appropriately. The MongoDB PHP driver handles replica set connections by default when provided with a connection string that lists multiple hosts. This allows the driver to discover the replica set topology and automatically connect to the current primary.
Connection String Configuration
The connection string format for replica sets is crucial. It should include all members of the replica set, along with the replicaSet option.
<?php
// Example using MongoDB PHP driver
$mongoHosts = [
'mongo-node-1.linode.com:27017',
'mongo-node-2.linode.com:27017',
'mongo-node-3.linode.com:27017',
];
// Construct the connection string
$connectionString = 'mongodb://' . implode(',', $mongoHosts) . '/?replicaSet=myReplicaSet&authSource=admin'; // Add authSource if using auth
try {
$client = new MongoDB\Client($connectionString);
// Select a database
$database = $client->selectDatabase('your_database_name');
// Perform operations
$collection = $database->selectCollection('your_collection_name');
$result = $collection->insertOne(['name' => 'Test Document']);
echo "Successfully connected and inserted document with ID: " . $result->getInsertedId();
} catch (MongoDB\Driver\Exception\Exception $e) {
// Handle connection errors or other MongoDB exceptions
error_log("MongoDB Connection Error: " . $e->getMessage());
// Implement fallback logic or display an error message to the user
echo "Error connecting to the database. Please try again later.";
}
?>
In this example:
$mongoHosts: An array containing the host and port of each MongoDB node.implode(',', $mongoHosts): Joins the host strings into a comma-separated list.?replicaSet=myReplicaSet: This is the critical parameter that tells the driver to connect to a replica set namedmyReplicaSet.&authSource=admin: Specify the authentication database if you have enabled authentication.
Handling Failovers in PHP
The MongoDB PHP driver, when configured with a replica set connection string, automatically handles failovers. If the current primary becomes unavailable, the driver will detect this and switch to the new primary. However, your application should still implement robust error handling to gracefully manage temporary unavailability or during the brief period of failover.
The try...catch block in the PHP code above is essential. It catches exceptions thrown by the MongoDB driver, such as connection errors or query failures. Within the catch block, you should log the error and potentially implement a retry mechanism or inform the user of the issue.
Implementing Auto-Failover with a Load Balancer (Optional but Recommended)
While MongoDB’s replica set provides automatic failover for database connections, for your PHP application servers, you might want to add an additional layer of resilience using a load balancer. This is particularly useful if you have multiple PHP application servers and want to distribute traffic and ensure that if one application server fails, traffic is routed to healthy ones.
For this scenario, we’ll consider using HAProxy, a popular, high-performance TCP/HTTP load balancer. We’ll configure it to balance traffic across your PHP application servers.
HAProxy Installation and Configuration
Install HAProxy on a dedicated Linode instance or one of your existing servers (ensure it has sufficient resources).
sudo apt update sudo apt install -y haproxy
Edit the HAProxy configuration file, typically located at /etc/haproxy/haproxy.cfg.
# /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
frontend http_frontend
bind *:80
default_backend http_backend
backend http_backend
balance roundrobin
option httpchk GET /healthz # Health check endpoint on your PHP app
server php-app-1 192.168.1.10:80 check # Replace with your PHP app server IPs
server php-app-2 192.168.1.11:80 check
server php-app-3 192.168.1.12:80 check
Explanation of the HAProxy configuration:
globalanddefaults: Standard HAProxy settings for logging, timeouts, etc.frontend http_frontend: Defines the entry point for incoming HTTP traffic on port 80.backend http_backend: Defines the pool of backend servers (your PHP application servers).balance roundrobin: Distributes requests evenly among servers. Other options includeleastconn.option httpchk GET /healthz: Configures HAProxy to perform HTTP health checks. Your PHP application must expose a/healthzendpoint that returns a 2xx status code when healthy.server php-app-X ... check: Lists your PHP application servers. Thecheckoption enables health checking, so HAProxy will automatically stop sending traffic to unhealthy servers.
After configuring HAProxy, restart the service:
sudo systemctl restart haproxy
PHP Application Health Check Endpoint
To make HAProxy’s health checks effective, your PHP application needs a simple endpoint that indicates its health. This endpoint should check critical dependencies, such as the MongoDB connection.
<?php
// public/healthz.php
header('Content-Type: application/json');
$response = ['status' => 'unhealthy', 'message' => 'Unknown error'];
try {
// Attempt to connect to MongoDB and perform a simple operation
// Use the same connection string as your main application
$mongoHosts = [
'mongo-node-1.linode.com:27017',
'mongo-node-2.linode.com:27017',
'mongo-node-3.linode.com:27017',
];
$connectionString = 'mongodb://' . implode(',', $mongoHosts) . '/?replicaSet=myReplicaSet&authSource=admin';
$client = new MongoDB\Client($connectionString);
// Ping the server to check connection
$client->selectServer(new MongoDB\Driver\ReadPreference(MongoDB\Driver\ReadPreference::RP_PRIMARY));
$client->getManager()->executeCommand(new MongoDB\Driver\Command(['ping' => 1]));
$response['status'] = 'healthy';
$response['message'] = 'Database connection successful';
http_response_code(200); // OK
} catch (MongoDB\Driver\Exception\Exception $e) {
$response['message'] = 'Database connection failed: ' . $e->getMessage();
http_response_code(503); // Service Unavailable
} catch (Exception $e) {
$response['message'] = 'An unexpected error occurred: ' . $e->getMessage();
http_response_code(500); // Internal Server Error
}
echo json_encode($response);
?>
This healthz.php script, when accessed via HTTP, will attempt to connect to MongoDB. If successful, it returns a 200 OK status with a JSON payload indicating health. If it fails, it returns an appropriate error code (e.g., 503 Service Unavailable) and a descriptive message. HAProxy will monitor this endpoint and automatically remove unhealthy PHP servers from its rotation.
Monitoring and Alerting
A critical component of any disaster recovery strategy is robust monitoring and alerting. You need to be notified proactively when issues arise, not after they’ve impacted users.
MongoDB Monitoring
Utilize tools like:
- MongoDB Atlas Monitoring (if using Atlas, though this guide focuses on self-hosted on Linode)
- Prometheus with the MongoDB Exporter: Scrape metrics from your MongoDB instances (e.g., oplog lag, connection counts, disk usage, replication status) and visualize them in Grafana.
- Nagios/Zabbix: Configure checks for MongoDB service status, replica set health, and key performance indicators.
Key metrics to monitor for MongoDB replication:
replSetGetStatusoutput: Specifically look formembers[].stateStr(should be PRIMARY/SECONDARY) andmembers[].optimeDate(to detect oplog lag).- Disk space on data directories.
- Network latency between nodes.
- CPU and memory usage.
Application and Infrastructure Monitoring
Monitor your PHP application servers and HAProxy:
- HAProxy Stats: Enable the HAProxy stats page (often accessible at
/haproxy?stats) to view backend server status in real-time. - Application Performance Monitoring (APM) tools (e.g., New Relic, Datadog): Track application response times, error rates, and database query performance.
- System-level monitoring (e.g., Prometheus Node Exporter, Netdata): Monitor CPU, memory, disk I/O, and network traffic on all servers.
- Log Aggregation (e.g., ELK stack, Graylog): Centralize logs from all servers to easily search and analyze errors.
Set up alerts for:
- MongoDB replica set member down or in an unhealthy state.
- Significant oplog lag.
- HAProxy backend server marked as DOWN.
- High application error rates.
- Low disk space.
- High server resource utilization.
Testing Your Failover Strategy
Regularly testing your failover mechanism is non-negotiable. Simulate failures to ensure your automated processes work as expected and that your team knows how to respond.
Simulating MongoDB Failures
To test MongoDB failover:
- Graceful Shutdown: Stop the primary MongoDB instance using
sudo systemctl stop mongod. Observe the replica set electing a new primary. - Abrupt Shutdown: Kill the MongoDB process directly (e.g.,
sudo kill -9 $(pgrep mongod)). This simulates a crash and tests the replica set’s resilience. - Network Partition: Temporarily block network traffic between MongoDB nodes to simulate network issues.
After each simulation, verify that a new primary is elected and that your PHP application can still connect and operate correctly. Check rs.status() to confirm the state of the replica set.
Simulating Application Server Failures
To test HAProxy and application server failover:
- Stop HAProxy:
sudo systemctl stop haproxy. All traffic should stop. Restart it to resume. - Stop PHP Application Server: Stop the web server (e.g., Apache/Nginx) or the PHP-FPM process on one of your application servers. Observe HAProxy marking that server as DOWN in its stats and rerouting traffic to the remaining healthy servers.
- Simulate Application Unresponsiveness: Modify the
healthz.phpscript to always return an error or a 500 status code. HAProxy should detect this and take the server out of rotation.
Ensure that during these tests, users experience minimal disruption, or at least a graceful degradation of service rather than a complete outage.