Server Monitoring Best Practices: Keeping Your PHP App and MongoDB Clusters Alive on OVH

Proactive MongoDB Cluster Health Checks with `mongosh` and `cron`

Maintaining the health of a distributed MongoDB cluster, especially on a cloud platform like OVH, requires more than just reactive alerts. Proactive, automated checks are crucial for identifying potential issues before they impact application performance or availability. This section details a robust approach using the `mongosh` shell and `cron` for scheduled health assessments.

We’ll focus on key metrics: replication lag, disk space utilization, and member status. These checks should be executed regularly, ideally every 5-15 minutes, depending on your cluster’s criticality.

1. Replication Lag Monitoring

Replication lag is a critical indicator of data synchronization issues. High lag can lead to stale reads and potential data inconsistencies. We can query the `rs.status()` output to determine the oplog window and the time since the last operation applied by each secondary.

The following `mongosh` script connects to a replica set member and reports the replication lag for each secondary. It calculates the difference between the current time and the `optimeDate` of the last applied operation on each secondary.

Script: `check_replication_lag.js`

// check_replication_lag.js
// Connect to a MongoDB replica set member
// Replace 'mongodb://user:password@host:port/admin' with your connection string
var conn = new Mongo("mongodb://localhost:27017/admin");
var adminDb = conn.getDB("admin");

// Ensure we are connected to a replica set and get its status
var replStatus = adminDb.runCommand({ replSetGetStatus: 1 });

if (!replStatus || !replStatus.ok) {
    print("Error: Could not get replica set status. Are you connected to a replica set member?");
    quit(1);
}

var members = replStatus.members;
var primary = null;
var secondaries = [];

// Identify primary and secondaries
members.forEach(function(member) {
    if (member.stateStr === "PRIMARY") {
        primary = member;
    } else if (member.stateStr === "SECONDARY" || member.stateStr === "SECONDARY_DEFAULT") {
        secondaries.push(member);
    }
});

if (!primary) {
    print("Error: No primary found. Cluster might be in an unhealthy state.");
    quit(1);
}

print("Primary: " + primary.name);
print("--- Replication Lag ---");

var currentTime = new Date();

secondaries.forEach(function(secondary) {
    var lagSeconds = -1;
    if (secondary.optimeDate) {
        lagSeconds = Math.floor((currentTime.getTime() - secondary.optimeDate.getTime()) / 1000);
    }
    print(secondary.name + ": " + lagSeconds + " seconds lag");

    // Define a threshold for alerting (e.g., 60 seconds)
    var lagThreshold = 60;
    if (lagSeconds > lagThreshold) {
        print("ALERT: High replication lag on " + secondary.name + " (" + lagSeconds + "s)");
        // In a real-world scenario, you'd trigger an alert here (e.g., send to Slack, PagerDuty)
    }
});

// Check for any members that are not PRIMARY, SECONDARY, or SECONDARY_DEFAULT
members.forEach(function(member) {
    if (member.stateStr !== "PRIMARY" && member.stateStr !== "SECONDARY" && member.stateStr !== "SECONDARY_DEFAULT") {
        print("WARNING: Member " + member.name + " is in state: " + member.stateStr);
    }
});

quit(0);

To automate this, we’ll use `cron`. First, ensure you have a user with appropriate permissions to connect to MongoDB. Then, create a cron job. It’s recommended to run this script from a dedicated monitoring server or one of your application servers that has network access to the MongoDB cluster.

Cron Job Setup

Edit the crontab for the user that will run the script (e.g., `www-data` or a dedicated `monitor` user):

crontab -e

Add the following line to run the script every 10 minutes:

*/10 * * * * /usr/bin/mongosh "mongodb://user:password@your_mongo_host:27017/admin" --quiet --file /path/to/check_replication_lag.js >> /var/log/mongodb/replication_lag.log 2>&1

Explanation:

/usr/bin/mongosh: Path to your MongoDB shell executable.
"mongodb://user:password@your_mongo_host:27017/admin": Your MongoDB connection string. Crucially, use a dedicated read-only user for monitoring.
--quiet: Suppresses welcome messages and other non-essential output.
--file /path/to/check_replication_lag.js: Specifies the script to execute.
>> /var/log/mongodb/replication_lag.log 2>&1: Appends standard output and standard error to a log file. This is essential for debugging and historical analysis.

For alerting, you would typically parse the log file or modify the script to send alerts directly (e.g., using `curl` to a webhook or a dedicated monitoring agent). A simple approach is to use `grep` on the log file for “ALERT” or “WARNING” and pipe that to an alerting mechanism.

2. Disk Space Monitoring

Running out of disk space is a common cause of database downtime. We need to monitor the disk usage of the data directories for each MongoDB instance.

This can be done using standard Linux tools like `df` and `du`. We’ll create a script that checks the usage of the directory specified by `dbPath` in your MongoDB configuration.

Script: `check_disk_space.sh`

#!/bin/bash

# Configuration
MONGO_CONF="/etc/mongod.conf" # Path to your mongod configuration file
ALERT_THRESHOLD_PERCENT=85   # Alert if usage exceeds this percentage
LOG_FILE="/var/log/mongodb/disk_space.log"

# --- Function to get MongoDB data path ---
get_mongo_data_path() {
    local conf_file="$1"
    local data_path=$(awk -F': ' '/^\s*storage:/,/^\s*}/ { if ($1 ~ /^\s*dbPath:/) { print $2; exit } }' "$conf_file" | tr -d '[:space:]')
    if [ -z "$data_path" ]; then
        # Fallback for older configs or different structures
        data_path=$(awk -F': ' '/^\s*dbPath:/ { print $2; exit }' "$conf_file" | tr -d '[:space:]')
    fi
    echo "$data_path"
}

# --- Main Script Logic ---
echo "--- $(date) ---" >> "$LOG_FILE"

# Find all mongod processes and their config files
pgrep -f "mongod --config" | while read -r pid; do
    cmdline=$(ps -p $pid -o cmd=)
    # Extract config file path from command line arguments
    conf_file=$(echo "$cmdline" | sed -n 's/.*--config\s*\([^ ]*\).*/\1/p')

    if [ -z "$conf_file" ]; then
        # If --config not found, try to find a default or common path
        # This is less reliable and might need adjustment based on your deployment
        if ps -p $pid -o cmd= | grep -q "mongod"; then
            # Attempt to find a common config path if not explicitly provided
            # This is a heuristic and might not work for all setups
            if [ -f "/etc/mongod.conf" ]; then
                conf_file="/etc/mongod.conf"
            elif [ -f "/etc/mongodb.conf" ]; then
                conf_file="/etc/mongodb.conf"
            else
                echo "Could not determine config file for PID $pid. Skipping." >> "$LOG_FILE"
                continue
            fi
        else
            echo "Could not determine mongod command for PID $pid. Skipping." >> "$LOG_FILE"
            continue
        fi
    fi

    DATA_PATH=$(get_mongo_data_path "$conf_file")

    if [ -z "$DATA_PATH" ]; then
        echo "Could not find dbPath in config file: $conf_file for PID $pid. Skipping." >> "$LOG_FILE"
        continue
    fi

    # Ensure DATA_PATH is a valid directory
    if [ ! -d "$DATA_PATH" ]; then
        echo "Data path '$DATA_PATH' from config '$conf_file' (PID $pid) is not a valid directory. Skipping." >> "$LOG_FILE"
        continue
    fi

    # Get disk usage for the partition containing the data path
    USAGE=$(df -h "$DATA_PATH" | awk 'NR==2 {print $5}' | sed 's/%//')
    FREE_SPACE=$(df -h "$DATA_PATH" | awk 'NR==2 {print $4}')
    MOUNT_POINT=$(df -h "$DATA_PATH" | awk 'NR==2 {print $6}')

    echo "Instance (PID: $pid, Conf: $conf_file): Data Path = $DATA_PATH, Mount Point = $MOUNT_POINT, Usage = ${USAGE}%, Free = ${FREE_SPACE}" >> "$LOG_FILE"

    if [ "$USAGE" -gt "$ALERT_THRESHOLD_PERCENT" ]; then
        echo "ALERT: Disk space low on mount point '$MOUNT_POINT' for instance (PID: $pid). Usage: ${USAGE}%, Free: ${FREE_SPACE}." >> "$LOG_FILE"
        # Trigger alert here (e.g., send to Slack, PagerDuty)
    fi
done

echo "" >> "$LOG_FILE"

Important Considerations for `check_disk_space.sh`:

This script assumes a standard Linux environment where `mongod` processes are running and their configuration files can be located.
It attempts to dynamically find the `dbPath` from the MongoDB configuration file. You might need to adjust the `awk` command if your configuration structure is significantly different.
The script iterates through running `mongod` processes to check their respective data directories. This is crucial for multi-instance setups on a single host or if you have different `dbPath` configurations.
Ensure the user running the cron job has read permissions for the MongoDB configuration files and the directories being checked.
The `ALERT_THRESHOLD_PERCENT` should be tuned based on your storage provisioning and acceptable risk.

Cron Job Setup for Disk Space

Add a similar entry to your crontab to run this script, for example, every hour:

0 * * * * /path/to/check_disk_space.sh >> /var/log/mongodb/disk_space.log 2>&1

Again, parsing the log file for “ALERT” messages is a common way to integrate this with your alerting system.

3. MongoDB Instance Health and Connectivity

Beyond replication lag and disk space, we need to ensure the MongoDB instances themselves are running and accessible. A simple `ping` or `nc` check is insufficient as it doesn’t verify the MongoDB service is actually responding.

We can use `mongosh` to attempt a connection and execute a simple command like `db.runCommand({ ping: 1 })`. This verifies that the MongoDB server is listening on its port and responding to commands.

Script: `check_mongo_connectivity.sh`

#!/bin/bash

# Configuration
MONGO_HOSTS=("host1:27017" "host2:27017" "host3:27017") # List of MongoDB hosts and ports
MONGO_USER="monitor_user" # Read-only user
MONGO_PASS="your_monitor_password" # Password for the monitor user
AUTH_DB="admin" # Authentication database
ALERT_THRESHOLD_SECONDS=5 # Max acceptable ping response time
LOG_FILE="/var/log/mongodb/connectivity.log"

# --- Main Script Logic ---
echo "--- $(date) ---" >> "$LOG_FILE"

for host_port in "${MONGO_HOSTS[@]}"; do
    host=$(echo "$host_port" | cut -d: -f1)
    port=$(echo "$host_port" | cut -d: -f2)

    # Use mongosh to ping the server
    # We use a short timeout for the connection attempt
    start_time=$(date +%s)
    response=$(/usr/bin/mongosh "mongodb://${MONGO_USER}:${MONGO_PASS}@${host_port}/${AUTH_DB}?serverSelectionTimeoutMS=2000&connectTimeoutMS=2000" --quiet --eval "db.runCommand({ ping: 1 })")
    end_time=$(date +%s)
    duration=$((end_time - start_time))

    if echo "$response" | grep -q '"ok": 1'; then
        echo "SUCCESS: ${host_port} is reachable and responding. Ping time: ${duration}s" >> "$LOG_FILE"
        if [ "$duration" -gt "$ALERT_THRESHOLD_SECONDS" ]; then
            echo "WARNING: ${host_port} responded slowly. Ping time: ${duration}s (Threshold: ${ALERT_THRESHOLD_SECONDS}s)" >> "$LOG_FILE"
            # Trigger slow response alert
        fi
    else
        echo "ERROR: ${host_port} is unreachable or not responding. Response: '$response'" >> "$LOG_FILE"
        # Trigger critical alert
    fi
done

echo "" >> "$LOG_FILE"

Key points for `check_mongo_connectivity.sh`:

MONGO_HOSTS: A crucial array defining all MongoDB instances (primary and secondaries) you want to monitor.
serverSelectionTimeoutMS and connectTimeoutMS: These parameters in the connection string are vital for preventing the script from hanging indefinitely if a server is unresponsive. Adjust them based on your network latency.
db.runCommand({ ping: 1 }): A lightweight command that confirms the MongoDB server is operational.
The script logs both success and failure, along with response times, providing valuable diagnostics.

Cron Job Setup for Connectivity

Schedule this script to run frequently, perhaps every minute, to catch transient network issues or service disruptions quickly.

* * * * * /path/to/check_mongo_connectivity.sh >> /var/log/mongodb/connectivity.log 2>&1

Integrating with Alerting Systems

The scripts above generate log files with clear “ALERT”, “WARNING”, or “ERROR” indicators. A robust alerting strategy involves a log monitoring tool (like ELK stack, Graylog, Datadog, Prometheus Alertmanager) that can:

Ingest these log files.
Parse the log entries to extract relevant information (host, type of alert, severity).
Trigger notifications via Slack, PagerDuty, email, or other channels based on predefined rules.

Alternatively, you can modify the scripts to directly send alerts. For example, using `curl` to post to a webhook:

# Example snippet to add to scripts for Slack alerts
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX"
ALERT_MESSAGE="MongoDB Alert: ${host_port} is unreachable."

curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"${ALERT_MESSAGE}\"}" "$SLACK_WEBHOOK_URL"

PHP Application Monitoring on OVH

Monitoring your PHP application involves looking at both its performance and its interaction with external services, particularly your MongoDB cluster. OVH’s infrastructure provides various tools, but a comprehensive approach often combines application-level metrics with infrastructure insights.

1. PHP Error and Exception Tracking

The first line of defense is capturing all errors and exceptions. PHP’s built-in error reporting is essential, but for production, you need a centralized logging and error tracking system.

Configuration: `php.ini`

; Enable all error reporting in development, but log to file in production
display_errors = Off
log_errors = On
error_log = /var/log/php/php_errors.log ; Ensure this directory and file are writable by the web server user (e.g., www-data)

; Set a reasonable error reporting level for production
error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT

; Increase memory limit if needed for complex operations
memory_limit = 256M

; Set a reasonable execution time limit
max_execution_time = 60

For more advanced tracking, integrate a service like Sentry, Bugsnag, or LogRocket. These services provide rich context for errors, including stack traces, request details, and user information.

2. Performance Monitoring (APM)

Understanding where your PHP application spends its time is critical for optimization. Application Performance Monitoring (APM) tools are invaluable here.

Using the New Relic PHP Agent (Example)

OVH often provides managed services or allows easy installation of agents. If using New Relic:

# Install the agent (example for Debian/Ubuntu)
sudo apt update
sudo apt install newrelic-php5

# Configure the agent
sudo newrelic-install install

# Edit the New Relic configuration file
sudo nano /etc/newrelic/newrelic.ini

# Ensure these settings are correct:
;license = "YOUR_LICENSE_KEY"
;app_name = "Your PHP App Name"
;enabled = true
;daemon.loglevel = info
;daemon.pass_environment = 1

After restarting your web server (e.g., Apache or Nginx) and PHP-FPM, New Relic will start reporting transactions, database calls (including MongoDB queries), and external service calls.

3. Monitoring MongoDB Interactions from PHP

Your APM tool should automatically instrument the MongoDB driver. However, you can also add custom instrumentation to track specific query performance or identify slow queries directly within your PHP code.

Custom Instrumentation Example (using New Relic API)

<?php
// Ensure New Relic agent is loaded
if (extension_loaded('newrelic')) {
    // Start a custom transaction segment for a specific operation
    $segment_name = "MongoDB: find users";
    $start_time = microtime(true);

    try {
        // Your MongoDB query logic here
        // Example using MongoDB\Client
        $client = new MongoDB\Client("mongodb://user:password@your_mongo_host:27017");
        $collection = $client->selectCollection('your_db', 'users');
        $cursor = $collection->find(['status' => 'active']);

        // Process results...

        $end_time = microtime(true);
        $duration = ($end_time - $start_time) * 1000; // Duration in milliseconds

        // Record the custom segment
        newrelic_add_custom_segment(array(
            "name" => $segment_name,
            "duration" => $duration,
            "parameters" => array(
                "query" => "db.users.find({status: 'active'})", // Log the query for context
                "host" => "your_mongo_host",
                "db" => "your_db",
                "collection" => "users"
            )
        ));

    } catch (MongoDB\Driver\Exception\Exception $e) {
        $end_time = microtime(true);
        $duration = ($end_time - $start_time) * 1000; // Duration in milliseconds

        // Record error and segment
        newrelic_notice_error("MongoDB Query Error", $e->getMessage());
        newrelic_add_custom_segment(array(
            "name" => $segment_name . " - ERROR",
            "duration" => $duration,
            "parameters" => array(
                "error_message" => $e->getMessage(),
                "host" => "your_mongo_host",
                "db" => "your_db",
                "collection" => "users"
            )
        ));
        // Re-throw or handle error appropriately
        throw $e;
    }
} else {
    // New Relic agent not loaded, execute query directly
    // ... your MongoDB query logic ...
}
?>

This allows you to pinpoint slow MongoDB operations originating from your PHP code, even if the general APM doesn’t highlight them specifically.

4. OVH-Specific Monitoring Tools

OVH offers several monitoring services:

OVHcloud Control Panel Metrics: Provides basic CPU, RAM, disk I/O, and network traffic for your instances. Regularly check these for unusual spikes or sustained high utilization.
Log Management: If you’re using OVH’s managed logging services, ensure your PHP error logs and system logs are ingested and searchable.
Alerting System: Configure OVH’s built-in alerting for critical infrastructure events (e.g., instance down, disk full on host).

While these are useful, they are often infrastructure-centric. For application-level insights, integrating third-party APM and error tracking is highly recommended.

Conclusion

A robust server monitoring strategy for a PHP application and its MongoDB cluster on OVH requires a multi-layered approach. Proactive checks using `mongosh` and `cron` for MongoDB health, combined with application-level monitoring (APM, error tracking) for the PHP application, provides comprehensive visibility. Regularly reviewing logs and integrating with an effective alerting system ensures that potential issues are identified and addressed before they impact your users.