Server Monitoring Best Practices: Keeping Your WordPress App and MongoDB Clusters Alive on DigitalOcean

Proactive Health Checks for WordPress and MongoDB on DigitalOcean

Maintaining the uptime and performance of a production WordPress application backed by a MongoDB cluster on DigitalOcean requires a multi-layered monitoring strategy. This isn’t about setting up a single dashboard and forgetting it; it’s about implementing granular, actionable checks that alert you to potential issues *before* they impact users. We’ll focus on essential metrics, configuration snippets, and diagnostic commands that are critical for keeping these services operational.

Monitoring WordPress Core and PHP-FPM

WordPress itself, while a CMS, is a PHP application. Its health is intrinsically linked to the underlying PHP-FPM process. We need to monitor both the application’s responsiveness and the PHP-FPM service’s resource utilization.

WordPress Application Response Time & Error Rate

A simple yet effective way to gauge WordPress health is by periodically fetching a key page (e.g., the homepage or a specific admin URL) and measuring the response time and checking for HTTP errors. We can use `curl` for this, and integrate it into a cron job.

Cron Job for WordPress Health Check

Create a shell script (e.g., `/opt/scripts/check_wordpress.sh`) with the following content:

#!/bin/bash

# Configuration
WP_URL="https://your-wordpress-domain.com"
EXPECTED_HTTP_CODE=200
ALERT_EMAIL="[email protected]"
LOG_FILE="/var/log/wordpress_health.log"
ERROR_LOG="/var/log/wordpress_health_errors.log"

# Timestamp for logging
TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")

# Perform the check
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 --max-time 10 "$WP_URL")

# Log the result
echo "$TIMESTAMP - $WP_URL - HTTP Code: $RESPONSE" >> "$LOG_FILE"

# Check for errors
if [ "$RESPONSE" != "$EXPECTED_HTTP_CODE" ]; then
    echo "$TIMESTAMP - ERROR: WordPress at $WP_URL returned HTTP code $RESPONSE. Expected $EXPECTED_HTTP_CODE." >> "$ERROR_LOG"
    # Send an alert (e.g., via mail)
    echo "WordPress Health Alert: $WP_URL returned HTTP code $RESPONSE" | mail -s "URGENT: WordPress Health Check Failed" "$ALERT_EMAIL"
    exit 1
fi

exit 0

Make the script executable:

sudo chmod +x /opt/scripts/check_wordpress.sh

Add it to cron to run every 5 minutes:

sudo crontab -e
# Add this line:
*/5 * * * * /opt/scripts/check_wordpress.sh

Monitoring PHP-FPM Status

PHP-FPM provides a status page that can be enabled to expose performance metrics. This is invaluable for diagnosing slow PHP execution.

Enabling PHP-FPM Status Page

Locate your PHP-FPM pool configuration file (e.g., `/etc/php/8.1/fpm/pool.d/www.conf` or similar). Add or modify the following directives:

; Ensure the pm.status_path is set
pm.status_path = /status

; Optional: Restrict access to the status page if it's publicly accessible
; For Nginx, you'd typically proxy this to a specific location and restrict by IP
; For simplicity here, we assume it's proxied internally or secured by other means.

Restart PHP-FPM:

sudo systemctl restart php8.1-fpm # Adjust version as needed

Now, you can access the status page via a web server proxy. For Nginx, add a location block to your WordPress site’s configuration:

location ~ ^/status(/.*)?$ {
    # Ensure this matches your PHP-FPM socket or address
    fastcgi_pass unix:/var/run/php/php8.1-fpm.sock;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_param PATH_INFO $fastcgi_path_info;

    # Optional: Restrict access by IP if not using other firewall rules
    # allow 192.168.1.0/24;
    # deny all;

    # For detailed status:
    # fastcgi_split_path_info ^(.+\.php)(/.+)$;
    # fastcgi_param PATH_INFO $fastcgi_path_info;
    # fastcgi_param SCRIPT_FILENAME $fastcgi_script_name;
    # include fastcgi_params;
    # fastcgi_param PHP_VALUE "auto_prepend_file=/path/to/your/php-fpm-status-page.php"; # If using a custom status page script
}

The default PHP-FPM status page provides metrics like:

pool: The name of the pool.
process manager: Static, Dynamic, orond.
start for: Time since the process manager started.
idle processes: Number of idle processes.
active processes: Number of active processes.
total processes: Total number of processes.
max active processes: Maximum number of active processes reached.
listen queue: Number of requests in the queue.
max listen queue: Maximum number of requests in the queue.
listen queue len: The length of the socket queue.
idle timeout: The idle timeout value.

Key metrics to watch are active processes, total processes, and listen queue. A consistently high or growing listen queue indicates that PHP-FPM cannot keep up with the request load, suggesting a need to tune PHP-FPM pool settings (e.g., pm.max_children, pm.start_servers, pm.min_spare_servers, pm.max_spare_servers) or scale the server resources.

Monitoring MongoDB Cluster Health and Performance

A MongoDB cluster (replica set or sharded cluster) requires robust monitoring to ensure data availability, consistency, and query performance. DigitalOcean’s managed MongoDB service simplifies some aspects, but understanding the underlying metrics is crucial.

Essential MongoDB Metrics

We need to monitor:

Replication Lag: The difference in oplog application time between primary and secondaries. Critical for write availability and read consistency.
Disk Usage: MongoDB can consume significant disk space.
Network I/O: High network traffic can indicate inefficient queries or large data transfers.
CPU & Memory Usage: Standard resource monitoring.
Query Performance: Slow queries can cripple an application.
Connections: Number of active connections.
Opcounters: Rate of operations (inserts, queries, updates, deletes).

Accessing MongoDB Metrics

DigitalOcean provides a metrics dashboard for its managed MongoDB service. For self-hosted MongoDB, you’d typically use tools like:

mongostat: Real-time statistics.
mongotop: Real-time usage by collection.
MongoDB’s built-in metrics collection (e.g., via the `serverStatus` command) which can be scraped by Prometheus.

Monitoring Replication Lag with `mongostat`

Connect to your MongoDB primary and run `mongostat` with the `–discover` option to see status across the replica set. Look for the `repl` column, which shows the replication lag in seconds.

mongo --host mongo-primary.your-domain.com:27017 --username your_user --password your_password --authenticationDatabase admin
db.adminCommand({ replSetGetStatus : 1 })

The output of `replSetGetStatus` will show each member, its state, and importantly, the `optime` and `optimeDate` for each member. By comparing the `optimeDate` of the primary with that of a secondary, you can calculate the lag.

# Example of calculating lag from replSetGetStatus output
# On the primary, get its optimeDate
primary_optime_date = ISODate("2023-10-27T10:30:00.123Z")

# On a secondary, get its optimeDate
secondary_optime_date = ISODate("2023-10-27T10:29:55.456Z")

# Calculate lag in seconds
lag_seconds = (primary_optime_date - secondary_optime_date) / 1000 # MongoDB dates are in milliseconds

# If lag_seconds > threshold (e.g., 60 seconds), trigger an alert.

For automated monitoring, you can script this using the MongoDB driver in Python or Node.js and send alerts when lag exceeds a threshold.

Monitoring Disk Usage

Standard system-level monitoring tools are sufficient here. On each MongoDB node:

df -h /var/lib/mongodb # Or wherever your data directory is

Set up alerts when disk usage exceeds 80-90% to prevent write failures.

Monitoring Slow Queries

Enable the MongoDB slow query log. In your MongoDB configuration file (`mongod.conf`):

# mongod.conf
systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true
  verbosity: 0 # Default, adjust as needed
  quiet: false

# Enable slow query logging
operationProfiling:
  slowOpThresholdMs: 100 # Log operations taking longer than 100ms
  mode: slowOp

Restart MongoDB after applying changes.

sudo systemctl restart mongod

You can then tail this log file or use tools like `mtools` (specifically `pt-query-digest` from Percona Toolkit, which works with MongoDB logs) to analyze slow queries over time.

Integrating with DigitalOcean Monitoring and Alerting

DigitalOcean’s platform offers built-in monitoring for Droplets and Managed Databases. Leverage these features:

Droplet Monitoring

Ensure the DigitalOcean agent is installed on your WordPress Droplets. This provides CPU, memory, disk I/O, and network traffic metrics. Configure alerts directly within the DigitalOcean control panel for critical thresholds (e.g., CPU usage > 90% for 15 minutes, disk space < 10% free).

Managed MongoDB Monitoring

DigitalOcean’s Managed MongoDB service exposes key metrics like:

CPU Usage
Memory Usage
Disk Usage
Network I/O
Connections
Replication Lag (often visible in the UI or via API)

Set up alerts for these metrics within the DigitalOcean control panel for your MongoDB cluster. Pay close attention to replication lag alerts, as they are direct indicators of potential data unavailability or consistency issues.

Centralized Logging and Alert Aggregation

While individual service checks are vital, a centralized logging system and an alert aggregation platform are essential for a holistic view and efficient incident response.

Log Management

Forward logs from your WordPress application (web server access/error logs, PHP-FPM logs), and MongoDB instances to a centralized logging service. Options include:

ELK Stack (Elasticsearch, Logstash, Kibana): Powerful but resource-intensive.
Loki (with Promtail and Grafana): A more lightweight, Prometheus-friendly option.
DigitalOcean Log Management: A managed service that can ingest logs from Droplets.

Configure log forwarding agents (e.g., `rsyslog`, `fluentd`, `promtail`) on each server to send relevant logs. For WordPress, this includes Nginx/Apache access and error logs, and PHP-FPM logs. For MongoDB, the slow query log and general `mongod.log` are critical.

Alert Aggregation

Use an alert manager to consolidate alerts from various sources (cron job failures, DigitalOcean alerts, Prometheus/Alertmanager, custom scripts). Tools like:

Alertmanager (part of Prometheus ecosystem): Excellent for deduplication, grouping, and routing alerts.
Opsgenie: A popular commercial solution.
PagerDuty: Another widely used commercial incident management platform.

Configure these tools to send notifications via Slack, email, or SMS based on severity and on-call schedules. Ensure your alerts are actionable and include enough context (server name, metric, threshold, current value) to facilitate quick diagnosis.

Conclusion

A robust server monitoring strategy for a WordPress and MongoDB stack on DigitalOcean involves a combination of application-level checks, service-specific metrics, infrastructure-level monitoring, and centralized logging/alerting. By implementing these granular checks and alerts, you move from reactive firefighting to proactive system management, ensuring the stability and performance of your critical applications.