Server Monitoring Best Practices: Keeping Your WordPress App and MongoDB Clusters Alive on OVH

Core Metrics for WordPress and MongoDB on OVH

Effective server monitoring hinges on tracking the right metrics. For a WordPress application backed by MongoDB on OVH, this means a dual focus: the health of the web server stack (PHP-FPM, Nginx) and the performance of the MongoDB cluster.

WordPress Stack Monitoring

The WordPress application itself, typically served by PHP-FPM and Nginx, requires close observation. Key indicators include:

PHP-FPM Process Management: Monitoring the number of active, idle, and rejected PHP-FPM processes is crucial. Spikes in rejected requests often indicate insufficient worker processes or a bottleneck elsewhere.
Nginx Request Handling: Track active connections, requests per second, and error rates (4xx, 5xx). High error rates point to application issues or server overload.
Resource Utilization: CPU, RAM, and I/O wait times for the web server and PHP-FPM processes. Sustained high CPU or RAM usage can lead to slow response times.
Disk Space: Essential for logs, uploads, and temporary files. Running out of disk space is a common cause of application failure.

PHP-FPM Configuration and Monitoring

A common setup involves using `pm.max_children`, `pm.start_servers`, `pm.min_spare_servers`, and `pm.max_spare_servers` to manage PHP-FPM pools. Monitoring these values and the number of requests served by each pool is vital. We can leverage the PHP-FPM status page for this.

First, ensure the status page is enabled in your PHP-FPM pool configuration (e.g., `/etc/php/8.1/fpm/pool.d/www.conf`):

; /etc/php/8.1/fpm/pool.d/www.conf
; ... other configurations ...
pm.status_path = /status
; ... other configurations ...

Then, configure Nginx to proxy requests to this status page. This is typically done within your WordPress site’s Nginx configuration file (e.g., `/etc/nginx/sites-available/your-wordpress-site`):

# /etc/nginx/sites-available/your-wordpress-site
server {
    # ... other server configurations ...

    location ~ ^/status$ {
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass unix:/run/php/php8.1-fpm.sock; # Adjust to your PHP-FPM socket
        allow 127.0.0.1; # Allow localhost access
        deny all;
    }

    # ... other server configurations ...
}

With this in place, you can access `http://your-wordpress-site.com/status` (from localhost or a whitelisted IP) to see output like:

pool: www
process manager: dynamic
current processes: 5
active processes: 1
idle processes: 4
requests: 12345
accepted conn: 12346
listen queue: 0
max listen queue: 0
listen queue len: 0
idle tron: 0
slow requests: 0

Tools like Prometheus with the `node_exporter` and a custom `php_fpm_exporter` (or a generic `textfile_collector` script) can scrape this status page and expose it for monitoring dashboards.

MongoDB Cluster Monitoring

For a MongoDB cluster, the focus shifts to database performance, replication, and resource usage at the cluster level.

Replica Set Status: Essential for high availability. Monitor the state of each member (PRIMARY, SECONDARY, ARBITER, STARTUP, etc.) and replication lag.
Query Performance: Track slow queries, query execution times, and the number of operations per second (reads, writes).
Connection Counts: Monitor the number of active connections to the MongoDB instances. Excessive connections can indicate application issues or insufficient connection pooling.
Resource Utilization: CPU, RAM, Disk I/O, and Network I/O for each MongoDB node. MongoDB is RAM-intensive, so memory usage and swapping are critical.
Disk Usage: MongoDB data files can grow rapidly. Monitor disk space on all nodes, especially for the data directory.
Oplog Size: For replica sets, the operation log (oplog) is crucial. Monitor its size and utilization to ensure secondaries can keep up.

Leveraging MongoDB’s Built-in Tools

MongoDB provides several command-line tools and database commands for introspection. `mongostat` and `mongotop` are invaluable for real-time monitoring.

To monitor a replica set member (e.g., a primary):

# On a MongoDB node
mongostat --host mongodb-node1:27017 --username admin --password 'your_password' --authenticationDatabase admin --oplog --discover --rowcount 10

This command provides a snapshot of operations, inserts, queries, updates, deletes, and oplog entries per second. The `–discover` flag is useful for automatically detecting replica set members.

For detailed replica set status:

# Connect to mongo shell
mongo --host mongodb-node1:27017 --username admin --password 'your_password' --authenticationDatabase admin

# Inside mongo shell
rs.status()

The output of `rs.status()` is a JSON document detailing the state of the replica set, including member states, health, and replication lag. This output can be parsed by monitoring agents.

Monitoring Oplog Usage

A common issue is the oplog filling up, preventing secondaries from catching up. You can monitor this by checking the size of the oplog collection and the time difference between the oldest and newest entries.

# Connect to mongo shell on a PRIMARY node
mongo --host mongodb-primary:27017 --username admin --password 'your_password' --authenticationDatabase admin

# Inside mongo shell
db.getReplicationInfo()

This command shows the oplog’s size, the number of entries, and the time range it covers. A common rule of thumb is to ensure the oplog is at least 24 hours worth of operations, but this can vary significantly based on write load.

OVH Specific Considerations and Tools

OVH’s infrastructure provides specific tools and considerations for monitoring.

OVHcloud Control Panel: Provides basic infrastructure metrics like CPU, RAM, disk, and network usage for your instances. While useful for a high-level overview, it’s insufficient for granular application monitoring.
OVHcloud API: Can be used to programmatically fetch instance metrics, allowing integration with external monitoring systems.
Log Management: OVH offers log management services. Centralizing logs from Nginx, PHP-FPM, and MongoDB is crucial for debugging and incident response. Configure your servers to send logs to a central collector (e.g., ELK stack, Graylog, or OVH’s offering).
Network Monitoring: Monitor network traffic to and from your OVH instances. High latency or packet loss can severely impact application performance, especially for distributed databases like MongoDB.

Integrating with Prometheus and Grafana

A robust monitoring stack often involves Prometheus for time-series data collection and alerting, and Grafana for visualization. Here’s a typical setup:

Prometheus Configuration

You’ll need exporters for your various services. For WordPress/PHP-FPM, you might use `node_exporter` with a `textfile_collector` script that scrapes the PHP-FPM status page. For MongoDB, the official `mongodb_exporter` is recommended.

# prometheus.yml
global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.

scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['your-wordpress-server-ip:9100'] # IP of your WordPress server

  - job_name: 'php_fpm_status'
    static_configs:
      - targets: ['your-wordpress-server-ip:9091'] # Assuming exporter is running on this port

  - job_name: 'mongodb_exporter'
    static_configs:
      - targets: ['your-mongodb-node1-ip:9216', 'your-mongodb-node2-ip:9216', 'your-mongodb-node3-ip:9216'] # IPs of your MongoDB nodes

The `php_fpm_status` job would rely on a custom exporter or a script that periodically fetches `http://localhost/status` and exposes metrics via an HTTP endpoint (e.g., using Python with Flask or Go).

Grafana Dashboards

Import pre-built Grafana dashboards for Node Exporter and MongoDB Exporter. You can find many on Grafana.com. For PHP-FPM, you’ll likely need to create a custom dashboard or adapt an existing one to display metrics scraped from your PHP-FPM status exporter.

Key Grafana panels to include:

PHP-FPM: Active processes, idle processes, requests, slow requests, accepted connections.
Nginx: Requests per second, error rates (4xx, 5xx), active connections.
MongoDB: Replica set status, replication lag, query/update/insert rates, slow queries, connection counts, memory usage, disk I/O.
System: CPU usage, RAM usage, disk space, network traffic.

Alerting Strategies

Proactive alerting is as important as monitoring. Configure alerts for critical conditions:

High Error Rates: Nginx 5xx errors, PHP-FPM slow requests.
Resource Exhaustion: CPU usage consistently above 90%, low disk space (<10% free), high memory usage leading to swapping.
MongoDB Unavailability: Replica set primary down, high replication lag (> 5 minutes).
PHP-FPM Pool Saturation: `pm.max_children` reached, high number of rejected requests.
Disk I/O Saturation: High `iowait` on database servers.

Prometheus Alertmanager is the standard tool for this. Define alert rules in Prometheus:

# rules.yml
groups:
- name: wordpress_alerts
  rules:
  - alert: HighNginx5xxErrors
    expr: sum(rate(nginx_http_requests_total{status=~"5.."}[5m])) by (instance) > 10
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High rate of 5xx errors on Nginx instance {{ $labels.instance }}"
      description: "Nginx on {{ $labels.instance }} is experiencing a high rate of 5xx errors ({{ $value }} req/s)."

  - alert: LowDiskSpace
    expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100 < 10
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Low disk space on {{ $labels.instance }}"
      description: "Filesystem on {{ $labels.instance }} has only {{ $value | printf "%.2f" }}% free space remaining."

- name: mongodb_alerts
  rules:
  - alert: MongoDBPrimaryDown
    expr: mongodb_replica_set_state{state="PRIMARY"} == 0
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "MongoDB replica set primary is down on {{ $labels.instance }}"
      description: "The MongoDB instance {{ $labels.instance }} is not reporting as PRIMARY."

  - alert: MongoDBReplicationLag
    expr: mongodb_replica_set_member_replication_lag_seconds > 300 # 5 minutes lag
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MongoDB replication lag detected on {{ $labels.instance }}"
      description: "MongoDB member {{ $labels.instance }} is lagging by {{ $value }} seconds."

Configure Alertmanager to route these alerts to your preferred notification channels (Slack, PagerDuty, email).

Conclusion

A comprehensive monitoring strategy for WordPress and MongoDB on OVH requires a layered approach. By tracking key metrics for both the application stack and the database cluster, leveraging OVH’s infrastructure insights, and implementing robust alerting with tools like Prometheus and Grafana, you can ensure high availability, performance, and rapid incident response.