Server Monitoring Best Practices: Keeping Your WordPress App and DynamoDB Clusters Alive on OVH

Establishing a Robust Monitoring Baseline for WordPress on OVH

Deploying WordPress on OVH, especially when coupled with a managed service like AWS DynamoDB for specific data persistence needs, necessitates a multi-layered monitoring strategy. This isn’t about generic uptime checks; it’s about granular visibility into application performance, underlying infrastructure health, and potential bottlenecks that could impact user experience and data integrity. We’ll focus on actionable metrics and tools, starting with the WordPress application layer itself.

Application Performance Monitoring (APM) with New Relic

For deep application insights, New Relic is a powerful choice. Its PHP agent can provide transaction traces, database query analysis, and error reporting directly from your WordPress application. The initial setup involves installing the agent and configuring it to communicate with your New Relic account.

New Relic PHP Agent Installation and Configuration

On your OVH instance (assuming a typical Linux distribution like Ubuntu or Debian), you’ll typically install the agent via PECL or by downloading the installer script. Ensure your PHP version is compatible.

PECL Installation (Recommended)

First, ensure you have the necessary build tools and PHP development headers installed:

sudo apt update
sudo apt install php-dev php-pear build-essential
sudo pecl install newrelic

Next, you need to enable the agent in your php.ini file. The PECL installation usually provides instructions on where to find the correct php.ini file (often multiple files for different SAPI configurations like CLI and FPM). You’ll typically add a line like this:

[PHP]
extension=newrelic.so

Then, configure the agent itself. Create or edit the newrelic.ini file (often located in /etc/php/<version>/cli/conf.d/ or similar paths). You’ll need your New Relic license key and a meaningful application name.

; This file is automatically generated by the newrelic installer.
; It is recommended to edit the newrelic.ini file directly.
; For more information, visit: https://docs.newrelic.com/docs/php/new-relic-php-installation-configuration

[newrelic]
; Required: Your New Relic license key.
; license = "YOUR_LICENSE_KEY"

; Required: The name of your application.
; appName = "My WordPress App"

; Optional: Set to true to enable the agent.
; enabled = true

; Optional: Set to true to enable the agent for CLI scripts.
; enable_cli = false

After saving these changes, restart your web server (e.g., Apache or Nginx) and PHP-FPM to ensure the agent is loaded.

Key WordPress Metrics to Monitor with New Relic

Transaction Traces: Identify slow PHP functions, external service calls, and database queries. Look for transactions exceeding your SLO (e.g., > 500ms).
Database Queries: Pinpoint inefficient SQL queries. Monitor query count, average duration, and slow query logs.
Error Rate: Track PHP errors and exceptions. Set up alerts for spikes in error frequency.
External Services: Monitor latency and error rates for API calls to third-party services (e.g., payment gateways, social media APIs).
WordPress Specifics: New Relic often provides insights into WordPress hooks, plugin performance, and theme execution times.

Monitoring the Underlying OVH Infrastructure

While New Relic covers the application, the OVH infrastructure (compute, network, storage) requires its own set of monitoring tools. OVH provides its own monitoring dashboards, but for deeper integration and custom alerting, we’ll leverage Prometheus and Grafana.

Prometheus for Time-Series Metrics Collection

Prometheus is an open-source systems monitoring and alerting toolkit. It works by scraping metrics from configured targets at given intervals, evaluating rule expressions, displaying the results, and triggering alerts if necessary. We’ll use the Node Exporter for system-level metrics and potentially a specific exporter for your web server (e.g., Nginx exporter).

Node Exporter Deployment on OVH Instances

Download the latest release of Node Exporter from the official Prometheus GitHub repository. For a typical Ubuntu/Debian system:

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo rm -rf node_exporter-1.7.0.linux-amd64*

Create a systemd service file to manage the Node Exporter process:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=nobody
Group=nogroup
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Ensure your firewall (e.g., UFW) allows access to the Node Exporter’s default port (9100):

sudo ufw allow 9100/tcp

Prometheus Configuration for Scraping

In your Prometheus configuration file (prometheus.yml), add a scrape job for your OVH instances:

scrape_configs:
  - job_name: 'ovh-instances'
    static_configs:
      - targets: ['YOUR_OVH_INSTANCE_IP_1:9100', 'YOUR_OVH_INSTANCE_IP_2:9100']
        labels:
          instance: 'webserver-01'
      - targets: ['YOUR_OVH_INSTANCE_IP_3:9100']
        labels:
          instance: 'webserver-02'

  - job_name: 'nginx'
    static_configs:
      - targets: ['YOUR_OVH_INSTANCE_IP_1:9113', 'YOUR_OVH_INSTANCE_IP_2:9113'] # Assuming nginx-exporter on port 9113
        labels:
          instance: 'webserver-01'
      - targets: ['YOUR_OVH_INSTANCE_IP_3:9113']
        labels:
          instance: 'webserver-02'

Restart Prometheus after updating the configuration.

Key Infrastructure Metrics to Monitor

CPU Usage: node_cpu_seconds_total (rate over time) – Monitor for sustained high utilization.
Memory Usage: node_memory_MemAvailable_bytes – Ensure sufficient free memory.
Disk I/O: node_disk_io_time_seconds_total (rate over time) – Identify disk saturation.
Network Traffic: node_network_receive_bytes_total and node_network_transmit_bytes_total (rate over time) – Monitor bandwidth usage and potential network saturation.
Web Server Metrics (Nginx): Active connections, requests per second, error rates (4xx, 5xx).

Grafana for Visualization and Alerting

Grafana provides a powerful dashboarding solution that integrates seamlessly with Prometheus. It allows you to visualize your collected metrics and set up sophisticated alerting rules.

Setting up Grafana and Prometheus Data Source

Install Grafana on a dedicated server or one of your OVH instances. Once installed and running, access the Grafana web UI (default port 3000). Navigate to “Configuration” -> “Data Sources” and add a new Prometheus data source, pointing it to your Prometheus server’s URL (e.g., http://localhost:9090).

Essential Grafana Dashboards

You can import pre-built dashboards from Grafana.com (search for “Node Exporter Full” and “Nginx”) or create custom ones. Key panels to include:

System Overview (CPU, RAM, Disk, Network)
Web Server Performance (Requests, Errors, Connections)
Application Response Times (from New Relic, if integrated via API or webhook)

Grafana Alerting Configuration

Within Grafana, you can define alert rules based on Prometheus queries. For example, an alert for high CPU usage:

Alert: High CPU Usage
Condition: AVG() OF node_cpu_seconds_total{mode="idle", instance="webserver-01"} FOR 5m IS BELOW 10
Evaluate every: 1m
For: 5m
Send to: Alertmanager (or other notification channel)

Configure notification channels (e.g., Slack, PagerDuty, email) in Grafana’s alerting settings.

Monitoring DynamoDB Clusters

When using AWS DynamoDB, monitoring shifts to AWS CloudWatch. DynamoDB provides a rich set of metrics that are crucial for understanding performance, cost, and potential throttling.

Key DynamoDB CloudWatch Metrics

ConsumedReadCapacityUnits / ConsumedWriteCapacityUnits: Tracks the actual capacity consumed. Essential for cost management and identifying under/over-provisioning.
ProvisionedReadCapacityUnits / ProvisionedWriteCapacityUnits: Shows the capacity you’ve configured.
ReadThrottleEvents / WriteThrottleEvents: Critical for identifying throttling. If these are non-zero, your application is being limited by provisioned throughput.
SuccessfulRequestLatency: Measures the latency of successful requests. Monitor the 95th and 99th percentiles.
SystemErrors: Tracks internal DynamoDB errors.
ItemCount: Number of items in the table.
TableSizeBytes: Size of the table in bytes.

Setting up CloudWatch Alarms

Use the AWS Management Console or AWS CLI to create CloudWatch Alarms for your DynamoDB tables. Focus on actionable alerts:

Example CloudWatch Alarm: Write Throttling

Metric: WriteThrottleEvents

Statistic: Sum

Period: 5 minutes

Threshold type: Static

Condition: Greater/Equal

Value: 1 (Any throttling event is a concern)

Actions: Send notification to SNS topic (which can then trigger Lambda, SQS, etc., or send an email/Slack message).

Example CloudWatch Alarm: High Read Latency

Metric: SuccessfulRequestLatency

Statistic: 95th Percentile

Period: 1 minute

Threshold type: Static

Condition: Greater/Equal

Value: 0.5 (e.g., 500ms – adjust based on your application’s SLO)

Actions: Send notification to SNS topic.

Integrating DynamoDB Metrics with Prometheus/Grafana

While CloudWatch is the primary tool, you can pull DynamoDB metrics into your Prometheus/Grafana stack for a unified dashboard. This typically involves using a CloudWatch exporter for Prometheus or a custom script that queries CloudWatch API and exposes metrics in Prometheus format.

Using `cloudwatch-exporter`

The cloudwatch-exporter project (available on GitHub) can be configured to scrape specific CloudWatch metrics and expose them via an HTTP endpoint for Prometheus to scrape. You’ll need to configure it with AWS credentials and specify the DynamoDB metrics you’re interested in.

# Example configuration snippet for cloudwatch-exporter
aws_credentials:
  region: us-east-1
metrics:
  - name: aws.dynamodb.consumed_read_capacity_units
    namespace: AWS/DynamoDB
    dimensions:
      - name: TableName
        value: "YourDynamoDBTableName"
    statistics:
      - Sum
    period: 300 # 5 minutes
  - name: aws.dynamodb.read_throttle_events
    namespace: AWS/DynamoDB
    dimensions:
      - name: TableName
        value: "YourDynamoDBTableName"
    statistics:
      - Sum
    period: 300

Once running, add a scrape job in your prometheus.yml to collect metrics from the `cloudwatch-exporter` instance.

Centralized Logging and Alert Aggregation

Beyond metrics, centralized logging is indispensable for debugging and incident response. For application logs (PHP errors, WordPress logs) and system logs, consider a solution like the ELK stack (Elasticsearch, Logstash, Kibana) or a managed service like Datadog or Splunk.

Log Shipping with Filebeat

Filebeat is a lightweight shipper that forwards log files from your OVH servers to a central log aggregation system. Configure Filebeat to tail your PHP error logs, web server access/error logs, and any custom application logs.

# Example filebeat.yml configuration for PHP errors
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/php/error.log # Adjust path as per your PHP configuration
  fields:
    log_type: php_error
  fields_under_root: true

# Output to Logstash or Elasticsearch
output.logstash:
  hosts: ["your-logstash-host:5044"]

Alertmanager for Unified Alerting

If you’re using Prometheus and Grafana, Alertmanager is the de facto standard for handling alerts. It deduplicates, groups, and routes alerts to the correct receiver (email, Slack, PagerDuty). Configure Prometheus to send alerts to Alertmanager, and then configure Alertmanager’s alertmanager.yml to define routing rules and receivers.

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

  receiver: 'default-receiver' # Default receiver if no specific route matches

receivers:
- name: 'default-receiver'
  slack_configs:
  - api_url: 'YOUR_SLACK_WEBHOOK_URL'
    channel: '#alerts'

- name: 'critical-receiver'
  pagerduty_configs:
  - service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'

This comprehensive approach, combining application-level APM, infrastructure metrics, cloud service monitoring, and centralized logging/alerting, provides the necessary visibility to keep your WordPress application and DynamoDB clusters healthy and performant on OVH.