Server Monitoring Best Practices: Keeping Your WordPress App and DynamoDB Clusters Alive on OVH
Establishing a Robust Monitoring Baseline for WordPress on OVH
Deploying WordPress on OVH, especially when coupled with a managed service like AWS DynamoDB for specific data persistence needs, necessitates a multi-layered monitoring strategy. This isn’t about generic uptime checks; it’s about granular visibility into application performance, underlying infrastructure health, and potential bottlenecks that could impact user experience and data integrity. We’ll focus on actionable metrics and tools, starting with the WordPress application layer itself.
Application Performance Monitoring (APM) with New Relic
For deep application insights, New Relic is a powerful choice. Its PHP agent can provide transaction traces, database query analysis, and error reporting directly from your WordPress application. The initial setup involves installing the agent and configuring it to communicate with your New Relic account.
New Relic PHP Agent Installation and Configuration
On your OVH instance (assuming a typical Linux distribution like Ubuntu or Debian), you’ll typically install the agent via PECL or by downloading the installer script. Ensure your PHP version is compatible.
PECL Installation (Recommended)
First, ensure you have the necessary build tools and PHP development headers installed:
sudo apt update sudo apt install php-dev php-pear build-essential sudo pecl install newrelic
Next, you need to enable the agent in your php.ini file. The PECL installation usually provides instructions on where to find the correct php.ini file (often multiple files for different SAPI configurations like CLI and FPM). You’ll typically add a line like this:
[PHP] extension=newrelic.so
Then, configure the agent itself. Create or edit the newrelic.ini file (often located in /etc/php/<version>/cli/conf.d/ or similar paths). You’ll need your New Relic license key and a meaningful application name.
; This file is automatically generated by the newrelic installer. ; It is recommended to edit the newrelic.ini file directly. ; For more information, visit: https://docs.newrelic.com/docs/php/new-relic-php-installation-configuration [newrelic] ; Required: Your New Relic license key. ; license = "YOUR_LICENSE_KEY" ; Required: The name of your application. ; appName = "My WordPress App" ; Optional: Set to true to enable the agent. ; enabled = true ; Optional: Set to true to enable the agent for CLI scripts. ; enable_cli = false
After saving these changes, restart your web server (e.g., Apache or Nginx) and PHP-FPM to ensure the agent is loaded.
Key WordPress Metrics to Monitor with New Relic
- Transaction Traces: Identify slow PHP functions, external service calls, and database queries. Look for transactions exceeding your SLO (e.g., > 500ms).
- Database Queries: Pinpoint inefficient SQL queries. Monitor query count, average duration, and slow query logs.
- Error Rate: Track PHP errors and exceptions. Set up alerts for spikes in error frequency.
- External Services: Monitor latency and error rates for API calls to third-party services (e.g., payment gateways, social media APIs).
- WordPress Specifics: New Relic often provides insights into WordPress hooks, plugin performance, and theme execution times.
Monitoring the Underlying OVH Infrastructure
While New Relic covers the application, the OVH infrastructure (compute, network, storage) requires its own set of monitoring tools. OVH provides its own monitoring dashboards, but for deeper integration and custom alerting, we’ll leverage Prometheus and Grafana.
Prometheus for Time-Series Metrics Collection
Prometheus is an open-source systems monitoring and alerting toolkit. It works by scraping metrics from configured targets at given intervals, evaluating rule expressions, displaying the results, and triggering alerts if necessary. We’ll use the Node Exporter for system-level metrics and potentially a specific exporter for your web server (e.g., Nginx exporter).
Node Exporter Deployment on OVH Instances
Download the latest release of Node Exporter from the official Prometheus GitHub repository. For a typical Ubuntu/Debian system:
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/ sudo rm -rf node_exporter-1.7.0.linux-amd64*
Create a systemd service file to manage the Node Exporter process:
[Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=nobody Group=nogroup Type=simple ExecStart=/usr/local/bin/node_exporter [Install] WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter
Ensure your firewall (e.g., UFW) allows access to the Node Exporter’s default port (9100):
sudo ufw allow 9100/tcp
Prometheus Configuration for Scraping
In your Prometheus configuration file (prometheus.yml), add a scrape job for your OVH instances:
scrape_configs:
- job_name: 'ovh-instances'
static_configs:
- targets: ['YOUR_OVH_INSTANCE_IP_1:9100', 'YOUR_OVH_INSTANCE_IP_2:9100']
labels:
instance: 'webserver-01'
- targets: ['YOUR_OVH_INSTANCE_IP_3:9100']
labels:
instance: 'webserver-02'
- job_name: 'nginx'
static_configs:
- targets: ['YOUR_OVH_INSTANCE_IP_1:9113', 'YOUR_OVH_INSTANCE_IP_2:9113'] # Assuming nginx-exporter on port 9113
labels:
instance: 'webserver-01'
- targets: ['YOUR_OVH_INSTANCE_IP_3:9113']
labels:
instance: 'webserver-02'
Restart Prometheus after updating the configuration.
Key Infrastructure Metrics to Monitor
- CPU Usage:
node_cpu_seconds_total(rate over time) – Monitor for sustained high utilization. - Memory Usage:
node_memory_MemAvailable_bytes– Ensure sufficient free memory. - Disk I/O:
node_disk_io_time_seconds_total(rate over time) – Identify disk saturation. - Network Traffic:
node_network_receive_bytes_totalandnode_network_transmit_bytes_total(rate over time) – Monitor bandwidth usage and potential network saturation. - Web Server Metrics (Nginx): Active connections, requests per second, error rates (4xx, 5xx).
Grafana for Visualization and Alerting
Grafana provides a powerful dashboarding solution that integrates seamlessly with Prometheus. It allows you to visualize your collected metrics and set up sophisticated alerting rules.
Setting up Grafana and Prometheus Data Source
Install Grafana on a dedicated server or one of your OVH instances. Once installed and running, access the Grafana web UI (default port 3000). Navigate to “Configuration” -> “Data Sources” and add a new Prometheus data source, pointing it to your Prometheus server’s URL (e.g., http://localhost:9090).
Essential Grafana Dashboards
You can import pre-built dashboards from Grafana.com (search for “Node Exporter Full” and “Nginx”) or create custom ones. Key panels to include:
- System Overview (CPU, RAM, Disk, Network)
- Web Server Performance (Requests, Errors, Connections)
- Application Response Times (from New Relic, if integrated via API or webhook)
Grafana Alerting Configuration
Within Grafana, you can define alert rules based on Prometheus queries. For example, an alert for high CPU usage:
Alert: High CPU Usage
Condition: AVG() OF node_cpu_seconds_total{mode="idle", instance="webserver-01"} FOR 5m IS BELOW 10
Evaluate every: 1m
For: 5m
Send to: Alertmanager (or other notification channel)
Configure notification channels (e.g., Slack, PagerDuty, email) in Grafana’s alerting settings.
Monitoring DynamoDB Clusters
When using AWS DynamoDB, monitoring shifts to AWS CloudWatch. DynamoDB provides a rich set of metrics that are crucial for understanding performance, cost, and potential throttling.
Key DynamoDB CloudWatch Metrics
- ConsumedReadCapacityUnits / ConsumedWriteCapacityUnits: Tracks the actual capacity consumed. Essential for cost management and identifying under/over-provisioning.
- ProvisionedReadCapacityUnits / ProvisionedWriteCapacityUnits: Shows the capacity you’ve configured.
- ReadThrottleEvents / WriteThrottleEvents: Critical for identifying throttling. If these are non-zero, your application is being limited by provisioned throughput.
- SuccessfulRequestLatency: Measures the latency of successful requests. Monitor the 95th and 99th percentiles.
- SystemErrors: Tracks internal DynamoDB errors.
- ItemCount: Number of items in the table.
- TableSizeBytes: Size of the table in bytes.
Setting up CloudWatch Alarms
Use the AWS Management Console or AWS CLI to create CloudWatch Alarms for your DynamoDB tables. Focus on actionable alerts:
Example CloudWatch Alarm: Write Throttling
Metric: WriteThrottleEvents
Statistic: Sum
Period: 5 minutes
Threshold type: Static
Condition: Greater/Equal
Value: 1 (Any throttling event is a concern)
Actions: Send notification to SNS topic (which can then trigger Lambda, SQS, etc., or send an email/Slack message).
Example CloudWatch Alarm: High Read Latency
Metric: SuccessfulRequestLatency
Statistic: 95th Percentile
Period: 1 minute
Threshold type: Static
Condition: Greater/Equal
Value: 0.5 (e.g., 500ms – adjust based on your application’s SLO)
Actions: Send notification to SNS topic.
Integrating DynamoDB Metrics with Prometheus/Grafana
While CloudWatch is the primary tool, you can pull DynamoDB metrics into your Prometheus/Grafana stack for a unified dashboard. This typically involves using a CloudWatch exporter for Prometheus or a custom script that queries CloudWatch API and exposes metrics in Prometheus format.
Using `cloudwatch-exporter`
The cloudwatch-exporter project (available on GitHub) can be configured to scrape specific CloudWatch metrics and expose them via an HTTP endpoint for Prometheus to scrape. You’ll need to configure it with AWS credentials and specify the DynamoDB metrics you’re interested in.
# Example configuration snippet for cloudwatch-exporter
aws_credentials:
region: us-east-1
metrics:
- name: aws.dynamodb.consumed_read_capacity_units
namespace: AWS/DynamoDB
dimensions:
- name: TableName
value: "YourDynamoDBTableName"
statistics:
- Sum
period: 300 # 5 minutes
- name: aws.dynamodb.read_throttle_events
namespace: AWS/DynamoDB
dimensions:
- name: TableName
value: "YourDynamoDBTableName"
statistics:
- Sum
period: 300
Once running, add a scrape job in your prometheus.yml to collect metrics from the `cloudwatch-exporter` instance.
Centralized Logging and Alert Aggregation
Beyond metrics, centralized logging is indispensable for debugging and incident response. For application logs (PHP errors, WordPress logs) and system logs, consider a solution like the ELK stack (Elasticsearch, Logstash, Kibana) or a managed service like Datadog or Splunk.
Log Shipping with Filebeat
Filebeat is a lightweight shipper that forwards log files from your OVH servers to a central log aggregation system. Configure Filebeat to tail your PHP error logs, web server access/error logs, and any custom application logs.
# Example filebeat.yml configuration for PHP errors
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/php/error.log # Adjust path as per your PHP configuration
fields:
log_type: php_error
fields_under_root: true
# Output to Logstash or Elasticsearch
output.logstash:
hosts: ["your-logstash-host:5044"]
Alertmanager for Unified Alerting
If you’re using Prometheus and Grafana, Alertmanager is the de facto standard for handling alerts. It deduplicates, groups, and routes alerts to the correct receiver (email, Slack, PagerDuty). Configure Prometheus to send alerts to Alertmanager, and then configure Alertmanager’s alertmanager.yml to define routing rules and receivers.
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver' # Default receiver if no specific route matches
receivers:
- name: 'default-receiver'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL'
channel: '#alerts'
- name: 'critical-receiver'
pagerduty_configs:
- service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
This comprehensive approach, combining application-level APM, infrastructure metrics, cloud service monitoring, and centralized logging/alerting, provides the necessary visibility to keep your WordPress application and DynamoDB clusters healthy and performant on OVH.