Server Monitoring Best Practices: Keeping Your WordPress App and Elasticsearch Clusters Alive on AWS
Proactive Health Checks for WordPress on AWS EC2
Maintaining a high-availability WordPress deployment on AWS EC2 necessitates a multi-layered monitoring strategy. Beyond basic CPU and memory utilization, we need to inspect application-level metrics and critical system processes. This involves leveraging both AWS CloudWatch and custom scripting for granular insights.
EC2 Instance Monitoring with CloudWatch Agent
The CloudWatch Agent is indispensable for collecting system-level metrics beyond the default EC2 metrics. It allows us to push custom metrics, log files, and detailed system performance data to CloudWatch. For a WordPress server, we’ll focus on disk I/O, network traffic, and specific process monitoring.
First, install the CloudWatch Agent on your EC2 instance. The installation process varies slightly by OS, but generally involves downloading the package and running the installer.
CloudWatch Agent Configuration
The agent’s configuration is defined in a JSON file, typically located at /opt/aws/amazon-cloudwatch-agent/bin/config.json. Here’s a sample configuration focusing on WordPress-relevant metrics:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "cwagent"
},
"metrics": {
"namespace": "WordPress/EC2",
"metrics_collected": {
"disk": {
"measurement": [
"used_percent",
"free",
"total",
"inodes_free",
"inodes_used_percent"
],
"resources": [
"/",
"/var/www/html"
],
"ignore_devices": [
"tmpfs",
"devtmpfs"
]
},
"mem": {
"measurement": [
"mem_used_percent",
"mem_available_percent",
"swap_used_percent"
]
},
"net": {
"measurement": [
"bytes_sent",
"bytes_recv",
"packets_sent",
"packets_recv"
]
},
"statsd": {
"service_address": "udp:localhost:8125"
},
"process": [
{
"process_name": "php-fpm",
"measurement": [
"pid",
"cpu_usage",
"memory_usage"
]
},
{
"process_name": "nginx",
"measurement": [
"pid",
"cpu_usage",
"memory_usage"
]
},
{
"process_name": "mysqld",
"measurement": [
"pid",
"cpu_usage",
"memory_usage"
]
}
]
}
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/nginx/access.log",
"log_group_name": "WordPress/EC2/Nginx/Access",
"log_stream_name": "{instance_id}/nginx-access"
},
{
"file_path": "/var/log/nginx/error.log",
"log_group_name": "WordPress/EC2/Nginx/Error",
"log_stream_name": "{instance_id}/nginx-error"
},
{
"file_path": "/var/log/php-fpm/error.log",
"log_group_name": "WordPress/EC2/PHP-FPM/Error",
"log_stream_name": "{instance_id}/php-fpm-error"
},
{
"file_path": "/var/log/mysql/error.log",
"log_group_name": "WordPress/EC2/MySQL/Error",
"log_stream_name": "{instance_id}/mysql-error"
}
]
}
}
}
}
After saving the configuration, start and enable the agent:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s sudo systemctl enable amazon-cloudwatch-agent sudo systemctl start amazon-cloudwatch-agent
Application-Level Monitoring with StatsD and Prometheus (Optional but Recommended)
For deeper application insights, integrating StatsD and Prometheus provides a robust solution. WordPress itself can be instrumented to expose custom metrics. We’ll use the StatsD daemon to collect these metrics and then scrape them with Prometheus.
WordPress PHP-FPM Metrics via StatsD
Ensure your php-fpm configuration includes the StatsD extension or a custom logger that sends metrics. A common approach is to use a library like php-statsd-client.
<?php
// Example: In a custom plugin or mu-plugin
require_once 'vendor/autoload.php'; // If using Composer
use Domnikl\Statsd\Connection\UdpSocket;
use Domnikl\Statsd\Statsd;
$connection = new UdpSocket('127.0.0.1', 8125);
$statsd = new Statsd($connection);
// Track request duration
$startTime = microtime(true);
// ... WordPress request processing ...
$duration = microtime(true) - $startTime;
$statsd->timing('request_duration_ms', $duration * 1000);
// Track successful logins
if (is_user_logged_in() && $user_id === $current_user_id) {
$statsd->increment('user_logins');
}
// Track errors
if (defined('WP_DEBUG') && WP_DEBUG) {
// Log specific errors or exceptions
$statsd->increment('php_errors');
}
?>
Configure the CloudWatch Agent’s statsd section (as shown previously) to listen on UDP port 8125. This will forward these custom metrics to CloudWatch under the WordPress/EC2 namespace.
Elasticsearch Cluster Monitoring on AWS
Monitoring an Elasticsearch cluster, especially one hosted on AWS (whether self-managed on EC2 or using Amazon Elasticsearch Service/OpenSearch Service), requires a different set of tools and metrics. We’ll focus on self-managed EC2 instances for this example.
Elasticsearch Node and Cluster Health Metrics
Elasticsearch exposes a wealth of information via its REST API. We can use tools like curl, or more effectively, a dedicated monitoring solution like Prometheus with the elasticsearch_exporter.
Using Elasticsearch Exporter with Prometheus
The elasticsearch_exporter is a Prometheus exporter that queries Elasticsearch for metrics and exposes them in Prometheus format. Install it on a dedicated monitoring node or one of the Elasticsearch nodes (though a separate node is preferred).
# Download and install the exporter (example for Linux) wget https://github.com/prometheus-community/elasticsearch_exporter/releases/download/vX.Y.Z/elasticsearch_exporter-vX.Y.Z.linux-amd64.tar.gz tar xvfz elasticsearch_exporter-vX.Y.Z.linux-amd64.tar.gz sudo mv elasticsearch_exporter-vX.Y.Z.linux-amd64/elasticsearch_exporter /usr/local/bin/ # Configure the exporter (e.g., via systemd service) # Create a systemd service file: /etc/systemd/system/elasticsearch_exporter.service # Example content: [Unit] Description=Prometheus Elasticsearch Exporter Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/elasticsearch_exporter \ --es.uri=http://localhost:9200 \ --web.listen-address=":9114" \ --es.all_indices \ --es.indices_include=".*" \ --es.cluster_name="my-es-cluster" [Install] WantedBy=multi-user.target # Reload systemd, enable and start the service sudo systemctl daemon-reload sudo systemctl enable elasticsearch_exporter sudo systemctl start elasticsearch_exporter
Ensure your Prometheus configuration scrapes this exporter:
# prometheus.yml
scrape_configs:
- job_name: 'elasticsearch'
static_configs:
- targets: ['your-es-exporter-host:9114']
metric_relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+):.*'
replacement: '$1'
Key Elasticsearch Metrics to Monitor
- Cluster Health:
elasticsearch_cluster_health_status(0=red, 1=yellow, 2=green). Set alerts for non-green statuses. - Node Status:
elasticsearch_node_status(1=online, 0=offline). - JVM Heap Usage:
elasticsearch_jvm_heap_used_percent. Keep this below 80-85% to avoid frequent garbage collection pauses. - Disk Usage:
elasticsearch_fs_data_free_bytesandelasticsearch_fs_data_total_bytes. Monitor free disk space closely, especially for indices with high write rates. - Indexing Rate:
elasticsearch_indices_indexing_rate. High rates can indicate performance bottlenecks. - Search Rate:
elasticsearch_indices_search_rate. Spikes can indicate inefficient queries or heavy load. - Pending Tasks:
elasticsearch_cluster_pending_tasks. A growing number indicates the cluster is struggling to keep up. - Replication Lag: Monitor shard replication status to ensure data consistency.
Alerting Strategies
Effective alerting is crucial for proactive issue resolution. We’ll use AWS CloudWatch Alarms for EC2 metrics and Prometheus Alertmanager for Elasticsearch and application-level metrics.
CloudWatch Alarms for WordPress EC2
Create CloudWatch Alarms based on the metrics collected by the CloudWatch Agent. For example, to alert when disk usage on the WordPress root volume exceeds 85%:
# Using AWS CLI
aws cloudwatch put-metric-alarm \
--alarm-name "WordPress-HighDiskUsage" \
--alarm-description "Alarm when WordPress root disk usage is high" \
--metric-name "disk_used_percent" \
--namespace "WordPress/EC2" \
--statistic "Average" \
--period 300 \
--threshold 85 \
--comparison-operator "GreaterThanOrEqualToThreshold" \
--dimensions "Name=path,Value=/" "Name=InstanceId,Value=i-0abcdef1234567890" \
--evaluation-periods 2 \
--datapoints-to-alarm 2 \
--treat-missing-data "notBreaching" \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:MyAlertsTopic"
Similarly, set alarms for high CPU utilization, low memory, and critical log file errors (e.g., count of “Fatal error” in PHP logs).
Prometheus Alertmanager for Elasticsearch
Configure Alertmanager rules in Prometheus to trigger alerts based on Elasticsearch metrics. Example rule:
# prometheus/alert.rules.yml
groups:
- name: elasticsearch_alerts
rules:
- alert: ElasticsearchClusterRed
expr: elasticsearch_cluster_health_status == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Elasticsearch cluster {{ $labels.cluster_name }} is RED"
description: "The Elasticsearch cluster {{ $labels.cluster_name }} has entered a RED health state."
- alert: ElasticsearchHighJVMPool
expr: elasticsearch_jvm_heap_used_percent > 85
for: 10m
labels:
severity: warning
annotations:
summary: "Elasticsearch JVM heap usage high on {{ $labels.instance }}"
description: "Elasticsearch node {{ $labels.instance }} has JVM heap usage above 85% (current: {{ $value }}%)."
- alert: ElasticsearchLowDiskSpace
expr: elasticsearch_fs_data_free_bytes / elasticsearch_fs_data_total_bytes * 100 < 15
for: 15m
labels:
severity: warning
annotations:
summary: "Elasticsearch low disk space on {{ $labels.instance }}"
description: "Elasticsearch node {{ $labels.instance }} has less than 15% free disk space (current: {{ printf "%.2f" (elasticsearch_fs_data_free_bytes / elasticsearch_fs_data_total_bytes * 100) }}%)."
Ensure Alertmanager is configured to route these alerts to your desired notification channels (Slack, PagerDuty, email, etc.).
Conclusion: A Layered Approach
A robust server monitoring strategy for WordPress and Elasticsearch on AWS involves combining AWS-native tools like CloudWatch with open-source solutions like Prometheus and its ecosystem. By monitoring system resources, application processes, and critical service health metrics, and by implementing intelligent alerting, you can ensure the stability, performance, and availability of your critical applications.