Server Monitoring Best Practices: Keeping Your WordPress App and Elasticsearch Clusters Alive on Linode
Establishing a Robust Monitoring Foundation with Prometheus and Grafana
Maintaining high availability for a critical WordPress application and its accompanying Elasticsearch cluster on Linode necessitates a proactive, data-driven approach to monitoring. We’ll leverage Prometheus for metrics collection and alerting, and Grafana for visualization. This setup provides deep insights into resource utilization, application performance, and potential failure points.
Deploying Prometheus on Linode
A dedicated Linode instance is ideal for hosting Prometheus. We’ll start by installing Prometheus from its official binary releases to ensure we have the latest stable version.
Installation Steps
First, download the latest Prometheus release. Replace X.Y.Z with the current version number.
wget https://github.com/prometheus/prometheus/releases/download/vX.Y.Z/prometheus-X.Y.Z.linux-amd64.tar.gz tar xvfz prometheus-X.Y.Z.linux-amd64.tar.gz cd prometheus-X.Y.Z.linux-amd64
Move the binaries to a common location and create a dedicated user for Prometheus.
sudo mv prometheus /usr/local/bin/ sudo mv promtool /usr/local/bin/ sudo useradd --no-create-home --shell /bin/false prometheus sudo mkdir /var/lib/prometheus
Configure Prometheus to run as a systemd service for automatic startup and management.
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
Create the systemd service file and enable/start Prometheus.
sudo mv prometheus.yml /etc/prometheus/ sudo mv consoles /etc/prometheus/ sudo mv console_libraries /etc/prometheus/ sudo chown -R prometheus:prometheus /etc/prometheus sudo chown -R prometheus:prometheus /var/lib/prometheus sudo systemctl daemon-reload sudo systemctl start prometheus sudo systemctl enable prometheus
Configuring Prometheus for WordPress and Elasticsearch
The core of Prometheus configuration lies in /etc/prometheus/prometheus.yml. We need to define scrape targets for our WordPress application and Elasticsearch cluster. For WordPress, we’ll use the node_exporter for system-level metrics and potentially a PHP-FPM exporter if applicable. For Elasticsearch, we’ll use the official Elasticsearch exporter.
WordPress Monitoring with Node Exporter
Install node_exporter on each Linode instance running your WordPress application. Follow similar installation steps as Prometheus, but use the node_exporter binary.
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds. Overridden by the scrape_config interval.
evaluation_interval: 15s # By default, evaluate rules every 15 seconds.
scrape_configs:
# Scrape Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Scrape Node Exporter for WordPress nodes
- job_name: 'wordpress_nodes'
static_configs:
- targets:
- 'wordpress_node_1_ip:9100' # Replace with actual IP/hostname
- 'wordpress_node_2_ip:9100' # Replace with actual IP/hostname
# ... add all WordPress nodes
# Scrape Elasticsearch Exporter for Elasticsearch nodes
- job_name: 'elasticsearch'
static_configs:
- targets:
- 'elasticsearch_node_1_ip:9114' # Replace with actual IP/hostname and port
- 'elasticsearch_node_2_ip:9114' # Replace with actual IP/hostname and port
# ... add all Elasticsearch nodes
Elasticsearch Exporter Setup
The Elasticsearch exporter (es_exporter) needs to be installed and configured on a separate Linode instance or on one of the Elasticsearch nodes. Download and run it, ensuring it can connect to your Elasticsearch cluster. The default port is 9114.
# Example of running es_exporter (adjust ES_URL as needed) ./es_exporter --es.uri="http://localhost:9200" --web.listen-address=":9114"
For production, you’d typically run this as a systemd service, similar to Prometheus and Node Exporter.
Deploying Grafana for Visualization
Grafana provides an intuitive interface to visualize the metrics collected by Prometheus. We’ll install Grafana on a dedicated Linode instance.
Installation and Configuration
Add the Grafana APT repository and install Grafana.
sudo apt-get update sudo apt-get install -y apt-transport-https software-properties-common wget wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list sudo apt-get update sudo apt-get install -y grafana sudo systemctl daemon-reload sudo systemctl start grafana-server sudo systemctl enable grafana-server
Access Grafana at http://your_grafana_linode_ip:3000. The default credentials are username admin and password admin. You’ll be prompted to change the password on first login.
Connecting Grafana to Prometheus
In Grafana, navigate to Configuration (gear icon) -> Data Sources. Click Add data source, select Prometheus, and enter the URL of your Prometheus server (e.g., http://your_prometheus_linode_ip:9090). Save and test the connection.
Essential WordPress and Elasticsearch Dashboards
While you can build custom dashboards, leveraging pre-built community dashboards can significantly accelerate your monitoring setup. Search for “WordPress” and “Elasticsearch” on the Grafana Dashboards website (grafana.com/grafana/dashboards/).
Key WordPress Metrics to Monitor
- CPU Usage:
node_cpu_seconds_total(rate over time) - Memory Usage:
node_memory_MemAvailable_bytes,node_memory_MemTotal_bytes - Disk I/O:
node_disk_io_time_seconds_total(rate),node_disk_reads_completed_total,node_disk_writes_completed_total - Network Traffic:
node_network_receive_bytes_total,node_network_transmit_bytes_total(rate) - PHP-FPM (if applicable): Active processes, request duration, slow requests.
- Web Server (Nginx/Apache): Request rates, error rates (4xx, 5xx), response times.
Key Elasticsearch Metrics to Monitor
- Cluster Health:
elasticsearch_cluster_status(0=green, 1=yellow, 2=red) - Node Status:
elasticsearch_node_up - JVM Heap Usage:
elasticsearch_jvm_memory_heap_used_percent - Disk Usage:
elasticsearch_fs_data_free_bytes,elasticsearch_fs_data_total_bytes - Indexing Rate:
elasticsearch_indices_indexing_index_total(rate) - Search Rate:
elasticsearch_indices_search_query_total(rate) - Request Latency:
elasticsearch_indices_search_query_time_in_millis,elasticsearch_indices_indexing_index_time_in_millis - Shards: Number of unassigned shards.
Alerting with Prometheus Alertmanager
Proactive alerting is crucial for preventing downtime. Prometheus Alertmanager handles alerts generated by Prometheus rules.
Setting up Alertmanager
Install Alertmanager similarly to Prometheus. Configure it via a systemd service and a configuration file (e.g., /etc/alertmanager/alertmanager.yml).
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver' # Default receiver
receivers:
- name: 'default-receiver'
slack_configs:
- api_url: ''
channel: '#alerts'
send_resolved: true
- name: 'critical-receiver'
slack_configs:
- api_url: ''
channel: '#critical-alerts'
send_resolved: true
# Example of routing alerts based on severity
# routes:
# - receiver: 'critical-receiver'
# matchers:
# - severity="critical"
# continue: true # If this route matches, continue evaluating other routes
In your prometheus.yml, add the Alertmanager configuration:
alerting:
alertmanagers:
- static_configs:
- targets:
- 'alertmanager_ip:9093' # Replace with your Alertmanager IP and port
Prometheus Alerting Rules
Define alerting rules in separate YAML files (e.g., /etc/prometheus/rules/wordpress_alerts.yml and /etc/prometheus/rules/elasticsearch_alerts.yml). Load these rules in prometheus.yml under the rule_files directive.
- groups:
- name: wordpress.rules
rules:
- alert: HighCpuUsage
expr: rate(node_cpu_seconds_total{mode="idle"}[5m]) == 0
for: 10m
labels:
severity: critical
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "Node {{ $labels.instance }} has been running at 100% CPU for 10 minutes."
- alert: LowDiskSpace
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
for: 5m
labels:
severity: warning
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Node {{ $labels.instance }} has less than 10% disk space remaining on /."
- groups:
- name: elasticsearch.rules
rules:
- alert: ElasticsearchClusterRed
expr: elasticsearch_cluster_status == 2
for: 5m
labels:
severity: critical
annotations:
summary: "Elasticsearch cluster is RED on {{ $labels.instance }}"
description: "Elasticsearch cluster {{ $labels.instance }} has entered a RED state."
- alert: HighElasticsearchHeapUsage
expr: elasticsearch_jvm_memory_heap_used_percent > 85
for: 15m
labels:
severity: warning
annotations:
summary: "High Elasticsearch JVM heap usage on {{ $labels.instance }}"
description: "Elasticsearch node {{ $labels.instance }} is using {{ $value | printf "%.2f" }}% of its JVM heap."
rule_files: - "/etc/prometheus/rules/*.yml"
Advanced Considerations and Best Practices
High Availability for Monitoring: For critical production environments, consider deploying Prometheus and Alertmanager in a highly available setup using Thanos or Cortex. This ensures your monitoring system itself doesn’t become a single point of failure.
Service Discovery: Instead of static configurations, leverage service discovery mechanisms (e.g., Consul, Kubernetes SD) if your infrastructure is dynamic. This automatically updates Prometheus scrape targets as services scale up or down.
Log Aggregation: While metrics are essential, logs provide granular detail for debugging. Integrate a log aggregation system like ELK Stack (Elasticsearch, Logstash, Kibana) or Loki with Prometheus for a comprehensive observability solution.
Security: Secure your monitoring endpoints. Use firewalls to restrict access to Prometheus, Grafana, and Alertmanager. Consider authentication and TLS for sensitive data.
Regular Review: Periodically review your dashboards and alert rules. As your application evolves, your monitoring needs will change. Ensure your alerts are actionable and not overly noisy.