Server Monitoring Best Practices: Keeping Your WordPress App and Elasticsearch Clusters Alive on Linode

Establishing a Robust Monitoring Foundation with Prometheus and Grafana

Maintaining high availability for a critical WordPress application and its accompanying Elasticsearch cluster on Linode necessitates a proactive, data-driven approach to monitoring. We’ll leverage Prometheus for metrics collection and alerting, and Grafana for visualization. This setup provides deep insights into resource utilization, application performance, and potential failure points.

Deploying Prometheus on Linode

A dedicated Linode instance is ideal for hosting Prometheus. We’ll start by installing Prometheus from its official binary releases to ensure we have the latest stable version.

Installation Steps

First, download the latest Prometheus release. Replace X.Y.Z with the current version number.

wget https://github.com/prometheus/prometheus/releases/download/vX.Y.Z/prometheus-X.Y.Z.linux-amd64.tar.gz
tar xvfz prometheus-X.Y.Z.linux-amd64.tar.gz
cd prometheus-X.Y.Z.linux-amd64

Move the binaries to a common location and create a dedicated user for Prometheus.

sudo mv prometheus /usr/local/bin/
sudo mv promtool /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /var/lib/prometheus

Configure Prometheus to run as a systemd service for automatic startup and management.

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target

Create the systemd service file and enable/start Prometheus.

sudo mv prometheus.yml /etc/prometheus/
sudo mv consoles /etc/prometheus/
sudo mv console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown -R prometheus:prometheus /var/lib/prometheus
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus

Configuring Prometheus for WordPress and Elasticsearch

The core of Prometheus configuration lies in /etc/prometheus/prometheus.yml. We need to define scrape targets for our WordPress application and Elasticsearch cluster. For WordPress, we’ll use the node_exporter for system-level metrics and potentially a PHP-FPM exporter if applicable. For Elasticsearch, we’ll use the official Elasticsearch exporter.

WordPress Monitoring with Node Exporter

Install node_exporter on each Linode instance running your WordPress application. Follow similar installation steps as Prometheus, but use the node_exporter binary.

global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds. Overridden by the scrape_config interval.
  evaluation_interval: 15s # By default, evaluate rules every 15 seconds.

scrape_configs:
  # Scrape Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Scrape Node Exporter for WordPress nodes
  - job_name: 'wordpress_nodes'
    static_configs:
      - targets:
          - 'wordpress_node_1_ip:9100' # Replace with actual IP/hostname
          - 'wordpress_node_2_ip:9100' # Replace with actual IP/hostname
          # ... add all WordPress nodes

  # Scrape Elasticsearch Exporter for Elasticsearch nodes
  - job_name: 'elasticsearch'
    static_configs:
      - targets:
          - 'elasticsearch_node_1_ip:9114' # Replace with actual IP/hostname and port
          - 'elasticsearch_node_2_ip:9114' # Replace with actual IP/hostname and port
          # ... add all Elasticsearch nodes

Elasticsearch Exporter Setup

The Elasticsearch exporter (es_exporter) needs to be installed and configured on a separate Linode instance or on one of the Elasticsearch nodes. Download and run it, ensuring it can connect to your Elasticsearch cluster. The default port is 9114.

# Example of running es_exporter (adjust ES_URL as needed)
./es_exporter --es.uri="http://localhost:9200" --web.listen-address=":9114"

For production, you’d typically run this as a systemd service, similar to Prometheus and Node Exporter.

Deploying Grafana for Visualization

Grafana provides an intuitive interface to visualize the metrics collected by Prometheus. We’ll install Grafana on a dedicated Linode instance.

Installation and Configuration

Add the Grafana APT repository and install Grafana.

sudo apt-get update
sudo apt-get install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana
sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Access Grafana at http://your_grafana_linode_ip:3000. The default credentials are username admin and password admin. You’ll be prompted to change the password on first login.

Connecting Grafana to Prometheus

In Grafana, navigate to Configuration (gear icon) -> Data Sources. Click Add data source, select Prometheus, and enter the URL of your Prometheus server (e.g., http://your_prometheus_linode_ip:9090). Save and test the connection.

Essential WordPress and Elasticsearch Dashboards

While you can build custom dashboards, leveraging pre-built community dashboards can significantly accelerate your monitoring setup. Search for “WordPress” and “Elasticsearch” on the Grafana Dashboards website (grafana.com/grafana/dashboards/).

Key WordPress Metrics to Monitor

CPU Usage: node_cpu_seconds_total (rate over time)
Memory Usage: node_memory_MemAvailable_bytes, node_memory_MemTotal_bytes
Disk I/O: node_disk_io_time_seconds_total (rate), node_disk_reads_completed_total, node_disk_writes_completed_total
Network Traffic: node_network_receive_bytes_total, node_network_transmit_bytes_total (rate)
PHP-FPM (if applicable): Active processes, request duration, slow requests.
Web Server (Nginx/Apache): Request rates, error rates (4xx, 5xx), response times.

Key Elasticsearch Metrics to Monitor

Cluster Health: elasticsearch_cluster_status (0=green, 1=yellow, 2=red)
Node Status: elasticsearch_node_up
JVM Heap Usage: elasticsearch_jvm_memory_heap_used_percent
Disk Usage: elasticsearch_fs_data_free_bytes, elasticsearch_fs_data_total_bytes
Indexing Rate: elasticsearch_indices_indexing_index_total (rate)
Search Rate: elasticsearch_indices_search_query_total (rate)
Request Latency: elasticsearch_indices_search_query_time_in_millis, elasticsearch_indices_indexing_index_time_in_millis
Shards: Number of unassigned shards.

Alerting with Prometheus Alertmanager

Proactive alerting is crucial for preventing downtime. Prometheus Alertmanager handles alerts generated by Prometheus rules.

Setting up Alertmanager

Install Alertmanager similarly to Prometheus. Configure it via a systemd service and a configuration file (e.g., /etc/alertmanager/alertmanager.yml).

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default-receiver' # Default receiver

receivers:
  - name: 'default-receiver'
    slack_configs:
      - api_url: ''
        channel: '#alerts'
        send_resolved: true

  - name: 'critical-receiver'
    slack_configs:
      - api_url: ''
        channel: '#critical-alerts'
        send_resolved: true

# Example of routing alerts based on severity
# routes:
#   - receiver: 'critical-receiver'
#     matchers:
#       - severity="critical"
#     continue: true # If this route matches, continue evaluating other routes

In your prometheus.yml, add the Alertmanager configuration:

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - 'alertmanager_ip:9093' # Replace with your Alertmanager IP and port

Prometheus Alerting Rules

Define alerting rules in separate YAML files (e.g., /etc/prometheus/rules/wordpress_alerts.yml and /etc/prometheus/rules/elasticsearch_alerts.yml). Load these rules in prometheus.yml under the rule_files directive.

- groups:
  - name: wordpress.rules
    rules:
      - alert: HighCpuUsage
        expr: rate(node_cpu_seconds_total{mode="idle"}[5m]) == 0
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "Node {{ $labels.instance }} has been running at 100% CPU for 10 minutes."

      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Node {{ $labels.instance }} has less than 10% disk space remaining on /."

- groups:
  - name: elasticsearch.rules
    rules:
      - alert: ElasticsearchClusterRed
        expr: elasticsearch_cluster_status == 2
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Elasticsearch cluster is RED on {{ $labels.instance }}"
          description: "Elasticsearch cluster {{ $labels.instance }} has entered a RED state."

      - alert: HighElasticsearchHeapUsage
        expr: elasticsearch_jvm_memory_heap_used_percent > 85
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "High Elasticsearch JVM heap usage on {{ $labels.instance }}"
          description: "Elasticsearch node {{ $labels.instance }} is using {{ $value | printf "%.2f" }}% of its JVM heap."

rule_files:
  - "/etc/prometheus/rules/*.yml"

Advanced Considerations and Best Practices

High Availability for Monitoring: For critical production environments, consider deploying Prometheus and Alertmanager in a highly available setup using Thanos or Cortex. This ensures your monitoring system itself doesn’t become a single point of failure.

Service Discovery: Instead of static configurations, leverage service discovery mechanisms (e.g., Consul, Kubernetes SD) if your infrastructure is dynamic. This automatically updates Prometheus scrape targets as services scale up or down.

Log Aggregation: While metrics are essential, logs provide granular detail for debugging. Integrate a log aggregation system like ELK Stack (Elasticsearch, Logstash, Kibana) or Loki with Prometheus for a comprehensive observability solution.

Security: Secure your monitoring endpoints. Use firewalls to restrict access to Prometheus, Grafana, and Alertmanager. Consider authentication and TLS for sensitive data.

Regular Review: Periodically review your dashboards and alert rules. As your application evolves, your monitoring needs will change. Ensure your alerts are actionable and not overly noisy.