Server Monitoring Best Practices: Keeping Your WordPress App and PostgreSQL Clusters Alive on Linode

Establishing a Baseline: Essential Metrics for WordPress and PostgreSQL

Effective server monitoring hinges on understanding what “normal” looks like for your specific application stack. For a WordPress site backed by a PostgreSQL cluster on Linode, this means tracking key performance indicators (KPIs) across both the web server layer and the database layer. Without a baseline, anomaly detection and proactive issue resolution become guesswork.

For the WordPress application itself, we’ll focus on metrics that indicate responsiveness and resource utilization. This includes:

HTTP Request Rate: The number of requests per second hitting your web server (Nginx or Apache).
Response Time (Average & Percentiles): How long it takes for the server to respond to requests. P95 and P99 are crucial for identifying outlier latency.
Error Rate (HTTP 5xx): The percentage of requests resulting in server-side errors.
CPU Utilization: Overall CPU load on the web server instances.
Memory Usage: RAM consumption, paying close attention to swap usage.
Disk I/O: Read/write operations per second and latency, especially important for serving static assets and WordPress uploads.

For the PostgreSQL cluster, the focus shifts to database performance and health:

Query Throughput: Transactions per second (TPS) or queries per second (QPS).
Query Latency (Average & Percentiles): Time taken to execute queries.
Connection Count: Number of active and idle connections. Exceeding `max_connections` is a common failure point.
Replication Lag: For high-availability setups, the delay between the primary and replica(s).
CPU & Memory Usage: Similar to web servers, but critical for database operations.
Disk I/O: Database operations are heavily I/O bound.
Cache Hit Ratio: Effectiveness of PostgreSQL’s shared buffer cache.
Lock Contention: Identifying queries waiting for locks.
WAL (Write-Ahead Log) Activity: Rate of WAL generation and archiving.

Agent-Based Monitoring with Prometheus and Node Exporter

Prometheus is a de facto standard for time-series monitoring in cloud-native environments. Its pull-based model and powerful query language (PromQL) make it ideal for collecting and analyzing metrics. We’ll deploy Node Exporter on each Linode instance to expose system-level metrics.

1. Install Node Exporter on each Linode instance:

Download the latest release from the Prometheus GitHub repository. For example, on a Debian/Ubuntu system:

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd -rs /bin/false node_exporter

2. Create a systemd service for Node Exporter:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Save this as /etc/systemd/system/node_exporter.service and then enable and start it:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter

3. Configure Prometheus to scrape Node Exporter:

On your Prometheus server, edit the prometheus.yml configuration file. Add scrape jobs for each Linode instance running Node Exporter.

global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds.
  evaluation_interval: 15s # Evaluate rules every 15 seconds.

scrape_configs:
  # Scrape Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Scrape WordPress web servers
  - job_name: 'wordpress_webservers'
    static_configs:
      - targets:
          - '192.0.2.10:9100' # Replace with your Linode IP for Web Server 1
          - '192.0.2.11:9100' # Replace with your Linode IP for Web Server 2
          # Add more web servers as needed

  # Scrape PostgreSQL primary node
  - job_name: 'postgresql_primary'
    static_configs:
      - targets:
          - '192.0.2.20:9100' # Replace with your Linode IP for PostgreSQL Primary

  # Scrape PostgreSQL replica nodes
  - job_name: 'postgresql_replicas'
    static_configs:
      - targets:
          - '192.0.2.21:9100' # Replace with your Linode IP for PostgreSQL Replica 1
          - '192.0.2.22:9100' # Replace with your Linode IP for PostgreSQL Replica 2
          # Add more replicas as needed

Reload the Prometheus configuration:

curl -X POST http://localhost:9090/-/reload

PostgreSQL Metrics with `postgres_exporter`

Node Exporter provides system-level metrics. To get detailed PostgreSQL metrics, we’ll use postgres_exporter. This exporter connects to your PostgreSQL instances and exposes metrics via an HTTP endpoint, which Prometheus can then scrape.

1. Install `postgres_exporter` on your PostgreSQL nodes:

Download the latest release from its GitHub repository. Similar to Node Exporter:

wget https://github.com/wrouesnel/postgres_exporter/releases/download/v0.13.0/postgres_exporter-0.13.0.linux-amd64.tar.gz
tar xvfz postgres_exporter-0.13.0.linux-amd64.tar.gz
sudo mv postgres_exporter-0.13.0.linux-amd64/postgres_exporter /usr/local/bin/
sudo useradd -rs /bin/false postgres_exporter

2. Configure `postgres_exporter` to connect to PostgreSQL:

Create a PostgreSQL user for the exporter with minimal privileges. This user only needs to connect and read from `pg_stat_activity`, `pg_stat_database`, and other relevant system catalogs. Avoid granting superuser privileges.

-- Connect to your PostgreSQL database as a superuser
CREATE USER monitoring_user WITH PASSWORD 'your_strong_password';
GRANT CONNECT ON DATABASE your_database TO monitoring_user;
GRANT USAGE ON SCHEMA pg_catalog TO monitoring_user;
GRANT SELECT ON pg_stat_activity TO monitoring_user;
GRANT SELECT ON pg_stat_database TO monitoring_user;
GRANT SELECT ON pg_stat_replication TO monitoring_user;
GRANT SELECT ON pg_locks TO monitoring_user;
GRANT SELECT ON pg_stat_statements TO monitoring_user; -- If pg_stat_statements is enabled
-- Add other necessary grants based on the exporter's documentation and your needs

Create a .pgpass file for the exporter user (or a dedicated system user running the exporter) to store the connection string:

# ~/.pgpass or /etc/postgres-exporter/.pgpass
hostname:port:database:username:password
localhost:5432:your_database:monitoring_user:your_strong_password

Ensure the permissions are strict:

chmod 0600 ~/.pgpass

3. Create a systemd service for `postgres_exporter`:

[Unit]
Description=PostgreSQL Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=postgres_exporter
Group=postgres_exporter
Type=simple
# Adjust the DATA_SOURCE_NAME to match your PostgreSQL connection string
# Example: "user=monitoring_user password=your_strong_password host=localhost port=5432 dbname=your_database sslmode=disable"
# Or use the .pgpass file by setting PGUSER, PGHOST, etc. or using the default location.
Environment="DATA_SOURCE_NAME=postgresql://monitoring_user:your_strong_password@localhost:5432/your_database?sslmode=disable"
ExecStart=/usr/local/bin/postgres_exporter --web.listen-address=":9187" --extend.queries-path=/etc/postgres-exporter/queries.yaml

[Install]
WantedBy=multi-user.target

Save this as /etc/systemd/system/postgres_exporter.service. You’ll also need to create /etc/postgres-exporter/queries.yaml for custom queries if needed. For standard metrics, the exporter works out-of-the-box. Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable postgres_exporter
sudo systemctl start postgres_exporter
sudo systemctl status postgres_exporter

4. Configure Prometheus to scrape `postgres_exporter`:

Update your prometheus.yml to include scrape jobs for the PostgreSQL exporter running on port 9187 on your database nodes.

# ... (previous scrape configs) ...

scrape_configs:
  # ... (Prometheus, WordPress webservers) ...

  # Scrape PostgreSQL primary node metrics
  - job_name: 'postgresql_primary_metrics'
    static_configs:
      - targets:
          - '192.0.2.20:9187' # Replace with your Linode IP for PostgreSQL Primary

  # Scrape PostgreSQL replica nodes metrics
  - job_name: 'postgresql_replicas_metrics'
    static_configs:
      - targets:
          - '192.0.2.21:9187' # Replace with your Linode IP for PostgreSQL Replica 1
          - '192.0.2.22:9187' # Replace with your Linode IP for PostgreSQL Replica 2
          # Add more replicas as needed

Reload Prometheus configuration again.

Application-Level Metrics with `php-fpm_exporter` and Custom WordPress Checks

While Node Exporter gives us system-level insights, and `postgres_exporter` database insights, we need metrics directly from the PHP-FPM process serving WordPress and potentially custom checks for WordPress itself.

1. Exposing PHP-FPM Metrics:

PHP-FPM exposes its status via a status page. We can use `php-fpm_exporter` to scrape this and expose it in Prometheus format. First, ensure your PHP-FPM pool configuration (e.g., /etc/php/8.1/fpm/pool.d/www.conf) has the status page enabled:

pm.status_path = /status
ping.path = /ping
ping.response = pong

Restart PHP-FPM:

sudo systemctl restart php8.1-fpm

Install `php-fpm_exporter` (similar process to Node Exporter/postgres_exporter) and configure it to point to your PHP-FPM status page. Typically, this involves setting the --php-fpm.status-url flag.

# Example systemd service snippet for php-fpm_exporter
[Service]
User=php-fpm_exporter
Group=php-fpm_exporter
Type=simple
ExecStart=/usr/local/bin/php-fpm_exporter --web.listen-address=":9124" --php-fpm.status-url="http://127.0.0.1/status" --php-fpm.ping-url="http://127.0.0.1/ping"
# Ensure the web server (Nginx/Apache) is configured to proxy these URLs to the PHP-FPM socket or a local listener.

Add a scrape job for `php-fpm_exporter` (e.g., on port 9124) to your prometheus.yml.

2. Custom WordPress Health Checks:

For deeper WordPress insights, consider a simple PHP script that checks critical functionalities and exposes metrics that can be scraped by Prometheus’s `blackbox_exporter` or a custom exporter.

Create a script like /var/www/html/healthcheck.php:

<?php
header('Content-Type: application/json');

$response = ['status' => 'ok', 'checks' => []];
$startTime = microtime(true);

// Check database connection (basic)
try {
    $db = new PDO('pgsql:host=localhost;dbname=your_database', 'monitoring_user', 'your_strong_password');
    $db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    $db->exec('SELECT 1'); // Simple query
    $response['checks']['database_connection'] = 'ok';
} catch (PDOException $e) {
    $response['status'] = 'error';
    $response['checks']['database_connection'] = 'error: ' . $e->getMessage();
}

// Check WordPress core files (optional, can be resource intensive)
// You might check for the existence of wp-load.php or similar
if (file_exists(ABSPATH . 'wp-load.php')) {
    $response['checks']['wp_core_files'] = 'ok';
} else {
    $response['status'] = 'error';
    $response['checks']['wp_core_files'] = 'error: wp-load.php not found';
}

// Add more checks: external API calls, cron job status, etc.

$endTime = microtime(true);
$response['duration_seconds'] = $endTime - $startTime;

// Expose metrics in Prometheus text format (simplified)
// For a full implementation, consider a dedicated exporter or library
echo "# HELP wordpress_healthcheck_status Overall health status (1=ok, 0=error)\n";
echo "# TYPE wordpress_healthcheck_status gauge\n";
echo "wordpress_healthcheck_status " . ($response['status'] === 'ok' ? '1' : '0') . "\n";

echo "# HELP wordpress_healthcheck_duration_seconds Health check duration in seconds\n";
echo "# TYPE wordpress_healthcheck_duration_seconds gauge\n";
echo "wordpress_healthcheck_duration_seconds " . $response['duration_seconds'] . "\n";

// Add metrics for individual checks if needed
foreach ($response['checks'] as $key => $value) {
    if (strpos($value, 'ok') !== false) {
        echo "# HELP wordpress_healthcheck_" . $key . "_status Status of " . $key . " (1=ok, 0=error)\n";
        echo "# TYPE wordpress_healthcheck_" . $key . "_status gauge\n";
        echo "wordpress_healthcheck_" . $key . "_status 1\n";
    } else {
        echo "# HELP wordpress_healthcheck_" . $key . "_status Status of " . $key . " (1=ok, 0=error)\n";
        echo "# TYPE wordpress_healthcheck_" . $key . "_status gauge\n";
        echo "wordpress_healthcheck_" . $key . "_status 0\n";
    }
}

// Output JSON for debugging/other consumers if needed
// echo json_encode($response);

exit(0); // Ensure script exits cleanly
?>

Configure your web server (Nginx/Apache) to allow access to this script and potentially proxy it to PHP-FPM if not directly accessible. Then, use Prometheus’s `blackbox_exporter` configured for HTTP probes to scrape this endpoint. Alternatively, write a small Go/Python exporter that fetches this script’s output and converts it into Prometheus metrics.

Alerting with Alertmanager

Collecting metrics is only half the battle. Alerting ensures you’re notified *before* users are impacted. Prometheus integrates with Alertmanager for sophisticated alert routing, grouping, and silencing.

1. Configure Prometheus Alerting Rules:

Define alerting rules in a separate file (e.g., alerts.yml) and include it in your prometheus.yml.

groups:
- name: wordpress_alerts
  rules:
  - alert: HighCpuUsage
    expr: avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) < 0.2
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage on {{ $labels.instance }} is above 80% for the last 10 minutes."

  - alert: HighMemoryUsage
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High Memory Usage on {{ $labels.instance }}"
      description: "Memory usage on {{ $labels.instance }} is above 90% for the last 5 minutes."

  - alert: HighHttp5xxErrorRate
    expr: sum(rate(http_requests_total{code=~"5.."}[5m])) by (instance) / sum(rate(http_requests_total[5m])) by (instance) * 100 > 5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High HTTP 5xx error rate on {{ $labels.instance }}"
      description: "More than 5% of requests to {{ $labels.instance }} are resulting in 5xx errors."

  - alert: PostgreSQLHighConnectionCount
    expr: pg_stat_activity_count > (pg_settings_max_connections * 0.9)
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High PostgreSQL connection count on {{ $labels.instance }}"
      description: "PostgreSQL on {{ $labels.instance }} is nearing its max_connections limit ({{ pg_settings_max_connections }})."

  - alert: PostgreSQLReplicationLag
    expr: rate(pg_replication_lag_seconds[5m]) > 60 # Lagging by more than 60 seconds
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "PostgreSQL replication lag detected on {{ $labels.instance }}"
      description: "PostgreSQL replica {{ $labels.instance }} is lagging by more than 60 seconds."

  - alert: WordPressHealthCheckFailed
    expr: wordpress_healthcheck_status == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "WordPress health check failed on {{ $labels.instance }}"
      description: "The WordPress application on {{ $labels.instance }} is reporting a health check failure."

Ensure your prometheus.yml includes:

rule_files:
  - "alerts.yml"

2. Configure Alertmanager:

Set up Alertmanager to receive alerts from Prometheus and route them to your desired notification channels (Slack, PagerDuty, email, etc.). The configuration involves defining receivers and routes.

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default-receiver' # Default receiver if no specific route matches

receivers:
- name: 'default-receiver'
  slack_configs:
  - api_url: 'YOUR_SLACK_WEBHOOK_URL'
    channel: '#alerts'

# Example for PagerDuty
# - name: 'pagerduty-critical'
#   pagerduty_configs:
#   - service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'

inhibit_rules:
  - target_match:
      severity: 'critical'
    source_match:
      severity: 'warning'
    equal: ['alertname', 'cluster', 'service']

Configure Prometheus to send alerts to Alertmanager (typically running on port 9093):

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 'localhost:9093' # Address of your Alertmanager instance

Visualization with Grafana

Dashboards are essential for visualizing trends, understanding historical performance, and quickly diagnosing issues. Grafana is the go-to tool for this, integrating seamlessly with Prometheus.

1. Set up Grafana:

Install Grafana on a separate server or on your Prometheus server. Add Prometheus as a data source in Grafana, pointing to your Prometheus instance’s URL (e.g., http://localhost:9090).

2. Import Pre-built Dashboards:

Many excellent pre-built dashboards are available on Grafana.com for Node Exporter, PostgreSQL Exporter, and general system monitoring. Search for “Node Exporter Full” and “PostgreSQL” dashboards. Import these via the Grafana UI (Configuration -> Dashboards -> Import).

3. Create Custom Dashboards:

For WordPress-specific metrics or combined views, create custom dashboards. Key panels to include:

Web Server Overview: Request rate, error rate (5xx, 4xx), response time percentiles (P95, P99).
PHP-FPM Performance: Active processes, idle processes, request duration, slow requests.
PostgreSQL Cluster Health: Replication lag, active connections, query latency, cache hit ratio, CPU/Memory usage per node.
System Resources: CPU, Memory, Disk I/O, Network traffic per instance.
WordPress Health Check Status: A simple gauge showing the overall health status from your custom script.

Use PromQL queries to populate these panels. For example, to show the P95 response time for your web servers:

histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket[5m])) by (le, instance))

And to show PostgreSQL replication lag:

avg by (instance) (rate(pg_replication_lag_seconds[5m]))

Advanced Considerations and Best Practices

1. High Availability for Monitoring: Ensure your Prometheus and Alertmanager instances are themselves highly available. Consider running multiple Prometheus instances behind a load balancer or using Thanos/Cortex for long-term storage and global view.

2. Service Discovery: For dynamic environments, use Prometheus’s service discovery mechanisms (e.g., Linode’s API, Kubernetes SD) instead of static configurations.

3. Log Aggregation: Complement metrics with centralized logging (e.g., ELK stack, Loki). Metrics tell you *what* is happening, logs tell you *why*.

4. Performance Tuning: Use the collected metrics to identify bottlenecks. For PostgreSQL, this might involve tuning shared_buffers, work_mem, or analyzing slow queries with `pg_stat_statements`. For WordPress, optimize PHP configuration, caching plugins, and consider a CDN.

5. Security: Secure your monitoring endpoints. Use firewalls to restrict access to Prometheus, Alertmanager, and exporters. Consider authentication for sensitive metrics.

6. Regular Review: Periodically review your alerts and dashboards. Are they still relevant? Are there too many false positives? Adjust thresholds and rules as your application evolves.

Server Monitoring Best Practices: Keeping Your WordPress App and PostgreSQL Clusters Alive on Linode

Establishing a Baseline: Essential Metrics for WordPress and PostgreSQL

Agent-Based Monitoring with Prometheus and Node Exporter

PostgreSQL Metrics with `postgres_exporter`

Application-Level Metrics with `php-fpm_exporter` and Custom WordPress Checks

Alerting with Alertmanager

Visualization with Grafana

Advanced Considerations and Best Practices

Recent Posts

Top Categories

Our Products

Our Services