• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • Home
  • Projects
  • Products
  • Themes
  • Tools
  • Request for Quote

Vengala Vinay

Having 9+ Years of Experience in Software Development

  • Home
  • WordPress
  • PHP
    • Codeigniter
  • Django
  • Magento
  • Selenium
  • Server
Home » Server Monitoring Best Practices: Keeping Your WordPress App and PostgreSQL Clusters Alive on Linode

Server Monitoring Best Practices: Keeping Your WordPress App and PostgreSQL Clusters Alive on Linode

Establishing a Baseline: Essential Metrics for WordPress and PostgreSQL

Effective server monitoring hinges on understanding what “normal” looks like for your specific application stack. For a WordPress site backed by a PostgreSQL cluster on Linode, this means tracking key performance indicators (KPIs) across both the web server layer and the database layer. Without a baseline, anomaly detection and proactive issue resolution become guesswork.

For the WordPress application itself, we’ll focus on metrics that indicate responsiveness and resource utilization. This includes:

  • HTTP Request Rate: The number of requests per second hitting your web server (Nginx or Apache).
  • Response Time (Average & Percentiles): How long it takes for the server to respond to requests. P95 and P99 are crucial for identifying outlier latency.
  • Error Rate (HTTP 5xx): The percentage of requests resulting in server-side errors.
  • CPU Utilization: Overall CPU load on the web server instances.
  • Memory Usage: RAM consumption, paying close attention to swap usage.
  • Disk I/O: Read/write operations per second and latency, especially important for serving static assets and WordPress uploads.

For the PostgreSQL cluster, the focus shifts to database performance and health:

  • Query Throughput: Transactions per second (TPS) or queries per second (QPS).
  • Query Latency (Average & Percentiles): Time taken to execute queries.
  • Connection Count: Number of active and idle connections. Exceeding `max_connections` is a common failure point.
  • Replication Lag: For high-availability setups, the delay between the primary and replica(s).
  • CPU & Memory Usage: Similar to web servers, but critical for database operations.
  • Disk I/O: Database operations are heavily I/O bound.
  • Cache Hit Ratio: Effectiveness of PostgreSQL’s shared buffer cache.
  • Lock Contention: Identifying queries waiting for locks.
  • WAL (Write-Ahead Log) Activity: Rate of WAL generation and archiving.

Agent-Based Monitoring with Prometheus and Node Exporter

Prometheus is a de facto standard for time-series monitoring in cloud-native environments. Its pull-based model and powerful query language (PromQL) make it ideal for collecting and analyzing metrics. We’ll deploy Node Exporter on each Linode instance to expose system-level metrics.

1. Install Node Exporter on each Linode instance:

Download the latest release from the Prometheus GitHub repository. For example, on a Debian/Ubuntu system:

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd -rs /bin/false node_exporter

2. Create a systemd service for Node Exporter:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Save this as /etc/systemd/system/node_exporter.service and then enable and start it:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter

3. Configure Prometheus to scrape Node Exporter:

On your Prometheus server, edit the prometheus.yml configuration file. Add scrape jobs for each Linode instance running Node Exporter.

global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds.
  evaluation_interval: 15s # Evaluate rules every 15 seconds.

scrape_configs:
  # Scrape Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Scrape WordPress web servers
  - job_name: 'wordpress_webservers'
    static_configs:
      - targets:
          - '192.0.2.10:9100' # Replace with your Linode IP for Web Server 1
          - '192.0.2.11:9100' # Replace with your Linode IP for Web Server 2
          # Add more web servers as needed

  # Scrape PostgreSQL primary node
  - job_name: 'postgresql_primary'
    static_configs:
      - targets:
          - '192.0.2.20:9100' # Replace with your Linode IP for PostgreSQL Primary

  # Scrape PostgreSQL replica nodes
  - job_name: 'postgresql_replicas'
    static_configs:
      - targets:
          - '192.0.2.21:9100' # Replace with your Linode IP for PostgreSQL Replica 1
          - '192.0.2.22:9100' # Replace with your Linode IP for PostgreSQL Replica 2
          # Add more replicas as needed

Reload the Prometheus configuration:

curl -X POST http://localhost:9090/-/reload

PostgreSQL Metrics with `postgres_exporter`

Node Exporter provides system-level metrics. To get detailed PostgreSQL metrics, we’ll use postgres_exporter. This exporter connects to your PostgreSQL instances and exposes metrics via an HTTP endpoint, which Prometheus can then scrape.

1. Install `postgres_exporter` on your PostgreSQL nodes:

Download the latest release from its GitHub repository. Similar to Node Exporter:

wget https://github.com/wrouesnel/postgres_exporter/releases/download/v0.13.0/postgres_exporter-0.13.0.linux-amd64.tar.gz
tar xvfz postgres_exporter-0.13.0.linux-amd64.tar.gz
sudo mv postgres_exporter-0.13.0.linux-amd64/postgres_exporter /usr/local/bin/
sudo useradd -rs /bin/false postgres_exporter

2. Configure `postgres_exporter` to connect to PostgreSQL:

Create a PostgreSQL user for the exporter with minimal privileges. This user only needs to connect and read from `pg_stat_activity`, `pg_stat_database`, and other relevant system catalogs. Avoid granting superuser privileges.

-- Connect to your PostgreSQL database as a superuser
CREATE USER monitoring_user WITH PASSWORD 'your_strong_password';
GRANT CONNECT ON DATABASE your_database TO monitoring_user;
GRANT USAGE ON SCHEMA pg_catalog TO monitoring_user;
GRANT SELECT ON pg_stat_activity TO monitoring_user;
GRANT SELECT ON pg_stat_database TO monitoring_user;
GRANT SELECT ON pg_stat_replication TO monitoring_user;
GRANT SELECT ON pg_locks TO monitoring_user;
GRANT SELECT ON pg_stat_statements TO monitoring_user; -- If pg_stat_statements is enabled
-- Add other necessary grants based on the exporter's documentation and your needs

Create a .pgpass file for the exporter user (or a dedicated system user running the exporter) to store the connection string:

# ~/.pgpass or /etc/postgres-exporter/.pgpass
hostname:port:database:username:password
localhost:5432:your_database:monitoring_user:your_strong_password

Ensure the permissions are strict:

chmod 0600 ~/.pgpass

3. Create a systemd service for `postgres_exporter`:

[Unit]
Description=PostgreSQL Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=postgres_exporter
Group=postgres_exporter
Type=simple
# Adjust the DATA_SOURCE_NAME to match your PostgreSQL connection string
# Example: "user=monitoring_user password=your_strong_password host=localhost port=5432 dbname=your_database sslmode=disable"
# Or use the .pgpass file by setting PGUSER, PGHOST, etc. or using the default location.
Environment="DATA_SOURCE_NAME=postgresql://monitoring_user:your_strong_password@localhost:5432/your_database?sslmode=disable"
ExecStart=/usr/local/bin/postgres_exporter --web.listen-address=":9187" --extend.queries-path=/etc/postgres-exporter/queries.yaml

[Install]
WantedBy=multi-user.target

Save this as /etc/systemd/system/postgres_exporter.service. You’ll also need to create /etc/postgres-exporter/queries.yaml for custom queries if needed. For standard metrics, the exporter works out-of-the-box. Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable postgres_exporter
sudo systemctl start postgres_exporter
sudo systemctl status postgres_exporter

4. Configure Prometheus to scrape `postgres_exporter`:

Update your prometheus.yml to include scrape jobs for the PostgreSQL exporter running on port 9187 on your database nodes.

# ... (previous scrape configs) ...

scrape_configs:
  # ... (Prometheus, WordPress webservers) ...

  # Scrape PostgreSQL primary node metrics
  - job_name: 'postgresql_primary_metrics'
    static_configs:
      - targets:
          - '192.0.2.20:9187' # Replace with your Linode IP for PostgreSQL Primary

  # Scrape PostgreSQL replica nodes metrics
  - job_name: 'postgresql_replicas_metrics'
    static_configs:
      - targets:
          - '192.0.2.21:9187' # Replace with your Linode IP for PostgreSQL Replica 1
          - '192.0.2.22:9187' # Replace with your Linode IP for PostgreSQL Replica 2
          # Add more replicas as needed

Reload Prometheus configuration again.

Application-Level Metrics with `php-fpm_exporter` and Custom WordPress Checks

While Node Exporter gives us system-level insights, and `postgres_exporter` database insights, we need metrics directly from the PHP-FPM process serving WordPress and potentially custom checks for WordPress itself.

1. Exposing PHP-FPM Metrics:

PHP-FPM exposes its status via a status page. We can use `php-fpm_exporter` to scrape this and expose it in Prometheus format. First, ensure your PHP-FPM pool configuration (e.g., /etc/php/8.1/fpm/pool.d/www.conf) has the status page enabled:

pm.status_path = /status
ping.path = /ping
ping.response = pong

Restart PHP-FPM:

sudo systemctl restart php8.1-fpm

Install `php-fpm_exporter` (similar process to Node Exporter/postgres_exporter) and configure it to point to your PHP-FPM status page. Typically, this involves setting the --php-fpm.status-url flag.

# Example systemd service snippet for php-fpm_exporter
[Service]
User=php-fpm_exporter
Group=php-fpm_exporter
Type=simple
ExecStart=/usr/local/bin/php-fpm_exporter --web.listen-address=":9124" --php-fpm.status-url="http://127.0.0.1/status" --php-fpm.ping-url="http://127.0.0.1/ping"
# Ensure the web server (Nginx/Apache) is configured to proxy these URLs to the PHP-FPM socket or a local listener.

Add a scrape job for `php-fpm_exporter` (e.g., on port 9124) to your prometheus.yml.

2. Custom WordPress Health Checks:

For deeper WordPress insights, consider a simple PHP script that checks critical functionalities and exposes metrics that can be scraped by Prometheus’s `blackbox_exporter` or a custom exporter.

Create a script like /var/www/html/healthcheck.php:

<?php
header('Content-Type: application/json');

$response = ['status' => 'ok', 'checks' => []];
$startTime = microtime(true);

// Check database connection (basic)
try {
    $db = new PDO('pgsql:host=localhost;dbname=your_database', 'monitoring_user', 'your_strong_password');
    $db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    $db->exec('SELECT 1'); // Simple query
    $response['checks']['database_connection'] = 'ok';
} catch (PDOException $e) {
    $response['status'] = 'error';
    $response['checks']['database_connection'] = 'error: ' . $e->getMessage();
}

// Check WordPress core files (optional, can be resource intensive)
// You might check for the existence of wp-load.php or similar
if (file_exists(ABSPATH . 'wp-load.php')) {
    $response['checks']['wp_core_files'] = 'ok';
} else {
    $response['status'] = 'error';
    $response['checks']['wp_core_files'] = 'error: wp-load.php not found';
}

// Add more checks: external API calls, cron job status, etc.

$endTime = microtime(true);
$response['duration_seconds'] = $endTime - $startTime;

// Expose metrics in Prometheus text format (simplified)
// For a full implementation, consider a dedicated exporter or library
echo "# HELP wordpress_healthcheck_status Overall health status (1=ok, 0=error)\n";
echo "# TYPE wordpress_healthcheck_status gauge\n";
echo "wordpress_healthcheck_status " . ($response['status'] === 'ok' ? '1' : '0') . "\n";

echo "# HELP wordpress_healthcheck_duration_seconds Health check duration in seconds\n";
echo "# TYPE wordpress_healthcheck_duration_seconds gauge\n";
echo "wordpress_healthcheck_duration_seconds " . $response['duration_seconds'] . "\n";

// Add metrics for individual checks if needed
foreach ($response['checks'] as $key => $value) {
    if (strpos($value, 'ok') !== false) {
        echo "# HELP wordpress_healthcheck_" . $key . "_status Status of " . $key . " (1=ok, 0=error)\n";
        echo "# TYPE wordpress_healthcheck_" . $key . "_status gauge\n";
        echo "wordpress_healthcheck_" . $key . "_status 1\n";
    } else {
        echo "# HELP wordpress_healthcheck_" . $key . "_status Status of " . $key . " (1=ok, 0=error)\n";
        echo "# TYPE wordpress_healthcheck_" . $key . "_status gauge\n";
        echo "wordpress_healthcheck_" . $key . "_status 0\n";
    }
}

// Output JSON for debugging/other consumers if needed
// echo json_encode($response);

exit(0); // Ensure script exits cleanly
?>

Configure your web server (Nginx/Apache) to allow access to this script and potentially proxy it to PHP-FPM if not directly accessible. Then, use Prometheus’s `blackbox_exporter` configured for HTTP probes to scrape this endpoint. Alternatively, write a small Go/Python exporter that fetches this script’s output and converts it into Prometheus metrics.

Alerting with Alertmanager

Collecting metrics is only half the battle. Alerting ensures you’re notified *before* users are impacted. Prometheus integrates with Alertmanager for sophisticated alert routing, grouping, and silencing.

1. Configure Prometheus Alerting Rules:

Define alerting rules in a separate file (e.g., alerts.yml) and include it in your prometheus.yml.

groups:
- name: wordpress_alerts
  rules:
  - alert: HighCpuUsage
    expr: avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) < 0.2
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage on {{ $labels.instance }} is above 80% for the last 10 minutes."

  - alert: HighMemoryUsage
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High Memory Usage on {{ $labels.instance }}"
      description: "Memory usage on {{ $labels.instance }} is above 90% for the last 5 minutes."

  - alert: HighHttp5xxErrorRate
    expr: sum(rate(http_requests_total{code=~"5.."}[5m])) by (instance) / sum(rate(http_requests_total[5m])) by (instance) * 100 > 5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High HTTP 5xx error rate on {{ $labels.instance }}"
      description: "More than 5% of requests to {{ $labels.instance }} are resulting in 5xx errors."

  - alert: PostgreSQLHighConnectionCount
    expr: pg_stat_activity_count > (pg_settings_max_connections * 0.9)
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High PostgreSQL connection count on {{ $labels.instance }}"
      description: "PostgreSQL on {{ $labels.instance }} is nearing its max_connections limit ({{ pg_settings_max_connections }})."

  - alert: PostgreSQLReplicationLag
    expr: rate(pg_replication_lag_seconds[5m]) > 60 # Lagging by more than 60 seconds
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "PostgreSQL replication lag detected on {{ $labels.instance }}"
      description: "PostgreSQL replica {{ $labels.instance }} is lagging by more than 60 seconds."

  - alert: WordPressHealthCheckFailed
    expr: wordpress_healthcheck_status == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "WordPress health check failed on {{ $labels.instance }}"
      description: "The WordPress application on {{ $labels.instance }} is reporting a health check failure."

Ensure your prometheus.yml includes:

rule_files:
  - "alerts.yml"

2. Configure Alertmanager:

Set up Alertmanager to receive alerts from Prometheus and route them to your desired notification channels (Slack, PagerDuty, email, etc.). The configuration involves defining receivers and routes.

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default-receiver' # Default receiver if no specific route matches

receivers:
- name: 'default-receiver'
  slack_configs:
  - api_url: 'YOUR_SLACK_WEBHOOK_URL'
    channel: '#alerts'

# Example for PagerDuty
# - name: 'pagerduty-critical'
#   pagerduty_configs:
#   - service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'

inhibit_rules:
  - target_match:
      severity: 'critical'
    source_match:
      severity: 'warning'
    equal: ['alertname', 'cluster', 'service']

Configure Prometheus to send alerts to Alertmanager (typically running on port 9093):

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 'localhost:9093' # Address of your Alertmanager instance

Visualization with Grafana

Dashboards are essential for visualizing trends, understanding historical performance, and quickly diagnosing issues. Grafana is the go-to tool for this, integrating seamlessly with Prometheus.

1. Set up Grafana:

Install Grafana on a separate server or on your Prometheus server. Add Prometheus as a data source in Grafana, pointing to your Prometheus instance’s URL (e.g., http://localhost:9090).

2. Import Pre-built Dashboards:

Many excellent pre-built dashboards are available on Grafana.com for Node Exporter, PostgreSQL Exporter, and general system monitoring. Search for “Node Exporter Full” and “PostgreSQL” dashboards. Import these via the Grafana UI (Configuration -> Dashboards -> Import).

3. Create Custom Dashboards:

For WordPress-specific metrics or combined views, create custom dashboards. Key panels to include:

  • Web Server Overview: Request rate, error rate (5xx, 4xx), response time percentiles (P95, P99).
  • PHP-FPM Performance: Active processes, idle processes, request duration, slow requests.
  • PostgreSQL Cluster Health: Replication lag, active connections, query latency, cache hit ratio, CPU/Memory usage per node.
  • System Resources: CPU, Memory, Disk I/O, Network traffic per instance.
  • WordPress Health Check Status: A simple gauge showing the overall health status from your custom script.

Use PromQL queries to populate these panels. For example, to show the P95 response time for your web servers:

histogram_quantile(0.95, sum(rate(http_server_requests_seconds_bucket[5m])) by (le, instance))

And to show PostgreSQL replication lag:

avg by (instance) (rate(pg_replication_lag_seconds[5m]))

Advanced Considerations and Best Practices

1. High Availability for Monitoring: Ensure your Prometheus and Alertmanager instances are themselves highly available. Consider running multiple Prometheus instances behind a load balancer or using Thanos/Cortex for long-term storage and global view.

2. Service Discovery: For dynamic environments, use Prometheus’s service discovery mechanisms (e.g., Linode’s API, Kubernetes SD) instead of static configurations.

3. Log Aggregation: Complement metrics with centralized logging (e.g., ELK stack, Loki). Metrics tell you *what* is happening, logs tell you *why*.

4. Performance Tuning: Use the collected metrics to identify bottlenecks. For PostgreSQL, this might involve tuning shared_buffers, work_mem, or analyzing slow queries with `pg_stat_statements`. For WordPress, optimize PHP configuration, caching plugins, and consider a CDN.

5. Security: Secure your monitoring endpoints. Use firewalls to restrict access to Prometheus, Alertmanager, and exporters. Consider authentication for sensitive metrics.

6. Regular Review: Periodically review your alerts and dashboards. Are they still relevant? Are there too many false positives? Adjust thresholds and rules as your application evolves.

Primary Sidebar

A little about the Author

Having 9+ Years of Experience in Software Development.
Expertised in Php Development, WordPress Custom Theme Development (From scratch using underscores or Genesis Framework or using any blank theme or Premium Theme), Custom Plugin Development. Hands on Experience on 3rd Party Php Extension like Chilkat, nSoftware.

Recent Posts

  • Disaster Recovery 101: Architecting Auto-Failovers for Redis and PHP Deployments on OVH
  • How We Audited a High-Traffic WooCommerce Enterprise Stack on Google Cloud and Mitigated Race conditions during high-concurrency payment processing
  • Disaster Recovery 101: Architecting Auto-Failovers for Elasticsearch and Magento 2 Deployments on DigitalOcean
  • An Auditor’s Checklist for Securing WordPress Backends on OVH
  • Step-by-Step: Diagnosing Perl script high CPU throttling due to unoptimized regular expressions on AWS Servers

Copyright © 2026 · Vinay Vengala