Server Monitoring Best Practices: Keeping Your WooCommerce App and PostgreSQL Clusters Alive on Linode

Proactive PostgreSQL Monitoring with `pg_stat_statements` and `pg_wait_sampling`

Maintaining the health and performance of your PostgreSQL clusters, especially those powering critical WooCommerce applications, demands more than just basic CPU and memory checks. We need deep visibility into query performance and wait events. The `pg_stat_statements` extension is indispensable for identifying slow or resource-intensive queries, while `pg_wait_sampling` provides crucial insights into what your database is *actually* waiting on.

First, ensure `pg_stat_statements` is enabled. This typically involves adding it to `shared_preload_libraries` in your `postgresql.conf` and then creating the extension in each database you want to monitor.

Enabling and Configuring `pg_stat_statements`

Edit your `postgresql.conf` (usually located in `/etc/postgresql/[version]/main/postgresql.conf` or similar on Linode):

shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
pg_stat_statements.track_utility = off
pg_stat_statements.save = on

After modifying `postgresql.conf`, you must restart your PostgreSQL service:

sudo systemctl restart postgresql

Then, connect to your target database and create the extension:

-- Connect to your database, e.g., 'woocommerce_db'
\c woocommerce_db

-- Create the extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Now, you can query `pg_stat_statements` to find the top queries by execution time, calls, or other metrics. This is invaluable for pinpointing performance bottlenecks within your WooCommerce application’s database interactions.

SELECT
    calls,
    total_exec_time,
    mean_exec_time,
    rows,
    substring(query, 1, 60) AS query_snippet
FROM
    pg_stat_statements
ORDER BY
    total_exec_time DESC
LIMIT 20;

Leveraging `pg_wait_sampling` for Deeper Insights

`pg_wait_sampling` is a more advanced extension that samples wait events occurring within PostgreSQL. This helps diagnose issues that `pg_stat_statements` might miss, such as contention on locks, I/O waits, or network latency.

Installation is similar: add it to `shared_preload_libraries` and create the extension. For `pg_wait_sampling`, you’ll also need to configure its sampling rate.

Edit `postgresql.conf`:

shared_preload_libraries = 'pg_stat_statements,pg_wait_sampling' # Add pg_wait_sampling
pg_wait_sampling.sample_rate = 0.1  # Sample 10% of events
pg_wait_sampling.include_superuser = off
pg_wait_sampling.include_background = off

Restart PostgreSQL after these changes.

sudo systemctl restart postgresql

Create the extension in your database:

\c woocommerce_db
CREATE EXTENSION IF NOT EXISTS pg_wait_sampling;

Querying `pg_wait_sampling` reveals the most common wait events. This is crucial for understanding why queries might be slow even if they aren’t the absolute longest-running ones in `pg_stat_statements`.

SELECT
    event,
    count(*) AS occurrences
FROM
    pg_wait_sampling_profile()
GROUP BY
    event
ORDER BY
    occurrences DESC
LIMIT 20;

Common wait events to watch for include `IO_READ`, `IO_WRITE` (indicating disk bottlenecks), `Lock` (contention on tables or rows), `ClientRead` (waiting for client to receive data), and `LWLock` (internal PostgreSQL lock contention).

WooCommerce Application Monitoring with Prometheus and Node Exporter

For your WooCommerce application servers, a robust monitoring stack is essential. Prometheus, coupled with `node_exporter`, provides a powerful, pull-based metrics collection system. This allows us to track system-level metrics and application-specific performance indicators.

Setting up Prometheus and Node Exporter on Linode

First, install `node_exporter` on each of your WooCommerce application servers. This can be done by downloading the latest release from the official Prometheus GitHub repository.

# On each app server:
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
rm -rf node_exporter-1.7.0.linux-amd64*

Create a systemd service file for `node_exporter` to ensure it runs as a service and restarts on boot.

# /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=nobody
Group=nogroup
Type=simple
ExecStart=/usr/local/bin/node_exporter \
    --web.listen-address=":9100" \
    --collector.textfile.directory="/var/lib/node_exporter/textfile-collector"

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter

Next, configure your Prometheus server to scrape metrics from these `node_exporter` instances. Edit your `prometheus.yml` configuration file.

# prometheus.yml
global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.

scrape_configs:
  # Scrape Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Scrape PostgreSQL exporter (assuming you have one running)
  # - job_name: 'postgres'
  #   static_configs:
  #     - targets: ['your_postgres_exporter_host:9187'] # Example

  # Scrape WooCommerce App Servers
  - job_name: 'woocommerce_app'
    static_configs:
      - targets:
          - 'app_server_1_ip:9100'
          - 'app_server_2_ip:9100'
          # Add all your WooCommerce app server IPs here
    # You can also use service discovery for dynamic environments

Restart your Prometheus server after updating the configuration.

Custom WooCommerce Metrics with Textfile Collector

To gain deeper insights into your WooCommerce application’s performance, you can leverage `node_exporter`’s textfile collector. This allows you to write custom metrics to files in a designated directory, which `node_exporter` will then expose.

Create the directory specified in the `node_exporter` service file:

sudo mkdir -p /var/lib/node_exporter/textfile-collector
sudo chown nobody:nogroup /var/lib/node_exporter/textfile-collector

Now, create a script that generates your custom metrics. For example, let’s track the number of pending orders or the average response time of a critical API endpoint.

#!/usr/bin/env python3
import os
import time
import subprocess

# Assume you have a way to query your WooCommerce DB or API
# For demonstration, we'll use dummy values.

def get_pending_orders():
    # Replace with actual logic to query WooCommerce DB/API
    # Example: SELECT COUNT(*) FROM wp_posts WHERE post_type = 'shop_order' AND post_status IN ('wc-pending', 'wc-processing');
    return 42

def get_api_response_time(url="https://your-woocommerce-app.com/wp-json/wc/v3/products"):
    # Replace with actual logic to measure API response time
    # Example using curl
    try:
        start_time = time.time()
        # Use a specific endpoint that's critical for your app
        subprocess.run(['curl', '-s', '-o', '/dev/null', '-w', '%{http_code}', url], check=True)
        end_time = time.time()
        return (end_time - start_time) * 1000 # in milliseconds
    except subprocess.CalledProcessError:
        return -1 # Indicate an error

def write_metrics():
    pending_orders = get_pending_orders()
    api_response_ms = get_api_response_time()

    metrics_output = f"""
# HELP wc_pending_orders Number of orders in pending or processing status.
# TYPE wc_pending_orders gauge
wc_pending_orders {pending_orders}

# HELP wc_api_response_time_ms Average response time for a critical API endpoint in milliseconds.
# TYPE wc_api_response_time_ms gauge
wc_api_response_time_ms {api_response_ms if api_response_ms != -1 else 'NaN'}
"""
    # Write to a file in the textfile collector directory
    with open("/var/lib/node_exporter/textfile-collector/woocommerce_custom.prom", "w") as f:
        f.write(metrics_output)

if __name__ == "__main__":
    write_metrics()

Schedule this script to run periodically (e.g., every minute) using `cron`.

# Edit crontab for the user that can write to the directory (e.g., root or a dedicated user)
sudo crontab -e

# Add the following line to run the script every minute:
* * * * * /usr/bin/python3 /path/to/your/metrics_script.py

After the script runs, Prometheus will scrape these new metrics from `node_exporter` on port 9100.

HAProxy Load Balancer Monitoring and Alerting

For high availability and scalability, HAProxy is often used to distribute traffic to your WooCommerce application servers. Monitoring HAProxy itself is critical to ensure traffic is being routed correctly and to detect backend server failures.

Enabling HAProxy Stats and Prometheus Exporter

HAProxy provides a built-in stats page. For Prometheus integration, we’ll use the `haproxy_exporter`.

First, enable the HAProxy stats socket in your `haproxy.cfg`:

# haproxy.cfg
listen stats
    bind *:8404
    mode http
    stats enable
    stats uri /haproxy?stats
    stats refresh 10s
    stats auth admin:YourSecurePassword # Change this!

Restart HAProxy:

sudo systemctl restart haproxy

Now, install `haproxy_exporter` on a separate server or on your Prometheus server. Download the binary from its GitHub releases page.

# On the server where haproxy_exporter will run:
wget https://github.com/prometheus/haproxy_exporter/releases/download/v0.15.0/haproxy_exporter-0.15.0.linux-amd64.tar.gz
tar xvfz haproxy_exporter-0.15.0.linux-amd64.tar.gz
sudo mv haproxy_exporter-0.15.0.linux-amd64/haproxy_exporter /usr/local/bin/
rm -rf haproxy_exporter-0.15.0.linux-amd64*

Create a systemd service for `haproxy_exporter`:

# /etc/systemd/system/haproxy_exporter.service
[Unit]
Description=HAProxy Exporter
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=nobody
Group=nogroup
ExecStart=/usr/local/bin/haproxy_exporter \
    --web.listen-address=":9101" \
    --haproxy.scrape-interval="5s" \
    --haproxy.stats-url="http://localhost:8404/haproxy?stats;csv" \
    --haproxy.username="admin" \
    --haproxy.password="YourSecurePassword" # Match the password set in haproxy.cfg

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable haproxy_exporter
sudo systemctl start haproxy_exporter
sudo systemctl status haproxy_exporter

Finally, configure Prometheus to scrape the `haproxy_exporter`.

# prometheus.yml (add this to your existing config)
  - job_name: 'haproxy'
    static_configs:
      - targets: ['your_haproxy_exporter_ip:9101'] # IP of the server running haproxy_exporter

Restart Prometheus.

HAProxy Alerting Rules

Define alerting rules in Prometheus to notify you of critical HAProxy events. Create a new file, e.g., `haproxy_alerts.yml`.

# haproxy_alerts.yml
groups:
- name: haproxy_alerts
  rules:
  - alert: HAProxyBackendDown
    expr: haproxy_backend_up == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "HAProxy backend {{ $labels.backend }} on {{ $labels.instance }} is down."
      description: "The HAProxy backend {{ $labels.backend }} has been down for more than 5 minutes."

  - alert: HAProxyFrontendDown
    expr: haproxy_frontend_up == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "HAProxy frontend {{ $labels.frontend }} on {{ $labels.instance }} is down."
      description: "The HAProxy frontend {{ $labels.frontend }} has been down for more than 5 minutes."

  - alert: HAProxyHighErrorRate
    expr: sum(rate(haproxy_server_errors_total{job="haproxy"}[5m])) by (instance, backend, server) > 10
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High error rate on HAProxy backend {{ $labels.backend }} server {{ $labels.server }}."
      description: "The HAProxy backend {{ $labels.backend }} server {{ $labels.server }} on {{ $labels.instance }} is experiencing a high error rate (more than 10 errors per 5 minutes)."

Ensure Prometheus is configured to load these alert rules by adding `alerting` and `rule_files` sections to your `prometheus.yml`.

# prometheus.yml
rule_files:
  - "haproxy_alerts.yml"
  # - "other_alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 'alertmanager_host:9093' # Your Alertmanager instance

Centralized Logging with ELK Stack (Elasticsearch, Logstash, Kibana)

Aggregating logs from your WooCommerce application servers, PostgreSQL instances, and HAProxy is crucial for debugging and auditing. The ELK stack (or its modern successor, the Elastic Stack) provides a powerful solution.

Log Shipping with Filebeat

Filebeat is a lightweight shipper that you install on your servers to forward log files to Logstash or directly to Elasticsearch.

Install Filebeat on your WooCommerce app servers, PostgreSQL servers, and HAProxy servers.

# On each server:
# Download and install Filebeat (example for Debian/Ubuntu)
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.11.1-amd64.deb
sudo dpkg -i filebeat-8.11.1-amd64.deb
rm filebeat-8.11.1-amd64.deb

Configure Filebeat to collect relevant logs. Edit `filebeat.yml`.

# filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/apache2/access.log
    - /var/log/apache2/error.log
    - /var/log/php/error.log # Adjust path for your PHP error log
  fields_under_root: true
  fields:
    app_type: woocommerce_app

- type: log
  enabled: true
  paths:
    - /var/log/postgresql/postgresql-*.log # Adjust for your PostgreSQL log path
  fields_under_root: true
  fields:
    app_type: postgresql

- type: log
  enabled: true
  paths:
    - /var/log/haproxy.log # Adjust for your HAProxy log path
  fields_under_root: true
  fields:
    app_type: haproxy

# Configure output to Logstash or Elasticsearch
output.logstash:
  hosts: ["your_logstash_host:5044"] # Or directly to Elasticsearch: ["your_elasticsearch_host:9200"]

# If sending directly to Elasticsearch, you might need authentication
# output.elasticsearch:
#   hosts: ["your_elasticsearch_host:9200"]
#   username: "elastic"
#   password: "changeme"

Enable and start the Filebeat service:

sudo systemctl enable filebeat
sudo systemctl start filebeat
sudo systemctl status filebeat

Logstash Configuration for Processing

On your Logstash server, configure pipelines to parse and enrich logs. Create a pipeline configuration file, e.g., `/etc/logstash/conf.d/02-woocommerce-pipeline.conf`.

# /etc/logstash/conf.d/02-woocommerce-pipeline.conf

input {
  beats {
    port => 5044
  }
}

filter {
  if [app_type] == "woocommerce_app" {
    # Example: Parse Apache access logs
    if "access.log" in [log.file.path] {
      grok {
        match => { "message" => "%{COMBINEDAPACHELOG}" }
      }
      date {
        match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
      }
      geoip {
        source => "client.address"
      }
      useragent {
        source => "agent.original"
      }
    }
    # Example: Parse PHP error logs (adjust grok pattern as needed)
    if "php/error.log" in [log.file.path] {
      grok {
        match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:loglevel} \[%{DATA:thread}\] %{JAVACLASS:class}: %{GREEDYDATA:error_message}" }
      }
      date {
        match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
      }
    }
  }

  if [app_type] == "postgresql" {
    # Example: Parse PostgreSQL logs (adjust grok pattern)
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{NUMBER:process_id}\] %{DATA:user_at_host} %{DATA:database} %{LOGLEVEL:loglevel} %{GREEDYDATA:log_message}" }
    }
    date {
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
    }
  }

  if [app_type] == "haproxy" {
    # Example: Parse HAProxy logs
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:frontend_name}:%{WORD:backend_name}/%{WORD:server_name} %{NUMBER:http_status_code} %{NUMBER:bytes_read} %{NUMBER:elapsed_time_ms} %{NUMBER:bytes_written} %{WORD:http_method} %{URIPATHPARAM:url} %{DATA:client_ip} %{DATA:client_port} %{DATA:server_ip} %{DATA:server_port}" }
    }
    date {
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
    }
  }
}

output {
  elasticsearch {
    hosts => ["your_elasticsearch_host:9200"]
    index => "%{app_type}-%{+YYYY.MM.dd}"
    # user => "elastic"
    # password => "changeme"
  }
}

Restart Logstash after applying changes.

Kibana Dashboards for Visualization

In Kibana, create index patterns for your logs (e.g., `woocommerce_app-*`, `postgresql-*`, `haproxy-*`). Then, build dashboards to visualize key metrics:

WooCommerce App: HTTP status code distribution, top requested URLs, error rates (PHP errors, 5xx responses), API response times, user agent analysis, geographic distribution of users.
PostgreSQL: Log levels (errors, warnings), slow query logs (if configured), connection attempts, database-specific errors.
HAProxy: Backend server status (up/down), request rates, error rates, response times, client IP analysis.

Leverage pre-built dashboards or create custom ones tailored to your specific needs. This centralized view is invaluable for rapid troubleshooting and performance analysis.

System Health Checks and Automated Recovery Scripts

Beyond metrics and logs, periodic health checks and automated recovery mechanisms are vital. This involves scripting checks for essential services and implementing basic self-healing capabilities.

Basic Health Check Script

Create a script that runs on each server (app, DB, load balancer) to check the status of critical processes and network connectivity.

#!/bin/bash

# --- Configuration ---
LOG_FILE="/var/log/health_check.log"
HEALTHCHECK_URL="https://your-woocommerce-app.com/wp-cron.php" # Example health check endpoint
PG_HOST="localhost"
PG_PORT="5432"
PG_USER="monitoring_user" # Create a read-only user for monitoring
PG_DB="woocommerce_db"
REQUIRED_SERVICES=("nginx" "php-fpm" "postgresql" "haproxy") # Adjust for your setup

# --- Functions ---
log_message() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | sudo tee -a $LOG_FILE
}

check_service() {
    local service_name=$1
    if systemctl is-active --quiet $service_name; then
        log_message "OK: Service '$service_name' is running."
        return 0
    else
        log_message "ERROR: Service '$service_name' is NOT running."
        return 1
    fi
}

check_network() {
    local url=$1
    if curl -s --head "$url" | grep "200 OK" > /dev/null; then
        log_message "OK: Network check to '$url' successful."
        return 0
    else
        log_message "ERROR: Network check to '$url' failed."
        return 1
    fi
}

check_postgres() {
    if PGPASSWORD=$(cat /etc/pg_monitoring_password); do
        if psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PG_DB -c '\q' > /dev/null 2>&1; then
            log_message "OK: PostgreSQL connection to '$PG_DB' successful."
            return 0
        else
            log_message "ERROR: PostgreSQL connection to '$PG_DB' failed."
            return 1
        fi
    else
        log_message "ERROR: Could not retrieve PostgreSQL password."
        return 1
    fi
}

# --- Main Execution ---
log_message "--- Starting Health Check ---"

# Check required services
for service in "${REQUIRED_SERVICES[@]}"; do
    check_service "$service"
done

# Perform specific checks based on server role (add logic here)
# Example for App Server:
if [[ $(hostname -s) == appserver* ]]; then
    check_network "$HEALTHCHECK_URL"
    # Add checks for PHP-FPM status, disk space, etc.
fi

# Example for DB Server:
if [[ $(hostname -s) == dbserver* ]]; then
    check_postgres
    # Add checks for disk space, replication status (if applicable)
fi

# Example for Load Balancer:
if [[ $(hostname -s) == lbserver* ]]; then
    # Check HAProxy stats endpoint or a dummy backend response
    check_network "http://localhost:8404/haproxy?stats;csv" # Example
fi

log_message "--- Health Check Finished ---"

# --- Automated Recovery (Optional) ---
# This section should be carefully implemented to avoid unintended consequences.
# Example: Restarting a failed service.
# if ! systemctl is-active --quiet nginx; then
#     log_message "Attempting to restart nginx..."
#     sudo systemctl restart nginx
#     sleep 5 # Give it time to start
#     check_service "nginx" # Re-check status
# fi

Securely store the PostgreSQL password (e.g., in `/etc/pg_monitoring_password` with restricted permissions).

# Create the password file
echo "your_monitoring_password" | sudo tee /etc/pg_monitoring_password
sudo chmod 600 /etc/pg_monitoring_password

Schedule this script using `cron` to run at regular intervals (e.g., every 5 minutes).

sudo crontab -e
# Add:
# */5 * * * * /path/to/your/health_check.sh

Alerting on Health Check Failures

You can use tools like `cronwatch` or simply parse the log file generated by the health check script. A more integrated approach is to have the script send metrics to Prometheus (e.g., using `node_exporter`’s textfile collector) and then set up alerts in Prometheus based on these health metrics.

For instance, the health check script could output metrics like:

# Inside the health check script, after checks:
echo "health_check_service_nginx_ok $( [ $(systemctl is-active --quiet nginx) = active ] & echo 1 || echo 0 )"
echo "health_check_postgres_ok $( check_postgres && echo 1 || echo 0 )"
# ... and so on for other checks

These metrics would be written to `/var/lib/node_exporter/textfile-collector/health_checks.prom`, and Prometheus alerts could be configured for when `health_check_service_nginx_ok == 0`, etc.