Server Monitoring Best Practices: Keeping Your WooCommerce App and PostgreSQL Clusters Alive on Linode
Proactive PostgreSQL Monitoring with `pg_stat_statements` and `pg_wait_sampling`
Maintaining the health and performance of your PostgreSQL clusters, especially those powering critical WooCommerce applications, demands more than just basic CPU and memory checks. We need deep visibility into query performance and wait events. The `pg_stat_statements` extension is indispensable for identifying slow or resource-intensive queries, while `pg_wait_sampling` provides crucial insights into what your database is *actually* waiting on.
First, ensure `pg_stat_statements` is enabled. This typically involves adding it to `shared_preload_libraries` in your `postgresql.conf` and then creating the extension in each database you want to monitor.
Enabling and Configuring `pg_stat_statements`
Edit your `postgresql.conf` (usually located in `/etc/postgresql/[version]/main/postgresql.conf` or similar on Linode):
shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.max = 10000 pg_stat_statements.track = all pg_stat_statements.track_utility = off pg_stat_statements.save = on
After modifying `postgresql.conf`, you must restart your PostgreSQL service:
sudo systemctl restart postgresql
Then, connect to your target database and create the extension:
-- Connect to your database, e.g., 'woocommerce_db' \c woocommerce_db -- Create the extension CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Now, you can query `pg_stat_statements` to find the top queries by execution time, calls, or other metrics. This is invaluable for pinpointing performance bottlenecks within your WooCommerce application’s database interactions.
SELECT
calls,
total_exec_time,
mean_exec_time,
rows,
substring(query, 1, 60) AS query_snippet
FROM
pg_stat_statements
ORDER BY
total_exec_time DESC
LIMIT 20;
Leveraging `pg_wait_sampling` for Deeper Insights
`pg_wait_sampling` is a more advanced extension that samples wait events occurring within PostgreSQL. This helps diagnose issues that `pg_stat_statements` might miss, such as contention on locks, I/O waits, or network latency.
Installation is similar: add it to `shared_preload_libraries` and create the extension. For `pg_wait_sampling`, you’ll also need to configure its sampling rate.
Edit `postgresql.conf`:
shared_preload_libraries = 'pg_stat_statements,pg_wait_sampling' # Add pg_wait_sampling pg_wait_sampling.sample_rate = 0.1 # Sample 10% of events pg_wait_sampling.include_superuser = off pg_wait_sampling.include_background = off
Restart PostgreSQL after these changes.
sudo systemctl restart postgresql
Create the extension in your database:
\c woocommerce_db CREATE EXTENSION IF NOT EXISTS pg_wait_sampling;
Querying `pg_wait_sampling` reveals the most common wait events. This is crucial for understanding why queries might be slow even if they aren’t the absolute longest-running ones in `pg_stat_statements`.
SELECT
event,
count(*) AS occurrences
FROM
pg_wait_sampling_profile()
GROUP BY
event
ORDER BY
occurrences DESC
LIMIT 20;
Common wait events to watch for include `IO_READ`, `IO_WRITE` (indicating disk bottlenecks), `Lock` (contention on tables or rows), `ClientRead` (waiting for client to receive data), and `LWLock` (internal PostgreSQL lock contention).
WooCommerce Application Monitoring with Prometheus and Node Exporter
For your WooCommerce application servers, a robust monitoring stack is essential. Prometheus, coupled with `node_exporter`, provides a powerful, pull-based metrics collection system. This allows us to track system-level metrics and application-specific performance indicators.
Setting up Prometheus and Node Exporter on Linode
First, install `node_exporter` on each of your WooCommerce application servers. This can be done by downloading the latest release from the official Prometheus GitHub repository.
# On each app server: wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/ rm -rf node_exporter-1.7.0.linux-amd64*
Create a systemd service file for `node_exporter` to ensure it runs as a service and restarts on boot.
# /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=nobody
Group=nogroup
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--web.listen-address=":9100" \
--collector.textfile.directory="/var/lib/node_exporter/textfile-collector"
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter sudo systemctl status node_exporter
Next, configure your Prometheus server to scrape metrics from these `node_exporter` instances. Edit your `prometheus.yml` configuration file.
# prometheus.yml
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
scrape_configs:
# Scrape Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Scrape PostgreSQL exporter (assuming you have one running)
# - job_name: 'postgres'
# static_configs:
# - targets: ['your_postgres_exporter_host:9187'] # Example
# Scrape WooCommerce App Servers
- job_name: 'woocommerce_app'
static_configs:
- targets:
- 'app_server_1_ip:9100'
- 'app_server_2_ip:9100'
# Add all your WooCommerce app server IPs here
# You can also use service discovery for dynamic environments
Restart your Prometheus server after updating the configuration.
Custom WooCommerce Metrics with Textfile Collector
To gain deeper insights into your WooCommerce application’s performance, you can leverage `node_exporter`’s textfile collector. This allows you to write custom metrics to files in a designated directory, which `node_exporter` will then expose.
Create the directory specified in the `node_exporter` service file:
sudo mkdir -p /var/lib/node_exporter/textfile-collector sudo chown nobody:nogroup /var/lib/node_exporter/textfile-collector
Now, create a script that generates your custom metrics. For example, let’s track the number of pending orders or the average response time of a critical API endpoint.
#!/usr/bin/env python3
import os
import time
import subprocess
# Assume you have a way to query your WooCommerce DB or API
# For demonstration, we'll use dummy values.
def get_pending_orders():
# Replace with actual logic to query WooCommerce DB/API
# Example: SELECT COUNT(*) FROM wp_posts WHERE post_type = 'shop_order' AND post_status IN ('wc-pending', 'wc-processing');
return 42
def get_api_response_time(url="https://your-woocommerce-app.com/wp-json/wc/v3/products"):
# Replace with actual logic to measure API response time
# Example using curl
try:
start_time = time.time()
# Use a specific endpoint that's critical for your app
subprocess.run(['curl', '-s', '-o', '/dev/null', '-w', '%{http_code}', url], check=True)
end_time = time.time()
return (end_time - start_time) * 1000 # in milliseconds
except subprocess.CalledProcessError:
return -1 # Indicate an error
def write_metrics():
pending_orders = get_pending_orders()
api_response_ms = get_api_response_time()
metrics_output = f"""
# HELP wc_pending_orders Number of orders in pending or processing status.
# TYPE wc_pending_orders gauge
wc_pending_orders {pending_orders}
# HELP wc_api_response_time_ms Average response time for a critical API endpoint in milliseconds.
# TYPE wc_api_response_time_ms gauge
wc_api_response_time_ms {api_response_ms if api_response_ms != -1 else 'NaN'}
"""
# Write to a file in the textfile collector directory
with open("/var/lib/node_exporter/textfile-collector/woocommerce_custom.prom", "w") as f:
f.write(metrics_output)
if __name__ == "__main__":
write_metrics()
Schedule this script to run periodically (e.g., every minute) using `cron`.
# Edit crontab for the user that can write to the directory (e.g., root or a dedicated user) sudo crontab -e # Add the following line to run the script every minute: * * * * * /usr/bin/python3 /path/to/your/metrics_script.py
After the script runs, Prometheus will scrape these new metrics from `node_exporter` on port 9100.
HAProxy Load Balancer Monitoring and Alerting
For high availability and scalability, HAProxy is often used to distribute traffic to your WooCommerce application servers. Monitoring HAProxy itself is critical to ensure traffic is being routed correctly and to detect backend server failures.
Enabling HAProxy Stats and Prometheus Exporter
HAProxy provides a built-in stats page. For Prometheus integration, we’ll use the `haproxy_exporter`.
First, enable the HAProxy stats socket in your `haproxy.cfg`:
# haproxy.cfg
listen stats
bind *:8404
mode http
stats enable
stats uri /haproxy?stats
stats refresh 10s
stats auth admin:YourSecurePassword # Change this!
Restart HAProxy:
sudo systemctl restart haproxy
Now, install `haproxy_exporter` on a separate server or on your Prometheus server. Download the binary from its GitHub releases page.
# On the server where haproxy_exporter will run: wget https://github.com/prometheus/haproxy_exporter/releases/download/v0.15.0/haproxy_exporter-0.15.0.linux-amd64.tar.gz tar xvfz haproxy_exporter-0.15.0.linux-amd64.tar.gz sudo mv haproxy_exporter-0.15.0.linux-amd64/haproxy_exporter /usr/local/bin/ rm -rf haproxy_exporter-0.15.0.linux-amd64*
Create a systemd service for `haproxy_exporter`:
# /etc/systemd/system/haproxy_exporter.service
[Unit]
Description=HAProxy Exporter
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=nobody
Group=nogroup
ExecStart=/usr/local/bin/haproxy_exporter \
--web.listen-address=":9101" \
--haproxy.scrape-interval="5s" \
--haproxy.stats-url="http://localhost:8404/haproxy?stats;csv" \
--haproxy.username="admin" \
--haproxy.password="YourSecurePassword" # Match the password set in haproxy.cfg
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload sudo systemctl enable haproxy_exporter sudo systemctl start haproxy_exporter sudo systemctl status haproxy_exporter
Finally, configure Prometheus to scrape the `haproxy_exporter`.
# prometheus.yml (add this to your existing config)
- job_name: 'haproxy'
static_configs:
- targets: ['your_haproxy_exporter_ip:9101'] # IP of the server running haproxy_exporter
Restart Prometheus.
HAProxy Alerting Rules
Define alerting rules in Prometheus to notify you of critical HAProxy events. Create a new file, e.g., `haproxy_alerts.yml`.
# haproxy_alerts.yml
groups:
- name: haproxy_alerts
rules:
- alert: HAProxyBackendDown
expr: haproxy_backend_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "HAProxy backend {{ $labels.backend }} on {{ $labels.instance }} is down."
description: "The HAProxy backend {{ $labels.backend }} has been down for more than 5 minutes."
- alert: HAProxyFrontendDown
expr: haproxy_frontend_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "HAProxy frontend {{ $labels.frontend }} on {{ $labels.instance }} is down."
description: "The HAProxy frontend {{ $labels.frontend }} has been down for more than 5 minutes."
- alert: HAProxyHighErrorRate
expr: sum(rate(haproxy_server_errors_total{job="haproxy"}[5m])) by (instance, backend, server) > 10
for: 10m
labels:
severity: warning
annotations:
summary: "High error rate on HAProxy backend {{ $labels.backend }} server {{ $labels.server }}."
description: "The HAProxy backend {{ $labels.backend }} server {{ $labels.server }} on {{ $labels.instance }} is experiencing a high error rate (more than 10 errors per 5 minutes)."
Ensure Prometheus is configured to load these alert rules by adding `alerting` and `rule_files` sections to your `prometheus.yml`.
# prometheus.yml
rule_files:
- "haproxy_alerts.yml"
# - "other_alerts.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- 'alertmanager_host:9093' # Your Alertmanager instance
Centralized Logging with ELK Stack (Elasticsearch, Logstash, Kibana)
Aggregating logs from your WooCommerce application servers, PostgreSQL instances, and HAProxy is crucial for debugging and auditing. The ELK stack (or its modern successor, the Elastic Stack) provides a powerful solution.
Log Shipping with Filebeat
Filebeat is a lightweight shipper that you install on your servers to forward log files to Logstash or directly to Elasticsearch.
Install Filebeat on your WooCommerce app servers, PostgreSQL servers, and HAProxy servers.
# On each server: # Download and install Filebeat (example for Debian/Ubuntu) curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.11.1-amd64.deb sudo dpkg -i filebeat-8.11.1-amd64.deb rm filebeat-8.11.1-amd64.deb
Configure Filebeat to collect relevant logs. Edit `filebeat.yml`.
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/apache2/access.log
- /var/log/apache2/error.log
- /var/log/php/error.log # Adjust path for your PHP error log
fields_under_root: true
fields:
app_type: woocommerce_app
- type: log
enabled: true
paths:
- /var/log/postgresql/postgresql-*.log # Adjust for your PostgreSQL log path
fields_under_root: true
fields:
app_type: postgresql
- type: log
enabled: true
paths:
- /var/log/haproxy.log # Adjust for your HAProxy log path
fields_under_root: true
fields:
app_type: haproxy
# Configure output to Logstash or Elasticsearch
output.logstash:
hosts: ["your_logstash_host:5044"] # Or directly to Elasticsearch: ["your_elasticsearch_host:9200"]
# If sending directly to Elasticsearch, you might need authentication
# output.elasticsearch:
# hosts: ["your_elasticsearch_host:9200"]
# username: "elastic"
# password: "changeme"
Enable and start the Filebeat service:
sudo systemctl enable filebeat sudo systemctl start filebeat sudo systemctl status filebeat
Logstash Configuration for Processing
On your Logstash server, configure pipelines to parse and enrich logs. Create a pipeline configuration file, e.g., `/etc/logstash/conf.d/02-woocommerce-pipeline.conf`.
# /etc/logstash/conf.d/02-woocommerce-pipeline.conf
input {
beats {
port => 5044
}
}
filter {
if [app_type] == "woocommerce_app" {
# Example: Parse Apache access logs
if "access.log" in [log.file.path] {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "client.address"
}
useragent {
source => "agent.original"
}
}
# Example: Parse PHP error logs (adjust grok pattern as needed)
if "php/error.log" in [log.file.path] {
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:loglevel} \[%{DATA:thread}\] %{JAVACLASS:class}: %{GREEDYDATA:error_message}" }
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
}
}
}
if [app_type] == "postgresql" {
# Example: Parse PostgreSQL logs (adjust grok pattern)
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{NUMBER:process_id}\] %{DATA:user_at_host} %{DATA:database} %{LOGLEVEL:loglevel} %{GREEDYDATA:log_message}" }
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
}
}
if [app_type] == "haproxy" {
# Example: Parse HAProxy logs
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:frontend_name}:%{WORD:backend_name}/%{WORD:server_name} %{NUMBER:http_status_code} %{NUMBER:bytes_read} %{NUMBER:elapsed_time_ms} %{NUMBER:bytes_written} %{WORD:http_method} %{URIPATHPARAM:url} %{DATA:client_ip} %{DATA:client_port} %{DATA:server_ip} %{DATA:server_port}" }
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
}
}
}
output {
elasticsearch {
hosts => ["your_elasticsearch_host:9200"]
index => "%{app_type}-%{+YYYY.MM.dd}"
# user => "elastic"
# password => "changeme"
}
}
Restart Logstash after applying changes.
Kibana Dashboards for Visualization
In Kibana, create index patterns for your logs (e.g., `woocommerce_app-*`, `postgresql-*`, `haproxy-*`). Then, build dashboards to visualize key metrics:
- WooCommerce App: HTTP status code distribution, top requested URLs, error rates (PHP errors, 5xx responses), API response times, user agent analysis, geographic distribution of users.
- PostgreSQL: Log levels (errors, warnings), slow query logs (if configured), connection attempts, database-specific errors.
- HAProxy: Backend server status (up/down), request rates, error rates, response times, client IP analysis.
Leverage pre-built dashboards or create custom ones tailored to your specific needs. This centralized view is invaluable for rapid troubleshooting and performance analysis.
System Health Checks and Automated Recovery Scripts
Beyond metrics and logs, periodic health checks and automated recovery mechanisms are vital. This involves scripting checks for essential services and implementing basic self-healing capabilities.
Basic Health Check Script
Create a script that runs on each server (app, DB, load balancer) to check the status of critical processes and network connectivity.
#!/bin/bash
# --- Configuration ---
LOG_FILE="/var/log/health_check.log"
HEALTHCHECK_URL="https://your-woocommerce-app.com/wp-cron.php" # Example health check endpoint
PG_HOST="localhost"
PG_PORT="5432"
PG_USER="monitoring_user" # Create a read-only user for monitoring
PG_DB="woocommerce_db"
REQUIRED_SERVICES=("nginx" "php-fpm" "postgresql" "haproxy") # Adjust for your setup
# --- Functions ---
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | sudo tee -a $LOG_FILE
}
check_service() {
local service_name=$1
if systemctl is-active --quiet $service_name; then
log_message "OK: Service '$service_name' is running."
return 0
else
log_message "ERROR: Service '$service_name' is NOT running."
return 1
fi
}
check_network() {
local url=$1
if curl -s --head "$url" | grep "200 OK" > /dev/null; then
log_message "OK: Network check to '$url' successful."
return 0
else
log_message "ERROR: Network check to '$url' failed."
return 1
fi
}
check_postgres() {
if PGPASSWORD=$(cat /etc/pg_monitoring_password); do
if psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PG_DB -c '\q' > /dev/null 2>&1; then
log_message "OK: PostgreSQL connection to '$PG_DB' successful."
return 0
else
log_message "ERROR: PostgreSQL connection to '$PG_DB' failed."
return 1
fi
else
log_message "ERROR: Could not retrieve PostgreSQL password."
return 1
fi
}
# --- Main Execution ---
log_message "--- Starting Health Check ---"
# Check required services
for service in "${REQUIRED_SERVICES[@]}"; do
check_service "$service"
done
# Perform specific checks based on server role (add logic here)
# Example for App Server:
if [[ $(hostname -s) == appserver* ]]; then
check_network "$HEALTHCHECK_URL"
# Add checks for PHP-FPM status, disk space, etc.
fi
# Example for DB Server:
if [[ $(hostname -s) == dbserver* ]]; then
check_postgres
# Add checks for disk space, replication status (if applicable)
fi
# Example for Load Balancer:
if [[ $(hostname -s) == lbserver* ]]; then
# Check HAProxy stats endpoint or a dummy backend response
check_network "http://localhost:8404/haproxy?stats;csv" # Example
fi
log_message "--- Health Check Finished ---"
# --- Automated Recovery (Optional) ---
# This section should be carefully implemented to avoid unintended consequences.
# Example: Restarting a failed service.
# if ! systemctl is-active --quiet nginx; then
# log_message "Attempting to restart nginx..."
# sudo systemctl restart nginx
# sleep 5 # Give it time to start
# check_service "nginx" # Re-check status
# fi
Securely store the PostgreSQL password (e.g., in `/etc/pg_monitoring_password` with restricted permissions).
# Create the password file echo "your_monitoring_password" | sudo tee /etc/pg_monitoring_password sudo chmod 600 /etc/pg_monitoring_password
Schedule this script using `cron` to run at regular intervals (e.g., every 5 minutes).
sudo crontab -e # Add: # */5 * * * * /path/to/your/health_check.sh
Alerting on Health Check Failures
You can use tools like `cronwatch` or simply parse the log file generated by the health check script. A more integrated approach is to have the script send metrics to Prometheus (e.g., using `node_exporter`’s textfile collector) and then set up alerts in Prometheus based on these health metrics.
For instance, the health check script could output metrics like:
# Inside the health check script, after checks: echo "health_check_service_nginx_ok $( [ $(systemctl is-active --quiet nginx) = active ] & echo 1 || echo 0 )" echo "health_check_postgres_ok $( check_postgres && echo 1 || echo 0 )" # ... and so on for other checks
These metrics would be written to `/var/lib/node_exporter/textfile-collector/health_checks.prom`, and Prometheus alerts could be configured for when `health_check_service_nginx_ok == 0`, etc.