Server Monitoring Best Practices: Keeping Your WooCommerce App and Redis Clusters Alive on DigitalOcean
Proactive Redis Cluster Health Checks with `redis-cli`
Maintaining the health of Redis clusters, especially those powering a high-traffic WooCommerce store, requires more than just basic uptime checks. We need to monitor internal cluster state, replication lag, and memory usage. DigitalOcean’s managed Redis offers some visibility, but direct `redis-cli` commands provide granular insights crucial for early detection of issues.
A fundamental check involves verifying the cluster’s `CLUSTER INFO` and `CLUSTER NODES` output. This helps confirm all nodes are reachable, slots are assigned correctly, and the cluster is in a stable state. We can script these checks to run periodically.
Automating Redis Cluster Status Checks
We’ll use a simple Bash script to connect to a primary node and execute these commands. The output can be parsed to trigger alerts. For a cluster, pick any master node to query.
Script for Redis Cluster Health
This script connects to a specified Redis master, retrieves cluster information, and checks for common failure indicators like `cluster_state:fail` or a high number of `master_repl_offset` discrepancies.
#!/bin/bash
REDIS_HOST="your_redis_master_ip"
REDIS_PORT="6379"
REDIS_PASSWORD="your_redis_password" # If password protected
# Function to execute redis-cli command
run_redis_command() {
redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" -a "$REDIS_PASSWORD" "$@"
}
echo "--- Checking Redis Cluster Info ---"
CLUSTER_INFO=$(run_redis_command CLUSTER INFO)
if [ $? -ne 0 ]; then
echo "ERROR: Could not connect to Redis at $REDIS_HOST:$REDIS_PORT"
exit 1
fi
echo "$CLUSTER_INFO"
# Check cluster state
CLUSTER_STATE=$(echo "$CLUSTER_INFO" | grep "cluster_state:" | awk '{print $2}')
if [ "$CLUSTER_STATE" != "ok" ]; then
echo "ALERT: Redis cluster state is NOT ok: $CLUSTER_STATE"
# Add your alerting mechanism here (e.g., send email, Slack notification)
fi
# Check for replication lag (requires CLUSTER NODES output)
echo "--- Checking Redis Cluster Nodes & Replication ---"
CLUSTER_NODES=$(run_redis_command CLUSTER NODES)
echo "$CLUSTER_NODES"
# Parse for replication offsets and identify potential lag
# This is a simplified check. More robust parsing might be needed for complex setups.
declare -A master_offsets
while IFS= read -r line; do
NODE_ID=$(echo "$line" | awk '{print $1}')
ROLE=$(echo "$line" | awk '{print $3}')
IP_PORT=$(echo "$line" | awk '{print $2}' | cut -d',' -f1) # Get IP:Port, ignore flags
if [[ "$ROLE" == "master" ]]; then
OFFSET=$(echo "$line" | grep -oP 'master_repl_offset:\K\d+')
if [ -n "$OFFSET" ]; then
master_offsets["$NODE_ID"]="$OFFSET"
fi
fi
done <<< "$(echo "$CLUSTER_NODES" | grep " myself")" # Focus on the node we queried
# Compare offsets of replicas to the master's offset
# This requires iterating through all nodes and comparing their offsets to their respective masters.
# For simplicity, let's assume a single master for this example and check its replicas.
# A more advanced script would iterate through all masters and their replicas.
# Example: Check replicas of the primary node we connected to
PRIMARY_NODE_ID=$(echo "$CLUSTER_INFO" | grep "cluster_my_current_epoch" | awk '{print $2}' | cut -d':' -f1) # This is a heuristic, CLUSTER NODES is better
PRIMARY_NODE_INFO=$(echo "$CLUSTER_NODES" | grep "$PRIMARY_NODE_ID")
PRIMARY_OFFSET=$(echo "$PRIMARY_NODE_INFO" | grep -oP 'master_repl_offset:\K\d+')
echo "--- Checking Replication Lag for Master ($PRIMARY_NODE_ID) ---"
while IFS= read -r line; do
REPLICA_ID=$(echo "$line" | awk '{print $1}')
REPLICA_ROLE=$(echo "$line" | awk '{print $3}')
REPLICA_IP_PORT=$(echo "$line" | awk '{print $2}' | cut -d',' -f1)
if [[ "$REPLICA_ROLE" == "slave" ]] && [[ $(echo "$line" | grep -oP 'master_host:\K[^,]+') == "$REDIS_HOST" ]]; then # Check if it's a slave of our target master
REPLICA_OFFSET=$(echo "$line" | grep -oP 'master_repl_offset:\K\d+')
if [ -n "$REPLICA_OFFSET" ] && [ -n "$PRIMARY_OFFSET" ]; then
LAG=$((PRIMARY_OFFSET - REPLICA_OFFSET))
echo "Replica $REPLICA_ID ($REPLICA_IP_PORT) lag: $LAG bytes"
if [ "$LAG" -gt 10485760 ]; then # Example: Alert if lag is > 10MB
echo "ALERT: High replication lag for replica $REPLICA_ID ($REPLICA_IP_PORT): $LAG bytes"
# Add alerting here
fi
fi
fi
done <<< "$CLUSTER_NODES"
echo "--- Redis Cluster Health Check Complete ---"
exit 0
To integrate this into a monitoring system like Prometheus, you could use redis_exporter, which exposes metrics from `redis-cli` commands. However, for custom logic and direct alerting, a Bash script executed by cron or a systemd timer is effective. Ensure the script has read access to the Redis cluster and appropriate credentials if authentication is enabled.
Monitoring WooCommerce Application Performance with New Relic
For the WooCommerce application itself, understanding performance bottlenecks is paramount. This involves tracking request latency, error rates, database query times, and external API calls. New Relic is a powerful Application Performance Monitoring (APM) tool that provides deep insights into PHP applications.
Configuring the New Relic PHP Agent
The first step is to install and configure the New Relic PHP agent on your DigitalOcean Droplets. This typically involves downloading the agent, running the installer, and then configuring the `newrelic.ini` file.
Installation and Configuration Steps
- Install Agent: Use the New Relic download script.
curl -Ls https://download.newrelic.com/install/newrelic-php5.sh | sudo bash
- Configure `newrelic.ini`: Locate the `newrelic.ini` file (often in `/etc/php/[php_version]/cli/conf.d/` or `/etc/php/[php_version]/fpm/conf.d/`). Edit it with your New Relic license key and application name.
; This file was generated by New Relic. ; Please do not edit this file directly. ; Instead, edit the newrelic.ini file located in /etc/php/[php_version]/cli/php.ini or /etc/php/[php_version]/fpm/php.ini. license_key = "YOUR_NEW_RELIC_LICENSE_KEY" app_name = "WooCommerce Production App" ; high_security = true ; Uncomment for higher security if needed ; distributed_tracing_enabled = true ; Enable distributed tracing for microservices ; log_level = "info" ; Adjust log level as needed
After configuration, restart your PHP-FPM service and web server (e.g., Nginx).
sudo systemctl restart php[php_version]-fpm sudo systemctl restart nginx
Key WooCommerce Metrics to Monitor in New Relic
Once the agent is active, New Relic will automatically collect a wealth of data. Focus on these critical metrics:
- Transaction Traces: Identify slow PHP functions, database queries, and external calls within your WooCommerce requests. Look for transactions exceeding your SLO (Service Level Objective), typically a few hundred milliseconds for most operations.
- Error Rate: Monitor PHP errors, exceptions, and HTTP 5xx responses. A sudden spike indicates a critical issue.
- Database Performance: Analyze query execution times for tables like `wp_posts`, `wp_options`, `wp_wc_order_stats`, and custom WooCommerce tables. Slow queries can cripple performance.
- External Services: Track latency and errors for calls to payment gateways (Stripe, PayPal), shipping APIs, and any third-party integrations.
- Memory Usage: Observe PHP memory consumption. High usage can lead to `Allowed memory size exhausted` errors.
Set up custom alerts in New Relic for these metrics. For instance, an alert for an error rate exceeding 1% over 5 minutes, or average transaction time for `checkout` exceeding 2 seconds.
DigitalOcean Droplet Resource Monitoring with `node_exporter` and Prometheus
Beyond application-specific monitoring, the underlying infrastructure—your DigitalOcean Droplets—needs robust resource monitoring. This includes CPU, memory, disk I/O, and network traffic. A common and powerful stack for this is Prometheus with `node_exporter`.
Deploying `node_exporter`
`node_exporter` is a Prometheus exporter that exposes a hardware and OS metrics. It's straightforward to deploy on each Droplet.
Installation and Running `node_exporter`
Download the latest release, extract it, and run it. For production, it's best to run it as a systemd service.
# Download the latest release (replace with actual version) wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz cd node_exporter-1.7.0.linux-amd64 # Run it (for testing) ./node_exporter # For systemd service (create /etc/systemd/system/node_exporter.service) # Ensure you adjust User and Group if running under a non-root user
[Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/node_exporter # Adjust path if installed elsewhere [Install] WantedBy=multi-user.target
After creating the service file, enable and start it:
sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter sudo systemctl status node_exporter
Ensure that port 9100 is open in your DigitalOcean firewall for Prometheus to scrape metrics.
Prometheus Configuration for Scraping
Your central Prometheus server needs to be configured to scrape metrics from each Droplet's `node_exporter`. This is done in the `prometheus.yml` configuration file.
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_configs:
# Scrape Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Scrape WooCommerce App Droplets
- job_name: 'woocommerce_app'
static_configs:
- targets:
- 'droplet_ip_1:9100'
- 'droplet_ip_2:9100'
# Add all your WooCommerce app Droplet IPs here
# Scrape Redis Droplets (if not managed and running node_exporter)
- job_name: 'redis_nodes'
static_configs:
- targets:
- 'redis_droplet_ip_1:9100'
- 'redis_droplet_ip_2:9100'
# Add all your Redis Droplet IPs here
# If using DigitalOcean Managed Redis, you won't run node_exporter on them.
# You'd typically use their API or a dedicated Redis exporter for Redis metrics.
Restart Prometheus after updating the configuration.
Essential Droplet Metrics for WooCommerce
- CPU Usage: `node_cpu_seconds_total` (rate over time). High sustained CPU can indicate inefficient code, heavy traffic, or insufficient resources.
- Memory Usage: `node_memory_MemAvailable_bytes` (absolute value). Low available memory can lead to swapping and performance degradation.
- Disk I/O: `node_disk_io_time_seconds_total` (rate over time). High I/O wait times can bottleneck database operations.
- Network Traffic: `node_network_receive_bytes_total` and `node_network_transmit_bytes_total` (rate over time). Spikes can indicate traffic surges or unusual activity.
- Load Average: `node_load1`, `node_load5`, `node_load15`. Consistently high load averages (e.g., exceeding the number of CPU cores) suggest the system is overloaded.
Configure Grafana dashboards to visualize these metrics, correlating them with application performance data from New Relic and Redis cluster status. Set up Prometheus Alertmanager rules for critical thresholds, such as CPU usage consistently above 80% for 10 minutes, or available memory below 100MB.