Server Monitoring Best Practices: Keeping Your Magento 2 App and Redis Clusters Alive on OVH
Proactive Health Checks for Magento 2 and Redis on OVH
Maintaining a high-availability Magento 2 deployment, especially when leveraging Redis for caching and session management, demands a robust and proactive monitoring strategy. This guide focuses on essential server-level and application-specific checks, tailored for an OVH infrastructure, ensuring minimal downtime and optimal performance. We’ll cover critical metrics, configuration snippets, and diagnostic commands to keep your Magento 2 clusters and Redis instances operational.
Core Server Metrics Monitoring
Before diving into application specifics, ensuring the underlying server infrastructure is healthy is paramount. We’ll use standard Linux tools and common monitoring agents. For OVH, this typically means monitoring your dedicated servers or VPS instances.
CPU Utilization
Sustained high CPU usage can cripple application performance. We’ll monitor the average load over 1, 5, and 15 minutes, and specifically track user, system, and I/O wait times. A common threshold for alerting is when the 15-minute load average consistently exceeds the number of CPU cores.
A simple Bash script can capture this data:
#!/bin/bash
LOAD_AVG=$(uptime | awk -F'load average:' '{ print $2 }' | sed 's/,//g')
CPU_CORES=$(nproc)
echo "LOAD_AVG=$LOAD_AVG"
echo "CPU_CORES=$CPU_CORES"
# Example alerting logic (can be integrated with Nagios, Zabbix, Prometheus Alertmanager, etc.)
if (( $(echo "$LOAD_AVG > $CPU_CORES" | bc -l) )); then
echo "ALERT: High CPU load average detected: $LOAD_AVG (CPU Cores: $CPU_CORES)"
fi
Memory Usage
Both RAM and swap usage are critical. Insufficient RAM leads to increased disk I/O (swapping), severely impacting Magento’s performance. We’ll monitor free memory, used memory, and swap usage. Alerting on swap usage exceeding a small percentage (e.g., 5%) is a good practice.
Using free -m and parsing its output:
#!/bin/bash
MEM_INFO=$(free -m)
TOTAL_MEM=$(echo "$MEM_INFO" | awk '/^Mem:/ {print $2}')
USED_MEM=$(echo "$MEM_INFO" | awk '/^Mem:/ {print $3}')
FREE_MEM=$(echo "$MEM_INFO" | awk '/^Mem:/ {print $4}')
SWAP_TOTAL=$(echo "$MEM_INFO" | awk '/^Swap:/ {print $2}')
SWAP_USED=$(echo "$MEM_INFO" | awk '/^Swap:/ {print $3}')
echo "TOTAL_MEM=${TOTAL_MEM}MB"
echo "USED_MEM=${USED_MEM}MB"
echo "FREE_MEM=${FREE_MEM}MB"
echo "SWAP_TOTAL=${SWAP_TOTAL}MB"
echo "SWAP_USED=${SWAP_USED}MB"
# Example alerting logic for swap
if [ "$SWAP_TOTAL" -gt 0 ]; then
SWAP_PERCENT=$(awk "BEGIN {printf \"%.2f\", ($SWAP_USED / $SWAP_TOTAL) * 100}")
if (( $(echo "$SWAP_PERCENT > 5.0" | bc -l) )); then
echo "ALERT: High swap usage detected: ${SWAP_PERCENT}% (${SWAP_USED}MB / ${SWAP_TOTAL}MB)"
fi
else
if [ "$SWAP_USED" -gt 0 ]; then
echo "ALERT: Swap space is being used but total swap is reported as 0."
fi
fi
Disk I/O and Space
Disk bottlenecks are common. Monitoring I/O wait times and disk space utilization is crucial. For Magento, logs and temporary files can consume significant space. Alerting on disk usage exceeding 85-90% is standard.
Checking disk space:
#!/bin/bash
# Monitor specific Magento directories and general root
DIRS_TO_MONITOR=("/var/log" "/tmp" "/var/www/html/var/log" "/var/www/html/var/cache" "/var/www/html/var/session")
THRESHOLD=85 # Percentage
for DIR in "${DIRS_TO_MONITOR[@]}"; do
if [ -d "$DIR" ]; then
USAGE=$(df -h "$DIR" | awk 'NR==2 {print $5}' | sed 's/%//')
echo "Disk usage for $DIR: $USAGE%"
if [ "$USAGE" -ge "$THRESHOLD" ]; then
echo "ALERT: Disk space critically low on $DIR: $USAGE%"
fi
else
echo "WARNING: Directory $DIR does not exist."
fi
done
# General root filesystem check
ROOT_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
echo "Disk usage for /: $ROOT_USAGE%"
if [ "$ROOT_USAGE" -ge "$THRESHOLD" ]; then
echo "ALERT: Disk space critically low on /: $ROOT_USAGE%"
fi
For disk I/O, tools like iostat are invaluable. Monitoring metrics like %util (percentage of time the device was busy) and await (average wait time for I/O requests) can pinpoint disk contention.
#!/bin/bash
# Monitor I/O wait for all devices
echo "Disk I/O Statistics:"
iostat -dx 5 1 | grep -vE '^(Device:|avg-cpu:)' | awk '{print $1, $12, $14}' | while read DEVICE UTIL WAIT; do
echo " Device: $DEVICE, %util: $UTIL, await: $WAIT ms"
# Example alerting thresholds (adjust based on your storage performance)
if (( $(echo "$UTIL > 90.0" | bc -l) )); then
echo "ALERT: High disk utilization on $DEVICE: $UTIL%"
fi
if (( $(echo "$WAIT > 50.0" | bc -l) )); then
echo "ALERT: High disk await time on $DEVICE: $WAIT ms"
fi
done
Magento 2 Specific Monitoring
Magento 2’s architecture, with its complex dependency injection, compilation, and caching layers, requires application-aware monitoring.
Web Server (Nginx/Apache) Status
Monitor web server response times, error rates (4xx, 5xx), and active connections. For Nginx, the status module is essential.
# Nginx configuration to enable status module
server {
listen 8080; # Use a non-standard port to avoid conflicts
server_name your_monitoring_domain.com;
location /nginx_status {
stub_status;
allow 127.0.0.1; # Restrict access to localhost for security
deny all;
}
}
You can then fetch this data using curl:
#!/bin/bash
STATUS_URL="http://localhost:8080/nginx_status"
RESPONSE=$(curl -s "$STATUS_URL")
if [ -z "$RESPONSE" ]; then
echo "ALERT: Failed to retrieve Nginx status from $STATUS_URL"
exit 1
fi
ACTIVE_CONNECTIONS=$(echo "$RESPONSE" | awk '/Active connections:/ {print $3}')
REQUESTS=$(echo "$RESPONSE" | awk '/Reading:/ {print $2}') # This is actually 'requests' in stub_status output
WRITING=$(echo "$RESPONSE" | awk '/Writing:/ {print $2}')
WAITING=$(echo "$RESPONSE" | awk '/Waiting:/ {print $2}')
echo "Nginx Active Connections: $ACTIVE_CONNECTIONS"
echo "Nginx Requests: $REQUESTS"
echo "Nginx Writing: $WRITING"
echo "Nginx Waiting: $WAITING"
# Example alerting: High active connections
if [ "$ACTIVE_CONNECTIONS" -gt 1000 ]; then # Adjust threshold
echo "ALERT: High Nginx active connections: $ACTIVE_CONNECTIONS"
fi
For Apache, use mod_status similarly. Beyond basic metrics, monitor the rate of 5xx errors. This can be done by parsing Nginx/Apache error logs or by using tools like Prometheus with `nginx-exporter` or `apache-exporter`.
PHP-FPM Performance
Magento 2 is PHP-intensive. PHP-FPM’s performance is critical. Monitor the number of active processes, idle processes, and request queue length. Ensure your `pm.max_children` is set appropriately for your server’s RAM and expected load.
# PHP-FPM pool configuration (e.g., /etc/php/8.1/fpm/pool.d/www.conf) pm = dynamic pm.max_children = 100 ; Adjust based on RAM and CPU pm.start_servers = 10 pm.min_spare_servers = 5 pm.max_spare_servers = 20 pm.process_idle_timeout = 10s pm.max_requests = 500 ; Restart process after N requests
PHP-FPM’s status page can provide real-time metrics. Enable it in your pool configuration:
; /etc/php/8.1/fpm/pool.d/www.conf pm.status_path = /fpm_status listen.allowed_clients = 127.0.0.1
And configure your web server to proxy to it:
# In your Nginx site configuration
location ~ ^/fpm_status {
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_pass unix:/run/php/php8.1-fpm.sock; # Adjust path to your PHP-FPM socket
allow 127.0.0.1;
deny all;
}
Then, fetch the status:
#!/bin/bash
STATUS_URL="http://localhost/fpm_status?full&json" # Use full for more details, json for parsing
RESPONSE=$(curl -s "$STATUS_URL")
if [ -z "$RESPONSE" ]; then
echo "ALERT: Failed to retrieve PHP-FPM status from $STATUS_URL"
exit 1
fi
# Parsing JSON output
ACTIVE_PROCESSES=$(echo "$RESPONSE" | jq '.processes')
IDLE_PROCESSES=$(echo "$RESPONSE" | jq '.idle_processes')
ACTIVE_CONNECTIONS=$(echo "$RESPONSE" | jq '.active_processes') # This is 'active_processes' in FPM status
MAX_CHILDREN=$(echo "$RESPONSE" | jq '.max_children')
QUEUE_LENGTH=$(echo "$RESPONSE" | jq '.queue')
echo "PHP-FPM Active Processes: $ACTIVE_PROCESSES"
echo "PHP-FPM Idle Processes: $IDLE_PROCESSES"
echo "PHP-FPM Max Children: $MAX_CHILDREN"
echo "PHP-FPM Queue Length: $QUEUE_LENGTH"
# Example alerting
if [ "$ACTIVE_PROCESSES" -eq "$MAX_CHILDREN" ] && [ "$QUEUE_LENGTH" -gt 0 ]; then
echo "ALERT: PHP-FPM pool is saturated. Active processes: $ACTIVE_PROCESSES, Max children: $MAX_CHILDREN, Queue: $QUEUE_LENGTH"
fi
if [ "$QUEUE_LENGTH" -gt 10 ]; then # Adjust threshold
echo "ALERT: PHP-FPM request queue is growing: $QUEUE_LENGTH"
fi
Magento 2 Specific Checks
Beyond server and web server metrics, we need to check Magento’s own health. This includes checking for deadlocks, slow database queries, and cache health.
Database Health (MySQL/MariaDB)
Magento is heavily database-dependent. Monitor slow query logs, connection counts, and InnoDB status. Look for deadlocks and long-running queries.
-- Connect to MySQL mysql -u your_user -pyour_password -h your_db_host -- Check current connections SHOW GLOBAL STATUS LIKE 'Threads_connected'; SHOW GLOBAL STATUS LIKE 'Max_used_connections'; -- Check for active queries (can be resource intensive on busy servers) SHOW FULL PROCESSLIST; -- Check InnoDB status for deadlocks and buffer pool usage SHOW ENGINE INNODB STATUS\G
You can script checks for deadlocks by parsing the InnoDB status output. Look for the “LATEST DETECTED DEADLOCK” section.
#!/bin/bash
MYSQL_USER="your_user"
MYSQL_PASS="your_password"
MYSQL_HOST="your_db_host"
INNODB_STATUS=$(mysql -u "$MYSQL_USER" -p"$MYSQL_PASS" -h "$MYSQL_HOST" -e "SHOW ENGINE INNODB STATUS\G")
if echo "$INNODB_STATUS" | grep -q "LATEST DETECTED DEADLOCK"; then
echo "ALERT: InnoDB deadlock detected!"
echo "$INNODB_STATUS" | grep -A 20 "LATEST DETECTED DEADLOCK" # Print relevant section
fi
# Check for long running queries (example: > 60 seconds)
LONG_QUERIES=$(mysql -u "$MYSQL_USER" -p"$MYSQL_PASS" -h "$MYSQL_HOST" -e "SELECT id, user, host, db, command, time, state, info FROM information_schema.processlist WHERE time > 60 ORDER BY time DESC;")
if [ -n "$LONG_QUERIES" ] && [ "$(echo "$LONG_QUERIES" | wc -l)" -gt 1 ]; then
echo "ALERT: Long running queries detected:"
echo "$LONG_QUERIES"
fi
Magento Cache Status
Magento relies heavily on its cache. While Redis is often used for this, ensuring the cache is functional and not stale is important. A simple check is to ensure cache types are enabled and that cache entries are being updated.
#!/bin/bash
MAGENTO_ROOT="/var/www/html" # Adjust to your Magento installation path
BIN_MAGENTO="${MAGENTO_ROOT}/bin/magento"
# Check if cache is enabled
CACHE_STATUS=$($BIN_MAGENTO cache:status)
if echo "$CACHE_STATUS" | grep -q "Cache is enabled"; then
echo "Magento Cache: Enabled"
else
echo "ALERT: Magento Cache is disabled!"
fi
# Check for specific cache types (e.g., config, layout, block_html)
# This is more complex and often involves application-level checks or synthetic transactions.
# A simpler approach is to monitor cache hit/miss ratios if your monitoring system can instrument it.
# Example: Attempt to clear a specific cache type and monitor for errors
# This is a disruptive check, use with caution in production.
# echo "Attempting to clear configuration cache..."
# $BIN_MAGENTO cache:clean config
# if [ $? -ne 0 ]; then
# echo "ALERT: Failed to clear Magento configuration cache."
# fi
Redis Cluster Monitoring
For Magento 2, Redis is typically used for session storage and caching. A Redis cluster (using Redis Sentinel or Redis Cluster) adds complexity but also resilience. Monitoring its health is critical.
Redis Server Health
Monitor Redis’s memory usage, connected clients, and command latency. Ensure persistence (RDB/AOF) is configured and working if required.
#!/bin/bash
REDIS_CLI="redis-cli" # Adjust path if needed
# Monitor memory usage
MEMORY_USAGE=$($REDIS_CLI INFO memory | grep 'used_memory_human:')
echo "Redis Memory Usage: $MEMORY_USAGE"
# Monitor connected clients
CONNECTED_CLIENTS=$($REDIS_CLI INFO clients | grep 'connected_clients:')
echo "Redis Connected Clients: $CONNECTED_CLIENTS"
# Monitor command latency (example: check latency for GET command)
# This requires Redis 4.0+ and enabling latency monitoring.
# $REDIS_CLI LATENCY LATEST
# For simpler checks, monitor overall server responsiveness.
# Basic responsiveness check
RESPONSE_TIME=$($REDIS_CLI --latency-history 10 | tail -n 1 | awk '{print $1}')
echo "Redis Latency (last 10s avg): $RESPONSE_TIME ms"
if (( $(echo "$RESPONSE_TIME > 50.0" | bc -l) )); then # Adjust threshold
echo "ALERT: High Redis latency detected: $RESPONSE_TIME ms"
fi
Redis Sentinel Monitoring
If using Redis Sentinel for high availability, monitor Sentinel’s health and its view of the master/replica status.
#!/bin/bash
SENTINEL_CLI="redis-cli -p 26379" # Default Sentinel port
# Get list of masters monitored by Sentinel
MASTERS=$($SENTINEL_CLI 'sentinel masters')
echo "$MASTERS" | grep "master" | while read -r line; do
MASTER_NAME=$(echo "$line" | awk '{print $2}')
echo "Monitoring Master: $MASTER_NAME"
# Get master status
MASTER_INFO=$($SENTINEL_CLI "sentinel master $MASTER_NAME")
echo "$MASTER_INFO" | grep "ip"
echo "$MASTER_INFO" | grep "port"
echo "$MASTER_INFO" | grep "role"
echo "$MASTER_INFO" | grep "num-slaves"
echo "$MASTER_INFO" | grep "num-other-sentinels"
# Check if master is currently flagged as +sdown or +odown
if echo "$MASTER_INFO" | grep -q "flags.*down"; then
echo "ALERT: Redis master '$MASTER_NAME' is marked as DOWN by Sentinel!"
fi
# Get replica status
REPLICAS=$($SENTINEL_CLI "sentinel replicas $MASTER_NAME")
echo "$REPLICAS" | grep "ip"
echo "$REPLICAS" | grep "flags.*slave" | while read -r replica_line; do
REPLICA_IP=$(echo "$replica_line" | awk '{print $4}')
REPLICA_FLAGS=$(echo "$replica_line" | awk '{print $5}')
if [[ "$REPLICA_FLAGS" == *"down"* ]]; then
echo "ALERT: Redis replica at $REPLICA_IP for master '$MASTER_NAME' is marked as DOWN by Sentinel!"
fi
done
done
# Check Sentinel's own health (e.g., if it's reachable)
if ! $SENTINEL_CLI ping &>/dev/null; then
echo "ALERT: Sentinel instance at $(hostname):26379 is unreachable."
fi
Redis Cluster Monitoring
If using Redis Cluster, monitor cluster state, node health, and slot distribution.
#!/bin/bash
REDIS_CLUSTER_CLI="redis-cli -c -p 7000" # Example port, adjust as needed
# Check cluster state
CLUSTER_INFO=$($REDIS_CLUSTER_CLI cluster info)
echo "$CLUSTER_INFO"
# Check cluster nodes
CLUSTER_NODES=$($REDIS_CLUSTER_CLI cluster nodes)
echo "$CLUSTER_NODES"
# Alert if cluster is not in 'ok' state
if echo "$CLUSTER_INFO" | grep -q "cluster_state:fail"; then
echo "ALERT: Redis Cluster state is FAIL!"
fi
# Alert if any node is marked as 'fail'
if echo "$CLUSTER_NODES" | grep -q "fail"; then
echo "ALERT: One or more Redis Cluster nodes are marked as FAIL!"
fi
# Check slot distribution (ensure all slots are covered)
SLOTS_INFO=$($REDIS_CLUSTER_CLI cluster slots)
# This requires more complex parsing to ensure all 16384 slots are assigned and healthy.
# A common check is to ensure the number of assigned slots is correct and no node is reporting errors.
Alerting and Integration
The Bash scripts above provide the core logic for checks. For production environments, these scripts should be integrated into a comprehensive monitoring system like Prometheus with Alertmanager, Zabbix, Nagios, or Datadog. Key considerations include:
- Thresholds: Tune alert thresholds based on your specific infrastructure performance and acceptable downtime.
- Alert Fatigue: Avoid noisy alerts. Implement intelligent grouping, silencing, and escalation policies.
- Actionable Alerts: Ensure alerts provide enough context (server name, metric, value, timestamp) to quickly diagnose the issue.
- Synthetic Transactions: For deeper application health, implement synthetic transactions that simulate user actions (e.g., adding to cart, checkout) and monitor their success rate and response time.
- Log Aggregation: Centralize logs from all servers (web, app, DB, Redis) using tools like ELK stack or Splunk for easier debugging.
By implementing these detailed server and application-level monitoring practices, you can significantly improve the reliability and performance of your Magento 2 and Redis clusters on OVH, ensuring a stable and responsive e-commerce platform.