Server Monitoring Best Practices: Keeping Your Magento 2 App and Redis Clusters Alive on OVH

Proactive Health Checks for Magento 2 and Redis on OVH

Maintaining a high-availability Magento 2 deployment, especially when leveraging Redis for caching and session management, demands a robust and proactive monitoring strategy. This guide focuses on essential server-level and application-specific checks, tailored for an OVH infrastructure, ensuring minimal downtime and optimal performance. We’ll cover critical metrics, configuration snippets, and diagnostic commands to keep your Magento 2 clusters and Redis instances operational.

Core Server Metrics Monitoring

Before diving into application specifics, ensuring the underlying server infrastructure is healthy is paramount. We’ll use standard Linux tools and common monitoring agents. For OVH, this typically means monitoring your dedicated servers or VPS instances.

CPU Utilization

Sustained high CPU usage can cripple application performance. We’ll monitor the average load over 1, 5, and 15 minutes, and specifically track user, system, and I/O wait times. A common threshold for alerting is when the 15-minute load average consistently exceeds the number of CPU cores.

A simple Bash script can capture this data:

#!/bin/bash

LOAD_AVG=$(uptime | awk -F'load average:' '{ print $2 }' | sed 's/,//g')
CPU_CORES=$(nproc)

echo "LOAD_AVG=$LOAD_AVG"
echo "CPU_CORES=$CPU_CORES"

# Example alerting logic (can be integrated with Nagios, Zabbix, Prometheus Alertmanager, etc.)
if (( $(echo "$LOAD_AVG > $CPU_CORES" | bc -l) )); then
    echo "ALERT: High CPU load average detected: $LOAD_AVG (CPU Cores: $CPU_CORES)"
fi

Memory Usage

Both RAM and swap usage are critical. Insufficient RAM leads to increased disk I/O (swapping), severely impacting Magento’s performance. We’ll monitor free memory, used memory, and swap usage. Alerting on swap usage exceeding a small percentage (e.g., 5%) is a good practice.

Using free -m and parsing its output:

#!/bin/bash

MEM_INFO=$(free -m)

TOTAL_MEM=$(echo "$MEM_INFO" | awk '/^Mem:/ {print $2}')
USED_MEM=$(echo "$MEM_INFO" | awk '/^Mem:/ {print $3}')
FREE_MEM=$(echo "$MEM_INFO" | awk '/^Mem:/ {print $4}')
SWAP_TOTAL=$(echo "$MEM_INFO" | awk '/^Swap:/ {print $2}')
SWAP_USED=$(echo "$MEM_INFO" | awk '/^Swap:/ {print $3}')

echo "TOTAL_MEM=${TOTAL_MEM}MB"
echo "USED_MEM=${USED_MEM}MB"
echo "FREE_MEM=${FREE_MEM}MB"
echo "SWAP_TOTAL=${SWAP_TOTAL}MB"
echo "SWAP_USED=${SWAP_USED}MB"

# Example alerting logic for swap
if [ "$SWAP_TOTAL" -gt 0 ]; then
    SWAP_PERCENT=$(awk "BEGIN {printf \"%.2f\", ($SWAP_USED / $SWAP_TOTAL) * 100}")
    if (( $(echo "$SWAP_PERCENT > 5.0" | bc -l) )); then
        echo "ALERT: High swap usage detected: ${SWAP_PERCENT}% (${SWAP_USED}MB / ${SWAP_TOTAL}MB)"
    fi
else
    if [ "$SWAP_USED" -gt 0 ]; then
        echo "ALERT: Swap space is being used but total swap is reported as 0."
    fi
fi

Disk I/O and Space

Disk bottlenecks are common. Monitoring I/O wait times and disk space utilization is crucial. For Magento, logs and temporary files can consume significant space. Alerting on disk usage exceeding 85-90% is standard.

Checking disk space:

#!/bin/bash

# Monitor specific Magento directories and general root
DIRS_TO_MONITOR=("/var/log" "/tmp" "/var/www/html/var/log" "/var/www/html/var/cache" "/var/www/html/var/session")
THRESHOLD=85 # Percentage

for DIR in "${DIRS_TO_MONITOR[@]}"; do
    if [ -d "$DIR" ]; then
        USAGE=$(df -h "$DIR" | awk 'NR==2 {print $5}' | sed 's/%//')
        echo "Disk usage for $DIR: $USAGE%"
        if [ "$USAGE" -ge "$THRESHOLD" ]; then
            echo "ALERT: Disk space critically low on $DIR: $USAGE%"
        fi
    else
        echo "WARNING: Directory $DIR does not exist."
    fi
done

# General root filesystem check
ROOT_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
echo "Disk usage for /: $ROOT_USAGE%"
if [ "$ROOT_USAGE" -ge "$THRESHOLD" ]; then
    echo "ALERT: Disk space critically low on /: $ROOT_USAGE%"
fi

For disk I/O, tools like iostat are invaluable. Monitoring metrics like %util (percentage of time the device was busy) and await (average wait time for I/O requests) can pinpoint disk contention.

#!/bin/bash

# Monitor I/O wait for all devices
echo "Disk I/O Statistics:"
iostat -dx 5 1 | grep -vE '^(Device:|avg-cpu:)' | awk '{print $1, $12, $14}' | while read DEVICE UTIL WAIT; do
    echo "  Device: $DEVICE, %util: $UTIL, await: $WAIT ms"
    # Example alerting thresholds (adjust based on your storage performance)
    if (( $(echo "$UTIL > 90.0" | bc -l) )); then
        echo "ALERT: High disk utilization on $DEVICE: $UTIL%"
    fi
    if (( $(echo "$WAIT > 50.0" | bc -l) )); then
        echo "ALERT: High disk await time on $DEVICE: $WAIT ms"
    fi
done

Magento 2 Specific Monitoring

Magento 2’s architecture, with its complex dependency injection, compilation, and caching layers, requires application-aware monitoring.

Web Server (Nginx/Apache) Status

Monitor web server response times, error rates (4xx, 5xx), and active connections. For Nginx, the status module is essential.

# Nginx configuration to enable status module
server {
    listen 8080; # Use a non-standard port to avoid conflicts
    server_name your_monitoring_domain.com;

    location /nginx_status {
        stub_status;
        allow 127.0.0.1; # Restrict access to localhost for security
        deny all;
    }
}

You can then fetch this data using curl:

#!/bin/bash

STATUS_URL="http://localhost:8080/nginx_status"

RESPONSE=$(curl -s "$STATUS_URL")

if [ -z "$RESPONSE" ]; then
    echo "ALERT: Failed to retrieve Nginx status from $STATUS_URL"
    exit 1
fi

ACTIVE_CONNECTIONS=$(echo "$RESPONSE" | awk '/Active connections:/ {print $3}')
REQUESTS=$(echo "$RESPONSE" | awk '/Reading:/ {print $2}') # This is actually 'requests' in stub_status output
WRITING=$(echo "$RESPONSE" | awk '/Writing:/ {print $2}')
WAITING=$(echo "$RESPONSE" | awk '/Waiting:/ {print $2}')

echo "Nginx Active Connections: $ACTIVE_CONNECTIONS"
echo "Nginx Requests: $REQUESTS"
echo "Nginx Writing: $WRITING"
echo "Nginx Waiting: $WAITING"

# Example alerting: High active connections
if [ "$ACTIVE_CONNECTIONS" -gt 1000 ]; then # Adjust threshold
    echo "ALERT: High Nginx active connections: $ACTIVE_CONNECTIONS"
fi

For Apache, use mod_status similarly. Beyond basic metrics, monitor the rate of 5xx errors. This can be done by parsing Nginx/Apache error logs or by using tools like Prometheus with `nginx-exporter` or `apache-exporter`.

PHP-FPM Performance

Magento 2 is PHP-intensive. PHP-FPM’s performance is critical. Monitor the number of active processes, idle processes, and request queue length. Ensure your `pm.max_children` is set appropriately for your server’s RAM and expected load.

# PHP-FPM pool configuration (e.g., /etc/php/8.1/fpm/pool.d/www.conf)
pm = dynamic
pm.max_children = 100       ; Adjust based on RAM and CPU
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
pm.process_idle_timeout = 10s
pm.max_requests = 500       ; Restart process after N requests

PHP-FPM’s status page can provide real-time metrics. Enable it in your pool configuration:

; /etc/php/8.1/fpm/pool.d/www.conf
pm.status_path = /fpm_status
listen.allowed_clients = 127.0.0.1

And configure your web server to proxy to it:

# In your Nginx site configuration
location ~ ^/fpm_status {
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_pass unix:/run/php/php8.1-fpm.sock; # Adjust path to your PHP-FPM socket
    allow 127.0.0.1;
    deny all;
}

Then, fetch the status:

#!/bin/bash

STATUS_URL="http://localhost/fpm_status?full&json" # Use full for more details, json for parsing

RESPONSE=$(curl -s "$STATUS_URL")

if [ -z "$RESPONSE" ]; then
    echo "ALERT: Failed to retrieve PHP-FPM status from $STATUS_URL"
    exit 1
fi

# Parsing JSON output
ACTIVE_PROCESSES=$(echo "$RESPONSE" | jq '.processes')
IDLE_PROCESSES=$(echo "$RESPONSE" | jq '.idle_processes')
ACTIVE_CONNECTIONS=$(echo "$RESPONSE" | jq '.active_processes') # This is 'active_processes' in FPM status
MAX_CHILDREN=$(echo "$RESPONSE" | jq '.max_children')
QUEUE_LENGTH=$(echo "$RESPONSE" | jq '.queue')

echo "PHP-FPM Active Processes: $ACTIVE_PROCESSES"
echo "PHP-FPM Idle Processes: $IDLE_PROCESSES"
echo "PHP-FPM Max Children: $MAX_CHILDREN"
echo "PHP-FPM Queue Length: $QUEUE_LENGTH"

# Example alerting
if [ "$ACTIVE_PROCESSES" -eq "$MAX_CHILDREN" ] && [ "$QUEUE_LENGTH" -gt 0 ]; then
    echo "ALERT: PHP-FPM pool is saturated. Active processes: $ACTIVE_PROCESSES, Max children: $MAX_CHILDREN, Queue: $QUEUE_LENGTH"
fi
if [ "$QUEUE_LENGTH" -gt 10 ]; then # Adjust threshold
    echo "ALERT: PHP-FPM request queue is growing: $QUEUE_LENGTH"
fi

Magento 2 Specific Checks

Beyond server and web server metrics, we need to check Magento’s own health. This includes checking for deadlocks, slow database queries, and cache health.

Database Health (MySQL/MariaDB)

Magento is heavily database-dependent. Monitor slow query logs, connection counts, and InnoDB status. Look for deadlocks and long-running queries.

-- Connect to MySQL
mysql -u your_user -pyour_password -h your_db_host

-- Check current connections
SHOW GLOBAL STATUS LIKE 'Threads_connected';
SHOW GLOBAL STATUS LIKE 'Max_used_connections';

-- Check for active queries (can be resource intensive on busy servers)
SHOW FULL PROCESSLIST;

-- Check InnoDB status for deadlocks and buffer pool usage
SHOW ENGINE INNODB STATUS\G

You can script checks for deadlocks by parsing the InnoDB status output. Look for the “LATEST DETECTED DEADLOCK” section.

#!/bin/bash

MYSQL_USER="your_user"
MYSQL_PASS="your_password"
MYSQL_HOST="your_db_host"

INNODB_STATUS=$(mysql -u "$MYSQL_USER" -p"$MYSQL_PASS" -h "$MYSQL_HOST" -e "SHOW ENGINE INNODB STATUS\G")

if echo "$INNODB_STATUS" | grep -q "LATEST DETECTED DEADLOCK"; then
    echo "ALERT: InnoDB deadlock detected!"
    echo "$INNODB_STATUS" | grep -A 20 "LATEST DETECTED DEADLOCK" # Print relevant section
fi

# Check for long running queries (example: > 60 seconds)
LONG_QUERIES=$(mysql -u "$MYSQL_USER" -p"$MYSQL_PASS" -h "$MYSQL_HOST" -e "SELECT id, user, host, db, command, time, state, info FROM information_schema.processlist WHERE time > 60 ORDER BY time DESC;")

if [ -n "$LONG_QUERIES" ] && [ "$(echo "$LONG_QUERIES" | wc -l)" -gt 1 ]; then
    echo "ALERT: Long running queries detected:"
    echo "$LONG_QUERIES"
fi

Magento Cache Status

Magento relies heavily on its cache. While Redis is often used for this, ensuring the cache is functional and not stale is important. A simple check is to ensure cache types are enabled and that cache entries are being updated.

#!/bin/bash

MAGENTO_ROOT="/var/www/html" # Adjust to your Magento installation path
BIN_MAGENTO="${MAGENTO_ROOT}/bin/magento"

# Check if cache is enabled
CACHE_STATUS=$($BIN_MAGENTO cache:status)
if echo "$CACHE_STATUS" | grep -q "Cache is enabled"; then
    echo "Magento Cache: Enabled"
else
    echo "ALERT: Magento Cache is disabled!"
fi

# Check for specific cache types (e.g., config, layout, block_html)
# This is more complex and often involves application-level checks or synthetic transactions.
# A simpler approach is to monitor cache hit/miss ratios if your monitoring system can instrument it.

# Example: Attempt to clear a specific cache type and monitor for errors
# This is a disruptive check, use with caution in production.
# echo "Attempting to clear configuration cache..."
# $BIN_MAGENTO cache:clean config
# if [ $? -ne 0 ]; then
#     echo "ALERT: Failed to clear Magento configuration cache."
# fi

Redis Cluster Monitoring

For Magento 2, Redis is typically used for session storage and caching. A Redis cluster (using Redis Sentinel or Redis Cluster) adds complexity but also resilience. Monitoring its health is critical.

Redis Server Health

Monitor Redis’s memory usage, connected clients, and command latency. Ensure persistence (RDB/AOF) is configured and working if required.

#!/bin/bash

REDIS_CLI="redis-cli" # Adjust path if needed

# Monitor memory usage
MEMORY_USAGE=$($REDIS_CLI INFO memory | grep 'used_memory_human:')
echo "Redis Memory Usage: $MEMORY_USAGE"

# Monitor connected clients
CONNECTED_CLIENTS=$($REDIS_CLI INFO clients | grep 'connected_clients:')
echo "Redis Connected Clients: $CONNECTED_CLIENTS"

# Monitor command latency (example: check latency for GET command)
# This requires Redis 4.0+ and enabling latency monitoring.
# $REDIS_CLI LATENCY LATEST
# For simpler checks, monitor overall server responsiveness.

# Basic responsiveness check
RESPONSE_TIME=$($REDIS_CLI --latency-history 10 | tail -n 1 | awk '{print $1}')
echo "Redis Latency (last 10s avg): $RESPONSE_TIME ms"

if (( $(echo "$RESPONSE_TIME > 50.0" | bc -l) )); then # Adjust threshold
    echo "ALERT: High Redis latency detected: $RESPONSE_TIME ms"
fi

Redis Sentinel Monitoring

If using Redis Sentinel for high availability, monitor Sentinel’s health and its view of the master/replica status.

#!/bin/bash

SENTINEL_CLI="redis-cli -p 26379" # Default Sentinel port

# Get list of masters monitored by Sentinel
MASTERS=$($SENTINEL_CLI 'sentinel masters')

echo "$MASTERS" | grep "master" | while read -r line; do
    MASTER_NAME=$(echo "$line" | awk '{print $2}')
    echo "Monitoring Master: $MASTER_NAME"

    # Get master status
    MASTER_INFO=$($SENTINEL_CLI "sentinel master $MASTER_NAME")
    echo "$MASTER_INFO" | grep "ip"
    echo "$MASTER_INFO" | grep "port"
    echo "$MASTER_INFO" | grep "role"
    echo "$MASTER_INFO" | grep "num-slaves"
    echo "$MASTER_INFO" | grep "num-other-sentinels"

    # Check if master is currently flagged as +sdown or +odown
    if echo "$MASTER_INFO" | grep -q "flags.*down"; then
        echo "ALERT: Redis master '$MASTER_NAME' is marked as DOWN by Sentinel!"
    fi

    # Get replica status
    REPLICAS=$($SENTINEL_CLI "sentinel replicas $MASTER_NAME")
    echo "$REPLICAS" | grep "ip"
    echo "$REPLICAS" | grep "flags.*slave" | while read -r replica_line; do
        REPLICA_IP=$(echo "$replica_line" | awk '{print $4}')
        REPLICA_FLAGS=$(echo "$replica_line" | awk '{print $5}')
        if [[ "$REPLICA_FLAGS" == *"down"* ]]; then
            echo "ALERT: Redis replica at $REPLICA_IP for master '$MASTER_NAME' is marked as DOWN by Sentinel!"
        fi
    done
done

# Check Sentinel's own health (e.g., if it's reachable)
if ! $SENTINEL_CLI ping &>/dev/null; then
    echo "ALERT: Sentinel instance at $(hostname):26379 is unreachable."
fi

Redis Cluster Monitoring

If using Redis Cluster, monitor cluster state, node health, and slot distribution.

#!/bin/bash

REDIS_CLUSTER_CLI="redis-cli -c -p 7000" # Example port, adjust as needed

# Check cluster state
CLUSTER_INFO=$($REDIS_CLUSTER_CLI cluster info)
echo "$CLUSTER_INFO"

# Check cluster nodes
CLUSTER_NODES=$($REDIS_CLUSTER_CLI cluster nodes)
echo "$CLUSTER_NODES"

# Alert if cluster is not in 'ok' state
if echo "$CLUSTER_INFO" | grep -q "cluster_state:fail"; then
    echo "ALERT: Redis Cluster state is FAIL!"
fi

# Alert if any node is marked as 'fail'
if echo "$CLUSTER_NODES" | grep -q "fail"; then
    echo "ALERT: One or more Redis Cluster nodes are marked as FAIL!"
fi

# Check slot distribution (ensure all slots are covered)
SLOTS_INFO=$($REDIS_CLUSTER_CLI cluster slots)
# This requires more complex parsing to ensure all 16384 slots are assigned and healthy.
# A common check is to ensure the number of assigned slots is correct and no node is reporting errors.

Alerting and Integration

The Bash scripts above provide the core logic for checks. For production environments, these scripts should be integrated into a comprehensive monitoring system like Prometheus with Alertmanager, Zabbix, Nagios, or Datadog. Key considerations include:

Thresholds: Tune alert thresholds based on your specific infrastructure performance and acceptable downtime.
Alert Fatigue: Avoid noisy alerts. Implement intelligent grouping, silencing, and escalation policies.
Actionable Alerts: Ensure alerts provide enough context (server name, metric, value, timestamp) to quickly diagnose the issue.
Synthetic Transactions: For deeper application health, implement synthetic transactions that simulate user actions (e.g., adding to cart, checkout) and monitor their success rate and response time.
Log Aggregation: Centralize logs from all servers (web, app, DB, Redis) using tools like ELK stack or Splunk for easier debugging.

By implementing these detailed server and application-level monitoring practices, you can significantly improve the reliability and performance of your Magento 2 and Redis clusters on OVH, ensuring a stable and responsive e-commerce platform.

Server Monitoring Best Practices: Keeping Your Magento 2 App and Redis Clusters Alive on OVH

Proactive Health Checks for Magento 2 and Redis on OVH

Core Server Metrics Monitoring

CPU Utilization

Memory Usage

Disk I/O and Space

Magento 2 Specific Monitoring

Web Server (Nginx/Apache) Status

PHP-FPM Performance

Magento 2 Specific Checks

Database Health (MySQL/MariaDB)

Magento Cache Status

Redis Cluster Monitoring

Redis Server Health

Redis Sentinel Monitoring

Redis Cluster Monitoring

Alerting and Integration

Recent Posts

Top Categories

Our Products

Our Services