Server Monitoring Best Practices: Keeping Your WordPress App and Redis Clusters Alive on OVH

Core WordPress & Redis Metrics for OVH Infrastructure

Maintaining high availability for WordPress applications, especially those leveraging Redis for caching and session management, requires a granular understanding of key performance indicators. On OVH infrastructure, this translates to monitoring not just the typical web server and database metrics, but also the specific health and performance of your Redis clusters. We’ll focus on actionable metrics that directly impact user experience and system stability.

WordPress Application-Level Monitoring

Beyond basic HTTP status codes and response times, deep application-level monitoring for WordPress is crucial. This involves understanding PHP-FPM performance, WordPress query execution, and potential bottlenecks within the application itself.

PHP-FPM Process Management

PHP-FPM’s process manager is a common point of contention. Monitoring the number of active, idle, and rejected processes provides insight into whether your FPM pool is adequately sized for the current load. We’ll use `pm.status_path` for this.

Configuration Snippet (php-fpm.conf)

; /etc/php/8.1/fpm/pool.d/www.conf (example path)
[www]
user = www-data
group = www-data
listen = /run/php/php8.1-fpm.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0660

pm = dynamic
pm.max_children = 150
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
pm.process_idle_timeout = 10s
pm.max_requests = 500

; Enable status page
pm.status_path = /fpm_status

Monitoring PHP-FPM Status with `curl`

You can fetch this status page directly. For automated monitoring, tools like Prometheus with the `php-fpm_exporter` are ideal, but a quick `curl` check is useful for manual diagnostics.

curl http://localhost/fpm_status

The output will look something like this:

pool: www
process manager: dynamic
-------------------------
process.max         = 150
process.min         = 10
process.idle        = 5
active processes    = 25
idle processes      = 10
requests active     = 0
total requests      = 1234567
slow requests       = 10

Key metrics to watch:

active processes: Should ideally be well below process.max. Spikes indicate high load. Sustained high values might require increasing pm.max_children.
idle processes: Should be within the min_spare_servers and max_spare_servers range. Too few idle processes mean new requests might wait.
slow requests: A direct indicator of performance issues within PHP execution.

WordPress Query Performance

Slow database queries are a primary cause of sluggish WordPress sites. Enabling the WordPress Query Monitor plugin or using the built-in debug log can help identify these. For production, we rely on external tools and database-level monitoring.

MySQL Slow Query Log

Ensure your MySQL server is configured to log slow queries. This is invaluable for pinpointing inefficient SQL statements generated by themes, plugins, or WordPress core.

; /etc/mysql/my.cnf or /etc/mysql/mysql.conf.d/mysqld.cnf
[mysqld]
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
long_query_time = 2 ; Log queries taking longer than 2 seconds
log_queries_not_using_indexes = 1 ; Optional, but highly recommended

Analyze the log file using tools like pt-query-digest from Percona Toolkit for aggregated insights.

pt-query-digest /var/log/mysql/mysql-slow.log > /tmp/slow_query_report.txt
cat /tmp/slow_query_report.txt

Redis Cluster Health and Performance

For WordPress sites using Redis for object caching, page caching, or session management, Redis performance is paramount. We need to monitor Redis itself, not just its presence.

Key Redis Metrics

Use the redis-cli INFO command to get a wealth of information. For automated monitoring, tools like `redis_exporter` (for Prometheus) are essential.

redis-cli -h your_redis_host -p 6379 INFO memory,stats,persistence,commandstats

Focus on these sections:

Memory Usage

used_memory:123456789
used_memory_human:117.75M
used_memory_peak:150000000
used_memory_peak_human:143.05M
mem_fragmentation_ratio:1.20

used_memory: Current memory consumption. Watch for it approaching maxmemory.
used_memory_peak: Highest memory usage since startup.
mem_fragmentation_ratio: Ratio of allocated memory to used memory. A value significantly above 1.5 can indicate fragmentation issues, potentially leading to higher memory usage than expected. A value below 1 means Redis is using more memory than it’s allocated, which is impossible and indicates a bug or corruption.

Cache Performance (Command Stats)

# commandstats
cmdstat_get:calls=1000000,usec=5000000,usec_per_call=5.00
cmdstat_set:calls=500000,usec=750000,usec_per_call=1.50
cmdstat_del:calls=10000,usec=20000,usec_per_call=2.00

cmdstat_get, cmdstat_set, cmdstat_del: These show the number of calls and total execution time for key commands. High usec_per_call for GET or SET indicates slow operations. For WordPress object caching, GET and SET are most critical.

Evictions and Expirations

evicted_keys:15000
expired_keys:20000

evicted_keys: Number of keys removed because of maxmemory policy. High eviction rates mean your cache is too small for the workload, leading to frequent cache misses and increased database load.

Persistence (RDB/AOF)

rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:120
aof_enabled:0
aof_last_write_status:ok

rdb_last_bgsave_status: Should be ‘ok’. Failures here mean your data might not be backed up correctly.
rdb_last_bgsave_time_sec: Time taken for the last background save. Long durations can impact performance.

Redis Cluster Specifics (OVH Managed Redis)

If you are using OVH’s managed Redis service, you’ll have access to their specific monitoring dashboards. These often abstract away the raw `redis-cli` output but provide high-level views of:

Node health (CPU, Memory, Network I/O)
Replication status (if applicable)
Connection counts
Latency

Crucially, check OVH’s documentation for their specific alerts and recommended thresholds. For instance, a sustained high CPU usage on a Redis node can indicate inefficient queries or a need for scaling.

Infrastructure-Level Monitoring on OVH

OVH provides its own suite of monitoring tools for your instances and services. Leveraging these is the first line of defense.

Instance Metrics

OVH’s control panel offers real-time and historical data for:

CPU Utilization: Sustained high CPU on WordPress web servers can indicate application bottlenecks or insufficient resources.
Network Traffic: Spikes can indicate DDoS attacks or unexpected traffic surges.
Disk I/O: High I/O wait times on database servers or WordPress instances serving many static assets can point to storage performance issues.
Memory Usage: Crucial for both web servers and database servers. OOM (Out-Of-Memory) killer events are catastrophic.

Load Balancer Metrics (if applicable)

If you’re using an OVH Load Balancer (e.g., HAProxy-based), monitor:

Backend health checks: Ensure all your WordPress instances are reported as healthy.
Request rates: Understand traffic patterns.
Response times: Identify latency introduced by the LB or backend servers.
Connection states: Monitor for connection errors or timeouts.

Alerting Strategy

Effective alerting is about notifying the right people about the right problems at the right time, without causing alert fatigue. Define clear thresholds based on historical data and expected performance.

Example Alerting Rules (Conceptual, for Prometheus/Alertmanager)

These are conceptual examples. Actual implementation depends on your monitoring stack.

# Alert: High PHP-FPM Active Processes
ALERT PHPFPMHighActiveProcesses
  IF php_fpm_active_processes{pool="www"} > 120 FOR 5m
  LABELS { severity = "warning" }
  ANNOTATIONS {
    summary = "PHP-FPM pool 'www' has high active processes ({{ $value }}).",
    description = "The number of active PHP-FPM processes for pool 'www' is exceeding the threshold. This may indicate a performance bottleneck or insufficient pool size."
  }

# Alert: Redis Memory Usage Approaching Limit
ALERT RedisMemoryHigh
  IF redis_memory_used_bytes / redis_max_memory_bytes * 100 > 85 FOR 10m
  LABELS { severity = "critical" }
  ANNOTATIONS {
    summary = "Redis memory usage is high ({{ $value | printf "%.2f" }}%).",
    description = "Redis memory usage has exceeded 85% of the configured maxmemory. Evictions may occur, impacting cache performance."
  }

# Alert: Slow Redis GET Operations
ALERT RedisSlowGet
  IF rate(redis_commandstats_usec_total{command="get"}[5m]) / rate(redis_commandstats_calls_total{command="get"}[5m]) > 5 FOR 5m
  LABELS { severity = "warning" }
  ANNOTATIONS {
    summary = "Average Redis GET operation time is high ({{ $value | printf "%.2f" }} usec).",
    description = "Average time for Redis GET commands is exceeding 5 microseconds. This could slow down WordPress object retrieval."
  }

# Alert: MySQL Slow Query Rate
ALERT MySQLSlowQueryRate
  IF rate(mysql_slow_queries_total[5m]) > 5 FOR 5m
  LABELS { severity = "warning" }
  ANNOTATIONS {
    summary = "High rate of MySQL slow queries detected.",
    description = "The rate of slow MySQL queries has exceeded 5 per minute. Investigate slow query log for details."
  }

Conclusion

A robust monitoring strategy for WordPress and Redis on OVH involves a multi-layered approach. Start with OVH’s native tools, then layer on application-specific metrics for PHP-FPM and MySQL, and finally, dive deep into Redis performance. Automate data collection and alerting to proactively identify and resolve issues before they impact your users.