Server Monitoring Best Practices: Keeping Your WordPress App and Redis Clusters Alive on OVH
Core WordPress & Redis Metrics for OVH Infrastructure
Maintaining high availability for WordPress applications, especially those leveraging Redis for caching and session management, requires a granular understanding of key performance indicators. On OVH infrastructure, this translates to monitoring not just the typical web server and database metrics, but also the specific health and performance of your Redis clusters. We’ll focus on actionable metrics that directly impact user experience and system stability.
WordPress Application-Level Monitoring
Beyond basic HTTP status codes and response times, deep application-level monitoring for WordPress is crucial. This involves understanding PHP-FPM performance, WordPress query execution, and potential bottlenecks within the application itself.
PHP-FPM Process Management
PHP-FPM’s process manager is a common point of contention. Monitoring the number of active, idle, and rejected processes provides insight into whether your FPM pool is adequately sized for the current load. We’ll use `pm.status_path` for this.
Configuration Snippet (php-fpm.conf)
; /etc/php/8.1/fpm/pool.d/www.conf (example path) [www] user = www-data group = www-data listen = /run/php/php8.1-fpm.sock listen.owner = www-data listen.group = www-data listen.mode = 0660 pm = dynamic pm.max_children = 150 pm.start_servers = 10 pm.min_spare_servers = 5 pm.max_spare_servers = 20 pm.process_idle_timeout = 10s pm.max_requests = 500 ; Enable status page pm.status_path = /fpm_status
Monitoring PHP-FPM Status with `curl`
You can fetch this status page directly. For automated monitoring, tools like Prometheus with the `php-fpm_exporter` are ideal, but a quick `curl` check is useful for manual diagnostics.
curl http://localhost/fpm_status
The output will look something like this:
pool: www process manager: dynamic ------------------------- process.max = 150 process.min = 10 process.idle = 5 active processes = 25 idle processes = 10 requests active = 0 total requests = 1234567 slow requests = 10
Key metrics to watch:
active processes: Should ideally be well belowprocess.max. Spikes indicate high load. Sustained high values might require increasingpm.max_children.idle processes: Should be within themin_spare_serversandmax_spare_serversrange. Too few idle processes mean new requests might wait.slow requests: A direct indicator of performance issues within PHP execution.
WordPress Query Performance
Slow database queries are a primary cause of sluggish WordPress sites. Enabling the WordPress Query Monitor plugin or using the built-in debug log can help identify these. For production, we rely on external tools and database-level monitoring.
MySQL Slow Query Log
Ensure your MySQL server is configured to log slow queries. This is invaluable for pinpointing inefficient SQL statements generated by themes, plugins, or WordPress core.
; /etc/mysql/my.cnf or /etc/mysql/mysql.conf.d/mysqld.cnf [mysqld] slow_query_log = 1 slow_query_log_file = /var/log/mysql/mysql-slow.log long_query_time = 2 ; Log queries taking longer than 2 seconds log_queries_not_using_indexes = 1 ; Optional, but highly recommended
Analyze the log file using tools like pt-query-digest from Percona Toolkit for aggregated insights.
pt-query-digest /var/log/mysql/mysql-slow.log > /tmp/slow_query_report.txt cat /tmp/slow_query_report.txt
Redis Cluster Health and Performance
For WordPress sites using Redis for object caching, page caching, or session management, Redis performance is paramount. We need to monitor Redis itself, not just its presence.
Key Redis Metrics
Use the redis-cli INFO command to get a wealth of information. For automated monitoring, tools like `redis_exporter` (for Prometheus) are essential.
redis-cli -h your_redis_host -p 6379 INFO memory,stats,persistence,commandstats
Focus on these sections:
Memory Usage
used_memory:123456789 used_memory_human:117.75M used_memory_peak:150000000 used_memory_peak_human:143.05M mem_fragmentation_ratio:1.20
used_memory: Current memory consumption. Watch for it approaching maxmemory. used_memory_peak: Highest memory usage since startup. mem_fragmentation_ratio: Ratio of allocated memory to used memory. A value significantly above 1.5 can indicate fragmentation issues, potentially leading to higher memory usage than expected. A value below 1 means Redis is using more memory than it’s allocated, which is impossible and indicates a bug or corruption.
Cache Performance (Command Stats)
# commandstats cmdstat_get:calls=1000000,usec=5000000,usec_per_call=5.00 cmdstat_set:calls=500000,usec=750000,usec_per_call=1.50 cmdstat_del:calls=10000,usec=20000,usec_per_call=2.00
cmdstat_get, cmdstat_set, cmdstat_del: These show the number of calls and total execution time for key commands. High usec_per_call for GET or SET indicates slow operations. For WordPress object caching, GET and SET are most critical.
Evictions and Expirations
evicted_keys:15000 expired_keys:20000
evicted_keys: Number of keys removed because of maxmemory policy. High eviction rates mean your cache is too small for the workload, leading to frequent cache misses and increased database load.
Persistence (RDB/AOF)
rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:120 aof_enabled:0 aof_last_write_status:ok
rdb_last_bgsave_status: Should be ‘ok’. Failures here mean your data might not be backed up correctly. rdb_last_bgsave_time_sec: Time taken for the last background save. Long durations can impact performance.
Redis Cluster Specifics (OVH Managed Redis)
If you are using OVH’s managed Redis service, you’ll have access to their specific monitoring dashboards. These often abstract away the raw `redis-cli` output but provide high-level views of:
- Node health (CPU, Memory, Network I/O)
- Replication status (if applicable)
- Connection counts
- Latency
Crucially, check OVH’s documentation for their specific alerts and recommended thresholds. For instance, a sustained high CPU usage on a Redis node can indicate inefficient queries or a need for scaling.
Infrastructure-Level Monitoring on OVH
OVH provides its own suite of monitoring tools for your instances and services. Leveraging these is the first line of defense.
Instance Metrics
OVH’s control panel offers real-time and historical data for:
- CPU Utilization: Sustained high CPU on WordPress web servers can indicate application bottlenecks or insufficient resources.
- Network Traffic: Spikes can indicate DDoS attacks or unexpected traffic surges.
- Disk I/O: High I/O wait times on database servers or WordPress instances serving many static assets can point to storage performance issues.
- Memory Usage: Crucial for both web servers and database servers. OOM (Out-Of-Memory) killer events are catastrophic.
Load Balancer Metrics (if applicable)
If you’re using an OVH Load Balancer (e.g., HAProxy-based), monitor:
- Backend health checks: Ensure all your WordPress instances are reported as healthy.
- Request rates: Understand traffic patterns.
- Response times: Identify latency introduced by the LB or backend servers.
- Connection states: Monitor for connection errors or timeouts.
Alerting Strategy
Effective alerting is about notifying the right people about the right problems at the right time, without causing alert fatigue. Define clear thresholds based on historical data and expected performance.
Example Alerting Rules (Conceptual, for Prometheus/Alertmanager)
These are conceptual examples. Actual implementation depends on your monitoring stack.
# Alert: High PHP-FPM Active Processes
ALERT PHPFPMHighActiveProcesses
IF php_fpm_active_processes{pool="www"} > 120 FOR 5m
LABELS { severity = "warning" }
ANNOTATIONS {
summary = "PHP-FPM pool 'www' has high active processes ({{ $value }}).",
description = "The number of active PHP-FPM processes for pool 'www' is exceeding the threshold. This may indicate a performance bottleneck or insufficient pool size."
}
# Alert: Redis Memory Usage Approaching Limit
ALERT RedisMemoryHigh
IF redis_memory_used_bytes / redis_max_memory_bytes * 100 > 85 FOR 10m
LABELS { severity = "critical" }
ANNOTATIONS {
summary = "Redis memory usage is high ({{ $value | printf "%.2f" }}%).",
description = "Redis memory usage has exceeded 85% of the configured maxmemory. Evictions may occur, impacting cache performance."
}
# Alert: Slow Redis GET Operations
ALERT RedisSlowGet
IF rate(redis_commandstats_usec_total{command="get"}[5m]) / rate(redis_commandstats_calls_total{command="get"}[5m]) > 5 FOR 5m
LABELS { severity = "warning" }
ANNOTATIONS {
summary = "Average Redis GET operation time is high ({{ $value | printf "%.2f" }} usec).",
description = "Average time for Redis GET commands is exceeding 5 microseconds. This could slow down WordPress object retrieval."
}
# Alert: MySQL Slow Query Rate
ALERT MySQLSlowQueryRate
IF rate(mysql_slow_queries_total[5m]) > 5 FOR 5m
LABELS { severity = "warning" }
ANNOTATIONS {
summary = "High rate of MySQL slow queries detected.",
description = "The rate of slow MySQL queries has exceeded 5 per minute. Investigate slow query log for details."
}
Conclusion
A robust monitoring strategy for WordPress and Redis on OVH involves a multi-layered approach. Start with OVH’s native tools, then layer on application-specific metrics for PHP-FPM and MySQL, and finally, dive deep into Redis performance. Automate data collection and alerting to proactively identify and resolve issues before they impact your users.