Server Monitoring Best Practices: Keeping Your PHP App and Redis Clusters Alive on Linode
Core Metrics for PHP Applications
Effective monitoring of PHP applications hinges on tracking key performance indicators (KPIs) that directly impact user experience and resource utilization. Beyond basic CPU and memory, we need to delve into application-specific metrics.
Request Latency and Throughput
Understanding how quickly your application responds to requests and how many requests it can handle per unit of time is paramount. Tools like New Relic, Datadog, or even custom Prometheus exporters can provide this data. For a more DIY approach, we can leverage web server logs and process them.
Consider using Nginx’s access logs to track request times. We can parse these logs to calculate average, p95, and p99 latencies.
PHP-FPM Process Management
PHP-FPM (FastCGI Process Manager) is the de facto standard for running PHP applications. Monitoring its process pool is critical. Key metrics include:
- Active Processes: The number of PHP-FPM workers currently handling requests.
- Idle Processes: Workers waiting for new requests.
- Queue Length: The number of requests waiting to be processed. A consistently high queue length indicates insufficient worker processes.
- Max Children: The maximum number of child processes allowed.
- Slow Requests: Requests that exceed a defined execution time.
PHP-FPM exposes these metrics via a status page. We can configure Nginx to proxy this status page and then use a monitoring agent to scrape it.
Configuring PHP-FPM Status Page
Edit your PHP-FPM pool configuration file (e.g., /etc/php/8.2/fpm/pool.d/www.conf) and ensure the following:
[global] ; ... other settings ... pm = dynamic pm.max_children = 50 pm.start_servers = 5 pm.min_spare_servers = 2 pm.max_spare_servers = 10 pm.process_idle_timeout = 10s pm.max_requests = 500 ; Enable the status page pm.status_path = /fpm-status ; Listen on a TCP socket for easier access from Nginx listen = /run/php/php8.2-fpm.sock ; If using TCP: listen = 127.0.0.1:9000 ; If using TCP, ensure 'access.log' is configured to log request times ; access.log = /var/log/php8.2-fpm/access.log ; request_slowlog_timeout = 10s ; slowlog = /var/log/php8.2-fpm/slow.log
After modifying the configuration, reload PHP-FPM:
sudo systemctl reload php8.2-fpm
Proxying PHP-FPM Status with Nginx
In your Nginx site configuration (e.g., /etc/nginx/sites-available/your-app), add a location block to expose the status page. This block should only be accessible from localhost or a trusted monitoring network.
server {
listen 80;
server_name your-app.com;
root /var/www/your-app/public;
index index.php;
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location ~ \.php$ {
include snippets/fastcgi-php.conf;
# If using Unix socket:
fastcgi_pass unix:/run/php/php8.2-fpm.sock;
# If using TCP:
# fastcgi_pass 127.0.0.1:9000;
}
# PHP-FPM Status Page - Restricted Access
location ~ ^/fpm-status {
# Allow access only from localhost
allow 127.0.0.1;
deny all;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_pass unix:/run/php/php8.2-fpm.sock; # Or your TCP address
# For detailed status, use 'full'
fastcgi_param PHP_STATUS_PAGE 'full';
}
}
Reload Nginx:
sudo systemctl reload nginx
You should now be able to access http://your-app.com/fpm-status (from localhost) and see detailed PHP-FPM metrics.
Application Error Rates
Tracking uncaught exceptions and fatal errors is crucial. This can be achieved through:
- Error Logging: Configure PHP to log errors to a file (
error_logdirective inphp.ini) and monitor these logs. - APM Tools: Application Performance Monitoring tools (New Relic, Datadog, Sentry) excel at capturing and aggregating exceptions.
- Custom Metrics: Instrument your code to send custom error counts to a metrics system like Prometheus.
Example of custom error reporting in PHP:
// Assuming you have a Prometheus client library integrated
use Prometheus\RegistryInterface;
use Prometheus\Counter;
class ErrorReporter {
private Counter $errorCounter;
public function __construct(RegistryInterface $registry) {
$this->errorCounter = $registry->registerCounter(
'php_application_errors_total',
'Total number of application errors',
['error_type', 'context']
);
}
public function reportError(string $errorType, string $context = 'unknown'): void {
$this->errorCounter->inc([$errorType, $context]);
}
}
// Usage example within your application
try {
// ... your code that might throw an exception ...
throw new \InvalidArgumentException("Invalid parameter provided.");
} catch (\Throwable $e) {
$errorReporter = new ErrorReporter($prometheusRegistry); // $prometheusRegistry is your configured Prometheus registry
$errorReporter->reportError(get_class($e), $e->getMessage());
// Log the error to a file or APM tool as well
error_log(sprintf("Uncaught Exception: %s in %s on line %d", $e->getMessage(), $e->getFile(), $e->getLine()));
// Re-throw or handle appropriately
throw $e;
}
Redis Cluster Monitoring Essentials
Redis, especially in a cluster configuration, requires dedicated monitoring to ensure data availability, performance, and stability. We’ll focus on metrics relevant to a clustered setup.
Cluster Health and Node Status
The most fundamental check is the health of the Redis cluster itself and its individual nodes. The redis-cli cluster info and redis-cli cluster nodes commands are invaluable.
# Connect to any node in the cluster redis-cli -c -h-p 6379 # Check overall cluster status CLUSTER INFO # List all nodes and their status CLUSTER NODES
Key indicators from CLUSTER INFO:
cluster_state: Should beok. If not, the cluster is unhealthy.cluster_slots_assigned,cluster_slots_ok,cluster_slots_pfail,cluster_slots_fail: These should ideally match, withpfailandfailbeing zero.pfail(possible failure) indicates a node is unreachable but might recover.failmeans it’s confirmed down.cluster_known_nodes: The total number of nodes the cluster is aware of.
From CLUSTER NODES, pay attention to the flags for each node (e.g., master, slave, myself, handshake, noaddr, fail, pfail). A node in fail state is a critical issue.
Replication and Failover Metrics
In a cluster, masters have replicas. Monitoring replication lag and failover events is crucial for data consistency and availability.
# On a master node INFO replication
Key metrics from INFO replication:
master_repl_offset: The current replication offset of the master.slave_repl_offset: The replication offset of the connected slave. The difference between these indicates lag.master_link_status: Should beup. Ifdown, replication has stopped.
For slaves, INFO replication will show:
master_host,master_port: Details of the master it’s connected to.slave_lag: The replication lag in seconds. This is a critical metric to monitor. A high lag means data on the replica is stale.
Monitoring failover events can be done by observing the cluster logs or by setting up alerts when a node’s status changes to fail or when a new master is elected.
Memory and Performance Metrics
Standard Redis performance metrics apply, but with a cluster view.
# On any node INFO memory INFO stats INFO persistence
Key metrics:
- Memory Usage:
used_memory,used_memory_rss,mem_fragmentation_ratio. High fragmentation can indicate memory issues. - Keyspace:
db0:keys,db0:expires. Monitor the total number of keys and keys with TTLs. - Commands Processed:
total_commands_processed. Track the rate of commands. - Connections:
connected_clients. High client counts can indicate connection leaks or overload. - Latency: Redis 6 introduced latency monitoring. Use
redis-cli --latency-history <host>:<port>orMONITOR(use with caution in production) to observe command execution times. - Persistence: Monitor RDB and AOF operations, especially
rdb_last_bgsave_statusandaof_last_bgrewrite_status. Failures here can lead to data loss or performance degradation.
Monitoring Tools and Strategies
For effective monitoring, consider these tools and approaches:
Prometheus and Grafana
This is a powerful open-source combination. You’ll need:
- Node Exporter: For system-level metrics (CPU, RAM, Disk, Network) on your Linode instances.
- Redis Exporter: A dedicated exporter (e.g.,
oliver006/redis_exporter) that scrapes Redis metrics and exposes them in Prometheus format. Configure it to connect to your cluster. - PHP-FPM Exporter: If you need more granular PHP-FPM metrics than the status page provides, or if you want to integrate them directly into Prometheus.
- Prometheus Server: To scrape, store, and query metrics.
- Grafana: For visualization and dashboarding. Pre-built Redis dashboards are readily available.
Example configuration for redis_exporter (often run as a systemd service):
# Example systemd service file for redis_exporter [Unit] Description=Redis Exporter Wants=network-online.target After=network-online.target [Service] User=redis_exporter Group=redis_exporter Type=simple ExecStart=/usr/local/bin/redis_exporter \ --redis.addr=redis://:6379 \ --redis.password= \ --redis.alias=my_redis_cluster \ --check-keyspace=true \ --check-cluster=true \ --check-replication=true \ --check-memory=true \ --check-command=true \ --check-keys=db0:my_key_count \ --namespace=redis_cluster [Install] Restart=on-failure
Ensure your Prometheus configuration scrapes this exporter:
scrape_configs:
- job_name: 'redis_cluster'
static_configs:
- targets: [':9121'] # Default port for redis_exporter
Alerting Strategies
Define clear alerting rules based on critical thresholds. For example:
# Prometheus Alerting Rules (e.g., in rules.yml)
groups:
- name: redis_alerts
rules:
- alert: RedisClusterDown
expr: redis_cluster_state == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Redis cluster is down."
description: "The Redis cluster is reporting a state of 'down'. Manual intervention required."
- alert: RedisReplicationLagging
expr: redis_slave_replication_lag_seconds > 60 # Lagging by more than 60 seconds
for: 2m
labels:
severity: warning
annotations:
summary: "Redis replication lag detected on {{ $labels.instance }}."
description: "Redis slave {{ $labels.instance }} is lagging behind its master by {{ $value }} seconds."
- alert: HighPhpFpmQueue
expr: php_fpm_queue_length > 10 # Queue length exceeds 10
for: 5m
labels:
severity: warning
annotations:
summary: "High PHP-FPM queue length on {{ $labels.instance }}."
description: "PHP-FPM queue length is {{ $value }} on {{ $labels.instance }}. Consider scaling up PHP-FPM workers."
- alert: HighPhpFpmSlowRequests
expr: php_fpm_slow_requests_total > 0 # Any slow requests detected
for: 1m
labels:
severity: info
annotations:
summary: "PHP-FPM slow requests detected on {{ $labels.instance }}."
description: "PHP-FPM is reporting slow requests on {{ $labels.instance }}. Investigate application performance."
Integrate Prometheus Alertmanager with Slack, PagerDuty, or email for notifications.
Linode Specific Considerations
When running on Linode, leverage their built-in monitoring and consider network configurations.
Linode Cloud Manager Monitoring
Linode’s Cloud Manager provides basic host-level metrics (CPU, Network I/O, Disk I/O, RAM). While useful for overall server health, they are insufficient for deep application or database monitoring. Use these as a first line of defense to detect host-level issues.
Network Latency and Firewalls
Ensure your Linode firewall rules (both Linode’s Cloud Firewall and server-level firewalls like ufw or iptables) allow necessary traffic for:
- Application traffic (HTTP/HTTPS)
- PHP-FPM communication (if using TCP sockets)
- Redis cluster inter-node communication (ports 6379 and cluster bus ports, typically 16379+N)
- Monitoring agent communication (e.g., Prometheus scraping ports)
High network latency between your PHP application servers and Redis cluster nodes can significantly degrade performance. If possible, co-locate them within the same Linode data center or even the same VPC/private network.
Automated Recovery and Health Checks
Beyond just alerting, consider automated actions:
- PHP-FPM: Configure systemd to automatically restart PHP-FPM services if they crash.
- Redis: Redis Sentinel can be used for automatic failover of master nodes. Ensure your cluster is configured with sufficient replicas and that Sentinel is properly monitoring.
- Application Health Checks: Implement a dedicated health check endpoint in your PHP application (e.g.,
/health) that verifies database connections, Redis connectivity, and other critical dependencies. Load balancers or orchestration systems (like Kubernetes, though less common for simple Linode setups) can use these endpoints to remove unhealthy instances from service.
A simple PHP health check endpoint:
<?php
// public/health.php
header('Content-Type: application/json');
$response = ['status' => 'ok', 'dependencies' => []];
$statusCode = 200;
// Check Redis connection
try {
// Assuming you have a Redis client instance available, e.g., via dependency injection
// $redisClient = $container->get(RedisClient::class);
// $redisClient->ping(); // Or a more robust check
$response['dependencies']['redis'] = 'ok';
} catch (\Throwable $e) {
$response['status'] = 'error';
$response['dependencies']['redis'] = 'error: ' . $e->getMessage();
$statusCode = 503; // Service Unavailable
}
// Check Database connection
try {
// Assuming you have a PDO or similar DB connection
// $db = new PDO(...);
// $db->query('SELECT 1');
$response['dependencies']['database'] = 'ok';
} catch (\Throwable $e) {
$response['status'] = 'error';
$response['dependencies']['database'] = 'error: ' . $e->getMessage();
$statusCode = 503;
}
// Add more checks as needed (e.g., external API availability)
http_response_code($statusCode);
echo json_encode($response);
exit;
Configure your web server (Nginx) to serve this endpoint efficiently and potentially bypass some application logic.