Server Monitoring Best Practices: Keeping Your Magento 2 App and Redis Clusters Alive on DigitalOcean
Proactive Redis Cluster Health Checks with `redis-cli`
Maintaining the health of your Redis clusters is paramount for Magento 2 performance. Beyond basic connectivity, we need to monitor key metrics that indicate potential bottlenecks or impending failures. This involves leveraging `redis-cli` for direct introspection and integrating these checks into a robust monitoring system like Prometheus or Nagios.
A fundamental check is the cluster’s overall status. For Redis Sentinel, this means querying the master and its replicas. For Redis Cluster, it’s about ensuring all nodes are in `ok` state and that the hash slots are fully covered.
Sentinel Cluster Status
Connect to a Sentinel instance and query the master’s status. This command will return the current master’s IP and port, and importantly, the number of Sentinels monitoring it and the number of replicas it has.
redis-cli -h-p SENTINEL master
The output will look something like this:
1) "name" 2) "" 3) "ip" 4) "10.10.0.5" 5) "port" 6) "6379" 7) "runid" 8) "..." 9) "flags" 10) "master" 11) "pending-commands" 12) "0" 13) "last-ping-sent" 14) "0" 15) "last-ping-reply" 16) "0" 17) "down-after-milliseconds" 18) "5000" 19) "failover-timeout" 20) "10000" 21) "parallel-syncs" 22) "1" 23) "master-host" 24) "10.10.0.5" 25) "master-port" 26) "6379" 27) "replication-offset" 28) "123456789" 29) "master-link-down-since-seconds" 30) "0" 31) "master-link-status" 32) "up" 33) "slave-priority" 34) "100" 35) "replica-count" 36) "2" 37) "flags-group-by-role" 38) "master" 39) "num-slaves" 40) "2" 41) "num-other-sentinels" 42) "2"
Key metrics to monitor from this output for alerting:
master-link-status: Should always beup. If it’s down, there’s a replication issue.replica-count: Should match your expected number of replicas. A decrease indicates a replica has gone offline or is failing to connect.num-other-sentinels: Should match your expected number of Sentinels. A decrease means a Sentinel has been removed or is unreachable.
To check the health of individual replicas, use the SENTINEL replicas <master-name> command.
redis-cli -h-p SENTINEL replicas
This will list each replica, its status, replication lag, and connection state. Monitor master-link-down-since-seconds for each replica; a non-zero value indicates a problem.
Redis Cluster Node Status
For Redis Cluster, the primary tool is redis-cli --cluster check. This command connects to all nodes in the cluster and verifies their state, including hash slot distribution and connectivity.
redis-cli --cluster check:
A healthy cluster will report:
... [OK] All nodes agree about slots configuration. ... [OK] All masters have at least one replica. ... [OK] All slaves are in sync with their master. ... [OK] All 16384 slots covered. ...
Any deviation from these `[OK]` messages indicates a problem. For example, “[ERR] N of slots covered” means some keys might be unreachable. “[ERR] Some slaves are not in sync” points to replication lag or failure.
To get a quick overview of all nodes and their roles, use CLUSTER NODES.
redis-cli -h-p CLUSTER NODES
Look for nodes marked as `master` or `slave`. Ensure all masters have at least one slave and that no node is marked as `disconnected` or `fail`.
Magento 2 Application Performance Monitoring (APM) with New Relic
While Redis health is crucial, understanding how your Magento 2 application interacts with it and other services is equally important. New Relic is a powerful APM tool that provides deep insights into transaction traces, database queries, external service calls, and error rates.
Key Magento 2 Metrics to Track
- Transaction Traces: Identify slow-loading pages, problematic API endpoints, or inefficient cron jobs. Look for transactions exceeding your SLOs (Service Level Objectives).
- Database Queries: Pinpoint slow SQL queries, especially those executed within Magento’s EAV model or complex catalog operations.
- External Services: Monitor latency and error rates for calls to third-party APIs (payment gateways, shipping providers, etc.) and internal services like Redis.
- Error Rates: Track PHP errors, exceptions, and HTTP 5xx errors. Set up alerts for spikes in error frequency.
- Throughput: Monitor requests per minute (RPM) to understand traffic patterns and identify potential load issues.
- Response Time: Track average and percentile response times (e.g., p95, p99) to gauge user experience.
Configuring New Relic for Magento 2
The New Relic PHP agent is typically installed via PECL or by downloading the agent archive. For DigitalOcean droplets, this often involves SSHing into the server and following the official New Relic installation guide.
After installation, you’ll need to configure the `newrelic.ini` file. This file is usually located in your PHP configuration directory (e.g., `/etc/php/8.1/cli/conf.d/` or `/etc/php/8.1/fpm/conf.d/`).
[newrelic] ; Required: Your New Relic license key license = "YOUR_LICENSE_KEY" ; Required: The application name that will appear in the New Relic UI appname = "Magento2-Production-Web" ; Optional: Enable/disable specific features ; enable_auto_instrumentation = true ; high_security = false ; transaction_tracer.enabled = true ; transaction_tracer.threshold = "10ms" ; Alert on transactions slower than 10ms ; error_collector.enabled = true ; capture_errors_for_unknown_classes = true ; log_level = "info" ; log_file = "/var/log/newrelic/php_agent.log" ; For Magento 2, ensure framework integration is enabled framework = "magento" framework.magento.logging = true framework.magento.transaction_name = "request_uri" ; Or "uri" for cleaner names
After modifying `newrelic.ini`, restart your PHP-FPM service and web server (e.g., Nginx or Apache) for the changes to take effect.
sudo systemctl restart php8.1-fpm sudo systemctl restart nginx
Alerting on Key APM Metrics
Within the New Relic UI, navigate to the “Alerts & AI” section. Create NRQL (New Relic Query Language) alerts for critical Magento 2 performance indicators.
Example NRQL for High Error Rate:
SELECT count(*) FROM TransactionError WHERE appName = 'Magento2-Production-Web' SINCE 5 minutes ago
Set a threshold (e.g., > 10 errors in 5 minutes) to trigger an alert.
Example NRQL for Slow Transactions:
SELECT average(duration) FROM Transaction WHERE appName = 'Magento2-Production-Web' SINCE 5 minutes ago
Alert if the average duration exceeds your SLO (e.g., > 2 seconds).
Example NRQL for Redis Latency (if Redis is instrumented):
SELECT average(newrelic.timeslice.value) FROM Transaction WHERE appName = 'Magento2-Production-Web' AND `external.name` = 'Redis/GET' SINCE 5 minutes ago
Monitor the average time spent on Redis GET operations. High latency here directly impacts page load times.
DigitalOcean Droplet Resource Monitoring with `node_exporter` and Prometheus
While APM tools focus on application-level performance, it’s crucial to monitor the underlying infrastructure. For DigitalOcean droplets, this means tracking CPU, memory, disk I/O, and network traffic. Prometheus, coupled with `node_exporter`, is a standard for this type of metric collection.
Setting up `node_exporter`
Download the latest `node_exporter` binary for your droplet’s architecture from the official Prometheus releases page. For example, on an Ubuntu droplet:
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
Create a systemd service file to manage `node_exporter`.
[Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=nobody Group=nogroup Type=simple ExecStart=/usr/local/bin/node_exporter [Install] WantedBy=multi-user.target
Save this as `/etc/systemd/system/node_exporter.service` and then enable and start it:
sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter
Verify that `node_exporter` is running and exposing metrics on port 9100:
curl http://localhost:9100/metrics
Configuring Prometheus to Scrape Droplets
In your Prometheus configuration file (e.g., `/etc/prometheus/prometheus.yml`), add a scrape job for your Magento 2 droplets.
scrape_configs:
- job_name: 'magento_droplets'
static_configs:
- targets: [':9100', ':9100', ':9100']
labels:
instance: 'magento-web-01'
- targets: [':9100']
labels:
instance: 'magento-redis-master-01'
- targets: [':9100']
labels:
instance: 'magento-redis-replica-01'
Reload Prometheus configuration for the changes to take effect.
sudo systemctl reload prometheus
Key Droplet Metrics for Alerting
Use Prometheus Alertmanager to define alerts based on these metrics:
- CPU Usage: High CPU can indicate inefficient code, traffic spikes, or resource contention.
- Memory Usage: Running out of memory leads to OOM killer events and application instability.
- Disk I/O Wait: High I/O wait times suggest storage bottlenecks, often exacerbated by slow disk performance or excessive database activity.
- Network Traffic: Sudden spikes or drops can indicate network issues or unusual traffic patterns.
- Filesystem Usage: Ensure logs and temporary directories don’t fill up the disk.
Example Alert Rule (in Alertmanager’s rules file):
groups:
- name: magento_alerts
rules:
- alert: HighCpuUsage
expr: 100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 > 90
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage on {{ $labels.instance }} is above 90% for the last 10 minutes."
- alert: LowMemoryAvailable
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: "Low memory available on {{ $labels.instance }}"
description: "Only {{ $value | printf \"%.2f\" }}% of memory is available on {{ $labels.instance }}."
- alert: HighDiskIOWait
expr: avg by (instance) (rate(node_disk_io_time_seconds_total{device=~"sd[a-z]+[0-9]*"}[5m])) > 0.8
for: 15m
labels:
severity: warning
annotations:
summary: "High disk I/O wait on {{ $labels.instance }}"
description: "Disk I/O wait time on {{ $labels.instance }} is consistently high."
Integrating Redis Metrics into Prometheus
To get Redis-specific metrics into Prometheus, we can use the official Redis Exporter or leverage `redis-cli` within a custom exporter script. The Redis Exporter is generally preferred for its comprehensive metric set.
Using Redis Exporter
Download and run the Redis Exporter binary. It can be configured to connect to your Redis master, replicas, or Sentinel instances.
# Download and extract (example for Linux AMD64) wget https://github.com/oliver006/redis_exporter/releases/download/v1.50.0/redis_exporter-v1.50.0.linux-amd64.tar.gz tar xvfz redis_exporter-v1.50.0.linux-amd64.tar.gz sudo mv redis_exporter-v1.50.0.linux-amd64/redis_exporter /usr/local/bin/ # Create systemd service cat <<EOF | sudo tee /etc/systemd/system/redis_exporter.service [Unit] Description=Redis Exporter Wants=network-online.target After=network-online.target [Service] User=nobody Group=nogroup Type=simple ExecStart=/usr/local/bin/redis_exporter --redis.addr=redis://:6379 --redis.password= # For Sentinel: # ExecStart=/usr/local/bin/redis_exporter --redis.addr=redis-sentinel:// :26379 --redis.sentinel.master= --redis.password= [Install] WantedBy=multi-user.target EOF sudo systemctl daemon-reload sudo systemctl enable redis_exporter sudo systemctl start redis_exporter
Add a scrape job to your Prometheus configuration:
scrape_configs:
- job_name: 'redis_cluster'
static_configs:
- targets: [':9121'] # Default port for redis_exporter
labels:
instance: 'magento-redis-master'
# Add similar jobs for replicas if using separate exporters or a single exporter configured for them
Key Redis Metrics for Alerting
redis_up: Should be 1. If 0, the exporter can’t connect to Redis.redis_connected_clients: Monitor for excessive client connections.redis_instantaneous_ops_per_sec: Track command throughput.redis_memory_used_bytes: Monitor memory consumption against limits.redis_evicted_keys: Indicates memory pressure and data loss.redis_replication_lag_seconds: Crucial for replicas to ensure they are in sync.redis_commands_processed_total: Total commands processed by the server.redis_keyspace_hits_totalandredis_keyspace_misses_total: Calculate hit rate (hits / (hits + misses)). A low hit rate might indicate insufficient memory or inefficient caching.
Example Alert Rule for Redis Replication Lag:
groups:
- name: redis_alerts
rules:
- alert: RedisReplicationLagging
expr: redis_replication_lag_seconds{job="redis_cluster"} > 60 # Lagging by more than 60 seconds
for: 5m
labels:
severity: critical
annotations:
summary: "Redis replication lag on {{ $labels.instance }}"
description: "Redis replica {{ $labels.instance }} is lagging by {{ $value | printf \"%.2f\" }} seconds."
- alert: HighRedisMemoryUsage
expr: redis_memory_used_bytes{job="redis_cluster"} / (1024*1024*1024) > 0.9 # 90% of 1GB limit
for: 10m
labels:
severity: warning
annotations:
summary: "High Redis memory usage on {{ $labels.instance }}"
description: "Redis memory usage on {{ $labels.instance }} is {{ $value | printf \"%.2f\" }} GB, exceeding 90% of its limit."
Centralized Logging with ELK Stack (Elasticsearch, Logstash, Kibana)
Aggregating logs from all your Magento 2 application servers, Redis nodes, and load balancers into a central location is essential for debugging and auditing. The ELK stack is a robust solution for this.
Log Shipping with Filebeat
Filebeat is a lightweight shipper that forwards log files from your servers to Logstash or Elasticsearch. Install Filebeat on each server (Magento app, Redis, Nginx, etc.).
# Example for Ubuntu/Debian curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.10.4-amd64.deb sudo dpkg -i filebeat-8.10.4-amd64.deb
Configure Filebeat to collect Magento, Nginx, and Redis logs. Edit `/etc/filebeat/filebeat.yml`:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/*.log
fields_under_root: true
fields:
log_type: nginx
- type: log
enabled: true
paths:
- /var/www/html/magento2/var/log/*.log # Adjust path as needed
fields_under_root: true
fields:
log_type: magento
- type: log
enabled: true
paths:
- /var/log/redis/redis-server.log # Adjust path if using a different log location
fields_under_root: true
fields:
log_type: redis
output.logstash:
hosts: [":5044"] # Or Elasticsearch output directly
# If sending directly to Elasticsearch:
# output.elasticsearch:
# hosts: [":9200"]
# index: "filebeat-%{[agent.version]}-%{+yyyy.MM.dd}"
Enable and start Filebeat:
sudo systemctl enable filebeat sudo systemctl start filebeat
Log Processing with Logstash
Logstash will receive logs from Filebeat, parse them, enrich them, and send them to Elasticsearch. Create a Logstash pipeline configuration (e.g., `/etc/logstash/conf.d/02-magento-pipeline.conf`):
input {
beats {
port => 5044
}
}
filter {
if [fields][log_type] == "nginx" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
}
mutate {
convert => { "response" => "integer" }
convert => { "bytes" => "integer" }
convert => { "response_time" => "float" }
}
}
if [fields][log_type] == "magento" {
# Magento logs can be complex. Use grok or JSON filter if logs are structured.
# Example for a simple error log line:
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:log_level}: %{GREEDYDATA:magento_message}" }
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
}
}
if [fields][log_type] == "redis" {
# Redis logs are typically less structured, focus on keywords
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{NUMBER:process_id}\:%{DATA:thread_id}\] %{LOGLEVEL:log_level} %{GREEDYDATA:redis_message}" }
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
}
}
# Common fields for all log types
mutate {
rename => { "message" => "original_message" }
rename => { "[fields][log_type]" => "log_type" }
}
}
output {
elasticsearch {
hosts => [":9200"]
index => "%{[fields][log_type]}-%{+yyyy.MM.dd}"
}
}
Log Analysis with Kibana
Use Kibana to visualize and search your logs. Create index patterns for each log type (e.g., `nginx-*`, `magento-*`, `redis-*`).
Key Kibana Dashboards/Visualizations:
- Nginx Access Logs: Visualize traffic volume, top IP addresses, response codes (especially 4xx and 5xx), and slow requests.
- Magento Logs: Filter for `ERROR` or `FATAL` log levels to quickly identify application issues. Search for specific exceptions or error messages.
- Redis Logs: Monitor for warnings, errors, or specific events like `eviction` or `failover`.
- Correlated Views: Create dashboards that combine Nginx, Magento, and Redis logs, allowing you to trace a request from the web server through the application and to the cache layer.
By implementing these layered monitoring strategies—from infrastructure metrics to application traces and centralized logging—you can build a resilient and observable Magento 2 environment on DigitalOcean, ensuring high availability and performance for your e-commerce platform.