Server Monitoring Best Practices: Keeping Your Magento 2 App and Redis Clusters Alive on AWS
Proactive Health Checks for Magento 2 on AWS EC2
Maintaining a high-availability Magento 2 instance on AWS requires a multi-layered monitoring strategy. Beyond basic CPU and memory utilization, we need to focus on application-specific metrics and external dependencies. This section details essential checks for your EC2 instances hosting Magento 2.
EC2 Instance Metrics: Beyond the Basics
AWS CloudWatch provides fundamental EC2 metrics. While CPUUtilization, MemoryUtilization (requires the CloudWatch agent), and Disk I/O are crucial, they are often lagging indicators. We need to supplement these with more granular, real-time checks.
Essential CloudWatch Alarms Configuration
Configure alarms for the following metrics. Set thresholds that reflect your application’s performance baseline, aiming for proactive alerts rather than reactive firefighting.
- CPUUtilization: Alarm if sustained above 80% for 5 minutes.
- NetworkIn/NetworkOut: Alarm if exceeding expected traffic patterns (e.g., sudden spikes indicating potential DDoS or runaway processes).
- DiskQueueLength: Alarm if consistently above 2 for any attached EBS volume. This indicates I/O bottlenecks.
- StatusCheckFailed: Alarm immediately if either InstanceStatusChecks or SystemStatusChecks fail. These indicate underlying AWS infrastructure or host issues.
For MemoryUtilization, ensure the CloudWatch agent is installed and configured to collect memory metrics. A common threshold is alarming if MemoryUtilization exceeds 90% for 5 minutes.
Application-Level Monitoring with CloudWatch Agent and Custom Metrics
CloudWatch Agent is indispensable for collecting logs and custom metrics from your EC2 instances. This allows us to monitor Magento 2’s specific health.
Configuring CloudWatch Agent for Magento 2 Logs
The agent can tail Magento 2’s log files and send them to CloudWatch Logs. This is vital for debugging and identifying application errors.
Create or edit the CloudWatch agent configuration file (e.g., /opt/aws/amazon-cloudwatch-agent/bin/config.json).
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "cwagent"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/www/html/magento2/var/log/system.log",
"log_group_name": "/aws/magento2/ec2/system",
"log_stream_name": "{instance_id}/system.log"
},
{
"file_path": "/var/www/html/magento2/var/log/exception.log",
"log_group_name": "/aws/magento2/ec2/exception",
"log_stream_name": "{instance_id}/exception.log"
},
{
"file_path": "/var/www/html/magento2/var/log/debug.log",
"log_group_name": "/aws/magento2/ec2/debug",
"log_stream_name": "{instance_id}/debug.log"
}
]
}
}
},
"metrics": {
"metrics_collected": {
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_user",
"cpu_usage_system",
"cpu_usage_iowait",
"cpu_usage_steal"
],
"metrics_collection_interval": 60,
"totalcpu": true
},
"disk": {
"measurement": [
"used_percent",
"inodes_free"
],
"metrics_collection_interval": 60,
"resources": [
"/"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"net": {
"measurement": [
"bytes_sent",
"bytes_recv",
"packets_sent",
"packets_recv"
],
"metrics_collection_interval": 60
}
}
}
}
After saving the configuration, restart the agent:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
Custom Metrics for Magento 2 Processes
We can use the CloudWatch agent to push custom metrics, such as the status of critical Magento 2 processes (e.g., PHP-FPM, Nginx). A simple script can check process status and report it.
Create a script (e.g., /opt/scripts/check_magento_processes.sh):
#!/bin/bash
# Define processes to monitor
PROCESSES=("php-fpm" "nginx")
NAMESPACE="Magento2/EC2/Processes"
for PROC in "${PROCESSES[@]}"; do
if pgrep -x "$PROC" > /dev/null; then
STATUS=1 # Running
else
STATUS=0 # Not Running
fi
# Publish metric to CloudWatch
aws cloudwatch put-metric-data --metric-name "${PROC}_status" --namespace "$NAMESPACE" --value $STATUS --dimensions InstanceId=$(curl -s http://169.254.169.254/latest/meta-data/instance-id) --unit Count
done
Make the script executable:
chmod +x /opt/scripts/check_magento_processes.sh
Schedule this script to run periodically using cron (e.g., every minute):
echo "* * * * * /opt/scripts/check_magento_processes.sh" | sudo tee /etc/cron.d/check_magento_processes
Then, configure the CloudWatch agent to collect these custom metrics. Add the following to your config.json under the metrics section:
"procstat": [
{
"pattern": "php-fpm",
"measurement": [
"pid_count"
],
"metrics_collection_interval": 60
},
{
"pattern": "nginx",
"measurement": [
"pid_count"
],
"metrics_collection_interval": 60
}
]
Note: The `aws cloudwatch put-metric-data` command is a simpler way to push custom metrics directly. The `procstat` plugin in the agent is more for collecting process-level statistics like CPU/memory usage per process. For a simple “is it running?” check, the `aws cli` method is often sufficient and easier to manage.
Web Server (Nginx) Monitoring
Nginx is the gateway for your Magento 2 application. Monitoring its health and performance is paramount.
Nginx Status and Performance Metrics
Enable the Nginx stub_status module to expose basic metrics. In your Nginx configuration (e.g., /etc/nginx/nginx.conf or a site-specific conf file):
http {
# ... other http configurations ...
server {
listen 80;
server_name your_domain.com;
location /nginx_status {
stub_status;
allow 127.0.0.1; # Restrict access to localhost
deny all;
}
# ... other locations ...
}
}
Reload Nginx to apply changes:
sudo systemctl reload nginx
You can then fetch these metrics:
curl http://localhost/nginx_status
The output will look like:
Active connections: 123
server accepts handled requests
1667890 1667890 12345678
Reading: 1 Writing: 6 Waiting: 116
Use a script (similar to the process check script) to parse these values and send them as custom CloudWatch metrics (e.g., ActiveConnections, RequestsPerSecond, ReadingConnections, WritingConnections, WaitingConnections). The handled and accepts values can be used to calculate requests per second over time.
Nginx Error Log Monitoring
Ensure Nginx error logs (e.g., /var/log/nginx/error.log) are being collected by the CloudWatch agent as configured previously. Set up CloudWatch Logs metric filters to alert on specific error patterns (e.g., “client denied by server configuration”, “upstream timed out”).
Redis Cluster Monitoring on AWS ElastiCache
For Magento 2, Redis is critical for caching. Using AWS ElastiCache for Redis simplifies management, but robust monitoring is still essential.
ElastiCache Metrics in CloudWatch
ElastiCache automatically publishes a rich set of metrics to CloudWatch. Key metrics to monitor include:
- CacheHits/CacheMisses: High miss rates indicate potential performance degradation or insufficient cache capacity.
- Evictions: High eviction rates mean Redis is discarding data due to memory pressure. This can lead to increased cache misses.
- CurrConnections: Monitor for unexpected spikes or drops.
- EngineCPUUtilization: For Redis, this is a critical indicator of load.
- ReplicationLag: For Redis (Cluster Mode Disabled) or Redis (Cluster Mode Enabled) with read replicas, monitor replication lag to ensure data consistency across nodes.
- NewConnections: Monitor for excessive connection churn.
- BytesUsedForCache: Track memory usage to prevent OOM errors.
Setting Up ElastiCache Alarms
Configure CloudWatch alarms for your ElastiCache Redis cluster:
- Evictions: Alarm if
Evictionsexceeds a low threshold (e.g., 5 per minute) for 5 minutes. This suggests memory pressure. - CacheMisses: Alarm if
CacheMissesrate significantly increases relative toCacheHits(e.g., miss rate > 50% for 10 minutes). - EngineCPUUtilization: Alarm if consistently above 80% for 5 minutes.
- ReplicationLag: Alarm if
ReplicationLagexceeds a few seconds (e.g., 5 seconds) for 2 minutes. - BytesUsedForCache: Alarm if
BytesUsedForCacheexceeds 90% of the allocated memory for 5 minutes.
Redis Performance Tuning and Monitoring Commands
While ElastiCache manages the infrastructure, you can still connect to your Redis instances (using `redis-cli`) to gather more granular insights, especially during troubleshooting.
Connecting to ElastiCache Redis
You’ll need the Redis endpoint and port from your ElastiCache console. Use `redis-cli` with TLS enabled if your cluster requires it.
# If TLS is enabled redis-cli -h your-redis-endpoint.cache.amazonaws.com -p 6379 --tls # If TLS is not enabled redis-cli -h your-redis-endpoint.cache.amazonaws.com -p 6379
Key `redis-cli` Commands for Monitoring
Once connected, these commands provide real-time status:
INFO memory: Detailed memory usage statistics. Look atused_memory,used_memory_rss,mem_fragmentation_ratio. A fragmentation ratio significantly above 1.5 can indicate memory inefficiency.INFO stats: Provides hit/miss ratios, commands processed, connections, etc.INFO persistence: Relevant if RDB or AOF is enabled (less common with ElastiCache unless specifically configured).MONITOR: (Use with extreme caution in production!) Streams all commands being executed. Useful for identifying slow or unexpected queries, but can heavily impact performance.SLOWLOG GET 10: Retrieves the 10 slowest commands executed. Essential for identifying performance bottlenecks within Redis itself.CLIENT LIST: Lists all connected clients, their state, and idle time. Helps identify stale or problematic connections.
You can script these commands to run periodically and push custom metrics to CloudWatch if ElastiCache’s built-in metrics aren’t sufficient. For example, a script could run redis-cli INFO memory, parse the output, and use `aws cloudwatch put-metric-data`.
Application Performance Monitoring (APM) for Magento 2
While server and infrastructure monitoring are crucial, understanding application-level performance bottlenecks within Magento 2 itself requires specialized tools.
Integrating APM Tools
Consider integrating an Application Performance Monitoring (APM) solution. Popular choices include:
- New Relic
- Datadog APM
- Dynatrace
- AWS X-Ray (for distributed tracing, can be integrated with other APM tools)
These tools provide deep insights into:
- Transaction tracing: Identifying slow page loads, API calls, and background tasks.
- Database query performance: Pinpointing inefficient SQL queries.
- External service calls: Monitoring latency and errors from third-party integrations.
- Code-level profiling: Pinpointing specific functions or methods causing performance issues.
For Magento 2, APM is invaluable for diagnosing issues that don’t manifest as simple CPU or memory spikes, such as slow database queries, inefficient third-party API calls, or poorly optimized Magento modules.
Alerting Strategy and Incident Response
Effective monitoring is only half the battle; a well-defined alerting and incident response strategy is critical.
Consolidating Alerts
Use AWS Simple Notification Service (SNS) to consolidate alerts from CloudWatch Alarms. Route these SNS topics to appropriate destinations:
- Email: For less critical alerts or initial notifications.
- PagerDuty/Opsgenie: For critical alerts requiring immediate attention and on-call rotation.
- Slack/Microsoft Teams: For team-wide visibility and discussion.
Alerting Best Practices
Implement the following:
- Actionable Alerts: Each alert should clearly state the problem, the affected service, and ideally, suggest remediation steps.
- Avoid Alert Fatigue: Tune thresholds carefully. Use multi-condition alarms (e.g., CPU high AND error rate high) to reduce noise. Implement alert silencing during planned maintenance.
- Severity Levels: Differentiate between critical (e.g., site down, Redis unavailable) and warning (e.g., high cache miss rate) alerts.
- Runbooks: Maintain runbooks for common alert types, detailing step-by-step procedures for diagnosis and resolution.
Conclusion
A comprehensive monitoring strategy for Magento 2 on AWS involves a blend of infrastructure metrics, application-specific logs and custom metrics, web server performance, and robust Redis cluster health checks. By leveraging AWS CloudWatch, the CloudWatch Agent, and potentially APM tools, coupled with a well-defined alerting strategy, you can ensure the stability, performance, and availability of your Magento 2 application.