Server Monitoring Best Practices: Keeping Your Magento 2 App and MongoDB Clusters Alive on AWS
Proactive Monitoring for Magento 2 and MongoDB on AWS
Maintaining a high-availability Magento 2 e-commerce platform, especially when coupled with a MongoDB cluster for specific functionalities (like catalog search or session management), demands a robust and proactive monitoring strategy. This document outlines essential best practices for monitoring your Magento 2 application and its supporting MongoDB infrastructure deployed on Amazon Web Services (AWS).
AWS CloudWatch: The Foundation of Your Monitoring Stack
AWS CloudWatch is your primary tool for collecting and tracking metrics, collecting log files, and setting alarms. For Magento 2 and MongoDB, we’ll focus on key metrics and log analysis.
Magento 2 Application Metrics
Beyond standard EC2 instance metrics (CPU Utilization, Network In/Out, Disk I/O), we need application-specific insights. This often involves custom metrics pushed to CloudWatch.
PHP-FPM Performance
PHP-FPM is critical for Magento 2’s performance. Monitor its active processes, idle processes, and request queue length. You can expose these via the status page and scrape them using the CloudWatch Agent.
First, enable the PHP-FPM status page in your php-fpm.conf (or equivalent configuration file, often in /etc/php/[version]/fpm/pool.d/www.conf):
; /etc/php/8.1/fpm/pool.d/www.conf ; ... other configurations ... pm.status_path = /fpm-status ; ... other configurations ...
Then, configure the CloudWatch Agent to scrape this endpoint. Create or edit the agent configuration file (e.g., /opt/aws/amazon-cloudwatch-agent/bin/config.json).
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "cwagent"
},
"metrics": {
"namespace": "Magento2/App",
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"http_listener": {
"port": 80,
"path": "/fpm-status",
"metrics": {
"request_count": {
"unit": "Count"
},
"active_processes": {
"unit": "Count"
},
"idle_processes": {
"unit": "Count"
},
"max_children": {
"unit": "Count"
},
"accepted_conn": {
"unit": "Count"
},
"listen_queue": {
"unit": "Count"
}
}
}
}
}
}
Apply the configuration and restart the agent:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s sudo systemctl restart amazon-cloudwatch-agent
Magento 2 Specific Logs
Magento 2 generates several critical log files (system.log, exception.log, debug.log). Forward these to CloudWatch Logs for analysis and alarming.
Add the following to your CloudWatch Agent configuration file (/opt/aws/amazon-cloudwatch-agent/bin/config.json) under the logs section:
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/nginx/error.log",
"log_group_name": "Magento2/Nginx/Error",
"log_stream_name": "{instance_id}/nginx_error"
},
{
"file_path": "/var/log/php-fpm/error.log",
"log_group_name": "Magento2/PHP-FPM/Error",
"log_stream_name": "{instance_id}/php_fpm_error"
},
{
"file_path": "/var/www/html/var/log/system.log",
"log_group_name": "Magento2/App/System",
"log_stream_name": "{instance_id}/system"
},
{
"file_path": "/var/www/html/var/log/exception.log",
"log_group_name": "Magento2/App/Exception",
"log_stream_name": "{instance_id}/exception"
}
]
}
}
}
}
Ensure the CloudWatch Agent has read permissions for these log files. After updating the agent configuration, apply and restart it as shown previously.
MongoDB Cluster Metrics
For MongoDB, we’ll monitor key operational metrics and query performance. The MongoDB CloudWatch integration can be achieved via custom scripts or by leveraging the CloudWatch Agent’s ability to collect metrics from specific endpoints.
MongoDB Server Status
The serverStatus command provides a wealth of information. A Python script can periodically execute this command and push relevant metrics to CloudWatch.
Here’s a Python script example using pymongo and boto3:
import pymongo
import boto3
import logging
from datetime import datetime
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
# AWS CloudWatch configuration
cloudwatch = boto3.client('cloudwatch')
NAMESPACE = 'Magento2/MongoDB'
# MongoDB connection details (replace with your actual connection string)
MONGO_URI = "mongodb://user:[email protected]:27017,mongo2.example.com:27017/?replicaSet=rs0"
DB_NAME = "admin" # Usually connect to admin DB for serverStatus
def get_mongo_metrics():
client = None
try:
client = pymongo.MongoClient(MONGO_URI, serverSelectionTimeoutMS=5000)
# The ismaster command is cheap and does not require auth.
client.admin.command('ismaster')
logging.info("Successfully connected to MongoDB.")
# Get server status
server_status = client.admin.command('serverStatus')
metrics = []
# Connections
metrics.append({
'MetricName': 'Connections_Current',
'Value': server_status['connections']['current'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'Connections_Available',
'Value': server_status['connections']['available'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'Connections_TotalCreated',
'Value': server_status['connections']['totalCreated'],
'Unit': 'Count'
})
# Network
metrics.append({
'MetricName': 'Network_InBytes',
'Value': server_status['network']['bytesIn'],
'Unit': 'Bytes'
})
metrics.append({
'MetricName': 'Network_OutBytes',
'Value': server_status['network']['bytesOut'],
'Unit': 'Bytes'
})
metrics.append({
'MetricName': 'Network_NumRequests',
'Value': server_status['network']['numRequests'],
'Unit': 'Count'
})
# Operations
metrics.append({
'MetricName': 'OpCounters_Insert',
'Value': server_status['opcounters']['insert'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'OpCounters_Query',
'Value': server_status['opcounters']['query'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'OpCounters_Update',
'Value': server_status['opcounters']['update'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'OpCounters_Delete',
'Value': server_status['opcounters']['delete'],
'Unit': 'Count'
})
# Memory
metrics.append({
'MetricName': 'Memory_Resident',
'Value': server_status['mem']['resident'],
'Unit': 'Megabytes'
})
metrics.append({
'MetricName': 'Memory_Virtual',
'Value': server_status['mem']['virtual'],
'Unit': 'Megabytes'
})
# WiredTiger Cache (if applicable)
if 'wiredTiger' in server_status and 'cache' in server_status['wiredTiger']:
metrics.append({
'MetricName': 'WiredTiger_Cache_BytesUsed',
'Value': server_status['wiredTiger']['cache']['bytesCurrentlyUsed'],
'Unit': 'Bytes'
})
metrics.append({
'MetricName': 'WiredTiger_Cache_PagesReadIntoCache',
'Value': server_status['wiredTiger']['cache']['pagesReadIntoCache'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'WiredTiger_Cache_PagesWrittenFromCache',
'Value': server_status['wiredTiger']['cache']['pagesWrittenFromCache'],
'Unit': 'Count'
})
# Assumes a replica set, get self status
if 'repl' in server_status and 'myState' in server_status['repl']:
state_map = {0: "STARTUP", 1: "PRIMARY", 2: "SECONDARY", 7: "ARBITER"}
metrics.append({
'MetricName': 'ReplicaSet_State',
'Value': server_status['repl']['myState'],
'Unit': 'Count' # Using Count for state code, can be mapped in CloudWatch
})
logging.info(f"ReplicaSet State: {state_map.get(server_status['repl']['myState'], 'UNKNOWN')}")
return metrics
except pymongo.errors.ConnectionFailure as e:
logging.error(f"Could not connect to MongoDB: {e}")
# Optionally push a "Health" metric
cloudwatch.put_metric_data(
Namespace=NAMESPACE,
MetricData=[{'MetricName': 'Health', 'Value': 0, 'Unit': 'Count'}]
)
return []
except Exception as e:
logging.error(f"An error occurred: {e}")
return []
finally:
if client:
client.close()
def push_to_cloudwatch(metrics):
if not metrics:
logging.warning("No metrics to push.")
return
try:
response = cloudwatch.put_metric_data(
Namespace=NAMESPACE,
MetricData=metrics
)
logging.info(f"Successfully pushed {len(metrics)} metrics to CloudWatch.")
except Exception as e:
logging.error(f"Failed to push metrics to CloudWatch: {e}")
if __name__ == "__main__":
metrics_data = get_mongo_metrics()
if metrics_data:
push_to_cloudwatch(metrics_data)
# Push a Health metric if connection was successful
cloudwatch.put_metric_data(
Namespace=NAMESPACE,
MetricData=[{'MetricName': 'Health', 'Value': 1, 'Unit': 'Count'}]
)
else:
logging.error("Failed to retrieve MongoDB metrics.")
Schedule this script to run periodically (e.g., every minute) using cron. Ensure the script has the necessary IAM permissions to write to CloudWatch.
MongoDB Slow Query Logging
Slow queries can significantly degrade application performance. Enable and monitor MongoDB’s slow query log. Configure the log level and threshold in your mongod.conf.
# /etc/mongod.conf
# ... other configurations ...
systemLog:
destination: file
path: "/var/log/mongodb/mongod.log"
logAppend: true
verbosity: 0 # Adjust as needed
quiet: false
timeStampFormat: iso8601
component:
query:
slowOpSampleRate: 0.1 # Log 10% of slow operations
slowOpThreshold: "100ms" # Log operations taking longer than 100ms
# ... other configurations ...
You can then use the CloudWatch Agent to tail this log file and send specific error patterns or query details to CloudWatch Logs. Define filters in your agent configuration to extract relevant information.
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
// ... other Magento logs ...
{
"file_path": "/var/log/mongodb/mongod.log",
"log_group_name": "Magento2/MongoDB/SlowQuery",
"log_stream_name": "{instance_id}/mongodb_slowquery",
"log_pattern": "^(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{3}\\+\\d{4})\\s+(?P<level>\\w+)\\s+(?P<message>.*)$"
}
]
}
}
}
}
Alerting Strategies with CloudWatch Alarms
Proactive alerting is crucial. Configure CloudWatch Alarms based on the metrics and logs collected.
Key Alarms for Magento 2
- High PHP-FPM Process Count: Alarm if
active_processesexceeds a threshold (e.g., 80% ofmax_children) for a sustained period. - High Listen Queue: Alarm if
listen_queueis consistently above 0. - High Nginx/PHP-FPM Error Rate: Create metric filters on CloudWatch Logs for specific error patterns (e.g., “PHP Fatal error”, “Segmentation fault”, “upstream timed out”) and alarm on their frequency.
- Application Exceptions: Alarm on the count of entries in the
Magento2/App/Exceptionlog group. - High Latency (Custom Metric): Instrument your Magento application to send request latency metrics to CloudWatch and alarm on high percentiles (e.g., p95, p99).
Key Alarms for MongoDB
- MongoDB Health: Alarm if the custom
Healthmetric drops to 0. - Replica Set State: Alarm if a secondary node is not in the
SECONDARYstate (value 2) for an extended period, or if the primary is not in thePRIMARYstate (value 1). - High Connection Count: Alarm if
Connections_Currentapproaches the configured maximum. - Slow Query Rate: If you can extract a count of slow queries from logs or a dedicated metric, alarm on an increasing trend.
- High Disk I/O Wait: Monitor EC2 disk I/O metrics for the EBS volumes hosting MongoDB data and index files.
Distributed Tracing and Performance Analysis
For complex Magento 2 applications, understanding request flow across different services (web server, PHP-FPM, database, external APIs) is vital. AWS X-Ray can provide distributed tracing capabilities.
AWS X-Ray Integration
Instrument your PHP application using the AWS X-Ray SDK for PHP. This allows you to trace requests as they traverse your Magento 2 application, including calls to MongoDB.
Install the SDK:
composer require aws/aws-sdk-php aws/aws-xray-sdk-php
Configure the X-Ray daemon on your EC2 instances and integrate the SDK into your Magento application’s bootstrap process. You’ll need to configure middleware to capture incoming requests and outgoing calls to MongoDB.
Database Specific Monitoring Tools
While CloudWatch provides a centralized view, dedicated MongoDB tools offer deeper insights.
Percona Monitoring and Management (PMM)
PMM is an open-source platform for managing and monitoring MySQL, MongoDB, and PostgreSQL performance. Deploying PMM in your AWS environment (e.g., on a separate EC2 instance or ECS task) provides advanced dashboards for MongoDB, including query analytics, performance metrics, and anomaly detection.
Key PMM features for MongoDB:
- Query Analytics: Identify slow, frequent, or problematic queries.
- Performance Dashboards: Detailed views of connections, operations, replication lag, cache usage, and more.
- Alerting: Built-in alerting capabilities that can complement CloudWatch alarms.
Conclusion
A comprehensive monitoring strategy for Magento 2 and MongoDB on AWS involves a layered approach. AWS CloudWatch serves as the central hub for metrics and logs, augmented by custom scripts for application-specific insights and dedicated tools like PMM for deep database analysis. Proactive alerting based on these monitored signals is key to maintaining a stable and performant e-commerce platform.