Server Monitoring Best Practices: Keeping Your Shopify App and MongoDB Clusters Alive on AWS
Establishing a Robust Monitoring Foundation with AWS CloudWatch
For any production Shopify app hosted on AWS, particularly those leveraging MongoDB clusters, a comprehensive monitoring strategy is non-negotiable. AWS CloudWatch serves as the foundational layer for this, providing essential metrics, logs, and alarms. We’ll focus on key areas: EC2 instance health, RDS/DocumentDB performance (if applicable), and application-level metrics.
EC2 Instance Health Monitoring
Beyond the default CloudWatch metrics (CPU Utilization, Network In/Out, Disk Read/Write Ops), we need to ensure our application servers are truly healthy. This involves custom metrics and log analysis.
Key Metrics to Monitor:
- CPU Utilization: Standard, but essential. High sustained CPU can indicate inefficient code or insufficient resources.
- Memory Utilization: Crucial for applications. CloudWatch agent can be configured to send memory metrics.
- Disk I/O Operations: High I/O can bottleneck applications, especially database-intensive ones.
- Network Traffic: Monitor for unusual spikes or drops that might indicate network issues or DoS attacks.
- Process Health: Ensure critical application processes (e.g., your PHP-FPM workers, Node.js processes) are running. This often requires custom scripting and sending custom metrics.
Configuring the CloudWatch Agent for Enhanced Metrics
The CloudWatch agent allows us to collect system-level metrics (like memory utilization) and custom application metrics. Here’s a sample configuration file for an EC2 instance running a typical web application stack.
Create a file named amazon-cloudwatch-agent.json on your EC2 instance.
Example amazon-cloudwatch-agent.json:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "cwagent"
},
"metrics": {
"namespace": "ShopifyApp/EC2",
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"aggregation_dimensions": [
[ "InstanceId" ]
],
"metrics_collected": {
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait",
"cpu_usage_user",
"cpu_usage_system"
],
"totalcpu": false
},
"disk": {
"measurement": [
"disk_read_ops",
"disk_write_ops",
"disk_read_bytes",
"disk_write_bytes"
],
"resources": [
"xvda",
"xvdb"
],
"ignore_file_system_types": [
"sysfs",
"devtmpfs",
"tmpfs",
"devfs",
"iso9660",
"overlay",
"aufs",
"squashfs"
]
},
"mem": {
"measurement": [
"mem_used_percent",
"mem_available_percent"
]
},
"net": {
"measurement": [
"bytes_recv",
"bytes_sent",
"packets_recv",
"packets_sent"
]
},
"statsd": {
"service_address": "udp:localhost:8125",
"metrics_collection_interval": 60
}
}
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/nginx/access.log",
"log_group_name": "ShopifyApp/Nginx/Access",
"log_stream_name": "{instance_id}"
},
{
"file_path": "/var/log/nginx/error.log",
"log_group_name": "ShopifyApp/Nginx/Error",
"log_stream_name": "{instance_id}"
},
{
"file_path": "/var/log/php-fpm/www-error.log",
"log_group_name": "ShopifyApp/PHP-FPM/Error",
"log_stream_name": "{instance_id}"
}
]
}
}
}
}
After creating the file, install and start the agent:
sudo yum install amazon-cloudwatch-agent -y # For Amazon Linux/CentOS/RHEL # or sudo apt-get install amazon-cloudwatch-agent -y # For Ubuntu/Debian sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/path/to/your/amazon-cloudwatch-agent.json -s
MongoDB Cluster Monitoring on AWS (DocumentDB or EC2-hosted)
Monitoring your MongoDB cluster is paramount. The approach differs slightly depending on whether you’re using AWS DocumentDB or self-hosting MongoDB on EC2 instances.
DocumentDB Monitoring
DocumentDB integrates seamlessly with CloudWatch. Key metrics are available by default. Ensure you’re monitoring the following:
- CPUUtilization
- DatabaseConnections
- ReadIOPS, WriteIOPS
- ReadLatency, WriteLatency
- FreeableMemory
- NetworkReceiveThroughput, NetworkTransmitThroughput
- DiskQueueDepth
Beyond these, consider enabling performance insights for deeper query analysis and setting up alarms on critical thresholds.
Self-Hosted MongoDB on EC2 Monitoring
For self-hosted MongoDB, you’ll need to leverage the CloudWatch agent’s custom metrics capabilities and potentially external tools.
Key MongoDB Metrics to Collect:
- Operations per Second (inserts, queries, updates, deletes)
- Connection Count
- Memory Usage (resident, virtual)
- Disk Usage
- Replication Lag (if applicable)
- Lock Percentages
- Network Traffic
You can collect these using the MongoDB `mongostat` and `mongotop` commands, or by querying the `serverStatus` command and sending the data as custom CloudWatch metrics. A common pattern is to use a Python script that runs periodically.
import boto3
import pymongo
import time
import json
from datetime import datetime
# --- Configuration ---
MONGO_HOST = "your_mongodb_host"
MONGO_PORT = 27017
CLOUDWATCH_NAMESPACE = "ShopifyApp/MongoDB"
# IAM role for EC2 instance should have CloudWatch PutMetricData permissions
# --- CloudWatch Client ---
cloudwatch = boto3.client('cloudwatch')
def get_mongo_stats():
client = None
try:
client = pymongo.MongoClient(MONGO_HOST, MONGO_PORT, serverSelectionTimeoutMS=5000)
client.admin.command('ping') # Test connection
db = client.admin
stats = db.command('serverStatus')
metrics = []
# Basic Metrics
metrics.append({
'MetricName': 'OperationsInsert',
'Value': stats['opcounters']['insert'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'OperationsQuery',
'Value': stats['opcounters']['query'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'OperationsUpdate',
'Value': stats['opcounters']['update'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'OperationsDelete',
'Value': stats['opcounters']['delete'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'ConnectionsCurrent',
'Value': stats['connections']['current'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'ConnectionsAvailable',
'Value': stats['connections']['available'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'NetworkIn',
'Value': stats['network']['bytesIn'],
'Unit': 'Bytes'
})
metrics.append({
'MetricName': 'NetworkOut',
'Value': stats['network']['bytesOut'],
'Unit': 'Bytes'
})
metrics.append({
'MetricName': 'MemoryResident',
'Value': stats['mem']['resident'],
'Unit': 'Megabytes'
})
metrics.append({
'MetricName': 'MemoryVirtual',
'Value': stats['mem']['virtual'],
'Unit': 'Megabytes'
})
# Replication Lag (if replica set)
if 'repl' in stats:
for member in stats['repl']['members']:
if member['self']:
metrics.append({
'MetricName': 'ReplicationLag',
'Value': member['optimeLag'],
'Unit': 'Seconds'
})
break # Assuming only one self member
return metrics
except pymongo.errors.ConnectionFailure as e:
print(f"Could not connect to MongoDB: {e}")
return None
except Exception as e:
print(f"An error occurred: {e}")
return None
finally:
if client:
client.close()
def put_metrics_to_cloudwatch(metrics_data):
if not metrics_data:
return
try:
response = cloudwatch.put_metric_data(
Namespace=CLOUDWATCH_NAMESPACE,
MetricData=metrics_data
)
print(f"Successfully put metrics to CloudWatch: {response}")
except Exception as e:
print(f"Error putting metrics to CloudWatch: {e}")
if __name__ == "__main__":
print("Starting MongoDB monitoring script...")
while True:
mongo_metrics = get_mongo_stats()
if mongo_metrics:
put_metrics_to_cloudwatch(mongo_metrics)
else:
print("Failed to retrieve MongoDB metrics.")
print("Waiting for next interval...")
time.sleep(60) # Collect metrics every 60 seconds
To run this script, you’ll need to install the Boto3 and PyMongo libraries:
pip install boto3 pymongo
Then, configure your EC2 instance’s IAM role to have permissions for cloudwatch:PutMetricData. You can run this script as a systemd service or a cron job.
Application-Level Metrics and Logging
Monitoring the underlying infrastructure is only half the battle. We need visibility into our Shopify app’s performance and behavior.
Custom Application Metrics
Instrument your application code to emit custom metrics. For a PHP application, this might involve using a library that can send metrics to a StatsD endpoint, which the CloudWatch agent can then collect. If using Node.js, libraries like @aws-sdk/client-cloudwatch can be used directly.
Example (Conceptual PHP using a StatsD client):
<?php
// Assuming you have a StatsD client library configured
// e.g., using `php-statsd-client` or similar
$statsd = new StatsDClient(['host' => '127.0.0.1', 'port' => 8125]);
// Example: Track API request duration
$startTime = microtime(true);
// ... your API request processing logic ...
$endTime = microtime(true);
$duration = ($endTime - $startTime) * 1000; // in milliseconds
$statsd->timing('api.request.duration', $duration);
// Example: Track successful order creations
if ($orderWasCreatedSuccessfully) {
$statsd->increment('shopify.orders.created');
} else {
$statsd->increment('shopify.orders.failed');
}
// Example: Track specific shopify API call latency
$shopifyApiStartTime = microtime(true);
// ... call Shopify API ...
$shopifyApiEndTime = microtime(true);
$shopifyApiDuration = ($shopifyApiEndTime - $shopifyApiStartTime) * 1000;
$statsd->timing('shopify.api.call.latency', $shopifyApiDuration);
?>
Ensure your amazon-cloudwatch-agent.json is configured to collect StatsD metrics (as shown in the earlier EC2 configuration example). The metrics will appear under the `ShopifyApp/EC2` namespace (or whatever you configure).
Centralized Logging with CloudWatch Logs
Application logs are invaluable for debugging. Configure your web server (Nginx/Apache), PHP-FPM, and your application itself to log to files that the CloudWatch agent monitors. The agent will then stream these logs to CloudWatch Logs, enabling centralized searching, filtering, and analysis.
Key Log Files to Monitor:
- Nginx Access and Error Logs
- PHP-FPM Error Logs
- Application-specific logs (e.g.,
storage/logs/laravel.logfor Laravel) - MongoDB logs (if self-hosted)
The amazon-cloudwatch-agent.json configuration includes examples for Nginx and PHP-FPM. For application logs, add entries like this to the "files" section:
{
"file_path": "/var/www/your-app/storage/logs/laravel.log",
"log_group_name": "ShopifyApp/Laravel/App",
"log_stream_name": "{instance_id}"
}
Alerting Strategies with CloudWatch Alarms
Metrics and logs are only useful if they trigger action when something goes wrong. CloudWatch Alarms are essential for proactive issue detection.
Critical Alarms to Configure:
- High CPU Utilization: e.g., > 80% for 15 minutes on any web server.
- Low Memory Availability: e.g., < 10% available memory for 10 minutes.
- High Disk I/O Wait: e.g., > 50ms average latency for 5 minutes.
- High Error Rates: Monitor Nginx error logs or application error metrics. Use CloudWatch Logs metric filters to count error occurrences.
- Database Connection Issues: High number of failed connections or exceeding connection limits.
- Replication Lag (MongoDB): If replication lag exceeds a defined threshold (e.g., > 60 seconds).
- Application-Specific Thresholds: e.g., API request latency exceeding acceptable limits, high rate of failed Shopify API calls.
Example CloudWatch Logs Metric Filter for Nginx Errors:
In the CloudWatch console, navigate to your Nginx error log group (e.g., `ShopifyApp/Nginx/Error`) and create a metric filter. The filter pattern could be:
[error]
This simple pattern will count lines containing “error”. You can create more sophisticated patterns. Then, create an alarm based on this new metric (e.g., “Error Count > 10 in 5 minutes”).
Beyond CloudWatch: Advanced Considerations
While CloudWatch is powerful, consider these for a truly resilient system:
- Distributed Tracing: For complex microservice architectures or deep debugging, tools like AWS X-Ray or open-source alternatives (Jaeger, Zipkin) can trace requests across multiple services.
- Synthetic Monitoring: Use CloudWatch Synthetics Canaries or similar tools to simulate user interactions with your Shopify app (e.g., adding to cart, checkout) to proactively detect availability issues.
- APM Tools: Application Performance Monitoring tools (e.g., Datadog, New Relic, Dynatrace) offer deeper insights into application code performance, database query analysis, and user experience monitoring, often with more intuitive dashboards than raw CloudWatch.
- Automated Remediation: Integrate alarms with AWS Lambda or Systems Manager Automation documents to automatically restart services, scale instances, or trigger other recovery actions.
By implementing a layered monitoring strategy encompassing infrastructure, database, and application-level metrics, coupled with robust alerting and logging, you can significantly improve the reliability and performance of your Shopify app and its underlying MongoDB clusters on AWS.