Server Monitoring Best Practices: Keeping Your Magento 2 App and MySQL Clusters Alive on AWS
Proactive MySQL Performance Monitoring with Enhanced CloudWatch Metrics
For Magento 2 deployments, especially those leveraging AWS RDS or Aurora, granular visibility into MySQL performance is paramount. Standard CloudWatch metrics offer a baseline, but often fall short of pinpointing the root cause of latency or resource contention. We need to augment these with custom metrics and advanced diagnostics.
A critical area is understanding connection pooling, query execution times, and buffer pool efficiency. We’ll focus on extracting these insights using a combination of AWS CloudWatch Agent and custom SQL queries.
1. Custom CloudWatch Agent Configuration for MySQL Metrics
The CloudWatch Agent can be configured to collect metrics directly from MySQL. This involves setting up a configuration file that specifies the metrics to scrape and how to scrape them. We’ll target key performance indicators like:
Threads_connectedandThreads_running: To monitor connection load.Slow_queries: To track queries exceeding thelong_query_timethreshold.Innodb_buffer_pool_wait_free: Indicates if the InnoDB buffer pool needs more memory.Innodb_row_lock_waits: Highlights contention for row locks.Com_select,Com_insert,Com_update,Com_delete: To understand query type distribution.
Create a configuration file, for example, /opt/aws/amazon-cloudwatch-agent/bin/config.json, with the following structure:
Ensure your MySQL user has the necessary `PROCESS` and `REPLICATION CLIENT` privileges for the agent to query performance schema and status variables.
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "cwagent"
},
"metrics": {
"namespace": "MySQL/Custom",
"metrics_collected": {
"mysql": {
"measurement": [
"Threads_connected",
"Threads_running",
"Slow_queries",
"Innodb_buffer_pool_wait_free",
"Innodb_row_lock_waits",
"Com_select",
"Com_insert",
"Com_update",
"Com_delete"
],
"resources": [
"localhost:3306"
],
"user_credentials": {
"user": "cloudwatch_user",
"password": "your_secure_password"
}
}
}
}
}
After creating or updating the configuration file, validate it and restart the agent:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s sudo systemctl restart amazon-cloudwatch-agent
2. Real-time Query Analysis with Performance Schema
While CloudWatch metrics provide aggregated views, understanding individual slow queries requires diving into MySQL’s Performance Schema. This powerful tool allows for detailed instrumentation of server events. We’ll set up a query to identify the top N slowest queries within a given timeframe.
First, ensure Performance Schema is enabled and configured. In your my.cnf or my.ini:
[mysqld] performance_schema = ON ; Consider enabling specific consumers for detailed analysis ; performance_schema_consumer_events_statements_history_long = ON ; performance_schema_consumer_events_statements_digest_text_256 = ON
After restarting MySQL, you can use the following SQL query to find the most time-consuming queries:
SELECT
digest_text,
COUNT_STAR,
SUM_TIMER_WAIT / 1000000000000 AS total_latency_seconds,
AVG_TIMER_WAIT / 1000000000000 AS avg_latency_seconds,
MAX_TIMER_WAIT / 1000000000000 AS max_latency_seconds,
SUM_ROWS_SENT,
AVG_ROWS_SENT,
SUM_ROWS_EXAMINED,
AVG_ROWS_EXAMINED,
FIRST_SEEN,
LAST_SEEN
FROM
performance_schema.events_statements_summary_by_digest
WHERE
SCHEMA_NAME = 'magento_db_name' -- Replace with your Magento database name
ORDER BY
total_latency_seconds DESC
LIMIT 10;
To automate this, you can create a cron job that runs this query periodically and sends the results to CloudWatch Logs or a dedicated alerting system. For example, a Python script:
import pymysql
import boto3
import json
import datetime
DB_HOST = 'your_rds_endpoint.rds.amazonaws.com'
DB_USER = 'cloudwatch_user'
DB_PASSWORD = 'your_secure_password'
DB_NAME = 'magento_db_name'
LOG_GROUP_NAME = '/aws/rds/mysql/magento_slow_queries'
REGION_NAME = 'us-east-1'
def get_slow_queries():
conn = None
try:
conn = pymysql.connect(host=DB_HOST, user=DB_USER, password=DB_PASSWORD, database=DB_NAME, cursorclass=pymysql.cursors.DictCursor)
with conn.cursor() as cursor:
query = """
SELECT
digest_text,
COUNT_STAR,
SUM_TIMER_WAIT / 1000000000000 AS total_latency_seconds,
AVG_TIMER_WAIT / 1000000000000 AS avg_latency_seconds,
MAX_TIMER_WAIT / 1000000000000 AS max_latency_seconds,
SUM_ROWS_SENT,
AVG_ROWS_SENT,
SUM_ROWS_EXAMINED,
AVG_ROWS_EXAMINED,
FIRST_SEEN,
LAST_SEEN
FROM
performance_schema.events_statements_summary_by_digest
WHERE
SCHEMA_NAME = %s AND TIMER_WAIT > 0
ORDER BY
total_latency_seconds DESC
LIMIT 10;
"""
cursor.execute(query, (DB_NAME,))
results = cursor.fetchall()
return results
except Exception as e:
print(f"Error fetching slow queries: {e}")
return []
finally:
if conn:
conn.close()
def send_to_cloudwatch_logs(log_events):
client = boto3.client('logs', region_name=REGION_NAME)
try:
response = client.put_log_events(
logGroupName=LOG_GROUP_NAME,
logStreamName=f"slow_queries_{datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}",
logEvents=log_events
)
print(f"Successfully sent logs to CloudWatch. Response: {response}")
except Exception as e:
print(f"Error sending logs to CloudWatch: {e}")
if __name__ == "__main__":
slow_queries = get_slow_queries()
if slow_queries:
log_events = []
timestamp = int(datetime.datetime.now().timestamp() * 1000)
for query_data in slow_queries:
message = json.dumps(query_data)
log_events.append({
'timestamp': timestamp,
'message': message
})
send_to_cloudwatch_logs(log_events)
else:
print("No slow queries found or an error occurred.")
This script should be scheduled via cron to run every 5-15 minutes. Ensure the IAM role attached to your EC2 instance (or the credentials used by your Lambda function) has permissions for logs:CreateLogGroup, logs:CreateLogStream, and logs:PutLogEvents.
Magento 2 Application Performance Monitoring (APM) on AWS
Monitoring the Magento 2 application layer requires a different approach, focusing on request latency, error rates, and resource utilization within the PHP-FPM and web server (Nginx/Apache) processes. AWS provides several tools, but often a dedicated APM solution offers deeper insights.
1. Leveraging AWS X-Ray for Distributed Tracing
AWS X-Ray is invaluable for tracing requests as they traverse different services, including your Magento application, API gateways, and databases. For PHP, you’ll need to install and configure the X-Ray SDK.
Installation (Composer):
composer require aws/aws-sdk-php aws/aws-xray-sdk-php
Configuration in Magento:
You’ll need to integrate X-Ray tracing into your Magento application’s bootstrap process. This typically involves creating a plugin or an event observer that starts and stops X-Ray segments.
<?php
// app/code/Vendor/XrayTrace/etc/di.xml
use Magento\Framework\Event\ObserverInterface;
use Aws\XRay\XRayClient;
use Magento\Framework\App\RequestInterface;
use Magento\Framework\App\ResponseInterface;
class TraceRequest implements ObserverInterface
{
protected $xrayClient;
protected $request;
protected $response;
public function __construct(
XRayClient $xrayClient,
RequestInterface $request,
ResponseInterface $response
) {
$this->xrayClient = $xrayClient;
$this->request = $request;
$this->response = $response;
}
public function execute(\Magento\Framework\Event\Observer $observer)
{
$this->xrayClient->beginSegment('MagentoRequest');
$this->xrayClient->getCurrentSegment()->putAnnotation('request_uri', $this->request->getRequestUri());
$this->xrayClient->getCurrentSegment()->putAnnotation('request_method', $this->request->getMethod());
// Add an observer to end the segment on response send
$observer->getEvent()->getFrontController()->getResponse()->setBodyCallback(function($body) {
$this->xrayClient->endSegment();
return $body;
});
}
}
?>
<?php
// app/code/Vendor/XrayTrace/etc/events.xml
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:Event/etc/events.xsd">
<event name="controller_action_predispatch">
<observer name="vendor_xraytrace_trace_request" instance="Vendor\XrayTrace\Observer\TraceRequest"/>
</event>
<event name="controller_front_send_response_before">
<observer name="vendor_xraytrace_end_segment" instance="Vendor\XrayTrace\Observer\TraceRequest"/>
</event>
</config>
?>
<?php
// app/code/Vendor/XrayTrace/etc/di.xml (for XRayClient factory)
<?xml version="1.0"?>
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:ObjectManager/etc/config.xsd">
<type name="Aws\XRay\XRayClient">
<arguments>
<argument name="config" xsi:type="array">
<item name="region" xsi:type="string">us-east-1</item> <!-- Replace with your AWS region -->
<item name="service" xsi:type="string">XRay</item>
<item name="version" xsi:type="string">latest</item>
</argument>
</arguments>
</type>
<virtualtype name="Vendor\XrayTrace\Service\XRayClientFactory" type="Aws\XRay\XRayClient">
<arguments>
<argument name="config" xsi:type="array">
<item name="region" xsi:type="string">us-east-1</item> <!-- Replace with your AWS region -->
</argument>
</arguments>
</virtualtype>
<preference for="Aws\XRay\XRayClient" type="Vendor\XrayTrace\Service\XRayClientFactory"/>
</config>
?>
You’ll also need to ensure the X-Ray daemon is running on your EC2 instances or that your Lambda functions are configured to send traces to X-Ray. For EC2, this typically involves installing the daemon and running it as a service.
2. PHP-FPM and Nginx/Apache Metrics with CloudWatch Agent
Beyond MySQL, the web server and PHP-FPM are critical bottlenecks. We can extend the CloudWatch Agent configuration to collect metrics from these components.
Nginx Metrics:
Ensure the Nginx `stub_status` module is enabled and configured in your Nginx configuration:
# nginx.conf or a site-specific conf file
http {
# ... other configurations ...
server {
listen 80;
server_name your_domain.com;
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1; # Allow access only from localhost for security
deny all;
}
# ... other locations ...
}
}
Add the following to your CloudWatch Agent config.json under metrics_collected:
"nginx": {
"measurement": [
"active_connections",
"reading_connections",
"writing_connections",
"waiting_connections"
],
"resources": [
"http://localhost/nginx_status"
],
"metrics_collection_interval": 60
}
PHP-FPM Metrics:
PHP-FPM exposes its status via a TCP socket or a Unix domain socket. Configure it in your PHP-FPM pool configuration (e.g., /etc/php/7.4/fpm/pool.d/www.conf):
; pm.status_path = /fpm_status ; You might need to configure access for the CloudWatch agent user ; or use a TCP socket for easier access. ; Example using TCP socket: listen = 127.0.0.1:9000 ; listen.mode = 0666 ; listen.owner = www-data ; listen.group = www-data ; listen.acl_users = ; listen.acl_groups = ; listen.backlog = 511 ; listen.allowed_clients = 127.0.0.1 ; listen.tcp_nodelay = 1 ; listen.ipv6 = 1 ; For status page access via TCP socket: ; You might need to configure a separate web server block (e.g., Nginx) ; to proxy requests to the FPM socket and expose it. ; Or, configure the CloudWatch agent to connect directly if possible. ; A common approach is to use a small web server to expose FPM status.
If using a Unix socket, you might need a small web server (like Nginx) to proxy requests to it. If using TCP, the CloudWatch agent can often connect directly. Add to config.json:
"phpfpm": {
"measurement": [
"accepted_conn",
"active_processes",
"idle_processes",
"max_children",
"max_requests",
"pool",
"process_mpm",
"slow_requests"
],
"servers": [
"127.0.0.1:9000"
],
"metrics_collection_interval": 60
}
After updating the config.json, restart the CloudWatch agent as shown previously.
Alerting Strategies for High Availability
Effective alerting is the final layer of defense. We need to define thresholds for both custom and standard CloudWatch metrics, and integrate with notification services like SNS.
1. Critical MySQL Alerts
Set up alarms for:
Threads_runningexceeding a high threshold (e.g., > 80% ofmax_connections).Innodb_buffer_pool_wait_freeconsistently above 0.Innodb_row_lock_waitsshowing a significant increase over a short period.Slow_queriescount increasing rapidly.- RDS/Aurora specific metrics like
CPUUtilization,DatabaseConnections,FreeableMemory,ReadIOPS,WriteIOPS.
Example AWS CLI command to create a CloudWatch alarm:
aws cloudwatch put-metric-alarm \
--alarm-name "Magento-MySQL-High-Connections" \
--alarm-description "High number of active MySQL connections" \
--metric-name Threads_running \
--namespace "MySQL/Custom" \
--statistic Average \
--period 300 \
--threshold 150 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
--evaluation-periods 2 \
--datapoints-to-alarm 2 \
--treat-missing-data notBreaching \
--alarm-actions arn:aws:sns:us-east-1:123456789012:MagentoAlertsTopic
2. Critical Application Alerts
Configure alarms for:
- High
CPUUtilizationon EC2 instances running Magento. - High
NetworkIn/NetworkOut. - High
RequestCountandHTTPCode_5XX_Countfrom Application Load Balancer (ALB) or CloudFront. - X-Ray trace errors exceeding a threshold.
- PHP-FPM
slow_requestscount. - Nginx
5xxerror rate (can be derived from ALB logs or custom Nginx logging).
For application-level errors not captured by standard metrics, consider using CloudWatch Logs Insights to query your application logs for specific error patterns (e.g., “Fatal error”, “Exception”) and create metric filters or alarms based on those queries.
By combining granular MySQL performance metrics, application-level tracing with X-Ray, and comprehensive web server/PHP-FPM monitoring, you establish a robust observability stack for your Magento 2 application on AWS, ensuring high availability and rapid issue resolution.