Server Monitoring Best Practices: Keeping Your PHP App and MySQL Clusters Alive on AWS

Proactive PHP Application Health Checks with CloudWatch Alarms

Relying solely on basic CPU and Memory utilization metrics for your PHP application instances on AWS EC2 is a recipe for disaster. Production environments demand granular, application-level insights. We’ll leverage CloudWatch custom metrics and alarms to monitor key PHP application indicators, ensuring we catch issues before they impact users.

A common pattern is to expose application health endpoints that return specific metrics or status codes. For a PHP application, this could be a simple script that checks database connectivity, cache status, and internal service dependencies. We can then use a lightweight agent on the EC2 instance to scrape this endpoint and push custom metrics to CloudWatch.

Implementing a PHP Health Check Endpoint

Create a file, e.g., /var/www/html/healthcheck.php, with the following logic:

<?php
header('Content-Type: application/json');

$status = [
    'status' => 'ok',
    'timestamp' => time(),
    'checks' => []
];

// Database Connection Check (Example using PDO)
try {
    // Replace with your actual DB credentials and connection string
    $db = new PDO('mysql:host=your_rds_endpoint;dbname=your_db_name', 'db_user', 'db_password', [PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_TIMEOUT => 2]);
    $status['checks']['database'] = 'ok';
} catch (PDOException $e) {
    $status['status'] = 'error';
    $status['checks']['database'] = 'error: ' . $e->getMessage();
} finally {
    $db = null; // Close connection
}

// Cache Check (Example using Redis)
try {
    $redis = new Redis();
    $redis->connect('your_redis_host', 6379); // Replace with your Redis host
    if ($redis->ping()) {
        $status['checks']['cache'] = 'ok';
    } else {
        throw new RedisException('Redis PING failed');
    }
} catch (RedisException $e) {
    $status['status'] = 'error';
    $status['checks']['cache'] = 'error: ' . $e->getMessage();
} finally {
    if (isset($redis) && $redis->isConnected()) {
        $redis->close();
    }
}

// Add more checks as needed (e.g., external API availability, file permissions)

// Output JSON response
echo json_encode($status);
exit;
?>

Ensure your web server (Nginx/Apache) is configured to allow access to this endpoint, and that the PHP process has the necessary extensions (PDO, Redis) installed and enabled.

Scraping and Publishing Metrics with the CloudWatch Agent

We’ll use the CloudWatch Agent to periodically fetch the healthcheck endpoint and publish custom metrics. First, install the agent on your EC2 instances.

Configure the agent by creating a configuration file, e.g., /opt/aws/amazon-cloudwatch-agent/bin/config.json. This configuration will define a custom metric collection that scrapes our health endpoint.

{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "cwagent"
  },
  "metrics": {
    "namespace": "MyApp/Health",
    "metrics_collected": {
      "http_listener": {
        "url": "http://localhost/healthcheck.php",
        "method": "GET",
        "response_code_metrics": true,
        "response_size_metrics": true,
        "response_time_metrics": true,
        "metrics_per_request": [
          {
            "name": "health_status",
            "type": "gauge",
            "value_path": "status",
            "value_type": "string",
            "string_map_metrics": {
              "ok": 1,
              "error": 0
            }
          },
          {
            "name": "db_check_status",
            "type": "gauge",
            "value_path": "checks.database",
            "value_type": "string",
            "string_map_metrics": {
              "ok": 1,
              "error": 0
            }
          },
          {
            "name": "cache_check_status",
            "type": "gauge",
            "value_path": "checks.cache",
            "value_type": "string",
            "string_map_metrics": {
              "ok": 1,
              "error": 0
            }
          }
        ]
      }
    }
  }
}

Apply the configuration and start the agent:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
sudo systemctl start amazon-cloudwatch-agent

After a few minutes, you should see custom metrics like health_status, db_check_status, and cache_check_status appearing in your CloudWatch console under the MyApp/Health namespace.

Creating CloudWatch Alarms for Critical Failures

Now, let’s set up alarms that trigger notifications when these health metrics indicate a problem. We’ll focus on the health_status and db_check_status metrics.

Navigate to CloudWatch -> Alarms -> Create alarm. Configure the following:

Metric: Select MyApp/Health namespace, health_status metric.
Statistic: Average
Period: 1 minute
Threshold type: Static
Condition: is below
Value: 0.9 (This assumes ‘ok’ is 1 and ‘error’ is 0. A sustained average below 1 indicates errors.)
Datapoints to alarm: 3 out of 3 (Requires 3 consecutive minutes of errors to trigger)
Missing data treatment: Treat missing data as good (not breaching threshold) (Or adjust based on your tolerance for missing metrics)
Notification: Select an SNS topic to send alerts to (e.g., for email, Slack integration via Lambda).

Repeat a similar process for db_check_status. You might want a more sensitive alarm for database connectivity, perhaps triggering on 2 out of 3 data points.

For more advanced scenarios, you can create composite alarms that depend on multiple individual alarms. For instance, an alarm that only triggers if both the application health and database health alarms are in ALARM state.

Optimizing MySQL Cluster Performance with Performance Insights and Enhanced Monitoring

Monitoring a MySQL cluster on AWS RDS or Aurora requires more than just basic CPU/IOPS. Performance Insights offers deep visibility into database load, while enhanced monitoring provides OS-level metrics crucial for diagnosing bottlenecks.

Enabling and Utilizing Performance Insights

Performance Insights is a powerful tool for identifying the SQL queries and wait events that are consuming the most database resources. Ensure it’s enabled for your RDS or Aurora cluster.

When creating or modifying your RDS/Aurora cluster, under “Additional configuration”, set “Performance Insights” to “Enabled”. Choose a retention period (e.g., 7 days) and select an IAM role for access if needed.

Once enabled, navigate to the RDS console, select your cluster, and click the “Performance Insights” tab. You’ll see a dashboard showing:

Database Load: Visualized by wait events (e.g., io/file/innodb/innodb_data_file for disk I/O, lock/table/sql/handler for table locks).
Top SQL: The queries contributing most to the load.
Top Waits: The most frequent wait events.
Hosts/Users: Identifying sources of load.

Actionable Insights:

If io/file/innodb/innodb_data_file is high, investigate slow queries, inefficient indexing, or consider provisioning higher IOPS storage (e.g., gp3 with provisioned IOPS, or io1/io2). For Aurora, this might indicate inefficient read/write patterns or large transactions.
If lock waits are prevalent, analyze transactions for long-running operations, optimize query locking strategies, or consider read replicas for read-heavy workloads.
Use the “Top SQL” to identify candidates for query optimization (adding indexes, rewriting queries).

Configuring Enhanced Monitoring for Deeper OS Insights

While Performance Insights focuses on the database engine, Enhanced Monitoring provides OS-level metrics that can reveal underlying system issues impacting the database. This is particularly useful for self-managed MySQL on EC2 or when diagnosing issues not directly visible in Performance Insights.

Enable Enhanced Monitoring when creating or modifying your RDS instance/cluster. Choose an appropriate granularity (e.g., 1-second, 5-second, 10-second, 30-second, 60-second intervals). Higher granularity provides more detail but increases CloudWatch costs.

Key metrics to monitor:

CPU Utilization: Overall CPU usage.
Memory Utilization: Available memory. High swap usage is a critical indicator.
Disk I/O: ReadIOPS, WriteIOPS, ReadLatency, WriteLatency. Correlate these with Performance Insights I/O waits.
Network I/O: NetworkReceiveThroughput, NetworkTransmitThroughput. High network traffic could indicate inefficient data transfer or replication issues.
Process List: Monitor the number of running processes, especially MySQL-related ones.

Setting Up RDS/Aurora Specific CloudWatch Alarms

Leverage CloudWatch alarms on both standard RDS metrics and custom metrics derived from Enhanced Monitoring and Performance Insights (via logs or custom scripts).

Essential RDS Alarms:

CPUUtilization: Set a static threshold (e.g., > 80% for 15 minutes) to detect sustained high CPU load.
FreeableMemory: Alarm if it drops below a critical level (e.g., < 100MB for 5 minutes). This indicates memory pressure.
SwapUsage: Alarm if it exceeds a small threshold (e.g., > 10MB for 5 minutes). Significant swap usage drastically degrades performance.
ReadIOPS/WriteIOPS: Monitor against provisioned limits or baseline performance.
ReadLatency/WriteLatency: Alarm on high latency (e.g., > 50ms for 5 minutes).
DatabaseConnections: Alarm if approaching the maximum configured connections.

Performance Insights Alarms (via Log Analysis or Custom Metrics):

Direct alarms on Performance Insights metrics aren’t available out-of-the-box. A common strategy is to export Performance Insights data to CloudWatch Logs and then create metric filters. Alternatively, use a Lambda function triggered by CloudWatch Events to periodically query Performance Insights and publish custom metrics.

Example: Alarm on High DB Load (Conceptual – requires log export/Lambda):

Export Performance Insights data to CloudWatch Logs.
Create a metric filter for log entries indicating high load (e.g., specific wait events exceeding a threshold).
Create a CloudWatch alarm based on this custom metric filter (e.g., alarm if the filter matches more than N times in 5 minutes).

This approach allows you to proactively alert on specific database performance degradation patterns identified by Performance Insights, complementing the OS-level metrics from Enhanced Monitoring.

Server Monitoring Best Practices: Keeping Your PHP App and MySQL Clusters Alive on AWS

Proactive PHP Application Health Checks with CloudWatch Alarms

Implementing a PHP Health Check Endpoint

Scraping and Publishing Metrics with the CloudWatch Agent

Creating CloudWatch Alarms for Critical Failures

Optimizing MySQL Cluster Performance with Performance Insights and Enhanced Monitoring

Enabling and Utilizing Performance Insights

Configuring Enhanced Monitoring for Deeper OS Insights

Setting Up RDS/Aurora Specific CloudWatch Alarms

Recent Posts

Top Categories

Our Products

Our Services