Server Monitoring Best Practices: Keeping Your PHP App and MySQL Clusters Alive on DigitalOcean

Proactive Health Checks for PHP Applications

A robust monitoring strategy for PHP applications goes beyond simply checking if the web server is responding. We need to ensure the application itself is healthy, processing requests efficiently, and not succumbing to common pitfalls like memory leaks or slow database queries. This involves implementing application-level health checks and integrating them with your chosen monitoring solution.

Implementing a PHP Health Check Endpoint

A simple yet effective approach is to create a dedicated health check endpoint within your PHP application. This endpoint should perform a series of checks and return a clear status. For a typical web application, this might include:

Database connectivity: Verify that the application can connect to its primary database.
Cache connectivity: If using Redis or Memcached, check the connection.
Essential service availability: Ping any other critical external services your app relies on.
Basic application logic: A simple, fast-executing piece of application logic to ensure core functionality is intact.

Here’s a basic example of a PHP health check script. For production, you’d want to abstract database and cache connections into your application’s service layer.

`healthcheck.php` Example

<?php
header('Content-Type: application/json');

$response = [
    'status' => 'unhealthy',
    'checks' => [],
];

// --- Database Check ---
try {
    // Replace with your actual database connection logic
    // Example using PDO:
    $dbHost = getenv('DB_HOST') ?: '127.0.0.1';
    $dbName = getenv('DB_NAME') ?: 'myapp_db';
    $dbUser = getenv('DB_USER') ?: 'db_user';
    $dbPass = getenv('DB_PASS') ?: 'db_password';

    $dsn = "mysql:host={$dbHost};dbname={$dbName};charset=utf8mb4";
    $options = [
        PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
        PDO::ATTR_TIMEOUT => 2, // 2-second timeout
    ];
    $pdo = new PDO($dsn, $dbUser, $dbPass, $options);
    $pdo->query('SELECT 1'); // Simple query to test connection
    $response['checks']['database'] = 'healthy';
} catch (PDOException $e) {
    $response['checks']['database'] = 'unhealthy: ' . $e->getMessage();
} catch (Exception $e) {
    $response['checks']['database'] = 'unhealthy: ' . $e->getMessage();
}

// --- Cache Check (Example: Redis) ---
try {
    // Replace with your actual cache connection logic
    // Example using Predis:
    $redisHost = getenv('REDIS_HOST') ?: '127.0.0.1';
    $redisPort = getenv('REDIS_PORT') ?: 6379;

    $redis = new Redis();
    $redis->connect($redisHost, $redisPort, 1); // 1-second timeout
    if ($redis->ping() === '+PONG') {
        $response['checks']['cache'] = 'healthy';
    } else {
        $response['checks']['cache'] = 'unhealthy: PING failed';
    }
} catch (RedisException $e) {
    $response['checks']['cache'] = 'unhealthy: ' . $e->getMessage();
} catch (Exception $e) {
    $response['checks']['cache'] = 'unhealthy: ' . $e->getMessage();
}

// --- Basic Application Logic Check ---
// Example: Check if a critical configuration value is loaded
if (defined('APP_VERSION') && !empty(APP_VERSION)) {
    $response['checks']['app_logic'] = 'healthy';
} else {
    $response['checks']['app_logic'] = 'unhealthy: APP_VERSION not defined or empty';
}

// Determine overall status
$allHealthy = true;
foreach ($response['checks'] as $check) {
    if (strpos($check, 'unhealthy') !== false) {
        $allHealthy = false;
        break;
    }
}

if ($allHealthy) {
    $response['status'] = 'healthy';
    http_response_code(200);
} else {
    http_response_code(503); // Service Unavailable
}

echo json_encode($response, JSON_PRETTY_PRINT);
exit;
?>

Ensure this script is placed in a location accessible by your monitoring system but ideally not directly exposed to the public internet without proper authentication or IP restrictions. For DigitalOcean App Platform, you can configure a health check endpoint directly in the application’s settings.

Integrating with DigitalOcean Monitoring and External Tools

DigitalOcean’s built-in monitoring provides essential metrics like CPU, memory, disk I/O, and network traffic for your Droplets and Managed Databases. However, for application-level health, you’ll want to leverage external monitoring services or set up your own.

External Uptime Monitoring

Services like UptimeRobot, Pingdom, or StatusCake can periodically poll your `healthcheck.php` endpoint (or a simple `GET /` request if you don’t need detailed checks). Configure them to expect a 200 OK status code for a healthy response and alert you on failures.

Server-Level Monitoring with Prometheus and Grafana

For more granular control and deeper insights, deploying Prometheus and Grafana on a dedicated Droplet is a powerful solution. You’ll need to install:

Node Exporter: To collect system-level metrics from your application Droplets.
Prometheus: To scrape metrics from Node Exporter and other exporters.
Grafana: For visualizing the collected metrics and creating dashboards.
PHP Exporter (optional but recommended): A custom exporter or a generic HTTP exporter configured to scrape your `healthcheck.php` endpoint.

Installation Steps (Simplified):

1. Install Node Exporter on Application Droplets

# Download the latest release
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
cd node_exporter-1.7.0.linux-amd64

# Run it (for testing, use systemd for production)
./node_exporter

# For production, create a systemd service file:
# /etc/systemd/system/node_exporter.service
# [Unit]
# Description=Node Exporter
# Wants=network-online.target
# After=network-online.target
#
# [Service]
# User=prometheus
# Group=prometheus
# Type=simple
# ExecStart=/usr/local/bin/node_exporter --web.listen-address="0.0.0.0:9100"
#
# [Install]
# WantedBy=multi-user.target

# Then enable and start:
# sudo systemctl daemon-reload
# sudo systemctl enable node_exporter
# sudo systemctl start node_exporter
# sudo ufw allow 9100/tcp

2. Set up Prometheus Server

On a separate Droplet (or the same one if resources allow), install Prometheus. A common approach is to use Docker.

# prometheus.yml
global:
  scrape_interval: 15s # By default, scrape targets every 15 seconds.

scrape_configs:
  # Scrape Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Scrape Node Exporter from your application Droplets
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['APP_DROPLET_IP_1:9100', 'APP_DROPLET_IP_2:9100'] # Replace with actual IPs

  # Scrape MySQL Exporter (see next section)
  - job_name: 'mysql_exporter'
    static_configs:
      - targets: ['MYSQL_EXPORTER_IP:9104'] # Replace with actual IP

Run Prometheus using Docker:

docker run -d \
  --name prometheus \
  -p 9090:9090 \
  -v /path/to/your/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus
# Ensure Prometheus can reach your application Droplets on port 9100
# and your MySQL exporter on port 9104. Adjust firewall rules as needed.

3. Set up Grafana

Install Grafana (e.g., via Docker or package manager) and configure Prometheus as a data source. Then, import pre-built dashboards for Node Exporter and potentially MySQL.

MySQL Cluster Monitoring Best Practices

Monitoring MySQL clusters, especially in a high-availability setup (like DigitalOcean’s Managed Databases for MySQL or a custom Galera/Replication setup), requires a different set of metrics. We need to focus on replication lag, query performance, connection counts, and resource utilization specific to the database.

Key MySQL Metrics to Monitor

Replication Lag: Crucial for understanding how far behind replicas are from the primary. Metrics like `Seconds_Behind_Master` (for traditional replication) or Galera’s `wsrep_local_state_comment` and `wsrep_cluster_size`.
Query Performance: Slow query log analysis, `Threads_running`, `QPS` (Queries Per Second), `TPS` (Transactions Per Second).
Connection Management: `Max_used_connections`, `Threads_connected`, `Aborted_connects`.
Buffer Pool Usage: `Innodb_buffer_pool_read_requests`, `Innodb_buffer_pool_reads` (to calculate hit rate).
Disk I/O: `Innodb_data_reads`, `Innodb_data_writes`.
Error Logs: Monitor MySQL error logs for critical issues.

Using MySQL Exporter with Prometheus

The `mysqld_exporter` is the standard Prometheus exporter for MySQL. It queries `SHOW GLOBAL STATUS`, `SHOW GLOBAL VARIABLES`, and other relevant tables to expose metrics.

1. Set up a MySQL User for the Exporter

-- On your MySQL server (or a replica you want to monitor)
CREATE USER 'exporter'@'%' IDENTIFIED BY 'your_strong_password';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';
FLUSH PRIVILEGES;

Note: For production, restrict the ‘exporter’ user’s host to the IP of the machine running `mysqld_exporter`.

2. Configure `mysqld_exporter`

Create a `.my.cnf` file for the exporter to use for authentication. This file should be readable only by the user running the exporter.

# ~/.my.cnf (or a dedicated file like /etc/mysqld_exporter/.my.cnf)
[client]
user=exporter
password=your_strong_password
host=YOUR_MYSQL_HOST # e.g., 127.0.0.1 or the DO Managed DB endpoint
port=3306

3. Run `mysqld_exporter`

You can run it directly or via Docker. For DigitalOcean Managed Databases, the `host` in `.my.cnf` will be the connection endpoint provided in your control panel.

# Download and run directly (similar to node_exporter)
# Or using Docker:
docker run -d \
  --name mysqld_exporter \
  -p 9104:9104 \
  -v /path/to/your/.my.cnf:/etc/mysqld_exporter/.my.cnf \
  prom/mysqld-exporter \
  --config.my-cnf=/etc/mysqld_exporter/.my.cnf \
  --collect.global_status \
  --collect.info_schema.tables \
  --collect.info_schema.processlist \
  --collect.binlog_size \
  --collect.slave_status \
  --collect.global_variables \
  --collect.innodb \
  --collect.engine_innodb_status \
  --collect.replication \
  --web.listen-address="0.0.0.0:9104"

# Ensure the Docker host can reach your MySQL server on port 3306.
# Ensure your Prometheus server can reach this mysqld_exporter on port 9104.
# Adjust firewall rules accordingly.

4. Configure Prometheus to Scrape MySQL Exporter

Add the `mysql_exporter` job to your `prometheus.yml` as shown in the Prometheus setup section, replacing `MYSQL_EXPORTER_IP` with the IP address of the Droplet running `mysqld_exporter`.

Monitoring DigitalOcean Managed Databases

DigitalOcean’s Managed Databases offer built-in metrics accessible via their control panel and API. While these are valuable for an overview, they often lack the depth needed for fine-grained troubleshooting. Integrating `mysqld_exporter` provides the Prometheus/Grafana ecosystem with the detailed metrics required for advanced analysis and alerting.

Alerting Strategies

Once you have metrics flowing into Prometheus, configure alerting rules. Prometheus Alertmanager is the standard tool for this. Define rules based on thresholds and anomalies:

Example Alerting Rule (`rules.yml`)

groups:
- name: mysql_alerts
  rules:
  - alert: HighReplicationLag
    expr: mysql_slave_sql_running == 0 # For traditional replication, check if slave threads are running
    # Or for wsrep:
    # expr: mysql_wsrep_local_state_comment != 'Synced'
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High replication lag detected on {{ $labels.instance }}"
      description: "MySQL replica {{ $labels.instance }} is lagging significantly behind the primary."

  - alert: HighQueryLatency
    expr: rate(mysql_global_status_queries[5m]) > 1000 # Example: High QPS
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High query rate on {{ $labels.instance }}"
      description: "MySQL instance {{ $labels.instance }} is experiencing a high query load."

  - alert: MySQLConnectionAborted
    expr: increase(mysql_global_status_aborted_connects[5m]) > 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Aborted MySQL connections on {{ $labels.instance }}"
      description: "More than 10 connections aborted in the last 5 minutes on {{ $labels.instance }}."

Configure Alertmanager to route these alerts to Slack, PagerDuty, or email. Regularly review and tune your alert thresholds to minimize false positives and ensure critical issues are addressed promptly.