Server Monitoring Best Practices: Keeping Your Laravel App and MySQL Clusters Alive on Linode
Proactive MySQL Cluster Health Checks with `pt-heartbeat`
Maintaining the health and synchronization of a MySQL cluster, especially in a high-availability setup for a Laravel application, is paramount. Downtime or replication lag can lead to data inconsistencies and application failures. A critical component of this is monitoring replication lag. The Percona Toolkit offers `pt-heartbeat`, an indispensable tool for this purpose. It writes a timestamp to a dedicated table and monitors the replication lag on slave servers by comparing the timestamp on the master with the timestamp on the slave.
First, ensure you have Percona Toolkit installed on your MySQL servers. On Debian/Ubuntu systems, this is typically:
sudo apt-get update sudo apt-get install percona-toolkit
Next, create a dedicated database and table for `pt-heartbeat` on your MySQL master server. This table will store the heartbeat timestamp.
CREATE DATABASE IF NOT EXISTS heartbeat;
USE heartbeat;
CREATE TABLE IF NOT EXISTS ping (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
server_id INT UNSIGNED NOT NULL,
ts DATETIME(6) NOT NULL
) ENGINE=InnoDB;
Now, configure `pt-heartbeat` to run on the master. This script will periodically update the `ts` column in the `heartbeat.ping` table. We’ll set it to run every 5 seconds.
pt-heartbeat --host=YOUR_MASTER_HOST --user=heartbeat_user --password=your_password --database=heartbeat --table=ping --interval=5 --update-server-id
Replace YOUR_MASTER_HOST, heartbeat_user, and your_password with your actual credentials. It’s highly recommended to create a dedicated MySQL user with minimal privileges for this operation.
On each of your MySQL slave servers, you’ll use `pt-heartbeat` to monitor the lag. This command connects to the slave and checks the `ts` value against the master’s current time.
pt-heartbeat --host=YOUR_SLAVE_HOST --user=heartbeat_user --password=your_password --database=heartbeat --table=ping --interval=5 --monitor
The --monitor flag tells `pt-heartbeat` to report the lag. You can then pipe this output to your monitoring system (e.g., Nagios, Zabbix, Prometheus Alertmanager). For instance, to trigger an alert if the lag exceeds 60 seconds:
pt-heartbeat --host=YOUR_SLAVE_HOST --user=heartbeat_user --password=your_password --database=heartbeat --table=ping --interval=5 --monitor --critical=60 --warning=30
Laravel Application Health Checks with Health Checks Endpoints
For your Laravel application, implementing a dedicated health check endpoint is crucial. This endpoint should not only verify that the application is running but also check its critical dependencies, such as the database connection, Redis cache, and any external APIs it relies on.
Create a new controller for your health checks:
php artisan make:controller HealthCheckController
In the `HealthCheckController`, define a method that performs the checks. For a robust check, you’ll want to verify database connectivity and potentially a Redis connection.
namespace App\Http\Controllers;
use Illuminate\Http\JsonResponse;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\Log;
use Illuminate\Routing\Controller as BaseController;
class HealthCheckController extends BaseController
{
/**
* Perform a comprehensive health check.
*
* @return \Illuminate\Http\JsonResponse
*/
public function index(): JsonResponse
{
$checks = [];
$status = 'healthy';
// 1. Database Connection Check
try {
DB::connection()->getPdo();
$checks['database'] = 'ok';
} catch (\Exception $e) {
$status = 'unhealthy';
$checks['database'] = 'error: ' . $e->getMessage();
Log::error('Database connection error: ' . $e->getMessage());
}
// 2. Redis Cache Check (if Redis is used)
try {
// Attempt to set and get a simple key
$redis = Cache::store('redis')->connection();
$redis->set('health_check_key', 'test', 'EX', 1); // Set with 1 second expiry
$value = $redis->get('health_check_key');
if ($value === 'test') {
$checks['redis'] = 'ok';
} else {
$status = 'unhealthy';
$checks['redis'] = 'error: failed to get value after set';
Log::error('Redis health check failed: value mismatch.');
}
} catch (\Exception $e) {
$status = 'unhealthy';
$checks['redis'] = 'error: ' . $e->getMessage();
Log::error('Redis connection error: ' . $e->getMessage());
}
// 3. Add checks for other critical services (e.g., external APIs)
// Example: Checking an external API
/*
try {
$response = Http::timeout(5)->get('https://api.example.com/health');
if ($response->successful()) {
$checks['external_api'] = 'ok';
} else {
$status = 'unhealthy';
$checks['external_api'] = 'error: API returned status ' . $response->status();
Log::error('External API health check failed: status ' . $response->status());
}
} catch (\Exception $e) {
$status = 'unhealthy';
$checks['external_api'] = 'error: ' . $e->getMessage();
Log::error('External API connection error: ' . $e->getMessage());
}
*/
return response()->json([
'status' => $status,
'checks' => $checks,
], $status === 'healthy' ? 200 : 503); // 503 Service Unavailable for unhealthy status
}
}
Register this endpoint in your routes/api.php or routes/web.php file. For production, it’s best to use routes/api.php and protect it with appropriate middleware if necessary, or at least ensure it’s not publicly accessible without authentication if it reveals sensitive information.
// routes/api.php
use App\Http\Controllers\HealthCheckController;
Route::get('/health', [HealthCheckController::class, 'index']);
You can then use a tool like Prometheus with its `blackbox_exporter` or a simple `curl` command in a cron job to poll this endpoint. If the response code is not 200 (or 503 if you want to monitor for *unhealthy* states), trigger an alert.
System-Level Monitoring with Node Exporter and Prometheus
Beyond application-specific checks, robust system-level monitoring is essential for your Linode instances. Prometheus, coupled with `node_exporter`, provides a powerful time-series monitoring solution.
First, install `node_exporter` on each of your Linode servers (both application servers and database servers). The easiest way is often to download the pre-compiled binary.
# Download the latest version (check https://prometheus.io/download/ for the latest) wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz cd node_exporter-1.7.0.linux-amd64 # Move the binary to a standard location sudo mv node_exporter /usr/local/bin/ # Create a systemd service file for automatic startup sudo tee /etc/systemd/system/node_exporter.service <<EOF [Unit] Description=Node Exporter Wants=network-online.target After=network-online.target [Service] User=nobody Group=nogroup Type=simple ExecStart=/usr/local/bin/node_exporter [Install] WantedBy=multi-user.target EOF # Reload systemd, enable and start the service sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter # Verify it's running sudo systemctl status node_exporter # You should see output indicating it's active (running) # Check metrics endpoint curl http://localhost:9100/metrics
Ensure your Linode firewall or cloud firewall is configured to allow incoming traffic on port 9100 from your Prometheus server’s IP address.
Next, configure your Prometheus server to scrape these `node_exporter` endpoints. In your prometheus.yml configuration file, add a scrape job:
scrape_configs:
- job_name: 'node_exporter'
static_configs:
- targets: ['LINODE_APP_SERVER_1_IP:9100', 'LINODE_APP_SERVER_2_IP:9100', 'LINODE_DB_MASTER_IP:9100', 'LINODE_DB_SLAVE_1_IP:9100', 'LINODE_DB_SLAVE_2_IP:9100']
labels:
instance: 'app-server-1' # Example label, adjust for each instance
- targets: ['LINODE_APP_SERVER_2_IP:9100']
labels:
instance: 'app-server-2'
- targets: ['LINODE_DB_MASTER_IP:9100']
labels:
instance: 'db-master'
- targets: ['LINODE_DB_SLAVE_1_IP:9100']
labels:
instance: 'db-slave-1'
- targets: ['LINODE_DB_SLAVE_2_IP:9100']
labels:
instance: 'db-slave-2'
- job_name: 'laravel_app_health'
metrics_path: /metrics # Assuming your Laravel app exposes Prometheus metrics via /metrics
static_configs:
- targets: ['LINODE_APP_SERVER_1_IP:80', 'LINODE_APP_SERVER_2_IP:80'] # Or your specific app port
labels:
instance: 'app-server-1'
- targets: ['LINODE_APP_SERVER_2_IP:80']
labels:
instance: 'app-server-2'
- job_name: 'mysql_heartbeat'
metrics_path: /metrics # Assuming pt-heartbeat is configured to expose metrics
static_configs:
- targets: ['LINODE_DB_SLAVE_1_IP:9100', 'LINODE_DB_SLAVE_2_IP:9100'] # Adjust if pt-heartbeat runs on a different port/exporter
labels:
instance: 'db-slave-1'
- targets: ['LINODE_DB_SLAVE_2_IP:9100']
labels:
instance: 'db-slave-2'
Replace LINODE_APP_SERVER_X_IP and LINODE_DB_X_IP with the actual IP addresses of your Linode instances. You’ll also need to configure Prometheus to scrape your Laravel application’s health endpoint if it’s exposing metrics in a Prometheus-compatible format (e.g., using a package like prometheus-client-php). Similarly, if `pt-heartbeat` is configured with an exporter, adjust the scrape config accordingly.
Alerting Strategies with Alertmanager
Prometheus alone is not enough; you need a robust alerting system. Alertmanager handles deduplication, grouping, and routing of alerts generated by Prometheus. Configure Prometheus to send alerts to Alertmanager:
# prometheus.yml
alerting:
alertmanagers:
- static_configs:
- targets: ['ALERTMANAGER_HOST:9093'] # Your Alertmanager instance address
Now, define alerting rules in Prometheus. For example, to alert on high MySQL replication lag or if the Laravel health check fails:
# rules.yml
groups:
- name: mysql_alerts
rules:
- alert: HighMysqlReplicationLag
expr: |
avg by (instance) (
time() - timestamp(
max by (instance) (
slave_status{variable="Seconds_Behind_Master"}
)
)
) > 60 # Alert if lag is more than 60 seconds
for: 5m # Fire alert only if condition persists for 5 minutes
labels:
severity: critical
annotations:
summary: "High MySQL replication lag on {{ $labels.instance }}"
description: "MySQL instance {{ $labels.instance }} has a replication lag of {{ $value }} seconds."
- name: laravel_alerts
rules:
- alert: LaravelAppUnhealthy
expr: |
up{job="laravel_app_health"} == 0 # Or check a specific metric indicating unhealthiness
for: 2m
labels:
severity: warning
annotations:
summary: "Laravel application is down on {{ $labels.instance }}"
description: "The Laravel application on {{ $labels.instance }} is not responding or is unhealthy."
You’ll need to ensure your `node_exporter` is configured to expose MySQL metrics (e.g., using the mysqld_exporter) and that your Laravel application is exposing health status as Prometheus metrics. If not, you can use Prometheus’s `blackbox_exporter` to probe the HTTP health check endpoint.
Finally, configure Alertmanager’s alertmanager.yml to route these alerts to your desired notification channels (e.g., Slack, PagerDuty, email).
# alertmanager.yml
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver'
receivers:
- name: 'default-receiver'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL'
channel: '#alerts'
send_resolved: true
title: '{{ template "slack.default.title" . }}'
text: '{{ template "slack.default.text" . }}'
# Add specific routes for different severities or services if needed
# receivers:
# - name: 'pagerduty-receiver'
# pagerduty_configs:
# - service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
By combining these strategies—proactive database monitoring with `pt-heartbeat`, application-level health checks, system-level metrics via `node_exporter`, and a robust alerting system with Prometheus and Alertmanager—you can build a resilient monitoring infrastructure for your Laravel application and MySQL clusters on Linode.