Server Monitoring Best Practices: Keeping Your Laravel App and MySQL Clusters Alive on OVH
Proactive MySQL Cluster Health Checks with `pt-heartbeat`
Maintaining the health and synchronization of a MySQL cluster, especially in a production environment serving a Laravel application, is paramount. Downtime or replication lag can directly impact user experience and business operations. A critical component of this is ensuring replication is functioning correctly and that there are no significant delays between master and replicas. We’ll leverage Percona Toolkit’s `pt-heartbeat` for this.
pt-heartbeat monitors replication lag by writing a timestamp to a table on the master and then reading it from the replicas. This provides a precise measurement of replication delay.
Setting up `pt-heartbeat` on MySQL Replicas
First, ensure Percona Toolkit is installed on all your MySQL nodes. On Debian/Ubuntu systems, this is typically:
sudo apt update sudo apt install percona-toolkit
Next, create a dedicated database and table on your MySQL master for `pt-heartbeat` to use. This table will store the heartbeat information.
-- Connect to your MySQL master -- mysql -h master_host -u root -p CREATE DATABASE IF NOT EXISTS heartbeat; USE heartbeat; CREATE TABLE IF NOT EXISTS bpm ( id int(11) NOT NULL AUTO_INCREMENT, server_id int(11) NOT NULL DEFAULT '0', ts datetime NOT NULL DEFAULT '0000-00-00 00:00:00', PRIMARY KEY (id) ) ENGINE=InnoDB;
Now, configure and run `pt-heartbeat` on each of your MySQL replicas. This command will connect to the master to get the heartbeat timestamp and then report the lag.
# On each MySQL replica: pt-heartbeat --host=master_host --user=replication_user --password=replication_password --database=heartbeat --table=bpm --interval=1 --daemonize --pid=/var/run/pt-heartbeat.pid --log=/var/log/pt-heartbeat.log
Replace master_host with the actual hostname or IP of your MySQL master, and replication_user/replication_password with credentials that have at least REPLICATION CLIENT and SELECT privileges on the heartbeat database. The --interval=1 means it checks every second. --daemonize runs it as a background process.
Monitoring Replication Lag with Prometheus and Grafana
To effectively monitor the output of `pt-heartbeat` and other MySQL metrics, we’ll integrate with Prometheus and visualize the data in Grafana. The mysqld_exporter is a Prometheus exporter for MySQL that can collect a wide range of metrics, including replication status.
First, install and configure mysqld_exporter. You’ll need a dedicated MySQL user with sufficient privileges for the exporter to query performance schema and other relevant tables.
-- On each MySQL node (master and replicas): CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'your_secure_password'; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost'; FLUSH PRIVILEGES;
Create a .my.cnf file for the user running the exporter (e.g., prometheus user) to store credentials securely.
# ~/.my.cnf for the exporter user [client] user=exporter password=your_secure_password host=localhost
Download and run mysqld_exporter. A common approach is to run it as a systemd service.
# Download the latest release from Prometheus's GitHub releases page wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.0/mysqld_exporter-0.15.0.linux-amd64.tar.gz tar xvfz mysqld_exporter-0.15.0.linux-amd64.tar.gz cd mysqld_exporter-0.15.0.linux-amd64/ # Create a systemd service file sudo nano /etc/systemd/system/mysqld_exporter.service
Add the following content to the systemd service file:
[Unit] Description=Prometheus MySQL Exporter Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/mysqld_exporter --config.my-cnf=/home/prometheus/.my.cnf [Install] WantedBy=multi-user.target
Make sure the prometheus user exists and has ownership of the .my.cnf file and the exporter binary. Then, enable and start the service:
sudo systemctl daemon-reload sudo systemctl enable mysqld_exporter sudo systemctl start mysqld_exporter sudo systemctl status mysqld_exporter
Configure Prometheus to scrape the mysqld_exporter endpoints. Add the following to your prometheus.yml configuration:
scrape_configs:
- job_name: 'mysql'
static_configs:
- targets:
- 'mysql_master_ip:9104'
- 'mysql_replica1_ip:9104'
- 'mysql_replica2_ip:9104'
# If using service discovery, adjust accordingly
Restart Prometheus for the changes to take effect. Now, you can import a pre-built MySQL dashboard into Grafana (many are available on Grafana.com, search for “MySQL Overview” or similar) or create custom dashboards. Key metrics to monitor include:
mysql_slave_lag_seconds: This metric, often derived frompt-heartbeator directly fromSeconds_Behind_Master, is crucial.mysql_up: Indicates if the exporter can connect to MySQL.mysql_global_status_threads_connected: Number of active connections.mysql_global_status_threads_running: Number of threads actively executing queries.mysql_global_status_innodb_buffer_pool_wait_free: Indicates buffer pool contention.mysql_global_status_innodb_row_lock_waits: High values suggest locking issues.mysql_replication_io_runningandmysql_replication_sql_running: Essential for replication health.
Laravel Application Health Checks
Beyond the database, the Laravel application itself needs robust health checks. This involves monitoring the web server, PHP-FPM, and the application’s internal state.
Nginx and PHP-FPM Monitoring
Nginx’s stub_status module is invaluable for basic web server performance metrics. Ensure it’s enabled in your Nginx configuration.
# In your Nginx site configuration (e.g., /etc/nginx/sites-available/your-laravel-app)
server {
listen 80;
server_name your-app.com;
# ... other configurations ...
location /nginx_status {
stub_status;
allow 127.0.0.1; # Allow access only from localhost for security
deny all;
}
# ... other configurations ...
}
Reload Nginx after making changes: sudo systemctl reload nginx. You can then use Prometheus’s nginx-exporter to scrape these metrics. Configure nginx-exporter similarly to mysqld_exporter, pointing it to your Nginx instance.
For PHP-FPM, the pm.status_path directive provides similar statistics. Ensure it’s configured and accessible (often via a local Nginx proxy or directly if running on the same host).
; In your PHP-FPM pool configuration (e.g., /etc/php/8.1/fpm/pool.d/www.conf) pm.status_path = /fpm_status ping.path = /fpm_ping ping.response = pong
You’ll need to configure Nginx to proxy requests to this status page, similar to the stub_status example. Then, use the php_fpm_exporter to scrape these metrics for Prometheus.
Application-Level Health Checks with Laravel
Laravel provides built-in mechanisms and can be extended for deeper application health checks. A common pattern is to create a dedicated health check route.
// routes/web.php or routes/api.php
use Illuminate\Support\Facades\Route;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\Redis;
Route::get('/health', function () {
$status = [
'database' => false,
'cache' => false,
'redis' => false,
'app' => true, // Assume app is healthy until proven otherwise
];
// Check Database Connection
try {
DB::connection()->getPdo();
$status['database'] = true;
} catch (\Exception $e) {
// Log the error if needed
report($e);
}
// Check Cache (e.g., file cache, or a specific cache key)
try {
$cacheKey = 'health_check_key_' . uniqid();
Cache::put($cacheKey, true, 1); // Put a key with a short TTL
if (Cache::get($cacheKey)) {
$status['cache'] = true;
}
Cache::forget($cacheKey);
} catch (\Exception $e) {
report($e);
}
// Check Redis Connection (if used)
try {
if (Redis::connection()->client()->ping()) {
$status['redis'] = true;
}
} catch (\Exception $e) {
report($e);
}
// Determine overall health
$overallHealth = (
$status['database'] &&
$status['cache'] &&
$status['redis'] && // Add other dependencies as needed
$status['app']
);
return response()->json([
'status' => $overallHealth ? 'healthy' : 'unhealthy',
'checks' => $status,
], $overallHealth ? 200 : 503); // 503 Service Unavailable for unhealthy
});
This route can be scraped by Prometheus using the blackbox_exporter. Configure blackbox_exporter to probe this /health endpoint using HTTP.
Alerting Strategies with Alertmanager
Collecting metrics is only half the battle; actionable alerts are crucial. Prometheus integrates with Alertmanager for sophisticated alerting rules and routing.
Define alerting rules in Prometheus, typically in a file like /etc/prometheus/alert.rules.yml. Here are examples for MySQL replication lag and application health:
groups:
- name: mysql_alerts
rules:
- alert: HighReplicationLag
expr: mysql_slave_lag_seconds{job="mysql"} > 60 # Alert if lag is over 60 seconds
for: 5m # Condition must persist for 5 minutes
labels:
severity: critical
annotations:
summary: "High replication lag on {{ $labels.instance }}"
description: "MySQL replica {{ $labels.instance }} has a replication lag of {{ $value }} seconds, exceeding the 60-second threshold."
- alert: MySQLDown
expr: mysql_up{job="mysql"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL exporter is down on {{ $labels.instance }}"
description: "The MySQL exporter for {{ $labels.instance }} has been down for more than 1 minute."
- name: laravel_app_alerts
rules:
- alert: LaravelAppUnhealthy
expr: probe_success{job="blackbox", instance="your-app.com/health"} == 0 # Assuming blackbox exporter probes /health
for: 2m
labels:
severity: critical
annotations:
summary: "Laravel application is unhealthy on {{ $labels.instance }}"
description: "The /health endpoint for {{ $labels.instance }} returned an error for 2 minutes."
- alert: HighPHPFPMWorkerUtilization
expr: php_fpm_process_count{job="php-fpm"} > (php_fpm_max_children{job="php-fpm"} * 0.9) # Alert if 90% of max children are used
for: 10m
labels:
severity: warning
annotations:
summary: "High PHP-FPM worker utilization on {{ $labels.instance }}"
description: "PHP-FPM on {{ $labels.instance }} is using {{ $value }} out of {{ php_fpm_max_children }} max children."
Configure Alertmanager to receive these alerts from Prometheus and route them to appropriate channels (e.g., Slack, PagerDuty, email). Ensure your Alertmanager configuration (alertmanager.yml) defines receivers and routes.
OVH Specific Considerations
When deploying this on OVH, consider:
- Network Segmentation: Use OVH’s network security groups or firewall rules to restrict access to your MySQL ports (3306) and monitoring endpoints (9104, 9090, etc.) to only necessary IPs (e.g., your application servers, monitoring servers).
- Instance Sizing: Ensure your instances (both for the application and monitoring components) are adequately sized for the expected load. Monitoring itself consumes resources.
- Managed Databases: If you are using OVH’s managed database services, the approach to monitoring might differ. You may need to use OVH’s provided tools or APIs, but the underlying principles of checking replication lag and performance remain the same. For self-managed MySQL on OVH instances, the above applies directly.
- Logging: Centralize logs from your application, Nginx, PHP-FPM, and the monitoring exporters using a service like ELK stack or Graylog, potentially hosted on a separate OVH instance or a dedicated logging service. This is crucial for post-mortem analysis.
By implementing these layered monitoring strategies, you can achieve a highly resilient and observable Laravel application running on MySQL clusters, ensuring proactive identification and resolution of issues before they impact your users.