Server Monitoring Best Practices: Keeping Your Laravel App and MySQL Clusters Alive on OVH

Proactive MySQL Cluster Health Checks with `pt-heartbeat`

Maintaining the health and synchronization of a MySQL cluster, especially in a production environment serving a Laravel application, is paramount. Downtime or replication lag can directly impact user experience and business operations. A critical component of this is ensuring replication is functioning correctly and that there are no significant delays between master and replicas. We’ll leverage Percona Toolkit’s `pt-heartbeat` for this.

pt-heartbeat monitors replication lag by writing a timestamp to a table on the master and then reading it from the replicas. This provides a precise measurement of replication delay.

Setting up `pt-heartbeat` on MySQL Replicas

First, ensure Percona Toolkit is installed on all your MySQL nodes. On Debian/Ubuntu systems, this is typically:

sudo apt update
sudo apt install percona-toolkit

Next, create a dedicated database and table on your MySQL master for `pt-heartbeat` to use. This table will store the heartbeat information.

-- Connect to your MySQL master
-- mysql -h master_host -u root -p

CREATE DATABASE IF NOT EXISTS heartbeat;
USE heartbeat;
CREATE TABLE IF NOT EXISTS bpm (
  id int(11) NOT NULL AUTO_INCREMENT,
  server_id int(11) NOT NULL DEFAULT '0',
  ts datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (id)
) ENGINE=InnoDB;

Now, configure and run `pt-heartbeat` on each of your MySQL replicas. This command will connect to the master to get the heartbeat timestamp and then report the lag.

# On each MySQL replica:
pt-heartbeat --host=master_host --user=replication_user --password=replication_password --database=heartbeat --table=bpm --interval=1 --daemonize --pid=/var/run/pt-heartbeat.pid --log=/var/log/pt-heartbeat.log

Replace master_host with the actual hostname or IP of your MySQL master, and replication_user/replication_password with credentials that have at least REPLICATION CLIENT and SELECT privileges on the heartbeat database. The --interval=1 means it checks every second. --daemonize runs it as a background process.

Monitoring Replication Lag with Prometheus and Grafana

To effectively monitor the output of `pt-heartbeat` and other MySQL metrics, we’ll integrate with Prometheus and visualize the data in Grafana. The mysqld_exporter is a Prometheus exporter for MySQL that can collect a wide range of metrics, including replication status.

First, install and configure mysqld_exporter. You’ll need a dedicated MySQL user with sufficient privileges for the exporter to query performance schema and other relevant tables.

-- On each MySQL node (master and replicas):
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'your_secure_password';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;

Create a .my.cnf file for the user running the exporter (e.g., prometheus user) to store credentials securely.

# ~/.my.cnf for the exporter user
[client]
user=exporter
password=your_secure_password
host=localhost

Download and run mysqld_exporter. A common approach is to run it as a systemd service.

# Download the latest release from Prometheus's GitHub releases page
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.0/mysqld_exporter-0.15.0.linux-amd64.tar.gz
tar xvfz mysqld_exporter-0.15.0.linux-amd64.tar.gz
cd mysqld_exporter-0.15.0.linux-amd64/

# Create a systemd service file
sudo nano /etc/systemd/system/mysqld_exporter.service

Add the following content to the systemd service file:

[Unit]
Description=Prometheus MySQL Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/mysqld_exporter --config.my-cnf=/home/prometheus/.my.cnf

[Install]
WantedBy=multi-user.target

Make sure the prometheus user exists and has ownership of the .my.cnf file and the exporter binary. Then, enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable mysqld_exporter
sudo systemctl start mysqld_exporter
sudo systemctl status mysqld_exporter

Configure Prometheus to scrape the mysqld_exporter endpoints. Add the following to your prometheus.yml configuration:

scrape_configs:
  - job_name: 'mysql'
    static_configs:
      - targets:
          - 'mysql_master_ip:9104'
          - 'mysql_replica1_ip:9104'
          - 'mysql_replica2_ip:9104'
    # If using service discovery, adjust accordingly

Restart Prometheus for the changes to take effect. Now, you can import a pre-built MySQL dashboard into Grafana (many are available on Grafana.com, search for “MySQL Overview” or similar) or create custom dashboards. Key metrics to monitor include:

mysql_slave_lag_seconds: This metric, often derived from pt-heartbeat or directly from Seconds_Behind_Master, is crucial.
mysql_up: Indicates if the exporter can connect to MySQL.
mysql_global_status_threads_connected: Number of active connections.
mysql_global_status_threads_running: Number of threads actively executing queries.
mysql_global_status_innodb_buffer_pool_wait_free: Indicates buffer pool contention.
mysql_global_status_innodb_row_lock_waits: High values suggest locking issues.
mysql_replication_io_running and mysql_replication_sql_running: Essential for replication health.

Laravel Application Health Checks

Beyond the database, the Laravel application itself needs robust health checks. This involves monitoring the web server, PHP-FPM, and the application’s internal state.

Nginx and PHP-FPM Monitoring

Nginx’s stub_status module is invaluable for basic web server performance metrics. Ensure it’s enabled in your Nginx configuration.

# In your Nginx site configuration (e.g., /etc/nginx/sites-available/your-laravel-app)
server {
    listen 80;
    server_name your-app.com;

    # ... other configurations ...

    location /nginx_status {
        stub_status;
        allow 127.0.0.1; # Allow access only from localhost for security
        deny all;
    }

    # ... other configurations ...
}

Reload Nginx after making changes: sudo systemctl reload nginx. You can then use Prometheus’s nginx-exporter to scrape these metrics. Configure nginx-exporter similarly to mysqld_exporter, pointing it to your Nginx instance.

For PHP-FPM, the pm.status_path directive provides similar statistics. Ensure it’s configured and accessible (often via a local Nginx proxy or directly if running on the same host).

; In your PHP-FPM pool configuration (e.g., /etc/php/8.1/fpm/pool.d/www.conf)
pm.status_path = /fpm_status
ping.path = /fpm_ping
ping.response = pong

You’ll need to configure Nginx to proxy requests to this status page, similar to the stub_status example. Then, use the php_fpm_exporter to scrape these metrics for Prometheus.

Application-Level Health Checks with Laravel

Laravel provides built-in mechanisms and can be extended for deeper application health checks. A common pattern is to create a dedicated health check route.

// routes/web.php or routes/api.php
use Illuminate\Support\Facades\Route;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\Redis;

Route::get('/health', function () {
    $status = [
        'database' => false,
        'cache' => false,
        'redis' => false,
        'app' => true, // Assume app is healthy until proven otherwise
    ];

    // Check Database Connection
    try {
        DB::connection()->getPdo();
        $status['database'] = true;
    } catch (\Exception $e) {
        // Log the error if needed
        report($e);
    }

    // Check Cache (e.g., file cache, or a specific cache key)
    try {
        $cacheKey = 'health_check_key_' . uniqid();
        Cache::put($cacheKey, true, 1); // Put a key with a short TTL
        if (Cache::get($cacheKey)) {
            $status['cache'] = true;
        }
        Cache::forget($cacheKey);
    } catch (\Exception $e) {
        report($e);
    }

    // Check Redis Connection (if used)
    try {
        if (Redis::connection()->client()->ping()) {
            $status['redis'] = true;
        }
    } catch (\Exception $e) {
        report($e);
    }

    // Determine overall health
    $overallHealth = (
        $status['database'] &&
        $status['cache'] &&
        $status['redis'] && // Add other dependencies as needed
        $status['app']
    );

    return response()->json([
        'status' => $overallHealth ? 'healthy' : 'unhealthy',
        'checks' => $status,
    ], $overallHealth ? 200 : 503); // 503 Service Unavailable for unhealthy
});

This route can be scraped by Prometheus using the blackbox_exporter. Configure blackbox_exporter to probe this /health endpoint using HTTP.

Alerting Strategies with Alertmanager

Collecting metrics is only half the battle; actionable alerts are crucial. Prometheus integrates with Alertmanager for sophisticated alerting rules and routing.

Define alerting rules in Prometheus, typically in a file like /etc/prometheus/alert.rules.yml. Here are examples for MySQL replication lag and application health:

groups:
- name: mysql_alerts
  rules:
  - alert: HighReplicationLag
    expr: mysql_slave_lag_seconds{job="mysql"} > 60 # Alert if lag is over 60 seconds
    for: 5m # Condition must persist for 5 minutes
    labels:
      severity: critical
    annotations:
      summary: "High replication lag on {{ $labels.instance }}"
      description: "MySQL replica {{ $labels.instance }} has a replication lag of {{ $value }} seconds, exceeding the 60-second threshold."

  - alert: MySQLDown
    expr: mysql_up{job="mysql"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "MySQL exporter is down on {{ $labels.instance }}"
      description: "The MySQL exporter for {{ $labels.instance }} has been down for more than 1 minute."

- name: laravel_app_alerts
  rules:
  - alert: LaravelAppUnhealthy
    expr: probe_success{job="blackbox", instance="your-app.com/health"} == 0 # Assuming blackbox exporter probes /health
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "Laravel application is unhealthy on {{ $labels.instance }}"
      description: "The /health endpoint for {{ $labels.instance }} returned an error for 2 minutes."

  - alert: HighPHPFPMWorkerUtilization
    expr: php_fpm_process_count{job="php-fpm"} > (php_fpm_max_children{job="php-fpm"} * 0.9) # Alert if 90% of max children are used
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High PHP-FPM worker utilization on {{ $labels.instance }}"
      description: "PHP-FPM on {{ $labels.instance }} is using {{ $value }} out of {{ php_fpm_max_children }} max children."

Configure Alertmanager to receive these alerts from Prometheus and route them to appropriate channels (e.g., Slack, PagerDuty, email). Ensure your Alertmanager configuration (alertmanager.yml) defines receivers and routes.

OVH Specific Considerations

When deploying this on OVH, consider:

Network Segmentation: Use OVH’s network security groups or firewall rules to restrict access to your MySQL ports (3306) and monitoring endpoints (9104, 9090, etc.) to only necessary IPs (e.g., your application servers, monitoring servers).
Instance Sizing: Ensure your instances (both for the application and monitoring components) are adequately sized for the expected load. Monitoring itself consumes resources.
Managed Databases: If you are using OVH’s managed database services, the approach to monitoring might differ. You may need to use OVH’s provided tools or APIs, but the underlying principles of checking replication lag and performance remain the same. For self-managed MySQL on OVH instances, the above applies directly.
Logging: Centralize logs from your application, Nginx, PHP-FPM, and the monitoring exporters using a service like ELK stack or Graylog, potentially hosted on a separate OVH instance or a dedicated logging service. This is crucial for post-mortem analysis.

By implementing these layered monitoring strategies, you can achieve a highly resilient and observable Laravel application running on MySQL clusters, ensuring proactive identification and resolution of issues before they impact your users.

Server Monitoring Best Practices: Keeping Your Laravel App and MySQL Clusters Alive on OVH

Proactive MySQL Cluster Health Checks with `pt-heartbeat`

Setting up `pt-heartbeat` on MySQL Replicas

Monitoring Replication Lag with Prometheus and Grafana

Laravel Application Health Checks

Nginx and PHP-FPM Monitoring

Application-Level Health Checks with Laravel

Alerting Strategies with Alertmanager

OVH Specific Considerations

Recent Posts

Top Categories

Our Products

Our Services