Server Monitoring Best Practices: Keeping Your Laravel App and Redis Clusters Alive on DigitalOcean

Proactive Redis Cluster Health Checks with `redis-cli`

Maintaining the health of a Redis cluster is paramount for any high-traffic Laravel application. Beyond basic uptime, we need to monitor key performance indicators and cluster state. DigitalOcean’s managed Redis offers some insights, but direct `redis-cli` commands provide granular, real-time diagnostics. We’ll focus on commands that reveal cluster integrity and potential bottlenecks.

First, establish a connection to one of your Redis nodes. If you’re using a private network on DigitalOcean, this will be a private IP address. For external access, use the public IP and ensure your firewall rules are correctly configured.

Cluster State and Node Status

The `CLUSTER INFO` command provides a high-level overview of the cluster’s status. Pay close attention to `cluster_state` (should be `ok`) and `cluster_slots_assigned` vs. `cluster_slots_ok`.

redis-cli -h  -p  CLUSTER INFO

A more detailed view of individual nodes and their roles is obtained with `CLUSTER NODES`. This output is crucial for identifying nodes that are disconnected, in a `fail` state, or not participating correctly in the cluster.

redis-cli -h  -p  CLUSTER NODES

Look for:

Nodes marked with `myself,master` or `myself,slave`.
The `connected` status of each node.
The `master` field for slaves, ensuring they point to a valid master.
The `slots` field for masters, confirming they are responsible for the expected hash slots.

Performance Metrics for Bottleneck Detection

The `INFO` command, when used with the `CPU`, `MEMORY`, and `STATS` sections, is invaluable for spotting performance degradation. We can script checks against these metrics.

redis-cli -h  -p  INFO CPU MEMORY STATS

Key metrics to monitor:

used_memory and used_memory_peak: Monitor memory usage against your droplet’s limits.
instantaneous_ops_per_sec: High values might indicate a busy node.
keyspace_hits and keyspace_misses: A low hit rate can suggest insufficient memory for caching or inefficient key usage.
latest_fork_usec: A high value indicates a long fork operation, which can block the main Redis thread. This is particularly important for persistence operations (RDB/AOF).
evicted_keys: Indicates that Redis is running out of memory and evicting keys.

Laravel Application Health Checks with Artisan and Custom Logic

For your Laravel application, health checks should go beyond simply verifying that the web server is responding. We need to ensure critical dependencies like the database and Redis are accessible and functioning correctly from the application’s perspective.

Artisan Command for Dependency Checks

Create a dedicated Artisan command to encapsulate these checks. This command can be triggered by external monitoring tools (e.g., via a cron job or a dedicated monitoring agent).

// app/Console/Commands/CheckAppHealth.php
namespace App\Console\Commands;

use Illuminate\Console\Command;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Redis;
use Exception;

class CheckAppHealth extends Command
{
    protected $signature = 'app:health-check';
    protected $description = 'Performs health checks on application dependencies (DB, Redis)';

    public function handle()
    {
        $this->info('Starting application health check...');

        $this->checkDatabaseConnection();
        $this->checkRedisConnection();

        $this->info('Application health check completed successfully.');
        return 0;
    }

    protected function checkDatabaseConnection(): void
    {
        try {
            DB::connection()->getPdo();
            $this->info('Database connection is healthy.');
        } catch (Exception $e) {
            $this->error('Database connection failed: ' . $e->getMessage());
            // In a real-world scenario, you might want to throw an exception
            // or exit with a non-zero status code to signal failure to monitoring systems.
            // throw $e;
            exit(1); // Signal failure
        }
    }

    protected function checkRedisConnection(): void
    {
        try {
            // Attempt a simple SET/GET operation to verify connectivity and functionality
            $key = 'app_health_check_' . uniqid();
            $value = 'ping';

            Redis::set($key, $value, 'EX', 5); // Set with a short expiry
            $retrievedValue = Redis::get($key);

            if ($retrievedValue === $value) {
                Redis::del($key); // Clean up
                $this->info('Redis connection is healthy.');
            } else {
                $this->error('Redis SET/GET operation failed. Retrieved value mismatch.');
                exit(1); // Signal failure
            }
        } catch (Exception $e) {
            $this->error('Redis connection failed: ' . $e->getMessage());
            exit(1); // Signal failure
        }
    }
}

protected $commands = [
    \App\Console\Commands\CheckAppHealth::class,
];

Scheduling the Health Check

You can schedule this command to run periodically using Laravel’s scheduler. For external monitoring, you’ll typically run it via cron directly on the server or through a CI/CD pipeline.

// app/Console/Kernel.php
protected function schedule(Schedule $schedule)
{
    // Run every minute for immediate feedback, adjust as needed
    $schedule->command('app:health-check')->everyMinute();
}

Ensure your cron daemon is set up to run Laravel’s scheduler:

* * * * * cd /path-to-your-laravel-project && php artisan schedule:run >> /dev/null 2>&1

Server-Level Monitoring with DigitalOcean and Prometheus/Grafana

While application-level checks are vital, robust monitoring requires observing the underlying infrastructure. DigitalOcean provides basic metrics, but for advanced visualization and alerting, integrating Prometheus and Grafana is a standard practice.

DigitalOcean Droplet Metrics

DigitalOcean’s control panel offers graphs for CPU utilization, bandwidth, load, and memory. These are good for a quick glance but lack the configurability and alerting capabilities of dedicated monitoring stacks.

Setting up Prometheus and Grafana

A common approach is to deploy Prometheus and Grafana on a dedicated monitoring Droplet or within your application’s infrastructure if resources permit. We’ll use Docker for ease of deployment.

# docker-compose.yml
version: '3.7'

services:
  prometheus:
    image: prom/prometheus:v2.37.0
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus:/etc/prometheus/
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    restart: unless-stopped

  grafana:
    image: grafana/grafana:9.1.0
    container_name: grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana-storage:/var/lib/grafana
    restart: unless-stopped

volumes:
  grafana-storage:

Create a prometheus/prometheus.yml file. This configuration will scrape metrics from your Redis nodes (using `redis_exporter`) and your Laravel application servers (using `node_exporter` and potentially a custom exporter for Artisan command results).

global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

scrape_configs:
  # Scrape Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Scrape Node Exporter for server metrics
  - job_name: 'node_exporter'
    static_configs:
      - targets: [':9100', ':9100'] # Replace with your Laravel server IPs

  # Scrape Redis Exporter for Redis metrics
  - job_name: 'redis_exporter'
    static_configs:
      - targets: [':9121', ':9121'] # Replace with your Redis node IPs and exporter ports

  # Example for a custom exporter that exposes Artisan command results (more advanced)
  # - job_name: 'laravel_artisan_exporter'
  #   static_configs:
  #     - targets: [':9500'] # Assuming a custom exporter runs on port 9500

You’ll need to deploy `node_exporter` and `redis_exporter` on your respective servers. For `redis_exporter`, it typically runs as a separate process that connects to Redis and exposes metrics via an HTTP endpoint (defaulting to port 9121).

# Example for running redis_exporter in Docker on a Redis node
docker run -d \
  --name redis_exporter \
  -p 9121:9121 \
  oliver006/redis_exporter:latest \
  --redis.addr=redis://localhost:6379 # Adjust if Redis is not on localhost:6379

For `node_exporter`, it’s usually installed directly on the server and run as a systemd service.

Grafana Dashboards and Alerting

Once Prometheus is scraping metrics, configure Grafana to use Prometheus as a data source. Import pre-built dashboards for Redis and Node Exporter (available on Grafana.com) or create custom ones. Key dashboards to look for:

Redis Cluster Overview
Redis Memory Usage
Redis Performance Metrics
Node Exporter Full Dashboard

Set up alerting rules in Prometheus (or directly in Grafana) for critical conditions:

Redis cluster state is not `ok`.
Redis node is unreachable.
High Redis memory usage (e.g., > 85%).
High `latest_fork_usec` in Redis.
High CPU or Memory usage on Laravel servers.
High disk I/O wait times.
Laravel health check Artisan command returns non-zero exit code.

Advanced: Custom Laravel Health Check Exporter

To integrate the results of your `php artisan app:health-check` command directly into Prometheus, you can build a small custom exporter. This exporter would periodically run the Artisan command and expose its results as Prometheus metrics.

A simple Python script using `subprocess` to call Artisan and `prometheus_client` library can achieve this:

# laravel_artisan_exporter.py
from prometheus_client import start_http_server, Gauge
import subprocess
import time
import os

# Define metrics
laravel_health_status = Gauge(
    'laravel_health_check_status',
    'Health check status of Laravel dependencies (1 for OK, 0 for FAIL)',
    ['dependency']
)

# Get Laravel project path from environment variable
LARAVEL_PATH = os.environ.get('LARAVEL_PATH', '/var/www/html') # Default path, adjust as needed

def run_artisan_health_check():
    try:
        # Execute the Artisan command
        # We capture stdout and stderr to check for specific messages
        result = subprocess.run(
            ['php', 'artisan', 'app:health-check'],
            cwd=LARAVEL_PATH,
            capture_output=True,
            text=True,
            check=False # Don't raise exception on non-zero exit code, we'll check it manually
        )

        # Check exit code
        if result.returncode == 0:
            # Command ran successfully, assume all checks passed if no specific errors are logged
            # More robust: parse output for "Database connection is healthy." and "Redis connection is healthy."
            if "Database connection is healthy." in result.stdout and "Redis connection is healthy." in result.stdout:
                laravel_health_status.labels('database').set(1)
                laravel_health_status.labels('redis').set(1)
                print("Artisan health check OK.")
            else:
                # Partial success or unexpected output
                laravel_health_status.labels('database').set(0)
                laravel_health_status.labels('redis').set(0)
                print(f"Artisan health check: Partial success or unexpected output. STDOUT: {result.stdout}")
        else:
            # Command failed, try to determine which dependency failed based on output
            print(f"Artisan health check FAILED with exit code {result.returncode}. STDERR: {result.stderr}")
            if "Database connection failed" in result.stderr or "Database connection failed" in result.stdout:
                laravel_health_status.labels('database').set(0)
            else:
                laravel_health_status.labels('database').set(1) # Assume OK if not explicitly failed

            if "Redis connection failed" in result.stderr or "Redis connection failed" in result.stdout:
                laravel_health_status.labels('redis').set(0)
            else:
                laravel_health_status.labels('redis').set(1) # Assume OK if not explicitly failed

    except FileNotFoundError:
        print(f"Error: php command not found or Laravel path '{LARAVEL_PATH}' is incorrect.")
        laravel_health_status.labels('database').set(0)
        laravel_health_status.labels('redis').set(0)
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        laravel_health_status.labels('database').set(0)
        laravel_health_status.labels('redis').set(0)

if __name__ == '__main__':
    # Start up the server to expose the metrics.
    # Expose metrics on port 9500
    exporter_port = 9500
    print(f"Starting Prometheus exporter on port {exporter_port}")
    start_http_server(exporter_port)

    # Run the health check periodically
    check_interval_seconds = 60 # Run every 60 seconds
    while True:
        run_artisan_health_check()
        time.sleep(check_interval_seconds)

Deploy this script on your Laravel server(s) and configure Prometheus to scrape it (as shown in the `prometheus.yml` example). Ensure the `LARAVEL_PATH` environment variable is set correctly for the script.

Conclusion: A Multi-Layered Approach

Effective server and application monitoring is a continuous process. By combining direct Redis cluster diagnostics, application-level Artisan checks, and infrastructure-wide metrics via Prometheus and Grafana, you build a resilient system. Proactive identification of issues before they impact users is the ultimate goal, and this multi-layered strategy provides the visibility needed to achieve it.