• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • Home
  • Projects
  • Products
  • Themes
  • Tools
  • Request for Quote

Vengala Vinay

Having 12+ Years of Experience in Software Development

  • Home
  • WordPress
  • PHP
    • Codeigniter
  • Django
  • Magento
  • Selenium
  • Server
Home » Server Monitoring Best Practices: Keeping Your Laravel App and DynamoDB Clusters Alive on DigitalOcean

Server Monitoring Best Practices: Keeping Your Laravel App and DynamoDB Clusters Alive on DigitalOcean

Proactive Laravel Application Health Checks

Maintaining the health of a Laravel application goes beyond simply checking if the web server is responding. We need to ensure the application itself is functioning correctly, processing requests efficiently, and not experiencing internal errors. This involves implementing deep health checks that can be integrated into your monitoring stack.

A robust health check endpoint should verify several critical components:

  • Database connectivity and basic query execution.
  • Cache driver accessibility.
  • Queue worker status (though this is often a separate, more involved check).
  • Key external API dependencies (if applicable).
  • Application-level errors (e.g., recent exceptions).

Let’s create a custom health check route in Laravel. This route will be polled by our monitoring system.

Implementing the Laravel Health Check Endpoint

First, define a route in routes/api.php (or routes/web.php if you prefer, but API routes are generally better for machine-to-machine communication).

// routes/api.php
use Illuminate\Support\Facades\Route;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Log;
use Illuminate\Http\JsonResponse;
use App\Http\Controllers\HealthCheckController;

Route::get('/health', [HealthCheckController::class, 'index']);

Next, create the HealthCheckController.

// app/Http/Controllers/HealthCheckController.php
namespace App\Http\Controllers;

use Illuminate\Http\JsonResponse;
use Illuminate\Routing\Controller as BaseController;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Log;
use Throwable;

class HealthCheckController extends BaseController
{
    public function index(): JsonResponse
    {
        $status = 'healthy';
        $checks = [];

        // 1. Database Check
        try {
            DB::connection()->getPdo();
            $checks['database'] = 'ok';
        } catch (Throwable $e) {
            $status = 'unhealthy';
            $checks['database'] = 'error: ' . $e->getMessage();
            Log::error('Database connection failed for health check', ['exception' => $e]);
        }

        // 2. Cache Check
        try {
            $cacheKey = 'health_check_cache_test_' . uniqid();
            Cache::put($cacheKey, 'test', 1);
            if (Cache::get($cacheKey) === 'test') {
                $checks['cache'] = 'ok';
                Cache::forget($cacheKey); // Clean up
            } else {
                throw new \Exception('Cache read/write failed');
            }
        } catch (Throwable $e) {
            $status = 'unhealthy';
            $checks['cache'] = 'error: ' . $e->getMessage();
            Log::error('Cache connection/operation failed for health check', ['exception' => $e]);
        }

        // 3. Add more checks as needed (e.g., external APIs, specific service availability)
        // Example: External API check (simplified)
        /*
        try {
            $client = new \GuzzleHttp\Client();
            $response = $client->request('GET', config('services.external_api.url') . '/health', ['timeout' => 5]);
            if ($response->getStatusCode() === 200) {
                $checks['external_api'] = 'ok';
            } else {
                throw new \Exception('External API returned non-200 status');
            }
        } catch (Throwable $e) {
            $status = 'unhealthy';
            $checks['external_api'] = 'error: ' . $e->getMessage();
            Log::error('External API health check failed', ['exception' => $e]);
        }
        */

        // 4. Application Exception Check (e.g., check logs for recent critical errors)
        // This is more complex and might involve parsing log files or using a dedicated logging service.
        // For simplicity, we'll assume a basic check or rely on external monitoring of logs.
        // A more advanced approach might query a logging service like Elasticsearch or Datadog.

        return response()->json([
            'status' => $status,
            'checks' => $checks,
            'timestamp' => now()->toIso8601String(),
        ], $status === 'healthy' ? 200 : 503); // 503 Service Unavailable for unhealthy
    }
}

Ensure your .env file has the correct database and cache configurations. For production, you’ll likely be using Redis for caching and a managed database service like DigitalOcean Managed Databases (PostgreSQL/MySQL).

Monitoring the Health Endpoint with UptimeRobot/Prometheus

You can use external services like UptimeRobot for basic HTTP checks, but for more granular insights and integration with your alerting system, Prometheus is a standard choice. You’ll need a Prometheus exporter that can scrape your Laravel application’s health endpoint.

A simple approach is to use a generic HTTP exporter or write a small custom exporter. For instance, you could deploy a small Python application using the prometheus_client library that periodically scrapes your health endpoint and exposes metrics to Prometheus.

# exporter/app.py
from prometheus_client import Gauge, start_http_server
import time
import requests
import os

HEALTH_URL = os.environ.get("LARAVEL_HEALTH_URL", "http://localhost/api/health")
EXPORTER_PORT = int(os.environ.get("EXPORTER_PORT", 9101))

# Metrics
app_health_status = Gauge('laravel_app_health_status', 'Laravel application health status (1 for healthy, 0 for unhealthy)', ['check'])
app_health_checks_total = Gauge('laravel_app_health_checks_total', 'Total number of health checks performed', ['check'])

def scrape_health_endpoint():
    try:
        response = requests.get(HEALTH_URL, timeout=10)
        response.raise_for_status() # Raise an exception for bad status codes
        data = response.json()

        # Reset all gauges before updating
        for check_name in ['overall', 'database', 'cache', 'external_api']: # Add all possible checks
            app_health_status.labels(check=check_name).set(0)
            app_health_checks_total.labels(check=check_name).inc()

        if data.get('status') == 'healthy':
            app_health_status.labels(check='overall').set(1)
        else:
            app_health_status.labels(check='overall').set(0)

        for check_name, check_status in data.get('checks', {}).items():
            if check_status == 'ok':
                app_health_status.labels(check=check_name).set(1)
            else:
                app_health_status.labels(check=check_name).set(0)
                # Optionally log the error from the check status
                print(f"Health check failed for {check_name}: {check_status}")

        print(f"Scraped {HEALTH_URL} successfully. Status: {data.get('status')}")

    except requests.exceptions.RequestException as e:
        print(f"Error scraping health endpoint {HEALTH_URL}: {e}")
        # Set overall health to unhealthy if scrape fails
        app_health_status.labels(check='overall').set(0)
        app_health_checks_total.labels(check='overall').inc()
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        app_health_status.labels(check='overall').set(0)
        app_health_checks_total.labels(check='overall').inc()


if __name__ == '__main__':
    print(f"Starting Prometheus exporter on port {EXPORTER_PORT}")
    start_http_server(EXPORTER_PORT)
    print(f"Scraping Laravel health endpoint at {HEALTH_URL}")

    while True:
        scrape_health_endpoint()
        time.sleep(60) # Scrape every 60 seconds

Deploy this exporter on a separate DigitalOcean Droplet or within your Kubernetes cluster, and configure Prometheus to scrape its metrics endpoint (e.g., http://exporter-ip:9101/metrics).

DynamoDB Cluster Health and Performance Monitoring

Monitoring DynamoDB involves looking at both operational health (availability) and performance metrics (throughput, latency, errors). AWS CloudWatch is the primary tool for this, and we’ll integrate its metrics into our monitoring stack, likely via Prometheus.

Key DynamoDB Metrics to Monitor

Focus on these critical metrics, available via AWS CloudWatch:

  • ConsumedReadCapacityUnits / ConsumedWriteCapacityUnits: Track actual usage against provisioned capacity. Spikes can indicate performance issues or inefficient queries.
  • ReadThrottleEvents / WriteThrottleEvents: Crucial for identifying when requests are being throttled due to exceeding provisioned capacity. This directly impacts application performance.
  • SuccessfulRequestLatency: Measures the latency of successful requests. High latency indicates potential issues within DynamoDB or network problems.
  • SystemErrors: Count of internal DynamoDB errors. Any non-zero value requires immediate investigation.
  • UserErrors: Count of errors originating from user requests (e.g., validation errors, conditional check failures). While some are expected, a sudden surge can point to application bugs.
  • ReturnedItemCount: Useful for understanding the volume of data being returned by queries.

Integrating CloudWatch Metrics with Prometheus

To bring CloudWatch metrics into Prometheus, we use the cloudwatch_exporter. This allows Prometheus to scrape metrics exposed by the exporter, which in turn pulls data from CloudWatch.

First, set up the cloudwatch_exporter. You’ll need AWS credentials configured (e.g., via environment variables or an IAM role if running on EC2/ECS).

# cloudwatch_exporter/config.yml
# Example configuration for DynamoDB metrics
---
region: us-east-1 # Or your AWS region
metrics:
  - aws_namespace: AWS/DynamoDB
    namespace: dynamodb
    # Define specific metrics for your tables
    dimensions:
      - name: TableName
        value: your-laravel-table-name # Replace with your actual table name
    metrics:
      - name: ConsumedReadCapacityUnits
        statistics: [Sum]
        period: 300 # 5 minutes
      - name: ConsumedWriteCapacityUnits
        statistics: [Sum]
        period: 300
      - name: ReadThrottleEvents
        statistics: [Sum]
        period: 60 # 1 minute for throttles
      - name: WriteThrottleEvents
        statistics: [Sum]
        period: 60
      - name: SuccessfulRequestLatency
        statistics: [Average, Maximum]
        period: 300
      - name: SystemErrors
        statistics: [Sum]
        period: 60
      - name: UserErrors
        statistics: [Sum]
        period: 60
      - name: ReturnedItemCount
        statistics: [Sum]
        period: 300
# Add more tables or global secondary indexes as needed
# Example for a specific index:
#  - aws_namespace: AWS/DynamoDB
#    namespace: dynamodb
#    dimensions:
#      - name: TableName
#        value: your-laravel-table-name
#      - name: GlobalSecondaryIndexName
#        value: your-index-name
#    metrics:
#      - name: ConsumedReadCapacityUnits
#        statistics: [Sum]
#        period: 300

Deploy the cloudwatch_exporter (e.g., as a Docker container) and configure Prometheus to scrape its metrics endpoint (e.g., http://cloudwatch-exporter-ip:9100/metrics).

Alerting on DynamoDB Throttles and Latency

Alerting is paramount. Configure Prometheus Alertmanager to trigger alerts based on specific thresholds.

# prometheus/alert.rules.yml
groups:
- name: dynamodb_alerts
  rules:
  - alert: HighDynamoDBReadThrottleRate
    expr: sum(rate(dynamodb_readthrottleevents_sum{job="cloudwatch_exporter"}[5m])) by (table) > 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High read throttle rate detected on DynamoDB table {{ $labels.table }}"
      description: "DynamoDB table {{ $labels.table }} is experiencing a high rate of read throttles (more than 0 per second over 5 minutes)."

  - alert: HighDynamoDBWriteThrottleRate
    expr: sum(rate(dynamodb_writethrottleevents_sum{job="cloudwatch_exporter"}[5m])) by (table) > 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High write throttle rate detected on DynamoDB table {{ $labels.table }}"
      description: "DynamoDB table {{ $labels.table }} is experiencing a high rate of write throttles (more than 0 per second over 5 minutes)."

  - alert: HighDynamoDBReadLatency
    expr: avg(dynamodb_successfulrequestlatency_maximum{job="cloudwatch_exporter", statistic="Maximum"} > 0.5) by (table) # Latency in seconds
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "High DynamoDB read latency on table {{ $labels.table }}"
      description: "DynamoDB table {{ $labels.table }} has a maximum read latency exceeding 0.5 seconds for the last 10 minutes."

  - alert: DynamoDBSystemErrors
    expr: sum(rate(dynamodb_systemerrors_sum{job="cloudwatch_exporter"}[5m])) by (table) > 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "System errors detected in DynamoDB table {{ $labels.table }}"
      description: "DynamoDB table {{ $labels.table }} is reporting system errors."

These alerts should be routed through Alertmanager to your preferred notification channels (Slack, PagerDuty, email).

Server Resource Monitoring on DigitalOcean Droplets

For your Laravel application servers (DigitalOcean Droplets), standard system resource monitoring is essential. Node Exporter is the de facto standard for exposing host-level metrics to Prometheus.

Deploying Node Exporter

Node Exporter can be deployed as a systemd service or a Docker container. Here’s a typical systemd service file:

# /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus # Or a dedicated user
ExecStart=/usr/local/bin/node_exporter --web.listen-address=":9100"

[Install]
Service]
WantedBy=multi-user.target

After placing this file, enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Configure Prometheus to scrape the node_exporter targets (e.g., http://your-droplet-ip:9100/metrics).

Key Server Metrics and Alerts

Focus on:

  • CPU Usage (node_cpu_seconds_total): Monitor overall CPU load and per-core usage. High sustained CPU can indicate inefficient code or insufficient resources.
  • Memory Usage (node_memory_MemAvailable_bytes, node_memory_MemTotal_bytes): Track available memory. Running out of memory leads to swapping and severe performance degradation.
  • Disk I/O (node_disk_io_time_seconds_total): Monitor disk read/write activity. High I/O wait times can bottleneck applications, especially databases.
  • Network Traffic (node_network_receive_bytes_total, node_network_transmit_bytes_total): Track network throughput.
  • Load Average (node_load1, node_load5, node_load15): A general indicator of system load.

Example Prometheus alerts for server resources:

# prometheus/alert.rules.yml (add to existing file)
groups:
- name: server_alerts
  rules:
  - alert: HighCpuUsage
    expr: 100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 > 90
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage on {{ $labels.instance }} has been above 90% for the last 10 minutes."

  - alert: LowAvailableMemory
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: "Low available memory on {{ $labels.instance }}"
      description: "Available memory on {{ $labels.instance }} is below 10% for the last 15 minutes."

  - alert: HighDiskIOWait
    expr: rate(node_disk_io_time_seconds_total{device="sda"}[5m]) > 0.8 # Adjust device and threshold
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High disk I/O wait on {{ $labels.instance }}"
      description: "Disk I/O wait time on {{ $labels.instance }} is high (above 80% of time spent waiting)."

These alerts, coupled with the application and database monitoring, provide a comprehensive view of your system’s health, enabling proactive intervention before issues impact users.

Primary Sidebar

A little about the Author

Having 12+ Years of Experience in Software Development, Vinay is a principal software architect, senior systems engineer, and elite technical consultant. He specializes in bespoke PHP/WordPress development, high-performance Magento 2 & Shopify architectures, custom plugin/theme development from scratch, and legacy code modernization (including VB6, VB.NET, PyQt, and Crystal Reports). Known for solving complex database bottlenecks, speed optimization (Core Web Vitals), and advanced security code auditing, Vinay engineers production-ready systems designed to scale under heavy concurrent load conditions.



Chat on WhatsApp

Recent Posts

  • Top 100 Developer Tooling and Productivity SaaS Ideas to Launch in 2026 to Boost Organic Search Growth by 200%
  • Top 100 Developer-Centric Code Snippet Managers and Customization Plugins to Double User Engagement and Session Duration
  • Top 5 API Monetization Frameworks and Gateway Strategies for Developers to Minimize Server Costs and Load Overhead
  • Top 50 Automated PDF & Document Generation Tool Ideas for Developers to Minimize Server Costs and Load Overhead
  • Top 50 Premium Newsletter and Subscription Business Models for Devs for High-Traffic Technical Portals

Categories

  • apache (1)
  • Business & Monetization (386)
  • Centos (4)
  • Comparisons & Decision Making (55)
  • Debian (2)
  • Debugging & Troubleshooting (514)
  • DevOps (7)
  • DevOps & Cloud Scaling (929)
  • Django (1)
  • Migration & Architecture (107)
  • MySQL (1)
  • Performance & Optimization (664)
  • PHP (5)
  • Plugins & Themes (146)
  • Security & Compliance (527)
  • SEO & Growth (457)
  • Server (23)
  • Ubuntu (9)
  • WordPress (22)
  • WordPress Plugin Development (7)
  • WordPress Theme Development (111)

Recent Posts

  • Top 100 Developer Tooling and Productivity SaaS Ideas to Launch in 2026 to Boost Organic Search Growth by 200%
  • Top 100 Developer-Centric Code Snippet Managers and Customization Plugins to Double User Engagement and Session Duration
  • Top 5 API Monetization Frameworks and Gateway Strategies for Developers to Minimize Server Costs and Load Overhead
  • Top 50 Automated PDF & Document Generation Tool Ideas for Developers to Minimize Server Costs and Load Overhead
  • Top 50 Premium Newsletter and Subscription Business Models for Devs for High-Traffic Technical Portals
  • Top 100 SEO and Schema Markup Plugins for Headless Decoupled Sites for Independent Web Developers and Indie Hackers

Top Categories

  • DevOps & Cloud Scaling (929)
  • Performance & Optimization (664)
  • Security & Compliance (527)
  • Debugging & Troubleshooting (514)
  • SEO & Growth (457)
  • Business & Monetization (386)

Our Products

  • School Management & Student Administration System
  • Integrated Hospital & Clinic Management System
  • Real Estate Directory & Agent Portal
  • Restaurant POS & Table Booking System
  • Retail Inventory POS & Billing System
  • Pharmacy Inventory & Clinic Billing System

Our Services

  • Vibe Engineering & AI Code Auditing Services
  • Prompt Engineering & "Vibe Coding" Workflow Consulting
  • AI-Augmented "Vibe Coding" & Rapid MVP Development
  • Figma to Shopify Liquid Theme Customization
  • Figma to WooCommerce Frontend Development
  • Figma to Magento 2 Theme Development

Copyright © 2026 · Vinay Vengala