Server Monitoring Best Practices: Keeping Your Laravel App and Elasticsearch Clusters Alive on DigitalOcean
Proactive Health Checks for Laravel Applications
Maintaining the health of a Laravel application deployed on DigitalOcean requires a multi-layered monitoring strategy. Beyond basic CPU and memory utilization, we need to ensure the application itself is responsive and its critical components are functioning. This involves implementing application-level health checks and integrating them with a robust monitoring system.
Implementing a Laravel Health Check Endpoint
A dedicated health check endpoint within your Laravel application is the first line of defense. This endpoint should perform essential checks, such as database connectivity, cache availability, and the status of any critical external services. We’ll create a simple controller and route for this.
First, generate a new controller:
php artisan make:controller HealthCheckController
Next, define the health check logic within the controller. This example checks database connectivity and Redis availability.
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\Log;
class HealthCheckController extends Controller
{
/**
* Perform a comprehensive health check.
*
* @return \Illuminate\Http\JsonResponse
*/
public function index()
{
$checks = [];
$status = 200;
// Check Database Connection
try {
DB::connection()->getPdo();
$checks['database'] = 'OK';
} catch (\Exception $e) {
$checks['database'] = 'FAILED: ' . $e->getMessage();
$status = 503; // Service Unavailable
Log::error('Database connection failed: ' . $e->getMessage());
}
// Check Cache (assuming Redis)
try {
Cache::put('health_check_test', 'value', 1);
if (Cache::get('health_check_test') === 'value') {
$checks['cache'] = 'OK';
Cache::forget('health_check_test');
} else {
$checks['cache'] = 'FAILED: Could not write/read from cache.';
$status = 503;
Log::error('Cache write/read failed.');
}
} catch (\Exception $e) {
$checks['cache'] = 'FAILED: ' . $e->getMessage();
$status = 503;
Log::error('Cache connection failed: ' . $e->getMessage());
}
// Add more checks here (e.g., external API calls, queue status)
return response()->json($checks, $status);
}
}
Now, register a route for this endpoint in routes/api.php. It’s crucial to protect this route, especially in production, by using middleware that limits access to trusted IP addresses or internal networks.
use App\Http\Controllers\HealthCheckController;
Route::get('/health', [HealthCheckController::class, 'index'])->middleware('throttle:100,1'); // Basic throttling
For production, consider a more robust middleware that restricts access by IP. You can create a custom middleware for this purpose.
Integrating with DigitalOcean Monitoring and Uptime Checks
DigitalOcean’s built-in monitoring provides essential infrastructure metrics. However, for application-level checks, we need to leverage external services or configure DigitalOcean’s Uptime Checks.
DigitalOcean Uptime Checks:
- Navigate to your Droplet in the DigitalOcean control panel.
- Go to the “Monitoring” tab.
- Under “Uptime Checks,” click “Add Uptime Check.”
- Configure the check:
- Protocol: HTTP/HTTPS
- Port: 80 or 443
- Path:
/health(the endpoint we created) - Check Interval: e.g., 1 minute
- Alerting: Configure email alerts for failures.
This setup will ping your /health endpoint at the specified interval. If the endpoint returns a non-2xx status code or times out, DigitalOcean will trigger an alert. This is a good first step for external validation.
Advanced Monitoring with Prometheus and Grafana
For more granular control and richer visualization, integrating Prometheus and Grafana is a standard practice. We’ll use the prometheus-client PHP library to expose application metrics and configure Prometheus to scrape them.
Install the Prometheus client library:
composer require promphp/prometheus-client
Create a new endpoint to expose Prometheus metrics. This endpoint will be scraped by Prometheus.
use Prometheus\Render\RenderTextFormat;
use Prometheus\Storage\InMemory;
use Prometheus\CollectorRegistry;
// ... inside a new controller or a dedicated metrics handler
public function metrics()
{
$registry = new CollectorRegistry(new InMemory());
// Example: Gauge for active users (requires logic to track)
$activeUsers = $registry->getOrRegisterGauge('myapp', 'active_users', 'Number of currently active users');
$activeUsers->set(rand(10, 100)); // Replace with actual user count
// Example: Counter for processed orders
$ordersProcessed = $registry->getOrRegisterCounter('myapp', 'orders_processed_total', 'Total number of orders processed');
// Increment this counter when an order is successfully processed
// Example: Histogram for request duration
$requestDuration = $registry->getOrRegisterHistogram('myapp', 'request_duration_seconds', 'Duration of HTTP requests', [0.1, 0.5, 1, 5, 10]);
// Record request duration in middleware or controller
$renderer = new RenderTextFormat();
header('Content-Type: ' . RenderTextFormat::MIME_TYPE);
echo $renderer->render($registry->getMetricFamilySamples());
exit;
}
Register a route for this metrics endpoint. Ensure this route is accessible only to your Prometheus server.
use App\Http\Controllers\MetricsController; // Assuming you put it in MetricsController
Route::get('/metrics', [MetricsController::class, 'metrics'])->middleware('auth.prometheus'); // Custom middleware for Prometheus IP restriction
You’ll need to create an auth.prometheus middleware to restrict access to your Prometheus server’s IP address.
Elasticsearch Cluster Health and Performance Monitoring
Monitoring Elasticsearch clusters is critical for maintaining the performance and availability of your search and logging infrastructure. This involves tracking cluster health, node status, indexing rates, search latency, and resource utilization.
Elasticsearch Cluster Health API
The Elasticsearch Cluster Health API (_cluster/health) provides a high-level overview of the cluster’s status. It returns information about the number of nodes, indices, shards, and the overall health status (green, yellow, red).
You can query this endpoint using curl:
curl -X GET "localhost:9200/_cluster/health?pretty"
A typical output:
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 15,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue" : 0,
"active_shards_percent_as_number" : 100.0
}
A status of green indicates that all primary and replica shards are allocated. yellow means all primary shards are allocated, but some replicas are not. red signifies that some primary shards are not allocated, meaning data might be unavailable.
Node Stats and Shard Allocation Monitoring
To dive deeper, monitor individual node statistics and shard allocation. The Node Stats API (_nodes/stats) provides detailed metrics on CPU usage, memory, disk I/O, network traffic, and JVM statistics for each node.
curl -X GET "localhost:9200/_nodes/stats?pretty"
The Shard Allocation API (_cluster/allocation/explain) is invaluable for diagnosing why shards are not being allocated (e.g., during a `yellow` or `red` cluster status).
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"
This will provide detailed explanations for unassigned shards, helping you identify issues like insufficient disk space, node attribute mismatches, or shard balancing problems.
Monitoring Indexing and Search Performance
Slow indexing or search queries can cripple an application. Monitor the Indexing Performance API (_stats/indexing) and Search Performance API (_stats/search) to identify bottlenecks.
# Indexing stats curl -X GET "localhost:9200/_stats/indexing?pretty" # Search stats curl -X GET "localhost:9200/_stats/search?pretty"
Key metrics to watch include:
- Indexing:
index_total,index_time_in_millis,throttle_time_in_millis(indicates indexing pressure). - Search:
query_total,query_time_in_millis,fetch_total,fetch_time_in_millis. High values here, especially with increasingquery_time_in_millis, point to slow queries.
Leveraging Elasticsearch Monitoring Tools
Elasticsearch offers its own monitoring solution, often integrated with Kibana. This provides a user-friendly dashboard for cluster health, node metrics, index performance, and more.
Enabling Elasticsearch Monitoring:
- Ensure the
x-pack.monitoring.enabled: truesetting is present in yourelasticsearch.ymlconfiguration file. - Restart your Elasticsearch nodes.
- Access the “Stack Monitoring” section in Kibana.
This built-in solution is excellent for a quick overview and alerts. For more advanced, custom metrics and integration with your existing monitoring stack (like Prometheus/Grafana), you can use:
- Prometheus Exporters: Use community-developed Elasticsearch exporters for Prometheus (e.g.,
prometheus-elasticsearch-exporter) to scrape Elasticsearch metrics and feed them into your Prometheus instance. - Log Aggregation: Forward Elasticsearch logs (including slow logs) to a centralized logging system (like ELK stack or Loki) for analysis and alerting.
Alerting Strategies for Elasticsearch
Set up alerts for critical conditions:
- Cluster Status: Alert immediately if the cluster status is
yelloworred. - Node Health: Monitor CPU, memory, and disk usage on each node. Alert when thresholds are breached (e.g., disk usage > 85%).
- Indexing/Search Latency: Alert on sustained high indexing or search latency.
- Unassigned Shards: Alert on any unassigned shards.
- JVM Heap Usage: Monitor JVM heap usage; high usage can lead to garbage collection pauses and performance degradation.
Configure these alerts within your chosen monitoring system (e.g., Prometheus Alertmanager, Grafana alerting, or Elasticsearch’s alerting features).