Server Monitoring Best Practices: Keeping Your Laravel App and Elasticsearch Clusters Alive on AWS

Proactive Health Checks for Laravel Applications on AWS EC2

Maintaining the health of a Laravel application deployed on AWS EC2 instances requires a multi-layered monitoring strategy. Beyond basic CPU and memory utilization, we need to ensure the application itself is responsive, its dependencies are functioning, and potential issues are flagged before they impact end-users. This involves implementing application-level health checks and integrating them with AWS CloudWatch.

Implementing a Laravel Health Check Endpoint

A robust health check endpoint within your Laravel application is the first line of defense. This endpoint should not only verify that the web server is responding but also check critical dependencies like the database connection, Redis cache, and any external APIs your application relies on. We’ll create a dedicated controller and route for this.

Health Check Controller

Create a new controller, for example, app/Http/Controllers/HealthCheckController.php.

// app/Http/Controllers/HealthCheckController.php
namespace App\Http\Controllers;

use Illuminate\Http\JsonResponse;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Cache;
use Illuminate\Support\Facades\Log;
use Illuminate\Support\Facades\Redis;
use Exception;

class HealthCheckController extends Controller
{
    /**
     * Checks the health of the application and its dependencies.
     *
     * @return \Illuminate\Http\JsonResponse
     */
    public function index(): JsonResponse
    {
        $status = 'ok';
        $checks = [];

        // 1. Database Connection Check
        try {
            DB::connection()->getPdo();
            $checks['database'] = 'connected';
        } catch (Exception $e) {
            $status = 'error';
            $checks['database'] = 'failed: ' . $e->getMessage();
            Log::error('Database connection failed: ' . $e->getMessage());
        }

        // 2. Cache (Redis) Connection Check
        try {
            Redis::connection()->ping();
            $checks['cache'] = 'connected';
        } catch (Exception $e) {
            $status = 'error';
            $checks['cache'] = 'failed: ' . $e->getMessage();
            Log::error('Redis connection failed: ' . $e->getMessage());
        }

        // 3. Basic Application Logic Check (e.g., can we access a specific configuration value?)
        try {
            config('app.name'); // Example: checking if config is loaded
            $checks['app_config'] = 'accessible';
        } catch (Exception $e) {
            $status = 'error';
            $checks['app_config'] = 'failed: ' . $e->getMessage();
            Log::error('Application config check failed: ' . $e->getMessage());
        }

        // Add more checks as needed (e.g., external API calls, queue status)

        return response()->json([
            'status' => $status,
            'checks' => $checks,
        ], $status === 'ok' ? 200 : 500);
    }
}

Define the Health Check Route

Add a route in routes/api.php (or routes/web.php if you prefer, but API is generally better for health checks).

// routes/api.php
use App\Http\Controllers\HealthCheckController;

Route::get('/health', [HealthCheckController::class, 'index']);

Ensure this route is accessible without authentication for monitoring purposes. If your application requires authentication for API routes, you might need to create a separate route group or middleware for health checks.

Configuring AWS CloudWatch Alarms

AWS CloudWatch is essential for monitoring EC2 instances and application health. We’ll set up alarms based on metrics and the health check endpoint.

CloudWatch Agent for System Metrics

The CloudWatch agent collects system-level metrics (CPU, memory, disk, network) and can also collect custom logs. Install and configure the agent on each EC2 instance running your Laravel application.

Installation (Amazon Linux 2 Example)

sudo yum update -y
sudo rpm -Uvh https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm

Configuration

Create a configuration file (e.g., /opt/aws/cloudwatch/cloudwatch-agent.json). This example collects basic system metrics and logs from Laravel’s storage/logs/laravel.log.

{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "cwagent"
  },
  "metrics": {
    "namespace": "LaravelApp/EC2",
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "cpu": {
        "measurement": [
          "cpu_usage_idle",
          "cpu_usage_user",
          "cpu_usage_system",
          "cpu_usage_iowait"
        ],
        "totalcpu": true
      },
      "disk": {
        "measurement": [
          "free_percent",
          "used_percent",
          "inodes_free",
          "inodes_used"
        ],
        "resources": [
          "/",
          "/var/log"
        ],
        "ignore_file_system_types": [
          "sysfs",
          "devtmpfs",
          "tmpfs",
          "devfs",
          "iso9660",
          "overlay",
          "aufs",
          "squashfs"
        ]
      },
      "mem": {
        "measurement": [
          "mem_used_percent",
          "mem_used",
          "mem_total",
          "mem_cached",
          "mem_free"
        ]
      },
      "netif": {
        "measurement": [
          "bytes_sent",
          "bytes_recv",
          "packets_sent",
          "packets_recv"
        ]
      }
    }
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/www/html/storage/logs/laravel.log",
            "log_group_name": "LaravelApp/EC2/Logs",
            "log_stream_name": "{instance_id}/laravel",
            "timestamp_format": "%Y-%m-%dT%H:%M:%S.%fZ",
            "timezone": "UTC"
          }
        ]
      }
    }
  }
}

Start the Agent

sudo /opt/aws/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/cloudwatch/cloudwatch-agent.json -s

CloudWatch Alarms for System Metrics

Navigate to the CloudWatch console in AWS. Create alarms for critical system metrics. For example, an alarm for high CPU utilization.

Example: High CPU Utilization Alarm

Metric: CPUUtilization (under the AWS/EC2 namespace, or LaravelApp/EC2 if using the agent’s namespace).

Statistic: Average

Period: 5 minutes

Threshold type: Static

Whenever CPU utilization is: Greater/Equal

than: 85%

Datapoints to alarm: 3 out of 3 (This means the condition must be true for 15 consecutive minutes).

Actions: Configure an SNS topic to send notifications (e.g., email, Slack via Lambda integration) when the alarm state changes.

CloudWatch Alarms for Application Health Endpoint

Monitoring the application’s health endpoint requires a slightly different approach, as it’s not a direct EC2 metric. We can use CloudWatch Synthetics Canaries or a custom Lambda function triggered by EventBridge.

Option 1: CloudWatch Synthetics Canaries

Canaries are ideal for simulating user interactions or API calls. We can create a canary to hit our /health endpoint.

Canary Configuration (Node.js Example)

In the CloudWatch console, create a Synthetics Canary. Choose “API Canary” and provide the URL of your health check endpoint (e.g., http://your-ec2-instance-ip/health or http://your-app-domain.com/health).

// Example Canary script (Node.js)
const synthetics = require('Synthetics');
const log = require('SyntheticsLogger');

const apiEndpoint = 'http://YOUR_APP_URL_OR_IP/health'; // Replace with your actual endpoint

exports.handler = async () => {
    const requestOptions = {
        url: apiEndpoint,
        method: 'GET',
        headers: {
            'Content-Type': 'application/json'
        },
        // Optional: Add a timeout
        timeout: 10000 // 10 seconds
    };

    log.info('Executing health check request to:', apiEndpoint);

    try {
        const response = await synthetics.executeHttpRequest(requestOptions);
        log.info('Received response:', JSON.stringify(response));

        // Check for a successful HTTP status code (2xx or 3xx)
        if (response.statusCode >= 200 && response.statusCode < 400) {
            // Further check the response body for our custom 'ok' status
            if (response.body && typeof response.body === 'string') {
                try {
                    const healthData = JSON.parse(response.body);
                    if (healthData.status === 'ok') {
                        log.info('Application health check successful.');
                        return response.body; // Success
                    } else {
                        log.error('Application reported an error status:', healthData.status);
                        throw new Error('Application reported an error status.');
                    }
                } catch (parseError) {
                    log.error('Failed to parse JSON response body:', parseError);
                    throw new Error('Failed to parse health check response body.');
                }
            } else {
                log.error('Response body is empty or not a string.');
                throw new Error('Empty or invalid response body from health check.');
            }
        } else {
            log.error(`Received non-success status code: ${response.statusCode}`);
            throw new Error(`Received non-success status code: ${response.statusCode}`);
        }
    } catch (error) {
        log.error('Health check failed:', error);
        throw error; // This will cause the canary to fail
    }
};

Configure the canary to run on a frequent schedule (e.g., every 1 or 5 minutes). Set up CloudWatch Alarms on the canary’s Synthetics Canary Run Failures metric. This alarm will trigger if the canary script fails (e.g., non-2xx response, JSON parsing error, or our custom ‘error’ status).

Option 2: Lambda Function with EventBridge

Alternatively, you can use a Lambda function to poll the health endpoint and trigger an alarm.

Lambda Function (Python Example)

import json
import os
import urllib3

# Replace with your application's health check endpoint
HEALTH_CHECK_URL = os.environ.get('HEALTH_CHECK_URL', 'http://YOUR_APP_URL_OR_IP/health')
HTTP_CLIENT = urllib3.PoolManager()

def lambda_handler(event, context):
    try:
        r = HTTP_CLIENT.request('GET', HEALTH_CHECK_URL, timeout=10)
        
        if r.status != 200:
            raise Exception(f"Health check returned status code: {r.status}")
            
        response_body = json.loads(r.data.decode('utf-8'))
        
        if response_body.get('status') != 'ok':
            raise Exception(f"Application reported status: {response_body.get('status')}")
            
        print(f"Health check successful: {response_body}")
        return {
            'statusCode': 200,
            'body': json.dumps('Health check OK')
        }
        
    except Exception as e:
        print(f"Health check failed: {e}")
        # This exception will cause the Lambda invocation to fail,
        # which can be monitored by CloudWatch.
        raise e

Deploy this Lambda function. Then, create an EventBridge (CloudWatch Events) rule to trigger this Lambda function on a schedule (e.g., every minute). Configure a CloudWatch Alarm on the Lambda function’s Invocations metric, specifically looking for Errors. If the Lambda function throws an exception (because the health check failed), the error count will increase, triggering the alarm.

Monitoring Elasticsearch Clusters on AWS

Elasticsearch clusters, especially when used with Laravel (e.g., for Scout or custom search), require dedicated monitoring. AWS offers Amazon Elasticsearch Service (now OpenSearch Service), which provides built-in metrics. We’ll focus on key metrics and setting up alarms.

Key Elasticsearch Metrics to Monitor

CPU Utilization: High CPU can indicate inefficient queries, indexing bottlenecks, or insufficient resources.
JVM Memory Pressure: Crucial for Elasticsearch performance. High pressure leads to garbage collection pauses and instability.
Disk Space Used: Running out of disk space will halt indexing and searching.
Indexing Rate: Monitor the rate at which documents are being indexed. Sudden drops or spikes can indicate issues.
Search Rate: Similar to indexing rate, monitor search request volume.
Search Latency: High latency directly impacts user experience.
Shards: Monitor the number of unassigned shards, which indicates cluster health problems.
Cluster Status: Elasticsearch reports its health as Green, Yellow, or Red. Red is critical.

Accessing Elasticsearch Metrics in CloudWatch

When you create an OpenSearch Service domain, it automatically publishes several metrics to CloudWatch under the AWS/OpenSearchService namespace. Ensure you have enabled “Detailed monitoring” for your domain to get metrics every minute (standard monitoring is every 5 minutes).

CloudWatch Alarms for Elasticsearch

Create alarms for the key metrics identified above.

Example: High JVM Memory Pressure Alarm

Metric: JVMMemoryPressure (under AWS/OpenSearchService namespace).

Statistic: Average

Period: 5 minutes

Threshold type: Static

Whenever JVM Memory Pressure is: Greater/Equal

than: 80% (Adjust based on your cluster’s baseline performance)

Datapoints to alarm: 3 out of 3

Actions: Configure an SNS topic for notifications.

Example: Unassigned Shards Alarm

Metric: UnassignedShards (under AWS/OpenSearchService namespace).

Statistic: Sum

Period: 1 minute

Threshold type: Static

Whenever Unassigned Shards is: Greater

than: 0

Datapoints to alarm: 1 out of 1

Actions: Configure an SNS topic for notifications. Unassigned shards are a critical indicator of cluster health issues.

Custom Elasticsearch Monitoring with CloudWatch Logs Insights

For more granular insights, you can configure your Elasticsearch cluster to send slow logs (search and index) to CloudWatch Logs. This allows you to use CloudWatch Logs Insights to query and analyze slow queries.

Enabling Slow Logs

In your OpenSearch Service domain configuration, under “Advanced options,” enable slowlog.enabled and set thresholds for index.search.slowlog.threshold.warn and index.indexing.slowlog.threshold.warn (e.g., 10s for warn, 30s for info).

Querying Slow Logs with Logs Insights

Once logs are flowing into CloudWatch Logs, use Logs Insights to find slow queries. For example, to find search queries taking longer than 5 seconds:

fields @timestamp, @message
| parse @message "* Search slowlog: * took<*>s" as search_time
| filter search_time > 5
| sort @timestamp desc
| limit 50

You can then create CloudWatch Alarms based on the results of these Logs Insights queries (e.g., alarm if more than X slow queries are found in Y minutes).

Centralized Logging and Alerting Strategy

A robust monitoring strategy is incomplete without a centralized logging and alerting system. AWS services like CloudWatch Logs, SNS, and potentially third-party tools like Datadog or Grafana (with Prometheus) play a crucial role.

Leveraging CloudWatch Logs for Centralization

As shown with the CloudWatch agent and Elasticsearch slow logs, direct logs to CloudWatch Logs. This provides a single pane of glass for application and system logs.

SNS for Alert Fan-out

AWS Simple Notification Service (SNS) is the backbone of our alerting. When a CloudWatch Alarm state changes (e.g., from OK to ALARM), it publishes a message to an SNS topic. This topic can then fan out notifications to multiple subscribers:

Email: For immediate human notification.
SMS: For critical alerts requiring urgent attention.
AWS Lambda: To trigger automated remediation actions (e.g., restarting a service, scaling up an instance).
SQS: To queue alerts for processing by other services.
HTTP/S endpoints: To integrate with external incident management tools (e.g., PagerDuty, Opsgenie).

Automated Remediation with Lambda

For common, predictable issues, automate remediation. For instance, if a health check fails repeatedly, a Lambda function subscribed to the SNS topic could attempt to restart the Laravel application process (e.g., via Systems Manager Run Command) or trigger an Auto Scaling event.

Example: Lambda to Restart Laravel Process

This Lambda function would be triggered by an SNS notification from a CloudWatch Alarm. It would use the AWS SDK to interact with Systems Manager.

import boto3
import json
import os

ssm = boto3.client('ssm')
instance_id = os.environ['TARGET_INSTANCE_ID'] # Or get dynamically from alarm event
command_document = "AWS-RunShellScript"
command_content = "sudo systemctl restart php-fpm && sudo systemctl restart apache2" # Adjust for your web server/PHP-FPM setup

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

    try:
        response = ssm.send_command(
            InstanceIds=[instance_id],
            DocumentName=command_document,
            Parameters={'commands': [command_content]},
            TimeoutSeconds=600,
            Comment='Automated restart of Laravel application due to health alert'
        )
        command_id = response['Command']['CommandId']
        print(f"Sent command {command_id} to instance {instance_id}")
        return {
            'statusCode': 200,
            'body': json.dumps(f'Command {command_id} sent successfully.')
        }
    except Exception as e:
        print(f"Error sending command: {e}")
        raise e

Ensure the Lambda function’s IAM role has permissions for ssm:SendCommand and potentially ec2:DescribeInstances if you need to dynamically find instance IDs.

Conclusion

A comprehensive monitoring strategy for Laravel applications and Elasticsearch on AWS involves proactive health checks at the application level, robust system metrics collection via the CloudWatch agent, and intelligent alerting. By combining CloudWatch Synthetics, alarms, and automated remediation with Lambda and SNS, you can significantly improve the reliability and uptime of your critical services.

Server Monitoring Best Practices: Keeping Your Laravel App and Elasticsearch Clusters Alive on AWS

Proactive Health Checks for Laravel Applications on AWS EC2

Implementing a Laravel Health Check Endpoint

Health Check Controller

Define the Health Check Route

Configuring AWS CloudWatch Alarms

CloudWatch Agent for System Metrics

Installation (Amazon Linux 2 Example)

Configuration

Start the Agent

CloudWatch Alarms for System Metrics

Example: High CPU Utilization Alarm

CloudWatch Alarms for Application Health Endpoint

Option 1: CloudWatch Synthetics Canaries

Canary Configuration (Node.js Example)

Option 2: Lambda Function with EventBridge

Lambda Function (Python Example)

Monitoring Elasticsearch Clusters on AWS

Key Elasticsearch Metrics to Monitor

Accessing Elasticsearch Metrics in CloudWatch

CloudWatch Alarms for Elasticsearch

Example: High JVM Memory Pressure Alarm

Example: Unassigned Shards Alarm

Custom Elasticsearch Monitoring with CloudWatch Logs Insights

Enabling Slow Logs

Querying Slow Logs with Logs Insights

Centralized Logging and Alerting Strategy

Leveraging CloudWatch Logs for Centralization

SNS for Alert Fan-out

Automated Remediation with Lambda

Example: Lambda to Restart Laravel Process

Conclusion

Recent Posts

Top Categories

Our Products

Our Services