Server Monitoring Best Practices: Keeping Your Ruby App and DynamoDB Clusters Alive on DigitalOcean

Proactive Health Checks for Ruby Applications on DigitalOcean

Maintaining the health of a Ruby application deployed on DigitalOcean requires a multi-layered monitoring strategy. Beyond basic CPU and memory utilization, we need to ensure the application itself is responsive and its critical dependencies, like DynamoDB, are performing optimally. This section focuses on implementing granular checks for your Ruby application.

Application-Level Health Endpoint

A fundamental practice is exposing a dedicated health check endpoint within your Ruby application. This endpoint should perform essential checks: database connectivity, cache status, and the availability of critical external services. For a Rails application, this can be implemented as a simple controller action.

Consider a basic implementation in Rails:

# app/controllers/health_controller.rb
class HealthController << ApplicationController
  def show
    # Check database connection
    unless ActiveRecord::Base.connection.execute('SELECT 1')
      render json: { status: 'error', message: 'Database connection failed' }, status: 503
      return
    end

    # Add checks for other critical services (e.g., Redis, external APIs)
    # Example:
    # unless $redis.ping == 'PONG'
    #   render json: { status: 'error', message: 'Redis connection failed' }, status: 503
    #   return
    # end

    render json: { status: 'ok', message: 'Application is healthy' }, status: 200
  rescue StandardError => e
    render json: { status: 'error', message: "An unexpected error occurred: #{e.message}" }, status: 500
  end
end

And its corresponding route:

# config/routes.rb
Rails.application.routes.draw do
  get 'health', to: 'health#show'
  # ... other routes
end

External Monitoring with DigitalOcean Monitoring and Prometheus

While the application-level health endpoint is crucial for internal checks, external monitoring is vital for simulating user experience and detecting issues before they impact users. DigitalOcean’s built-in monitoring provides basic infrastructure metrics. For more advanced application-level checks, integrating Prometheus with exporters is a robust solution.

We can use a tool like blackbox_exporter to probe our application’s health endpoint from an external perspective. This involves deploying blackbox_exporter and configuring Prometheus to scrape it.

Deploying Blackbox Exporter

A simple way to deploy blackbox_exporter is using Docker on a separate DigitalOcean Droplet or within your existing infrastructure.

docker run -d --name blackbox_exporter \
  -p 9115:9115 \
  -v $(pwd)/blackbox.yml:/config/blackbox.yml \
  prom/blackbox-exporter:latest \
  --config.file=/config/blackbox.yml

The blackbox.yml configuration file defines the probes. For an HTTP probe targeting our Rails app:

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      # Expect a 200 OK status code
      valid_status_codes: []
      # Optionally, check for specific content in the response body
      # body_string: "Application is healthy"
      # tls_config:
      #   insecure_skip_verify: true # Use with caution, only for internal testing if needed
  tcp_connect:
    prober: tcp
    timeout: 5s
    tcp:
      # Connect to the application's port
      preferred_ip_protocol: "ip4"
      query_timeout: 2s

Configuring Prometheus

In your Prometheus configuration (prometheus.yml), add a scrape job for the blackbox_exporter. This job will instruct Prometheus to query the blackbox_exporter for specific targets.

scrape_configs:
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx] # Or 'tcp_connect'
    static_configs:
      - targets:
        - http://your-rails-app-domain.com/health # Target your application's health endpoint
        - http://another-app-instance.com/health
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter-host:9115 # Address of your blackbox_exporter instance

This setup allows Prometheus to monitor the availability and responsiveness of your Ruby application’s health endpoint from an external perspective. Alerts can then be configured in Prometheus (or Alertmanager) based on the metrics generated by blackbox_exporter.

Monitoring DynamoDB Performance and Health

DynamoDB, as a managed NoSQL database, offloads much of the operational burden. However, performance and cost optimization still require diligent monitoring. Key metrics to track include read/write capacity utilization, latency, and throttled requests.

Leveraging AWS CloudWatch Metrics

AWS CloudWatch is the primary service for monitoring DynamoDB. It provides a rich set of metrics that can be accessed via the AWS CLI, SDKs, or the AWS Management Console. For automated monitoring and alerting, we’ll focus on using the AWS CLI and integrating with Prometheus.

Key DynamoDB metrics to monitor:

ConsumedReadCapacityUnits: The number of read capacity units consumed by your operations.
ConsumedWriteCapacityUnits: The number of write capacity units consumed.
ProvisionedReadCapacityUnits: The number of read capacity units provisioned.
ProvisionedWriteCapacityUnits: The number of write capacity units provisioned.
ReadThrottleEvents: The number of read requests that were throttled.
WriteThrottleEvents: The number of write requests that were throttled.
SuccessfulRequestLatency: The latency of successful requests.
ThrottledRequests: The number of requests that were throttled (combines read and write).

Automated Metric Collection with AWS CLI and Cron

For a basic setup, you can use cron jobs to periodically fetch CloudWatch metrics using the AWS CLI and store them or push them to a time-series database. This is a less sophisticated approach than dedicated exporters but can be effective for smaller deployments.

# Example script to fetch read throttle events for a table
aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name ReadThrottleEvents \
  --dimensions Name=TableName,Value=YourTableName \
  --start-time $(date -u -d '5 minutes ago' '+%Y-%m-%dT%H:%M:%SZ') \
  --end-time $(date -u '+%Y-%m-%dT%H:%M:%SZ') \
  --period 300 \
  --statistics Sum \
  --output json >> dynamodb_metrics.log

This script fetches the sum of ReadThrottleEvents over the last 5 minutes. You would need to adapt this for other metrics and potentially parse the JSON output for more advanced processing.

Integrating DynamoDB Metrics with Prometheus via CloudWatch Exporter

A more robust and scalable approach is to use the Prometheus cloudwatch_exporter. This exporter can be configured to scrape specific CloudWatch metrics and expose them in Prometheus-readable format.

Deploying CloudWatch Exporter

Similar to blackbox_exporter, cloudwatch_exporter can be deployed via Docker.

docker run -d --name cloudwatch_exporter \
  -p 9118:9118 \
  -v $(pwd)/cloudwatch_exporter.yml:/etc/cloudwatch_exporter/config.yml \
  -e AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY_ID" \
  -e AWS_SECRET_ACCESS_KEY="YOUR_SECRET_ACCESS_KEY" \
  -e AWS_REGION="us-east-1" \
  prom/cloudwatch-exporter:latest \
  --config.file=/etc/cloudwatch_exporter/config.yml

Ensure the IAM user associated with the provided credentials has permissions to access CloudWatch metrics (e.g., cloudwatch:GetMetricStatistics, cloudwatch:ListMetrics).

Configuring CloudWatch Exporter

The cloudwatch_exporter.yml file defines which metrics to collect. Here’s an example for DynamoDB:

# cloudwatch_exporter.yml
discovery:
  jobs:
    - type: DynamoDB
      regions:
        - us-east-1 # Your AWS region
      metrics:
        - name: ConsumedReadCapacityUnits
          statistics: [Sum]
          period: 300 # 5 minutes
          length: 1
        - name: ConsumedWriteCapacityUnits
          statistics: [Sum]
          period: 300
          length: 1
        - name: ReadThrottleEvents
          statistics: [Sum]
          period: 300
          length: 1
        - name: WriteThrottleEvents
          statistics: [Sum]
          period: 300
          length: 1
        - name: SuccessfulRequestLatency
          statistics: [Average, Maximum]
          period: 60 # 1 minute
          length: 1
      # You can also specify specific table names if needed
      # table_names:
      #   - YourTableName1
      #   - YourTableName2

Configuring Prometheus to Scrape CloudWatch Exporter

Add a scrape job to your prometheus.yml to collect metrics from the cloudwatch_exporter.

scrape_configs:
  - job_name: 'cloudwatch_dynamodb'
    static_configs:
      - targets:
        - 'cloudwatch-exporter-host:9118' # Address of your cloudwatch_exporter instance
    metric_relabel_configs:
      # Filter for DynamoDB metrics specifically
      - source_labels: [__name__]
        regex: 'aws_cloudwatch_dynamodb_(.*)'
        action: keep
      # Add labels for table name, region, etc. if not already present
      - source_labels: [table]
        target_label: table_name
      - source_labels: [region]
        target_label: aws_region

With this setup, Prometheus will ingest DynamoDB metrics, allowing you to create dashboards in Grafana and set up alerts for conditions like high read/write utilization, throttling, or increased latency. This proactive monitoring is essential for maintaining application performance and controlling costs on DigitalOcean.