Server Monitoring Best Practices: Keeping Your Ruby App and DynamoDB Clusters Alive on DigitalOcean
Proactive Health Checks for Ruby Applications on DigitalOcean
Maintaining the health of a Ruby application deployed on DigitalOcean requires a multi-layered monitoring strategy. Beyond basic CPU and memory utilization, we need to ensure the application itself is responsive and its critical dependencies, like DynamoDB, are performing optimally. This section focuses on implementing granular checks for your Ruby application.
Application-Level Health Endpoint
A fundamental practice is exposing a dedicated health check endpoint within your Ruby application. This endpoint should perform essential checks: database connectivity, cache status, and the availability of critical external services. For a Rails application, this can be implemented as a simple controller action.
Consider a basic implementation in Rails:
# app/controllers/health_controller.rb
class HealthController << ApplicationController
def show
# Check database connection
unless ActiveRecord::Base.connection.execute('SELECT 1')
render json: { status: 'error', message: 'Database connection failed' }, status: 503
return
end
# Add checks for other critical services (e.g., Redis, external APIs)
# Example:
# unless $redis.ping == 'PONG'
# render json: { status: 'error', message: 'Redis connection failed' }, status: 503
# return
# end
render json: { status: 'ok', message: 'Application is healthy' }, status: 200
rescue StandardError => e
render json: { status: 'error', message: "An unexpected error occurred: #{e.message}" }, status: 500
end
end
And its corresponding route:
# config/routes.rb Rails.application.routes.draw do get 'health', to: 'health#show' # ... other routes end
External Monitoring with DigitalOcean Monitoring and Prometheus
While the application-level health endpoint is crucial for internal checks, external monitoring is vital for simulating user experience and detecting issues before they impact users. DigitalOcean’s built-in monitoring provides basic infrastructure metrics. For more advanced application-level checks, integrating Prometheus with exporters is a robust solution.
We can use a tool like blackbox_exporter to probe our application’s health endpoint from an external perspective. This involves deploying blackbox_exporter and configuring Prometheus to scrape it.
Deploying Blackbox Exporter
A simple way to deploy blackbox_exporter is using Docker on a separate DigitalOcean Droplet or within your existing infrastructure.
docker run -d --name blackbox_exporter \ -p 9115:9115 \ -v $(pwd)/blackbox.yml:/config/blackbox.yml \ prom/blackbox-exporter:latest \ --config.file=/config/blackbox.yml
The blackbox.yml configuration file defines the probes. For an HTTP probe targeting our Rails app:
modules:
http_2xx:
prober: http
timeout: 5s
http:
method: GET
# Expect a 200 OK status code
valid_status_codes: []
# Optionally, check for specific content in the response body
# body_string: "Application is healthy"
# tls_config:
# insecure_skip_verify: true # Use with caution, only for internal testing if needed
tcp_connect:
prober: tcp
timeout: 5s
tcp:
# Connect to the application's port
preferred_ip_protocol: "ip4"
query_timeout: 2s
Configuring Prometheus
In your Prometheus configuration (prometheus.yml), add a scrape job for the blackbox_exporter. This job will instruct Prometheus to query the blackbox_exporter for specific targets.
scrape_configs:
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx] # Or 'tcp_connect'
static_configs:
- targets:
- http://your-rails-app-domain.com/health # Target your application's health endpoint
- http://another-app-instance.com/health
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter-host:9115 # Address of your blackbox_exporter instance
This setup allows Prometheus to monitor the availability and responsiveness of your Ruby application’s health endpoint from an external perspective. Alerts can then be configured in Prometheus (or Alertmanager) based on the metrics generated by blackbox_exporter.
Monitoring DynamoDB Performance and Health
DynamoDB, as a managed NoSQL database, offloads much of the operational burden. However, performance and cost optimization still require diligent monitoring. Key metrics to track include read/write capacity utilization, latency, and throttled requests.
Leveraging AWS CloudWatch Metrics
AWS CloudWatch is the primary service for monitoring DynamoDB. It provides a rich set of metrics that can be accessed via the AWS CLI, SDKs, or the AWS Management Console. For automated monitoring and alerting, we’ll focus on using the AWS CLI and integrating with Prometheus.
Key DynamoDB metrics to monitor:
ConsumedReadCapacityUnits: The number of read capacity units consumed by your operations.ConsumedWriteCapacityUnits: The number of write capacity units consumed.ProvisionedReadCapacityUnits: The number of read capacity units provisioned.ProvisionedWriteCapacityUnits: The number of write capacity units provisioned.ReadThrottleEvents: The number of read requests that were throttled.WriteThrottleEvents: The number of write requests that were throttled.SuccessfulRequestLatency: The latency of successful requests.ThrottledRequests: The number of requests that were throttled (combines read and write).
Automated Metric Collection with AWS CLI and Cron
For a basic setup, you can use cron jobs to periodically fetch CloudWatch metrics using the AWS CLI and store them or push them to a time-series database. This is a less sophisticated approach than dedicated exporters but can be effective for smaller deployments.
# Example script to fetch read throttle events for a table aws cloudwatch get-metric-statistics \ --namespace AWS/DynamoDB \ --metric-name ReadThrottleEvents \ --dimensions Name=TableName,Value=YourTableName \ --start-time $(date -u -d '5 minutes ago' '+%Y-%m-%dT%H:%M:%SZ') \ --end-time $(date -u '+%Y-%m-%dT%H:%M:%SZ') \ --period 300 \ --statistics Sum \ --output json >> dynamodb_metrics.log
This script fetches the sum of ReadThrottleEvents over the last 5 minutes. You would need to adapt this for other metrics and potentially parse the JSON output for more advanced processing.
Integrating DynamoDB Metrics with Prometheus via CloudWatch Exporter
A more robust and scalable approach is to use the Prometheus cloudwatch_exporter. This exporter can be configured to scrape specific CloudWatch metrics and expose them in Prometheus-readable format.
Deploying CloudWatch Exporter
Similar to blackbox_exporter, cloudwatch_exporter can be deployed via Docker.
docker run -d --name cloudwatch_exporter \ -p 9118:9118 \ -v $(pwd)/cloudwatch_exporter.yml:/etc/cloudwatch_exporter/config.yml \ -e AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY_ID" \ -e AWS_SECRET_ACCESS_KEY="YOUR_SECRET_ACCESS_KEY" \ -e AWS_REGION="us-east-1" \ prom/cloudwatch-exporter:latest \ --config.file=/etc/cloudwatch_exporter/config.yml
Ensure the IAM user associated with the provided credentials has permissions to access CloudWatch metrics (e.g., cloudwatch:GetMetricStatistics, cloudwatch:ListMetrics).
Configuring CloudWatch Exporter
The cloudwatch_exporter.yml file defines which metrics to collect. Here’s an example for DynamoDB:
# cloudwatch_exporter.yml
discovery:
jobs:
- type: DynamoDB
regions:
- us-east-1 # Your AWS region
metrics:
- name: ConsumedReadCapacityUnits
statistics: [Sum]
period: 300 # 5 minutes
length: 1
- name: ConsumedWriteCapacityUnits
statistics: [Sum]
period: 300
length: 1
- name: ReadThrottleEvents
statistics: [Sum]
period: 300
length: 1
- name: WriteThrottleEvents
statistics: [Sum]
period: 300
length: 1
- name: SuccessfulRequestLatency
statistics: [Average, Maximum]
period: 60 # 1 minute
length: 1
# You can also specify specific table names if needed
# table_names:
# - YourTableName1
# - YourTableName2
Configuring Prometheus to Scrape CloudWatch Exporter
Add a scrape job to your prometheus.yml to collect metrics from the cloudwatch_exporter.
scrape_configs:
- job_name: 'cloudwatch_dynamodb'
static_configs:
- targets:
- 'cloudwatch-exporter-host:9118' # Address of your cloudwatch_exporter instance
metric_relabel_configs:
# Filter for DynamoDB metrics specifically
- source_labels: [__name__]
regex: 'aws_cloudwatch_dynamodb_(.*)'
action: keep
# Add labels for table name, region, etc. if not already present
- source_labels: [table]
target_label: table_name
- source_labels: [region]
target_label: aws_region
With this setup, Prometheus will ingest DynamoDB metrics, allowing you to create dashboards in Grafana and set up alerts for conditions like high read/write utilization, throttling, or increased latency. This proactive monitoring is essential for maintaining application performance and controlling costs on DigitalOcean.