Server Monitoring Best Practices: Keeping Your Perl App and MongoDB Clusters Alive on DigitalOcean
Establishing a Baseline: Essential Metrics for Perl Apps and MongoDB
Before diving into complex alerting, a robust monitoring strategy begins with understanding your system’s normal operating parameters. For a Perl application, this means tracking request latency, error rates, and resource consumption (CPU, memory, disk I/O). For MongoDB clusters, key indicators include query performance, replication lag, connection counts, and disk usage.
Perl Application Monitoring with `ps` and `top` (Initial Triage)
For immediate, on-server diagnostics of your Perl application, standard Unix utilities are invaluable. While not a long-term monitoring solution, they provide critical real-time insights during an incident.
Identifying High-CPU Perl Processes
A common symptom of an unhealthy Perl application is a runaway process consuming excessive CPU. The top command, or its more modern counterpart htop, is your first line of defense.
Using top
Run top and look for processes with a high `%CPU` value. Note the `PID` (Process ID) of the offending Perl script. You can filter top to show only Perl processes:
top -p $(pgrep -d, perl)
Using htop (Recommended)
htop offers a more user-friendly, color-coded interface. Install it if you don’t have it (e.g., sudo apt-get install htop or sudo yum install htop). Once installed, run htop and press F4 to filter by command name, then type “perl”.
Inspecting Memory Usage
High memory usage can lead to swapping, drastically degrading performance. Use ps to get a detailed view of memory consumption.
ps aux | grep perl | grep -v grep | awk '{print $2, $4, $5, $11}' | sort -k2 -nr
This command outputs: PID, %MEM, RSS (Resident Set Size in KB), and the command name. Sorting by `%MEM` (descending) helps pinpoint memory hogs.
Advanced Perl Application Monitoring: Prometheus & Exporters
For production environments, relying solely on shell commands is insufficient. A robust monitoring stack is essential. Prometheus, with its time-series database and powerful query language (PromQL), is a de facto standard. We’ll use a custom exporter for Perl metrics.
Developing a Custom Perl Exporter
You can write a simple Perl script that exposes metrics via an HTTP endpoint, which Prometheus can then scrape. This script will need to gather application-specific metrics.
Example: Basic Perl Metrics Exporter
This example uses the HTTP::Server::Simple module to expose metrics in Prometheus text format. You’ll need to install it: cpan HTTP::Server::Simple.
use strict;
use warnings;
use HTTP::Server::Simple;
use HTTP::Response;
use Sys::Statistics::Proc;
use Time::HiRes qw(time);
my $server = HTTP::Server::Simple->new(Port => 9101); # Prometheus default port for node_exporter is 9100, use a different one for custom exporters
my $proc_stats = Sys::Statistics::Proc->new();
$server->run(sub {
my $self = shift;
my $conn = shift;
my $response = HTTP::Response->new(200, 'OK', ['Content-Type' => 'text/plain']);
my $body = '';
# Get process stats for the current script (the exporter itself)
my $pid = $$;
my $stats = $proc_stats->get($pid);
if ($stats) {
# CPU Usage
my $cpu_seconds = $stats->{utime} + $stats->{stime};
my $cpu_total_seconds = time(); # Approximation, ideally track over an interval
$body .= "# HELP perl_process_cpu_seconds_total Total CPU time spent by the process in seconds.\n";
$body .= "# TYPE perl_process_cpu_seconds_total counter\n";
$body .= "perl_process_cpu_seconds_total{pid=\"$pid\"} $cpu_seconds\n\n";
# Memory Usage (Resident Set Size)
my $rss_bytes = $stats->{rss} * 1024; # RSS is usually in KB
$body .= "# HELP perl_process_resident_memory_bytes Resident set size of the process in bytes.\n";
$body .= "# TYPE perl_process_resident_memory_bytes gauge\n";
$body .= "perl_process_resident_memory_bytes{pid=\"$pid\"} $rss_bytes\n\n";
# Number of file descriptors
my $fd_count = $stats->{fd_count};
$body .= "# HELP perl_process_open_fds Number of file descriptors opened by the process.\n";
$body .= "# TYPE perl_process_open_fds gauge\n";
$body .= "perl_process_open_fds{pid=\"$pid\"} $fd_count\n\n";
} else {
$body .= "# ERROR: Could not retrieve stats for PID $pid\n";
}
# Add application-specific metrics here (e.g., request count, error count)
# Example: Simulate a request counter
my $request_count = int(rand(1000));
$body .= "# HELP myapp_requests_total Total requests processed by the application.\n";
$body .= "# TYPE myapp_requests_total counter\n";
$body .= "myapp_requests_total $request_count\n\n";
$response->content($body);
return $response;
});
To run this exporter, save it as exporter.pl and execute it: perl exporter.pl. You can then access the metrics at http://your_server_ip:9101.
Configuring Prometheus to Scrape the Exporter
Edit your prometheus.yml configuration file to include a scrape job for your Perl application exporter.
scrape_configs:
- job_name: 'perl_app'
static_configs:
- targets: ['your_server_ip:9101'] # Replace with your server's IP and exporter port
MongoDB Cluster Monitoring: Built-in Metrics and Tools
MongoDB provides a rich set of metrics accessible via the mongostat and mongotop command-line utilities, as well as through the MongoDB Management Service (MMS) or its successor, MongoDB Atlas, and the MongoDB Enterprise features.
Real-time MongoDB Performance with mongostat
mongostat provides a quick overview of MongoDB server statistics. It’s excellent for understanding current load and identifying bottlenecks.
mongostat --host mongodb_host:27017 --username mongo_user --password mongo_password --authenticationDatabase admin --rows 10 --interval 5
Key metrics to watch:
insert,query,update,delete: Operations per second. Spikes can indicate heavy load or inefficient queries.getmore: Number ofgetMoreoperations. High values can indicate large cursors or inefficient pagination.dirty: Percentage of dirty pages in the WiredTiger cache. High values might suggest I/O bottlenecks or insufficient cache size.used: Percentage of WiredTiger cache used.qrw,qw: Queue length for read and write operations. High numbers indicate contention.idx%: Percentage of queries that use an index. Low values are a strong indicator of performance problems.netIn,netOut: Network traffic in bytes.conn: Number of active connections.
Identifying Slow Queries with mongotop
mongotop reports on the time MongoDB spends reading and writing data per collection. This is crucial for pinpointing slow-performing collections.
mongotop --host mongodb_host:27017 --username mongo_user --password mongo_password --authenticationDatabase admin --locks --interval 5
Look for collections with high lock times (%Code, %Read, %Write) or significant time spent reading/writing data. This often points to missing indexes or inefficient query patterns.
MongoDB Monitoring with Prometheus and the MongoDB Exporter
For continuous, historical monitoring of your MongoDB cluster, the Prometheus MongoDB exporter is indispensable. It exposes a wide range of MongoDB metrics in a format Prometheus can ingest.
Setting up the MongoDB Exporter
The MongoDB exporter is typically run as a separate service. You can download pre-compiled binaries or build from source. Ensure the user running the exporter has sufficient read permissions on the MongoDB instances.
Configuration Example
The exporter is configured via a YAML file (e.g., mongodb_exporter.yml). This file specifies the MongoDB connection strings and which metrics to collect.
# mongodb_exporter.yml
mongodb:
- name: "my_replica_set"
uri: "mongodb://mongo_user:[email protected]:27017,mongo2.example.com:27017,mongo3.example.com:27017/?replicaSet=myReplicaSet&authSource=admin"
# Optional: specify specific databases or collections to monitor
# databases:
# - name: "mydb"
# collections:
# - "mycollection"
metrics:
- name: "mongodb_up"
enabled: true
- name: "mongodb_mongod_connections"
enabled: true
- name: "mongodb_mongod_network_bytes_in"
enabled: true
- name: "mongodb_mongod_network_bytes_out"
enabled: true
- name: "mongodb_mongod_opcounters_insert"
enabled: true
- name: "mongodb_mongod_opcounters_query"
enabled: true
- name: "mongodb_mongod_opcounters_update"
enabled: true
- name: "mongodb_mongod_opcounters_delete"
enabled: true
- name: "mongodb_mongod_opcounters_getmore"
enabled: true
- name: "mongodb_mongod_opcounters_command"
enabled: true
- name: "mongodb_mongod_cache_used_percent"
enabled: true
- name: "mongodb_mongod_cache_dirty_percent"
enabled: true
- name: "mongodb_mongod_locks_time_acquiring_micros"
enabled: true
- name: "mongodb_mongod_locks_time_waiting_micros"
enabled: true
- name: "mongodb_replset_member_state"
enabled: true
- name: "mongodb_replset_member_optime_lag"
enabled: true
Start the exporter (assuming it’s installed and configured):
./mongodb_exporter --config.file=mongodb_exporter.yml --web.listen-address=":9216"
Configuring Prometheus for MongoDB Exporter
Add a new job to your prometheus.yml to scrape the MongoDB exporter.
scrape_configs:
- job_name: 'mongodb'
static_configs:
- targets: ['your_mongodb_exporter_ip:9216'] # Replace with your exporter's IP and port
Alerting Strategies: Defining Critical Thresholds
Once you have your metrics flowing into Prometheus, the next step is to define meaningful alerts. Alerts should be actionable and indicate a deviation from normal behavior that requires intervention.
Perl Application Alerts
These alerts focus on the health and performance of your Perl application instances.
# Alerting rules for Perl application
groups:
- name: perl_app_alerts
rules:
- alert: HighCpuUsagePerlApp
expr: avg by (instance) (rate(process_cpu_seconds_total{job="perl_app"}[5m])) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "Perl application on {{ $labels.instance }} is using more than 80% CPU for the last 5 minutes."
- alert: HighMemoryUsagePerlApp
expr: process_resident_memory_bytes{job="perl_app"} / (1024 * 1024 * 1024) > 4 # Alert if memory usage exceeds 4GB
for: 10m
labels:
severity: warning
annotations:
summary: "High Memory Usage on {{ $labels.instance }}"
description: "Perl application on {{ $labels.instance }} is using more than 4GB of RAM."
- alert: PerlAppExporterDown
expr: up{job="perl_app"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Perl application exporter is down on {{ $labels.instance }}"
description: "Prometheus cannot scrape metrics from the Perl application exporter on {{ $labels.instance }}."
- alert: HighRequestRatePerlApp
expr: rate(myapp_requests_total[5m]) > 1000 # Example: Alert if request rate exceeds 1000 req/min
for: 2m
labels:
severity: info
annotations:
summary: "High request rate on {{ $labels.instance }}"
description: "Perl application on {{ $labels.instance }} is processing {{ $value | printf \"%.0f\" }} requests per minute."
MongoDB Cluster Alerts
These alerts focus on the health, performance, and availability of your MongoDB replica sets and individual nodes.
# Alerting rules for MongoDB cluster
groups:
- name: mongodb_alerts
rules:
- alert: MongoDBReplicaSetMemberDown
expr: mongodb_replset_member_state != 1 # State 1 is PRIMARY
for: 5m
labels:
severity: critical
annotations:
summary: "MongoDB replica set member is down or not PRIMARY on {{ $labels.instance }}"
description: "The MongoDB instance {{ $labels.instance }} is in state {{ $value }} (expected PRIMARY)."
- alert: HighMongoDBQueryLatency
expr: rate(mongodb_mongod_opcounters_query{job="mongodb"}[5m]) > 0 AND mongodb_mongod_locks_time_waiting_micros{job="mongodb", type="read"} / rate(mongodb_mongod_opcounters_query{job="mongodb"}[5m]) > 100000 # Average wait time per query > 100ms
for: 5m
labels:
severity: warning
annotations:
summary: "High MongoDB query latency on {{ $labels.instance }}"
description: "MongoDB instance {{ $labels.instance }} is experiencing high query latency. Average wait time is {{ $value | printf \"%.2f\" }} microseconds per query."
- alert: HighMongoDBWriteLatency
expr: rate(mongodb_mongod_opcounters_update{job="mongodb"}[5m]) > 0 AND mongodb_mongod_locks_time_waiting_micros{job="mongodb", type="write"} / rate(mongodb_mongod_opcounters_update{job="mongodb"}[5m]) > 100000 # Average wait time per update > 100ms
for: 5m
labels:
severity: warning
annotations:
summary: "High MongoDB write latency on {{ $labels.instance }}"
description: "MongoDB instance {{ $labels.instance }} is experiencing high write latency. Average wait time is {{ $value | printf \"%.2f\" }} microseconds per write."
- alert: MongoDBReplicationLag
expr: mongodb_replset_member_optime_lag > 60 # Replication lag greater than 60 seconds
for: 2m
labels:
severity: warning
annotations:
summary: "MongoDB replication lag detected on {{ $labels.instance }}"
description: "MongoDB instance {{ $labels.instance }} is lagging behind the primary by {{ $value | printf \"%.0f\" }} seconds."
- alert: HighCacheUsageMongoDB
expr: mongodb_mongod_cache_used_percent{job="mongodb"} > 90
for: 10m
labels:
severity: warning
annotations:
summary: "High MongoDB cache usage on {{ $labels.instance }}"
description: "MongoDB instance {{ $labels.instance }} is using {{ $value | printf \"%.0f\" }}% of its cache."
- alert: MongoDBExporterDown
expr: up{job="mongodb"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MongoDB exporter is down on {{ $labels.instance }}"
description: "Prometheus cannot scrape metrics from the MongoDB exporter on {{ $labels.instance }}."
Integrating with Alertmanager
Prometheus itself does not send notifications. It relies on Alertmanager for deduplication, grouping, routing, and silencing of alerts. Configure Prometheus to send alerts to Alertmanager.
# prometheus.yml
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager_host:9093'] # Replace with your Alertmanager host and port
In Alertmanager’s configuration (alertmanager.yml), you define receivers (e.g., Slack, PagerDuty, email) and routing rules based on labels (like severity).
# alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver' # Default receiver if no specific route matches
receivers:
- name: 'default-receiver'
slack_configs:
- api_url: 'https://hooks.slack.com/services/...'
channel: '#alerts'
send_resolved: true
# Example routing for critical alerts to PagerDuty
- name: 'pagerduty-receiver'
pagerduty_configs:
- service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
routes:
- match:
severity: 'critical'
receiver: 'pagerduty-receiver'
continue: true # Allows further routing if needed
- match:
severity: 'warning'
receiver: 'default-receiver'
continue: true
DigitalOcean Specific Considerations
When deploying on DigitalOcean, leverage their managed services and infrastructure features:
- Droplet Monitoring: DigitalOcean provides basic CPU, RAM, disk, and network monitoring for Droplets. Integrate these metrics into your Prometheus stack if possible, or use them as a secondary check.
- Managed Databases: If using DigitalOcean Managed MongoDB, they handle much of the underlying infrastructure monitoring. Focus your custom monitoring on application-level metrics and query performance.
- Firewalls: Ensure your Prometheus server, Alertmanager, and exporters can communicate by configuring DigitalOcean Cloud Firewalls appropriately. Open necessary ports (e.g., 9090 for Prometheus, 9093 for Alertmanager, exporter ports like 9101 and 9216).
- Load Balancers: If your Perl application is behind a DigitalOcean Load Balancer, monitor its health and traffic patterns. Ensure health checks are configured correctly to remove unhealthy application instances from rotation.
Conclusion: Proactive System Health
A comprehensive monitoring strategy for your Perl applications and MongoDB clusters on DigitalOcean involves a layered approach. Start with essential command-line tools for quick diagnostics, implement robust exporters for Prometheus to gather detailed metrics, define actionable alerts in Prometheus, and use Alertmanager for intelligent notification routing. By continuously observing and reacting to these metrics, you can ensure the stability, performance, and availability of your critical systems.