Server Monitoring Best Practices: Keeping Your Perl App and MongoDB Clusters Alive on DigitalOcean

Establishing a Baseline: Essential Metrics for Perl Apps and MongoDB

Before diving into complex alerting, a robust monitoring strategy begins with understanding your system’s normal operating parameters. For a Perl application, this means tracking request latency, error rates, and resource consumption (CPU, memory, disk I/O). For MongoDB clusters, key indicators include query performance, replication lag, connection counts, and disk usage.

Perl Application Monitoring with `ps` and `top` (Initial Triage)

For immediate, on-server diagnostics of your Perl application, standard Unix utilities are invaluable. While not a long-term monitoring solution, they provide critical real-time insights during an incident.

Identifying High-CPU Perl Processes

A common symptom of an unhealthy Perl application is a runaway process consuming excessive CPU. The top command, or its more modern counterpart htop, is your first line of defense.

Using `top`

Run top and look for processes with a high `%CPU` value. Note the `PID` (Process ID) of the offending Perl script. You can filter top to show only Perl processes:

top -p $(pgrep -d, perl)

Using `htop` (Recommended)

htop offers a more user-friendly, color-coded interface. Install it if you don’t have it (e.g., sudo apt-get install htop or sudo yum install htop). Once installed, run htop and press F4 to filter by command name, then type “perl”.

Inspecting Memory Usage

High memory usage can lead to swapping, drastically degrading performance. Use ps to get a detailed view of memory consumption.

ps aux | grep perl | grep -v grep | awk '{print $2, $4, $5, $11}' | sort -k2 -nr

This command outputs: PID, %MEM, RSS (Resident Set Size in KB), and the command name. Sorting by `%MEM` (descending) helps pinpoint memory hogs.

Advanced Perl Application Monitoring: Prometheus & Exporters

For production environments, relying solely on shell commands is insufficient. A robust monitoring stack is essential. Prometheus, with its time-series database and powerful query language (PromQL), is a de facto standard. We’ll use a custom exporter for Perl metrics.

Developing a Custom Perl Exporter

You can write a simple Perl script that exposes metrics via an HTTP endpoint, which Prometheus can then scrape. This script will need to gather application-specific metrics.

Example: Basic Perl Metrics Exporter

This example uses the HTTP::Server::Simple module to expose metrics in Prometheus text format. You’ll need to install it: cpan HTTP::Server::Simple.

use strict;
use warnings;
use HTTP::Server::Simple;
use HTTP::Response;
use Sys::Statistics::Proc;
use Time::HiRes qw(time);

my $server = HTTP::Server::Simple->new(Port => 9101); # Prometheus default port for node_exporter is 9100, use a different one for custom exporters

my $proc_stats = Sys::Statistics::Proc->new();

$server->run(sub {
    my $self = shift;
    my $conn = shift;

    my $response = HTTP::Response->new(200, 'OK', ['Content-Type' => 'text/plain']);
    my $body = '';

    # Get process stats for the current script (the exporter itself)
    my $pid = $$;
    my $stats = $proc_stats->get($pid);

    if ($stats) {
        # CPU Usage
        my $cpu_seconds = $stats->{utime} + $stats->{stime};
        my $cpu_total_seconds = time(); # Approximation, ideally track over an interval
        $body .= "# HELP perl_process_cpu_seconds_total Total CPU time spent by the process in seconds.\n";
        $body .= "# TYPE perl_process_cpu_seconds_total counter\n";
        $body .= "perl_process_cpu_seconds_total{pid=\"$pid\"} $cpu_seconds\n\n";

        # Memory Usage (Resident Set Size)
        my $rss_bytes = $stats->{rss} * 1024; # RSS is usually in KB
        $body .= "# HELP perl_process_resident_memory_bytes Resident set size of the process in bytes.\n";
        $body .= "# TYPE perl_process_resident_memory_bytes gauge\n";
        $body .= "perl_process_resident_memory_bytes{pid=\"$pid\"} $rss_bytes\n\n";

        # Number of file descriptors
        my $fd_count = $stats->{fd_count};
        $body .= "# HELP perl_process_open_fds Number of file descriptors opened by the process.\n";
        $body .= "# TYPE perl_process_open_fds gauge\n";
        $body .= "perl_process_open_fds{pid=\"$pid\"} $fd_count\n\n";
    } else {
        $body .= "# ERROR: Could not retrieve stats for PID $pid\n";
    }

    # Add application-specific metrics here (e.g., request count, error count)
    # Example: Simulate a request counter
    my $request_count = int(rand(1000));
    $body .= "# HELP myapp_requests_total Total requests processed by the application.\n";
    $body .= "# TYPE myapp_requests_total counter\n";
    $body .= "myapp_requests_total $request_count\n\n";

    $response->content($body);
    return $response;
});

To run this exporter, save it as exporter.pl and execute it: perl exporter.pl. You can then access the metrics at http://your_server_ip:9101.

Configuring Prometheus to Scrape the Exporter

Edit your prometheus.yml configuration file to include a scrape job for your Perl application exporter.

scrape_configs:
  - job_name: 'perl_app'
    static_configs:
      - targets: ['your_server_ip:9101'] # Replace with your server's IP and exporter port

MongoDB Cluster Monitoring: Built-in Metrics and Tools

MongoDB provides a rich set of metrics accessible via the mongostat and mongotop command-line utilities, as well as through the MongoDB Management Service (MMS) or its successor, MongoDB Atlas, and the MongoDB Enterprise features.

Real-time MongoDB Performance with `mongostat`

mongostat provides a quick overview of MongoDB server statistics. It’s excellent for understanding current load and identifying bottlenecks.

mongostat --host mongodb_host:27017 --username mongo_user --password mongo_password --authenticationDatabase admin --rows 10 --interval 5

Key metrics to watch:

insert, query, update, delete: Operations per second. Spikes can indicate heavy load or inefficient queries.
getmore: Number of getMore operations. High values can indicate large cursors or inefficient pagination.
dirty: Percentage of dirty pages in the WiredTiger cache. High values might suggest I/O bottlenecks or insufficient cache size.
used: Percentage of WiredTiger cache used.
qrw, qw: Queue length for read and write operations. High numbers indicate contention.
idx%: Percentage of queries that use an index. Low values are a strong indicator of performance problems.
netIn, netOut: Network traffic in bytes.
conn: Number of active connections.

Identifying Slow Queries with `mongotop`

mongotop reports on the time MongoDB spends reading and writing data per collection. This is crucial for pinpointing slow-performing collections.

mongotop --host mongodb_host:27017 --username mongo_user --password mongo_password --authenticationDatabase admin --locks --interval 5

Look for collections with high lock times (%Code, %Read, %Write) or significant time spent reading/writing data. This often points to missing indexes or inefficient query patterns.

MongoDB Monitoring with Prometheus and the MongoDB Exporter

For continuous, historical monitoring of your MongoDB cluster, the Prometheus MongoDB exporter is indispensable. It exposes a wide range of MongoDB metrics in a format Prometheus can ingest.

Setting up the MongoDB Exporter

The MongoDB exporter is typically run as a separate service. You can download pre-compiled binaries or build from source. Ensure the user running the exporter has sufficient read permissions on the MongoDB instances.

Configuration Example

The exporter is configured via a YAML file (e.g., mongodb_exporter.yml). This file specifies the MongoDB connection strings and which metrics to collect.

# mongodb_exporter.yml
mongodb:
  - name: "my_replica_set"
    uri: "mongodb://mongo_user:[email protected]:27017,mongo2.example.com:27017,mongo3.example.com:27017/?replicaSet=myReplicaSet&authSource=admin"
    # Optional: specify specific databases or collections to monitor
    # databases:
    #   - name: "mydb"
    #     collections:
    #       - "mycollection"
    metrics:
      - name: "mongodb_up"
        enabled: true
      - name: "mongodb_mongod_connections"
        enabled: true
      - name: "mongodb_mongod_network_bytes_in"
        enabled: true
      - name: "mongodb_mongod_network_bytes_out"
        enabled: true
      - name: "mongodb_mongod_opcounters_insert"
        enabled: true
      - name: "mongodb_mongod_opcounters_query"
        enabled: true
      - name: "mongodb_mongod_opcounters_update"
        enabled: true
      - name: "mongodb_mongod_opcounters_delete"
        enabled: true
      - name: "mongodb_mongod_opcounters_getmore"
        enabled: true
      - name: "mongodb_mongod_opcounters_command"
        enabled: true
      - name: "mongodb_mongod_cache_used_percent"
        enabled: true
      - name: "mongodb_mongod_cache_dirty_percent"
        enabled: true
      - name: "mongodb_mongod_locks_time_acquiring_micros"
        enabled: true
      - name: "mongodb_mongod_locks_time_waiting_micros"
        enabled: true
      - name: "mongodb_replset_member_state"
        enabled: true
      - name: "mongodb_replset_member_optime_lag"
        enabled: true

Start the exporter (assuming it’s installed and configured):

./mongodb_exporter --config.file=mongodb_exporter.yml --web.listen-address=":9216"

Configuring Prometheus for MongoDB Exporter

Add a new job to your prometheus.yml to scrape the MongoDB exporter.

scrape_configs:
  - job_name: 'mongodb'
    static_configs:
      - targets: ['your_mongodb_exporter_ip:9216'] # Replace with your exporter's IP and port

Alerting Strategies: Defining Critical Thresholds

Once you have your metrics flowing into Prometheus, the next step is to define meaningful alerts. Alerts should be actionable and indicate a deviation from normal behavior that requires intervention.

Perl Application Alerts

These alerts focus on the health and performance of your Perl application instances.

# Alerting rules for Perl application
groups:
- name: perl_app_alerts
  rules:
  - alert: HighCpuUsagePerlApp
    expr: avg by (instance) (rate(process_cpu_seconds_total{job="perl_app"}[5m])) * 100 > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "Perl application on {{ $labels.instance }} is using more than 80% CPU for the last 5 minutes."

  - alert: HighMemoryUsagePerlApp
    expr: process_resident_memory_bytes{job="perl_app"} / (1024 * 1024 * 1024) > 4 # Alert if memory usage exceeds 4GB
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High Memory Usage on {{ $labels.instance }}"
      description: "Perl application on {{ $labels.instance }} is using more than 4GB of RAM."

  - alert: PerlAppExporterDown
    expr: up{job="perl_app"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Perl application exporter is down on {{ $labels.instance }}"
      description: "Prometheus cannot scrape metrics from the Perl application exporter on {{ $labels.instance }}."

  - alert: HighRequestRatePerlApp
    expr: rate(myapp_requests_total[5m]) > 1000 # Example: Alert if request rate exceeds 1000 req/min
    for: 2m
    labels:
      severity: info
    annotations:
      summary: "High request rate on {{ $labels.instance }}"
      description: "Perl application on {{ $labels.instance }} is processing {{ $value | printf \"%.0f\" }} requests per minute."

MongoDB Cluster Alerts

These alerts focus on the health, performance, and availability of your MongoDB replica sets and individual nodes.

# Alerting rules for MongoDB cluster
groups:
- name: mongodb_alerts
  rules:
  - alert: MongoDBReplicaSetMemberDown
    expr: mongodb_replset_member_state != 1 # State 1 is PRIMARY
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "MongoDB replica set member is down or not PRIMARY on {{ $labels.instance }}"
      description: "The MongoDB instance {{ $labels.instance }} is in state {{ $value }} (expected PRIMARY)."

  - alert: HighMongoDBQueryLatency
    expr: rate(mongodb_mongod_opcounters_query{job="mongodb"}[5m]) > 0 AND mongodb_mongod_locks_time_waiting_micros{job="mongodb", type="read"} / rate(mongodb_mongod_opcounters_query{job="mongodb"}[5m]) > 100000 # Average wait time per query > 100ms
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High MongoDB query latency on {{ $labels.instance }}"
      description: "MongoDB instance {{ $labels.instance }} is experiencing high query latency. Average wait time is {{ $value | printf \"%.2f\" }} microseconds per query."

  - alert: HighMongoDBWriteLatency
    expr: rate(mongodb_mongod_opcounters_update{job="mongodb"}[5m]) > 0 AND mongodb_mongod_locks_time_waiting_micros{job="mongodb", type="write"} / rate(mongodb_mongod_opcounters_update{job="mongodb"}[5m]) > 100000 # Average wait time per update > 100ms
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High MongoDB write latency on {{ $labels.instance }}"
      description: "MongoDB instance {{ $labels.instance }} is experiencing high write latency. Average wait time is {{ $value | printf \"%.2f\" }} microseconds per write."

  - alert: MongoDBReplicationLag
    expr: mongodb_replset_member_optime_lag > 60 # Replication lag greater than 60 seconds
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "MongoDB replication lag detected on {{ $labels.instance }}"
      description: "MongoDB instance {{ $labels.instance }} is lagging behind the primary by {{ $value | printf \"%.0f\" }} seconds."

  - alert: HighCacheUsageMongoDB
    expr: mongodb_mongod_cache_used_percent{job="mongodb"} > 90
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High MongoDB cache usage on {{ $labels.instance }}"
      description: "MongoDB instance {{ $labels.instance }} is using {{ $value | printf \"%.0f\" }}% of its cache."

  - alert: MongoDBExporterDown
    expr: up{job="mongodb"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "MongoDB exporter is down on {{ $labels.instance }}"
      description: "Prometheus cannot scrape metrics from the MongoDB exporter on {{ $labels.instance }}."

Integrating with Alertmanager

Prometheus itself does not send notifications. It relies on Alertmanager for deduplication, grouping, routing, and silencing of alerts. Configure Prometheus to send alerts to Alertmanager.

# prometheus.yml
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager_host:9093'] # Replace with your Alertmanager host and port

In Alertmanager’s configuration (alertmanager.yml), you define receivers (e.g., Slack, PagerDuty, email) and routing rules based on labels (like severity).

# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default-receiver' # Default receiver if no specific route matches

receivers:
- name: 'default-receiver'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/...'
    channel: '#alerts'
    send_resolved: true

# Example routing for critical alerts to PagerDuty
- name: 'pagerduty-receiver'
  pagerduty_configs:
  - service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'

routes:
- match:
    severity: 'critical'
  receiver: 'pagerduty-receiver'
  continue: true # Allows further routing if needed
- match:
    severity: 'warning'
  receiver: 'default-receiver'
  continue: true

DigitalOcean Specific Considerations

When deploying on DigitalOcean, leverage their managed services and infrastructure features:

Droplet Monitoring: DigitalOcean provides basic CPU, RAM, disk, and network monitoring for Droplets. Integrate these metrics into your Prometheus stack if possible, or use them as a secondary check.
Managed Databases: If using DigitalOcean Managed MongoDB, they handle much of the underlying infrastructure monitoring. Focus your custom monitoring on application-level metrics and query performance.
Firewalls: Ensure your Prometheus server, Alertmanager, and exporters can communicate by configuring DigitalOcean Cloud Firewalls appropriately. Open necessary ports (e.g., 9090 for Prometheus, 9093 for Alertmanager, exporter ports like 9101 and 9216).
Load Balancers: If your Perl application is behind a DigitalOcean Load Balancer, monitor its health and traffic patterns. Ensure health checks are configured correctly to remove unhealthy application instances from rotation.

Conclusion: Proactive System Health

A comprehensive monitoring strategy for your Perl applications and MongoDB clusters on DigitalOcean involves a layered approach. Start with essential command-line tools for quick diagnostics, implement robust exporters for Prometheus to gather detailed metrics, define actionable alerts in Prometheus, and use Alertmanager for intelligent notification routing. By continuously observing and reacting to these metrics, you can ensure the stability, performance, and availability of your critical systems.