Server Monitoring Best Practices: Keeping Your Perl App and MongoDB Clusters Alive on Linode

Proactive Health Checks for Perl Applications

Maintaining the health of a Perl application, especially one serving critical functions, requires more than just checking if the process is running. We need to ensure it’s responsive, handling requests efficiently, and not leaking resources. This involves a multi-layered approach, starting with basic process monitoring and extending to application-specific health endpoints.

Process and Resource Monitoring with `monit`

monit is an excellent, lightweight tool for managing and monitoring Unix systems. It can automatically perform actions, like restarting a service, when certain conditions are met. For a Perl application, we’ll typically monitor the main application process (e.g., a FastCGI or PSGI server) and its resource consumption.

First, ensure monit is installed on your Linode instance:

sudo apt update && sudo apt install monit -y

Next, configure monit to watch your Perl application. Assuming your Perl application is managed by a systemd service named myperlapp.service, a basic monit configuration might look like this. Create a new configuration file, for example, /etc/monit/conf.d/myperlapp:

check process myperlapp with pidfile /var/run/myperlapp.pid
  start program = "/bin/systemctl start myperlapp.service"
  stop program  = "/bin/systemctl stop myperlapp.service"
  if failed host 127.0.0.1 port 5000 protocol http then restart
  if 5 restarts within 5 cycles then timeout
  if cpu is greater than 80% for 2 cycles then alert
  if memory is greater than 500 MB for 2 cycles then restart
  if does not exist then restart

In this configuration:

check process myperlapp with pidfile /var/run/myperlapp.pid: Defines a service named myperlapp and specifies its PID file. Ensure your Perl application correctly writes its PID to this file.
start program and stop program: Defines how to start and stop the service using systemd.
if failed host 127.0.0.1 port 5000 protocol http then restart: This is a crucial application-level check. It attempts to connect to your application’s HTTP endpoint (assuming it’s listening on port 5000) and restarts it if the connection fails. Adjust the port and protocol as needed.
if 5 restarts within 5 cycles then timeout: Prevents rapid restart loops. If the service restarts 5 times within 5 monitoring cycles (default cycle is 2 minutes), monit will stop trying and alert you.
if cpu is greater than 80% for 2 cycles then alert: Alerts if CPU usage is consistently high. You might choose to restart instead of just alerting.
if memory is greater than 500 MB for 2 cycles then restart: Restarts the application if it consumes more than 500MB of memory. Tune this based on your application’s typical footprint.
if does not exist then restart: Checks if the PID file exists; if not, it assumes the process is down and attempts a restart.

After creating the configuration file, test it and reload monit:

sudo monit -t
sudo systemctl reload monit

You can check the status of your monitored services with:

sudo monit status

Application-Specific Health Endpoints

For more granular insights, your Perl application should expose a dedicated health check endpoint. This endpoint can perform internal checks, such as verifying database connections, cache availability, or the status of external API integrations. A simple Perl script using Mojolicious or Dancer2 can provide this.

Here’s an example using Mojolicious:

package MyApp::Controller::Health;
use Mojo::Base 'Mojolicious::Controller';

sub check {
    my $self = shift;

    # Example: Check MongoDB connection
    my $mongo_client = eval {
        Mojo::MongoDB->new('mongodb://localhost:27017');
    };
    unless ($mongo_client) {
        $self->render(json => { status => 'error', message => 'MongoDB connection failed' }, status => 503 );
        return;
    }

    # Example: Check a critical external service (simulated)
    my $external_service_ok = eval {
        # Replace with actual check, e.g., HTTP request to an API
        1; # Assume success for now
    };
    unless ($external_service_ok) {
        $self->render(json => { status => 'error', message => 'External service unavailable' }, status => 503 );
        return;
    }

    # If all checks pass
    $self->render(json => { status => 'ok', message => 'All systems nominal' }, status => 200 );
}

1;

This endpoint, typically mapped to /health, should return a 200 OK status code if all dependencies are healthy and a non-2xx status (e.g., 503 Service Unavailable) otherwise. This endpoint can then be polled by external monitoring services (like Prometheus, Nagios, or even a simple cron job with curl) or used by load balancers for health checks.

Monitoring MongoDB Clusters on Linode

MongoDB clusters, especially replica sets and sharded environments, require robust monitoring to ensure data availability, consistency, and performance. Key metrics include replication lag, oplog window, disk I/O, network traffic, query performance, and resource utilization.

Essential MongoDB Metrics

We’ll focus on metrics that are critical for operational health. Tools like mongostat and mongotop provide real-time insights, while mongodump and mongorestore are essential for backups, but for continuous monitoring, we need a more automated approach.

Key metrics to track:

Replication Lag: The difference in oplog entries between the primary and secondaries. High lag indicates potential data staleness on secondaries and can lead to read-your-writes consistency issues.
Oplog Window Size: The amount of time covered by the current oplog. A shrinking window can indicate that secondaries are not keeping up, or that the oplog is too small.
Network Traffic: Ingress/egress traffic to/from MongoDB instances.
Disk I/O: Read/write operations per second, latency.
Query Performance: Slow queries, query execution times.
Connections: Number of active connections, connection pool usage.
Memory Usage: Resident memory, virtual memory.
CPU Usage: System CPU, user CPU.
Disk Space: Available disk space on data volumes.

Leveraging Prometheus and Grafana

Prometheus is a popular open-source monitoring and alerting system. It scrapes metrics from configured targets at given intervals, evaluates rule expressions, and displays the results on a time-series database. Grafana is a powerful visualization tool that integrates seamlessly with Prometheus.

To monitor MongoDB with Prometheus, we use the mongodb_exporter. This exporter runs as a separate service, connects to your MongoDB instances, collects metrics, and exposes them via an HTTP endpoint (defaulting to port 9216) for Prometheus to scrape.

First, install the mongodb_exporter on a dedicated monitoring node or one of your MongoDB nodes (if resources permit). Download the latest release from GitHub.

# Example for Linux AMD64
wget https://github.com/mongodb-developer/mongodb_exporter/releases/download/v0.10.0/mongodb_exporter-0.10.0.linux.amd64.tar.gz
tar -xzf mongodb_exporter-0.10.0.linux.amd64.tar.gz
sudo mv mongodb_exporter /usr/local/bin/
rm mongodb_exporter-0.10.0.linux.amd64.tar.gz

Next, create a user in MongoDB with the necessary privileges for the exporter to read metrics. This user should have the clusterMonitor role.

# Connect to your MongoDB primary
mongo --host <your_mongo_primary_ip> -u admin -p <admin_password> --authenticationDatabase admin

# Create the exporter user
use admin
db.createUser({
  user: "exporter",
  pwd: "your_exporter_password",
  roles: [ { role: "clusterMonitor", db: "admin" } ]
})
exit

Now, configure the mongodb_exporter to connect to your MongoDB instances. It’s best practice to use a configuration file for credentials rather than command-line arguments.

# Create a credentials file, e.g., /etc/mongodb_exporter/credentials.yml
# Ensure this file has strict permissions: chmod 600 /etc/mongodb_exporter/credentials.yml
#
# Example content:
#
# - uri: "mongodb://exporter:[email protected]:27017/?replicaSet=rs0"
# - uri: "mongodb://exporter:[email protected]:27017/?replicaSet=rs0"
# - uri: "mongodb://exporter:[email protected]:27017/?replicaSet=rs0"
#
# If you have sharded clusters, you'll need to configure the config servers too.
# For a sharded cluster, you might have:
# - uri: "mongodb://exporter:[email protected]:27017"
# - uri: "mongodb://exporter:[email protected]:27017"
# - uri: "mongodb://exporter:[email protected]:27017"
#
# And for config servers (if not part of the replica set already listed):
# - uri: "mongodb://exporter:[email protected]:27019/?replicaSet=configReplSet"
# - uri: "mongodb://exporter:[email protected]:27019/?replicaSet=configReplSet"
# - uri: "mongodb://exporter:[email protected]:27019/?replicaSet=configReplSet"

Create a systemd service file for mongodb_exporter (e.g., /etc/systemd/system/mongodb_exporter.service):

[Unit]
Description=MongoDB Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=mongodb_exporter
Group=mongodb_exporter
Type=simple
ExecStart=/usr/local/bin/mongodb_exporter \
  --config.file=/etc/mongodb_exporter/credentials.yml \
  --web.listen-address=":9216" \
  --mongodb.log-dir=/var/log/mongodb \
  --mongodb.slow-queries-log=/var/log/mongodb/mongod.slow.log

Restart=on-failure

[Install]
[Install]
WantedBy=multi-user.target

Create the user and group for the exporter, and the log directory:

sudo groupadd --system mongodb_exporter
sudo useradd --system -g mongodb_exporter mongodb_exporter
sudo mkdir -p /etc/mongodb_exporter
sudo chown -R mongodb_exporter:mongodb_exporter /etc/mongodb_exporter
sudo mkdir -p /var/log/mongodb
sudo chown -R mongodb_exporter:mongodb_exporter /var/log/mongodb

Reload systemd, start, and enable the exporter:

sudo systemctl daemon-reload
sudo systemctl start mongodb_exporter
sudo systemctl enable mongodb_exporter
sudo systemctl status mongodb_exporter

Now, configure Prometheus to scrape the mongodb_exporter. In your prometheus.yml file, add a scrape configuration:

scrape_configs:
  - job_name: 'mongodb'
    static_configs:
      - targets: ['<your_mongodb_exporter_ip>:9216'] # Replace with the IP of your exporter host
        labels:
          cluster: 'my-mongo-cluster' # Add a descriptive label
          # If you have multiple MongoDB clusters, add more labels to differentiate

Restart Prometheus after updating its configuration.

Grafana Dashboards for MongoDB

Once Prometheus is scraping MongoDB metrics, you can import pre-built Grafana dashboards or create your own. A popular source for MongoDB dashboards is the Grafana Labs community dashboards. Search for “MongoDB” on grafana.com/grafana/dashboards/.

A good starting point is the “MongoDB” dashboard (often ID 7426 or similar). After importing, configure the dashboard to use your Prometheus data source. You’ll want to pay close attention to dashboards that visualize:

Replication status and lag (e.g., mongodb_replset_member_state, mongodb_replset_member_oplog_lag_seconds)
Oplog statistics (e.g., mongodb_oplog_window_seconds)
Performance metrics (e.g., mongodb_opcounter_insert, mongodb_opcounter_query, mongodb_network_bytes_in, mongodb_network_bytes_out)
Resource utilization (e.g., mongodb_storage_data_size, mongodb_mem_resident, mongodb_cpu_user_seconds_total)
Connection counts (e.g., mongodb_connections_current)

Alerting with Prometheus Alertmanager

To make your monitoring truly proactive, set up alerting. Prometheus Alertmanager handles alerts sent by Prometheus server instances. It receives alerts, deduplicates them, groups them, and routes them to the correct receiver integration (e.g., Slack, PagerDuty, email).

Define alerting rules in Prometheus. For example, in a file like /etc/prometheus/rules/mongodb_alerts.yml:

groups:
- name: mongodb.rules
  rules:
  - alert: HighReplicationLag
    expr: avg by (cluster) (mongodb_replset_member_oplog_lag_seconds{state="primary"}) > 600 # Alert if lag is over 10 minutes
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High replication lag detected on {{ $labels.cluster }}"
      description: "MongoDB cluster {{ $labels.cluster }} has replication lag exceeding 10 minutes."

  - alert: LowOplogWindow
    expr: avg by (cluster) (mongodb_oplog_window_seconds) < 3600 # Alert if oplog window is less than 1 hour
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "Low oplog window on {{ $labels.cluster }}"
      description: "MongoDB cluster {{ $labels.cluster }} has an oplog window less than 1 hour, indicating secondaries may fall behind."

  - alert: HighDiskUsage
    expr: node_filesystem_avail_bytes{mountpoint="/var/lib/mongodb"} / node_filesystem_size_bytes{mountpoint="/var/lib/mongodb"} * 100 < 15 # Alert if less than 15% disk space available
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: "Low disk space on MongoDB data volume for {{ $labels.instance }}"
      description: "MongoDB data volume on {{ $labels.instance }} has less than 15% free space."

Ensure these rules are loaded by Prometheus by adding them to your prometheus.yml:

rule_files:
  - "/etc/prometheus/rules/*.yml"

Configure Alertmanager with receivers for your preferred notification channels (e.g., Slack, email). This typically involves setting up a alertmanager.yml file with routes and receivers.

Linode Specific Considerations

When deploying on Linode, consider the following:

Network Latency: If your Perl app and MongoDB cluster are spread across different Linode data centers or even different regions, monitor network latency. High latency can impact application performance and replication.
Linode NodeBalancers: For high availability of your Perl application, use Linode NodeBalancers. Configure them to use your application’s health check endpoint for intelligent traffic distribution.
Disk I/O Performance: Linode offers different storage options (e.g., SSD, NVMe). For MongoDB, NVMe offers superior I/O performance, which is critical for database workloads. Monitor I/O wait times and throughput.
Resource Limits: Be mindful of Linode’s resource limits (CPU, RAM, network egress). Monitor these closely to avoid unexpected throttling or performance degradation.
Backups: While not strictly monitoring, robust backup strategies are essential. Linode provides automated backups, but ensure you also have application-level backups (e.g., mongodump) for critical data.

By combining application-level health checks, system resource monitoring, and dedicated database cluster monitoring with tools like Prometheus and Grafana, you can build a resilient infrastructure that keeps your Perl applications and MongoDB clusters running smoothly on Linode.