Server Monitoring Best Practices: Keeping Your Perl App and MongoDB Clusters Alive on Linode
Proactive Health Checks for Perl Applications
Maintaining the health of a Perl application, especially one serving critical functions, requires more than just checking if the process is running. We need to ensure it’s responsive, handling requests efficiently, and not leaking resources. This involves a multi-layered approach, starting with basic process monitoring and extending to application-specific health endpoints.
Process and Resource Monitoring with `monit`
monit is an excellent, lightweight tool for managing and monitoring Unix systems. It can automatically perform actions, like restarting a service, when certain conditions are met. For a Perl application, we’ll typically monitor the main application process (e.g., a FastCGI or PSGI server) and its resource consumption.
First, ensure monit is installed on your Linode instance:
sudo apt update && sudo apt install monit -y
Next, configure monit to watch your Perl application. Assuming your Perl application is managed by a systemd service named myperlapp.service, a basic monit configuration might look like this. Create a new configuration file, for example, /etc/monit/conf.d/myperlapp:
check process myperlapp with pidfile /var/run/myperlapp.pid start program = "/bin/systemctl start myperlapp.service" stop program = "/bin/systemctl stop myperlapp.service" if failed host 127.0.0.1 port 5000 protocol http then restart if 5 restarts within 5 cycles then timeout if cpu is greater than 80% for 2 cycles then alert if memory is greater than 500 MB for 2 cycles then restart if does not exist then restart
In this configuration:
check process myperlapp with pidfile /var/run/myperlapp.pid: Defines a service namedmyperlappand specifies its PID file. Ensure your Perl application correctly writes its PID to this file.start programandstop program: Defines how to start and stop the service using systemd.if failed host 127.0.0.1 port 5000 protocol http then restart: This is a crucial application-level check. It attempts to connect to your application’s HTTP endpoint (assuming it’s listening on port 5000) and restarts it if the connection fails. Adjust the port and protocol as needed.if 5 restarts within 5 cycles then timeout: Prevents rapid restart loops. If the service restarts 5 times within 5 monitoring cycles (default cycle is 2 minutes),monitwill stop trying and alert you.if cpu is greater than 80% for 2 cycles then alert: Alerts if CPU usage is consistently high. You might choose to restart instead of just alerting.if memory is greater than 500 MB for 2 cycles then restart: Restarts the application if it consumes more than 500MB of memory. Tune this based on your application’s typical footprint.if does not exist then restart: Checks if the PID file exists; if not, it assumes the process is down and attempts a restart.
After creating the configuration file, test it and reload monit:
sudo monit -t sudo systemctl reload monit
You can check the status of your monitored services with:
sudo monit status
Application-Specific Health Endpoints
For more granular insights, your Perl application should expose a dedicated health check endpoint. This endpoint can perform internal checks, such as verifying database connections, cache availability, or the status of external API integrations. A simple Perl script using Mojolicious or Dancer2 can provide this.
Here’s an example using Mojolicious:
package MyApp::Controller::Health;
use Mojo::Base 'Mojolicious::Controller';
sub check {
my $self = shift;
# Example: Check MongoDB connection
my $mongo_client = eval {
Mojo::MongoDB->new('mongodb://localhost:27017');
};
unless ($mongo_client) {
$self->render(json => { status => 'error', message => 'MongoDB connection failed' }, status => 503 );
return;
}
# Example: Check a critical external service (simulated)
my $external_service_ok = eval {
# Replace with actual check, e.g., HTTP request to an API
1; # Assume success for now
};
unless ($external_service_ok) {
$self->render(json => { status => 'error', message => 'External service unavailable' }, status => 503 );
return;
}
# If all checks pass
$self->render(json => { status => 'ok', message => 'All systems nominal' }, status => 200 );
}
1;
This endpoint, typically mapped to /health, should return a 200 OK status code if all dependencies are healthy and a non-2xx status (e.g., 503 Service Unavailable) otherwise. This endpoint can then be polled by external monitoring services (like Prometheus, Nagios, or even a simple cron job with curl) or used by load balancers for health checks.
Monitoring MongoDB Clusters on Linode
MongoDB clusters, especially replica sets and sharded environments, require robust monitoring to ensure data availability, consistency, and performance. Key metrics include replication lag, oplog window, disk I/O, network traffic, query performance, and resource utilization.
Essential MongoDB Metrics
We’ll focus on metrics that are critical for operational health. Tools like mongostat and mongotop provide real-time insights, while mongodump and mongorestore are essential for backups, but for continuous monitoring, we need a more automated approach.
Key metrics to track:
- Replication Lag: The difference in oplog entries between the primary and secondaries. High lag indicates potential data staleness on secondaries and can lead to read-your-writes consistency issues.
- Oplog Window Size: The amount of time covered by the current oplog. A shrinking window can indicate that secondaries are not keeping up, or that the oplog is too small.
- Network Traffic: Ingress/egress traffic to/from MongoDB instances.
- Disk I/O: Read/write operations per second, latency.
- Query Performance: Slow queries, query execution times.
- Connections: Number of active connections, connection pool usage.
- Memory Usage: Resident memory, virtual memory.
- CPU Usage: System CPU, user CPU.
- Disk Space: Available disk space on data volumes.
Leveraging Prometheus and Grafana
Prometheus is a popular open-source monitoring and alerting system. It scrapes metrics from configured targets at given intervals, evaluates rule expressions, and displays the results on a time-series database. Grafana is a powerful visualization tool that integrates seamlessly with Prometheus.
To monitor MongoDB with Prometheus, we use the mongodb_exporter. This exporter runs as a separate service, connects to your MongoDB instances, collects metrics, and exposes them via an HTTP endpoint (defaulting to port 9216) for Prometheus to scrape.
First, install the mongodb_exporter on a dedicated monitoring node or one of your MongoDB nodes (if resources permit). Download the latest release from GitHub.
# Example for Linux AMD64 wget https://github.com/mongodb-developer/mongodb_exporter/releases/download/v0.10.0/mongodb_exporter-0.10.0.linux.amd64.tar.gz tar -xzf mongodb_exporter-0.10.0.linux.amd64.tar.gz sudo mv mongodb_exporter /usr/local/bin/ rm mongodb_exporter-0.10.0.linux.amd64.tar.gz
Next, create a user in MongoDB with the necessary privileges for the exporter to read metrics. This user should have the clusterMonitor role.
# Connect to your MongoDB primary
mongo --host <your_mongo_primary_ip> -u admin -p <admin_password> --authenticationDatabase admin
# Create the exporter user
use admin
db.createUser({
user: "exporter",
pwd: "your_exporter_password",
roles: [ { role: "clusterMonitor", db: "admin" } ]
})
exit
Now, configure the mongodb_exporter to connect to your MongoDB instances. It’s best practice to use a configuration file for credentials rather than command-line arguments.
# Create a credentials file, e.g., /etc/mongodb_exporter/credentials.yml # Ensure this file has strict permissions: chmod 600 /etc/mongodb_exporter/credentials.yml # # Example content: # # - uri: "mongodb://exporter:[email protected]:27017/?replicaSet=rs0" # - uri: "mongodb://exporter:[email protected]:27017/?replicaSet=rs0" # - uri: "mongodb://exporter:[email protected]:27017/?replicaSet=rs0" # # If you have sharded clusters, you'll need to configure the config servers too. # For a sharded cluster, you might have: # - uri: "mongodb://exporter:[email protected]:27017" # - uri: "mongodb://exporter:[email protected]:27017" # - uri: "mongodb://exporter:[email protected]:27017" # # And for config servers (if not part of the replica set already listed): # - uri: "mongodb://exporter:[email protected]:27019/?replicaSet=configReplSet" # - uri: "mongodb://exporter:[email protected]:27019/?replicaSet=configReplSet" # - uri: "mongodb://exporter:[email protected]:27019/?replicaSet=configReplSet"
Create a systemd service file for mongodb_exporter (e.g., /etc/systemd/system/mongodb_exporter.service):
[Unit] Description=MongoDB Exporter Wants=network-online.target After=network-online.target [Service] User=mongodb_exporter Group=mongodb_exporter Type=simple ExecStart=/usr/local/bin/mongodb_exporter \ --config.file=/etc/mongodb_exporter/credentials.yml \ --web.listen-address=":9216" \ --mongodb.log-dir=/var/log/mongodb \ --mongodb.slow-queries-log=/var/log/mongodb/mongod.slow.log Restart=on-failure [Install] [Install] WantedBy=multi-user.target
Create the user and group for the exporter, and the log directory:
sudo groupadd --system mongodb_exporter sudo useradd --system -g mongodb_exporter mongodb_exporter sudo mkdir -p /etc/mongodb_exporter sudo chown -R mongodb_exporter:mongodb_exporter /etc/mongodb_exporter sudo mkdir -p /var/log/mongodb sudo chown -R mongodb_exporter:mongodb_exporter /var/log/mongodb
Reload systemd, start, and enable the exporter:
sudo systemctl daemon-reload sudo systemctl start mongodb_exporter sudo systemctl enable mongodb_exporter sudo systemctl status mongodb_exporter
Now, configure Prometheus to scrape the mongodb_exporter. In your prometheus.yml file, add a scrape configuration:
scrape_configs:
- job_name: 'mongodb'
static_configs:
- targets: ['<your_mongodb_exporter_ip>:9216'] # Replace with the IP of your exporter host
labels:
cluster: 'my-mongo-cluster' # Add a descriptive label
# If you have multiple MongoDB clusters, add more labels to differentiate
Restart Prometheus after updating its configuration.
Grafana Dashboards for MongoDB
Once Prometheus is scraping MongoDB metrics, you can import pre-built Grafana dashboards or create your own. A popular source for MongoDB dashboards is the Grafana Labs community dashboards. Search for “MongoDB” on grafana.com/grafana/dashboards/.
A good starting point is the “MongoDB” dashboard (often ID 7426 or similar). After importing, configure the dashboard to use your Prometheus data source. You’ll want to pay close attention to dashboards that visualize:
- Replication status and lag (e.g.,
mongodb_replset_member_state,mongodb_replset_member_oplog_lag_seconds) - Oplog statistics (e.g.,
mongodb_oplog_window_seconds) - Performance metrics (e.g.,
mongodb_opcounter_insert,mongodb_opcounter_query,mongodb_network_bytes_in,mongodb_network_bytes_out) - Resource utilization (e.g.,
mongodb_storage_data_size,mongodb_mem_resident,mongodb_cpu_user_seconds_total) - Connection counts (e.g.,
mongodb_connections_current)
Alerting with Prometheus Alertmanager
To make your monitoring truly proactive, set up alerting. Prometheus Alertmanager handles alerts sent by Prometheus server instances. It receives alerts, deduplicates them, groups them, and routes them to the correct receiver integration (e.g., Slack, PagerDuty, email).
Define alerting rules in Prometheus. For example, in a file like /etc/prometheus/rules/mongodb_alerts.yml:
groups:
- name: mongodb.rules
rules:
- alert: HighReplicationLag
expr: avg by (cluster) (mongodb_replset_member_oplog_lag_seconds{state="primary"}) > 600 # Alert if lag is over 10 minutes
for: 5m
labels:
severity: warning
annotations:
summary: "High replication lag detected on {{ $labels.cluster }}"
description: "MongoDB cluster {{ $labels.cluster }} has replication lag exceeding 10 minutes."
- alert: LowOplogWindow
expr: avg by (cluster) (mongodb_oplog_window_seconds) < 3600 # Alert if oplog window is less than 1 hour
for: 10m
labels:
severity: critical
annotations:
summary: "Low oplog window on {{ $labels.cluster }}"
description: "MongoDB cluster {{ $labels.cluster }} has an oplog window less than 1 hour, indicating secondaries may fall behind."
- alert: HighDiskUsage
expr: node_filesystem_avail_bytes{mountpoint="/var/lib/mongodb"} / node_filesystem_size_bytes{mountpoint="/var/lib/mongodb"} * 100 < 15 # Alert if less than 15% disk space available
for: 15m
labels:
severity: critical
annotations:
summary: "Low disk space on MongoDB data volume for {{ $labels.instance }}"
description: "MongoDB data volume on {{ $labels.instance }} has less than 15% free space."
Ensure these rules are loaded by Prometheus by adding them to your prometheus.yml:
rule_files: - "/etc/prometheus/rules/*.yml"
Configure Alertmanager with receivers for your preferred notification channels (e.g., Slack, email). This typically involves setting up a alertmanager.yml file with routes and receivers.
Linode Specific Considerations
When deploying on Linode, consider the following:
- Network Latency: If your Perl app and MongoDB cluster are spread across different Linode data centers or even different regions, monitor network latency. High latency can impact application performance and replication.
- Linode NodeBalancers: For high availability of your Perl application, use Linode NodeBalancers. Configure them to use your application’s health check endpoint for intelligent traffic distribution.
- Disk I/O Performance: Linode offers different storage options (e.g., SSD, NVMe). For MongoDB, NVMe offers superior I/O performance, which is critical for database workloads. Monitor I/O wait times and throughput.
- Resource Limits: Be mindful of Linode’s resource limits (CPU, RAM, network egress). Monitor these closely to avoid unexpected throttling or performance degradation.
- Backups: While not strictly monitoring, robust backup strategies are essential. Linode provides automated backups, but ensure you also have application-level backups (e.g.,
mongodump) for critical data.
By combining application-level health checks, system resource monitoring, and dedicated database cluster monitoring with tools like Prometheus and Grafana, you can build a resilient infrastructure that keeps your Perl applications and MongoDB clusters running smoothly on Linode.