Server Monitoring Best Practices: Keeping Your C++ App and MongoDB Clusters Alive on Linode
Proactive Health Checks for C++ Applications
Maintaining the health of a C++ application, especially one serving critical services, requires more than just basic process monitoring. We need to implement application-level health checks that can be queried externally. This allows our monitoring system to understand not just if the process is running, but if it’s *actually* serving requests correctly. For a C++ application, this often involves exposing an HTTP endpoint that performs internal checks.
Consider a simple C++ web server using `libmicrohttpd` or `Boost.Beast`. We can add a dedicated `/health` endpoint. This endpoint should: 1. Verify database connectivity (if applicable). 2. Check internal resource pools (e.g., thread pools, connection pools). 3. Confirm essential background tasks are running.
Implementing a Health Endpoint in C++
Here’s a conceptual example using `libmicrohttpd`. The actual implementation will depend on your application’s architecture and dependencies.
First, ensure you have `libmicrohttpd` installed. On Debian/Ubuntu:
sudo apt update sudo apt install libmicrohttpd-dev
Next, the C++ code. We’ll add a handler for the `/health` path.
#include <microhttpd.h>
#include <string>
#include <vector>
#include <iostream>
#include <chrono>
#include <thread>
// Assume these functions exist and perform actual checks
bool is_database_connected() {
// Simulate a check
std::this_thread::sleep_for(std::chrono::milliseconds(50));
return true; // Or false if connection fails
}
bool are_essential_services_running() {
// Simulate a check
std::this_thread::sleep_for(std::chrono::milliseconds(30));
return true; // Or false if services are down
}
static int health_handler(void *cls, struct MHD_Connection *connection,
const char *url, const char *method, const char *version,
const char *upload_data, size_t *upload_data_size, void **con_cls) {
if (strcmp(url, "/health") == 0 && strcmp(method, "GET") == 0) {
bool db_ok = is_database_connected();
bool services_ok = are_essential_services_running();
if (db_ok && services_ok) {
const char *response = "{\"status\": \"ok\"}";
struct MHD_Response *mhd_response = MHD_create_response_from_buffer(strlen(response), (void *)response, MHD_RESPMEM_PERSISTENT);
MHD_add_response_header(mhd_response, MHD_HTTP_HEADER_CONTENT_TYPE, "application/json");
int ret = MHD_queue_basic_status_line(connection, MHD_HTTP_OK, mhd_response);
MHD_destroy_response(mhd_response);
return ret;
} else {
const char *response = "{\"status\": \"degraded\"}";
struct MHD_Response *mhd_response = MHD_create_response_from_buffer(strlen(response), (void *)response, MHD_RESPMEM_PERSISTENT);
MHD_add_response_header(mhd_response, MHD_HTTP_HEADER_CONTENT_TYPE, "application/json");
// Use 503 Service Unavailable for degraded/unhealthy states
int ret = MHD_queue_basic_status_line(connection, MHD_HTTP_SERVICE_UNAVAILABLE, mhd_response);
MHD_destroy_response(mhd_response);
return ret;
}
}
// Handle other requests or return 404
const char *response = "Not Found";
struct MHD_Response *mhd_response = MHD_create_response_from_buffer(strlen(response), (void *)response, MHD_RESPMEM_PERSISTENT);
int ret = MHD_queue_basic_status_line(connection, MHD_HTTP_NOT_FOUND, mhd_response);
MHD_destroy_response(mhd_response);
return ret;
}
int main() {
struct MHD_Daemon *daemon;
daemon = MHD_start_daemon(MHD_SERVER_PORT_65535, 8080, NULL, NULL,
&health_handler, NULL, MHD_OPTION_END);
if (daemon == NULL) {
std::cerr << "Failed to start daemon" << std::endl;
return 1;
}
std::cout << "Server started on port 8080. Health check at /health" << std::endl;
// Keep the server running
// In a real app, this would be your main application logic loop
while (1) {
std::this_thread::sleep_for(std::chrono::seconds(1));
}
MHD_stop_daemon(daemon);
return 0;
}
Compile this with:
g++ -o health_server health_server.cpp -lmicrohttpd -std=c++11
This simple endpoint returns JSON indicating the application’s health. A `200 OK` signifies all checks passed, while a `503 Service Unavailable` indicates a problem. This is crucial for load balancers and orchestration systems.
Monitoring MongoDB Clusters on Linode
For MongoDB, robust monitoring is essential, especially in a cluster. Linode offers managed MongoDB, but even then, understanding cluster health, performance metrics, and potential bottlenecks is vital. We’ll focus on using `mongostat`, `mongotop`, and Prometheus with the MongoDB Exporter.
Essential MongoDB Command-Line Tools
mongostat provides a quick overview of current MongoDB server activity. It’s invaluable for real-time performance analysis.
# On a MongoDB node mongostat --host mongodb.yourdomain.com --port 27017 --username your_user --password your_password --authenticationDatabase admin --oplog-window 1000 --discover
Key metrics to watch:
insert,query,update,delete: Operations per second. Spikes or sustained high rates need context.dirty %: Percentage of dirty pages in RAM. High values can indicate insufficient RAM or slow writes.used,res: Disk and resident memory usage.qrw,arw: Queue length for read/write operations. High numbers suggest I/O bottlenecks or overloaded server.netIn,netOut: Network traffic.lock %: Percentage of time locks were held. High lock contention is a major performance killer.
mongotop shows the time spent reading and writing data on a per-collection basis. This helps identify hot collections.
# On a MongoDB node mongotop --host mongodb.yourdomain.com --port 27017 --username your_user --password your_password --authenticationDatabase admin --discover
Run this for a few minutes to get meaningful data. Look for collections with consistently high read or write times.
Automated Monitoring with Prometheus and MongoDB Exporter
For production environments, manual checks are insufficient. Prometheus, coupled with the official MongoDB Exporter, provides a scalable and comprehensive monitoring solution. This involves deploying the exporter on each MongoDB node (or a dedicated monitoring node that can reach them) and configuring Prometheus to scrape its metrics.
1. Deploy MongoDB Exporter
The exporter can be run as a Docker container or a standalone binary. Using Docker is often simpler for management.
# Example using Docker docker run -d \ --name mongodb_exporter \ -p 9274:9274 \ -e "DATA_SOURCE_NAME=mongodb://your_user:[email protected]:27017/admin?authSource=admin" \ prom/mongodb-exporter:latest
Replace `your_user`, `your_password`, and `mongodb.yourdomain.com` with your actual credentials and host. The exporter exposes metrics on port `9274` by default.
2. Configure Prometheus to Scrape MongoDB Exporter
Edit your Prometheus configuration file (`prometheus.yml`). Add a new scrape job for your MongoDB instances.
scrape_configs:
- job_name: 'mongodb'
static_configs:
- targets:
- 'mongodb-node-1.yourdomain.com:9274'
- 'mongodb-node-2.yourdomain.com:9274'
- 'mongodb-node-3.yourdomain.com:9274'
metrics_path: /metrics
# If using service discovery, this would be dynamic.
# For Linode, you might use file_sd_configs or consul_sd_configs.
# Example using file_sd_configs for static IPs:
# file_sd_configs:
# - files:
# - '/etc/prometheus/file_sd/mongodb.json'
If you have a dynamic environment, consider using Prometheus’s service discovery mechanisms (e.g., Consul, Kubernetes, EC2) or a file-based discovery (`file_sd_configs`) where you maintain a JSON file listing your MongoDB exporter endpoints.
3. Key Prometheus Metrics for MongoDB
Once Prometheus is scraping, you can build dashboards (e.g., in Grafana) and set up alerts. Essential metrics include:
mongodb_up: Indicates if the exporter can connect to MongoDB.mongodb_mongod_connections_current: Number of active connections.mongodb_mongod_network_bytes_in_total,mongodb_mongod_network_bytes_out_total: Network throughput.mongodb_mongod_opcounters_insert_total,mongodb_mongod_opcounters_query_total, etc.: Operation counts.mongodb_mongod_locks_time_acquiring_total: Lock acquisition times.mongodb_mongod_memory_resident_total: Resident memory usage.mongodb_mongod_metrics_document_total: Total documents in collections (useful for growth tracking).mongodb_replication_oplog_window_seconds: Oplog window size. Crucial for replica set health and catch-up times.
Alerting on Critical Conditions
Alerting is the final, critical piece. Configure Prometheus Alertmanager to notify you of issues before they impact users.
Example Alert Rule (in a Prometheus rules file, e.g., `alerts.yml`):
groups:
- name: mongodb_alerts
rules:
- alert: MongoDBHighConnectionCount
expr: mongodb_mongod_connections_current > 500 # Adjust threshold
for: 5m
labels:
severity: warning
annotations:
summary: "MongoDB high connection count on {{ $labels.instance }}"
description: "Instance {{ $labels.instance }} has {{ $value }} active connections, exceeding the threshold."
- alert: MongoDBReplicaLag
expr: mongodb_replication_oplog_window_seconds < 600 # Oplog window less than 10 minutes
for: 2m
labels:
severity: critical
annotations:
summary: "MongoDB replica lag detected on {{ $labels.instance }}"
description: "Instance {{ $labels.instance }} has an oplog window of {{ $value }} seconds, indicating replication lag."
- alert: MongoDBExporterDown
expr: up{job="mongodb"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MongoDB exporter is down on {{ $labels.instance }}"
description: "The Prometheus exporter for MongoDB on {{ $labels.instance }} is unreachable."
Ensure your `alertmanager.yml` is configured to route these alerts to your preferred notification channels (Slack, PagerDuty, email, etc.). Regularly review these alerts and thresholds to tune them for your specific workload and tolerance for downtime.