Server Monitoring Best Practices: Keeping Your C++ App and MongoDB Clusters Alive on AWS
Proactive C++ Application Health Checks on AWS EC2
Maintaining the health of C++ applications deployed on AWS EC2 instances requires a multi-layered monitoring strategy. Beyond basic CPU and memory utilization, we need to inspect application-specific metrics and ensure critical processes are running. This involves leveraging system-level tools and custom application instrumentation.
Implementing a C++ Application Health Check Endpoint
A robust approach is to embed a simple HTTP health check endpoint directly within your C++ application. This endpoint can perform internal checks, such as verifying database connections, queue status, or the availability of essential services. For this example, we’ll use the `cpprestsdk` for a lightweight HTTP server.
First, ensure you have `cpprestsdk` installed. On Ubuntu/Debian, this is typically:
sudo apt update && sudo apt install libcpprest-dev
Here’s a minimal C++ example demonstrating a health check endpoint:
#include <iostream>
#include <cpprest/http_listener.h>
#include <cpprest/json.h>
using namespace web;
using namespace web::http;
using namespace web::http::experimental::listener;
// Simulate a critical dependency check
bool is_database_connected() {
// In a real app, this would check your MongoDB connection pool or similar
return true; // Placeholder
}
void handle_get(http_request message) {
web::json::value response_json;
response_json[U("status")] = web::json::value::string(U("OK"));
response_json[U("timestamp")] = web::json::value::string(utility::conversions::to_string_t(std::to_string(std::chrono::system_clock::now().time_since_epoch().count())));
// Perform application-specific checks
if (!is_database_connected()) {
response_json[U("status")] = web::json::value::string(U("ERROR"));
response_json[U("message")] = web::json::value::string(U("Database connection failed"));
}
message.reply(status_codes::OK, response_json);
}
int main() {
// The port your application normally runs on, or a dedicated monitoring port
uri_builder uri(U("http://0.0.0.0:8080"));
uri.set_path(U("/health"));
auto addr = uri.to_uri().to_string();
http_listener listener(addr);
listener.support(methods::GET, handle_get);
try {
listener
.open()
.then([&listener]() { std::cout << utility::conversions::to_string_t("Listening for requests at: ") << listener.uri().to_string() << std::endl; })
.wait();
// Keep the server running
std::cout << "Press ENTER to exit." << std::endl;
std::string line;
std::getline(std::cin, line);
listener.close().wait();
} catch (const std::exception & e) {
std::cerr << "Error: " << e.what() << std::endl;
}
return 0;
}
Compile this code (e.g., using g++):
g++ health_check.cpp -o health_check -lcpprest -lboost_system -lpthread -lssl -lcrypto ./health_check
Integrating with AWS CloudWatch Agent
AWS CloudWatch Agent is essential for collecting metrics and logs from your EC2 instances. We can configure it to poll our health check endpoint and send custom metrics to CloudWatch.
First, install the CloudWatch Agent on your EC2 instance. Refer to the official AWS documentation for the latest installation instructions for your OS.
Next, create a CloudWatch Agent configuration file (e.g., /opt/aws/amazon-cloudwatch-agent/bin/config.json). This configuration will include:
- Metrics collection (CPU, memory, disk, network).
- Custom metrics from our application’s health check endpoint.
- Log file collection.
Here’s a sample configuration snippet focusing on the custom metrics:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "cwagent"
},
"metrics": {
"namespace": "MyCppApp",
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"http_listener": {
"interval": 30,
"url": "http://localhost:8080/health",
"method": "GET",
"response_code_metrics": {
"enabled": true
},
"response_size_metrics": {
"enabled": true
},
"response_time_metrics": {
"enabled": true
},
"content_metrics": [
{
"name": "HealthStatus",
"type": "string",
"value_path": "$.status",
"value_alias": "OK",
"dimensions": [
{"HealthStatus": "OK"}
]
}
]
}
}
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/my_cpp_app.log",
"log_group_name": "MyCppAppLogs",
"log_stream_name": "{instance_id}/my_cpp_app"
}
]
}
}
}
}
Start the agent with this configuration:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
You can then create CloudWatch Alarms based on the custom `MyCppApp/HealthStatus` metric. For instance, trigger an alarm if the `HealthStatus` is not “OK” for a sustained period.
Monitoring MongoDB Clusters on AWS (e.g., using EC2 or DocumentDB)
Monitoring MongoDB clusters, whether self-hosted on EC2 or using AWS DocumentDB, requires a focus on performance metrics, replication status, and resource utilization.
Self-Hosted MongoDB on EC2: Key Metrics and Tools
For MongoDB instances running on EC2, we’ll use a combination of MongoDB’s built-in tools and CloudWatch Agent.
Essential Metrics to Monitor:
- Connections:
current,available,totalCreated. - Network:
bytesIn,bytesOut,numRequests. - Memory:
resident,virtual,mapped. - Disk:
fsUsedSize,fsTotalSize. - Operations:
insert,query,update,delete,command(per second). - Replication:
oplog lag(crucial for replica sets). - Cache:
pagesections,page_faults. - Locking:
globalLock.currentQueue.total,globalLock.lockTime.
Using mongostat and mongotop:
These command-line utilities provide real-time insights. You can script their output for periodic collection.
# Real-time stats mongostat --host <your_mongo_host>:<port> --username <user> --password <password> --authenticationDatabase admin 1
# Real-time collection/index usage mongotop --host <your_mongo_host>:<port> --username <user> --password <password> --authenticationDatabase admin 10
Integrating with CloudWatch Agent for Custom Metrics:
We can create a small script that periodically runs `mongostat` and parses its output, then sends these metrics to CloudWatch using the AWS SDK or by writing to a file that CloudWatch Agent monitors.
Here’s a Python script example that uses `pymongo` to fetch metrics and `boto3` to publish them to CloudWatch:
import pymongo
import boto3
from datetime import datetime
import time
# --- Configuration ---
MONGO_HOST = "mongodb://user:password@your_mongo_host:27017/?authSource=admin"
CLOUDWATCH_NAMESPACE = "MongoDB"
REGION_NAME = "us-east-1"
INSTANCE_ID = "i-xxxxxxxxxxxxxxxxx" # Or fetch dynamically
# ---------------------
client = pymongo.MongoClient(MONGO_HOST)
db = client.admin
cloudwatch = boto3.client('cloudwatch', region_name=REGION_NAME)
def get_mongo_metrics():
metrics = []
try:
# Basic server status
server_status = db.command('serverStatus')
# Connections
metrics.append({
'MetricName': 'ConnectionsCurrent',
'Value': server_status['connections']['current'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'ConnectionsAvailable',
'Value': server_status['connections']['available'],
'Unit': 'Count'
})
# Network
metrics.append({
'MetricName': 'NetworkInBytes',
'Value': server_status['network']['bytesIn'],
'Unit': 'Bytes'
})
metrics.append({
'MetricName': 'NetworkOutBytes',
'Value': server_status['network']['bytesOut'],
'Unit': 'Bytes'
})
metrics.append({
'MetricName': 'NetworkRequests',
'Value': server_status['network']['numRequests'],
'Unit': 'Count'
})
# Operations (per second - requires calculating difference over time, simplified here)
metrics.append({
'MetricName': 'OperationsInsert',
'Value': server_status['opcounters']['insert'],
'Unit': 'Count'
})
metrics.append({
'MetricName': 'OperationsQuery',
'Value': server_status['opcounters']['query'],
'Unit': 'Count'
})
# Replication lag (for replica sets)
if 'repl' in server_status:
repl_status = db.command('replSetGetStatus')
for member in repl_status['members']:
if member['self']:
metrics.append({
'MetricName': 'ReplicationOplogLagSeconds',
'Value': member.get('optimeLag', 0),
'Unit': 'Seconds'
})
break
# Global Lock
metrics.append({
'MetricName': 'GlobalLockQueueTotal',
'Value': server_status['globalLock']['currentQueue']['total'],
'Unit': 'Count'
})
except Exception as e:
print(f"Error fetching MongoDB metrics: {e}")
# Optionally send an error metric
metrics.append({
'MetricName': 'MongoHealthCheckFailed',
'Value': 1,
'Unit': 'Count'
})
return metrics
def publish_metrics(metrics):
if not metrics:
return
# Add common dimensions
for metric in metrics:
metric['Dimensions'] = [
{'Name': 'InstanceId', 'Value': INSTANCE_ID}
]
try:
cloudwatch.put_metric_data(
Namespace=CLOUDWATCH_NAMESPACE,
MetricData=metrics,
Timestamp=datetime.utcnow()
)
print(f"Published {len(metrics)} metrics to CloudWatch.")
except Exception as e:
print(f"Error publishing metrics to CloudWatch: {e}")
if __name__ == "__main__":
# This script should be run periodically (e.g., via cron or systemd timer)
# For demonstration, we'll run it once. In production, use a loop with sleep.
mongo_metrics = get_mongo_metrics()
publish_metrics(mongo_metrics)
client.close()
# Example of running in a loop:
# while True:
# mongo_metrics = get_mongo_metrics()
# publish_metrics(mongo_metrics)
# time.sleep(60) # Collect metrics every 60 seconds
Configure the CloudWatch Agent’s config.json to collect logs from MongoDB (e.g., /var/log/mongodb/mongod.log) and potentially custom metrics if you choose to write them to a file in a specific format.
AWS DocumentDB Monitoring
AWS DocumentDB integrates seamlessly with CloudWatch. You don’t need custom scripts for basic metrics. DocumentDB automatically publishes a rich set of metrics to CloudWatch under the AWS/DocDB namespace.
Key DocumentDB Metrics in CloudWatch:
- Connections:
Connections. - CPU:
CPUUtilization. - Memory:
FreeableMemory. - Disk:
FreeStorageSpace. - Operations:
ReadIOPS,WriteIOPS,ReadLatency,WriteLatency. - Replication:
ReplicationLag. - Database Engine:
DatabaseConnections.
Monitoring Replication Lag:
The ReplicationLag metric is critical. Create CloudWatch Alarms to notify you if this lag exceeds acceptable thresholds (e.g., > 10 seconds). This indicates potential issues with data propagation across your DocumentDB cluster instances.
DocumentDB Performance Insights:
Enable Performance Insights for your DocumentDB cluster. This provides deeper visibility into database load, allowing you to identify slow queries, long-running operations, and other performance bottlenecks. It’s invaluable for tuning and troubleshooting.
Centralized Logging and Alerting
Regardless of whether you’re monitoring C++ apps or MongoDB, a centralized logging and alerting strategy is paramount. Use CloudWatch Logs to aggregate logs from all your instances and services. Configure metric filters and alarms to trigger notifications via SNS when specific error patterns or threshold breaches occur.
Example: CloudWatch Metric Filter for C++ App Errors
In the CloudWatch console, navigate to your Log Group (e.g., MyCppAppLogs) and create a Metric Filter. Use a filter pattern like:
[ERROR]
This filter will count log entries containing “[ERROR]”. You can then create a CloudWatch Alarm on this metric to alert you immediately upon encountering application errors.
For MongoDB, filter for specific error messages or patterns in your MongoDB logs. For DocumentDB, leverage the structured logs and metrics provided by AWS.
Conclusion
A comprehensive monitoring strategy for C++ applications and MongoDB clusters on AWS involves proactive health checks, detailed metric collection, centralized logging, and timely alerting. By combining application-level instrumentation, CloudWatch Agent, and AWS’s managed service monitoring capabilities (like DocumentDB’s integration), you can ensure the stability, performance, and availability of your critical systems.