Server Monitoring Best Practices: Keeping Your C++ App and MySQL Clusters Alive on DigitalOcean
Proactive C++ Application Health Checks
For a C++ application running on DigitalOcean, robust health checking is paramount. Beyond simple process existence, we need to verify internal state and responsiveness. A common pattern is to expose an HTTP endpoint that the application itself serves, which can be polled by an external monitoring system. This endpoint should perform critical internal checks.
Consider a C++ application that manages a connection pool to MySQL. The health check endpoint should not only confirm the application process is running but also validate the health of its critical dependencies, like the MySQL connection pool. Here’s a conceptual outline of how such an endpoint might be implemented using a lightweight HTTP server library like cpp-httplib.
Implementing a Health Check Endpoint
The health check endpoint, typically exposed on a dedicated port (e.g., 8080), should return a 200 OK status for healthy states and a non-2xx status code (e.g., 503 Service Unavailable) for unhealthy states. The response body can contain detailed JSON diagnostics.
Example C++ Health Check Logic
This snippet demonstrates a basic health check within a C++ application. It assumes you have a MySQLConnectionPool class with a ping() method that verifies connectivity.
#include <iostream>
#include <string>
#include <httplib.h> // Assuming cpp-httplib is used
#include <nlohmann/json.hpp> // For JSON serialization
// Forward declaration of your MySQL connection pool
class MySQLConnectionPool;
// Assume this is initialized elsewhere and holds the connection pool
extern MySQLConnectionPool g_mysql_pool;
// Function to perform comprehensive health checks
std::string perform_health_check() {
nlohmann::json status;
status["application_status"] = "ok";
status["dependencies"] = nlohmann::json::object();
// Check MySQL connection pool
try {
if (g_mysql_pool.ping()) {
status["dependencies"]["mysql"] = "ok";
} else {
status["application_status"] = "degraded";
status["dependencies"]["mysql"] = "unreachable";
}
} catch (const std::exception& e) {
status["application_status"] = "degraded";
status["dependencies"]["mysql"] = "error: " + std::string(e.what());
}
// Add other dependency checks here (e.g., Redis, external services)
return status.dump(4); // Pretty-printed JSON
}
void setup_health_check_server(int port) {
httplib::Server svr;
svr.Get("/healthz", [&](const httplib::Request& req, httplib::Response& res) {
std::string health_data = perform_health_check();
nlohmann::json status_json = nlohmann::json::parse(health_data);
if (status_json.count("application_status") && status_json["application_status"] == "ok") {
res.set_content(health_data, "application/json");
res.status = 200;
} else {
res.set_content(health_data, "application/json");
res.status = 503; // Service Unavailable
}
});
std::cout << "Health check server starting on port " << port << std::endl;
if (!svr.listen("0.0.0.0", port)) {
std::cerr << "Failed to start health check server!" << std::endl;
// Handle error appropriately, perhaps exit or retry
}
}
// In your main function:
// int main() {
// // ... initialize g_mysql_pool ...
// std::thread health_thread(setup_health_check_server, 8080);
// // ... rest of your application logic ...
// health_thread.join();
// return 0;
// }
Monitoring MySQL Clusters on DigitalOcean with Prometheus & Grafana
For MySQL clusters, especially those managed by DigitalOcean’s Managed Databases, a robust monitoring solution is essential. Prometheus, coupled with Grafana for visualization, provides a powerful and flexible stack. We’ll focus on setting up the mysqld_exporter to scrape metrics from your MySQL instances.
Deploying and Configuring mysqld_exporter
The mysqld_exporter is a Prometheus exporter that scrapes metrics from MySQL servers. It requires a MySQL user with appropriate privileges.
MySQL User Permissions
Create a dedicated user for the exporter. Granting minimal necessary privileges reduces the attack surface.
-- Connect to your MySQL instance -- CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'your_strong_password'; -- GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost'; -- FLUSH PRIVILEGES; -- If using DigitalOcean Managed Databases, you might need to use a specific host or IP. -- Consult DO documentation for secure access. -- Example for a specific IP: -- GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'192.168.1.100';
Running mysqld_exporter
You can run mysqld_exporter as a systemd service for reliability. First, download the latest release from the official GitHub repository.
Create a systemd service file (e.g., /etc/systemd/system/mysqld_exporter.service):
[Unit] Description=Prometheus MySQL Exporter Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/mysqld_exporter \ --config.my-cnf=/etc/mysqld_exporter/.my.cnf \ --web.listen-address=":9104" [Install] WantedBy=multi-user.target
Create the configuration file /etc/mysqld_exporter/.my.cnf:
[client] user=exporter password=your_strong_password host=your_mysql_host_or_ip port=3306
Ensure the prometheus user exists and has ownership of the configuration directory and file. Then, enable and start the service:
sudo useradd --no-create-home prometheus sudo mkdir -p /etc/mysqld_exporter sudo chown -R prometheus:prometheus /etc/mysqld_exporter sudo chmod 600 /etc/mysqld_exporter/.my.cnf # Restrict permissions sudo systemctl daemon-reload sudo systemctl enable mysqld_exporter sudo systemctl start mysqld_exporter sudo systemctl status mysqld_exporter
Configuring Prometheus to Scrape MySQL Metrics
Edit your Prometheus configuration file (e.g., /etc/prometheus/prometheus.yml) to add a scrape job for mysqld_exporter.
scrape_configs:
- job_name: 'mysql'
static_configs:
- targets: ['localhost:9104'] # Or the IP/hostname of your exporter if remote
labels:
cluster: 'your-mysql-cluster-name' # e.g., 'do-mysql-prod-01'
env: 'production'
Reload Prometheus configuration:
sudo systemctl reload prometheus
Setting up Grafana Dashboards
Import pre-built Grafana dashboards for MySQL. A popular choice is the “MySQL Overview” dashboard (ID 7362 or similar). You can find many community-contributed dashboards on Grafana.com/dashboards.
When configuring the data source in Grafana, point it to your Prometheus instance. Ensure your Prometheus instance is configured to scrape the mysqld_exporter. Key metrics to monitor include:
mysql_global_status_connections: Number of client connections.mysql_global_status_threads_connected: Number of currently connected threads.mysql_global_status_threads_running: Number of threads that are not sleeping.mysql_global_status_queries: Total number of statements executed by the server.mysql_global_status_slow_queries: Number of queries that take too long to execute.mysql_slave_status_slave_sql_runningandmysql_slave_status_slave_io_running: For replication status (if applicable).mysql_up: Indicates if the exporter could connect to MySQL.
Integrating Application Health with Load Balancers
DigitalOcean Load Balancers (or any external load balancer like HAProxy) should leverage the application’s health check endpoint. This ensures that unhealthy instances are automatically removed from the load balancing pool, preventing users from hitting failing application servers.
Configuring Load Balancer Health Checks
When setting up a Load Balancer in DigitalOcean, you can configure health checks. For an application exposing a /healthz endpoint on port 8080:
- Protocol: HTTP
- Port: 8080
- Path:
/healthz - Check Interval: e.g., 10 seconds
- Response Timeout: e.g., 5 seconds
- Healthy Threshold: e.g., 2 consecutive successes
- Unhealthy Threshold: e.g., 3 consecutive failures
The load balancer will periodically send requests to http://<droplet_ip>:8080/healthz. If it receives a non-2xx response (like our 503), it marks the droplet as unhealthy and stops sending traffic to it. Once the application recovers and starts returning 200 OK, the droplet will be marked healthy again.
Alerting Strategies
Effective alerting is crucial. Alerts should be actionable and minimize noise. We’ll use Alertmanager, which integrates seamlessly with Prometheus.
Setting up Alertmanager
Alertmanager handles alerts sent by Prometheus. It deduplicates, groups, and routes them to the correct receiver (e.g., Slack, PagerDuty, email).
A basic Alertmanager configuration (alertmanager.yml):
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster', 'env']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default-receiver' # Default receiver if no specific route matches
routes:
- receiver: 'critical-alerts'
match_re:
severity: 'critical'
continue: true # Allows matching other routes if needed
receivers:
- name: 'default-receiver'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL'
channel: '#alerts-general'
send_resolved: true
- name: 'critical-alerts'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL'
channel: '#alerts-critical'
send_resolved: true
pagerduty_configs:
- service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'
Prometheus Alerting Rules
Define alerting rules in Prometheus. These are PromQL queries that trigger alerts when conditions are met. Place these in a separate file (e.g., /etc/prometheus/rules/mysql_alerts.yml) and include it in prometheus.yml.
groups:
- name: mysql.rules
rules:
- alert: MysqlDown
expr: mysql_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "MySQL instance {{ $labels.instance }} is down."
description: "The mysqld_exporter failed to connect to MySQL instance {{ $labels.instance }} for 5 minutes."
- alert: HighMysqlConnections
expr: mysql_global_status_threads_connected > 500 # Adjust threshold
for: 10m
labels:
severity: warning
annotations:
summary: "High number of MySQL connections on {{ $labels.instance }}."
description: "MySQL instance {{ $labels.instance }} has {{ $value }} active connections, exceeding the threshold."
- alert: HighMysqlThreadsRunning
expr: mysql_global_status_threads_running > 100 # Adjust threshold
for: 10m
labels:
severity: warning
annotations:
summary: "High number of running MySQL threads on {{ $labels.instance }}."
description: "MySQL instance {{ $labels.instance }} has {{ $value }} running threads, indicating potential contention."
- alert: HighMysqlSlowQueries
expr: rate(mysql_global_status_slow_queries[5m]) > 10 # Adjust rate and threshold
for: 5m
labels:
severity: warning
annotations:
summary: "High rate of slow MySQL queries on {{ $labels.instance }}."
description: "MySQL instance {{ $labels.instance }} is experiencing a high rate of slow queries ({{ $value }} per second)."
- alert: MysqlReplicationLag
expr: mysql_slave_status_seconds_behind_master > 600 # 10 minutes lag
for: 5m
labels:
severity: critical
annotations:
summary: "MySQL replication lag detected on {{ $labels.instance }}."
description: "MySQL replica {{ $labels.instance }} is {{ $value }} seconds behind the primary."
Ensure your Prometheus configuration (prometheus.yml) includes:
rule_files:
- "/etc/prometheus/rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093'] # Address of your Alertmanager instance
Reload Prometheus and ensure Alertmanager is running and accessible by Prometheus.
System-Level Monitoring with Node Exporter
While application and database specific metrics are vital, understanding the underlying system performance is equally important. node_exporter provides hardware and OS metrics.
Deploying Node Exporter
Similar to mysqld_exporter, download the latest release and set it up as a systemd service. The default port is 9100.
[Unit] Description=Prometheus Node Exporter Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/node_exporter \ --collector.filesystem.mount-points-exclude="^/(sys|proc|dev|host|etc)($$|/.*)" \ --collector.netdev.ignore-self-loops=true \ --collector.netdev.ignore-up=true \ --web.listen-address=":9100" [Install] WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter sudo systemctl status node_exporter
Configuring Prometheus for Node Exporter
Add another job to your prometheus.yml:
scrape_configs:
# ... other jobs ...
- job_name: 'node'
static_configs:
- targets: ['localhost:9100'] # Or the IP/hostname of your exporter if remote
labels:
cluster: 'your-app-cluster-name' # e.g., 'do-app-prod-web-01'
env: 'production'
Key Node Exporter Metrics for Alerting
node_load1,node_load5,node_load15: System load averages.node_memory_MemAvailable_bytes: Available memory.node_disk_io_time_seconds_total: Disk I/O time.node_filesystem_avail_bytes: Filesystem available space.node_network_receive_errs_total,node_network_transmit_errs_total: Network interface errors.
Alerting rules for these metrics can be added to your Prometheus rule files, similar to the MySQL alerts, to detect high load, low memory, disk space issues, or network errors.