Server Monitoring Best Practices: Keeping Your C++ App and MySQL Clusters Alive on DigitalOcean

Proactive C++ Application Health Checks

For a C++ application running on DigitalOcean, robust health checking is paramount. Beyond simple process existence, we need to verify internal state and responsiveness. A common pattern is to expose an HTTP endpoint that the application itself serves, which can be polled by an external monitoring system. This endpoint should perform critical internal checks.

Consider a C++ application that manages a connection pool to MySQL. The health check endpoint should not only confirm the application process is running but also validate the health of its critical dependencies, like the MySQL connection pool. Here’s a conceptual outline of how such an endpoint might be implemented using a lightweight HTTP server library like cpp-httplib.

Implementing a Health Check Endpoint

The health check endpoint, typically exposed on a dedicated port (e.g., 8080), should return a 200 OK status for healthy states and a non-2xx status code (e.g., 503 Service Unavailable) for unhealthy states. The response body can contain detailed JSON diagnostics.

Example C++ Health Check Logic

This snippet demonstrates a basic health check within a C++ application. It assumes you have a MySQLConnectionPool class with a ping() method that verifies connectivity.

#include <iostream>
#include <string>
#include <httplib.h> // Assuming cpp-httplib is used
#include <nlohmann/json.hpp> // For JSON serialization

// Forward declaration of your MySQL connection pool
class MySQLConnectionPool;

// Assume this is initialized elsewhere and holds the connection pool
extern MySQLConnectionPool g_mysql_pool;

// Function to perform comprehensive health checks
std::string perform_health_check() {
    nlohmann::json status;
    status["application_status"] = "ok";
    status["dependencies"] = nlohmann::json::object();

    // Check MySQL connection pool
    try {
        if (g_mysql_pool.ping()) {
            status["dependencies"]["mysql"] = "ok";
        } else {
            status["application_status"] = "degraded";
            status["dependencies"]["mysql"] = "unreachable";
        }
    } catch (const std::exception& e) {
        status["application_status"] = "degraded";
        status["dependencies"]["mysql"] = "error: " + std::string(e.what());
    }

    // Add other dependency checks here (e.g., Redis, external services)

    return status.dump(4); // Pretty-printed JSON
}

void setup_health_check_server(int port) {
    httplib::Server svr;

    svr.Get("/healthz", [&](const httplib::Request& req, httplib::Response& res) {
        std::string health_data = perform_health_check();
        nlohmann::json status_json = nlohmann::json::parse(health_data);

        if (status_json.count("application_status") && status_json["application_status"] == "ok") {
            res.set_content(health_data, "application/json");
            res.status = 200;
        } else {
            res.set_content(health_data, "application/json");
            res.status = 503; // Service Unavailable
        }
    });

    std::cout << "Health check server starting on port " << port << std::endl;
    if (!svr.listen("0.0.0.0", port)) {
        std::cerr << "Failed to start health check server!" << std::endl;
        // Handle error appropriately, perhaps exit or retry
    }
}

// In your main function:
// int main() {
//     // ... initialize g_mysql_pool ...
//     std::thread health_thread(setup_health_check_server, 8080);
//     // ... rest of your application logic ...
//     health_thread.join();
//     return 0;
// }

Monitoring MySQL Clusters on DigitalOcean with Prometheus & Grafana

For MySQL clusters, especially those managed by DigitalOcean’s Managed Databases, a robust monitoring solution is essential. Prometheus, coupled with Grafana for visualization, provides a powerful and flexible stack. We’ll focus on setting up the mysqld_exporter to scrape metrics from your MySQL instances.

Deploying and Configuring mysqld_exporter

The mysqld_exporter is a Prometheus exporter that scrapes metrics from MySQL servers. It requires a MySQL user with appropriate privileges.

MySQL User Permissions

Create a dedicated user for the exporter. Granting minimal necessary privileges reduces the attack surface.

-- Connect to your MySQL instance
-- CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'your_strong_password';
-- GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
-- FLUSH PRIVILEGES;

-- If using DigitalOcean Managed Databases, you might need to use a specific host or IP.
-- Consult DO documentation for secure access.
-- Example for a specific IP:
-- GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'192.168.1.100';

Running mysqld_exporter

You can run mysqld_exporter as a systemd service for reliability. First, download the latest release from the official GitHub repository.

Create a systemd service file (e.g., /etc/systemd/system/mysqld_exporter.service):

[Unit]
Description=Prometheus MySQL Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/mysqld_exporter \
  --config.my-cnf=/etc/mysqld_exporter/.my.cnf \
  --web.listen-address=":9104"

[Install]
WantedBy=multi-user.target

Create the configuration file /etc/mysqld_exporter/.my.cnf:

[client]
user=exporter
password=your_strong_password
host=your_mysql_host_or_ip
port=3306

Ensure the prometheus user exists and has ownership of the configuration directory and file. Then, enable and start the service:

sudo useradd --no-create-home prometheus
sudo mkdir -p /etc/mysqld_exporter
sudo chown -R prometheus:prometheus /etc/mysqld_exporter
sudo chmod 600 /etc/mysqld_exporter/.my.cnf # Restrict permissions

sudo systemctl daemon-reload
sudo systemctl enable mysqld_exporter
sudo systemctl start mysqld_exporter
sudo systemctl status mysqld_exporter

Configuring Prometheus to Scrape MySQL Metrics

Edit your Prometheus configuration file (e.g., /etc/prometheus/prometheus.yml) to add a scrape job for mysqld_exporter.

scrape_configs:
  - job_name: 'mysql'
    static_configs:
      - targets: ['localhost:9104'] # Or the IP/hostname of your exporter if remote
        labels:
          cluster: 'your-mysql-cluster-name' # e.g., 'do-mysql-prod-01'
          env: 'production'

Reload Prometheus configuration:

sudo systemctl reload prometheus

Setting up Grafana Dashboards

Import pre-built Grafana dashboards for MySQL. A popular choice is the “MySQL Overview” dashboard (ID 7362 or similar). You can find many community-contributed dashboards on Grafana.com/dashboards.

When configuring the data source in Grafana, point it to your Prometheus instance. Ensure your Prometheus instance is configured to scrape the mysqld_exporter. Key metrics to monitor include:

mysql_global_status_connections: Number of client connections.
mysql_global_status_threads_connected: Number of currently connected threads.
mysql_global_status_threads_running: Number of threads that are not sleeping.
mysql_global_status_queries: Total number of statements executed by the server.
mysql_global_status_slow_queries: Number of queries that take too long to execute.
mysql_slave_status_slave_sql_running and mysql_slave_status_slave_io_running: For replication status (if applicable).
mysql_up: Indicates if the exporter could connect to MySQL.

Integrating Application Health with Load Balancers

DigitalOcean Load Balancers (or any external load balancer like HAProxy) should leverage the application’s health check endpoint. This ensures that unhealthy instances are automatically removed from the load balancing pool, preventing users from hitting failing application servers.

Configuring Load Balancer Health Checks

When setting up a Load Balancer in DigitalOcean, you can configure health checks. For an application exposing a /healthz endpoint on port 8080:

Protocol: HTTP
Port: 8080
Path: /healthz
Check Interval: e.g., 10 seconds
Response Timeout: e.g., 5 seconds
Healthy Threshold: e.g., 2 consecutive successes
Unhealthy Threshold: e.g., 3 consecutive failures

The load balancer will periodically send requests to http://<droplet_ip>:8080/healthz. If it receives a non-2xx response (like our 503), it marks the droplet as unhealthy and stops sending traffic to it. Once the application recovers and starts returning 200 OK, the droplet will be marked healthy again.

Alerting Strategies

Effective alerting is crucial. Alerts should be actionable and minimize noise. We’ll use Alertmanager, which integrates seamlessly with Prometheus.

Setting up Alertmanager

Alertmanager handles alerts sent by Prometheus. It deduplicates, groups, and routes them to the correct receiver (e.g., Slack, PagerDuty, email).

A basic Alertmanager configuration (alertmanager.yml):

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'env']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default-receiver' # Default receiver if no specific route matches

  routes:
    - receiver: 'critical-alerts'
      match_re:
        severity: 'critical'
      continue: true # Allows matching other routes if needed

receivers:
  - name: 'default-receiver'
    slack_configs:
      - api_url: 'YOUR_SLACK_WEBHOOK_URL'
        channel: '#alerts-general'
        send_resolved: true

  - name: 'critical-alerts'
    slack_configs:
      - api_url: 'YOUR_SLACK_WEBHOOK_URL'
        channel: '#alerts-critical'
        send_resolved: true
    pagerduty_configs:
      - service_key: 'YOUR_PAGERDUTY_INTEGRATION_KEY'

Prometheus Alerting Rules

Define alerting rules in Prometheus. These are PromQL queries that trigger alerts when conditions are met. Place these in a separate file (e.g., /etc/prometheus/rules/mysql_alerts.yml) and include it in prometheus.yml.

groups:
  - name: mysql.rules
    rules:
      - alert: MysqlDown
        expr: mysql_up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "MySQL instance {{ $labels.instance }} is down."
          description: "The mysqld_exporter failed to connect to MySQL instance {{ $labels.instance }} for 5 minutes."

      - alert: HighMysqlConnections
        expr: mysql_global_status_threads_connected > 500 # Adjust threshold
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High number of MySQL connections on {{ $labels.instance }}."
          description: "MySQL instance {{ $labels.instance }} has {{ $value }} active connections, exceeding the threshold."

      - alert: HighMysqlThreadsRunning
        expr: mysql_global_status_threads_running > 100 # Adjust threshold
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High number of running MySQL threads on {{ $labels.instance }}."
          description: "MySQL instance {{ $labels.instance }} has {{ $value }} running threads, indicating potential contention."

      - alert: HighMysqlSlowQueries
        expr: rate(mysql_global_status_slow_queries[5m]) > 10 # Adjust rate and threshold
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High rate of slow MySQL queries on {{ $labels.instance }}."
          description: "MySQL instance {{ $labels.instance }} is experiencing a high rate of slow queries ({{ $value }} per second)."

      - alert: MysqlReplicationLag
        expr: mysql_slave_status_seconds_behind_master > 600 # 10 minutes lag
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "MySQL replication lag detected on {{ $labels.instance }}."
          description: "MySQL replica {{ $labels.instance }} is {{ $value }} seconds behind the primary."

Ensure your Prometheus configuration (prometheus.yml) includes:

rule_files:
  - "/etc/prometheus/rules/*.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093'] # Address of your Alertmanager instance

Reload Prometheus and ensure Alertmanager is running and accessible by Prometheus.

System-Level Monitoring with Node Exporter

While application and database specific metrics are vital, understanding the underlying system performance is equally important. node_exporter provides hardware and OS metrics.

Deploying Node Exporter

Similar to mysqld_exporter, download the latest release and set it up as a systemd service. The default port is 9100.

[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter \
  --collector.filesystem.mount-points-exclude="^/(sys|proc|dev|host|etc)($$|/.*)" \
  --collector.netdev.ignore-self-loops=true \
  --collector.netdev.ignore-up=true \
  --web.listen-address=":9100"

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter

Configuring Prometheus for Node Exporter

Add another job to your prometheus.yml:

scrape_configs:
  # ... other jobs ...
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100'] # Or the IP/hostname of your exporter if remote
        labels:
          cluster: 'your-app-cluster-name' # e.g., 'do-app-prod-web-01'
          env: 'production'

Key Node Exporter Metrics for Alerting

node_load1, node_load5, node_load15: System load averages.
node_memory_MemAvailable_bytes: Available memory.
node_disk_io_time_seconds_total: Disk I/O time.
node_filesystem_avail_bytes: Filesystem available space.
node_network_receive_errs_total, node_network_transmit_errs_total: Network interface errors.

Alerting rules for these metrics can be added to your Prometheus rule files, similar to the MySQL alerts, to detect high load, low memory, disk space issues, or network errors.