Server Monitoring Best Practices: Keeping Your C App and MySQL Clusters Alive on DigitalOcean
Proactive C Application Health Checks
For a C application running on DigitalOcean, robust health checking is paramount. We’re not just talking about a simple `ping`. We need to ensure the application process is alive, responsive, and not stuck in a deadlock or consuming excessive resources. A common pattern is to expose an HTTP endpoint that the application itself serves, which external monitoring tools can query. This endpoint should perform internal checks.
Consider a C application that manages a pool of worker threads and interacts with a database. The health check endpoint should verify:
- The main application process is running.
- The number of active worker threads is within an acceptable range.
- The connection pool to MySQL is healthy (e.g., can acquire a connection).
- No critical error flags are set within the application’s internal state.
Here’s a simplified example of how you might implement such a health check endpoint using a lightweight HTTP server library like libmicrohttpd. This snippet focuses on the health check logic itself, assuming the server setup is handled elsewhere.
C Health Check Endpoint Implementation
#include <stdio.h>
#include <stdlib.h>
#include <microhttpd.h>
#include <string.h>
// Assume these are global or accessible application state variables
extern int active_worker_threads;
extern int max_worker_threads;
extern int db_connection_pool_size;
extern int db_connections_in_use;
extern int critical_error_flag;
// Function to perform internal health checks
int perform_internal_health_checks() {
if (critical_error_flag) {
return 0; // Critical error detected
}
if (active_worker_threads < 0 || active_worker_threads > max_worker_threads) {
return 0; // Worker thread count out of bounds
}
// A more sophisticated check would attempt to acquire a DB connection
// For simplicity, we'll just check the pool status
if (db_connections_in_use > db_connection_pool_size) {
return 0; // Connection pool overflow (indicative of a problem)
}
return 1; // All checks passed
}
// Callback function for HTTP requests
static int health_check_handler(void *cls, struct MHD_Connection *connection,
const char *url, const char *method,
const char *version, const char *upload_data,
size_t *upload_data_size, void **con_cls) {
if (strcmp(url, "/healthz") == 0 && strcmp(method, "GET") == 0) {
if (perform_internal_health_checks()) {
const char *response = "{\"status\": \"OK\"}";
struct MHD_Response *mhd_response;
mhd_response = MHD_create_response_from_buffer(strlen(response), (void *)response, MHD_NO_FREE_BUFFER_வதால்);
MHD_add_response_header(mhd_response, MHD_HTTP_HEADER_CONTENT_TYPE, "application/json");
return MHD_queue_response(connection, MHD_HTTP_STATUS_OK, mhd_response);
} else {
const char *response = "{\"status\": \"ERROR\", \"message\": \"Application health check failed\"}";
struct MHD_Response *mhd_response;
mhd_response = MHD_create_response_from_buffer(strlen(response), (void *)response, MHD_NO_FREE_BUFFER_வதால்);
MHD_add_response_header(mhd_response, MHD_HTTP_HEADER_CONTENT_TYPE, "application/json");
return MHD_queue_response(connection, MHD_HTTP_STATUS_INTERNAL_SERVER_ERROR, mhd_response);
}
}
// Handle other requests or return 404
const char *response = "Not Found";
struct MHD_Response *mhd_response;
mhd_response = MHD_create_response_from_buffer(strlen(response), (void *)response, MHD_NO_FREE_BUFFER_வதால்);
return MHD_queue_response(connection, MHD_HTTP_STATUS_NOT_FOUND, mhd_response);
}
// Function to start the HTTP server (simplified)
void start_health_check_server(int port) {
struct MHD_Daemon *daemon;
daemon = MHD_start_daemon(MHD_THREAD_PER_CONNECTION, port, NULL, NULL,
&health_check_handler, NULL, MHD_OPTION_END);
if (daemon == NULL) {
fprintf(stderr, "Failed to start HTTP daemon on port %d\n", port);
exit(1);
}
printf("Health check server started on port %d\n", port);
// In a real app, this would likely run in a separate thread or be managed by an event loop.
// For this example, we assume the main thread continues or this is called from a dedicated thread.
}
// Example usage (in your main application logic)
/*
int main() {
// ... initialize application state ...
active_worker_threads = 5;
max_worker_threads = 10;
db_connection_pool_size = 20;
db_connections_in_use = 3;
critical_error_flag = 0;
// Start the health check server on a dedicated port (e.g., 8081)
start_health_check_server(8081);
// ... rest of your application logic ...
// To stop the daemon (not shown here, requires MHD_destroy_daemon)
return 0;
}
*/
This C code exposes a /healthz endpoint. When accessed via GET, it calls perform_internal_health_checks(). If all checks pass, it returns HTTP 200 OK with a JSON payload {"status": "OK"}. Otherwise, it returns HTTP 500 Internal Server Error with a descriptive JSON message. This endpoint should run on a separate port from your main application’s service port to avoid interference.
Monitoring C App Health with Prometheus & Node Exporter
To effectively monitor this C application, we’ll leverage Prometheus. The standard way to get system-level metrics and custom application metrics into Prometheus is via exporters. For system metrics, node_exporter is the de facto standard. For our custom C application health endpoint, we can use blackbox_exporter.
Setting up Node Exporter
On each DigitalOcean Droplet running your C application, install and configure node_exporter. This typically involves downloading the binary, running it as a systemd service.
# Download the latest release (adjust version as needed) wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz cd node_exporter-1.7.0.linux-amd64 # Create a systemd service file sudo nano /etc/systemd/system/node_exporter.service # Paste the following content: [Unit] Description=Prometheus Node Exporter Wants=network-online.target After=network-online.target [Service] User=nobody Group=nogroup Type=simple ExecStart=/usr/local/bin/node_exporter # Adjust path if you installed elsewhere [Install] WantedBy=multi-user.target # Copy the binary to a standard location sudo cp node_exporter /usr/local/bin/ # Enable and start the service sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter sudo systemctl status node_exporter
Ensure that port 9100 (the default for node_exporter) is open in your DigitalOcean firewall and accessible from your Prometheus server.
Setting up Blackbox Exporter for C App Health
blackbox_exporter allows Prometheus to probe endpoints over various protocols (HTTP, HTTPS, TCP, ICMP, DNS). We’ll use it to probe our C application’s /healthz endpoint.
First, install blackbox_exporter on a machine that can reach your application’s health port (this could be the Prometheus server itself or a dedicated monitoring host). Then, configure its blackbox.yml file.
# blackbox.yml
modules:
http_2xx: # A generic module for HTTP 2xx checks
prober: http
timeout: 5s
http:
method: GET
# No specific headers needed for our simple healthz endpoint
# headers:
# Host: example.com
# Validate the response body for specific content
# This is crucial for our custom healthz endpoint
fail_if_not_contains: '"status": "OK"'
fail_if_body_contains_not: '"status": "OK"' # Ensure it *must* contain OK
# fail_if_body_contains: "ERROR" # Alternative: fail if ERROR is present
# Custom module for our C app health check
c_app_health:
prober: http
timeout: 5s
http:
method: GET
# We expect a JSON response, so we check for the specific OK status
fail_if_not_contains: '"status": "OK"'
fail_if_body_contains_not: '"status": "OK"'
# If the C app returns 500, blackbox will report a non-2xx status,
# which Prometheus will interpret as a failure.
Start blackbox_exporter with this configuration. It typically runs on port 9115.
# Download and run blackbox_exporter (adjust version) wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz tar xvfz blackbox_exporter-0.25.0.linux-amd64.tar.gz cd blackbox_exporter-0.25.0.linux-amd64 # Run it, pointing to your config file ./blackbox_exporter --config.file=blackbox.yml # Or set up as a systemd service for production
Prometheus Configuration
Now, configure Prometheus to scrape both node_exporter and blackbox_exporter. Add these scrape configurations to your prometheus.yml.
# prometheus.yml
scrape_configs:
# Scrape Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Scrape Node Exporter on all application Droplets
- job_name: 'node_exporter'
static_configs:
# Replace with your actual Droplet IPs or DNS names
- targets:
- '192.168.1.10:9100' # App Server 1
- '192.168.1.11:9100' # App Server 2
- '192.168.1.12:9100' # App Server 3
# Use service discovery (e.g., DigitalOcean integration, Consul, file_sd_configs)
# for dynamic environments. For simplicity, static_configs are shown.
# Scrape Blackbox Exporter for C App health checks
- job_name: 'c_app_health_check'
metrics_path: /probe
params:
module: [c_app_health] # Use the custom module defined in blackbox.yml
static_configs:
# Target the blackbox exporter itself, but specify the actual app endpoint
# in the 'targets' list. Blackbox will then probe these.
- targets:
- 'http://192.168.1.10:8081/healthz' # App Server 1 Health Endpoint
- 'http://192.168.1.11:8081/healthz' # App Server 2 Health Endpoint
- 'http://192.168.1.12:8081/healthz' # App Server 3 Health Endpoint
relabel_configs:
# This relabeling is crucial: it tells Prometheus to scrape the blackbox_exporter
# but to use the *original* target URL (the app's healthz endpoint) as the
# instance label for the metrics.
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 'localhost:9115' # Address of your blackbox_exporter
# Example for MySQL cluster health (see next section)
# - job_name: 'mysql_cluster_health'
# ...
With this setup, Prometheus will periodically query blackbox_exporter, which in turn probes your C application’s /healthz endpoint. If the C app returns anything other than a 200 OK with the expected JSON, or if the endpoint is unreachable, Prometheus will record a failure for the c_app_health_check job.
MySQL Cluster Monitoring: Percona Monitoring and Management (PMM)
For MySQL clusters, especially on DigitalOcean where you might be managing multiple nodes for replication or sharding, a dedicated monitoring solution is highly recommended. Percona Monitoring and Management (PMM) is an excellent open-source choice that provides deep insights into MySQL performance and health.
Deploying Percona PMM Server
PMM consists of a server component (which collects and visualizes data) and client agents that run on your database nodes. The easiest way to deploy the PMM server is using Docker on a dedicated Droplet.
# On a dedicated Droplet for PMM Server # Ensure Docker and Docker Compose are installed # Create a directory for PMM data sudo mkdir -p /opt/pmm/data sudo chown -R "$(id -u):$(id -g)" /opt/pmm/data # Create a docker-compose.yml file nano docker-compose.yml
version: '3.7'
services:
pmm-server:
image: perconalab/pmm-server:2.38.0 # Use a specific, stable version
container_name: pmm-server
restart: always
ports:
- "80:80" # PMM UI
- "443:443" # PMM UI (HTTPS)
- "3306:3306" # MySQL client access (optional, for direct DB access)
- "9003:9003" # QAN API
- "9009:9009" # Prometheus API
- "9010:9010" # Alertmanager API
volumes:
- /opt/pmm/data:/srv/data
- /opt/pmm/log:/var/log
environment:
- VIRTUAL_HOST=pmm.yourdomain.com # Set your FQDN here
- LETSENCRYPT_HOST=pmm.yourdomain.com # For automatic SSL
- [email protected] # For SSL
- TIMEZONE=UTC # Or your preferred timezone
Start the PMM server:
sudo docker-compose up -d
Access the PMM UI via the FQDN you configured (e.g., https://pmm.yourdomain.com). You’ll need to set up DNS for this FQDN to point to your PMM server Droplet’s IP address. The default credentials are usually admin/admin, which you should change immediately.
Deploying PMM Agents on MySQL Nodes
For each MySQL node in your cluster, you need to deploy the PMM client agent. This agent collects metrics and sends them to the PMM server.
# On each MySQL node # Ensure Docker is installed # Create a directory for PMM agent data sudo mkdir -p /opt/pmm/agent sudo chown -R "$(id -u):$(id -g)" /opt/pmm/agent # Create a docker-compose.yml for the agent nano docker-compose-agent.yml
version: '3.7'
services:
pmm-client:
image: perconalab/pmm-client:2.38.0 # Match PMM Server version
container_name: pmm-client
restart: always
pid: host
network_mode: host # Agent needs to access MySQL on host network
environment:
- PMA_HOST=pmm.yourdomain.com # FQDN of your PMM Server
- PMA_PORT=443
- PMA_USER=admin # PMM Server admin user
- PMA_PASSWORD=your_pmm_admin_password # PMM Server admin password
- PMA_SSL_MODE=verify_identity # Or 'disable' if PMM server uses self-signed certs initially
- PMA_API_URL=https://pmm.yourdomain.com:443/api/v1
volumes:
- /opt/pmm/agent:/srv/agent-data
- /var/run/docker.sock:/var/run/docker.sock # If MySQL is in Docker
- /var/lib/mysql:/var/lib/mysql # If MySQL is on host
- /etc/mysql:/etc/mysql # If MySQL config is on host
command: >
--server-url=https://admin:[email protected]:443
--config-server-url=https://pmm.yourdomain.com:443
--bind-address=0.0.0.0
--listen-port=9100 # Default agent port
Start the PMM agent on each MySQL node:
sudo docker-compose -f docker-compose-agent.yml up -d
After starting the agent, you need to register your MySQL instance with PMM. This is done via the PMM UI. Navigate to “Inventory” -> “Add Service” and select “MySQL/MariaDB/Percona Server”. Provide the connection details for your MySQL instance (host, port, user, password). PMM will then automatically start collecting metrics.
Key MySQL Metrics to Monitor with PMM
Once PMM is collecting data, focus on these critical metrics:
- Replication Lag: Essential for high availability. Monitor
Seconds_Behind_Master(or equivalent for other replication types). PMM visualizes this clearly. - Query Performance: Use the Query Analytics (QAN) feature to identify slow queries, high-frequency queries, and their impact.
- Connections:
Threads_connected,Threads_running. High numbers can indicate performance bottlenecks or DoS attacks. - InnoDB Metrics: Buffer pool hit rate, I/O activity (reads/writes), deadlocks, row lock waits.
- Replication Status: Check
SHOW REPLICA STATUS(orSHOW SLAVE STATUS) for errors, and ensureReplica_IO_RunningandReplica_SQL_Runningare ‘Yes’. PMM agents can be configured to run these checks. - Disk I/O: Monitor read/write latency and throughput on the underlying storage.
- CPU/Memory Usage: Standard system metrics, but correlated with MySQL activity.
Alerting on MySQL Issues
PMM integrates with Alertmanager for sophisticated alerting. Configure alert rules in PMM’s alerting section. For example:
# Example alert rule within PMM's alerting configuration
groups:
- name: mysql_alerts
rules:
- alert: MySQLReplicationLagging
expr: mysql_replication_seconds_behind_master > 60 # Lagging by more than 60 seconds
for: 5m
labels:
severity: critical
annotations:
summary: "MySQL replication lag detected on {{ $labels.instance }}"
description: "MySQL instance {{ $labels.instance }} is lagging behind master by {{ $value }} seconds."
- alert: HighMySQLConnections
expr: mysql_global_status_threads_connected > 500 # More than 500 connections
for: 10m
labels:
severity: warning
annotations:
summary: "High number of MySQL connections on {{ $labels.instance }}"
description: "MySQL instance {{ $labels.instance }} has {{ $value }} active connections."
Ensure your PMM server is configured to send alerts to an Alertmanager instance, which can then route notifications via email, Slack, PagerDuty, etc.
Centralized Logging with ELK Stack (Elasticsearch, Logstash, Kibana)
Effective monitoring isn’t complete without centralized logging. Aggregating logs from your C applications and MySQL servers into a single, searchable location is crucial for debugging and incident response.
Log Collection Agents (Filebeat)
On each Droplet running your C application or MySQL, deploy Filebeat. Filebeat is a lightweight shipper that forwards log files to Logstash or directly to Elasticsearch.
# On each application/database Droplet # Download and install Filebeat (adjust version) curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.11.3-amd64.deb sudo dpkg -i filebeat-8.11.3-amd64.deb # Configure Filebeat to collect logs sudo nano /etc/filebeat/filebeat.yml
# /etc/filebeat/filebeat.yml
filebeat.inputs:
# Collect logs from your C application
- type: log
enabled: true
paths:
- /var/log/your_c_app/*.log # Adjust path to your app's log files
# If your C app logs JSON, uncomment and configure:
# json.from_line:
# message_key: log # Assuming your JSON has a 'log' field for the message
# Collect MySQL error logs
- type: log
enabled: true
paths:
- /var/log/mysql/error.log # Adjust path as per your MySQL config
# MySQL logs are often plain text, no JSON parsing needed here
# Output to Logstash (recommended for processing)
output.logstash:
hosts: ["your_logstash_ip:5044"] # Replace with your Logstash server IP and port
# Or output directly to Elasticsearch if not using Logstash
# output.elasticsearch:
# hosts: ["your_elasticsearch_ip:9200"]
# Disable the default modules unless you need them
# filebeat.modules:
# - module: nginx
# ...
# - module: mysql
# ...
# Disable Elasticsearch output if using Logstash
# setup.template.enabled: false
# setup.ilm.enabled: false
Enable and start Filebeat:
sudo systemctl enable filebeat sudo systemctl start filebeat sudo systemctl status filebeat
Logstash Configuration
Logstash will receive logs from Filebeat, parse them, enrich them if necessary, and send them to Elasticsearch.
# On your Logstash server # Create a pipeline configuration file sudo nano /etc/logstash/conf.d/02-beats-pipeline.conf
# /etc/logstash/conf.d/02-beats-pipeline.conf
input {
beats {
port => 5044
}
}
filter {
# Parse C application logs if they are JSON
if [agent][type] == "log" && [log][file][path] =~ /\/var\/log\/your_c_app\// {
json {
source => "message" # Assuming the entire message is a JSON string
# If your JSON has a specific field for the actual log message, use that:
# source => "log.original" # Example if Filebeat parsed it into log.original
remove_field => ["message"] # Remove original message field after parsing
}
# Add custom fields for C app logs if needed
mutate {
add_field => { "application" => "my_c_app" }
}
}
# Parse MySQL error logs (example: simple grok for common patterns)
if [log][file][path] =~ /mysql\/error.log/ {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} \[%{DATA:thread_id}\] %{GREEDYDATA:mysql_message}" }
overwrite => [ "message" ] # Replace original message with parsed content
}
mutate {
add_field => { "application" => "mysql" }
# Convert timestamp if needed, or rely on Logstash's @timestamp
}
}
# Add GeoIP information based on source IP (if logs come from different servers)
# geoip { source => "client.address" }
# Standard date parsing for @timestamp
date {
match => [ "timestamp", "ISO8601" ] # If your C app logs ISO8601 timestamps
target => "@timestamp"
}
}
output {
elasticsearch {
hosts => ["your_elasticsearch_ip:9200"] # Replace with your Elasticsearch host
index => "%{application}-%{+YYYY.MM.dd}" # Example: my_c_app-2023.11.15 or mysql-2023.11.15
# user => "elastic"
# password => "changeme"
}
}
Restart Logstash after making changes.
Kibana for Visualization and Analysis
Once data flows into Elasticsearch, Kibana provides the interface for searching, visualizing, and creating dashboards.
- Discover Tab: Search and filter logs using Lucene query syntax.
- Visualize Tab: Create charts, graphs, and maps from your log data.
- Dashboard Tab: Combine visualizations into comprehensive dashboards. For example, a dashboard showing C application error rates alongside MySQL slow query counts.
- Index Patterns: Ensure you have created index patterns in Kibana that match your Elasticsearch index naming convention (e.g.,
my_c_app-*,mysql-*).
For example, to visualize C application errors, you might create a bar chart showing the count of logs where application: "my_c_app" and loglevel: "ERROR" (or similar fields depending on your log structure) over time.
Automated Deployments and Configuration Management
Manual configuration and deployment are error-prone. Infrastructure as Code (IaC) tools like Terraform and configuration management tools like Ansible are essential for maintaining consistency and repeatability.
Terraform for Infrastructure Provisioning
Use Terraform to define your DigitalOcean Droplets, VPCs, firewalls, and load balancers.
# main.tf
provider "digitalocean" {
token = var.do_token
}
resource "digitalocean_droplet" "app_server" {
count = 3 # Number of application servers
image = "ubuntu-22-04-x64"
region = "nyc3"
size = "s-2vcpu-4gb"
ssh_keys = [digitalocean_ssh_key.deployer.id]
tags = ["app-server", "production"]
# Ensure monitoring is enabled
monitoring = true
}
resource "digitalocean_droplet" "mysql_node" {
count = 3 # Number of MySQL nodes
image = "ubuntu-22-04-x64"
region = "nyc3"
size = "s-4vcpu-8gb" # Larger size for DBs
ssh_keys = [digitalocean_ssh_key.deployer.id]
tags = ["mysql-node", "production"]
monitoring = true
}
resource "digitalocean_droplet" "pmm_server" {
image = "ubuntu-22-04-x64"
region = "nyc3"
size = "s-2vcpu-4gb"
ssh_keys = [digitalocean_ssh_key.deployer.id]
tags = ["pmm-server", "monitoring"]
monitoring = true
}
resource "digitalocean_ssh_key" "deployer" {
name = "deployer-key"
public_key = file("~/.ssh/id_rsa.pub") # Path to your public SSH key
}
# Define firewall rules
resource "digitalocean_firewall" "app_firewall" {
name = "app-firewall"
# Allow SSH from trusted IPs
inbound_rule {
protocol = "tcp"
port_range = "22"
# Add your trusted IP ranges here
# sources {
# addresses = ["YOUR_HOME_IP/32"]
# }
}
# Allow health check port
inbound_rule {
protocol = "tcp"
port_range = "8081" # C App health check port
# Allow from anywhere or specific monitoring IPs
# sources {
# addresses = ["0.0.0.0/0"]
# }
}
# Allow Prometheus to scrape health check
inbound_rule {
protocol = "tcp"
port_range = "8081"
sources {
addresses = ["YOUR_PROMETHEUS_SERVER_IP/32"]
}
}
# Allow Prometheus to scrape node_exporter
inbound_rule {
protocol = "tcp"
port_range = "91