Server Monitoring Best Practices: Keeping Your C App and PostgreSQL Clusters Alive on DigitalOcean

Proactive PostgreSQL Cluster Health Checks

Maintaining the health of a PostgreSQL cluster, especially in a distributed cloud environment like DigitalOcean, requires more than just basic uptime checks. We need to go deeper, monitoring key performance indicators (KPIs) that signal potential issues before they impact application availability. This involves a multi-layered approach, starting with essential system metrics and progressing to PostgreSQL-specific diagnostics.

System-Level Monitoring with Prometheus and Node Exporter

For robust system-level monitoring, Prometheus is the de facto standard. We’ll deploy node_exporter on each DigitalOcean Droplet hosting a PostgreSQL instance. This exporter exposes a wealth of system metrics that Prometheus can scrape.

First, ensure you have Prometheus installed and configured. Then, on each PostgreSQL Droplet, install node_exporter. A common method is to download the pre-compiled binary:

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd -rs /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Next, create a systemd service file to manage the node_exporter process:

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Save this as /etc/systemd/system/node_exporter.service, then enable and start it:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter

Now, configure your Prometheus server to scrape these targets. Add the following to your prometheus.yml configuration:

scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: [':9100', ':9100', ':9100'] # Replace with your PostgreSQL Droplet IPs

Restart Prometheus for the changes to take effect. Key metrics to monitor here include:

node_cpu_seconds_total: CPU utilization trends.
node_memory_MemAvailable_bytes: Available memory. Low availability is a critical indicator.
node_disk_io_time_seconds_total: Disk I/O wait times. High values suggest I/O bottlenecks.
node_network_receive_errs_total and node_network_transmit_errs_total: Network errors.

PostgreSQL-Specific Monitoring with the PostgreSQL Exporter

While node_exporter gives us system context, we need PostgreSQL-specific metrics. The postgres_exporter is an excellent choice for this. It queries PostgreSQL directly and exposes metrics in a Prometheus-compatible format.

Install postgres_exporter on a dedicated monitoring Droplet or one of the PostgreSQL nodes (though a separate node is preferred for isolation). Download the binary similarly to node_exporter.

wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.13.0/postgres_exporter-0.13.0.linux-amd64.tar.gz
tar xvfz postgres_exporter-0.13.0.linux-amd64.tar.gz
sudo mv postgres_exporter-0.13.0.linux-amd64/postgres_exporter /usr/local/bin/
sudo useradd -rs /bin/false postgres_exporter
sudo chown postgres_exporter:postgres_exporter /usr/local/bin/postgres_exporter

Create a PostgreSQL user with minimal privileges for the exporter to connect with. This user should only have `CONNECT` privilege on the databases you want to monitor and `SELECT` on specific system views.

-- Connect to your PostgreSQL instance as a superuser
CREATE USER monitoring_user WITH PASSWORD 'your_secure_password';
GRANT CONNECT ON DATABASE your_database_name TO monitoring_user;
-- On each database you want to monitor:
GRANT USAGE ON SCHEMA pg_catalog TO monitoring_user;
GRANT SELECT ON pg_stat_activity TO monitoring_user;
GRANT SELECT ON pg_stat_database TO monitoring_user;
GRANT SELECT ON pg_stat_replication TO monitoring_user;
GRANT SELECT ON pg_stat_statements TO monitoring_user; -- If pg_stat_statements is enabled
GRANT SELECT ON pg_locks TO monitoring_user;
GRANT SELECT ON pg_settings TO monitoring_user;

Create a .pgpass file for the postgres_exporter user to authenticate:

# ~/.pgpass
hostname:port:database:username:password
localhost:5432:your_database_name:monitoring_user:your_secure_password

Set appropriate permissions:

sudo chown postgres_exporter:postgres_exporter ~/.pgpass
sudo chmod 600 ~/.pgpass

Now, create a systemd service for postgres_exporter. You’ll need to specify the connection string. If running on the same host as PostgreSQL, it might be simpler. For remote connections, use the full DSN.

[Unit]
Description=PostgreSQL Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=postgres_exporter
Group=postgres_exporter
Type=simple
# Example for a local connection:
# ExecStart=/usr/local/bin/postgres_exporter --web.listen-address=":9187" --extend.queries-path=/etc/postgres_exporter/queries.yaml
# Example for a remote connection (replace with your actual DSN):
ExecStart=/usr/local/bin/postgres_exporter --web.listen-address=":9187" --pg.connstr="postgresql://monitoring_user:your_secure_password@:5432/your_database_name" --extend.queries-path=/etc/postgres_exporter/queries.yaml

[Install]
WantedBy=multi-user.target

Save this as /etc/systemd/system/postgres_exporter.service. You might also want to create a custom queries.yaml for additional metrics. Then, enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable postgres_exporter
sudo systemctl start postgres_exporter
sudo systemctl status postgres_exporter

Configure Prometheus to scrape the postgres_exporter:

scrape_configs:
  - job_name: 'postgres_exporter'
    static_configs:
      - targets: [':9187'] # IP of the Droplet running postgres_exporter

Key PostgreSQL metrics to monitor include:

pg_up: Basic connectivity check.
pg_stat_database_numbackends: Number of active connections. High numbers can indicate connection pool exhaustion or slow queries.
pg_stat_replication_lag_seconds: Replication lag for standby servers. Critical for HA/DR.
pg_stat_database_blks_read and pg_stat_database_blks_hit: Cache hit ratio. Low hit ratios suggest insufficient memory or inefficient queries.
pg_stat_activity_count: Number of active queries, broken down by state (active, idle, etc.).
pg_locks_count: Number of active locks. High contention is a major performance killer.
pg_settings_max_connections: Monitor against actual usage.

Application-Level Monitoring with Custom Metrics

Your C application is the direct consumer of PostgreSQL. Monitoring its interaction with the database is crucial. This involves instrumenting your C code to expose custom metrics that can be scraped by Prometheus.

The prometheus-client-c library is a good starting point. You’ll need to integrate it into your C application’s build process.

# Example build integration (using CMake)
# Add this to your CMakeLists.txt
find_package(prometheus_client_c REQUIRED)
target_link_libraries(your_app PRIVATE prometheus_client_c::prometheus_client_c)

In your C code, you can define and update metrics:

#include <prometheus/counter.h>
#include <prometheus/registry.h>
#include <prometheus/exposer.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

// Global registry and exposer
prometheusio_registry_t* registry = NULL;
prometheusio_exposer_t* exposer = NULL;

// Define a counter for successful database operations
prometheusio_counter_t* db_operations_success_counter = NULL;

void initialize_prometheus_metrics() {
    // Create a new registry
    registry = prometheusio_registry_new();
    if (!registry) {
        fprintf(stderr, "Failed to create Prometheus registry\n");
        exit(EXIT_FAILURE);
    }

    // Create a counter metric
    db_operations_success_counter = prometheusio_counter_new(
        "app_db_operations_total",
        "Total number of successful database operations.",
        prometheusio_labels_new("operation_type", NULL) // Example label
    );
    if (!db_operations_success_counter) {
        fprintf(stderr, "Failed to create success counter metric\n");
        prometheusio_registry_destroy(registry);
        exit(EXIT_FAILURE);
    }
    prometheusio_registry_register(registry, db_operations_success_counter);

    // Create an HTTP exposer to serve metrics on port 9394
    exposer = prometheusio_exposer_new(registry, "0.0.0.0", "9394");
    if (!exposer) {
        fprintf(stderr, "Failed to create Prometheus exposer\n");
        prometheusio_registry_destroy(registry);
        exit(EXIT_FAILURE);
    }
    prometheusio_exposer_start(exposer);
    printf("Prometheus metrics exposed on http://0.0.0.0:9394/metrics\n");
}

void increment_db_success(const char* op_type) {
    if (db_operations_success_counter) {
        prometheusio_counter_inc(db_operations_success_counter, prometheusio_labels_new("operation_type", op_type, NULL));
    }
}

int main() {
    initialize_prometheus_metrics();

    // Simulate database operations
    while (1) {
        // ... perform database operation ...
        printf("Performing a 'read' operation...\n");
        increment_db_success("read");
        sleep(5);

        // ... perform another database operation ...
        printf("Performing a 'write' operation...\n");
        increment_db_success("write");
        sleep(10);
    }

    // Cleanup (though this loop is infinite in this example)
    prometheusio_exposer_destroy(exposer);
    prometheusio_registry_destroy(registry);
    return 0;
}

Compile and run your C application. Then, configure Prometheus to scrape your application’s metrics endpoint (e.g., http://:9394/metrics).

scrape_configs:
  - job_name: 'my_c_app'
    static_configs:
      - targets: [':9394'] # IP of the Droplet running your C app

Custom metrics to consider:

Database query latency (e.g., app_db_query_duration_seconds histogram).
Number of failed database operations (app_db_operations_failed_total counter).
Connection pool usage (app_db_pool_connections_current gauge).
Specific application logic timings that depend on database interaction.

Alerting with Alertmanager

Collecting metrics is only half the battle. We need to be notified when things go wrong. Alertmanager integrates with Prometheus to handle alerts. Define alert rules in Prometheus based on the metrics we’re collecting.

Example alert rule for high PostgreSQL replication lag:

groups:
- name: postgresql_alerts
  rules:
  - alert: HighReplicationLag
    expr: pg_replication_lag_seconds > 60 # Alert if lag is over 60 seconds
    for: 5m # Condition must be true for 5 minutes
    labels:
      severity: critical
    annotations:
      summary: "High replication lag detected on {{ $labels.instance }}"
      description: "PostgreSQL replication lag on {{ $labels.instance }} is {{ $value }} seconds, exceeding the threshold."

Example alert rule for low available memory on a PostgreSQL node:

groups:
- name: node_alerts
  rules:
  - alert: LowAvailableMemory
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 # Less than 10% available memory
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Low available memory on {{ $labels.instance }}"
      description: "Droplet {{ $labels.instance }} has only {{ printf "%.2f" (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100) }}% available memory."

Configure Alertmanager to route these alerts to your preferred notification channels (Slack, PagerDuty, email, etc.). Ensure your Alertmanager configuration points to your Prometheus server.

Log Aggregation and Analysis

Metrics tell you *what* is happening, but logs often tell you *why*. Centralizing logs from your PostgreSQL instances and C application is essential for debugging and incident response.

Consider using a log aggregation tool like Loki, Elasticsearch, or a managed service. For Loki, you’d deploy Promtail on each Droplet to collect logs and forward them to your Loki instance.

Configure Promtail to tail PostgreSQL logs (typically found in /var/log/postgresql/postgresql-*.log) and your C application’s logs.

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/postgresql/postgresql-*.log
  - job_name: my_c_app_logs
    static_configs:
      - targets:
          - localhost
        labels:
          job: my_c_app
          __path__: /var/log/my_c_app/*.log # Adjust path as needed

In Grafana, you can then query Loki to visualize and search your logs alongside your metrics, providing a holistic view of your system’s health.

Regular Health Checks and Maintenance

Beyond automated monitoring, schedule regular manual checks and maintenance. This includes:

VACUUM and ANALYZE: Ensure these are running regularly (either via cron jobs or PostgreSQL’s autovacuum). Monitor pg_stat_user_tables for last_autovacuum and last_autoanalyze.
Disk Space: Proactively monitor disk usage on all Droplets.
Backups: Verify that your PostgreSQL backups are completing successfully and test restores periodically.
Connection Pooling: If using a connection pooler like PgBouncer, monitor its metrics as well.
PostgreSQL Version Updates: Keep your PostgreSQL instances patched and updated.

By implementing this comprehensive monitoring strategy, you can significantly improve the reliability and performance of your C application and PostgreSQL clusters on DigitalOcean, moving from reactive firefighting to proactive system management.

Server Monitoring Best Practices: Keeping Your C App and PostgreSQL Clusters Alive on DigitalOcean

Proactive PostgreSQL Cluster Health Checks

System-Level Monitoring with Prometheus and Node Exporter

PostgreSQL-Specific Monitoring with the PostgreSQL Exporter

Application-Level Monitoring with Custom Metrics

Alerting with Alertmanager

Log Aggregation and Analysis

Regular Health Checks and Maintenance

Recent Posts

Top Categories

Our Products

Our Services