• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • Home
  • Projects
  • Products
  • Themes
  • Tools
  • Request for Quote

Vengala Vinay

Having 9+ Years of Experience in Software Development

  • Home
  • WordPress
  • PHP
    • Codeigniter
  • Django
  • Magento
  • Selenium
  • Server
Home » Server Monitoring Best Practices: Keeping Your C App and PostgreSQL Clusters Alive on OVH

Server Monitoring Best Practices: Keeping Your C App and PostgreSQL Clusters Alive on OVH

Proactive C Application Health Checks with `systemd`

For critical C applications deployed on OVH infrastructure, robust health checking is paramount. We’ll leverage `systemd`’s built-in capabilities to ensure our application is not only running but also responsive. This involves defining a `systemd` service unit with specific health check directives.

Consider a typical C application that listens on a specific port (e.g., 8080) and exposes a health check endpoint (e.g., `/healthz`). We’ll create a `systemd` service file to manage this application and its health monitoring.

`systemd` Service Unit Configuration

Create a file named `my-c-app.service` in `/etc/systemd/system/`:

[Unit]
Description=My Critical C Application
After=network.target

[Service]
ExecStart=/usr/local/bin/my_c_app --config /etc/my_c_app/config.conf
ExecStop=/bin/kill -s TERM $MAINPID
Restart=on-failure
RestartSec=5s

# Health Check Configuration
Type=notify
NotifyAccess=all
WatchdogSec=10s

# User and Group for security
User=appuser
Group=appgroup

[Install]
WantedBy=multi-user.target

In this configuration:

  • Type=notify: This tells `systemd` that our application will signal its readiness and health status.
  • NotifyAccess=all: Allows the service to send notifications to `systemd`.
  • WatchdogSec=10s: This is crucial. `systemd` will send a “keep-alive” message to the application every 10 seconds. If the application doesn’t respond within this interval, `systemd` will consider it unhealthy and restart it.

Your C application needs to be modified to support `systemd`’s notification protocol. This typically involves:

C Application Modifications for `systemd` Notification

Your C application must periodically send a “READY=1” message to `sd_notify(3)` when it’s ready to accept connections and a “STATUS=…” message to provide status updates. For the watchdog, it needs to respond to a specific signal or message from `systemd`.

#include <systemd/sd-daemon.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <fcntl.h>

#define PORT 8080
#define HEALTH_CHECK_PORT 8081 // Separate port for health check endpoint

volatile sig_atomic_t watchdog_triggered = 0;

void sig_handler(int signum) {
    if (signum == SIGUSR1) { // Assuming SIGUSR1 is used for watchdog
        watchdog_triggered = 1;
    }
}

void setup_health_check_server() {
    int sockfd;
    struct sockaddr_in serv_addr;

    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) {
        perror("ERROR opening socket");
        exit(1);
    }

    // Set socket to non-blocking
    int flags = fcntl(sockfd, F_GETFL, 0);
    if (flags == -1) {
        perror("fcntl F_GETFL");
        exit(1);
    }
    if (fcntl(sockfd, F_SETFL, flags | O_NONBLOCK) == -1) {
        perror("fcntl F_SETFL O_NONBLOCK");
        exit(1);
    }

    memset(&serv_addr, 0, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr = INADDR_ANY;
    serv_addr.sin_port = htons(HEALTH_CHECK_PORT);

    if (bind(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0) {
        perror("ERROR on binding");
        close(sockfd);
        exit(1);
    }

    listen(sockfd, 5);

    // Handle incoming connections in a separate thread or async loop
    // For simplicity, this example doesn't implement full async handling
    // but demonstrates the concept.
    printf("Health check server listening on port %d\n", HEALTH_CHECK_PORT);
}

void handle_health_check_request(int client_sock) {
    char buffer[1024] = {0};
    read(client_sock, buffer, 1024); // Read request (e.g., GET /healthz HTTP/1.1)

    const char *response = "HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nContent-Length: 25\r\n\r\n{\"status\": \"ok\"}";
    send(client_sock, response, strlen(response), 0);
    close(client_sock);
}

int main(int argc, char *argv[]) {
    // Parse arguments, load config, etc.
    // ...

    // Signal handler for watchdog
    signal(SIGUSR1, sig_handler);

    // Setup main application server (e.g., listening on PORT 8080)
    // ...

    // Setup health check server
    setup_health_check_server();

    // Initial notification to systemd
    sd_notify(0, "READY=1");

    int health_check_server_fd = socket(AF_INET, SOCK_STREAM, 0);
    // ... bind and listen for health_check_server_fd on HEALTH_CHECK_PORT ...
    // For simplicity, assuming health_check_server_fd is already set up and listening.
    // In a real app, this would be managed properly.

    while (1) {
        // Main application logic
        // ...

        // Periodically send watchdog notification
        if (sd_notify(0, "STATUS=Processing requests") < 0) {
            // Handle error, maybe log it. sd_notify can fail if systemd is not running.
            fprintf(stderr, "sd_notify failed: %s\n", strerror(errno));
        }

        // Simulate watchdog response (in a real app, this would be a specific mechanism)
        // For this example, we'll assume a simple periodic check.
        // A more robust implementation would involve systemd sending a signal
        // or a specific message that the app needs to acknowledge.
        // The WatchdogSec in systemd service unit is the primary mechanism.
        // The app's responsibility is to *stay alive* and *respond* to systemd's checks.
        // If the app hangs, systemd's watchdog will time out.

        // Example of handling health check requests (simplified)
        struct sockaddr_in client_addr;
        socklen_t client_len = sizeof(client_addr);
        int client_sock = accept(health_check_server_fd, (struct sockaddr *)&client_addr, &client_len);
        if (client_sock >= 0) {
            handle_health_check_request(client_sock);
        } else if (errno != EWOULDBLOCK && errno != EAGAIN) {
            perror("accept");
            // Handle error
        }

        usleep(100000); // Sleep for 100ms
    }

    return 0;
}

After creating the service file and modifying your C application, reload `systemd`, enable, and start the service:

sudo systemctl daemon-reload
sudo systemctl enable my-c-app.service
sudo systemctl start my-c-app.service
sudo systemctl status my-c-app.service

This setup ensures that `systemd` actively monitors your C application. If the application becomes unresponsive (fails to signal `systemd` within `WatchdogSec`), `systemd` will automatically attempt to restart it.

PostgreSQL Cluster Monitoring with `pg_monitor` and `pg_stat_statements`

Monitoring PostgreSQL clusters, especially in a high-availability setup on OVH, requires a multi-faceted approach. We’ll focus on key metrics that indicate performance, availability, and potential issues, using `pg_monitor` (a custom script or tool) and the built-in `pg_stat_statements` extension.

Enabling and Configuring `pg_stat_statements`

`pg_stat_statements` is invaluable for identifying slow queries. Ensure it’s enabled in your `postgresql.conf`.

# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
pg_stat_statements.save = on

After modifying `postgresql.conf`, you must restart your PostgreSQL instances for the changes to take effect. Then, create the extension in each database you want to monitor:

-- In each database you want to monitor
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

You can then query `pg_stat_statements` to find the most resource-intensive queries:

SELECT
    calls,
    total_time,
    rows,
    mean_time,
    stddev_time,
    "user",
    "query"
FROM
    pg_stat_statements
ORDER BY
    total_time DESC
LIMIT 20;

Custom Monitoring Script (`pg_monitor.sh`)

A shell script can aggregate critical metrics. This script should be run periodically (e.g., via cron or `systemd` timers) on each PostgreSQL node.

#!/bin/bash

# Configuration
PG_USER="monitor_user"
PG_HOST="localhost"
PG_PORT="5432"
LOG_FILE="/var/log/pg_monitor.log"
ALERT_THRESHOLD_CPU=80 # %
ALERT_THRESHOLD_MEM=80 # %
ALERT_THRESHOLD_DISK=90 # %
ALERT_THRESHOLD_CONNECTIONS=500
ALERT_THRESHOLD_SLOTS=0 # Minimum number of replication slots

# --- Functions ---

log_message() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}

send_alert() {
    local metric="$1"
    local value="$2"
    local threshold="$3"
    local message="ALERT: PostgreSQL cluster issue on ${PG_HOST}:${PG_PORT}. Metric: ${metric}, Value: ${value}, Threshold: ${threshold}"
    log_message "$message"
    # In a production environment, integrate with your alerting system (e.g., PagerDuty, Slack, Prometheus Alertmanager)
    # echo "$message" | mail -s "PostgreSQL Alert on ${PG_HOST}" [email protected]
}

check_replication_status() {
    local slots_used=$(psql -U "$PG_USER" -h "$PG_HOST" -p "$PG_PORT" -tAc "SELECT count(*) FROM pg_replication_slots;")
    if [ "$slots_used" -lt "$ALERT_THRESHOLD_SLOTS" ]; then
        send_alert "Replication Slots" "$slots_used" "$ALERT_THRESHOLD_SLOTS"
    fi
    # Add checks for streaming replication lag if applicable
}

check_connection_count() {
    local current_connections=$(psql -U "$PG_USER" -h "$PG_HOST" -p "$PG_PORT" -tAc "SELECT count(*) FROM pg_stat_activity;")
    local max_connections=$(psql -U "$PG_USER" -h "$PG_HOST" -p "$PG_PORT" -tAc "SHOW max_connections;")
    if [ "$current_connections" -gt "$ALERT_THRESHOLD_CONNECTIONS" ]; then
        send_alert "Current Connections" "$current_connections" "$ALERT_THRESHOLD_CONNECTIONS"
    fi
    if [ "$current_connections" -gt "$((max_connections * 90 / 100))" ]; then
        send_alert "Connection Usage" "$current_connections/$max_connections" "90% of max_connections"
    fi
}

check_disk_usage() {
    local disk_usage=$(df -h /var/lib/postgresql/data | awk 'NR==2 {print $5}' | sed 's/%//')
    if [ "$disk_usage" -gt "$ALERT_THRESHOLD_DISK" ]; then
        send_alert "Disk Usage" "$disk_usage%" "$ALERT_THRESHOLD_DISK%"
    fi
}

check_cpu_usage() {
    # This is a simplified check. For accurate PostgreSQL CPU usage,
    # consider tools like `pg_top` or more advanced system monitoring.
    local cpu_usage=$(top -bn1 | grep "Cpu(s)" | sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | awk '{print 100 - $1}')
    if (( $(echo "$cpu_usage > $ALERT_THRESHOLD_CPU" | bc -l) )); then
        send_alert "CPU Usage" "${cpu_usage}%" "${ALERT_THRESHOLD_CPU}%"
    fi
}

check_memory_usage() {
    # Similar to CPU, this is a system-wide check.
    local mem_usage=$(free | grep Mem: | awk '{print $3/$2 * 100.0}')
    if (( $(echo "$mem_usage > $ALERT_THRESHOLD_MEM" | bc -l) )); then
        send_alert "Memory Usage" "${mem_usage}%" "${ALERT_THRESHOLD_MEM}%"
    fi
}

check_pg_is_running() {
    if ! pg_isready -h "$PG_HOST" -p "$PG_PORT" -U "$PG_USER" >/dev/null 2>&1; then
        send_alert "PostgreSQL Service" "Not Running" "Running"
        return 1
    fi
    return 0
}

# --- Main Execution ---
log_message "Starting PostgreSQL health check..."

if ! check_pg_is_running; then
    log_message "PostgreSQL is not running on ${PG_HOST}:${PG_PORT}. Exiting."
    exit 1
fi

check_cpu_usage
check_memory_usage
check_disk_usage
check_connection_count
check_replication_status

log_message "PostgreSQL health check finished."
exit 0

To make this script executable and schedule it:

sudo chmod +x pg_monitor.sh
# Create a user with read-only permissions for monitoring
sudo -u postgres psql -c "CREATE USER monitor_user WITH PASSWORD 'your_secure_password';"
sudo -u postgres psql -c "GRANT pg_read_all_stats TO monitor_user;"
# Add to crontab for hourly checks
echo "0 * * * * /path/to/your/pg_monitor.sh" | sudo crontab -

OVH Specific Considerations

When running on OVH, pay close attention to:

  • Network Latency: If your PostgreSQL cluster spans multiple OVH regions or availability zones, monitor inter-node latency. Tools like `ping` and `mtr` can help diagnose network issues.
  • Disk I/O: OVH offers various disk types (SSD, NVMe). Monitor I/O wait times and throughput using `iostat` to ensure your chosen storage meets performance requirements.
  • Resource Limits: Be aware of CPU and memory limits imposed by your OVH instance type. Use `top`, `htop`, and `free` to monitor resource utilization.
  • Firewall Rules: Ensure that PostgreSQL ports (default 5432) are accessible between your application servers and database nodes, and that monitoring tools can reach the database.

By combining `systemd`’s process management with detailed PostgreSQL metrics and custom scripting, you can build a resilient monitoring strategy for your C applications and PostgreSQL clusters on OVH.

Primary Sidebar

A little about the Author

Having 9+ Years of Experience in Software Development.
Expertised in Php Development, WordPress Custom Theme Development (From scratch using underscores or Genesis Framework or using any blank theme or Premium Theme), Custom Plugin Development. Hands on Experience on 3rd Party Php Extension like Chilkat, nSoftware.

Recent Posts

  • Step-by-Step: Diagnosing thread pools deadlock during concurrent ActiveRecord transaction processing on Linode Servers
  • Securing Your E-commerce APIs: Preventing SQL Injection (SQLi) in customized checkout queries in WooCommerce Implementations
  • Disaster Recovery 101: Architecting Auto-Failovers for MySQL and Ruby Deployments on Linode
  • High-Throughput Caching Strategies: Scaling MySQL for Perl Application APIs
  • Disaster Recovery 101: Architecting Auto-Failovers for DynamoDB and Laravel Deployments on DigitalOcean

Copyright © 2026 · Vinay Vengala