Server Monitoring Best Practices: Keeping Your C App and Redis Clusters Alive on Linode

Proactive C Application Health Checks with Systemd

For critical C applications running on Linode, robust health monitoring is paramount. Relying solely on external probes can lead to delayed detection of internal application failures. Integrating health checks directly into the system’s service manager, like systemd, provides a more immediate and granular approach. We’ll configure systemd to periodically check the health of our C application and restart it if it becomes unresponsive.

Assume your C application listens on a specific port (e.g., 8080) and has a health check endpoint (e.g., /health) that returns an HTTP 200 OK status when healthy. If the application crashes or hangs, this endpoint will become unreachable.

Systemd Service Unit Configuration

Create a systemd service file for your application. Let’s assume your application binary is located at /opt/myapp/myapp-server and its configuration is at /etc/myapp/myapp.conf.

Create the service file /etc/systemd/system/myapp.service:

[Unit]
Description=My C Application Server
After=network.target

[Service]
ExecStart=/opt/myapp/myapp-server --config /etc/myapp/myapp.conf
Restart=on-failure
RestartSec=5
User=myappuser
Group=myappgroup
WorkingDirectory=/opt/myapp

# Health Check Configuration
ExecStartPost=/usr/bin/curl --fail --silent --connect-timeout 5 http://localhost:8080/health > /dev/null 2>&1
HealthCheckIntervalSec=10
HealthCheckTimeoutSec=5

[Install]
WantedBy=multi-user.target

Explanation:

Description: A human-readable description of the service.
After=network.target: Ensures the network is up before starting the service.
ExecStart: The command to start your C application.
Restart=on-failure: Configures systemd to restart the service if it exits with a non-zero status code.
RestartSec=5: Waits 5 seconds before attempting a restart.
User/Group: Runs the application as a non-root user for security.
WorkingDirectory: Sets the working directory for the application.
ExecStartPost: This is crucial. It runs a command after the main ExecStart command has successfully started. We use curl to hit the /health endpoint. --fail makes curl return a non-zero exit code if the HTTP status is not 2xx or 3xx. --silent suppresses progress meters and error messages. --connect-timeout 5 limits the connection attempt to 5 seconds. The output is redirected to /dev/null as we only care about the exit status.
HealthCheckIntervalSec: (Systemd v235+) Specifies how often to run the health check command.
HealthCheckTimeoutSec: (Systemd v235+) Specifies the maximum time the health check command is allowed to run.
[Install]: Defines how the service should be enabled.

After creating the service file, reload systemd, enable, and start your application:

sudo systemctl daemon-reload
sudo systemctl enable myapp.service
sudo systemctl start myapp.service

You can check the status and logs with:

sudo systemctl status myapp.service
sudo journalctl -u myapp.service -f

Redis Cluster Monitoring with Redis-CLI and Prometheus Exporter

Monitoring a Redis cluster involves tracking node health, memory usage, latency, and command statistics. A common and effective approach is to use the built-in redis-cli for basic checks and integrate with Prometheus using the redis_exporter for comprehensive metrics collection.

Basic Redis Node Health Check Script

We can create a simple shell script to check the status of each node in the Redis cluster. This script can be run periodically by cron or integrated into a more sophisticated monitoring system.

Create a script, e.g., /opt/scripts/check_redis_cluster.sh:

#!/bin/bash

REDIS_HOSTS=("redis-node-1:6379" "redis-node-2:6379" "redis-node-3:6379")
EXPECTED_MASTERS=2 # Assuming a 3-node cluster with 2 masters for HA
EXPECTED_SLAVES=1

CLUSTER_INFO=$(redis-cli -h redis-node-1 -p 6379 cluster info 2>&1)
CLUSTER_NODES=$(redis-cli -h redis-node-1 -p 6379 cluster nodes 2>&1)

# Check cluster_state
if echo "$CLUSTER_INFO" | grep -q "cluster_state:ok"; then
    echo "Redis cluster state is OK."
else
    echo "ERROR: Redis cluster state is NOT OK. Cluster Info:"
    echo "$CLUSTER_INFO"
    exit 1
fi

# Check number of masters and slaves
MASTERS=$(echo "$CLUSTER_NODES" | grep "master" | wc -l)
SLAVES=$(echo "$CLUSTER_NODES" | grep "slave" | wc -l)

if [ "$MASTERS" -eq "$EXPECTED_MASTERS" ] && [ "$SLAVES" -eq "$EXPECTED_SLAVES" ]; then
    echo "Redis cluster has $MASTERS masters and $SLAVES slaves. (Expected: $EXPECTED_MASTERS masters, $EXPECTED_SLAVES slaves)"
else
    echo "WARNING: Redis cluster has $MASTERS masters and $SLAVES slaves. (Expected: $EXPECTED_MASTERS masters, $EXPECTED_SLAVES slaves)"
    # Depending on criticality, you might want to exit 1 here.
fi

# Check individual node reachability and role
for NODE in "${REDIS_HOSTS[@]}"; do
    HOST=$(echo $NODE | cut -d: -f1)
    PORT=$(echo $NODE | cut -d: -f2)
    echo "Checking node: $HOST:$PORT"
    if redis-cli -h $HOST -p $PORT ping > /dev/null 2>&1; then
        ROLE=$(redis-cli -h $HOST -p $PORT role | awk '{print $1}')
        echo "  Node $HOST:$PORT is PINGable. Role: $ROLE"
    else
        echo "  ERROR: Node $HOST:$PORT is NOT PINGable."
        exit 1
    fi
done

exit 0

Make the script executable and add it to cron:

chmod +x /opt/scripts/check_redis_cluster.sh
# Add to cron: run every 5 minutes
echo "*/5 * * * * /opt/scripts/check_redis_cluster.sh >> /var/log/redis_cluster_check.log 2>&1" | crontab -

Integrating Redis Exporter with Prometheus

For more detailed metrics and integration with a centralized monitoring system like Prometheus, deploying the redis_exporter is highly recommended. This exporter runs as a separate service and exposes Redis metrics in a Prometheus-compatible format.

Installation:

Download the latest release from the redis_exporter releases page. For example, on a Debian/Ubuntu system:

wget https://github.com/oliver006/redis_exporter/releases/download/v1.47.0/redis_exporter-v1.47.0.linux.amd64.tar.gz
tar xvfz redis_exporter-v1.47.0.linux.amd64.tar.gz
sudo mv redis_exporter /usr/local/bin/

Systemd Service for Redis Exporter:

Create a systemd service file /etc/systemd/system/redis_exporter.service. This example assumes your Redis cluster is accessible via a service discovery mechanism or a static list of hosts. For simplicity, we’ll configure it to scrape a single node, but it can be configured to scrape multiple or use Redis Sentinel.

[Unit]
Description=Redis Exporter
After=network.target

[Service]
User=redis_exporter
Group=redis_exporter
ExecStart=/usr/local/bin/redis_exporter \
  --redis.addr=redis://redis-node-1:6379 \
  --redis.addr=redis://redis-node-2:6379 \
  --redis.addr=redis://redis-node-3:6379 \
  --check-keyspace=true \
  --check-keyspace.interval=15m \
  --namespace=redis_cluster
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Explanation:

--redis.addr: Specifies the Redis instance(s) to connect to. You can list multiple for a cluster.
--check-keyspace: Enables keyspace statistics collection.
--check-keyspace.interval: How often to collect keyspace stats.
--namespace: A prefix for Prometheus metrics.

Create a user for the exporter, set up permissions, reload systemd, and start the service:

sudo useradd --system --no-create-home redis_exporter
sudo systemctl daemon-reload
sudo systemctl enable redis_exporter.service
sudo systemctl start redis_exporter.service

The exporter will now be available at http://localhost:9121/metrics. Configure your Prometheus server to scrape this endpoint.

Prometheus Configuration for Redis Cluster Scraping

In your Prometheus configuration file (e.g., /etc/prometheus/prometheus.yml), add a scrape job for the Redis exporter:

scrape_configs:
  - job_name: 'redis_cluster'
    static_configs:
      - targets: ['localhost:9121'] # Or the IP/hostname of your redis_exporter instance
    metrics_path: '/metrics'

If you have multiple Redis exporters or want to use service discovery, adjust the static_configs accordingly. After updating Prometheus configuration, reload it:

curl -X POST http://localhost:9090/-/reload

Alerting on Key Redis Metrics

With Prometheus collecting metrics, you can define alerting rules in Prometheus’s rules.yml file (or a separate file referenced in prometheus.yml). Here are some essential alerts for a Redis cluster:

- alert: RedisClusterDown
  expr: |
    up{job="redis_cluster"} == 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Redis cluster exporter is down on {{ $labels.instance }}"
    description: "The Prometheus exporter for Redis cluster is not reachable."

- alert: RedisNodeNotPinging
  expr: |
    redis_up{job="redis_cluster"} == 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Redis node {{ $labels.instance }} is not responding to PING"
    description: "The Redis node {{ $labels.instance }} is down or unreachable."

- alert: RedisHighMemoryUsage
  expr: |
    redis_memory_used_bytes{job="redis_cluster"} / redis_total_system_memory_bytes{job="redis_cluster"} * 100 > 85
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Redis node {{ $labels.instance }} has high memory usage ({{ $value | printf "%.2f" }}%)"
    description: "Redis node {{ $labels.instance }} is using {{ $value | printf "%.2f" }}% of its allocated memory."

- alert: RedisHighLatency
  expr: |
    rate(redis_commands_total{job="redis_cluster",verb="PING"}[5m]) > 0 AND
    avg_over_time(redis_commands_duration_seconds_sum{job="redis_cluster",verb="PING"}[5m]) / avg_over_time(redis_commands_total{job="redis_cluster",verb="PING"}[5m]) > 0.01
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Redis node {{ $labels.instance }} has high PING latency"
    description: "Redis node {{ $labels.instance }} is experiencing high latency for PING commands."

- alert: RedisClusterStateNotOk
  expr: |
    redis_cluster_state{job="redis_cluster"} != "ok"
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Redis cluster state is not OK on {{ $labels.instance }}"
    description: "The Redis cluster state on {{ $labels.instance }} is {{ $value }}, expected 'ok'."

These alerts, when combined with Prometheus Alertmanager, provide a robust notification system for critical issues affecting your C application and Redis cluster on Linode.

Server Monitoring Best Practices: Keeping Your C App and Redis Clusters Alive on Linode

Proactive C Application Health Checks with Systemd

Systemd Service Unit Configuration

Redis Cluster Monitoring with Redis-CLI and Prometheus Exporter

Basic Redis Node Health Check Script

Integrating Redis Exporter with Prometheus

Prometheus Configuration for Redis Cluster Scraping

Alerting on Key Redis Metrics

Recent Posts

Top Categories

Our Products

Our Services