Server Monitoring Best Practices: Keeping Your C App and Redis Clusters Alive on Linode
Proactive C Application Health Checks with Systemd
For critical C applications running on Linode, robust health monitoring is paramount. Relying solely on external probes can lead to delayed detection of internal application failures. Integrating health checks directly into the system’s service manager, like systemd, provides a more immediate and granular approach. We’ll configure systemd to periodically check the health of our C application and restart it if it becomes unresponsive.
Assume your C application listens on a specific port (e.g., 8080) and has a health check endpoint (e.g., /health) that returns an HTTP 200 OK status when healthy. If the application crashes or hangs, this endpoint will become unreachable.
Systemd Service Unit Configuration
Create a systemd service file for your application. Let’s assume your application binary is located at /opt/myapp/myapp-server and its configuration is at /etc/myapp/myapp.conf.
Create the service file /etc/systemd/system/myapp.service:
[Unit] Description=My C Application Server After=network.target [Service] ExecStart=/opt/myapp/myapp-server --config /etc/myapp/myapp.conf Restart=on-failure RestartSec=5 User=myappuser Group=myappgroup WorkingDirectory=/opt/myapp # Health Check Configuration ExecStartPost=/usr/bin/curl --fail --silent --connect-timeout 5 http://localhost:8080/health > /dev/null 2>&1 HealthCheckIntervalSec=10 HealthCheckTimeoutSec=5 [Install] WantedBy=multi-user.target
Explanation:
Description: A human-readable description of the service.After=network.target: Ensures the network is up before starting the service.ExecStart: The command to start your C application.Restart=on-failure: Configures systemd to restart the service if it exits with a non-zero status code.RestartSec=5: Waits 5 seconds before attempting a restart.User/Group: Runs the application as a non-root user for security.WorkingDirectory: Sets the working directory for the application.ExecStartPost: This is crucial. It runs a command after the mainExecStartcommand has successfully started. We usecurlto hit the/healthendpoint.--failmakes curl return a non-zero exit code if the HTTP status is not 2xx or 3xx.--silentsuppresses progress meters and error messages.--connect-timeout 5limits the connection attempt to 5 seconds. The output is redirected to/dev/nullas we only care about the exit status.HealthCheckIntervalSec: (Systemd v235+) Specifies how often to run the health check command.HealthCheckTimeoutSec: (Systemd v235+) Specifies the maximum time the health check command is allowed to run.[Install]: Defines how the service should be enabled.
After creating the service file, reload systemd, enable, and start your application:
sudo systemctl daemon-reload sudo systemctl enable myapp.service sudo systemctl start myapp.service
You can check the status and logs with:
sudo systemctl status myapp.service sudo journalctl -u myapp.service -f
Redis Cluster Monitoring with Redis-CLI and Prometheus Exporter
Monitoring a Redis cluster involves tracking node health, memory usage, latency, and command statistics. A common and effective approach is to use the built-in redis-cli for basic checks and integrate with Prometheus using the redis_exporter for comprehensive metrics collection.
Basic Redis Node Health Check Script
We can create a simple shell script to check the status of each node in the Redis cluster. This script can be run periodically by cron or integrated into a more sophisticated monitoring system.
Create a script, e.g., /opt/scripts/check_redis_cluster.sh:
#!/bin/bash
REDIS_HOSTS=("redis-node-1:6379" "redis-node-2:6379" "redis-node-3:6379")
EXPECTED_MASTERS=2 # Assuming a 3-node cluster with 2 masters for HA
EXPECTED_SLAVES=1
CLUSTER_INFO=$(redis-cli -h redis-node-1 -p 6379 cluster info 2>&1)
CLUSTER_NODES=$(redis-cli -h redis-node-1 -p 6379 cluster nodes 2>&1)
# Check cluster_state
if echo "$CLUSTER_INFO" | grep -q "cluster_state:ok"; then
echo "Redis cluster state is OK."
else
echo "ERROR: Redis cluster state is NOT OK. Cluster Info:"
echo "$CLUSTER_INFO"
exit 1
fi
# Check number of masters and slaves
MASTERS=$(echo "$CLUSTER_NODES" | grep "master" | wc -l)
SLAVES=$(echo "$CLUSTER_NODES" | grep "slave" | wc -l)
if [ "$MASTERS" -eq "$EXPECTED_MASTERS" ] && [ "$SLAVES" -eq "$EXPECTED_SLAVES" ]; then
echo "Redis cluster has $MASTERS masters and $SLAVES slaves. (Expected: $EXPECTED_MASTERS masters, $EXPECTED_SLAVES slaves)"
else
echo "WARNING: Redis cluster has $MASTERS masters and $SLAVES slaves. (Expected: $EXPECTED_MASTERS masters, $EXPECTED_SLAVES slaves)"
# Depending on criticality, you might want to exit 1 here.
fi
# Check individual node reachability and role
for NODE in "${REDIS_HOSTS[@]}"; do
HOST=$(echo $NODE | cut -d: -f1)
PORT=$(echo $NODE | cut -d: -f2)
echo "Checking node: $HOST:$PORT"
if redis-cli -h $HOST -p $PORT ping > /dev/null 2>&1; then
ROLE=$(redis-cli -h $HOST -p $PORT role | awk '{print $1}')
echo " Node $HOST:$PORT is PINGable. Role: $ROLE"
else
echo " ERROR: Node $HOST:$PORT is NOT PINGable."
exit 1
fi
done
exit 0
Make the script executable and add it to cron:
chmod +x /opt/scripts/check_redis_cluster.sh # Add to cron: run every 5 minutes echo "*/5 * * * * /opt/scripts/check_redis_cluster.sh >> /var/log/redis_cluster_check.log 2>&1" | crontab -
Integrating Redis Exporter with Prometheus
For more detailed metrics and integration with a centralized monitoring system like Prometheus, deploying the redis_exporter is highly recommended. This exporter runs as a separate service and exposes Redis metrics in a Prometheus-compatible format.
Installation:
Download the latest release from the redis_exporter releases page. For example, on a Debian/Ubuntu system:
wget https://github.com/oliver006/redis_exporter/releases/download/v1.47.0/redis_exporter-v1.47.0.linux.amd64.tar.gz tar xvfz redis_exporter-v1.47.0.linux.amd64.tar.gz sudo mv redis_exporter /usr/local/bin/
Systemd Service for Redis Exporter:
Create a systemd service file /etc/systemd/system/redis_exporter.service. This example assumes your Redis cluster is accessible via a service discovery mechanism or a static list of hosts. For simplicity, we’ll configure it to scrape a single node, but it can be configured to scrape multiple or use Redis Sentinel.
[Unit] Description=Redis Exporter After=network.target [Service] User=redis_exporter Group=redis_exporter ExecStart=/usr/local/bin/redis_exporter \ --redis.addr=redis://redis-node-1:6379 \ --redis.addr=redis://redis-node-2:6379 \ --redis.addr=redis://redis-node-3:6379 \ --check-keyspace=true \ --check-keyspace.interval=15m \ --namespace=redis_cluster Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target
Explanation:
--redis.addr: Specifies the Redis instance(s) to connect to. You can list multiple for a cluster.--check-keyspace: Enables keyspace statistics collection.--check-keyspace.interval: How often to collect keyspace stats.--namespace: A prefix for Prometheus metrics.
Create a user for the exporter, set up permissions, reload systemd, and start the service:
sudo useradd --system --no-create-home redis_exporter sudo systemctl daemon-reload sudo systemctl enable redis_exporter.service sudo systemctl start redis_exporter.service
The exporter will now be available at http://localhost:9121/metrics. Configure your Prometheus server to scrape this endpoint.
Prometheus Configuration for Redis Cluster Scraping
In your Prometheus configuration file (e.g., /etc/prometheus/prometheus.yml), add a scrape job for the Redis exporter:
scrape_configs:
- job_name: 'redis_cluster'
static_configs:
- targets: ['localhost:9121'] # Or the IP/hostname of your redis_exporter instance
metrics_path: '/metrics'
If you have multiple Redis exporters or want to use service discovery, adjust the static_configs accordingly. After updating Prometheus configuration, reload it:
curl -X POST http://localhost:9090/-/reload
Alerting on Key Redis Metrics
With Prometheus collecting metrics, you can define alerting rules in Prometheus’s rules.yml file (or a separate file referenced in prometheus.yml). Here are some essential alerts for a Redis cluster:
- alert: RedisClusterDown
expr: |
up{job="redis_cluster"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Redis cluster exporter is down on {{ $labels.instance }}"
description: "The Prometheus exporter for Redis cluster is not reachable."
- alert: RedisNodeNotPinging
expr: |
redis_up{job="redis_cluster"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Redis node {{ $labels.instance }} is not responding to PING"
description: "The Redis node {{ $labels.instance }} is down or unreachable."
- alert: RedisHighMemoryUsage
expr: |
redis_memory_used_bytes{job="redis_cluster"} / redis_total_system_memory_bytes{job="redis_cluster"} * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "Redis node {{ $labels.instance }} has high memory usage ({{ $value | printf "%.2f" }}%)"
description: "Redis node {{ $labels.instance }} is using {{ $value | printf "%.2f" }}% of its allocated memory."
- alert: RedisHighLatency
expr: |
rate(redis_commands_total{job="redis_cluster",verb="PING"}[5m]) > 0 AND
avg_over_time(redis_commands_duration_seconds_sum{job="redis_cluster",verb="PING"}[5m]) / avg_over_time(redis_commands_total{job="redis_cluster",verb="PING"}[5m]) > 0.01
for: 5m
labels:
severity: warning
annotations:
summary: "Redis node {{ $labels.instance }} has high PING latency"
description: "Redis node {{ $labels.instance }} is experiencing high latency for PING commands."
- alert: RedisClusterStateNotOk
expr: |
redis_cluster_state{job="redis_cluster"} != "ok"
for: 1m
labels:
severity: critical
annotations:
summary: "Redis cluster state is not OK on {{ $labels.instance }}"
description: "The Redis cluster state on {{ $labels.instance }} is {{ $value }}, expected 'ok'."
These alerts, when combined with Prometheus Alertmanager, provide a robust notification system for critical issues affecting your C application and Redis cluster on Linode.