Server Monitoring Best Practices: Keeping Your Python App and MySQL Clusters Alive on OVH

Proactive Health Checks for Python Applications

Maintaining the health of Python applications, especially those deployed on cloud infrastructure like OVH, requires a multi-layered monitoring approach. Beyond basic uptime checks, we need to delve into application-specific metrics and error detection. A common pattern is to expose a dedicated health check endpoint within the application itself.

For a Flask application, this might look like:

from flask import Flask, jsonify
import logging

app = Flask(__name__)

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def check_database_connection():
    # Placeholder for actual DB connection check
    # In a real scenario, this would attempt a quick query or connection
    try:
        # Example: Using SQLAlchemy (replace with your actual ORM/DB library)
        # from your_app.database import db_session
        # db_session.execute("SELECT 1")
        return True, "Database connection is healthy."
    except Exception as e:
        logging.error(f"Database connection check failed: {e}")
        return False, f"Database connection error: {str(e)}"

@app.route('/healthz')
def health_check():
    db_healthy, db_message = check_database_connection()

    # Add other checks as needed: cache, external services, etc.
    # For example, checking a Redis connection:
    # redis_healthy, redis_message = check_redis_connection()

    if db_healthy: # and redis_healthy:
        return jsonify({
            "status": "ok",
            "message": "Application is healthy.",
            "database": {"status": "ok", "message": db_message}
            # "redis": {"status": "ok", "message": redis_message}
        }), 200
    else:
        return jsonify({
            "status": "error",
            "message": "Application is unhealthy.",
            "database": {"status": "error", "message": db_message}
            # "redis": {"status": "error", "message": redis_message}
        }), 503 # Service Unavailable

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

This endpoint should be polled by an external monitoring system (e.g., Prometheus, Nagios, or even a simple cron job with `curl`). The response code (200 for healthy, non-2xx for unhealthy) and the JSON payload provide granular insights.

Leveraging Prometheus for Application and Infrastructure Metrics

Prometheus is a de facto standard for metrics-based monitoring. For Python applications, the prometheus_client library is invaluable. We can instrument our application to expose custom metrics.

First, install the library:

pip install prometheus_client

Then, integrate it into your Flask app:

from flask import Flask, Response
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
import time
import random

app = Flask(__name__)

# Custom metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP Requests', ['method', 'endpoint'])
REQUEST_LATENCY = Summary('http_request_duration_seconds', 'HTTP Request Latency', ['endpoint'])

@app.route('/metrics')
def metrics():
    return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)

@app.route('/')
@REQUEST_LATENCY.labels(endpoint='/').time()
def index():
    REQUEST_COUNT.labels(method='GET', endpoint='/').inc()
    time.sleep(random.uniform(0.1, 0.5)) # Simulate work
    return "Hello, World!"

@app.route('/api/data')
@REQUEST_LATENCY.labels(endpoint='/api/data').time()
def get_data():
    REQUEST_COUNT.labels(method='GET', endpoint='/api/data').inc()
    time.sleep(random.uniform(0.2, 1.0)) # Simulate work
    return {"data": "some_value"}

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

This exposes a /metrics endpoint that Prometheus can scrape. We’re tracking request counts and latency. For more advanced metrics, consider using:

Gauge for values that can go up or down (e.g., current number of active connections).
Histogram for distributions of values (e.g., response times, useful for calculating percentiles).
Summary for calculating configurable quantiles of observations over a sliding window.

Monitoring MySQL Clusters with Percona Monitoring and Management (PMM)

For MySQL clusters, especially in a production environment, a robust solution like Percona Monitoring and Management (PMM) is highly recommended. PMM provides deep insights into MySQL performance, availability, and query analysis.

Deployment typically involves:

Deploying the PMM Server (a Docker container or VM).
Installing the PMM Client on each MySQL node.

The PMM Client uses the mysqld_exporter to collect metrics from MySQL instances. Configuration involves ensuring the exporter can connect to your MySQL instances with appropriate credentials.

On your MySQL server (e.g., running on OVH’s dedicated servers or VPS), you’ll need to create a dedicated monitoring user:

CREATE USER 'pmm_monitor'@'localhost' IDENTIFIED BY 'your_strong_password';
GRANT PROCESS, REPLICATION CLIENT, SHOW DATABASES, SELECT ON *.* TO 'pmm_monitor'@'localhost';
FLUSH PRIVILEGES;

Then, configure the mysqld_exporter. This is often done via environment variables when running it as a service or within Docker.

# Example using Docker Compose for PMM Client
version: '3'

services:
  pmm-client:
    image: perconalab/pmm-client:latest
    container_name: pmm-client
    environment:
      - PMM_SERVER_ADDRESS=your_pmm_server_ip_or_hostname
      - PMM_SERVER_USERNAME=admin
      - PMM_SERVER_PASSWORD=your_pmm_server_password
    volumes:
      - /usr/local/percona/pmm2/collectors:/usr/local/percona/pmm2/collectors
      - /usr/local/percona/pmm2/exporters:/usr/local/percona/pmm2/exporters
      - /usr/local/percona/pmm2/config:/usr/local/percona/pmm2/config
      - /var/lib/mysql:/var/lib/mysql # For socket file access if needed
    network_mode: host # Or configure specific network
    restart: always

# Add your MySQL service here if running in the same compose file
# For existing MySQL instances, you'll run the pmm-client container separately
# and point it to your MySQL host/port.

Once the PMM client is running and connected, PMM will automatically start collecting metrics. You can then access the PMM web UI to visualize these metrics, set up alerts, and analyze query performance.

OVH Specific Considerations: Network and Firewall

When deploying monitoring agents or exposing metrics endpoints on OVH infrastructure, firewall rules are critical. Ensure that:

Your Prometheus server can reach the /metrics endpoints of your Python applications (typically port 5000 or 8000).
Your PMM server can reach the PMM client agents (usually port 443 for the client registration, and the MySQL port, e.g., 3306, for data collection).
Your PMM client can reach your MySQL cluster nodes.
Your application health check endpoints (e.g., /healthz) are accessible by your monitoring probes.

OVH’s control panel provides tools to manage security groups and firewall rules. For instance, to allow Prometheus (running on a specific IP) to scrape your application on port 5000:

# Example using OVH API or control panel to add a rule:
# Allow TCP traffic on port 5000 from Prometheus server IP (e.g., 1.2.3.4)
# to your application server IP.

Similarly, for PMM, ensure the necessary ports are open between the PMM server, PMM clients, and your MySQL instances.

Alerting Strategies: From Prometheus Alertmanager to PMM Alerts

Effective monitoring is incomplete without a robust alerting strategy. Prometheus integrates with Alertmanager for sophisticated alert routing and deduplication.

Define alerting rules in Prometheus configuration files (e.g., rules.yml):

groups:
- name: python_app_alerts
  rules:
  - alert: HighRequestLatency
    expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{endpoint="/api/data"}[5m])) by (le, endpoint)) > 2.0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High latency detected for /api/data endpoint"
      description: "95th percentile latency for /api/data is {{ $value }}s, exceeding threshold."

  - alert: AppUnhealthy
    expr: probe_success{job="my_python_app_healthcheck"} == 0
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "Python application health check failed"
      description: "The /healthz endpoint for my_python_app is returning an error."

- name: mysql_alerts
  rules:
  - alert: MySQLHighCPU
    expr: pmm_mysql_cpu_usage_seconds_total{instance=~"mysql-node-.*"} > 0.8
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on MySQL instance"
      description: "MySQL instance {{ $labels.instance }} is experiencing high CPU usage ({{ $value }})."

  - alert: MySQLReplicationLag
    expr: pmm_mysql_replication_lag_seconds{instance="mysql-master-1"} > 60
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "MySQL replication lag detected"
      description: "MySQL master-1 has a replication lag of {{ $value }} seconds."

Configure Alertmanager to route these alerts to Slack, PagerDuty, or email. PMM also has its own alerting capabilities, often integrated with Alertmanager or providing direct notification channels.

For MySQL, PMM’s query analytics is crucial for identifying slow queries that might be impacting performance. Setting up alerts for specific slow query patterns or high `Threads_running` can prevent cascading failures.

Log Aggregation and Analysis

Metrics tell you *what* is happening, but logs tell you *why*. Centralized log aggregation is essential for debugging issues across distributed systems.

Tools like Elasticsearch, Logstash, and Kibana (ELK stack), or Grafana Loki, are commonly used. On OVH, you might deploy these yourself or use managed services.

For Python applications, ensure your logging is configured to output in a structured format (e.g., JSON). This makes parsing and searching much easier.

import logging
import json

# Configure JSON logging
class JsonFormatter(logging.Formatter):
    def format(self, record):
        log_entry = {
            "timestamp": self.formatTime(record, self.datefmt),
            "level": record.levelname,
            "message": record.getMessage(),
            "logger": record.name,
            "pathname": record.pathname,
            "lineno": record.lineno,
        }
        # Add exception info if present
        if record.exc_info:
            log_entry['exc_info'] = self.formatException(record.exc_info)
        return json.dumps(log_entry)

logger = logging.getLogger('my_app')
logger.setLevel(logging.INFO)

handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger.addHandler(handler)

logger.info("Application started successfully.")
try:
    result = 1 / 0
except ZeroDivisionError:
    logger.exception("An error occurred during calculation.")

Configure Logstash or Fluentd agents on your OVH servers to collect these logs and forward them to your central logging cluster. For MySQL, ensure the general query log and error log are being collected.

By combining application-level health checks, Prometheus metrics, PMM for database insights, robust alerting, and centralized logging, you build a resilient monitoring strategy capable of keeping your Python applications and MySQL clusters healthy and performant on OVH infrastructure.

Server Monitoring Best Practices: Keeping Your Python App and MySQL Clusters Alive on OVH

Proactive Health Checks for Python Applications

Leveraging Prometheus for Application and Infrastructure Metrics

Monitoring MySQL Clusters with Percona Monitoring and Management (PMM)

OVH Specific Considerations: Network and Firewall

Alerting Strategies: From Prometheus Alertmanager to PMM Alerts

Log Aggregation and Analysis

Recent Posts

Top Categories

Our Products

Our Services