The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on OVH for Python

Nginx as a High-Performance Frontend for Python Applications

When deploying Python web applications, particularly those using WSGI servers like Gunicorn, Nginx serves as an indispensable frontend. Its primary roles are efficient static file serving, SSL termination, load balancing, and acting as a reverse proxy to your application server. Optimizing Nginx is crucial for handling high traffic volumes and ensuring low latency.

Core Nginx Configuration Tuning

The nginx.conf file, typically located at /etc/nginx/nginx.conf or within /etc/nginx/conf.d/, is the starting point. We’ll focus on key directives within the http block and specific server configurations.

Worker Processes and Connections

The number of worker processes should ideally match the number of CPU cores available on your server. This allows Nginx to effectively utilize all available processing power for handling requests. The worker_connections directive defines the maximum number of simultaneous connections that each worker process can handle. A common starting point is 1024 or higher, depending on your expected load and system limits.

Example `nginx.conf` Snippet

# /etc/nginx/nginx.conf

user www-data;
worker_processes auto; # Or set to the number of CPU cores, e.g., worker_processes 4;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 4096; # Adjust based on system limits and expected load
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # ... other http configurations ...
}

Keep-Alive and Buffering

Enabling keepalive_timeout reduces the overhead of establishing new TCP connections for subsequent requests from the same client. Buffering directives control how Nginx handles request and response bodies. For upstream connections (to Gunicorn/FPM), it’s often beneficial to disable client request buffering to avoid large memory usage on the Nginx worker, especially if your application server can handle large uploads efficiently. However, response buffering can be useful for compressing responses.

Example Server Block for Python App

# /etc/nginx/sites-available/your_python_app.conf

server {
    listen 80;
    server_name your_domain.com www.your_domain.com;
    charset utf-8;

    # Static files
    location /static/ {
        alias /path/to/your/app/static/;
        expires 30d;
        access_log off;
        add_header Cache-Control "public";
    }

    # Media files (if applicable)
    location /media/ {
        alias /path/to/your/app/media/;
        expires 30d;
        access_log off;
        add_header Cache-Control "public";
    }

    # Proxy to Gunicorn/FPM
    location / {
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # For Gunicorn (Unix socket or TCP)
        # proxy_pass http://unix:/path/to/your/app.sock;
        proxy_pass http://127.0.0.1:8000; # Example for Gunicorn on TCP

        # For PHP-FPM
        # include snippets/fastcgi-php.conf;
        # fastcgi_pass unix:/var/run/php/php7.4-fpm.sock;

        proxy_connect_timeout 75s;
        proxy_read_timeout 300s; # Adjust based on expected request times
        proxy_send_timeout 300s;

        # Disable client request buffering if Gunicorn can handle it
        client_max_body_size 100M; # Set a reasonable limit
        client_body_buffer_size 128k; # Default is usually fine, adjust if needed
        client_header_buffer_size 128k;
        large_client_header_buffers 4 128k;

        # Response buffering and compression (optional but recommended)
        proxy_buffering on;
        proxy_buffer_size 128k;
        proxy_buffers 8 128k;
        proxy_busy_buffers_size 256k;

        gzip on;
        gzip_vary on;
        gzip_proxied any;
        gzip_comp_level 6;
        gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
    }

    # Optional: Error pages
    error_page 500 502 503 504 /50x.html;
    location = /50x.html {
        root /usr/share/nginx/html;
    }
}

Gunicorn Tuning for Python WSGI Applications

Gunicorn (Green Unicorn) is a popular WSGI HTTP Server for Python. Its performance is heavily influenced by the number of worker processes, worker types, and connection handling. On OVH, where resources can be provisioned with varying CPU and RAM, tuning is essential.

Worker Processes and Types

The number of worker processes is a critical tuning parameter. A common recommendation is to set it to (2 * number_of_cores) + 1. This formula aims to keep CPU cores busy while accounting for I/O waits. Gunicorn offers several worker types:

Sync Workers (default): Simple, but can block under heavy load.
Eventlet/Gevent Workers: Asynchronous, non-blocking I/O. Excellent for I/O-bound applications. Requires installing eventlet or gevent.
Gthread Workers: Uses threads for concurrency. Can be simpler to manage than async workers if your application isn’t heavily I/O bound but benefits from concurrency.

For most modern Python web applications, especially those making external API calls or database queries, gevent or eventlet workers are highly recommended for their ability to handle many concurrent connections efficiently without blocking.

Tuning Worker Connections and Timeouts

--worker-connections (for async workers like gevent/eventlet) specifies the maximum number of clients that can be served by a single worker process. The default is 1000. --timeout defines how long Gunicorn will wait for a worker to process a request before considering it timed out. This should be set higher than your longest expected request processing time but not excessively high to prevent hanging workers from consuming resources indefinitely.

Gunicorn Command Line Example

# Example using gevent workers, 4 CPU cores, and a socket file
# Assuming your WSGI app is in 'my_app.wsgi:application'

gunicorn --workers 9 \
         --worker-class gevent \
         --worker-connections 2000 \
         --bind unix:/path/to/your/app.sock \
         --timeout 120 \
         --graceful-timeout 120 \
         --log-level info \
         --access-logfile /var/log/gunicorn/access.log \
         --error-logfile /var/log/gunicorn/error.log \
         my_app.wsgi:application

Note: If binding to a TCP port (e.g., --bind 127.0.0.1:8000), ensure that Nginx is configured to proxy to this address. Using a Unix socket is generally preferred for performance when Nginx and Gunicorn are on the same machine, as it avoids the overhead of TCP/IP.

PHP-FPM Tuning for PHP Applications

If your application is PHP-based, PHP-FPM (FastCGI Process Manager) is the standard way to interface PHP with web servers like Nginx. Tuning PHP-FPM is crucial for managing resource consumption and request throughput.

Process Management Modes

PHP-FPM offers three primary process management `pm` settings:

static: A fixed number of child processes are spawned when the FPM master process starts. This offers the most predictable performance but can be inefficient if traffic fluctuates wildly.
dynamic: FPM starts a few processes and spawns more as needed, up to a defined maximum. Processes are then killed if they are idle for a certain amount of time. This is a good balance for fluctuating traffic.
ondemand: Processes are spawned only when a request is received. This saves resources but can introduce a slight delay on the first request after a period of inactivity.

For most production environments, dynamic is the recommended setting. You’ll need to tune pm.max_children, pm.start_servers, pm.min_spare_servers, and pm.max_spare_servers.

Tuning Key PHP-FPM Directives

These directives are typically found in the PHP-FPM pool configuration file, often located at /etc/php/X.Y/fpm/pool.d/www.conf (replace X.Y with your PHP version).

Example PHP-FPM Pool Configuration

; /etc/php/7.4/fpm/pool.d/www.conf

[www]
user = www-data
group = www-data
listen = /var/run/php/php7.4-fpm.sock ; Or a TCP port like 127.0.0.1:9000
listen.owner = www-data
listen.group = www-data
listen.mode = 0660

pm = dynamic
pm.max_children = 100       ; Max number of children that can be alive at the same time.
pm.start_servers = 5        ; Number of children to start when the process manager is started.
pm.min_spare_servers = 5    ; Number of required "spare" processes. The process manager will fork if there are less than this.
pm.max_spare_servers = 15   ; Number of "spare" processes. The process manager will kill if there are more than this.

pm.process_idle_timeout = 10s ; The timeout for idle processes. The process manager will kill idle processes until the number of processes is less than pm.max_spare_servers.
pm.max_requests = 500       ; The number of requests each child process should execute before respawning. Prevents memory leaks.

request_terminate_timeout = 120s ; Timeout for script execution. Corresponds to Nginx's proxy_read_timeout.
request_slowlog_timeout = 30s    ; Log script execution times exceeding this value.

catch_workers_output = yes
; slowlog = /var/log/php/php-fpm-slowlog.log ; Uncomment to enable slow log
; rlimit_files = 1024 ; Adjust based on system limits

Tuning Strategy: Start with conservative values for pm.max_children and gradually increase them while monitoring server CPU and memory usage. A common approach is to set pm.max_children to a value that, when multiplied by the average memory footprint of a PHP-FPM worker process, does not exceed your available RAM. pm.start_servers, pm.min_spare_servers, and pm.max_spare_servers should be tuned to ensure a quick response to traffic spikes without creating excessive idle processes.

Elasticsearch Performance Tuning on OVH

Elasticsearch, while not directly part of the web serving stack, is often a critical component for search and logging. Optimizing its performance on OVH infrastructure involves JVM tuning, disk I/O considerations, and cluster configuration.

JVM Heap Size

The Java Virtual Machine (JVM) heap size is paramount. Elasticsearch uses a significant portion of its allocated heap for caching and indexing. The recommended setting is to allocate 50% of your system’s RAM to the JVM heap, but never exceed 30-32GB. This is because of compressed ordinary object pointers (compressed oops), which provide performance benefits up to this threshold. The heap size is configured in jvm.options, typically found in /etc/elasticsearch/jvm.options.

Example `jvm.options` Snippet

# /etc/elasticsearch/jvm.options

# Xms represents the initial size of the heap, and Xmx represents the maximum size.
# Set both to the same value to avoid heap resizing.
# Example for a server with 64GB RAM:
-Xms30g
-Xmx30g

# Other JVM settings can be tuned, but heap size is the most critical.
# Ensure you have enough swap disabled or configured appropriately.
# swapoff -a
# Or configure swappiness:
# sysctl vm.swappiness=1

Disk I/O and Storage

Elasticsearch is I/O intensive. On OVH, choose storage that offers high IOPS. SSDs are a must for production environments. For large deployments, consider NVMe SSDs. Ensure your filesystem is mounted with appropriate options, such as noatime, to reduce unnecessary writes.

Shard and Replica Strategy

The number of shards per index significantly impacts performance. Too many shards can overwhelm the cluster, while too few can limit parallelism. A common guideline is to aim for shards between 10GB and 50GB. The number of replicas affects search performance (more replicas can improve read throughput) and resilience but increases indexing overhead and storage requirements.

Example Index Settings (via API)

PUT /my-index
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "refresh_interval": "5s"
    }
  }
}

refresh_interval controls how often data becomes searchable. The default is 1 second. Increasing this to 5s or more can significantly improve indexing performance at the cost of near-real-time search latency.

Elasticsearch Configuration (`elasticsearch.yml`)

Key settings in /etc/elasticsearch/elasticsearch.yml include:

Essential `elasticsearch.yml` Directives

# /etc/elasticsearch/elasticsearch.yml # Cluster settings cluster.name: "my-es-cluster" # Node settings node.name: "node-1" network.host: "0.0.0.0" # Or specific IP for binding # Discovery settings (for multi-node clusters) discovery.seed_hosts: ["host1", "host2"] cluster.initial_master_nodes: ["node-1", "node-2"] # File system settings path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch # JVM heap settings are in jvm.options, not here. # Performance tuning indices.memory.index_buffer_size: "50%" # Default is 10%, can be increased cautiously thread_pool.write.queue_size: 1000 # Default is 200, increase if seeing write rejections thread_pool.search.queue_size: 1000 # Default is 1000, adjust based on search load # Disable swapping bootstrap.memory_lock: true # Requires appropriate OS configuration (ulimit -l)

Important: Ensure that bootstrap.memory_lock: true is configured correctly at the OS level (e.g., via /etc/security/limits.conf) to prevent Elasticsearch from swapping. Also, disable swap entirely if possible on your OVH instances.

Monitoring and Iterative Tuning

Tuning is not a one-time event. Continuous monitoring is key. Utilize tools like:

Nginx: stub_status module, Nginx Amplify, Prometheus exporters.
Gunicorn: Built-in logging, Prometheus exporters (e.g., gunicorn-prometheus-exposer).
PHP-FPM: Status page, Prometheus exporters (e.g., php-fpm_exporter).
Elasticsearch: Elasticsearch's own monitoring APIs, Kibana's Stack Monitoring, Prometheus exporters (e.g., elasticsearch_exporter).
System-level: htop, vmstat, iostat, Prometheus Node Exporter.

Analyze metrics such as CPU utilization, memory usage, I/O wait times, request latency, error rates, and garbage collection activity (for Elasticsearch). Make incremental changes to configurations and observe their impact. This iterative process will lead to a highly optimized stack tailored to your specific workload on OVH.