The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on OVH for C

Nginx Tuning for High Throughput on OVH Instances

Optimizing Nginx is paramount for handling significant traffic loads, especially on cloud infrastructure like OVH where instance resources are finite. The following configurations focus on maximizing connection handling, efficient file serving, and robust proxying to your application servers (Gunicorn/FPM).

Worker Processes and Connections

The `worker_processes` directive dictates how many worker processes Nginx will spawn. A common recommendation is to set this to the number of CPU cores available. For I/O-bound workloads, setting it to `auto` allows Nginx to determine the optimal number based on CPU cores. The `worker_connections` directive sets the maximum number of simultaneous connections that each worker process can handle. This value, combined with `worker_processes`, determines the total connection capacity.

Consider a typical OVH instance with 8 vCPUs. We’ll set `worker_processes` to 8 and `worker_connections` to a reasonably high value, ensuring it doesn’t exceed the system’s file descriptor limits.

worker_processes 8;
# Or use 'auto' for Nginx to decide based on CPU cores
# worker_processes auto;

events {
    worker_connections 4096; # Adjust based on system limits (ulimit -n)
    multi_accept on;
    use epoll; # For Linux systems
}

http {
    # ... other http configurations ...
}

File Descriptor Limits

Nginx workers require file descriptors for each connection and open file. Insufficient file descriptor limits (`ulimit -n`) will lead to “Too many open files” errors. You must increase these limits system-wide and for the Nginx user.

Edit `/etc/security/limits.conf`:

# /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
root soft nofile 65536
root hard nofile 65536

Then, configure Nginx to use these limits. In your `nginx.conf` (or a separate file sourced by it), add:

user nginx; # Or your Nginx user
worker_rlimit_nofile 65536;

After modifying `limits.conf`, you’ll need to restart the Nginx service or, for a more robust change, reboot the server. Verify the limits with `ulimit -n` as the Nginx user.

Keepalive Connections

Enabling keepalive connections reduces the overhead of establishing new TCP connections for subsequent requests from the same client. This is crucial for performance, especially with HTTP/1.1.

http {
    # ...
    keepalive_timeout 65; # Default is 75. Adjust based on expected client behavior.
    keepalive_requests 1000; # Max requests per keepalive connection.
    # ...
}

Gzip Compression

Compressing responses significantly reduces bandwidth usage and improves perceived load times. Ensure your application servers (Gunicorn/FPM) are also configured to handle compression if Nginx is not directly serving static assets.

http {
    # ...
    gzip on;
    gzip_vary on;
    gzip_proxied any; # Compress proxied responses
    gzip_comp_level 6; # Compression level (1-9)
    gzip_buffers 16 8k; # Number and size of buffers
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
    # ...
}

Buffering and Timeouts for Proxied Requests

When proxying to Gunicorn or PHP-FPM, Nginx buffering and timeout settings are critical. Improper tuning can lead to Nginx closing connections prematurely or consuming excessive memory. Adjust these based on your application’s typical response times.

location / {
    proxy_pass http://your_app_backend;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    # Buffering settings
    proxy_buffering on;
    proxy_buffers 8 16k; # Adjust buffer size based on typical response payloads
    proxy_buffer_size 32k; # Larger buffer for initial response

    # Timeouts
    proxy_connect_timeout 60s;
    proxy_send_timeout 60s;
    proxy_read_timeout 60s; # Crucial for long-running requests
}

Gunicorn Tuning for Python Applications

Gunicorn (Green Unicorn) is a popular WSGI HTTP Server for Python. Tuning its worker processes and threads is key to handling concurrent requests efficiently.

Worker Processes and Threads

Gunicorn’s `workers` setting determines the number of worker processes. A common starting point is `(2 * number_of_cores) + 1`. For I/O-bound applications, you might increase this. The `threads` setting (available with `gthread` worker type) allows for concurrency within a single process, useful for I/O-bound tasks but can be limited by the Global Interpreter Lock (GIL) for CPU-bound tasks.

For a server with 8 vCPUs, a good starting point for `sync` workers (which are process-based) would be around 17 workers. If using `gthread`, you might use fewer workers and more threads.

# Example command line for Gunicorn
gunicorn --workers 17 \
         --worker-class sync \
         --bind 0.0.0.0:8000 \
         your_app.wsgi:application

# Example with threads (gthread worker class)
gunicorn --workers 4 \
         --threads 8 \
         --worker-class gthread \
         --bind 0.0.0.0:8000 \
         your_app.wsgi:application

The `sync` worker class is generally recommended for its simplicity and robustness. `gthread` can offer better concurrency for I/O-bound tasks but requires careful tuning and understanding of Python’s GIL.

Timeout and Keepalive

Gunicorn’s `timeout` setting defines how long a worker can take to process a request before it’s killed. This should be set slightly higher than your application’s longest expected request processing time. `keepalive` controls the number of requests a worker can handle before it’s restarted, helping to mitigate memory leaks.

# Example command line for Gunicorn
gunicorn --workers 17 \
         --worker-class sync \
         --timeout 120 \
         --keepalive 100 \
         --bind 0.0.0.0:8000 \
         your_app.wsgi:application

Worker Memory Limits

For long-running applications, monitoring and limiting worker memory usage is crucial to prevent OOM (Out Of Memory) errors. Gunicorn doesn’t have a direct built-in mechanism for this, but you can use external tools or implement periodic restarts based on memory usage.

A common strategy is to restart workers after a certain number of requests or a period of time, controlled by `–max-requests` or `–max-requests-jitter`.

# Example command line for Gunicorn
gunicorn --workers 17 \
         --worker-class sync \
         --max-requests 5000 \
         --max-requests-jitter 500 \
         --bind 0.0.0.0:8000 \
         your_app.wsgi:application

The `max-requests-jitter` adds a random delay to the restart, preventing all workers from restarting simultaneously and causing a spike in load.

PHP-FPM Tuning for PHP Applications

PHP-FPM (FastCGI Process Manager) is the standard for running PHP applications with web servers like Nginx. Its performance is heavily influenced by its process management settings.

Process Manager Settings

PHP-FPM offers three process management strategies: `static`, `dynamic`, and `ondemand`. For production environments, `dynamic` or `static` are generally preferred.

`dynamic`: This is often the best balance. It starts with a minimum number of processes and spawns more as needed, up to a maximum. It also kills idle processes to save resources.

; /etc/php/X.Y/fpm/pool.d/www.conf (adjust X.Y for your PHP version)
pm = dynamic
pm.max_children = 100       ; Max number of children that can be alive at the same time.
pm.start_servers = 2        ; Number of children when pm is dynamic.
pm.min_spare_servers = 1    ; Min number of idle InUse processes.
pm.max_spare_servers = 5    ; Max number of idle InUse processes.
pm.max_requests = 500       ; Max number of requests each child process will serve.
                            ; Set to 0 to disable. Helps prevent memory leaks.

`static`: This pre-forks a fixed number of processes. It offers the most predictable performance but can be wasteful if traffic is inconsistent.

; /etc/php/X.Y/fpm/pool.d/www.conf
pm = static
pm.max_children = 150       ; Fixed number of children. Adjust based on server resources.
pm.max_requests = 0         ; Disable automatic restarts for static pool.

The values for `pm.max_children` and `pm.max_requests` should be tuned based on your server’s RAM and the typical memory footprint of your PHP application. A common starting point for `pm.max_children` is to calculate based on available RAM: `Available RAM / Average PHP Process Size`. For example, if you have 16GB RAM and each PHP process averages 50MB, you could potentially support around 320 `max_children` (16384MB / 50MB ≈ 327). However, always leave ample RAM for the OS and other services.

Process Idle Timeout

The `process_idle_timeout` directive (for `dynamic` and `ondemand` PM) determines how long an idle process will be kept alive before being killed. This helps conserve resources during low traffic periods.

; /etc/php/X.Y/fpm/pool.d/www.conf
pm.process_idle_timeout = 10s ; For dynamic and ondemand

Nginx Configuration for PHP-FPM

Ensure your Nginx configuration correctly passes requests to PHP-FPM, including appropriate timeouts.

location ~ \.php$ {
    include snippets/fastcgi-php.conf;
    # With php-fpm (or other unix sockets):
    fastcgi_pass unix:/run/php/phpX.Y-fpm.sock; # Adjust PHP version
    # Or with TCP/IP:
    # fastcgi_pass 127.0.0.1:9000;

    # Increase FastCGI timeouts if your PHP scripts can take longer
    fastcgi_connect_timeout 60s;
    fastcgi_send_timeout 60s;
    fastcgi_read_timeout 300s; # Crucial for long-running PHP scripts
}

Elasticsearch Tuning for Performance and Scalability

Elasticsearch performance is heavily dependent on JVM heap size, file system cache, and shard configuration. On OVH instances, managing memory and disk I/O is critical.

JVM Heap Size

The most critical Elasticsearch setting is the JVM heap size. It should be set to no more than 50% of the system’s total RAM, and never exceed 30-32GB due to compressed ordinary object pointers (compressed oops).

Edit `config/jvm.options` (or `config/jvm.options.d/*.options` in newer versions):

-Xms4g
-Xmx4g

For a server with 8GB RAM, setting `-Xms4g -Xmx4g` (4GB heap) is a reasonable starting point. This leaves 4GB for the OS and file system cache, which is vital for Elasticsearch performance.

File System Cache

Elasticsearch relies heavily on the operating system’s file system cache to store index data. Ensuring sufficient free RAM for the OS is paramount. Avoid running other memory-intensive applications on the same node.

Shard Allocation and Size

The number and size of shards significantly impact performance. Aim for shard sizes between 10GB and 50GB. Too many small shards increase overhead; too few large shards can hinder rebalancing and recovery.

When creating indices, define the number of primary shards. For example, to create an index with 3 primary shards:

{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1
    }
  }
}

The number of primary shards should generally not exceed the number of data nodes in your cluster. Replicas provide redundancy and can improve read performance but increase indexing load.

Swapping Disabled

Elasticsearch performance degrades severely if the JVM heap is swapped to disk. Ensure swapping is disabled for the Elasticsearch process and the system.

In `elasticsearch.yml`:

bootstrap.memory_lock: true

You’ll also need to configure `ulimit` for the Elasticsearch user to allow `memlock` (memory locking):

# /etc/security/limits.d/elasticsearch.conf
* soft memlock unlimited
* hard memlock unlimited

And ensure the system’s `vm.max_map_count` is sufficiently high (e.g., 262144):

# /etc/sysctl.conf
vm.max_map_count=262144

Apply sysctl changes with `sysctl -p`.

Disk I/O and Network

For optimal performance, use fast SSDs for Elasticsearch data directories. Monitor disk I/O using tools like `iostat`. Ensure your network configuration on OVH allows for low latency between nodes if running a cluster.

Putting It All Together: A Sample OVH Deployment

Consider a typical OVH instance (e.g., 8 vCPU, 16GB RAM) hosting Nginx, PHP-FPM, and a single-node Elasticsearch instance. This is a common scenario for smaller applications or development environments. For production, consider dedicated nodes for each service.

Nginx Configuration Snippet

# /etc/nginx/nginx.conf
user www-data;
worker_processes 8; # Based on 8 vCPUs

events {
    worker_connections 4096;
    multi_accept on;
    use epoll;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;
    keepalive_timeout 65;
    keepalive_requests 1000;
    types_hash_max_size 2048;

    server_tokens off; # Security best practice

    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Proxy to PHP-FPM
    location ~ \.php$ {
        include snippets/fastcgi-php.conf;
        fastcgi_pass unix:/run/php/php8.1-fpm.sock; # Adjust PHP version
        fastcgi_connect_timeout 60s;
        fastcgi_send_timeout 60s;
        fastcgi_read_timeout 300s;
    }

    # Proxy to Gunicorn (if applicable)
    location / {
        proxy_pass http://127.0.0.1:8000; # Assuming Gunicorn runs on localhost:8000
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_buffering on;
        proxy_buffers 8 16k;
        proxy_buffer_size 32k;

        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    # Serve static files directly
    location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2)$ {
        expires 30d;
        access_log off;
        add_header Cache-Control "public";
    }
}

PHP-FPM Configuration Snippet

; /etc/php/8.1/fpm/pool.d/www.conf (adjust PHP version)
[www]
user = www-data
group = www-data
listen = /run/php/php8.1-fpm.sock

pm = dynamic
pm.max_children = 100
pm.start_servers = 5
pm.min_spare_servers = 2
pm.max_spare_servers = 10
pm.max_requests = 1000
pm.process_idle_timeout = 10s

Elasticsearch Configuration Snippet

# /etc/elasticsearch/elasticsearch.yml
cluster.name: "my-ovh-cluster"
node.name: "es-node-1"
network.host: 127.0.0.1 # Or a private IP if part of a cluster
http.port: 9200
discovery.seed_hosts: [] # For single node, or list other nodes if clustered

# Ensure these are set in jvm.options and limits.conf/sysctl.conf as described above
# bootstrap.memory_lock: true
# xpack.security.enabled: false # For simplicity in this example, enable for production

# /etc/elasticsearch/jvm.options
-Xms4g
-Xmx4g

This comprehensive playbook provides a solid foundation for tuning Nginx, Gunicorn/FPM, and Elasticsearch on OVH infrastructure. Remember that continuous monitoring and iterative adjustments based on real-world load are key to maintaining optimal performance.