The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on OVH for Ruby

Nginx as a High-Performance Reverse Proxy and Load Balancer

When deploying Ruby applications, particularly those using frameworks like Ruby on Rails, Nginx serves as an indispensable component for handling incoming HTTP traffic. Its efficiency in serving static assets, SSL termination, and acting as a reverse proxy to application servers like Gunicorn or Puma is paramount. On OVH infrastructure, leveraging Nginx effectively can significantly reduce latency and improve overall throughput.

A common setup involves Nginx proxying requests to a Gunicorn (or Puma) instance running your Ruby application. Here’s a robust Nginx configuration snippet optimized for this scenario. We’ll focus on connection management, buffering, and static file serving.

Core Nginx Configuration for Ruby Applications

This configuration assumes your Ruby application is listening on a specific port (e.g., 8000) via Gunicorn. Adjust `proxy_pass` accordingly. We’ll also configure Nginx to serve static assets directly, offloading this task from the application server.

# /etc/nginx/sites-available/your_ruby_app

server {
    listen 80;
    server_name your_domain.com www.your_domain.com;

    # SSL Configuration (Recommended for production)
    # listen 443 ssl http2;
    # ssl_certificate /etc/letsencrypt/live/your_domain.com/fullchain.pem;
    # ssl_certificate_key /etc/letsencrypt/live/your_domain.com/privkey.pem;
    # include /etc/letsencrypt/options-ssl-nginx.conf;
    # ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;

    # Gzip Compression
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;

    # Static Assets
    location ~ ^/(assets|images|javascripts|stylesheets|system)/ {
        root /path/to/your/rails/app/public; # Adjust to your Rails public directory
        expires max;
        add_header Cache-Control public;
        access_log off;
    }

    # Proxy to Application Server (Gunicorn/Puma)
    location / {
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Buffering settings to prevent 502 Bad Gateway errors for long requests
        proxy_buffers 8 16k;
        proxy_buffer_size 32k;
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;

        proxy_pass http://127.0.0.1:8000; # Assuming Gunicorn is on port 8000
        proxy_redirect off;
    }

    # Health Check Endpoint (Optional but Recommended)
    # location /health {
    #     access_log off;
    #     return 200 'OK';
    #     add_header Content-Type text/plain;
    # }

    # Error Pages
    error_page 500 502 503 504 /500.html;
    location = /500.html {
        root /path/to/your/rails/app/public; # Adjust to your Rails public directory
        internal;
    }
}

Key Directives Explained:

gzip: Enables Gzip compression for responses.
gzip_types: Specifies MIME types to compress.
location ~ ^/(assets|images|...): Directs Nginx to serve static files from the specified `root` path. `expires max` and `add_header Cache-Control public` instruct browsers and intermediate caches to aggressively cache these assets.
proxy_set_header: Crucial for passing client information to the backend application.
proxy_buffers, proxy_buffer_size: These settings help manage large request/response bodies, preventing timeouts and 502 errors. Adjust `8 16k` and `32k` based on typical request sizes and application behavior.
proxy_connect_timeout, proxy_send_timeout, proxy_read_timeout: These control how long Nginx waits for a connection to the upstream server and for data to be sent/received. Increase these values if your application performs long-running operations.
proxy_pass http://127.0.0.1:8000;: The core directive that forwards requests to your application server.

After modifying your Nginx configuration, always test it before reloading:

sudo nginx -t
sudo systemctl reload nginx

Gunicorn/Puma Tuning for Ruby Applications

Gunicorn (or Puma, a popular alternative for Ruby) is the WSGI HTTP Server that runs your Python/Ruby application. Its performance is directly tied to how efficiently it handles concurrent requests. Tuning Gunicorn involves selecting the right worker type and number of workers.

Worker Types and Scaling

Gunicorn supports several worker types:

Sync Workers (Default): Each worker handles one request at a time. Simple but can be a bottleneck under high load.
Async Workers (e.g., Gevent, Eventlet): These workers can handle multiple requests concurrently using non-blocking I/O. This is generally preferred for I/O-bound applications.
Threads (e.g., Gunicorn’s `threads` option with `sync` workers): Allows a single process to handle multiple requests using threads. Be mindful of the Global Interpreter Lock (GIL) in CPython, which can limit true parallelism for CPU-bound tasks.

For Ruby applications (often run via Puma), the concept is similar. Puma uses threads by default. Tuning involves setting the number of workers and threads.

Gunicorn Configuration Example

You can configure Gunicorn via command-line arguments or a Python configuration file. For production, a configuration file is cleaner.

# gunicorn_config.py

import multiprocessing

# Number of worker processes. A common starting point is (2 * number_of_cores) + 1.
# For I/O bound applications, consider using async workers and fewer processes.
workers = multiprocessing.cpu_count() * 2 + 1

# Worker class. 'sync' is default. 'gevent' or 'eventlet' for async.
# If using gevent/eventlet, ensure they are installed: pip install gevent
worker_class = 'sync' # Or 'gevent'

# If using 'sync' worker_class, you can enable threads.
# Each worker process will spawn 'threads' number of threads.
# Be cautious with threads due to Python's GIL.
threads = 2

# Bind to a socket. Nginx will proxy to this.
bind = "127.0.0.1:8000"

# Logging settings
loglevel = 'info'
accesslog = '-' # Log to stdout, which can be captured by systemd/Docker
errorlog = '-'  # Log to stderr

# Timeout for worker requests. Adjust based on your application's longest operations.
timeout = 120

# Maximum number of requests a worker will process before restarting.
# Helps prevent memory leaks.
max_requests = 5000
max_requests_jitter = 1000 # Randomize restart to avoid thundering herd

# Worker temporary directory for large uploads/responses
# temporary_path = "/var/tmp"

To run Gunicorn with this configuration:

gunicorn -c gunicorn_config.py your_app.wsgi:application

Tuning Considerations:

Worker Count: Start with `(2 * CPU cores) + 1` for sync workers. If your application is heavily I/O bound (e.g., making many external API calls, database queries), consider async workers (gevent/eventlet) and potentially fewer worker processes, as each async worker can handle many concurrent connections.
Threads: For sync workers, threads can improve concurrency for I/O-bound tasks within a single process. However, Python’s GIL means threads won’t provide true parallelism for CPU-bound tasks.
Timeouts: Ensure `timeout` is set high enough to accommodate your application’s longest-running requests without being so high that it masks performance issues or hangs indefinitely.
Max Requests: Regularly restarting workers helps mitigate memory leaks and ensures a fresh process.

Elasticsearch Performance Tuning on OVH

Elasticsearch, often used for logging, search, and analytics, can become a performance bottleneck if not properly configured, especially on shared or resource-constrained OVH instances. Tuning focuses on JVM heap size, file descriptors, and shard allocation.

JVM Heap Size Configuration

The JVM heap is critical for Elasticsearch performance. It should be set to no more than 50% of the system’s available RAM, and never exceed 30-32GB due to compressed ordinary object pointers (compressed oops).

# /etc/elasticsearch/jvm.options

# Set initial and max heap size
# Example for a server with 16GB RAM:
-Xms8g
-Xmx8g

# Example for a server with 64GB RAM (but capped at ~30GB):
# -Xms15g
# -Xmx15g

Important: After changing jvm.options, restart Elasticsearch:

sudo systemctl restart elasticsearch

File Descriptors Limit

Elasticsearch uses a large number of file descriptors for its indices and network connections. The default limits are often too low. Increase them system-wide and for the Elasticsearch user.

# /etc/security/limits.conf
# Add these lines for the 'elasticsearch' user (or the user ES runs as)
elasticsearch   soft    nofile  65536
elasticsearch   hard    nofile  65536
elasticsearch   soft    nproc   4096
elasticsearch   hard    nproc   4096

# /etc/systemd/system/elasticsearch.service.d/override.conf (if using systemd)
# Create this file if it doesn't exist
[Service]
LimitNOFILE=65536
LimitNPROC=4096

Note: You might need to reboot the server or at least log out and back in for limits.conf changes to take effect for interactive sessions. For systemd services, reloading the daemon and restarting the service is usually sufficient:

sudo systemctl daemon-reload
sudo systemctl restart elasticsearch

Shard Allocation and Indexing Performance

Improper shard distribution can lead to uneven disk I/O and CPU load. For OVH instances, especially those with potentially slower disks or network, optimizing shard count and allocation is key.

Shard Count: Aim for a shard size between 10GB and 50GB. Too many small shards increase overhead. Too few large shards can hinder rebalancing and recovery.

Allocation Awareness: If you have multiple nodes in different physical locations (e.g., different OVH data centers or availability zones), configure allocation awareness to ensure replicas are placed on different nodes/zones for high availability.

# PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.awareness.attributes": "zone"
  }
}

Then, when creating nodes, assign the `zone` attribute:

# PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}

Indexing Buffer: For write-heavy workloads, tuning the JVM heap’s `indices.memory.index_buffer_size` can help. It defaults to 10% of the heap.

# Example: Set index buffer to 2GB if heap is 8GB
indices.memory.index_buffer_size: 256m # Default is 10% of heap, often sufficient. Adjust if profiling shows it's a bottleneck.

Monitoring: Regularly monitor Elasticsearch using tools like Kibana’s Stack Monitoring, Prometheus/Grafana, or the `_cat` APIs to identify bottlenecks. Pay attention to CPU usage, disk I/O, JVM heap usage, and garbage collection activity.

Putting It All Together: A Holistic Approach

Optimizing a stack on OVH for Ruby applications involves a layered approach. Nginx handles the edge, efficiently routing and serving static content. Gunicorn/Puma manages application concurrency, and Elasticsearch provides robust data indexing and retrieval. Each layer must be tuned independently and then observed in concert. Start with sensible defaults, monitor performance metrics closely (CPU, RAM, I/O, network, latency), and iterate on configurations based on real-world load and observed bottlenecks. For OVH, consider their specific instance types and network performance characteristics when making tuning decisions.