The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on DigitalOcean for Ruby

Nginx as a High-Performance Frontend Proxy

For Ruby applications, Nginx serves as an indispensable frontend proxy, efficiently handling static assets, SSL termination, and load balancing. Optimizing Nginx is crucial for maximizing throughput and minimizing latency. We’ll focus on key directives that impact performance and resource utilization.

Tuning Worker Processes and Connections

The number of worker processes directly influences how Nginx utilizes CPU cores. A common best practice is to set worker_processes to the number of available CPU cores. The worker_connections directive limits the number of simultaneous connections a worker process can handle. The total maximum connections will be worker_processes * worker_connections.

Example Nginx Configuration Snippet

worker_processes auto; # Or set to the number of CPU cores
events {
    worker_connections 4096; # Adjust based on expected load and server memory
    multi_accept on;
}

http {
    # ... other http configurations ...

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off; # Important for security and reducing response size

    # Gzip compression for text-based assets
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # ... upstream configurations for Gunicorn/Puma ...
    upstream ruby_app {
        server 127.0.0.1:8000; # Assuming Gunicorn is listening on port 8000
        # Or for Puma:
        # server unix:/path/to/your/app.sock fail_timeout=0;
    }

    server {
        listen 80;
        server_name your_domain.com www.your_domain.com;

        location / {
            proxy_pass http://ruby_app;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_read_timeout 300s; # Increase if your app has long-running requests
            proxy_connect_timeout 75s;
        }

        # Serve static assets directly from Nginx
        location ~ ^/(assets|images|javascripts|stylesheets)/ {
            root /path/to/your/rails/public; # Adjust to your Rails public directory
            expires max;
            add_header Cache-Control public;
        }
    }
}

Explanation:

worker_processes auto;: Automatically sets the number of worker processes to the number of CPU cores.
worker_connections 4096;: A generous default. Monitor your server’s memory and adjust if necessary.
multi_accept on;: Allows workers to accept multiple new connections at once.
sendfile on;: Enables efficient transfer of files from Nginx to the client without copying data between kernel and user space.
tcp_nopush on; and tcp_nodelay on;: Optimize TCP packet sending.
keepalive_timeout 65;: Sets the timeout for keep-alive connections.
server_tokens off;: Hides Nginx version information, a minor security enhancement.
gzip_* directives: Enable and configure Gzip compression for text-based responses.
upstream ruby_app: Defines a backend server group. Adjust the server directive to match your application server’s listening address (TCP or Unix socket).
proxy_set_header directives: Pass essential client information to the backend application.
proxy_read_timeout and proxy_connect_timeout: Crucial for preventing upstream timeouts on slow requests.
Static asset location block: Offloads serving static files to Nginx for maximum performance.

SSL/TLS Optimization

When using SSL/TLS, several directives can improve performance:

Example SSL/TLS Configuration Snippet

server {
    listen 443 ssl http2;
    server_name your_domain.com www.your_domain.com;

    ssl_certificate /etc/letsencrypt/live/your_domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your_domain.com/privkey.pem;

    # Modern TLS configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';
    ssl_session_cache shared:SSL:10m; # Adjust size based on memory and expected connections
    ssl_session_timeout 10m;
    ssl_session_tickets off; # Consider enabling if performance is critical and security implications understood

    # OCSP Stapling
    ssl_stapling on;
    ssl_stapling_verify on;
    resolver 8.8.8.8 8.8.4.4 valid=300s; # Use your preferred DNS resolvers
    resolver_timeout 5s;

    # HSTS (HTTP Strict Transport Security)
    add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;

    # ... rest of your server configuration ...
    location / {
        proxy_pass http://ruby_app;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 300s;
        proxy_connect_timeout 75s;
    }
}

Explanation:

ssl_protocols TLSv1.2 TLSv1.3;: Enforces modern, secure TLS versions.
ssl_ciphers: A curated list of strong, performant cipher suites.
ssl_session_cache and ssl_session_timeout: Enable session resumption to reduce handshake overhead for returning clients.
ssl_session_tickets off;: Disabling session tickets can improve security but might slightly increase handshake latency for some clients. Enable if your threat model allows.
ssl_stapling on; and ssl_stapling_verify on;: Improves performance by allowing Nginx to cache OCSP responses, reducing client lookup time.
resolver: Specifies DNS servers for OCSP stapling.
add_header Strict-Transport-Security ...: Enforces HTTPS, improving security and potentially performance by eliminating HTTP-to-HTTPS redirects.

Gunicorn/Puma: The Ruby Application Server

The choice between Gunicorn and Puma (or other WSGI/Rack servers) depends on your application’s needs and deployment strategy. Both require careful tuning to balance concurrency, memory usage, and request latency.

Gunicorn Tuning

Gunicorn’s concurrency model is primarily based on worker processes. The most critical settings are --workers and --threads (if using a threaded worker type).

Worker Type Considerations

For most Ruby applications (especially those using standard Rack interfaces), the sync worker type is common. However, if your application has significant I/O-bound operations and you’re comfortable with the complexities, gevent or eventlet (which require installing the respective libraries) can offer higher concurrency with fewer processes.

Calculating Optimal Workers

A common heuristic for the sync worker type is:

(2 * number_of_cpu_cores) + 1

This formula aims to keep CPU cores busy while accounting for I/O waits. For threaded workers (like gthread), you’d adjust based on threads per worker.

Example Gunicorn Command Line / Systemd Service

# Example command line
gunicorn --workers 4 \
         --threads 2 \
         --worker-class sync \
         --bind 127.0.0.1:8000 \
         --timeout 120 \
         --graceful-timeout 120 \
         --log-level info \
         --access-logfile /var/log/gunicorn/access.log \
         --error-logfile /var/log/gunicorn/error.log \
         your_app.wsgi:application # Replace with your actual WSGI application entry point

# Example Systemd service file (/etc/systemd/system/gunicorn.service)
[Unit]
Description=Gunicorn instance to serve your_app
After=network.target

[Service]
User=your_app_user
Group=www-data
WorkingDirectory=/path/to/your/app
Environment="PATH=/path/to/your/app/venv/bin"
ExecStart=/path/to/your/app/venv/bin/gunicorn \
          --workers 4 \
          --threads 2 \
          --worker-class sync \
          --bind unix:/run/gunicorn.sock \
          --timeout 120 \
          --graceful-timeout 120 \
          --log-level info \
          --access-logfile /var/log/gunicorn/access.log \
          --error-logfile /var/log/gunicorn/error.log \
          your_app.wsgi:application

[Install]
section
WantedBy=multi-user.target

Explanation:

--workers 4: Set based on the heuristic and available CPU cores.
--threads 2: If using gthread worker class, this determines threads per worker. For sync, this is ignored.
--worker-class sync: The default, suitable for most CPU-bound or mixed workloads.
--bind unix:/run/gunicorn.sock: Binding to a Unix socket is generally faster than TCP for local communication between Nginx and Gunicorn.
--timeout 120: A generous timeout to prevent premature request termination. Adjust based on your application’s longest expected operations.
--graceful-timeout 120: Time allowed for existing requests to finish during a reload.
your_app.wsgi:application: The Python path to your WSGI application object.

Puma Tuning

Puma is a multi-threaded server. Its concurrency is managed by --workers (for multi-processing) and --threads (for multi-threading within each worker).

Calculating Optimal Workers and Threads

A common starting point for Puma:

Workers: (2 * number_of_cpu_cores) + 1 (similar to Gunicorn’s sync workers, for process-level concurrency).
Threads: 5 (a reasonable default, but can be tuned significantly based on I/O vs. CPU bound nature of your app).

The total concurrency is roughly workers * threads, but this is a simplification as Puma uses a thread pool and handles requests asynchronously.

Example Puma Command Line / Systemd Service

# Example command line
puma -w 4 -t 5 -b unix:///path/to/your/app.sock --pidfile /var/run/puma.pid \
     --state /var/run/puma.state \
     --log-dir /var/log/puma \
     --control tcp://127.0.0.1:9292 \
     --environment production

# Example Systemd service file (/etc/systemd/system/puma.service)
[Unit]
Description=Puma HTTP Server for your_app
After=network.target

[Service]
User=your_app_user
Group=www-data
WorkingDirectory=/path/to/your/app
Environment="RAILS_ENV=production"
ExecStart=/path/to/your/app/bin/puma -w 4 -t 5 -b unix:/run/puma.sock \
          --pidfile /var/run/puma.pid \
          --state /var/run/puma.state \
          --log-dir /var/log/puma \
          --control tcp://127.0.0.1:9292 \
          config/puma.rb # Or directly specify the app if not using a config file

[Install]
WantedBy=multi-user.target

Explanation:

-w 4: Number of worker processes.
-t 5: Number of threads per worker.
-b unix:/run/puma.sock: Binding to a Unix socket.
--control tcp://127.0.0.1:9292: Enables the Puma remote control console for status checks and management.
config/puma.rb: Puma can also be configured via a Ruby file, offering more granular control.

Elasticsearch Performance Tuning

Elasticsearch, while not directly part of the Ruby application stack, is often a critical dependency for search functionality. Its performance directly impacts user experience. Tuning involves JVM heap, shard allocation, and indexing strategies.

JVM Heap Size Configuration

Elasticsearch runs on the JVM. Allocating sufficient heap is crucial. The general recommendation is to set the heap size to 50% of the system’s RAM, but not exceeding 30-32GB. This is configured in jvm.options.

Example jvm.options Snippet

# /etc/elasticsearch/jvm.options (or similar path)

# Xms represents the initial size of the heap, and Xmx represents the maximum size.
# For a server with 64GB RAM, you might set it to 31GB.
-Xms16g
-Xmx16g

# Other JVM options...
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/lib/elasticsearch
-XX:+ExitOnOutOfMemoryError

Note: After changing jvm.options, you must restart the Elasticsearch service.

Shard Allocation and Sizing

The number and size of shards significantly impact search and indexing performance. Aim for shard sizes between 10GB and 50GB. Too many small shards increase overhead; too few large shards can hinder rebalancing and recovery.

Shard Allocation Awareness and Filtering

For multi-node clusters, use shard allocation awareness to ensure shards are distributed across different physical racks or availability zones. You can also use allocation filtering to control where shards are placed.

Example Cluster Settings (via API)

{
  "persistent": {
    "cluster.routing.allocation.awareness.attributes": "zone",
    "cluster.routing.allocation.enable": "all"
  },
  "transient": {
    "cluster.routing.allocation.exclude._ip": "192.168.1.10"
  }
}

Explanation:

cluster.routing.allocation.awareness.attributes: "zone": Tells Elasticsearch to consider the ‘zone’ attribute when allocating shards. Nodes must be tagged with this attribute.
cluster.routing.allocation.enable: "all": Ensures shards are allocated. Other options include primaries, new_primaries, and none.
cluster.routing.allocation.exclude._ip: "192.168.1.10": Temporarily excludes a node from shard allocation (e.g., for maintenance).

Indexing Performance

During indexing, you can temporarily disable replica shards and refresh intervals to improve throughput. Remember to re-enable them afterward.

Example Indexing Settings (via API)

{
  "index": {
    "number_of_replicas": 0,
    "refresh_interval": "-1"
  }
}

After indexing is complete, revert these settings:

{
  "index": {
    "number_of_replicas": 1,  // Or your desired number of replicas
    "refresh_interval": "1s" // Or your desired refresh interval
  }
}

Note: Setting refresh_interval to -1 disables refreshing, meaning documents won’t be searchable until the interval is reset or the index is closed. Setting number_of_replicas to 0 means no copies of your data are being written during indexing, increasing risk if a node fails.

Monitoring and Iterative Tuning

Performance tuning is not a one-time task. Continuous monitoring is essential. Key metrics to track include:

Nginx: Active connections, requests per second, error rates (5xx, 4xx), worker connections, upstream response times.
Gunicorn/Puma: Request latency, worker utilization, memory usage per worker, error rates, queue lengths (if applicable).
Elasticsearch: JVM heap usage, CPU utilization, disk I/O, search latency, indexing rate, shard status, cluster health (green, yellow, red).
System: CPU load, memory usage, disk I/O, network traffic.

Tools like Prometheus with Grafana, Datadog, New Relic, or even basic system monitoring tools (htop, vmstat, iostat) are invaluable. Regularly review these metrics, identify bottlenecks, and make incremental adjustments to your configurations. Always test changes in a staging environment before deploying to production.

The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on DigitalOcean for Ruby

Nginx as a High-Performance Frontend Proxy

Tuning Worker Processes and Connections

Example Nginx Configuration Snippet

SSL/TLS Optimization

Example SSL/TLS Configuration Snippet

Gunicorn/Puma: The Ruby Application Server

Gunicorn Tuning

Worker Type Considerations

Calculating Optimal Workers

Example Gunicorn Command Line / Systemd Service

Puma Tuning

Calculating Optimal Workers and Threads

Example Puma Command Line / Systemd Service

Elasticsearch Performance Tuning

JVM Heap Size Configuration

Example jvm.options Snippet

Shard Allocation and Sizing

Shard Allocation Awareness and Filtering

Example Cluster Settings (via API)

Indexing Performance

Example Indexing Settings (via API)

Monitoring and Iterative Tuning

Recent Posts

Top Categories

Our Products

Our Services