The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and MongoDB on AWS for Ruby

Nginx Tuning for High-Traffic Ruby Applications

Optimizing Nginx as a reverse proxy and static file server is crucial for any high-traffic Ruby application. We’ll focus on key directives that directly impact performance and resource utilization, particularly when serving dynamic content proxied to Gunicorn or Puma (for Python/Ruby respectively) or PHP-FPM.

Worker Processes and Connections

The worker_processes directive determines how many worker processes Nginx will spawn. Setting this to auto is generally recommended, allowing Nginx to detect the number of CPU cores and utilize them efficiently. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. A common starting point is 1024, but this should be tuned based on your application’s concurrency needs and server’s available RAM.

Keepalive Connections

Enabling keepalive_timeout reduces the overhead of establishing new TCP connections for subsequent requests from the same client. A value between 60 and 120 seconds is a good balance, preventing idle connections from consuming resources indefinitely while still benefiting from connection reuse. keepalive_requests limits the number of requests that can be made over a single keepalive connection, preventing potential resource exhaustion on the client or server side.

Buffering and Timeouts

Nginx uses buffers to handle requests and responses. Tuning client_body_buffer_size and client_header_buffer_size can prevent excessive disk I/O for small requests. For proxied requests, proxy_buffers and proxy_buffer_size are critical. The number of buffers and their size should be sufficient to hold the largest expected response from your backend application. Timeouts like proxy_connect_timeout, proxy_send_timeout, and proxy_read_timeout are vital to prevent Nginx from holding connections open indefinitely if the backend is slow or unresponsive. Values between 30 and 60 seconds are typical.

Gzip Compression

Enabling Gzip compression significantly reduces the amount of data transferred over the network, improving page load times. Directives like gzip on, gzip_vary on, gzip_proxied any, gzip_comp_level (typically 4-6), and gzip_types are essential. Ensure you include common MIME types for your application’s assets.

Nginx Configuration Snippet

Here’s a sample Nginx configuration snippet incorporating these optimizations. Remember to adjust values based on your specific AWS instance type and application load.

worker_processes auto;
events {
    worker_connections 4096; # Adjust based on RAM and expected concurrency
    multi_accept on;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;

    keepalive_timeout 65;
    keepalive_requests 1000;

    # Buffering
    client_body_buffer_size    10K;
    client_header_buffer_size  1K;
    client_header_timeout      30s;
    client_body_timeout        30s;
    large_client_header_buffers  2 8k; # For potentially large headers

    # Gzip Compression
    gzip on;
    gzip_vary on;
    gzip_proxied any; # Compress all proxied responses
    gzip_comp_level 6;
    gzip_min_length 256; # Don't compress very small responses
    gzip_types text/plain text/css application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;

    # Proxy settings (example for Gunicorn/Puma)
    proxy_connect_timeout       60s;
    proxy_send_timeout          60s;
    proxy_read_timeout          60s;
    proxy_buffer_size           16k;
    proxy_buffers               4 32k;
    proxy_busy_buffers_size     64k;
    proxy_temp_file_write_size  64k;

    # SSL configuration (if applicable)
    # ssl_protocols TLSv1.2 TLSv1.3;
    # ssl_prefer_server_ciphers on;
    # ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:...';

    server {
        listen 80;
        server_name your_domain.com;

        location / {
            proxy_pass http://your_backend_app_address; # e.g., http://127.0.0.1:8000
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        # Serve static files directly
        location /static/ {
            alias /path/to/your/static/files/;
            expires 30d;
            access_log off;
            add_header Cache-Control "public";
        }

        # Health check endpoint
        location /health {
            access_log off;
            return 200 "OK";
            add_header Content-Type text/plain;
        }
    }

    # Include other server blocks or configurations as needed
}

Gunicorn/Puma Tuning for Ruby/Python Applications

When using Gunicorn (Python) or Puma (Ruby) as your WSGI/Rack HTTP server, tuning its worker processes and threads is paramount. These servers bridge the gap between Nginx and your application code.

Worker Processes and Threads

The number of worker processes (--workers in Gunicorn, --workers in Puma) should ideally be (2 * Number of CPU Cores) + 1. This formula accounts for handling I/O-bound tasks efficiently. For CPU-bound applications, a lower number might be more appropriate to avoid excessive context switching. Threads (--threads in Gunicorn, --threads in Puma) allow a single worker process to handle multiple requests concurrently. The optimal number of threads depends heavily on your application’s I/O patterns. A common starting point for Puma is 5 threads per worker.

Timeouts and Queues

--timeout (Gunicorn) or --timeout (Puma) defines how long a worker will wait for a request to complete before timing out. This should be set slightly higher than your application’s longest expected request processing time. The --keep-alive option (Puma) or --keep-alive (Gunicorn) allows for persistent connections, reducing latency. For Gunicorn, --worker-connections (deprecated in favor of threads) or understanding how threads manage concurrency is key. Puma’s --queue-workers can help manage request queues when workers are busy.

Gunicorn Configuration Example

Here’s a typical Gunicorn command-line invocation for a Python application:

gunicorn --workers 3 \
         --threads 2 \
         --timeout 120 \
         --bind 0.0.0.0:8000 \
         your_project.wsgi:application

Puma Configuration Example

And a Puma command-line invocation for a Ruby application:

bundle exec puma -C config/puma.rb

Where config/puma.rb might contain:

# config/puma.rb
workers 4 # (2 * CPU cores) + 1
threads 0, 5 # Min threads 0, Max threads 5

environment ENV.fetch('RAILS_ENV') { 'production' }

bind 'unix:///path/to/your/app.sock' # Or 'tcp://0.0.0.0:9292' if not using a Unix socket

# Other settings like pidfile, log_requests, etc.

MongoDB Tuning on AWS (EC2/RDS)

Optimizing MongoDB performance on AWS involves both instance-level tuning and MongoDB configuration adjustments. We’ll cover both scenarios, whether you’re running MongoDB directly on EC2 or using Amazon RDS for MongoDB.

Instance Selection and Storage

For EC2, choose instance types with sufficient CPU, RAM, and network throughput. i3, i4i, and m5d/r5d instances with local NVMe SSDs are excellent for performance-critical workloads due to low latency and high IOPS. For RDS, select instance classes that match your performance needs, and crucially, provision sufficient IOPS for your EBS volume (or instance storage for some EC2 types). MongoDB’s performance is heavily I/O bound, so storage performance is paramount.

MongoDB Configuration (`mongod.conf`)

Key parameters in mongod.conf (or /etc/mongod.conf) include:

storage.wiredTiger.engineConfig.cacheSizeGB: This is arguably the most critical setting. Allocate a significant portion of your instance’s RAM to the WiredTiger cache. A common recommendation is 50-75% of available RAM, leaving enough for the OS and other processes.
storage.journal.enabled: Always keep this enabled for durability.
operationProfiling.mode: Set to slowOp or all for performance analysis, but disable in production unless actively debugging.
net.bindIp: Ensure this is set correctly to allow connections from your application servers (e.g., 0.0.0.0 or specific private IPs).
sharding.clusterRole: If part of a sharded cluster, this is essential.

MongoDB Configuration Snippet (EC2)

# /etc/mongod.conf
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
  wiredTiger:
    engineConfig:
      cacheSizeGB: 0.75 # Example: 75% of 8GB RAM instance
    collectionConfig:
      blockSize: 4KB
      compression: snappy # Or zstd for better compression/performance
    indexConfig:
      prefixCompression: true

operationProfiling:
  mode: off # Set to 'slowOp' or 'all' for debugging, then turn off

net:
  port: 27017
  bindIp: 0.0.0.0 # Or specific private IPs for security

# Sharding settings (if applicable)
# sharding:
#   clusterRole: configsvr
#   configsvr: true
#   localConnectionString: mongodb://your_mongos_host:27017

# Replication settings (if applicable)
# replication:
#   replSetName: rs0

RDS for MongoDB Specifics

When using RDS for MongoDB, many low-level configurations are managed by AWS. Your primary tuning levers are:

Instance Class: Choose an appropriate class (e.g., r6g, r5, m6g, m5) with sufficient RAM and vCPUs.
Storage Type and IOPS: Select gp3 or io1/io2 for EBS volumes. gp3 allows independent scaling of IOPS and throughput. Provision enough IOPS to meet your application’s read/write demands.
Parameter Groups: While AWS manages many parameters, you can tune some via custom parameter groups. The most impactful parameter you can often tune is wiredTigerCacheSizeGB.
Monitoring: Leverage CloudWatch metrics (CPU utilization, Network In/Out, Disk Queue Depth, IOPS, Latency) to identify bottlenecks.

Indexing Strategy

Regardless of deployment method, a robust indexing strategy is non-negotiable for MongoDB performance. Regularly analyze slow queries using db.slow_queries.find() or the profiler. Ensure that your application’s most frequent and critical queries are supported by appropriate indexes. Avoid over-indexing, as indexes consume disk space and slow down write operations.

Connection Pooling

Ensure your Ruby/Python application is using connection pooling for MongoDB. Libraries like Moped (Ruby) or PyMongo (Python) manage connection pools automatically. Properly configuring the pool size (e.g., matching your application’s worker/thread count) prevents connection exhaustion and improves latency.