The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and MongoDB on AWS for Ruby
Nginx Tuning for High-Traffic Ruby Applications
Optimizing Nginx as a reverse proxy and static file server is crucial for any high-traffic Ruby application. We’ll focus on key directives that directly impact performance and resource utilization, particularly when serving dynamic content proxied to Gunicorn or Puma (for Python/Ruby respectively) or PHP-FPM.
Worker Processes and Connections
The worker_processes directive determines how many worker processes Nginx will spawn. Setting this to auto is generally recommended, allowing Nginx to detect the number of CPU cores and utilize them efficiently. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. A common starting point is 1024, but this should be tuned based on your application’s concurrency needs and server’s available RAM.
Keepalive Connections
Enabling keepalive_timeout reduces the overhead of establishing new TCP connections for subsequent requests from the same client. A value between 60 and 120 seconds is a good balance, preventing idle connections from consuming resources indefinitely while still benefiting from connection reuse. keepalive_requests limits the number of requests that can be made over a single keepalive connection, preventing potential resource exhaustion on the client or server side.
Buffering and Timeouts
Nginx uses buffers to handle requests and responses. Tuning client_body_buffer_size and client_header_buffer_size can prevent excessive disk I/O for small requests. For proxied requests, proxy_buffers and proxy_buffer_size are critical. The number of buffers and their size should be sufficient to hold the largest expected response from your backend application. Timeouts like proxy_connect_timeout, proxy_send_timeout, and proxy_read_timeout are vital to prevent Nginx from holding connections open indefinitely if the backend is slow or unresponsive. Values between 30 and 60 seconds are typical.
Gzip Compression
Enabling Gzip compression significantly reduces the amount of data transferred over the network, improving page load times. Directives like gzip on, gzip_vary on, gzip_proxied any, gzip_comp_level (typically 4-6), and gzip_types are essential. Ensure you include common MIME types for your application’s assets.
Nginx Configuration Snippet
Here’s a sample Nginx configuration snippet incorporating these optimizations. Remember to adjust values based on your specific AWS instance type and application load.
worker_processes auto;
events {
worker_connections 4096; # Adjust based on RAM and expected concurrency
multi_accept on;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 1000;
# Buffering
client_body_buffer_size 10K;
client_header_buffer_size 1K;
client_header_timeout 30s;
client_body_timeout 30s;
large_client_header_buffers 2 8k; # For potentially large headers
# Gzip Compression
gzip on;
gzip_vary on;
gzip_proxied any; # Compress all proxied responses
gzip_comp_level 6;
gzip_min_length 256; # Don't compress very small responses
gzip_types text/plain text/css application/json application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;
# Proxy settings (example for Gunicorn/Puma)
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_buffer_size 16k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k;
proxy_temp_file_write_size 64k;
# SSL configuration (if applicable)
# ssl_protocols TLSv1.2 TLSv1.3;
# ssl_prefer_server_ciphers on;
# ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:...';
server {
listen 80;
server_name your_domain.com;
location / {
proxy_pass http://your_backend_app_address; # e.g., http://127.0.0.1:8000
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# Serve static files directly
location /static/ {
alias /path/to/your/static/files/;
expires 30d;
access_log off;
add_header Cache-Control "public";
}
# Health check endpoint
location /health {
access_log off;
return 200 "OK";
add_header Content-Type text/plain;
}
}
# Include other server blocks or configurations as needed
}
Gunicorn/Puma Tuning for Ruby/Python Applications
When using Gunicorn (Python) or Puma (Ruby) as your WSGI/Rack HTTP server, tuning its worker processes and threads is paramount. These servers bridge the gap between Nginx and your application code.
Worker Processes and Threads
The number of worker processes (--workers in Gunicorn, --workers in Puma) should ideally be (2 * Number of CPU Cores) + 1. This formula accounts for handling I/O-bound tasks efficiently. For CPU-bound applications, a lower number might be more appropriate to avoid excessive context switching. Threads (--threads in Gunicorn, --threads in Puma) allow a single worker process to handle multiple requests concurrently. The optimal number of threads depends heavily on your application’s I/O patterns. A common starting point for Puma is 5 threads per worker.
Timeouts and Queues
--timeout (Gunicorn) or --timeout (Puma) defines how long a worker will wait for a request to complete before timing out. This should be set slightly higher than your application’s longest expected request processing time. The --keep-alive option (Puma) or --keep-alive (Gunicorn) allows for persistent connections, reducing latency. For Gunicorn, --worker-connections (deprecated in favor of threads) or understanding how threads manage concurrency is key. Puma’s --queue-workers can help manage request queues when workers are busy.
Gunicorn Configuration Example
Here’s a typical Gunicorn command-line invocation for a Python application:
gunicorn --workers 3 \
--threads 2 \
--timeout 120 \
--bind 0.0.0.0:8000 \
your_project.wsgi:application
Puma Configuration Example
And a Puma command-line invocation for a Ruby application:
bundle exec puma -C config/puma.rb
Where config/puma.rb might contain:
# config/puma.rb
workers 4 # (2 * CPU cores) + 1
threads 0, 5 # Min threads 0, Max threads 5
environment ENV.fetch('RAILS_ENV') { 'production' }
bind 'unix:///path/to/your/app.sock' # Or 'tcp://0.0.0.0:9292' if not using a Unix socket
# Other settings like pidfile, log_requests, etc.
MongoDB Tuning on AWS (EC2/RDS)
Optimizing MongoDB performance on AWS involves both instance-level tuning and MongoDB configuration adjustments. We’ll cover both scenarios, whether you’re running MongoDB directly on EC2 or using Amazon RDS for MongoDB.
Instance Selection and Storage
For EC2, choose instance types with sufficient CPU, RAM, and network throughput. i3, i4i, and m5d/r5d instances with local NVMe SSDs are excellent for performance-critical workloads due to low latency and high IOPS. For RDS, select instance classes that match your performance needs, and crucially, provision sufficient IOPS for your EBS volume (or instance storage for some EC2 types). MongoDB’s performance is heavily I/O bound, so storage performance is paramount.
MongoDB Configuration (`mongod.conf`)
Key parameters in mongod.conf (or /etc/mongod.conf) include:
storage.wiredTiger.engineConfig.cacheSizeGB: This is arguably the most critical setting. Allocate a significant portion of your instance’s RAM to the WiredTiger cache. A common recommendation is 50-75% of available RAM, leaving enough for the OS and other processes.storage.journal.enabled: Always keep this enabled for durability.operationProfiling.mode: Set toslowOporallfor performance analysis, but disable in production unless actively debugging.net.bindIp: Ensure this is set correctly to allow connections from your application servers (e.g.,0.0.0.0or specific private IPs).sharding.clusterRole: If part of a sharded cluster, this is essential.
MongoDB Configuration Snippet (EC2)
# /etc/mongod.conf
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
wiredTiger:
engineConfig:
cacheSizeGB: 0.75 # Example: 75% of 8GB RAM instance
collectionConfig:
blockSize: 4KB
compression: snappy # Or zstd for better compression/performance
indexConfig:
prefixCompression: true
operationProfiling:
mode: off # Set to 'slowOp' or 'all' for debugging, then turn off
net:
port: 27017
bindIp: 0.0.0.0 # Or specific private IPs for security
# Sharding settings (if applicable)
# sharding:
# clusterRole: configsvr
# configsvr: true
# localConnectionString: mongodb://your_mongos_host:27017
# Replication settings (if applicable)
# replication:
# replSetName: rs0
RDS for MongoDB Specifics
When using RDS for MongoDB, many low-level configurations are managed by AWS. Your primary tuning levers are:
- Instance Class: Choose an appropriate class (e.g.,
r6g,r5,m6g,m5) with sufficient RAM and vCPUs. - Storage Type and IOPS: Select
gp3orio1/io2for EBS volumes.gp3allows independent scaling of IOPS and throughput. Provision enough IOPS to meet your application’s read/write demands. - Parameter Groups: While AWS manages many parameters, you can tune some via custom parameter groups. The most impactful parameter you can often tune is
wiredTigerCacheSizeGB. - Monitoring: Leverage CloudWatch metrics (CPU utilization, Network In/Out, Disk Queue Depth, IOPS, Latency) to identify bottlenecks.
Indexing Strategy
Regardless of deployment method, a robust indexing strategy is non-negotiable for MongoDB performance. Regularly analyze slow queries using db.slow_queries.find() or the profiler. Ensure that your application’s most frequent and critical queries are supported by appropriate indexes. Avoid over-indexing, as indexes consume disk space and slow down write operations.
Connection Pooling
Ensure your Ruby/Python application is using connection pooling for MongoDB. Libraries like Moped (Ruby) or PyMongo (Python) manage connection pools automatically. Properly configuring the pool size (e.g., matching your application’s worker/thread count) prevents connection exhaustion and improves latency.