The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Redis on AWS for Ruby

Nginx as a High-Performance Frontend for Ruby Applications

When deploying Ruby applications, particularly those built with frameworks like Ruby on Rails or Sinatra, Nginx serves as an indispensable frontend. Its primary roles are to efficiently handle static file serving, SSL termination, request buffering, and load balancing, offloading these critical tasks from your application server. This section details optimal Nginx configurations for this purpose.

Optimizing Nginx for Static Assets and Request Handling

A well-tuned Nginx configuration can dramatically reduce the load on your application servers. Key directives to focus on include worker processes, connection limits, and caching strategies.

Worker Processes and Connections

The number of worker processes should ideally match the number of CPU cores available on your Nginx instance. The worker_connections directive dictates the maximum number of simultaneous connections a single worker process can handle. A common starting point is 1024, but this can be tuned based on your application’s concurrency needs and system resources.

Nginx Configuration Snippet

worker_processes auto; # Auto-detect based on CPU cores
events {
    worker_connections 4096; # Adjust based on expected load and system limits
    multi_accept on;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    # Gzip compression for text-based assets
    gzip on;
    gzip_disable "msie6";
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Static file caching
    location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
        expires 30d;
        add_header Cache-Control "public, no-transform";
        access_log off;
    }

    # Proxy to your application server (e.g., Gunicorn/Unicorn or PHP-FPM)
    location / {
        proxy_pass http://your_app_server_upstream; # Defined below
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 300s; # Increase if your app has long-running requests
        proxy_connect_timeout 75s;
        proxy_send_timeout 300s;
        proxy_buffer_size 128k;
        proxy_buffers 4 256k;
        proxy_busy_buffers_size 256k;
    }

    # Define your upstream application server(s)
    upstream your_app_server_upstream {
        # For Gunicorn/Unicorn (Python)
        # server unix:/path/to/your/app.sock fail_timeout=0;
        # server 127.0.0.1:8000 fail_timeout=0;

        # For PHP-FPM
        # server unix:/var/run/php/php7.4-fpm.sock;
        # server 127.0.0.1:9000;

        # For Puma/Passenger (Ruby)
        # server unix:/path/to/your/puma.sock fail_timeout=0;
        # server 127.0.0.1:3000 fail_timeout=0;

        # Example for multiple app servers (load balancing)
        # server app1.example.com:8000 weight=5;
        # server app2.example.com:8000 weight=1;
        # server app3.example.com:8000 backup;

        # For Gunicorn/Unicorn with multiple workers
        server unix:/path/to/your/app.sock fail_timeout=0;
        server unix:/path/to/your/app2.sock fail_timeout=0;
        server unix:/path/to/your/app3.sock fail_timeout=0;
        server unix:/path/to/your/app4.sock fail_timeout=0;
    }

    # Optional: Rate limiting
    # limit_req_zone $binary_remote_addr zone=mylimit:10m rate=5r/s;
    # location / {
    #     limit_req zone=mylimit burst=20 nodelay;
    #     proxy_pass http://your_app_server_upstream;
    #     # ... other proxy settings
    # }
}

SSL Termination and HTTP/2

Offloading SSL/TLS encryption and decryption to Nginx is a standard practice. Enabling HTTP/2 can further improve performance by allowing multiplexing, header compression, and server push. Ensure your Nginx is compiled with the --with-http_v2_module flag.

SSL Configuration Snippet

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name yourdomain.com www.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 10m;
    ssl_session_tickets off; # Consider security implications

    # OCSP Stapling
    ssl_stapling on;
    ssl_stapling_verify on;
    resolver 8.8.8.8 8.8.4.4 valid=300s; # Use your preferred DNS resolvers
    resolver_timeout 5s;

    # HSTS (HTTP Strict Transport Security)
    add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;

    # ... rest of your http block configuration (e.g., location /)
    location / {
        proxy_pass http://your_app_server_upstream;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 300s;
        proxy_connect_timeout 75s;
        proxy_send_timeout 300s;
        proxy_buffer_size 128k;
        proxy_buffers 4 256k;
        proxy_busy_buffers_size 256k;
    }

    # Serve static assets directly
    location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
        root /path/to/your/public/assets; # Adjust to your asset path
        expires 30d;
        add_header Cache-Control "public, no-transform";
        access_log off;
    }
}

Tuning Gunicorn/Unicorn for Ruby Applications

For Ruby applications, especially those not using a managed platform like Heroku or AWS Elastic Beanstalk with pre-configured servers, Gunicorn (Python WSGI HTTP Server for UNIX) or Unicorn (Ruby Unicorn is a pre-fork binary, not Gunicorn) are common choices. However, if you’re running a Ruby app, you’d typically use Puma or Unicorn (Ruby). Assuming you’re using Puma or a similar Ruby server, let’s discuss tuning. If you are indeed using Gunicorn for a Python app, the principles are similar but the specific parameters differ. For this playbook, we’ll focus on tuning a Ruby application server like Puma.

Puma Worker and Thread Configuration

Puma is a multi-threaded, multi-process web server for Ruby. Its performance is heavily influenced by the number of workers and threads configured. A common strategy is to use a combination of multiple worker processes and multiple threads per worker.

Puma Configuration (config/puma.rb)

The config/puma.rb file is where you define Puma’s behavior. Here’s a typical setup for a production environment:

# config/puma.rb

# Set the environment
environment ENV.fetch('RAILS_ENV') { 'production' }

# Number of workers. Typically set to the number of CPU cores available to the application.
# For AWS EC2 instances, this often means matching the vCPU count.
# Example: For an m5.large (2 vCPU), you might start with 2 workers.
# If using a socket, ensure each worker can bind to it or use a master socket.
# If using TCP, ensure ports are available.
workers ENV.fetch('WEB_CONCURRENCY') { 2 }.to_i

# Number of threads per worker. This determines how many requests a single worker process can handle concurrently.
# A common starting point is 5. Tune based on I/O bound vs CPU bound nature of your app.
threads_count = ENV.fetch('RAILS_MAX_THREADS') { 5 }.to_i
threads threads_count, threads_count

# Bind to a TCP socket or a Unix socket.
# For Nginx proxying, a Unix socket is often preferred for performance and simplicity.
# Ensure the path is accessible by the Nginx user.
# Example: If Nginx is running as www-data, ensure www-data can read/write to the socket directory.
# If Nginx is on a different server, use a TCP socket.
# For AWS, using a TCP socket on localhost (127.0.0.1:PORT) is common when Nginx and Puma are on the same EC2 instance.
# If using multiple EC2 instances for app servers, Nginx will proxy to them via their public/private IPs.
# For this example, we'll assume Nginx is on the same instance and proxies to a Unix socket.
# If using TCP, replace 'unix:///path/to/your/app.sock' with 'tcp://127.0.0.1:9000' (or your chosen port).
bind "unix:///var/www/your_app/shared/tmp/sockets/puma.sock"

# Set the maximum number of connections per worker.
# This is often related to the number of threads.
# If you have 5 threads, you might set max_connections to 5 * 100 = 500.
# This directive is more relevant for threaded servers like Puma.
# For worker-based servers like Unicorn, this is less of a direct concern.
# Puma's default behavior is to handle connections within its threads.
# If you encounter connection issues, consider tuning this.
# max_connections ENV.fetch('RAILS_MAX_CONNECTIONS') { 100 }.to_i

# Daemonize the server into the background.
# For production, it's often better to run Puma under a process manager like systemd or supervisord,
# which handles daemonization and restarts. Set to false if using a process manager.
daemonize false

# Logging
pidfile "/var/www/your_app/shared/tmp/pids/puma.pid"
state_path "/var/www/your_app/shared/tmp/pids/puma.state"
log_requests true
access_log "/var/www/your_app/log/puma_access.log"
error_log "/var/www/your_app/log/puma_error.log"

# Preload the application code.
# This is crucial for performance as it loads your entire application into memory before starting workers.
preload_app!

# Callbacks for application lifecycle events.
on_worker_boot do
# Worker specific setup for database connections, etc.
# For example, to reset database connections for each worker:
ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

# Allow Puma to be restarted by `rails restart` command.
plugin :tmp_restart

# Other useful settings:
# queue_requests false # If true, Puma will queue requests when all threads are busy.
# workers_management false # If true, Puma will manage worker restarts.

Tuning Considerations for Puma

Workers vs. Threads: The optimal ratio depends on your application’s workload. If your app is I/O bound (e.g., heavy database queries, external API calls), you can benefit from more threads per worker. If it’s CPU bound, more workers are generally better.
preload_app!: Always use this in production. It loads your application code once, significantly speeding up worker startup and reducing memory overhead.
Database Connections: Ensure your database connection pool size is adequate for the total number of threads across all workers. For example, if you have 4 workers and 5 threads per worker, and a database pool size of 10, you have a total of 4 * 5 = 20 potential concurrent requests, but only 10 database connections available. You might need to increase the pool size (e.g., to 20 or more).
Process Manager: Use systemd or supervisord to manage Puma processes. This ensures automatic restarts on failure and proper startup during server boot.

Redis for Caching and Session Management

Redis is an in-memory data structure store, used as a database, cache, and message broker. It’s exceptionally fast and a perfect fit for caching frequently accessed data, session storage, and background job queues in Ruby applications.

Optimizing Redis Performance

Tuning Redis involves configuring memory usage, persistence, and network settings. On AWS, consider using ElastiCache for a managed Redis experience, or self-hosting on EC2 instances.

Redis Configuration (redis.conf)

# redis.conf

# Network settings
# Bind to localhost if Redis and your app are on the same instance.
# If using ElastiCache or Redis on a separate EC2 instance, bind to the instance's private IP.
# For security, avoid binding to 0.0.0.0 unless absolutely necessary and protected by firewall.
bind 127.0.0.1

# Port to listen on
port 6379

# Max memory usage. Crucial for preventing Redis from consuming all available RAM.
# Set this to a value less than your total system RAM to leave room for the OS and other processes.
# Example: For a 4GB RAM instance, set to 3GB.
maxmemory 3gb
maxmemory-policy allkeys-lru # Evict least recently used keys when maxmemory is reached

# Persistence settings
# RDB (Redis Database) snapshots
save 900 1 # Save at least once in 900 seconds if at least 1 key changed
save 300 10 # Save at least once in 300 seconds if at least 10 keys changed
save 60 10000 # Save at least once in 60 seconds if at least 10000 keys changed

# AOF (Append Only File) - provides better durability than RDB alone.
# appendonly yes
# appendfilename "appendonly.aof"
# appendfsync everysec # fsync every second (good balance of performance and durability)

# Client connection settings
tcp-backlog 511 # Default is 511. Increase if you see connection refused errors under high load.
timeout 0 # Close connections after 0 seconds of inactivity (keep alive)

# Replication (if using Redis Sentinel or Cluster)
# replica-serve-stale-data yes
# replica-read-only yes

# Lua scripting
lua-time-limit 5000 # Max execution time for Lua scripts in milliseconds

# Slowlog settings
slowlog-log-slower-than 10000 # Log commands that take longer than 10ms
slowlog-max-len 128 # Number of slow log entries to keep

# Other useful settings:
# databases 16 # Number of databases (default is 16)
# supervised systemd # If running Redis as a systemd service

AWS Specific Considerations

EC2 Instance Sizing: Choose an EC2 instance type with sufficient RAM for your Redis dataset and overhead. For memory-intensive workloads, consider memory-optimized instances (e.g., `r` series).

Security Groups: Configure AWS Security Groups to allow inbound traffic to your Redis instance only from your application servers’ security group on port 6379. If using ElastiCache, ensure your VPC and Subnet Group are correctly configured.

Monitoring: Utilize CloudWatch metrics for Redis (e.g., `CacheHits`, `CacheMisses`, `CurrConnections`, `MemoryUsage`) to monitor performance and identify potential bottlenecks.

Integrating Redis with Ruby

The redis-rb gem is the standard client for interacting with Redis from Ruby.

Example Usage (Rails initializer)

# config/initializers/redis.rb

# Use environment variables for configuration
redis_host = ENV.fetch('REDIS_HOST', '127.0.0.1')
redis_port = ENV.fetch('REDIS_PORT', 6379).to_i
redis_db = ENV.fetch('REDIS_DB', 0).to_i

# For session store
Rails.application.config.session_store :redis_session_store,
  redis: {
    host: redis_host,
    port: redis_port,
    db: redis_db,
    # Add password if your Redis instance requires it
    # password: ENV['REDIS_PASSWORD']
  }

# For general caching
$redis = Redis.new(host: redis_host, port: redis_port, db: redis_db)
# $redis.auth(ENV['REDIS_PASSWORD']) if ENV['REDIS_PASSWORD']

# Example of using Redis for caching
# class MyCache
#   def self.fetch(key, &block)
#     Rails.cache.fetch(key, expires_in: 1.hour, &block)
#   end
# end

# If using a different cache store, configure it here.
# For example, to use Redis as the Rails cache store:
# Rails.application.config.cache_store :redis_cache_store, { url: "redis://#{redis_host}:#{redis_port}/#{redis_db}" }

Putting It All Together: A Typical AWS Deployment Stack

A common and robust setup on AWS for a Ruby application would look like this:

Load Balancer: AWS Application Load Balancer (ALB) or Network Load Balancer (NLB) for distributing traffic across multiple EC2 instances.
Web Server: Nginx running on EC2 instances, configured as detailed above, handling static assets, SSL termination, and proxying to the application server.
Application Server: Puma (or Unicorn) running on EC2 instances, managed by systemd or supervisord, listening on a Unix socket or localhost TCP port.
Caching/Session Store: Redis, either self-hosted on dedicated EC2 instances or using AWS ElastiCache for managed Redis.
Database: AWS RDS (e.g., PostgreSQL, MySQL) or Aurora.
Background Jobs: Sidekiq (using Redis) or AWS SQS.

Example EC2 Instance Configuration (Conceptual)

Imagine an EC2 instance running your application. The Nginx configuration would point to a Puma socket, and Puma would be managed by systemd.

Systemd Service File for Puma

# /etc/systemd/system/puma.service

[Unit]
Description=Puma Application Server
After=network.target

[Service]
Type=simple
User=deploy # Or your application user
Group=www-data # Or the group Nginx runs as, if using Unix sockets

# Set environment variables for your application
Environment="RAILS_ENV=production"
Environment="RAILS_LOG_TO_STDOUT=true" # Useful for containerized environments or systemd logging
Environment="WEB_CONCURRENCY=4" # Matches workers in puma.rb
Environment="RAILS_MAX_THREADS=5" # Matches threads in puma.rb
Environment="REDIS_HOST=your-redis-host.xxxxxx.ng.0001.use1.cache.amazonaws.com" # If using ElastiCache
Environment="REDIS_PORT=6379"
Environment="REDIS_DB=0"
# Environment="REDIS_PASSWORD=your_redis_password"

WorkingDirectory=/var/www/your_app/current
ExecStart=/usr/local/bin/bundle exec puma -C /var/www/your_app/shared/puma.rb

RestartSec=5
Restart=always

# If using Unix sockets, ensure the user/group has permissions
# If Nginx runs as www-data and your app user is 'deploy', you might need:
# User=deploy
# Group=www-data
# PermissionsStartOnly=true
# ExecStartPre=/bin/chown deploy:www-data /var/www/your_app/shared/tmp/sockets
# ExecStartPre=/bin/chmod 775 /var/www/your_app/shared/tmp/sockets

[Install]
Install]
WantedBy=multi-user.target

With this setup, Nginx receives incoming requests, serves static files directly, and forwards dynamic requests to Puma via the Unix socket. Puma, managed by systemd, handles the application logic, leveraging Redis for caching and sessions. This layered approach ensures high performance, scalability, and resilience.