The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and MySQL on AWS for Ruby

Nginx as a High-Performance Frontend for Ruby Applications

When deploying Ruby applications, particularly those built with frameworks like Ruby on Rails or Sinatra, Nginx serves as an indispensable frontend. Its strengths lie in efficient static file serving, SSL termination, request buffering, and load balancing. Properly tuning Nginx is crucial for minimizing latency and maximizing throughput.

Nginx Configuration for Ruby Backends

The core of Nginx’s role is its proxy_pass directive, which forwards requests to your application server (e.g., Gunicorn for Python, Puma/Unicorn for Ruby, or PHP-FPM for PHP). Here’s a robust configuration snippet for a typical setup:

Key Nginx Directives and Tuning Parameters

Let’s break down essential directives within your Nginx server block:

worker_processes: Controls the number of worker processes. Setting this to auto or the number of CPU cores is a common starting point.
worker_connections: The maximum number of simultaneous connections a worker can handle. This should be set high enough to accommodate your expected traffic, considering that each connection might be to a backend server.
keepalive_timeout: The time to keep an HTTP connection open for subsequent requests. A moderate value (e.g., 65 seconds) balances resource usage with client responsiveness.
client_max_body_size: Crucial for handling file uploads. Set this to a reasonable maximum size for request bodies.
proxy_connect_timeout, proxy_send_timeout, proxy_read_timeout: These define timeouts for communication with the upstream server. Adjust them based on your application’s typical response times.
proxy_buffer_size, proxy_buffers: These control how Nginx buffers responses from the upstream. Larger buffers can help with slow upstream responses but consume more memory.
gzip and gzip_types: Enable compression for text-based assets to reduce bandwidth.
ssl_session_cache and ssl_session_timeout: Optimize SSL handshake performance.

Example Nginx Configuration Snippet

This configuration assumes your Ruby application is served by Puma on 127.0.0.1:3000. Adapt the upstream block and proxy settings as needed.

# /etc/nginx/nginx.conf or /etc/nginx/sites-available/your_app

user www-data;
worker_processes auto; # Or set to the number of CPU cores
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 4096; # Adjust based on expected load and backend connections
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging settings
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log warn;

    # Gzip compression
    gzip on;
    gzip_disable "msie6";
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;

    # Buffering and timeouts for upstream
    proxy_buffer_size 128k;
    proxy_buffers 4 256k;
    proxy_busy_buffers_size 256k;
    proxy_connect_timeout 60s;
    proxy_send_timeout 60s;
    proxy_read_timeout 60s;

    # Client body size for uploads
    client_max_body_size 50M; # Adjust as needed

    # SSL settings (if applicable)
    # ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    # ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
    # ssl_protocols TLSv1.2 TLSv1.3;
    # ssl_prefer_server_ciphers on;
    # ssl_session_cache shared:SSL:10m;
    # ssl_session_timeout 10m;

    upstream puma_app {
        server 127.0.0.1:3000; # Your Puma server address and port
        # For multiple Puma instances or load balancing:
        # server 127.0.0.1:3001;
        # server 127.0.0.1:3002;
        # least_conn; # Or other load balancing methods
    }

    server {
        listen 80;
        # listen 443 ssl http2; # Uncomment for SSL

        server_name yourdomain.com www.yourdomain.com;

        root /var/www/your_app/public; # Path to your Rails/Sinatra public directory

        location / {
            try_files $uri $uri/ /index.html; # For static assets
            proxy_pass http://puma_app;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_redirect off;
        }

        # Serve static assets directly from Nginx for performance
        location ~ ^/(assets|images|javascripts|stylesheets|system)/ {
            expires 1y;
            add_header Cache-Control "public";
            access_log off;
            try_files $uri $uri/ =404;
        }

        # Handle favicon and robots.txt
        location = /favicon.ico { access_log off; log_not_found off; }
        location = /robots.txt  { access_log off; log_not_found off; }

        # Deny access to hidden files
        location ~ /\. {
            deny all;
        }
    }
}

Gunicorn/Puma Tuning for Ruby Applications

Your application server (e.g., Puma, Unicorn, or Gunicorn if you’re using a Python WSGI app that your Ruby app interacts with, though typically Ruby apps use Puma or Unicorn) is the direct interface to your Ruby code. Tuning its worker count, threads, and timeouts is critical.

Puma Configuration and Best Practices

Puma is a popular, multi-platform web server for Ruby. Its configuration is typically done via a config/puma.rb file.

# config/puma.rb

# Set the environment
environment ENV.fetch('RAILS_ENV') { 'production' }

# Number of threads to use for each worker.
# A common starting point is 5. Adjust based on your application's I/O bound vs CPU bound nature.
threads_count = ENV.fetch('RAILS_MAX_THREADS') { 5 }.to_i
threads threads_count, threads_count

# Number of worker processes.
# For multi-core systems, setting this to (number of cores - 1) is a good strategy.
# If using Nginx with multiple workers, you might start with fewer Puma workers.
worker_count = ENV.fetch('WEB_CONCURRENCY') { 2 }.to_i
workers worker_count

# Bind to a specific address and port.
# Nginx will proxy to this address.
bind "tcp://127.0.0.1:3000"

# If using a Unix socket instead of TCP:
# bind "unix:///path/to/your/app/shared/sockets/puma.sock"

# Daemonize the server into the background.
# Typically managed by systemd or other process managers, so often false in production.
daemonize false

# Logging
stdout_redirect "/path/to/your/app/shared/log/puma.stdout.log", "/path/to/your/app/shared/log/puma.stderr.log"
pidfile "/path/to/your/app/shared/pids/puma.pid"
state_path "/path/to/your/app/shared/pids/puma.state"

# Graceful shutdown timeout
graceful_shutdown_timeout 15

# Preload the application code before forking workers.
# This significantly speeds up worker startup.
preload_app!

# Callbacks for worker lifecycle
on_worker_boot do
  # Worker specific setup
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord::Base)
end

on_worker_fork do
  # Worker specific setup
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord::Base)
end

Tuning Considerations for Puma:

Threads vs. Workers: Threads are lighter weight but share memory. Workers are heavier but provide process isolation. A common strategy is to use multiple workers, each with multiple threads.
threads: If your application is I/O bound (e.g., making many external API calls or database queries), increasing threads can improve concurrency. If it’s CPU bound, too many threads can lead to contention.
workers: This determines the number of Ruby processes. For CPU-bound tasks, more workers can leverage multiple cores. For I/O-bound tasks, fewer workers with more threads might suffice. A good starting point is (number of CPU cores) - 1.
preload_app!: Essential for performance. It loads your application code once before forking workers, avoiding redundant memory loading.
bind: Ensure this matches what Nginx is configured to proxy to. Using 127.0.0.1 is standard for local communication.

MySQL/PostgreSQL Tuning for High-Traffic Ruby Apps

Database performance is often the bottleneck. Tuning your database server (MySQL or PostgreSQL) is as critical as tuning your web server and application server.

MySQL Tuning Parameters

Key parameters in my.cnf (or mysqld.cnf) to focus on:

# /etc/mysql/my.cnf or /etc/mysql/mysql.conf.d/mysqld.cnf

[mysqld]
# General
user                    = mysql
pid-file                = /var/run/mysqld/mysqld.pid
socket                  = /var/run/mysqld/mysqld.sock
datadir                 = /var/lib/mysql
log-error               = /var/log/mysql/error.log
# General Query Log (use with caution in production, can be very verbose)
# general_log_file        = /var/log/mysql/mysql.log
# general_log             = 1

# Performance Tuning
innodb_buffer_pool_size = 768M  # Crucial: ~70-80% of available RAM for dedicated DB servers
innodb_log_file_size    = 256M  # Larger logs can improve write performance
innodb_log_buffer_size  = 16M   # Buffer for transaction logs
innodb_flush_log_at_trx_commit = 1 # Default is 1 (ACID compliant), 2 can be faster but less safe on crash
innodb_flush_method     = O_DIRECT # Bypass OS cache for InnoDB data files

# Connection Handling
max_connections         = 200   # Adjust based on application needs and server capacity
thread_cache_size       = 16    # Cache threads for reuse
table_open_cache        = 2000  # Cache open table file descriptors
table_definition_cache  = 1000  # Cache table definitions

# Query Cache (Deprecated in MySQL 5.7, removed in 8.0. Use application-level caching.)
# query_cache_type        = 0
# query_cache_size        = 0

# Other important settings
tmp_table_size          = 64M
max_heap_table_size     = 64M
sort_buffer_size        = 4M
join_buffer_size        = 4M
read_rnd_buffer_size    = 4M
read_buffer_size        = 4M

# Replication (if applicable)
# server-id = 1
# log_bin = mysql-bin
# binlog_format = ROW

MySQL Tuning Notes:

innodb_buffer_pool_size: This is the single most important setting for InnoDB. It caches data and indexes. Allocate as much RAM as possible without starving the OS or other processes.
innodb_log_file_size: Larger log files can improve write performance by reducing the frequency of log flushing, but increase recovery time after a crash.
innodb_flush_log_at_trx_commit: Setting to 2 can offer a performance boost for writes, but risks losing the last second of transactions on a server crash. 1 is safest.
max_connections: Don’t set this too high. Each connection consumes memory. Use connection pooling in your application if possible.
Query Cache: Avoid using the MySQL query cache in modern versions. It has scalability issues and is often disabled in favor of application-level caching (e.g., Redis, Memcached).

PostgreSQL Tuning Parameters

Key parameters in postgresql.conf:

# /etc/postgresql/X.Y/main/postgresql.conf

# Shared Memory
shared_buffers = 1GB       # ~25% of total RAM is a common starting point
effective_cache_size = 3GB # ~75% of total RAM, informs the planner about OS cache

# WAL (Write-Ahead Logging)
wal_level = replica        # Or 'logical' if using logical replication
wal_buffers = 16MB         # At least 3 times shared_buffers/128, or 16MB
wal_writer_delay = 200ms   # How often WAL writer wakes up
min_wal_size = 1GB         # Minimum size of WAL files
max_wal_size = 4GB         # Maximum size of WAL files

# Checkpointing
checkpoint_timeout = 5min  # Time between checkpoints
max_wal_size = 4GB         # Max WAL size before checkpoint
checkpoint_completion_target = 0.9 # Spread checkpoint over 90% of the interval

# Memory and I/O
work_mem = 16MB             # For sorts, hashes, etc. Per operation.
maintenance_work_mem = 256MB # For VACUUM, CREATE INDEX, etc.
random_page_cost = 1.1      # Lower if using SSDs
seq_page_cost = 1.0

# Connection Handling
max_connections = 200      # Adjust based on application needs
shared_preload_libraries = 'pg_stat_statements' # Essential for query analysis

# Autovacuum
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 15s
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.2 # Percentage of table size to trigger vacuum
autovacuum_analyze_scale_factor = 0.1 # Percentage of table size to trigger analyze

# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_min_duration_statement = 250ms # Log slow queries
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0 # Log temp files larger than this
log_autovacuum_min_duration = 0 # Log autovacuum actions

PostgreSQL Tuning Notes:

shared_buffers: The most critical parameter. It’s the amount of memory PostgreSQL uses for caching data. A common starting point is 25% of RAM.
effective_cache_size: This tells the query planner how much memory is available for caching by both PostgreSQL (shared_buffers) and the OS file system cache. Set it high (e.g., 50-75% of RAM) to encourage the planner to use indexes.
WAL Tuning: Parameters like wal_buffers, min_wal_size, and max_wal_size affect write performance and recovery time.
work_mem and maintenance_work_mem: work_mem is used per operation (sort, hash join), so be cautious not to set it too high globally. maintenance_work_mem is for maintenance tasks.
Autovacuum: Crucial for PostgreSQL to reclaim space from dead tuples and update statistics. Tune its parameters to ensure it runs effectively without impacting performance.
log_min_duration_statement: Essential for identifying slow queries. Set it to a value that captures problematic queries without flooding your logs.

Monitoring and Iterative Tuning

Tuning is not a one-time event. Continuous monitoring is key to identifying bottlenecks and validating changes. Use tools like:

AWS CloudWatch: Monitor CPU utilization, memory usage, network I/O, and disk I/O for your EC2 instances and RDS instances.
Nginx Status Module: Provides insights into active connections, requests, and errors. Enable it via ngx_http_stub_status_module.
Application Performance Monitoring (APM) Tools: New Relic, Datadog, Scout APM, or Skylight provide deep insights into application code performance, database query times, and external service calls.
Database Performance Tools: pg_stat_statements (PostgreSQL) or mysqltuner.pl / pt-query-digest (MySQL) help identify slow queries and resource usage.
System Monitoring: Tools like htop, vmstat, iostat, and netstat on your servers.

When making changes, adjust one parameter at a time, monitor the impact, and document your findings. This iterative approach ensures you understand the effect of each tuning step and can revert if necessary.