The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Redis on Google Cloud for Ruby

Nginx as a High-Performance Frontend Proxy

When deploying Ruby applications on Google Cloud, Nginx serves as an indispensable frontend proxy. Its role extends beyond simple request routing; it handles SSL termination, static file serving, caching, and load balancing, offloading these critical tasks from your application servers. Proper tuning of Nginx is paramount for achieving optimal performance and stability.

Nginx Configuration for Ruby Applications

A robust Nginx configuration for a Ruby application, typically served via Gunicorn or Puma, involves several key directives. We’ll focus on optimizing worker processes, connection handling, and buffering.

Worker Processes and Connections

The worker_processes directive dictates how many worker processes Nginx will spawn. A common best practice is to set this to the number of CPU cores available on your instance. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be worker_processes * worker_connections.

Example Nginx Configuration Snippet

Place these directives within the main context of your nginx.conf file, typically located at /etc/nginx/nginx.conf.

# Adjust worker_processes based on your VM's CPU cores.
# For a 4-core VM, set to 4.
worker_processes 4;

# Maximum number of simultaneous connections per worker.
# A common starting point is 1024, but this can be tuned higher.
worker_connections 1024;

# Enable Gzip compression for text-based assets.
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

# Increase the buffer size for requests.
client_body_buffer_size 128k;
client_max_body_size 10m; # Adjust as needed for file uploads

# Timeout settings to prevent idle connections from consuming resources.
client_header_timeout 3m;
client_body_timeout 3m;
send_timeout 3m;
keepalive_timeout 65;

# Enable TCP_NODELAY and TCP_CORK for better network performance.
tcp_nodelay on;
tcp_nopush on;

Proxying to Gunicorn/Puma

When proxying to a WSGI server like Gunicorn (for Python applications, though often used metaphorically for Ruby app servers) or a Ruby server like Puma, it’s crucial to configure appropriate timeouts and buffer sizes to prevent upstream timeouts and ensure smooth data transfer.

Example Server Block for Upstream Proxying

This configuration assumes your Ruby application server is listening on 127.0.0.1:8000. Adjust the proxy_pass directive accordingly.

server {
    listen 80;
    server_name your_domain.com www.your_domain.com;

    # Serve static files directly from Nginx for performance.
    location /static/ {
        alias /path/to/your/app/public/static/;
        expires 30d;
        access_log off;
    }

    location / {
        proxy_pass http://127.0.0.1:8000; # Or your Puma/Gunicorn address
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Increase proxy timeouts to prevent upstream timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;

        # Buffering settings
        proxy_buffering on;
        proxy_buffer_size 128k;
        proxy_buffers 8 128k;
        proxy_busy_buffers_size 256k;
    }

    # Optional: SSL configuration (if not handled by Google Cloud Load Balancer)
    # listen 443 ssl;
    # ssl_certificate /etc/letsencrypt/live/your_domain.com/fullchain.pem;
    # ssl_certificate_key /etc/letsencrypt/live/your_domain.com/privkey.pem;
    # include /etc/letsencrypt/options-ssl-nginx.conf;
    # ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
}

Gunicorn/Puma Tuning for Ruby Applications

Your Ruby application server (e.g., Puma, Unicorn) is the heart of your application’s request processing. Tuning its worker count, threads, and timeout settings is critical. We’ll use Puma as the primary example, as it’s a popular choice for Ruby web applications.

Puma Worker and Thread Configuration

Puma’s concurrency model is based on workers and threads. A common strategy is to use multiple workers, each with multiple threads. The optimal number depends heavily on your application’s I/O-bound vs. CPU-bound nature and the available CPU cores.

Example Puma Command Line Arguments

When starting Puma, you can specify these parameters. For a typical Google Cloud instance with 4 vCPUs, a good starting point might be 2 workers, each with 4 threads.

# Example for a Rails application
bundle exec puma -C config/puma.rb -w 2 -t 4 --bind tcp://127.0.0.1:8000

Puma Configuration File (config/puma.rb)

For more granular control, use a puma.rb configuration file. This allows for more sophisticated settings, including environment-specific configurations and daemonization.

# config/puma.rb

# Number of workers. Adjust based on CPU cores.
# For a 4-core VM, 2-4 workers is a good starting point.
workers 2

# Number of threads per worker.
# If your app is I/O bound, you might increase this.
# If CPU bound, keep it closer to 1.
threads 0, 4 # Min threads 0, Max threads 4

# Bind to a TCP socket for Nginx to proxy to.
bind "tcp://127.0.0.1:8000"

# Environment (e.g., "production")
environment ENV.fetch("RAILS_ENV") { "production" }

# Daemonize the process (run in background)
daemonize false # Set to true if not managed by a process manager like systemd

# Logging
pidfile "tmp/pids/puma.pid"
state_path "tmp/pids/puma.state"
log_requests true

# Graceful shutdown timeout
quiet_and_graceful_shutdown true
graceful_shutdown_timeout 15

# Preload application code
preload_app!

# Callbacks for worker startup/shutdown
on_worker_boot do
  # Worker specific setup
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

on_worker_shutdown do
  # Worker specific cleanup
end

# Worker timeout (in seconds) - how long a worker can be idle before being restarted.
# Adjust based on your application's typical request latency.
worker_timeout 60

Gunicorn (for Python, but principles apply)

While Gunicorn is for Python, its worker and thread concepts are similar. For a Ruby app, you’d use a Ruby WSGI server or a Rack-compatible server like Puma or Unicorn. If you were using Gunicorn for a Python app, the command might look like:

# Example for a Python app with Gunicorn
gunicorn --workers 4 --threads 2 --bind 127.0.0.1:8000 myapp.wsgi:application

Redis Performance Tuning on Google Cloud

Redis is often used as a cache, session store, or message broker. Optimizing its performance on Google Cloud involves both Redis configuration and understanding its interaction with your application and network.

Redis Configuration (`redis.conf`)

Key parameters in redis.conf for performance include memory management, persistence, and network settings.

# redis.conf

# Max memory to use. Crucial for preventing Redis from consuming all available RAM.
# Set this to a value less than your instance's total RAM, leaving room for the OS and other processes.
# Example for a 4GB RAM instance:
maxmemory 3gb
maxmemory-policy allkeys-lru # Eviction policy: LRU is common for caching.

# Persistence settings: RDB snapshots are good for backups, AOF for durability.
# For caching, you might disable persistence or use RDB only.
# For session stores, AOF might be preferred.
save 900 1    # Save at least once every 15 minutes if at least 1 key changed
save 300 10   # Save at least once every 5 minutes if at least 10 keys changed
save 60 10000 # Save at least once every 1 minute if at least 10000 keys changed
appendonly no # Set to 'yes' for AOF persistence. For pure caching, 'no' is fine.

# TCP settings
tcp-backlog 511 # Default is 511, can be increased if you see connection issues under high load.
tcp-keepalive 300 # Send a PING to clients every 300 seconds to keep connections alive.

# Performance tuning for I/O
# io-threads 4 # Enable I/O threading for Redis 6.0+ for better multi-core utilization.
# io-threads-do-reads yes # If using io-threads, also use for reads.

Google Cloud Network Considerations

Network latency between your application servers and Redis is a significant factor. Ensure your Redis instance and application servers are in the same Google Cloud region and, ideally, the same zone or VPC network subnet to minimize latency.

Monitoring and Diagnostics

Regularly monitor Redis performance using commands like INFO and MONITOR. Pay attention to:

used_memory: Track memory consumption against maxmemory.
instantaneous_ops_per_sec: High values indicate heavy load.
keyspace_hits and keyspace_misses: Monitor cache hit ratio.
evicted_keys: Indicates memory pressure and aggressive eviction.
rejected_connections: If Redis is overloaded or maxclients is reached.

Example Redis `INFO` Output Analysis

# Server
redis_version:6.2.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:abcdef123456

# Clients
connected_clients:150
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0

# Memory
used_memory:1073741824 # 1GB used
used_memory_human:1.00G
used_memory_rss:1258291200 # 1.2GB RSS (includes overhead)
used_memory_peak:1500000000 # Peak memory usage
used_memory_peak_human:1.40G
total_system_memory:4000000000 # 4GB total system memory
total_system_memory_human:3.73G
maxmemory:3221225472 # 3GB maxmemory limit
maxmemory_human:3.00G
maxmemory_policy:allkeys-lru

# Persistence
rdb_bgsave_in_progress:0
rdb_save_in_progress:0
rdb_last_save_time:1678886400
rdb_current_bgsave_time_last:0
rdb_current_bgsave_time:0
rdb_last_bgsave_status:ok
rdb_last_bgsave_duration_sec:0
rdb_last_rewrite_time_sec:-1
rdb_buffer_length:0
rdb_dump_buffer_length:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_bgrewrite_time_sec:-1
aof_last_rewrite_time_sec:-1
aof_last_size:0
aof_current_size:0
aof_buffer_length:0
aof_meta_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

# Stats
total_connections_received:1000000
total_commands_processed:50000000
instantaneous_ops_per_sec:15000
total_net_input_bytes:10000000000
total_net_output_bytes:20000000000
instantaneous_input_kbps:1000.00
instantaneous_output_kbps:2000.00
rejected_connections:0
sync_full:1
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:5000 # Indicates memory pressure, keys are being removed.
keyspace_hits:45000000
keyspace_misses:5000000
------------ Keyspace ------------
db0:keys=1000000,expires=0,avg_ttl=0

In this example, used_memory is well within maxmemory, but evicted_keys indicates that Redis is under memory pressure and is actively removing keys. If this is undesirable for your caching strategy, you’d need to increase maxmemory (if system RAM allows) or choose a more aggressive eviction policy.