The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and PostgreSQL on Google Cloud for Ruby

Nginx Configuration for High-Traffic Ruby Applications

Optimizing Nginx is crucial for serving Ruby web applications efficiently, especially when dealing with high concurrency. The primary goals are to minimize latency, maximize throughput, and ensure robust error handling. We’ll focus on key directives that impact performance and stability.

Worker Processes and Connections

The worker_processes directive determines how many worker processes Nginx will spawn. Setting this to auto is generally recommended, allowing Nginx to detect the number of CPU cores and utilize them effectively. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. This value should be set high enough to accommodate peak traffic, but not so high that it exhausts system resources.

Example Nginx Configuration Snippet

worker_processes auto;
events {
    worker_connections 4096; # Adjust based on your server's RAM and expected load
    multi_accept on;
}

http {
    # ... other http directives ...

    server {
        listen 80;
        server_name your_domain.com;
        root /path/to/your/ruby/app/public; # For static files

        location / {
            proxy_pass http://unix:/path/to/your/app.sock; # Or http://127.0.0.1:PORT
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_read_timeout 300s; # Increase for long-running requests
            proxy_connect_timeout 75s;
        }

        # Serve static assets directly from Nginx for better performance
        location ~ ^/(assets|images|javascripts|stylesheets)/ {
            expires 1y;
            add_header Cache-Control "public";
            try_files $uri =404;
        }

        # Deny access to hidden files
        location ~ /\. {
            deny all;
        }
    }
}

Buffering and Timeouts

Nginx uses buffers to handle requests and responses. Tuning client_body_buffer_size, client_max_body_size, and proxy_buffers can prevent issues with large uploads or slow clients. Timeouts like proxy_read_timeout and proxy_connect_timeout are critical for preventing worker processes from being held up by slow backend applications or database queries.

Gunicorn/Puma Tuning for Ruby Applications

Whether you’re using Gunicorn (Python WSGI HTTP Server, often used with Ruby via tools like rackup) or Puma (a popular Ruby web server), the principles of tuning remain similar: manage worker processes, threads, and timeouts effectively.

Worker and Thread Configuration

The number of worker processes and threads directly impacts concurrency. A common strategy is to set the number of workers based on CPU cores and then use threads to handle I/O-bound tasks within each worker. For Puma, this is controlled by --workers and --threads. For Gunicorn, it’s --workers and --threads (though Gunicorn’s threading model can be more complex due to the GIL).

Puma Configuration Example (via `config/puma.rb`)

# config/puma.rb
workers Integer(ENV.fetch("WEB_CONCURRENCY") { 2 }) # Number of workers
threads_count = Integer(ENV.fetch("RAILS_MAX_THREADS") { 5 }) # Threads per worker

threads threads_count, threads_count

preload_app!

rackup
environment ENV.fetch("RAILS_ENV") { "development" }

on_worker_boot do
  # Worker specific setup for Rails.
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord::Base)
end

# Allow Puma to be restarted by `rails restart` command.
plugin :tmp_restart

# Bind to a Unix socket for Nginx to proxy to
bind "unix:///path/to/your/app.sock"

# Or bind to a TCP socket
# bind "tcp://0.0.0.0:3000"

# Increase timeout for long-running requests
# If Nginx has a higher proxy_read_timeout, this should be higher or equal.
# Puma's default is 30 seconds.
request_max_threads threads_count, threads_count
worker_timeout 60 # seconds

Gunicorn Configuration Example (Command Line)

gunicorn --workers 4 \
         --threads 2 \
         --bind unix:/path/to/your/app.sock \
         --timeout 120 \
         --log-level info \
         your_app.wsgi:application

Note: For Gunicorn with Python, the Global Interpreter Lock (GIL) means that threads are best suited for I/O-bound tasks. For CPU-bound tasks, multiple worker processes are more effective. If your Ruby app is heavily CPU-bound, consider using multiple Puma workers.

Timeouts and Graceful Restarts

Setting appropriate timeout values in your web server is crucial. If a request takes longer than this timeout, the worker process will be killed, potentially leading to errors. Ensure this aligns with Nginx’s proxy_read_timeout. Graceful restarts (plugin :tmp_restart for Puma, or specific Gunicorn signals) are essential for zero-downtime deployments.

PostgreSQL Tuning on Google Cloud (GCP)

Database performance is often the bottleneck. Tuning PostgreSQL on GCP, particularly with Cloud SQL, involves understanding its configuration parameters and leveraging GCP’s managed services effectively.

Key PostgreSQL Configuration Parameters

These parameters are typically set in postgresql.conf. On Cloud SQL, you manage these via the GCP Console under “Flags”.

Memory Allocation

# Shared memory buffer for caching data pages.
# Aim for 25% of total RAM, but not more than ~8GB for typical workloads.
shared_buffers = 2GB

# Write-ahead log buffer.
# Larger values can improve write performance but increase recovery time.
wal_buffers = 16MB

# Amount of memory for sorting and hashing operations per connection.
# Aim for 5-10% of total RAM per connection, capped by max_connections.
# If you have many connections, this can consume a lot of RAM.
# Consider setting it lower if you have many connections and tune per-query.
work_mem = 64MB

# Memory for vacuum operations.
# Useful for preventing bloat and improving performance.
autovacuum_work_mem = 1GB # Or set to -1 to use work_mem

# Maximum memory used by background writer.
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 1.0

Connection Management

# Maximum number of concurrent connections.
# This is a critical parameter. Each connection consumes RAM.
# Set based on your application's needs and server RAM.
max_connections = 100

# Number of background worker processes.
# Useful for parallel vacuuming, etc.
max_worker_processes = 8 # Should be >= autovacuum_max_workers + max_parallel_workers_per_gather

# Number of autovacuum worker processes.
autovacuum_max_workers = 3

# Maximum number of parallel workers per query.
max_parallel_workers_per_gather = 4
max_parallel_workers = 4

Write-Ahead Log (WAL) Tuning

# Controls how often WAL is flushed to disk.
# 'on' (default) flushes at each commit. 'remote_write' is often a good balance.
# 'local' is faster but less durable.
wal_level = replica
wal_sync_method = fsync
wal_writer_delay = 200ms # Default is 200ms, can be tuned.

# Controls the size of WAL segments. Larger segments can improve performance
# for heavy write loads but increase recovery time.
# Default is 16MB. Consider 64MB or 128MB for very high write throughput.
wal_segment_size = 64MB

Cloud SQL Specific Optimizations

When using Cloud SQL for PostgreSQL, leverage its features:

Instance Sizing: Choose an instance size (CPU, RAM) that matches your workload. Start with a reasonable size and monitor performance.
Read Replicas: Offload read traffic to read replicas to reduce load on the primary instance.
Connection Pooling: Use a connection pooler like PgBouncer. Cloud SQL doesn’t manage this directly, so you’ll need to deploy it separately or within your application environment. This is critical for managing max_connections effectively.
Monitoring: Utilize Cloud Monitoring to track CPU utilization, memory usage, disk I/O, and PostgreSQL-specific metrics (e.g., cache hit ratio, active connections, query latency).

Example PgBouncer Configuration (`pgbouncer.ini`)

[databases]
# Format: database_name = connection_string
# Example:
mydb = host=YOUR_CLOUD_SQL_IP port=5432 dbname=your_db_name user=your_user password=your_password

[pgbouncer]
; Listen on a Unix socket or TCP port
listen_addr = /var/run/pgbouncer/pgbouncer.sock
# listen_addr = 0.0.0.0:6432

; Pool mode: session, transaction, or statement
pool_mode = session

; Maximum number of clients per server connection.
; Adjust based on your application's concurrency and server resources.
max_client_conn = 1000

; Maximum number of server connections per pool.
; This should be significantly lower than max_client_conn.
default_pool_size = 20

; Minimum number of server connections per pool.
min_pool_size = 5

; Connection timeout for clients.
client_idle_timeout = 60

; Connection timeout for servers.
server_idle_timeout = 60

; Maximum number of server connections to keep open per database.
; This is a hard limit.
max_db_connections = 500

; Log level
log_connections = 0
log_disconnections = 0
log_pooler_errors = 1
log_stats = 0
log_filename = pgbouncer
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

In your application’s database configuration (e.g., database.yml for Rails), you would then point to the PgBouncer socket or port instead of directly to PostgreSQL.

Monitoring and Iteration

Performance tuning is an iterative process. Continuously monitor your application and infrastructure metrics. Use tools like pg_stat_statements to identify slow queries and optimize them. Benchmark changes before and after applying them to quantify improvements. For GCP, Cloud Logging and Cloud Monitoring are indispensable for this.