The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and DynamoDB on Google Cloud for Ruby
Nginx as a High-Performance Frontend for Ruby Applications
When deploying Ruby applications, particularly those using frameworks like Rails or Sinatra, on Google Cloud Platform (GCP), Nginx often serves as the primary web server and reverse proxy. Its efficiency in handling static assets, SSL termination, and load balancing makes it an indispensable component. Proper tuning of Nginx is crucial for maximizing throughput and minimizing latency.
Nginx Worker Processes and Connections
The core of Nginx performance tuning lies in its worker processes and connection handling. The worker_processes directive dictates how many worker processes Nginx will spawn. A common best practice is to set this to the number of CPU cores available on the instance. For dynamic scaling, especially in a cloud environment, setting it to auto is often preferred, allowing Nginx to determine the optimal number based on the system’s CPU count.
The worker_connections directive limits the number of simultaneous connections that each worker process can handle. This value, multiplied by the number of worker processes, determines the total maximum connections Nginx can manage. It’s essential to set this high enough to accommodate peak traffic but not so high that it exhausts system resources. A good starting point is often 1024 or higher, depending on the expected load and available memory.
Optimizing Keep-Alive and Buffers
HTTP keep-alive connections significantly reduce the overhead of establishing new TCP connections for each request. The keepalive_timeout directive controls how long an idle keep-alive connection will remain open. A value between 60 and 120 seconds is generally a good balance, allowing clients to reuse connections without holding them open indefinitely and consuming resources.
Buffer directives are critical for efficient request processing. client_body_buffer_size sets the size of the buffer used for reading client request bodies. If the request body is larger than this buffer, it’s written to a temporary file. client_header_buffer_size is for client request headers. Increasing these values can improve performance for applications handling larger requests or headers, but excessively large values can consume significant memory. A common starting point for client_body_buffer_size is 128k or 256k.
Gzip Compression and Caching
Enabling Gzip compression can dramatically reduce the bandwidth required for transferring text-based assets (HTML, CSS, JavaScript, JSON). The gzip directive should be set to on. Further tuning involves gzip_types to specify MIME types to compress, gzip_min_length to avoid compressing very small files, and gzip_comp_level for the compression ratio (a value of 4-6 is often a good compromise between compression and CPU usage).
Browser caching is also essential. The expires directive, often used within location blocks for static assets, tells the browser how long it can cache a resource. Setting this to a long duration (e.g., 30d for 30 days) for versioned assets significantly reduces server load.
Nginx Configuration Example
Here’s a sample Nginx configuration snippet for a Ruby application, incorporating some of these optimizations. This assumes Nginx is acting as a reverse proxy to a Gunicorn or Puma application server.
nginx.conf Snippet
# Global settings
worker_processes auto; # Or set to the number of CPU cores
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 4096; # Adjust based on expected load and system limits
multi_accept on;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
# Gzip Compression
gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;
# Buffers
client_body_buffer_size 256k;
client_header_buffer_size 1k;
large_client_header_buffers 4 8k;
# MIME types
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Logging
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log warn;
# SSL Configuration (example)
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';
# Include virtual host configurations
include /etc/nginx/*.conf;
include /etc/nginx/conf.d/*.conf;
}
Tuning Gunicorn/Puma for Ruby Applications
For Ruby applications, Gunicorn (Python WSGI HTTP Server) or Puma (Ruby web server) are common choices. While Gunicorn is Python-native, it can be used to serve Ruby applications via interfaces like Rack. Puma is a more direct Ruby solution. The primary tuning parameters revolve around worker processes, threads, and timeouts.
Worker Processes and Threads
Both Gunicorn and Puma use worker processes to handle requests concurrently. Additionally, Puma supports threading within each worker. The optimal number of workers and threads depends heavily on the application’s I/O-bound versus CPU-bound nature and the underlying hardware (CPU cores, memory).
A common heuristic for CPU-bound applications is to set the number of worker processes equal to the number of CPU cores. For I/O-bound applications, you might increase the number of workers or, in Puma’s case, leverage threads. A good starting point for Puma is often WEB_CONCURRENCY (workers) * RAILS_MAX_THREADS (threads per worker) = number of CPU cores. For Gunicorn, the --workers flag is analogous to Nginx’s worker_processes.
Gunicorn Configuration Example
When running Gunicorn, command-line arguments or a configuration file are used. For a Ruby application served via Rack, you’d typically invoke it like this:
gunicorn --workers 4 --threads 2 --bind 0.0.0.0:8000 myapp.wsgi:application
Here:
--workers 4: Spawns 4 worker processes. Adjust based on CPU cores.--threads 2: Each worker process can handle 2 concurrent requests using threads. This is more relevant for I/O-bound tasks.--bind 0.0.0.0:8000: Listens on all network interfaces on port 8000.myapp.wsgi:application: Points to the WSGI application object. For Ruby, this would be your Rack application.
Puma Configuration Example
Puma is often configured via a config/puma.rb file in a Rails application. The WEB_CONCURRENCY environment variable controls the number of worker processes, and RAILS_MAX_THREADS controls threads per worker.
# config/puma.rb
# Specifies the number of processes
# Should be equal to the number of available cores
workers ENV.fetch("WEB_CONCURRENCY") { 2 }.to_i
# Specifies the number of threads per process
# Should be set to accommodate I/O bound tasks
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }.to_i
threads threads_count, threads_count
environment ENV.fetch("RAILS_ENV") { "development" }
# Specifies the number of connections allowed per worker
# This is often set to the number of threads * 2
# For example, if you have 5 threads, you might set max_concurrency to 10
# max_concurrency threads_count * 2
# Specifies the bind address and port
# Use a Unix socket for better performance when Nginx is on the same host
# bind "unix:///var/www/my_app/shared/tmp/sockets/puma.sock"
bind "tcp://0.0.0.0:3000"
# Specifies the directory for Puma's temporary files
tmp_dir "/var/www/my_app/shared/tmp"
# Activate the master process
activate_control_app
# Allow Puma to be restarted by `rails restart` command.
plugin :tmp_restart
# Logging
stdout_redirect "/var/log/puma/puma_stdout.log", "/var/log/puma/puma_stderr.log", true
To run this configuration, you would typically set the environment variables:
export WEB_CONCURRENCY=4
export RAILS_MAX_THREADS=5
bundle exec puma -C config/puma.rb
DynamoDB Performance Tuning on GCP
While not directly part of the web server stack, DynamoDB is a common choice for persistent storage in cloud-native applications, including those on GCP. Optimizing DynamoDB involves understanding its throughput model, indexing strategies, and query patterns.
Throughput Provisioning (RCUs and WCUs)
DynamoDB operates on a provisioned throughput model, measured in Read Capacity Units (RCUs) and Write Capacity Units (WCUs). Each RCU allows one strongly consistent read per second for an item up to 4KB, or two eventually consistent reads per second. Each WCU allows one write per second for an item up to 1KB.
Key Tuning Strategies:
- Auto Scaling: Configure DynamoDB Auto Scaling to automatically adjust provisioned throughput based on actual traffic. This prevents throttling during spikes and saves costs during lulls. Set appropriate minimum and maximum values for RCUs and WCUs.
- On-Demand Capacity: For unpredictable workloads, consider DynamoDB On-Demand capacity. This eliminates the need to provision throughput, and you pay per request. It's often more cost-effective for new or highly variable workloads.
- Monitoring: Continuously monitor
ConsumedReadCapacityUnitsandConsumedWriteCapacityUnits. IfConsumed... > Provisioned..., you're being throttled. IfConsumed... << Provisioned..., you might be over-provisioned.
Indexing Strategies
The choice of primary keys (Partition Key and Sort Key) and secondary indexes (Global Secondary Indexes - GSIs, Local Secondary Indexes - LSIs) is paramount for efficient querying. A good partition key distributes data evenly across partitions to avoid hot spots. Sort keys enable efficient range queries within a partition.
Query Optimization
Avoid Scan operations on large tables. Scans read every item in the table, which is inefficient and costly. Instead, design your data model and indexes to support Query operations, which are much more performant as they target specific partitions.
When using GSIs, ensure that the GSI's keys align with your query patterns. Projecting only the necessary attributes to the GSI can reduce storage costs and improve performance.
DynamoDB Example: Auto Scaling Configuration (AWS CLI)
While GCP doesn't directly host DynamoDB (it's an AWS service), if your application on GCP needs to interact with DynamoDB, you'd manage its configuration via AWS tools. Here's an example of setting up Auto Scaling for a DynamoDB table using the AWS CLI:
# Enable Auto Scaling for Read Capacity
aws application-autoscaling put-scaling-policy \
--service-namespace dynamodb \
--resource-id table/your-table-name \
--policy-name MyDynamoDBReadAutoScalingPolicy \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "DynamoDBReadCapacityUtilization"
},
"ScaleInCooldown": 300,
"ScaleOutCooldown": 300,
"DisableScaleIn": false
}'
# Enable Auto Scaling for Write Capacity
aws application-autoscaling put-scaling-policy \
--service-namespace dynamodb \
--resource-id table/your-table-name \
--policy-name MyDynamoDBWriteAutoScalingPolicy \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "DynamoDBWriteCapacityUtilization"
},
"ScaleInCooldown": 300,
"ScaleOutCooldown": 300,
"DisableScaleIn": false
}'
# Set the minimum and maximum provisioned capacity
aws application-autoscaling register-scalable-target \
--service-namespace dynamodb \
--resource-id table/your-table-name \
--scalable-dimension dynamodb:table:ReadCapacityUnits \
--min-capacity 5 \
--max-capacity 1000
aws application-autoscaling register-scalable-target \
--service-namespace dynamodb \
--resource-id table/your-table-name \
--scalable-dimension dynamodb:table:WriteCapacityUnits \
--min-capacity 5 \
--max-capacity 1000
Remember to replace your-table-name with your actual DynamoDB table name and adjust the TargetValue, min-capacity, and max-capacity according to your application's needs and cost considerations.