The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and MongoDB on Google Cloud for Ruby
Nginx as a High-Performance Frontend for Ruby Applications
When deploying Ruby applications on Google Cloud, Nginx serves as an indispensable frontend. Its role extends beyond simple reverse proxying; it handles SSL termination, static file serving, request buffering, and load balancing, offloading these critical tasks from your application servers. Optimizing Nginx is paramount for achieving low latency and high throughput.
Nginx Configuration Tuning
The core of Nginx performance lies in its configuration. We’ll focus on key directives within the http and server blocks.
Worker Processes and Connections
The worker_processes directive dictates how many worker processes Nginx will spawn. Setting this to auto is generally recommended, allowing Nginx to detect the number of CPU cores available. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. A common starting point is 1024, but this can be increased based on your server’s memory and expected load.
Keepalive Connections
Enabling keepalive connections reduces the overhead of establishing new TCP connections for each request. The keepalive_timeout directive specifies how long an idle keepalive connection will remain open. A value between 60 and 120 seconds is a good balance, preventing resource exhaustion while still offering performance benefits.
Buffering and Request Size Limits
Nginx uses buffers to handle client requests and responses. Tuning client_body_buffer_size and client_header_buffer_size can prevent excessive disk I/O for large requests. For typical web applications, values around 16k or 32k are sufficient. client_max_body_size should be set to accommodate the largest expected file uploads.
Gzip Compression
Enabling Gzip compression significantly reduces the amount of data transferred over the network, leading to faster page load times. Ensure gzip on; is set, along with appropriate gzip_types to compress common content types like HTML, CSS, JavaScript, and JSON.
SSL/TLS Optimization
For secure connections, optimize SSL/TLS settings. ssl_session_cache shared:SSL:10m; and ssl_session_timeout 10m; enable session caching, reducing the CPU load for repeated SSL handshakes. Using modern cipher suites and protocols is also crucial for security and performance.
Example Nginx Configuration Snippet
Here’s a sample Nginx configuration snippet incorporating these optimizations. This would typically reside in your nginx.conf or a site-specific configuration file.
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 4096;
multi_accept on;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 1000;
client_body_buffer_size 16k;
client_header_buffer_size 16k;
client_max_body_size 50m; # Adjust based on expected file uploads
gzip on;
gzip_disable "msie6";
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;
# SSL Configuration (example for a specific server block)
# ssl_certificate /etc/nginx/ssl/your_domain.crt;
# ssl_certificate_key /etc/nginx/ssl/your_domain.key;
# ssl_protocols TLSv1.2 TLSv1.3;
# ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';
# ssl_prefer_server_ciphers on;
# ssl_session_cache shared:SSL:10m;
# ssl_session_timeout 10m;
# ssl_session_tickets off; # Consider security implications
# Proxy to your Ruby application server (e.g., Gunicorn/Puma)
# location / {
# proxy_pass http://your_app_backend;
# proxy_set_header Host $host;
# proxy_set_header X-Real-IP $remote_addr;
# proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# proxy_set_header X-Forwarded-Proto $scheme;
# proxy_read_timeout 300s; # Adjust as needed for long-running requests
# proxy_connect_timeout 75s;
# }
# Serve static files directly
# location /assets/ {
# alias /path/to/your/app/public/assets/;
# expires 30d;
# add_header Cache-Control "public";
# }
}
Gunicorn/Puma Tuning for Ruby Applications
Your Ruby application server (commonly Gunicorn for Python, but for Ruby, Puma or Unicorn are more prevalent) is the next critical layer. Tuning its worker count, threads, and timeouts directly impacts how your application handles concurrent requests and its overall responsiveness.
Worker and Thread Configuration
For Puma, the --workers flag determines the number of Ruby processes. The --threads flag determines the number of threads per worker. A common strategy is to set workers equal to the number of CPU cores available to the instance, and then tune threads based on your application’s I/O-bound vs. CPU-bound nature. For I/O-bound applications, more threads can be beneficial. For CPU-bound applications, fewer threads might be better to avoid excessive context switching.
Timeouts and Keepalive
--timeout in Puma (or equivalent in other servers) defines how long the server will wait for a worker to respond to a request. This should be set slightly higher than your longest expected request processing time. Ensure this aligns with Nginx’s proxy_read_timeout to avoid conflicting timeouts.
Example Puma Command Line Arguments
When starting Puma, you might use arguments like these:
# Assuming a 4-core VM and an I/O-bound application puma -w 4 -t 8 --bind tcp://0.0.0.0:3000 --environment production --timeout 60 --pidfile /tmp/puma.pid
Explanation:
-w 4: Spawns 4 worker processes.-t 8: Configures 8 threads per worker.--bind tcp://0.0.0.0:3000: Binds Puma to listen on all interfaces on port 3000.--environment production: Sets the Rails environment.--timeout 60: Sets a 60-second request timeout.--pidfile /tmp/puma.pid: Specifies a PID file for process management.
MongoDB Performance Tuning on Google Cloud
MongoDB’s performance is heavily influenced by its configuration, hardware, and query patterns. On Google Cloud, leveraging appropriate machine types and disk configurations is crucial.
Hardware and Disk Selection
For production MongoDB deployments, choose Compute Engine instances with sufficient RAM and CPU. For storage, use SSD Persistent Disks. Local SSDs can offer higher IOPS but are ephemeral, making them unsuitable for primary data storage unless you have a robust backup and replication strategy. Consider using Google Cloud’s managed MongoDB solutions (like MongoDB Atlas on GCP) which abstract away much of this infrastructure management.
MongoDB Configuration (`mongod.conf`)
Key parameters in mongod.conf (or /etc/mongod.conf) to tune include:
Storage Engine and Cache
MongoDB typically uses the WiredTiger storage engine. Ensure storage.engine: wiredTiger is set. The WiredTiger cache size is critical. By default, it uses 50% of free RAM. You can explicitly set storage.wiredTiger.collectionConfig.cacheSizeGB to a specific value, but often letting it use a generous portion of RAM (e.g., 70-80% of instance RAM, leaving enough for the OS and application) is effective.
Journaling
Journaling (storage.journal.enabled: true) is enabled by default and crucial for durability. While it has a performance overhead, it’s essential for production environments. For extreme write-heavy workloads where some data loss on catastrophic failure might be acceptable, it could be disabled, but this is rarely recommended.
Network and Connection Settings
net.port and net.bindIp are standard network configurations. Ensure your firewall rules (VPC firewall in GCP) allow access only from necessary sources.
Indexing Strategy
The most significant performance gains in MongoDB often come from proper indexing. Analyze your query patterns using explain() and ensure indexes exist for fields used in query filters, sorts, and projections. Avoid over-indexing, as indexes consume disk space and slow down write operations.
Example `mongod.conf` Snippet
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
engine: wiredTiger
wiredTiger:
collectionConfig:
cacheSizeGB: 0.75 # Example: Use 75% of available RAM if instance has 8GB RAM
engineConfig:
cacheSizeGB: 0.75 # Same as above, for WiredTiger's overall cache
directoryForTempFiles: /var/tmp
# For a standalone instance, adjust bindIp if needed. For replica sets, use appropriate IPs.
net:
port: 27017
bindIp: 0.0.0.0 # Or specific IPs for security
# Sharding settings would go here if applicable
# sharding:
# clusterRole: configsvr
# localConnectionString: mongodb://host1:27019,host2:27019,host3:27019/admin?replicaSet=configReplSet
# Replication settings for replica sets
# replication:
# replSetName: myReplicaSetName
# Security settings
# security:
# authorization: enabled
# keyFile: /path/to/keyfile
# Logging settings
# systemLog:
# destination: file
# path: /var/log/mongodb/mongod.log
# logAppend: true
# verbosity: 0
Note: The cacheSizeGB values are illustrative. You must calculate the appropriate value based on your specific Google Cloud instance’s RAM. For example, on an 8GB RAM instance, 0.75 would mean 75% of 8GB, which is 6GB. Ensure you leave sufficient RAM for the OS and your application server.
Monitoring and Iterative Tuning
Performance tuning is not a one-time event. Continuous monitoring is essential. Utilize Google Cloud’s Cloud Monitoring, Nginx’s status module, and application-level performance monitoring (APM) tools. Key metrics to watch include:
- Nginx: Active connections, requests per second, error rates (4xx, 5xx), upstream response times.
- Application Server (Puma/Unicorn): Request latency, worker/thread utilization, memory usage, garbage collection activity.
- MongoDB: Query execution times, index hit rates, cache hit rates, disk I/O, network traffic, oplog lag (for replica sets).
Regularly review these metrics, identify bottlenecks, and make incremental adjustments to your configurations. Load testing in a staging environment that mirrors production is highly recommended before applying significant changes to a live system.