The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on Google Cloud for Ruby

Nginx Tuning for High-Traffic Ruby Applications on Google Cloud

Optimizing Nginx is paramount for serving high-traffic Ruby applications, especially when leveraging Google Cloud Platform (GCP). This section details critical Nginx configurations for performance, security, and scalability, focusing on worker processes, connection handling, caching, and SSL/TLS optimization.

Worker Processes and Connections

The `worker_processes` directive dictates how many worker processes Nginx will spawn. A common best practice is to set this to the number of CPU cores available on your instance. For dynamic environments, `auto` can be used, allowing Nginx to determine the optimal number. The `worker_connections` directive sets the maximum number of simultaneous connections that each worker process can handle. This value, combined with `worker_processes`, determines the total connection capacity. Ensure your system’s file descriptor limits are also increased to accommodate these connections.

Systemd Service File for File Descriptor Limits

To increase file descriptor limits for Nginx, modify its systemd service file. This ensures that Nginx can handle a large number of open connections without hitting OS-level limits.

[Unit]
Description=The Nginx HTTP Server
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
LimitNOFILE=65536  <-- Increased file descriptor limit
LimitNPROC=65536   <-- Increased process limit

[Install]
WantedBy=multi-user.target

After modifying the service file (typically located at /etc/systemd/system/nginx.service.d/override.conf or by creating a new file like /etc/systemd/system/nginx.service and using systemctl edit nginx), reload the systemd daemon and restart Nginx:

sudo systemctl daemon-reload
sudo systemctl restart nginx

Nginx Configuration Snippet

In your main Nginx configuration file (e.g., /etc/nginx/nginx.conf), adjust the following directives within the events block:

worker_processes auto; # Or set to the number of CPU cores
events {
    worker_connections 4096; # Adjust based on expected load and system limits
    multi_accept on;
}

Gzip Compression and Buffering

Enabling Gzip compression significantly reduces the bandwidth required to transfer assets, leading to faster load times. Buffering directives control how Nginx handles request and response bodies, which can impact memory usage and latency.

http {
    # ... other http settings ...

    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml;

    # Buffering settings
    proxy_buffering on;
    proxy_buffer_size 16k;
    proxy_buffers 8 16k;
    proxy_busy_buffers_size 32k;
    proxy_temp_file_write_size 32k;
}

SSL/TLS Optimization

For secure connections, SSL/TLS optimization is crucial. This includes enabling HTTP/2, optimizing cipher suites, and leveraging session caching.

server {
    listen 443 ssl http2; # Enable HTTP/2
    server_name your_domain.com;

    ssl_certificate /etc/letsencrypt/live/your_domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your_domain.com/privkey.pem;

    # Modern TLS configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
    ssl_session_cache shared:SSL:10m; # Adjust size as needed
    ssl_session_timeout 10m;
    ssl_session_tickets off; # Consider security implications

    # OCSP Stapling
    ssl_stapling on;
    ssl_stapling_verify on;
    resolver 8.8.8.8 8.8.4.4 valid=300s; # Google DNS, adjust if necessary
    resolver_timeout 5s;

    # ... rest of your server configuration ...
}

Gunicorn/Puma Tuning for Ruby Applications

The application server (Gunicorn for Python, Puma for Ruby) is the bridge between Nginx and your application code. Proper configuration here is vital for handling concurrent requests efficiently.

Gunicorn Configuration (if applicable for Python-based services)

For Python applications, Gunicorn’s worker count and type significantly impact performance. The number of workers should generally be (2 * number_of_cores) + 1. The worker class also matters; gevent or eventlet are good for I/O-bound applications, while sync is simpler but less efficient for concurrency.

# Example Gunicorn command
gunicorn --workers 5 \
         --worker-class gevent \
         --bind 0.0.0.0:8000 \
         your_app.wsgi:application

In a GCP environment, consider using a managed service like Cloud Run or App Engine Flexible for easier scaling and management of your Python applications, which abstract away some of these Gunicorn tuning concerns.

Puma Configuration (for Ruby Applications)

Puma, a popular choice for Ruby, offers threaded and forked worker models. For typical web applications, a combination of phased-out (forked) workers and threaded workers within each worker is effective. The -w flag sets the number of worker processes, and -t sets the number of threads per worker.

# Example Puma command (often managed by systemd or similar)
# For a 4-core instance:
# 2 worker processes, each with 5 threads
puma -w 2 -t 5 --bind tcp://0.0.0.0:9292 --pidfile /var/run/puma.pid /path/to/your/app/config.ru

The optimal ratio of workers to threads depends heavily on your application’s I/O patterns and CPU-bound versus memory-bound characteristics. A common starting point is to set the total number of threads (workers * threads_per_worker) to roughly 2 * number_of_cores, and then adjust based on performance testing.

Elasticsearch Tuning on Google Cloud

Elasticsearch performance is critical for search functionality. Tuning involves JVM heap size, shard allocation, and indexing strategies. For GCP, consider using Elasticsearch Service on Elastic Cloud or self-managing on Compute Engine instances.

JVM Heap Size

The JVM heap size is arguably the most critical Elasticsearch tuning parameter. It should be set to no more than 50% of the total system RAM, and never exceed 30-32GB due to compressed ordinary object pointers (compressed oops). Set ES_HEAP_SIZE in the Elasticsearch environment configuration.

# In /etc/elasticsearch/jvm.options or via environment variables
-Xms4g
-Xmx4g

For a GCP instance with 8GB RAM, setting -Xms4g -Xmx4g is a reasonable starting point. Restart Elasticsearch after changing this.

Shard Allocation and Size

The number of primary shards per index impacts performance. Aim for primary shard sizes between 10GB and 50GB. Too many small shards increase overhead; too few large shards can hinder recovery and rebalancing. Elasticsearch’s default shard allocation settings are generally good, but monitor the cluster health API.

# Example: Creating an index with a specific number of primary shards
PUT /my-index
{
  "settings": {
    "index": {
      "number_of_shards": 3,  <-- Adjust based on expected data volume and query load
      "number_of_replicas": 1 <-- Adjust based on availability needs
    }
  }
}

Use the Cluster Allocation Explain API to diagnose shard placement issues.

Indexing Performance

For high-volume indexing, consider disabling `refresh_interval` during bulk indexing operations and re-enabling it afterward. Also, tune the number of indexing threads and bulk queue sizes.

# Temporarily disable refresh for bulk indexing
PUT /my-index/_settings
{
  "index": {
    "refresh_interval": "-1"
  }
}

# Perform bulk indexing...

# Re-enable refresh (e.g., every 5 seconds)
PUT /my-index/_settings
{
  "index": {
    "refresh_interval": "5s"
  }
}

Monitor the Elasticsearch `_cat/thread_pool` API to understand thread pool usage and identify potential bottlenecks.

Monitoring and Diagnostics on GCP

Effective monitoring is key to identifying performance issues before they impact users. Leverage GCP’s built-in tools and integrate them with your application stack.

Nginx and Application Server Metrics

Use Nginx’s `stub_status` module to expose active connections, requests per second, and other key metrics. For Ruby applications, integrate libraries like prometheus-client-ruby to expose application-level metrics (e.g., request latency, error rates) that can be scraped by Prometheus.

# In nginx.conf, within http block
http {
    # ...
    server {
        # ...
        location /nginx_status {
            stub_status;
            allow 127.0.0.1; # Restrict access
            deny all;
        }
        # ...
    }
    # ...
}

GCP’s Operations Suite (formerly Stackdriver) can ingest these metrics, providing dashboards and alerting capabilities.

Elasticsearch Monitoring

Utilize the Elasticsearch Monitoring features, often integrated with Kibana, to track cluster health, node statistics, JVM usage, indexing rates, and search performance. GCP’s Operations Suite can also ingest logs from Elasticsearch nodes for centralized analysis.

# Example: Checking cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Example: Checking node stats
curl -X GET "localhost:9200/_nodes/stats?pretty"

Set up alerts in GCP Operations Suite for critical Elasticsearch conditions such as high CPU utilization, low disk space, or unhealthy cluster status.

Conclusion

Tuning Nginx, your application server (Gunicorn/Puma), and Elasticsearch is an ongoing process. This playbook provides a solid foundation for optimizing your Ruby stack on Google Cloud. Remember to benchmark changes, monitor performance continuously, and iterate based on real-world usage patterns.