The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on AWS for Ruby

Nginx Tuning for High-Traffic Ruby Applications

Optimizing Nginx as a reverse proxy and static file server is paramount for any high-traffic Ruby application. We’ll focus on key directives that directly impact performance and resource utilization, particularly when serving dynamic content proxied to Gunicorn or Puma (for Python/Ruby respectively) or FPM (for PHP). This section assumes a standard Nginx setup on an EC2 instance within AWS.

Worker Processes and Connections

The worker_processes directive dictates how many worker processes Nginx will spawn. A common recommendation is to set this to the number of CPU cores available. For dynamic applications, setting it to auto is often the most robust approach, allowing Nginx to determine the optimal number based on the system’s CPU count.

The worker_connections directive defines the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be worker_processes * worker_connections. This value should be set high enough to accommodate peak traffic, but not so high that it exhausts system resources. A good starting point is 1024 or 2048, but this often needs to be tuned based on actual load and OS limits.

Configuration Snippet

# In nginx.conf, within the 'main' context
worker_processes auto; # Or set to the number of CPU cores

events {
    worker_connections 4096; # Adjust based on load and OS limits
    multi_accept on; # Allows workers to accept multiple connections at once
}

Keepalive Connections

HTTP keep-alive connections reduce the overhead of establishing new TCP connections for each request. Tuning keepalive_timeout and keepalive_requests can significantly improve performance by allowing clients to reuse existing connections. A shorter keepalive_timeout can free up resources faster, while a longer one can improve latency for clients making multiple requests.

Configuration Snippet

# In nginx.conf, within the 'main' context
http {
    # ... other http directives ...

    keepalive_timeout 65; # Seconds to keep a connection open
    keepalive_requests 100; # Max requests per keepalive connection

    # ... rest of http block ...
}

Buffering and Gzip Compression

Nginx buffering can help manage traffic spikes and improve response times. Directives like client_body_buffer_size, client_max_body_size, proxy_buffers, and proxy_buffer_size control how Nginx handles request and response bodies. For dynamic content proxied to backend applications, tuning these can prevent excessive memory usage or slow responses.

Enabling Gzip compression is a low-hanging fruit for performance gains, especially for text-based assets (HTML, CSS, JS). Ensure it’s configured to compress responses before sending them to the client.

Configuration Snippet

# In nginx.conf, within the 'http' context
http {
    # ... other http directives ...

    client_body_buffer_size 128k;
    client_max_body_size 10m; # Adjust based on expected file uploads
    proxy_buffers 8 16k;
    proxy_buffer_size 32k;
    proxy_connect_timeout 60s;
    proxy_send_timeout 60s;
    proxy_read_timeout 60s;

    # Gzip Compression
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6; # Compression level (1-9)
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # ... rest of http block ...
}

Gunicorn/Puma Tuning for Ruby Applications

When serving Ruby applications, Gunicorn (more common for Python, but can be used with WSGI wrappers for Ruby) or Puma are typical choices. Tuning their worker processes, threads, and timeouts is crucial for handling concurrent requests efficiently and preventing application-level bottlenecks.

Worker Processes and Threads (Puma)

Puma’s architecture uses a multi-threaded, multi-process model. The -w (workers) flag sets the number of worker processes, and the -t (threads) flag sets the number of threads per worker. A common strategy is to set workers to the number of CPU cores and threads to a value that balances concurrency with memory usage. For I/O-bound applications, a higher thread count can be beneficial.

Command Line Example

# Example for a 4-core instance
puma -w 4 -t 5 config/puma.rb

Puma Configuration File (config/puma.rb)

The config/puma.rb file offers more granular control. Key settings include workers, threads, and timeout. The workers and threads directives here mirror the command-line flags. The timeout directive specifies how long Puma will wait for a request to be processed before timing out, which is important for preventing hung requests from consuming resources indefinitely.

Configuration Snippet (config/puma.rb)

# config/puma.rb

workers 4 # Number of worker processes
threads 0, 16 # Min and max threads per worker (0 means use default)

# Adjust timeout based on expected request processing time
timeout 30

# Environment and port
environment ENV.fetch('RAILS_ENV') { 'production' }
port ENV.fetch('PORT') { 3000 }

# If using a clustered mode, you might have more advanced configurations
# For example, to preload the application:
# preload_app!

Gunicorn Configuration (if applicable)

If using Gunicorn with a Ruby application (less common, but possible via WSGI wrappers), the tuning principles are similar. The --workers flag sets the number of worker processes, and --threads sets the number of threads per worker. The --timeout flag is also critical.

Command Line Example

# Example for a 4-core instance
gunicorn --workers 4 --threads 5 --timeout 30 myapp.wsgi:application

PHP-FPM Tuning

For PHP applications served via Nginx, PHP-FPM (FastCGI Process Manager) is the standard. Tuning FPM pools is essential for performance. The primary directives to consider are pm (process manager type), pm.max_children, pm.start_servers, pm.min_spare_servers, pm.max_spare_servers, and pm.max_requests.

Process Manager Types

PHP-FPM offers three process manager types:

static: The number of child processes is fixed. Good for predictable loads.
dynamic: The number of child processes varies between pm.min_spare_servers and pm.max_children.
ondemand: Child processes are created only when a request is received.

For most production environments, dynamic offers a good balance between resource utilization and responsiveness. static can be beneficial if you have a very stable, high-traffic site and want to avoid the overhead of process spawning/killing.

Tuning Directives

pm.max_children: The maximum number of child processes that can be active at the same time. This is the most critical setting and should be tuned based on your server’s RAM and the memory footprint of your PHP application. A common formula is (Total RAM - RAM for OS/Nginx/DB) / Average PHP process size.

pm.start_servers: The number of child processes started when the FPM master process is started.

pm.min_spare_servers: The minimum number of idle supervisor processes. If there are fewer idle processes than this, the master process will spawn more children.

pm.max_spare_servers: The maximum number of idle supervisor processes. If there are more idle processes than this, the master process will kill off the extra children.

pm.max_requests: The number of requests each child process should execute before respawning. This helps to prevent memory leaks and keep processes fresh. A value between 500 and 1000 is common.

Configuration Snippet (php-fpm.conf or pool.d/www.conf)

; Example for a server with 8GB RAM, assuming ~2GB for OS/Nginx/DB
; and average PHP process size of ~30MB

; pm = dynamic
; pm.max_children = 64 ; (8GB - 2GB) / 30MB = 6GB / 30MB = 200MB / 1MB = 200. Let's start lower.
; pm.start_servers = 10
; pm.min_spare_servers = 5
; pm.max_spare_servers = 20
; pm.max_requests = 500

; For a more static approach on a high-traffic server:
pm = static
pm.max_children = 100 ; Adjust based on RAM and application footprint
pm.max_requests = 1000

; Other important settings
listen = /run/php/php7.4-fpm.sock ; Or TCP/IP socket
listen.owner = www-data
listen.group = www-data
listen.mode = 0660

request_terminate_timeout = 120s ; Max execution time for a script

Elasticsearch Tuning on AWS

Optimizing Elasticsearch, especially on AWS, involves careful consideration of instance types, JVM heap size, shard allocation, and indexing strategies. For Ruby applications, Elasticsearch is often used for search functionality, logging aggregation, or analytics.

Instance Type Selection

Choosing the right EC2 instance type is crucial. For Elasticsearch, memory-optimized instances (e.g., r5, r6g) are generally preferred due to Elasticsearch’s heavy reliance on heap memory and OS file system cache. Compute-optimized instances might be suitable for heavy indexing workloads, while general-purpose instances can be a starting point for smaller deployments.

JVM Heap Size

The JVM heap size is arguably the most critical Elasticsearch tuning parameter. It should be set to no more than 50% of the system’s total RAM, and never exceed 30-32GB (due to compressed ordinary object pointers – compressed oops). Setting it too high can lead to long garbage collection pauses, while setting it too low can cause OutOfMemory errors.

Configuration Snippet (jvm.options)

# In /etc/elasticsearch/jvm.options or similar path

# Example for an instance with 64GB RAM
-Xms30g
-Xmx30g

# Example for an instance with 16GB RAM
; -Xms8g
; -Xmx8g

Important: After changing jvm.options, you must restart the Elasticsearch service.

Shard Allocation and Shard Count

The number of primary shards per index significantly impacts performance. Too many shards can increase overhead, while too few can limit parallelism. A common recommendation is to aim for a shard size between 10GB and 50GB. The total number of shards across your cluster should be manageable by your nodes. Avoid oversharding.

Elasticsearch’s shard allocation awareness and filtering can be used to distribute shards intelligently across availability zones or specific instance types within AWS, improving resilience and performance.

Example: Creating an Index with Specific Shard Count

PUT /my_logs
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1
    }
  }
}

Indexing Performance

For high-volume indexing, consider the following:

Refresh Interval: The index.refresh_interval setting controls how often new documents become visible for search. Increasing this interval (e.g., from 1s to 30s or -1 during bulk indexing) can significantly boost indexing throughput at the cost of search latency.
Bulk API: Always use the Bulk API for indexing multiple documents. Tune the size of your bulk requests (e.g., 5-15MB) based on your cluster’s performance.
Translog Durability: Setting index.translog.durability to async can improve indexing performance by reducing disk I/O, but it increases the risk of data loss in case of a node failure before the translog is flushed to disk. Use with caution.

Example: Adjusting Refresh Interval

PUT /my_logs/_settings
{
  "index": {
    "refresh_interval": "30s"
  }
}

Monitoring and Diagnostics

Effective monitoring is key to identifying bottlenecks and validating tuning efforts. Utilize AWS CloudWatch for EC2 metrics (CPU, Memory, Network), ELB metrics, and RDS metrics. For Elasticsearch, leverage its built-in monitoring APIs and consider tools like Metricbeat or commercial solutions.

Key metrics to watch:

Nginx: Active connections, requests per second, error rates (5xx, 4xx), upstream response times.
Gunicorn/Puma: Worker utilization, request queue length, response times, error rates.
PHP-FPM: Pool utilization (active processes, idle processes), request duration, slow requests.
Elasticsearch: JVM heap usage, garbage collection activity, indexing rate, search latency, disk I/O, CPU utilization.

For deep dives, use tools like strace, perf, and application-specific profiling tools. Regularly review Nginx access and error logs, and application logs for recurring issues.

The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on AWS for Ruby

Nginx Tuning for High-Traffic Ruby Applications

Worker Processes and Connections

Configuration Snippet

Keepalive Connections

Configuration Snippet

Buffering and Gzip Compression

Configuration Snippet

Gunicorn/Puma Tuning for Ruby Applications

Worker Processes and Threads (Puma)

Command Line Example

Puma Configuration File (config/puma.rb)

Configuration Snippet (config/puma.rb)

Gunicorn Configuration (if applicable)

Command Line Example

PHP-FPM Tuning

Process Manager Types

Tuning Directives

Configuration Snippet (php-fpm.conf or pool.d/www.conf)

Elasticsearch Tuning on AWS

Instance Type Selection

JVM Heap Size

Configuration Snippet (jvm.options)

Shard Allocation and Shard Count

Example: Creating an Index with Specific Shard Count

Indexing Performance

Example: Adjusting Refresh Interval

Monitoring and Diagnostics

Recent Posts

Top Categories

Our Products

Our Services