The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and DynamoDB on AWS for Ruby

Nginx Configuration for High-Traffic Ruby Applications

Optimizing Nginx as a reverse proxy for Ruby applications, especially those using Gunicorn or Puma, is critical for handling concurrent requests and minimizing latency. The key lies in fine-tuning worker processes, connection limits, and caching strategies.

For a typical setup serving a Ruby application via Gunicorn (often used with Python, but conceptually similar to Puma for Ruby), Nginx acts as the front-end, handling SSL termination, static file serving, and request routing. We’ll focus on the Nginx configuration directives that directly impact performance.

Worker Processes and Connections

The worker_processes directive determines how many worker processes Nginx will spawn. A common recommendation is to set this to the number of CPU cores available on the server. The worker_connections directive sets the maximum number of simultaneous connections that each worker process can handle. The total maximum connections will be worker_processes * worker_connections.

Tuning `worker_processes` and `worker_connections`

Start by identifying the number of CPU cores. You can typically find this using nproc or by inspecting /proc/cpuinfo.

Example Nginx Configuration Snippet

worker_processes auto; # 'auto' lets Nginx decide based on CPU cores
# Or explicitly set based on CPU cores, e.g., for 4 cores:
# worker_processes 4;

events {
    worker_connections 4096; # Adjust based on expected load and system limits
    multi_accept on;
}

http {
    # ... other http configurations ...

    server {
        listen 80;
        server_name your_domain.com;

        # Serve static files directly from Nginx
        location ~ ^/(assets|images|javascripts)/ {
            root /path/to/your/rails/public;
            expires 1y;
            add_header Cache-Control "public";
        }

        location / {
            proxy_pass http://unix:/path/to/your/app.sock; # Or http://127.0.0.1:PORT
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            proxy_connect_timeout 60s;
            proxy_send_timeout 60s;
            proxy_read_timeout 60s;
        }

        # ... other server configurations ...
    }
}

Explanation:

worker_processes auto;: Recommended for most modern systems. Nginx will automatically detect the number of CPU cores and set worker_processes accordingly.
worker_connections 4096;: This is a common starting point. The actual optimal value depends on your application’s I/O patterns and the system’s file descriptor limits (ulimit -n). Ensure your system’s limits are high enough.
multi_accept on;: Allows workers to accept multiple new connections at once, improving efficiency under heavy load.
proxy_set_header directives: Crucial for passing accurate client information to the backend application.
proxy_connect_timeout, proxy_send_timeout, proxy_read_timeout: These timeouts prevent Nginx from holding connections open indefinitely if the backend application is slow or unresponsive. Adjust these based on your application’s typical response times.

Buffering and Keepalive

Nginx buffering can significantly impact performance by allowing it to buffer responses from the backend before sending them to the client. keepalive_timeout controls how long an idle connection will remain open. For HTTP/1.1, persistent connections are the default, reducing the overhead of establishing new TCP connections.

Tuning Buffering and Keepalive

http {
    # ...

    proxy_buffering on;
    proxy_buffer_size 16k;
    proxy_buffers 8 32k; # Number of buffers and size of each buffer
    proxy_busy_buffers_size 64k; # Size of busy buffers

    keepalive_timeout 65; # Default is 75. Lowering can free up resources faster.
    keepalive_requests 100; # Number of requests per keepalive connection.

    # ...
}

Explanation:

proxy_buffering on;: Enables buffering of responses from the proxied server. This can improve performance by allowing Nginx to send data to the client more efficiently, especially for slow clients.
proxy_buffer_size, proxy_buffers, proxy_busy_buffers_size: These directives control the size and number of buffers Nginx uses for proxying. Tuning these can be complex and application-dependent. Larger values might be beneficial for applications returning large responses, but can also increase memory usage. Start with defaults and tune based on monitoring.
keepalive_timeout 65;: Reduces the number of new TCP connections established, which is beneficial for performance. The value should be long enough to allow for multiple requests over the same connection but short enough to release resources from idle connections.
keepalive_requests 100;: Limits the number of requests that can be made over a single keepalive connection. This prevents a single client from monopolizing a connection.

Gunicorn/Puma Configuration for Ruby Applications

When running Ruby applications (like Rails or Sinatra) on AWS, you’ll typically use a WSGI/Rack server like Puma or Unicorn. These servers manage worker processes that handle incoming requests from Nginx. Their configuration directly impacts concurrency and throughput.

Puma Configuration

Puma is a popular, multi-threaded, multi-process web server for Ruby. Its configuration revolves around the number of workers and threads.

Tuning Workers and Threads

The optimal balance between workers and threads depends heavily on your application’s I/O-bound vs. CPU-bound nature and the available CPU cores. A common strategy is to use multiple workers, each with multiple threads.

Example Puma Configuration (`config/puma.rb`)

# config/puma.rb

# Set the environment
environment ENV.fetch('RAILS_ENV') { 'production' }

# Number of threads per worker.
# A common starting point is 5. Adjust based on your application's performance
# and the number of CPU cores. Too many threads can lead to context-switching overhead.
threads 0, 16 # Min threads, Max threads. 0 means auto-tune to number of cores.

# Number of worker processes.
# For multi-core systems, a good starting point is (number of CPU cores - 1).
# If using a single core, set to 1.
workers ENV.fetch('WEB_CONCURRENCY') { 2 }.to_i # Example: 2 workers

# Set the maximum number of connections per worker.
# This is often tied to the number of threads.
# max_concurrency ENV.fetch('RAILS_MAX_THREADS') { 5 }.to_i # Not a direct Puma setting, but a conceptual limit

# Bind to a Unix socket or TCP port
# If Nginx is on the same server, a Unix socket is generally faster.
# If Nginx is on a different server, use a TCP port.
# Example for Unix socket:
# bind "unix:///path/to/your/app.sock"
# Example for TCP port:
bind "tcp://0.0.0.0:9292" # Or your preferred port

# Set the maximum number of requests that a worker will process before restarting.
# This helps prevent memory leaks and keeps the application fresh.
max_threads_count ENV.fetch('RAILS_MAX_THREADS') { 5 }
min_threads_count ENV.fetch('RAILS_MIN_THREADS') { 1 }

# Set the maximum number of requests that a worker will process before restarting.
# This helps prevent memory leaks and keeps the application fresh.
worker_timeout 60 # Seconds

# Logging
stdout_redirect 'log/puma.stdout.log', 'log/puma.stderr.log', true

# State file for Puma to manage workers
state_path "tmp/pids/puma.state"

# Activate the master process
activate_control_app

# Preload the application before starting workers
preload_app!

# Callbacks
on_worker_boot do
  # Worker specific setup
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

on_worker_shutdown do
  # Worker specific cleanup
end

on_worker_fork do
  # Worker specific setup
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

# Allow Puma to be restarted by `rails restart` command.
plugin :tmp_restart

Explanation:

threads 0, 16: This is a powerful setting. 0 for the minimum threads tells Puma to auto-tune based on the number of CPU cores. The maximum of 16 sets an upper bound. For a typical web server, 5-10 threads per core is a good starting point.
workers ENV.fetch('WEB_CONCURRENCY') { 2 }.to_i: Sets the number of worker processes. This is often set via an environment variable. A common strategy is to set this to CPU_CORES - 1.
bind "tcp://0.0.0.0:9292": Configures Puma to listen on a specific TCP port. Ensure this port is accessible from your Nginx server if they are on different instances.
max_threads_count and min_threads_count: These are used for the threads directive.
worker_timeout 60: A safety mechanism. If a worker takes longer than 60 seconds to respond, it’s killed and restarted.
preload_app!: This directive tells Puma to load the entire application stack before forking worker processes. This is crucial for performance as it avoids loading the application code repeatedly for each worker.
on_worker_boot/on_worker_fork: Useful for re-establishing database connections or other resources after a worker is spawned or forked.

Unicorn Configuration

Unicorn is a simpler, process-based HTTP server for Rack applications. It does not use threads within its workers, meaning each worker handles one request at a time. This makes it easier to reason about concurrency but can be less efficient for I/O-bound tasks compared to threaded servers like Puma.

Tuning Workers

The primary tuning parameter for Unicorn is the number of worker processes. A common recommendation is (CPU_CORES * 2) + 1 for I/O-bound applications, or simply CPU_CORES for CPU-bound applications.

Example Unicorn Configuration (`config/unicorn.rb`)

# config/unicorn.rb

# Set the number of worker processes.
# A common recommendation for I/O bound apps is (CPU_CORES * 2) + 1.
# For CPU bound apps, just CPU_CORES.
worker_processes ENV.fetch('WEB_CONCURRENCY') { 4 }.to_i # Example: 4 workers

# Load the app into memory before forking workers.
preload_app true

# Set the path to the Unicorn socket or TCP port.
# If Nginx is on the same server, a Unix socket is generally faster.
# If Nginx is on a different server, use a TCP port.
# Example for Unix socket:
# listen "/path/to/your/app.sock"
# Example for TCP port:
listen ENV.fetch('PORT') { 3000 }.to_i # Listen on a specific port

# Set the maximum number of requests that a worker will process before restarting.
# This helps prevent memory leaks and keeps the application fresh.
max_requests 5000

# Set the timeout for worker processes.
# If a worker takes longer than this to respond, it will be killed and restarted.
timeout 30

# Logging
stderr_path "log/unicorn.stderr.log"
stdout_path "log/unicorn.stdout.log"

# PID file
pid "/path/to/your/tmp/pids/unicorn.pid"

# Before worker processes are forked
before_fork do |server, worker|
  # The following is highly recommended for Rails in production:
  #
  # Ensure that the master process does not hold connections to the database.
  # This is because workers will be forked from the master, and if the master
  # has connections to the database, then the workers will inherit them.
  #
  # If you are using ActiveRecord, you can do this like so:
  defined?(ActiveRecord::Base) && ActiveRecord::Base.connection.disconnect!
end

# After worker processes are forked
after_fork do |server, worker|
  # The following is highly recommended for Rails in production:
  #
  # Redis connection pool
  # if defined?(Redis)
  #   $redis.client.reconnect
  # end
  #
  # If you are using ActiveRecord, you can do this like so:
  defined?(ActiveRecord::Base) && ActiveRecord::Base.establish_connection
end

Explanation:

worker_processes ENV.fetch('WEB_CONCURRENCY') { 4 }.to_i: Sets the number of worker processes. This is typically set via an environment variable.
preload_app true: Similar to Puma’s preload_app!, this loads the application code before forking workers, significantly improving startup time and reducing memory duplication.
listen ENV.fetch('PORT') { 3000 }.to_i: Configures Unicorn to listen on a specific port.
max_requests 5000: A crucial setting to prevent memory leaks. Unicorn will restart a worker after it has handled 5000 requests.
timeout 30: If a worker doesn’t respond within 30 seconds, it’s killed.
before_fork and after_fork callbacks: Essential for managing database connections and other resources. Disconnecting the database connection in before_fork and re-establishing it in after_fork ensures each worker has its own clean connection.

DynamoDB Tuning and Best Practices on AWS

DynamoDB is a fully managed NoSQL database service that offers seamless scalability. However, achieving optimal performance and cost-efficiency requires careful consideration of its throughput provisioning, data modeling, and query patterns.

Throughput Provisioning: RCU and WCU

DynamoDB operates on a provisioned throughput model, where you define the Read Capacity Units (RCUs) and Write Capacity Units (WCUs) your table needs. Exceeding provisioned throughput results in throttled requests.

Understanding RCUs and WCUs

RCU (Read Capacity Unit): One RCU can perform one eventually consistent read per second, or 0.5 strongly consistent reads per second, for an item up to 4 KB in size.
WCU (Write Capacity Unit): One WCU can perform one write per second for an item up to 1 KB in size.

Tuning Throughput

1. Auto Scaling: This is the most recommended approach for dynamic workloads. Configure Auto Scaling to automatically adjust RCUs and WCUs based on actual traffic, maintaining a target utilization percentage (e.g., 70%).

AWS CLI Example for Auto Scaling Configuration

# Enable Auto Scaling for a table
aws dynamodb put-table-scaling-policy --table-name YourTableName \
    --policy-name YourTableReadScalingPolicy \
    --policy-json '{
        "TargetTrackingScalingPolicyConfiguration": {
            "TargetValue": 70.0,
            "ScaleInCooldown": 300,
            "ScaleOutCooldown": 300,
            "PredefinedMetricSpecification": {
                "PredefinedMetricType": "DynamoDBReadCapacityUtilization"
            }
        }
    }'

aws dynamodb put-table-scaling-policy --table-name YourTableName \
    --policy-name YourTableWriteScalingPolicy \
    --policy-json '{
        "TargetTrackingScalingPolicyConfiguration": {
            "TargetValue": 70.0,
            "ScaleInCooldown": 300,
            "ScaleOutCooldown": 300,
            "PredefinedMetricSpecification": {
                "PredefinedMetricType": "DynamoDBWriteCapacityUtilization"
            }
        }
    }'

# Set minimum and maximum capacity for Auto Scaling
aws application-autoscaling register-scalable-target --service-namespace dynamodb \
    --resource-id table/YourTableName \
    --scalable-dimension dynamodb:table:ReadCapacityUnits \
    --min-capacity 5 \
    --max-capacity 1000

aws application-autoscaling register-scalable-target --service-namespace dynamodb \
    --resource-id table/YourTableName \
    --scalable-dimension dynamodb:table:WriteCapacityUnits \
    --min-capacity 5 \
    --max-capacity 1000

2. Manual Provisioning: For predictable workloads, you can manually set RCU/WCU. Monitor CloudWatch metrics (ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits, ThrottledRequests) to adjust these values. If you see frequent throttling, increase provisioned capacity. If capacity is consistently underutilized, decrease it to save costs.

Data Modeling for Performance

DynamoDB’s performance is heavily influenced by your data model. A well-designed schema minimizes the need for complex queries and scans.

Key Principles:

Single Table Design: Often preferred for performance and cost. Group related entities into a single table, using a generic primary key (e.g., PK, SK) and a sort key.
Access Patterns First: Design your tables around how you will query the data, not just how the data relates logically.
Avoid Scans: Full table scans are expensive and slow. Always aim to query using primary keys or secondary indexes.
Use Secondary Indexes (GSI/LSI): Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs) allow you to query data on attributes other than the primary key. GSIs are more flexible but have their own provisioned throughput.

Example: Single Table Design with GSI

Consider a scenario where you need to store users and their orders. Instead of two tables, use one:

{
    "TableName": "MyAppTable",
    "KeySchema": [
        { "AttributeName": "PK", "KeyType": "HASH" },  // Partition Key (e.g., "USER#user_id" or "ORDER#order_id")
        { "AttributeName": "SK", "KeyType": "RANGE" }   // Sort Key (e.g., "ORDER#order_id" or "METADATA")
    ],
    "AttributeDefinitions": [
        { "AttributeName": "PK", "AttributeType": "S" },
        { "AttributeName": "SK", "AttributeType": "S" },
        { "AttributeName": "GSI1_PK", "AttributeType": "S" }, // For GSI
        { "AttributeName": "GSI1_SK", "AttributeType": "S" }  // For GSI
    ],
    "GlobalSecondaryIndexes": [
        {
            "IndexName": "GSI1",
            "KeySchema": [
                { "AttributeName": "GSI1_PK", "KeyType": "HASH" },
                { "AttributeName": "GSI1_SK", "KeyType": "RANGE" }
            ],
            "Projection": {
                "ProjectionType": "ALL" // Or KEYS_ONLY, INCLUDE
            },
            "ProvisionedThroughput": {
                "ReadCapacityUnits": 10,
                "WriteCapacityUnits": 10
            }
        }
    ],
    "ProvisionedThroughput": {
        "ReadCapacityUnits": 10,
        "WriteCapacityUnits": 10
    }
}

In this model:

A user might have an item like: PK: "USER#123", SK: "METADATA", name: "Alice", email: "[email protected]"
An order for that user might be: PK: "USER#123", SK: "ORDER#abc", order_date: "2023-10-27", total: 50.00
To query all orders for user 123: Query(PK='USER#123', SK begins_with='ORDER#')
To query orders by date across all users (using GSI): Query(IndexName='GSI1', GSI1_PK='ORDERS', GSI1_SK='2023-10-27') (assuming GSI1_PK is ‘ORDERS’ and GSI1_SK is the order date for all order items).

Query Optimization

Efficiently querying DynamoDB is paramount. Understand the difference between Query and Scan operations.

`Query` vs. `Scan`

Query: Retrieves items based on the primary key (partition key and optional sort key conditions). It’s efficient as it only reads data from the relevant partitions.
Scan: Reads every item in a table or index. It’s inefficient and should be avoided in production for large tables. If you must use a scan, filter the results client-side or server-side using FilterExpression, but be aware that DynamoDB still reads all items before filtering.

Best Practices for Queries:

Use Primary Keys: Always try to query using the partition key and sort key.
Leverage Secondary Indexes: Use GSIs for querying on non-key attributes.
ProjectionExpression: Specify only the attributes you need to retrieve to reduce data transfer and cost.
Limit: Use the Limit parameter to retrieve a subset of items if you don’t need all matching results in a single request. Combine with LastEvaluatedKey for pagination.
Batch Operations: For multiple item reads or writes, use BatchGetItem or BatchWriteItem to reduce the number of API calls and improve efficiency.

Example: Ruby SDK for DynamoDB Query

require 'aws-sdk-dynamodb'

# Initialize the DynamoDB client
dynamodb = Aws::DynamoDB::Client.new(region: 'us-east-1') # Replace with your region

table_name = 'MyAppTable'
user_id = '123'
order_prefix = 'ORDER#'

# Query for all orders for a specific user
params = {
  table_name: table_name,
  key_condition_expression: '#pk = :pk_val AND begins_with(#sk, :sk_prefix)',
  expression_attribute_names: {
    '#pk' => 'PK',
    '#sk' => 'SK'
  },
  expression_attribute_values: {
    ':pk_val' => "USER\##{user_id}",
    ':sk_prefix' => order_prefix
  }
}

begin
  result = dynamodb.query(params)
  puts "Found #{result.items.count} orders for user #{user_id}:"
  result.items.each do |item|
    puts "- Order ID: #{item['SK'].split('#').last}, Total: #{item['total']}"
  end
rescue Aws::DynamoDB::Errors::ServiceError => e
  puts "Error querying DynamoDB: #{e.message}"
end

# Example of using a GSI
gsi_params = {
  table_name: table_name,
  index_name: 'GSI1',
  key_condition_expression: '#gsi_pk = :gsi_pk_val AND #gsi_sk = :gsi_sk_val',
  expression_attribute_names: {
    '#gsi_pk' => 'GSI1_PK',
    '#gsi_sk' => 'GSI1_SK'
  },
  expression_attribute_values: {
    ':gsi_pk_val' => 'ORDERS', # Assuming GSI1_PK is 'ORDERS' for all order items
    ':gsi_sk_val' => '2023-10-27' # Query for orders on a specific date
  }
}

begin
  gsi_result = dynamodb.query(gsi_params)
  puts "Found #{gsi_result.items.count} orders on 2023-10-27:"
  gsi_result.items.each do |item|
    puts "- User ID: #{item['PK'].split('#').last}, Order ID: #{item['SK'].split('#').last}"
  end
rescue Aws::DynamoDB::Errors::ServiceError => e
  puts "Error querying GSI: #{e.message}"
end

By meticulously tuning Nginx, your Ruby application server (Puma/Unicorn), and DynamoDB, you can build a highly performant and scalable infrastructure on AWS. Continuous monitoring and iterative adjustments based on real-world metrics are key to maintaining optimal performance.

The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and DynamoDB on AWS for Ruby

Nginx Configuration for High-Traffic Ruby Applications

Worker Processes and Connections

Tuning worker_processes and worker_connections

Example Nginx Configuration Snippet

Buffering and Keepalive

Tuning Buffering and Keepalive

Gunicorn/Puma Configuration for Ruby Applications

Puma Configuration

Tuning Workers and Threads

Example Puma Configuration (config/puma.rb)

Unicorn Configuration

Tuning Workers

Example Unicorn Configuration (config/unicorn.rb)

DynamoDB Tuning and Best Practices on AWS

Throughput Provisioning: RCU and WCU

Understanding RCUs and WCUs

Tuning Throughput

AWS CLI Example for Auto Scaling Configuration

Data Modeling for Performance

Key Principles:

Example: Single Table Design with GSI

Query Optimization

Query vs. Scan

Best Practices for Queries:

Example: Ruby SDK for DynamoDB Query

Recent Posts

Top Categories

Our Products

Our Services

Tuning `worker_processes` and `worker_connections`

Example Puma Configuration (`config/puma.rb`)

Example Unicorn Configuration (`config/unicorn.rb`)

`Query` vs. `Scan`