The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and Elasticsearch on Linode for Ruby

Nginx as a High-Performance Frontend for Ruby Applications

When deploying Ruby applications, particularly those built with frameworks like Ruby on Rails or Sinatra, Nginx serves as an indispensable frontend. Its strengths lie in efficiently handling static assets, SSL termination, request buffering, and load balancing. For optimal performance, we’ll focus on tuning Nginx for maximum throughput and minimal latency.

Core Nginx Configuration Tuning

The primary configuration file, typically located at /etc/nginx/nginx.conf, contains global settings. Key directives to consider for performance include:

worker_processes: Set this to the number of CPU cores available on your Linode instance. Too few can lead to underutilization; too many can cause context-switching overhead.
worker_connections: This defines the maximum number of simultaneous connections a worker process can handle. A common starting point is 1024 or higher, depending on expected traffic.
keepalive_timeout: Controls how long an idle HTTP connection will remain open. A lower value (e.g., 65 seconds) can free up resources faster, while a higher value might improve performance for clients with high latency.
sendfile: Set to on to enable zero-copy data transfer from kernel space, significantly speeding up static file delivery.
tcp_nopush and tcp_nodelay: Setting these to on can improve network efficiency by reducing the number of packets sent and optimizing their delivery.

Here’s an example snippet from nginx.conf:

worker_processes auto; # Or set to the number of CPU cores
events {
    worker_connections 4096; # Adjust based on expected load
    multi_accept on;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    tcp_nopush      on;
    tcp_nodelay     on;

    keepalive_timeout  65;
    keepalive_requests 1000; # Limit requests per keepalive connection

    # Gzip compression for dynamic content
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    # Buffering settings for upstream connections
    proxy_buffering on;
    proxy_buffer_size 16k;
    proxy_buffers 8 32k;
    proxy_busy_buffers_size 64k;

    # Include server configurations
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Optimizing Static Asset Delivery

Nginx excels at serving static files directly, bypassing the Ruby application entirely. This is crucial for performance. Configure your server block to leverage this:

server {
    listen 80;
    server_name your_domain.com www.your_domain.com;
    root /var/www/your_app/public; # Path to your Rails/Sinatra public directory

    # Serve static assets directly
    location ~ ^/(assets|images|javascripts|stylesheets|system)/ {
        expires 1y;
        add_header Cache-Control "public";
        try_files $uri $uri/ =404;
    }

    # Proxy requests to the application server (Gunicorn/Puma/Unicorn)
    location / {
        proxy_pass http://unix:/run/your_app.sock; # Or http://127.0.0.1:8000
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 300s; # Increase timeout for potentially long requests
        proxy_connect_timeout 75s;
    }

    # SSL configuration (if applicable)
    # listen 443 ssl;
    # ssl_certificate /etc/letsencrypt/live/your_domain.com/fullchain.pem;
    # ssl_certificate_key /etc/letsencrypt/live/your_domain.com/privkey.pem;
    # ... other SSL settings
}

Gunicorn/Puma Tuning for Ruby Applications

For Ruby applications, Gunicorn (Python WSGI HTTP Server) is often used as a process manager, even though it’s a Python server. More commonly, Ruby applications use Puma or Unicorn. We’ll focus on Puma as it’s the default for Rails and widely adopted.

Puma Worker and Thread Configuration

Puma operates with a master process that spawns multiple workers. Each worker can then manage multiple threads. The key is to balance the number of workers and threads to match your server’s CPU and memory resources, and the nature of your application’s workload (I/O-bound vs. CPU-bound).

Workers: Each worker is a separate Ruby process. More workers increase parallelism but consume more memory. A common strategy is to set workers to (CPU cores * 2) + 1.
Threads: Threads within a worker handle concurrent requests. More threads can handle more requests simultaneously without spawning new processes, but excessive threads can lead to contention and context switching overhead. A typical range is 4-16 threads per worker.

You can configure Puma via a config/puma.rb file in your Rails application:

# config/puma.rb

# Set the environment
environment ENV.fetch('RAILS_ENV') { 'production' }

# Number of workers to spawn.
# For a Linode with 4 cores, 2*4+1 = 9 workers might be a good starting point.
# Adjust based on memory usage.
workers ENV.fetch('WEB_CONCURRENCY') { 4 }.to_i

# Minimum number of threads per worker.
# If your app is I/O bound, you might increase this.
threads_count = ENV.fetch('RAILS_MAX_THREADS') { 5 }.to_i
threads threads_count, threads_count

# Bind to a Unix socket for Nginx to connect to.
# Ensure Nginx has read/write permissions to this socket's directory.
bind "unix:///run/your_app.sock"

# Or bind to a TCP port if Nginx is on a different machine or for development.
# bind "tcp://0.0.0.0:8000"

# Set the maximum number of connections per worker.
# This is often set to the number of threads.
max_concurrency threads_count

# Set the timeout for requests.
# This should be less than Nginx's proxy_read_timeout.
request_timeout 60

# Logging
stdout_redirect "#{__dir__}/log/puma.stdout.log", "#{__dir__}/log/puma.stderr.log", true

# Preload the application code before workers are forked.
preload_app!

# Callbacks for worker lifecycle
on_worker_boot do
  # Worker specific setup code.
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

on_worker_shutdown do
  # Worker specific cleanup code.
end

# Allow Puma to be restarted by `rails restart` command.
plugin :tmp_restart

To run Puma with these settings, you’d typically use a process manager like systemd. Here’s a sample systemd service file for your application (e.g., /etc/systemd/system/your_app.service):

[Unit]
Description=Puma Application Server
After=network.target

[Service]
Type=simple
User=deploy # Or your application user
Group=www-data # Or your application group
WorkingDirectory=/var/www/your_app
Environment="RAILS_ENV=production"
Environment="RAILS_LOG_TO_STDOUT=disabled" # If logging to files
Environment="WEB_CONCURRENCY=4" # Matches workers in puma.rb
Environment="RAILS_MAX_THREADS=5" # Matches threads in puma.rb
ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb
ExecStop=/bin/kill -s TERM $MAINPID
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

After creating or modifying the service file, reload systemd and start your application:

sudo systemctl daemon-reload
sudo systemctl enable your_app.service
sudo systemctl start your_app.service
sudo systemctl status your_app.service

Elasticsearch Performance Tuning on Linode

Elasticsearch, while not directly serving web requests, is often a critical component for search functionality in Ruby applications. Optimizing its performance involves JVM tuning, shard management, and hardware considerations.

JVM Heap Size Configuration

The Java Virtual Machine (JVM) heap size is arguably the most critical setting for Elasticsearch performance. It dictates how much memory Elasticsearch can use for its data structures, caches, and operations. A common recommendation is to set the heap size to 50% of the system’s RAM, but never exceeding 30-32GB due to compressed ordinary object pointers (compressed oops).

Edit the jvm.options file, typically located at /etc/elasticsearch/jvm.options:

-Xms4g
-Xmx4g

In this example, we’ve allocated 4GB of RAM for the heap. Adjust -Xms (initial heap size) and -Xmx (maximum heap size) based on your Linode instance’s RAM and your cluster’s needs. Ensure both are set to the same value to prevent resizing during operation.

Shard Allocation and Sizing

The number and size of shards significantly impact search and indexing performance. Too many small shards can overwhelm the cluster with overhead; too few large shards can limit parallelism and recovery speed.

Shard Count: Aim for shards between 10GB and 50GB. For a 100GB index, 2-5 primary shards is a reasonable starting point.
Replicas: For high availability and read performance, use replicas. A common setup is 1 replica per primary shard.

You can manage shard settings via the Elasticsearch API. For example, to set the number of primary shards to 3 and replicas to 1 for an index named my_index:

PUT /my_index
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1
    }
  }
}

To update settings on an existing index:

PUT /my_index/_settings
{
  "index": {
    "number_of_replicas": 2
  }
}

Filesystem Cache and Swappiness

Elasticsearch relies heavily on the operating system’s filesystem cache. Ensure your Linode instance is configured to maximize its use.

Swappiness: Set the vm.swappiness kernel parameter to a low value (e.g., 1 or 10) to discourage the OS from swapping out Elasticsearch’s memory. Edit /etc/sysctl.conf and add/modify the line: vm.swappiness = 1. Then apply with sudo sysctl -p.
File Descriptors: Elasticsearch requires a high number of open file descriptors. Ensure the limits are set appropriately in /etc/security/limits.conf and the Elasticsearch systemd service file.

# Example /etc/security/limits.conf entries
* soft nofile 65536
* hard nofile 65536
root soft nofile 65536
root hard nofile 65536

# Example systemd service override for file descriptors
# Create a file like /etc/systemd/system/elasticsearch.service.d/override.conf
[Service]
LimitNOFILE=65536

After making changes to sysctl.conf or limits.conf, you’ll need to restart Elasticsearch for them to take effect. For systemd overrides, run sudo systemctl daemon-reload and then restart the service.

Monitoring and Diagnostics

Continuous monitoring is key to identifying bottlenecks. Use tools like:

Nginx: nginx -s reload, tail -f /var/log/nginx/access.log, tail -f /var/log/nginx/error.log, netstat -tulnp | grep nginx.
Puma: systemctl status your_app.service, journalctl -u your_app.service -f, check Puma’s log files.
Elasticsearch: Elasticsearch’s own monitoring APIs (e.g., _cat/nodes, _cat/indices, _cluster/stats), and external tools like Prometheus with the Elasticsearch exporter, or commercial APM solutions.

Regularly review logs for errors, high latency, and resource exhaustion. For Elasticsearch, pay close attention to garbage collection logs and search/indexing latency metrics.