Scaling Ruby on Linode to Handle 50,000+ Concurrent Requests

Understanding the Bottlenecks: From Single Server to Distributed Architecture

Achieving 50,000+ concurrent requests on a Ruby application, especially one hosted on Linode, necessitates a fundamental shift from a monolithic, single-server mindset to a robust, distributed architecture. The primary bottlenecks typically manifest in three key areas: CPU saturation within the application servers, I/O limitations (database, network, disk), and memory exhaustion. Simply throwing more RAM at a single Linode instance will yield diminishing returns and eventually hit hard limits. Our strategy will focus on horizontal scaling, intelligent load balancing, and optimizing each layer of the stack.

Strategic Infrastructure: Linode Instance Types and Network Design

For this scale, we’ll leverage Linode’s Compute Optimized or High Memory instances. Compute Optimized instances offer superior CPU performance per dollar, crucial for Ruby’s often CPU-bound operations. High Memory instances are beneficial if your application has significant memory footprints, but always profile first. A typical setup might involve:

Web/Application Servers: Multiple Linode instances running your Ruby application (e.g., Rails, Sinatra). The number will depend on your application’s resource consumption per request. Start with 3-5 and scale up.
Load Balancer: A dedicated Linode instance (or a managed service if preferred) configured with HAProxy or Nginx. This is the entry point for all traffic.
Database Server: A separate, powerful Linode instance (potentially High Memory) dedicated to your database (e.g., PostgreSQL, MySQL). Consider read replicas for scaling read-heavy workloads.
Caching Layer: A separate Linode instance for Redis or Memcached to offload database reads.
Background Job Workers: Separate instances for processing asynchronous tasks (e.g., Sidekiq, Delayed Job).

Load Balancer Configuration: HAProxy for High Availability

HAProxy is an excellent choice for its performance, reliability, and feature set. We’ll configure it for TCP mode initially for maximum flexibility, but HTTP mode offers more advanced features like SSL termination and request routing.

HAProxy Configuration Snippet

Assume your application servers are on IPs 192.168.1.10, 192.168.1.11, 192.168.1.12, all listening on port 3000 (e.g., Puma). The load balancer will listen on port 80.

# /etc/haproxy/haproxy.cfg

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000ms
    timeout client  50000ms
    timeout server  50000ms
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend http_frontend
    bind *:80
    acl is_static       path_beg       -i /assets /images /javascripts /stylesheets
    use_backend static_backend if is_static
    default_backend     app_servers

backend app_servers
    balance roundrobin
    option httpchk GET /healthcheck.txt HTTP/1.1\r\nHost: example.com
    http-check expect status 200
    server app1 192.168.1.10:3000 check
    server app2 192.168.1.11:3000 check
    server app3 192.168.1.12:3000 check

backend static_backend
    # Serve static assets directly if Nginx is configured on LB or use CDN
    # For simplicity, this example assumes static assets are handled by app servers
    # or a separate Nginx instance. If serving from LB, configure Nginx here.
    # For now, we'll just pass through to app servers.
    balance roundrobin
    server app1 192.168.1.10:3000
    server app2 192.168.1.11:3000
    server app3 192.168.1.12:3000

listen stats
    bind *:8404
    mode http
    stats enable
    stats uri /stats
    stats realm Haproxy\ Statistics
    stats auth admin:YourSecurePassword

Explanation:

global and defaults: Standard HAProxy configuration for logging, timeouts, and error handling.
frontend http_frontend: Listens on port 80. It uses an ACL to potentially route static assets to a different backend (though in this simplified example, they still go to app servers).
backend app_servers: Uses roundrobin for load distribution. option httpchk and http-check expect status 200 configure health checks to ensure traffic is only sent to healthy application instances.
backend static_backend: A placeholder for serving static assets. In a production environment, you’d likely use Nginx on the load balancer itself or a CDN.
listen stats: Exposes HAProxy’s statistics page on port 8404, invaluable for monitoring. Remember to change YourSecurePassword.

Application Server Optimization: Puma and Ruby Tuning

The choice of application server and its configuration is paramount. Puma is a popular and performant choice for Ruby. We need to tune its worker and thread counts based on the Linode instance’s CPU cores and available RAM.

Puma Configuration (`config/puma.rb`)

# config/puma.rb

# Change to match your Linode instance's CPU cores.
# For a 4-core instance, you might start with 2 workers.
# Over-provisioning workers can lead to context switching overhead.
workers ENV.fetch("WEB_CONCURRENCY") { 2 }.to_i

# Threads per worker. This is crucial.
# A common starting point is 5 threads per worker.
# Tune this based on your application's I/O vs CPU bound nature.
# If your app is very CPU-bound, fewer threads might be better.
# If it's I/O bound (waiting for DB, external APIs), more threads can help.
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }.to_i
threads threads_count, threads_count

preload_app!

# Environment-specific settings
environment ENV.fetch("RAILS_ENV") { "production" }

# Logging
stdout_redirect "/var/log/puma.stdout.log", "/var/log/puma.stderr.log", true

# Daemonize the server into the background.
# This is often handled by systemd or other process managers in production.
# daemonize true

# PID file location
pidfile "/var/run/puma.pid"

# State file location
state_path "/var/run/puma.state"

# Allow Puma to be restarted by `rails restart` command.
plugin :tmp_restart

# Configure Puma to listen on a specific port and bind to all interfaces
# This is often handled by the load balancer, but good to have a default.
# If using a Unix socket for communication with Nginx/HAProxy on the same machine:
# activate_control_socket '/tmp/pumactl.sock'
# bind "unix:///var/run/puma.sock"

# If listening on TCP (e.g., for HAProxy on a different machine):
bind "tcp://0.0.0.0:3000"

on_worker_boot do
  # Worker specific setup code.
  # e.g., ActiveRecord::Base.establish_connection
end

before_fork do
  # Ensures that the main process doesn't hold onto connections.
  ActiveRecord::Base.connection_pool.disconnect! if defined?(ActiveRecord::Base)
end

Tuning Strategy:

Workers vs. Threads: Puma uses multiple processes (workers) and multiple threads within each worker. Workers leverage multiple CPU cores, while threads handle concurrent requests within a single process. The optimal balance depends heavily on your application’s characteristics.
CPU Cores: Set workers to roughly half the number of CPU cores available on your Linode instance. This leaves cores for the OS and other processes.
Memory: Set threads to a value that doesn’t exhaust RAM. Each thread consumes memory. Monitor memory usage closely. A common starting point is 5 threads per worker.
preload_app!: Essential for production. It loads your application code before forking workers, reducing memory duplication and startup time for each worker.
Health Checks: Ensure your application has a simple endpoint (e.g., /healthcheck.txt returning “OK”) that HAProxy can use to verify its health.

Database Scaling and Optimization

The database is often the most significant bottleneck. For 50,000+ concurrent requests, a single database instance will struggle. We need to consider:

Read Replicas

Configure PostgreSQL or MySQL replication. Direct read-heavy queries to replica instances, leaving the primary instance to handle writes. This requires application-level logic or a proxy like PgBouncer or ProxySQL to route queries.

Connection Pooling

Each web request typically opens a database connection. With thousands of concurrent requests, this can overwhelm the database. Use a connection pooler like PgBouncer (for PostgreSQL) or HikariCP (if using Java-based tools) or ensure your ORM’s connection pooling is adequately configured.

Query Optimization

This is non-negotiable. Regularly analyze slow queries using EXPLAIN ANALYZE (PostgreSQL) or the slow query log (MySQL). Ensure proper indexing, avoid N+1 query problems, and denormalize where appropriate for read performance.

Caching Strategies

Implement aggressive caching at multiple levels:

HTTP Caching: Use HTTP headers (Cache-Control, ETag) for browser and CDN caching.
Application-Level Caching: Cache expensive computations, API responses, or frequently accessed data in Redis or Memcached.
Fragment Caching (Rails): Cache parts of your views.
Russian Doll Caching (Rails): Cache nested view fragments.

Redis/Memcached Setup

Deploy Redis or Memcached on a separate Linode instance. Ensure it’s accessible only from your application servers and database server (using private networking).

# Example: Basic Redis configuration for performance
# /etc/redis/redis.conf

daemonize yes
pidfile /var/run/redis_6379.pid
port 6379
# Consider binding to a private IP for security
# bind 192.168.1.50

tcp-backlog 511
timeout 0
tcp-keepalive 300
# Use a faster memory allocator if available and tested
# jemalloc is often recommended
# set- திக்கு-allocator "jemalloc"

databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis

logfile /var/log/redis/redis-server.log
loglevel notice

# Enable append only file for durability if needed, but can impact performance
# appendonly no

# Max clients - adjust based on expected connections
# maxclients 10000

# Max memory - CRITICAL for preventing OOM errors
# Set this to a value less than the Linode's total RAM, leaving room for OS and other processes.
# Example for an 8GB RAM Linode:
# maxmemory 6gb
# maxmemory-policy allkeys-lru # Or another suitable eviction policy

Tuning maxmemory and maxmemory-policy is crucial to prevent Redis from consuming all available RAM and causing instability.

Background Job Processing

Offload any non-critical, time-consuming tasks (email sending, image processing, report generation) to background job workers. This frees up your web application servers to handle incoming requests quickly.

Sidekiq Configuration Example

Run Sidekiq on dedicated Linode instances. The number of instances and the concurrency (threads/processes) within Sidekiq depend on the workload.

# config/initializers/sidekiq.rb

# Example configuration for Sidekiq concurrency
# Adjust based on your worker Linode's CPU cores and memory.
# A common pattern is to use more threads than Puma workers, as Sidekiq
# often performs I/O bound tasks.
concurrency = ENV.fetch("SIDEKIQ_CONCURRENCY") { 25 }.to_i

Sidekiq.configure_server do |config|
  config.redis = { url: ENV.fetch("REDIS_URL") { "redis://localhost:6379/0" } }
  config.concurrency = concurrency
  # Other configurations like Sidekiq Enterprise features, etc.
end

Sidekiq.configure_client do |config|
  config.redis = { url: ENV.fetch("REDIS_URL") { "redis://localhost:6379/0" } }
end

Deployment Strategy:

Deploy Sidekiq processes using a process manager like systemd or foreman.
Ensure Sidekiq workers can connect to your Redis instance.
Monitor Sidekiq queues and worker performance using the Sidekiq Web UI.

Monitoring, Alerting, and Iterative Scaling

Scaling is not a one-time event; it’s an ongoing process. Robust monitoring is essential.

Key Metrics to Monitor

Application Servers: CPU utilization, memory usage, request latency, error rates (5xx, 4xx), Puma worker/thread status.
Load Balancer: Active connections, request rate, backend server health, HAProxy stats.
Database: CPU, memory, disk I/O, active connections, query latency, replication lag.
Cache (Redis/Memcached): Memory usage, hit/miss ratio, CPU, network I/O.
Background Workers: Queue depth, job processing time, worker utilization.

Utilize tools like Prometheus/Grafana, Datadog, New Relic, or Linode’s built-in monitoring. Set up alerts for critical thresholds (e.g., CPU > 80% for 5 minutes, high error rates, replication lag). When metrics indicate saturation, scale horizontally by adding more application servers, database replicas, or worker instances. Remember to test changes in a staging environment before deploying to production.

Conclusion: A Layered Approach

Handling 50,000+ concurrent requests with Ruby on Linode is achievable through a well-architected, distributed system. It requires meticulous tuning at every layer: intelligent load balancing with HAProxy, optimized application server configurations (Puma), robust database scaling (read replicas, connection pooling), effective caching, and efficient background job processing. Continuous monitoring and iterative adjustments are key to maintaining performance and stability under heavy load.