Scaling Ruby on OVH to Handle 50,000+ Concurrent Requests

Architectural Foundation: Beyond a Single Rails Instance

Achieving 50,000+ concurrent requests with Ruby on Rails on any cloud provider, including OVH, necessitates a fundamental shift from a monolithic, single-instance deployment to a distributed, horizontally scalable architecture. This isn’t about tweaking a few Rails parameters; it’s about designing a system that can gracefully handle load by adding more resources rather than trying to squeeze performance out of a single, overburdened server. Our approach centers on a multi-tiered system: a robust load balancing layer, a fleet of application servers, a dedicated caching layer, and a performant database cluster.

Load Balancing with HAProxy: The Gatekeeper

HAProxy is our chosen weapon for distributing incoming traffic. Its low resource footprint, high performance, and extensive configuration options make it ideal for this task. We’ll configure it to manage health checks and distribute requests across our Rails application servers using a round-robin or least-connections algorithm. For high availability, we deploy HAProxy in an active-passive or active-active cluster, often leveraging Keepalived for floating IP management.

HAProxy Configuration for Rails

Here’s a sample HAProxy configuration snippet. Note the health check (`option httpchk GET /health`) which is crucial for ensuring traffic is only sent to healthy application instances. We also set appropriate timeouts to prevent long-lived connections from hogging resources.

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend http_frontend
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/your_domain.pem
    acl is_static       path_beg       -i /assets
    use_backend static_backend if is_static
    default_backend     rails_backend

backend rails_backend
    balance leastconn
    option httpchk GET /health
    http-request set-header X-Forwarded-Port %[dst_port]
    http-request add-header X-Forwarded-Proto https if { ssl_fc }
    server app1 10.0.0.1:80 check
    server app2 10.0.0.2:80 check
    server app3 10.0.0.3:80 check
    server app4 10.0.0.4:80 check
    # Add more app servers as needed

backend static_backend
    server static1 /var/www/your_app/public/assets
    # Configure this to serve static assets directly, bypassing Rails

Application Servers: Puma and Worker Processes

We’re using Puma as our application server. Its multi-threaded and multi-process capabilities are essential. For 50,000+ concurrent requests, a single Puma worker process is insufficient. We configure Puma to run multiple worker processes, each with multiple threads. The optimal ratio depends heavily on the application’s I/O-bound vs. CPU-bound nature and available server resources. A common starting point is 2-4 worker processes per CPU core, with 5-10 threads per worker.

Puma Configuration (`config/puma.rb`)

This configuration aims to maximize concurrency while leaving room for the OS and other services. The `workers` count should be tuned based on your server’s CPU cores and memory. `threads` should be set to handle concurrent requests within a single worker process. `preload_app true` is critical for performance as it loads the application code once per worker, rather than on each request.

# config/puma.rb
workers Integer(ENV.fetch("WEB_CONCURRENCY") { 4 }) # Number of worker processes
threads_count = Integer(ENV.fetch("RAILS_MAX_THREADS") { 5 }) # Threads per worker

threads threads_count, threads_count

preload_app!

rackup "config.ru"
environment ENV.fetch("RAILS_ENV") { "production" }

# Allow Puma to be restarted by `rails restart` command.
plugin :tmp_restart

# Configure bind address and port
bind "tcp://0.0.0.0:8080" # Or your desired port

# Logging
stdout_redirect "/var/log/puma.stdout.log", "/var/log/puma.stderr.log", true

# State file for Puma
state_path "/tmp/puma.state"

# PID file
pidfile "/tmp/puma.pid"

# On fork, the application is reloaded. This is good for development,
# but for production, you want to load the app once and then fork.
# The `preload_app!` directive above handles this.
on_worker_boot do
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

# Clean up when workers exit
on_worker_shutdown do
  ActiveRecord::Base.connection.disconnect! if defined?(ActiveRecord)
end

To manage these Puma workers, we use a process manager like `systemd` or `foreman`. For production, `systemd` is preferred for its robustness and integration with the OS.

Systemd Service File Example

# /etc/systemd/system/your_app.service
[Unit]
Description=YourApp Puma Server
After=network.target

[Service]
Type=simple
User=your_app_user
Group=your_app_group
WorkingDirectory=/path/to/your_app
Environment="RAILS_ENV=production"
Environment="WEB_CONCURRENCY=4"
Environment="RAILS_MAX_THREADS=5"
ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb
ExecStop=/usr/local/bin/bundle exec puma --pidfile /tmp/puma.pid --state /tmp/puma.state stop
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Caching Strategy: Redis for Speed

A significant portion of requests can be served from cache, dramatically reducing the load on application servers and the database. Redis is our go-to for in-memory caching due to its speed, versatility, and features like TTL (Time To Live) for automatic cache invalidation. We implement caching at various levels: fragment caching within views, page caching for static-like content, and object caching for expensive database queries.

Rails Cache Store Configuration

# config/environments/production.rb
Rails.application.configure do
  # ... other configurations ...

  config.cache_store = :redis_cache_store, {
    url: ENV.fetch("REDIS_URL") { "redis://localhost:6379/0" },
    driver: :hiredis, # Use hiredis for better performance
    expires_in: 90.minutes, # Default expiration for cache entries
    # Other Redis options can be passed here
    # e.g., :password => 'your_redis_password'
  }

  # Enable eager loading for production
  config.eager_load = true

  # ... other configurations ...
end

For high availability and performance, Redis should be deployed as a cluster or with Sentinel for failover, ideally on separate instances from the application servers. OVH’s managed Redis services can simplify this, but for maximum control and performance, self-hosting on dedicated instances is often preferred.

Database Scaling: PostgreSQL and Read Replicas

The database is often the ultimate bottleneck. For a read-heavy application, implementing read replicas is paramount. PostgreSQL’s built-in replication capabilities are robust. We configure a primary PostgreSQL instance for writes and multiple replica instances for reads. Application code needs to be aware of this separation, directing write operations to the primary and read operations to the replicas.

Rails Database Configuration with Replicas

# config/database.yml

default: &default
  adapter: postgresql
  encoding: unicode
  pool: 5 # Adjust pool size based on Puma threads and worker count
  host: primary_db_host # Your primary DB host
  username: your_db_user
  password: your_db_password

development:
  <<: *default
  database: your_app_development

test:
  <<: *default
  database: your_app_test

production:
  primary:
    <<: *default
    database: your_app_production
    host: primary_db_host # Explicitly set primary host

  replica:
    <<: *default
    database: your_app_production
    host: replica_db_host_1 # Your first replica host
    replica_of: primary # This is a conceptual key, not a direct PG feature

  replica_2:
    <<: *default
    database: your_app_production
    host: replica_db_host_2 # Your second replica host
    replica_of: primary

# Example of how to use replicas in Rails (e.g., in models or controllers)
# User.on(:reads).find(1) # Directs query to a replica
# User.on(:writes).find(1) # Directs query to primary (default)
# Or use gems like 'makara' for automatic load balancing across replicas

For automatic load balancing and failover across read replicas, consider gems like `makara`. This abstracts away the complexity of directing reads to available replicas.

Asynchronous Processing: Sidekiq for Background Jobs

Any task that doesn't require an immediate response should be offloaded to a background job processing system. Sidekiq, powered by Redis, is an excellent choice for Ruby. This includes sending emails, processing images, generating reports, and any long-running API calls. By offloading these, we keep our web request cycle short and responsive.

Sidekiq Configuration and Deployment

# config/initializers/sidekiq.rb
Sidekiq.configure_server do |config|
  config.redis = { url: ENV.fetch("REDIS_URL") { "redis://localhost:6379/1" } } # Use a different Redis DB for Sidekiq
end

Sidekiq.configure_client do |config|
  config.redis = { url: ENV.fetch("REDIS_URL") { "redis://localhost:6379/1" } }
end

Sidekiq workers should be deployed on separate servers or as separate `systemd` services, scaled independently based on the job queue load. Monitoring queue lengths is critical to ensure jobs are processed in a timely manner.

Monitoring and Alerting: The Eyes of the System

To maintain performance and stability at scale, comprehensive monitoring is non-negotiable. We deploy tools like Prometheus for metrics collection, Grafana for visualization, and Alertmanager for notifications. Key metrics to track include:

HAProxy: Request rates, error rates (5xx, 4xx), connection counts, backend health.
Puma: Worker/thread utilization, request latency, memory usage.
Redis: Memory usage, hit/miss ratio, command latency, connections.
PostgreSQL: Query performance, connection counts, CPU/memory usage, replication lag.
Sidekiq: Queue lengths, job processing times, worker availability.
Application-level metrics: Custom metrics for critical business operations.

Alerting should be configured for critical thresholds (e.g., high error rates, long queue lengths, database replication lag exceeding acceptable limits) to proactively address issues before they impact users.

OVH Specific Considerations

When deploying on OVH, leverage their dedicated servers for compute-intensive tasks (application servers, HAProxy) and their managed database services (if suitable for your SLA) or deploy your own PostgreSQL/Redis clusters on dedicated instances. Ensure your network configuration within OVH allows for efficient communication between these tiers. Utilize OVH's load balancing services if you prefer a managed solution, though HAProxy offers more granular control. For persistent storage, consider OVH's block storage solutions for your application servers.

Conclusion: Iterative Scaling

Scaling to 50,000+ concurrent requests is an ongoing process. The architecture described provides a robust foundation. Continuous performance testing, profiling, and monitoring are essential to identify bottlenecks and iteratively refine the configuration. Each component—load balancer, application server, cache, and database—must be scaled and optimized in tandem. This distributed, multi-tiered approach, combined with smart caching and asynchronous processing, is the key to handling such high loads effectively on OVH or any other cloud infrastructure.