Scaling Ruby on OVH to Handle 50,000+ Concurrent Requests
Architectural Foundation: Beyond a Single Rails Instance
Achieving 50,000+ concurrent requests with Ruby on Rails on any cloud provider, including OVH, necessitates a fundamental shift from a monolithic, single-instance deployment to a distributed, horizontally scalable architecture. This isn’t about tweaking a few Rails parameters; it’s about designing a system that can gracefully handle load by adding more resources rather than trying to squeeze performance out of a single, overburdened server. Our approach centers on a multi-tiered system: a robust load balancing layer, a fleet of application servers, a dedicated caching layer, and a performant database cluster.
Load Balancing with HAProxy: The Gatekeeper
HAProxy is our chosen weapon for distributing incoming traffic. Its low resource footprint, high performance, and extensive configuration options make it ideal for this task. We’ll configure it to manage health checks and distribute requests across our Rails application servers using a round-robin or least-connections algorithm. For high availability, we deploy HAProxy in an active-passive or active-active cluster, often leveraging Keepalived for floating IP management.
HAProxy Configuration for Rails
Here’s a sample HAProxy configuration snippet. Note the health check (`option httpchk GET /health`) which is crucial for ensuring traffic is only sent to healthy application instances. We also set appropriate timeouts to prevent long-lived connections from hogging resources.
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend http_frontend
bind *:80
bind *:443 ssl crt /etc/ssl/certs/your_domain.pem
acl is_static path_beg -i /assets
use_backend static_backend if is_static
default_backend rails_backend
backend rails_backend
balance leastconn
option httpchk GET /health
http-request set-header X-Forwarded-Port %[dst_port]
http-request add-header X-Forwarded-Proto https if { ssl_fc }
server app1 10.0.0.1:80 check
server app2 10.0.0.2:80 check
server app3 10.0.0.3:80 check
server app4 10.0.0.4:80 check
# Add more app servers as needed
backend static_backend
server static1 /var/www/your_app/public/assets
# Configure this to serve static assets directly, bypassing Rails
Application Servers: Puma and Worker Processes
We’re using Puma as our application server. Its multi-threaded and multi-process capabilities are essential. For 50,000+ concurrent requests, a single Puma worker process is insufficient. We configure Puma to run multiple worker processes, each with multiple threads. The optimal ratio depends heavily on the application’s I/O-bound vs. CPU-bound nature and available server resources. A common starting point is 2-4 worker processes per CPU core, with 5-10 threads per worker.
Puma Configuration (`config/puma.rb`)
This configuration aims to maximize concurrency while leaving room for the OS and other services. The `workers` count should be tuned based on your server’s CPU cores and memory. `threads` should be set to handle concurrent requests within a single worker process. `preload_app true` is critical for performance as it loads the application code once per worker, rather than on each request.
# config/puma.rb
workers Integer(ENV.fetch("WEB_CONCURRENCY") { 4 }) # Number of worker processes
threads_count = Integer(ENV.fetch("RAILS_MAX_THREADS") { 5 }) # Threads per worker
threads threads_count, threads_count
preload_app!
rackup "config.ru"
environment ENV.fetch("RAILS_ENV") { "production" }
# Allow Puma to be restarted by `rails restart` command.
plugin :tmp_restart
# Configure bind address and port
bind "tcp://0.0.0.0:8080" # Or your desired port
# Logging
stdout_redirect "/var/log/puma.stdout.log", "/var/log/puma.stderr.log", true
# State file for Puma
state_path "/tmp/puma.state"
# PID file
pidfile "/tmp/puma.pid"
# On fork, the application is reloaded. This is good for development,
# but for production, you want to load the app once and then fork.
# The `preload_app!` directive above handles this.
on_worker_boot do
ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end
# Clean up when workers exit
on_worker_shutdown do
ActiveRecord::Base.connection.disconnect! if defined?(ActiveRecord)
end
To manage these Puma workers, we use a process manager like `systemd` or `foreman`. For production, `systemd` is preferred for its robustness and integration with the OS.
Systemd Service File Example
# /etc/systemd/system/your_app.service [Unit] Description=YourApp Puma Server After=network.target [Service] Type=simple User=your_app_user Group=your_app_group WorkingDirectory=/path/to/your_app Environment="RAILS_ENV=production" Environment="WEB_CONCURRENCY=4" Environment="RAILS_MAX_THREADS=5" ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb ExecStop=/usr/local/bin/bundle exec puma --pidfile /tmp/puma.pid --state /tmp/puma.state stop Restart=always RestartSec=5 [Install] WantedBy=multi-user.target
Caching Strategy: Redis for Speed
A significant portion of requests can be served from cache, dramatically reducing the load on application servers and the database. Redis is our go-to for in-memory caching due to its speed, versatility, and features like TTL (Time To Live) for automatic cache invalidation. We implement caching at various levels: fragment caching within views, page caching for static-like content, and object caching for expensive database queries.
Rails Cache Store Configuration
# config/environments/production.rb
Rails.application.configure do
# ... other configurations ...
config.cache_store = :redis_cache_store, {
url: ENV.fetch("REDIS_URL") { "redis://localhost:6379/0" },
driver: :hiredis, # Use hiredis for better performance
expires_in: 90.minutes, # Default expiration for cache entries
# Other Redis options can be passed here
# e.g., :password => 'your_redis_password'
}
# Enable eager loading for production
config.eager_load = true
# ... other configurations ...
end
For high availability and performance, Redis should be deployed as a cluster or with Sentinel for failover, ideally on separate instances from the application servers. OVH’s managed Redis services can simplify this, but for maximum control and performance, self-hosting on dedicated instances is often preferred.
Database Scaling: PostgreSQL and Read Replicas
The database is often the ultimate bottleneck. For a read-heavy application, implementing read replicas is paramount. PostgreSQL’s built-in replication capabilities are robust. We configure a primary PostgreSQL instance for writes and multiple replica instances for reads. Application code needs to be aware of this separation, directing write operations to the primary and read operations to the replicas.
Rails Database Configuration with Replicas
# config/database.yml
default: &default
adapter: postgresql
encoding: unicode
pool: 5 # Adjust pool size based on Puma threads and worker count
host: primary_db_host # Your primary DB host
username: your_db_user
password: your_db_password
development:
<<: *default
database: your_app_development
test:
<<: *default
database: your_app_test
production:
primary:
<<: *default
database: your_app_production
host: primary_db_host # Explicitly set primary host
replica:
<<: *default
database: your_app_production
host: replica_db_host_1 # Your first replica host
replica_of: primary # This is a conceptual key, not a direct PG feature
replica_2:
<<: *default
database: your_app_production
host: replica_db_host_2 # Your second replica host
replica_of: primary
# Example of how to use replicas in Rails (e.g., in models or controllers)
# User.on(:reads).find(1) # Directs query to a replica
# User.on(:writes).find(1) # Directs query to primary (default)
# Or use gems like 'makara' for automatic load balancing across replicas
For automatic load balancing and failover across read replicas, consider gems like `makara`. This abstracts away the complexity of directing reads to available replicas.
Asynchronous Processing: Sidekiq for Background Jobs
Any task that doesn't require an immediate response should be offloaded to a background job processing system. Sidekiq, powered by Redis, is an excellent choice for Ruby. This includes sending emails, processing images, generating reports, and any long-running API calls. By offloading these, we keep our web request cycle short and responsive.
Sidekiq Configuration and Deployment
# config/initializers/sidekiq.rb
Sidekiq.configure_server do |config|
config.redis = { url: ENV.fetch("REDIS_URL") { "redis://localhost:6379/1" } } # Use a different Redis DB for Sidekiq
end
Sidekiq.configure_client do |config|
config.redis = { url: ENV.fetch("REDIS_URL") { "redis://localhost:6379/1" } }
end
Sidekiq workers should be deployed on separate servers or as separate `systemd` services, scaled independently based on the job queue load. Monitoring queue lengths is critical to ensure jobs are processed in a timely manner.
Monitoring and Alerting: The Eyes of the System
To maintain performance and stability at scale, comprehensive monitoring is non-negotiable. We deploy tools like Prometheus for metrics collection, Grafana for visualization, and Alertmanager for notifications. Key metrics to track include:
- HAProxy: Request rates, error rates (5xx, 4xx), connection counts, backend health.
- Puma: Worker/thread utilization, request latency, memory usage.
- Redis: Memory usage, hit/miss ratio, command latency, connections.
- PostgreSQL: Query performance, connection counts, CPU/memory usage, replication lag.
- Sidekiq: Queue lengths, job processing times, worker availability.
- Application-level metrics: Custom metrics for critical business operations.
Alerting should be configured for critical thresholds (e.g., high error rates, long queue lengths, database replication lag exceeding acceptable limits) to proactively address issues before they impact users.
OVH Specific Considerations
When deploying on OVH, leverage their dedicated servers for compute-intensive tasks (application servers, HAProxy) and their managed database services (if suitable for your SLA) or deploy your own PostgreSQL/Redis clusters on dedicated instances. Ensure your network configuration within OVH allows for efficient communication between these tiers. Utilize OVH's load balancing services if you prefer a managed solution, though HAProxy offers more granular control. For persistent storage, consider OVH's block storage solutions for your application servers.
Conclusion: Iterative Scaling
Scaling to 50,000+ concurrent requests is an ongoing process. The architecture described provides a robust foundation. Continuous performance testing, profiling, and monitoring are essential to identify bottlenecks and iteratively refine the configuration. Each component—load balancer, application server, cache, and database—must be scaled and optimized in tandem. This distributed, multi-tiered approach, combined with smart caching and asynchronous processing, is the key to handling such high loads effectively on OVH or any other cloud infrastructure.