Scaling Ruby on DigitalOcean to Handle 50,000+ Concurrent Requests

Architectural Foundation: Beyond Single-Instance Rails

Scaling a Ruby on Rails application to handle 50,000+ concurrent requests on DigitalOcean necessitates a fundamental shift from a monolithic, single-instance deployment. The core principle is to distribute load, decouple services, and leverage asynchronous processing. This involves a multi-pronged approach focusing on application server scaling, database optimization, caching strategies, and asynchronous task management.

Application Server Scaling with Puma and Load Balancing

For Rails applications, Puma is the de facto standard application server. To achieve high concurrency, we need to run multiple Puma workers and threads. The optimal configuration depends heavily on the server’s CPU cores and available RAM. A common starting point for a DigitalOcean Droplet with 8 vCPUs and 32GB RAM might look like this:

The WEB_CONCURRENCY environment variable (or RAILS_MAX_THREADS if using older Puma versions) controls the number of Puma workers. Each worker is a separate Ruby process. The RAILS_MIN_THREADS and RAILS_MAX_THREADS control the number of threads within each worker. A good heuristic is to set WEB_CONCURRENCY to (number of CPU cores) - 1 and RAILS_MAX_THREADS to a value that, when multiplied by WEB_CONCURRENCY, doesn’t exceed available memory. For 8 vCPUs, we might start with 7 workers and 15 threads per worker.

To manage multiple Puma instances and distribute traffic, a load balancer is essential. DigitalOcean’s Load Balancers are a managed service that simplifies this. We’ll configure them to distribute incoming HTTP/S traffic across multiple Droplets running our Rails application.

Puma Configuration Example

This configuration is typically set via environment variables. If you’re using a tool like foreman or systemd for process management, you’ll set these variables in your process manager’s configuration file.

# Example for systemd service file
[Service]
Environment="RAILS_ENV=production"
Environment="WEB_CONCURRENCY=7"
Environment="RAILS_MIN_THREADS=1"
Environment="RAILS_MAX_THREADS=15"
Environment="PUMA_PORT=3000"
# ... other service configurations

DigitalOcean Load Balancer Setup

When setting up a DigitalOcean Load Balancer, you’ll typically:

Create a new Load Balancer in the DigitalOcean control panel.
Configure forwarding rules:

Protocol: HTTP, Port: 80
Protocol: HTTPS, Port: 443 (if using SSL termination at the LB)

Add your application Droplets as backend servers. Ensure they are listening on the port specified in your Puma configuration (e.g., 3000).
Configure health checks to ensure traffic is only sent to healthy application instances. A common health check endpoint is /health which should return a 200 OK.

Database Scaling and Optimization

The database is often the bottleneck. For 50,000+ concurrent requests, a single PostgreSQL or MySQL instance will likely struggle. We need to consider read replicas, connection pooling, and query optimization.

Read Replicas

DigitalOcean Managed Databases offer easy setup for read replicas. Configure your Rails application to direct read-heavy queries to the replica(s) and write operations to the primary. This can be managed using gems like makara or by configuring your database adapter.

# config/database.yml (example using makara)
production:
  primary:
    adapter: postgresql
    database: myapp_production
    host: primary_db_host
    username: <%= ENV['DB_USERNAME'] %>
    password: <%= ENV['DB_PASSWORD'] %>
  replica:
    adapter: postgresql
    database: myapp_production
    host: replica_db_host
    username: <%= ENV['DB_USERNAME'] %>
    password: <%= ENV['DB_PASSWORD'] %>
    replica: true # Indicates this is a replica connection

# In your application code, use Makara's connection proxy
# e.g., ActiveRecord::Base.connection_pool.with_connection do |conn|
#   conn.execute("SELECT * FROM users") # This might go to replica
# end

Connection Pooling

Each Puma worker will have its own database connection pool. The size of this pool is critical. A common configuration for pool in database.yml is WEB_CONCURRENCY * RAILS_MAX_THREADS. However, this can lead to an excessive number of connections to the database. A more conservative approach is to set the pool size to RAILS_MAX_THREADS per worker, and then scale the database’s max_connections accordingly. For 7 workers * 15 threads = 105 connections, you’d want your database to support at least this many active connections, plus overhead for other services.

# config/database.yml
production:
  pool: 15 # Assuming RAILS_MAX_THREADS is 15
  # ... other settings

On the database server (e.g., PostgreSQL), you’ll need to adjust max_connections in postgresql.conf. For example, if you have 7 Puma workers * 15 threads = 105 application connections, and you need connections for other services (like background job workers), you might set max_connections = 150. Ensure your Droplet for the database has sufficient RAM to support this many connections.

Query Optimization

Regularly analyze slow queries using tools like pg_stat_statements (for PostgreSQL) or the MySQL Slow Query Log. Ensure proper indexing is in place for frequently queried columns. Use tools like rails-pg-extras or bullet to identify N+1 query problems and other inefficiencies.

Caching Strategies

Aggressive caching is non-negotiable. Implement multiple layers of caching:

Page Caching / Russian Doll Caching

For pages that don’t change frequently or are largely static, use Rails’ built-in caching mechanisms. Russian Doll Caching is particularly effective for fragmenting views and caching individual components.

# app/views/posts/_post.html.erb
<%= cache post do %>
  <h2><%= post.title %></h2>
  <p><%= post.body %></p>
  <%= render 'comments/comments', post: post %>
<% end %>

Fragment Caching

Cache specific parts of your views that are expensive to render.

<%= cache ['v1', @post, @post.comments.order(:created_at)] do %>
  <h3>Comments</h3>
  <%= render @post.comments %>
<% end %>

HTTP Caching (ETags, Last-Modified)

Leverage HTTP caching headers to allow browsers and intermediate proxies to cache responses. This reduces load on your application servers.

# app/controllers/posts_controller.rb
class PostsController < ApplicationController
  def show
    @post = Post.find(params[:id])
    fresh_when(last_modified: @post.updated_at, public: true)
  end
end

External Caching Stores (Redis/Memcached)

For shared caching across application instances and for caching complex data structures or query results, use an external caching store like Redis. DigitalOcean Managed Databases can host Redis instances.

# config/initializers/redis.rb
$redis = Redis.new(url: ENV['REDIS_URL'])

# Example usage in a model or service
def cached_data
  Rails.cache.fetch('my_complex_data', expires_in: 1.hour) do
    # Expensive computation or database query
    fetch_from_database_or_api
  end
end

Asynchronous Task Processing

Any operation that is not critical for the immediate user response should be moved to a background job queue. This includes sending emails, processing images, generating reports, and performing complex calculations.

Sidekiq Configuration

Sidekiq is a popular and robust background job processing library for Ruby. It uses Redis as a backend. You’ll need to run Sidekiq workers on separate Droplets or on the same Droplets as your Rails app, but with careful resource management.

# Example systemd service for Sidekiq
[Unit]
Description=Sidekiq
After=network.target redis-server.service

[Service]
User=deploy
Group=deploy
WorkingDirectory=/home/deploy/myapp
Environment="RAILS_ENV=production"
Environment="REDIS_URL=redis://your_redis_host:6379/0"
ExecStart=/usr/local/bin/bundle exec sidekiq -C /home/deploy/myapp/config/sidekiq.yml

[Install]
WantedBy=multi-user.target

The sidekiq.yml file allows you to configure concurrency, queues, and other settings. For high throughput, you might configure multiple queues and assign different priorities to them.

# config/sidekiq.yml
:concurrency: 25 # Number of Sidekiq worker processes
:queues:
  - [default, 6]
  - [mailers, 3]
  - [reports, 1]

Monitoring and Alerting

To maintain performance and stability under heavy load, robust monitoring is crucial. Key metrics to track include:

Application Performance Monitoring (APM): Tools like New Relic, Datadog, or Scout APM provide deep insights into request latency, error rates, and transaction traces.
Server Metrics: CPU utilization, memory usage, disk I/O, and network traffic on your Droplets. DigitalOcean’s built-in monitoring is a good start.
Database Metrics: Connection counts, query latency, cache hit rates, and replication lag.
Queue Depth: Monitor the number of jobs waiting in your Sidekiq queues.
Load Balancer Metrics: Request rates, error rates, and backend health.

Set up alerts for critical thresholds (e.g., CPU > 80%, error rate > 5%, queue depth > 1000) to proactively address issues before they impact users.

CDN and Asset Optimization

Offload static asset delivery to a Content Delivery Network (CDN). DigitalOcean Spaces (S3-compatible object storage) can be used with a CDN like Cloudflare or Fastly. This significantly reduces the load on your application servers.

# config/environments/production.rb
config.action_controller.asset_host = ENV['ASSET_HOST'] # e.g., "https://cdn.yourdomain.com"
config.public_file_server.enabled = false # Disable serving assets from Rails

Ensure your assets are fingerprinted (e.g., application-abcdef12345.css) to leverage browser caching effectively.

Conclusion: Iterative Scaling

Scaling to 50,000+ concurrent requests is not a one-time task but an ongoing process. Start with these foundational elements: robust application server configuration, a scalable database strategy with read replicas, aggressive caching, and asynchronous task processing. Continuously monitor performance, identify bottlenecks, and iterate on your architecture. Each component – application servers, database, caching layer, and background job system – needs to be scaled independently based on observed load and performance characteristics.