Scaling C++ on OVH to Handle 50,000+ Concurrent Requests

Architectural Foundation: Asynchronous C++ and High-Performance Networking

Achieving 50,000+ concurrent requests with C++ on OVH infrastructure necessitates a robust, non-blocking, and highly efficient architectural foundation. Traditional thread-per-request models quickly become a bottleneck due to context switching overhead and memory consumption. Our approach leverages an event-driven, asynchronous I/O model, primarily using libraries like Boost.Asio or the C++20 networking TS equivalents, coupled with a carefully tuned operating system and network stack.

The core principle is to maximize CPU utilization by ensuring that threads are always performing useful work rather than waiting for I/O operations to complete. This is achieved by registering I/O events (e.g., socket readability/writability) with an event loop (like epoll on Linux) and having a pool of worker threads that process these events asynchronously.

Optimizing the C++ Application for Concurrency

The C++ application itself must be designed with concurrency in mind. This involves:

Minimizing Shared Mutable State: Use thread-safe data structures (e.g., std::atomic, concurrent queues) or design components to be largely immutable or partitioned to avoid contention.
Efficient Memory Management: Employ custom allocators or memory pools to reduce the overhead of dynamic memory allocation, especially for frequently created and destroyed objects.
Lock-Free Programming (where applicable): For specific critical sections, explore lock-free algorithms and data structures to eliminate blocking entirely, though this significantly increases complexity.
CPU Affinity and NUMA Awareness: Pinning threads to specific CPU cores and being mindful of Non-Uniform Memory Access (NUMA) can yield substantial performance gains by reducing cache misses and memory latency.

Consider a simplified example of an asynchronous HTTP server handler using Boost.Asio:

Example: Asynchronous HTTP Request Handler (Conceptual)

#include <boost/asio.hpp>
#include <boost/beast.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <thread>
#include <memory>

namespace beast = boost::beast;
namespace http = beast::http;
namespace net = boost::asio;
using tcp = net::ip::tcp;

// Represents a single connection
class session : public std::enable_shared_from_this<session>
{
public:
    session(tcp::socket socket)
        : socket_(std::move(socket))
    {
    }

    void start()
    {
        do_read();
    }

private:
    void do_read()
    {
        // Clear the buffer for the next request
        buffer_.consume(buffer_.size());

        // Read a request at the beginning of the connection
        http::async_read(socket_, buffer_, request_,
            beast::bind_front_handler(
                &session::on_read,
                shared_from_this()));
    }

    void on_read(beast::error_code ec, std::size_t bytes_transferred)
    {
        boost::ignore_unused(bytes_transferred);

        // This means they closed the connection
        if(ec == http::error::end_of_stream)
            return;

        if(ec)
        {
            std::cerr << "Error reading request: " << ec.message() << std::endl;
            return;
        }

        // Process the request and prepare the response
        handle_request();

        // Send the response
        http::async_write(socket_, response_,
            beast::bind_front_handler(
                &session::on_write,
                shared_from_this()));
    }

    void handle_request()
    {
        response_.version(request_.version());
        response_.keep_alive(request_.keep_alive());

        // Example: Simple echo or static content
        if (request_.method() == http::verb::get) {
            response_.result(http::status::ok);
            response_.set(http::field::server, BOOST_BEAST_VERSION_STRING);
            response_.set(http::field::content_type, "text/plain");
            response_.body() = "Hello from C++ on OVH! You requested: " + request_.target().to_string();
        } else {
            response_.result(http::status::bad_request);
            response_.set(http::field::content_type, "text/plain");
            response_.body() = "Unsupported method";
        }
        response_.prepare_payload();
    }

    void on_write(beast::error_code ec, std::size_t bytes_transferred)
    {
        boost::ignore_unused(bytes_transferred);

        if(ec)
        {
            std::cerr << "Error writing response: " << ec.message() << std::endl;
            return;
        }

        // If we're not keeping the connection alive, close it
        if(!response_.keep_alive())
        {
            beast::error_code ignored_ec;
            socket_.shutdown(tcp::socket::shutdown_send, ignored_ec);
        }

        // Continue to the next read if keep-alive is enabled
        if (request_.keep_alive()) {
            do_read(); // Prepare for the next request on the same connection
        }
    }

    tcp::socket socket_;
    beast::flat_buffer buffer_;
    http::request<http::string_body> request_;
    http::response<http::string_body> response_;
};

// Accepts incoming connections and launches the sessions
class listener : public std::enable_shared_from_this<listener>
{
public:
    listener(net::io_context& ioc, tcp::endpoint endpoint)
        : ioc_(ioc), acceptor_(ioc, endpoint)
    {
    }

    void run()
    {
        do_accept();
    }

private:
    void do_accept()
    {
        acceptor_.async_accept(
            [this](beast::error_code ec, tcp::socket socket)
            {
                if(!ec)
                    std::make_shared<session>(std::move(socket))->start();

                // Continue accepting new connections
                do_accept();
            });
    }

    net::io_context& ioc_;
    tcp::acceptor acceptor_;
};

// Main server setup
void run_server(net::io_context& ioc, const std::string& address, unsigned short port, int thread_count)
{
    auto const bind_address = net::ip::make_address(address);
    tcp::endpoint endpoint(bind_address, port);

    // Create and launch a listener
    std::make_shared<listener>(ioc, endpoint)->run();

    // The io_context is required for all I/O
    // Create and launch a group of threads to run the io_context.
    std::vector<std::thread> threads;
    for(int i = 0; i < thread_count; ++i)
        threads.emplace_back(
        [&ioc]
        {
            ioc.run();
        });

    // Wait for all threads to complete.
    for(auto& t : threads)
        t.join();
}

// Example usage in main()
/*
int main()
{
    try
    {
        auto const address = "0.0.0.0";
        unsigned short port = 8080;
        int thread_count = std::thread::hardware_concurrency(); // Use all available cores

        net::io_context ioc;

        run_server(ioc, address, port, thread_count);
    }
    catch (const std::exception& e)
    {
        std::cerr << "Exception: " << e.what() << std::endl;
        return EXIT_FAILURE;
    }
}
*/

In this example, the session class handles a single client connection. The listener class accepts new connections and passes them to new session objects. The net::io_context is run by a pool of threads, allowing multiple I/O operations to be processed concurrently without blocking.

OVH Infrastructure Tuning and Configuration

Leveraging OVH’s bare-metal servers provides the necessary control over the operating system and network stack. Key areas for tuning include:

1. Kernel Parameters (sysctl)

These parameters directly influence the network stack’s capacity and efficiency. Apply these using sysctl -w parameter=value or by adding them to /etc/sysctl.conf for persistence.

# Increase the maximum number of open files (file descriptors)
fs.file-max = 200000
# Increase the maximum number of file descriptors per process
fs.nr_open = 1000000

# Increase the maximum number of sockets
net.core.somaxconn = 4096
# Increase the maximum number of pending connections backlog
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_syncookies = 1 # Protect against SYN floods

# Increase the maximum number of UDP receive/send buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# Increase the default receive/send buffer sizes
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216

# Increase the maximum number of queued packets for each socket
net.core.netdev_max_backlog = 5000

# Enable TCP Fast Open (if supported and desired)
net.ipv4.tcp_fastopen = 3 # 1: client, 2: server, 3: both

# Reduce TIME_WAIT state duration
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1 # Allow reuse of sockets in TIME_WAIT for new connections
net.ipv4.tcp_tw_recycle = 1 # Be cautious with this on NAT environments

# Increase the maximum number of ARP cache entries
net.ipv4.neigh.default.gc_thresh1 = 1024
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096

# Enable TCP congestion control algorithm (e.g., BBR for better throughput)
# Ensure kernel module is loaded: sudo modprobe tcp_bbr
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Note on tcp_tw_recycle: While it can help with rapid connection reuse, it can cause issues with clients behind NAT. tcp_tw_reuse is generally safer. Test thoroughly.

2. Network Interface Configuration

Ensure your network interfaces are configured for optimal performance. This includes disabling power-saving features and potentially tuning interrupt coalescing.

# Example for eth0: Disable offloading features that might add latency
sudo ethtool -K eth0 tso off gso off gro off

# Check current settings
sudo ethtool -k eth0

# Adjust interrupt coalescing (requires careful benchmarking)
# Lower values mean more frequent interrupts, potentially higher CPU, but lower latency.
# Higher values mean fewer interrupts, lower CPU, but higher latency.
# Example: Set rx-usecs to 0 (disable coalescing)
# sudo ethtool -C eth0 rx-usecs 0 tx-usecs 0
# This is highly hardware-dependent and requires extensive testing.

3. Application Deployment and Load Balancing

To handle 50,000+ concurrent requests, a single server is unlikely to suffice. A distributed architecture with load balancing is essential.

3.1. Load Balancer Configuration (HAProxy Example)

HAProxy is an excellent choice for high-performance load balancing. Configure it to distribute traffic efficiently across your C++ application instances.

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000 # Increased client timeout
    timeout server  50000 # Increased server timeout
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend http_frontend
    bind *:80
    mode http
    # Use roundrobin for even distribution, or leastconn for servers with fewer active connections
    balance roundrobin
    # Increase backlog for the listening socket
    option httpchk GET /healthz
    default_backend http_backend

backend http_backend
    mode http
    balance roundrobin
    # Increase server connection queue size
    option httpchk GET /healthz
    # Use keep-alive to reduce overhead for persistent connections
    option http-server-close
    # Adjust server timeouts if necessary
    timeout server 50000
    # Configure health checks
    option httpchk HEAD /healthz HTTP/1.1\r\nHost:\ localhost
    http-check expect status 200

    # List your C++ application servers here
    server app1 192.168.1.10:8080 check
    server app2 192.168.1.11:8080 check
    server app3 192.168.1.12:8080 check
    server app4 192.168.1.13:8080 check
    # Add more servers as needed

Key HAProxy Settings:

balance roundrobin or leastconn: Distributes load.
timeout client/server: Generous timeouts to accommodate potentially long-running asynchronous operations.
option httpchk: Essential for monitoring backend health. Ensure your C++ app has a /healthz endpoint.
option http-server-close: Can be useful, but test its interaction with your C++ app’s keep-alive handling.
tune.bufsize and tune.maxaccept: Advanced tuning parameters for HAProxy’s internal buffers and accept queue.

3.2. Scaling Strategy

Start with a modest number of powerful OVH bare-metal servers (e.g., 4-8 cores, 32GB+ RAM). Deploy your C++ application instances on each, configuring the thread pool size (e.g., number of io_context threads) to match or slightly exceed the number of CPU cores available to the application process. Use HAProxy to distribute traffic. Monitor CPU, memory, network I/O, and request latency. Gradually add more application servers and/or more powerful servers as needed, adjusting HAProxy’s backend pool. Auto-scaling can be implemented using orchestration tools like Kubernetes or custom scripts that monitor metrics and adjust server counts.

Monitoring and Performance Analysis

Continuous monitoring is critical. Use tools like:

Prometheus & Grafana: For collecting and visualizing metrics. Instrument your C++ application to expose metrics (e.g., active connections, request latency, error counts) via an HTTP endpoint.
System Monitoring Tools: htop, iotop, netstat -anp, ss -s, vmstat, iostat.
Profiling Tools: perf, gprof, Valgrind (callgrind) for deep dives into performance bottlenecks within the C++ code.
Load Testing Tools: wrk, k6, vegeta to simulate high concurrency and measure performance under load.

Key metrics to track:

Request Latency (p95, p99)
CPU Utilization (per core and overall)
Memory Usage
Network Throughput and Packet Loss
File Descriptor Usage
TCP Connection States (ESTABLISHED, TIME_WAIT, etc.)
Application-specific error rates

Regularly analyze these metrics to identify bottlenecks, whether they lie in the C++ application, the OS configuration, the network, or the load balancer. Iteratively tune parameters and architecture based on this data.