• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • Home
  • Projects
  • Products
  • Themes
  • Tools
  • Request for Quote

Vengala Vinay

Having 9+ Years of Experience in Software Development

  • Home
  • WordPress
  • PHP
    • Codeigniter
  • Django
  • Magento
  • Selenium
  • Server
Home » Scaling C++ on OVH to Handle 50,000+ Concurrent Requests

Scaling C++ on OVH to Handle 50,000+ Concurrent Requests

Architectural Foundation: Asynchronous C++ and High-Performance Networking

Achieving 50,000+ concurrent requests with C++ on OVH infrastructure necessitates a robust, non-blocking, and highly efficient architectural foundation. Traditional thread-per-request models quickly become a bottleneck due to context switching overhead and memory consumption. Our approach leverages an event-driven, asynchronous I/O model, primarily using libraries like Boost.Asio or the C++20 networking TS equivalents, coupled with a carefully tuned operating system and network stack.

The core principle is to maximize CPU utilization by ensuring that threads are always performing useful work rather than waiting for I/O operations to complete. This is achieved by registering I/O events (e.g., socket readability/writability) with an event loop (like epoll on Linux) and having a pool of worker threads that process these events asynchronously.

Optimizing the C++ Application for Concurrency

The C++ application itself must be designed with concurrency in mind. This involves:

  • Minimizing Shared Mutable State: Use thread-safe data structures (e.g., std::atomic, concurrent queues) or design components to be largely immutable or partitioned to avoid contention.
  • Efficient Memory Management: Employ custom allocators or memory pools to reduce the overhead of dynamic memory allocation, especially for frequently created and destroyed objects.
  • Lock-Free Programming (where applicable): For specific critical sections, explore lock-free algorithms and data structures to eliminate blocking entirely, though this significantly increases complexity.
  • CPU Affinity and NUMA Awareness: Pinning threads to specific CPU cores and being mindful of Non-Uniform Memory Access (NUMA) can yield substantial performance gains by reducing cache misses and memory latency.

Consider a simplified example of an asynchronous HTTP server handler using Boost.Asio:

Example: Asynchronous HTTP Request Handler (Conceptual)

#include <boost/asio.hpp>
#include <boost/beast.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <thread>
#include <memory>

namespace beast = boost::beast;
namespace http = beast::http;
namespace net = boost::asio;
using tcp = net::ip::tcp;

// Represents a single connection
class session : public std::enable_shared_from_this<session>
{
public:
    session(tcp::socket socket)
        : socket_(std::move(socket))
    {
    }

    void start()
    {
        do_read();
    }

private:
    void do_read()
    {
        // Clear the buffer for the next request
        buffer_.consume(buffer_.size());

        // Read a request at the beginning of the connection
        http::async_read(socket_, buffer_, request_,
            beast::bind_front_handler(
                &session::on_read,
                shared_from_this()));
    }

    void on_read(beast::error_code ec, std::size_t bytes_transferred)
    {
        boost::ignore_unused(bytes_transferred);

        // This means they closed the connection
        if(ec == http::error::end_of_stream)
            return;

        if(ec)
        {
            std::cerr << "Error reading request: " << ec.message() << std::endl;
            return;
        }

        // Process the request and prepare the response
        handle_request();

        // Send the response
        http::async_write(socket_, response_,
            beast::bind_front_handler(
                &session::on_write,
                shared_from_this()));
    }

    void handle_request()
    {
        response_.version(request_.version());
        response_.keep_alive(request_.keep_alive());

        // Example: Simple echo or static content
        if (request_.method() == http::verb::get) {
            response_.result(http::status::ok);
            response_.set(http::field::server, BOOST_BEAST_VERSION_STRING);
            response_.set(http::field::content_type, "text/plain");
            response_.body() = "Hello from C++ on OVH! You requested: " + request_.target().to_string();
        } else {
            response_.result(http::status::bad_request);
            response_.set(http::field::content_type, "text/plain");
            response_.body() = "Unsupported method";
        }
        response_.prepare_payload();
    }

    void on_write(beast::error_code ec, std::size_t bytes_transferred)
    {
        boost::ignore_unused(bytes_transferred);

        if(ec)
        {
            std::cerr << "Error writing response: " << ec.message() << std::endl;
            return;
        }

        // If we're not keeping the connection alive, close it
        if(!response_.keep_alive())
        {
            beast::error_code ignored_ec;
            socket_.shutdown(tcp::socket::shutdown_send, ignored_ec);
        }

        // Continue to the next read if keep-alive is enabled
        if (request_.keep_alive()) {
            do_read(); // Prepare for the next request on the same connection
        }
    }

    tcp::socket socket_;
    beast::flat_buffer buffer_;
    http::request<http::string_body> request_;
    http::response<http::string_body> response_;
};

// Accepts incoming connections and launches the sessions
class listener : public std::enable_shared_from_this<listener>
{
public:
    listener(net::io_context& ioc, tcp::endpoint endpoint)
        : ioc_(ioc), acceptor_(ioc, endpoint)
    {
    }

    void run()
    {
        do_accept();
    }

private:
    void do_accept()
    {
        acceptor_.async_accept(
            [this](beast::error_code ec, tcp::socket socket)
            {
                if(!ec)
                    std::make_shared<session>(std::move(socket))->start();

                // Continue accepting new connections
                do_accept();
            });
    }

    net::io_context& ioc_;
    tcp::acceptor acceptor_;
};

// Main server setup
void run_server(net::io_context& ioc, const std::string& address, unsigned short port, int thread_count)
{
    auto const bind_address = net::ip::make_address(address);
    tcp::endpoint endpoint(bind_address, port);

    // Create and launch a listener
    std::make_shared<listener>(ioc, endpoint)->run();

    // The io_context is required for all I/O
    // Create and launch a group of threads to run the io_context.
    std::vector<std::thread> threads;
    for(int i = 0; i < thread_count; ++i)
        threads.emplace_back(
        [&ioc]
        {
            ioc.run();
        });

    // Wait for all threads to complete.
    for(auto& t : threads)
        t.join();
}

// Example usage in main()
/*
int main()
{
    try
    {
        auto const address = "0.0.0.0";
        unsigned short port = 8080;
        int thread_count = std::thread::hardware_concurrency(); // Use all available cores

        net::io_context ioc;

        run_server(ioc, address, port, thread_count);
    }
    catch (const std::exception& e)
    {
        std::cerr << "Exception: " << e.what() << std::endl;
        return EXIT_FAILURE;
    }
}
*/

In this example, the session class handles a single client connection. The listener class accepts new connections and passes them to new session objects. The net::io_context is run by a pool of threads, allowing multiple I/O operations to be processed concurrently without blocking.

OVH Infrastructure Tuning and Configuration

Leveraging OVH’s bare-metal servers provides the necessary control over the operating system and network stack. Key areas for tuning include:

1. Kernel Parameters (sysctl)

These parameters directly influence the network stack’s capacity and efficiency. Apply these using sysctl -w parameter=value or by adding them to /etc/sysctl.conf for persistence.

# Increase the maximum number of open files (file descriptors)
fs.file-max = 200000
# Increase the maximum number of file descriptors per process
fs.nr_open = 1000000

# Increase the maximum number of sockets
net.core.somaxconn = 4096
# Increase the maximum number of pending connections backlog
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_syncookies = 1 # Protect against SYN floods

# Increase the maximum number of UDP receive/send buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# Increase the default receive/send buffer sizes
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216

# Increase the maximum number of queued packets for each socket
net.core.netdev_max_backlog = 5000

# Enable TCP Fast Open (if supported and desired)
net.ipv4.tcp_fastopen = 3 # 1: client, 2: server, 3: both

# Reduce TIME_WAIT state duration
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1 # Allow reuse of sockets in TIME_WAIT for new connections
net.ipv4.tcp_tw_recycle = 1 # Be cautious with this on NAT environments

# Increase the maximum number of ARP cache entries
net.ipv4.neigh.default.gc_thresh1 = 1024
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096

# Enable TCP congestion control algorithm (e.g., BBR for better throughput)
# Ensure kernel module is loaded: sudo modprobe tcp_bbr
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Note on tcp_tw_recycle: While it can help with rapid connection reuse, it can cause issues with clients behind NAT. tcp_tw_reuse is generally safer. Test thoroughly.

2. Network Interface Configuration

Ensure your network interfaces are configured for optimal performance. This includes disabling power-saving features and potentially tuning interrupt coalescing.

# Example for eth0: Disable offloading features that might add latency
sudo ethtool -K eth0 tso off gso off gro off

# Check current settings
sudo ethtool -k eth0

# Adjust interrupt coalescing (requires careful benchmarking)
# Lower values mean more frequent interrupts, potentially higher CPU, but lower latency.
# Higher values mean fewer interrupts, lower CPU, but higher latency.
# Example: Set rx-usecs to 0 (disable coalescing)
# sudo ethtool -C eth0 rx-usecs 0 tx-usecs 0
# This is highly hardware-dependent and requires extensive testing.

3. Application Deployment and Load Balancing

To handle 50,000+ concurrent requests, a single server is unlikely to suffice. A distributed architecture with load balancing is essential.

3.1. Load Balancer Configuration (HAProxy Example)

HAProxy is an excellent choice for high-performance load balancing. Configure it to distribute traffic efficiently across your C++ application instances.

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000 # Increased client timeout
    timeout server  50000 # Increased server timeout
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend http_frontend
    bind *:80
    mode http
    # Use roundrobin for even distribution, or leastconn for servers with fewer active connections
    balance roundrobin
    # Increase backlog for the listening socket
    option httpchk GET /healthz
    default_backend http_backend

backend http_backend
    mode http
    balance roundrobin
    # Increase server connection queue size
    option httpchk GET /healthz
    # Use keep-alive to reduce overhead for persistent connections
    option http-server-close
    # Adjust server timeouts if necessary
    timeout server 50000
    # Configure health checks
    option httpchk HEAD /healthz HTTP/1.1\r\nHost:\ localhost
    http-check expect status 200

    # List your C++ application servers here
    server app1 192.168.1.10:8080 check
    server app2 192.168.1.11:8080 check
    server app3 192.168.1.12:8080 check
    server app4 192.168.1.13:8080 check
    # Add more servers as needed

Key HAProxy Settings:

  • balance roundrobin or leastconn: Distributes load.
  • timeout client/server: Generous timeouts to accommodate potentially long-running asynchronous operations.
  • option httpchk: Essential for monitoring backend health. Ensure your C++ app has a /healthz endpoint.
  • option http-server-close: Can be useful, but test its interaction with your C++ app’s keep-alive handling.
  • tune.bufsize and tune.maxaccept: Advanced tuning parameters for HAProxy’s internal buffers and accept queue.

3.2. Scaling Strategy

Start with a modest number of powerful OVH bare-metal servers (e.g., 4-8 cores, 32GB+ RAM). Deploy your C++ application instances on each, configuring the thread pool size (e.g., number of io_context threads) to match or slightly exceed the number of CPU cores available to the application process. Use HAProxy to distribute traffic. Monitor CPU, memory, network I/O, and request latency. Gradually add more application servers and/or more powerful servers as needed, adjusting HAProxy’s backend pool. Auto-scaling can be implemented using orchestration tools like Kubernetes or custom scripts that monitor metrics and adjust server counts.

Monitoring and Performance Analysis

Continuous monitoring is critical. Use tools like:

  • Prometheus & Grafana: For collecting and visualizing metrics. Instrument your C++ application to expose metrics (e.g., active connections, request latency, error counts) via an HTTP endpoint.
  • System Monitoring Tools: htop, iotop, netstat -anp, ss -s, vmstat, iostat.
  • Profiling Tools: perf, gprof, Valgrind (callgrind) for deep dives into performance bottlenecks within the C++ code.
  • Load Testing Tools: wrk, k6, vegeta to simulate high concurrency and measure performance under load.

Key metrics to track:

  • Request Latency (p95, p99)
  • CPU Utilization (per core and overall)
  • Memory Usage
  • Network Throughput and Packet Loss
  • File Descriptor Usage
  • TCP Connection States (ESTABLISHED, TIME_WAIT, etc.)
  • Application-specific error rates

Regularly analyze these metrics to identify bottlenecks, whether they lie in the C++ application, the OS configuration, the network, or the load balancer. Iteratively tune parameters and architecture based on this data.

Primary Sidebar

A little about the Author

Having 9+ Years of Experience in Software Development.
Expertised in Php Development, WordPress Custom Theme Development (From scratch using underscores or Genesis Framework or using any blank theme or Premium Theme), Custom Plugin Development. Hands on Experience on 3rd Party Php Extension like Chilkat, nSoftware.

Recent Posts

  • How to Optimize Largest Contentful Paint (LCP) and Interaction to Next Paint (INP) in Large-Scale WooCommerce Enterprise Sites
  • Server Monitoring Best Practices: Keeping Your Laravel App and Elasticsearch Clusters Alive on Linode
  • Resolving thread pools deadlock during concurrent ActiveRecord transaction processing Under Peak Event Traffic on OVH
  • Eliminating PostgreSQL Bottlenecks: Tuning Queries for High-Performance Laravel Stores
  • The Ultimate DevOps Playbook: Tuning Nginx, Gunicorn/FPM, and DynamoDB on OVH for Magento 2

Copyright © 2026 · Vinay Vengala