Scaling C++ on OVH to Handle 50,000+ Concurrent Requests
Architectural Foundation: Asynchronous C++ and High-Performance Networking
Achieving 50,000+ concurrent requests with C++ on OVH infrastructure necessitates a robust, non-blocking, and highly efficient architectural foundation. Traditional thread-per-request models quickly become a bottleneck due to context switching overhead and memory consumption. Our approach leverages an event-driven, asynchronous I/O model, primarily using libraries like Boost.Asio or the C++20 networking TS equivalents, coupled with a carefully tuned operating system and network stack.
The core principle is to maximize CPU utilization by ensuring that threads are always performing useful work rather than waiting for I/O operations to complete. This is achieved by registering I/O events (e.g., socket readability/writability) with an event loop (like epoll on Linux) and having a pool of worker threads that process these events asynchronously.
Optimizing the C++ Application for Concurrency
The C++ application itself must be designed with concurrency in mind. This involves:
- Minimizing Shared Mutable State: Use thread-safe data structures (e.g.,
std::atomic, concurrent queues) or design components to be largely immutable or partitioned to avoid contention. - Efficient Memory Management: Employ custom allocators or memory pools to reduce the overhead of dynamic memory allocation, especially for frequently created and destroyed objects.
- Lock-Free Programming (where applicable): For specific critical sections, explore lock-free algorithms and data structures to eliminate blocking entirely, though this significantly increases complexity.
- CPU Affinity and NUMA Awareness: Pinning threads to specific CPU cores and being mindful of Non-Uniform Memory Access (NUMA) can yield substantial performance gains by reducing cache misses and memory latency.
Consider a simplified example of an asynchronous HTTP server handler using Boost.Asio:
Example: Asynchronous HTTP Request Handler (Conceptual)
#include <boost/asio.hpp>
#include <boost/beast.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <thread>
#include <memory>
namespace beast = boost::beast;
namespace http = beast::http;
namespace net = boost::asio;
using tcp = net::ip::tcp;
// Represents a single connection
class session : public std::enable_shared_from_this<session>
{
public:
session(tcp::socket socket)
: socket_(std::move(socket))
{
}
void start()
{
do_read();
}
private:
void do_read()
{
// Clear the buffer for the next request
buffer_.consume(buffer_.size());
// Read a request at the beginning of the connection
http::async_read(socket_, buffer_, request_,
beast::bind_front_handler(
&session::on_read,
shared_from_this()));
}
void on_read(beast::error_code ec, std::size_t bytes_transferred)
{
boost::ignore_unused(bytes_transferred);
// This means they closed the connection
if(ec == http::error::end_of_stream)
return;
if(ec)
{
std::cerr << "Error reading request: " << ec.message() << std::endl;
return;
}
// Process the request and prepare the response
handle_request();
// Send the response
http::async_write(socket_, response_,
beast::bind_front_handler(
&session::on_write,
shared_from_this()));
}
void handle_request()
{
response_.version(request_.version());
response_.keep_alive(request_.keep_alive());
// Example: Simple echo or static content
if (request_.method() == http::verb::get) {
response_.result(http::status::ok);
response_.set(http::field::server, BOOST_BEAST_VERSION_STRING);
response_.set(http::field::content_type, "text/plain");
response_.body() = "Hello from C++ on OVH! You requested: " + request_.target().to_string();
} else {
response_.result(http::status::bad_request);
response_.set(http::field::content_type, "text/plain");
response_.body() = "Unsupported method";
}
response_.prepare_payload();
}
void on_write(beast::error_code ec, std::size_t bytes_transferred)
{
boost::ignore_unused(bytes_transferred);
if(ec)
{
std::cerr << "Error writing response: " << ec.message() << std::endl;
return;
}
// If we're not keeping the connection alive, close it
if(!response_.keep_alive())
{
beast::error_code ignored_ec;
socket_.shutdown(tcp::socket::shutdown_send, ignored_ec);
}
// Continue to the next read if keep-alive is enabled
if (request_.keep_alive()) {
do_read(); // Prepare for the next request on the same connection
}
}
tcp::socket socket_;
beast::flat_buffer buffer_;
http::request<http::string_body> request_;
http::response<http::string_body> response_;
};
// Accepts incoming connections and launches the sessions
class listener : public std::enable_shared_from_this<listener>
{
public:
listener(net::io_context& ioc, tcp::endpoint endpoint)
: ioc_(ioc), acceptor_(ioc, endpoint)
{
}
void run()
{
do_accept();
}
private:
void do_accept()
{
acceptor_.async_accept(
[this](beast::error_code ec, tcp::socket socket)
{
if(!ec)
std::make_shared<session>(std::move(socket))->start();
// Continue accepting new connections
do_accept();
});
}
net::io_context& ioc_;
tcp::acceptor acceptor_;
};
// Main server setup
void run_server(net::io_context& ioc, const std::string& address, unsigned short port, int thread_count)
{
auto const bind_address = net::ip::make_address(address);
tcp::endpoint endpoint(bind_address, port);
// Create and launch a listener
std::make_shared<listener>(ioc, endpoint)->run();
// The io_context is required for all I/O
// Create and launch a group of threads to run the io_context.
std::vector<std::thread> threads;
for(int i = 0; i < thread_count; ++i)
threads.emplace_back(
[&ioc]
{
ioc.run();
});
// Wait for all threads to complete.
for(auto& t : threads)
t.join();
}
// Example usage in main()
/*
int main()
{
try
{
auto const address = "0.0.0.0";
unsigned short port = 8080;
int thread_count = std::thread::hardware_concurrency(); // Use all available cores
net::io_context ioc;
run_server(ioc, address, port, thread_count);
}
catch (const std::exception& e)
{
std::cerr << "Exception: " << e.what() << std::endl;
return EXIT_FAILURE;
}
}
*/
In this example, the session class handles a single client connection. The listener class accepts new connections and passes them to new session objects. The net::io_context is run by a pool of threads, allowing multiple I/O operations to be processed concurrently without blocking.
OVH Infrastructure Tuning and Configuration
Leveraging OVH’s bare-metal servers provides the necessary control over the operating system and network stack. Key areas for tuning include:
1. Kernel Parameters (sysctl)
These parameters directly influence the network stack’s capacity and efficiency. Apply these using sysctl -w parameter=value or by adding them to /etc/sysctl.conf for persistence.
# Increase the maximum number of open files (file descriptors) fs.file-max = 200000 # Increase the maximum number of file descriptors per process fs.nr_open = 1000000 # Increase the maximum number of sockets net.core.somaxconn = 4096 # Increase the maximum number of pending connections backlog net.ipv4.tcp_max_syn_backlog = 2048 net.ipv4.tcp_syncookies = 1 # Protect against SYN floods # Increase the maximum number of UDP receive/send buffers net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # Increase the default receive/send buffer sizes net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 # Increase the maximum number of queued packets for each socket net.core.netdev_max_backlog = 5000 # Enable TCP Fast Open (if supported and desired) net.ipv4.tcp_fastopen = 3 # 1: client, 2: server, 3: both # Reduce TIME_WAIT state duration net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_tw_reuse = 1 # Allow reuse of sockets in TIME_WAIT for new connections net.ipv4.tcp_tw_recycle = 1 # Be cautious with this on NAT environments # Increase the maximum number of ARP cache entries net.ipv4.neigh.default.gc_thresh1 = 1024 net.ipv4.neigh.default.gc_thresh2 = 2048 net.ipv4.neigh.default.gc_thresh3 = 4096 # Enable TCP congestion control algorithm (e.g., BBR for better throughput) # Ensure kernel module is loaded: sudo modprobe tcp_bbr net.core.default_qdisc = fq net.ipv4.tcp_congestion_control = bbr
Note on tcp_tw_recycle: While it can help with rapid connection reuse, it can cause issues with clients behind NAT. tcp_tw_reuse is generally safer. Test thoroughly.
2. Network Interface Configuration
Ensure your network interfaces are configured for optimal performance. This includes disabling power-saving features and potentially tuning interrupt coalescing.
# Example for eth0: Disable offloading features that might add latency sudo ethtool -K eth0 tso off gso off gro off # Check current settings sudo ethtool -k eth0 # Adjust interrupt coalescing (requires careful benchmarking) # Lower values mean more frequent interrupts, potentially higher CPU, but lower latency. # Higher values mean fewer interrupts, lower CPU, but higher latency. # Example: Set rx-usecs to 0 (disable coalescing) # sudo ethtool -C eth0 rx-usecs 0 tx-usecs 0 # This is highly hardware-dependent and requires extensive testing.
3. Application Deployment and Load Balancing
To handle 50,000+ concurrent requests, a single server is unlikely to suffice. A distributed architecture with load balancing is essential.
3.1. Load Balancer Configuration (HAProxy Example)
HAProxy is an excellent choice for high-performance load balancing. Configure it to distribute traffic efficiently across your C++ application instances.
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000 # Increased client timeout
timeout server 50000 # Increased server timeout
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend http_frontend
bind *:80
mode http
# Use roundrobin for even distribution, or leastconn for servers with fewer active connections
balance roundrobin
# Increase backlog for the listening socket
option httpchk GET /healthz
default_backend http_backend
backend http_backend
mode http
balance roundrobin
# Increase server connection queue size
option httpchk GET /healthz
# Use keep-alive to reduce overhead for persistent connections
option http-server-close
# Adjust server timeouts if necessary
timeout server 50000
# Configure health checks
option httpchk HEAD /healthz HTTP/1.1\r\nHost:\ localhost
http-check expect status 200
# List your C++ application servers here
server app1 192.168.1.10:8080 check
server app2 192.168.1.11:8080 check
server app3 192.168.1.12:8080 check
server app4 192.168.1.13:8080 check
# Add more servers as needed
Key HAProxy Settings:
balance roundrobinorleastconn: Distributes load.timeout client/server: Generous timeouts to accommodate potentially long-running asynchronous operations.option httpchk: Essential for monitoring backend health. Ensure your C++ app has a/healthzendpoint.option http-server-close: Can be useful, but test its interaction with your C++ app’s keep-alive handling.tune.bufsizeandtune.maxaccept: Advanced tuning parameters for HAProxy’s internal buffers and accept queue.
3.2. Scaling Strategy
Start with a modest number of powerful OVH bare-metal servers (e.g., 4-8 cores, 32GB+ RAM). Deploy your C++ application instances on each, configuring the thread pool size (e.g., number of io_context threads) to match or slightly exceed the number of CPU cores available to the application process. Use HAProxy to distribute traffic. Monitor CPU, memory, network I/O, and request latency. Gradually add more application servers and/or more powerful servers as needed, adjusting HAProxy’s backend pool. Auto-scaling can be implemented using orchestration tools like Kubernetes or custom scripts that monitor metrics and adjust server counts.
Monitoring and Performance Analysis
Continuous monitoring is critical. Use tools like:
- Prometheus & Grafana: For collecting and visualizing metrics. Instrument your C++ application to expose metrics (e.g., active connections, request latency, error counts) via an HTTP endpoint.
- System Monitoring Tools:
htop,iotop,netstat -anp,ss -s,vmstat,iostat. - Profiling Tools:
perf,gprof, Valgrind (callgrind) for deep dives into performance bottlenecks within the C++ code. - Load Testing Tools:
wrk,k6,vegetato simulate high concurrency and measure performance under load.
Key metrics to track:
- Request Latency (p95, p99)
- CPU Utilization (per core and overall)
- Memory Usage
- Network Throughput and Packet Loss
- File Descriptor Usage
- TCP Connection States (ESTABLISHED, TIME_WAIT, etc.)
- Application-specific error rates
Regularly analyze these metrics to identify bottlenecks, whether they lie in the C++ application, the OS configuration, the network, or the load balancer. Iteratively tune parameters and architecture based on this data.