Tuning Linux Kernel TCP Stack parameters (/etc/sysctl.conf) on Ubuntu 22.04 LTS to Handle 50k Concurrent HTTP Requests

Understanding the Bottlenecks for High Concurrency

Handling 50,000 concurrent HTTP requests on a single Ubuntu 22.04 LTS server is a demanding task that pushes the limits of the operating system’s networking stack. The primary bottlenecks typically lie within the Linux kernel’s TCP/IP implementation, specifically in areas related to connection management, memory allocation for buffers, and the handling of network events. Without proper tuning, the system can quickly exhaust resources, leading to dropped connections, high latency, and ultimately, service degradation. This document outlines specific `sysctl` parameters and their rationale for optimizing the TCP stack for such high-load scenarios.

Essential `sysctl` Parameters for TCP Stack Tuning

The primary configuration file for tuning kernel parameters at runtime is /etc/sysctl.conf. Changes made here are persistent across reboots. After modifying this file, you can apply the changes immediately using the command sudo sysctl -p.

1. Network Buffer Sizes and Limits

TCP performance is heavily influenced by the size of the send and receive buffers. Insufficient buffer space can lead to packet loss, especially under high throughput conditions, as the kernel may not have enough memory to hold incoming or outgoing data. Conversely, excessively large buffers can increase latency and memory consumption.

1.1. Maximum Receive Buffer (`net.core.rmem_max`)

This parameter sets the maximum size for the receive buffer for all socket types. For high concurrency, we need to allow for larger buffers to accommodate bursts of incoming data without dropping packets. A common recommendation for high-performance servers is to set this to a value that allows for significant buffering, often around 16MB or more.

net.core.rmem_max = 16777216

1.2. Default Receive Buffer (`net.core.rmem_default`)

This is the default size of the receive buffer. It should be set to a reasonable value, typically lower than rmem_max but still substantial enough to handle typical traffic patterns. Setting it too low can still cause issues even if rmem_max is high.

net.core.rmem_default = 16777216

1.3. Maximum Send Buffer (`net.core.wmem_max`)

Similar to receive buffers, send buffers are critical for outgoing data. For high concurrency, especially with persistent connections or large responses, sufficient send buffer space is crucial to prevent the application from being bottlenecked by the network. We’ll set this to the same value as rmem_max.

net.core.wmem_max = 16777216

1.4. Default Send Buffer (`net.core.wmem_default`)

The default send buffer size. Again, setting this to a substantial value helps ensure smooth data transmission.

net.core.wmem_default = 16777216

2. TCP Connection Management

High concurrency implies a large number of active and potentially idle TCP connections. The kernel needs to efficiently manage these connections, including their state, retransmission timers, and memory associated with them.

2.1. Maximum Number of Connections (`net.core.somaxconn`)

This parameter limits the maximum number of pending connections that can be queued for a listening socket. For 50,000 concurrent requests, the backlog queue needs to be sufficiently large to handle incoming connection requests that arrive faster than the application can accept them. The default is often 128, which is far too low. A value of 65535 is a common starting point for high-concurrency servers.

net.core.somaxconn = 65535

2.2. TCP SYN Backlog (`net.ipv4.tcp_max_syn_backlog`)

This controls the maximum number of outstanding SYN requests that can be queued. When a client sends a SYN packet, the server responds with SYN-ACK and adds the connection to this backlog. If the backlog is full, new SYN requests may be dropped. For high concurrency, this needs to be significantly increased. A value of 4096 or higher is recommended.

net.ipv4.tcp_max_syn_backlog = 4096

2.3. TCP Connection Reuse (`net.ipv4.tcp_tw_reuse`)

When a TCP connection is closed, it enters the TIME_WAIT state to ensure that any delayed packets from the previous incarnation of the connection are handled correctly. If the server is closing connections rapidly, many sockets can end up in TIME_WAIT, consuming resources and preventing new connections from using the same port. Enabling tcp_tw_reuse allows the kernel to reuse sockets in TIME_WAIT for new outgoing connections if it’s deemed safe (i.e., the new connection’s timestamp is later than the timestamp of the packet that caused the previous connection to enter TIME_WAIT). This is generally safe for servers accepting incoming connections.

net.ipv4.tcp_tw_reuse = 1

2.4. TCP TIME_WAIT Timeout (`net.ipv4.tcp_fin_timeout`)

This parameter controls how long a TCP connection stays in the FIN-WAIT-2 state. While tcp_tw_reuse helps with TIME_WAIT, reducing the FIN-WAIT-2 timeout can also free up resources faster. A value of 30 seconds is often sufficient.

net.ipv4.tcp_fin_timeout = 30

3. Network Event Handling and Resource Limits

Efficiently handling a large number of network events (like incoming packets or connection state changes) is crucial. This involves the kernel’s ability to process these events without getting bogged down.

3.1. File Descriptor Limits (`fs.file-max`)

Each network connection consumes a file descriptor. With 50,000 concurrent connections, the system needs to be able to open a very large number of files. The system-wide limit for open file descriptors is controlled by fs.file-max. This should be set to a value significantly higher than the expected number of connections, e.g., 100,000 or more.

fs.file-max = 100000

Additionally, the limits for individual processes must be increased. This is typically done in /etc/security/limits.conf. For example:

* soft nofile 100000
* hard nofile 100000
root soft nofile 100000
root hard nofile 100000

Note that these limits might need to be applied to the specific user or group running your web server process (e.g., www-data) for finer control.

3.2. Epoll Event Queue Size (`fs.epoll.max_user_watches`)

Modern high-performance servers often use event-driven I/O models like epoll. epoll allows a single process to monitor many file descriptors efficiently. The fs.epoll.max_user_watches parameter limits the number of file descriptors that a single user can have registered with epoll. For 50,000 connections, this needs to be increased substantially. A value of 100,000 or more is appropriate.

fs.epoll.max_user_watches = 100000

3.3. TCP Fast Open (`net.ipv4.tcp_fastopen`)

TCP Fast Open (TFO) allows data to be sent in the initial SYN packet, reducing latency for subsequent connections. This can be beneficial for HTTP, especially with keep-alive disabled or for short-lived connections. The value is a bitmask: 1 enables client-side TFO, 2 enables server-side TFO, and 3 enables both. For a server, enabling server-side TFO (2) is the primary goal.

net.ipv4.tcp_fastopen = 2

3.4. TCP Congestion Control Algorithm

While not directly a sysctl parameter for tuning buffer sizes, the choice of congestion control algorithm can impact performance under load. Ubuntu 22.04 LTS typically defaults to cubic, which is generally good. However, for very high-speed, low-latency networks, algorithms like BBR (Bottleneck Bandwidth and Round-trip propagation time) might offer improvements. BBR is available in recent Linux kernels and can be enabled via sysctl if compiled into the kernel.

# To check current algorithm:
sysctl net.ipv4.tcp_congestion_control

# To set to BBR (if available and compiled):
# net.ipv4.tcp_congestion_control = bbr

Enabling BBR often requires kernel module loading and might be better managed through boot parameters or module loading scripts rather than a direct sysctl.conf entry if it’s not the default. For this guide, we’ll assume the default cubic is sufficient unless specific network conditions warrant further investigation.

Applying the Changes

To apply these settings, first create or edit the /etc/sysctl.conf file and add the parameters listed above. Then, execute the following command to load the new settings:

sudo sysctl -p

You should see output confirming that the new values have been loaded. To verify the current settings, you can use:

sysctl -a | grep net.core.rmem_max
sysctl -a | grep net.core.somaxconn
# and so on for other parameters

Application-Level Considerations

While kernel tuning is essential, it’s only one part of the equation. The web server or application handling these 50,000 concurrent requests must also be configured optimally. This includes:

Worker Processes/Threads: Ensure your web server (e.g., Nginx, Apache) is configured with an adequate number of worker processes or threads to handle the load without becoming CPU-bound.
Event-Driven Architecture: Using event-driven I/O models (like Nginx’s default or Apache’s event MPM) is crucial for handling many connections with fewer resources.
Keep-Alive Settings: Properly configuring HTTP keep-alive can reduce the overhead of establishing new TCP connections for subsequent requests, but excessively long keep-alive timeouts can tie up resources.
Application Performance: The application logic itself must be efficient. Slow application responses will lead to longer connection durations and higher resource utilization, regardless of kernel tuning.
Memory Allocation: Ensure the server has sufficient RAM to accommodate the increased buffer sizes and the memory footprint of the application and its connections.

Monitoring and Iteration

Tuning is an iterative process. After applying these changes, it’s critical to monitor the system’s performance under load. Key metrics to watch include:

CPU Usage: High CPU usage might indicate the application or kernel is struggling.
Memory Usage: Monitor RAM and swap usage. Increased buffer sizes will consume more memory.
Network Statistics: Use tools like netstat -s, ss -s, and sar -n DEV to observe packet drops, retransmissions, and connection states (especially TIME_WAIT and SYN_RECV).
Application Latency: Measure the response times of your HTTP requests.
Connection Counts: Monitor the number of established, listening, and TIME_WAIT connections.

Tools like htop, atop, sar, and Prometheus/Grafana are invaluable for this monitoring. If performance issues persist, further adjustments to these parameters, or even more advanced kernel tuning (e.g., per-CPU network queues, interrupt balancing), may be necessary. Always test changes in a staging environment before deploying to production.

Tuning Linux Kernel TCP Stack parameters (/etc/sysctl.conf) on Ubuntu 22.04 LTS to Handle 50k Concurrent HTTP Requests

Understanding the Bottlenecks for High Concurrency

Essential `sysctl` Parameters for TCP Stack Tuning

1. Network Buffer Sizes and Limits

1.1. Maximum Receive Buffer (net.core.rmem_max)

1.2. Default Receive Buffer (net.core.rmem_default)

1.3. Maximum Send Buffer (net.core.wmem_max)

1.4. Default Send Buffer (net.core.wmem_default)

2. TCP Connection Management

2.1. Maximum Number of Connections (net.core.somaxconn)

2.2. TCP SYN Backlog (net.ipv4.tcp_max_syn_backlog)

2.3. TCP Connection Reuse (net.ipv4.tcp_tw_reuse)

2.4. TCP TIME_WAIT Timeout (net.ipv4.tcp_fin_timeout)

3. Network Event Handling and Resource Limits

3.1. File Descriptor Limits (fs.file-max)

3.2. Epoll Event Queue Size (fs.epoll.max_user_watches)

3.3. TCP Fast Open (net.ipv4.tcp_fastopen)