Tuning RHEL 9 Network Interface Card (NIC) Rings and Transmit Queues (txqueuelen) for Ultra-Low Latency APIs
Understanding NIC Ring Buffers and Transmit Queues
For enterprise applications demanding ultra-low latency, particularly those serving high-frequency trading platforms, real-time analytics, or critical API endpoints, the performance of the network interface card (NIC) is paramount. Two key, often overlooked, tuning parameters directly impacting network throughput and latency are the NIC’s receive (RX) and transmit (TX) ring buffer sizes, and the per-queue transmit queue length (txqueuelen). These parameters dictate how many network packets the NIC can buffer before handing them off to the kernel or before the kernel can send them out. Insufficient buffer sizes can lead to packet drops (RX drops) or increased latency as packets wait for transmission slots.
Assessing Current NIC Configuration
Before tuning, it’s crucial to understand the current state of your network interfaces. We’ll use the ethtool utility, a standard Linux tool for configuring and displaying information about network drivers and hardware settings.
Displaying NIC Information
To view general information, including the driver and supported features, use:
sudo ethtool -i eth0
To inspect the current ring buffer settings (RX and TX descriptors) and other offload capabilities:
sudo ethtool -g eth0
The output will show:
Ring parameters: This section details the current number of RX/TX descriptors. These descriptors are essentially pointers to memory buffers where packets are stored.Pre-sleepandPost-sleep: These indicate how many descriptors are processed before and after the NIC enters a low-power state.RX/TX-usecs: The time in microseconds the NIC spends processing descriptors before yielding to the CPU.RX/TX-frames: The number of frames processed before yielding.
To view the transmit queue length (txqueuelen) for an interface:
ip link show eth0
Look for the txqlen value in the output. This is the maximum number of packets that can be queued for transmission on this interface before packets are dropped.
Tuning Ring Buffers with ethtool
Increasing the number of RX and TX descriptors can significantly reduce packet drops under heavy load, as it provides more buffer space for incoming and outgoing packets. The optimal values depend heavily on the NIC hardware, driver, and workload. A common starting point for high-performance scenarios is to increase these values substantially.
Setting Ring Buffer Sizes
Use the -G option with ethtool to set the ring parameters. For example, to set RX and TX descriptors to 4096 each:
sudo ethtool -G eth0 rx 4096 tx 4096
Important Considerations:
- Memory Consumption: Each descriptor consumes memory. Very large values can lead to excessive RAM usage. Monitor system memory after changes.
- Driver Support: Not all drivers support arbitrary values. Consult your NIC vendor’s documentation.
- Hardware Limitations: The NIC itself has a finite number of descriptor rings it can manage.
- Workload Specificity: For very high packet rates (e.g., millions of packets per second), you might need even larger values. For latency-sensitive applications with moderate throughput, excessively large buffers might introduce slight delays due to cache effects.
Making Ring Buffer Changes Persistent
Changes made with ethtool are not persistent across reboots. On RHEL 9, the recommended way to make these changes persistent is by using systemd-networkd or by creating a custom ethtool service. Using systemd-networkd is generally preferred for modern systems.
Using systemd-networkd
Create or edit a network configuration file for your interface (e.g., /etc/systemd/network/10-eth0.network). Add an [Link] section with the MTUBytes and WakeOnLan (if needed) options. For ethtool specific settings, you’ll typically use a .link file.
Create a file named /etc/systemd/network/eth0.link (or similar, matching your interface name):
[Match] Name=eth0 [Link] # Example: Set RX/TX descriptors to 4096 # Note: ethtool options are often specified directly or via a script. # For ethtool specific settings, a systemd service is more robust. # However, some basic link properties can be set here. # For advanced ethtool settings, a systemd service is recommended. # Example for MTU if needed: # MTUBytes=1500 ]
A more robust approach for ethtool settings is to create a systemd service that runs ethtool commands after the network is up.
Create a service file, e.g., /etc/systemd/system/ethtool-eth0.service:
[Unit] Description=Set ethtool options for eth0 After=network.target [Service] Type=oneshot ExecStart=/usr/sbin/ethtool -G eth0 rx 4096 tx 4096 ExecStop=/usr/sbin/ethtool -G eth0 rx 1024 tx 1024 # Optional: revert to defaults on stop [Install] WantedBy=multi-user.target ]
Then, enable and start the service:
sudo systemctl enable ethtool-eth0.service sudo systemctl start ethtool-eth0.service
Verify the settings with ethtool -g eth0 after starting the service.
Tuning Transmit Queue Length (txqueuelen)
The txqueuelen parameter controls the maximum number of packets that can be buffered in the transmit queue for a given interface. A small txqueuelen can lead to packet drops if the application sends data faster than the network can transmit it, even if the NIC’s TX ring buffers are not full. Conversely, a very large txqueuelen can increase latency by holding packets in the kernel queue longer than necessary.
Setting txqueuelen
You can adjust txqueuelen using the ip command:
sudo ip link set dev eth0 txqueuelen 1000
The default value is often 1000. For ultra-low latency, you might consider a value that balances throughput and latency. For some high-frequency trading scenarios, a smaller value (e.g., 250-500) might be preferred to minimize queuing delays, assuming the NIC ring buffers and CPU can keep up. For general high-throughput, increasing it slightly might be beneficial.
Making txqueuelen Changes Persistent
Similar to ring buffers, txqueuelen changes are not persistent. The systemd-networkd approach is again recommended.
Using systemd-networkd (.network file)
Edit your interface’s .network file (e.g., /etc/systemd/network/10-eth0.network) and add the TxQueueLen option under the [Link] section:
[Match] Name=eth0 [Network] DHCP=no Address=192.168.1.100/24 Gateway=192.168.1.1 DNS=8.8.8.8 [Link] # Set txqueuelen to 1000 TxQueueLen=1000 ]
After saving the file, restart systemd-networkd:
sudo systemctl restart systemd-networkd
Verify the change with ip link show eth0.
Advanced Considerations: Interrupt Coalescing and CPU Affinity
While ring buffers and txqueuelen are critical, achieving ultra-low latency often requires a holistic approach. Two other key areas are interrupt coalescing and CPU affinity.
Interrupt Coalescing
Interrupt coalescing is a mechanism where the NIC delays sending an interrupt to the CPU until a certain number of packets have arrived or a specific time interval has passed. This reduces the CPU overhead per packet but can increase latency. For low-latency applications, you often want to disable or significantly reduce interrupt coalescing.
Tuning Interrupt Coalescing
Use ethtool -c to view and modify interrupt coalescing settings:
sudo ethtool -c eth0
To disable coalescing (set to 0):
sudo ethtool -C eth0 adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0
Note: The exact parameters and their availability depend on the NIC driver. adaptive-rx/tx and rx/tx-usecs are common. Setting rx-usecs and tx-usecs to 0 effectively disables coalescing. Persistence for these settings also requires a systemd service as described earlier.
CPU Affinity
To minimize cache misses and context switching overhead, it’s highly beneficial to bind network interrupt handlers and application threads to specific CPU cores. This is known as CPU affinity.
Setting CPU Affinity
This is typically managed through kernel boot parameters (e.g., isolcpus) and application-specific configurations or systemd service files. For network interrupts, you can often map IRQs to specific CPUs. First, find the IRQ for your NIC:
grep eth0 /proc/interrupts
Then, you can manually assign the IRQ to a CPU core (e.g., core 4) by writing to /proc/irq/<IRQ_NUMBER>/smp_affinity. For persistence, use irqbalance configuration or a systemd service.
For application threads, use the taskset command or libraries within your application framework to pin threads to specific cores. For example, to run a process on CPU core 4:
taskset -c 4 your_application_command
Monitoring and Validation
After applying these tuning parameters, continuous monitoring is essential. Key metrics to watch include:
- Packet Drops: Use
netstat -s | grep -i 'packet dropped'orip -s link show eth0to identify interface-level drops. - Latency: Employ tools like
ping(for basic RTT),sockperf, or application-level latency measurements. - Throughput: Use
iperf3or application-specific metrics. - CPU Usage: Monitor CPU load, especially on cores handling network traffic, using
top,htop, orsar. - NIC Statistics:
ethtool -S eth0provides detailed hardware-level statistics, including various types of errors and drops.
Iteratively adjust the ring buffer sizes, txqueuelen, and interrupt coalescing settings based on observed performance and application requirements. The goal is to find the sweet spot that minimizes packet loss and latency without introducing excessive CPU overhead or memory consumption.
Leave a Reply
You must be logged in to post a comment.