Troubleshooting High Load Average and I/O Wait Spikes on Rocky Linux 9: Tuning ext4 and XFS Mount Parameters
Identifying the Root Cause: Load Average vs. I/O Wait
High load average on a Linux system, particularly Rocky Linux 9, is often a symptom, not the disease. It indicates the number of processes that are either running or waiting to run. A load average consistently higher than the number of CPU cores suggests a bottleneck. However, distinguishing between CPU-bound processes and I/O-bound processes is critical for effective troubleshooting. High I/O wait (wa% in `top` or `iostat`) specifically points to the CPU spending time waiting for I/O operations to complete, often disk I/O. This guide focuses on tuning filesystem mount options for ext4 and XFS to mitigate these I/O-related spikes.
Initial Diagnostics: Gathering System Metrics
Before altering any configurations, establish a baseline and pinpoint the problematic I/O patterns. The following commands provide essential insights:
System-wide Performance Overview
The top command offers a real-time snapshot. Pay close attention to the load average (load average: X.XX, Y.YY, Z.ZZ) and the %wa column under the CPU states.
top - 10:30:00 up 10 days, 2:15, 1 user, load average: 1.50, 1.60, 1.70
Tasks: 250 total, 1 running, 249 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.0 us, 2.0 sy, 0.0 ni, 92.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 16000.0 total, 8000.0 free, 4000.0 used, 4000.0 buff/cache
MiB Swap: 2000.0 total, 2000.0 free, 0.0 used. 11000.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 mysql 20 0 500000 10000 5000 S 0.5 0.1 12:34:56 mysqld
5678 nginx 20 0 20000 5000 2000 S 0.2 0.0 5:01:23 nginx
9012 appuser 20 0 10000 3000 1000 R 0.1 0.0 0:00:01 script.sh
The iostat command provides more granular disk I/O statistics. Use it with a short interval to capture spikes.
iostat -dx 5 Linux 5.14.0-284.11.1.el9_3.x86_64 (rocky9-prod-web) 04/15/2024 _x86_64_ (8 CPU) Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz await %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 10.00 0.00 40.00 0.00 0.00 0.00 0.00 0.00 5.00 0.00 5.00 10.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ...
Focus on %util (device utilization), await (average I/O wait time), and aqu-sz (average queue length). High values here, especially when correlated with load average spikes, indicate disk I/O as the primary bottleneck.
Tuning Filesystem Mount Options: ext4
Rocky Linux 9 typically uses either ext4 or XFS. For ext4, several mount options can influence I/O behavior. The most impactful for reducing I/O wait are related to journaling and writeback behavior.
Understanding Key ext4 Mount Options
data=writeback: This is the most aggressive journaling mode. Metadata is journaled, but data blocks are written directly to the filesystem without being journaled first. This significantly reduces I/O overhead but offers less protection against data corruption in case of a crash during a write operation. It’s ideal for performance-critical workloads where data integrity can be managed at the application level or where downtime is acceptable.data=ordered(default): Metadata is journaled, and data blocks are written and committed to the filesystem before the corresponding metadata is committed to the journal. This provides a good balance between performance and data integrity.data=journal: Both data and metadata are journaled. This offers the highest level of data integrity but incurs the most I/O overhead.commit=seconds: This option controls how often the filesystem flushes its data to disk. The default is often 5 seconds. Increasing this value (e.g., to 30 or 60) can reduce the frequency of write operations, potentially smoothing out I/O spikes, but also increases the window of potential data loss in case of a crash.nobarrier: For certain hardware RAID controllers or SSDs that guarantee write ordering, disabling I/O barriers can improve performance by eliminating redundant write checks. However, this should be used with extreme caution and only if the underlying storage subsystem guarantees ordered writes. Incorrect use can lead to severe data corruption.
Applying ext4 Mount Options
To test these options without a reboot, you can remount the filesystem. For example, to test data=writeback and a longer commit interval on a filesystem mounted at /data:
# First, check current mount options mount | grep /data # Example output: # /dev/sdb1 on /data type ext4 (rw,relatime,data=ordered) # Remount with new options (replace /dev/sdb1 with your actual device) sudo mount -o remount,data=writeback,commit=30 /data # Verify the change mount | grep /data # Expected output: # /dev/sdb1 on /data type ext4 (rw,relatime,data=writeback,commit=30)
To make these changes permanent, edit the /etc/fstab file. Add or modify the options for the relevant filesystem entry. For example:
# /etc/fstab UUID=... / ext4 defaults,errors=remount-ro 0 1 UUID=... /data ext4 rw,relatime,data=writeback,commit=30 0 2 UUID=... /var/log ext4 rw,relatime,data=ordered,commit=60 0 2
Caution: Using data=writeback or a very high commit value increases the risk of data loss or corruption during unexpected shutdowns. Assess your application’s tolerance for this risk. For critical data, stick to data=ordered or consider XFS.
Tuning Filesystem Mount Options: XFS
XFS is known for its high performance, especially with large files and concurrent I/O. Its journaling implementation is generally more efficient than ext4’s, and it offers different tuning parameters.
Understanding Key XFS Mount Options
logbufs=N: Specifies the number of in-memory log buffers. Increasing this can improve performance for write-heavy workloads by allowing more I/O to be buffered before being written to disk. The default is typically 8.logbsize=N: Sets the size of each log buffer in kilobytes. Increasing this can also help buffer more data. The default is usually 32k.noatime/relatime: These options control the update of file access times.noatimedisables access time updates entirely, whilerelatime(often the default) only updates access times if the previous update was more than 24 hours ago or if the file was modified. Usingnoatimecan reduce small writes, especially on busy systems with many reads.swalloc=N: This option, when used during filesystem creation (mkfs.xfs), pre-allocates space for the log. While not a mount option, it’s relevant for XFS performance tuning.
Applying XFS Mount Options
Similar to ext4, you can remount an XFS filesystem to test options. For example, to increase log buffers and disable access time updates on a filesystem mounted at /data:
# Check current mount options mount | grep /data # Example output: # /dev/sdc1 on /data type xfs (rw,relatime,attr2,inode64,noquota) # Remount with new options (replace /dev/sdc1 with your actual device) sudo mount -o remount,logbufs=16,noatime /data # Verify the change mount | grep /data # Expected output: # /dev/sdc1 on /data type xfs (rw,relatime,attr2,inode64,noquota,logbufs=16,noatime)
To make these changes permanent, edit /etc/fstab. For XFS, the tuning parameters are often set during filesystem creation or via xfs_growfs for certain attributes, but mount options like noatime and logbufs can be adjusted in fstab.
# /etc/fstab UUID=... / xfs defaults,inode64,noquota 0 1 UUID=... /data xfs rw,relatime,attr2,inode64,noquota,logbufs=16,noatime 0 2
Note: The logbufs and logbsize options are most effective when set at filesystem creation time. While remounting can apply them, their full benefit is realized with a properly tuned filesystem from the start. For existing filesystems, consider the impact of noatime on applications that rely on access times.
Advanced Considerations and Further Tuning
Beyond mount options, several other factors contribute to I/O performance and can be tuned:
I/O Scheduler Tuning
The I/O scheduler determines the order in which I/O requests are sent to the storage device. For modern SSDs, the none or mq-deadline schedulers are often recommended. For HDDs, bfq or kyber might offer better throughput. You can check and set the scheduler for a device:
# Check current scheduler for a device (e.g., sdb) cat /sys/block/sdb/queue/scheduler # Example output: # mq-deadline [bfq] none # Set scheduler (e.g., to none) - requires root privileges echo none > /sys/block/sdb/queue/scheduler # To make this persistent across reboots, use udev rules. # Create a file like /etc/udev/rules.d/60-ioschedulers.rules
# /etc/udev/rules.d/60-ioschedulers.rules
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="nvme[0-9]n[0-9]", ATTR{queue/scheduler}="none"
Swappiness and Memory Management
Excessive swapping to disk can cause severe I/O wait. While not directly a filesystem tuning parameter, reducing the kernel’s tendency to swap can alleviate I/O pressure.
# Check current swappiness cat /proc/sys/vm/swappiness # Set swappiness temporarily (e.g., to 10) sudo sysctl vm.swappiness=10 # Make permanent by editing /etc/sysctl.conf or a file in /etc/sysctl.d/ # Example: /etc/sysctl.d/99-swappiness.conf # vm.swappiness = 10
Application-Level Tuning
Ultimately, the most effective tuning often occurs at the application level. Database tuning (e.g., buffer pool sizes, query optimization), web server configuration (e.g., caching, connection limits), and application code can significantly reduce I/O load. For instance, a database application might benefit from larger buffer pools to keep frequently accessed data in RAM, reducing disk reads.
Conclusion and Best Practices
Troubleshooting high load average and I/O wait spikes on Rocky Linux 9 requires a systematic approach. Start with diagnostics to confirm I/O as the bottleneck. Then, carefully evaluate filesystem mount options for ext4 and XFS, understanding the trade-offs between performance and data integrity. For ext4, data=writeback and adjusted commit intervals are powerful but risky. For XFS, tuning log buffers and using noatime can yield benefits. Always test changes in a staging environment before applying them to production. Remember to consider I/O scheduler settings and application-level optimizations for a holistic performance tuning strategy.
Leave a Reply
You must be logged in to post a comment.