On Tue, Jun 17, 2025 at 01:54:43PM +0200, Christian Theune wrote:
On 17. Jun 2025, at 07:44, Christian Theune ct@flyingcircus.io wrote:
On 16. Jun 2025, at 14:15, Carlos Maiolino cem@kernel.org wrote:
On Mon, Jun 16, 2025 at 12:09:21PM +0200, Christian Theune wrote:
# xfs_info /tmp/ meta-data=/dev/vdb1 isize=512 agcount=8, agsize=229376 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 bigtime=0 inobtcount=0 nrext64=0 = exchange=0 data = bsize=4096 blocks=1833979, imaxpct=25 = sunit=1024 swidth=1024 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=0 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
This is worrisome. Your journal size is 10MiB, this can easily keep stalling IO waiting for log space to be freed, depending on the nature of the machine this can be easily triggered. I'm curious though how you made this FS, because 2560 is below the minimal log size that xfsprogs allows since (/me goes look into git log) 2022, xfsprogs 5.15.
FWIW, one of the reasons the minimum journal log size has been increased is the latency/stalls that happens when waiting for free log space, which is exactly the symptom you've been seeing.
I'd suggest you to check the xfsprogs commit below if you want more details, but if this is one of the filesystems where you see the stalls, this might very well be the cause:
Interesting catch! I’ll double check this against our fleet and the affected machines and will dive into the traffic patterns of the specific underlying devices.
This filesystem is used for /tmp and is getting created fresh after a “cold boot” from our hypervisor. It could be that a number of VMs have only seen warm reboots for a couple of years but get kernel upgrades with warm reboots quite regularly. We’re in the process of changing the /tmp filesystem creation to happen fresh during initrd so that the VM internal xfsprogs will more closely match the guest kernel.
I’ve checked the log size. A number of machines with very long uptimes have this outdated 10 MiB size. Many machines with less uptime have larger sizes (multiple hundred megabytes). Checking our codebase we let xfsprogs do their thing and don’t fiddle with the defaults.
The log sizes of the affected machines weren’t all set to 10 MiB - even machines with larger sizes were affected.
Bear in mind that this was the worst one you sent me, just because the other FS had a larger log, it doesn't mean it's enough or it won't face the same problems too, IIRC, the other configuration FS you sent, had a 64MiB log. xfsprogs has a default log size based on the FS size, so it won't take too much disk space for just the log, but the default may not be enough. This of course is speculation from my part giving the logs you provided, but might be worth to test your VMs with larger log sizes.
I’ll follow up - as promised - with further analysis whether IO starvation from the underlying storage may have occured.
Christian
-- Christian Theune · ct@flyingcircus.io · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick