Re: temporary hung tasks on XFS since updating to 6.6.92

17 Jun 2025


      On Tue, Jun 17, 2025 at 01:54:43PM +0200, Christian Theune wrote:
...
...
On 17. Jun 2025, at 07:44, Christian Theune ct@flyingcircus.io wrote:
...
On 16. Jun 2025, at 14:15, Carlos Maiolino cem@kernel.org wrote:
On Mon, Jun 16, 2025 at 12:09:21PM +0200, Christian Theune wrote:
...
# xfs_info /tmp/
meta-data=/dev/vdb1              isize=512    agcount=8, agsize=229376 blks
       =                       sectsz=512   attr=2, projid32bit=1
       =                       crc=1        finobt=1, sparse=1, rmapbt=0
       =                       reflink=0    bigtime=0 inobtcount=0 nrext64=0
       =                       exchange=0
data     =                       bsize=4096   blocks=1833979, imaxpct=25
       =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=0
log      =internal log           bsize=4096   blocks=2560, version=2
       =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
This is worrisome. Your journal size is 10MiB, this can easily keep stalling IO
waiting for log space to be freed, depending on the nature of the machine this
can be easily triggered. I'm curious though how you made this FS, because 2560
is below the minimal log size that xfsprogs allows since (/me goes look
into git log) 2022, xfsprogs 5.15.
FWIW, one of the reasons the minimum journal log size has been increased is the
latency/stalls that happens when waiting for free log space, which is exactly
the symptom you've been seeing.
I'd suggest you to check the xfsprogs commit below if you want more details,
but if this is one of the filesystems where you see the stalls, this might very
well be the cause:
Interesting catch! I’ll double check this against our fleet and the affected machines and will dive into the traffic patterns of the specific underlying devices.
This filesystem is used for /tmp and is getting created fresh after a “cold boot” from our hypervisor. It could be that a number of VMs have only seen warm reboots for a couple of years but get kernel upgrades with warm reboots quite regularly. We’re in the process of changing the /tmp filesystem creation to happen fresh during initrd so that the VM internal xfsprogs will more closely match the guest kernel.
I’ve checked the log size. A number of machines with very long uptimes have this outdated 10 MiB size. Many machines with less uptime have larger sizes (multiple hundred megabytes). Checking our codebase we let xfsprogs do their thing and don’t fiddle with the defaults.
The log sizes of the affected machines weren’t all set to 10 MiB - even machines with larger sizes were affected.
Bear in mind that this was the worst one you sent me, just because the other FS
had a larger log, it doesn't mean it's enough or it won't face the same problems
too, IIRC, the other configuration FS you sent, had a 64MiB log. xfsprogs has a
default log size based on the FS size, so it won't take too much disk space for
just the log, but the default may not be enough.
This of course is speculation from my part giving the logs you provided, but
might be worth to test your VMs with larger log sizes.
...
I’ll follow up - as promised - with further analysis whether IO starvation from the underlying storage may have occured.
Christian
--
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: temporary hung tasks on XFS since updating to 6.6.92