On 18. Jun 2025, at 00:28, Dave Chinner david@fromorbit.com wrote:
On Mon, Jun 16, 2025 at 12:09:21PM +0200, Christian Theune wrote:
Can you share the xfs_info of one of these filesystems? I'm curious about the FS geometry.
Sure:
# xfs_info / meta-data=/dev/disk/by-label/root isize=512 agcount=21, agsize=655040 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=1 inobtcount=1 nrext64=0 = exchange=0 data = bsize=4096 blocks=13106171, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=0 log =internal log bsize=4096 blocks=16384, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
From the logs, it was /dev/vda1 that was getting hung up, so I'm going to assume the workload is hitting the root partition, not:
# xfs_info /tmp/ meta-data=/dev/vdb1 isize=512 agcount=8, agsize=229376 blks
... this one that has a small log.
IOWs, I don't think the log size is a contributing factor here.
The indication from the logs is that the system is hung up waiting on slow journal writes. e.g. there are processes hung waiting for transaction reservations (i.e. no journal space available). Journal space is backed up on metadata writeback trying to force the journal to stable storage (which is blocked waiting for journal IO completion so it can issue more journal IO) and getting blocked so it can't make progress, either.
I think part of the issue is that journal writes issue device cache flushes and FUA writes, both of which require written data to be on stable storage before returning.
All this points to whatever storage is backing these VMs is extremely slow at guaranteeing persistence of data and eventually it can't keep up with the application making changes to the filesystem. When the journal IO latency gets high enough you start to see things backing up and stall warnings appearing.
IOWs, this does not look like a filesystem issue from the information presented, just storage that can't keep up with the rate at which the filesystem can make modifications in memory. When the fs finally starts to throttle on the slow storage, that's when you notice just how slow the storage actually is...
[ Historical note: this is exactly the sort of thing we have seen for years with hardware RAID5/6 adapters with large amounts of NVRAM and random write workloads. They run as fast as NVRAM can sink the 4kB random writes, then when the NVRAM fills, they have to wait for hundreds of MB of cached 4kB random writes to be written to the RAID5/6 luns at 50-100 IOPS. This causes the exact same "filesystem is hung" symptoms as you are describing in this thread. ]
Yeah, I’m very wary of reporting these tracebacks as potential bugs because of them easily being just a hint on slow storage. My problem here is that I can’t point to anything that says the storage would have been slow.
I’ve gone through all metrics and logs on the KVM servers as well as the Ceph servers and they’ve been performing completely at baseline level regarding errors, queues, iops, latency.
I’ve done a measurement to try to emulate those accesses by running
$ fio --rw=randrw --name=synctest --bs=4k --direct=1 --numjobs=1 --ioengine=libaio --iodepth=1 --runtime=600 --write_barrier=1 --size=60m
I hope this is sufficiently comparable behaviour (maybe with a different read/write ratio instead of 0.5?) to what XFS log flushing does. This resulted in [1].
My interpretation of this measurement (and the VM showed no illnes while this was running over 10 minutes): the VM is throttled at 250 IOPs and is reporting back after 10 minutes of 4k random writes with average IOPS of exactly 250. The latencies are a bit varied, this could be due to Qemu throttling. The max latency was 133ms, the average 2ms. This is on a 10g storage network with Ceph that requires another network roundtrip for replication before ACKing a write.
There is one more (somewhat far fetched thing) I could pull in here: out of 11 VMs that have seen that are exhibiting those symptoms I have seen two (including one already running on 6.12 at that time) that did log a stacktrace[2] that reminded me of the memory/folio issue we debugged late last year / earlier this year (https://lkml.org/lkml/2024/9/12/1472)) … maybe it’s worthwhile to consider whether this might be related. The outside symptoms are similar: it recovers on its own at some point and I can’t show any issue with the underlying storage at all.
I’m out of ideas for now, I’ll keep thinking about this. If anyone has any pointer for further tests in any direction, I’m open to anything. ;)
Thanks for all the help so far, Christian
[1] The test results:
synctest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.38 Starting 1 process Jobs: 1 (f=1): [m(1)][100.0%][r=484KiB/s,w=517KiB/s][r=121,w=129 IOPS][eta 00m:00s] synctest: (groupid=0, jobs=1): err= 0: pid=655973: Wed Jun 18 09:28:31 2025 read: IOPS=122, BW=489KiB/s (501kB/s)(29.9MiB/62557msec) slat (usec): min=8, max=1096, avg=22.94, stdev=24.68 clat (usec): min=285, max=133773, avg=2745.14, stdev=2723.35 lat (usec): min=296, max=133812, avg=2768.08, stdev=2723.35 clat percentiles (usec): | 1.00th=[ 416], 5.00th=[ 506], 10.00th=[ 611], 20.00th=[ 832], | 30.00th=[ 1713], 40.00th=[ 2540], 50.00th=[ 2737], 60.00th=[ 3458], | 70.00th=[ 3621], 80.00th=[ 3785], 90.00th=[ 4555], 95.00th=[ 4817], | 99.00th=[ 6063], 99.50th=[ 7898], 99.90th=[ 22676], 99.95th=[ 78119], | 99.99th=[133694] bw ( KiB/s): min= 304, max= 608, per=99.90%, avg=489.79, stdev=59.34, samples=125 iops : min= 76, max= 152, avg=122.45, stdev=14.83, samples=125 write: IOPS=123, BW=493KiB/s (504kB/s)(30.1MiB/62557msec); 0 zone resets slat (usec): min=11, max=1102, avg=25.54, stdev=23.33 clat (usec): min=1434, max=140566, avg=5337.34, stdev=7163.78 lat (usec): min=1453, max=140651, avg=5362.88, stdev=7164.02 clat percentiles (usec): | 1.00th=[ 1713], 5.00th=[ 1926], 10.00th=[ 2114], 20.00th=[ 2868], | 30.00th=[ 3720], 40.00th=[ 4080], 50.00th=[ 4490], 60.00th=[ 5014], | 70.00th=[ 5276], 80.00th=[ 5866], 90.00th=[ 6652], 95.00th=[ 8979], | 99.00th=[ 33162], 99.50th=[ 64750], 99.90th=[ 99091], 99.95th=[111674], | 99.99th=[141558] bw ( KiB/s): min= 336, max= 672, per=99.86%, avg=492.93, stdev=60.30, samples=125 iops : min= 84, max= 168, avg=123.23, stdev=15.08, samples=125 lat (usec) : 500=2.28%, 750=6.19%, 1000=2.66% lat (msec) : 2=9.39%, 4=40.59%, 10=36.82%, 20=1.26%, 50=0.41% lat (msec) : 100=0.34%, 250=0.05% cpu : usr=0.17%, sys=0.73%, ctx=15398, majf=0, minf=11 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=7655,7705,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs): READ: bw=489KiB/s (501kB/s), 489KiB/s-489KiB/s (501kB/s-501kB/s), io=29.9MiB (31.4MB), run=62557-62557msec WRITE: bw=493KiB/s (504kB/s), 493KiB/s-493KiB/s (504kB/s-504kB/s), io=30.1MiB (31.6MB), run=62557-62557msec
Disk stats (read/write): vda: ios=7662/7865, sectors=61856/71089, merge=0/5, ticks=21263/105909, in_queue=127177, util=98.62% ctheune@hannover96stag00:~/ > fio --rw=randrw --name=synctest --bs=4k --direct=1 --numjobs=1 --ioengine=libaio --iodepth=1 --runtime=600 --write_barrier=1 --size=60m --time_based synctest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.38 Starting 1 process Jobs: 1 (f=1): [m(1)][100.0%][r=408KiB/s,w=344KiB/s][r=102,w=86 IOPS][eta 00m:00s] synctest: (groupid=0, jobs=1): err= 0: pid=656097: Wed Jun 18 09:38:54 2025 read: IOPS=121, BW=485KiB/s (496kB/s)(284MiB/600006msec) slat (usec): min=7, max=1126, avg=23.70, stdev=22.33 clat (usec): min=99, max=128424, avg=2834.26, stdev=1977.13 lat (usec): min=299, max=128439, avg=2857.97, stdev=1975.91 clat percentiles (usec): | 1.00th=[ 416], 5.00th=[ 519], 10.00th=[ 660], 20.00th=[ 1516], | 30.00th=[ 2442], 40.00th=[ 2638], 50.00th=[ 2802], 60.00th=[ 3523], | 70.00th=[ 3654], 80.00th=[ 3818], 90.00th=[ 4555], 95.00th=[ 4752], | 99.00th=[ 5211], 99.50th=[ 7308], 99.90th=[12125], 99.95th=[23725], | 99.99th=[71828] bw ( KiB/s): min= 64, max= 672, per=100.00%, avg=485.12, stdev=75.21, samples=1199 iops : min= 16, max= 168, avg=121.27, stdev=18.80, samples=1199 write: IOPS=120, BW=481KiB/s (493kB/s)(282MiB/600006msec); 0 zone resets slat (usec): min=10, max=1168, avg=26.84, stdev=22.15 clat (usec): min=1193, max=307265, avg=5397.02, stdev=8361.82 lat (usec): min=1343, max=307308, avg=5423.86, stdev=8362.12 clat percentiles (usec): | 1.00th=[ 1729], 5.00th=[ 1975], 10.00th=[ 2245], 20.00th=[ 3097], | 30.00th=[ 3884], 40.00th=[ 4178], 50.00th=[ 4621], 60.00th=[ 5014], | 70.00th=[ 5276], 80.00th=[ 5800], 90.00th=[ 6456], 95.00th=[ 7898], | 99.00th=[ 32900], 99.50th=[ 66847], 99.90th=[132645], 99.95th=[154141], | 99.99th=[170918] bw ( KiB/s): min= 56, max= 672, per=99.91%, avg=481.74, stdev=74.40, samples=1199 iops : min= 14, max= 168, avg=120.43, stdev=18.60, samples=1199 lat (usec) : 100=0.01%, 250=0.01%, 500=2.08%, 750=4.49%, 1000=2.13% lat (msec) : 2=8.44%, 4=42.49%, 10=38.81%, 20=0.89%, 50=0.31% lat (msec) : 100=0.24%, 250=0.11%, 500=0.01% cpu : usr=0.20%, sys=0.75%, ctx=145300, majf=0, minf=12 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=72728,72217,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs): READ: bw=485KiB/s (496kB/s), 485KiB/s-485KiB/s (496kB/s-496kB/s), io=284MiB (298MB), run=600006-600006msec WRITE: bw=481KiB/s (493kB/s), 481KiB/s-481KiB/s (493kB/s-493kB/s), io=282MiB (296MB), run=600006-600006msec
Disk stats (read/write): vda: ios=72722/74795, sectors=582248/768053, merge=0/67, ticks=206476/860665, in_queue=1067364, util=98.78%
[2] second type of hung tasks that I considered unrelated so far
May 17 03:30:23 kernel: INFO: task kworker/u18:2:19320 blocked for more than 122 seconds. May 17 03:30:23 kernel: Not tainted 6.12.28 #1-NixOS May 17 03:30:23 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 17 03:30:23 kernel: task:kworker/u18:2 state:D stack:0 pid:19320 tgid:19320 ppid:2 flags:0x00004000 May 17 03:30:23 kernel: Workqueue: writeback wb_workfn (flush-253:0) May 17 03:30:23 kernel: Call Trace: May 17 03:30:23 kernel: <TASK> May 17 03:30:23 kernel: __schedule+0x442/0x12d0 May 17 03:30:23 kernel: schedule+0x27/0xf0 May 17 03:30:23 kernel: io_schedule+0x46/0x70 May 17 03:30:23 kernel: folio_wait_bit_common+0x13f/0x340 May 17 03:30:23 kernel: ? __pfx_wake_page_function+0x10/0x10 May 17 03:30:23 kernel: writeback_iter+0x1ec/0x2d0 May 17 03:30:23 kernel: iomap_writepages+0x74/0x9e0 May 17 03:30:23 kernel: ? virtqueue_add_split+0xb1/0x7a0 [virtio_ring] May 17 03:30:23 kernel: ? virtqueue_add_split+0x2af/0x7a0 [virtio_ring] May 17 03:30:23 kernel: xfs_vm_writepages+0x67/0xa0 [xfs] May 17 03:30:23 kernel: do_writepages+0x8a/0x290 May 17 03:30:23 kernel: ? enqueue_hrtimer+0x35/0x90 May 17 03:30:23 kernel: ? hrtimer_start_range_ns+0x2b7/0x450 May 17 03:30:23 kernel: __writeback_single_inode+0x3d/0x350 May 17 03:30:23 kernel: ? wbc_detach_inode+0x116/0x250 May 17 03:30:23 kernel: writeback_sb_inodes+0x228/0x4e0 May 17 03:30:23 kernel: __writeback_inodes_wb+0x4c/0xf0 May 17 03:30:23 kernel: wb_writeback+0x1ac/0x330 May 17 03:30:23 kernel: ? get_nr_inodes+0x3b/0x60 May 17 03:30:23 kernel: wb_workfn+0x357/0x460 May 17 03:30:23 kernel: process_one_work+0x192/0x3b0 May 17 03:30:23 kernel: worker_thread+0x230/0x340 May 17 03:30:23 kernel: ? __pfx_worker_thread+0x10/0x10 May 17 03:30:23 kernel: kthread+0xd0/0x100 May 17 03:30:23 kernel: ? __pfx_kthread+0x10/0x10 May 17 03:30:23 kernel: ret_from_fork+0x34/0x50 May 17 03:30:23 kernel: ? __pfx_kthread+0x10/0x10 May 17 03:30:23 kernel: ret_from_fork_asm+0x1a/0x30 May 17 03:30:23 kernel: </TASK> May 17 03:30:23 kernel: INFO: task nix:21146 blocked for more than 122 seconds. May 17 03:30:23 kernel: Not tainted 6.12.28 #1-NixOS May 17 03:30:23 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 17 03:30:23 kernel: task:nix state:D stack:0 pid:21146 tgid:21146 ppid:21145 flags:0x00000002 May 17 03:30:23 kernel: Call Trace: May 17 03:30:23 kernel: <TASK> May 17 03:30:23 kernel: __schedule+0x442/0x12d0 May 17 03:30:23 kernel: ? xas_load+0xd/0xe0 May 17 03:30:23 kernel: ? xa_load+0x77/0xb0 May 17 03:30:23 kernel: schedule+0x27/0xf0 May 17 03:30:23 kernel: io_schedule+0x46/0x70 May 17 03:30:23 kernel: folio_wait_bit_common+0x13f/0x340 May 17 03:30:23 kernel: ? __pfx_wake_page_function+0x10/0x10 May 17 03:30:23 kernel: folio_wait_writeback+0x2b/0x90 May 17 03:30:23 kernel: truncate_inode_partial_folio+0x5e/0x1c0 May 17 03:30:23 kernel: truncate_inode_pages_range+0x1de/0x410 May 17 03:30:23 kernel: truncate_pagecache+0x47/0x60 May 17 03:30:23 kernel: xfs_setattr_size+0xf6/0x3c0 [xfs] May 17 03:30:23 kernel: xfs_vn_setattr+0x85/0x150 [xfs] May 17 03:30:23 kernel: notify_change+0x301/0x500 May 17 03:30:23 kernel: ? do_truncate+0x98/0xf0 May 17 03:30:23 kernel: do_truncate+0x98/0xf0 May 17 03:30:23 kernel: do_ftruncate+0x104/0x170 May 17 03:30:23 kernel: do_sys_ftruncate+0x3d/0x80 May 17 03:30:23 kernel: do_syscall_64+0xb7/0x210 May 17 03:30:23 kernel: entry_SYSCALL_64_after_hwframe+0x77/0x7f May 17 03:30:23 kernel: RIP: 0033:0x7f87aeb0d75b May 17 03:30:23 kernel: RSP: 002b:00007ffddbfc04b8 EFLAGS: 00000213 ORIG_RAX: 000000000000004d May 17 03:30:23 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87aeb0d75b May 17 03:30:23 kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008 May 17 03:30:23 kernel: RBP: 0000557d5dd46498 R08: 0000000000000000 R09: 0000000000000000 May 17 03:30:23 kernel: R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000000008 May 17 03:30:23 kernel: R13: 0000557d5dcd2aa0 R14: 0000557d5dd46468 R15: 0000000000000008 May 17 03:30:23 kernel: </TASK>
May 17 03:32:26 kernel: INFO: task kworker/u18:2:19320 blocked for more than 245 seconds. May 17 03:32:26 kernel: Not tainted 6.12.28 #1-NixOS May 17 03:32:26 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 17 03:32:26 kernel: task:kworker/u18:2 state:D stack:0 pid:19320 tgid:19320 ppid:2 flags:0x00004000 May 17 03:32:26 kernel: Workqueue: writeback wb_workfn (flush-253:0) May 17 03:32:26 kernel: Call Trace: May 17 03:32:26 kernel: <TASK> May 17 03:32:26 kernel: __schedule+0x442/0x12d0 May 17 03:32:26 kernel: schedule+0x27/0xf0 May 17 03:32:26 kernel: io_schedule+0x46/0x70 May 17 03:32:26 kernel: folio_wait_bit_common+0x13f/0x340 May 17 03:32:26 kernel: ? __pfx_wake_page_function+0x10/0x10 May 17 03:32:26 kernel: writeback_iter+0x1ec/0x2d0 May 17 03:32:26 kernel: iomap_writepages+0x74/0x9e0 May 17 03:32:26 kernel: ? virtqueue_add_split+0xb1/0x7a0 [virtio_ring] May 17 03:32:26 kernel: ? virtqueue_add_split+0x2af/0x7a0 [virtio_ring] May 17 03:32:26 kernel: xfs_vm_writepages+0x67/0xa0 [xfs] May 17 03:32:26 kernel: do_writepages+0x8a/0x290 May 17 03:32:26 kernel: ? enqueue_hrtimer+0x35/0x90 May 17 03:32:26 kernel: ? hrtimer_start_range_ns+0x2b7/0x450 May 17 03:32:26 kernel: __writeback_single_inode+0x3d/0x350 May 17 03:32:26 kernel: ? wbc_detach_inode+0x116/0x250 May 17 03:32:26 kernel: writeback_sb_inodes+0x228/0x4e0 May 17 03:32:26 kernel: __writeback_inodes_wb+0x4c/0xf0 May 17 03:32:26 kernel: wb_writeback+0x1ac/0x330 May 17 03:32:26 kernel: ? get_nr_inodes+0x3b/0x60 May 17 03:32:26 kernel: wb_workfn+0x357/0x460 May 17 03:32:26 kernel: process_one_work+0x192/0x3b0 May 17 03:32:26 kernel: worker_thread+0x230/0x340 May 17 03:32:26 kernel: ? __pfx_worker_thread+0x10/0x10 May 17 03:32:26 kernel: kthread+0xd0/0x100 May 17 03:32:26 kernel: ? __pfx_kthread+0x10/0x10 May 17 03:32:26 kernel: ret_from_fork+0x34/0x50 May 17 03:32:26 kernel: ? __pfx_kthread+0x10/0x10 May 17 03:32:26 kernel: ret_from_fork_asm+0x1a/0x30 May 17 03:32:26 kernel: </TASK> May 17 03:32:26 kernel: INFO: task nix:21146 blocked for more than 245 seconds. May 17 03:32:26 kernel: Not tainted 6.12.28 #1-NixOS May 17 03:32:26 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 17 03:32:26 kernel: task:nix state:D stack:0 pid:21146 tgid:21146 ppid:21145 flags:0x00000002 May 17 03:32:26 kernel: Call Trace: May 17 03:32:26 kernel: <TASK> May 17 03:32:26 kernel: __schedule+0x442/0x12d0 May 17 03:32:26 kernel: ? xas_load+0xd/0xe0 May 17 03:32:26 kernel: ? xa_load+0x77/0xb0 May 17 03:32:26 kernel: schedule+0x27/0xf0 May 17 03:32:26 kernel: io_schedule+0x46/0x70 May 17 03:32:26 kernel: folio_wait_bit_common+0x13f/0x340 May 17 03:32:26 kernel: ? __pfx_wake_page_function+0x10/0x10 May 17 03:32:26 kernel: folio_wait_writeback+0x2b/0x90 May 17 03:32:26 kernel: truncate_inode_partial_folio+0x5e/0x1c0 May 17 03:32:26 kernel: truncate_inode_pages_range+0x1de/0x410 May 17 03:32:26 kernel: truncate_pagecache+0x47/0x60 May 17 03:32:26 kernel: xfs_setattr_size+0xf6/0x3c0 [xfs] May 17 03:32:26 kernel: xfs_vn_setattr+0x85/0x150 [xfs] May 17 03:32:26 kernel: notify_change+0x301/0x500 May 17 03:32:26 kernel: ? do_truncate+0x98/0xf0 May 17 03:32:26 kernel: do_truncate+0x98/0xf0 May 17 03:32:26 kernel: do_ftruncate+0x104/0x170 May 17 03:32:26 kernel: do_sys_ftruncate+0x3d/0x80 May 17 03:32:26 kernel: do_syscall_64+0xb7/0x210 May 17 03:32:26 kernel: entry_SYSCALL_64_after_hwframe+0x77/0x7f May 17 03:32:26 kernel: RIP: 0033:0x7f87aeb0d75b May 17 03:32:26 kernel: RSP: 002b:00007ffddbfc04b8 EFLAGS: 00000213 ORIG_RAX: 000000000000004d May 17 03:32:26 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87aeb0d75b May 17 03:32:26 kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008 May 17 03:32:26 kernel: RBP: 0000557d5dd46498 R08: 0000000000000000 R09: 0000000000000000 May 17 03:32:26 kernel: R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000000008 May 17 03:32:26 kernel: R13: 0000557d5dcd2aa0 R14: 0000557d5dd46468 R15: 0000000000000008 May 17 03:32:26 kernel: </TASK>
May 17 03:34:29 kernel: INFO: task kworker/u18:2:19320 blocked for more than 368 seconds. May 17 03:34:29 kernel: Not tainted 6.12.28 #1-NixOS May 17 03:34:29 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 17 03:34:29 kernel: task:kworker/u18:2 state:D stack:0 pid:19320 tgid:19320 ppid:2 flags:0x00004000 May 17 03:34:29 kernel: Workqueue: writeback wb_workfn (flush-253:0) May 17 03:34:29 kernel: Call Trace: May 17 03:34:29 kernel: <TASK> May 17 03:34:29 kernel: __schedule+0x442/0x12d0 May 17 03:34:29 kernel: schedule+0x27/0xf0 May 17 03:34:29 kernel: io_schedule+0x46/0x70 May 17 03:34:29 kernel: folio_wait_bit_common+0x13f/0x340 May 17 03:34:29 kernel: ? __pfx_wake_page_function+0x10/0x10 May 17 03:34:29 kernel: writeback_iter+0x1ec/0x2d0 May 17 03:34:29 kernel: iomap_writepages+0x74/0x9e0 May 17 03:34:29 kernel: ? virtqueue_add_split+0xb1/0x7a0 [virtio_ring] May 17 03:34:29 kernel: ? virtqueue_add_split+0x2af/0x7a0 [virtio_ring] May 17 03:34:29 kernel: xfs_vm_writepages+0x67/0xa0 [xfs] May 17 03:34:29 kernel: do_writepages+0x8a/0x290 May 17 03:34:29 kernel: ? enqueue_hrtimer+0x35/0x90 May 17 03:34:29 kernel: ? hrtimer_start_range_ns+0x2b7/0x450 May 17 03:34:29 kernel: __writeback_single_inode+0x3d/0x350 May 17 03:34:29 kernel: ? wbc_detach_inode+0x116/0x250 May 17 03:34:29 kernel: writeback_sb_inodes+0x228/0x4e0 May 17 03:34:29 kernel: __writeback_inodes_wb+0x4c/0xf0 May 17 03:34:29 kernel: wb_writeback+0x1ac/0x330 May 17 03:34:29 kernel: ? get_nr_inodes+0x3b/0x60 May 17 03:34:29 kernel: wb_workfn+0x357/0x460 May 17 03:34:29 kernel: process_one_work+0x192/0x3b0 May 17 03:34:29 kernel: worker_thread+0x230/0x340 May 17 03:34:29 kernel: ? __pfx_worker_thread+0x10/0x10 May 17 03:34:29 kernel: kthread+0xd0/0x100 May 17 03:34:29 kernel: ? __pfx_kthread+0x10/0x10 May 17 03:34:29 kernel: ret_from_fork+0x34/0x50 May 17 03:34:29 kernel: ? __pfx_kthread+0x10/0x10 May 17 03:34:29 kernel: ret_from_fork_asm+0x1a/0x30 May 17 03:34:29 kernel: </TASK> May 17 03:34:29 kernel: INFO: task nix:21146 blocked for more than 368 seconds. May 17 03:34:29 kernel: Not tainted 6.12.28 #1-NixOS May 17 03:34:29 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 17 03:34:29 kernel: task:nix state:D stack:0 pid:21146 tgid:21146 ppid:21145 flags:0x00000002 May 17 03:34:29 kernel: Call Trace: May 17 03:34:29 kernel: <TASK> May 17 03:34:29 kernel: __schedule+0x442/0x12d0 May 17 03:34:29 kernel: ? xas_load+0xd/0xe0 May 17 03:34:29 kernel: ? xa_load+0x77/0xb0 May 17 03:34:29 kernel: schedule+0x27/0xf0 May 17 03:34:29 kernel: io_schedule+0x46/0x70 May 17 03:34:29 kernel: folio_wait_bit_common+0x13f/0x340 May 17 03:34:29 kernel: ? __pfx_wake_page_function+0x10/0x10 May 17 03:34:29 kernel: folio_wait_writeback+0x2b/0x90 May 17 03:34:29 kernel: truncate_inode_partial_folio+0x5e/0x1c0 May 17 03:34:29 kernel: truncate_inode_pages_range+0x1de/0x410 May 17 03:34:29 kernel: truncate_pagecache+0x47/0x60 May 17 03:34:29 kernel: xfs_setattr_size+0xf6/0x3c0 [xfs] May 17 03:34:29 kernel: xfs_vn_setattr+0x85/0x150 [xfs] May 17 03:34:29 kernel: notify_change+0x301/0x500 May 17 03:34:29 kernel: ? do_truncate+0x98/0xf0 May 17 03:34:29 kernel: do_truncate+0x98/0xf0 May 17 03:34:29 kernel: do_ftruncate+0x104/0x170 May 17 03:34:29 kernel: do_sys_ftruncate+0x3d/0x80 May 17 03:34:29 kernel: do_syscall_64+0xb7/0x210 May 17 03:34:29 kernel: entry_SYSCALL_64_after_hwframe+0x77/0x7f May 17 03:34:29 kernel: RIP: 0033:0x7f87aeb0d75b May 17 03:34:29 kernel: RSP: 002b:00007ffddbfc04b8 EFLAGS: 00000213 ORIG_RAX: 000000000000004d May 17 03:34:29 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87aeb0d75b May 17 03:34:29 kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008 May 17 03:34:29 kernel: RBP: 0000557d5dd46498 R08: 0000000000000000 R09: 0000000000000000 May 17 03:34:29 kernel: R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000000008 May 17 03:34:29 kernel: R13: 0000557d5dcd2aa0 R14: 0000557d5dd46468 R15: 0000000000000008 May 17 03:34:29 kernel: </TASK>