Re: temporary hung tasks on XFS since updating to 6.6.92

18 Jun 2025


      ...
On 18. Jun 2025, at 00:28, Dave Chinner david@fromorbit.com wrote:
On Mon, Jun 16, 2025 at 12:09:21PM +0200, Christian Theune wrote:
...
...
Can you share the xfs_info of one of these filesystems? I'm curious about the FS
geometry.
Sure:
# xfs_info /
meta-data=/dev/disk/by-label/root isize=512    agcount=21, agsize=655040 blks
        =                       sectsz=512   attr=2, projid32bit=1
        =                       crc=1        finobt=1, sparse=1, rmapbt=0
        =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
        =                       exchange=0
data     =                       bsize=4096   blocks=13106171, imaxpct=25
        =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=0
log      =internal log           bsize=4096   blocks=16384, version=2
        =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
From the logs, it was /dev/vda1 that was getting hung up, so I'm
going to assume the workload is hitting the root partition, not:
...
# xfs_info /tmp/
meta-data=/dev/vdb1              isize=512    agcount=8, agsize=229376 blks
... this one that has a small log.
IOWs, I don't think the log size is a contributing factor here.
The indication from the logs is that the system is hung up waiting
on slow journal writes. e.g. there are processes hung waiting for
transaction reservations (i.e. no journal space available). Journal
space is backed up on metadata writeback trying to force the journal
to stable storage (which is blocked waiting for journal IO
completion so it can issue more journal IO) and getting blocked so
it can't make progress, either.
I think part of the issue is that journal writes issue device cache
flushes and FUA writes, both of which require written data to be
on stable storage before returning.
All this points to whatever storage is backing these VMs is
extremely slow at guaranteeing persistence of data and eventually it
can't keep up with the application making changes to the filesystem.
When the journal IO latency gets high enough you start to see things
backing up and stall warnings appearing.
IOWs, this does not look like a filesystem issue from the
information presented, just storage that can't keep up with the rate
at which the filesystem can make modifications in memory. When the
fs finally starts to throttle on the slow storage, that's when you
notice just how slow the storage actually is...
[ Historical note: this is exactly the sort of thing we have seen
for years with hardware RAID5/6 adapters with large amounts of NVRAM
and random write workloads. They run as fast as NVRAM can sink the
4kB random writes, then when the NVRAM fills, they have to wait for
hundreds of MB of cached 4kB random writes to be written to the
RAID5/6 luns at 50-100 IOPS. This causes the exact same "filesystem
is hung" symptoms as you are describing in this thread. ]
Yeah, I’m very wary of reporting these tracebacks as potential bugs because of them easily being just a hint on slow storage. My problem here is that I can’t point to anything that says the storage would have been slow.
I’ve gone through all metrics and logs on the KVM servers as well as the Ceph servers and they’ve been performing completely at baseline level regarding errors, queues, iops, latency.
I’ve done a measurement to try to emulate those accesses by running
$ fio --rw=randrw --name=synctest --bs=4k --direct=1 --numjobs=1 --ioengine=libaio --iodepth=1 --runtime=600 --write_barrier=1 --size=60m
I hope this is sufficiently comparable behaviour (maybe with a different read/write ratio instead of 0.5?) to what XFS log flushing does. This resulted in [1].
My interpretation of this measurement (and the VM showed no illnes while this was running over 10 minutes): the VM is throttled at 250 IOPs and is reporting back after 10 minutes of 4k random writes with average IOPS of exactly 250. The latencies are a bit varied, this could be due to Qemu throttling. The max latency was 133ms, the average 2ms. This is on a 10g storage network with Ceph that requires another network roundtrip for replication before ACKing a write.
There is one more (somewhat far fetched thing) I could pull in here: out of 11 VMs that have seen that are exhibiting those symptoms I have seen two (including one already running on 6.12 at that time) that did log a stacktrace[2] that reminded me of the memory/folio issue we debugged late last year / earlier this year (https://lkml.org/lkml/2024/9/12/1472)) … maybe it’s worthwhile to consider whether this might be related. The outside symptoms are similar: it recovers on its own at some point and I can’t show any issue with the underlying storage at all.
I’m out of ideas for now, I’ll keep thinking about this. If anyone has any pointer for further tests in any direction, I’m open to anything. ;)
Thanks for all the help so far,
Christian
[1] The test results:
synctest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.38
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=484KiB/s,w=517KiB/s][r=121,w=129 IOPS][eta 00m:00s]
synctest: (groupid=0, jobs=1): err= 0: pid=655973: Wed Jun 18 09:28:31 2025
  read: IOPS=122, BW=489KiB/s (501kB/s)(29.9MiB/62557msec)
    slat (usec): min=8, max=1096, avg=22.94, stdev=24.68
    clat (usec): min=285, max=133773, avg=2745.14, stdev=2723.35
     lat (usec): min=296, max=133812, avg=2768.08, stdev=2723.35
    clat percentiles (usec):
     |  1.00th=[   416],  5.00th=[   506], 10.00th=[   611], 20.00th=[   832],
     | 30.00th=[  1713], 40.00th=[  2540], 50.00th=[  2737], 60.00th=[  3458],
     | 70.00th=[  3621], 80.00th=[  3785], 90.00th=[  4555], 95.00th=[  4817],
     | 99.00th=[  6063], 99.50th=[  7898], 99.90th=[ 22676], 99.95th=[ 78119],
     | 99.99th=[133694]
   bw (  KiB/s): min=  304, max=  608, per=99.90%, avg=489.79, stdev=59.34, samples=125
   iops        : min=   76, max=  152, avg=122.45, stdev=14.83, samples=125
  write: IOPS=123, BW=493KiB/s (504kB/s)(30.1MiB/62557msec); 0 zone resets
    slat (usec): min=11, max=1102, avg=25.54, stdev=23.33
    clat (usec): min=1434, max=140566, avg=5337.34, stdev=7163.78
     lat (usec): min=1453, max=140651, avg=5362.88, stdev=7164.02
    clat percentiles (usec):
     |  1.00th=[  1713],  5.00th=[  1926], 10.00th=[  2114], 20.00th=[  2868],
     | 30.00th=[  3720], 40.00th=[  4080], 50.00th=[  4490], 60.00th=[  5014],
     | 70.00th=[  5276], 80.00th=[  5866], 90.00th=[  6652], 95.00th=[  8979],
     | 99.00th=[ 33162], 99.50th=[ 64750], 99.90th=[ 99091], 99.95th=[111674],
     | 99.99th=[141558]
   bw (  KiB/s): min=  336, max=  672, per=99.86%, avg=492.93, stdev=60.30, samples=125
   iops        : min=   84, max=  168, avg=123.23, stdev=15.08, samples=125
  lat (usec)   : 500=2.28%, 750=6.19%, 1000=2.66%
  lat (msec)   : 2=9.39%, 4=40.59%, 10=36.82%, 20=1.26%, 50=0.41%
  lat (msec)   : 100=0.34%, 250=0.05%
  cpu          : usr=0.17%, sys=0.73%, ctx=15398, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=7655,7705,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
   READ: bw=489KiB/s (501kB/s), 489KiB/s-489KiB/s (501kB/s-501kB/s), io=29.9MiB (31.4MB), run=62557-62557msec
  WRITE: bw=493KiB/s (504kB/s), 493KiB/s-493KiB/s (504kB/s-504kB/s), io=30.1MiB (31.6MB), run=62557-62557msec
Disk stats (read/write):
  vda: ios=7662/7865, sectors=61856/71089, merge=0/5, ticks=21263/105909, in_queue=127177, util=98.62%
ctheune@hannover96stag00:~/ > fio --rw=randrw --name=synctest --bs=4k --direct=1 --numjobs=1 --ioengine=libaio --iodepth=1 --runtime=600 --write_barrier=1 --size=60m --time_based
synctest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.38
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=408KiB/s,w=344KiB/s][r=102,w=86 IOPS][eta 00m:00s]
synctest: (groupid=0, jobs=1): err= 0: pid=656097: Wed Jun 18 09:38:54 2025
  read: IOPS=121, BW=485KiB/s (496kB/s)(284MiB/600006msec)
    slat (usec): min=7, max=1126, avg=23.70, stdev=22.33
    clat (usec): min=99, max=128424, avg=2834.26, stdev=1977.13
     lat (usec): min=299, max=128439, avg=2857.97, stdev=1975.91
    clat percentiles (usec):
     |  1.00th=[  416],  5.00th=[  519], 10.00th=[  660], 20.00th=[ 1516],
     | 30.00th=[ 2442], 40.00th=[ 2638], 50.00th=[ 2802], 60.00th=[ 3523],
     | 70.00th=[ 3654], 80.00th=[ 3818], 90.00th=[ 4555], 95.00th=[ 4752],
     | 99.00th=[ 5211], 99.50th=[ 7308], 99.90th=[12125], 99.95th=[23725],
     | 99.99th=[71828]
   bw (  KiB/s): min=   64, max=  672, per=100.00%, avg=485.12, stdev=75.21, samples=1199
   iops        : min=   16, max=  168, avg=121.27, stdev=18.80, samples=1199
  write: IOPS=120, BW=481KiB/s (493kB/s)(282MiB/600006msec); 0 zone resets
    slat (usec): min=10, max=1168, avg=26.84, stdev=22.15
    clat (usec): min=1193, max=307265, avg=5397.02, stdev=8361.82
     lat (usec): min=1343, max=307308, avg=5423.86, stdev=8362.12
    clat percentiles (usec):
     |  1.00th=[  1729],  5.00th=[  1975], 10.00th=[  2245], 20.00th=[  3097],
     | 30.00th=[  3884], 40.00th=[  4178], 50.00th=[  4621], 60.00th=[  5014],
     | 70.00th=[  5276], 80.00th=[  5800], 90.00th=[  6456], 95.00th=[  7898],
     | 99.00th=[ 32900], 99.50th=[ 66847], 99.90th=[132645], 99.95th=[154141],
     | 99.99th=[170918]
   bw (  KiB/s): min=   56, max=  672, per=99.91%, avg=481.74, stdev=74.40, samples=1199
   iops        : min=   14, max=  168, avg=120.43, stdev=18.60, samples=1199
  lat (usec)   : 100=0.01%, 250=0.01%, 500=2.08%, 750=4.49%, 1000=2.13%
  lat (msec)   : 2=8.44%, 4=42.49%, 10=38.81%, 20=0.89%, 50=0.31%
  lat (msec)   : 100=0.24%, 250=0.11%, 500=0.01%
  cpu          : usr=0.20%, sys=0.75%, ctx=145300, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=72728,72217,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
   READ: bw=485KiB/s (496kB/s), 485KiB/s-485KiB/s (496kB/s-496kB/s), io=284MiB (298MB), run=600006-600006msec
  WRITE: bw=481KiB/s (493kB/s), 481KiB/s-481KiB/s (493kB/s-493kB/s), io=282MiB (296MB), run=600006-600006msec
Disk stats (read/write):
  vda: ios=72722/74795, sectors=582248/768053, merge=0/67, ticks=206476/860665, in_queue=1067364, util=98.78%
[2] second type of hung tasks that I considered unrelated so far
May 17 03:30:23  kernel: INFO: task kworker/u18:2:19320 blocked for more than 122 seconds.
May 17 03:30:23  kernel:       Not tainted 6.12.28 #1-NixOS
May 17 03:30:23  kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 03:30:23  kernel: task:kworker/u18:2   state:D stack:0     pid:19320 tgid:19320 ppid:2      flags:0x00004000
May 17 03:30:23  kernel: Workqueue: writeback wb_workfn (flush-253:0)
May 17 03:30:23  kernel: Call Trace:
May 17 03:30:23  kernel:  <TASK>
May 17 03:30:23  kernel:  __schedule+0x442/0x12d0
May 17 03:30:23  kernel:  schedule+0x27/0xf0
May 17 03:30:23  kernel:  io_schedule+0x46/0x70
May 17 03:30:23  kernel:  folio_wait_bit_common+0x13f/0x340
May 17 03:30:23  kernel:  ? __pfx_wake_page_function+0x10/0x10
May 17 03:30:23  kernel:  writeback_iter+0x1ec/0x2d0
May 17 03:30:23  kernel:  iomap_writepages+0x74/0x9e0
May 17 03:30:23  kernel:  ? virtqueue_add_split+0xb1/0x7a0 [virtio_ring]
May 17 03:30:23  kernel:  ? virtqueue_add_split+0x2af/0x7a0 [virtio_ring]
May 17 03:30:23  kernel:  xfs_vm_writepages+0x67/0xa0 [xfs]
May 17 03:30:23  kernel:  do_writepages+0x8a/0x290
May 17 03:30:23  kernel:  ? enqueue_hrtimer+0x35/0x90
May 17 03:30:23  kernel:  ? hrtimer_start_range_ns+0x2b7/0x450
May 17 03:30:23  kernel:  __writeback_single_inode+0x3d/0x350
May 17 03:30:23  kernel:  ? wbc_detach_inode+0x116/0x250
May 17 03:30:23  kernel:  writeback_sb_inodes+0x228/0x4e0
May 17 03:30:23  kernel:  __writeback_inodes_wb+0x4c/0xf0
May 17 03:30:23  kernel:  wb_writeback+0x1ac/0x330
May 17 03:30:23  kernel:  ? get_nr_inodes+0x3b/0x60
May 17 03:30:23  kernel:  wb_workfn+0x357/0x460
May 17 03:30:23  kernel:  process_one_work+0x192/0x3b0
May 17 03:30:23  kernel:  worker_thread+0x230/0x340
May 17 03:30:23  kernel:  ? __pfx_worker_thread+0x10/0x10
May 17 03:30:23  kernel:  kthread+0xd0/0x100
May 17 03:30:23  kernel:  ? __pfx_kthread+0x10/0x10
May 17 03:30:23  kernel:  ret_from_fork+0x34/0x50
May 17 03:30:23  kernel:  ? __pfx_kthread+0x10/0x10
May 17 03:30:23  kernel:  ret_from_fork_asm+0x1a/0x30
May 17 03:30:23  kernel:  </TASK>
May 17 03:30:23  kernel: INFO: task nix:21146 blocked for more than 122 seconds.
May 17 03:30:23  kernel:       Not tainted 6.12.28 #1-NixOS
May 17 03:30:23  kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 03:30:23  kernel: task:nix             state:D stack:0     pid:21146 tgid:21146 ppid:21145  flags:0x00000002
May 17 03:30:23  kernel: Call Trace:
May 17 03:30:23  kernel:  <TASK>
May 17 03:30:23  kernel:  __schedule+0x442/0x12d0
May 17 03:30:23  kernel:  ? xas_load+0xd/0xe0
May 17 03:30:23  kernel:  ? xa_load+0x77/0xb0
May 17 03:30:23  kernel:  schedule+0x27/0xf0
May 17 03:30:23  kernel:  io_schedule+0x46/0x70
May 17 03:30:23  kernel:  folio_wait_bit_common+0x13f/0x340
May 17 03:30:23  kernel:  ? __pfx_wake_page_function+0x10/0x10
May 17 03:30:23  kernel:  folio_wait_writeback+0x2b/0x90
May 17 03:30:23  kernel:  truncate_inode_partial_folio+0x5e/0x1c0
May 17 03:30:23  kernel:  truncate_inode_pages_range+0x1de/0x410
May 17 03:30:23  kernel:  truncate_pagecache+0x47/0x60
May 17 03:30:23  kernel:  xfs_setattr_size+0xf6/0x3c0 [xfs]
May 17 03:30:23  kernel:  xfs_vn_setattr+0x85/0x150 [xfs]
May 17 03:30:23  kernel:  notify_change+0x301/0x500
May 17 03:30:23  kernel:  ? do_truncate+0x98/0xf0
May 17 03:30:23  kernel:  do_truncate+0x98/0xf0
May 17 03:30:23  kernel:  do_ftruncate+0x104/0x170
May 17 03:30:23  kernel:  do_sys_ftruncate+0x3d/0x80
May 17 03:30:23  kernel:  do_syscall_64+0xb7/0x210
May 17 03:30:23  kernel:  entry_SYSCALL_64_after_hwframe+0x77/0x7f
May 17 03:30:23  kernel: RIP: 0033:0x7f87aeb0d75b
May 17 03:30:23  kernel: RSP: 002b:00007ffddbfc04b8 EFLAGS: 00000213 ORIG_RAX: 000000000000004d
May 17 03:30:23  kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87aeb0d75b
May 17 03:30:23  kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008
May 17 03:30:23  kernel: RBP: 0000557d5dd46498 R08: 0000000000000000 R09: 0000000000000000
May 17 03:30:23  kernel: R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000000008
May 17 03:30:23  kernel: R13: 0000557d5dcd2aa0 R14: 0000557d5dd46468 R15: 0000000000000008
May 17 03:30:23  kernel:  </TASK>
May 17 03:32:26  kernel: INFO: task kworker/u18:2:19320 blocked for more than 245 seconds.
May 17 03:32:26  kernel:       Not tainted 6.12.28 #1-NixOS
May 17 03:32:26  kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 03:32:26  kernel: task:kworker/u18:2   state:D stack:0     pid:19320 tgid:19320 ppid:2      flags:0x00004000
May 17 03:32:26  kernel: Workqueue: writeback wb_workfn (flush-253:0)
May 17 03:32:26  kernel: Call Trace:
May 17 03:32:26  kernel:  <TASK>
May 17 03:32:26  kernel:  __schedule+0x442/0x12d0
May 17 03:32:26  kernel:  schedule+0x27/0xf0
May 17 03:32:26  kernel:  io_schedule+0x46/0x70
May 17 03:32:26  kernel:  folio_wait_bit_common+0x13f/0x340
May 17 03:32:26  kernel:  ? __pfx_wake_page_function+0x10/0x10
May 17 03:32:26  kernel:  writeback_iter+0x1ec/0x2d0
May 17 03:32:26  kernel:  iomap_writepages+0x74/0x9e0
May 17 03:32:26  kernel:  ? virtqueue_add_split+0xb1/0x7a0 [virtio_ring]
May 17 03:32:26  kernel:  ? virtqueue_add_split+0x2af/0x7a0 [virtio_ring]
May 17 03:32:26  kernel:  xfs_vm_writepages+0x67/0xa0 [xfs]
May 17 03:32:26  kernel:  do_writepages+0x8a/0x290
May 17 03:32:26  kernel:  ? enqueue_hrtimer+0x35/0x90
May 17 03:32:26  kernel:  ? hrtimer_start_range_ns+0x2b7/0x450
May 17 03:32:26  kernel:  __writeback_single_inode+0x3d/0x350
May 17 03:32:26  kernel:  ? wbc_detach_inode+0x116/0x250
May 17 03:32:26  kernel:  writeback_sb_inodes+0x228/0x4e0
May 17 03:32:26  kernel:  __writeback_inodes_wb+0x4c/0xf0
May 17 03:32:26  kernel:  wb_writeback+0x1ac/0x330
May 17 03:32:26  kernel:  ? get_nr_inodes+0x3b/0x60
May 17 03:32:26  kernel:  wb_workfn+0x357/0x460
May 17 03:32:26  kernel:  process_one_work+0x192/0x3b0
May 17 03:32:26  kernel:  worker_thread+0x230/0x340
May 17 03:32:26  kernel:  ? __pfx_worker_thread+0x10/0x10
May 17 03:32:26  kernel:  kthread+0xd0/0x100
May 17 03:32:26  kernel:  ? __pfx_kthread+0x10/0x10
May 17 03:32:26  kernel:  ret_from_fork+0x34/0x50
May 17 03:32:26  kernel:  ? __pfx_kthread+0x10/0x10
May 17 03:32:26  kernel:  ret_from_fork_asm+0x1a/0x30
May 17 03:32:26  kernel:  </TASK>
May 17 03:32:26  kernel: INFO: task nix:21146 blocked for more than 245 seconds.
May 17 03:32:26  kernel:       Not tainted 6.12.28 #1-NixOS
May 17 03:32:26  kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 03:32:26  kernel: task:nix             state:D stack:0     pid:21146 tgid:21146 ppid:21145  flags:0x00000002
May 17 03:32:26  kernel: Call Trace:
May 17 03:32:26  kernel:  <TASK>
May 17 03:32:26  kernel:  __schedule+0x442/0x12d0
May 17 03:32:26  kernel:  ? xas_load+0xd/0xe0
May 17 03:32:26  kernel:  ? xa_load+0x77/0xb0
May 17 03:32:26  kernel:  schedule+0x27/0xf0
May 17 03:32:26  kernel:  io_schedule+0x46/0x70
May 17 03:32:26  kernel:  folio_wait_bit_common+0x13f/0x340
May 17 03:32:26  kernel:  ? __pfx_wake_page_function+0x10/0x10
May 17 03:32:26  kernel:  folio_wait_writeback+0x2b/0x90
May 17 03:32:26  kernel:  truncate_inode_partial_folio+0x5e/0x1c0
May 17 03:32:26  kernel:  truncate_inode_pages_range+0x1de/0x410
May 17 03:32:26  kernel:  truncate_pagecache+0x47/0x60
May 17 03:32:26  kernel:  xfs_setattr_size+0xf6/0x3c0 [xfs]
May 17 03:32:26  kernel:  xfs_vn_setattr+0x85/0x150 [xfs]
May 17 03:32:26  kernel:  notify_change+0x301/0x500
May 17 03:32:26  kernel:  ? do_truncate+0x98/0xf0
May 17 03:32:26  kernel:  do_truncate+0x98/0xf0
May 17 03:32:26  kernel:  do_ftruncate+0x104/0x170
May 17 03:32:26  kernel:  do_sys_ftruncate+0x3d/0x80
May 17 03:32:26  kernel:  do_syscall_64+0xb7/0x210
May 17 03:32:26  kernel:  entry_SYSCALL_64_after_hwframe+0x77/0x7f
May 17 03:32:26  kernel: RIP: 0033:0x7f87aeb0d75b
May 17 03:32:26  kernel: RSP: 002b:00007ffddbfc04b8 EFLAGS: 00000213 ORIG_RAX: 000000000000004d
May 17 03:32:26  kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87aeb0d75b
May 17 03:32:26  kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008
May 17 03:32:26  kernel: RBP: 0000557d5dd46498 R08: 0000000000000000 R09: 0000000000000000
May 17 03:32:26  kernel: R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000000008
May 17 03:32:26  kernel: R13: 0000557d5dcd2aa0 R14: 0000557d5dd46468 R15: 0000000000000008
May 17 03:32:26  kernel:  </TASK>
May 17 03:34:29  kernel: INFO: task kworker/u18:2:19320 blocked for more than 368 seconds.
May 17 03:34:29  kernel:       Not tainted 6.12.28 #1-NixOS
May 17 03:34:29  kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 03:34:29  kernel: task:kworker/u18:2   state:D stack:0     pid:19320 tgid:19320 ppid:2      flags:0x00004000
May 17 03:34:29  kernel: Workqueue: writeback wb_workfn (flush-253:0)
May 17 03:34:29  kernel: Call Trace:
May 17 03:34:29  kernel:  <TASK>
May 17 03:34:29  kernel:  __schedule+0x442/0x12d0
May 17 03:34:29  kernel:  schedule+0x27/0xf0
May 17 03:34:29  kernel:  io_schedule+0x46/0x70
May 17 03:34:29  kernel:  folio_wait_bit_common+0x13f/0x340
May 17 03:34:29  kernel:  ? __pfx_wake_page_function+0x10/0x10
May 17 03:34:29  kernel:  writeback_iter+0x1ec/0x2d0
May 17 03:34:29  kernel:  iomap_writepages+0x74/0x9e0
May 17 03:34:29  kernel:  ? virtqueue_add_split+0xb1/0x7a0 [virtio_ring]
May 17 03:34:29  kernel:  ? virtqueue_add_split+0x2af/0x7a0 [virtio_ring]
May 17 03:34:29  kernel:  xfs_vm_writepages+0x67/0xa0 [xfs]
May 17 03:34:29  kernel:  do_writepages+0x8a/0x290
May 17 03:34:29  kernel:  ? enqueue_hrtimer+0x35/0x90
May 17 03:34:29  kernel:  ? hrtimer_start_range_ns+0x2b7/0x450
May 17 03:34:29  kernel:  __writeback_single_inode+0x3d/0x350
May 17 03:34:29  kernel:  ? wbc_detach_inode+0x116/0x250
May 17 03:34:29  kernel:  writeback_sb_inodes+0x228/0x4e0
May 17 03:34:29  kernel:  __writeback_inodes_wb+0x4c/0xf0
May 17 03:34:29  kernel:  wb_writeback+0x1ac/0x330
May 17 03:34:29  kernel:  ? get_nr_inodes+0x3b/0x60
May 17 03:34:29  kernel:  wb_workfn+0x357/0x460
May 17 03:34:29  kernel:  process_one_work+0x192/0x3b0
May 17 03:34:29  kernel:  worker_thread+0x230/0x340
May 17 03:34:29  kernel:  ? __pfx_worker_thread+0x10/0x10
May 17 03:34:29  kernel:  kthread+0xd0/0x100
May 17 03:34:29  kernel:  ? __pfx_kthread+0x10/0x10
May 17 03:34:29  kernel:  ret_from_fork+0x34/0x50
May 17 03:34:29  kernel:  ? __pfx_kthread+0x10/0x10
May 17 03:34:29  kernel:  ret_from_fork_asm+0x1a/0x30
May 17 03:34:29  kernel:  </TASK>
May 17 03:34:29  kernel: INFO: task nix:21146 blocked for more than 368 seconds.
May 17 03:34:29  kernel:       Not tainted 6.12.28 #1-NixOS
May 17 03:34:29  kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 03:34:29  kernel: task:nix             state:D stack:0     pid:21146 tgid:21146 ppid:21145  flags:0x00000002
May 17 03:34:29  kernel: Call Trace:
May 17 03:34:29  kernel:  <TASK>
May 17 03:34:29  kernel:  __schedule+0x442/0x12d0
May 17 03:34:29  kernel:  ? xas_load+0xd/0xe0
May 17 03:34:29  kernel:  ? xa_load+0x77/0xb0
May 17 03:34:29  kernel:  schedule+0x27/0xf0
May 17 03:34:29  kernel:  io_schedule+0x46/0x70
May 17 03:34:29  kernel:  folio_wait_bit_common+0x13f/0x340
May 17 03:34:29  kernel:  ? __pfx_wake_page_function+0x10/0x10
May 17 03:34:29  kernel:  folio_wait_writeback+0x2b/0x90
May 17 03:34:29  kernel:  truncate_inode_partial_folio+0x5e/0x1c0
May 17 03:34:29  kernel:  truncate_inode_pages_range+0x1de/0x410
May 17 03:34:29  kernel:  truncate_pagecache+0x47/0x60
May 17 03:34:29  kernel:  xfs_setattr_size+0xf6/0x3c0 [xfs]
May 17 03:34:29  kernel:  xfs_vn_setattr+0x85/0x150 [xfs]
May 17 03:34:29  kernel:  notify_change+0x301/0x500
May 17 03:34:29  kernel:  ? do_truncate+0x98/0xf0
May 17 03:34:29  kernel:  do_truncate+0x98/0xf0
May 17 03:34:29  kernel:  do_ftruncate+0x104/0x170
May 17 03:34:29  kernel:  do_sys_ftruncate+0x3d/0x80
May 17 03:34:29  kernel:  do_syscall_64+0xb7/0x210
May 17 03:34:29  kernel:  entry_SYSCALL_64_after_hwframe+0x77/0x7f
May 17 03:34:29  kernel: RIP: 0033:0x7f87aeb0d75b
May 17 03:34:29  kernel: RSP: 002b:00007ffddbfc04b8 EFLAGS: 00000213 ORIG_RAX: 000000000000004d
May 17 03:34:29  kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87aeb0d75b
May 17 03:34:29  kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008
May 17 03:34:29  kernel: RBP: 0000557d5dd46498 R08: 0000000000000000 R09: 0000000000000000
May 17 03:34:29  kernel: R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000000008
May 17 03:34:29  kernel: R13: 0000557d5dcd2aa0 R14: 0000557d5dd46468 R15: 0000000000000008
May 17 03:34:29  kernel:  </TASK>
-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: temporary hung tasks on XFS since updating to 6.6.92