On Sun, Apr 21, 2024 at 01:19:44PM +0530, Ritesh Harjani (IBM) wrote:
An async dio write to a sparse file can generate a lot of extents and when we unlink this file (using rm), the kernel can be busy in umapping and freeing those extents as part of transaction processing. Add cond_resched() in xfs_defer_finish_noroll() to avoid soft lockups messages. Here is a call trace of such soft lockup.
watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [rm:81335] CPU: 1 PID: 81335 Comm: rm Kdump: loaded Tainted: G L X 5.14.21-150500.53-default
Can you reproduce this on a current TOT kernel? 5.14 is pretty old, and this stack trace:
NIP [c00800001b174768] xfs_extent_busy_trim+0xc0/0x2a0 [xfs] LR [c00800001b1746f4] xfs_extent_busy_trim+0x4c/0x2a0 [xfs] Call Trace: 0xc0000000a8268340 (unreliable) xfs_alloc_compute_aligned+0x5c/0x150 [xfs] xfs_alloc_ag_vextent_size+0x1dc/0x8c0 [xfs] xfs_alloc_ag_vextent+0x17c/0x1c0 [xfs] xfs_alloc_fix_freelist+0x274/0x4b0 [xfs] xfs_free_extent_fix_freelist+0x84/0xe0 [xfs] __xfs_free_extent+0xa0/0x240 [xfs] xfs_trans_free_extent+0x6c/0x140 [xfs] xfs_defer_finish_noroll+0x2b0/0x650 [xfs] xfs_inactive_truncate+0xe8/0x140 [xfs] xfs_fs_destroy_inode+0xdc/0x320 [xfs] destroy_inode+0x6c/0xc0
.... doesn't exist anymore.
xfs_inactive_truncate() is now done from a background inodegc thread, not directly in destroy_inode().
I also suspect that any sort of cond_resched() should be in the top layer loop in xfs_bunmapi_range(), not hidden deep in the defer code. The problem is the number of extents being processed without yielding, not the time spent processing each individual deferred work chain to free the extent. Hence the explicit rescheduling should be at the top level loop where it can be easily explained and understand, not hidden deep inside the defer chain mechanism....
-Dave.