In the code of the LTS branch that is being maintained (from linux-5.4 to linux-6.6), a deadlock occurs in the network reconnection scenario When multiple processes or threads write to the same file concurrently.
Take the code of linux-5.10 as an example. The simplified deadlock process is as follows:
``` Process 1 Process 2 lock_page() - [1] wait_on_page_writeback() - [2] Waiting for writeback, blocked by [4]
lock_page() - [3] Blocked by [1] end_page_writeback() - [4] Won't execute ```
Based on my research, I'm going to use two detailed scenarios to illustrate the issue.
Scenarios 1:
``` P1 (dd) P2 (cifsd) P3 (cifsiod)
cifs_writepages wdata_prepare_pages lock_page - [1] wait_on_page_writeback - [2] Waiting for writeback, blocked by [4] wait_on_page_bit cifs_demultiplex_thread cifs_read_from_socket cifs_readv_from_socket - If another process triggers reconnect at this point cifs_reconnect - mid->mid_state updated to MID_RETRY_NEEDED smb2_writev_callback mid_entry->callback() - mid_state leads to wdata->result = -EAGAIN wdata->result = -EAGAIN queue_work(cifsiod_wq, &wdata->work); cifs_writev_complete - worker function - wdata->result == -EAGAIN Condition satisfied cifs_writev_requeue lock_page - [3] Blocked by [1] end_page_writeback - [4] Won't execute unlock_page ```
Mainline refactoring commit d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") unlock folio while waiting for the writeback to complete. This patch is introduced in v6.3-rc1. Therefore, scenario 1 only affects LTS versions from linux-5.4 to linux-6.1.
Call stack trace:
``` cat /proc/34/stack [<0>] __lock_page+0x147/0x3a0 [<0>] cifs_writev_requeue.cold+0x185/0x28e [<0>] process_one_work+0x1df/0x3b0 [<0>] worker_thread+0x4a/0x3c0 [<0>] kthread+0x125/0x160 [<0>] ret_from_fork+0x22/0x30
# cat /proc/465/stack [<0>] wait_on_page_bit+0x106/0x2e0 [<0>] wait_on_page_writeback+0x25/0xd0 [<0>] cifs_writepages+0x5ee/0xf60 [<0>] do_writepages+0x43/0xe0 [<0>] __filemap_fdatawrite_range+0xcd/0x110 [<0>] file_write_and_wait_range+0x40/0x90 [<0>] cifs_strict_fsync+0x35/0x470 [<0>] do_fsync+0x38/0x70 [<0>] __x64_sys_fsync+0x10/0x20 [<0>] do_syscall_64+0x33/0x40 [<0>] entry_SYSCALL_64_after_hwframe+0x67/0xd1
[ 369.826215] INFO: task kworker/1:1:34 blocked for more than 122 seconds. [ 369.828964] Not tainted 5.10.0+ #164 [ 369.830623] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.835104] task:kworker/1:1 state:D stack:13472 pid: 34 ppid: 2 flags:0x00004000 [ 369.838448] Workqueue: cifsiod cifs_writev_complete [ 369.840242] Call Trace: [ 369.841219] __schedule+0x401/0x8e0 [ 369.842568] schedule+0x49/0x130 [ 369.843785] io_schedule+0x12/0x40 [ 369.845079] __lock_page+0x147/0x3a0 [ 369.846444] ? add_to_page_cache_lru+0x180/0x180 [ 369.847963] cifs_writev_requeue.cold+0x185/0x28e [ 369.849193] process_one_work+0x1df/0x3b0 [ 369.850248] worker_thread+0x4a/0x3c0 [ 369.851216] ? process_one_work+0x3b0/0x3b0 [ 369.852308] kthread+0x125/0x160 [ 369.853167] ? kthread_park+0x90/0x90 [ 369.854142] ret_from_fork+0x22/0x30 [ 369.855054] INFO: task kworker/u8:3:96 blocked for more than 122 seconds. [ 369.856781] Not tainted 5.10.0+ #164 [ 369.857851] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.859419] task:kworker/u8:3 state:D stack:12744 pid: 96 ppid: 2 flags:0x00004000 [ 369.861041] Workqueue: writeback wb_workfn (flush-cifs-2) [ 369.862095] Call Trace: [ 369.862583] __schedule+0x401/0x8e0 [ 369.863280] schedule+0x49/0x130 [ 369.863912] io_schedule+0x12/0x40 [ 369.864604] __lock_page+0x147/0x3a0 [ 369.865322] ? add_to_page_cache_lru+0x180/0x180 [ 369.866246] cifs_writepages+0x620/0xf60 [ 369.867005] do_writepages+0x43/0xe0 [ 369.867737] ? __blk_mq_try_issue_directly+0x121/0x1c0 [ 369.868750] __writeback_single_inode+0x3d/0x320 [ 369.869589] writeback_sb_inodes+0x20d/0x480 [ 369.870367] __writeback_inodes_wb+0x4c/0xe0 [ 369.871148] wb_writeback+0x201/0x2f0 [ 369.871797] wb_workfn+0x38a/0x4e0 [ 369.872427] ? check_preempt_curr+0x47/0x70 [ 369.873191] ? ttwu_do_wakeup.isra.0+0x17/0x170 [ 369.873999] process_one_work+0x1df/0x3b0 [ 369.874741] worker_thread+0x4a/0x3c0 [ 369.875421] ? process_one_work+0x3b0/0x3b0 [ 369.876180] kthread+0x125/0x160 [ 369.876761] ? kthread_park+0x90/0x90 [ 369.877431] ret_from_fork+0x22/0x30 [ 369.878106] INFO: task a.out:465 blocked for more than 122 seconds. [ 369.879225] Not tainted 5.10.0+ #164 [ 369.879945] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.881316] task:a.out state:D stack:12752 pid: 465 ppid: 386 flags:0x00000002 [ 369.882791] Call Trace: [ 369.883263] __schedule+0x401/0x8e0 [ 369.883884] schedule+0x49/0x130 [ 369.884447] io_schedule+0x12/0x40 [ 369.885054] wait_on_page_bit+0x106/0x2e0 [ 369.885795] ? add_to_page_cache_lru+0x180/0x180 [ 369.886631] wait_on_page_writeback+0x25/0xd0 [ 369.887427] cifs_writepages+0x5ee/0xf60 [ 369.888151] do_writepages+0x43/0xe0 [ 369.888789] ? __generic_file_write_iter+0xfd/0x1d0 [ 369.889663] __filemap_fdatawrite_range+0xcd/0x110 [ 369.890523] file_write_and_wait_range+0x40/0x90 [ 369.891360] cifs_strict_fsync+0x35/0x470 [ 369.892094] do_fsync+0x38/0x70 [ 369.892657] __x64_sys_fsync+0x10/0x20 [ 369.893336] do_syscall_64+0x33/0x40 [ 369.893978] entry_SYSCALL_64_after_hwframe+0x67/0xd1 [ 369.894883] RIP: 0033:0x7f660e208950 [ 369.895538] RSP: 002b:00007fff52b27b78 EFLAGS: 00000202 ORIG_RAX: 000000000000004a [ 369.896882] RAX: ffffffffffffffda RBX: 00007fff52b28cb8 RCX: 00007f660e208950 [ 369.898139] RDX: 0000000000001000 RSI: 00007fff52b27b80 RDI: 0000000000000003 [ 369.899395] RBP: 00007fff52b28ba0 R08: 0000000000000410 R09: 0000000000000001 [ 369.900661] R10: 00007f660e11c400 R11: 0000000000000202 R12: 0000000000000000 [ 369.901925] R13: 00007fff52b28cc8 R14: 00007f660e328000 R15: 000055b5aeb6fdd8 [ 369.903202] INFO: task sync:468 blocked for more than 122 seconds. [ 369.904311] Not tainted 5.10.0+ #164 [ 369.905034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.906457] task:sync state:D stack:13632 pid: 468 ppid: 386 flags:0x00004002 [ 369.907930] Call Trace: [ 369.908369] __schedule+0x401/0x8e0 [ 369.908984] schedule+0x49/0x130 [ 369.909582] io_schedule+0x12/0x40 [ 369.910208] wait_on_page_bit+0x106/0x2e0 [ 369.910918] ? add_to_page_cache_lru+0x180/0x180 [ 369.911758] wait_on_page_writeback+0x25/0xd0 [ 369.912560] __filemap_fdatawait_range+0x83/0x110 [ 369.913408] ? __add_pages+0x6f/0x1b0 [ 369.914089] filemap_fdatawait_keep_errors+0x1a/0x50 [ 369.914957] sync_inodes_sb+0x208/0x2a0 [ 369.915666] ? __x64_sys_tee+0xd0/0xd0 [ 369.916344] iterate_supers+0x90/0xe0 [ 369.916983] ksys_sync+0x40/0xb0 [ 369.917590] __do_sys_sync+0xa/0x20 [ 369.918240] do_syscall_64+0x33/0x40 [ 369.918884] entry_SYSCALL_64_after_hwframe+0x67/0xd1 [ 369.919800] RIP: 0033:0x7f746d820987 [ 369.920451] RSP: 002b:00007ffce853fd78 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [ 369.921798] RAX: ffffffffffffffda RBX: 00007ffce853fed8 RCX: 00007f746d820987 [ 369.923063] RDX: 00007f746d8f4801 RSI: 00007ffce8541f71 RDI: 00007f746d8b05ad [ 369.924339] RBP: 0000000000000001 R08: 000000000000ffff R09: 0000000000000000 [ 369.925605] R10: 00007f746d7308a0 R11: 0000000000000206 R12: 000055b8487470fb [ 369.926866] R13: 0000000000000000 R14: 0000000000000000 R15: 000055b848749ce0 [ 369.928138] Kernel panic - not syncing: hung_task: blocked tasks [ 369.929191] CPU: 3 PID: 35 Comm: khungtaskd Not tainted 5.10.0+ #164 [ 369.952450] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 [ 369.956984] Call Trace: [ 369.957973] dump_stack+0x57/0x6e [ 369.959273] panic+0x115/0x2f1 [ 369.960476] watchdog.cold+0xb5/0xb5 [ 369.961884] ? hungtask_pm_notify+0x40/0x40 [ 369.963310] kthread+0x125/0x160 [ 369.964354] ? kthread_park+0x90/0x90 [ 369.965551] ret_from_fork+0x22/0x30 [ 369.967673] Kernel Offset: 0xd600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 369.971025] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]--- ```
Scenarios 2:
Scenario 2 occurs in strict cache mode
``` P1 (dd) P2 (cifsd) P3 (cifsiod)
cifs_strict_writev cifs_zap_mapping - If something breaks the oplock cifs_revalidate_mapping cifs_invalidate_mapping invalidate_inode_pages2 invalidate_inode_pages2_range lock_page - [1] wait_on_page_writeback - [2] Waiting for writeback, blocked by [4] wait_on_page_bit cifs_demultiplex_thread cifs_read_from_socket cifs_readv_from_socket - If another process triggers reconnect at this point cifs_reconnect - mid->mid_state updated to MID_RETRY_NEEDED smb2_writev_callback mid_entry->callback() - mid_state leads to wdata->result = -EAGAIN wdata->result = -EAGAIN queue_work(cifsiod_wq, &wdata->work); cifs_writev_complete - worker function - wdata->result == -EAGAIN Condition satisfied cifs_writev_requeue lock_page - [3] Blocked by [1] end_page_writeback - [4] Won't execute unlock_page ```
Mainline refactoring commit 3ee1a1fc3981 ("cifs: Cut over to using netfslib") directly terminates the file write instead of resending data when smb2_writev_callback() detects a write failure, thus avoiding this problem. This patch is introduced in v6.10-rc1. Therefore, scenario 2 affects LTS versions from linux-5.4 to linux-6.6.
``` cat /proc/522/stack [<0>] wait_on_page_bit+0x106/0x150 [<0>] invalidate_inode_pages2_range+0x2cc/0x580 [<0>] cifs_invalidate_mapping+0x2c/0x50 [cifs] [<0>] cifs_revalidate_mapping+0x4c/0x90 [cifs] [<0>] cifs_strict_writev+0x17a/0x250 [cifs] [<0>] __vfs_write+0x14f/0x1b0 [<0>] vfs_write+0xb6/0x1a0 [<0>] ksys_write+0x57/0xd0 [<0>] do_syscall_64+0x63/0x250 [<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 [<0>] 0xffffffffffffffff
cat /proc/33/stack [<0>] __lock_page+0x10c/0x160 [<0>] cifs_writev_requeue.cold+0x17e/0x239 [cifs] [<0>] process_one_work+0x1a9/0x3f0 [<0>] worker_thread+0x50/0x3c0 [<0>] kthread+0x117/0x130 [<0>] ret_from_fork+0x35/0x40 [<0>] 0xffffffffffffffff ```
The root cause of the deadlock problem is that the page/folio is locked again in cifs_writev_requeue(). In order to safely fix it on the LTS branches, I would like to clarify the following questions:,
1. Whether resending is necessary. If retransmission is not required, simply terminating the write would avoids this problem. Is this an acceptable solution?
2. Is it necessary to lock the page/folio in cifs_writev_requeue()? Based on my code screening (possibly missing), there seems to be no process that modifies a page when it is marked as PG_writeback.Therefore, the page does not need to be locked during wait_on_page_writeback().
Adding David Howells in case he has opinions.
On Sun, Jan 26, 2025 at 1:37 AM Wang Zhaolong wangzhaolong1@huawei.com wrote:
In the code of the LTS branch that is being maintained (from linux-5.4 to linux-6.6), a deadlock occurs in the network reconnection scenario When multiple processes or threads write to the same file concurrently.
Take the code of linux-5.10 as an example. The simplified deadlock process is as follows:
Process 1 Process 2 lock_page() - [1] wait_on_page_writeback() - [2] Waiting for writeback, blocked by [4] lock_page() - [3] Blocked by [1] end_page_writeback() - [4] Won't execute
Based on my research, I'm going to use two detailed scenarios to illustrate the issue.
Scenarios 1:
P1 (dd) P2 (cifsd) P3 (cifsiod) cifs_writepages wdata_prepare_pages lock_page - [1] wait_on_page_writeback - [2] Waiting for writeback, blocked by [4] wait_on_page_bit cifs_demultiplex_thread cifs_read_from_socket cifs_readv_from_socket - If another process triggers reconnect at this point cifs_reconnect - mid->mid_state updated to MID_RETRY_NEEDED smb2_writev_callback mid_entry->callback() - mid_state leads to wdata->result = -EAGAIN wdata->result = -EAGAIN queue_work(cifsiod_wq, &wdata->work); cifs_writev_complete - worker function - wdata->result == -EAGAIN Condition satisfied cifs_writev_requeue lock_page - [3] Blocked by [1] end_page_writeback - [4] Won't execute unlock_page
Mainline refactoring commit d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") unlock folio while waiting for the writeback to complete. This patch is introduced in v6.3-rc1. Therefore, scenario 1 only affects LTS versions from linux-5.4 to linux-6.1.
Call stack trace:
cat /proc/34/stack [<0>] __lock_page+0x147/0x3a0 [<0>] cifs_writev_requeue.cold+0x185/0x28e [<0>] process_one_work+0x1df/0x3b0 [<0>] worker_thread+0x4a/0x3c0 [<0>] kthread+0x125/0x160 [<0>] ret_from_fork+0x22/0x30 # cat /proc/465/stack [<0>] wait_on_page_bit+0x106/0x2e0 [<0>] wait_on_page_writeback+0x25/0xd0 [<0>] cifs_writepages+0x5ee/0xf60 [<0>] do_writepages+0x43/0xe0 [<0>] __filemap_fdatawrite_range+0xcd/0x110 [<0>] file_write_and_wait_range+0x40/0x90 [<0>] cifs_strict_fsync+0x35/0x470 [<0>] do_fsync+0x38/0x70 [<0>] __x64_sys_fsync+0x10/0x20 [<0>] do_syscall_64+0x33/0x40 [<0>] entry_SYSCALL_64_after_hwframe+0x67/0xd1 [ 369.826215] INFO: task kworker/1:1:34 blocked for more than 122 seconds. [ 369.828964] Not tainted 5.10.0+ #164 [ 369.830623] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.835104] task:kworker/1:1 state:D stack:13472 pid: 34 ppid: 2 flags:0x00004000 [ 369.838448] Workqueue: cifsiod cifs_writev_complete [ 369.840242] Call Trace: [ 369.841219] __schedule+0x401/0x8e0 [ 369.842568] schedule+0x49/0x130 [ 369.843785] io_schedule+0x12/0x40 [ 369.845079] __lock_page+0x147/0x3a0 [ 369.846444] ? add_to_page_cache_lru+0x180/0x180 [ 369.847963] cifs_writev_requeue.cold+0x185/0x28e [ 369.849193] process_one_work+0x1df/0x3b0 [ 369.850248] worker_thread+0x4a/0x3c0 [ 369.851216] ? process_one_work+0x3b0/0x3b0 [ 369.852308] kthread+0x125/0x160 [ 369.853167] ? kthread_park+0x90/0x90 [ 369.854142] ret_from_fork+0x22/0x30 [ 369.855054] INFO: task kworker/u8:3:96 blocked for more than 122 seconds. [ 369.856781] Not tainted 5.10.0+ #164 [ 369.857851] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.859419] task:kworker/u8:3 state:D stack:12744 pid: 96 ppid: 2 flags:0x00004000 [ 369.861041] Workqueue: writeback wb_workfn (flush-cifs-2) [ 369.862095] Call Trace: [ 369.862583] __schedule+0x401/0x8e0 [ 369.863280] schedule+0x49/0x130 [ 369.863912] io_schedule+0x12/0x40 [ 369.864604] __lock_page+0x147/0x3a0 [ 369.865322] ? add_to_page_cache_lru+0x180/0x180 [ 369.866246] cifs_writepages+0x620/0xf60 [ 369.867005] do_writepages+0x43/0xe0 [ 369.867737] ? __blk_mq_try_issue_directly+0x121/0x1c0 [ 369.868750] __writeback_single_inode+0x3d/0x320 [ 369.869589] writeback_sb_inodes+0x20d/0x480 [ 369.870367] __writeback_inodes_wb+0x4c/0xe0 [ 369.871148] wb_writeback+0x201/0x2f0 [ 369.871797] wb_workfn+0x38a/0x4e0 [ 369.872427] ? check_preempt_curr+0x47/0x70 [ 369.873191] ? ttwu_do_wakeup.isra.0+0x17/0x170 [ 369.873999] process_one_work+0x1df/0x3b0 [ 369.874741] worker_thread+0x4a/0x3c0 [ 369.875421] ? process_one_work+0x3b0/0x3b0 [ 369.876180] kthread+0x125/0x160 [ 369.876761] ? kthread_park+0x90/0x90 [ 369.877431] ret_from_fork+0x22/0x30 [ 369.878106] INFO: task a.out:465 blocked for more than 122 seconds. [ 369.879225] Not tainted 5.10.0+ #164 [ 369.879945] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.881316] task:a.out state:D stack:12752 pid: 465 ppid: 386 flags:0x00000002 [ 369.882791] Call Trace: [ 369.883263] __schedule+0x401/0x8e0 [ 369.883884] schedule+0x49/0x130 [ 369.884447] io_schedule+0x12/0x40 [ 369.885054] wait_on_page_bit+0x106/0x2e0 [ 369.885795] ? add_to_page_cache_lru+0x180/0x180 [ 369.886631] wait_on_page_writeback+0x25/0xd0 [ 369.887427] cifs_writepages+0x5ee/0xf60 [ 369.888151] do_writepages+0x43/0xe0 [ 369.888789] ? __generic_file_write_iter+0xfd/0x1d0 [ 369.889663] __filemap_fdatawrite_range+0xcd/0x110 [ 369.890523] file_write_and_wait_range+0x40/0x90 [ 369.891360] cifs_strict_fsync+0x35/0x470 [ 369.892094] do_fsync+0x38/0x70 [ 369.892657] __x64_sys_fsync+0x10/0x20 [ 369.893336] do_syscall_64+0x33/0x40 [ 369.893978] entry_SYSCALL_64_after_hwframe+0x67/0xd1 [ 369.894883] RIP: 0033:0x7f660e208950 [ 369.895538] RSP: 002b:00007fff52b27b78 EFLAGS: 00000202 ORIG_RAX: 000000000000004a [ 369.896882] RAX: ffffffffffffffda RBX: 00007fff52b28cb8 RCX: 00007f660e208950 [ 369.898139] RDX: 0000000000001000 RSI: 00007fff52b27b80 RDI: 0000000000000003 [ 369.899395] RBP: 00007fff52b28ba0 R08: 0000000000000410 R09: 0000000000000001 [ 369.900661] R10: 00007f660e11c400 R11: 0000000000000202 R12: 0000000000000000 [ 369.901925] R13: 00007fff52b28cc8 R14: 00007f660e328000 R15: 000055b5aeb6fdd8 [ 369.903202] INFO: task sync:468 blocked for more than 122 seconds. [ 369.904311] Not tainted 5.10.0+ #164 [ 369.905034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.906457] task:sync state:D stack:13632 pid: 468 ppid: 386 flags:0x00004002 [ 369.907930] Call Trace: [ 369.908369] __schedule+0x401/0x8e0 [ 369.908984] schedule+0x49/0x130 [ 369.909582] io_schedule+0x12/0x40 [ 369.910208] wait_on_page_bit+0x106/0x2e0 [ 369.910918] ? add_to_page_cache_lru+0x180/0x180 [ 369.911758] wait_on_page_writeback+0x25/0xd0 [ 369.912560] __filemap_fdatawait_range+0x83/0x110 [ 369.913408] ? __add_pages+0x6f/0x1b0 [ 369.914089] filemap_fdatawait_keep_errors+0x1a/0x50 [ 369.914957] sync_inodes_sb+0x208/0x2a0 [ 369.915666] ? __x64_sys_tee+0xd0/0xd0 [ 369.916344] iterate_supers+0x90/0xe0 [ 369.916983] ksys_sync+0x40/0xb0 [ 369.917590] __do_sys_sync+0xa/0x20 [ 369.918240] do_syscall_64+0x33/0x40 [ 369.918884] entry_SYSCALL_64_after_hwframe+0x67/0xd1 [ 369.919800] RIP: 0033:0x7f746d820987 [ 369.920451] RSP: 002b:00007ffce853fd78 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [ 369.921798] RAX: ffffffffffffffda RBX: 00007ffce853fed8 RCX: 00007f746d820987 [ 369.923063] RDX: 00007f746d8f4801 RSI: 00007ffce8541f71 RDI: 00007f746d8b05ad [ 369.924339] RBP: 0000000000000001 R08: 000000000000ffff R09: 0000000000000000 [ 369.925605] R10: 00007f746d7308a0 R11: 0000000000000206 R12: 000055b8487470fb [ 369.926866] R13: 0000000000000000 R14: 0000000000000000 R15: 000055b848749ce0 [ 369.928138] Kernel panic - not syncing: hung_task: blocked tasks [ 369.929191] CPU: 3 PID: 35 Comm: khungtaskd Not tainted 5.10.0+ #164 [ 369.952450] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 [ 369.956984] Call Trace: [ 369.957973] dump_stack+0x57/0x6e [ 369.959273] panic+0x115/0x2f1 [ 369.960476] watchdog.cold+0xb5/0xb5 [ 369.961884] ? hungtask_pm_notify+0x40/0x40 [ 369.963310] kthread+0x125/0x160 [ 369.964354] ? kthread_park+0x90/0x90 [ 369.965551] ret_from_fork+0x22/0x30 [ 369.967673] Kernel Offset: 0xd600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 369.971025] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---
Scenarios 2:
Scenario 2 occurs in strict cache mode
P1 (dd) P2 (cifsd) P3 (cifsiod) cifs_strict_writev cifs_zap_mapping - If something breaks the oplock cifs_revalidate_mapping cifs_invalidate_mapping invalidate_inode_pages2 invalidate_inode_pages2_range lock_page - [1] wait_on_page_writeback - [2] Waiting for writeback, blocked by [4] wait_on_page_bit cifs_demultiplex_thread cifs_read_from_socket cifs_readv_from_socket - If another process triggers reconnect at this point cifs_reconnect - mid->mid_state updated to MID_RETRY_NEEDED smb2_writev_callback mid_entry->callback() - mid_state leads to wdata->result = -EAGAIN wdata->result = -EAGAIN queue_work(cifsiod_wq, &wdata->work); cifs_writev_complete - worker function - wdata->result == -EAGAIN Condition satisfied cifs_writev_requeue lock_page - [3] Blocked by [1] end_page_writeback - [4] Won't execute unlock_page
Mainline refactoring commit 3ee1a1fc3981 ("cifs: Cut over to using netfslib") directly terminates the file write instead of resending data when smb2_writev_callback() detects a write failure, thus avoiding this problem. This patch is introduced in v6.10-rc1. Therefore, scenario 2 affects LTS versions from linux-5.4 to linux-6.6.
cat /proc/522/stack [<0>] wait_on_page_bit+0x106/0x150 [<0>] invalidate_inode_pages2_range+0x2cc/0x580 [<0>] cifs_invalidate_mapping+0x2c/0x50 [cifs] [<0>] cifs_revalidate_mapping+0x4c/0x90 [cifs] [<0>] cifs_strict_writev+0x17a/0x250 [cifs] [<0>] __vfs_write+0x14f/0x1b0 [<0>] vfs_write+0xb6/0x1a0 [<0>] ksys_write+0x57/0xd0 [<0>] do_syscall_64+0x63/0x250 [<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 [<0>] 0xffffffffffffffff cat /proc/33/stack [<0>] __lock_page+0x10c/0x160 [<0>] cifs_writev_requeue.cold+0x17e/0x239 [cifs] [<0>] process_one_work+0x1a9/0x3f0 [<0>] worker_thread+0x50/0x3c0 [<0>] kthread+0x117/0x130 [<0>] ret_from_fork+0x35/0x40 [<0>] 0xffffffffffffffff
The root cause of the deadlock problem is that the page/folio is locked again in cifs_writev_requeue(). In order to safely fix it on the LTS branches, I would like to clarify the following questions:,
- Whether resending is necessary. If retransmission is not required, simply
terminating the write would avoids this problem. Is this an acceptable solution?
- Is it necessary to lock the page/folio in cifs_writev_requeue()? Based on
my code screening (possibly missing), there seems to be no process that modifies a page when it is marked as PG_writeback.Therefore, the page does not need to be locked during wait_on_page_writeback().
This is before cifs moved over to using netfslib (v6.9) and netfslib took over all the dealing with the VFS/VM for I/O and the handling of pages/folios. Do you know if the same problem occurs after that point?
David
Apologies for the earlier context-less ping 🙏. Here's the situation:
I have been tracking the latest progress on fixing an issue involving a deadlock in the CIFS write file process caused by a network interruption. This problem affects LTS Linux kernel versions 5.4.y through 6.6.y. The reason it is limited to LTS versions is that the issue was avoided in the mainline 6.9 version due to the netns-based code restructuring in the CIFS.
In my previous email, I provided the code call flow of the issue, as well as the invasive method to modify the kernel for reproduction. If there is anything else I can provide to help move this forward, please let me know.
Thank you for your time and support!
Best regards, Wang Zhaolong
On Tue, Mar 18, 2025 at 09:50:25PM +0800, Wang Zhaolong wrote:
Friendly ping.
Empty pings with no context are not good :(
linux-stable-mirror@lists.linaro.org