Every time I run test srp/002 the following deadlock is triggered:
task:multipathd Call Trace: <TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 schedule_preempt_disabled+0x1c/0x30 __mutex_lock+0xb89/0x1650 mutex_lock_nested+0x1f/0x30 dm_table_set_restrictions+0x823/0xdf0 __bind+0x166/0x590 dm_swap_table+0x2a7/0x490 do_resume+0x1b1/0x610 dev_suspend+0x55/0x1a0 ctl_ioctl+0x3a5/0x7e0 dm_ctl_ioctl+0x12/0x20 __x64_sys_ioctl+0x127/0x1a0 x64_sys_call+0xe2b/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> task:(udev-worker) Call Trace: <TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 blk_mq_freeze_queue_wait+0xf2/0x140 blk_mq_freeze_queue_nomemsave+0x23/0x30 queue_ra_store+0x14e/0x290 queue_attr_store+0x23e/0x2c0 sysfs_kf_write+0xde/0x140 kernfs_fop_write_iter+0x3b2/0x630 vfs_write+0x4fd/0x1390 ksys_write+0xfd/0x230 __x64_sys_write+0x76/0xc0 x64_sys_call+0x276/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK>
Fix this by removing the superfluous queue freezing/unfreezing code from queue_ra_store().
Cc: Nilay Shroff nilay@linux.ibm.com Cc: stable@vger.kernel.org Fixes: b07a889e8335 ("block: move q->sysfs_lock and queue-freeze under show/store method") Signed-off-by: Bart Van Assche bvanassche@acm.org --- block/blk-sysfs.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index b2b9b89d6967..1f63b184c6e9 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -105,7 +105,6 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count) { unsigned long ra_kb; ssize_t ret; - unsigned int memflags; struct request_queue *q = disk->queue;
ret = queue_var_store(&ra_kb, page, count); @@ -116,10 +115,8 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count) * calculated from the queue limits by queue_limits_commit_update. */ mutex_lock(&q->limits_lock); - memflags = blk_mq_freeze_queue(q); disk->bdi->ra_pages = ra_kb >> (PAGE_SHIFT - 10); mutex_unlock(&q->limits_lock); - blk_mq_unfreeze_queue(q, memflags);
return ret; }
On 6/25/25 10:15 PM, Christoph Hellwig wrote:
On Wed, Jun 25, 2025 at 12:54:50PM -0700, Bart Van Assche wrote:
Fix this by removing the superfluous queue freezing/unfreezing code from queue_ra_store().
You'll need to explain why it is useless here.
If nobody objects I will add the following text to the patch description:
"Freezing the request queue from inside a block layer sysfs store callback function is essential when modifying parameters that affect how bios or requests are processed, e.g. parameters that affect bio_split_to_limit(). Freezing the request queue when modifying parameters that do not affect bio nor request processing is not necessary."
Thanks,
Bart.
On 6/26/25 1:24 AM, Bart Van Assche wrote:
Every time I run test srp/002 the following deadlock is triggered:
task:multipathd Call Trace:
<TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 schedule_preempt_disabled+0x1c/0x30 __mutex_lock+0xb89/0x1650 mutex_lock_nested+0x1f/0x30 dm_table_set_restrictions+0x823/0xdf0 __bind+0x166/0x590 dm_swap_table+0x2a7/0x490 do_resume+0x1b1/0x610 dev_suspend+0x55/0x1a0 ctl_ioctl+0x3a5/0x7e0 dm_ctl_ioctl+0x12/0x20 __x64_sys_ioctl+0x127/0x1a0 x64_sys_call+0xe2b/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> task:(udev-worker) Call Trace: <TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 blk_mq_freeze_queue_wait+0xf2/0x140 blk_mq_freeze_queue_nomemsave+0x23/0x30 queue_ra_store+0x14e/0x290 queue_attr_store+0x23e/0x2c0 sysfs_kf_write+0xde/0x140 kernfs_fop_write_iter+0x3b2/0x630 vfs_write+0x4fd/0x1390 ksys_write+0xfd/0x230 __x64_sys_write+0x76/0xc0 x64_sys_call+0x276/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK>
It seems that some other thread on your system acquired ->freeze_lock and never released it and that prevents the udev-worker thread to forward progress. As udev-worker couldn't forward progress, it also now prevents the multipathd to make progress (as multipathd is pending on ->limits_lock which has been acquired by udev-worker).
If you haven't enabled lockdep on your system then can you please configure lockdep and rerun the srp/002 test? I think, you shall encounter a lockdep splat and that shall help further investigate the deadlock.
Thanks, --Nilay
On 6/25/25 10:31 PM, Nilay Shroff wrote:
It seems that some other thread on your system acquired ->freeze_lock and never released it and that prevents the udev-worker thread to forward progress.
That's wrong. blk_mq_freeze_queue_wait() is waiting for q_usage_counter to drop to zero as the below output shows:
(gdb) list *(blk_mq_freeze_queue_wait+0xf2) 0xffffffff823ab0b2 is in blk_mq_freeze_queue_wait (block/blk-mq.c:190). 185 } 186 EXPORT_SYMBOL_GPL(blk_freeze_queue_start); 187 188 void blk_mq_freeze_queue_wait(struct request_queue *q) 189 { 190 wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); 191 } 192 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait); 193 194 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
If you haven't enabled lockdep on your system then can you please configure lockdep and rerun the srp/002 test?
Lockdep was enabled during the test and didn't complain.
This is my analysis of the deadlock:
* Multiple requests are pending: # (cd /sys/kernel/debug/block && grep -aH . */*/*/*list) | head dm-2/hctx0/cpu0/default_rq_list:0000000035c26c20 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=137, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000005060461e {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=136, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000007cd295ec {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=135, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:00000000a4a8006b {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=134, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000001f93036f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=140, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:00000000333baffb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=173, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000002c050850 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=141, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000000668dd8b {.op=WRITE, .cmd_flags=SYNC|META|PRIO, .rq_flags=IO_STAT, .state=idle, .tag=133, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:0000000079b67c9f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=207, .internal_tag=-1} dm-2/hctx0/cpu107/default_rq_list:0000000036254afb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=1384, .internal_tag=-1}
* queue_if_no_path is enabled for the multipath device dm-2: # ls -l /dev/mapper/mpatha lrwxrwxrwx 1 root root 7 Jun 26 08:50 /dev/mapper/mpatha -> ../dm-2 # dmsetup table mpatha 0 65536 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 2 8:32 1 1
* The block device 8:32 is being deleted: # grep '^8:32$' /sys/class/block/*/dev | wc -l 0
* blk_mq_freeze_queue_nomemsave() waits for the pending requests to finish. Because the only path in the multipath is being deleted and because queue_if_no_path is enabled, blk_mq_freeze_queue_nomemsave() hangs.
Bart.
On 6/26/25 9:32 PM, Bart Van Assche wrote:
On 6/25/25 10:31 PM, Nilay Shroff wrote:
It seems that some other thread on your system acquired ->freeze_lock and never released it and that prevents the udev-worker thread to forward progress.
That's wrong. blk_mq_freeze_queue_wait() is waiting for q_usage_counter to drop to zero as the below output shows:
(gdb) list *(blk_mq_freeze_queue_wait+0xf2) 0xffffffff823ab0b2 is in blk_mq_freeze_queue_wait (block/blk-mq.c:190). 185 } 186 EXPORT_SYMBOL_GPL(blk_freeze_queue_start); 187 188 void blk_mq_freeze_queue_wait(struct request_queue *q) 189 { 190 wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); 191 } 192 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait); 193 194 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
If you haven't enabled lockdep on your system then can you please configure lockdep and rerun the srp/002 test?
Lockdep was enabled during the test and didn't complain.
This is my analysis of the deadlock:
- Multiple requests are pending:
# (cd /sys/kernel/debug/block && grep -aH . */*/*/*list) | head dm-2/hctx0/cpu0/default_rq_list:0000000035c26c20 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=137, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000005060461e {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=136, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000007cd295ec {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=135, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:00000000a4a8006b {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=134, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000001f93036f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=140, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:00000000333baffb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=173, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000002c050850 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=141, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000000668dd8b {.op=WRITE, .cmd_flags=SYNC|META|PRIO, .rq_flags=IO_STAT, .state=idle, .tag=133, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:0000000079b67c9f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=207, .internal_tag=-1} dm-2/hctx0/cpu107/default_rq_list:0000000036254afb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=1384, .internal_tag=-1}
- queue_if_no_path is enabled for the multipath device dm-2:
# ls -l /dev/mapper/mpatha lrwxrwxrwx 1 root root 7 Jun 26 08:50 /dev/mapper/mpatha -> ../dm-2 # dmsetup table mpatha 0 65536 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 2 8:32 1 1
- The block device 8:32 is being deleted:
# grep '^8:32$' /sys/class/block/*/dev | wc -l 0
- blk_mq_freeze_queue_nomemsave() waits for the pending requests to
finish. Because the only path in the multipath is being deleted and because queue_if_no_path is enabled, blk_mq_freeze_queue_nomemsave() hangs.
Thanks! this makes sense now. But then we do have few other limits (e.g. iostats_passthrough, iostats, write_cache etc.) which are accessed during IO hotpath. So if we were to update those limits then we acquire ->limits_lock and also freezes the queue. So I wonder how could those be addressed?
Thanks, --Nilay
On 6/26/25 11:16 PM, Nilay Shroff wrote:
Thanks! this makes sense now. But then we do have few other limits (e.g. iostats_passthrough, iostats, write_cache etc.) which are accessed during IO hotpath. So if we were to update those limits then we acquire ->limits_lock and also freezes the queue. So I wonder how could those be addressed?
Is there any Linux distro that sets these sysfs attributes from a udev rule? If not, I don't think that we have to worry about these sysfs attributes.
Thanks,
Bart.
linux-stable-mirror@lists.linaro.org