On 6/25/25 10:31 PM, Nilay Shroff wrote:
It seems that some other thread on your system acquired ->freeze_lock and never released it and that prevents the udev-worker thread to forward progress.
That's wrong. blk_mq_freeze_queue_wait() is waiting for q_usage_counter to drop to zero as the below output shows:
(gdb) list *(blk_mq_freeze_queue_wait+0xf2) 0xffffffff823ab0b2 is in blk_mq_freeze_queue_wait (block/blk-mq.c:190). 185 } 186 EXPORT_SYMBOL_GPL(blk_freeze_queue_start); 187 188 void blk_mq_freeze_queue_wait(struct request_queue *q) 189 { 190 wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); 191 } 192 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait); 193 194 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
If you haven't enabled lockdep on your system then can you please configure lockdep and rerun the srp/002 test?
Lockdep was enabled during the test and didn't complain.
This is my analysis of the deadlock:
* Multiple requests are pending: # (cd /sys/kernel/debug/block && grep -aH . */*/*/*list) | head dm-2/hctx0/cpu0/default_rq_list:0000000035c26c20 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=137, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000005060461e {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=136, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000007cd295ec {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=135, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:00000000a4a8006b {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=134, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000001f93036f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=140, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:00000000333baffb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=173, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000002c050850 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=141, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:000000000668dd8b {.op=WRITE, .cmd_flags=SYNC|META|PRIO, .rq_flags=IO_STAT, .state=idle, .tag=133, .internal_tag=-1} dm-2/hctx0/cpu0/default_rq_list:0000000079b67c9f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=207, .internal_tag=-1} dm-2/hctx0/cpu107/default_rq_list:0000000036254afb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=1384, .internal_tag=-1}
* queue_if_no_path is enabled for the multipath device dm-2: # ls -l /dev/mapper/mpatha lrwxrwxrwx 1 root root 7 Jun 26 08:50 /dev/mapper/mpatha -> ../dm-2 # dmsetup table mpatha 0 65536 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 2 8:32 1 1
* The block device 8:32 is being deleted: # grep '^8:32$' /sys/class/block/*/dev | wc -l 0
* blk_mq_freeze_queue_nomemsave() waits for the pending requests to finish. Because the only path in the multipath is being deleted and because queue_if_no_path is enabled, blk_mq_freeze_queue_nomemsave() hangs.
Bart.