On Sun, Nov 11, 2018 at 02:06:54AM -0600, Corey Wright wrote:
The recently released stable version 3.18.125 introduced a deadlock because dm_get_live_table() is called twice within __dm_destroy().
The backported commit e1db66a5 "dm: fix AB-BA deadlock in __dm_destroy()" doesn't *move* the dm_get_live_table() call from before the mutex_lock(), as the original commit 2a708cff does, but instead *adds* a new dm_get_live_table() call after the mutex_lock(). The two dm_get_live_table() calls result in a deadlock:
[ 311.291323] INFO: task cryptsetup:209 blocked for more than 120 seconds. [ 311.420925] Not tainted 3.18.125+1-amd64 #1 [ 311.559858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 311.651116] cryptsetup D 0000000000000000 0 209 203 0x00000000 [ 311.732304] ffff88007abfd5f0 0000000000000082 0000000000000001 ffff88007a470390 [ 311.873420] 00000000000136c0 ffff88007a78bfd8 00000000000136c0 ffff88007abfd5f0 [ 311.934275] 0000000000000001 ffff88007a78bc70 7fffffffffffffff ffff88007a78bc68 [ 311.940115] Call Trace: [ 311.949956] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.179891] [<ffffffff81553d8a>] ? schedule_timeout+0x24a/0x2d0 [ 312.375447] [<ffffffff810a7ba4>] ? __wake_up+0x34/0x50 [ 312.377825] [<ffffffff810c3374>] ? srcu_readers_seq_idx.isra.8+0x54/0x70 [ 312.557921] [<ffffffff815519b0>] ? wait_for_completion+0xb0/0x120 [ 312.561314] [<ffffffff81096340>] ? wake_up_state+0x20/0x20 [ 312.664457] [<ffffffff810c36e8>] ? __synchronize_srcu+0xd8/0x120 [ 312.768794] [<ffffffff810c3260>] ? call_srcu+0x70/0x70 [ 312.790337] [<ffffffffa01ca757>] ? __dm_destroy+0x107/0x2e0 [dm_mod] [ 312.909878] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.978804] [<ffffffffa01d0c4e>] ? dev_remove+0xde/0x120 [dm_mod] [ 313.082322] [<ffffffffa01d12e3>] ? ctl_ioctl+0x203/0x4c0 [dm_mod] [ 313.175957] [<ffffffffa01d15b3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 313.301981] [<ffffffff811d40c0>] ? do_vfs_ioctl+0x2d0/0x4a0 [ 313.384648] [<ffffffff8108848c>] ? task_work_run+0xbc/0xf0 [ 313.489669] [<ffffffff811d4311>] ? SyS_ioctl+0x81/0xa0 [ 313.510846] [<ffffffff815551cd>] ? system_call_fastpath+0x16/0x1b
Removing the original dm_get_live_table() call from before the mutex_lock() prevents the deadlock.
Thanks for maintaining 3.18!
PS Greg, Was this a subtle attempt to get someone to speak up and say "I am using this!" as you requested in the 3.18.125 release announcement? ;)
Hm, interesting. It looks like git did the wrong thing here, sorry for that :(
-- Thanks, Sasha