The recently released stable version 3.18.125 introduced a deadlock because dm_get_live_table() is called twice within __dm_destroy().
The backported commit e1db66a5 "dm: fix AB-BA deadlock in __dm_destroy()" doesn't *move* the dm_get_live_table() call from before the mutex_lock(), as the original commit 2a708cff does, but instead *adds* a new dm_get_live_table() call after the mutex_lock(). The two dm_get_live_table() calls result in a deadlock:
[ 311.291323] INFO: task cryptsetup:209 blocked for more than 120 seconds. [ 311.420925] Not tainted 3.18.125+1-amd64 #1 [ 311.559858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 311.651116] cryptsetup D 0000000000000000 0 209 203 0x00000000 [ 311.732304] ffff88007abfd5f0 0000000000000082 0000000000000001 ffff88007a470390 [ 311.873420] 00000000000136c0 ffff88007a78bfd8 00000000000136c0 ffff88007abfd5f0 [ 311.934275] 0000000000000001 ffff88007a78bc70 7fffffffffffffff ffff88007a78bc68 [ 311.940115] Call Trace: [ 311.949956] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.179891] [<ffffffff81553d8a>] ? schedule_timeout+0x24a/0x2d0 [ 312.375447] [<ffffffff810a7ba4>] ? __wake_up+0x34/0x50 [ 312.377825] [<ffffffff810c3374>] ? srcu_readers_seq_idx.isra.8+0x54/0x70 [ 312.557921] [<ffffffff815519b0>] ? wait_for_completion+0xb0/0x120 [ 312.561314] [<ffffffff81096340>] ? wake_up_state+0x20/0x20 [ 312.664457] [<ffffffff810c36e8>] ? __synchronize_srcu+0xd8/0x120 [ 312.768794] [<ffffffff810c3260>] ? call_srcu+0x70/0x70 [ 312.790337] [<ffffffffa01ca757>] ? __dm_destroy+0x107/0x2e0 [dm_mod] [ 312.909878] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.978804] [<ffffffffa01d0c4e>] ? dev_remove+0xde/0x120 [dm_mod] [ 313.082322] [<ffffffffa01d12e3>] ? ctl_ioctl+0x203/0x4c0 [dm_mod] [ 313.175957] [<ffffffffa01d15b3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 313.301981] [<ffffffff811d40c0>] ? do_vfs_ioctl+0x2d0/0x4a0 [ 313.384648] [<ffffffff8108848c>] ? task_work_run+0xbc/0xf0 [ 313.489669] [<ffffffff811d4311>] ? SyS_ioctl+0x81/0xa0 [ 313.510846] [<ffffffff815551cd>] ? system_call_fastpath+0x16/0x1b
Removing the original dm_get_live_table() call from before the mutex_lock() prevents the deadlock.
Thanks for maintaining 3.18!
PS Greg, Was this a subtle attempt to get someone to speak up and say "I am using this!" as you requested in the 3.18.125 release announcement? ;)
Corey -- undefined@pobox.com
__dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) twice which leads to a deadlock. Remove taking lock before holding suspend_lock to prevent a different potential deadlock.
Signed-off-by: Corey Wright undefined@pobox.com Fixes: e1db66a5fdcc ("dm: fix AB-BA deadlock in __dm_destroy()") Cc: Sasha Levin sashal@kernel.org --- drivers/md/dm.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 00c86ff..3955df0 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2589,7 +2589,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait) might_sleep();
spin_lock(&_minor_lock); - map = dm_get_live_table(md, &srcu_idx); idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md)))); set_bit(DMF_FREEING, &md->flags); spin_unlock(&_minor_lock);
On Sun, Nov 11, 2018 at 02:06:54AM -0600, Corey Wright wrote:
The recently released stable version 3.18.125 introduced a deadlock because dm_get_live_table() is called twice within __dm_destroy().
The backported commit e1db66a5 "dm: fix AB-BA deadlock in __dm_destroy()" doesn't *move* the dm_get_live_table() call from before the mutex_lock(), as the original commit 2a708cff does, but instead *adds* a new dm_get_live_table() call after the mutex_lock(). The two dm_get_live_table() calls result in a deadlock:
[ 311.291323] INFO: task cryptsetup:209 blocked for more than 120 seconds. [ 311.420925] Not tainted 3.18.125+1-amd64 #1 [ 311.559858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 311.651116] cryptsetup D 0000000000000000 0 209 203 0x00000000 [ 311.732304] ffff88007abfd5f0 0000000000000082 0000000000000001 ffff88007a470390 [ 311.873420] 00000000000136c0 ffff88007a78bfd8 00000000000136c0 ffff88007abfd5f0 [ 311.934275] 0000000000000001 ffff88007a78bc70 7fffffffffffffff ffff88007a78bc68 [ 311.940115] Call Trace: [ 311.949956] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.179891] [<ffffffff81553d8a>] ? schedule_timeout+0x24a/0x2d0 [ 312.375447] [<ffffffff810a7ba4>] ? __wake_up+0x34/0x50 [ 312.377825] [<ffffffff810c3374>] ? srcu_readers_seq_idx.isra.8+0x54/0x70 [ 312.557921] [<ffffffff815519b0>] ? wait_for_completion+0xb0/0x120 [ 312.561314] [<ffffffff81096340>] ? wake_up_state+0x20/0x20 [ 312.664457] [<ffffffff810c36e8>] ? __synchronize_srcu+0xd8/0x120 [ 312.768794] [<ffffffff810c3260>] ? call_srcu+0x70/0x70 [ 312.790337] [<ffffffffa01ca757>] ? __dm_destroy+0x107/0x2e0 [dm_mod] [ 312.909878] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.978804] [<ffffffffa01d0c4e>] ? dev_remove+0xde/0x120 [dm_mod] [ 313.082322] [<ffffffffa01d12e3>] ? ctl_ioctl+0x203/0x4c0 [dm_mod] [ 313.175957] [<ffffffffa01d15b3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 313.301981] [<ffffffff811d40c0>] ? do_vfs_ioctl+0x2d0/0x4a0 [ 313.384648] [<ffffffff8108848c>] ? task_work_run+0xbc/0xf0 [ 313.489669] [<ffffffff811d4311>] ? SyS_ioctl+0x81/0xa0 [ 313.510846] [<ffffffff815551cd>] ? system_call_fastpath+0x16/0x1b
Removing the original dm_get_live_table() call from before the mutex_lock() prevents the deadlock.
Thanks for maintaining 3.18!
PS Greg, Was this a subtle attempt to get someone to speak up and say "I am using this!" as you requested in the 3.18.125 release announcement? ;)
Heh, no it was not, but it's nice to see someone actually read that :)
Why are you using 3.18 and why can you not just use a newer kernel version?
thanks,
greg k-h
On Sun, 11 Nov 2018 12:53:28 -0800 Greg KH greg@kroah.com wrote:
On Sun, Nov 11, 2018 at 02:06:54AM -0600, Corey Wright wrote:
The recently released stable version 3.18.125 introduced a deadlock because dm_get_live_table() is called twice within __dm_destroy().
The backported commit e1db66a5 "dm: fix AB-BA deadlock in __dm_destroy()" doesn't *move* the dm_get_live_table() call from before the mutex_lock(), as the original commit 2a708cff does, but instead *adds* a new dm_get_live_table() call after the mutex_lock(). The two dm_get_live_table() calls result in a deadlock:
[ 311.291323] INFO: task cryptsetup:209 blocked for more than 120 seconds. [ 311.420925] Not tainted 3.18.125+1-amd64 #1 [ 311.559858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 311.651116] cryptsetup D 0000000000000000 0 209 203 0x00000000 [ 311.732304] ffff88007abfd5f0 0000000000000082 0000000000000001 ffff88007a470390 [ 311.873420] 00000000000136c0 ffff88007a78bfd8 00000000000136c0 ffff88007abfd5f0 [ 311.934275] 0000000000000001 ffff88007a78bc70 7fffffffffffffff ffff88007a78bc68 [ 311.940115] Call Trace: [ 311.949956] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.179891] [<ffffffff81553d8a>] ? schedule_timeout+0x24a/0x2d0 [ 312.375447] [<ffffffff810a7ba4>] ? __wake_up+0x34/0x50 [ 312.377825] [<ffffffff810c3374>] ? srcu_readers_seq_idx.isra.8+0x54/0x70 [ 312.557921] [<ffffffff815519b0>] ? wait_for_completion+0xb0/0x120 [ 312.561314] [<ffffffff81096340>] ? wake_up_state+0x20/0x20 [ 312.664457] [<ffffffff810c36e8>] ? __synchronize_srcu+0xd8/0x120 [ 312.768794] [<ffffffff810c3260>] ? call_srcu+0x70/0x70 [ 312.790337] [<ffffffffa01ca757>] ? __dm_destroy+0x107/0x2e0 [dm_mod] [ 312.909878] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.978804] [<ffffffffa01d0c4e>] ? dev_remove+0xde/0x120 [dm_mod] [ 313.082322] [<ffffffffa01d12e3>] ? ctl_ioctl+0x203/0x4c0 [dm_mod] [ 313.175957] [<ffffffffa01d15b3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 313.301981] [<ffffffff811d40c0>] ? do_vfs_ioctl+0x2d0/0x4a0 [ 313.384648] [<ffffffff8108848c>] ? task_work_run+0xbc/0xf0 [ 313.489669] [<ffffffff811d4311>] ? SyS_ioctl+0x81/0xa0 [ 313.510846] [<ffffffff815551cd>] ? system_call_fastpath+0x16/0x1b
Removing the original dm_get_live_table() call from before the mutex_lock() prevents the deadlock.
Thanks for maintaining 3.18!
PS Greg, Was this a subtle attempt to get someone to speak up and say "I am using this!" as you requested in the 3.18.125 release announcement? ;)
Heh, no it was not, but it's nice to see someone actually read that :)
Why are you using 3.18 and why can you not just use a newer kernel version?
Full disclosure: I have no expectations that you maintain 3.18, especially not for my personal use-case. I am not a device manufacturer and therefor realize I'm not the intended audience for long-term stable kernels.
It's all your fault! You make staying on 3.18 too easy. ;) And even when the process breaks, like the bad backport I encountered with 3.18.125, as a software developer who doesn't do C programming professionally any more I enjoyed the exercise.
I run 3.18 patched with Linux-VServer separating several services on a single personal server. I was hoping to migrate to LXC so as to use my distro's kernel, but between busyness of job and family and continued stable updates I've stuck with 3.18. Linux-VServer even supports 4.4 and 4.9, but momentum has kept me on 3.18.
If you drop 3.18 stable updates then I'll make do. (Maybe that'll be the kick in the pants I need to finally migrate to LXC or maybe I'll keep kicking the can down the road and migrate to 4.4 or 4.9 patched with Linux-VServer.)
thanks,
greg k-h
Thank you for your and Sasha's support of 3.18 specifically, but the Linux kernel community in general!
Corey -- undefined@pobox.com
On Sun, Nov 11, 2018 at 02:06:54AM -0600, Corey Wright wrote:
The recently released stable version 3.18.125 introduced a deadlock because dm_get_live_table() is called twice within __dm_destroy().
The backported commit e1db66a5 "dm: fix AB-BA deadlock in __dm_destroy()" doesn't *move* the dm_get_live_table() call from before the mutex_lock(), as the original commit 2a708cff does, but instead *adds* a new dm_get_live_table() call after the mutex_lock(). The two dm_get_live_table() calls result in a deadlock:
[ 311.291323] INFO: task cryptsetup:209 blocked for more than 120 seconds. [ 311.420925] Not tainted 3.18.125+1-amd64 #1 [ 311.559858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 311.651116] cryptsetup D 0000000000000000 0 209 203 0x00000000 [ 311.732304] ffff88007abfd5f0 0000000000000082 0000000000000001 ffff88007a470390 [ 311.873420] 00000000000136c0 ffff88007a78bfd8 00000000000136c0 ffff88007abfd5f0 [ 311.934275] 0000000000000001 ffff88007a78bc70 7fffffffffffffff ffff88007a78bc68 [ 311.940115] Call Trace: [ 311.949956] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.179891] [<ffffffff81553d8a>] ? schedule_timeout+0x24a/0x2d0 [ 312.375447] [<ffffffff810a7ba4>] ? __wake_up+0x34/0x50 [ 312.377825] [<ffffffff810c3374>] ? srcu_readers_seq_idx.isra.8+0x54/0x70 [ 312.557921] [<ffffffff815519b0>] ? wait_for_completion+0xb0/0x120 [ 312.561314] [<ffffffff81096340>] ? wake_up_state+0x20/0x20 [ 312.664457] [<ffffffff810c36e8>] ? __synchronize_srcu+0xd8/0x120 [ 312.768794] [<ffffffff810c3260>] ? call_srcu+0x70/0x70 [ 312.790337] [<ffffffffa01ca757>] ? __dm_destroy+0x107/0x2e0 [dm_mod] [ 312.909878] [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod] [ 312.978804] [<ffffffffa01d0c4e>] ? dev_remove+0xde/0x120 [dm_mod] [ 313.082322] [<ffffffffa01d12e3>] ? ctl_ioctl+0x203/0x4c0 [dm_mod] [ 313.175957] [<ffffffffa01d15b3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 313.301981] [<ffffffff811d40c0>] ? do_vfs_ioctl+0x2d0/0x4a0 [ 313.384648] [<ffffffff8108848c>] ? task_work_run+0xbc/0xf0 [ 313.489669] [<ffffffff811d4311>] ? SyS_ioctl+0x81/0xa0 [ 313.510846] [<ffffffff815551cd>] ? system_call_fastpath+0x16/0x1b
Removing the original dm_get_live_table() call from before the mutex_lock() prevents the deadlock.
Thanks for maintaining 3.18!
PS Greg, Was this a subtle attempt to get someone to speak up and say "I am using this!" as you requested in the 3.18.125 release announcement? ;)
Hm, interesting. It looks like git did the wrong thing here, sorry for that :(
-- Thanks, Sasha
linux-stable-mirror@lists.linaro.org