On Sun, Apr 22, 2018 at 12:42:57PM +0200, Greg Kroah-Hartman wrote:
On Sun, Apr 22, 2018 at 03:36:32AM -0700, Nathan Chancellor wrote:
From: Greg Thelen gthelen@google.com
commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream.
lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if the page's memcg is undergoing move accounting, which occurs when a process leaves its memcg for a new one that has memory.move_charge_at_immigrate set.
unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the given inode is switching writeback domains. Switches occur when enough writes are issued from a new domain.
This existing pattern is thus suspicious: lock_page_memcg(page); unlocked_inode_to_wb_begin(inode, &locked); ... unlocked_inode_to_wb_end(inode, locked); unlock_page_memcg(page);
If both inode switch and process memcg migration are both in-flight then unlocked_inode_to_wb_end() will unconditionally enable interrupts while still holding the lock_page_memcg() irq spinlock. This suggests the possibility of deadlock if an interrupt occurs before unlock_page_memcg().
truncate __cancel_dirty_page lock_page_memcg unlocked_inode_to_wb_begin unlocked_inode_to_wb_end <interrupts mistakenly enabled> <interrupt> end_page_writeback test_clear_page_writeback lock_page_memcg <deadlock> unlock_page_memcg
Due to configuration limitations this deadlock is not currently possible because we don't mix cgroup writeback (a cgroupv2 feature) and memory.move_charge_at_immigrate (a cgroupv1 feature).
If the kernel is hacked to always claim inode switching and memcg moving_account, then this script triggers lockup in less than a minute:
cd /mnt/cgroup/memory mkdir a b echo 1 > a/memory.move_charge_at_immigrate echo 1 > b/memory.move_charge_at_immigrate ( echo $BASHPID > a/cgroup.procs while true; do dd if=/dev/zero of=/mnt/big bs=1M count=256 done ) & while true; do sync done & sleep 1h & SLEEP=$! while true; do echo $SLEEP > a/cgroup.procs echo $SLEEP > b/cgroup.procs done
The deadlock does not seem possible, so it's debatable if there's any reason to modify the kernel. I suggest we should to prevent future surprises. And Wang Long said "this deadlock occurs three times in our environment", so there's more reason to apply this, even to stable. Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch see "[PATCH for-4.4] writeback: safer lock nesting" https://lkml.org/lkml/2018/4/11/146
Wang Long said "this deadlock occurs three times in our environment"
[gthelen@google.com: v4] Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com [akpm@linux-foundation.org: comment tweaks, struct initialization simplification] Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613 Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") Signed-off-by: Greg Thelen gthelen@google.com Reported-by: Wang Long wanglong19@meituan.com Acked-by: Wang Long wanglong19@meituan.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Johannes Weiner hannes@cmpxchg.org Cc: Tejun Heo tj@kernel.org Cc: Nicholas Piggin npiggin@gmail.com Cc: stable@vger.kernel.org [v4.2+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org [natechancellor: Adjust context due to lack of b93b016313b3b] Signed-off-by: Nathan Chancellor natechancellor@gmail.com
I tried a "simple" backport like this as well, and it too blew up when building :(
This patch dies with:
$ make M=mm CC mm/filemap.o In file included from ./include/linux/kernel.h:13:0, from ./include/linux/list.h:9, from ./include/linux/wait.h:7, from ./include/linux/wait_bit.h:8, from ./include/linux/fs.h:6, from ./include/linux/dax.h:5, from mm/filemap.c:14: ./include/linux/backing-dev.h: In function ‘unlocked_inode_to_wb_begin’: ./include/linux/backing-dev.h:373:52: error: ‘flags’ undeclared (first use in this function); did you mean ‘class’? spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags); ^ ./include/linux/typecheck.h:11:9: note: in definition of macro ‘typecheck’ typeof(x) __dummy2; \ ^ ./include/linux/spinlock.h:340:2: note: in expansion of macro ‘raw_spin_lock_irqsave’ raw_spin_lock_irqsave(spinlock_check(lock), flags); \ ^~~~~~~~~~~~~~~~~~~~~ ./include/linux/backing-dev.h:373:3: note: in expansion of macro ‘spin_lock_irqsave’ spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags); ^~~~~~~~~~~~~~~~~ ./include/linux/backing-dev.h:373:52: note: each undeclared identifier is reported only once for each function it appears in spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags); ^ ./include/linux/typecheck.h:11:9: note: in definition of macro ‘typecheck’ typeof(x) __dummy2; \ ^ ./include/linux/spinlock.h:340:2: note: in expansion of macro ‘raw_spin_lock_irqsave’ raw_spin_lock_irqsave(spinlock_check(lock), flags); \ ^~~~~~~~~~~~~~~~~~~~~ ./include/linux/backing-dev.h:373:3: note: in expansion of macro ‘spin_lock_irqsave’ spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags); ^~~~~~~~~~~~~~~~~ ./include/linux/typecheck.h:12:18: warning: comparison of distinct pointer types lacks a cast (void)(&__dummy == &__dummy2); \ ^ ./include/linux/spinlock.h:221:3: note: in expansion of macro ‘typecheck’ typecheck(unsigned long, flags); \ ^~~~~~~~~ ./include/linux/spinlock.h:340:2: note: in expansion of macro ‘raw_spin_lock_irqsave’ raw_spin_lock_irqsave(spinlock_check(lock), flags); \ ^~~~~~~~~~~~~~~~~~~~~ ./include/linux/backing-dev.h:373:3: note: in expansion of macro ‘spin_lock_irqsave’ spin_lock_irqsave(&inode->i_mapping->tree_lock, *flags); ^~~~~~~~~~~~~~~~~ In file included from mm/filemap.c:29:0: ./include/linux/backing-dev.h: In function ‘unlocked_inode_to_wb_end’: ./include/linux/backing-dev.h:391:56: error: ‘flags’ undeclared (first use in this function); did you mean ‘class’? spin_unlock_irqrestore(&inode->i_mapping->tree_lock, flags); ^~~~~ class make[1]: *** [scripts/Makefile.build:325: mm/filemap.o] Error 1 make: *** [Makefile:1561: _module_mm] Error 2
Can you test-build your patches? :)
thanks,
greg k-h
Sigh I don't know what happened but I was looking at an old version of the patch for reference :/
I just send v2 and I have verified everything builds properly just using the x86_64_defconfig.
Again really sorry about all the noice! Nathan