Re: [PATCH 4.14/4.16 v2] writeback: safer lock nesting

22 Apr 2018


      On Sun, Apr 22, 2018 at 5:24 AM Greg Kroah-Hartman <
gregkh@linuxfoundation.org> wrote:
...
On Sun, Apr 22, 2018 at 04:15:12AM -0700, Nathan Chancellor wrote:
...
From: Greg Thelen gthelen@google.com
commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream.
lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.
unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains.  Switches occur when
enough writes are issued from a new domain.
This existing pattern is thus suspicious:
    lock_page_memcg(page);
    unlocked_inode_to_wb_begin(inode, &locked);
    ...
    unlocked_inode_to_wb_end(inode, locked);
    unlock_page_memcg(page);
If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock.  This suggests the
possibility of deadlock if an interrupt occurs before
unlock_page_memcg().
...
...
truncate
__cancel_dirty_page
lock_page_memcg
unlocked_inode_to_wb_begin
unlocked_inode_to_wb_end
<interrupts mistakenly enabled>
                                <interrupt>
                                end_page_writeback
                                test_clear_page_writeback
                                lock_page_memcg
                                <deadlock>
unlock_page_memcg


Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).
If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:
cd /mnt/cgroup/memory
  mkdir a b
  echo 1 > a/memory.move_charge_at_immigrate
  echo 1 > b/memory.move_charge_at_immigrate
  (
    echo $BASHPID > a/cgroup.procs
    while true; do
      dd if=/dev/zero of=/mnt/big bs=1M count=256
    done
  ) &
  while true; do
    sync
  done &
  sleep 1h &
  SLEEP=$!
  while true; do
    echo $SLEEP > a/cgroup.procs
    echo $SLEEP > b/cgroup.procs
  done
The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel.  I suggest we should to prevent future
surprises.  And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4
patch
...
...
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146
Wang Long said "this deadlock occurs three times in our environment"
[gthelen@google.com: v4]
  Link:
http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
...
...
[akpm@linux-foundation.org: comment tweaks, struct initialization
simplification]
...
...
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link:
http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
...
...
Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb
transaction and use it for stat updates")
...
...
Signed-off-by: Greg Thelen gthelen@google.com
Reported-by: Wang Long wanglong19@meituan.com
Acked-by: Wang Long wanglong19@meituan.com
Acked-by: Michal Hocko mhocko@suse.com
Reviewed-by: Andrew Morton akpm@linux-foundation.org
Cc: Johannes Weiner hannes@cmpxchg.org
Cc: Tejun Heo tj@kernel.org
Cc: Nicholas Piggin npiggin@gmail.com
Cc: stable@vger.kernel.org  [v4.2+]
Signed-off-by: Andrew Morton akpm@linux-foundation.org
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
[natechancellor: Adjust context due to lack of b93b016313b3b]
Signed-off-by: Nathan Chancellor natechancellor@gmail.com
...
Thanks for all of these, now queued up.
...
greg k-h
I reviewed the 4.4, 4.9, 4.14, 4.16 queued stable backports for
("writeback: safer lock nesting").  They all look good.  Thanks.
I don't know if it's customary to add an author's reviewed-by to
non-trivial backports.
If useful, here you go:
Reviewed-by: Greg Thelen gthelen@google.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 4.14/4.16 v2] writeback: safer lock nesting