Re: [Linaro-mm-sig] [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

17 Sep 2020


      On Thu, Jul 30, 2020 at 06:45:14PM +0200, Thomas Hellström (Intel) wrote:
...
On 7/30/20 3:17 PM, Daniel Vetter wrote:
...
On Thu, Jul 30, 2020 at 2:17 PM Thomas Hellström (Intel)
thomas_os@shipmail.org wrote:
...
On 7/28/20 3:58 PM, Daniel Vetter wrote:
...
GPU drivers need this in their shrinkers, to be able to throw out
mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
but that loop is resolved by trylocking in shrinkers.
So full hierarchy is now (ignore some of the other branches we already
have primed):
mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write
I hope that's not inconsistent with anything mm or fs does, adding
relevant people.
Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but
don't allocate any memory AFAICT.
Since huge page-table-entry splitting may happen under the i_mmap_lock
from unmap_mapping_range() it might be worth figuring out how new page
directory pages are allocated, though.
ofc I'm not an mm expert at all, but I did try to scroll through all
i_mmap_lock_write/read callers. Found the following:

kernel/events/uprobes.c in build_map_info:
       /*
        * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
        * reclaim. This is optimistic, no harm done if it fails.
        */


I got lost in the hugetlb.c code and couldn't convince myself it's


not allocating page directories at various levels with something else
than GFP_KERNEL.
So looks like the recursion is clearly there and known, but the
hugepage code is too complex and flying over my head.
-Daniel
OK, so I inverted your annotation and ran a memory hog, and got the below
splat. So clearly your proposed reclaim->i_mmap_lock locking order is an
already established one.
So
Reviewed-by: Thomas Hellström thomas.hellstrom@intel.com
No one complaining that this is a terrible idea and two reviews from
people who know stuff, so I went ahead and pushed this to drm-misc-next.
Thanks for taking a look at this.
-Daniel
...
8<---------------------------------------------------------------------------------------------
[  308.324654] WARNING: possible circular locking dependency detected
[  308.324655] 5.8.0-rc2+ #16 Not tainted
[  308.324656] ------------------------------------------------------
[  308.324657] kswapd0/98 is trying to acquire lock:
[  308.324658] ffff92a16f758428 (&mapping->i_mmap_rwsem){++++}-{3:3}, at:
rmap_walk_file+0x1c0/0x2f0
[  308.324663]
               but task is already holding lock:
[  308.324664] ffffffffb0960240 (fs_reclaim){+.+.}-{0:0}, at:
__fs_reclaim_acquire+0x5/0x30
[  308.324666]
               which lock already depends on the new lock.
[  308.324667]
               the existing dependency chain (in reverse order) is:
[  308.324667]
               -> #1 (fs_reclaim){+.+.}-{0:0}:
[  308.324670]        fs_reclaim_acquire+0x34/0x40
[  308.324672]        dma_resv_lockdep+0x186/0x224
[  308.324675]        do_one_initcall+0x5d/0x2c0
[  308.324676]        kernel_init_freeable+0x222/0x288
[  308.324678]        kernel_init+0xa/0x107
[  308.324679]        ret_from_fork+0x1f/0x30
[  308.324680]
               -> #0 (&mapping->i_mmap_rwsem){++++}-{3:3}:
[  308.324682]        __lock_acquire+0x119f/0x1fc0
[  308.324683]        lock_acquire+0xa4/0x3b0
[  308.324685]        down_read+0x2d/0x110
[  308.324686]        rmap_walk_file+0x1c0/0x2f0
[  308.324687]        page_referenced+0x133/0x150
[  308.324689]        shrink_active_list+0x142/0x610
[  308.324690]        balance_pgdat+0x229/0x620
[  308.324691]        kswapd+0x200/0x470
[  308.324693]        kthread+0x11f/0x140
[  308.324694]        ret_from_fork+0x1f/0x30
[  308.324694]
               other info that might help us debug this:
[  308.324695]  Possible unsafe locking scenario:
[  308.324695]        CPU0                    CPU1
[  308.324696]        ----                    ----
[  308.324696]   lock(fs_reclaim);
[  308.324697] lock(&mapping->i_mmap_rwsem);
[  308.324698]                                lock(fs_reclaim);
[  308.324699]   lock(&mapping->i_mmap_rwsem);
[  308.324699]
                *** DEADLOCK ***
[  308.324700] 1 lock held by kswapd0/98:
[  308.324701]  #0: ffffffffb0960240 (fs_reclaim){+.+.}-{0:0}, at:
__fs_reclaim_acquire+0x5/0x30
[  308.324702]
               stack backtrace:
[  308.324704] CPU: 1 PID: 98 Comm: kswapd0 Not tainted 5.8.0-rc2+ #16
[  308.324705] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 07/29/2019
[  308.324706] Call Trace:
[  308.324710]  dump_stack+0x92/0xc8
[  308.324711]  check_noncircular+0x12d/0x150
[  308.324713]  __lock_acquire+0x119f/0x1fc0
[  308.324715]  lock_acquire+0xa4/0x3b0
[  308.324716]  ? rmap_walk_file+0x1c0/0x2f0
[  308.324717]  ? __lock_acquire+0x394/0x1fc0
[  308.324719]  down_read+0x2d/0x110
[  308.324720]  ? rmap_walk_file+0x1c0/0x2f0
[  308.324721]  rmap_walk_file+0x1c0/0x2f0
[  308.324722]  page_referenced+0x133/0x150
[  308.324724]  ? __page_set_anon_rmap+0x70/0x70
[  308.324725]  ? page_get_anon_vma+0x190/0x190
[  308.324726]  shrink_active_list+0x142/0x610
[  308.324728]  balance_pgdat+0x229/0x620
[  308.324730]  kswapd+0x200/0x470
[  308.324731]  ? lockdep_hardirqs_on_prepare+0xf5/0x170
[  308.324733]  ? finish_wait+0x80/0x80
[  308.324734]  ? balance_pgdat+0x620/0x620
[  308.324736]  kthread+0x11f/0x140
[  308.324737]  ? kthread_create_worker_on_cpu+0x40/0x40
[  308.324739]  ret_from_fork+0x1f/0x30
...
...
/Thomas
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv