Re: [PATCH] hugetlbfs: fix anon huge page migration race

13 Nov 2020

      On Thu, Nov 05, 2020 at 11:50:58AM -0800, Mike Kravetz wrote:
...
Qian Cai reported the following BUG in [1]
[ 6147.019063][T45242] LTP: starting move_pages12
[ 6147.475680][T64921] BUG: unable to handle page fault for address: ffffffffffffffe0
...
[ 6147.525866][T64921] RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170
avc_start_pgoff at mm/interval_tree.c:63
[ 6147.620914][T64921] Call Trace:
[ 6147.624078][T64921]  rmap_walk_anon+0x141/0xa30
rmap_walk_anon at mm/rmap.c:1864
[ 6147.628639][T64921]  try_to_unmap+0x209/0x2d0
try_to_unmap at mm/rmap.c:1763
[ 6147.633026][T64921]  ? rmap_walk_locked+0x140/0x140
[ 6147.637936][T64921]  ? page_remove_rmap+0x1190/0x1190
[ 6147.643020][T64921]  ? page_not_mapped+0x10/0x10
[ 6147.647668][T64921]  ? page_get_anon_vma+0x290/0x290
[ 6147.652664][T64921]  ? page_mapcount_is_zero+0x10/0x10
[ 6147.657838][T64921]  ? hugetlb_page_mapping_lock_write+0x97/0x180
[ 6147.663972][T64921]  migrate_pages+0x1005/0x1fb0
[ 6147.668617][T64921]  ? remove_migration_pte+0xac0/0xac0
[ 6147.673875][T64921]  move_pages_and_store_status.isra.47+0xd7/0x1a0
[ 6147.680181][T64921]  ? migrate_pages+0x1fb0/0x1fb0
[ 6147.685002][T64921]  __x64_sys_move_pages+0xa5c/0x1100
[ 6147.690176][T64921]  ? trace_hardirqs_on+0x20/0x1b5
[ 6147.695084][T64921]  ? move_pages_and_store_status.isra.47+0x1a0/0x1a0
[ 6147.701653][T64921]  ? rcu_read_lock_sched_held+0xaa/0xd0
[ 6147.707088][T64921]  ? switch_fpu_return+0x196/0x400
[ 6147.712083][T64921]  ? lockdep_hardirqs_on_prepare+0x38c/0x550
[ 6147.717954][T64921]  ? do_syscall_64+0x24/0x310
[ 6147.722513][T64921]  do_syscall_64+0x5f/0x310
[ 6147.726897][T64921]  ? trace_hardirqs_off+0x12/0x1a0
[ 6147.731894][T64921]  ? asm_exc_page_fault+0x8/0x30
[ 6147.736714][T64921]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Hugh Dickens diagnosed this as a migration bug caused by code introduced
to use i_mmap_rwsem for pmd sharing synchronization.  Specifically, the
routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED
flag to try_to_unmap() while holding i_mmap_rwsem.   This is wrong for
anon pages as the anon_vma_lock should be held in this case.  Further
analysis suggested that i_mmap_rwsem was not required to he held at all
when calling try_to_unmap for anon pages as an anon page could never be
part of a shared pmd mapping.
Discussion also revealed that the hack in hugetlb_page_mapping_lock_write
to drop page lock and acquire i_mmap_rwsem is wrong.  There is no way to
keep mapping valid while dropping page lock.
This patch does the following:

Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when
calling try_to_unmap.
Remove the hacky code in hugetlb_page_mapping_lock_write.  The routine
will now simply do a 'trylock' while still holding the page lock.  If
the trylock fails, it will return NULL.  This could impact the callers:
migration calling code will receive -EAGAIN and retry up to the hard
coded limit (10).
memory error code will treat the page as BUSY.  This will force
killing (SIGKILL) instead of SIGBUS any mapping tasks.

Do note that this change in behavior only happens when there is a race.
None of the standard kernel testing suites actually hit this race, but
it is possible.

[1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/
[2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.an...
Reported-by: Qian Cai cai@lca.pw
Suggested-by: Hugh Dickins hughd@google.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Kravetz mike.kravetz@oracle.com
This approch looks simpler and better than former ones.
Thank you for the update.
Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] hugetlbfs: fix anon huge page migration race