Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()

16 Jun 2021


      On Wed, Jun 16, 2021 at 10:27 AM Vlastimil Babka vbabka@suse.cz wrote:
...
On 6/16/21 1:10 AM, Yang Shi wrote:
...
On Tue, Jun 15, 2021 at 5:10 AM Jann Horn jannh@google.com wrote:
...
On Tue, Jun 15, 2021 at 8:37 AM John Hubbard jhubbard@nvidia.com wrote:
...
On 6/14/21 6:20 PM, Jann Horn wrote:
...
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent
freeing of page tables (via local_irq_save()), but not against
concurrent TLB flushes, freeing of data pages, or splitting of compound
pages.
[...]
...
Reviewed-by: John Hubbard jhubbard@nvidia.com
Thanks!
[...]
...
...
@@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
      if (WARN_ON_ONCE(page_ref_count(head) < 0))
              return NULL;
      if (unlikely(!page_cache_add_speculative(head, refs)))
              return NULL;


/*


 * At this point we have a stable reference to the head page; but it


 * could be that between the compound_head() lookup and the refcount


 * increment, the compound page was split, in which case we'd end up


 * holding a reference on a page that has nothing to do with the page


 * we were given anymore.


 * So now that the head page is stable, recheck that the pages still


 * belong together.


 */


if (unlikely(compound_head(page) != head)) {



I was just wondering about what all could happen here. Such as: page gets split,
reallocated into a different-sized compound page, one that still has page pointing
to head. I think that's OK, because we don't look at or change other huge page
fields.
But I thought I'd mention the idea in case anyone else has any clever ideas about
how this simple check might be insufficient here. It seems fine to me, but I
routinely lack enough imagination about concurrent operations. :)
Hmmm... I think the scariest aspect here is probably the interaction
with concurrent allocation of a compound page on architectures with
store-store reordering (like ARM). *If* the page allocator handled
compound pages with lockless, non-atomic percpu freelists, I think it
might be possible that the zeroing of tail_page->compound_head in
put_page() could be reordered after the page has been freed,
reallocated and set to refcount 1 again?
That shouldn't be possible at the moment, but it is still a bit scary.
It might be possible after Mel's "mm/page_alloc: Allow high-order
pages to be stored on the per-cpu lists" patch
(https://patchwork.kernel.org/project/linux-mm/patch/20210611135753.GC30378@t...).
Those would be percpu indeed, but not "lockless, non-atomic", no? They are
protected by a local_lock.
The local_lock is *not* a lock on non-PREEMPT_RT kernel IIUC. It
disables preempt and IRQ. But preempt disable is no-op on non-preempt
kernel. IRQ disable can guarantee it is atomic context, but I'm not
sure if it is equivalent to "atomic freelists" in Jann's context.
...
...
...
I think the lockless page cache code also has to deal with somewhat
similar ordering concerns when it uses page_cache_get_speculative(),
e.g. in mapping_get_entry() - first it looks up a page pointer with
xas_load(), and any access to the page later on would be a _dependent
load_, but if the page then gets freed, reallocated, and inserted into
the page cache again before the refcount increment and the re-check
using xas_reload(), then there would be no data dependency from
xas_reload() to the following use of the page...

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()