Re: [PATCH v2 4/8] mm/gup: track FOLL_PIN pages

30 Jan 2020

On 1/29/20 5:51 AM, Kirill A. Shutemov wrote:
...
On Tue, Jan 28, 2020 at 07:24:13PM -0800, John Hubbard wrote:
...
Add tracking of pages that were pinned via FOLL_PIN. This tracking is
implemented via overloading of page->_refcount: pins are added by
adding GUP_PIN_COUNTING_BIAS (1024) to the refcount. This provides a
fuzzy indication of pinning, and it can have false positives (and that's
OK). Please see the pre-existing
Documentation/core-api/pin_user_pages.rst for details.
As mentioned in pin_user_pages.rst, callers who effectively set FOLL_PIN
(typically via pin_user_pages*()) are required to ultimately free such
pages via unpin_user_page().
Please also not the limitation, discussed in pin_user_pages.rst under
s/not/note/
Fixed, thanks!
...
...
...
+/**


page_dma_pinned() - report if a page is pinned for DMA.







This function checks if a page has been pinned via a call to



pin_user_pages*().







For non-huge pages, the return value is partially fuzzy: false is not fuzzy,



because it means "definitely not pinned for DMA", but true means "probably



pinned for DMA, but possibly a false positive due to having at least



GUP_PIN_COUNTING_BIAS worth of normal page references".







False positives are OK, because: a) it's unlikely for a page to get that many



refcounts, and b) all the callers of this routine are expected to be able to



deal gracefully with a false positive.



I wounder if we should reverse the logic and name -- page_not_dma_pinned()
or something -- too emphasise that we can only know for sure when the page
is not pinned, but not necessary when it is.
This is an interesting point. I agree that it's worth maybe adding information
into the function name, but I'd like to keep the bool "positive", because there
will be a number of callers that ask "if it is possibly dma-pinned, then ...".
So combining that, how about this function name:
page_maybe_dma_pinned()
, which I could live with and I think would be acceptable?
...
...






For more information, please see Documentation/vm/pin_user_pages.rst.







@page:	pointer to page to be queried.



@Return:	True, if it is likely that the page has been "dma-pinned".



False, if the page is definitely not dma-pinned.




*/

+static inline bool page_dma_pinned(struct page *page)
+{

/*
* page_ref_count() is signed. If that refcount overflows, then


* page_ref_count() returns a negative value, and callers will avoid


* further incrementing the refcount.


*


* Here, for that overflow case, use the signed bit to count a little


* bit higher via unsigned math, and thus still get an accurate result


* from page_dma_pinned().


*/


return ((unsigned int)page_ref_count(compound_head(page))) >=
GUP_PIN_COUNTING_BIAS;



Do you expect it too be called on tail pages?
Yes, we definitely cannot rule that out.
...
...
+}

#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define SECTION_IN_PAGE_FLAGS
#endif

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 14d14beb1f7f..b9cbe553d1e7 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -102,6 +102,16 @@ static inline void page_ref_sub(struct page *page, int nr)
   	__page_ref_mod(page, -nr);
  }
  
+static inline int page_ref_sub_return(struct page *page, int nr)
+{

int ret = atomic_sub_return(nr, &page->_refcount);

if (page_ref_tracepoint_active(__tracepoint_page_ref_mod))
__page_ref_mod(page, -nr);



return ret;

+}



I see opportunity to split the patch further.
ah, OK. I wasn't sure how far to go before I get tagged for "excessive
patch splitting"! haha. Anyway, are you suggesting to put the
page_ref_sub_return() routine into it's own patch?
Another thing to split out would be adding the flags to the remaining
functions, such as undo_dev_pagemap(). That burns quite a few lines of
diff. Anything else to split out?
...
...
static inline void page_ref_inc(struct page *page)
  {
   atomic_inc(&page->_refcount);
diff --git a/mm/gup.c b/mm/gup.c
index 9e117998274c..7a96490dcc54 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -44,6 +44,136 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
   return head;
  }
  
+/*


try_grab_compound_head() - attempt to elevate a page's refcount, by a



flags-dependent amount.







"grab" names in this file mean, "look at flags to decide whether to use



FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount.







Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the



same time. (That's true throughout the get_user_pages*() and



pin_user_pages*() APIs.) Cases:







FOLL_GET: page's refcount will be incremented by 1.



FOLL_PIN: page's refcount will be incremented by GUP_PIN_COUNTING_BIAS.







Return: head page (with refcount appropriately incremented) for success, or



NULL upon failure. If neither FOLL_GET nor FOLL_PIN was set, that's



considered failure, and furthermore, a likely bug in the caller, so a warning



is also emitted.


*/

+static __maybe_unused struct page *try_grab_compound_head(struct page *page,

					  int refs,


					  unsigned int flags)



+{

if (flags & FOLL_GET)
return try_get_compound_head(page, refs);


else if (flags & FOLL_PIN) {
refs *= GUP_PIN_COUNTING_BIAS;


return try_get_compound_head(page, refs);



Maybe overflow detection? At least under VM_BUG_ON()?
OK, yes I see now that there is no check to see if we're about to overflow
the refs, in this path. I'll add one.
...
...
...
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0a55dec68925..b1079aaa6f24 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -958,6 +958,11 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
    */
   WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set");

/* FOLL_GET and FOLL_PIN are mutually exclusive. */
if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
	 (FOLL_PIN | FOLL_GET)))



Too many parentheses.
OK, I'll remove at least one. :)
...
...

return NULL;


if (flags & FOLL_WRITE && !pmd_write(*pmd))
  return NULL;

@@ -973,7 +978,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
    * device mapped pages can only be returned if the
    * caller will manage the page reference count.
    */

if (!(flags & FOLL_GET))


if (!(flags & (FOLL_GET | FOLL_PIN)))
return ERR_PTR(-EEXIST);

pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT;
@@ -981,7 +986,8 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
   if (!*pgmap)
   	return ERR_PTR(-EFAULT);
   page = pfn_to_page(pfn);

get_page(page);


if (!try_grab_page(page, flags))
page = ERR_PTR(-ENOMEM);



return page;
  }
@@ -1101,6 +1107,11 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
   if (flags & FOLL_WRITE && !pud_write(*pud))
   	return NULL;

/* FOLL_GET and FOLL_PIN are mutually exclusive. */
if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
	 (FOLL_PIN | FOLL_GET)))


return NULL;




Ditto.
ACK.
...
...
...
@@ -4965,6 +4958,12 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
   struct page *page = NULL;
   spinlock_t *ptl;
   pte_t pte;


/* FOLL_GET and FOLL_PIN are mutually exclusive. */
if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
	 (FOLL_PIN | FOLL_GET)))


return NULL;




Ditto.
ACK.
thanks,
-- 
John Hubbard
NVIDIA

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2 4/8] mm/gup: track FOLL_PIN pages