The patch titled Subject: mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs has been added to the -mm tree. Its filename is mm-page_alloc-fix-ref-bias-in-page_frag_alloc-for-1-byte-allocs.patch
This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-fix-ref-bias-in-page_... and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-fix-ref-bias-in-page_...
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated there every 3-4 working days
------------------------------------------------------ From: Jann Horn jannh@google.com Subject: mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum number of references that we might need to create in the fastpath later, the bump-allocation fastpath only has to modify the non-atomic bias value that tracks the number of extra references we hold instead of the atomic refcount. The maximum number of allocations we can serve (under the assumption that no allocation is made with size 0) is nc->size, so that's the bias used.
However, even when all memory in the allocation has been given away, a reference to the page is still held; and in the `offset < 0` slowpath, the page may be reused if everyone else has dropped their references. This means that the necessary number of references is actually `nc->size+1`.
Luckily, from a quick grep, it looks like the only path that can call page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which requires CAP_NET_ADMIN in the init namespace and is only intended to be used for kernel testing and fuzzing.
To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the `offset < 0` path, below the virt_to_page() call, and then repeatedly call writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI, with a vector consisting of 15 elements containing 1 byte each.
Link: http://lkml.kernel.org/r/20190213204157.12570-1-jannh@google.com Signed-off-by: Jann Horn jannh@google.com Cc: Michal Hocko mhocko@suse.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Pavel Tatashin pavel.tatashin@microsoft.com Cc: Oscar Salvador osalvador@suse.de Cc: Mel Gorman mgorman@techsingularity.net Cc: Aaron Lu aaron.lu@intel.com Cc: Alexander Duyck alexander.h.duyck@redhat.com Cc: David Miller davem@davemloft.net Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
mm/page_alloc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-ref-bias-in-page_frag_alloc-for-1-byte-allocs +++ a/mm/page_alloc.c @@ -4845,11 +4845,11 @@ refill: /* Even if we own the page, we do not use atomic_set(). * This would break get_page_unless_zero() users. */ - page_ref_add(page, size - 1); + page_ref_add(page, size);
/* reset page count bias and offset to start of new frag */ nc->pfmemalloc = page_is_pfmemalloc(page); - nc->pagecnt_bias = size; + nc->pagecnt_bias = size + 1; nc->offset = size; }
@@ -4865,10 +4865,10 @@ refill: size = nc->size; #endif /* OK, page count is 0, we can safely set it */ - set_page_count(page, size); + set_page_count(page, size + 1);
/* reset page count bias and offset to start of new frag */ - nc->pagecnt_bias = size; + nc->pagecnt_bias = size + 1; offset = size - fragsz; }
_
Patches currently in -mm which might be from jannh@google.com are
mm-page_alloc-fix-ref-bias-in-page_frag_alloc-for-1-byte-allocs.patch