The patch titled
Subject: mm/khugepaged: collapse_shmem() without freezing new_page
has been removed from the -mm tree. Its filename was
mm-khugepaged-collapse_shmem-without-freezing-new_page.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/khugepaged: collapse_shmem() without freezing new_page
khugepaged's collapse_shmem() does almost all of its work, to assemble the
huge new_page from 512 scattered old pages, with the new_page's refcount
frozen to 0 (and refcounts of all old pages so far also frozen to 0).
Including shmem_getpage() to read in any which were out on swap, memory
reclaim if necessary to allocate their intermediate pages, and copying
over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting to
spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong. So simply
eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are all
unfrozen on success or failure: just as before. They could be unfrozen
sooner, but cause no problem once no longer visible to find_get_entry(),
filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
--- a/mm/khugepaged.c~mm-khugepaged-collapse_shmem-without-freezing-new_page
+++ a/mm/khugepaged.c
@@ -1287,7 +1287,7 @@ static void retract_page_tables(struct a
* collapse_shmem - collapse small tmpfs/shmem pages into huge one.
*
* Basic scheme is simple, details are more complex:
- * - allocate and freeze a new huge page;
+ * - allocate and lock a new huge page;
* - scan page cache replacing old pages with the new one
* + swap in pages if necessary;
* + fill in gaps;
@@ -1295,11 +1295,11 @@ static void retract_page_tables(struct a
* - if replacing succeeds:
* + copy data over;
* + free old pages;
- * + unfreeze huge page;
+ * + unlock huge page;
* - if replacing failed;
* + put all pages back and unfreeze them;
* + restore gaps in the page cache;
- * + free huge page;
+ * + unlock and free huge page;
*/
static void collapse_shmem(struct mm_struct *mm,
struct address_space *mapping, pgoff_t start,
@@ -1333,13 +1333,11 @@ static void collapse_shmem(struct mm_str
__SetPageSwapBacked(new_page);
new_page->index = start;
new_page->mapping = mapping;
- BUG_ON(!page_ref_freeze(new_page, 1));
/*
- * At this point the new_page is 'frozen' (page_count() is zero),
- * locked and not up-to-date. It's safe to insert it into the page
- * cache, because nobody would be able to map it or use it in other
- * way until we unfreeze it.
+ * At this point the new_page is locked and not up-to-date.
+ * It's safe to insert it into the page cache, because nobody would
+ * be able to map it or use it in another way until we unlock it.
*/
/* This will be less messy when we use multi-index entries */
@@ -1491,9 +1489,8 @@ xa_unlocked:
index++;
}
- /* Everything is ready, let's unfreeze the new_page */
SetPageUptodate(new_page);
- page_ref_unfreeze(new_page, HPAGE_PMD_NR);
+ page_ref_add(new_page, HPAGE_PMD_NR - 1);
set_page_dirty(new_page);
mem_cgroup_commit_charge(new_page, memcg, false, true);
lru_cache_add_anon(new_page);
@@ -1541,8 +1538,6 @@ xa_unlocked:
VM_BUG_ON(nr_none);
xas_unlock_irq(&xas);
- /* Unfreeze new_page, caller would take care about freeing it */
- page_ref_unfreeze(new_page, 1);
mem_cgroup_cancel_charge(new_page, memcg, true);
new_page->mapping = NULL;
}
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-put_and_wait_on_page_locked-while-page-is-migrated.patch
The patch titled
Subject: mm/khugepaged: collapse_shmem() remember to clear holes
has been removed from the -mm tree. Its filename was
mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/khugepaged: collapse_shmem() remember to clear holes
Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp flags
khugepaged uses to allocate a huge page - in all common cases it would
just be a waste of effort - so collapse_shmem() must remember to clear out
any holes that it instantiates.
The obvious place to do so, where they are put into the page cache tree,
is not a good choice: because interrupts are disabled there. Leave it
until further down, once success is assured, where the other pages are
copied (before setting PageUptodate).
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261525080.2275@eggly.anvils
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/mm/khugepaged.c~mm-khugepaged-collapse_shmem-remember-to-clear-holes
+++ a/mm/khugepaged.c
@@ -1467,7 +1467,12 @@ xa_unlocked:
* Replacing old pages with new one has succeeded, now we
* need to copy the content and free the old pages.
*/
+ index = start;
list_for_each_entry_safe(page, tmp, &pagelist, lru) {
+ while (index < page->index) {
+ clear_highpage(new_page + (index % HPAGE_PMD_NR));
+ index++;
+ }
copy_highpage(new_page + (page->index % HPAGE_PMD_NR),
page);
list_del(&page->lru);
@@ -1477,6 +1482,11 @@ xa_unlocked:
ClearPageActive(page);
ClearPageUnevictable(page);
put_page(page);
+ index++;
+ }
+ while (index < end) {
+ clear_highpage(new_page + (index % HPAGE_PMD_NR));
+ index++;
}
local_irq_disable();
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-put_and_wait_on_page_locked-while-page-is-migrated.patch
The patch titled
Subject: mm/khugepaged: fix crashes due to misaccounted holes
has been removed from the -mm tree. Its filename was
mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/khugepaged: fix crashes due to misaccounted holes
Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent hit
shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by clear_inode()'s
BUG_ON(inode->i_data.nrpages) when the file was later closed and unlinked.
khugepaged's collapse_shmem() was forgetting to update mapping->nrpages on
the rollback path, after it had added but then needs to undo some holes.
There is indeed an irritating asymmetry between shmem_charge(), whose
callers want it to increment nrpages after successfully accounting blocks,
and shmem_uncharge(), when __delete_from_page_cache() already decremented
nrpages itself: oh well, just add a comment on that to them both.
And shmem_recalc_inode() is supposed to be called when the accounting is
expected to be in balance (so it can deduce from imbalance that reclaim
discarded some pages): so change shmem_charge() to update nrpages earlier
(though it's rare for the difference to matter at all).
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261523450.2275@eggly.anvils
Fixes: 800d8c63b2e98 ("shmem: add huge pages support")
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 5 ++++-
mm/shmem.c | 6 +++++-
2 files changed, 9 insertions(+), 2 deletions(-)
--- a/mm/khugepaged.c~mm-khugepaged-fix-crashes-due-to-misaccounted-holes
+++ a/mm/khugepaged.c
@@ -1506,9 +1506,12 @@ xa_unlocked:
khugepaged_pages_collapsed++;
} else {
struct page *page;
+
/* Something went wrong: roll back page cache changes */
- shmem_uncharge(mapping->host, nr_none);
xas_lock_irq(&xas);
+ mapping->nrpages -= nr_none;
+ shmem_uncharge(mapping->host, nr_none);
+
xas_set(&xas, start);
xas_for_each(&xas, page, end - 1) {
page = list_first_entry_or_null(&pagelist,
--- a/mm/shmem.c~mm-khugepaged-fix-crashes-due-to-misaccounted-holes
+++ a/mm/shmem.c
@@ -297,12 +297,14 @@ bool shmem_charge(struct inode *inode, l
if (!shmem_inode_acct_block(inode, pages))
return false;
+ /* nrpages adjustment first, then shmem_recalc_inode() when balanced */
+ inode->i_mapping->nrpages += pages;
+
spin_lock_irqsave(&info->lock, flags);
info->alloced += pages;
inode->i_blocks += pages * BLOCKS_PER_PAGE;
shmem_recalc_inode(inode);
spin_unlock_irqrestore(&info->lock, flags);
- inode->i_mapping->nrpages += pages;
return true;
}
@@ -312,6 +314,8 @@ void shmem_uncharge(struct inode *inode,
struct shmem_inode_info *info = SHMEM_I(inode);
unsigned long flags;
+ /* nrpages adjustment done by __delete_from_page_cache() or caller */
+
spin_lock_irqsave(&info->lock, flags);
info->alloced -= pages;
inode->i_blocks -= pages * BLOCKS_PER_PAGE;
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-put_and_wait_on_page_locked-while-page-is-migrated.patch
The patch titled
Subject: mm/khugepaged: collapse_shmem() stop if punched or truncated
has been removed from the -mm tree. Its filename was
mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/khugepaged: collapse_shmem() stop if punched or truncated
Huge tmpfs testing showed that although collapse_shmem() recognizes a
concurrently truncated or hole-punched page correctly, its handling of
holes was liable to refill an emptied extent. Add check to stop that.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261522040.2275@eggly.anvils
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/mm/khugepaged.c~mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated
+++ a/mm/khugepaged.c
@@ -1359,6 +1359,17 @@ static void collapse_shmem(struct mm_str
VM_BUG_ON(index != xas.xa_index);
if (!page) {
+ /*
+ * Stop if extent has been truncated or hole-punched,
+ * and is now completely empty.
+ */
+ if (index == start) {
+ if (!xas_next_entry(&xas, end - 1)) {
+ result = SCAN_TRUNCATED;
+ break;
+ }
+ xas_set(&xas, index);
+ }
if (!shmem_charge(mapping->host, 1)) {
result = SCAN_FAIL;
break;
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-put_and_wait_on_page_locked-while-page-is-migrated.patch
The patch titled
Subject: mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
has been removed from the -mm tree. Its filename was
mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
Huge tmpfs testing, on 32-bit kernel with lockdep enabled, showed that
__split_huge_page() was using i_size_read() while holding the irq-safe
lru_lock and page tree lock, but the 32-bit i_size_read() uses an
irq-unsafe seqlock which should not be nested inside them.
Instead, read the i_size earlier in split_huge_page_to_list(), and pass
the end offset down to __split_huge_page(): all while holding head page
lock, which is enough to prevent truncation of that extent before the page
tree lock has been taken.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261520070.2275@eggly.anvils
Fixes: baa355fd33142 ("thp: file pages support for split_huge_page()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read
+++ a/mm/huge_memory.c
@@ -2439,12 +2439,11 @@ static void __split_huge_page_tail(struc
}
static void __split_huge_page(struct page *page, struct list_head *list,
- unsigned long flags)
+ pgoff_t end, unsigned long flags)
{
struct page *head = compound_head(page);
struct zone *zone = page_zone(head);
struct lruvec *lruvec;
- pgoff_t end = -1;
int i;
lruvec = mem_cgroup_page_lruvec(head, zone->zone_pgdat);
@@ -2452,9 +2451,6 @@ static void __split_huge_page(struct pag
/* complete memcg works before add pages to LRU */
mem_cgroup_split_huge_fixup(head);
- if (!PageAnon(page))
- end = DIV_ROUND_UP(i_size_read(head->mapping->host), PAGE_SIZE);
-
for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond i_size: drop them from page cache */
@@ -2626,6 +2622,7 @@ int split_huge_page_to_list(struct page
int count, mapcount, extra_pins, ret;
bool mlocked;
unsigned long flags;
+ pgoff_t end;
VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -2648,6 +2645,7 @@ int split_huge_page_to_list(struct page
ret = -EBUSY;
goto out;
}
+ end = -1;
mapping = NULL;
anon_vma_lock_write(anon_vma);
} else {
@@ -2661,6 +2659,15 @@ int split_huge_page_to_list(struct page
anon_vma = NULL;
i_mmap_lock_read(mapping);
+
+ /*
+ *__split_huge_page() may need to trim off pages beyond EOF:
+ * but on 32-bit, i_size_read() takes an irq-unsafe seqlock,
+ * which cannot be nested inside the page tree lock. So note
+ * end now: i_size itself may be changed at any moment, but
+ * head page lock is good enough to serialize the trimming.
+ */
+ end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE);
}
/*
@@ -2707,7 +2714,7 @@ int split_huge_page_to_list(struct page
if (mapping)
__dec_node_page_state(page, NR_SHMEM_THPS);
spin_unlock(&pgdata->split_queue_lock);
- __split_huge_page(page, list, flags);
+ __split_huge_page(page, list, end, flags);
if (PageSwapCache(head)) {
swp_entry_t entry = { .val = page_private(head) };
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-put_and_wait_on_page_locked-while-page-is-migrated.patch
The patch titled
Subject: mm/huge_memory: splitting set mapping+index before unfreeze
has been removed from the -mm tree. Its filename was
mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/huge_memory: splitting set mapping+index before unfreeze
Huge tmpfs stress testing has occasionally hit shmem_undo_range()'s
VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page).
Move the setting of mapping and index up before the page_ref_unfreeze() in
__split_huge_page_tail() to fix this: so that a page cache lookup cannot
get a reference while the tail's mapping and index are unstable.
In fact, might as well move them up before the smp_wmb(): I don't see an
actual need for that, but if I'm missing something, this way round is
safer than the other, and no less efficient.
You might argue that VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page) is
misplaced, and should be left until after the trylock_page(); but left as
is has not crashed since, and gives more stringent assurance.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261516380.2275@eggly.anvils
Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()")
Requires: 605ca5ede764 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memory-splitting-set-mappingindex-before-unfreeze
+++ a/mm/huge_memory.c
@@ -2402,6 +2402,12 @@ static void __split_huge_page_tail(struc
(1L << PG_unevictable) |
(1L << PG_dirty)));
+ /* ->mapping in first tail page is compound_mapcount */
+ VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING,
+ page_tail);
+ page_tail->mapping = head->mapping;
+ page_tail->index = head->index + tail;
+
/* Page flags must be visible before we make the page non-compound. */
smp_wmb();
@@ -2422,12 +2428,6 @@ static void __split_huge_page_tail(struc
if (page_is_idle(head))
set_page_idle(page_tail);
- /* ->mapping in first tail page is compound_mapcount */
- VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING,
- page_tail);
- page_tail->mapping = head->mapping;
-
- page_tail->index = head->index + tail;
page_cpupid_xchg_last(page_tail, page_cpupid_last(head));
/*
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-put_and_wait_on_page_locked-while-page-is-migrated.patch
The patch titled
Subject: mm/huge_memory: rename freeze_page() to unmap_page()
has been removed from the -mm tree. Its filename was
mm-huge_memory-rename-freeze_page-to-unmap_page.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/huge_memory: rename freeze_page() to unmap_page()
The term "freeze" is used in several ways in the kernel, and in mm it has
the particular meaning of forcing page refcount temporarily to 0.
freeze_page() is just too confusing a name for a function that unmaps a
page: rename it unmap_page(), and rename unfreeze_page() remap_page().
Went to change the mention of freeze_page() added later in mm/rmap.c, but
found it to be incorrect: ordinary page reclaim reaches there too; but the
substance of the comment still seems correct, so edit it down.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261514080.2275@eggly.anvils
Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 12 ++++++------
mm/rmap.c | 13 +++----------
2 files changed, 9 insertions(+), 16 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memory-rename-freeze_page-to-unmap_page
+++ a/mm/huge_memory.c
@@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_are
}
}
-static void freeze_page(struct page *page)
+static void unmap_page(struct page *page)
{
enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS |
TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
@@ -2365,7 +2365,7 @@ static void freeze_page(struct page *pag
VM_BUG_ON_PAGE(!unmap_success, page);
}
-static void unfreeze_page(struct page *page)
+static void remap_page(struct page *page)
{
int i;
if (PageTransHuge(page)) {
@@ -2483,7 +2483,7 @@ static void __split_huge_page(struct pag
spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags);
- unfreeze_page(head);
+ remap_page(head);
for (i = 0; i < HPAGE_PMD_NR; i++) {
struct page *subpage = head + i;
@@ -2664,7 +2664,7 @@ int split_huge_page_to_list(struct page
}
/*
- * Racy check if we can split the page, before freeze_page() will
+ * Racy check if we can split the page, before unmap_page() will
* split PMDs
*/
if (!can_split_huge_page(head, &extra_pins)) {
@@ -2673,7 +2673,7 @@ int split_huge_page_to_list(struct page
}
mlocked = PageMlocked(page);
- freeze_page(head);
+ unmap_page(head);
VM_BUG_ON_PAGE(compound_mapcount(head), head);
/* Make sure the page is not on per-CPU pagevec as it takes pin */
@@ -2727,7 +2727,7 @@ int split_huge_page_to_list(struct page
fail: if (mapping)
xa_unlock(&mapping->i_pages);
spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags);
- unfreeze_page(head);
+ remap_page(head);
ret = -EBUSY;
}
--- a/mm/rmap.c~mm-huge_memory-rename-freeze_page-to-unmap_page
+++ a/mm/rmap.c
@@ -1627,16 +1627,9 @@ static bool try_to_unmap_one(struct page
address + PAGE_SIZE);
} else {
/*
- * We should not need to notify here as we reach this
- * case only from freeze_page() itself only call from
- * split_huge_page_to_list() so everything below must
- * be true:
- * - page is not anonymous
- * - page is locked
- *
- * So as it is a locked file back page thus it can not
- * be remove from the page cache and replace by a new
- * page before mmu_notifier_invalidate_range_end so no
+ * This is a locked file-backed page, thus it cannot
+ * be removed from the page cache and replaced by a new
+ * page before mmu_notifier_invalidate_range_end, so no
* concurrent thread might update its page table to
* point at new page while a device still is using this
* page.
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-put_and_wait_on_page_locked-while-page-is-migrated.patch
The patch titled
Subject: userfaultfd: shmem: add i_size checks
has been removed from the -mm tree. Its filename was
userfaultfd-shmem-add-i_size-checks.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: userfaultfd: shmem: add i_size checks
With MAP_SHARED: recheck the i_size after taking the PT lock, to serialize
against truncate with the PT lock. Delete the page from the pagecache if
the i_size_read check fails.
With MAP_PRIVATE: check the i_size after the PT lock before mapping
anonymous memory or zeropages into the MAP_PRIVATE shmem mapping.
A mostly irrelevant cleanup: like we do the delete_from_page_cache()
pagecache removal after dropping the PT lock, the PT lock is a spinlock so
drop it before the sleepable page lock.
Link: http://lkml.kernel.org/r/20181126173452.26955-5-aarcange@redhat.com
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd(a)google.com>
Reported-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 18 ++++++++++++++++--
mm/userfaultfd.c | 26 ++++++++++++++++++++++++--
2 files changed, 40 insertions(+), 4 deletions(-)
--- a/mm/shmem.c~userfaultfd-shmem-add-i_size-checks
+++ a/mm/shmem.c
@@ -2216,6 +2216,7 @@ static int shmem_mfill_atomic_pte(struct
struct page *page;
pte_t _dst_pte, *dst_pte;
int ret;
+ pgoff_t offset, max_off;
ret = -ENOMEM;
if (!shmem_inode_acct_block(inode, 1))
@@ -2253,6 +2254,12 @@ static int shmem_mfill_atomic_pte(struct
__SetPageSwapBacked(page);
__SetPageUptodate(page);
+ ret = -EFAULT;
+ offset = linear_page_index(dst_vma, dst_addr);
+ max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+ if (unlikely(offset >= max_off))
+ goto out_release;
+
ret = mem_cgroup_try_charge_delay(page, dst_mm, gfp, &memcg, false);
if (ret)
goto out_release;
@@ -2268,8 +2275,14 @@ static int shmem_mfill_atomic_pte(struct
if (dst_vma->vm_flags & VM_WRITE)
_dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
- ret = -EEXIST;
dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
+
+ ret = -EFAULT;
+ max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+ if (unlikely(offset >= max_off))
+ goto out_release_uncharge_unlock;
+
+ ret = -EEXIST;
if (!pte_none(*dst_pte))
goto out_release_uncharge_unlock;
@@ -2287,13 +2300,14 @@ static int shmem_mfill_atomic_pte(struct
/* No need to invalidate - it was non-present before */
update_mmu_cache(dst_vma, dst_addr, dst_pte);
- unlock_page(page);
pte_unmap_unlock(dst_pte, ptl);
+ unlock_page(page);
ret = 0;
out:
return ret;
out_release_uncharge_unlock:
pte_unmap_unlock(dst_pte, ptl);
+ delete_from_page_cache(page);
out_release_uncharge:
mem_cgroup_cancel_charge(page, memcg, false);
out_release:
--- a/mm/userfaultfd.c~userfaultfd-shmem-add-i_size-checks
+++ a/mm/userfaultfd.c
@@ -33,6 +33,8 @@ static int mcopy_atomic_pte(struct mm_st
void *page_kaddr;
int ret;
struct page *page;
+ pgoff_t offset, max_off;
+ struct inode *inode;
if (!*pagep) {
ret = -ENOMEM;
@@ -73,8 +75,17 @@ static int mcopy_atomic_pte(struct mm_st
if (dst_vma->vm_flags & VM_WRITE)
_dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
- ret = -EEXIST;
dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
+ if (dst_vma->vm_file) {
+ /* the shmem MAP_PRIVATE case requires checking the i_size */
+ inode = dst_vma->vm_file->f_inode;
+ offset = linear_page_index(dst_vma, dst_addr);
+ max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+ ret = -EFAULT;
+ if (unlikely(offset >= max_off))
+ goto out_release_uncharge_unlock;
+ }
+ ret = -EEXIST;
if (!pte_none(*dst_pte))
goto out_release_uncharge_unlock;
@@ -108,11 +119,22 @@ static int mfill_zeropage_pte(struct mm_
pte_t _dst_pte, *dst_pte;
spinlock_t *ptl;
int ret;
+ pgoff_t offset, max_off;
+ struct inode *inode;
_dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr),
dst_vma->vm_page_prot));
- ret = -EEXIST;
dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
+ if (dst_vma->vm_file) {
+ /* the shmem MAP_PRIVATE case requires checking the i_size */
+ inode = dst_vma->vm_file->f_inode;
+ offset = linear_page_index(dst_vma, dst_addr);
+ max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+ ret = -EFAULT;
+ if (unlikely(offset >= max_off))
+ goto out_unlock;
+ }
+ ret = -EEXIST;
if (!pte_none(*dst_pte))
goto out_unlock;
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
_
Patches currently in -mm which might be from aarcange(a)redhat.com are
The patch titled
Subject: userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
has been removed from the -mm tree. Its filename was
userfaultfd-shmem-hugetlbfs-only-allow-to-register-vm_maywrite-vmas.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
After the VMA to register the uffd onto is found, check that it has
VM_MAYWRITE set before allowing registration. This way we inherit all
common code checks before allowing to fill file holes in shmem and
hugetlbfs with UFFDIO_COPY.
The userfaultfd memory model is not applicable for readonly files unless
it's a MAP_PRIVATE.
Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
Fixes: ff62a3421044 ("hugetlb: implement memfd sealing")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd(a)google.com>
Reported-by: Jann Horn <jannh(a)google.com>
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Cc: <stable(a)vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 15 +++++++++++++++
mm/userfaultfd.c | 15 ++++++---------
2 files changed, 21 insertions(+), 9 deletions(-)
--- a/fs/userfaultfd.c~userfaultfd-shmem-hugetlbfs-only-allow-to-register-vm_maywrite-vmas
+++ a/fs/userfaultfd.c
@@ -1361,6 +1361,19 @@ static int userfaultfd_register(struct u
ret = -EINVAL;
if (!vma_can_userfault(cur))
goto out_unlock;
+
+ /*
+ * UFFDIO_COPY will fill file holes even without
+ * PROT_WRITE. This check enforces that if this is a
+ * MAP_SHARED, the process has write permission to the backing
+ * file. If VM_MAYWRITE is set it also enforces that on a
+ * MAP_SHARED vma: there is no F_WRITE_SEAL and no further
+ * F_WRITE_SEAL can be taken until the vma is destroyed.
+ */
+ ret = -EPERM;
+ if (unlikely(!(cur->vm_flags & VM_MAYWRITE)))
+ goto out_unlock;
+
/*
* If this vma contains ending address, and huge pages
* check alignment.
@@ -1406,6 +1419,7 @@ static int userfaultfd_register(struct u
BUG_ON(!vma_can_userfault(vma));
BUG_ON(vma->vm_userfaultfd_ctx.ctx &&
vma->vm_userfaultfd_ctx.ctx != ctx);
+ WARN_ON(!(vma->vm_flags & VM_MAYWRITE));
/*
* Nothing to do: this vma is already registered into this
@@ -1552,6 +1566,7 @@ static int userfaultfd_unregister(struct
cond_resched();
BUG_ON(!vma_can_userfault(vma));
+ WARN_ON(!(vma->vm_flags & VM_MAYWRITE));
/*
* Nothing to do: this vma is already registered into this
--- a/mm/userfaultfd.c~userfaultfd-shmem-hugetlbfs-only-allow-to-register-vm_maywrite-vmas
+++ a/mm/userfaultfd.c
@@ -205,8 +205,9 @@ retry:
if (!dst_vma || !is_vm_hugetlb_page(dst_vma))
goto out_unlock;
/*
- * Only allow __mcopy_atomic_hugetlb on userfaultfd
- * registered ranges.
+ * Check the vma is registered in uffd, this is
+ * required to enforce the VM_MAYWRITE check done at
+ * uffd registration time.
*/
if (!dst_vma->vm_userfaultfd_ctx.ctx)
goto out_unlock;
@@ -459,13 +460,9 @@ retry:
if (!dst_vma)
goto out_unlock;
/*
- * Be strict and only allow __mcopy_atomic on userfaultfd
- * registered ranges to prevent userland errors going
- * unnoticed. As far as the VM consistency is concerned, it
- * would be perfectly safe to remove this check, but there's
- * no useful usage for __mcopy_atomic ouside of userfaultfd
- * registered ranges. This is after all why these are ioctls
- * belonging to the userfaultfd and not syscalls.
+ * Check the vma is registered in uffd, this is required to
+ * enforce the VM_MAYWRITE check done at uffd registration
+ * time.
*/
if (!dst_vma->vm_userfaultfd_ctx.ctx)
goto out_unlock;
_
Patches currently in -mm which might be from aarcange(a)redhat.com are
The patch titled
Subject: userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
has been removed from the -mm tree. Its filename was
userfaultfd-shmem-allocate-anonymous-memory-for-map_private-shmem.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
Userfaultfd did not create private memory when UFFDIO_COPY was invoked on
a MAP_PRIVATE shmem mapping. Instead it wrote to the shmem file, even
when that had not been opened for writing. Though, fortunately, that
could only happen where there was a hole in the file.
Fix the shmem-backed implementation of UFFDIO_COPY to create private
memory for MAP_PRIVATE mappings. The hugetlbfs-backed implementation was
already correct.
This change is visible to userland, if userfaultfd has been used in
unintended ways: so it introduces a small risk of incompatibility, but is
necessary in order to respect file permissions.
An app that uses UFFDIO_COPY for anything like postcopy live migration
won't notice the difference, and in fact it'll run faster because there
will be no copy-on-write and memory waste in the tmpfs pagecache anymore.
Userfaults on MAP_PRIVATE shmem keep triggering only on file holes like
before.
The real zeropage can also be built on a MAP_PRIVATE shmem mapping through
UFFDIO_ZEROPAGE and that's safe because the zeropage pte is never dirty,
in turn even an mprotect upgrading the vma permission from PROT_READ to
PROT_READ|PROT_WRITE won't make the zeropage pte writable.
Link: http://lkml.kernel.org/r/20181126173452.26955-3-aarcange@redhat.com
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-shmem-allocate-anonymous-memory-for-map_private-shmem
+++ a/mm/userfaultfd.c
@@ -380,7 +380,17 @@ static __always_inline ssize_t mfill_ato
{
ssize_t err;
- if (vma_is_anonymous(dst_vma)) {
+ /*
+ * The normal page fault path for a shmem will invoke the
+ * fault, fill the hole in the file and COW it right away. The
+ * result generates plain anonymous memory. So when we are
+ * asked to fill an hole in a MAP_PRIVATE shmem mapping, we'll
+ * generate anonymous memory directly without actually filling
+ * the hole. For the MAP_PRIVATE case the robustness check
+ * only happens in the pagetable (to verify it's still none)
+ * and not in the radix tree.
+ */
+ if (!(dst_vma->vm_flags & VM_SHARED)) {
if (!zeropage)
err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
dst_addr, src_addr, page);
@@ -489,7 +499,8 @@ retry:
* dst_vma.
*/
err = -ENOMEM;
- if (vma_is_anonymous(dst_vma) && unlikely(anon_vma_prepare(dst_vma)))
+ if (!(dst_vma->vm_flags & VM_SHARED) &&
+ unlikely(anon_vma_prepare(dst_vma)))
goto out_unlock;
while (src_addr < src_start + len) {
_
Patches currently in -mm which might be from aarcange(a)redhat.com are
The patch titled
Subject: userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails
has been removed from the -mm tree. Its filename was
userfaultfd-use-enoent-instead-of-efault-if-the-atomic-copy-user-fails.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails
Patch series "userfaultfd shmem updates".
Jann found two bugs in the userfaultfd shmem MAP_SHARED backend: the lack
of the VM_MAYWRITE check and the lack of i_size checks.
Then looking into the above we also fixed the MAP_PRIVATE case.
Hugh by source review also found a data loss source if UFFDIO_COPY is used
on shmem MAP_SHARED PROT_READ mappings (the production usages incidentally
run with PROT_READ|PROT_WRITE, so the data loss couldn't happen in those
production usages like with QEMU).
The whole patchset is marked for stable.
We verified QEMU postcopy live migration with guest running on shmem
MAP_PRIVATE run as well as before after the fix of shmem MAP_PRIVATE.
Regardless if it's shmem or hugetlbfs or MAP_PRIVATE or MAP_SHARED, QEMU
unconditionally invokes a punch hole if the guest mapping is filebacked
and a MADV_DONTNEED too (needed to get rid of the MAP_PRIVATE COWs and for
the anon backend).
This patch (of 5):
We internally used EFAULT to communicate with the caller, switch to
ENOENT, so EFAULT can be used as a non internal retval.
Link: http://lkml.kernel.org/r/20181126173452.26955-2-aarcange@redhat.com
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd(a)google.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 2 +-
mm/shmem.c | 2 +-
mm/userfaultfd.c | 6 +++---
3 files changed, 5 insertions(+), 5 deletions(-)
--- a/mm/hugetlb.c~userfaultfd-use-enoent-instead-of-efault-if-the-atomic-copy-user-fails
+++ a/mm/hugetlb.c
@@ -4080,7 +4080,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
/* fallback to copy_from_user outside mmap_sem */
if (unlikely(ret)) {
- ret = -EFAULT;
+ ret = -ENOENT;
*pagep = page;
/* don't free the page */
goto out;
--- a/mm/shmem.c~userfaultfd-use-enoent-instead-of-efault-if-the-atomic-copy-user-fails
+++ a/mm/shmem.c
@@ -2238,7 +2238,7 @@ static int shmem_mfill_atomic_pte(struct
*pagep = page;
shmem_inode_unacct_blocks(inode, 1);
/* don't free the page */
- return -EFAULT;
+ return -ENOENT;
}
} else { /* mfill_zeropage_atomic */
clear_highpage(page);
--- a/mm/userfaultfd.c~userfaultfd-use-enoent-instead-of-efault-if-the-atomic-copy-user-fails
+++ a/mm/userfaultfd.c
@@ -48,7 +48,7 @@ static int mcopy_atomic_pte(struct mm_st
/* fallback to copy_from_user outside mmap_sem */
if (unlikely(ret)) {
- ret = -EFAULT;
+ ret = -ENOENT;
*pagep = page;
/* don't free the page */
goto out;
@@ -274,7 +274,7 @@ retry:
cond_resched();
- if (unlikely(err == -EFAULT)) {
+ if (unlikely(err == -ENOENT)) {
up_read(&dst_mm->mmap_sem);
BUG_ON(!page);
@@ -530,7 +530,7 @@ retry:
src_addr, &page, zeropage);
cond_resched();
- if (unlikely(err == -EFAULT)) {
+ if (unlikely(err == -ENOENT)) {
void *page_kaddr;
up_read(&dst_mm->mmap_sem);
_
Patches currently in -mm which might be from aarcange(a)redhat.com are
The patch titled
Subject: lib/test_kmod.c: fix rmmod double free
has been removed from the -mm tree. Its filename was
test_kmod-fix-rmmod-double-free.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Luis Chamberlain <mcgrof(a)kernel.org>
Subject: lib/test_kmod.c: fix rmmod double free
We free the misc device string twice on rmmod; fix this. Without this we
cannot remove the module without crashing.
Link: http://lkml.kernel.org/r/20181124050500.5257-1-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
Reported-by: Randy Dunlap <rdunlap(a)infradead.org>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: <stable(a)vger.kernel.org> [4.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/test_kmod.c | 1 -
1 file changed, 1 deletion(-)
--- a/lib/test_kmod.c~test_kmod-fix-rmmod-double-free
+++ a/lib/test_kmod.c
@@ -1214,7 +1214,6 @@ void unregister_test_dev_kmod(struct kmo
dev_info(test_dev->dev, "removing interface\n");
misc_deregister(&test_dev->misc_dev);
- kfree(&test_dev->misc_dev.name);
mutex_unlock(&test_dev->config_mutex);
mutex_unlock(&test_dev->trigger_mutex);
_
Patches currently in -mm which might be from mcgrof(a)kernel.org are
The patch titled
Subject: mm: use swp_offset as key in shmem_replace_page()
has been removed from the -mm tree. Its filename was
mm-use-swp_offset-as-key-in-shmem_replace_page.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm: use swp_offset as key in shmem_replace_page()
We changed the key of swap cache tree from swp_entry_t.val to swp_offset.
We need to do so in shmem_replace_page() as well.
Hugh said:
: shmem_replace_page() has been wrong since the day I wrote it: good
: enough to work on swap "type" 0, which is all most people ever use
: (especially those few who need shmem_replace_page() at all), but broken
: once there are any non-0 swp_type bits set in the higher order bits.
Link: http://lkml.kernel.org/r/20181121215442.138545-1-yuzhao@google.com
Fixes: f6ab1f7f6b2d ("mm, swap: use offset of swap entry as key of swap cache")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Matthew Wilcox <willy(a)infradead.org>
Acked-by: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/shmem.c~mm-use-swp_offset-as-key-in-shmem_replace_page
+++ a/mm/shmem.c
@@ -1509,11 +1509,13 @@ static int shmem_replace_page(struct pag
{
struct page *oldpage, *newpage;
struct address_space *swap_mapping;
+ swp_entry_t entry;
pgoff_t swap_index;
int error;
oldpage = *pagep;
- swap_index = page_private(oldpage);
+ entry.val = page_private(oldpage);
+ swap_index = swp_offset(entry);
swap_mapping = page_mapping(oldpage);
/*
@@ -1532,7 +1534,7 @@ static int shmem_replace_page(struct pag
__SetPageLocked(newpage);
__SetPageSwapBacked(newpage);
SetPageUptodate(newpage);
- set_page_private(newpage, swap_index);
+ set_page_private(newpage, entry.val);
SetPageSwapCache(newpage);
/*
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-remove-pte_lock_deinit.patch
mm-dont-expose-page-to-fast-gup-before-its-ready.patch
The patch titled
Subject: mm: cleancache: fix corruption on missed inode invalidation
has been removed from the -mm tree. Its filename was
mm-cleancache-fix-corruption-on-missed-inode-invalidation.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Subject: mm: cleancache: fix corruption on missed inode invalidation
If all pages are deleted from the mapping by memory reclaim and also
moved to the cleancache:
__delete_from_page_cache
(no shadow case)
unaccount_page_cache_page
cleancache_put_page
page_cache_delete
mapping->nrpages -= nr
(nrpages becomes 0)
We don't clean the cleancache for an inode after final file truncation
(removal).
truncate_inode_pages_final
check (nrpages || nrexceptional) is false
no truncate_inode_pages
no cleancache_invalidate_inode(mapping)
These way when reading the new file created with same inode we may get
these trash leftover pages from cleancache and see wrong data instead of
the contents of the new file.
Fix it by always doing truncate_inode_pages which is already ready for
nrpages == 0 && nrexceptional == 0 case and just invalidates inode.
[akpm(a)linux-foundation.org: add comment, per Jan]
Link: http://lkml.kernel.org/r/20181112095734.17979-1-ptikhomirov@virtuozzo.com
Fixes: commit 91b0abe36a7b ("mm + fs: store shadow entries in page cache")
Signed-off-by: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Reviewed-by: Vasily Averin <vvs(a)virtuozzo.com>
Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/truncate.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/truncate.c~mm-cleancache-fix-corruption-on-missed-inode-invalidation
+++ a/mm/truncate.c
@@ -517,9 +517,13 @@ void truncate_inode_pages_final(struct a
*/
xa_lock_irq(&mapping->i_pages);
xa_unlock_irq(&mapping->i_pages);
-
- truncate_inode_pages(mapping, 0);
}
+
+ /*
+ * Cleancache needs notification even if there are no pages or shadow
+ * entries.
+ */
+ truncate_inode_pages(mapping, 0);
}
EXPORT_SYMBOL(truncate_inode_pages_final);
_
Patches currently in -mm which might be from ptikhomirov(a)virtuozzo.com are
The commit b1b8f45b3130 ("ARM: dts: bcm2837: Add missing GPIOs of Expander")
introduced a wifi power sequence. Unfortunately the polarity of the reset
GPIOs were wrong and broke the wifi support on Raspberry Pi 3 B and
later in 3 B+. This wasn't discovered before since the power sequence
takes only effect in case the relevant MMC driver is compiled as a module.
Fixes: b1b8f45b3130 ("ARM: dts: bcm2837: Add missing GPIOs of Expander")
Cc: stable(a)vger.kernel.org
Reported-by: Matthias Lueschner <lueschem(a)gmail.com>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911443
Signed-off-by: Stefan Wahren <stefan.wahren(a)i2se.com>
---
Hi, i like to have this included in 4.20 if possible.
Stefan
arch/arm/boot/dts/bcm2837-rpi-3-b-plus.dts | 2 +-
arch/arm/boot/dts/bcm2837-rpi-3-b.dts | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/boot/dts/bcm2837-rpi-3-b-plus.dts b/arch/arm/boot/dts/bcm2837-rpi-3-b-plus.dts
index 4adb85e..9376224 100644
--- a/arch/arm/boot/dts/bcm2837-rpi-3-b-plus.dts
+++ b/arch/arm/boot/dts/bcm2837-rpi-3-b-plus.dts
@@ -31,7 +31,7 @@
wifi_pwrseq: wifi-pwrseq {
compatible = "mmc-pwrseq-simple";
- reset-gpios = <&expgpio 1 GPIO_ACTIVE_HIGH>;
+ reset-gpios = <&expgpio 1 GPIO_ACTIVE_LOW>;
};
};
diff --git a/arch/arm/boot/dts/bcm2837-rpi-3-b.dts b/arch/arm/boot/dts/bcm2837-rpi-3-b.dts
index c318bcb..89e6fd5 100644
--- a/arch/arm/boot/dts/bcm2837-rpi-3-b.dts
+++ b/arch/arm/boot/dts/bcm2837-rpi-3-b.dts
@@ -26,7 +26,7 @@
wifi_pwrseq: wifi-pwrseq {
compatible = "mmc-pwrseq-simple";
- reset-gpios = <&expgpio 1 GPIO_ACTIVE_HIGH>;
+ reset-gpios = <&expgpio 1 GPIO_ACTIVE_LOW>;
};
};
--
2.7.4
NullFunc packets should never be duplicate just like
QoS-NullFunc packets.
We saw a client that enters / exits power save with
NullFunc frames (and not with QoS-NullFunc) despite the
fact that the association supports HT.
This specific client also re-uses a non-zero sequence number
for different NullFunc frames.
At some point, the client had to send a retransmission of
the NullFunc frame and we dropped it, leading to a
misalignment in the power save state.
Fix this by never consider a NullFunc frame as duplicate,
just like we do for QoS NullFunc frames.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=201449
CC: <stable(a)vger.kernel.org>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
---
net/mac80211/rx.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 3bd3b57..2c6515a 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1403,6 +1403,7 @@ ieee80211_rx_h_check_dup(struct ieee80211_rx_data *rx)
return RX_CONTINUE;
if (ieee80211_is_ctl(hdr->frame_control) ||
+ ieee80211_is_nullfunc(hdr->frame_control) ||
ieee80211_is_qos_nullfunc(hdr->frame_control) ||
is_multicast_ether_addr(hdr->addr1))
return RX_CONTINUE;
--
2.7.4
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: dvb-usb-v2: Fix incorrect use of transfer_flags URB_FREE_BUFFER
Author: Malcolm Priestley <tvboxspy(a)gmail.com>
Date: Mon Nov 26 15:18:25 2018 -0500
commit 1a0c10ed7bb1 ("media: dvb-usb-v2: stop using coherent memory for
URBs") incorrectly adds URB_FREE_BUFFER after every urb transfer.
It cannot use this flag because it reconfigures the URBs accordingly
to suit connected devices. In doing a call to usb_free_urb is made and
invertedly frees the buffers.
The stream buffer should remain constant while driver is up.
Signed-off-by: Malcolm Priestley <tvboxspy(a)gmail.com>
CC: stable(a)vger.kernel.org # v4.18+
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
drivers/media/usb/dvb-usb-v2/usb_urb.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
---
diff --git a/drivers/media/usb/dvb-usb-v2/usb_urb.c b/drivers/media/usb/dvb-usb-v2/usb_urb.c
index 024c751eb165..2ad2ddeaff51 100644
--- a/drivers/media/usb/dvb-usb-v2/usb_urb.c
+++ b/drivers/media/usb/dvb-usb-v2/usb_urb.c
@@ -155,7 +155,6 @@ static int usb_urb_alloc_bulk_urbs(struct usb_data_stream *stream)
stream->props.u.bulk.buffersize,
usb_urb_complete, stream);
- stream->urb_list[i]->transfer_flags = URB_FREE_BUFFER;
stream->urbs_initialized++;
}
return 0;
@@ -186,7 +185,7 @@ static int usb_urb_alloc_isoc_urbs(struct usb_data_stream *stream)
urb->complete = usb_urb_complete;
urb->pipe = usb_rcvisocpipe(stream->udev,
stream->props.endpoint);
- urb->transfer_flags = URB_ISO_ASAP | URB_FREE_BUFFER;
+ urb->transfer_flags = URB_ISO_ASAP;
urb->interval = stream->props.u.isoc.interval;
urb->number_of_packets = stream->props.u.isoc.framesperurb;
urb->transfer_buffer_length = stream->props.u.isoc.framesize *
@@ -210,7 +209,7 @@ static int usb_free_stream_buffers(struct usb_data_stream *stream)
if (stream->state & USB_STATE_URB_BUF) {
while (stream->buf_num) {
stream->buf_num--;
- stream->buf_list[stream->buf_num] = NULL;
+ kfree(stream->buf_list[stream->buf_num]);
}
}
From: Todd Kjos <tkjos(a)android.com>
commit 7bada55ab50697861eee6bb7d60b41e68a961a9c upstream
Malicious code can attempt to free buffers using the BC_FREE_BUFFER
ioctl to binder. There are protections against a user freeing a buffer
while in use by the kernel, however there was a window where
BC_FREE_BUFFER could be used to free a recently allocated buffer that
was not completely initialized. This resulted in a use-after-free
detected by KASAN with a malicious test program.
This window is closed by setting the buffer's allow_user_free attribute
to 0 when the buffer is allocated or when the user has previously freed
it instead of waiting for the caller to set it. The problem was that
when the struct buffer was recycled, allow_user_free was stale and set
to 1 allowing a free to go through.
Signed-off-by: Todd Kjos <tkjos(a)google.com>
Acked-by: Arve Hjønnevåg <arve(a)android.com>
Cc: stable <stable(a)vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/android/binder.c | 21 ++++++++++++---------
drivers/android/binder_alloc.c | 14 ++++++--------
drivers/android/binder_alloc.h | 3 +--
3 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index a86c27948fcab..96a0f940e54d9 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2918,7 +2918,6 @@ static void binder_transaction(struct binder_proc *proc,
t->buffer = NULL;
goto err_binder_alloc_buf_failed;
}
- t->buffer->allow_user_free = 0;
t->buffer->debug_id = t->debug_id;
t->buffer->transaction = t;
t->buffer->target_node = target_node;
@@ -3407,14 +3406,18 @@ static int binder_thread_write(struct binder_proc *proc,
buffer = binder_alloc_prepare_to_free(&proc->alloc,
data_ptr);
- if (buffer == NULL) {
- binder_user_error("%d:%d BC_FREE_BUFFER u%016llx no match\n",
- proc->pid, thread->pid, (u64)data_ptr);
- break;
- }
- if (!buffer->allow_user_free) {
- binder_user_error("%d:%d BC_FREE_BUFFER u%016llx matched unreturned buffer\n",
- proc->pid, thread->pid, (u64)data_ptr);
+ if (IS_ERR_OR_NULL(buffer)) {
+ if (PTR_ERR(buffer) == -EPERM) {
+ binder_user_error(
+ "%d:%d BC_FREE_BUFFER u%016llx matched unreturned or currently freeing buffer\n",
+ proc->pid, thread->pid,
+ (u64)data_ptr);
+ } else {
+ binder_user_error(
+ "%d:%d BC_FREE_BUFFER u%016llx no match\n",
+ proc->pid, thread->pid,
+ (u64)data_ptr);
+ }
break;
}
binder_debug(BINDER_DEBUG_FREE_BUFFER,
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 58e4658f9dd6c..b9281f2725a6a 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -149,14 +149,12 @@ static struct binder_buffer *binder_alloc_prepare_to_free_locked(
else {
/*
* Guard against user threads attempting to
- * free the buffer twice
+ * free the buffer when in use by kernel or
+ * after it's already been freed.
*/
- if (buffer->free_in_progress) {
- pr_err("%d:%d FREE_BUFFER u%016llx user freed buffer twice\n",
- alloc->pid, current->pid, (u64)user_ptr);
- return NULL;
- }
- buffer->free_in_progress = 1;
+ if (!buffer->allow_user_free)
+ return ERR_PTR(-EPERM);
+ buffer->allow_user_free = 0;
return buffer;
}
}
@@ -486,7 +484,7 @@ struct binder_buffer *binder_alloc_new_buf_locked(struct binder_alloc *alloc,
rb_erase(best_fit, &alloc->free_buffers);
buffer->free = 0;
- buffer->free_in_progress = 0;
+ buffer->allow_user_free = 0;
binder_insert_allocated_buffer_locked(alloc, buffer);
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: binder_alloc_buf size %zd got %pK\n",
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 2dd33b6df1044..a3ad7683b6f23 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -50,8 +50,7 @@ struct binder_buffer {
unsigned free:1;
unsigned allow_user_free:1;
unsigned async_transaction:1;
- unsigned free_in_progress:1;
- unsigned debug_id:28;
+ unsigned debug_id:29;
struct binder_transaction *transaction;
--
2.20.0.rc1.387.gf8505762e3-goog
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ff38bd40230af35e446239396e5fc8ebd6a5248 Mon Sep 17 00:00:00 2001
From: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Date: Fri, 30 Nov 2018 14:09:00 -0800
Subject: [PATCH] mm: cleancache: fix corruption on missed inode invalidation
If all pages are deleted from the mapping by memory reclaim and also
moved to the cleancache:
__delete_from_page_cache
(no shadow case)
unaccount_page_cache_page
cleancache_put_page
page_cache_delete
mapping->nrpages -= nr
(nrpages becomes 0)
We don't clean the cleancache for an inode after final file truncation
(removal).
truncate_inode_pages_final
check (nrpages || nrexceptional) is false
no truncate_inode_pages
no cleancache_invalidate_inode(mapping)
These way when reading the new file created with same inode we may get
these trash leftover pages from cleancache and see wrong data instead of
the contents of the new file.
Fix it by always doing truncate_inode_pages which is already ready for
nrpages == 0 && nrexceptional == 0 case and just invalidates inode.
[akpm(a)linux-foundation.org: add comment, per Jan]
Link: http://lkml.kernel.org/r/20181112095734.17979-1-ptikhomirov@virtuozzo.com
Fixes: commit 91b0abe36a7b ("mm + fs: store shadow entries in page cache")
Signed-off-by: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Reviewed-by: Vasily Averin <vvs(a)virtuozzo.com>
Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/truncate.c b/mm/truncate.c
index 45d68e90b703..798e7ccfb030 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -517,9 +517,13 @@ void truncate_inode_pages_final(struct address_space *mapping)
*/
xa_lock_irq(&mapping->i_pages);
xa_unlock_irq(&mapping->i_pages);
-
- truncate_inode_pages(mapping, 0);
}
+
+ /*
+ * Cleancache needs notification even if there are no pages or shadow
+ * entries.
+ */
+ truncate_inode_pages(mapping, 0);
}
EXPORT_SYMBOL(truncate_inode_pages_final);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ff38bd40230af35e446239396e5fc8ebd6a5248 Mon Sep 17 00:00:00 2001
From: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Date: Fri, 30 Nov 2018 14:09:00 -0800
Subject: [PATCH] mm: cleancache: fix corruption on missed inode invalidation
If all pages are deleted from the mapping by memory reclaim and also
moved to the cleancache:
__delete_from_page_cache
(no shadow case)
unaccount_page_cache_page
cleancache_put_page
page_cache_delete
mapping->nrpages -= nr
(nrpages becomes 0)
We don't clean the cleancache for an inode after final file truncation
(removal).
truncate_inode_pages_final
check (nrpages || nrexceptional) is false
no truncate_inode_pages
no cleancache_invalidate_inode(mapping)
These way when reading the new file created with same inode we may get
these trash leftover pages from cleancache and see wrong data instead of
the contents of the new file.
Fix it by always doing truncate_inode_pages which is already ready for
nrpages == 0 && nrexceptional == 0 case and just invalidates inode.
[akpm(a)linux-foundation.org: add comment, per Jan]
Link: http://lkml.kernel.org/r/20181112095734.17979-1-ptikhomirov@virtuozzo.com
Fixes: commit 91b0abe36a7b ("mm + fs: store shadow entries in page cache")
Signed-off-by: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Reviewed-by: Vasily Averin <vvs(a)virtuozzo.com>
Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/truncate.c b/mm/truncate.c
index 45d68e90b703..798e7ccfb030 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -517,9 +517,13 @@ void truncate_inode_pages_final(struct address_space *mapping)
*/
xa_lock_irq(&mapping->i_pages);
xa_unlock_irq(&mapping->i_pages);
-
- truncate_inode_pages(mapping, 0);
}
+
+ /*
+ * Cleancache needs notification even if there are no pages or shadow
+ * entries.
+ */
+ truncate_inode_pages(mapping, 0);
}
EXPORT_SYMBOL(truncate_inode_pages_final);