Commit 93b3a674485f6a4b8ffff85d1682d5e8b7c51560 upstream
Commit 93b3a674485f ("mm,vmstat: reduce zone->lock holding time by
/proc/pagetypeinfo") upstream caps the number of iterations over each
free_list at 100,000, and also drops the zone->lock in between each
migrate type. Capping the iteration count alters the file contents in
some cases, which means this approach may not be suitable for stable
backports.
However, dropping zone->lock in between migrate types (and, as a result,
page orders) will not change the /proc/pagetypeinfo file contents. It
can significantly reduce the length of time spent with IRQs disabled,
which can prevent missed interrupts or soft lockups which we have
observed on systems with particularly large memory.
Thus, this commit is a modified version of the upstream one which only
drops the lock in between migrate types.
Fixes: 467c996c1e19 ("Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo")
Signed-off-by: Stephen Brennan <stephen.s.brennan(a)oracle.com>
Reviewed-by: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz(a)oracle.com>
---
mm/vmstat.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 13b74c4314a7e..663069cf7724a 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1316,6 +1316,9 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
list_for_each(curr, &area->free_list[mtype])
freecount++;
seq_printf(m, "%6lu ", freecount);
+ spin_unlock_irq(&zone->lock);
+ cond_resched();
+ spin_lock_irq(&zone->lock);
}
seq_putc(m, '\n');
}
--
2.27.0
From: Mark Tomlinson <mark.tomlinson(a)alliedtelesis.co.nz>
commit 175e476b8cdf2a4de7432583b49c871345e4f8a1 upstream.
When a new table value was assigned, it was followed by a write memory
barrier. This ensured that all writes before this point would complete
before any writes after this point. However, to determine whether the
rules are unused, the sequence counter is read. To ensure that all
writes have been done before these reads, a full memory barrier is
needed, not just a write memory barrier. The same argument applies when
incrementing the counter, before the rules are read.
Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic
reported in cc00bcaa5899 (which is still present), while still
maintaining the same speed of replacing tables.
The smb_mb() barriers potentially slow the packet path, however testing
has shown no measurable change in performance on a 4-core MIPS64
platform.
Fixes: 7f5c6d4f665b ("netfilter: get rid of atomic ops in fast path")
Signed-off-by: Mark Tomlinson <mark.tomlinson(a)alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
[Ported to stable, affected barrier is added by d3d40f237480abf3268956daf18cdc56edd32834 in mainline]
Signed-off-by: Pavel Machek (CIP) <pavel(a)denx.de>
Signed-off-by: Nobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu(a)toshiba.co.jp>
---
include/linux/netfilter/x_tables.h | 2 +-
net/netfilter/x_tables.c | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 6923e4049de3af..304b60b4952627 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -327,7 +327,7 @@ static inline unsigned int xt_write_recseq_begin(void)
* since addend is most likely 1
*/
__this_cpu_add(xt_recseq.sequence, addend);
- smp_wmb();
+ smp_mb();
return addend;
}
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 7e261fab7ef8de..480ccd52a73fdb 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1140,6 +1140,9 @@ xt_replace_table(struct xt_table *table,
smp_wmb();
table->private = newinfo;
+ /* make sure all cpus see new ->private value */
+ smp_mb();
+
/*
* Even though table entries have now been swapped, other CPU's
* may still be using the old entries. This is okay, because
--
2.31.1
The patch titled
Subject: mm/page_alloc: fix counting of free pages after take off from buddy
has been added to the -mm tree. Its filename is
mm-page_alloc-fix-counting-of-free-pages-after-take-off-from-buddy.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-fix-counting-of-fre…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-fix-counting-of-fre…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ding Hui <dinghui(a)sangfor.com.cn>
Subject: mm/page_alloc: fix counting of free pages after take off from buddy
Recently we found that there is a lot MemFree left in /proc/meminfo after
do a lot of pages soft offline, it's not quite correct.
Before Oscar rework soft offline for free pages [1], if we soft offline
free pages, these pages are left in buddy with HWPoison flag, and
NR_FREE_PAGES is not updated immediately. So the difference between
NR_FREE_PAGES and real number of available free pages is also even big at
the beginning.
However, with the workload running, when we catch HWPoison page in any
alloc functions subsequently, we will remove it from buddy, meanwhile
update the NR_FREE_PAGES and try again, so the NR_FREE_PAGES will get more
and more closer to the real number of available free pages. (regardless
of unpoison_memory())
Now, for offline free pages, after a successful call
take_page_off_buddy(), the page is no longer belong to buddy allocator,
and will not be used any more, but we missed accounting NR_FREE_PAGES in
this situation, and there is no chance to be updated later.
Do update in take_page_off_buddy() like rmqueue() does, but avoid double
counting if some one already set_migratetype_isolate() on the page.
[1]: commit 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages")
Link: https://lkml.kernel.org/r/20210526075247.11130-1-dinghui@sangfor.com.cn
Fixes: 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages")
Signed-off-by: Ding Hui <dinghui(a)sangfor.com.cn>
Suggested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/page_alloc.c~mm-page_alloc-fix-counting-of-free-pages-after-take-off-from-buddy
+++ a/mm/page_alloc.c
@@ -9158,6 +9158,8 @@ bool take_page_off_buddy(struct page *pa
del_page_from_free_list(page_head, zone, page_order);
break_down_buddy_pages(zone, page_head, page, 0,
page_order, migratetype);
+ if (!is_migrate_isolate(migratetype))
+ __mod_zone_freepage_state(zone, -1, migratetype);
ret = true;
break;
}
_
Patches currently in -mm which might be from dinghui(a)sangfor.com.cn are
mm-page_alloc-fix-counting-of-free-pages-after-take-off-from-buddy.patch