The patch titled
Subject: mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
has been added to the -mm tree. Its filename is
mm-pages_allocc-dont-create-zone_movable-beyond-the-end-of-a-node.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-pages_allocc-dont-create-zone_…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-pages_allocc-dont-create-zone_…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alistair Popple <apopple(a)nvidia.com>
Subject: mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
ZONE_MOVABLE uses the remaining memory in each node. Its starting pfn is
also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining
memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is not
enough room for ZONE_MOVABLE on that node.
Unfortunately this condition is not checked for. This leads to
zone_movable_pfn[] getting set to a pfn greater than the last pfn in a
node.
calculate_node_totalpages() then sets zone->present_pages to be greater
than zone->spanned_pages which is invalid, as spanned_pages represents the
maximum number of pages in a zone assuming no holes.
Subsequently it is possible free_area_init_core() will observe a zone of
size zero with present pages. In this case it will skip setting up the
zone, including the initialisation of free_lists[].
However populated_zone() checks zone->present_pages to see if a zone has
memory available. This is used by iterators such as walk_zones_in_node().
pagetypeinfo_showfree() uses this to walk the free_list of each zone in
each node, which are assumed to be initialised due to the zone not being
empty. As free_area_init_core() never initialised the free_lists[] this
results in the following kernel crash when trying to read
/proc/pagetypeinfo:
[ 67.534914] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 67.535429] #PF: supervisor read access in kernel mode
[ 67.535789] #PF: error_code(0x0000) - not-present page
[ 67.536128] PGD 0 P4D 0
[ 67.536305] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
[ 67.536696] CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461
[ 67.537096] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[ 67.537638] RIP: 0010:pagetypeinfo_show+0x163/0x460
[ 67.537992] Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82
[ 67.538259] RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003
[ 67.538259] RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001
[ 67.538259] RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b
[ 67.538259] RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e
[ 67.538259] R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00
[ 67.538259] R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000
[ 67.538259] FS: 00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 67.538259] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 67.538259] CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0
[ 67.538259] Call Trace:
[ 67.538259] <TASK>
[ 67.538259] seq_read_iter+0x128/0x460
[ 67.538259] ? aa_file_perm+0x1af/0x5f0
[ 67.538259] proc_reg_read_iter+0x51/0x80
[ 67.538259] ? lock_is_held_type+0xea/0x140
[ 67.538259] new_sync_read+0x113/0x1a0
[ 67.538259] vfs_read+0x136/0x1d0
[ 67.538259] ksys_read+0x70/0xf0
[ 67.538259] __x64_sys_read+0x1a/0x20
[ 67.538259] do_syscall_64+0x3b/0xc0
[ 67.538259] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 67.538259] RIP: 0033:0x7f9c83e23cce
[ 67.538259] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 13 0a 00 e8 c9 e3 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
[ 67.538259] RSP: 002b:00007fff116e1a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 67.538259] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f9c83e23cce
[ 67.538259] RDX: 0000000000020000 RSI: 00007f9c83a2c000 RDI: 0000000000000003
[ 67.538259] RBP: 00007f9c83a2c000 R08: 00007f9c83a2b010 R09: 0000000000000000
[ 67.538259] R10: 00007f9c83f2d7d0 R11: 0000000000000246 R12: 0000000000000000
[ 67.538259] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[ 67.538259] </TASK>
Fix this by checking that the aligned zone_movable_pfn[] does not exceed
the end of the node, and if it does skip creating a movable zone on this
node.
Link: https://lkml.kernel.org/r/20220215025831.2113067-1-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple(a)nvidia.com>
Fixes: 2a1e274acf0b ("Create the ZONE_MOVABLE zone")
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/page_alloc.c~mm-pages_allocc-dont-create-zone_movable-beyond-the-end-of-a-node
+++ a/mm/page_alloc.c
@@ -7998,10 +7998,17 @@ restart:
out2:
/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
- for (nid = 0; nid < MAX_NUMNODES; nid++)
+ for (nid = 0; nid < MAX_NUMNODES; nid++) {
+ unsigned long start_pfn, end_pfn;
+
zone_movable_pfn[nid] =
roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
+ get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
+ if (zone_movable_pfn[nid] >= end_pfn)
+ zone_movable_pfn[nid] = 0;
+ }
+
out:
/* restore the node_state */
node_states[N_MEMORY] = saved_node_state;
_
Patches currently in -mm which might be from apopple(a)nvidia.com are
mm-pages_allocc-dont-create-zone_movable-beyond-the-end-of-a-node.patch
mm-remove-the-vma-check-in-migrate_vma_setup.patch
mm-gup-migrate-device-coherent-pages-when-pinning-instead-of-failing.patch
The patch titled
Subject: memfd: fix shmem huge page failed to set F_SEAL_WRITE attribute problem
has been added to the -mm tree. Its filename is
fix-shmem-huge-page-failed-to-set-f_seal_write-attribute-problem.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/fix-shmem-huge-page-failed-to-set…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/fix-shmem-huge-page-failed-to-set…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: wangyong <wang.yong12(a)zte.com.cn>
Subject: memfd: fix shmem huge page failed to set F_SEAL_WRITE attribute problem
After enabling tmpfs filesystem to support transparent hugepage with the
following command:
echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled
The docker program adds F_SEAL_WRITE through the following command which
will prompt EBUSY.
fcntl(5, F_ADD_SEALS, F_SEAL_WRITE)=-1.
It is found that in memfd_wait_for_pins function, the page_count of
hugepage is 512 and page_mapcount is 0, which does not meet the
conditions:
page_count(page) - page_mapcount(page) != 1.
But the page is not busy at this time, therefore, the page_order of
hugepage should be taken into account in the calculation.
Link: https://lkml.kernel.org/r/20220215073743.1769979-1-cgel.zte@gmail.com
Signed-off-by: wangyong <wang.yong12(a)zte.com.cn>
Reported-by: Zeal Robot <zealci(a)zte.com.cn>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memfd.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
--- a/mm/memfd.c~fix-shmem-huge-page-failed-to-set-f_seal_write-attribute-problem
+++ a/mm/memfd.c
@@ -31,6 +31,7 @@
static void memfd_tag_pins(struct xa_state *xas)
{
struct page *page;
+ int count = 0;
unsigned int tagged = 0;
lru_add_drain();
@@ -39,8 +40,12 @@ static void memfd_tag_pins(struct xa_sta
xas_for_each(xas, page, ULONG_MAX) {
if (xa_is_value(page))
continue;
+
page = find_subpage(page, xas->xa_index);
- if (page_count(page) - page_mapcount(page) > 1)
+ count = page_count(page);
+ if (PageTransCompound(page))
+ count -= (1 << compound_order(compound_head(page))) - 1;
+ if (count - page_mapcount(page) > 1)
xas_set_mark(xas, MEMFD_TAG_PINNED);
if (++tagged % XA_CHECK_SCHED)
@@ -67,11 +72,12 @@ static int memfd_wait_for_pins(struct ad
{
XA_STATE(xas, &mapping->i_pages, 0);
struct page *page;
- int error, scan;
+ int error, scan, count;
memfd_tag_pins(&xas);
error = 0;
+ count = 0;
for (scan = 0; scan <= LAST_SCAN; scan++) {
unsigned int tagged = 0;
@@ -89,8 +95,12 @@ static int memfd_wait_for_pins(struct ad
bool clear = true;
if (xa_is_value(page))
continue;
+
page = find_subpage(page, xas.xa_index);
- if (page_count(page) - page_mapcount(page) != 1) {
+ count = page_count(page);
+ if (PageTransCompound(page))
+ count -= (1 << compound_order(compound_head(page))) - 1;
+ if (count - page_mapcount(page) != 1) {
/*
* On the last scan, we clean up all those tags
* we inserted; but make a note that we still
_
Patches currently in -mm which might be from wang.yong12(a)zte.com.cn are
fix-shmem-huge-page-failed-to-set-f_seal_write-attribute-problem.patch
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
If the only thing that is changing is SAGV vs. no SAGV but
the number of active planes and the total data rates end up
unchanged we currently bail out of intel_bw_atomic_check()
early and forget to actually compute the new WGV point
mask and thus won't actually enable/disable SAGV as requested.
This ends up poorly if we end up running with SAGV enabled
when we shouldn't. Usually ends up in underruns.
To fix this let's go through the QGV point mask computation
if anyone else already added the bw state for us.
Cc: stable(a)vger.kernel.org
Cc: Stanislav Lisovskiy <stanislav.lisovskiy(a)intel.com>
Fixes: 20f505f22531 ("drm/i915: Restrict qgv points which don't have enough bandwidth.")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_bw.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c
index 23aa8e06de18..d72ccee7d53b 100644
--- a/drivers/gpu/drm/i915/display/intel_bw.c
+++ b/drivers/gpu/drm/i915/display/intel_bw.c
@@ -846,6 +846,13 @@ int intel_bw_atomic_check(struct intel_atomic_state *state)
if (num_psf_gv_points > 0)
mask |= REG_GENMASK(num_psf_gv_points - 1, 0) << ADLS_PSF_PT_SHIFT;
+ /*
+ * If we already have the bw state then recompute everything
+ * even if pipe data_rate / active_planes didn't change.
+ * Other things (such as SAGV) may have changed.
+ */
+ new_bw_state = intel_atomic_get_new_bw_state(state);
+
for_each_oldnew_intel_crtc_in_state(state, crtc, old_crtc_state,
new_crtc_state, i) {
unsigned int old_data_rate =
--
2.34.1