From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Cc: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Liu Shixin <liushixin2(a)huawei.com>
commit f1897f2f08b28ae59476d8b73374b08f856973af upstream.
syzkaller reported such a BUG_ON():
------------[ cut here ]------------
kernel BUG at mm/khugepaged.c:1835!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
...
CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22
Tainted: [W]=WARN
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : collapse_file+0xa44/0x1400
lr : collapse_file+0x88/0x1400
sp : ffff80008afe3a60
...
Call trace:
collapse_file+0xa44/0x1400 (P)
hpage_collapse_scan_file+0x278/0x400
madvise_collapse+0x1bc/0x678
madvise_vma_behavior+0x32c/0x448
madvise_walk_vmas.constprop.0+0xbc/0x140
do_madvise.part.0+0xdc/0x2c8
__arm64_sys_madvise+0x68/0x88
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0x128
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x190/0x198
This indicates that the pgoff is unaligned. After analysis, I confirm the
vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is
set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned,
it can pass the check in thp_vma_allowable_order() as it is an
anonymous-mmap, but then be collapsed as a file-mmap.
It seems the problem has existed for a long time, but actually, since we
have khugepaged_max_ptes_none check before, we will skip collapse it as it
is /dev/zero and so has no present page. But commit d8ea7cc8547c limit
the check for only khugepaged, so the BUG_ON() can be triggered by
madvise_collapse().
Add vma_is_anonymous() check to make such vma be processed by
hpage_collapse_scan_pmd().
Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com
Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mattew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nanyong Sun <sunnanyong(a)huawei.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[acsjakub: backport, clean apply]
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Cc: linux-mm(a)kvack.org
---
v1 -> v2: fix missing sign-off
Ran into the crash with syzkaller, backporting this patch works - the
reproducer no longer crashes.
Please let me know if there was a reason not to backport.
mm/khugepaged.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b538c3d48386..abd5764e4864 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2404,7 +2404,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
VM_BUG_ON(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma,
khugepaged_scan.address);
@@ -2750,7 +2750,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
nodes_clear(cc->alloc_nmask);
- if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) {
+ if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Edip Hazuri <edip(a)medip.dev>
The mute led on this laptop is using ALC245 but requires a quirk to work
This patch enables the existing quirk for the device.
Tested on Victus 16-S0063NT Laptop. The LED behaviour works
as intended.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Edip Hazuri <edip(a)medip.dev>
---
sound/hda/codecs/realtek/alc269.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/hda/codecs/realtek/alc269.c b/sound/hda/codecs/realtek/alc269.c
index 05019fa732..77322ff8a6 100644
--- a/sound/hda/codecs/realtek/alc269.c
+++ b/sound/hda/codecs/realtek/alc269.c
@@ -6528,6 +6528,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x8bbe, "HP Victus 16-r0xxx (MB 8BBE)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bc8, "HP Victus 15-fa1xxx", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bcd, "HP Omen 16-xd0xxx", ALC245_FIXUP_HP_MUTE_LED_V1_COEFBIT),
+ SND_PCI_QUIRK(0x103c, 0x8bd4, "HP Victus 16-s0xxx (MB 8BD4)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bdd, "HP Envy 17", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x103c, 0x8bde, "HP Envy 17", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x103c, 0x8bdf, "HP Envy 15", ALC287_FIXUP_CS35L41_I2C_2),
--
2.50.1
Hello,
Until kernel version 6.7, a write-sealed memfd could not be mapped as
shared and read-only. This was clearly a bug, and was not inline with
the description of F_SEAL_WRITE in the man page for fcntl()[1].
Lorenzo's series [2] fixed that issue and was merged in kernel version
6.7, but was not backported to older kernels. So, this issue is still
present on kernels 5.4, 5.10, 5.15, 6.1, and 6.6.
This series backports Lorenzo's series to the 5.4 kernel.
[1] https://man7.org/linux/man-pages/man2/fcntl.2.html
[2] https://lore.kernel.org/all/913628168ce6cce77df7d13a63970bae06a526e0.169711…
Lorenzo Stoakes (3):
mm: drop the assumption that VM_SHARED always implies writable
mm: update memfd seal write check to include F_SEAL_WRITE
mm: perform the mapping_map_writable() check after call_mmap()
fs/hugetlbfs/inode.c | 2 +-
include/linux/fs.h | 4 ++--
include/linux/mm.h | 26 +++++++++++++++++++-------
kernel/fork.c | 2 +-
mm/filemap.c | 2 +-
mm/madvise.c | 2 +-
mm/mmap.c | 26 ++++++++++++++++----------
mm/shmem.c | 2 +-
8 files changed, 42 insertions(+), 24 deletions(-)
--
2.50.1.552.g942d659e1b-goog
The process of adding an I2C adapter can invoke I2C accesses on that new
adapter (see i2c_detect()).
Ensure we have set the adapter's driver data to avoid null pointer
dereferences in the xfer functions during the adapter add.
This has been noted in the past and the same fix proposed but not
completed. See:
https://lore.kernel.org/lkml/ef597e73-ed71-168e-52af-0d19b03734ac@vigem.de/
Signed-off-by: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Sumanth Gavini <sumanth.gavini(a)yahoo.com>
---
drivers/hid/hid-mcp2221.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/hid-mcp2221.c b/drivers/hid/hid-mcp2221.c
index de52e9f7bb8c..9973545c1c4b 100644
--- a/drivers/hid/hid-mcp2221.c
+++ b/drivers/hid/hid-mcp2221.c
@@ -873,12 +873,12 @@ static int mcp2221_probe(struct hid_device *hdev,
"MCP2221 usb-i2c bridge on hidraw%d",
((struct hidraw *)hdev->hidraw)->minor);
+ i2c_set_adapdata(&mcp->adapter, mcp);
ret = i2c_add_adapter(&mcp->adapter);
if (ret) {
hid_err(hdev, "can't add usb-i2c adapter: %d\n", ret);
goto err_i2c;
}
- i2c_set_adapdata(&mcp->adapter, mcp);
/* Setup GPIO chip */
mcp->gc = devm_kzalloc(&hdev->dev, sizeof(*mcp->gc), GFP_KERNEL);
--
2.43.0
The userprogs infrastructure does not expect clang being used with GNU ld
and in that case uses /usr/bin/ld for linking, not the configured $(LD).
This fallback is problematic as it will break when cross-compiling.
Mixing clang and GNU ld is used for example when building for SPARC64,
as ld.lld is not sufficient; see Documentation/kbuild/llvm.rst.
Relax the check around --ld-path so it gets used for all linkers.
Fixes: dfc1b168a8c4 ("kbuild: userprogs: use correct lld when linking through clang")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Nathan, you originally proposed the check for $(CONFIG_LD_IS_LLD) [0],
could you take a look at this?
[0] https://lore.kernel.org/all/20250213175437.GA2756218@ax162/
---
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index c09766beb7eff4780574682b8ea44475fc0a5188..e300c6546c845c300edb5f0033719963c7da8f9b 100644
--- a/Makefile
+++ b/Makefile
@@ -1134,7 +1134,7 @@ KBUILD_USERCFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD
KBUILD_USERLDFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS))
# userspace programs are linked via the compiler, use the correct linker
-ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_LD_IS_LLD),yy)
+ifneq ($(CONFIG_CC_IS_CLANG),)
KBUILD_USERLDFLAGS += --ld-path=$(LD)
endif
---
base-commit: 6832a9317eee280117cd695fa885b2b7a7a38daf
change-id: 20250723-userprogs-clang-gnu-ld-7a1c16fc852d
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
This patchset fixes the xe driver probe fail with -ETIMEDOUT issue in
linux 6.12.35 and later version. The failure is caused by commit
d42b44736ea2 ("drm/xe/gt: Update handling of xe_force_wake_get return"),
which incorrectly handles the return value of xe_force_wake_get as
"refcounted domain mask" (as introduced in 6.13), rather than status
code (as used in 6.12).
In 6.12 stable kernel, xe_force_wake_get still returns a status code.
The update incorrectly treats the return value as a mask, causing the
return value of 0 to be misinterpreted as an error. As a result, the
driver probe fails with -ETIMEDOUT in xe_pci_probe -> xe_device_probe
-> xe_gt_init_hwconfig -> xe_force_wake_get.
[ 1254.323172] xe 0000:00:02.0: [drm] Found ALDERLAKE_P (device ID 46a6) display version 13.00 stepping D0
[ 1254.323175] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] ALDERLAKE_P 46a6:000c dgfx:0 gfx:Xe_LP (12.00) media:Xe_M (12.00) display:yes dma_m_s:39 tc:1 gscfi:0 cscfi:0
[ 1254.323275] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] Stepping = (G:C0, M:C0, B:**)
[ 1254.323328] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] SR-IOV support: no (mode: none)
[ 1254.323379] xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH
[ 1254.323475] xe 0000:00:02.0: probe with driver xe failed with error -110
Similar return handling issue cause by API mismatch are also found in:
Commit 95a75ed2b005 ("drm/xe/tests/mocs: Update xe_force_wake_get() return handling")
Commit 9ffd6ec2de08 ("drm/xe/devcoredump: Update handling of xe_force_wake_get return")
This patchset fixes them by reverting them all.
Additionally, commit deb05f8431f3 ("drm/xe/forcewake: Add a helper
xe_force_wake_ref_has_domain()") is also reverted as it is not needed in
6.12.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5373
Cc: Badal Nilawar <badal.nilawar(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Nirmoy Das <nirmoy.das(a)intel.com>
Tomita Moeko (4):
Revert "drm/xe/gt: Update handling of xe_force_wake_get return"
Revert "drm/xe/tests/mocs: Update xe_force_wake_get() return handling"
Revert "drm/xe/devcoredump: Update handling of xe_force_wake_get
return"
Revert "drm/xe/forcewake: Add a helper xe_force_wake_ref_has_domain()"
drivers/gpu/drm/xe/tests/xe_mocs.c | 21 +++---
drivers/gpu/drm/xe/xe_devcoredump.c | 14 ++--
drivers/gpu/drm/xe/xe_force_wake.h | 16 -----
drivers/gpu/drm/xe/xe_gt.c | 105 +++++++++++++---------------
4 files changed, 63 insertions(+), 93 deletions(-)
--
2.47.2
Greg recently reported 3 patches that could not be applied without
conflicts in v6.1:
- f8a1d9b18c5e ("mptcp: make fallback action and fallback decision atomic")
- def5b7b2643e ("mptcp: plug races between subflow fail and subflow creation")
- da9b2fc7b73d ("mptcp: reset fallback status gracefully at disconnect() time")
Conflicts have been resolved, and documented in each patch.
Paolo Abeni (3):
mptcp: make fallback action and fallback decision atomic
mptcp: plug races between subflow fail and subflow creation
mptcp: reset fallback status gracefully at disconnect() time
net/mptcp/options.c | 3 ++-
net/mptcp/pm.c | 8 ++++++-
net/mptcp/protocol.c | 55 ++++++++++++++++++++++++++++++++++++--------
net/mptcp/protocol.h | 27 +++++++++++++++++-----
net/mptcp/subflow.c | 30 +++++++++++++++---------
5 files changed, 95 insertions(+), 28 deletions(-)
--
2.50.0
The patch titled
Subject: mm: memory-tiering: fix PGPROMOTE_CANDIDATE counting
has been added to the -mm mm-new branch. Its filename is
mm-memory-tiering-fix-pgpromote_candidate-counting.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ruan Shiyang <ruansy.fnst(a)fujitsu.com>
Subject: mm: memory-tiering: fix PGPROMOTE_CANDIDATE counting
Date: Tue, 29 Jul 2025 11:51:01 +0800
Goto-san reported confusing pgpromote statistics where the
pgpromote_success count significantly exceeded pgpromote_candidate.
On a system with three nodes (nodes 0-1: DRAM 4GB, node 2: NVDIMM 4GB):
# Enable demotion only
echo 1 > /sys/kernel/mm/numa/demotion_enabled
numactl -m 0-1 memhog -r200 3500M >/dev/null &
pid=$!
sleep 2
numactl memhog -r100 2500M >/dev/null &
sleep 10
kill -9 $pid # terminate the 1st memhog
# Enable promotion
echo 2 > /proc/sys/kernel/numa_balancing
After a few seconds, we observeed `pgpromote_candidate < pgpromote_success`
$ grep -e pgpromote /proc/vmstat
pgpromote_success 2579
pgpromote_candidate 0
In this scenario, after terminating the first memhog, the conditions for
pgdat_free_space_enough() are quickly met, and triggers promotion.
However, these migrated pages are only counted for in PGPROMOTE_SUCCESS,
not in PGPROMOTE_CANDIDATE.
To solve these confusing statistics, introduce PGPROMOTE_CANDIDATE_NRL to
count the missed promotion pages. And also, not counting these pages into
PGPROMOTE_CANDIDATE is to avoid changing the existing algorithm or
performance of the promotion rate limit.
Link: https://lkml.kernel.org/r/20250729035101.1601407-1-ruansy.fnst@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Signed-off-by: Ruan Shiyang <ruansy.fnst(a)fujitsu.com>
Reported-by: Yasunori Gotou (Fujitsu) <y-goto(a)fujitsu.com>
Suggested-by: Huang Ying <ying.huang(a)linux.alibaba.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Valentin Schneider <vschneid(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmzone.h | 16 +++++++++++++++-
kernel/sched/fair.c | 5 +++--
mm/vmstat.c | 1 +
3 files changed, 19 insertions(+), 3 deletions(-)
--- a/include/linux/mmzone.h~mm-memory-tiering-fix-pgpromote_candidate-counting
+++ a/include/linux/mmzone.h
@@ -234,7 +234,21 @@ enum node_stat_item {
#endif
#ifdef CONFIG_NUMA_BALANCING
PGPROMOTE_SUCCESS, /* promote successfully */
- PGPROMOTE_CANDIDATE, /* candidate pages to promote */
+ /**
+ * Candidate pages for promotion based on hint fault latency. This
+ * counter is used to control the promotion rate and adjust the hot
+ * threshold.
+ */
+ PGPROMOTE_CANDIDATE,
+ /**
+ * Not rate-limited (NRL) candidate pages for those can be promoted
+ * without considering hot threshold because of enough free pages in
+ * fast-tier node. These promotions bypass the regular hotness checks
+ * and do NOT influence the promotion rate-limiter or
+ * threshold-adjustment logic.
+ * This is for statistics/monitoring purposes.
+ */
+ PGPROMOTE_CANDIDATE_NRL,
#endif
/* PGDEMOTE_*: pages demoted */
PGDEMOTE_KSWAPD,
--- a/kernel/sched/fair.c~mm-memory-tiering-fix-pgpromote_candidate-counting
+++ a/kernel/sched/fair.c
@@ -1940,11 +1940,13 @@ bool should_numa_migrate_memory(struct t
struct pglist_data *pgdat;
unsigned long rate_limit;
unsigned int latency, th, def_th;
+ long nr = folio_nr_pages(folio);
pgdat = NODE_DATA(dst_nid);
if (pgdat_free_space_enough(pgdat)) {
/* workload changed, reset hot threshold */
pgdat->nbp_threshold = 0;
+ mod_node_page_state(pgdat, PGPROMOTE_CANDIDATE_NRL, nr);
return true;
}
@@ -1958,8 +1960,7 @@ bool should_numa_migrate_memory(struct t
if (latency >= th)
return false;
- return !numa_promotion_rate_limit(pgdat, rate_limit,
- folio_nr_pages(folio));
+ return !numa_promotion_rate_limit(pgdat, rate_limit, nr);
}
this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
--- a/mm/vmstat.c~mm-memory-tiering-fix-pgpromote_candidate-counting
+++ a/mm/vmstat.c
@@ -1280,6 +1280,7 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_NUMA_BALANCING
[I(PGPROMOTE_SUCCESS)] = "pgpromote_success",
[I(PGPROMOTE_CANDIDATE)] = "pgpromote_candidate",
+ [I(PGPROMOTE_CANDIDATE_NRL)] = "pgpromote_candidate_nrl",
#endif
[I(PGDEMOTE_KSWAPD)] = "pgdemote_kswapd",
[I(PGDEMOTE_DIRECT)] = "pgdemote_direct",
_
Patches currently in -mm which might be from ruansy.fnst(a)fujitsu.com are
mm-memory-tiering-fix-pgpromote_candidate-counting.patch