Hi Greg and Sasha,
Please consider applying commit 8dcd71b45df3 ("powerpc/prom_init: Use
-ffreestanding to avoid a reference to bcmp") to 5.4, as it resolves an
error seen in that branch with a newer version of clang that looks for
loops to turn into strlen(), which it is not allowed to do in this early
code. It landed in 5.5 so it is in every other stable branch already.
Please let me know if there are any issues.
Cheers,
Nathan
From: Mario Limonciello <mario.limonciello(a)amd.com>
On systems that only have an SRA sensor connected to SFH the sensor
doesn't get enabled due to a bad optimization condition of breaking
the sensor walk loop.
This optimization is unnecessary in the first place because if there
is only one device then the loop only runs once. Drop the condition
and explicitly mark sensor as enabled.
Reported-by: Yijun Shen <Yijun.Shen(a)dell.com>
Tested-By: Yijun Shen <Yijun_Shen(a)Dell.com>
Fixes: d1c444b47100d ("HID: amd_sfh: Add support to export device operating states")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
v2:
* Add tag
---
drivers/hid/amd-sfh-hid/sfh1_1/amd_sfh_init.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/hid/amd-sfh-hid/sfh1_1/amd_sfh_init.c b/drivers/hid/amd-sfh-hid/sfh1_1/amd_sfh_init.c
index 25f0ebfcbd5f5..c1bdf1e0d44af 100644
--- a/drivers/hid/amd-sfh-hid/sfh1_1/amd_sfh_init.c
+++ b/drivers/hid/amd-sfh-hid/sfh1_1/amd_sfh_init.c
@@ -134,9 +134,6 @@ static int amd_sfh1_1_hid_client_init(struct amd_mp2_dev *privdata)
for (i = 0; i < cl_data->num_hid_devices; i++) {
cl_data->sensor_sts[i] = SENSOR_DISABLED;
- if (cl_data->num_hid_devices == 1 && cl_data->sensor_idx[0] == SRA_IDX)
- break;
-
if (cl_data->sensor_idx[i] == SRA_IDX) {
info.sensor_idx = cl_data->sensor_idx[i];
writel(0, privdata->mmio + amd_get_p2c_val(privdata, 0));
@@ -145,8 +142,10 @@ static int amd_sfh1_1_hid_client_init(struct amd_mp2_dev *privdata)
(privdata, cl_data->sensor_idx[i], ENABLE_SENSOR);
cl_data->sensor_sts[i] = (status == 0) ? SENSOR_ENABLED : SENSOR_DISABLED;
- if (cl_data->sensor_sts[i] == SENSOR_ENABLED)
+ if (cl_data->sensor_sts[i] == SENSOR_ENABLED) {
+ cl_data->is_any_sensor_enabled = true;
privdata->dev_en.is_sra_present = true;
+ }
continue;
}
--
2.43.0
The patch titled
Subject: mm/contig_alloc: fix alloc_contig_range when __GFP_COMP and order < MAX_ORDER
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-contig_alloc-fix-alloc_contig_range-when-__gfp_comp-and-order-max_order.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jinjiang Tu <tujinjiang(a)huawei.com>
Subject: mm/contig_alloc: fix alloc_contig_range when __GFP_COMP and order < MAX_ORDER
Date: Mon, 21 Apr 2025 09:36:20 +0800
When calling alloc_contig_range() with __GFP_COMP and the order of
requested pfn range is pageblock_order, less than MAX_ORDER, I triggered
WARNING as follows:
PFN range: requested [2150105088, 2150105600), allocated [2150105088, 2150106112)
WARNING: CPU: 3 PID: 580 at mm/page_alloc.c:6877 alloc_contig_range+0x280/0x340
alloc_contig_range() marks pageblocks of the requested pfn range to be
isolated, migrate these pages if they are in use and will be freed to
MIGRATE_ISOLATED freelist.
Suppose two alloc_contig_range() calls at the same time and the requested
pfn range are [0x80280000, 0x80280200) and [0x80280200, 0x80280400)
respectively. Suppose the two memory range are in use, then
alloc_contig_range() will migrate and free these pages to MIGRATE_ISOLATED
freelist. __free_one_page() will merge MIGRATE_ISOLATE buddy to larger
buddy, resulting in a MAX_ORDER buddy. Finally, find_large_buddy() in
alloc_contig_range() returns a MAX_ORDER buddy and results in WARNING.
To fix it, call free_contig_range() to free the excess pfn range.
Link: https://lkml.kernel.org/r/20250421013620.459740-1-tujinjiang@huawei.com
Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-contig_alloc-fix-alloc_contig_range-when-__gfp_comp-and-order-max_order
+++ a/mm/page_alloc.c
@@ -6706,6 +6706,7 @@ int alloc_contig_range_noprof(unsigned l
.alloc_contig = true,
};
INIT_LIST_HEAD(&cc.migratepages);
+ bool is_range_aligned;
gfp_mask = current_gfp_context(gfp_mask);
if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask))
@@ -6794,7 +6795,14 @@ int alloc_contig_range_noprof(unsigned l
goto done;
}
- if (!(gfp_mask & __GFP_COMP)) {
+ /*
+ * With __GFP_COMP and the requested order < MAX_PAGE_ORDER,
+ * isolated free pages can have higher order than the requested
+ * one. Use split_free_pages() to free out of range pages.
+ */
+ is_range_aligned = is_power_of_2(end - start);
+ if (!(gfp_mask & __GFP_COMP) ||
+ (is_range_aligned && ilog2(end - start) < MAX_PAGE_ORDER)) {
split_free_pages(cc.freepages, gfp_mask);
/* Free head and tail (if any) */
@@ -6802,7 +6810,15 @@ int alloc_contig_range_noprof(unsigned l
free_contig_range(outer_start, start - outer_start);
if (end != outer_end)
free_contig_range(end, outer_end - end);
- } else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
+
+ outer_start = start;
+ outer_end = end;
+
+ if (!(gfp_mask & __GFP_COMP))
+ goto done;
+ }
+
+ if (start == outer_start && end == outer_end && is_range_aligned) {
struct page *head = pfn_to_page(start);
int order = ilog2(end - start);
_
Patches currently in -mm which might be from tujinjiang(a)huawei.com are
mm-contig_alloc-fix-alloc_contig_range-when-__gfp_comp-and-order-max_order.patch
The patch titled
Subject: watchdog: fix watchdog may detect false positive of softlockup
has been added to the -mm mm-nonmm-unstable branch. Its filename is
watchdog-fix-watchdog-may-detect-false-positive-of-softlockup.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Luo Gengkun <luogengkun(a)huaweicloud.com>
Subject: watchdog: fix watchdog may detect false positive of softlockup
Date: Mon, 21 Apr 2025 03:50:21 +0000
When updating `watchdog_thresh`, there is a race condition between writing
the new `watchdog_thresh` value and stopping the old watchdog timer. If
the old timer triggers during this window, it may falsely detect a
softlockup due to the old interval and the new `watchdog_thresh` value
being used. The problem can be described as follow:
# We asuume previous watchdog_thresh is 60, so the watchdog timer is
# coming every 24s.
echo 10 > /proc/sys/kernel/watchdog_thresh (User space)
|
+------>+ update watchdog_thresh (We are in kernel now)
|
| # using old interval and new `watchdog_thresh`
+------>+ watchdog hrtimer (irq context: detect softlockup)
|
|
+-------+
|
|
+ softlockup_stop_all
To fix this problem, introduce a shadow variable for `watchdog_thresh`.
The update to the actual `watchdog_thresh` is delayed until after the old
timer is stopped, preventing false positives.
The following testcase may help to understand this problem.
---------------------------------------------
echo RT_RUNTIME_SHARE > /sys/kernel/debug/sched/features
echo -1 > /proc/sys/kernel/sched_rt_runtime_us
echo 0 > /sys/kernel/debug/sched/fair_server/cpu3/runtime
echo 60 > /proc/sys/kernel/watchdog_thresh
taskset -c 3 chrt -r 99 /bin/bash -c "while true;do true; done" &
echo 10 > /proc/sys/kernel/watchdog_thresh &
---------------------------------------------
The test case above first removes the throttling restrictions for
real-time tasks. It then sets watchdog_thresh to 60 and executes a
real-time task ,a simple while(1) loop, on cpu3. Consequently, the final
command gets blocked because the presence of this real-time thread
prevents kworker:3 from being selected by the scheduler. This eventually
triggers a softlockup detection on cpu3 due to watchdog_timer_fn operating
with inconsistent variable - using both the old interval and the updated
watchdog_thresh simultaneously.
Link: https://lkml.kernel.org/r/20250421035021.3507649-1-luogengkun@huaweicloud.c…
Signed-off-by: Luo Gengkun <luogengkun(a)huaweicloud.com>
Cc: Doug Anderson <dianders(a)chromium.org>
Cc: Joel Granados <joel.granados(a)kernel.org>
Cc: Song Liu <song(a)kernel.org>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/watchdog.c | 37 ++++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 13 deletions(-)
--- a/kernel/watchdog.c~watchdog-fix-watchdog-may-detect-false-positive-of-softlockup
+++ a/kernel/watchdog.c
@@ -47,6 +47,7 @@ int __read_mostly watchdog_user_enabled
static int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT;
static int __read_mostly watchdog_softlockup_user_enabled = 1;
int __read_mostly watchdog_thresh = 10;
+static int __read_mostly watchdog_thresh_next;
static int __read_mostly watchdog_hardlockup_available;
struct cpumask watchdog_cpumask __read_mostly;
@@ -870,12 +871,20 @@ int lockup_detector_offline_cpu(unsigned
return 0;
}
-static void __lockup_detector_reconfigure(void)
+static void __lockup_detector_reconfigure(bool thresh_changed)
{
cpus_read_lock();
watchdog_hardlockup_stop();
softlockup_stop_all();
+ /*
+ * To prevent watchdog_timer_fn from using the old interval and
+ * the new watchdog_thresh at the same time, which could lead to
+ * false softlockup reports, it is necessary to update the
+ * watchdog_thresh after the softlockup is completed.
+ */
+ if (thresh_changed)
+ watchdog_thresh = READ_ONCE(watchdog_thresh_next);
set_sample_period();
lockup_detector_update_enable();
if (watchdog_enabled && watchdog_thresh)
@@ -888,7 +897,7 @@ static void __lockup_detector_reconfigur
void lockup_detector_reconfigure(void)
{
mutex_lock(&watchdog_mutex);
- __lockup_detector_reconfigure();
+ __lockup_detector_reconfigure(false);
mutex_unlock(&watchdog_mutex);
}
@@ -908,7 +917,7 @@ static __init void lockup_detector_setup
return;
mutex_lock(&watchdog_mutex);
- __lockup_detector_reconfigure();
+ __lockup_detector_reconfigure(false);
softlockup_initialized = true;
mutex_unlock(&watchdog_mutex);
}
@@ -924,11 +933,11 @@ static void __lockup_detector_reconfigur
}
void lockup_detector_reconfigure(void)
{
- __lockup_detector_reconfigure();
+ __lockup_detector_reconfigure(false);
}
static inline void lockup_detector_setup(void)
{
- __lockup_detector_reconfigure();
+ __lockup_detector_reconfigure(false);
}
#endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
@@ -946,11 +955,11 @@ void lockup_detector_soft_poweroff(void)
#ifdef CONFIG_SYSCTL
/* Propagate any changes to the watchdog infrastructure */
-static void proc_watchdog_update(void)
+static void proc_watchdog_update(bool thresh_changed)
{
/* Remove impossible cpus to keep sysctl output clean. */
cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
- __lockup_detector_reconfigure();
+ __lockup_detector_reconfigure(thresh_changed);
}
/*
@@ -984,7 +993,7 @@ static int proc_watchdog_common(int whic
} else {
err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!err && old != READ_ONCE(*param))
- proc_watchdog_update();
+ proc_watchdog_update(false);
}
mutex_unlock(&watchdog_mutex);
return err;
@@ -1035,11 +1044,13 @@ static int proc_watchdog_thresh(const st
mutex_lock(&watchdog_mutex);
- old = READ_ONCE(watchdog_thresh);
+ watchdog_thresh_next = READ_ONCE(watchdog_thresh);
+
+ old = watchdog_thresh_next;
err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
- if (!err && write && old != READ_ONCE(watchdog_thresh))
- proc_watchdog_update();
+ if (!err && write && old != READ_ONCE(watchdog_thresh_next))
+ proc_watchdog_update(true);
mutex_unlock(&watchdog_mutex);
return err;
@@ -1060,7 +1071,7 @@ static int proc_watchdog_cpumask(const s
err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
if (!err && write)
- proc_watchdog_update();
+ proc_watchdog_update(false);
mutex_unlock(&watchdog_mutex);
return err;
@@ -1080,7 +1091,7 @@ static const struct ctl_table watchdog_s
},
{
.procname = "watchdog_thresh",
- .data = &watchdog_thresh,
+ .data = &watchdog_thresh_next,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_watchdog_thresh,
_
Patches currently in -mm which might be from luogengkun(a)huaweicloud.com are
watchdog-fix-watchdog-may-detect-false-positive-of-softlockup.patch
The patch titled
Subject: mm/huge_memory: fix dereferencing invalid pmd migration entry
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-fix-dereferencing-invalid-pmd-migration-entry.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gavin Guo <gavinguo(a)igalia.com>
Subject: mm/huge_memory: fix dereferencing invalid pmd migration entry
Date: Mon, 21 Apr 2025 19:35:36 +0800
When migrating a THP, concurrent access to the PMD migration entry during
a deferred split scan can lead to an invalid address access, as
illustrated below. To prevent this invalid access, it is necessary to
check the PMD migration entry and return early. In this context, there is
no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
equality of the target folio. Since the PMD migration entry is locked, it
cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma
lookup points to a location which may contain the folio of interest, but
might instead contain another folio: and weeding out those other folios is
precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of
replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008
CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
Call Trace:
<TASK>
try_to_migrate_one+0x28c/0x3730
rmap_walk_anon+0x4f6/0x770
unmap_folio+0x196/0x1f0
split_huge_page_to_list_to_order+0x9f6/0x1560
deferred_split_scan+0xac5/0x12a0
shrinker_debugfs_scan_write+0x376/0x470
full_proxy_write+0x15c/0x220
vfs_write+0x2fc/0xcb0
ksys_write+0x146/0x250
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on
upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com
Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/
Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Gavin Guo <gavinguo(a)igalia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Gavin Shan <gshan(a)redhat.com>
Cc: Florent Revest <revest(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memory-fix-dereferencing-invalid-pmd-migration-entry
+++ a/mm/huge_memory.c
@@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(stru
void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, bool freeze, struct folio *folio)
{
+ bool pmd_migration = is_pmd_migration_entry(*pmd);
+
VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio));
VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE));
VM_WARN_ON_ONCE(folio && !folio_test_locked(folio));
@@ -3085,9 +3087,12 @@ void split_huge_pmd_locked(struct vm_are
* require a folio to check the PMD against. Otherwise, there
* is a risk of replacing the wrong folio.
*/
- if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
- is_pmd_migration_entry(*pmd)) {
- if (folio && folio != pmd_folio(*pmd))
+ if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) {
+ /*
+ * Do not apply pmd_folio() to a migration entry; and folio lock
+ * guarantees that it must be of the wrong folio anyway.
+ */
+ if (folio && (pmd_migration || folio != pmd_folio(*pmd)))
return;
__split_huge_pmd_locked(vma, pmd, address, freeze);
}
_
Patches currently in -mm which might be from gavinguo(a)igalia.com are
mm-huge_memory-fix-dereferencing-invalid-pmd-migration-entry.patch
The amdgpu_sched_hw_submission parameter controls the number of hardware
jobs that can be scheduled concurrently per ring. This value is later
multiplied by ring->max_dw to compute buffer offsets or command patch
regions.
If amdgpu_sched_hw_submission is set too high (by user input via module
parameter), the multiplication can overflow, resulting in corrupted memory
offsets or ring buffer overflows.
Clamp amdgpu_sched_hw_submission to a practical upper bound (e.g. 2^16)
to prevent arithmetic overflow when computing ring buffer offsets during
initialization, especially in jpeg_v1_0_start().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable(a)vger.kernel.org # v5.9+
Fixes: 5d5bd5e32e6e ("drm/amdgpu: restrict the hw sched jobs number to power of two")
Signed-off-by: Alexey Nepomnyashih <sdl(a)nppct.ru>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 018dfccd771b..69217a021b0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2056,6 +2056,10 @@ static int amdgpu_device_check_arguments(struct amdgpu_device *adev)
dev_warn(adev->dev, "sched hw submission jobs (%d) must be at least 2\n",
amdgpu_sched_hw_submission);
amdgpu_sched_hw_submission = 2;
+ } else if (amdgpu_sched_hw_submission > 65536) {
+ dev_warn(adev->dev, "sched hw submission jobs (%d) is too large\n",
+ amdgpu_sched_hw_submission);
+ amdgpu_sched_hw_submission = 65536;
} else if (!is_power_of_2(amdgpu_sched_hw_submission)) {
dev_warn(adev->dev, "sched hw submission jobs (%d) must be a power of 2\n",
amdgpu_sched_hw_submission);
--
2.43.0
When migrating a THP, concurrent access to the PMD migration entry
during a deferred split scan can lead to a invalid address access, as
illustrated below. To prevent this page fault, it is necessary to check
the PMD migration entry and return early. In this context, there is no
need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
equality of the target folio. Since the PMD migration entry is locked,
it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins:
"An anon_vma lookup points to a location which may contain the folio of
interest, but might instead contain another folio: and weeding out those
other folios is precisely what the "folio != pmd_folio((*pmd)" check
(and the "risk of replacing the wrong folio" comment a few lines above
it) is for."
BUG: unable to handle page fault for address: ffffea60001db008
CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
Call Trace:
<TASK>
try_to_migrate_one+0x28c/0x3730
rmap_walk_anon+0x4f6/0x770
unmap_folio+0x196/0x1f0
split_huge_page_to_list_to_order+0x9f6/0x1560
deferred_split_scan+0xac5/0x12a0
shrinker_debugfs_scan_write+0x376/0x470
full_proxy_write+0x15c/0x220
vfs_write+0x2fc/0xcb0
ksys_write+0x146/0x250
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on
upstream.
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gavin Guo <gavinguo(a)igalia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/
---
V1 -> V2: Add explanation from Hugh and correct the wording from page
fault to invalid address access.
mm/huge_memory.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2a47682d1ab7..0cb9547dcff2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, bool freeze, struct folio *folio)
{
+ bool pmd_migration = is_pmd_migration_entry(*pmd);
+
VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio));
VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE));
VM_WARN_ON_ONCE(folio && !folio_test_locked(folio));
@@ -3085,10 +3087,18 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
* require a folio to check the PMD against. Otherwise, there
* is a risk of replacing the wrong folio.
*/
- if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
- is_pmd_migration_entry(*pmd)) {
- if (folio && folio != pmd_folio(*pmd))
- return;
+ if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) {
+ if (folio) {
+ /*
+ * Do not apply pmd_folio() to a migration entry; and
+ * folio lock guarantees that it must be of the wrong
+ * folio anyway.
+ */
+ if (pmd_migration)
+ return;
+ if (folio != pmd_folio(*pmd))
+ return;
+ }
__split_huge_pmd_locked(vma, pmd, address, freeze);
}
}
base-commit: a24588245776dafc227243a01bfbeb8a59bafba9
--
2.43.0