V1 -> V2:
- Because `pmd_val` variable broke ppc builds due to its name,
renamed it to `_pmd`. see [1].
[1] https://lore.kernel.org/stable/aS7lPZPYuChOTdXU@hyeyoo
- Added David Hildenbrand's Acked-by [2], thanks a lot!
[2] https://lore.kernel.org/linux-mm/ac8d7137-3819-4a75-9dd3-fb3d2259ebe4@kerne…
# TL;DR
previous discussion: https://lore.kernel.org/linux-mm/20250921232709.1608699-1-harry.yoo@oracle.…
A "bad pmd" error occurs due to race condition between
change_prot_numa() and THP migration. The mainline kernel does not have
this bug as commit 670ddd8cdc fixes the race condition. 6.1.y, 5.15.y,
5.10.y, 5.4.y are affected by this bug.
Fixing this in -stable kernels is tricky because pte_map_offset_lock()
has different semantics in pre-6.5 and post-6.5 kernels. I am trying to
backport the same mechanism we have in the mainline kernel.
Since the code looks bit different due to different semantics of
pte_map_offset_lock(), it'd be best to get this reviewed by MM folks.
# Testing
I verified that the bug described below is not reproduced anymore
(on a downstream kernel) after applying this patch series. It used to
trigger in few days of intensive numa balancing testing, but it survived
2 weeks with this applied.
# Bug Description
It was reported that a bad pmd is seen when automatic NUMA
balancing is marking page table entries as prot_numa:
[2437548.196018] mm/pgtable-generic.c:50: bad pmd 00000000af22fc02(dffffffe71fbfe02)
[2437548.235022] Call Trace:
[2437548.238234] <TASK>
[2437548.241060] dump_stack_lvl+0x46/0x61
[2437548.245689] panic+0x106/0x2e5
[2437548.249497] pmd_clear_bad+0x3c/0x3c
[2437548.253967] change_pmd_range.isra.0+0x34d/0x3a7
[2437548.259537] change_p4d_range+0x156/0x20e
[2437548.264392] change_protection_range+0x116/0x1a9
[2437548.269976] change_prot_numa+0x15/0x37
[2437548.274774] task_numa_work+0x1b8/0x302
[2437548.279512] task_work_run+0x62/0x95
[2437548.283882] exit_to_user_mode_loop+0x1a4/0x1a9
[2437548.289277] exit_to_user_mode_prepare+0xf4/0xfc
[2437548.294751] ? sysvec_apic_timer_interrupt+0x34/0x81
[2437548.300677] irqentry_exit_to_user_mode+0x5/0x25
[2437548.306153] asm_sysvec_apic_timer_interrupt+0x16/0x1b
This is due to a race condition between change_prot_numa() and
THP migration because the kernel doesn't check is_swap_pmd() and
pmd_trans_huge() atomically:
change_prot_numa() THP migration
======================================================================
- change_pmd_range()
-> is_swap_pmd() returns false,
meaning it's not a PMD migration
entry.
- do_huge_pmd_numa_page()
-> migrate_misplaced_page() sets
migration entries for the THP.
- change_pmd_range()
-> pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_none() and pmd_trans_huge() returns false
- pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_bad() returns true for the migration entry!
The upstream commit 670ddd8cdcbd ("mm/mprotect: delete
pmd_none_or_clear_bad_unless_trans_huge()") closes this race condition
by checking is_swap_pmd() and pmd_trans_huge() atomically.
# Backporting note
commit a79390f5d6a7 ("mm/mprotect: use long for page accountings and retval")
is backported to return an error code (negative value) in
change_pte_range().
Unlike the mainline, pte_offset_map_lock() does not check if the pmd
entry is a migration entry or a hugepage; acquires PTL unconditionally
instead of returning failure. Therefore, it is necessary to keep the
!is_swap_pmd() && !pmd_trans_huge() && !pmd_devmap() checks in
change_pmd_range() before acquiring the PTL.
After acquiring the lock, open-code the semantics of
pte_offset_map_lock() in the mainline kernel; change_pte_range() fails
if the pmd value has changed. This requires adding pmd_old parameter
(pmd_t value that is read before calling the function) to
change_pte_range().
Hugh Dickins (1):
mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge()
Peter Xu (1):
mm/mprotect: use long for page accountings and retval
include/linux/hugetlb.h | 4 +-
include/linux/mm.h | 2 +-
mm/hugetlb.c | 4 +-
mm/mempolicy.c | 2 +-
mm/mprotect.c | 125 ++++++++++++++++++----------------------
5 files changed, 61 insertions(+), 76 deletions(-)
--
2.43.0
When bnxt_init_one() fails during initialization (e.g.,
bnxt_init_int_mode returns -ENODEV), the error path calls
bnxt_free_hwrm_resources() which destroys the DMA pool and sets
bp->hwrm_dma_pool to NULL. Subsequently, bnxt_ptp_clear() is called,
which invokes ptp_clock_unregister().
Since commit a60fc3294a37 ("ptp: rework ptp_clock_unregister() to
disable events"), ptp_clock_unregister() now calls
ptp_disable_all_events(), which in turn invokes the driver's .enable()
callback (bnxt_ptp_enable()) to disable PTP events before completing the
unregistration.
bnxt_ptp_enable() attempts to send HWRM commands via bnxt_ptp_cfg_pin()
and bnxt_ptp_cfg_event(), both of which call hwrm_req_init(). This
function tries to allocate from bp->hwrm_dma_pool, causing a NULL
pointer dereference:
bnxt_en 0000:01:00.0 (unnamed net_device) (uninitialized): bnxt_init_int_mode err: ffffffed
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
Call Trace:
__hwrm_req_init (drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c:72)
bnxt_ptp_enable (drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:323 drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:517)
ptp_disable_all_events (drivers/ptp/ptp_chardev.c:66)
ptp_clock_unregister (drivers/ptp/ptp_clock.c:518)
bnxt_ptp_clear (drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:1134)
bnxt_init_one (drivers/net/ethernet/broadcom/bnxt/bnxt.c:16889)
Lines are against commit f8f9c1f4d0c7 ("Linux 6.19-rc3")
Fix this by clearing and unregistering ptp (bnxt_ptp_clear()) before
freeing HWRM resources.
Suggested-by: Pavan Chebbi <pavan.chebbi(a)broadcom.com>
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Fixes: a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable events")
Cc: stable(a)vger.kernel.org
---
Changes in v2:
- Instead of checking for HWRM resources in bnxt_ptp_enable(), call it
when HWRM resources are availble (Pavan Chebbi)
- Link to v1: https://patch.msgid.link/20251231-bnxt-v1-1-8f9cde6698b4@debian.org
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d160e54ac121..5a4af8abf848 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -16891,11 +16891,11 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
init_err_pci_clean:
bnxt_hwrm_func_drv_unrgtr(bp);
+ bnxt_ptp_clear(bp);
+ kfree(bp->ptp_cfg);
bnxt_free_hwrm_resources(bp);
bnxt_hwmon_uninit(bp);
bnxt_ethtool_free(bp);
- bnxt_ptp_clear(bp);
- kfree(bp->ptp_cfg);
bp->ptp_cfg = NULL;
kfree(bp->fw_health);
bp->fw_health = NULL;
---
base-commit: e146b276a817807b8f4a94b5781bf80c6c00601b
change-id: 20251231-bnxt-c54d317d8bfe
Best regards,
--
Breno Leitao <leitao(a)debian.org>
When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback()
schedules delayed work with a delay of 0, causing immediate execution.
The function then reschedules itself with 0 delay again, creating an
infinite busy loop that causes 100% kworker CPU usage.
Fix by:
- Only scheduling delayed work in wakeup_dirtytime_writeback() when
dirtytime_expire_interval is non-zero
- Cancelling the delayed work in dirtytime_interval_handler() when
the interval is set to 0
- Adding a guard in start_dirtytime_writeback() for defensive coding
Tested by booting kernel in QEMU with virtme-ng:
- Before fix: kworker CPU spikes to ~73%
- After fix: CPU remains at normal levels
- Setting interval back to non-zero correctly resumes writeback
Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written")
Cc: stable(a)vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227
Signed-off-by: Laveesh Bansal <laveeshb(a)laveeshbansal.com>
---
fs/fs-writeback.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6800886c4d10..cd21c74cd0e5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2492,7 +2492,8 @@ static void wakeup_dirtytime_writeback(struct work_struct *w)
wb_wakeup(wb);
}
rcu_read_unlock();
- schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
+ if (dirtytime_expire_interval)
+ schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
}
static int dirtytime_interval_handler(const struct ctl_table *table, int write,
@@ -2501,8 +2502,12 @@ static int dirtytime_interval_handler(const struct ctl_table *table, int write,
int ret;
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
- if (ret == 0 && write)
- mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
+ if (ret == 0 && write) {
+ if (dirtytime_expire_interval)
+ mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
+ else
+ cancel_delayed_work_sync(&dirtytime_work);
+ }
return ret;
}
@@ -2519,7 +2524,8 @@ static const struct ctl_table vm_fs_writeback_table[] = {
static int __init start_dirtytime_writeback(void)
{
- schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
+ if (dirtytime_expire_interval)
+ schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
register_sysctl_init("vm", vm_fs_writeback_table);
return 0;
}
--
2.43.0
Xorg fails to start with defaults on AMD KV260, /var/log/Xorg.0.log:
[ 23.491] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[ 23.491] (II) Module fbdev: vendor="X.Org Foundation"
[ 23.491] compiled for 1.21.1.18, module version = 0.5.1
[ 23.491] Module class: X.Org Video Driver
[ 23.491] ABI class: X.Org Video Driver, version 25.2
[ 23.491] (II) modesetting: Driver for Modesetting Kernel Drivers:
kms
[ 23.491] (II) FBDEV: driver for framebuffer: fbdev
[ 23.510] (II) modeset(0): using drv /dev/dri/card1
[ 23.511] (WW) Falling back to old probe method for fbdev
[ 23.511] (II) Loading sub module "fbdevhw"
[ 23.511] (II) LoadModule: "fbdevhw"
[ 23.511] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[ 23.511] (II) Module fbdevhw: vendor="X.Org Foundation"
[ 23.511] compiled for 1.21.1.18, module version = 0.0.2
[ 23.511] ABI class: X.Org Video Driver, version 25.2
[ 23.512] (II) modeset(0): Creating default Display subsection in
Screen section
"Default Screen Section" for depth/fbbpp 24/32
[ 23.512] (==) modeset(0): Depth 24, (==) framebuffer bpp 32
[ 23.512] (==) modeset(0): RGB weight 888
[ 23.512] (==) modeset(0): Default visual is TrueColor
...
[ 23.911] (II) Loading sub module "fb"
[ 23.911] (II) LoadModule: "fb"
[ 23.911] (II) Module "fb" already built-in
[ 23.911] (II) UnloadModule: "fbdev"
[ 23.911] (II) Unloading fbdev
[ 23.912] (II) UnloadSubModule: "fbdevhw"
[ 23.912] (II) Unloading fbdevhw
[ 24.238] (==) modeset(0): Backing store enabled
[ 24.238] (==) modeset(0): Silken mouse enabled
[ 24.249] (II) modeset(0): Initializing kms color map for depth 24, 8
bpc.
[ 24.250] (==) modeset(0): DPMS enabled
[ 24.250] (II) modeset(0): [DRI2] Setup complete
[ 24.250] (II) modeset(0): [DRI2] DRI driver: kms_swrast
[ 24.250] (II) modeset(0): [DRI2] VDPAU driver: kms_swrast
...
[ 24.770] (II) modeset(0): Disabling kernel dirty updates, not
required.
[ 24.770] (EE) modeset(0): failed to set mode: Invalid argument
xorg tries to use 24 and 32 bpp which pass on the fb API but which
don't actually work on AMD KV260 when Xorg starts. As a workaround
Xorg config can set color depth to 16 using /etc/X11/xorg.conf snippet:
Section "Screen"
Identifier "Default Screen"
Monitor "Configured Monitor"
Device "Configured Video Device"
DefaultDepth 16
EndSection
But this is cumbersome on images meant for multiple different arm64
devices and boards. So instead set 16 bpp as preferred depth
in zynqmp_kms fb driver which is used by Xorg in the logic to find
out a working depth.
Now Xorg startup and bpp query using fb API works and HDMI display
shows graphics. /var/log/Xorg.0.log shows:
[ 23.219] (II) LoadModule: "fbdev"
[ 23.219] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[ 23.219] (II) Module fbdev: vendor="X.Org Foundation"
[ 23.219] compiled for 1.21.1.18, module version = 0.5.1
[ 23.219] Module class: X.Org Video Driver
[ 23.219] ABI class: X.Org Video Driver, version 25.2
[ 23.219] (II) modesetting: Driver for Modesetting Kernel Drivers:
kms
[ 23.219] (II) FBDEV: driver for framebuffer: fbdev
[ 23.238] (II) modeset(0): using drv /dev/dri/card1
[ 23.238] (WW) Falling back to old probe method for fbdev
[ 23.238] (II) Loading sub module "fbdevhw"
[ 23.238] (II) LoadModule: "fbdevhw"
[ 23.239] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[ 23.239] (II) Module fbdevhw: vendor="X.Org Foundation"
[ 23.239] compiled for 1.21.1.18, module version = 0.0.2
[ 23.239] ABI class: X.Org Video Driver, version 25.2
[ 23.240] (II) modeset(0): Creating default Display subsection in Screen section
"Default Screen Section" for depth/fbbpp 16/16
[ 23.240] (==) modeset(0): Depth 16, (==) framebuffer bpp 16
[ 23.240] (==) modeset(0): RGB weight 565
[ 23.240] (==) modeset(0): Default visual is TrueColor
...
[ 24.015] (==) modeset(0): Backing store enabled
[ 24.015] (==) modeset(0): Silken mouse enabled
[ 24.027] (II) modeset(0): Initializing kms color map for depth 16, 6 bpc.
[ 24.028] (==) modeset(0): DPMS enabled
[ 24.028] (II) modeset(0): [DRI2] Setup complete
[ 24.028] (II) modeset(0): [DRI2] DRI driver: kms_swrast
[ 24.028] (II) modeset(0): [DRI2] VDPAU driver: kms_swrast
Cc: Bill Mills <bill.mills(a)linaro.org>
Cc: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
Cc: Anatoliy Klymenko <anatoliy.klymenko(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikko Rapeli <mikko.rapeli(a)linaro.org>
---
drivers/gpu/drm/xlnx/zynqmp_kms.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c b/drivers/gpu/drm/xlnx/zynqmp_kms.c
index ccc35cacd10cb..a42192c827af0 100644
--- a/drivers/gpu/drm/xlnx/zynqmp_kms.c
+++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c
@@ -506,6 +506,7 @@ int zynqmp_dpsub_drm_init(struct zynqmp_dpsub *dpsub)
drm->mode_config.min_height = 0;
drm->mode_config.max_width = ZYNQMP_DISP_MAX_WIDTH;
drm->mode_config.max_height = ZYNQMP_DISP_MAX_HEIGHT;
+ drm->mode_config.preferred_depth = 16;
ret = drm_vblank_init(drm, 1);
if (ret)
--
2.34.1
From: Anatoliy Klymenko <anatoliy.klymenko(a)amd.com>
Use RG16 buffer format for fbdev emulation. Fbdev backend is being used
by Mali 400 userspace driver which expects 16 bit RGB pixel color format.
This change should not affect console or other fbdev applications as we
still have plenty of colors left.
Cc: Bill Mills <bill.mills(a)linaro.org>
Cc: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Anatoliy Klymenko <anatoliy.klymenko(a)amd.com>
Signed-off-by: Mikko Rapeli <mikko.rapeli(a)linaro.org>
---
drivers/gpu/drm/xlnx/zynqmp_kms.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c b/drivers/gpu/drm/xlnx/zynqmp_kms.c
index 2bee0a2275ede..ccc35cacd10cb 100644
--- a/drivers/gpu/drm/xlnx/zynqmp_kms.c
+++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c
@@ -525,7 +525,7 @@ int zynqmp_dpsub_drm_init(struct zynqmp_dpsub *dpsub)
goto err_poll_fini;
/* Initialize fbdev generic emulation. */
- drm_client_setup_with_fourcc(drm, DRM_FORMAT_RGB888);
+ drm_client_setup_with_fourcc(drm, DRM_FORMAT_RGB565);
return 0;
--
2.34.1
If `locked_pages` is zero, the page array must not be allocated:
ceph_process_folio_batch() uses `locked_pages` to decide when to
allocate `pages`, and redundant allocations trigger
ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
writeback stall) or even a kernel panic. Consequently, the main loop in
ceph_writepages_start() assumes that the lifetime of `pages` is confined
to a single iteration.
The ceph_submit_write() function claims ownership of the page array on
success. But failures only redirty/unlock the pages and fail to free the
array, making the failure case in ceph_submit_write() fatal.
Free the page array in ceph_submit_write()'s error-handling 'if' block
so that the caller's invariant (that the array does not outlive the
iteration) is maintained unconditionally, allowing failures in
ceph_submit_write() to be recoverable as originally intended.
Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sam Edwards <CFSworks(a)gmail.com>
---
fs/ceph/addr.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 2b722916fb9b..91cc43950162 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1466,6 +1466,13 @@ int ceph_submit_write(struct address_space *mapping,
unlock_page(page);
}
+ if (ceph_wbc->from_pool) {
+ mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
+ ceph_wbc->from_pool = false;
+ } else
+ kfree(ceph_wbc->pages);
+ ceph_wbc->pages = NULL;
+
ceph_osdc_put_request(req);
return -EIO;
}
--
2.51.2