The patch titled
Subject: Squashfs: reject negative file sizes in squashfs_read_inode()
has been added to the -mm mm-nonmm-unstable branch. Its filename is
squashfs-reject-negative-file-sizes-in-squashfs_read_inode.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: Squashfs: reject negative file sizes in squashfs_read_inode()
Date: Fri, 26 Sep 2025 22:59:35 +0100
Syskaller reports a "WARNING in ovl_copy_up_file" in overlayfs.
This warning is ultimately caused because the underlying Squashfs file
system returns a file with a negative file size.
This commit checks for a negative file size and returns EINVAL.
Link: https://lkml.kernel.org/r/20250926215935.107233-1-phillip@squashfs.org.uk
Fixes: 6545b246a2c8 ("Squashfs: inode operations")
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: syzbot+f754e01116421e9754b9(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68d580e5.a00a0220.303701.0019.GAE@google.com/
Cc: Amir Goldstein <amir73il(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/inode.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
--- a/fs/squashfs/inode.c~squashfs-reject-negative-file-sizes-in-squashfs_read_inode
+++ a/fs/squashfs/inode.c
@@ -145,6 +145,10 @@ int squashfs_read_inode(struct inode *in
goto failed_read;
inode->i_size = le32_to_cpu(sqsh_ino->file_size);
+ if (inode->i_size < 0) {
+ err = -EINVAL;
+ goto failed_read;
+ }
frag = le32_to_cpu(sqsh_ino->fragment);
if (frag != SQUASHFS_INVALID_FRAG) {
/*
@@ -197,6 +201,10 @@ int squashfs_read_inode(struct inode *in
goto failed_read;
inode->i_size = le64_to_cpu(sqsh_ino->file_size);
+ if (inode->i_size < 0) {
+ err = -EINVAL;
+ goto failed_read;
+ }
frag = le32_to_cpu(sqsh_ino->fragment);
if (frag != SQUASHFS_INVALID_FRAG) {
/*
@@ -249,8 +257,12 @@ int squashfs_read_inode(struct inode *in
if (err < 0)
goto failed_read;
- set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_size = le16_to_cpu(sqsh_ino->file_size);
+ if (inode->i_size < 0) {
+ err = -EINVAL;
+ goto failed_read;
+ }
+ set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_op = &squashfs_dir_inode_ops;
inode->i_fop = &squashfs_dir_ops;
inode->i_mode |= S_IFDIR;
@@ -273,9 +285,13 @@ int squashfs_read_inode(struct inode *in
if (err < 0)
goto failed_read;
+ inode->i_size = le32_to_cpu(sqsh_ino->file_size);
+ if (inode->i_size < 0) {
+ err = -EINVAL;
+ goto failed_read;
+ }
xattr_id = le32_to_cpu(sqsh_ino->xattr);
set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
- inode->i_size = le32_to_cpu(sqsh_ino->file_size);
inode->i_op = &squashfs_dir_inode_ops;
inode->i_fop = &squashfs_dir_ops;
inode->i_mode |= S_IFDIR;
@@ -302,7 +318,7 @@ int squashfs_read_inode(struct inode *in
goto failed_read;
inode->i_size = le32_to_cpu(sqsh_ino->symlink_size);
- if (inode->i_size > PAGE_SIZE) {
+ if (inode->i_size < 0 || inode->i_size > PAGE_SIZE) {
ERROR("Corrupted symlink\n");
return -EINVAL;
}
_
Patches currently in -mm which might be from phillip(a)squashfs.org.uk are
squashfs-fix-uninit-value-in-squashfs_get_parent.patch
squashfs-add-additional-inode-sanity-checking.patch
squashfs-add-seek_data-seek_hole-support.patch
squashfs-reject-negative-file-sizes-in-squashfs_read_inode.patch
This patch series enables a future version of tune2fs to be able to
modify certain parts of the ext4 superblock without to write to the
block device.
The first patch fixes a potential buffer overrun caused by a
maliciously moified superblock. The second patch adds support for
32-bit uid and gid's which can have access to the reserved blocks pool.
The last patch adds the ioctl's which will be used by tune2fs.
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
---
Changes in v2:
- fix bugs that were detected using sparse
- remove tune (unsafe) ability to clear certain compat faatures
- add the ability to set the encoding and encoding flags for case folding
- Link to v1: https://lore.kernel.org/r/20250908-tune2fs-v1-0-e3a6929f3355@mit.edu
---
Theodore Ts'o (3):
ext4: avoid potential buffer over-read in parse_apply_sb_mount_options()
ext4: add support for 32-bit default reserved uid and gid values
ext4: implemet new ioctls to set and get superblock parameters
fs/ext4/ext4.h | 16 +++-
fs/ext4/ioctl.c | 312 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/ext4/super.c | 25 +++----
include/uapi/linux/ext4.h | 53 +++++++++++++
4 files changed, 382 insertions(+), 24 deletions(-)
---
base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0
change-id: 20250830-tune2fs-3376beb72403
Best regards,
--
Theodore Ts'o <tytso(a)mit.edu>
Make sure to drop the reference taken to the ocmem platform device when
looking up its driver data.
Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.
Also note that commit 0ff027027e05 ("soc: qcom: ocmem: Fix missing
put_device() call in of_get_ocmem") fixed the leak in a lookup error
path, but the reference is still leaking on success.
Fixes: 88c1e9404f1d ("soc: qcom: add OCMEM driver")
Cc: stable(a)vger.kernel.org # 5.5: 0ff027027e05
Cc: Brian Masney <bmasney(a)redhat.com>
Cc: Miaoqian Lin <linmq006(a)gmail.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/soc/qcom/ocmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/soc/qcom/ocmem.c b/drivers/soc/qcom/ocmem.c
index 9c3bd37b6579..71130a2f62e9 100644
--- a/drivers/soc/qcom/ocmem.c
+++ b/drivers/soc/qcom/ocmem.c
@@ -202,9 +202,9 @@ struct ocmem *of_get_ocmem(struct device *dev)
}
ocmem = platform_get_drvdata(pdev);
+ put_device(&pdev->dev);
if (!ocmem) {
dev_err(dev, "Cannot get ocmem\n");
- put_device(&pdev->dev);
return ERR_PTR(-ENODEV);
}
return ocmem;
--
2.49.1
Hello,
This is Chenglong from Google Container Optimized OS. I'm reporting a
severe CPU hang regression that occurs after a high volume of file
creation and subsequent cgroup cleanup.
Through bisection, the issue appears to be caused by a chain reaction
between three commits related to writeback, unbound workqueues, and
CPU-hogging detection. The issue is greatly alleviated on the latest
mainline kernel but is not fully resolved, still occurring
intermittently (~1 in 10 runs).
How to reproduce
The kernel v6.1 is good. The hang is reliably triggered(over 80%
chance) on kernels v6.6 and 6.12 and intermittently on
mainline(6.17-rc7) with the following steps:
Environment: A machine with a fast SSD and a high core count (e.g.,
Google Cloud's N2-standard-128).
Workload: Concurrently generate a large number of files (e.g., 2
million) using multiple services managed by systemd-run. This creates
significant I/O and cgroup churn.
Trigger: After the file generation completes, terminate the
systemd-run services.
Result: Shortly after the services are killed, the system's CPU load
spikes, leading to a massive number of kworker/+inode_switch_wbs
threads and a system-wide hang/livelock where the machine becomes
unresponsive (20s - 300s).
Analysis and Problematic Commits
1. The initial commit: The process begins with a worker that can get
stuck busy-waiting on a spinlock.
Commit: ("writeback, cgroup: release dying cgwbs by switching attached inodes")
Effect: This introduced the inode_switch_wbs_work_fn worker to clean
up cgroup writeback structures. Under our test load, this worker
appears to hit a highly contended wb->list_lock spinlock, causing it
to burn 100% CPU without sleeping.
2. The Kworker Explosion: A subsequent change misinterprets the
spinning worker from Stage 1, leading to a runaway feedback loop of
worker creation.
Commit: 616db8779b1e ("workqueue: Automatically mark CPU-hogging work
items CPU_INTENSIVE")
Effect: This logic sees the spinning worker, marks it as
CPU_INTENSIVE, and excludes it from concurrency management. To handle
the work backlog, it spawns a new kworker, which then also gets stuck
on the same lock, repeating the cycle. This directly causes the
kworker count to explode from <50 to 100-2000+.
3. The System-Wide Lockdown: The final piece allows this localized
worker explosion to saturate the entire system.
Commit: 8639ecebc9b1 ("workqueue: Implement non-strict affinity scope
for unbound workqueues")
Effect: This change introduced non-strict affinity as the default. It
allows the hundreds of kworkers created in Stage 2 to be spread by the
scheduler across all available CPU cores, turning the problem into a
system-wide hang.
Current Status and Mitigation
Mainline Status: On the latest mainline kernel, the hang is far less
frequent and the kworker counts are reduced back to normal (<50),
suggesting other changes have partially mitigated the issue. However,
the hang still occurs, and when it does, the kworker count still
explodes (e.g., 300+), indicating the underlying feedback loop
remains.
Workaround: A reliable mitigation is to revert to the old workqueue
behavior by setting affinity_strict to 1. This contains the kworker
proliferation to a single CPU pod, preventing the system-wide hang.
Questions
Given that the issue is not fully resolved, could you please provide
some guidance?
1. Is this a known issue, and are there patches in development that
might fully address the underlying spinlock contention or the kworker
feedback loop?
2. Is there a better long-term mitigation we can apply other than
forcing strict affinity?
Thank you for your time and help.
Best regards,
Chenglong
On Wed, Sep 24, 2025 at 05:24:15PM -0700, Chenglong Tang wrote:
> The kernel v6.1 is good. The hang is reliably triggered(over 80% chance) on
> kernels v6.6 and 6.12 and intermittently on mainline(6.17-rc7) with the
> following steps:
> -
>
> *Environment:* A machine with a fast SSD and a high core count (e.g.,
> Google Cloud's N2-standard-128).
> -
>
> *Workload:* Concurrently generate a large number of files (e.g., 2 million)
> using multiple services managed by systemd-run. This creates significant
> I/O and cgroup churn.
> -
>
> *Trigger:* After the file generation completes, terminate the systemd-run
> services.
> -
>
> *Result:* Shortly after the services are killed, the system's CPU load
> spikes, leading to a massive number of kworker/+inode_switch_wbs threads
> and a system-wide hang/livelock where the machine becomes unresponsive (20s
> - 300s).
Sounds like:
http://lkml.kernel.org/r/20250912103522.2935-1-jack@suse.cz
Can you see whether those patches resolve the problem?
Thanks.
--
tejun
Make sure to drop the reference taken to the ahb platform device when
looking up its driver data while enabling the smmu.
Note that holding a reference to a device does not prevent its driver
data from going away.
Fixes: 89c788bab1f0 ("ARM: tegra: Add SMMU enabler in AHB")
Cc: stable(a)vger.kernel.org # 3.5
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/amba/tegra-ahb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/amba/tegra-ahb.c b/drivers/amba/tegra-ahb.c
index c0e8b765522d..f23c3ed01810 100644
--- a/drivers/amba/tegra-ahb.c
+++ b/drivers/amba/tegra-ahb.c
@@ -144,6 +144,7 @@ int tegra_ahb_enable_smmu(struct device_node *dn)
if (!dev)
return -EPROBE_DEFER;
ahb = dev_get_drvdata(dev);
+ put_device(dev);
val = gizmo_readl(ahb, AHB_ARBITRATION_XBAR_CTRL);
val |= AHB_ARBITRATION_XBAR_CTRL_SMMU_INIT_DONE;
gizmo_writel(ahb, val, AHB_ARBITRATION_XBAR_CTRL);
--
2.49.1
Make sure to drop the reference taken to the pbs platform device when
looking up its driver data.
Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.
Fixes: 5b2dd77be1d8 ("soc: qcom: add QCOM PBS driver")
Cc: stable(a)vger.kernel.org # 6.9
Cc: Anjelique Melendez <quic_amelende(a)quicinc.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/soc/qcom/qcom-pbs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/soc/qcom/qcom-pbs.c b/drivers/soc/qcom/qcom-pbs.c
index 1cc5d045f9dd..06b4a596e275 100644
--- a/drivers/soc/qcom/qcom-pbs.c
+++ b/drivers/soc/qcom/qcom-pbs.c
@@ -173,6 +173,8 @@ struct pbs_dev *get_pbs_client_device(struct device *dev)
return ERR_PTR(-EINVAL);
}
+ platform_device_put(pdev);
+
return pbs;
}
EXPORT_SYMBOL_GPL(get_pbs_client_device);
--
2.49.1
ARCH_STRICT_ALIGN is used for hardware without UAL, now it only control
the -mstrict-align flag. However, ACPI structures are packed by default
so will cause unaligned accesses.
To avoid this, define ACPI_MISALIGNMENT_NOT_SUPPORTED in asm/acenv.h to
align ACPI structures if ARCH_STRICT_ALIGN enabled.
Cc: stable(a)vger.kernel.org
Reported-by: Binbin Zhou <zhoubinbin(a)loongson.cn>
Suggested-by: Xi Ruoyao <xry111(a)xry111.site>
Suggested-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Modify asm/acenv.h instead of Makefile.
arch/loongarch/include/asm/acenv.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/loongarch/include/asm/acenv.h b/arch/loongarch/include/asm/acenv.h
index 52f298f7293b..483c955f2ae5 100644
--- a/arch/loongarch/include/asm/acenv.h
+++ b/arch/loongarch/include/asm/acenv.h
@@ -10,9 +10,8 @@
#ifndef _ASM_LOONGARCH_ACENV_H
#define _ASM_LOONGARCH_ACENV_H
-/*
- * This header is required by ACPI core, but we have nothing to fill in
- * right now. Will be updated later when needed.
- */
+#ifdef CONFIG_ARCH_STRICT_ALIGN
+#define ACPI_MISALIGNMENT_NOT_SUPPORTED
+#endif /* CONFIG_ARCH_STRICT_ALIGN */
#endif /* _ASM_LOONGARCH_ACENV_H */
--
2.47.3
Make sure to drop the reference taken to the mbox platform device when
looking up its driver data.
Note that holding a reference to a device does not prevent its driver
data from going away so there is no point in keeping the reference.
Fixes: 6e1457fcad3f ("soc: apple: mailbox: Add ASC/M3 mailbox driver")
Cc: stable(a)vger.kernel.org # 6.8
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/soc/apple/mailbox.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/soc/apple/mailbox.c b/drivers/soc/apple/mailbox.c
index 49a0955e82d6..1685da1da23d 100644
--- a/drivers/soc/apple/mailbox.c
+++ b/drivers/soc/apple/mailbox.c
@@ -299,11 +299,18 @@ struct apple_mbox *apple_mbox_get(struct device *dev, int index)
return ERR_PTR(-EPROBE_DEFER);
mbox = platform_get_drvdata(pdev);
- if (!mbox)
- return ERR_PTR(-EPROBE_DEFER);
+ if (!mbox) {
+ mbox = ERR_PTR(-EPROBE_DEFER);
+ goto out_put_pdev;
+ }
+
+ if (!device_link_add(dev, &pdev->dev, DL_FLAG_AUTOREMOVE_CONSUMER)) {
+ mbox = ERR_PTR(-ENODEV);
+ goto out_put_pdev;
+ }
- if (!device_link_add(dev, &pdev->dev, DL_FLAG_AUTOREMOVE_CONSUMER))
- return ERR_PTR(-ENODEV);
+out_put_pdev:
+ put_device(&pdev->dev);
return mbox;
}
--
2.49.1
The ACPI has ways to annotate the location of a USB device. Wire that
annotation to a v4l2 control.
To support all possible devices, add a way to annotate USB devices on DT
as well. The original binding discussion happened here:
https://lore.kernel.org/linux-devicetree/20241212-usb-orientation-v1-1-0b69…
The following patches are needed regardless if we finally add support
for orientation and rotation or not:
- media: uvcvideo: Always set default_value
- media: uvcvideo: Set a function for UVC_EXT_GPIO_UNIT
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v3:
- refactor dt bindings
- add media: uvcvideo: Use current_value for read-only controls
- get_(max|cur|def) = swentity_get_cur
- virtual_entity add codestyle
- Codestyle
- Fix xu get_info and get_len
- Drop ACPI_DEVICE_SWNODE_DEV_ROTATION
- Add missing select V4L2_FWNODE
- Link to v2: https://lore.kernel.org/r/20250605-uvc-orientation-v2-0-5710f9d030aa@chromi…
Changes in v2:
- Add support for rotation
- Rename fwnode to swentity
- Remove the patch to move the gpio file
- Remove patches already in media-committers
- Change priority of data origins
- Patch mipi-disco
- Link to v1: https://lore.kernel.org/r/20250403-uvc-orientation-v1-0-1a0cc595a62d@chromi…
---
Ricardo Ribalda (12):
media: uvcvideo: Always set default_value
media: uvcvideo: Set a function for UVC_EXT_GPIO_UNIT
media: v4l: fwnode: Support ACPI's _PLD for v4l2_fwnode_device_parse
ACPI: mipi-disco-img: Do not duplicate rotation info into swnodes
media: ipu-bridge: Use v4l2_fwnode_device_parse helper
media: ipu-bridge: Use v4l2_fwnode for unknown rotations
dt-bindings: media: Add usb-camera-module
media: uvcvideo: Add support for V4L2_CID_CAMERA_ORIENTATION
media: uvcvideo: Fill ctrl->info.selector earlier
media: uvcvideo: Add uvc_ctrl_query_entity helper
media: uvcvideo: Use current_value for read-only controls
media: uvcvideo: Add support for V4L2_CID_CAMERA_ROTATION
.../bindings/media/usb-camera-module.yaml | 46 +++++
MAINTAINERS | 1 +
drivers/acpi/mipi-disco-img.c | 15 --
drivers/media/pci/intel/Kconfig | 1 +
drivers/media/pci/intel/ipu-bridge.c | 58 +++---
drivers/media/usb/uvc/Kconfig | 1 +
drivers/media/usb/uvc/Makefile | 3 +-
drivers/media/usb/uvc/uvc_ctrl.c | 201 +++++++++++++++------
drivers/media/usb/uvc/uvc_driver.c | 22 ++-
drivers/media/usb/uvc/uvc_entity.c | 3 +-
drivers/media/usb/uvc/uvc_swentity.c | 107 +++++++++++
drivers/media/usb/uvc/uvcvideo.h | 22 +++
drivers/media/v4l2-core/v4l2-fwnode.c | 84 ++++++++-
include/acpi/acpi_bus.h | 1 -
include/linux/usb/uvc.h | 3 +
15 files changed, 441 insertions(+), 127 deletions(-)
---
base-commit: afb100a5ea7a13d7e6937dcd3b36b19dc6cc9328
change-id: 20250403-uvc-orientation-5f7f19da5adb
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
From: Conor Dooley <conor.dooley(a)microchip.com>
mpfs_gpio_direction_output() actually sets the line to input mode.
Use the correct register settings for output mode so that this function
actually works as intended.
This was a copy-paste mistake made when converting to regmap during the
driver submission process. It went unnoticed because my test for output
mode is toggling LEDs on an Icicle kit which functions with the
incorrect code. The internal reporter has yet to test the patch, but on
their system the incorrect setting may be the reason for failures to
drive the GPIO lines on the BeagleV-fire board.
CC: stable(a)vger.kernel.org
Fixes: a987b78f3615e ("gpio: mpfs: add polarfire soc gpio support")
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
---
CC: Conor Dooley <conor.dooley(a)microchip.com>
CC: Daire McNamara <daire.mcnamara(a)microchip.com>
CC: Cyril.Jean(a)microchip.com
CC: Linus Walleij <linus.walleij(a)linaro.org>
CC: Bartosz Golaszewski <brgl(a)bgdev.pl>
CC: linux-riscv(a)lists.infradead.org
CC: linux-gpio(a)vger.kernel.org
CC: linux-kernel(a)vger.kernel.org
---
drivers/gpio/gpio-mpfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-mpfs.c b/drivers/gpio/gpio-mpfs.c
index 82d557a7e5d8d..9468795b96348 100644
--- a/drivers/gpio/gpio-mpfs.c
+++ b/drivers/gpio/gpio-mpfs.c
@@ -69,7 +69,7 @@ static int mpfs_gpio_direction_output(struct gpio_chip *gc, unsigned int gpio_in
struct mpfs_gpio_chip *mpfs_gpio = gpiochip_get_data(gc);
regmap_update_bits(mpfs_gpio->regs, MPFS_GPIO_CTRL(gpio_index),
- MPFS_GPIO_DIR_MASK, MPFS_GPIO_EN_IN);
+ MPFS_GPIO_DIR_MASK, MPFS_GPIO_EN_OUT | MPFS_GPIO_EN_OUT_BUF);
regmap_update_bits(mpfs_gpio->regs, mpfs_gpio->offsets->outp, BIT(gpio_index),
value << gpio_index);
--
2.47.3
Hi,
Access a comprehensive, verified database of 501,853 IAA Mobility 2025 attendees and 750 exhibitors to grow your network and maximize business opportunities.
What you’ll get: Contact name, Job Title, Business Name, Physical Address, Phone Numbers, Official Email address and many more.
Delivered within 48 hours and 100% opt-in and GDPR compliant.
Special Discount: Enjoy 20% off for a limited time!
Would you prefer the entire attendee database or a customized segment tailored by geography, job role, or industry? I’ll forward the same to our Business Development Manager, who will provide you with more details about list acquisition.
Best regards,
Garnet Conwell
Sr. Marketing Manager
P.S. To opt out of future emails, simply reply “Unfollow.”
From: Max Kellermann <max.kellermann(a)ionos.com>
Commit 20d72b00ca81 ("netfs: Fix the request's work item to not
require a ref") modified netfs_alloc_request() to initialize the
reference counter to 2 instead of 1. The rationale was that the
requet's "work" would release the second reference after completion
(via netfs_{read,write}_collection_worker()). That works most of the
time if all goes well.
However, it leaks this additional reference if the request is released
before the I/O operation has been submitted: the error code path only
decrements the reference counter once and the work item will never be
queued because there will never be a completion.
This has caused outages of our whole server cluster today because
tasks were blocked in netfs_wait_for_outstanding_io(), leading to
deadlocks in Ceph (another bug that I will address soon in another
patch). This was caused by a netfs_pgpriv2_begin_copy_to_cache() call
which failed in fscache_begin_write_operation(). The leaked
netfs_io_request was never completed, leaving `netfs_inode.io_count`
with a positive value forever.
All of this is super-fragile code. Finding out which code paths will
lead to an eventual completion and which do not is hard to see:
- Some functions like netfs_create_write_req() allocate a request, but
will never submit any I/O.
- netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read()
and then netfs_put_request(); however, netfs_unbuffered_read() can
also fail early before submitting the I/O request, therefore another
netfs_put_request() call must be added there.
A rule of thumb is that functions that return a `netfs_io_request` do
not submit I/O, and all of their callers must be checked.
For my taste, the whole netfs code needs an overhaul to make reference
counting easier to understand and less fragile & obscure. But to fix
this bug here and now and produce a patch that is adequate for a
stable backport, I tried a minimal approach that quickly frees the
request object upon early failure.
I decided against adding a second netfs_put_request() each time
because that would cause code duplication which obscures the code
further. Instead, I added the function netfs_put_failed_request()
which frees such a failed request synchronously under the assumption
that the reference count is exactly 2 (as initially set by
netfs_alloc_request() and never touched), verified by a
WARN_ON_ONCE(). It then deinitializes the request object (without
going through the "cleanup_work" indirection) and frees the allocation
(with RCU protection to protect against concurrent access by
netfs_requests_seq_start()).
All code paths that fail early have been changed to call
netfs_put_failed_request() instead of netfs_put_request().
Additionally, I have added a netfs_put_request() call to
netfs_unbuffered_read() as explained above because the
netfs_put_failed_request() approach does not work there.
Fixes: 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref")
Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
cc: Paulo Alcantara <pc(a)manguebit.org>
cc: netfs(a)lists.linux.dev,
cc: linux-fsdevel(a)vger.kernel.org,
cc: stable(a)vger.kernel.org
---
Changes
=======
ver #3)
- Log the refcount in the tracepoint in netfs_put_failed_request().
ver #2)
- Fix missing RCU handling in netfs_put_failed_request().
fs/netfs/buffered_read.c | 10 +++++-----
fs/netfs/direct_read.c | 7 ++++++-
fs/netfs/direct_write.c | 6 +++++-
fs/netfs/internal.h | 1 +
fs/netfs/objects.c | 28 +++++++++++++++++++++++++---
fs/netfs/read_pgpriv2.c | 2 +-
fs/netfs/read_single.c | 2 +-
fs/netfs/write_issue.c | 3 +--
8 files changed, 45 insertions(+), 14 deletions(-)
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 18b3dc74c70e..37ab6f28b5ad 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -369,7 +369,7 @@ void netfs_readahead(struct readahead_control *ractl)
return netfs_put_request(rreq, netfs_rreq_trace_put_return);
cleanup_free:
- return netfs_put_request(rreq, netfs_rreq_trace_put_failed);
+ return netfs_put_failed_request(rreq);
}
EXPORT_SYMBOL(netfs_readahead);
@@ -472,7 +472,7 @@ static int netfs_read_gaps(struct file *file, struct folio *folio)
return ret < 0 ? ret : 0;
discard:
- netfs_put_request(rreq, netfs_rreq_trace_put_discard);
+ netfs_put_failed_request(rreq);
alloc_error:
folio_unlock(folio);
return ret;
@@ -532,7 +532,7 @@ int netfs_read_folio(struct file *file, struct folio *folio)
return ret < 0 ? ret : 0;
discard:
- netfs_put_request(rreq, netfs_rreq_trace_put_discard);
+ netfs_put_failed_request(rreq);
alloc_error:
folio_unlock(folio);
return ret;
@@ -699,7 +699,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
return 0;
error_put:
- netfs_put_request(rreq, netfs_rreq_trace_put_failed);
+ netfs_put_failed_request(rreq);
error:
if (folio) {
folio_unlock(folio);
@@ -754,7 +754,7 @@ int netfs_prefetch_for_write(struct file *file, struct folio *folio,
return ret < 0 ? ret : 0;
error_put:
- netfs_put_request(rreq, netfs_rreq_trace_put_discard);
+ netfs_put_failed_request(rreq);
error:
_leave(" = %d", ret);
return ret;
diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index a05e13472baf..a498ee8d6674 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -131,6 +131,7 @@ static ssize_t netfs_unbuffered_read(struct netfs_io_request *rreq, bool sync)
if (rreq->len == 0) {
pr_err("Zero-sized read [R=%x]\n", rreq->debug_id);
+ netfs_put_request(rreq, netfs_rreq_trace_put_discard);
return -EIO;
}
@@ -205,7 +206,7 @@ ssize_t netfs_unbuffered_read_iter_locked(struct kiocb *iocb, struct iov_iter *i
if (user_backed_iter(iter)) {
ret = netfs_extract_user_iter(iter, rreq->len, &rreq->buffer.iter, 0);
if (ret < 0)
- goto out;
+ goto error_put;
rreq->direct_bv = (struct bio_vec *)rreq->buffer.iter.bvec;
rreq->direct_bv_count = ret;
rreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
@@ -238,6 +239,10 @@ ssize_t netfs_unbuffered_read_iter_locked(struct kiocb *iocb, struct iov_iter *i
if (ret > 0)
orig_count -= ret;
return ret;
+
+error_put:
+ netfs_put_failed_request(rreq);
+ return ret;
}
EXPORT_SYMBOL(netfs_unbuffered_read_iter_locked);
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index a16660ab7f83..a9d1c3b2c084 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -57,7 +57,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
if (n < 0) {
ret = n;
- goto out;
+ goto error_put;
}
wreq->direct_bv = (struct bio_vec *)wreq->buffer.iter.bvec;
wreq->direct_bv_count = n;
@@ -101,6 +101,10 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
out:
netfs_put_request(wreq, netfs_rreq_trace_put_return);
return ret;
+
+error_put:
+ netfs_put_failed_request(wreq);
+ return ret;
}
EXPORT_SYMBOL(netfs_unbuffered_write_iter_locked);
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index d4f16fefd965..4319611f5354 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -87,6 +87,7 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
void netfs_get_request(struct netfs_io_request *rreq, enum netfs_rreq_ref_trace what);
void netfs_clear_subrequests(struct netfs_io_request *rreq);
void netfs_put_request(struct netfs_io_request *rreq, enum netfs_rreq_ref_trace what);
+void netfs_put_failed_request(struct netfs_io_request *rreq);
struct netfs_io_subrequest *netfs_alloc_subrequest(struct netfs_io_request *rreq);
static inline void netfs_see_request(struct netfs_io_request *rreq,
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index e8c99738b5bb..39d5e13f7248 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -116,10 +116,8 @@ static void netfs_free_request_rcu(struct rcu_head *rcu)
netfs_stat_d(&netfs_n_rh_rreq);
}
-static void netfs_free_request(struct work_struct *work)
+static void netfs_deinit_request(struct netfs_io_request *rreq)
{
- struct netfs_io_request *rreq =
- container_of(work, struct netfs_io_request, cleanup_work);
struct netfs_inode *ictx = netfs_inode(rreq->inode);
unsigned int i;
@@ -149,6 +147,14 @@ static void netfs_free_request(struct work_struct *work)
if (atomic_dec_and_test(&ictx->io_count))
wake_up_var(&ictx->io_count);
+}
+
+static void netfs_free_request(struct work_struct *work)
+{
+ struct netfs_io_request *rreq =
+ container_of(work, struct netfs_io_request, cleanup_work);
+
+ netfs_deinit_request(rreq);
call_rcu(&rreq->rcu, netfs_free_request_rcu);
}
@@ -167,6 +173,22 @@ void netfs_put_request(struct netfs_io_request *rreq, enum netfs_rreq_ref_trace
}
}
+/*
+ * Free a request (synchronously) that was just allocated but has
+ * failed before it could be submitted.
+ */
+void netfs_put_failed_request(struct netfs_io_request *rreq)
+{
+ /* new requests have two references (see
+ * netfs_alloc_request(), and this function is only allowed on
+ * new request objects
+ */
+ WARN_ON_ONCE(refcount_read(&rreq->ref) != 2);
+
+ trace_netfs_rreq_ref(rreq->debug_id, 0, netfs_rreq_trace_put_failed);
+ netfs_free_request(&rreq->cleanup_work);
+}
+
/*
* Allocate and partially initialise an I/O request structure.
*/
diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index 8097bc069c1d..a1489aa29f78 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -118,7 +118,7 @@ static struct netfs_io_request *netfs_pgpriv2_begin_copy_to_cache(
return creq;
cancel_put:
- netfs_put_request(creq, netfs_rreq_trace_put_return);
+ netfs_put_failed_request(creq);
cancel:
rreq->copy_to_cache = ERR_PTR(-ENOBUFS);
clear_bit(NETFS_RREQ_FOLIO_COPY_TO_CACHE, &rreq->flags);
diff --git a/fs/netfs/read_single.c b/fs/netfs/read_single.c
index fa622a6cd56d..5c0dc4efc792 100644
--- a/fs/netfs/read_single.c
+++ b/fs/netfs/read_single.c
@@ -189,7 +189,7 @@ ssize_t netfs_read_single(struct inode *inode, struct file *file, struct iov_ite
return ret;
cleanup_free:
- netfs_put_request(rreq, netfs_rreq_trace_put_failed);
+ netfs_put_failed_request(rreq);
return ret;
}
EXPORT_SYMBOL(netfs_read_single);
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 0584cba1a043..dd8743bc8d7f 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -133,8 +133,7 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
return wreq;
nomem:
- wreq->error = -ENOMEM;
- netfs_put_request(wreq, netfs_rreq_trace_put_failed);
+ netfs_put_failed_request(wreq);
return ERR_PTR(-ENOMEM);
}
A chip freeze is observed on i.MX7D when PCIe RC kicks off the PM_PME
message and no any devices are connected on the port.
To workaroud such kind of issue, skip PME_Turn_Off message if there is
no endpoint connected.
Cc: stable(a)vger.kernel.org
Fixes: 4774faf854f5 ("PCI: dwc: Implement generic suspend/resume functionality")
Fixes: a528d1a72597 ("PCI: imx6: Use DWC common suspend resume method")
Signed-off-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
---
drivers/pci/controller/dwc/pcie-designware-host.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 57a1ba08c427..b303a74b0fd7 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -1008,12 +1008,15 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
u32 val;
int ret;
- if (pci->pp.ops->pme_turn_off) {
- pci->pp.ops->pme_turn_off(&pci->pp);
- } else {
- ret = dw_pcie_pme_turn_off(pci);
- if (ret)
- return ret;
+ /* Skip PME_Turn_Off message if there is no endpoint connected */
+ if (dw_pcie_get_ltssm(pci) > DW_PCIE_LTSSM_DETECT_WAIT) {
+ if (pci->pp.ops->pme_turn_off) {
+ pci->pp.ops->pme_turn_off(&pci->pp);
+ } else {
+ ret = dw_pcie_pme_turn_off(pci);
+ if (ret)
+ return ret;
+ }
}
if (dwc_quirk(pci, QUIRK_NOL2POLL_IN_PM)) {
--
2.37.1
The quilt patch titled
Subject: mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call()
has been removed from the -mm tree. Its filename was
mm-damon-sysfs-do-not-ignore-callbacks-return-value-in-damon_sysfs_damon_call.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Akinobu Mita <akinobu.mita(a)gmail.com>
Subject: mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call()
Date: Sat, 20 Sep 2025 22:25:46 +0900
The callback return value is ignored in damon_sysfs_damon_call(), which
means that it is not possible to detect invalid user input when writing
commands such as 'commit' to
/sys/kernel/mm/damon/admin/kdamonds/<K>/state. Fix it.
Link: https://lkml.kernel.org/r/20250920132546.5822-1-akinobu.mita@gmail.com
Fixes: f64539dcdb87 ("mm/damon/sysfs: use damon_call() for update_schemes_stats")
Signed-off-by: Akinobu Mita <akinobu.mita(a)gmail.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-do-not-ignore-callbacks-return-value-in-damon_sysfs_damon_call
+++ a/mm/damon/sysfs.c
@@ -1592,12 +1592,14 @@ static int damon_sysfs_damon_call(int (*
struct damon_sysfs_kdamond *kdamond)
{
struct damon_call_control call_control = {};
+ int err;
if (!kdamond->damon_ctx)
return -EINVAL;
call_control.fn = fn;
call_control.data = kdamond;
- return damon_call(kdamond->damon_ctx, &call_control);
+ err = damon_call(kdamond->damon_ctx, &call_control);
+ return err ? err : call_control.return_code;
}
struct damon_sysfs_schemes_walk_data {
_
Patches currently in -mm which might be from akinobu.mita(a)gmail.com are
The quilt patch titled
Subject: fs/proc/task_mmu: check p->vec_buf for NULL
has been removed from the -mm tree. Its filename was
fs-proc-task_mmu-check-p-vec_buf-for-null.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jakub Acs <acsjakub(a)amazon.de>
Subject: fs/proc/task_mmu: check p->vec_buf for NULL
Date: Mon, 22 Sep 2025 08:22:05 +0000
When the PAGEMAP_SCAN ioctl is invoked with vec_len = 0 reaches
pagemap_scan_backout_range(), kernel panics with null-ptr-deref:
[ 44.936808] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 44.937797] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 44.938391] CPU: 1 UID: 0 PID: 2480 Comm: reproducer Not tainted 6.17.0-rc6 #22 PREEMPT(none)
[ 44.939062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 44.939935] RIP: 0010:pagemap_scan_thp_entry.isra.0+0x741/0xa80
<snip registers, unreliable trace>
[ 44.946828] Call Trace:
[ 44.947030] <TASK>
[ 44.949219] pagemap_scan_pmd_entry+0xec/0xfa0
[ 44.952593] walk_pmd_range.isra.0+0x302/0x910
[ 44.954069] walk_pud_range.isra.0+0x419/0x790
[ 44.954427] walk_p4d_range+0x41e/0x620
[ 44.954743] walk_pgd_range+0x31e/0x630
[ 44.955057] __walk_page_range+0x160/0x670
[ 44.956883] walk_page_range_mm+0x408/0x980
[ 44.958677] walk_page_range+0x66/0x90
[ 44.958984] do_pagemap_scan+0x28d/0x9c0
[ 44.961833] do_pagemap_cmd+0x59/0x80
[ 44.962484] __x64_sys_ioctl+0x18d/0x210
[ 44.962804] do_syscall_64+0x5b/0x290
[ 44.963111] entry_SYSCALL_64_after_hwframe+0x76/0x7e
vec_len = 0 in pagemap_scan_init_bounce_buffer() means no buffers are
allocated and p->vec_buf remains set to NULL.
This breaks an assumption made later in pagemap_scan_backout_range(), that
page_region is always allocated for p->vec_buf_index.
Fix it by explicitly checking p->vec_buf for NULL before dereferencing.
Other sites that might run into same deref-issue are already (directly or
transitively) protected by checking p->vec_buf.
Note:
From PAGEMAP_SCAN man page, it seems vec_len = 0 is valid when no output
is requested and it's only the side effects caller is interested in,
hence it passes check in pagemap_scan_get_args().
This issue was found by syzkaller.
Link: https://lkml.kernel.org/r/20250922082206.6889-1-acsjakub@amazon.de
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Jinjiang Tu <tujinjiang(a)huawei.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Penglei Jiang <superman.xpt(a)gmail.com>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: "Micha�� Miros��aw" <mirq-linux(a)rere.qmqm.pl>
Cc: Stephen Rothwell <sfr(a)canb.auug.org.au>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 3 +++
1 file changed, 3 insertions(+)
--- a/fs/proc/task_mmu.c~fs-proc-task_mmu-check-p-vec_buf-for-null
+++ a/fs/proc/task_mmu.c
@@ -2417,6 +2417,9 @@ static void pagemap_scan_backout_range(s
{
struct page_region *cur_buf = &p->vec_buf[p->vec_buf_index];
+ if (!p->vec_buf)
+ return;
+
if (cur_buf->start != addr)
cur_buf->end = addr;
else
_
Patches currently in -mm which might be from acsjakub(a)amazon.de are
The quilt patch titled
Subject: kmsan: fix out-of-bounds access to shadow memory
has been removed from the -mm tree. Its filename was
kmsan-fix-out-of-bounds-access-to-shadow-memory.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Eric Biggers <ebiggers(a)kernel.org>
Subject: kmsan: fix out-of-bounds access to shadow memory
Date: Thu, 11 Sep 2025 12:58:58 -0700
Running sha224_kunit on a KMSAN-enabled kernel results in a crash in
kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary)
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100
[...]
Call Trace:
<TASK>
__msan_memset+0xee/0x1a0
sha224_final+0x9e/0x350
test_hash_buffer_overruns+0x46f/0x5f0
? kmsan_get_shadow_origin_ptr+0x46/0xa0
? __pfx_test_hash_buffer_overruns+0x10/0x10
kunit_try_run_case+0x198/0xa00
This occurs when memset() is called on a buffer that is not 4-byte aligned
and extends to the end of a guard page, i.e. the next page is unmapped.
The bug is that the loop at the end of kmsan_internal_set_shadow_origin()
accesses the wrong shadow memory bytes when the address is not 4-byte
aligned. Since each 4 bytes are associated with an origin, it rounds the
address and size so that it can access all the origins that contain the
buffer. However, when it checks the corresponding shadow bytes for a
particular origin, it incorrectly uses the original unrounded shadow
address. This results in reads from shadow memory beyond the end of the
buffer's shadow memory, which crashes when that memory is not mapped.
To fix this, correctly align the shadow address before accessing the 4
shadow bytes corresponding to each origin.
Link: https://lkml.kernel.org/r/20250911195858.394235-1-ebiggers@kernel.org
Fixes: 2ef3cec44c60 ("kmsan: do not wipe out origin when doing partial unpoisoning")
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
Tested-by: Alexander Potapenko <glider(a)google.com>
Reviewed-by: Alexander Potapenko <glider(a)google.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmsan/core.c | 10 +++++++---
mm/kmsan/kmsan_test.c | 16 ++++++++++++++++
2 files changed, 23 insertions(+), 3 deletions(-)
--- a/mm/kmsan/core.c~kmsan-fix-out-of-bounds-access-to-shadow-memory
+++ a/mm/kmsan/core.c
@@ -195,7 +195,8 @@ void kmsan_internal_set_shadow_origin(vo
u32 origin, bool checked)
{
u64 address = (u64)addr;
- u32 *shadow_start, *origin_start;
+ void *shadow_start;
+ u32 *aligned_shadow, *origin_start;
size_t pad = 0;
KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
@@ -214,9 +215,12 @@ void kmsan_internal_set_shadow_origin(vo
}
__memset(shadow_start, b, size);
- if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
+ if (IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
+ aligned_shadow = shadow_start;
+ } else {
pad = address % KMSAN_ORIGIN_SIZE;
address -= pad;
+ aligned_shadow = shadow_start - pad;
size += pad;
}
size = ALIGN(size, KMSAN_ORIGIN_SIZE);
@@ -230,7 +234,7 @@ void kmsan_internal_set_shadow_origin(vo
* corresponding shadow slot is zero.
*/
for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++) {
- if (origin || !shadow_start[i])
+ if (origin || !aligned_shadow[i])
origin_start[i] = origin;
}
}
--- a/mm/kmsan/kmsan_test.c~kmsan-fix-out-of-bounds-access-to-shadow-memory
+++ a/mm/kmsan/kmsan_test.c
@@ -556,6 +556,21 @@ DEFINE_TEST_MEMSETXX(16)
DEFINE_TEST_MEMSETXX(32)
DEFINE_TEST_MEMSETXX(64)
+/* Test case: ensure that KMSAN does not access shadow memory out of bounds. */
+static void test_memset_on_guarded_buffer(struct kunit *test)
+{
+ void *buf = vmalloc(PAGE_SIZE);
+
+ kunit_info(test,
+ "memset() on ends of guarded buffer should not crash\n");
+
+ for (size_t size = 0; size <= 128; size++) {
+ memset(buf, 0xff, size);
+ memset(buf + PAGE_SIZE - size, 0xff, size);
+ }
+ vfree(buf);
+}
+
static noinline void fibonacci(int *array, int size, int start)
{
if (start < 2 || (start == size))
@@ -677,6 +692,7 @@ static struct kunit_case kmsan_test_case
KUNIT_CASE(test_memset16),
KUNIT_CASE(test_memset32),
KUNIT_CASE(test_memset64),
+ KUNIT_CASE(test_memset_on_guarded_buffer),
KUNIT_CASE(test_long_origin_chain),
KUNIT_CASE(test_stackdepot_roundtrip),
KUNIT_CASE(test_unpoison_memory),
_
Patches currently in -mm which might be from ebiggers(a)kernel.org are
The quilt patch titled
Subject: mm/hugetlb: fix folio is still mapped when deleted
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-folio-is-still-mapped-when-deleted.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jinjiang Tu <tujinjiang(a)huawei.com>
Subject: mm/hugetlb: fix folio is still mapped when deleted
Date: Fri, 12 Sep 2025 15:41:39 +0800
Migration may be raced with fallocating hole. remove_inode_single_folio
will unmap the folio if the folio is still mapped. However, it's called
without folio lock. If the folio is migrated and the mapped pte has been
converted to migration entry, folio_mapped() returns false, and won't
unmap it. Due to extra refcount held by remove_inode_single_folio,
migration fails, restores migration entry to normal pte, and the folio is
mapped again. As a result, we triggered BUG in filemap_unaccount_folio.
The log is as follows:
BUG: Bad page cache in process hugetlb pfn:156c00
page: refcount:515 mapcount:0 mapping:0000000099fef6e1 index:0x0 pfn:0x156c00
head: order:9 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0
aops:hugetlbfs_aops ino:dcc dentry name(?):"my_hugepage_file"
flags: 0x17ffffc00000c1(locked|waiters|head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f4(hugetlb)
page dumped because: still mapped when deleted
CPU: 1 UID: 0 PID: 395 Comm: hugetlb Not tainted 6.17.0-rc5-00044-g7aac71907bde-dirty #484 NONE
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
<TASK>
dump_stack_lvl+0x4f/0x70
filemap_unaccount_folio+0xc4/0x1c0
__filemap_remove_folio+0x38/0x1c0
filemap_remove_folio+0x41/0xd0
remove_inode_hugepages+0x142/0x250
hugetlbfs_fallocate+0x471/0x5a0
vfs_fallocate+0x149/0x380
Hold folio lock before checking if the folio is mapped to avold race with
migration.
Link: https://lkml.kernel.org/r/20250912074139.3575005-1-tujinjiang@huawei.com
Fixes: 4aae8d1c051e ("mm/hugetlbfs: unmap pages if page fault raced with hole punch")
Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/fs/hugetlbfs/inode.c~mm-hugetlb-fix-folio-is-still-mapped-when-deleted
+++ a/fs/hugetlbfs/inode.c
@@ -517,14 +517,16 @@ static bool remove_inode_single_folio(st
/*
* If folio is mapped, it was faulted in after being
- * unmapped in caller. Unmap (again) while holding
- * the fault mutex. The mutex will prevent faults
- * until we finish removing the folio.
+ * unmapped in caller or hugetlb_vmdelete_list() skips
+ * unmapping it due to fail to grab lock. Unmap (again)
+ * while holding the fault mutex. The mutex will prevent
+ * faults until we finish removing the folio. Hold folio
+ * lock to guarantee no concurrent migration.
*/
+ folio_lock(folio);
if (unlikely(folio_mapped(folio)))
hugetlb_unmap_file_folio(h, mapping, folio, index);
- folio_lock(folio);
/*
* We must remove the folio from page cache before removing
* the region/ reserve map (hugetlb_unreserve_pages). In
_
Patches currently in -mm which might be from tujinjiang(a)huawei.com are