The patch titled
Subject: zram: fix uninitialized ZRAM not releasing backing device
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
zram-fix-uninitialized-zram-not-releasing-backing-device.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kairui Song <kasong(a)tencent.com>
Subject: zram: fix uninitialized ZRAM not releasing backing device
Date: Thu, 5 Dec 2024 02:02:24 +0800
Setting backing device is done before ZRAM initialization. If we set the
backing device, then remove the ZRAM module without initializing the
device, the backing device reference will be leaked and the device will be
hold forever.
Fix this by always check and release the backing device when resetting or
removing ZRAM.
Link: https://lkml.kernel.org/r/20241204180224.31069-3-ryncsn@gmail.com
Fixes: 013bf95a83ec ("zram: add interface to specif backing device")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Reported-by: Desheng Wu <deshengwu(a)tencent.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/block/zram/zram_drv.c | 3 +++
1 file changed, 3 insertions(+)
--- a/drivers/block/zram/zram_drv.c~zram-fix-uninitialized-zram-not-releasing-backing-device
+++ a/drivers/block/zram/zram_drv.c
@@ -2335,6 +2335,9 @@ static void zram_reset_device(struct zra
zram->limit_pages = 0;
if (!init_done(zram)) {
+ /* Backing device could be set before ZRAM initialization. */
+ reset_bdev(zram);
+
up_write(&zram->init_lock);
return;
}
_
Patches currently in -mm which might be from kasong(a)tencent.com are
zram-refuse-to-use-zero-sized-block-device-as-backing-device.patch
zram-fix-uninitialized-zram-not-releasing-backing-device.patch
The patch titled
Subject: zram: refuse to use zero sized block device as backing device
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
zram-refuse-to-use-zero-sized-block-device-as-backing-device.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kairui Song <kasong(a)tencent.com>
Subject: zram: refuse to use zero sized block device as backing device
Date: Thu, 5 Dec 2024 02:02:23 +0800
Patch series "zram: fix backing device setup issue".
This series fixes two bugs of backing device setting:
- ZRAM should reject using a zero sized (or the uninitialized ZRAM
device itself) as the backing device.
- Fix backing device leaking when removing a uninitialized ZRAM
device.
This patch (of 2):
Setting a zero sized block device as backing device is pointless, and one
can easily create a recursive loop by setting the uninitialized ZRAM
device itself as its own backing device by (zram0 is uninitialized):
echo /dev/zram0 > /sys/block/zram0/backing_dev
It's definitely a wrong config, and the module will pin itself, kernel
should refuse doing so in the first place.
By refusing to use zero sized device we avoided misuse cases including
this one above.
Link: https://lkml.kernel.org/r/20241204180224.31069-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20241204180224.31069-2-ryncsn@gmail.com
Fixes: 013bf95a83ec ("zram: add interface to specif backing device")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Reported-by: Desheng Wu <deshengwu(a)tencent.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/block/zram/zram_drv.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/block/zram/zram_drv.c~zram-refuse-to-use-zero-sized-block-device-as-backing-device
+++ a/drivers/block/zram/zram_drv.c
@@ -614,6 +614,12 @@ static ssize_t backing_dev_store(struct
}
nr_pages = i_size_read(inode) >> PAGE_SHIFT;
+ /* Refuse to use zero sized device (also prevents self reference) */
+ if (!nr_pages) {
+ err = -EINVAL;
+ goto out;
+ }
+
bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long);
bitmap = kvzalloc(bitmap_sz, GFP_KERNEL);
if (!bitmap) {
_
Patches currently in -mm which might be from kasong(a)tencent.com are
zram-refuse-to-use-zero-sized-block-device-as-backing-device.patch
zram-fix-uninitialized-zram-not-releasing-backing-device.patch
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: c9a4b55431e5220347881e148725bed69c84e037
Gitweb: https://git.kernel.org/tip/c9a4b55431e5220347881e148725bed69c84e037
Author: Len Brown <len.brown(a)intel.com>
AuthorDate: Tue, 12 Nov 2024 21:07:00 -05:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Wed, 04 Dec 2024 12:30:14 -08:00
x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation
Under some conditions, MONITOR wakeups on Lunar Lake processors
can be lost, resulting in significant user-visible delays.
Add Lunar Lake to X86_BUG_MONITOR so that wake_up_idle_cpu()
always sends an IPI, avoiding this potential delay.
Reported originally here:
https://bugzilla.kernel.org/show_bug.cgi?id=219364
[ dhansen: tweak subject ]
Signed-off-by: Len Brown <len.brown(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/a4aa8842a3c3bfdb7fe9807710eef159cbf0e705.173146…
---
arch/x86/kernel/cpu/intel.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index d1de300..8ded9f8 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -555,7 +555,9 @@ static void init_intel(struct cpuinfo_x86 *c)
c->x86_vfm == INTEL_WESTMERE_EX))
set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR);
- if (boot_cpu_has(X86_FEATURE_MWAIT) && c->x86_vfm == INTEL_ATOM_GOLDMONT)
+ if (boot_cpu_has(X86_FEATURE_MWAIT) &&
+ (c->x86_vfm == INTEL_ATOM_GOLDMONT ||
+ c->x86_vfm == INTEL_LUNARLAKE_M))
set_cpu_bug(c, X86_BUG_MONITOR);
#ifdef CONFIG_X86_64
From: Keith Busch <kbusch(a)kernel.org>
[ Upstream commit 2fa046449a82a7d0f6d9721dd83e348816038444 ]
The "bus" and "cxl_bus" reset methods reset a device by asserting Secondary
Bus Reset on the bridge leading to the device. These only work if the
device is the only device below the bridge.
Add a sysfs 'reset_subordinate' attribute on bridges that can assert
Secondary Bus Reset regardless of how many devices are below the bridge.
This resets all the devices below a bridge in a single command, including
the locking and config space save/restore that reset methods normally do.
This may be the only way to reset devices that don't support other reset
methods (ACPI, FLR, PM reset, etc).
Link: https://lore.kernel.org/r/20241025222755.3756162-1-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
[bhelgaas: commit log, add capable(CAP_SYS_ADMIN) check]
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Alex Williamson <alex.williamson(a)redhat.com>
Reviewed-by: Amey Narkhede <ameynarkhede03(a)gmail.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Documentation/ABI/testing/sysfs-bus-pci | 11 +++++++++++
drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++
drivers/pci/pci.c | 2 +-
drivers/pci/pci.h | 1 +
4 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 44d4b2be92fd4..c68d1d9a4d479 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -125,6 +125,17 @@ Description:
will be present in sysfs. Writing 1 to this file
will perform reset.
+What: /sys/bus/pci/devices/.../reset_subordinate
+Date: October 2024
+Contact: linux-pci(a)vger.kernel.org
+Description:
+ This is visible only for bridge devices. If you want to reset
+ all devices attached through the subordinate bus of a specific
+ bridge device, writing 1 to this will try to do it. This will
+ affect all devices attached to the system through this bridge
+ similiar to writing 1 to their individual "reset" file, so use
+ with caution.
+
What: /sys/bus/pci/devices/.../vpd
Date: February 2008
Contact: Ben Hutchings <bwh(a)kernel.org>
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index f68798763af8d..d5287dfac9171 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -507,6 +507,31 @@ static ssize_t dev_bus_rescan_store(struct device *dev,
}
static DEVICE_ATTR(rescan, (S_IWUSR|S_IWGRP), NULL, dev_bus_rescan_store);
+static ssize_t reset_subordinate_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct pci_bus *bus = pdev->subordinate;
+ unsigned long val;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (kstrtoul(buf, 0, &val) < 0)
+ return -EINVAL;
+
+ if (val) {
+ int ret = __pci_reset_bus(bus);
+
+ if (ret)
+ return ret;
+ }
+
+ return count;
+}
+static DEVICE_ATTR_WO(reset_subordinate);
+
#if defined(CONFIG_PM) && defined(CONFIG_ACPI)
static ssize_t d3cold_allowed_store(struct device *dev,
struct device_attribute *attr,
@@ -778,6 +803,7 @@ static struct attribute *pci_dev_attrs[] = {
static struct attribute *pci_bridge_attrs[] = {
&dev_attr_subordinate_bus_number.attr,
&dev_attr_secondary_bus_number.attr,
+ &dev_attr_reset_subordinate.attr,
NULL,
};
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aa2be8d815048..e22040c8a0aec 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5295,7 +5295,7 @@ EXPORT_SYMBOL_GPL(pci_probe_reset_bus);
*
* Same as above except return -EAGAIN if the bus cannot be locked
*/
-static int __pci_reset_bus(struct pci_bus *bus)
+int __pci_reset_bus(struct pci_bus *bus)
{
int rc;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 39725b71300f8..1db9b8ff6043d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -36,6 +36,7 @@ int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai,
int pci_probe_reset_function(struct pci_dev *dev);
int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
int pci_bus_error_reset(struct pci_dev *dev);
+int __pci_reset_bus(struct pci_bus *bus);
/**
* struct pci_platform_pm_ops - Firmware PM callbacks
--
2.43.0
From: Qi Han <hanqi(a)vivo.com>
[ Upstream commit d5c367ef8287fb4d235c46a2f8c8d68715f3a0ca ]
creating a large files during checkpoint disable until it runs out of
space and then delete it, then remount to enable checkpoint again, and
then unmount the filesystem triggers the f2fs_bug_on as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:896!
CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty #360
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:f2fs_evict_inode+0x58c/0x610
Call Trace:
__die_body+0x15/0x60
die+0x33/0x50
do_trap+0x10a/0x120
f2fs_evict_inode+0x58c/0x610
do_error_trap+0x60/0x80
f2fs_evict_inode+0x58c/0x610
exc_invalid_op+0x53/0x60
f2fs_evict_inode+0x58c/0x610
asm_exc_invalid_op+0x16/0x20
f2fs_evict_inode+0x58c/0x610
evict+0x101/0x260
dispose_list+0x30/0x50
evict_inodes+0x140/0x190
generic_shutdown_super+0x2f/0x150
kill_block_super+0x11/0x40
kill_f2fs_super+0x7d/0x140
deactivate_locked_super+0x2a/0x70
cleanup_mnt+0xb3/0x140
task_work_run+0x61/0x90
The root cause is: creating large files during disable checkpoint
period results in not enough free segments, so when writing back root
inode will failed in f2fs_enable_checkpoint. When umount the file
system after enabling checkpoint, the root inode is dirty in
f2fs_evict_inode function, which triggers BUG_ON. The steps to
reproduce are as follows:
dd if=/dev/zero of=f2fs.img bs=1M count=55
mount f2fs.img f2fs_dir -o checkpoint=disable:10%
dd if=/dev/zero of=big bs=1M count=50
sync
rm big
mount -o remount,checkpoint=enable f2fs_dir
umount f2fs_dir
Let's redirty inode when there is not free segments during checkpoint
is disable.
Signed-off-by: Qi Han <hanqi(a)vivo.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/f2fs/inode.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 53e1a757e4e17..b0cbb01df8cba 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -631,8 +631,10 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
!is_inode_flag_set(inode, FI_DIRTY_INODE))
return 0;
- if (!f2fs_is_checkpoint_ready(sbi))
+ if (!f2fs_is_checkpoint_ready(sbi)) {
+ f2fs_mark_inode_dirty_sync(inode, true);
return -ENOSPC;
+ }
/*
* We need to balance fs here to prevent from producing dirty node pages
--
2.43.0
From: Qi Han <hanqi(a)vivo.com>
[ Upstream commit d5c367ef8287fb4d235c46a2f8c8d68715f3a0ca ]
creating a large files during checkpoint disable until it runs out of
space and then delete it, then remount to enable checkpoint again, and
then unmount the filesystem triggers the f2fs_bug_on as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:896!
CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty #360
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:f2fs_evict_inode+0x58c/0x610
Call Trace:
__die_body+0x15/0x60
die+0x33/0x50
do_trap+0x10a/0x120
f2fs_evict_inode+0x58c/0x610
do_error_trap+0x60/0x80
f2fs_evict_inode+0x58c/0x610
exc_invalid_op+0x53/0x60
f2fs_evict_inode+0x58c/0x610
asm_exc_invalid_op+0x16/0x20
f2fs_evict_inode+0x58c/0x610
evict+0x101/0x260
dispose_list+0x30/0x50
evict_inodes+0x140/0x190
generic_shutdown_super+0x2f/0x150
kill_block_super+0x11/0x40
kill_f2fs_super+0x7d/0x140
deactivate_locked_super+0x2a/0x70
cleanup_mnt+0xb3/0x140
task_work_run+0x61/0x90
The root cause is: creating large files during disable checkpoint
period results in not enough free segments, so when writing back root
inode will failed in f2fs_enable_checkpoint. When umount the file
system after enabling checkpoint, the root inode is dirty in
f2fs_evict_inode function, which triggers BUG_ON. The steps to
reproduce are as follows:
dd if=/dev/zero of=f2fs.img bs=1M count=55
mount f2fs.img f2fs_dir -o checkpoint=disable:10%
dd if=/dev/zero of=big bs=1M count=50
sync
rm big
mount -o remount,checkpoint=enable f2fs_dir
umount f2fs_dir
Let's redirty inode when there is not free segments during checkpoint
is disable.
Signed-off-by: Qi Han <hanqi(a)vivo.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/f2fs/inode.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index b23e6a848e9b7..452c0240cc11e 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -701,8 +701,10 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
!is_inode_flag_set(inode, FI_DIRTY_INODE))
return 0;
- if (!f2fs_is_checkpoint_ready(sbi))
+ if (!f2fs_is_checkpoint_ready(sbi)) {
+ f2fs_mark_inode_dirty_sync(inode, true);
return -ENOSPC;
+ }
/*
* We need to balance fs here to prevent from producing dirty node pages
--
2.43.0
From: Qi Han <hanqi(a)vivo.com>
[ Upstream commit d5c367ef8287fb4d235c46a2f8c8d68715f3a0ca ]
creating a large files during checkpoint disable until it runs out of
space and then delete it, then remount to enable checkpoint again, and
then unmount the filesystem triggers the f2fs_bug_on as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:896!
CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty #360
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:f2fs_evict_inode+0x58c/0x610
Call Trace:
__die_body+0x15/0x60
die+0x33/0x50
do_trap+0x10a/0x120
f2fs_evict_inode+0x58c/0x610
do_error_trap+0x60/0x80
f2fs_evict_inode+0x58c/0x610
exc_invalid_op+0x53/0x60
f2fs_evict_inode+0x58c/0x610
asm_exc_invalid_op+0x16/0x20
f2fs_evict_inode+0x58c/0x610
evict+0x101/0x260
dispose_list+0x30/0x50
evict_inodes+0x140/0x190
generic_shutdown_super+0x2f/0x150
kill_block_super+0x11/0x40
kill_f2fs_super+0x7d/0x140
deactivate_locked_super+0x2a/0x70
cleanup_mnt+0xb3/0x140
task_work_run+0x61/0x90
The root cause is: creating large files during disable checkpoint
period results in not enough free segments, so when writing back root
inode will failed in f2fs_enable_checkpoint. When umount the file
system after enabling checkpoint, the root inode is dirty in
f2fs_evict_inode function, which triggers BUG_ON. The steps to
reproduce are as follows:
dd if=/dev/zero of=f2fs.img bs=1M count=55
mount f2fs.img f2fs_dir -o checkpoint=disable:10%
dd if=/dev/zero of=big bs=1M count=50
sync
rm big
mount -o remount,checkpoint=enable f2fs_dir
umount f2fs_dir
Let's redirty inode when there is not free segments during checkpoint
is disable.
Signed-off-by: Qi Han <hanqi(a)vivo.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/f2fs/inode.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 27bd8d1bae4d3..558f478d037d0 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -724,8 +724,10 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
!is_inode_flag_set(inode, FI_DIRTY_INODE))
return 0;
- if (!f2fs_is_checkpoint_ready(sbi))
+ if (!f2fs_is_checkpoint_ready(sbi)) {
+ f2fs_mark_inode_dirty_sync(inode, true);
return -ENOSPC;
+ }
/*
* We need to balance fs here to prevent from producing dirty node pages
--
2.43.0