From: Qi Han hanqi@vivo.com
[ Upstream commit d5c367ef8287fb4d235c46a2f8c8d68715f3a0ca ]
creating a large files during checkpoint disable until it runs out of space and then delete it, then remount to enable checkpoint again, and then unmount the filesystem triggers the f2fs_bug_on as below:
------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:896! CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty #360 Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:f2fs_evict_inode+0x58c/0x610 Call Trace: __die_body+0x15/0x60 die+0x33/0x50 do_trap+0x10a/0x120 f2fs_evict_inode+0x58c/0x610 do_error_trap+0x60/0x80 f2fs_evict_inode+0x58c/0x610 exc_invalid_op+0x53/0x60 f2fs_evict_inode+0x58c/0x610 asm_exc_invalid_op+0x16/0x20 f2fs_evict_inode+0x58c/0x610 evict+0x101/0x260 dispose_list+0x30/0x50 evict_inodes+0x140/0x190 generic_shutdown_super+0x2f/0x150 kill_block_super+0x11/0x40 kill_f2fs_super+0x7d/0x140 deactivate_locked_super+0x2a/0x70 cleanup_mnt+0xb3/0x140 task_work_run+0x61/0x90
The root cause is: creating large files during disable checkpoint period results in not enough free segments, so when writing back root inode will failed in f2fs_enable_checkpoint. When umount the file system after enabling checkpoint, the root inode is dirty in f2fs_evict_inode function, which triggers BUG_ON. The steps to reproduce are as follows:
dd if=/dev/zero of=f2fs.img bs=1M count=55 mount f2fs.img f2fs_dir -o checkpoint=disable:10% dd if=/dev/zero of=big bs=1M count=50 sync rm big mount -o remount,checkpoint=enable f2fs_dir umount f2fs_dir
Let's redirty inode when there is not free segments during checkpoint is disable.
Signed-off-by: Qi Han hanqi@vivo.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 53e1a757e4e17..b0cbb01df8cba 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -631,8 +631,10 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc) !is_inode_flag_set(inode, FI_DIRTY_INODE)) return 0;
- if (!f2fs_is_checkpoint_ready(sbi)) + if (!f2fs_is_checkpoint_ready(sbi)) { + f2fs_mark_inode_dirty_sync(inode, true); return -ENOSPC; + }
/* * We need to balance fs here to prevent from producing dirty node pages
From: Keith Busch kbusch@kernel.org
[ Upstream commit 2fa046449a82a7d0f6d9721dd83e348816038444 ]
The "bus" and "cxl_bus" reset methods reset a device by asserting Secondary Bus Reset on the bridge leading to the device. These only work if the device is the only device below the bridge.
Add a sysfs 'reset_subordinate' attribute on bridges that can assert Secondary Bus Reset regardless of how many devices are below the bridge.
This resets all the devices below a bridge in a single command, including the locking and config space save/restore that reset methods normally do.
This may be the only way to reset devices that don't support other reset methods (ACPI, FLR, PM reset, etc).
Link: https://lore.kernel.org/r/20241025222755.3756162-1-kbusch@meta.com Signed-off-by: Keith Busch kbusch@kernel.org [bhelgaas: commit log, add capable(CAP_SYS_ADMIN) check] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Alex Williamson alex.williamson@redhat.com Reviewed-by: Amey Narkhede ameynarkhede03@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/ABI/testing/sysfs-bus-pci | 11 +++++++++++ drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++ drivers/pci/pci.c | 2 +- drivers/pci/pci.h | 1 + 4 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 8bfee557e50ea..e84434d48dcf0 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -125,6 +125,17 @@ Description: will be present in sysfs. Writing 1 to this file will perform reset.
+What: /sys/bus/pci/devices/.../reset_subordinate +Date: October 2024 +Contact: linux-pci@vger.kernel.org +Description: + This is visible only for bridge devices. If you want to reset + all devices attached through the subordinate bus of a specific + bridge device, writing 1 to this will try to do it. This will + affect all devices attached to the system through this bridge + similiar to writing 1 to their individual "reset" file, so use + with caution. + What: /sys/bus/pci/devices/.../vpd Date: February 2008 Contact: Ben Hutchings bwh@kernel.org diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 90d5a29a6ff3d..131c6d7e86f8f 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -505,6 +505,31 @@ static ssize_t bus_rescan_store(struct device *dev, static struct device_attribute dev_attr_bus_rescan = __ATTR(rescan, 0200, NULL, bus_rescan_store);
+static ssize_t reset_subordinate_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct pci_bus *bus = pdev->subordinate; + unsigned long val; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (kstrtoul(buf, 0, &val) < 0) + return -EINVAL; + + if (val) { + int ret = __pci_reset_bus(bus); + + if (ret) + return ret; + } + + return count; +} +static DEVICE_ATTR_WO(reset_subordinate); + #if defined(CONFIG_PM) && defined(CONFIG_ACPI) static ssize_t d3cold_allowed_store(struct device *dev, struct device_attribute *attr, @@ -628,6 +653,7 @@ static struct attribute *pci_dev_attrs[] = { static struct attribute *pci_bridge_attrs[] = { &dev_attr_subordinate_bus_number.attr, &dev_attr_secondary_bus_number.attr, + &dev_attr_reset_subordinate.attr, NULL, };
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 41050a35631fa..ad5bd17f77a3b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5538,7 +5538,7 @@ EXPORT_SYMBOL_GPL(pci_probe_reset_bus); * * Same as above except return -EAGAIN if the bus cannot be locked */ -static int __pci_reset_bus(struct pci_bus *bus) +int __pci_reset_bus(struct pci_bus *bus) { int rc;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 725d2b0d45693..7a737ef76e6de 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -41,6 +41,7 @@ int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai, int pci_probe_reset_function(struct pci_dev *dev); int pci_bridge_secondary_bus_reset(struct pci_dev *dev); int pci_bus_error_reset(struct pci_dev *dev); +int __pci_reset_bus(struct pci_bus *bus);
#define PCI_PM_D2_DELAY 200 #define PCI_PM_D3_WAIT 10
From: Mengyuan Lou mengyuanlou@net-swift.com
[ Upstream commit aa46a3736afcb7b0793766d22479b8b99fc1b322 ]
Wangxun FF5xxx NICs are similar to SFxxx, RP1000 and RP2000 NICs. They may be multi-function devices, but they do not advertise an ACS capability.
But the hardware does isolate FF5xxx functions as though it had an ACS capability and PCI_ACS_RR and PCI_ACS_CR were set in the ACS Control register, i.e., all peer-to-peer traffic is directed upstream instead of being routed internally.
Add ACS quirk for FF5xxx NICs in pci_quirk_wangxun_nic_acs() so the functions can be in independent IOMMU groups.
Link: https://lore.kernel.org/r/E16053DB2B80E9A5+20241115024604.30493-1-mengyuanlo... Signed-off-by: Mengyuan Lou mengyuanlou@net-swift.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/quirks.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index b60954b04a077..6a2d64d050c04 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4870,18 +4870,21 @@ static int pci_quirk_brcm_acs(struct pci_dev *dev, u16 acs_flags) }
/* - * Wangxun 10G/1G NICs have no ACS capability, and on multi-function - * devices, peer-to-peer transactions are not be used between the functions. - * So add an ACS quirk for below devices to isolate functions. + * Wangxun 40G/25G/10G/1G NICs have no ACS capability, but on + * multi-function devices, the hardware isolates the functions by + * directing all peer-to-peer traffic upstream as though PCI_ACS_RR and + * PCI_ACS_CR were set. * SFxxx 1G NICs(em). * RP1000/RP2000 10G NICs(sp). + * FF5xxx 40G/25G/10G NICs(aml). */ static int pci_quirk_wangxun_nic_acs(struct pci_dev *dev, u16 acs_flags) { switch (dev->device) { - case 0x0100 ... 0x010F: - case 0x1001: - case 0x2001: + case 0x0100 ... 0x010F: /* EM */ + case 0x1001: case 0x2001: /* SP */ + case 0x5010: case 0x5025: case 0x5040: /* AML */ + case 0x5110: case 0x5125: case 0x5140: /* AML */ return pci_acs_ctrl_enabled(acs_flags, PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); }
From: Defa Li defa.li@mediatek.com
[ Upstream commit 6cf7b65f7029914dc0cd7db86fac9ee5159008c6 ]
A deadlock may happen since the i3c_master_register() acquires &i3cbus->lock twice. See the log below. Use i3cdev->desc->info instead of calling i3c_device_info() to avoid acquiring the lock twice.
v2: - Modified the title and commit message
============================================ WARNING: possible recursive locking detected 6.11.0-mainline -------------------------------------------- init/1 is trying to acquire lock: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_bus_normaluse_lock
but task is already holding lock: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&i3cbus->lock); lock(&i3cbus->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by init/1: #0: fcffff809b6798f8 (&dev->mutex){....}-{3:3}, at: __driver_attach #1: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register
stack backtrace: CPU: 6 UID: 0 PID: 1 Comm: init Call trace: dump_backtrace+0xfc/0x17c show_stack+0x18/0x28 dump_stack_lvl+0x40/0xc0 dump_stack+0x18/0x24 print_deadlock_bug+0x388/0x390 __lock_acquire+0x18bc/0x32ec lock_acquire+0x134/0x2b0 down_read+0x50/0x19c i3c_bus_normaluse_lock+0x14/0x24 i3c_device_get_info+0x24/0x58 i3c_device_uevent+0x34/0xa4 dev_uevent+0x310/0x384 kobject_uevent_env+0x244/0x414 kobject_uevent+0x14/0x20 device_add+0x278/0x460 device_register+0x20/0x34 i3c_master_register_new_i3c_devs+0x78/0x154 i3c_master_register+0x6a0/0x6d4 mtk_i3c_master_probe+0x3b8/0x4d8 platform_probe+0xa0/0xe0 really_probe+0x114/0x454 __driver_probe_device+0xa0/0x15c driver_probe_device+0x3c/0x1ac __driver_attach+0xc4/0x1f0 bus_for_each_dev+0x104/0x160 driver_attach+0x24/0x34 bus_add_driver+0x14c/0x294 driver_register+0x68/0x104 __platform_driver_register+0x20/0x30 init_module+0x20/0xfe4 do_one_initcall+0x184/0x464 do_init_module+0x58/0x1ec load_module+0xefc/0x10c8 __arm64_sys_finit_module+0x238/0x33c invoke_syscall+0x58/0x10c el0_svc_common+0xa8/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x50/0xac el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac
Signed-off-by: Defa Li defa.li@mediatek.com Link: https://lore.kernel.org/r/20241107132549.25439-1-defa.li@mediatek.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i3c/master.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/i3c/master.c b/drivers/i3c/master.c index 02dc8e29c8280..0503f8398a86a 100644 --- a/drivers/i3c/master.c +++ b/drivers/i3c/master.c @@ -257,7 +257,8 @@ static int i3c_device_uevent(struct device *dev, struct kobj_uevent_env *env) struct i3c_device_info devinfo; u16 manuf, part, ext;
- i3c_device_get_info(i3cdev, &devinfo); + if (i3cdev->desc) + devinfo = i3cdev->desc->info; manuf = I3C_PID_MANUF_ID(devinfo.pid); part = I3C_PID_PART_ID(devinfo.pid); ext = I3C_PID_EXTRA_INFO(devinfo.pid);
linux-stable-mirror@lists.linaro.org