[PATCH AUTOSEL 6.11 01/13] f2fs: fix f2fs_bug_on when uninstalling filesystem call f2fs_evict_inode.

List overview All Threads
Download

newer

older

[PATCH AUTOSEL 6.6 01/12] f2fs:...

[PATCH AUTOSEL 6.12 01/15] f2fs:...

Sasha Levin

4 Dec 2024 4 Dec '24

4 p.m.

From: Qi Han hanqi@vivo.com

[ Upstream commit d5c367ef8287fb4d235c46a2f8c8d68715f3a0ca ]

creating a large files during checkpoint disable until it runs out of space and then delete it, then remount to enable checkpoint again, and then unmount the filesystem triggers the f2fs_bug_on as below:

------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:896! CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty #360 Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:f2fs_evict_inode+0x58c/0x610 Call Trace: __die_body+0x15/0x60 die+0x33/0x50 do_trap+0x10a/0x120 f2fs_evict_inode+0x58c/0x610 do_error_trap+0x60/0x80 f2fs_evict_inode+0x58c/0x610 exc_invalid_op+0x53/0x60 f2fs_evict_inode+0x58c/0x610 asm_exc_invalid_op+0x16/0x20 f2fs_evict_inode+0x58c/0x610 evict+0x101/0x260 dispose_list+0x30/0x50 evict_inodes+0x140/0x190 generic_shutdown_super+0x2f/0x150 kill_block_super+0x11/0x40 kill_f2fs_super+0x7d/0x140 deactivate_locked_super+0x2a/0x70 cleanup_mnt+0xb3/0x140 task_work_run+0x61/0x90

The root cause is: creating large files during disable checkpoint period results in not enough free segments, so when writing back root inode will failed in f2fs_enable_checkpoint. When umount the file system after enabling checkpoint, the root inode is dirty in f2fs_evict_inode function, which triggers BUG_ON. The steps to reproduce are as follows:

dd if=/dev/zero of=f2fs.img bs=1M count=55 mount f2fs.img f2fs_dir -o checkpoint=disable:10% dd if=/dev/zero of=big bs=1M count=50 sync rm big mount -o remount,checkpoint=enable f2fs_dir umount f2fs_dir

Let's redirty inode when there is not free segments during checkpoint is disable.

Signed-off-by: Qi Han hanqi@vivo.com Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 4729c49bf6d7e..b0676e8338334 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -775,8 +775,10 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc) !is_inode_flag_set(inode, FI_DIRTY_INODE)) return 0;

- if (!f2fs_is_checkpoint_ready(sbi)) + if (!f2fs_is_checkpoint_ready(sbi)) { + f2fs_mark_inode_dirty_sync(inode, true); return -ENOSPC; + }

/* * We need to balance fs here to prevent from producing dirty node pages

-- 2.43.0

Show replies by date

Sasha Levin

4 Dec 4 Dec

4 p.m.

New subject: [PATCH AUTOSEL 6.11 02/13] KMSAN: uninit-value in inode_go_dump (5)

From: Qianqiang Liu qianqiang.liu@163.com

[ Upstream commit f9417fcfca3c5e30a0b961e7250fab92cfa5d123 ]

When mounting of a corrupted disk image fails, the error message printed can reference uninitialized inode fields. To prevent that from happening, always initialize those fields.

Reported-by: syzbot+aa0730b0a42646eb1359@syzkaller.appspotmail.com Signed-off-by: Qianqiang Liu qianqiang.liu@163.com Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/super.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 6678060ed4d2b..d60d53810bc12 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -1537,11 +1537,13 @@ static struct inode *gfs2_alloc_inode(struct super_block *sb) if (!ip) return NULL; ip->i_no_addr = 0; + ip->i_no_formal_ino = 0; ip->i_flags = 0; ip->i_gl = NULL; gfs2_holder_mark_uninitialized(&ip->i_iopen_gh); memset(&ip->i_res, 0, sizeof(ip->i_res)); RB_CLEAR_NODE(&ip->i_res.rs_node); + ip->i_diskflags = 0; ip->i_rahead = 0; return &ip->i_inode; }

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 03/13] i3c: mipi-i3c-hci: Mask ring interrupts before ring stop request

From: Jarkko Nikula jarkko.nikula@linux.intel.com

[ Upstream commit 6ca2738174e4ee44edb2ab2d86ce74f015a0cc32 ]

Bus cleanup path in DMA mode may trigger a RING_OP_STAT interrupt when the ring is being stopped. Depending on timing between ring stop request completion, interrupt handler removal and code execution this may lead to a NULL pointer dereference in hci_dma_irq_handler() if it gets to run after the io_data pointer is set to NULL in hci_dma_cleanup().

Prevent this my masking the ring interrupts before ring stop request.

Signed-off-by: Jarkko Nikula jarkko.nikula@linux.intel.com Link: https://lore.kernel.org/r/20240920144432.62370-2-jarkko.nikula@linux.intel.c... Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i3c/master/mipi-i3c-hci/dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/i3c/master/mipi-i3c-hci/dma.c b/drivers/i3c/master/mipi-i3c-hci/dma.c index a918e96b21fdd..13adc58400942 100644 --- a/drivers/i3c/master/mipi-i3c-hci/dma.c +++ b/drivers/i3c/master/mipi-i3c-hci/dma.c @@ -159,10 +159,10 @@ static void hci_dma_cleanup(struct i3c_hci *hci) for (i = 0; i < rings->total; i++) { rh = &rings->headers[i];

+ rh_reg_write(INTR_SIGNAL_ENABLE, 0); rh_reg_write(RING_CONTROL, 0); rh_reg_write(CR_SETUP, 0); rh_reg_write(IBI_SETUP, 0); - rh_reg_write(INTR_SIGNAL_ENABLE, 0);

if (rh->xfer) dma_free_coherent(&hci->master.dev,

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 04/13] PCI: qcom: Add support for IPQ9574

From: devi priya quic_devipriy@quicinc.com

[ Upstream commit a63b74f2e35be3829f256922037ae5cee6bb844a ]

Add the new IPQ9574 platform which is based on the Qcom IP rev. 1.27.0 and Synopsys IP rev. 5.80a.

The platform itself has four PCIe Gen3 controllers: two single-lane and two dual-lane, all are based on Synopsys IP rev. 5.70a. As such, reuse all the members of 'ops_2_9_0'.

Link: https://lore.kernel.org/r/20240801054803.3015572-5-quic_srichara@quicinc.com Co-developed-by: Anusha Rao quic_anusha@quicinc.com Signed-off-by: Anusha Rao quic_anusha@quicinc.com Signed-off-by: devi priya quic_devipriy@quicinc.com Signed-off-by: Sricharan Ramabadhran quic_srichara@quicinc.com [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński kwilczynski@kernel.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/dwc/pcie-qcom.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c index 0b3020c7a50a4..ee8b08f6a641a 100644 --- a/drivers/pci/controller/dwc/pcie-qcom.c +++ b/drivers/pci/controller/dwc/pcie-qcom.c @@ -1769,6 +1769,7 @@ static const struct of_device_id qcom_pcie_match[] = { { .compatible = "qcom,pcie-ipq8064-v2", .data = &cfg_2_1_0 }, { .compatible = "qcom,pcie-ipq8074", .data = &cfg_2_3_3 }, { .compatible = "qcom,pcie-ipq8074-gen3", .data = &cfg_2_9_0 }, + { .compatible = "qcom,pcie-ipq9574", .data = &cfg_2_9_0 }, { .compatible = "qcom,pcie-msm8996", .data = &cfg_2_3_2 }, { .compatible = "qcom,pcie-qcs404", .data = &cfg_2_4_0 }, { .compatible = "qcom,pcie-sa8540p", .data = &cfg_sc8280xp },

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 05/13] PCI: vmd: Add DID 8086:B06F and 8086:B60B for Intel client SKUs

From: Nirmal Patel nirmal.patel@linux.ntel.com

[ Upstream commit b727484cace4be22be9321cc0bc9487648ba447b ]

Add support for this VMD device which supports the bus restriction mode. The feature that turns off vector 0 for MSI-X remapping is also enabled.

Link: https://lore.kernel.org/r/20241011175657.249948-1-nirmal.patel@linux.intel.c... Signed-off-by: Nirmal Patel nirmal.patel@linux.ntel.com Signed-off-by: Krzysztof Wilczyński kwilczynski@kernel.org Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/vmd.c | 4 ++++ 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index a726de0af011f..4429a3ca1de17 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -1111,6 +1111,10 @@ static const struct pci_device_id vmd_ids[] = { .driver_data = VMD_FEATS_CLIENT,}, {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_VMD_9A0B), .driver_data = VMD_FEATS_CLIENT,}, + {PCI_VDEVICE(INTEL, 0xb60b), + .driver_data = VMD_FEATS_CLIENT,}, + {PCI_VDEVICE(INTEL, 0xb06f), + .driver_data = VMD_FEATS_CLIENT,}, {0,} }; MODULE_DEVICE_TABLE(pci, vmd_ids);

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 06/13] PCI: vmd: Set devices to D0 before enabling PM L1 Substates

From: Jian-Hong Pan jhp@endlessos.org

[ Upstream commit d66041063192497a4a97d21dbf86b79a03a7f4fb ]

The remapped PCIe Root Port and the child device have PM L1 Substates capability, but they are disabled originally.

Here is a failed example on ASUS B1400CEAE:

Capabilities: [900 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us

Enable PCI-PM L1 PM Substates for devices below VMD while they are in D0 (see PCIe r6.0, sec 5.5.4).

Link: https://lore.kernel.org/r/20241001083438.10070-4-jhp@endlessos.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 Signed-off-by: Jian-Hong Pan jhp@endlessos.org Signed-off-by: Krzysztof Wilczyński kwilczynski@kernel.org Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/vmd.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 4429a3ca1de17..73c672cb830f3 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -751,11 +751,9 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) if (!(features & VMD_FEAT_BIOS_PM_QUIRK)) return 0;

- pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); - pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); if (!pos) - return 0; + goto out_state_change;

/* * Skip if the max snoop LTR is non-zero, indicating BIOS has set it @@ -763,7 +761,7 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) */ pci_read_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, &ltr_reg); if (!!(ltr_reg & (PCI_LTR_VALUE_MASK | PCI_LTR_SCALE_MASK))) - return 0; + goto out_state_change;

/* * Set the default values to the maximum required by the platform to @@ -775,6 +773,13 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, ltr_reg); pci_info(pdev, "VMD: Default LTR value set by driver\n");

+out_state_change: + /* + * Ensure devices are in D0 before enabling PCI-PM L1 PM Substates, per + * PCIe r6.0, sec 5.5.4. + */ + pci_set_power_state_locked(pdev, PCI_D0); + pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); return 0; }

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 07/13] PCI: Detect and trust built-in Thunderbolt chips

From: Esther Shimanovich eshimanovich@chromium.org

[ Upstream commit 3b96b895127b7c0aed63d82c974b46340e8466c1 ]

Some computers with CPUs that lack Thunderbolt features use discrete Thunderbolt chips to add Thunderbolt functionality. These Thunderbolt chips are located within the chassis; between the Root Port labeled ExternalFacingPort and the USB-C port.

These Thunderbolt PCIe devices should be labeled as fixed and trusted, as they are built into the computer. Otherwise, security policies that rely on those flags may have unintended results, such as preventing USB-C ports from enumerating.

Detect the above scenario through the process of elimination.

1) Integrated Thunderbolt host controllers already have Thunderbolt implemented, so anything outside their external facing Root Port is removable and untrusted.

Detect them using the following properties:

- Most integrated host controllers have the "usb4-host-interface" ACPI property, as described here:

https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-...

- Integrated Thunderbolt PCIe Root Ports before Alder Lake do not have the "usb4-host-interface" ACPI property. Identify those by their PCI IDs instead.

2) If a Root Port does not have integrated Thunderbolt capabilities, but has the "ExternalFacingPort" ACPI property, that means the manufacturer has opted to use a discrete Thunderbolt host controller that is built into the computer.

This host controller can be identified by virtue of being located directly below an external-facing Root Port that lacks integrated Thunderbolt. Label it as trusted and fixed.

Everything downstream from it is untrusted and removable.

The "ExternalFacingPort" ACPI property is described here: https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-...

Link: https://lore.kernel.org/r/20240910-trust-tbt-fix-v5-1-7a7a42a5f496@chromium.... Suggested-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Esther Shimanovich eshimanovich@chromium.org Signed-off-by: Bjorn Helgaas bhelgaas@google.com Tested-by: Mika Westerberg mika.westerberg@linux.intel.com Tested-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/pci/acpi.c | 119 ++++++++++++++++++++++++++++++++++++++++++++ drivers/pci/probe.c | 30 ++++++++--- include/linux/pci.h | 6 +++ 3 files changed, 148 insertions(+), 7 deletions(-)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index 55c4b07ec1f63..0c316bae1726e 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -250,6 +250,125 @@ void __init pci_acpi_crs_quirks(void) pr_info("Please notify linux-pci@vger.kernel.org so future kernels can do this automatically\n"); }

+/* + * Check if pdev is part of a PCIe switch that is directly below the + * specified bridge. + */ +static bool pcie_switch_directly_under(struct pci_dev *bridge, + struct pci_dev *pdev) +{ + struct pci_dev *parent = pci_upstream_bridge(pdev); + + /* If the device doesn't have a parent, it's not under anything */ + if (!parent) + return false; + + /* + * If the device has a PCIe type, check if it is below the + * corresponding PCIe switch components (if applicable). Then check + * if its upstream port is directly beneath the specified bridge. + */ + switch (pci_pcie_type(pdev)) { + case PCI_EXP_TYPE_UPSTREAM: + return parent == bridge; + + case PCI_EXP_TYPE_DOWNSTREAM: + if (pci_pcie_type(parent) != PCI_EXP_TYPE_UPSTREAM) + return false; + parent = pci_upstream_bridge(parent); + return parent == bridge; + + case PCI_EXP_TYPE_ENDPOINT: + if (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM) + return false; + parent = pci_upstream_bridge(parent); + if (!parent || pci_pcie_type(parent) != PCI_EXP_TYPE_UPSTREAM) + return false; + parent = pci_upstream_bridge(parent); + return parent == bridge; + } + + return false; +} + +static bool pcie_has_usb4_host_interface(struct pci_dev *pdev) +{ + struct fwnode_handle *fwnode; + + /* + * For USB4, the tunneled PCIe Root or Downstream Ports are marked + * with the "usb4-host-interface" ACPI property, so we look for + * that first. This should cover most cases. + */ + fwnode = fwnode_find_reference(dev_fwnode(&pdev->dev), + "usb4-host-interface", 0); + if (!IS_ERR(fwnode)) { + fwnode_handle_put(fwnode); + return true; + } + + /* + * Any integrated Thunderbolt 3/4 PCIe Root Ports from Intel + * before Alder Lake do not have the "usb4-host-interface" + * property so we use their PCI IDs instead. All these are + * tunneled. This list is not expected to grow. + */ + if (pdev->vendor == PCI_VENDOR_ID_INTEL) { + switch (pdev->device) { + /* Ice Lake Thunderbolt 3 PCIe Root Ports */ + case 0x8a1d: + case 0x8a1f: + case 0x8a21: + case 0x8a23: + /* Tiger Lake-LP Thunderbolt 4 PCIe Root Ports */ + case 0x9a23: + case 0x9a25: + case 0x9a27: + case 0x9a29: + /* Tiger Lake-H Thunderbolt 4 PCIe Root Ports */ + case 0x9a2b: + case 0x9a2d: + case 0x9a2f: + case 0x9a31: + return true; + } + } + + return false; +} + +bool arch_pci_dev_is_removable(struct pci_dev *pdev) +{ + struct pci_dev *parent, *root; + + /* pdev without a parent or Root Port is never tunneled */ + parent = pci_upstream_bridge(pdev); + if (!parent) + return false; + root = pcie_find_root_port(pdev); + if (!root) + return false; + + /* Internal PCIe devices are not tunneled */ + if (!root->external_facing) + return false; + + /* Anything directly behind a "usb4-host-interface" is tunneled */ + if (pcie_has_usb4_host_interface(parent)) + return true; + + /* + * Check if this is a discrete Thunderbolt/USB4 controller that is + * directly behind the non-USB4 PCIe Root Port marked as + * "ExternalFacingPort". Those are not behind a PCIe tunnel. + */ + if (pcie_switch_directly_under(root, pdev)) + return false; + + /* PCIe devices after the discrete chip are tunneled */ + return true; +} + #ifdef CONFIG_PCI_MMCONFIG static int check_segment(u16 seg, struct device *dev, char *estr) { diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index d203e23b75620..6f5e4fa854f78 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1631,23 +1631,33 @@ static void set_pcie_thunderbolt(struct pci_dev *dev)

static void set_pcie_untrusted(struct pci_dev *dev) { - struct pci_dev *parent; + struct pci_dev *parent = pci_upstream_bridge(dev);

+ if (!parent) + return; /* - * If the upstream bridge is untrusted we treat this device + * If the upstream bridge is untrusted we treat this device as * untrusted as well. */ - parent = pci_upstream_bridge(dev); - if (parent && (parent->untrusted || parent->external_facing)) + if (parent->untrusted) { + dev->untrusted = true; + return; + } + + if (arch_pci_dev_is_removable(dev)) { + pci_dbg(dev, "marking as untrusted\n"); dev->untrusted = true; + } }

static void pci_set_removable(struct pci_dev *dev) { struct pci_dev *parent = pci_upstream_bridge(dev);

+ if (!parent) + return; /* - * We (only) consider everything downstream from an external_facing + * We (only) consider everything tunneled below an external_facing * device to be removable by the user. We're mainly concerned with * consumer platforms with user accessible thunderbolt ports that are * vulnerable to DMA attacks, and we expect those ports to be marked by @@ -1657,9 +1667,15 @@ static void pci_set_removable(struct pci_dev *dev) * accessible to user / may not be removed by end user, and thus not * exposed as "removable" to userspace. */ - if (parent && - (parent->external_facing || dev_is_removable(&parent->dev))) + if (dev_is_removable(&parent->dev)) { + dev_set_removable(&dev->dev, DEVICE_REMOVABLE); + return; + } + + if (arch_pci_dev_is_removable(dev)) { + pci_dbg(dev, "marking as removable\n"); dev_set_removable(&dev->dev, DEVICE_REMOVABLE); + } }

/** diff --git a/include/linux/pci.h b/include/linux/pci.h index 37d97bef060f9..3eea098a47865 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2602,6 +2602,12 @@ pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; } static inline bool pci_pr3_present(struct pci_dev *pdev) { return false; } #endif

+#if defined(CONFIG_X86) && defined(CONFIG_ACPI) +bool arch_pci_dev_is_removable(struct pci_dev *pdev); +#else +static inline bool arch_pci_dev_is_removable(struct pci_dev *pdev) { return false; } +#endif + #ifdef CONFIG_EEH static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) {

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 08/13] PCI: starfive: Enable controller runtime PM before probing host bridge

From: Mayank Rana quic_mrana@quicinc.com

[ Upstream commit 6168efbebace0db443185d4c6701ca8170a8788d ]

A PCI controller device, e.g., StarFive, is parent to PCI host bridge device. We must enable runtime PM of the controller before enabling runtime PM of the host bridge, which will happen in pci_host_probe(), to avoid this warning:

pcie-starfive 940000000.pcie: Enabling runtime PM for inactive device with active children

Fix this issue by enabling StarFive controller device's runtime PM before calling pci_host_probe() in plda_pcie_host_init().

Link: https://lore.kernel.org/r/20241111-runtime_pm-v7-1-9c164eefcd87@quicinc.com Tested-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Mayank Rana quic_mrana@quicinc.com [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/plda/pcie-starfive.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/plda/pcie-starfive.c b/drivers/pci/controller/plda/pcie-starfive.c index c9933ecf68338..0564fdce47c2a 100644 --- a/drivers/pci/controller/plda/pcie-starfive.c +++ b/drivers/pci/controller/plda/pcie-starfive.c @@ -404,6 +404,9 @@ static int starfive_pcie_probe(struct platform_device *pdev) if (ret) return ret;

+ pm_runtime_enable(&pdev->dev); + pm_runtime_get_sync(&pdev->dev); + plda->host_ops = &sf_host_ops; plda->num_events = PLDA_MAX_EVENT_NUM; /* mask doorbell event */ @@ -413,11 +416,12 @@ static int starfive_pcie_probe(struct platform_device *pdev) plda->events_bitmap <<= PLDA_NUM_DMA_EVENTS; ret = plda_pcie_host_init(&pcie->plda, &starfive_pcie_ops, &stf_pcie_event); - if (ret) + if (ret) { + pm_runtime_put_sync(&pdev->dev); + pm_runtime_disable(&pdev->dev); return ret; + }

- pm_runtime_enable(&pdev->dev); - pm_runtime_get_sync(&pdev->dev); platform_set_drvdata(pdev, pcie);

return 0;

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 09/13] PCI: Add 'reset_subordinate' to reset hierarchy below bridge

From: Keith Busch kbusch@kernel.org

[ Upstream commit 2fa046449a82a7d0f6d9721dd83e348816038444 ]

The "bus" and "cxl_bus" reset methods reset a device by asserting Secondary Bus Reset on the bridge leading to the device. These only work if the device is the only device below the bridge.

Add a sysfs 'reset_subordinate' attribute on bridges that can assert Secondary Bus Reset regardless of how many devices are below the bridge.

This resets all the devices below a bridge in a single command, including the locking and config space save/restore that reset methods normally do.

This may be the only way to reset devices that don't support other reset methods (ACPI, FLR, PM reset, etc).

Link: https://lore.kernel.org/r/20241025222755.3756162-1-kbusch@meta.com Signed-off-by: Keith Busch kbusch@kernel.org [bhelgaas: commit log, add capable(CAP_SYS_ADMIN) check] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Alex Williamson alex.williamson@redhat.com Reviewed-by: Amey Narkhede ameynarkhede03@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/ABI/testing/sysfs-bus-pci | 11 +++++++++++ drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++ drivers/pci/pci.c | 2 +- drivers/pci/pci.h | 1 + 4 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index ecf47559f495b..7f3e6bc3ff0ff 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -163,6 +163,17 @@ Description: will be present in sysfs. Writing 1 to this file will perform reset.

+What: /sys/bus/pci/devices/.../reset_subordinate +Date: October 2024 +Contact: linux-pci@vger.kernel.org +Description: + This is visible only for bridge devices. If you want to reset + all devices attached through the subordinate bus of a specific + bridge device, writing 1 to this will try to do it. This will + affect all devices attached to the system through this bridge + similiar to writing 1 to their individual "reset" file, so use + with caution. + What: /sys/bus/pci/devices/.../vpd Date: February 2008 Contact: Ben Hutchings bwh@kernel.org diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 40cfa716392fb..4c6ff5249a953 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -517,6 +517,31 @@ static ssize_t bus_rescan_store(struct device *dev, static struct device_attribute dev_attr_bus_rescan = __ATTR(rescan, 0200, NULL, bus_rescan_store);

+static ssize_t reset_subordinate_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct pci_bus *bus = pdev->subordinate; + unsigned long val; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (kstrtoul(buf, 0, &val) < 0) + return -EINVAL; + + if (val) { + int ret = __pci_reset_bus(bus); + + if (ret) + return ret; + } + + return count; +} +static DEVICE_ATTR_WO(reset_subordinate); + #if defined(CONFIG_PM) && defined(CONFIG_ACPI) static ssize_t d3cold_allowed_store(struct device *dev, struct device_attribute *attr, @@ -621,6 +646,7 @@ static struct attribute *pci_dev_attrs[] = { static struct attribute *pci_bridge_attrs[] = { &dev_attr_subordinate_bus_number.attr, &dev_attr_secondary_bus_number.attr, + &dev_attr_reset_subordinate.attr, NULL, };

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 51407c376a222..50868daaecdcf 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5869,7 +5869,7 @@ EXPORT_SYMBOL_GPL(pci_probe_reset_bus); * * Same as above except return -EAGAIN if the bus cannot be locked */ -static int __pci_reset_bus(struct pci_bus *bus) +int __pci_reset_bus(struct pci_bus *bus) { int rc;

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 7c06e55c5072e..822269cd67bed 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -89,6 +89,7 @@ bool pci_reset_supported(struct pci_dev *dev); void pci_init_reset_methods(struct pci_dev *dev); int pci_bridge_secondary_bus_reset(struct pci_dev *dev); int pci_bus_error_reset(struct pci_dev *dev); +int __pci_reset_bus(struct pci_bus *bus);

struct pci_cap_saved_data { u16 cap_nr;

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 10/13] PCI: Add ACS quirk for Wangxun FF5xxx NICs

From: Mengyuan Lou mengyuanlou@net-swift.com

[ Upstream commit aa46a3736afcb7b0793766d22479b8b99fc1b322 ]

Wangxun FF5xxx NICs are similar to SFxxx, RP1000 and RP2000 NICs. They may be multi-function devices, but they do not advertise an ACS capability.

But the hardware does isolate FF5xxx functions as though it had an ACS capability and PCI_ACS_RR and PCI_ACS_CR were set in the ACS Control register, i.e., all peer-to-peer traffic is directed upstream instead of being routed internally.

Add ACS quirk for FF5xxx NICs in pci_quirk_wangxun_nic_acs() so the functions can be in independent IOMMU groups.

Link: https://lore.kernel.org/r/E16053DB2B80E9A5+20241115024604.30493-1-mengyuanlo... Signed-off-by: Mengyuan Lou mengyuanlou@net-swift.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/quirks.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index dccb60c1d9cc3..8103bc24a54ea 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4996,18 +4996,21 @@ static int pci_quirk_brcm_acs(struct pci_dev *dev, u16 acs_flags) }

/* - * Wangxun 10G/1G NICs have no ACS capability, and on multi-function - * devices, peer-to-peer transactions are not be used between the functions. - * So add an ACS quirk for below devices to isolate functions. + * Wangxun 40G/25G/10G/1G NICs have no ACS capability, but on + * multi-function devices, the hardware isolates the functions by + * directing all peer-to-peer traffic upstream as though PCI_ACS_RR and + * PCI_ACS_CR were set. * SFxxx 1G NICs(em). * RP1000/RP2000 10G NICs(sp). + * FF5xxx 40G/25G/10G NICs(aml). */ static int pci_quirk_wangxun_nic_acs(struct pci_dev *dev, u16 acs_flags) { switch (dev->device) { - case 0x0100 ... 0x010F: - case 0x1001: - case 0x2001: + case 0x0100 ... 0x010F: /* EM */ + case 0x1001: case 0x2001: /* SP */ + case 0x5010: case 0x5025: case 0x5040: /* AML */ + case 0x5110: case 0x5125: case 0x5140: /* AML */ return pci_acs_ctrl_enabled(acs_flags, PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF); }

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 11/13] i3c: Use i3cdev->desc->info instead of calling i3c_device_get_info() to avoid deadlock

From: Defa Li defa.li@mediatek.com

[ Upstream commit 6cf7b65f7029914dc0cd7db86fac9ee5159008c6 ]

A deadlock may happen since the i3c_master_register() acquires &i3cbus->lock twice. See the log below. Use i3cdev->desc->info instead of calling i3c_device_info() to avoid acquiring the lock twice.

v2: - Modified the title and commit message

============================================ WARNING: possible recursive locking detected 6.11.0-mainline -------------------------------------------- init/1 is trying to acquire lock: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_bus_normaluse_lock

but task is already holding lock: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register

other info that might help us debug this: Possible unsafe locking scenario:

CPU0 ---- lock(&i3cbus->lock); lock(&i3cbus->lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

2 locks held by init/1: #0: fcffff809b6798f8 (&dev->mutex){....}-{3:3}, at: __driver_attach #1: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register

stack backtrace: CPU: 6 UID: 0 PID: 1 Comm: init Call trace: dump_backtrace+0xfc/0x17c show_stack+0x18/0x28 dump_stack_lvl+0x40/0xc0 dump_stack+0x18/0x24 print_deadlock_bug+0x388/0x390 __lock_acquire+0x18bc/0x32ec lock_acquire+0x134/0x2b0 down_read+0x50/0x19c i3c_bus_normaluse_lock+0x14/0x24 i3c_device_get_info+0x24/0x58 i3c_device_uevent+0x34/0xa4 dev_uevent+0x310/0x384 kobject_uevent_env+0x244/0x414 kobject_uevent+0x14/0x20 device_add+0x278/0x460 device_register+0x20/0x34 i3c_master_register_new_i3c_devs+0x78/0x154 i3c_master_register+0x6a0/0x6d4 mtk_i3c_master_probe+0x3b8/0x4d8 platform_probe+0xa0/0xe0 really_probe+0x114/0x454 __driver_probe_device+0xa0/0x15c driver_probe_device+0x3c/0x1ac __driver_attach+0xc4/0x1f0 bus_for_each_dev+0x104/0x160 driver_attach+0x24/0x34 bus_add_driver+0x14c/0x294 driver_register+0x68/0x104 __platform_driver_register+0x20/0x30 init_module+0x20/0xfe4 do_one_initcall+0x184/0x464 do_init_module+0x58/0x1ec load_module+0xefc/0x10c8 __arm64_sys_finit_module+0x238/0x33c invoke_syscall+0x58/0x10c el0_svc_common+0xa8/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x50/0xac el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac

Signed-off-by: Defa Li defa.li@mediatek.com Link: https://lore.kernel.org/r/20241107132549.25439-1-defa.li@mediatek.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i3c/master.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/i3c/master.c b/drivers/i3c/master.c index 7028f03c2c42e..5e3b63f78d2d2 100644 --- a/drivers/i3c/master.c +++ b/drivers/i3c/master.c @@ -282,7 +282,8 @@ static int i3c_device_uevent(const struct device *dev, struct kobj_uevent_env *e struct i3c_device_info devinfo; u16 manuf, part, ext;

- i3c_device_get_info(i3cdev, &devinfo); + if (i3cdev->desc) + devinfo = i3cdev->desc->info; manuf = I3C_PID_MANUF_ID(devinfo.pid); part = I3C_PID_PART_ID(devinfo.pid); ext = I3C_PID_EXTRA_INFO(devinfo.pid);

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 12/13] f2fs: print message if fscorrupted was found in f2fs_new_node_page()

From: Chao Yu chao@kernel.org

[ Upstream commit 81520c684ca67aea6a589461a3caebb9b11dcc90 ]

If fs corruption occurs in f2fs_new_node_page(), let's print more information about corrupted metadata into kernel log.

Meanwhile, it updates to record ERROR_INCONSISTENT_NAT instead of ERROR_INVALID_BLKADDR if blkaddr in nat entry is not NULL_ADDR which means nat bitmap and nat entry is inconsistent.

Signed-off-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/node.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index b72ef96f7e33a..6ed4b1b2e7d94 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1331,7 +1331,12 @@ struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs) err = -EFSCORRUPTED; dec_valid_node_count(sbi, dn->inode, !ofs); set_sbi_flag(sbi, SBI_NEED_FSCK); - f2fs_handle_error(sbi, ERROR_INVALID_BLKADDR); + f2fs_warn_ratelimited(sbi, + "f2fs_new_node_page: inconsistent nat entry, " + "ino:%u, nid:%u, blkaddr:%u, ver:%u, flag:%u", + new_ni.ino, new_ni.nid, new_ni.blk_addr, + new_ni.version, new_ni.flag); + f2fs_handle_error(sbi, ERROR_INCONSISTENT_NAT); goto fail; } #endif

-- 2.43.0

Sasha Levin

4 p.m.

New subject: [PATCH AUTOSEL 6.11 13/13] f2fs: fix to shrink read extent node in batches

From: Chao Yu chao@kernel.org

[ Upstream commit 3fc5d5a182f6a1f8bd4dc775feb54c369dd2c343 ]

We use rwlock to protect core structure data of extent tree during its shrink, however, if there is a huge number of extent nodes in extent tree, during shrink of extent tree, it may hold rwlock for a very long time, which may trigger kernel hang issue.

This patch fixes to shrink read extent node in batches, so that, critical region of the rwlock can be shrunk to avoid its extreme long time hold.

Reported-by: Xiuhong Wang xiuhong.wang@unisoc.com Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wa... Signed-off-by: Xiuhong Wang xiuhong.wang@unisoc.com Signed-off-by: Zhiguo Niu zhiguo.niu@unisoc.com Signed-off-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/extent_cache.c | 69 +++++++++++++++++++++++++----------------- 1 file changed, 41 insertions(+), 28 deletions(-)

diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c index 62ac440d94168..368d9cbdea743 100644 --- a/fs/f2fs/extent_cache.c +++ b/fs/f2fs/extent_cache.c @@ -346,21 +346,22 @@ static struct extent_tree *__grab_extent_tree(struct inode *inode, }

static unsigned int __free_extent_tree(struct f2fs_sb_info *sbi, - struct extent_tree *et) + struct extent_tree *et, unsigned int nr_shrink) { struct rb_node *node, *next; struct extent_node *en; - unsigned int count = atomic_read(&et->node_cnt); + unsigned int count;

node = rb_first_cached(&et->root); - while (node) { + + for (count = 0; node && count < nr_shrink; count++) { next = rb_next(node); en = rb_entry(node, struct extent_node, rb_node); __release_extent_node(sbi, et, en); node = next; }

- return count - atomic_read(&et->node_cnt); + return count; }

static void __drop_largest_extent(struct extent_tree *et, @@ -579,6 +580,30 @@ static struct extent_node *__insert_extent_tree(struct f2fs_sb_info *sbi, return en; }

+static unsigned int __destroy_extent_node(struct inode *inode, + enum extent_type type) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct extent_tree *et = F2FS_I(inode)->extent_tree[type]; + unsigned int nr_shrink = type == EX_READ ? + READ_EXTENT_CACHE_SHRINK_NUMBER : + AGE_EXTENT_CACHE_SHRINK_NUMBER; + unsigned int node_cnt = 0; + + if (!et || !atomic_read(&et->node_cnt)) + return 0; + + while (atomic_read(&et->node_cnt)) { + write_lock(&et->lock); + node_cnt += __free_extent_tree(sbi, et, nr_shrink); + write_unlock(&et->lock); + } + + f2fs_bug_on(sbi, atomic_read(&et->node_cnt)); + + return node_cnt; +} + static void __update_extent_tree_range(struct inode *inode, struct extent_info *tei, enum extent_type type) { @@ -717,9 +742,6 @@ static void __update_extent_tree_range(struct inode *inode, } }

- if (is_inode_flag_set(inode, FI_NO_EXTENT)) - __free_extent_tree(sbi, et); - if (et->largest_updated) { et->largest_updated = false; updated = true; @@ -737,6 +759,9 @@ static void __update_extent_tree_range(struct inode *inode, out_read_extent_cache: write_unlock(&et->lock);

+ if (is_inode_flag_set(inode, FI_NO_EXTENT)) + __destroy_extent_node(inode, EX_READ); + if (updated) f2fs_mark_inode_dirty_sync(inode, true); } @@ -899,10 +924,14 @@ static unsigned int __shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink list_for_each_entry_safe(et, next, &eti->zombie_list, list) { if (atomic_read(&et->node_cnt)) { write_lock(&et->lock); - node_cnt += __free_extent_tree(sbi, et); + node_cnt += __free_extent_tree(sbi, et, + nr_shrink - node_cnt - tree_cnt); write_unlock(&et->lock); } - f2fs_bug_on(sbi, atomic_read(&et->node_cnt)); + + if (atomic_read(&et->node_cnt)) + goto unlock_out; + list_del_init(&et->list); radix_tree_delete(&eti->extent_tree_root, et->ino); kmem_cache_free(extent_tree_slab, et); @@ -1041,23 +1070,6 @@ unsigned int f2fs_shrink_age_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink return __shrink_extent_tree(sbi, nr_shrink, EX_BLOCK_AGE); }

-static unsigned int __destroy_extent_node(struct inode *inode, - enum extent_type type) -{ - struct f2fs_sb_info *sbi = F2FS_I_SB(inode); - struct extent_tree *et = F2FS_I(inode)->extent_tree[type]; - unsigned int node_cnt = 0; - - if (!et || !atomic_read(&et->node_cnt)) - return 0; - - write_lock(&et->lock); - node_cnt = __free_extent_tree(sbi, et); - write_unlock(&et->lock); - - return node_cnt; -} - void f2fs_destroy_extent_node(struct inode *inode) { __destroy_extent_node(inode, EX_READ); @@ -1066,7 +1078,6 @@ void f2fs_destroy_extent_node(struct inode *inode)

static void __drop_extent_tree(struct inode *inode, enum extent_type type) { - struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct extent_tree *et = F2FS_I(inode)->extent_tree[type]; bool updated = false;

@@ -1074,7 +1085,6 @@ static void __drop_extent_tree(struct inode *inode, enum extent_type type) return;

write_lock(&et->lock); - __free_extent_tree(sbi, et); if (type == EX_READ) { set_inode_flag(inode, FI_NO_EXTENT); if (et->largest.len) { @@ -1083,6 +1093,9 @@ static void __drop_extent_tree(struct inode *inode, enum extent_type type) } } write_unlock(&et->lock); + + __destroy_extent_node(inode, type); + if (updated) f2fs_mark_inode_dirty_sync(inode, true); }

-- 2.43.0

388

days inactive

388

days old

linux-stable-mirror@lists.linaro.org

12 comments

participants

tags (0)

participants (1)

Sasha Levin