August 2024 - Linux-stable-mirror

FAILED: patch "[PATCH] ata: libata-core: Fix null pointer dereference on error" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 5d92c7c566dc76d96e0e19e481d926bbe6631c1e # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024070107-underrate-unusable-ddb9@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: 5d92c7c566dc ("ata: libata-core: Fix null pointer dereference on error") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 5d92c7c566dc76d96e0e19e481d926bbe6631c1e Mon Sep 17 00:00:00 2001 From: Niklas Cassel <cassel(a)kernel.org> Date: Sat, 29 Jun 2024 14:42:11 +0200 Subject: [PATCH] ata: libata-core: Fix null pointer dereference on error If the ata_port_alloc() call in ata_host_alloc() fails, ata_host_release() will get called. However, the code in ata_host_release() tries to free ata_port struct members unconditionally, which can lead to the following: BUG: unable to handle page fault for address: 0000000000003990 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 10 PID: 594 Comm: (udev-worker) Not tainted 6.10.0-rc5 #44 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:ata_host_release.cold+0x2f/0x6e [libata] Code: e4 4d 63 f4 44 89 e2 48 c7 c6 90 ad 32 c0 48 c7 c7 d0 70 33 c0 49 83 c6 0e 41 RSP: 0018:ffffc90000ebb968 EFLAGS: 00010246 RAX: 0000000000000041 RBX: ffff88810fb52e78 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88813b3218c0 RDI: ffff88813b3218c0 RBP: ffff88810fb52e40 R08: 0000000000000000 R09: 6c65725f74736f68 R10: ffffc90000ebb738 R11: 73692033203a746e R12: 0000000000000004 R13: 0000000000000000 R14: 0000000000000011 R15: 0000000000000006 FS: 00007f6cc55b9980(0000) GS:ffff88813b300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000003990 CR3: 00000001122a2000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? ata_host_release.cold+0x2f/0x6e [libata] ? ata_host_release.cold+0x2f/0x6e [libata] release_nodes+0x35/0xb0 devres_release_group+0x113/0x140 ata_host_alloc+0xed/0x120 [libata] ata_host_alloc_pinfo+0x14/0xa0 [libata] ahci_init_one+0x6c9/0xd20 [ahci] Do not access ata_port struct members unconditionally. Fixes: 633273a3ed1c ("libata-pmp: hook PMP support and enable it") Cc: stable(a)vger.kernel.org Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org> Reviewed-by: Hannes Reinecke <hare(a)suse.de> Reviewed-by: John Garry <john.g.garry(a)oracle.com> Link: https://lore.kernel.org/r/20240629124210.181537-7-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel(a)kernel.org> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index efb5195da60c..bdccf4ea251a 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -5517,6 +5517,9 @@ static void ata_host_release(struct kref *kref) for (i = 0; i < host->n_ports; i++) { struct ata_port *ap = host->ports[i]; + if (!ap) + continue; + kfree(ap->pmp_link); kfree(ap->slave_link); kfree(ap->ncq_sense_buf);

10 months, 1 week

2
1
0 0

[PATCH v3] mm: Fix race between __split_huge_pmd_locked() and GUP-fast

by Ryan Roberts

__split_huge_pmd_locked() can be called for a present THP, devmap or (non-present) migration entry. It calls pmdp_invalidate() unconditionally on the pmdp and only determines if it is present or not based on the returned old pmd. This is a problem for the migration entry case because pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a present pmd. On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future call to pmd_present() will return true. And therefore any lockless pgtable walker could see the migration entry pmd in this state and start interpretting the fields as if it were present, leading to BadThings (TM). GUP-fast appears to be one such lockless pgtable walker. x86 does not suffer the above problem, but instead pmd_mkinvalid() will corrupt the offset field of the swap entry within the swap pte. See link below for discussion of that problem. Fix all of this by only calling pmdp_invalidate() for a present pmd. And for good measure let's add a warning to all implementations of pmdp_invalidate[_ad](). I've manually reviewed all other pmdp_invalidate[_ad]() call sites and believe all others to be conformant. This is a theoretical bug found during code review. I don't have any test case to trigger it in practice. Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/all/0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com> --- Right v3; this goes back to the original approach in v1 to fix core-mm rather than push the fix into arm64, since we discovered that x86 can't handle pmd_mkinvalid() being called for non-present pmds either. I'm pulling in more arch maintainers because this version adds some warnings in arch code to help spot incorrect usage. Although Catalin had already accepted v2 (fixing arm64) [2] into for-next/fixes, he's agreed to either remove or revert it. Changes since v1 [1] ==================== - Improve pmdp_mkinvalid() docs to make it clear it can only be called for present pmd (per JohnH, Zi Yan) - Added warnings to arch overrides of pmdp_invalidate[_ad]() (per Zi Yan) - Moved comment next to new location of pmpd_invalidate() (per Zi Yan) [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.… [2] https://lore.kernel.org/all/20240430133138.732088-1-ryan.roberts@arm.com/ Thanks, Ryan Documentation/mm/arch_pgtable_helpers.rst | 6 ++- arch/powerpc/mm/book3s64/pgtable.c | 1 + arch/s390/include/asm/pgtable.h | 4 +- arch/sparc/mm/tlb.c | 1 + arch/x86/mm/pgtable.c | 2 + mm/huge_memory.c | 49 ++++++++++++----------- mm/pgtable-generic.c | 2 + 7 files changed, 39 insertions(+), 26 deletions(-) diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst index 2466d3363af7..ad50ca6f495e 100644 --- a/Documentation/mm/arch_pgtable_helpers.rst +++ b/Documentation/mm/arch_pgtable_helpers.rst @@ -140,7 +140,8 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD | +---------------------------+--------------------------------------------------+ -| pmd_mkinvalid | Invalidates a mapped PMD [1] | +| pmd_mkinvalid | Invalidates a present PMD; do not call for | +| | non-present PMD [1] | +---------------------------+--------------------------------------------------+ | pmd_set_huge | Creates a PMD huge mapping | +---------------------------+--------------------------------------------------+ @@ -196,7 +197,8 @@ PUD Page Table Helpers +---------------------------+--------------------------------------------------+ | pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD | +---------------------------+--------------------------------------------------+ -| pud_mkinvalid | Invalidates a mapped PUD [1] | +| pud_mkinvalid | Invalidates a present PUD; do not call for | +| | non-present PUD [1] | +---------------------------+--------------------------------------------------+ | pud_set_huge | Creates a PUD huge mapping | +---------------------------+--------------------------------------------------+ diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 83823db3488b..2975ea0841ba 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -170,6 +170,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, { unsigned long old_pmd; + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return __pmd(old_pmd); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 60950e7a25f5..480bea44559d 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1768,8 +1768,10 @@ static inline pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, static inline pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp) { - pmd_t pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID); + pmd_t pmd; + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); + pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID); return pmdp_xchg_direct(vma->vm_mm, addr, pmdp, pmd); } diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index b44d79d778c7..ef69127d7e5e 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -249,6 +249,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, { pmd_t old, entry; + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID); old = pmdp_establish(vma, address, pmdp, entry); flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE); diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index d007591b8059..103cbccf1d7d 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -631,6 +631,8 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); + /* * No flush is necessary. Once an invalid PTE is established, the PTE's * access and dirty bits cannot be updated. diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 89f58c7603b2..dd1fc105f70b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2493,32 +2493,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, return __split_huge_zero_page_pmd(vma, haddr, pmd); } - /* - * Up to this point the pmd is present and huge and userland has the - * whole access to the hugepage during the split (which happens in - * place). If we overwrite the pmd with the not-huge version pointing - * to the pte here (which of course we could if all CPUs were bug - * free), userland could trigger a small page size TLB miss on the - * small sized TLB while the hugepage TLB entry is still established in - * the huge TLB. Some CPU doesn't like that. - * See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum - * 383 on page 105. Intel should be safe but is also warns that it's - * only safe if the permission and cache attributes of the two entries - * loaded in the two TLB is identical (which should be the case here). - * But it is generally safer to never allow small and huge TLB entries - * for the same virtual address to be loaded simultaneously. So instead - * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the - * current pmd notpresent (atomically because here the pmd_trans_huge - * must remain set at all times on the pmd until the split is complete - * for this pmd), then we flush the SMP TLB and finally we write the - * non-huge version of the pmd entry with pmd_populate. - */ - old_pmd = pmdp_invalidate(vma, haddr, pmd); - - pmd_migration = is_pmd_migration_entry(old_pmd); + pmd_migration = is_pmd_migration_entry(*pmd); if (unlikely(pmd_migration)) { swp_entry_t entry; + old_pmd = *pmd; entry = pmd_to_swp_entry(old_pmd); page = pfn_swap_entry_to_page(entry); write = is_writable_migration_entry(entry); @@ -2529,6 +2508,30 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, soft_dirty = pmd_swp_soft_dirty(old_pmd); uffd_wp = pmd_swp_uffd_wp(old_pmd); } else { + /* + * Up to this point the pmd is present and huge and userland has + * the whole access to the hugepage during the split (which + * happens in place). If we overwrite the pmd with the not-huge + * version pointing to the pte here (which of course we could if + * all CPUs were bug free), userland could trigger a small page + * size TLB miss on the small sized TLB while the hugepage TLB + * entry is still established in the huge TLB. Some CPU doesn't + * like that. See + * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum + * 383 on page 105. Intel should be safe but is also warns that + * it's only safe if the permission and cache attributes of the + * two entries loaded in the two TLB is identical (which should + * be the case here). But it is generally safer to never allow + * small and huge TLB entries for the same virtual address to be + * loaded simultaneously. So instead of doing "pmd_populate(); + * flush_pmd_tlb_range();" we first mark the current pmd + * notpresent (atomically because here the pmd_trans_huge must + * remain set at all times on the pmd until the split is + * complete for this pmd), then we flush the SMP TLB and finally + * we write the non-huge version of the pmd entry with + * pmd_populate. + */ + old_pmd = pmdp_invalidate(vma, haddr, pmd); page = pmd_page(old_pmd); folio = page_folio(page); if (pmd_dirty(old_pmd)) { diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 4fcd959dcc4d..a78a4adf711a 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -198,6 +198,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return old; @@ -208,6 +209,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); return pmdp_invalidate(vma, address, pmdp); } #endif -- 2.25.1

10 months, 1 week

5
8
0 0

[PATCH] MIPS: fw: Gracefully handle unknown firmware protocols

by Bjørn Mork

Boards based on the same SoC family can use different boot loaders. These may pass numeric arguments which we erroneously interpret as command line or environment pointers. Such errors will cause boot to halt at an early stage since commit 056a68cea01e ("mips: allow firmware to pass RNG seed to kernel"). One known example of this issue is a HPE switch using a BootWare boot loader. It was found to pass these arguments to the kernel: 0x00020000 0x00060000 0xfffdffff 0x0000416c We can avoid hanging by validating that both passed pointers are in KSEG1 as expected. Cc: stable(a)vger.kernel.org Fixes: 14aecdd41921 ("MIPS: FW: Add environment variable processing.") Signed-off-by: Bjørn Mork <bjorn(a)mork.no> --- arch/mips/fw/lib/cmdline.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/mips/fw/lib/cmdline.c b/arch/mips/fw/lib/cmdline.c index 892765b742bb..51238c4f9455 100644 --- a/arch/mips/fw/lib/cmdline.c +++ b/arch/mips/fw/lib/cmdline.c @@ -22,7 +22,7 @@ void __init fw_init_cmdline(void) int i; /* Validate command line parameters. */ - if ((fw_arg0 >= CKSEG0) || (fw_arg1 < CKSEG0)) { + if (fw_arg0 >= CKSEG0 || fw_arg1 < CKSEG0 || fw_arg1 >= CKSEG2) { fw_argc = 0; _fw_argv = NULL; } else { @@ -31,7 +31,7 @@ void __init fw_init_cmdline(void) } /* Validate environment pointer. */ - if (fw_arg2 < CKSEG0) + if (fw_arg2 < CKSEG0 || fw_arg2 >= CKSEG2) _fw_envp = NULL; else _fw_envp = (int *)fw_arg2; -- 2.39.2

10 months, 1 week

4
9
0 0

[PATCH] btrfs: qgroup: add missing extent changeset release

by Fedor Pchelkin

The extent changeset may have some additional memory dynamically allocated for ulist in result of clear_record_extent_bits() execution. Release it after the local changeset is no longer needed in BTRFS_QGROUP_MODE_DISABLED case. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Reported-by: syzbot+81670362c283f3dd889c(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/000000000000aa8c0c060ade165e@google.com Fixes: af0e2aab3b70 ("btrfs: qgroup: flush reservations during quota disable") Cc: stable(a)vger.kernel.org # 6.10+ Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru> --- fs/btrfs/qgroup.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 5d57a285d59b..4f1fa5d427e1 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -4345,9 +4345,10 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, if (btrfs_qgroup_mode(inode->root->fs_info) == BTRFS_QGROUP_MODE_DISABLED) { extent_changeset_init(&changeset); - return clear_record_extent_bits(&inode->io_tree, start, - start + len - 1, - EXTENT_QGROUP_RESERVED, &changeset); + ret = clear_record_extent_bits(&inode->io_tree, start, + start + len - 1, + EXTENT_QGROUP_RESERVED, &changeset); + goto out; } /* In release case, we shouldn't have @reserved */ -- 2.39.2

10 months, 1 week

4
8
0 0

[PATCH v3] firmware_loader: Block path traversal

by Jann Horn

Most firmware names are hardcoded strings, or are constructed from fairly constrained format strings where the dynamic parts are just some hex numbers or such. However, there are a couple codepaths in the kernel where firmware file names contain string components that are passed through from a device or semi-privileged userspace; the ones I could find (not counting interfaces that require root privileges) are: - lpfc_sli4_request_firmware_update() seems to construct the firmware filename from "ModelName", a string that was previously parsed out of some descriptor ("Vital Product Data") in lpfc_fill_vpd() - nfp_net_fw_find() seems to construct a firmware filename from a model name coming from nfp_hwinfo_lookup(pf->hwinfo, "nffw.partno"), which I think parses some descriptor that was read from the device. (But this case likely isn't exploitable because the format string looks like "netronome/nic_%s", and there shouldn't be any *folders* starting with "netronome/nic_". The previous case was different because there, the "%s" is *at the start* of the format string.) - module_flash_fw_schedule() is reachable from the ETHTOOL_MSG_MODULE_FW_FLASH_ACT netlink command, which is marked as GENL_UNS_ADMIN_PERM (meaning CAP_NET_ADMIN inside a user namespace is enough to pass the privilege check), and takes a userspace-provided firmware name. (But I think to reach this case, you need to have CAP_NET_ADMIN over a network namespace that a special kind of ethernet device is mapped into, so I think this is not a viable attack path in practice.) Fix it by rejecting any firmware names containing ".." path components. For what it's worth, I went looking and haven't found any USB device drivers that use the firmware loader dangerously. Cc: stable(a)vger.kernel.org Reviewed-by: Danilo Krummrich <dakr(a)kernel.org> Fixes: abb139e75c2c ("firmware: teach the kernel to load firmware files directly from the filesystem") Signed-off-by: Jann Horn <jannh(a)google.com> --- Changes in v3: - replace name_contains_dotdot implementation (Danilo) - add missing \n in log format string (Danilo) - Link to v2: https://lore.kernel.org/r/20240823-firmware-traversal-v2-1-880082882709@goo… Changes in v2: - describe fix in commit message (dakr) - write check more clearly and with comment in separate helper (dakr) - document new restriction in comment above request_firmware() (dakr) - warn when new restriction is triggered - Link to v1: https://lore.kernel.org/r/20240820-firmware-traversal-v1-1-8699ffaa9276@goo… --- drivers/base/firmware_loader/main.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c index a03ee4b11134..324a9a3c087a 100644 --- a/drivers/base/firmware_loader/main.c +++ b/drivers/base/firmware_loader/main.c @@ -849,6 +849,26 @@ static void fw_log_firmware_info(const struct firmware *fw, const char *name, {} #endif +/* + * Reject firmware file names with ".." path components. + * There are drivers that construct firmware file names from device-supplied + * strings, and we don't want some device to be able to tell us "I would like to + * be sent my firmware from ../../../etc/shadow, please". + * + * Search for ".." surrounded by either '/' or start/end of string. + * + * This intentionally only looks at the firmware name, not at the firmware base + * directory or at symlink contents. + */ +static bool name_contains_dotdot(const char *name) +{ + size_t name_len = strlen(name); + + return strcmp(name, "..") == 0 || strncmp(name, "../", 3) == 0 || + strstr(name, "/../") != NULL || + (name_len >= 3 && strcmp(name+name_len-3, "/..") == 0); +} + /* called from request_firmware() and request_firmware_work_func() */ static int _request_firmware(const struct firmware **firmware_p, const char *name, @@ -869,6 +889,14 @@ _request_firmware(const struct firmware **firmware_p, const char *name, goto out; } + if (name_contains_dotdot(name)) { + dev_warn(device, + "Firmware load for '%s' refused, path contains '..' component\n", + name); + ret = -EINVAL; + goto out; + } + ret = _request_firmware_prepare(&fw, name, device, buf, size, offset, opt_flags); if (ret <= 0) /* error or already assigned */ @@ -946,6 +974,8 @@ _request_firmware(const struct firmware **firmware_p, const char *name, * @name will be used as $FIRMWARE in the uevent environment and * should be distinctive enough not to be confused with any other * firmware image for this or any other device. + * It must not contain any ".." path components - "foo/bar..bin" is + * allowed, but "foo/../bar.bin" is not. * * Caller must hold the reference count of @device. * --- base-commit: b0da640826ba3b6506b4996a6b23a429235e6923 change-id: 20240820-firmware-traversal-6df8501b0fe4 -- Jann Horn <jannh(a)google.com>

10 months, 1 week

2
1
0 0

[PATCH 1/2] drm/v3d: Disable preemption while updating GPU stats

by Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> We forgot to disable preemption around the write_seqcount_begin/end() pair while updating GPU stats: [ ] WARNING: CPU: 2 PID: 12 at include/linux/seqlock.h:221 __seqprop_assert.isra.0+0x128/0x150 [v3d] [ ] Workqueue: v3d_bin drm_sched_run_job_work [gpu_sched] <...snip...> [ ] Call trace: [ ] __seqprop_assert.isra.0+0x128/0x150 [v3d] [ ] v3d_job_start_stats.isra.0+0x90/0x218 [v3d] [ ] v3d_bin_job_run+0x23c/0x388 [v3d] [ ] drm_sched_run_job_work+0x520/0x6d0 [gpu_sched] [ ] process_one_work+0x62c/0xb48 [ ] worker_thread+0x468/0x5b0 [ ] kthread+0x1c4/0x1e0 [ ] ret_from_fork+0x10/0x20 Fix it. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Fixes: 6abe93b621ab ("drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler") Cc: Maíra Canal <mcanal(a)igalia.com> Cc: <stable(a)vger.kernel.org> # v6.10+ Acked-by: Maíra Canal <mcanal(a)igalia.com> --- drivers/gpu/drm/v3d/v3d_sched.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 42d4f4a2dba2..cc2e5a89467b 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -136,6 +136,8 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue queue) struct v3d_stats *local_stats = &file->stats[queue]; u64 now = local_clock(); + preempt_disable(); + write_seqcount_begin(&local_stats->lock); local_stats->start_ns = now; write_seqcount_end(&local_stats->lock); @@ -143,6 +145,8 @@ v3d_job_start_stats(struct v3d_job *job, enum v3d_queue queue) write_seqcount_begin(&global_stats->lock); global_stats->start_ns = now; write_seqcount_end(&global_stats->lock); + + preempt_enable(); } static void @@ -164,8 +168,10 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) struct v3d_stats *local_stats = &file->stats[queue]; u64 now = local_clock(); + preempt_disable(); v3d_stats_update(local_stats, now); v3d_stats_update(global_stats, now); + preempt_enable(); } static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job) -- 2.44.0

10 months, 1 week

2
1
0 0

[RESEND PATCH v1] mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0

by Hailong Liu

The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption. Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE): kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code. Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu <hailong.liu(a)oppo.com> Reported-by: Tangquan Zheng <zhengtangquan(a)oppo.com> Cc: <stable(a)vger.kernel.org> CC: Barry Song <21cnbao(a)gmail.com> CC: Baoquan He <bhe(a)redhat.com> CC: Matthew Wilcox <willy(a)infradead.org> --- mm/vmalloc.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6b783baf12a1..af2de36549d6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_noprof(alloc_gfp, order); else page = alloc_pages_node_noprof(nid, alloc_gfp, order); - if (unlikely(!page)) { - if (!nofail) - break; - - /* fall back to the zero order allocations */ - alloc_gfp |= __GFP_NOFAIL; - order = 0; - continue; - } + if (unlikely(!page)) + break; /* * Higher order allocations must be able to be treated as --- Sorry for fat fingers. with .rej file. resend this. Baoquan suggests set page_shift to 0 if fallback in (2 and concern about performance of retry with order-0. But IMO with retry, - Save memory usage if high order allocation failed. - Keep consistancy with align and page-shift. - make use of bulk allocator with order-0 [2] https://lore.kernel.org/lkml/20240725035318.471-1-hailong.liu@oppo.com/ -- 2.30.0

10 months, 1 week

6
29
0 0

[PATCH] vfs: fix race between evice_inodes() and find_inode()&iput()

by Julian Sun

Hi, all Recently I noticed a bug[1] in btrfs, after digged it into and I believe it'a race in vfs. Let's assume there's a inode (ie ino 261) with i_count 1 is called by iput(), and there's a concurrent thread calling generic_shutdown_super(). cpu0: cpu1: iput() // i_count is 1 ->spin_lock(inode) ->dec i_count to 0 ->iput_final() generic_shutdown_super() ->__inode_add_lru() ->evict_inodes() // cause some reason[2] ->if (atomic_read(inode->i_count)) continue; // return before // inode 261 passed the above check // list_lru_add_obj() // and then schedule out ->spin_unlock() // note here: the inode 261 // was still at sb list and hash list, // and I_FREEING|I_WILL_FREE was not been set btrfs_iget() // after some function calls ->find_inode() // found the above inode 261 ->spin_lock(inode) // check I_FREEING|I_WILL_FREE // and passed ->__iget() ->spin_unlock(inode) // schedule back ->spin_lock(inode) // check (I_NEW|I_FREEING|I_WILL_FREE) flags, // passed and set I_FREEING iput() ->spin_unlock(inode) ->spin_lock(inode) ->evict() // dec i_count to 0 ->iput_final() ->spin_unlock() ->evict() Now, we have two threads simultaneously evicting the same inode, which may trigger the BUG(inode->i_state & I_CLEAR) statement both within clear_inode() and iput(). To fix the bug, recheck the inode->i_count after holding i_lock. Because in the most scenarios, the first check is valid, and the overhead of spin_lock() can be reduced. If there is any misunderstanding, please let me know, thanks. [1]: https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/ [2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable() return false when I reproduced the bug. Reported-by: syzbot+67ba3c42bcbb4665d3ad(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad CC: stable(a)vger.kernel.org Fixes: 63997e98a3be ("split invalidate_inodes()") Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com> --- fs/inode.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/inode.c b/fs/inode.c index 3a41f83a4ba5..011f630777d0 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -723,6 +723,10 @@ void evict_inodes(struct super_block *sb) continue; spin_lock(&inode->i_lock); + if (atomic_read(&inode->i_count)) { + spin_unlock(&inode->i_lock); + continue; + } if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { spin_unlock(&inode->i_lock); continue; -- 2.39.2

10 months, 1 week

4
5
0 0

[PATCH] f2fs: Do not check the FI_DIRTY_INODE flag when umounting a ro fs.

by Julian Sun

Hi, all. Recently syzbot reported a bug as following: kernel BUG at fs/f2fs/inode.c:896! CPU: 1 UID: 0 PID: 5217 Comm: syz-executor605 Not tainted 6.11.0-rc4-syzkaller-00033-g872cf28b8df9 #0 RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896 Call Trace: <TASK> evict+0x532/0x950 fs/inode.c:704 dispose_list fs/inode.c:747 [inline] evict_inodes+0x5f9/0x690 fs/inode.c:797 generic_shutdown_super+0x9d/0x2d0 fs/super.c:627 kill_block_super+0x44/0x90 fs/super.c:1696 kill_f2fs_super+0x344/0x690 fs/f2fs/super.c:4898 deactivate_locked_super+0xc4/0x130 fs/super.c:473 cleanup_mnt+0x41f/0x4b0 fs/namespace.c:1373 task_work_run+0x24f/0x310 kernel/task_work.c:228 ptrace_notify+0x2d2/0x380 kernel/signal.c:2402 ptrace_report_syscall include/linux/ptrace.h:415 [inline] ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline] syscall_exit_work+0xc6/0x190 kernel/entry/common.c:173 syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline] syscall_exit_to_user_mode+0x279/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f The syzbot constructed the following scenario: concurrently creating directories and setting the file system to read-only. In this case, while f2fs was making dir, the filesystem switched to readonly, and when it tried to clear the dirty flag, it triggered this code path: f2fs_mkdir()-> f2fs_sync_fs()->f2fs_write_checkpoint() ->f2fs_readonly(). This resulted FI_DIRTY_INODE flag not being cleared, which eventually led to a bug being triggered during the FI_DIRTY_INODE check in f2fs_evict_inode(). In this case, we cannot do anything further, so if filesystem is readonly, do not trigger the BUG. Instead, clean up resources to the best of our ability to prevent triggering subsequent resource leak checks. If there is anything important I'm missing, please let me know, thanks. Reported-by: syzbot+ebea2790904673d7c618(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ebea2790904673d7c618 Fixes: ca7d802a7d8e ("f2fs: detect dirty inode in evict_inode") CC: stable(a)vger.kernel.org Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com> --- fs/f2fs/inode.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index aef57172014f..52d273383ec2 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -892,8 +892,12 @@ void f2fs_evict_inode(struct inode *inode) atomic_read(&fi->i_compr_blocks)); if (likely(!f2fs_cp_error(sbi) && - !is_sbi_flag_set(sbi, SBI_CP_DISABLED))) - f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE)); + !is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { + if (!f2fs_readonly(sbi->sb)) + f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE)); + else + f2fs_inode_synced(inode); + } else f2fs_inode_synced(inode); -- 2.39.2

10 months, 1 week

2
2
0 0

[PATCH] remoteproc: k3-r5: Fix error handling when power-up failed

by Jan Kiszka

From: Jan Kiszka <jan.kiszka(a)siemens.com> By simply bailing out, the driver was violating its rule and internal assumptions that either both or no rproc should be initialized. E.g., this could cause the first core to be available but not the second one, leading to crashes on its shutdown later on while trying to dereference that second instance. Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") Signed-off-by: Jan Kiszka <jan.kiszka(a)siemens.com> --- drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name); - return ret; + goto err_powerup; } } @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } } +err_powerup: rproc_del(rproc); err_add: k3_r5_reserved_mem_exit(kproc); -- 2.43.0

10 months, 1 week

4
8
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror August 2024