Boards based on the same SoC family can use different boot loaders.
These may pass numeric arguments which we erroneously interpret as
command line or environment pointers. Such errors will cause boot
to halt at an early stage since commit 056a68cea01e ("mips: allow
firmware to pass RNG seed to kernel").
One known example of this issue is a HPE switch using a BootWare
boot loader. It was found to pass these arguments to the kernel:
0x00020000 0x00060000 0xfffdffff 0x0000416c
We can avoid hanging by validating that both passed pointers are in
KSEG1 as expected.
Cc: stable(a)vger.kernel.org
Fixes: 14aecdd41921 ("MIPS: FW: Add environment variable processing.")
Signed-off-by: Bjørn Mork <bjorn(a)mork.no>
---
arch/mips/fw/lib/cmdline.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/mips/fw/lib/cmdline.c b/arch/mips/fw/lib/cmdline.c
index 892765b742bb..51238c4f9455 100644
--- a/arch/mips/fw/lib/cmdline.c
+++ b/arch/mips/fw/lib/cmdline.c
@@ -22,7 +22,7 @@ void __init fw_init_cmdline(void)
int i;
/* Validate command line parameters. */
- if ((fw_arg0 >= CKSEG0) || (fw_arg1 < CKSEG0)) {
+ if (fw_arg0 >= CKSEG0 || fw_arg1 < CKSEG0 || fw_arg1 >= CKSEG2) {
fw_argc = 0;
_fw_argv = NULL;
} else {
@@ -31,7 +31,7 @@ void __init fw_init_cmdline(void)
}
/* Validate environment pointer. */
- if (fw_arg2 < CKSEG0)
+ if (fw_arg2 < CKSEG0 || fw_arg2 >= CKSEG2)
_fw_envp = NULL;
else
_fw_envp = (int *)fw_arg2;
--
2.39.2
The extent changeset may have some additional memory dynamically allocated
for ulist in result of clear_record_extent_bits() execution.
Release it after the local changeset is no longer needed in
BTRFS_QGROUP_MODE_DISABLED case.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Reported-by: syzbot+81670362c283f3dd889c(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000aa8c0c060ade165e@google.com
Fixes: af0e2aab3b70 ("btrfs: qgroup: flush reservations during quota disable")
Cc: stable(a)vger.kernel.org # 6.10+
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
fs/btrfs/qgroup.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 5d57a285d59b..4f1fa5d427e1 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -4345,9 +4345,10 @@ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode,
if (btrfs_qgroup_mode(inode->root->fs_info) == BTRFS_QGROUP_MODE_DISABLED) {
extent_changeset_init(&changeset);
- return clear_record_extent_bits(&inode->io_tree, start,
- start + len - 1,
- EXTENT_QGROUP_RESERVED, &changeset);
+ ret = clear_record_extent_bits(&inode->io_tree, start,
+ start + len - 1,
+ EXTENT_QGROUP_RESERVED, &changeset);
+ goto out;
}
/* In release case, we shouldn't have @reserved */
--
2.39.2
Most firmware names are hardcoded strings, or are constructed from fairly
constrained format strings where the dynamic parts are just some hex
numbers or such.
However, there are a couple codepaths in the kernel where firmware file
names contain string components that are passed through from a device or
semi-privileged userspace; the ones I could find (not counting interfaces
that require root privileges) are:
- lpfc_sli4_request_firmware_update() seems to construct the firmware
filename from "ModelName", a string that was previously parsed out of
some descriptor ("Vital Product Data") in lpfc_fill_vpd()
- nfp_net_fw_find() seems to construct a firmware filename from a model
name coming from nfp_hwinfo_lookup(pf->hwinfo, "nffw.partno"), which I
think parses some descriptor that was read from the device.
(But this case likely isn't exploitable because the format string looks
like "netronome/nic_%s", and there shouldn't be any *folders* starting
with "netronome/nic_". The previous case was different because there,
the "%s" is *at the start* of the format string.)
- module_flash_fw_schedule() is reachable from the
ETHTOOL_MSG_MODULE_FW_FLASH_ACT netlink command, which is marked as
GENL_UNS_ADMIN_PERM (meaning CAP_NET_ADMIN inside a user namespace is
enough to pass the privilege check), and takes a userspace-provided
firmware name.
(But I think to reach this case, you need to have CAP_NET_ADMIN over a
network namespace that a special kind of ethernet device is mapped into,
so I think this is not a viable attack path in practice.)
Fix it by rejecting any firmware names containing ".." path components.
For what it's worth, I went looking and haven't found any USB device
drivers that use the firmware loader dangerously.
Cc: stable(a)vger.kernel.org
Reviewed-by: Danilo Krummrich <dakr(a)kernel.org>
Fixes: abb139e75c2c ("firmware: teach the kernel to load firmware files directly from the filesystem")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
Changes in v3:
- replace name_contains_dotdot implementation (Danilo)
- add missing \n in log format string (Danilo)
- Link to v2: https://lore.kernel.org/r/20240823-firmware-traversal-v2-1-880082882709@goo…
Changes in v2:
- describe fix in commit message (dakr)
- write check more clearly and with comment in separate helper (dakr)
- document new restriction in comment above request_firmware() (dakr)
- warn when new restriction is triggered
- Link to v1: https://lore.kernel.org/r/20240820-firmware-traversal-v1-1-8699ffaa9276@goo…
---
drivers/base/firmware_loader/main.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index a03ee4b11134..324a9a3c087a 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -849,6 +849,26 @@ static void fw_log_firmware_info(const struct firmware *fw, const char *name,
{}
#endif
+/*
+ * Reject firmware file names with ".." path components.
+ * There are drivers that construct firmware file names from device-supplied
+ * strings, and we don't want some device to be able to tell us "I would like to
+ * be sent my firmware from ../../../etc/shadow, please".
+ *
+ * Search for ".." surrounded by either '/' or start/end of string.
+ *
+ * This intentionally only looks at the firmware name, not at the firmware base
+ * directory or at symlink contents.
+ */
+static bool name_contains_dotdot(const char *name)
+{
+ size_t name_len = strlen(name);
+
+ return strcmp(name, "..") == 0 || strncmp(name, "../", 3) == 0 ||
+ strstr(name, "/../") != NULL ||
+ (name_len >= 3 && strcmp(name+name_len-3, "/..") == 0);
+}
+
/* called from request_firmware() and request_firmware_work_func() */
static int
_request_firmware(const struct firmware **firmware_p, const char *name,
@@ -869,6 +889,14 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
goto out;
}
+ if (name_contains_dotdot(name)) {
+ dev_warn(device,
+ "Firmware load for '%s' refused, path contains '..' component\n",
+ name);
+ ret = -EINVAL;
+ goto out;
+ }
+
ret = _request_firmware_prepare(&fw, name, device, buf, size,
offset, opt_flags);
if (ret <= 0) /* error or already assigned */
@@ -946,6 +974,8 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
* @name will be used as $FIRMWARE in the uevent environment and
* should be distinctive enough not to be confused with any other
* firmware image for this or any other device.
+ * It must not contain any ".." path components - "foo/bar..bin" is
+ * allowed, but "foo/../bar.bin" is not.
*
* Caller must hold the reference count of @device.
*
---
base-commit: b0da640826ba3b6506b4996a6b23a429235e6923
change-id: 20240820-firmware-traversal-6df8501b0fe4
--
Jann Horn <jannh(a)google.com>
The __vmap_pages_range_noflush() assumes its argument pages** contains
pages with the same page shift. However, since commit e9c3cda4d86e
("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags
includes __GFP_NOFAIL with high order in vm_area_alloc_pages()
and page allocation failed for high order, the pages** may contain
two different page shifts (high order and order-0). This could
lead __vmap_pages_range_noflush() to perform incorrect mappings,
potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE):
kvmalloc(2M, __GFP_NOFAIL|GFP_X)
__vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
vmap_pages_range()
vmap_pages_range_noflush()
__vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order
allocation fails, __vmalloc_node_range_noprof() will retry with
order-0. Therefore, it is unnecessary to fallback to order-0
here. Therefore, fix this by removing the fallback code.
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
Signed-off-by: Hailong Liu <hailong.liu(a)oppo.com>
Reported-by: Tangquan Zheng <zhengtangquan(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
CC: Barry Song <21cnbao(a)gmail.com>
CC: Baoquan He <bhe(a)redhat.com>
CC: Matthew Wilcox <willy(a)infradead.org>
---
mm/vmalloc.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6b783baf12a1..af2de36549d6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
page = alloc_pages_noprof(alloc_gfp, order);
else
page = alloc_pages_node_noprof(nid, alloc_gfp, order);
- if (unlikely(!page)) {
- if (!nofail)
- break;
-
- /* fall back to the zero order allocations */
- alloc_gfp |= __GFP_NOFAIL;
- order = 0;
- continue;
- }
+ if (unlikely(!page))
+ break;
/*
* Higher order allocations must be able to be treated as
---
Sorry for fat fingers. with .rej file. resend this.
Baoquan suggests set page_shift to 0 if fallback in (2 and concern about
performance of retry with order-0. But IMO with retry,
- Save memory usage if high order allocation failed.
- Keep consistancy with align and page-shift.
- make use of bulk allocator with order-0
[2] https://lore.kernel.org/lkml/20240725035318.471-1-hailong.liu@oppo.com/
--
2.30.0
Hi, all
Recently I noticed a bug[1] in btrfs, after digged it into
and I believe it'a race in vfs.
Let's assume there's a inode (ie ino 261) with i_count 1 is
called by iput(), and there's a concurrent thread calling
generic_shutdown_super().
cpu0: cpu1:
iput() // i_count is 1
->spin_lock(inode)
->dec i_count to 0
->iput_final() generic_shutdown_super()
->__inode_add_lru() ->evict_inodes()
// cause some reason[2] ->if (atomic_read(inode->i_count)) continue;
// return before // inode 261 passed the above check
// list_lru_add_obj() // and then schedule out
->spin_unlock()
// note here: the inode 261
// was still at sb list and hash list,
// and I_FREEING|I_WILL_FREE was not been set
btrfs_iget()
// after some function calls
->find_inode()
// found the above inode 261
->spin_lock(inode)
// check I_FREEING|I_WILL_FREE
// and passed
->__iget()
->spin_unlock(inode) // schedule back
->spin_lock(inode)
// check (I_NEW|I_FREEING|I_WILL_FREE) flags,
// passed and set I_FREEING
iput() ->spin_unlock(inode)
->spin_lock(inode) ->evict()
// dec i_count to 0
->iput_final()
->spin_unlock()
->evict()
Now, we have two threads simultaneously evicting
the same inode, which may trigger the BUG(inode->i_state & I_CLEAR)
statement both within clear_inode() and iput().
To fix the bug, recheck the inode->i_count after holding i_lock.
Because in the most scenarios, the first check is valid, and
the overhead of spin_lock() can be reduced.
If there is any misunderstanding, please let me know, thanks.
[1]: https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/
[2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable()
return false when I reproduced the bug.
Reported-by: syzbot+67ba3c42bcbb4665d3ad(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad
CC: stable(a)vger.kernel.org
Fixes: 63997e98a3be ("split invalidate_inodes()")
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
---
fs/inode.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/inode.c b/fs/inode.c
index 3a41f83a4ba5..011f630777d0 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -723,6 +723,10 @@ void evict_inodes(struct super_block *sb)
continue;
spin_lock(&inode->i_lock);
+ if (atomic_read(&inode->i_count)) {
+ spin_unlock(&inode->i_lock);
+ continue;
+ }
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
spin_unlock(&inode->i_lock);
continue;
--
2.39.2
Hi, all.
Recently syzbot reported a bug as following:
kernel BUG at fs/f2fs/inode.c:896!
CPU: 1 UID: 0 PID: 5217 Comm: syz-executor605 Not tainted 6.11.0-rc4-syzkaller-00033-g872cf28b8df9 #0
RIP: 0010:f2fs_evict_inode+0x1598/0x15c0 fs/f2fs/inode.c:896
Call Trace:
<TASK>
evict+0x532/0x950 fs/inode.c:704
dispose_list fs/inode.c:747 [inline]
evict_inodes+0x5f9/0x690 fs/inode.c:797
generic_shutdown_super+0x9d/0x2d0 fs/super.c:627
kill_block_super+0x44/0x90 fs/super.c:1696
kill_f2fs_super+0x344/0x690 fs/f2fs/super.c:4898
deactivate_locked_super+0xc4/0x130 fs/super.c:473
cleanup_mnt+0x41f/0x4b0 fs/namespace.c:1373
task_work_run+0x24f/0x310 kernel/task_work.c:228
ptrace_notify+0x2d2/0x380 kernel/signal.c:2402
ptrace_report_syscall include/linux/ptrace.h:415 [inline]
ptrace_report_syscall_exit include/linux/ptrace.h:477 [inline]
syscall_exit_work+0xc6/0x190 kernel/entry/common.c:173
syscall_exit_to_user_mode_prepare kernel/entry/common.c:200 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline]
syscall_exit_to_user_mode+0x279/0x370 kernel/entry/common.c:218
do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The syzbot constructed the following scenario: concurrently
creating directories and setting the file system to read-only.
In this case, while f2fs was making dir, the filesystem switched to
readonly, and when it tried to clear the dirty flag, it triggered this
code path: f2fs_mkdir()-> f2fs_sync_fs()->f2fs_write_checkpoint()
->f2fs_readonly(). This resulted FI_DIRTY_INODE flag not being cleared,
which eventually led to a bug being triggered during the FI_DIRTY_INODE
check in f2fs_evict_inode().
In this case, we cannot do anything further, so if filesystem is readonly,
do not trigger the BUG. Instead, clean up resources to the best of our
ability to prevent triggering subsequent resource leak checks.
If there is anything important I'm missing, please let me know, thanks.
Reported-by: syzbot+ebea2790904673d7c618(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ebea2790904673d7c618
Fixes: ca7d802a7d8e ("f2fs: detect dirty inode in evict_inode")
CC: stable(a)vger.kernel.org
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
---
fs/f2fs/inode.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index aef57172014f..52d273383ec2 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -892,8 +892,12 @@ void f2fs_evict_inode(struct inode *inode)
atomic_read(&fi->i_compr_blocks));
if (likely(!f2fs_cp_error(sbi) &&
- !is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
- f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
+ !is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
+ if (!f2fs_readonly(sbi->sb))
+ f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
+ else
+ f2fs_inode_synced(inode);
+ }
else
f2fs_inode_synced(inode);
--
2.39.2
From: Jan Kiszka <jan.kiszka(a)siemens.com>
By simply bailing out, the driver was violating its rule and internal
assumptions that either both or no rproc should be initialized. E.g.,
this could cause the first core to be available but not the second one,
leading to crashes on its shutdown later on while trying to dereference
that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1")
Signed-off-by: Jan Kiszka <jan.kiszka(a)siemens.com>
---
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c
index 39a47540c590..eb09d2e9b32a 100644
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
@@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev)
dev_err(dev,
"Timed out waiting for %s core to power up!\n",
rproc->name);
- return ret;
+ goto err_powerup;
}
}
@@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev)
}
}
+err_powerup:
rproc_del(rproc);
err_add:
k3_r5_reserved_mem_exit(kproc);
--
2.43.0
The TDG_VM_WR TDCALL is used to ask the TDX module to change some
TD-specific VM configuration. There is currently only one user in the
kernel of this TDCALL leaf. More will be added shortly.
Refactor to make way for more users of TDG_VM_WR who will need to modify
other TD configuration values.
Add a wrapper for the TDG_VM_RD TDCALL that requests TD-specific
metadata from the TDX module. There are currently no users for
TDG_VM_RD. Mark it as __maybe_unused until the first user appears.
This is preparation for enumeration and enabling optional TD features.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 32 ++++++++++++++++++++++++++-----
arch/x86/include/asm/shared/tdx.h | 1 +
2 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 078e2bac2553..64717a96a936 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -77,6 +77,32 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
panic("TDCALL %lld failed (Buggy TDX module!)\n", fn);
}
+/* Read TD-scoped metadata */
+static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value)
+{
+ struct tdx_module_args args = {
+ .rdx = field,
+ };
+ u64 ret;
+
+ ret = __tdcall_ret(TDG_VM_RD, &args);
+ *value = args.r8;
+
+ return ret;
+}
+
+/* Write TD-scoped metadata */
+static inline u64 tdg_vm_wr(u64 field, u64 value, u64 mask)
+{
+ struct tdx_module_args args = {
+ .rdx = field,
+ .r8 = value,
+ .r9 = mask,
+ };
+
+ return __tdcall(TDG_VM_WR, &args);
+}
+
/**
* tdx_mcall_get_report0() - Wrapper to get TDREPORT0 (a.k.a. TDREPORT
* subtype 0) using TDG.MR.REPORT TDCALL.
@@ -924,10 +950,6 @@ static void tdx_kexec_finish(void)
void __init tdx_early_init(void)
{
- struct tdx_module_args args = {
- .rdx = TDCS_NOTIFY_ENABLES,
- .r9 = -1ULL,
- };
u64 cc_mask;
u32 eax, sig[3];
@@ -946,7 +968,7 @@ void __init tdx_early_init(void)
cc_set_mask(cc_mask);
/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
- tdcall(TDG_VM_WR, &args);
+ tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
/*
* All bits above GPA width are reserved and kernel treats shared bit
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index fdfd41511b02..7e12cfa28bec 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -16,6 +16,7 @@
#define TDG_VP_VEINFO_GET 3
#define TDG_MR_REPORT 4
#define TDG_MEM_PAGE_ACCEPT 6
+#define TDG_VM_RD 7
#define TDG_VM_WR 8
/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
--
2.45.2