January 2022 - Linux-stable-mirror

[PATCH] gpio: Revert regression in sysfs-gpio (gpiolib.c)

by Marcelo Roberto Jimenez

Some GPIO lines have stopped working after the patch commit 2ab73c6d8323f ("gpio: Support GPIO controllers without pin-ranges") And this has supposedly been fixed in the following patches commit 89ad556b7f96a ("gpio: Avoid using pin ranges with !PINCTRL") commit 6dbbf84603961 ("gpiolib: Don't free if pin ranges are not defined") But an erratic behavior where some GPIO lines work while others do not work has been introduced. This patch reverts those changes so that the sysfs-gpio interface works properly again. Signed-off-by: Marcelo Roberto Jimenez <marcelo.jimenez(a)gmail.com> --- Hi, My system is ARM926EJ-S rev 5 (v5l) (AT91SAM9G25), the board is an ACME Systems Arietta. The system used sysfs-gpio to manage a few gpio lines, and I have noticed that some have stopped working. The test script is very simple: #! /bin/bash cd /sys/class/gpio/ echo 24 > export cd pioA24 echo out > direction echo 0 > value cat value echo 1 > value cat value echo 0 > value cat value echo 1 > value cat value cd .. echo 24 > unexport In a "good" kernel, this script outputs 0, 1, 0, 1. In a bad kernel, the output result is 1, 1, 1, 1. Also it must be possible to run this script twice without errors, that was the issue with the gpiochip_generic_free() call that had been addressed in another patch. In my system PINCTRL is automatically selected by SOC_AT91SAM9 [=y] && ARCH_AT91 [=y] && ARCH_MULTI_V5 [=y] So it is not an option to disable it to make it work. Best regards, Marcelo. drivers/gpio/gpiolib.c | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index af5bb8fedfea..ac69ec8fb37a 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1804,11 +1804,6 @@ static inline void gpiochip_irqchip_free_valid_mask(struct gpio_chip *gc) */ int gpiochip_generic_request(struct gpio_chip *gc, unsigned offset) { -#ifdef CONFIG_PINCTRL - if (list_empty(&gc->gpiodev->pin_ranges)) - return 0; -#endif - return pinctrl_gpio_request(gc->gpiodev->base + offset); } EXPORT_SYMBOL_GPL(gpiochip_generic_request); @@ -1820,11 +1815,6 @@ EXPORT_SYMBOL_GPL(gpiochip_generic_request); */ void gpiochip_generic_free(struct gpio_chip *gc, unsigned offset) { -#ifdef CONFIG_PINCTRL - if (list_empty(&gc->gpiodev->pin_ranges)) - return; -#endif - pinctrl_gpio_free(gc->gpiodev->base + offset); } EXPORT_SYMBOL_GPL(gpiochip_generic_free); -- 2.30.2

3 years, 6 months

9
27
0 0

[PATCH 5.10.y] drm/cirrus: fix a NULL vs IS_ERR() checks

by Shile Zhang

The function drm_gem_shmem_vmap can returns error pointers as well, which could cause following kernel crash: BUG: unable to handle page fault for address: fffffffffffffffc PGD 1426a12067 P4D 1426a12067 PUD 1426a14067 PMD 0 Oops: 0000 [#1] SMP NOPTI CPU: 12 PID: 3598532 Comm: stress-ng Kdump: loaded Not tainted 5.10.50.x86_64 #1 ... RIP: 0010:memcpy_toio+0x23/0x50 Code: 00 00 00 00 0f 1f 00 0f 1f 44 00 00 48 85 d2 74 28 40 f6 c7 01 75 2b 48 83 fa 01 76 06 40 f6 c7 02 75 17 48 89 d1 48 c1 e9 02 <f3> a5 f6 c2 02 74 02 66 a5 f6 c2 01 74 01 a4 c3 66 a5 48 83 ea 02 RSP: 0018:ffffafbf8a203c68 EFLAGS: 00010216 RAX: 0000000000000000 RBX: fffffffffffffffc RCX: 0000000000000200 RDX: 0000000000000800 RSI: fffffffffffffffc RDI: ffffafbf82000000 RBP: ffffafbf82000000 R08: 0000000000000002 R09: 0000000000000000 R10: 00000000000002b5 R11: 0000000000000000 R12: 0000000000000800 R13: ffff8a6801099300 R14: 0000000000000001 R15: 0000000000000300 FS: 00007f4a6bc5f740(0000) GS:ffff8a8641900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffffc CR3: 00000016d3874001 CR4: 00000000003606e0 Call Trace: drm_fb_memcpy_dstclip+0x5e/0x80 [drm_kms_helper] cirrus_fb_blit_rect.isra.0+0xb7/0xe0 [cirrus] cirrus_pipe_update+0x9f/0xa8 [cirrus] drm_atomic_helper_commit_planes+0xb8/0x220 [drm_kms_helper] drm_atomic_helper_commit_tail+0x42/0x80 [drm_kms_helper] commit_tail+0xce/0x130 [drm_kms_helper] drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper] drm_client_modeset_commit_atomic+0x1c4/0x200 [drm] drm_client_modeset_commit_locked+0x53/0x80 [drm] drm_client_modeset_commit+0x24/0x40 [drm] drm_fbdev_client_restore+0x48/0x85 [drm_kms_helper] drm_client_dev_restore+0x64/0xb0 [drm] drm_release+0xf2/0x110 [drm] __fput+0x96/0x240 task_work_run+0x5c/0x90 exit_to_user_mode_loop+0xce/0xd0 exit_to_user_mode_prepare+0x6a/0x70 syscall_exit_to_user_mode+0x12/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f4a6bd82c2b Fixes: ab3e023b1b4c9 ("drm/cirrus: rewrite and modernize driver.") CC: stable(a)vger.kernel.org Reported-by: Wen Kang <kw01107137(a)alibaba-inc.com> Signed-off-by: Shile Zhang <shile.zhang(a)linux.alibaba.com> --- drivers/gpu/drm/tiny/cirrus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/tiny/cirrus.c b/drivers/gpu/drm/tiny/cirrus.c index 744a8e337e41..d64f6bb767ee 100644 --- a/drivers/gpu/drm/tiny/cirrus.c +++ b/drivers/gpu/drm/tiny/cirrus.c @@ -323,7 +323,7 @@ static int cirrus_fb_blit_rect(struct drm_framebuffer *fb, ret = -ENOMEM; vmap = drm_gem_shmem_vmap(fb->obj[0]); - if (!vmap) + if (IS_ERR_OR_NULL(vmap)) goto out_dev_exit; if (cirrus->cpp == fb->format->cpp[0]) -- 2.33.0.rc2

3 years, 7 months

2
10
0 0

[PATCH 1/3] pwm-sun4i: convert "next_period" to local variable

by Max Kellermann

Its value is calculated in sun4i_pwm_apply() and is used only there. Cc: stable(a)vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann(a)gmail.com> --- drivers/pwm/pwm-sun4i.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/pwm/pwm-sun4i.c b/drivers/pwm/pwm-sun4i.c index 91ca67651abd..c7c564a6bf36 100644 --- a/drivers/pwm/pwm-sun4i.c +++ b/drivers/pwm/pwm-sun4i.c @@ -89,7 +89,6 @@ struct sun4i_pwm_chip { void __iomem *base; spinlock_t ctrl_lock; const struct sun4i_pwm_data *data; - unsigned long next_period[2]; }; static inline struct sun4i_pwm_chip *to_sun4i_pwm_chip(struct pwm_chip *chip) @@ -237,6 +236,7 @@ static int sun4i_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, int ret; unsigned int delay_us, prescaler = 0; unsigned long now; + unsigned long next_period; bool bypass; pwm_get_state(pwm, &cstate); @@ -284,7 +284,7 @@ static int sun4i_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, val = (duty & PWM_DTY_MASK) | PWM_PRD(period); sun4i_pwm_writel(sun4i_pwm, val, PWM_CH_PRD(pwm->hwpwm)); - sun4i_pwm->next_period[pwm->hwpwm] = jiffies + + next_period = jiffies + nsecs_to_jiffies(cstate.period + 1000); if (state->polarity != PWM_POLARITY_NORMAL) @@ -306,9 +306,8 @@ static int sun4i_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, /* We need a full period to elapse before disabling the channel. */ now = jiffies; - if (time_before(now, sun4i_pwm->next_period[pwm->hwpwm])) { - delay_us = jiffies_to_usecs(sun4i_pwm->next_period[pwm->hwpwm] - - now); + if (time_before(now, next_period)) { + delay_us = jiffies_to_usecs(next_period - now); if ((delay_us / 500) > MAX_UDELAY_MS) msleep(delay_us / 1000 + 1); else -- 2.34.0

3 years, 7 months

4
11
0 0

[PATCH 01/12] kexec: Allow architecture code to opt-out at runtime

by Joerg Roedel

From: Joerg Roedel <jroedel(a)suse.de> Allow a runtime opt-out of kexec support for architecture code in case the kernel is running in an environment where kexec is not properly supported yet. This will be used on x86 when the kernel is running as an SEV-ES guest. SEV-ES guests need special handling for kexec to hand over all CPUs to the new kernel. This requires special hypervisor support and handling code in the guest which is not yet implemented. Cc: stable(a)vger.kernel.org # v5.10+ Signed-off-by: Joerg Roedel <jroedel(a)suse.de> --- include/linux/kexec.h | 1 + kernel/kexec.c | 14 ++++++++++++++ kernel/kexec_file.c | 9 +++++++++ 3 files changed, 24 insertions(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0c994ae37729..85c30dcd0bdc 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -201,6 +201,7 @@ int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, unsigned long buf_len); #endif int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf); +bool arch_kexec_supported(void); extern int kexec_add_buffer(struct kexec_buf *kbuf); int kexec_locate_mem_hole(struct kexec_buf *kbuf); diff --git a/kernel/kexec.c b/kernel/kexec.c index c82c6c06f051..d03134160458 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -195,11 +195,25 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, * that to happen you need to do that yourself. */ +bool __weak arch_kexec_supported(void) +{ + return true; +} + static inline int kexec_load_check(unsigned long nr_segments, unsigned long flags) { int result; + /* + * The architecture may support kexec in general, but the kernel could + * run in an environment where it is not (yet) possible to execute a new + * kernel. Allow the architecture code to opt-out of kexec support when + * it is running in such an environment. + */ + if (!arch_kexec_supported()) + return -ENOSYS; + /* We only trust the superuser with rebooting the system. */ if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) return -EPERM; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 33400ff051a8..96d08a512e9c 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -358,6 +358,15 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, int ret = 0, i; struct kimage **dest_image, *image; + /* + * The architecture may support kexec in general, but the kernel could + * run in an environment where it is not (yet) possible to execute a new + * kernel. Allow the architecture code to opt-out of kexec support when + * it is running in such an environment. + */ + if (!arch_kexec_supported()) + return -ENOSYS; + /* We only trust the superuser with rebooting the system. */ if (!capable(CAP_SYS_BOOT) || kexec_load_disabled) return -EPERM; -- 2.31.1

3 years, 7 months

2
1
0 0

[PATCH AUTOSEL 5.15 01/16] remoteproc: coredump: Correct argument 2 type for memcpy_fromio

by Sasha Levin

From: Peng Fan <peng.fan(a)nxp.com> [ Upstream commit 876e0b26ccd211ca92607d83c87cc1f097784c6d ] Address the sparse check warning: >> drivers/remoteproc/remoteproc_coredump.c:169:53: sparse: warning: incorrect type in argument 2 (different address spaces) sparse: expected void const volatile [noderef] __iomem *src sparse: got void *[assigned] ptr Reported-by: kernel test robot <lkp(a)intel.com> Signed-off-by: Peng Fan <peng.fan(a)nxp.com> Link: https://lore.kernel.org/r/20211110032101.517487-1-peng.fan@oss.nxp.com Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/remoteproc/remoteproc_coredump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/remoteproc/remoteproc_coredump.c b/drivers/remoteproc/remoteproc_coredump.c index c892f433a323e..4b093420d98aa 100644 --- a/drivers/remoteproc/remoteproc_coredump.c +++ b/drivers/remoteproc/remoteproc_coredump.c @@ -166,7 +166,7 @@ static void rproc_copy_segment(struct rproc *rproc, void *dest, memset(dest, 0xff, size); } else { if (is_iomem) - memcpy_fromio(dest, ptr, size); + memcpy_fromio(dest, (void const __iomem *)ptr, size); else memcpy(dest, ptr, size); } -- 2.34.1

3 years, 8 months

2
16
0 0

[PATCH] riscv: fix build with binutils 2.38

by Aurelien Jarno

From version 2.38, binutils default to ISA spec version 20191213. This means that the csr read/write (csrr*/csrw*) instructions and fence.i instruction has separated from the `I` extension, become two standalone extensions: Zicsr and Zifencei. As the kernel uses those instruction, this causes the following build failure: CC arch/riscv/kernel/vdso/vgettimeofday.o <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h: Assembler messages: <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01' <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01' <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01' <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01' The fix is to specify those extensions explicitely in -march. However as older binutils version do not support this, we first need to detect that. Cc: stable(a)vger.kernel.org # 4.15+ Cc: Kito Cheng <kito.cheng(a)gmail.com> Signed-off-by: Aurelien Jarno <aurelien(a)aurel32.net> --- arch/riscv/Makefile | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index 8a107ed18b0d..7d81102cffd4 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -50,6 +50,12 @@ riscv-march-$(CONFIG_ARCH_RV32I) := rv32ima riscv-march-$(CONFIG_ARCH_RV64I) := rv64ima riscv-march-$(CONFIG_FPU) := $(riscv-march-y)fd riscv-march-$(CONFIG_RISCV_ISA_C) := $(riscv-march-y)c + +# Newer binutils versions default to ISA spec version 20191213 which moves some +# instructions from the I extension to the Zicsr and Zifencei extensions. +toolchain-need-zicsr-zifencei := $(call cc-option-yn, -march=$(riscv-march-y)_zicsr_zifencei) +riscv-march-$(toolchain-need-zicsr-zifencei) := $(riscv-march-y)_zicsr_zifencei + KBUILD_CFLAGS += -march=$(subst fd,,$(riscv-march-y)) KBUILD_AFLAGS += -march=$(riscv-march-y) -- 2.34.1

3 years, 8 months

6
12
0 0

[PATCH] mmc: sdhci-xenon: fix 1.8v regulator stabilization

by Marcin Wojtas

From: Alex Leibovich <alexl(a)marvell.com> Automatic Clock Gating is a feature used for the power consumption optimisation. It turned out that during early init phase it may prevent the stable voltage switch to 1.8V - due to that on some platfroms an endless printout in dmesg can be observed: "mmc1: 1.8V regulator output did not became stable" Fix the problem by disabling the ACG at very beginning of the sdhci_init and let that be enabled later. Fixes: 3a3748dba881 ("mmc: sdhci-xenon: Add Marvell Xenon SDHC core functionality") Signed-off-by: Alex Leibovich <alexl(a)marvell.com> Signed-off-by: Marcin Wojtas <mw(a)semihalf.com> Cc: stable(a)vger.kernel.org --- drivers/mmc/host/sdhci-xenon.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c index c67611fdaa8a..4b05f6fdefb4 100644 --- a/drivers/mmc/host/sdhci-xenon.c +++ b/drivers/mmc/host/sdhci-xenon.c @@ -168,7 +168,12 @@ static void xenon_reset_exit(struct sdhci_host *host, /* Disable tuning request and auto-retuning again */ xenon_retune_setup(host); - xenon_set_acg(host, true); + /* + * The ACG should be turned off at the early init time, in order + * to solve a possile issues with the 1.8V regulator stabilization. + * The feature is enabled in later stage. + */ + xenon_set_acg(host, false); xenon_set_sdclk_off_idle(host, sdhc_id, false); -- 2.29.0

3 years, 8 months

4
10
0 0

[PATCH -next] jffs2: fix use-after-free in jffs2_clear_xattr_subsystem

by Baokun Li

When we mount a jffs2 image, assume that the first few blocks of the image are normal and contain at least one xattr-related inode, but the next block is abnormal. As a result, an error is returned in jffs2_scan_eraseblock(). jffs2_clear_xattr_subsystem() is then called in jffs2_build_filesystem() and then again in jffs2_do_fill_super(). Finally we can observe the following report: ================================================================== BUG: KASAN: use-after-free in jffs2_clear_xattr_subsystem+0x95/0x6ac Read of size 8 at addr ffff8881243384e0 by task mount/719 Call Trace: dump_stack+0x115/0x16b jffs2_clear_xattr_subsystem+0x95/0x6ac jffs2_do_fill_super+0x84f/0xc30 jffs2_fill_super+0x2ea/0x4c0 mtd_get_sb+0x254/0x400 mtd_get_sb_by_nr+0x4f/0xd0 get_tree_mtd+0x498/0x840 jffs2_get_tree+0x25/0x30 vfs_get_tree+0x8d/0x2e0 path_mount+0x50f/0x1e50 do_mount+0x107/0x130 __se_sys_mount+0x1c5/0x2f0 __x64_sys_mount+0xc7/0x160 do_syscall_64+0x45/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Allocated by task 719: kasan_save_stack+0x23/0x60 __kasan_kmalloc.constprop.0+0x10b/0x120 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x1c0/0x870 jffs2_alloc_xattr_ref+0x2f/0xa0 jffs2_scan_medium.cold+0x3713/0x4794 jffs2_do_mount_fs.cold+0xa7/0x2253 jffs2_do_fill_super+0x383/0xc30 jffs2_fill_super+0x2ea/0x4c0 [...] Freed by task 719: kmem_cache_free+0xcc/0x7b0 jffs2_free_xattr_ref+0x78/0x98 jffs2_clear_xattr_subsystem+0xa1/0x6ac jffs2_do_mount_fs.cold+0x5e6/0x2253 jffs2_do_fill_super+0x383/0xc30 jffs2_fill_super+0x2ea/0x4c0 [...] The buggy address belongs to the object at ffff8881243384b8 which belongs to the cache jffs2_xattr_ref of size 48 The buggy address is located 40 bytes inside of 48-byte region [ffff8881243384b8, ffff8881243384e8) [...] ================================================================== The triggering of the BUG is shown in the following stack: ----------------------------------------------------------- jffs2_fill_super jffs2_do_fill_super jffs2_do_mount_fs jffs2_build_filesystem jffs2_scan_medium jffs2_scan_eraseblock <--- ERROR jffs2_clear_xattr_subsystem <--- free jffs2_clear_xattr_subsystem <--- free again ----------------------------------------------------------- An error is returned in jffs2_do_mount_fs(). If the error is returned by jffs2_sum_init(), the jffs2_clear_xattr_subsystem() does not need to be executed. If the error is returned by jffs2_build_filesystem(), the jffs2_clear_xattr_subsystem() also does not need to be executed again. So move jffs2_clear_xattr_subsystem() from 'out_inohash' to 'out_root' to fix this UAF problem. Fixes: aa98d7cf59b5 ("[JFFS2][XATTR] XATTR support on JFFS2 (version. 5)") Cc: stable(a)vger.kernel.org Reported-by: Hulk Robot <hulkci(a)huawei.com> Signed-off-by: Baokun Li <libaokun1(a)huawei.com> --- fs/jffs2/fs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c index 2ac410477c4f..71f03a5d36ed 100644 --- a/fs/jffs2/fs.c +++ b/fs/jffs2/fs.c @@ -603,8 +603,8 @@ int jffs2_do_fill_super(struct super_block *sb, struct fs_context *fc) jffs2_free_ino_caches(c); jffs2_free_raw_node_refs(c); kvfree(c->blocks); - out_inohash: jffs2_clear_xattr_subsystem(c); + out_inohash: kfree(c->inocache_list); out_wbuf: jffs2_flash_cleanup(c); -- 2.31.1

3 years, 8 months

3
4
0 0

[tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success

by tip-bot2 for Stefan Roese

The following commit has been merged into the irq/urgent branch of tip: Commit-ID: 83dbf898a2d45289be875deb580e93050ba67529 Gitweb: https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529 Author: Stefan Roese <sr(a)denx.de> AuthorDate: Tue, 14 Dec 2021 12:49:32 +01:00 Committer: Thomas Gleixner <tglx(a)linutronix.de> CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00 PCI/MSI: Mask MSI-X vectors only on success Masking all unused MSI-X entries is done to ensure that a crash kernel starts from a clean slate, which correponds to the reset state of the device as defined in the PCI-E specificion 3.0 and later: Vector Control for MSI-X Table Entries -------------------------------------- "00: Mask bit: When this bit is set, the function is prohibited from sending a message using this MSI-X Table entry. ... This bit’s state after reset is 1 (entry is masked)." A Marvell NVME device fails to deliver MSI interrupts after trying to enable MSI-X interrupts due to that masking. It seems to take the MSI-X mask bits into account even when MSI-X is disabled. While not specification compliant, this can be cured by moving the masking into the success path, so that the MSI-X table entries stay in device reset state when the MSI-X setup fails. [ tglx: Move it into the success path, add comment and amend changelog ] Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries") Signed-off-by: Stefan Roese <sr(a)denx.de> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: linux-pci(a)vger.kernel.org Cc: Bjorn Helgaas <bhelgaas(a)google.com> Cc: Michal Simek <michal.simek(a)xilinx.com> Cc: Marek Vasut <marex(a)denx.de> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20211210161025.3287927-1-sr@denx.de --- drivers/pci/msi.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 48e3f4e..6748cf9 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, goto out_disable; } - /* Ensure that all table entries are masked. */ - msix_mask_all(base, tsize); - ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) goto out_disable; @@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, /* Set MSI-X enabled bits and unmask the function */ pci_intx_for_msi(dev, 0); dev->msix_enabled = 1; + + /* + * Ensure that all table entries are masked to prevent + * stale entries from firing in a crash kernel. + * + * Done late to deal with a broken Marvell NVME device + * which takes the MSI-X mask bits into account even + * when MSI-X is disabled, which prevents MSI delivery. + */ + msix_mask_all(base, tsize); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); pcibios_free_irq(dev);

3 years, 8 months

4
3
0 0

FAILED: patch "[PATCH] bpf: Fix toctou on read-only map's constant scalar tracking" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 353050be4c19e102178ccc05988101887c25ae53 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <daniel(a)iogearbox.net> Date: Tue, 9 Nov 2021 18:48:08 +0000 Subject: [PATCH] bpf: Fix toctou on read-only map's constant scalar tracking Commit a23740ec43ba ("bpf: Track contents of read-only maps as scalars") is checking whether maps are read-only both from BPF program side and user space side, and then, given their content is constant, reading out their data via map->ops->map_direct_value_addr() which is then subsequently used as known scalar value for the register, that is, it is marked as __mark_reg_known() with the read value at verification time. Before a23740ec43ba, the register content was marked as an unknown scalar so the verifier could not make any assumptions about the map content. The current implementation however is prone to a TOCTOU race, meaning, the value read as known scalar for the register is not guaranteed to be exactly the same at a later point when the program is executed, and as such, the prior made assumptions of the verifier with regards to the program will be invalid which can cause issues such as OOB access, etc. While the BPF_F_RDONLY_PROG map flag is always fixed and required to be specified at map creation time, the map->frozen property is initially set to false for the map given the map value needs to be populated, e.g. for global data sections. Once complete, the loader "freezes" the map from user space such that no subsequent updates/deletes are possible anymore. For the rest of the lifetime of the map, this freeze one-time trigger cannot be undone anymore after a successful BPF_MAP_FREEZE cmd return. Meaning, any new BPF_* cmd calls which would update/delete map entries will be rejected with -EPERM since map_get_sys_perms() removes the FMODE_CAN_WRITE permission. This also means that pending update/delete map entries must still complete before this guarantee is given. This corner case is not an issue for loaders since they create and prepare such program private map in successive steps. However, a malicious user is able to trigger this TOCTOU race in two different ways: i) via userfaultfd, and ii) via batched updates. For i) userfaultfd is used to expand the competition interval, so that map_update_elem() can modify the contents of the map after map_freeze() and bpf_prog_load() were executed. This works, because userfaultfd halts the parallel thread which triggered a map_update_elem() at the time where we copy key/value from the user buffer and this already passed the FMODE_CAN_WRITE capability test given at that time the map was not "frozen". Then, the main thread performs the map_freeze() and bpf_prog_load(), and once that had completed successfully, the other thread is woken up to complete the pending map_update_elem() which then changes the map content. For ii) the idea of the batched update is similar, meaning, when there are a large number of updates to be processed, it can increase the competition interval between the two. It is therefore possible in practice to modify the contents of the map after executing map_freeze() and bpf_prog_load(). One way to fix both i) and ii) at the same time is to expand the use of the map's map->writecnt. The latter was introduced in fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY") and further refined in 1f6cb19be2e2 ("bpf: Prevent re-mmap()'ing BPF map as writable for initially r/o mapping") with the rationale to make a writable mmap()'ing of a map mutually exclusive with read-only freezing. The counter indicates writable mmap() mappings and then prevents/fails the freeze operation. Its semantics can be expanded beyond just mmap() by generally indicating ongoing write phases. This would essentially span any parallel regular and batched flavor of update/delete operation and then also have map_freeze() fail with -EBUSY. For the check_mem_access() in the verifier we expand upon the bpf_map_is_rdonly() check ensuring that all last pending writes have completed via bpf_map_write_active() test. Once the map->frozen is set and bpf_map_write_active() indicates a map->writecnt of 0 only then we are really guaranteed to use the map's data as known constants. For map->frozen being set and pending writes in process of still being completed we fall back to marking that register as unknown scalar so we don't end up making assumptions about it. With this, both TOCTOU reproducers from i) and ii) are fixed. Note that the map->writecnt has been converted into a atomic64 in the fix in order to avoid a double freeze_mutex mutex_{un,}lock() pair when updating map->writecnt in the various map update/delete BPF_* cmd flavors. Spanning the freeze_mutex over entire map update/delete operations in syscall side would not be possible due to then causing everything to be serialized. Similarly, something like synchronize_rcu() after setting map->frozen to wait for update/deletes to complete is not possible either since it would also have to span the user copy which can sleep. On the libbpf side, this won't break d66562fba1ce ("libbpf: Add BPF object skeleton support") as the anonymous mmap()-ed "map initialization image" is remapped as a BPF map-backed mmap()-ed memory where for .rodata it's non-writable. Fixes: a23740ec43ba ("bpf: Track contents of read-only maps as scalars") Reported-by: w1tcher.bupt(a)gmail.com Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net> Acked-by: Andrii Nakryiko <andrii(a)kernel.org> Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f715e8863f4d..e7a163a3146b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -193,7 +193,7 @@ struct bpf_map { atomic64_t usercnt; struct work_struct work; struct mutex freeze_mutex; - u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */ + atomic64_t writecnt; }; static inline bool map_value_has_spin_lock(const struct bpf_map *map) @@ -1419,6 +1419,7 @@ void bpf_map_put(struct bpf_map *map); void *bpf_map_area_alloc(u64 size, int numa_node); void *bpf_map_area_mmapable_alloc(u64 size, int numa_node); void bpf_map_area_free(void *base); +bool bpf_map_write_active(const struct bpf_map *map); void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr); int generic_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 50f96ea4452a..1033ee8c0caf 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -132,6 +132,21 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) return map; } +static void bpf_map_write_active_inc(struct bpf_map *map) +{ + atomic64_inc(&map->writecnt); +} + +static void bpf_map_write_active_dec(struct bpf_map *map) +{ + atomic64_dec(&map->writecnt); +} + +bool bpf_map_write_active(const struct bpf_map *map) +{ + return atomic64_read(&map->writecnt) != 0; +} + static u32 bpf_map_value_size(const struct bpf_map *map) { if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || @@ -601,11 +616,8 @@ static void bpf_map_mmap_open(struct vm_area_struct *vma) { struct bpf_map *map = vma->vm_file->private_data; - if (vma->vm_flags & VM_MAYWRITE) { - mutex_lock(&map->freeze_mutex); - map->writecnt++; - mutex_unlock(&map->freeze_mutex); - } + if (vma->vm_flags & VM_MAYWRITE) + bpf_map_write_active_inc(map); } /* called for all unmapped memory region (including initial) */ @@ -613,11 +625,8 @@ static void bpf_map_mmap_close(struct vm_area_struct *vma) { struct bpf_map *map = vma->vm_file->private_data; - if (vma->vm_flags & VM_MAYWRITE) { - mutex_lock(&map->freeze_mutex); - map->writecnt--; - mutex_unlock(&map->freeze_mutex); - } + if (vma->vm_flags & VM_MAYWRITE) + bpf_map_write_active_dec(map); } static const struct vm_operations_struct bpf_map_default_vmops = { @@ -668,7 +677,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) goto out; if (vma->vm_flags & VM_MAYWRITE) - map->writecnt++; + bpf_map_write_active_inc(map); out: mutex_unlock(&map->freeze_mutex); return err; @@ -1139,6 +1148,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); + bpf_map_write_active_inc(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; @@ -1174,6 +1184,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) free_key: kvfree(key); err_put: + bpf_map_write_active_dec(map); fdput(f); return err; } @@ -1196,6 +1207,7 @@ static int map_delete_elem(union bpf_attr *attr) map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); + bpf_map_write_active_inc(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; @@ -1226,6 +1238,7 @@ static int map_delete_elem(union bpf_attr *attr) out: kvfree(key); err_put: + bpf_map_write_active_dec(map); fdput(f); return err; } @@ -1533,6 +1546,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); + bpf_map_write_active_inc(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ) || !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; @@ -1597,6 +1611,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) free_key: kvfree(key); err_put: + bpf_map_write_active_dec(map); fdput(f); return err; } @@ -1624,8 +1639,7 @@ static int map_freeze(const union bpf_attr *attr) } mutex_lock(&map->freeze_mutex); - - if (map->writecnt) { + if (bpf_map_write_active(map)) { err = -EBUSY; goto err_put; } @@ -4171,6 +4185,9 @@ static int bpf_map_do_batch(const union bpf_attr *attr, union bpf_attr __user *uattr, int cmd) { + bool has_read = cmd == BPF_MAP_LOOKUP_BATCH || + cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH; + bool has_write = cmd != BPF_MAP_LOOKUP_BATCH; struct bpf_map *map; int err, ufd; struct fd f; @@ -4183,16 +4200,13 @@ static int bpf_map_do_batch(const union bpf_attr *attr, map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); - - if ((cmd == BPF_MAP_LOOKUP_BATCH || - cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH) && - !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) { + if (has_write) + bpf_map_write_active_inc(map); + if (has_read && !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) { err = -EPERM; goto err_put; } - - if (cmd != BPF_MAP_LOOKUP_BATCH && - !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { + if (has_write && !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; } @@ -4205,8 +4219,9 @@ static int bpf_map_do_batch(const union bpf_attr *attr, BPF_DO_BATCH(map->ops->map_update_batch); else BPF_DO_BATCH(map->ops->map_delete_batch); - err_put: + if (has_write) + bpf_map_write_active_dec(map); fdput(f); return err; } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 65d2f93b7030..50efda51515b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4056,7 +4056,22 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size) static bool bpf_map_is_rdonly(const struct bpf_map *map) { - return (map->map_flags & BPF_F_RDONLY_PROG) && map->frozen; + /* A map is considered read-only if the following condition are true: + * + * 1) BPF program side cannot change any of the map content. The + * BPF_F_RDONLY_PROG flag is throughout the lifetime of a map + * and was set at map creation time. + * 2) The map value(s) have been initialized from user space by a + * loader and then "frozen", such that no new map update/delete + * operations from syscall side are possible for the rest of + * the map's lifetime from that point onwards. + * 3) Any parallel/pending map update/delete operations from syscall + * side have been completed. Only after that point, it's safe to + * assume that map value(s) are immutable. + */ + return (map->map_flags & BPF_F_RDONLY_PROG) && + READ_ONCE(map->frozen) && + !bpf_map_write_active(map); } static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val)

3 years, 8 months

5
10
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror January 2022