From: Herbert Xu herbert@gondor.apana.org.au
[ Upstream commit 882aa6525cabcfa0cea61e1a19c9af4c543118ac ]
Module device tables need to be declared as maybe_unused because they will be unused when built-in and the corresponding option is also disabled.
This patch adds the maybe_unused attributes to OF and ACPI. This also allows us to remove the ifdef around the ACPI data structure.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Reviewed-by: Vinod Koul vkoul@kernel.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/crypto/qcom-rng.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/crypto/qcom-rng.c b/drivers/crypto/qcom-rng.c index 6cc4fd005fe0..648e3bc9c4e9 100644 --- a/drivers/crypto/qcom-rng.c +++ b/drivers/crypto/qcom-rng.c @@ -8,6 +8,7 @@ #include <linux/clk.h> #include <linux/crypto.h> #include <linux/iopoll.h> +#include <linux/kernel.h> #include <linux/module.h> #include <linux/of.h> #include <linux/platform_device.h> @@ -200,15 +201,13 @@ static int qcom_rng_remove(struct platform_device *pdev) return 0; }
-#if IS_ENABLED(CONFIG_ACPI) -static const struct acpi_device_id qcom_rng_acpi_match[] = { +static const struct acpi_device_id __maybe_unused qcom_rng_acpi_match[] = { { .id = "QCOM8160", .driver_data = 1 }, {} }; MODULE_DEVICE_TABLE(acpi, qcom_rng_acpi_match); -#endif
-static const struct of_device_id qcom_rng_of_match[] = { +static const struct of_device_id __maybe_unused qcom_rng_of_match[] = { { .compatible = "qcom,prng", .data = (void *)0}, { .compatible = "qcom,prng-ee", .data = (void *)1}, {}
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit 204c0300c4e99707e9fb6e57840aa1127060e63f ]
Switch from strlcpy to strscpy and make sure that @count is the size of the smaller of the source and destination buffers. This prevents reading beyond the end of the source buffer when the source string isn't null terminated.
Found by a modified version of syzkaller.
Suggested-by: Wolfram Sang wsa+renesas@sang-engineering.com Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/ops_fstype.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 29b27d769860..2841134f7812 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -377,8 +377,10 @@ static int init_names(struct gfs2_sbd *sdp, int silent) if (!table[0]) table = sdp->sd_vfs->s_id;
- strlcpy(sdp->sd_proto_name, proto, GFS2_FSNAME_LEN); - strlcpy(sdp->sd_table_name, table, GFS2_FSNAME_LEN); + BUILD_BUG_ON(GFS2_LOCKNAME_LEN > GFS2_FSNAME_LEN); + + strscpy(sdp->sd_proto_name, proto, GFS2_LOCKNAME_LEN); + strscpy(sdp->sd_table_name, table, GFS2_LOCKNAME_LEN);
table = sdp->sd_table_name; while ((table = strchr(table, '/'))) @@ -1349,13 +1351,13 @@ static int gfs2_parse_param(struct fs_context *fc, struct fs_parameter *param)
switch (o) { case Opt_lockproto: - strlcpy(args->ar_lockproto, param->string, GFS2_LOCKNAME_LEN); + strscpy(args->ar_lockproto, param->string, GFS2_LOCKNAME_LEN); break; case Opt_locktable: - strlcpy(args->ar_locktable, param->string, GFS2_LOCKNAME_LEN); + strscpy(args->ar_locktable, param->string, GFS2_LOCKNAME_LEN); break; case Opt_hostdata: - strlcpy(args->ar_hostdata, param->string, GFS2_LOCKNAME_LEN); + strscpy(args->ar_hostdata, param->string, GFS2_LOCKNAME_LEN); break; case Opt_spectator: args->ar_spectator = 1;
From: Tejun Heo tj@kernel.org
[ Upstream commit dc79ec1b232ad2c165d381d3dd2626df4ef9b5a4 ]
There's a seemingly harmless data-race around cgrp_dfl_visible detected by kernel concurrency sanitizer. Let's remove it by throwing WRITE/READ_ONCE at it.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Abhishek Shah abhishek.shah@columbia.edu Cc: Gabriel Ryan gabe@cs.columbia.edu Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org Link: https://lore.kernel.org/netdev/20220819072256.fn7ctciefy4fc4cu@wittgenstein/ Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/cgroup/cgroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index d14575c0e464..194060a6492a 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2178,7 +2178,7 @@ static int cgroup_get_tree(struct fs_context *fc) struct cgroup_fs_context *ctx = cgroup_fc2context(fc); int ret;
- cgrp_dfl_visible = true; + WRITE_ONCE(cgrp_dfl_visible, true); cgroup_get_live(&cgrp_dfl_root.cgrp); ctx->root = &cgrp_dfl_root;
@@ -6022,7 +6022,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, struct cgroup *cgrp; int ssid, count = 0;
- if (root == &cgrp_dfl_root && !cgrp_dfl_visible) + if (root == &cgrp_dfl_root && !READ_ONCE(cgrp_dfl_visible)) continue;
seq_printf(m, "%d:", root->hierarchy_id);
From: Marek Bykowski marek.bykowski@gmail.com
[ Upstream commit d5e3050c0feb8bf7b9a75482fafcc31b90257926 ]
If the properties 'linux,initrd-start' and 'linux,initrd-end' of the chosen node populated from the bootloader, eg. U-Boot, are so that start > end, then the phys_initrd_size calculated from end - start is negative that subsequently gets converted to a high positive value for being unsigned long long. Then, the memory region with the (invalid) size is added to the bootmem and attempted being paged in paging_init() that results in the kernel fault.
For example, on the FVP ARM64 system I'm running, the U-Boot populates the 'linux,initrd-start' with 8800_0000 and 'linux,initrd-end' with 0. The phys_initrd_size calculated is then ffff_ffff_7800_0000 (= 0 - 8800_0000 = -8800_0000 + ULLONG_MAX + 1). paging_init() then attempts to map the address 8800_0000 + ffff_ffff_7800_0000 and oops'es as below.
It should be stressed, it is generally a fault of the bootloader's with the kernel relying on it, however we should not allow the bootloader's misconfiguration to lead to the kernel oops. Not only the kernel should be bullet proof against it but also finding the root cause of the paging fault spanning over the bootloader, DT, and kernel may happen is not so easy.
Unable to handle kernel paging request at virtual address fffffffefe43c000 Mem abort info: ESR = 0x96000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000007 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000 [fffffffefe43c000] pgd=0000000080de9003, pud=0000000080de9003 Unable to handle kernel paging request at virtual address ffffff8000de9f90 Mem abort info: ESR = 0x96000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000 [ffffff8000de9f90] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.51-yocto-standard #1 Hardware name: FVP Base (DT) pstate: 60000085 (nZCv daIf -PAN -UAO) pc : show_pte+0x12c/0x1b4 lr : show_pte+0x100/0x1b4 sp : ffffffc010ce3b30 x29: ffffffc010ce3b30 x28: ffffffc010ceed80 x27: fffffffefe43c000 x26: fffffffefe43a028 x25: 0000000080bf0000 x24: 0000000000000025 x23: ffffffc010b8d000 x22: ffffffc010e3d000 x23: ffffffc010b8d000 x22: ffffffc010e3d000 x21: 0000000080de9000 x20: ffffff7f80000f90 x19: fffffffefe43c000 x18: 0000000000000030 x17: 0000000000001400 x16: 0000000000001c00 x15: ffffffc010cef1b8 x14: ffffffffffffffff x13: ffffffc010df1f40 x12: ffffffc010df1b70 x11: ffffffc010ce3b30 x10: ffffffc010ce3b30 x9 : 00000000ffffffc8 x8 : 0000000000000000 x7 : 000000000000000f x6 : ffffffc010df16e8 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff x2 : 0000000000000000 x1 : 0000008080000000 x0 : ffffffc010af1d68 Call trace: show_pte+0x12c/0x1b4 die_kernel_fault+0x54/0x78 __do_kernel_fault+0x11c/0x128 do_translation_fault+0x58/0xac do_mem_abort+0x50/0xb0 el1_da+0x1c/0x90 __create_pgd_mapping+0x348/0x598 paging_init+0x3f0/0x70d0 setup_arch+0x2c0/0x5d4 start_kernel+0x94/0x49c Code: 92748eb5 900052a0 9135a000 cb010294 (f8756a96)
Signed-off-by: Marek Bykowski marek.bykowski@gmail.com Link: https://lore.kernel.org/r/20220909023358.76881-1-marek.bykowski@gmail.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/of/fdt.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 6d519ef3c5da..0193a5e82fab 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -889,6 +889,8 @@ static void __init early_init_dt_check_for_initrd(unsigned long node) if (!prop) return; end = of_read_number(prop, len/4); + if (start > end) + return;
__early_init_dt_declare_initrd(start, end); phys_initrd_start = start;
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit 7a7621dfa417aa3715d2a3bd1bdd6cf5018274d0 ]
When 'discussing' control flow Masami mentioned the LOOP* instructions and I realized objtool doesn't decode them properly.
As it turns out, these instructions are somewhat inefficient and as such unlikely to be emitted by the compiler (a few vmlinux.o checks can't find a single one) so this isn't critical, but still, best to decode them properly.
Reported-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/Yxhd4EMKyoFoH9y4@hirez.programming.kicks-ass.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/objtool/arch/x86/decode.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c index a62e032863a8..4a7652719130 100644 --- a/tools/objtool/arch/x86/decode.c +++ b/tools/objtool/arch/x86/decode.c @@ -435,6 +435,12 @@ int arch_decode_instruction(struct elf *elf, struct section *sec, *type = INSN_CONTEXT_SWITCH; break;
+ case 0xe0: /* loopne */ + case 0xe1: /* loope */ + case 0xe2: /* loop */ + *type = INSN_JUMP_CONDITIONAL; + break; + case 0xe8: *type = INSN_CALL; break;
From: Andrew Price anprice@redhat.com
[ Upstream commit 670f8ce56dd0632dc29a0322e188cc73ce3c6b92 ]
Fuzzers like to scribble over sb_bsize_shift but in reality it's very unlikely that this field would be corrupted on its own. Nevertheless it should be checked to avoid the possibility of messy mount errors due to bad calculations. It's always a fixed value based on the block size so we can just check that it's the expected value.
Tested with:
mkfs.gfs2 -O -p lock_nolock /dev/vdb for i in 0 -1 64 65 32 33; do gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb mount /dev/vdb /mnt/test && umount /mnt/test done
Before this patch we get a withdraw after
[ 76.413681] gfs2: fsid=loop0.0: fatal: invalid metadata block [ 76.413681] bh = 19 (type: exp=5, found=4) [ 76.413681] function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 492
and with UBSAN configured we also get complaints like
[ 76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19 [ 76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int'
After the patch, these complaints don't appear, mount fails immediately and we get an explanation in dmesg.
Reported-by: syzbot+dcf33a7aae997956fe06@syzkaller.appspotmail.com Signed-off-by: Andrew Price anprice@redhat.com Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/ops_fstype.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 2841134f7812..2112ff7a0172 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -180,7 +180,10 @@ static int gfs2_check_sb(struct gfs2_sbd *sdp, int silent) pr_warn("Invalid superblock size\n"); return -EINVAL; } - + if (sb->sb_bsize_shift != ffs(sb->sb_bsize) - 1) { + pr_warn("Invalid block size shift\n"); + return -EINVAL; + } return 0; }
From: Yury Norov yury.norov@gmail.com
[ Upstream commit 546a073d628111e3338af689938407e77d5dc38f ]
generic_secondary_common_init() calls LOAD_REG_ADDR(r7, nr_cpu_ids) conditionally on CONFIG_SMP. However, if 'NR_CPUS == 1', kernel doesn't use the nr_cpu_ids, and in C code, it's just: #if NR_CPUS == 1 #define nr_cpu_ids ...
This series makes declaration of nr_cpu_ids conditional on NR_CPUS == 1, and that reveals the issue, because compiler can't link the LOAD_REG_ADDR(r7, nr_cpu_ids) against nonexisting symbol.
Current code looks unsafe for those who build kernel with CONFIG_SMP=y and NR_CPUS == 1. This is weird configuration, but not disallowed.
Fix the linker error by replacing LOAD_REG_ADDR() with LOAD_REG_IMMEDIATE() conditionally on NR_CPUS == 1.
As the following patch adds CONFIG_FORCE_NR_CPUS option that has the similar effect on nr_cpu_ids, make the generic_secondary_common_init() conditional on it too.
Reported-by: Stephen Rothwell sfr@canb.auug.org.au Signed-off-by: Yury Norov yury.norov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/head_64.S | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 9019f1395d39..be6e94f203c1 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -395,8 +395,12 @@ generic_secondary_common_init: #else LOAD_REG_ADDR(r8, paca_ptrs) /* Load paca_ptrs pointe */ ld r8,0(r8) /* Get base vaddr of array */ +#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS) + LOAD_REG_IMMEDIATE(r7, NR_CPUS) +#else LOAD_REG_ADDR(r7, nr_cpu_ids) /* Load nr_cpu_ids address */ lwz r7,0(r7) /* also the max paca allocated */ +#endif li r5,0 /* logical cpu id */ 1: sldi r9,r5,3 /* get paca_ptrs[] index from cpu id */
From: Greg Ungerer gerg@linux-m68k.org
[ Upstream commit 750321ace9107e103f254bf46900629ff347eb7b ]
Compiling for a classic m68k non-MMU target with no specific CPU selected fails with the following error:
arch/m68k/68000/ints.c: In function 'process_int':
arch/m68k/68000/ints.c:82:30: error: 'ISR' undeclared (first use in this function)
82 | unsigned long pend = ISR; | ^~~
This interrupt handling code is specific to the 68328 family of 68000 parts. There is a couple of variants (68EZ328, 68VZ328) and the common ancestor of them the strait 68328.
The code here includes a specific header for each variant type. But if none is selected then nothing is included to supply the appropriate register and bit flags defines.
Rearrange the includes so that at least one type is always included. At the very least the 68328 base type should be the fallback, so make that true.
Reported-by: kernel test robot lkp@intel.com Reviewed-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Greg Ungerer gerg@linux-m68k.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/m68k/68000/ints.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/m68k/68000/ints.c b/arch/m68k/68000/ints.c index cda49b12d7be..f9a5ec781408 100644 --- a/arch/m68k/68000/ints.c +++ b/arch/m68k/68000/ints.c @@ -18,12 +18,12 @@ #include <asm/io.h> #include <asm/machdep.h>
-#if defined(CONFIG_M68328) -#include <asm/MC68328.h> -#elif defined(CONFIG_M68EZ328) +#if defined(CONFIG_M68EZ328) #include <asm/MC68EZ328.h> #elif defined(CONFIG_M68VZ328) #include <asm/MC68VZ328.h> +#else +#include <asm/MC68328.h> #endif
/* assembler routines */
From: Greg Ungerer gerg@linux-m68k.org
[ Upstream commit 18011e50c497f04a57a8e00122906f04922b30b4 ]
The family of classic 68000 parts supported when in non-mmu mode all currently use the legacy timer support. Move the selection of that config option, LEGACY_TIMER_TICK, into the core CPU configuration.
This fixes compilation if no specific CPU variant is selected, since the LEGACY_TIMER_TICK option was only selected in the specific CPU variant configurations.
Reviewed-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Greg Ungerer gerg@linux-m68k.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/m68k/Kconfig.cpu | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu index 6bc7fc14163f..aeee0743938f 100644 --- a/arch/m68k/Kconfig.cpu +++ b/arch/m68k/Kconfig.cpu @@ -43,6 +43,7 @@ config M68000 select GENERIC_CSUM select CPU_NO_EFFICIENT_FFS select HAVE_ARCH_HASH + select LEGACY_TIMER_TICK help The Freescale (was Motorola) 68000 CPU is the first generation of the well known M68K family of processors. The CPU core as well as
From: Alexander Potapenko glider@google.com
[ Upstream commit f630a5d0ca59a6e73b61e3f82c371dc230da99ff ]
KMSAN metadata for adjacent physical pages may not be adjacent, therefore accessing such pages together may lead to metadata corruption. We disable merging pages in biovec to prevent such corruptions.
Link: https://lkml.kernel.org/r/20220915150417.722975-28-glider@google.com Signed-off-by: Alexander Potapenko glider@google.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Alexei Starovoitov ast@kernel.org Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andy Lutomirski luto@kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Borislav Petkov bp@alien8.de Cc: Christoph Hellwig hch@lst.de Cc: Christoph Lameter cl@linux.com Cc: David Rientjes rientjes@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Eric Biggers ebiggers@google.com Cc: Eric Biggers ebiggers@kernel.org Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Ilya Leoshkevich iii@linux.ibm.com Cc: Ingo Molnar mingo@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Marco Elver elver@google.com Cc: Mark Rutland mark.rutland@arm.com Cc: Matthew Wilcox willy@infradead.org Cc: Michael S. Tsirkin mst@redhat.com Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Petr Mladek pmladek@suse.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Steven Rostedt rostedt@goodmis.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vasily Gorbik gor@linux.ibm.com Cc: Vegard Nossum vegard.nossum@oracle.com Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/block/blk.h b/block/blk.h index ee3d5664d962..3358ef4244fe 100644 --- a/block/blk.h +++ b/block/blk.h @@ -79,6 +79,13 @@ static inline bool biovec_phys_mergeable(struct request_queue *q, phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset; phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;
+ /* + * Merging adjacent physical pages may not work correctly under KMSAN + * if their metadata pages aren't adjacent. Just disable merging. + */ + if (IS_ENABLED(CONFIG_KMSAN)) + return false; + if (addr1 + vec1->bv_len != addr2) return false; if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
On Mon, Oct 17, 2022 at 08:10:59PM -0400, Sasha Levin wrote:
From: Alexander Potapenko glider@google.com
[ Upstream commit f630a5d0ca59a6e73b61e3f82c371dc230da99ff ]
KMSAN metadata for adjacent physical pages may not be adjacent, therefore accessing such pages together may lead to metadata corruption. We disable merging pages in biovec to prevent such corruptions.
Link: https://lkml.kernel.org/r/20220915150417.722975-28-glider@google.com Signed-off-by: Alexander Potapenko glider@google.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Alexei Starovoitov ast@kernel.org Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andy Lutomirski luto@kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Borislav Petkov bp@alien8.de Cc: Christoph Hellwig hch@lst.de Cc: Christoph Lameter cl@linux.com Cc: David Rientjes rientjes@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Eric Biggers ebiggers@google.com Cc: Eric Biggers ebiggers@kernel.org Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Ilya Leoshkevich iii@linux.ibm.com Cc: Ingo Molnar mingo@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Marco Elver elver@google.com Cc: Mark Rutland mark.rutland@arm.com Cc: Matthew Wilcox willy@infradead.org Cc: Michael S. Tsirkin mst@redhat.com Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Petr Mladek pmladek@suse.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Steven Rostedt rostedt@goodmis.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vasily Gorbik gor@linux.ibm.com Cc: Vegard Nossum vegard.nossum@oracle.com Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org
block/blk.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/block/blk.h b/block/blk.h index ee3d5664d962..3358ef4244fe 100644 --- a/block/blk.h +++ b/block/blk.h @@ -79,6 +79,13 @@ static inline bool biovec_phys_mergeable(struct request_queue *q, phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset; phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;
- /*
* Merging adjacent physical pages may not work correctly under KMSAN
* if their metadata pages aren't adjacent. Just disable merging.
*/
- if (IS_ENABLED(CONFIG_KMSAN))
return false;
- if (addr1 + vec1->bv_len != addr2) return false; if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
So KMSAN is being backported to 5.4?
- Eric
On 10/18/22 02:22, Eric Biggers wrote:
On Mon, Oct 17, 2022 at 08:10:59PM -0400, Sasha Levin wrote:
From: Alexander Potapenko glider@google.com
[ Upstream commit f630a5d0ca59a6e73b61e3f82c371dc230da99ff ]
KMSAN metadata for adjacent physical pages may not be adjacent, therefore accessing such pages together may lead to metadata corruption. We disable merging pages in biovec to prevent such corruptions.
Link: https://lkml.kernel.org/r/20220915150417.722975-28-glider@google.com Signed-off-by: Alexander Potapenko glider@google.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Alexei Starovoitov ast@kernel.org Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andy Lutomirski luto@kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Borislav Petkov bp@alien8.de Cc: Christoph Hellwig hch@lst.de Cc: Christoph Lameter cl@linux.com Cc: David Rientjes rientjes@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Eric Biggers ebiggers@google.com Cc: Eric Biggers ebiggers@kernel.org Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Ilya Leoshkevich iii@linux.ibm.com Cc: Ingo Molnar mingo@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Marco Elver elver@google.com Cc: Mark Rutland mark.rutland@arm.com Cc: Matthew Wilcox willy@infradead.org Cc: Michael S. Tsirkin mst@redhat.com Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Petr Mladek pmladek@suse.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Steven Rostedt rostedt@goodmis.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vasily Gorbik gor@linux.ibm.com Cc: Vegard Nossum vegard.nossum@oracle.com Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org
block/blk.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/block/blk.h b/block/blk.h index ee3d5664d962..3358ef4244fe 100644 --- a/block/blk.h +++ b/block/blk.h @@ -79,6 +79,13 @@ static inline bool biovec_phys_mergeable(struct request_queue *q, phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset; phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;
- /*
* Merging adjacent physical pages may not work correctly under KMSAN
* if their metadata pages aren't adjacent. Just disable merging.
*/
- if (IS_ENABLED(CONFIG_KMSAN))
return false;
- if (addr1 + vec1->bv_len != addr2) return false; if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
So KMSAN is being backported to 5.4?
No, AUTOSEL is drunk and should drop the random kmsan patches for random stable versions.
- Eric
From: Dominique Martinet asmadeus@codewreck.org
[ Upstream commit 52f1c45dde9136f964d63a77d19826c8a74e2c7f ]
syzbot reported a double-lock here and we no longer need this lock after requests have been moved off to local list: just drop the lock earlier.
Link: https://lkml.kernel.org/r/20220904064028.1305220-1-asmadeus@codewreck.org Reported-by: syzbot+50f7e8d06c3768dd97f3@syzkaller.appspotmail.com Signed-off-by: Dominique Martinet asmadeus@codewreck.org Tested-by: Schspa Shi schspa@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_fd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 60eb9a2b209b..70bcacb3bbdc 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -205,6 +205,8 @@ static void p9_conn_cancel(struct p9_conn *m, int err) list_move(&req->req_list, &cancel_list); }
+ spin_unlock(&m->client->lock); + list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) { p9_debug(P9_DEBUG_ERROR, "call back req %p\n", req); list_del(&req->req_list); @@ -212,7 +214,6 @@ static void p9_conn_cancel(struct p9_conn *m, int err) req->t_err = err; p9_client_cb(m->client, req, REQ_STATUS_ERROR); } - spin_unlock(&m->client->lock); }
static __poll_t
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit ef575281b21e9a34dfae544a187c6aac2ae424a9 ]
syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is failing to interrupt already started kernel_read() from p9_fd_read() from p9_read_work() and/or kernel_write() from p9_fd_write() from p9_write_work() requests.
Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open() does not set O_NONBLOCK flag, but pipe blocks unless signal is pending, p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when the file descriptor refers to a pipe. In other words, pipe file descriptor needs to be handled as if socket file descriptor.
We somehow need to interrupt kernel_read()/kernel_write() on pipes.
A minimal change, which this patch is doing, is to set O_NONBLOCK flag from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing of regular files. But this approach changes O_NONBLOCK flag on userspace- supplied file descriptors (which might break userspace programs), and O_NONBLOCK flag could be changed by userspace. It would be possible to set O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still remains small race window for clearing O_NONBLOCK flag.
If we don't want to manipulate O_NONBLOCK flag, we might be able to surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING) and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are processed by kernel threads which process global system_wq workqueue, signals could not be delivered from remote threads when p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be needed if we count on signals for making kernel_read()/kernel_write() non-blocking.
Link: https://lkml.kernel.org/r/345de429-a88b-7097-d177-adecf9fed342@I-love.SAKURA... Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1] Reported-by: syzbot syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Tested-by: syzbot syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com [Dominique: add comment at Christian's suggestion] Signed-off-by: Dominique Martinet asmadeus@codewreck.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_fd.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 70bcacb3bbdc..b691871d9a02 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -821,11 +821,14 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd) goto out_free_ts; if (!(ts->rd->f_mode & FMODE_READ)) goto out_put_rd; + /* prevent workers from hanging on IO when fd is a pipe */ + ts->rd->f_flags |= O_NONBLOCK; ts->wr = fget(wfd); if (!ts->wr) goto out_put_rd; if (!(ts->wr->f_mode & FMODE_WRITE)) goto out_put_wr; + ts->wr->f_flags |= O_NONBLOCK;
client->trans = ts; client->status = Connected;
From: Angus Chen angus.chen@jaguarmicro.com
[ Upstream commit 71491c54eafa318fdd24a1f26a1c82b28e1ac21d ]
The background is that we use dpu in cloud computing,the arch is x86,80 cores. We will have a lots of virtio devices,like 512 or more. When we probe about 200 virtio_blk devices,it will fail and the stack is printed as follows:
[25338.485128] virtio-pci 0000:b3:00.0: virtio_pci: leaving for legacy driver [25338.496174] genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer) [25338.503822] CPU: 20 PID: 5431 Comm: kworker/20:0 Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.30.1.el8.x86_64 [25338.516403] Hardware name: Inspur NF5280M5/YZMB-00882-10E, BIOS 4.1.21 08/25/2021 [25338.523881] Workqueue: events work_for_cpu_fn [25338.528235] Call Trace: [25338.530687] dump_stack+0x5c/0x80 [25338.534000] __setup_irq.cold.53+0x7c/0xd3 [25338.538098] request_threaded_irq+0xf5/0x160 [25338.542371] vp_find_vqs+0xc7/0x190 [25338.545866] init_vq+0x17c/0x2e0 [virtio_blk] [25338.550223] ? ncpus_cmp_func+0x10/0x10 [25338.554061] virtblk_probe+0xe6/0x8a0 [virtio_blk] [25338.558846] virtio_dev_probe+0x158/0x1f0 [25338.562861] really_probe+0x255/0x4a0 [25338.566524] ? __driver_attach_async_helper+0x90/0x90 [25338.571567] driver_probe_device+0x49/0xc0 [25338.575660] bus_for_each_drv+0x79/0xc0 [25338.579499] __device_attach+0xdc/0x160 [25338.583337] bus_probe_device+0x9d/0xb0 [25338.587167] device_add+0x418/0x780 [25338.590654] register_virtio_device+0x9e/0xe0 [25338.595011] virtio_pci_probe+0xb3/0x140 [25338.598941] local_pci_probe+0x41/0x90 [25338.602689] work_for_cpu_fn+0x16/0x20 [25338.606443] process_one_work+0x1a7/0x360 [25338.610456] ? create_worker+0x1a0/0x1a0 [25338.614381] worker_thread+0x1cf/0x390 [25338.618132] ? create_worker+0x1a0/0x1a0 [25338.622051] kthread+0x116/0x130 [25338.625283] ? kthread_flush_work_fn+0x10/0x10 [25338.629731] ret_from_fork+0x1f/0x40 [25338.633395] virtio_blk: probe of virtio418 failed with error -16
The log : "genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer)" was printed because of the irq 0 is used by timer exclusive,and when vp_find_vqs call vp_find_vqs_msix and returns false twice (for whatever reason), then it will call vp_find_vqs_intx as a fallback. Because vp_dev->pci_dev->irq is zero, we request irq 0 with flag IRQF_SHARED, and get a backtrace like above.
According to PCI spec about "Interrupt Pin" Register (Offset 3Dh): "The Interrupt Pin register is a read-only register that identifies the legacy interrupt Message(s) the Function uses. Valid values are 01h, 02h, 03h, and 04h that map to legacy interrupt Messages for INTA, INTB, INTC, and INTD respectively. A value of 00h indicates that the Function uses no legacy interrupt Message(s)."
So if vp_dev->pci_dev->pin is zero, we should not request legacy interrupt.
Signed-off-by: Angus Chen angus.chen@jaguarmicro.com Suggested-by: Michael S. Tsirkin mst@redhat.com Message-Id: 20220930000915.548-1-angus.chen@jaguarmicro.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/virtio/virtio_pci_common.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 1e890ef17687..0b4b3011ff92 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -403,6 +403,9 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs, err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, false, ctx, desc); if (!err) return 0; + /* Is there an interrupt pin? If not give up. */ + if (!(to_vp_device(vdev)->pci_dev->pin)) + return err; /* Finally fall back to regular interrupts. */ return vp_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx); }
linux-stable-mirror@lists.linaro.org