This is the start of the stable review cycle for the 4.9.169 release. There are 76 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:37 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.169-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.169-rc1
Andre Przywara andre.przywara@arm.com PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
Max Filippov jcmvbkbc@gmail.com xtensa: fix return_address
Mel Gorman mgorman@techsingularity.net sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
Dan Carpenter dan.carpenter@oracle.com xen: Prevent buffer overflow in privcmd ioctl
Will Deacon will.deacon@arm.com arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
David Engraf david.engraf@sysgo.com ARM: dts: at91: Fix typo in ISC_D0 on PC9
Cornelia Huck cohuck@redhat.com virtio: Honour 'may_reduce_num' in vring_create_virtqueue
Stephen Boyd swboyd@chromium.org genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
Jérôme Glisse jglisse@redhat.com block: do not leak memory in bio_copy_user_iov()
Filipe Manana fdmanana@suse.com Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
S.j. Wang shengjiu.wang@nxp.com ASoC: fsl_esai: fix channel swap issue when stream starts
Arnd Bergmann arnd@arndb.de include/linux/bitrev.h: fix constant bitrev
Helge Deller deller@gmx.de parisc: Detect QEMU earlier in boot process
Zubin Mithra zsm@chromium.org ALSA: seq: Fix OOB-reads from strlcpy
Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type
Li RongQing lirongqing@baidu.com net: ethtool: not call vzalloc for zero sized memory request
Eric Dumazet edumazet@google.com netns: provide pure entropy for net_hash_mix()
Yuval Avnery yuvalav@mellanox.com net/mlx5e: Add a lock on tir list
Michael Chan michael.chan@broadcom.com bnxt_en: Improve RX consumer index validity check.
Michael Chan michael.chan@broadcom.com bnxt_en: Reset device on RX buffer errors.
Stephen Suryaputra ssuryaextr@gmail.com vrf: check accept_source_route on the original netdevice
Koen De Schepper koen.de_schepper@nokia-bell-labs.com tcp: Ensure DCTCP reacts to losses
Xin Long lucien.xin@gmail.com sctp: initialize _pad of sockaddr_in before copying to user memory
Bjørn Mork bjorn@mork.no qmi_wwan: add Olicard 600
Andrea Righi andrea.righi@canonical.com openvswitch: fix flow actions reallocation
Mao Wenan maowenan@huawei.com net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().
Jiri Slaby jslaby@suse.cz kcm: switch order of device registration to fix a crash
Lorenzo Bianconi lorenzo.bianconi@redhat.com ipv6: sit: reset ip header pointer in ipip6_rcv
Junwei Hu hujunwei4@huawei.com ipv6: Fix dangling pointer when ipv6 fragment
Greg Kroah-Hartman gregkh@linuxfoundation.org tty: ldisc: add sysctl to prevent autoloading of ldiscs
Greg Kroah-Hartman gregkh@linuxfoundation.org tty: mark Siemens R3964 line discipline as BROKEN
Yueyi Li liyueyi@live.com arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear region
Michael Ellerman mpe@ellerman.id.au powerpc/security: Fix spectre_v2 reporting
Christophe Leroy christophe.leroy@c-s.fr powerpc/fsl: Fix the flush of branch predictor.
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Update Spectre v2 reporting
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Flush branch predictor when entering KVM
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Add nospectre_v2 command line argument
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Emulate SPRN_BUCSR register
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Fix spectre_v2 mitigations reporting
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Add macro to flush the branch predictor
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Add infrastructure to fixup branch predictor flush
Michael Ellerman mpe@ellerman.id.au powerpc/powernv: Query firmware for count cache flush settings
Michael Ellerman mpe@ellerman.id.au powerpc/pseries: Query hypervisor for count cache flush settings
Michael Ellerman mpe@ellerman.id.au powerpc/64s: Add support for software count cache flush
Michael Ellerman mpe@ellerman.id.au powerpc/64s: Add new security feature flags for count cache flush
Michael Ellerman mpe@ellerman.id.au powerpc/asm: Add a patch_site macro & helpers for patching instructions
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platforms
Diana Craciun diana.craciun@nxp.com powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
Diana Craciun diana.craciun@nxp.com powerpc/64: Make meltdown reporting Book3S 64 specific
Michael Ellerman mpe@ellerman.id.au powerpc/64: Call setup_barrier_nospec() from setup_arch()
Michael Ellerman mpe@ellerman.id.au powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
Diana Craciun diana.craciun@nxp.com powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.
Diana Craciun diana.craciun@nxp.com powerpc/64: Disable the speculation barrier from the command line
Michael Ellerman mpe@ellerman.id.au powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2
Michal Suchanek msuchanek@suse.de powerpc/64s: Enhance the information in cpu_show_spectre_v1()
Michael Ellerman mpe@ellerman.id.au powerpc/64: Use barrier_nospec in syscall entry
Michael Ellerman mpe@ellerman.id.au powerpc: Use barrier_nospec in copy_from_user()
Michal Suchanek msuchanek@suse.de powerpc/64s: Enable barrier_nospec based on firmware settings
Michal Suchanek msuchanek@suse.de powerpc/64s: Patch barrier_nospec in modules
Michael Neuling mikey@neuling.org powerpc: Avoid code patching freed init sections
Michal Suchanek msuchanek@suse.de powerpc/64s: Add support for ori barrier_nospec patching
Michal Suchanek msuchanek@suse.de powerpc/64s: Add barrier_nospec
Andreas Schwab schwab@linux-m68k.org powerpc: Fix invalid use of register expressions
Nick Desaulniers ndesaulniers@google.com lib/string.c: implement a basic bcmp
Nick Desaulniers ndesaulniers@google.com x86/vdso: Drop implicit common-page-size linker flag
Alistair Strachan astrachan@google.com x86: vdso: Use $LD instead of $CC to link
Nick Desaulniers ndesaulniers@google.com kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD
Breno Leitao leitao@debian.org powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM
Andy Lutomirski luto@kernel.org x86/power: Make restore_processor_context() sane
Andy Lutomirski luto@kernel.org x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
Andy Lutomirski luto@kernel.org x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
Andy Lutomirski luto@kernel.org x86/power: Fix some ordering bugs in __restore_processor_context()
-------------
Diffstat:
Makefile | 6 +- arch/arm/boot/dts/sama5d2-pinfunc.h | 2 +- arch/arm64/include/asm/futex.h | 16 +- arch/arm64/mm/init.c | 2 +- arch/parisc/kernel/process.c | 6 - arch/parisc/kernel/setup.c | 3 + arch/powerpc/Kconfig | 7 +- arch/powerpc/include/asm/asm-prototypes.h | 6 + arch/powerpc/include/asm/barrier.h | 21 ++ arch/powerpc/include/asm/code-patching-asm.h | 18 ++ arch/powerpc/include/asm/code-patching.h | 2 + arch/powerpc/include/asm/feature-fixups.h | 21 ++ arch/powerpc/include/asm/hvcall.h | 2 + arch/powerpc/include/asm/ppc_asm.h | 23 ++- arch/powerpc/include/asm/security_features.h | 7 + arch/powerpc/include/asm/setup.h | 21 ++ arch/powerpc/include/asm/uaccess.h | 11 +- arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/entry_32.S | 10 + arch/powerpc/kernel/entry_64.S | 69 +++++++ arch/powerpc/kernel/exceptions-64e.S | 27 ++- arch/powerpc/kernel/head_booke.h | 12 ++ arch/powerpc/kernel/head_fsl_booke.S | 15 ++ arch/powerpc/kernel/module.c | 10 +- arch/powerpc/kernel/security.c | 216 ++++++++++++++++++++- arch/powerpc/kernel/setup-common.c | 3 + arch/powerpc/kernel/signal_64.c | 23 ++- arch/powerpc/kernel/swsusp_asm64.S | 2 +- arch/powerpc/kernel/vmlinux.lds.S | 19 +- arch/powerpc/kvm/bookehv_interrupts.S | 4 + arch/powerpc/kvm/e500_emulate.c | 7 + arch/powerpc/lib/code-patching.c | 24 +++ arch/powerpc/lib/copypage_power7.S | 14 +- arch/powerpc/lib/copyuser_power7.S | 66 +++---- arch/powerpc/lib/feature-fixups.c | 93 +++++++++ arch/powerpc/lib/memcpy_power7.S | 66 +++---- arch/powerpc/lib/string_64.S | 2 +- arch/powerpc/mm/mem.c | 2 + arch/powerpc/mm/tlb_low_64e.S | 7 + arch/powerpc/platforms/powernv/setup.c | 7 + arch/powerpc/platforms/pseries/setup.c | 7 + arch/x86/entry/vdso/Makefile | 22 +-- arch/x86/include/asm/suspend_32.h | 8 +- arch/x86/include/asm/suspend_64.h | 19 +- arch/x86/include/asm/xen/hypercall.h | 3 + arch/x86/power/cpu.c | 96 ++++----- arch/xtensa/kernel/stacktrace.c | 6 +- block/bio.c | 5 +- drivers/char/Kconfig | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 14 +- .../net/ethernet/mellanox/mlx5/core/en_common.c | 7 + drivers/net/usb/qmi_wwan.c | 1 + drivers/pci/quirks.c | 2 + drivers/tty/Kconfig | 23 +++ drivers/tty/tty_io.c | 3 + drivers/tty/tty_ldisc.c | 47 +++++ drivers/virtio/virtio_ring.c | 2 + fs/btrfs/ioctl.c | 10 + include/linux/bitrev.h | 36 ++-- include/linux/mlx5/driver.h | 2 + include/linux/string.h | 3 + include/linux/virtio_ring.h | 2 +- include/net/ip.h | 2 +- include/net/net_namespace.h | 1 + include/net/netns/hash.h | 15 +- kernel/irq/chip.c | 4 + kernel/sched/fair.c | 6 +- lib/string.c | 20 ++ net/core/ethtool.c | 47 +++-- net/core/net_namespace.c | 1 + net/ipv4/ip_input.c | 7 +- net/ipv4/ip_options.c | 4 +- net/ipv4/tcp_dctcp.c | 36 ++-- net/ipv6/ip6_output.c | 4 +- net/ipv6/ip6_tunnel.c | 4 +- net/ipv6/sit.c | 4 + net/kcm/kcmsock.c | 16 +- net/openvswitch/flow_netlink.c | 4 +- net/rds/tcp.c | 2 +- net/sctp/protocol.c | 1 + sound/core/seq/seq_clientmgr.c | 6 +- sound/soc/fsl/fsl_esai.c | 47 ++++- 82 files changed, 1138 insertions(+), 288 deletions(-)
[ Upstream commit 5b06bbcfc2c621da3009da8decb7511500c293ed ]
__restore_processor_context() had a couple of ordering bugs. It restored GSBASE after calling load_gs_index(), and the latter can call into tracing code. It also tried to restore segment registers before restoring the LDT, which is straight-up wrong.
Reorder the code so that we restore GSBASE, then the descriptor tables, then the segments.
This fixes two bugs. First, it fixes a regression that broke resume under certain configurations due to irqflag tracing in native_load_gs_index(). Second, it fixes resume when the userspace process that initiated suspect had funny segments. The latter can be reproduced by compiling this:
// SPDX-License-Identifier: GPL-2.0 /* * ldt_echo.c - Echo argv[1] while using an LDT segment */
int main(int argc, char **argv) { int ret; size_t len; char *buf;
const struct user_desc desc = { .entry_number = 0, .base_addr = 0, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0 };
if (argc != 2) errx(1, "Usage: %s STRING", argv[0]);
len = asprintf(&buf, "%s\n", argv[1]); if (len < 0) errx(1, "Out of memory");
ret = syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)); if (ret < -1) errno = -ret; if (ret) err(1, "modify_ldt");
asm volatile ("movw %0, %%es" :: "rm" ((unsigned short)7)); write(1, buf, len); return 0; }
and running ldt_echo >/sys/power/mem
Without the fix, the latter causes a triple fault on resume.
Fixes: ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to native_load_gs_index()") Reported-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Andy Lutomirski luto@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Jarkko Nikula jarkko.nikula@linux.intel.com Cc: Peter Zijlstra a.p.zijlstra@chello.nl Cc: Borislav Petkov bp@alien8.de Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lkml.kernel.org/r/6b31721ea92f51ea839e79bd97ade4a75b1eeea2.151205730... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/power/cpu.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 53cace2ec0e2..73063dfed476 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -222,8 +222,20 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) load_idt((const struct desc_ptr *)&ctxt->idt_limit); #endif
+#ifdef CONFIG_X86_64 /* - * segment registers + * We need GSBASE restored before percpu access can work. + * percpu access can happen in exception handlers or in complicated + * helpers like load_gs_index(). + */ + wrmsrl(MSR_GS_BASE, ctxt->gs_base); +#endif + + fix_processor_context(); + + /* + * Restore segment registers. This happens after restoring the GDT + * and LDT, which happen in fix_processor_context(). */ #ifdef CONFIG_X86_32 loadsegment(es, ctxt->es); @@ -244,13 +256,14 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) load_gs_index(ctxt->gs); asm volatile ("movw %0, %%ss" :: "r" (ctxt->ss));
+ /* + * Restore FSBASE and user GSBASE after reloading the respective + * segment selectors. + */ wrmsrl(MSR_FS_BASE, ctxt->fs_base); - wrmsrl(MSR_GS_BASE, ctxt->gs_base); wrmsrl(MSR_KERNEL_GS_BASE, ctxt->gs_kernel_base); #endif
- fix_processor_context(); - do_fpu_end(); x86_platform.restore_sched_clock_state(); mtrr_bp_restore();
[ Upstream commit 090edbe23ff57940fca7f57d9165ce57a826bd7a ]
x86_64's saved_context nonsensically used separate idt_limit and idt_base fields and then cast &idt_limit to struct desc_ptr *.
This was correct (with -fno-strict-aliasing), but it's confusing, served no purpose, and required #ifdeffery. Simplify this by using struct desc_ptr directly.
No change in functionality.
Tested-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Andy Lutomirski luto@kernel.org Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Borislav Petkov bpetkov@suse.de Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Pavel Machek pavel@ucw.cz Cc: Peter Zijlstra peterz@infradead.org Cc: Rafael J. Wysocki rjw@rjwysocki.net Cc: Zhang Rui rui.zhang@intel.com Link: http://lkml.kernel.org/r/967909ce38d341b01d45eff53e278e2728a3a93a.1513286253... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/suspend_64.h | 3 +-- arch/x86/power/cpu.c | 11 +---------- 2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h index 2bd96b4df140..ab899e5f3a85 100644 --- a/arch/x86/include/asm/suspend_64.h +++ b/arch/x86/include/asm/suspend_64.h @@ -29,8 +29,7 @@ struct saved_context { u16 gdt_pad; /* Unused */ struct desc_ptr gdt_desc; u16 idt_pad; - u16 idt_limit; - unsigned long idt_base; + struct desc_ptr idt; u16 ldt; u16 tss; unsigned long tr; diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 73063dfed476..ec923a1cdaf0 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -82,12 +82,8 @@ static void __save_processor_state(struct saved_context *ctxt) /* * descriptor tables */ -#ifdef CONFIG_X86_32 store_idt(&ctxt->idt); -#else -/* CONFIG_X86_64 */ - store_idt((struct desc_ptr *)&ctxt->idt_limit); -#endif + /* * We save it here, but restore it only in the hibernate case. * For ACPI S3 resume, this is loaded via 'early_gdt_desc' in 64-bit @@ -215,12 +211,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) * now restore the descriptor tables to their proper values * ltr is done i fix_processor_context(). */ -#ifdef CONFIG_X86_32 load_idt(&ctxt->idt); -#else -/* CONFIG_X86_64 */ - load_idt((const struct desc_ptr *)&ctxt->idt_limit); -#endif
#ifdef CONFIG_X86_64 /*
[ Upstream commit 896c80bef4d3b357814a476663158aaf669d0fb3 ]
x86_64 restores system call MSRs in fix_processor_context(), and x86_32 restored them along with segment registers. The 64-bit variant makes more sense, so move the 32-bit code to match the 64-bit code.
No side effects are expected to runtime behavior.
Tested-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Andy Lutomirski luto@kernel.org Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Borislav Petkov bpetkov@suse.de Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Pavel Machek pavel@ucw.cz Cc: Peter Zijlstra peterz@infradead.org Cc: Rafael J. Wysocki rjw@rjwysocki.net Cc: Zhang Rui rui.zhang@intel.com Link: http://lkml.kernel.org/r/65158f8d7ee64dd6bbc6c1c83b3b34aaa854e3ae.1513286253... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/power/cpu.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index ec923a1cdaf0..2335e8beb0cf 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -174,6 +174,9 @@ static void fix_processor_context(void) write_gdt_entry(desc, GDT_ENTRY_TSS, &tss, DESC_TSS);
syscall_init(); /* This sets MSR_*STAR and related */ +#else + if (boot_cpu_has(X86_FEATURE_SEP)) + enable_sep_cpu(); #endif load_TR_desc(); /* This does ltr */ load_mm_ldt(current->active_mm); /* This does lldt */ @@ -233,12 +236,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) loadsegment(fs, ctxt->fs); loadsegment(gs, ctxt->gs); loadsegment(ss, ctxt->ss); - - /* - * sysenter MSRs - */ - if (boot_cpu_has(X86_FEATURE_SEP)) - enable_sep_cpu(); #else /* CONFIG_X86_64 */ asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
[ Upstream commit 7ee18d677989e99635027cee04c878950e0752b9 ]
My previous attempt to fix a couple of bugs in __restore_processor_context():
5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")
... introduced yet another bug, breaking suspend-resume.
Rather than trying to come up with a minimal fix, let's try to clean it up for real. This patch fixes quite a few things:
- The old code saved a nonsensical subset of segment registers. The only registers that need to be saved are those that contain userspace state or those that can't be trivially restored without percpu access working. (On x86_32, we can restore percpu access by writing __KERNEL_PERCPU to %fs. On x86_64, it's easier to save and restore the kernel's GSBASE.) With this patch, we restore hardcoded values to the kernel state where applicable and explicitly restore the user state after fixing all the descriptor tables.
- We used to use an unholy mix of inline asm and C helpers for segment register access. Let's get rid of the inline asm.
This fixes the reported s2ram hangs and make the code all around more logical.
Analyzed-by: Linus Torvalds torvalds@linux-foundation.org Reported-by: Jarkko Nikula jarkko.nikula@linux.intel.com Reported-by: Pavel Machek pavel@ucw.cz Tested-by: Jarkko Nikula jarkko.nikula@linux.intel.com Tested-by: Pavel Machek pavel@ucw.cz Signed-off-by: Andy Lutomirski luto@kernel.org Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Borislav Petkov bpetkov@suse.de Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Rafael J. Wysocki rjw@rjwysocki.net Cc: Zhang Rui rui.zhang@intel.com Fixes: 5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()") Link: http://lkml.kernel.org/r/398ee68e5c0f766425a7b746becfc810840770ff.1513286253... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/suspend_32.h | 8 +++- arch/x86/include/asm/suspend_64.h | 16 ++++++- arch/x86/power/cpu.c | 79 ++++++++++++++++--------------- 3 files changed, 62 insertions(+), 41 deletions(-)
diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h index 8e9dbe7b73a1..5cc2ce4ab8a3 100644 --- a/arch/x86/include/asm/suspend_32.h +++ b/arch/x86/include/asm/suspend_32.h @@ -11,7 +11,13 @@
/* image of the saved processor state */ struct saved_context { - u16 es, fs, gs, ss; + /* + * On x86_32, all segment registers, with the possible exception of + * gs, are saved at kernel entry in pt_regs. + */ +#ifdef CONFIG_X86_32_LAZY_GS + u16 gs; +#endif unsigned long cr0, cr2, cr3, cr4; u64 misc_enable; bool misc_enable_saved; diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h index ab899e5f3a85..701751918921 100644 --- a/arch/x86/include/asm/suspend_64.h +++ b/arch/x86/include/asm/suspend_64.h @@ -19,8 +19,20 @@ */ struct saved_context { struct pt_regs regs; - u16 ds, es, fs, gs, ss; - unsigned long gs_base, gs_kernel_base, fs_base; + + /* + * User CS and SS are saved in current_pt_regs(). The rest of the + * segment selectors need to be saved and restored here. + */ + u16 ds, es, fs, gs; + + /* + * Usermode FSBASE and GSBASE may not match the fs and gs selectors, + * so we save them separately. We save the kernelmode GSBASE to + * restore percpu access after resume. + */ + unsigned long kernelmode_gs_base, usermode_gs_base, fs_base; + unsigned long cr0, cr2, cr3, cr4, cr8; u64 misc_enable; bool misc_enable_saved; diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 2335e8beb0cf..054e27671df9 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -99,22 +99,18 @@ static void __save_processor_state(struct saved_context *ctxt) /* * segment registers */ -#ifdef CONFIG_X86_32 - savesegment(es, ctxt->es); - savesegment(fs, ctxt->fs); +#ifdef CONFIG_X86_32_LAZY_GS savesegment(gs, ctxt->gs); - savesegment(ss, ctxt->ss); -#else -/* CONFIG_X86_64 */ - asm volatile ("movw %%ds, %0" : "=m" (ctxt->ds)); - asm volatile ("movw %%es, %0" : "=m" (ctxt->es)); - asm volatile ("movw %%fs, %0" : "=m" (ctxt->fs)); - asm volatile ("movw %%gs, %0" : "=m" (ctxt->gs)); - asm volatile ("movw %%ss, %0" : "=m" (ctxt->ss)); +#endif +#ifdef CONFIG_X86_64 + savesegment(gs, ctxt->gs); + savesegment(fs, ctxt->fs); + savesegment(ds, ctxt->ds); + savesegment(es, ctxt->es);
rdmsrl(MSR_FS_BASE, ctxt->fs_base); - rdmsrl(MSR_GS_BASE, ctxt->gs_base); - rdmsrl(MSR_KERNEL_GS_BASE, ctxt->gs_kernel_base); + rdmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base); + rdmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base); mtrr_save_fixed_ranges(NULL);
rdmsrl(MSR_EFER, ctxt->efer); @@ -185,9 +181,12 @@ static void fix_processor_context(void) }
/** - * __restore_processor_state - restore the contents of CPU registers saved - * by __save_processor_state() - * @ctxt - structure to load the registers contents from + * __restore_processor_state - restore the contents of CPU registers saved + * by __save_processor_state() + * @ctxt - structure to load the registers contents from + * + * The asm code that gets us here will have restored a usable GDT, although + * it will be pointing to the wrong alias. */ static void notrace __restore_processor_state(struct saved_context *ctxt) { @@ -210,46 +209,50 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) write_cr2(ctxt->cr2); write_cr0(ctxt->cr0);
+ /* Restore the IDT. */ + load_idt(&ctxt->idt); + /* - * now restore the descriptor tables to their proper values - * ltr is done i fix_processor_context(). + * Just in case the asm code got us here with the SS, DS, or ES + * out of sync with the GDT, update them. */ - load_idt(&ctxt->idt); + loadsegment(ss, __KERNEL_DS); + loadsegment(ds, __USER_DS); + loadsegment(es, __USER_DS);
-#ifdef CONFIG_X86_64 /* - * We need GSBASE restored before percpu access can work. - * percpu access can happen in exception handlers or in complicated - * helpers like load_gs_index(). + * Restore percpu access. Percpu access can happen in exception + * handlers or in complicated helpers like load_gs_index(). */ - wrmsrl(MSR_GS_BASE, ctxt->gs_base); +#ifdef CONFIG_X86_64 + wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base); +#else + loadsegment(fs, __KERNEL_PERCPU); + loadsegment(gs, __KERNEL_STACK_CANARY); #endif
+ /* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */ fix_processor_context();
/* - * Restore segment registers. This happens after restoring the GDT - * and LDT, which happen in fix_processor_context(). + * Now that we have descriptor tables fully restored and working + * exception handling, restore the usermode segments. */ -#ifdef CONFIG_X86_32 +#ifdef CONFIG_X86_64 + loadsegment(ds, ctxt->es); loadsegment(es, ctxt->es); loadsegment(fs, ctxt->fs); - loadsegment(gs, ctxt->gs); - loadsegment(ss, ctxt->ss); -#else -/* CONFIG_X86_64 */ - asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds)); - asm volatile ("movw %0, %%es" :: "r" (ctxt->es)); - asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs)); load_gs_index(ctxt->gs); - asm volatile ("movw %0, %%ss" :: "r" (ctxt->ss));
/* - * Restore FSBASE and user GSBASE after reloading the respective - * segment selectors. + * Restore FSBASE and GSBASE after restoring the selectors, since + * restoring the selectors clobbers the bases. Keep in mind + * that MSR_KERNEL_GS_BASE is horribly misnamed. */ wrmsrl(MSR_FS_BASE, ctxt->fs_base); - wrmsrl(MSR_KERNEL_GS_BASE, ctxt->gs_kernel_base); + wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base); +#elif defined(CONFIG_X86_32_LAZY_GS) + loadsegment(gs, ctxt->gs); #endif
do_fpu_end();
[ Upstream commit 897bc3df8c5aebb54c32d831f917592e873d0559 ]
Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") moved a code block around and this block uses a 'msr' variable outside of the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when CONFIG_PPC_TRANSACTION_MEM is not defined.
error: 'msr' undeclared (first use in this function)
This is not causing a compilation error in the mainline kernel, because 'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set:
#define MSR_TM_ACTIVE(x) 0
This patch just fixes this issue avoiding the 'msr' variable usage outside the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the MSR_TM_ACTIVE() definition.
Cc: stable@vger.kernel.org Reported-by: Christoph Biedl linux-kernel.bfrz@manchmal.in-ulm.de Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") Signed-off-by: Breno Leitao leitao@debian.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/signal_64.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index d929afab7b24..bdf2f7b995bb 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -746,12 +746,25 @@ int sys_rt_sigreturn(unsigned long r3, unsigned long r4, unsigned long r5, if (restore_tm_sigcontexts(current, &uc->uc_mcontext, &uc_transact->uc_mcontext)) goto badframe; - } - else - /* Fall through, for non-TM restore */ + } else #endif - if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) - goto badframe; + { + /* + * Fall through, for non-TM restore + * + * Unset MSR[TS] on the thread regs since MSR from user + * context does not have MSR active, and recheckpoint was + * not called since restore_tm_sigcontexts() was not called + * also. + * + * If not unsetting it, the code can RFID to userspace with + * MSR[TS] set, but without CPU in the proper state, + * causing a TM bad thing. + */ + current->thread.regs->msr &= ~MSR_TS_MASK; + if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) + goto badframe; + }
if (restore_altstack(&uc->uc_stack)) goto badframe;
commit ad15006cc78459d059af56729c4d9bed7c7fd860 upstream.
This causes an issue when trying to build with `make LD=ld.lld` if ld.lld and the rest of your cross tools aren't in the same directory (ex. /usr/local/bin) (as is the case for Android's build system), as the GCC_TOOLCHAIN_DIR then gets set based on `which $(LD)` which will point where LLVM tools are, not GCC/binutils tools are located.
Instead, select the GCC_TOOLCHAIN_DIR based on another tool provided by binutils for which LLVM does not provide a substitute for, such as elfedit.
Fixes: 785f11aa595b ("kbuild: Add better clang cross build support") Link: https://github.com/ClangBuiltLinux/linux/issues/341 Suggested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Nathan Chancellor natechancellor@gmail.com Tested-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile index f44094d2b147..ba3c48809f44 100644 --- a/Makefile +++ b/Makefile @@ -507,7 +507,7 @@ endif ifeq ($(cc-name),clang) ifneq ($(CROSS_COMPILE),) CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%)) -GCC_TOOLCHAIN_DIR := $(dir $(shell which $(LD))) +GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit)) CLANG_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR) GCC_TOOLCHAIN := $(realpath $(GCC_TOOLCHAIN_DIR)/..) endif
The vdso{32,64}.so can fail to link with CC=clang when clang tries to find a suitable GCC toolchain to link these libraries with.
/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: access beyond end of merged section (782)
This happens because the host environment leaked into the cross compiler environment due to the way clang searches for suitable GCC toolchains.
Clang is a retargetable compiler, and each invocation of it must provide --target=<something> --gcc-toolchain=<something> to allow it to find the correct binutils for cross compilation. These flags had been added to KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various reasons) which breaks clang's ability to find the correct linker when cross compiling.
Most of the time this goes unnoticed because the host linker is new enough to work anyway, or is incompatible and skipped, but this cannot be reliably assumed.
This change alters the vdso makefile to just use LD directly, which bypasses clang and thus the searching problem. The makefile will just use ${CROSS_COMPILE}ld instead, which is always what we want. This matches the method used to link vmlinux.
This drops references to DISABLE_LTO; this option doesn't seem to be set anywhere, and not knowing what its possible values are, it's not clear how to convert it from CC to LD flag.
Signed-off-by: Alistair Strachan astrachan@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: kernel-team@android.com Cc: joel@joelfernandes.org Cc: Andi Kleen andi.kleen@intel.com Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrachan@google.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/entry/vdso/Makefile | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index d5409660f5de..2ae92c6b1de6 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -47,10 +47,8 @@ targets += $(vdso_img_sodbg)
export CPPFLAGS_vdso.lds += -P -C
-VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \ - -Wl,--no-undefined \ - -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \ - $(DISABLE_LTO) +VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \ + -z max-page-size=4096 -z common-page-size=4096
$(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) @@ -96,10 +94,8 @@ CFLAGS_REMOVE_vvar.o = -pg #
CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \ - -Wl,-soname=linux-vdso.so.1 \ - -Wl,-z,max-page-size=4096 \ - -Wl,-z,common-page-size=4096 +VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \ + -z max-page-size=4096 -z common-page-size=4096
# 64-bit objects to re-brand as x32 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y)) @@ -127,7 +123,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE $(call if_changed,vdso)
CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1 +VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
# This makes sure the $(obj) subdirectory exists even though vdso32/ # is not a kbuild sub-make subdirectory. @@ -165,13 +161,13 @@ $(obj)/vdso32.so.dbg: FORCE \ # The DSO images are built using a special linker script. # quiet_cmd_vdso = VDSO $@ - cmd_vdso = $(CC) -nostdlib -o $@ \ + cmd_vdso = $(LD) -nostdlib -o $@ \ $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter %.lds,$(^F))) \ - -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \ + -T $(filter %.lds,$^) $(filter %.o,$^) && \ sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) \ - $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS) +VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \ + $(call ld-option, --build-id) -Bsymbolic GCOV_PROFILE := n
#
GNU linker's -z common-page-size's default value is based on the target architecture. arch/x86/entry/vdso/Makefile sets it to the architecture default, which is implicit and redundant. Drop it.
Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Reported-by: Dmitry Golovin dima@golovin.in Reported-by: Bill Wendling morbo@google.com Suggested-by: Dmitry Golovin dima@golovin.in Suggested-by: Rui Ueyama ruiu@google.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Andy Lutomirski luto@kernel.org Cc: Andi Kleen andi@firstfloor.org Cc: Fangrui Song maskray@google.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com Link: https://bugs.llvm.org/show_bug.cgi?id=38774 Link: https://github.com/ClangBuiltLinux/linux/issues/31 Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/entry/vdso/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 2ae92c6b1de6..756dc9432d15 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -48,7 +48,7 @@ targets += $(vdso_img_sodbg) export CPPFLAGS_vdso.lds += -P -C
VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \ - -z max-page-size=4096 -z common-page-size=4096 + -z max-page-size=4096
$(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) @@ -95,7 +95,7 @@ CFLAGS_REMOVE_vvar.o = -pg
CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \ - -z max-page-size=4096 -z common-page-size=4096 + -z max-page-size=4096
# 64-bit objects to re-brand as x32 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y))
[ Upstream commit 5f074f3e192f10c9fade898b9b3b8812e3d83342 ]
A recent optimization in Clang (r355672) lowers comparisons of the return value of memcmp against zero to comparisons of the return value of bcmp against zero. This helps some platforms that implement bcmp more efficiently than memcmp. glibc simply aliases bcmp to memcmp, but an optimized implementation is in the works.
This results in linkage failures for all targets with Clang due to the undefined symbol. For now, just implement bcmp as a tailcail to memcmp to unbreak the build. This routine can be further optimized in the future.
Other ideas discussed:
* A weak alias was discussed, but breaks for architectures that define their own implementations of memcmp since aliases to declarations are not permitted (only definitions). Arch-specific memcmp implementations typically declare memcmp in C headers, but implement them in assembly.
* -ffreestanding also is used sporadically throughout the kernel.
* -fno-builtin-bcmp doesn't work when doing LTO.
Link: https://bugs.llvm.org/show_bug.cgi?id=41035 Link: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#bcmp Link: https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18... Link: https://github.com/ClangBuiltLinux/linux/issues/416 Link: http://lkml.kernel.org/r/20190313211335.165605-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Reported-by: Nathan Chancellor natechancellor@gmail.com Reported-by: Adhemerval Zanella adhemerval.zanella@linaro.org Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: James Y Knight jyknight@google.com Suggested-by: Masahiro Yamada yamada.masahiro@socionext.com Suggested-by: Nathan Chancellor natechancellor@gmail.com Suggested-by: Rasmus Villemoes linux@rasmusvillemoes.dk Acked-by: Steven Rostedt (VMware) rostedt@goodmis.org Reviewed-by: Nathan Chancellor natechancellor@gmail.com Tested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Masahiro Yamada yamada.masahiro@socionext.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: David Laight David.Laight@ACULAB.COM Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Namhyung Kim namhyung@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Dan Williams dan.j.williams@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/string.h | 3 +++ lib/string.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+)
--- a/include/linux/string.h +++ b/include/linux/string.h @@ -111,6 +111,9 @@ extern void * memscan(void *,int,__kerne #ifndef __HAVE_ARCH_MEMCMP extern int memcmp(const void *,const void *,__kernel_size_t); #endif +#ifndef __HAVE_ARCH_BCMP +extern int bcmp(const void *,const void *,__kernel_size_t); +#endif #ifndef __HAVE_ARCH_MEMCHR extern void * memchr(const void *,int,__kernel_size_t); #endif --- a/lib/string.c +++ b/lib/string.c @@ -772,6 +772,26 @@ __visible int memcmp(const void *cs, con EXPORT_SYMBOL(memcmp); #endif
+#ifndef __HAVE_ARCH_BCMP +/** + * bcmp - returns 0 if and only if the buffers have identical contents. + * @a: pointer to first buffer. + * @b: pointer to second buffer. + * @len: size of buffers. + * + * The sign or magnitude of a non-zero return value has no particular + * meaning, and architectures may implement their own more efficient bcmp(). So + * while this particular implementation is a simple (tail) call to memcmp, do + * not rely on anything but whether the return value is zero or non-zero. + */ +#undef bcmp +int bcmp(const void *a, const void *b, size_t len) +{ + return memcmp(a, b, len); +} +EXPORT_SYMBOL(bcmp); +#endif + #ifndef __HAVE_ARCH_MEMSCAN /** * memscan - Find a character in an area of memory.
commit 8a583c0a8d316d8ea52ea78491174ab1a3e9ef9d upstream.
binutils >= 2.26 now warns about misuse of register expressions in assembler operands that are actually literals, for example:
arch/powerpc/kernel/entry_64.S:535: Warning: invalid register expression
In practice these are almost all uses of r0 that should just be a literal 0.
Signed-off-by: Andreas Schwab schwab@linux-m68k.org [mpe: Mention r0 is almost always the culprit, fold in purgatory change] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/ppc_asm.h | 2 +- arch/powerpc/kernel/swsusp_asm64.S | 2 +- arch/powerpc/lib/copypage_power7.S | 14 +++---- arch/powerpc/lib/copyuser_power7.S | 66 +++++++++++++++--------------- arch/powerpc/lib/memcpy_power7.S | 66 +++++++++++++++--------------- arch/powerpc/lib/string_64.S | 2 +- 6 files changed, 76 insertions(+), 76 deletions(-)
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index c73750b0d9fa..24e95be3bfaf 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -437,7 +437,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601) .machine push ; \ .machine "power4" ; \ lis scratch,0x60000000@h; \ - dcbt r0,scratch,0b01010; \ + dcbt 0,scratch,0b01010; \ .machine pop
/* diff --git a/arch/powerpc/kernel/swsusp_asm64.S b/arch/powerpc/kernel/swsusp_asm64.S index 988f38dced0f..82d8aae81c6a 100644 --- a/arch/powerpc/kernel/swsusp_asm64.S +++ b/arch/powerpc/kernel/swsusp_asm64.S @@ -179,7 +179,7 @@ nothing_to_copy: sld r3, r3, r0 li r0, 0 1: - dcbf r0,r3 + dcbf 0,r3 addi r3,r3,0x20 bdnz 1b
diff --git a/arch/powerpc/lib/copypage_power7.S b/arch/powerpc/lib/copypage_power7.S index a84d333ecb09..ca5fc8fa7efc 100644 --- a/arch/powerpc/lib/copypage_power7.S +++ b/arch/powerpc/lib/copypage_power7.S @@ -45,13 +45,13 @@ _GLOBAL(copypage_power7) .machine push .machine "power4" /* setup read stream 0 */ - dcbt r0,r4,0b01000 /* addr from */ - dcbt r0,r7,0b01010 /* length and depth from */ + dcbt 0,r4,0b01000 /* addr from */ + dcbt 0,r7,0b01010 /* length and depth from */ /* setup write stream 1 */ - dcbtst r0,r9,0b01000 /* addr to */ - dcbtst r0,r10,0b01010 /* length and depth to */ + dcbtst 0,r9,0b01000 /* addr to */ + dcbtst 0,r10,0b01010 /* length and depth to */ eieio - dcbt r0,r8,0b01010 /* all streams GO */ + dcbt 0,r8,0b01010 /* all streams GO */ .machine pop
#ifdef CONFIG_ALTIVEC @@ -83,7 +83,7 @@ _GLOBAL(copypage_power7) li r12,112
.align 5 -1: lvx v7,r0,r4 +1: lvx v7,0,r4 lvx v6,r4,r6 lvx v5,r4,r7 lvx v4,r4,r8 @@ -92,7 +92,7 @@ _GLOBAL(copypage_power7) lvx v1,r4,r11 lvx v0,r4,r12 addi r4,r4,128 - stvx v7,r0,r3 + stvx v7,0,r3 stvx v6,r3,r6 stvx v5,r3,r7 stvx v4,r3,r8 diff --git a/arch/powerpc/lib/copyuser_power7.S b/arch/powerpc/lib/copyuser_power7.S index da0c568d18c4..391694814691 100644 --- a/arch/powerpc/lib/copyuser_power7.S +++ b/arch/powerpc/lib/copyuser_power7.S @@ -327,13 +327,13 @@ err1; stb r0,0(r3) .machine push .machine "power4" /* setup read stream 0 */ - dcbt r0,r6,0b01000 /* addr from */ - dcbt r0,r7,0b01010 /* length and depth from */ + dcbt 0,r6,0b01000 /* addr from */ + dcbt 0,r7,0b01010 /* length and depth from */ /* setup write stream 1 */ - dcbtst r0,r9,0b01000 /* addr to */ - dcbtst r0,r10,0b01010 /* length and depth to */ + dcbtst 0,r9,0b01000 /* addr to */ + dcbtst 0,r10,0b01010 /* length and depth to */ eieio - dcbt r0,r8,0b01010 /* all streams GO */ + dcbt 0,r8,0b01010 /* all streams GO */ .machine pop
beq cr1,.Lunwind_stack_nonvmx_copy @@ -388,26 +388,26 @@ err3; std r0,0(r3) li r11,48
bf cr7*4+3,5f -err3; lvx v1,r0,r4 +err3; lvx v1,0,r4 addi r4,r4,16 -err3; stvx v1,r0,r3 +err3; stvx v1,0,r3 addi r3,r3,16
5: bf cr7*4+2,6f -err3; lvx v1,r0,r4 +err3; lvx v1,0,r4 err3; lvx v0,r4,r9 addi r4,r4,32 -err3; stvx v1,r0,r3 +err3; stvx v1,0,r3 err3; stvx v0,r3,r9 addi r3,r3,32
6: bf cr7*4+1,7f -err3; lvx v3,r0,r4 +err3; lvx v3,0,r4 err3; lvx v2,r4,r9 err3; lvx v1,r4,r10 err3; lvx v0,r4,r11 addi r4,r4,64 -err3; stvx v3,r0,r3 +err3; stvx v3,0,r3 err3; stvx v2,r3,r9 err3; stvx v1,r3,r10 err3; stvx v0,r3,r11 @@ -433,7 +433,7 @@ err3; stvx v0,r3,r11 */ .align 5 8: -err4; lvx v7,r0,r4 +err4; lvx v7,0,r4 err4; lvx v6,r4,r9 err4; lvx v5,r4,r10 err4; lvx v4,r4,r11 @@ -442,7 +442,7 @@ err4; lvx v2,r4,r14 err4; lvx v1,r4,r15 err4; lvx v0,r4,r16 addi r4,r4,128 -err4; stvx v7,r0,r3 +err4; stvx v7,0,r3 err4; stvx v6,r3,r9 err4; stvx v5,r3,r10 err4; stvx v4,r3,r11 @@ -463,29 +463,29 @@ err4; stvx v0,r3,r16 mtocrf 0x01,r6
bf cr7*4+1,9f -err3; lvx v3,r0,r4 +err3; lvx v3,0,r4 err3; lvx v2,r4,r9 err3; lvx v1,r4,r10 err3; lvx v0,r4,r11 addi r4,r4,64 -err3; stvx v3,r0,r3 +err3; stvx v3,0,r3 err3; stvx v2,r3,r9 err3; stvx v1,r3,r10 err3; stvx v0,r3,r11 addi r3,r3,64
9: bf cr7*4+2,10f -err3; lvx v1,r0,r4 +err3; lvx v1,0,r4 err3; lvx v0,r4,r9 addi r4,r4,32 -err3; stvx v1,r0,r3 +err3; stvx v1,0,r3 err3; stvx v0,r3,r9 addi r3,r3,32
10: bf cr7*4+3,11f -err3; lvx v1,r0,r4 +err3; lvx v1,0,r4 addi r4,r4,16 -err3; stvx v1,r0,r3 +err3; stvx v1,0,r3 addi r3,r3,16
/* Up to 15B to go */ @@ -565,25 +565,25 @@ err3; lvx v0,0,r4 addi r4,r4,16
bf cr7*4+3,5f -err3; lvx v1,r0,r4 +err3; lvx v1,0,r4 VPERM(v8,v0,v1,v16) addi r4,r4,16 -err3; stvx v8,r0,r3 +err3; stvx v8,0,r3 addi r3,r3,16 vor v0,v1,v1
5: bf cr7*4+2,6f -err3; lvx v1,r0,r4 +err3; lvx v1,0,r4 VPERM(v8,v0,v1,v16) err3; lvx v0,r4,r9 VPERM(v9,v1,v0,v16) addi r4,r4,32 -err3; stvx v8,r0,r3 +err3; stvx v8,0,r3 err3; stvx v9,r3,r9 addi r3,r3,32
6: bf cr7*4+1,7f -err3; lvx v3,r0,r4 +err3; lvx v3,0,r4 VPERM(v8,v0,v3,v16) err3; lvx v2,r4,r9 VPERM(v9,v3,v2,v16) @@ -592,7 +592,7 @@ err3; lvx v1,r4,r10 err3; lvx v0,r4,r11 VPERM(v11,v1,v0,v16) addi r4,r4,64 -err3; stvx v8,r0,r3 +err3; stvx v8,0,r3 err3; stvx v9,r3,r9 err3; stvx v10,r3,r10 err3; stvx v11,r3,r11 @@ -618,7 +618,7 @@ err3; stvx v11,r3,r11 */ .align 5 8: -err4; lvx v7,r0,r4 +err4; lvx v7,0,r4 VPERM(v8,v0,v7,v16) err4; lvx v6,r4,r9 VPERM(v9,v7,v6,v16) @@ -635,7 +635,7 @@ err4; lvx v1,r4,r15 err4; lvx v0,r4,r16 VPERM(v15,v1,v0,v16) addi r4,r4,128 -err4; stvx v8,r0,r3 +err4; stvx v8,0,r3 err4; stvx v9,r3,r9 err4; stvx v10,r3,r10 err4; stvx v11,r3,r11 @@ -656,7 +656,7 @@ err4; stvx v15,r3,r16 mtocrf 0x01,r6
bf cr7*4+1,9f -err3; lvx v3,r0,r4 +err3; lvx v3,0,r4 VPERM(v8,v0,v3,v16) err3; lvx v2,r4,r9 VPERM(v9,v3,v2,v16) @@ -665,27 +665,27 @@ err3; lvx v1,r4,r10 err3; lvx v0,r4,r11 VPERM(v11,v1,v0,v16) addi r4,r4,64 -err3; stvx v8,r0,r3 +err3; stvx v8,0,r3 err3; stvx v9,r3,r9 err3; stvx v10,r3,r10 err3; stvx v11,r3,r11 addi r3,r3,64
9: bf cr7*4+2,10f -err3; lvx v1,r0,r4 +err3; lvx v1,0,r4 VPERM(v8,v0,v1,v16) err3; lvx v0,r4,r9 VPERM(v9,v1,v0,v16) addi r4,r4,32 -err3; stvx v8,r0,r3 +err3; stvx v8,0,r3 err3; stvx v9,r3,r9 addi r3,r3,32
10: bf cr7*4+3,11f -err3; lvx v1,r0,r4 +err3; lvx v1,0,r4 VPERM(v8,v0,v1,v16) addi r4,r4,16 -err3; stvx v8,r0,r3 +err3; stvx v8,0,r3 addi r3,r3,16
/* Up to 15B to go */ diff --git a/arch/powerpc/lib/memcpy_power7.S b/arch/powerpc/lib/memcpy_power7.S index 786234fd4e91..193909abd18b 100644 --- a/arch/powerpc/lib/memcpy_power7.S +++ b/arch/powerpc/lib/memcpy_power7.S @@ -261,12 +261,12 @@ _GLOBAL(memcpy_power7)
.machine push .machine "power4" - dcbt r0,r6,0b01000 - dcbt r0,r7,0b01010 - dcbtst r0,r9,0b01000 - dcbtst r0,r10,0b01010 + dcbt 0,r6,0b01000 + dcbt 0,r7,0b01010 + dcbtst 0,r9,0b01000 + dcbtst 0,r10,0b01010 eieio - dcbt r0,r8,0b01010 /* GO */ + dcbt 0,r8,0b01010 /* GO */ .machine pop
beq cr1,.Lunwind_stack_nonvmx_copy @@ -321,26 +321,26 @@ _GLOBAL(memcpy_power7) li r11,48
bf cr7*4+3,5f - lvx v1,r0,r4 + lvx v1,0,r4 addi r4,r4,16 - stvx v1,r0,r3 + stvx v1,0,r3 addi r3,r3,16
5: bf cr7*4+2,6f - lvx v1,r0,r4 + lvx v1,0,r4 lvx v0,r4,r9 addi r4,r4,32 - stvx v1,r0,r3 + stvx v1,0,r3 stvx v0,r3,r9 addi r3,r3,32
6: bf cr7*4+1,7f - lvx v3,r0,r4 + lvx v3,0,r4 lvx v2,r4,r9 lvx v1,r4,r10 lvx v0,r4,r11 addi r4,r4,64 - stvx v3,r0,r3 + stvx v3,0,r3 stvx v2,r3,r9 stvx v1,r3,r10 stvx v0,r3,r11 @@ -366,7 +366,7 @@ _GLOBAL(memcpy_power7) */ .align 5 8: - lvx v7,r0,r4 + lvx v7,0,r4 lvx v6,r4,r9 lvx v5,r4,r10 lvx v4,r4,r11 @@ -375,7 +375,7 @@ _GLOBAL(memcpy_power7) lvx v1,r4,r15 lvx v0,r4,r16 addi r4,r4,128 - stvx v7,r0,r3 + stvx v7,0,r3 stvx v6,r3,r9 stvx v5,r3,r10 stvx v4,r3,r11 @@ -396,29 +396,29 @@ _GLOBAL(memcpy_power7) mtocrf 0x01,r6
bf cr7*4+1,9f - lvx v3,r0,r4 + lvx v3,0,r4 lvx v2,r4,r9 lvx v1,r4,r10 lvx v0,r4,r11 addi r4,r4,64 - stvx v3,r0,r3 + stvx v3,0,r3 stvx v2,r3,r9 stvx v1,r3,r10 stvx v0,r3,r11 addi r3,r3,64
9: bf cr7*4+2,10f - lvx v1,r0,r4 + lvx v1,0,r4 lvx v0,r4,r9 addi r4,r4,32 - stvx v1,r0,r3 + stvx v1,0,r3 stvx v0,r3,r9 addi r3,r3,32
10: bf cr7*4+3,11f - lvx v1,r0,r4 + lvx v1,0,r4 addi r4,r4,16 - stvx v1,r0,r3 + stvx v1,0,r3 addi r3,r3,16
/* Up to 15B to go */ @@ -499,25 +499,25 @@ _GLOBAL(memcpy_power7) addi r4,r4,16
bf cr7*4+3,5f - lvx v1,r0,r4 + lvx v1,0,r4 VPERM(v8,v0,v1,v16) addi r4,r4,16 - stvx v8,r0,r3 + stvx v8,0,r3 addi r3,r3,16 vor v0,v1,v1
5: bf cr7*4+2,6f - lvx v1,r0,r4 + lvx v1,0,r4 VPERM(v8,v0,v1,v16) lvx v0,r4,r9 VPERM(v9,v1,v0,v16) addi r4,r4,32 - stvx v8,r0,r3 + stvx v8,0,r3 stvx v9,r3,r9 addi r3,r3,32
6: bf cr7*4+1,7f - lvx v3,r0,r4 + lvx v3,0,r4 VPERM(v8,v0,v3,v16) lvx v2,r4,r9 VPERM(v9,v3,v2,v16) @@ -526,7 +526,7 @@ _GLOBAL(memcpy_power7) lvx v0,r4,r11 VPERM(v11,v1,v0,v16) addi r4,r4,64 - stvx v8,r0,r3 + stvx v8,0,r3 stvx v9,r3,r9 stvx v10,r3,r10 stvx v11,r3,r11 @@ -552,7 +552,7 @@ _GLOBAL(memcpy_power7) */ .align 5 8: - lvx v7,r0,r4 + lvx v7,0,r4 VPERM(v8,v0,v7,v16) lvx v6,r4,r9 VPERM(v9,v7,v6,v16) @@ -569,7 +569,7 @@ _GLOBAL(memcpy_power7) lvx v0,r4,r16 VPERM(v15,v1,v0,v16) addi r4,r4,128 - stvx v8,r0,r3 + stvx v8,0,r3 stvx v9,r3,r9 stvx v10,r3,r10 stvx v11,r3,r11 @@ -590,7 +590,7 @@ _GLOBAL(memcpy_power7) mtocrf 0x01,r6
bf cr7*4+1,9f - lvx v3,r0,r4 + lvx v3,0,r4 VPERM(v8,v0,v3,v16) lvx v2,r4,r9 VPERM(v9,v3,v2,v16) @@ -599,27 +599,27 @@ _GLOBAL(memcpy_power7) lvx v0,r4,r11 VPERM(v11,v1,v0,v16) addi r4,r4,64 - stvx v8,r0,r3 + stvx v8,0,r3 stvx v9,r3,r9 stvx v10,r3,r10 stvx v11,r3,r11 addi r3,r3,64
9: bf cr7*4+2,10f - lvx v1,r0,r4 + lvx v1,0,r4 VPERM(v8,v0,v1,v16) lvx v0,r4,r9 VPERM(v9,v1,v0,v16) addi r4,r4,32 - stvx v8,r0,r3 + stvx v8,0,r3 stvx v9,r3,r9 addi r3,r3,32
10: bf cr7*4+3,11f - lvx v1,r0,r4 + lvx v1,0,r4 VPERM(v8,v0,v1,v16) addi r4,r4,16 - stvx v8,r0,r3 + stvx v8,0,r3 addi r3,r3,16
/* Up to 15B to go */ diff --git a/arch/powerpc/lib/string_64.S b/arch/powerpc/lib/string_64.S index 57ace356c949..11e6372537fd 100644 --- a/arch/powerpc/lib/string_64.S +++ b/arch/powerpc/lib/string_64.S @@ -192,7 +192,7 @@ err1; std r0,8(r3) mtctr r6 mr r8,r3 14: -err1; dcbz r0,r3 +err1; dcbz 0,r3 add r3,r3,r9 bdnz 14b
commit a6b3964ad71a61bb7c61d80a60bea7d42187b2eb upstream.
A no-op form of ori (or immediate of 0 into r31 and the result stored in r31) has been re-tasked as a speculation barrier. The instruction only acts as a barrier on newer machines with appropriate firmware support. On older CPUs it remains a harmless no-op.
Implement barrier_nospec using this instruction.
mpe: The semantics of the instruction are believed to be that it prevents execution of subsequent instructions until preceding branches have been fully resolved and are no longer executing speculatively. There is no further documentation available at this time.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/barrier.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 798ab37c9930..352ea3e3cc05 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -77,6 +77,21 @@ do { \
#define smp_mb__before_spinlock() smp_mb()
+#ifdef CONFIG_PPC_BOOK3S_64 +/* + * Prevent execution of subsequent instructions until preceding branches have + * been fully resolved and are no longer executing speculatively. + */ +#define barrier_nospec_asm ori 31,31,0 + +// This also acts as a compiler barrier due to the memory clobber. +#define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory") + +#else /* !CONFIG_PPC_BOOK3S_64 */ +#define barrier_nospec_asm +#define barrier_nospec() +#endif + #include <asm-generic/barrier.h>
#endif /* _ASM_POWERPC_BARRIER_H */
commit 2eea7f067f495e33b8b116b35b5988ab2b8aec55 upstream.
Based on the RFI patching. This is required to be able to disable the speculation barrier.
Only one barrier type is supported and it does nothing when the firmware does not enable it. Also re-patching modules is not supported So the only meaningful thing that can be done is patching out the speculation barrier at boot when the user says it is not wanted.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/barrier.h | 2 +- arch/powerpc/include/asm/feature-fixups.h | 9 ++++++++ arch/powerpc/include/asm/setup.h | 1 + arch/powerpc/kernel/security.c | 9 ++++++++ arch/powerpc/kernel/vmlinux.lds.S | 7 ++++++ arch/powerpc/lib/feature-fixups.c | 27 +++++++++++++++++++++++ 6 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 352ea3e3cc05..a8131162104f 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -82,7 +82,7 @@ do { \ * Prevent execution of subsequent instructions until preceding branches have * been fully resolved and are no longer executing speculatively. */ -#define barrier_nospec_asm ori 31,31,0 +#define barrier_nospec_asm NOSPEC_BARRIER_FIXUP_SECTION; nop
// This also acts as a compiler barrier due to the memory clobber. #define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory") diff --git a/arch/powerpc/include/asm/feature-fixups.h b/arch/powerpc/include/asm/feature-fixups.h index 0bf8202feca6..afd3efd38938 100644 --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -213,6 +213,14 @@ void setup_feature_keys(void); FTR_ENTRY_OFFSET 951b-952b; \ .popsection;
+#define NOSPEC_BARRIER_FIXUP_SECTION \ +953: \ + .pushsection __barrier_nospec_fixup,"a"; \ + .align 2; \ +954: \ + FTR_ENTRY_OFFSET 953b-954b; \ + .popsection; +
#ifndef __ASSEMBLY__
@@ -220,6 +228,7 @@ extern long stf_barrier_fallback; extern long __start___stf_entry_barrier_fixup, __stop___stf_entry_barrier_fixup; extern long __start___stf_exit_barrier_fixup, __stop___stf_exit_barrier_fixup; extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; +extern long __start___barrier_nospec_fixup, __stop___barrier_nospec_fixup;
#endif
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 3f160cd20107..703ddf752516 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -50,6 +50,7 @@ enum l1d_flush_type {
void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); +void do_barrier_nospec_fixups(bool enable);
#endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index 2277df84ef6e..8b1cf9c81b82 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -10,10 +10,19 @@ #include <linux/seq_buf.h>
#include <asm/security_features.h> +#include <asm/setup.h>
unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
+static bool barrier_nospec_enabled; + +static void enable_barrier_nospec(bool enable) +{ + barrier_nospec_enabled = enable; + do_barrier_nospec_fixups(enable); +} + ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { bool thread_priv; diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index c16fddbb6ab8..61def0be6914 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -153,6 +153,13 @@ SECTIONS *(__rfi_flush_fixup) __stop___rfi_flush_fixup = .; } + + . = ALIGN(8); + __spec_barrier_fixup : AT(ADDR(__spec_barrier_fixup) - LOAD_OFFSET) { + __start___barrier_nospec_fixup = .; + *(__barrier_nospec_fixup) + __stop___barrier_nospec_fixup = .; + } #endif
EXCEPTION_TABLE(0) diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index cf1398e3c2e0..f82ae6bb2365 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -277,6 +277,33 @@ void do_rfi_flush_fixups(enum l1d_flush_type types) (types & L1D_FLUSH_MTTRIG) ? "mttrig type" : "unknown"); } + +void do_barrier_nospec_fixups(bool enable) +{ + unsigned int instr, *dest; + long *start, *end; + int i; + + start = PTRRELOC(&__start___barrier_nospec_fixup), + end = PTRRELOC(&__stop___barrier_nospec_fixup); + + instr = 0x60000000; /* nop */ + + if (enable) { + pr_info("barrier-nospec: using ORI speculation barrier\n"); + instr = 0x63ff0000; /* ori 31,31,0 speculation barrier */ + } + + for (i = 0; start < end; start++, i++) { + dest = (void *)start + *start; + + pr_devel("patching dest %lx\n", (unsigned long)dest); + patch_instruction(dest, instr); + } + + printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); +} + #endif /* CONFIG_PPC_BOOK3S_64 */
void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end)
commit 51c3c62b58b357e8d35e4cc32f7b4ec907426fe3 upstream.
This stops us from doing code patching in init sections after they've been freed.
In this chain: kvm_guest_init() -> kvm_use_magic_page() -> fault_in_pages_readable() -> __get_user() -> __get_user_nocheck() -> barrier_nospec();
We have a code patching location at barrier_nospec() and kvm_guest_init() is an init function. This whole chain gets inlined, so when we free the init section (hence kvm_guest_init()), this code goes away and hence should no longer be patched.
We seen this as userspace memory corruption when using a memory checker while doing partition migration testing on powervm (this starts the code patching post migration via /sys/kernel/mobility/migration). In theory, it could also happen when using /sys/kernel/debug/powerpc/barrier_nospec.
Cc: stable@vger.kernel.org # 4.13+ Signed-off-by: Michael Neuling mikey@neuling.org Reviewed-by: Nicholas Piggin npiggin@gmail.com Reviewed-by: Christophe Leroy christophe.leroy@c-s.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/setup.h | 1 + arch/powerpc/lib/code-patching.c | 8 ++++++++ arch/powerpc/mm/mem.c | 2 ++ 3 files changed, 11 insertions(+)
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 703ddf752516..709f4e739ae8 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -8,6 +8,7 @@ extern void ppc_printk_progress(char *s, unsigned short hex);
extern unsigned int rtas_data; extern unsigned long long memory_limit; +extern bool init_mem_is_free; extern unsigned long klimit; extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 753d591f1b52..c77c486fbf24 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -14,12 +14,20 @@ #include <asm/page.h> #include <asm/code-patching.h> #include <asm/uaccess.h> +#include <asm/setup.h> +#include <asm/sections.h>
int patch_instruction(unsigned int *addr, unsigned int instr) { int err;
+ /* Make sure we aren't patching a freed init section */ + if (init_mem_is_free && init_section_contains(addr, 4)) { + pr_debug("Skipping init section patching addr: 0x%px\n", addr); + return 0; + } + __put_user_size(instr, addr, 4, err); if (err) return err; diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 5f844337de21..1e93dbc88e80 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -62,6 +62,7 @@ #endif
unsigned long long memory_limit; +bool init_mem_is_free;
#ifdef CONFIG_HIGHMEM pte_t *kmap_pte; @@ -396,6 +397,7 @@ void __init mem_init(void) void free_initmem(void) { ppc_md.progress = ppc_printk_progress; + init_mem_is_free = true; free_initmem_default(POISON_FREE_INITMEM); }
commit 815069ca57c142eb71d27439bc27f41a433a67b3 upstream.
Note that unlike RFI which is patched only in kernel the nospec state reflects settings at the time the module was loaded.
Iterating all modules and re-patching every time the settings change is not implemented.
Based on lwsync patching.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/setup.h | 7 +++++++ arch/powerpc/kernel/module.c | 6 ++++++ arch/powerpc/kernel/security.c | 2 +- arch/powerpc/lib/feature-fixups.c | 16 +++++++++++++--- 4 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 709f4e739ae8..a225b5c42e76 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -52,6 +52,13 @@ enum l1d_flush_type { void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); void do_barrier_nospec_fixups(bool enable); +extern bool barrier_nospec_enabled; + +#ifdef CONFIG_PPC_BOOK3S_64 +void do_barrier_nospec_fixups_range(bool enable, void *start, void *end); +#else +static inline void do_barrier_nospec_fixups_range(bool enable, void *start, void *end) { }; +#endif
#endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index 30b89d5cbb03..d30f0626dcd0 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -72,6 +72,12 @@ int module_finalize(const Elf_Ehdr *hdr, do_feature_fixups(powerpc_firmware_features, (void *)sect->sh_addr, (void *)sect->sh_addr + sect->sh_size); + + sect = find_section(hdr, sechdrs, "__spec_barrier_fixup"); + if (sect != NULL) + do_barrier_nospec_fixups_range(barrier_nospec_enabled, + (void *)sect->sh_addr, + (void *)sect->sh_addr + sect->sh_size); #endif
sect = find_section(hdr, sechdrs, "__lwsync_fixup"); diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index 8b1cf9c81b82..34d436fe2498 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -15,7 +15,7 @@
unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
-static bool barrier_nospec_enabled; +bool barrier_nospec_enabled;
static void enable_barrier_nospec(bool enable) { diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index f82ae6bb2365..a1222c441df5 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -278,14 +278,14 @@ void do_rfi_flush_fixups(enum l1d_flush_type types) : "unknown"); }
-void do_barrier_nospec_fixups(bool enable) +void do_barrier_nospec_fixups_range(bool enable, void *fixup_start, void *fixup_end) { unsigned int instr, *dest; long *start, *end; int i;
- start = PTRRELOC(&__start___barrier_nospec_fixup), - end = PTRRELOC(&__stop___barrier_nospec_fixup); + start = fixup_start; + end = fixup_end;
instr = 0x60000000; /* nop */
@@ -304,6 +304,16 @@ void do_barrier_nospec_fixups(bool enable) printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); }
+void do_barrier_nospec_fixups(bool enable) +{ + void *start, *end; + + start = PTRRELOC(&__start___barrier_nospec_fixup), + end = PTRRELOC(&__stop___barrier_nospec_fixup); + + do_barrier_nospec_fixups_range(enable, start, end); +} + #endif /* CONFIG_PPC_BOOK3S_64 */
void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end)
commit cb3d6759a93c6d0aea1c10deb6d00e111c29c19c upstream.
Check what firmware told us and enable/disable the barrier_nospec as appropriate.
We err on the side of enabling the barrier, as it's no-op on older systems, see the comment for more detail.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/setup.h | 1 + arch/powerpc/kernel/security.c | 60 ++++++++++++++++++++++++++ arch/powerpc/platforms/powernv/setup.c | 1 + arch/powerpc/platforms/pseries/setup.c | 1 + 4 files changed, 63 insertions(+)
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index a225b5c42e76..84ae150ce6a6 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -51,6 +51,7 @@ enum l1d_flush_type {
void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); +void setup_barrier_nospec(void); void do_barrier_nospec_fixups(bool enable); extern bool barrier_nospec_enabled;
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index 34d436fe2498..d0e974da4918 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -9,6 +9,7 @@ #include <linux/device.h> #include <linux/seq_buf.h>
+#include <asm/debug.h> #include <asm/security_features.h> #include <asm/setup.h>
@@ -23,6 +24,65 @@ static void enable_barrier_nospec(bool enable) do_barrier_nospec_fixups(enable); }
+void setup_barrier_nospec(void) +{ + bool enable; + + /* + * It would make sense to check SEC_FTR_SPEC_BAR_ORI31 below as well. + * But there's a good reason not to. The two flags we check below are + * both are enabled by default in the kernel, so if the hcall is not + * functional they will be enabled. + * On a system where the host firmware has been updated (so the ori + * functions as a barrier), but on which the hypervisor (KVM/Qemu) has + * not been updated, we would like to enable the barrier. Dropping the + * check for SEC_FTR_SPEC_BAR_ORI31 achieves that. The only downside is + * we potentially enable the barrier on systems where the host firmware + * is not updated, but that's harmless as it's a no-op. + */ + enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && + security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR); + + enable_barrier_nospec(enable); +} + +#ifdef CONFIG_DEBUG_FS +static int barrier_nospec_set(void *data, u64 val) +{ + switch (val) { + case 0: + case 1: + break; + default: + return -EINVAL; + } + + if (!!val == !!barrier_nospec_enabled) + return 0; + + enable_barrier_nospec(!!val); + + return 0; +} + +static int barrier_nospec_get(void *data, u64 *val) +{ + *val = barrier_nospec_enabled ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_barrier_nospec, + barrier_nospec_get, barrier_nospec_set, "%llu\n"); + +static __init int barrier_nospec_debugfs_init(void) +{ + debugfs_create_file("barrier_nospec", 0600, powerpc_debugfs_root, NULL, + &fops_barrier_nospec); + return 0; +} +device_initcall(barrier_nospec_debugfs_init); +#endif /* CONFIG_DEBUG_FS */ + ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { bool thread_priv; diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 17203abf38e8..eb5464648810 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -123,6 +123,7 @@ static void pnv_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV));
setup_rfi_flush(type, enable); + setup_barrier_nospec(); }
static void __init pnv_setup_arch(void) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 91ade7755823..2b2759c98c7e 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -525,6 +525,7 @@ void pseries_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR);
setup_rfi_flush(types, enable); + setup_barrier_nospec(); }
static void __init pSeries_setup_arch(void)
commit ddf35cf3764b5a182b178105f57515b42e2634f8 upstream.
Based on the x86 commit doing the same.
See commit 304ec1b05031 ("x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec") and b3bbfb3fb5d2 ("x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec") for more detail.
In all cases we are ordering the load from the potentially user-controlled pointer vs a previous branch based on an access_ok() check or similar.
Base on a patch from Michal Suchanek.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/uaccess.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 31913b3ac7ab..da852153c1f8 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -269,6 +269,7 @@ do { \ __chk_user_ptr(ptr); \ if (!is_kernel_addr((unsigned long)__gu_addr)) \ might_fault(); \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -280,8 +281,10 @@ do { \ unsigned long __gu_val = 0; \ __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ might_fault(); \ - if (access_ok(VERIFY_READ, __gu_addr, (size))) \ + if (access_ok(VERIFY_READ, __gu_addr, (size))) { \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ + } \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ }) @@ -292,6 +295,7 @@ do { \ unsigned long __gu_val; \ __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ __chk_user_ptr(ptr); \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -348,15 +352,19 @@ static inline unsigned long __copy_from_user_inatomic(void *to,
switch (n) { case 1: + barrier_nospec(); __get_user_size(*(u8 *)to, from, 1, ret); break; case 2: + barrier_nospec(); __get_user_size(*(u16 *)to, from, 2, ret); break; case 4: + barrier_nospec(); __get_user_size(*(u32 *)to, from, 4, ret); break; case 8: + barrier_nospec(); __get_user_size(*(u64 *)to, from, 8, ret); break; } @@ -366,6 +374,7 @@ static inline unsigned long __copy_from_user_inatomic(void *to,
check_object_size(to, n, false);
+ barrier_nospec(); return __copy_tofrom_user((__force void __user *)to, from, n); }
commit 51973a815c6b46d7b23b68d6af371ad1c9d503ca upstream.
Our syscall entry is done in assembly so patch in an explicit barrier_nospec.
Based on a patch by Michal Suchanek.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/entry_64.S | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index e24ae0fa80ed..11e390662384 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -38,6 +38,7 @@ #include <asm/context_tracking.h> #include <asm/tm.h> #include <asm/ppc-opcode.h> +#include <asm/barrier.h> #include <asm/export.h> #ifdef CONFIG_PPC_BOOK3S #include <asm/exception-64s.h> @@ -180,6 +181,15 @@ system_call: /* label this so stack traces look sane */ clrldi r8,r8,32 15: slwi r0,r0,4 + + barrier_nospec_asm + /* + * Prevent the load of the handler below (based on the user-passed + * system call number) being speculatively executed until the test + * against NR_syscalls and branch to .Lsyscall_enosys above has + * committed. + */ + ldx r12,r11,r0 /* Fetch system call handler [ptr] */ mtctr r12 bctrl /* Call handler */
commit a377514519b9a20fa1ea9adddbb4129573129cef upstream.
We now have barrier_nospec as mitigation so print it in cpu_show_spectre_v1() when enabled.
Signed-off-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/security.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index d0e974da4918..f189f946d935 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -121,6 +121,9 @@ ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, c if (!security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR)) return sprintf(buf, "Not affected\n");
+ if (barrier_nospec_enabled) + return sprintf(buf, "Mitigation: __user pointer sanitization\n"); + return sprintf(buf, "Vulnerable\n"); }
commit 6d44acae1937b81cf8115ada8958e04f601f3f2e upstream.
When I added the spectre_v2 information in sysfs, I included the availability of the ori31 speculation barrier.
Although the ori31 barrier can be used to mitigate v2, it's primarily intended as a spectre v1 mitigation. Spectre v2 is mitigated by hardware changes.
So rework the sysfs files to show the ori31 information in the spectre_v1 file, rather than v2.
Currently we display eg:
$ grep . spectre_v* spectre_v1:Mitigation: __user pointer sanitization spectre_v2:Mitigation: Indirect branch cache disabled, ori31 speculation barrier enabled
After:
$ grep . spectre_v* spectre_v1:Mitigation: __user pointer sanitization, ori31 speculation barrier enabled spectre_v2:Mitigation: Indirect branch cache disabled
Fixes: d6fbe1c55c55 ("powerpc/64s: Wire up cpu_show_spectre_v2()") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/security.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index f189f946d935..bf298d0c475f 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -118,25 +118,35 @@ ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, cha
ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf) { - if (!security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR)) - return sprintf(buf, "Not affected\n"); + struct seq_buf s; + + seq_buf_init(&s, buf, PAGE_SIZE - 1);
- if (barrier_nospec_enabled) - return sprintf(buf, "Mitigation: __user pointer sanitization\n"); + if (security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR)) { + if (barrier_nospec_enabled) + seq_buf_printf(&s, "Mitigation: __user pointer sanitization"); + else + seq_buf_printf(&s, "Vulnerable");
- return sprintf(buf, "Vulnerable\n"); + if (security_ftr_enabled(SEC_FTR_SPEC_BAR_ORI31)) + seq_buf_printf(&s, ", ori31 speculation barrier enabled"); + + seq_buf_printf(&s, "\n"); + } else + seq_buf_printf(&s, "Not affected\n"); + + return s.len; }
ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf) { - bool bcs, ccd, ori; struct seq_buf s; + bool bcs, ccd;
seq_buf_init(&s, buf, PAGE_SIZE - 1);
bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED); ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED); - ori = security_ftr_enabled(SEC_FTR_SPEC_BAR_ORI31);
if (bcs || ccd) { seq_buf_printf(&s, "Mitigation: "); @@ -152,9 +162,6 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, c } else seq_buf_printf(&s, "Vulnerable");
- if (ori) - seq_buf_printf(&s, ", ori31 speculation barrier enabled"); - seq_buf_printf(&s, "\n");
return s.len;
commit cf175dc315f90185128fb061dc05b6fbb211aa2f upstream.
The speculation barrier can be disabled from the command line with the parameter: "nospectre_v1".
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/security.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index bf298d0c475f..813e38ff81ce 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -17,6 +17,7 @@ unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
bool barrier_nospec_enabled; +static bool no_nospec;
static void enable_barrier_nospec(bool enable) { @@ -43,9 +44,18 @@ void setup_barrier_nospec(void) enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR);
- enable_barrier_nospec(enable); + if (!no_nospec) + enable_barrier_nospec(enable); }
+static int __init handle_nospectre_v1(char *p) +{ + no_nospec = true; + + return 0; +} +early_param("nospectre_v1", handle_nospectre_v1); + #ifdef CONFIG_DEBUG_FS static int barrier_nospec_set(void *data, u64 val) {
Hi,
I have tested the patches on NXP platforms and they worked as expected.
Diana
On 4/15/2019 9:45 PM, Greg Kroah-Hartman wrote:
commit cf175dc315f90185128fb061dc05b6fbb211aa2f upstream.
The speculation barrier can be disabled from the command line with the parameter: "nospectre_v1".
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org
arch/powerpc/kernel/security.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index bf298d0c475f..813e38ff81ce 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -17,6 +17,7 @@ unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
bool barrier_nospec_enabled; +static bool no_nospec;
static void enable_barrier_nospec(bool enable) { @@ -43,9 +44,18 @@ void setup_barrier_nospec(void) enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR);
enable_barrier_nospec(enable);
if (!no_nospec)
enable_barrier_nospec(enable);
}
+static int __init handle_nospectre_v1(char *p) +{
no_nospec = true;
return 0;
+} +early_param("nospectre_v1", handle_nospectre_v1);
#ifdef CONFIG_DEBUG_FS static int barrier_nospec_set(void *data, u64 val) { -- 2.19.1
Diana Madalina Craciun diana.craciun@nxp.com writes:
Hi,
I have tested the patches on NXP platforms and they worked as expected.
Thanks Diana.
cheers
On 4/15/2019 9:45 PM, Greg Kroah-Hartman wrote:
commit cf175dc315f90185128fb061dc05b6fbb211aa2f upstream.
The speculation barrier can be disabled from the command line with the parameter: "nospectre_v1".
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org
arch/powerpc/kernel/security.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index bf298d0c475f..813e38ff81ce 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -17,6 +17,7 @@ unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
bool barrier_nospec_enabled; +static bool no_nospec;
static void enable_barrier_nospec(bool enable) { @@ -43,9 +44,18 @@ void setup_barrier_nospec(void) enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) && security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR);
enable_barrier_nospec(enable);
if (!no_nospec)
enable_barrier_nospec(enable);
}
+static int __init handle_nospectre_v1(char *p) +{
no_nospec = true;
return 0;
+} +early_param("nospectre_v1", handle_nospectre_v1);
#ifdef CONFIG_DEBUG_FS static int barrier_nospec_set(void *data, u64 val) { -- 2.19.1
commit 6453b532f2c8856a80381e6b9a1f5ea2f12294df upstream.
NXP Book3E platforms are not vulnerable to speculative store bypass, so make the mitigations PPC_BOOK3S_64 specific.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/security.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index 813e38ff81ce..926ed3c38741 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -177,6 +177,7 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, c return s.len; }
+#ifdef CONFIG_PPC_BOOK3S_64 /* * Store-forwarding barrier support. */ @@ -324,3 +325,4 @@ static __init int stf_barrier_debugfs_init(void) } device_initcall(stf_barrier_debugfs_init); #endif /* CONFIG_DEBUG_FS */ +#endif /* CONFIG_PPC_BOOK3S_64 */
commit 179ab1cbf883575c3a585bcfc0f2160f1d22a149 upstream.
Add a config symbol to encode which platforms support the barrier_nospec speculation barrier. Currently this is just Book3S 64 but we will add Book3E in a future patch.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/Kconfig | 7 ++++++- arch/powerpc/include/asm/barrier.h | 6 +++--- arch/powerpc/include/asm/setup.h | 2 +- arch/powerpc/kernel/Makefile | 3 ++- arch/powerpc/kernel/module.c | 4 +++- arch/powerpc/kernel/vmlinux.lds.S | 4 +++- arch/powerpc/lib/feature-fixups.c | 6 ++++-- 7 files changed, 22 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 0a6bb48854e3..a238698178fc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -128,7 +128,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select GENERIC_SMP_IDLE_THREAD select GENERIC_CMOS_UPDATE - select GENERIC_CPU_VULNERABILITIES if PPC_BOOK3S_64 + select GENERIC_CPU_VULNERABILITIES if PPC_BARRIER_NOSPEC select GENERIC_TIME_VSYSCALL_OLD select GENERIC_CLOCKEVENTS select GENERIC_CLOCKEVENTS_BROADCAST if SMP @@ -164,6 +164,11 @@ config PPC select HAVE_ARCH_HARDENED_USERCOPY select HAVE_KERNEL_GZIP
+config PPC_BARRIER_NOSPEC + bool + default y + depends on PPC_BOOK3S_64 + config GENERIC_CSUM def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index a8131162104f..465a64316897 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -77,7 +77,7 @@ do { \
#define smp_mb__before_spinlock() smp_mb()
-#ifdef CONFIG_PPC_BOOK3S_64 +#ifdef CONFIG_PPC_BARRIER_NOSPEC /* * Prevent execution of subsequent instructions until preceding branches have * been fully resolved and are no longer executing speculatively. @@ -87,10 +87,10 @@ do { \ // This also acts as a compiler barrier due to the memory clobber. #define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory")
-#else /* !CONFIG_PPC_BOOK3S_64 */ +#else /* !CONFIG_PPC_BARRIER_NOSPEC */ #define barrier_nospec_asm #define barrier_nospec() -#endif +#endif /* CONFIG_PPC_BARRIER_NOSPEC */
#include <asm-generic/barrier.h>
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 84ae150ce6a6..38525bd2ed65 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -55,7 +55,7 @@ void setup_barrier_nospec(void); void do_barrier_nospec_fixups(bool enable); extern bool barrier_nospec_enabled;
-#ifdef CONFIG_PPC_BOOK3S_64 +#ifdef CONFIG_PPC_BARRIER_NOSPEC void do_barrier_nospec_fixups_range(bool enable, void *start, void *end); #else static inline void do_barrier_nospec_fixups_range(bool enable, void *start, void *end) { }; diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 13885786282b..d80fbf0884ff 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -44,9 +44,10 @@ obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_ppc970.o cpu_setup_pa6t.o -obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o security.o +obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o obj-$(CONFIG_PPC_BOOK3S_64) += mce.o mce_power.o obj-$(CONFIG_PPC_BOOK3E_64) += exceptions-64e.o idle_book3e.o +obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o obj-$(CONFIG_PPC64) += vdso64/ obj-$(CONFIG_ALTIVEC) += vecemu.o obj-$(CONFIG_PPC_970_NAP) += idle_power4.o diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index d30f0626dcd0..3b1c3bb91025 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -72,13 +72,15 @@ int module_finalize(const Elf_Ehdr *hdr, do_feature_fixups(powerpc_firmware_features, (void *)sect->sh_addr, (void *)sect->sh_addr + sect->sh_size); +#endif /* CONFIG_PPC64 */
+#ifdef CONFIG_PPC_BARRIER_NOSPEC sect = find_section(hdr, sechdrs, "__spec_barrier_fixup"); if (sect != NULL) do_barrier_nospec_fixups_range(barrier_nospec_enabled, (void *)sect->sh_addr, (void *)sect->sh_addr + sect->sh_size); -#endif +#endif /* CONFIG_PPC_BARRIER_NOSPEC */
sect = find_section(hdr, sechdrs, "__lwsync_fixup"); if (sect != NULL) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 61def0be6914..5c6cf58943b9 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -153,14 +153,16 @@ SECTIONS *(__rfi_flush_fixup) __stop___rfi_flush_fixup = .; } +#endif /* CONFIG_PPC64 */
+#ifdef CONFIG_PPC_BARRIER_NOSPEC . = ALIGN(8); __spec_barrier_fixup : AT(ADDR(__spec_barrier_fixup) - LOAD_OFFSET) { __start___barrier_nospec_fixup = .; *(__barrier_nospec_fixup) __stop___barrier_nospec_fixup = .; } -#endif +#endif /* CONFIG_PPC_BARRIER_NOSPEC */
EXCEPTION_TABLE(0)
diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index a1222c441df5..5df57f7bae0a 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -304,6 +304,9 @@ void do_barrier_nospec_fixups_range(bool enable, void *fixup_start, void *fixup_ printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); }
+#endif /* CONFIG_PPC_BOOK3S_64 */ + +#ifdef CONFIG_PPC_BARRIER_NOSPEC void do_barrier_nospec_fixups(bool enable) { void *start, *end; @@ -313,8 +316,7 @@ void do_barrier_nospec_fixups(bool enable)
do_barrier_nospec_fixups_range(enable, start, end); } - -#endif /* CONFIG_PPC_BOOK3S_64 */ +#endif /* CONFIG_PPC_BARRIER_NOSPEC */
void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end) {
commit af375eefbfb27cbb5b831984e66d724a40d26b5c upstream.
Currently we require platform code to call setup_barrier_nospec(). But if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case then we can call it in setup_arch().
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/setup.h | 4 ++++ arch/powerpc/kernel/setup-common.c | 2 ++ arch/powerpc/platforms/powernv/setup.c | 1 - arch/powerpc/platforms/pseries/setup.c | 1 - 4 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 38525bd2ed65..d3e9da62d029 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -51,7 +51,11 @@ enum l1d_flush_type {
void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); +#ifdef CONFIG_PPC_BARRIER_NOSPEC void setup_barrier_nospec(void); +#else +static inline void setup_barrier_nospec(void) { }; +#endif void do_barrier_nospec_fixups(bool enable); extern bool barrier_nospec_enabled;
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index bf0f712ac0e0..d5a128f54537 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -918,6 +918,8 @@ void __init setup_arch(char **cmdline_p) if (ppc_md.setup_arch) ppc_md.setup_arch();
+ setup_barrier_nospec(); + paging_init();
/* Initialize the MMU context management stuff. */ diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index eb5464648810..17203abf38e8 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -123,7 +123,6 @@ static void pnv_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV));
setup_rfi_flush(type, enable); - setup_barrier_nospec(); }
static void __init pnv_setup_arch(void) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 2b2759c98c7e..91ade7755823 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -525,7 +525,6 @@ void pseries_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR);
setup_rfi_flush(types, enable); - setup_barrier_nospec(); }
static void __init pSeries_setup_arch(void)
commit 406d2b6ae3420f5bb2b3db6986dc6f0b6dbb637b upstream.
In a subsequent patch we will enable building security.c for Book3E. However the NXP platforms are not vulnerable to Meltdown, so make the Meltdown vulnerability reporting PPC_BOOK3S_64 specific.
Signed-off-by: Diana Craciun diana.craciun@nxp.com [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/security.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index 926ed3c38741..2f30fc8ed0a8 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -93,6 +93,7 @@ static __init int barrier_nospec_debugfs_init(void) device_initcall(barrier_nospec_debugfs_init); #endif /* CONFIG_DEBUG_FS */
+#ifdef CONFIG_PPC_BOOK3S_64 ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { bool thread_priv; @@ -125,6 +126,7 @@ ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, cha
return sprintf(buf, "Vulnerable\n"); } +#endif
ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf) {
commit ebcd1bfc33c7a90df941df68a6e5d4018c022fba upstream.
Implement the barrier_nospec as a isync;sync instruction sequence. The implementation uses the infrastructure built for BOOK3S 64.
Signed-off-by: Diana Craciun diana.craciun@nxp.com [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/barrier.h | 8 +++++++- arch/powerpc/lib/feature-fixups.c | 31 ++++++++++++++++++++++++++++++ 3 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index a238698178fc..fa8f2aa88189 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -167,7 +167,7 @@ config PPC config PPC_BARRIER_NOSPEC bool default y - depends on PPC_BOOK3S_64 + depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
config GENERIC_CSUM def_bool CPU_LITTLE_ENDIAN diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 465a64316897..80024c4f2093 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -77,12 +77,18 @@ do { \
#define smp_mb__before_spinlock() smp_mb()
+#ifdef CONFIG_PPC_BOOK3S_64 +#define NOSPEC_BARRIER_SLOT nop +#elif defined(CONFIG_PPC_FSL_BOOK3E) +#define NOSPEC_BARRIER_SLOT nop; nop +#endif + #ifdef CONFIG_PPC_BARRIER_NOSPEC /* * Prevent execution of subsequent instructions until preceding branches have * been fully resolved and are no longer executing speculatively. */ -#define barrier_nospec_asm NOSPEC_BARRIER_FIXUP_SECTION; nop +#define barrier_nospec_asm NOSPEC_BARRIER_FIXUP_SECTION; NOSPEC_BARRIER_SLOT
// This also acts as a compiler barrier due to the memory clobber. #define barrier_nospec() asm (stringify_in_c(barrier_nospec_asm) ::: "memory") diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 5df57f7bae0a..b3e362437ec4 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -318,6 +318,37 @@ void do_barrier_nospec_fixups(bool enable) } #endif /* CONFIG_PPC_BARRIER_NOSPEC */
+#ifdef CONFIG_PPC_FSL_BOOK3E +void do_barrier_nospec_fixups_range(bool enable, void *fixup_start, void *fixup_end) +{ + unsigned int instr[2], *dest; + long *start, *end; + int i; + + start = fixup_start; + end = fixup_end; + + instr[0] = PPC_INST_NOP; + instr[1] = PPC_INST_NOP; + + if (enable) { + pr_info("barrier-nospec: using isync; sync as speculation barrier\n"); + instr[0] = PPC_INST_ISYNC; + instr[1] = PPC_INST_SYNC; + } + + for (i = 0; start < end; start++, i++) { + dest = (void *)start + *start; + + pr_devel("patching dest %lx\n", (unsigned long)dest); + patch_instruction(dest, instr[0]); + patch_instruction(dest + 1, instr[1]); + } + + printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); +} +#endif /* CONFIG_PPC_FSL_BOOK3E */ + void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end) { long *start, *end;
commit c28218d4abbf4f2035495334d8bfcba64bda4787 upstream.
Used barrier_nospec to sanitize the syscall table.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/entry_32.S | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 370645687cc7..bdd88f9d7926 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -34,6 +34,7 @@ #include <asm/ftrace.h> #include <asm/ptrace.h> #include <asm/export.h> +#include <asm/barrier.h>
/* * MSR_KERNEL is > 0x10000 on 4xx/Book-E since it include MSR_CE. @@ -347,6 +348,15 @@ syscall_dotrace_cont: ori r10,r10,sys_call_table@l slwi r0,r0,2 bge- 66f + + barrier_nospec_asm + /* + * Prevent the load of the handler below (based on the user-passed + * system call number) being speculatively executed until the test + * against NR_syscalls and branch to .66f above has + * committed. + */ + lwzx r10,r10,r0 /* Fetch system call handler [ptr] */ mtlr r10 addi r9,r1,STACK_FRAME_OVERHEAD
commit 06d0bbc6d0f56dacac3a79900e9a9a0d5972d818 upstream.
Add a macro and some helper C functions for patching single asm instructions.
The gas macro means we can do something like:
1: nop patch_site 1b, patch__foo
Which is less visually distracting than defining a GLOBAL symbol at 1, and also doesn't pollute the symbol table which can confuse eg. perf.
These are obviously similar to our existing feature sections, but are not automatically patched based on CPU/MMU features, rather they are designed to be manually patched by C code at some arbitrary point.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/code-patching-asm.h | 18 ++++++++++++++++++ arch/powerpc/include/asm/code-patching.h | 2 ++ arch/powerpc/lib/code-patching.c | 16 ++++++++++++++++ 3 files changed, 36 insertions(+) create mode 100644 arch/powerpc/include/asm/code-patching-asm.h
diff --git a/arch/powerpc/include/asm/code-patching-asm.h b/arch/powerpc/include/asm/code-patching-asm.h new file mode 100644 index 000000000000..ed7b1448493a --- /dev/null +++ b/arch/powerpc/include/asm/code-patching-asm.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Copyright 2018, Michael Ellerman, IBM Corporation. + */ +#ifndef _ASM_POWERPC_CODE_PATCHING_ASM_H +#define _ASM_POWERPC_CODE_PATCHING_ASM_H + +/* Define a "site" that can be patched */ +.macro patch_site label name + .pushsection ".rodata" + .balign 4 + .global \name +\name: + .4byte \label - . + .popsection +.endm + +#endif /* _ASM_POWERPC_CODE_PATCHING_ASM_H */ diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index b4ab1f497335..ab934f8232bd 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -28,6 +28,8 @@ unsigned int create_cond_branch(const unsigned int *addr, unsigned long target, int flags); int patch_branch(unsigned int *addr, unsigned long target, int flags); int patch_instruction(unsigned int *addr, unsigned int instr); +int patch_instruction_site(s32 *addr, unsigned int instr); +int patch_branch_site(s32 *site, unsigned long target, int flags);
int instr_is_relative_branch(unsigned int instr); int instr_is_relative_link_branch(unsigned int instr); diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index c77c486fbf24..14535ad4cdd1 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -40,6 +40,22 @@ int patch_branch(unsigned int *addr, unsigned long target, int flags) return patch_instruction(addr, create_branch(addr, target, flags)); }
+int patch_branch_site(s32 *site, unsigned long target, int flags) +{ + unsigned int *addr; + + addr = (unsigned int *)((unsigned long)site + *site); + return patch_instruction(addr, create_branch(addr, target, flags)); +} + +int patch_instruction_site(s32 *site, unsigned int instr) +{ + unsigned int *addr; + + addr = (unsigned int *)((unsigned long)site + *site); + return patch_instruction(addr, instr); +} + unsigned int create_branch(const unsigned int *addr, unsigned long target, int flags) {
commit dc8c6cce9a26a51fc19961accb978217a3ba8c75 upstream.
Add security feature flags to indicate the need for software to flush the count cache on context switch, and for the presence of a hardware assisted count cache flush.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/security_features.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h index 44989b22383c..a0d47bc18a5c 100644 --- a/arch/powerpc/include/asm/security_features.h +++ b/arch/powerpc/include/asm/security_features.h @@ -59,6 +59,9 @@ static inline bool security_ftr_enabled(unsigned long feature) // Indirect branch prediction cache disabled #define SEC_FTR_COUNT_CACHE_DISABLED 0x0000000000000020ull
+// bcctr 2,0,0 triggers a hardware assisted count cache flush +#define SEC_FTR_BCCTR_FLUSH_ASSIST 0x0000000000000800ull +
// Features indicating need for Spectre/Meltdown mitigations
@@ -74,6 +77,9 @@ static inline bool security_ftr_enabled(unsigned long feature) // Firmware configuration indicates user favours security over performance #define SEC_FTR_FAVOUR_SECURITY 0x0000000000000200ull
+// Software required to flush count cache on context switch +#define SEC_FTR_FLUSH_COUNT_CACHE 0x0000000000000400ull +
// Features enabled by default #define SEC_FTR_DEFAULT \
commit ee13cb249fabdff8b90aaff61add347749280087 upstream.
Some CPU revisions support a mode where the count cache needs to be flushed by software on context switch. Additionally some revisions may have a hardware accelerated flush, in which case the software flush sequence can be shortened.
If we detect the appropriate flag from firmware we patch a branch into _switch() which takes us to a count cache flush sequence.
That sequence in turn may be patched to return early if we detect that the CPU supports accelerating the flush sequence in hardware.
Add debugfs support for reporting the state of the flush, as well as runtime disabling it.
And modify the spectre_v2 sysfs file to report the state of the software flush.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/asm-prototypes.h | 6 ++ arch/powerpc/include/asm/security_features.h | 1 + arch/powerpc/kernel/entry_64.S | 54 +++++++++++ arch/powerpc/kernel/security.c | 98 +++++++++++++++++++- 4 files changed, 154 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index e0baba1535e6..f3daa175f86c 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -121,4 +121,10 @@ extern s64 __ashrdi3(s64, int); extern int __cmpdi2(s64, s64); extern int __ucmpdi2(u64, u64);
+/* Patch sites */ +extern s32 patch__call_flush_count_cache; +extern s32 patch__flush_count_cache_return; + +extern long flush_count_cache; + #endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */ diff --git a/arch/powerpc/include/asm/security_features.h b/arch/powerpc/include/asm/security_features.h index a0d47bc18a5c..759597bf0fd8 100644 --- a/arch/powerpc/include/asm/security_features.h +++ b/arch/powerpc/include/asm/security_features.h @@ -22,6 +22,7 @@ enum stf_barrier_type {
void setup_stf_barrier(void); void do_stf_barrier_fixups(enum stf_barrier_type types); +void setup_count_cache_flush(void);
static inline void security_ftr_set(unsigned long feature) { diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 11e390662384..6625cec9e7c0 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -26,6 +26,7 @@ #include <asm/page.h> #include <asm/mmu.h> #include <asm/thread_info.h> +#include <asm/code-patching-asm.h> #include <asm/ppc_asm.h> #include <asm/asm-offsets.h> #include <asm/cputable.h> @@ -483,6 +484,57 @@ _GLOBAL(ret_from_kernel_thread) li r3,0 b .Lsyscall_exit
+#ifdef CONFIG_PPC_BOOK3S_64 + +#define FLUSH_COUNT_CACHE \ +1: nop; \ + patch_site 1b, patch__call_flush_count_cache + + +#define BCCTR_FLUSH .long 0x4c400420 + +.macro nops number + .rept \number + nop + .endr +.endm + +.balign 32 +.global flush_count_cache +flush_count_cache: + /* Save LR into r9 */ + mflr r9 + + .rept 64 + bl .+4 + .endr + b 1f + nops 6 + + .balign 32 + /* Restore LR */ +1: mtlr r9 + li r9,0x7fff + mtctr r9 + + BCCTR_FLUSH + +2: nop + patch_site 2b patch__flush_count_cache_return + + nops 3 + + .rept 278 + .balign 32 + BCCTR_FLUSH + nops 7 + .endr + + blr +#else +#define FLUSH_COUNT_CACHE +#endif /* CONFIG_PPC_BOOK3S_64 */ + /* * This routine switches between two different tasks. The process * state of one is saved on its kernel stack. Then the state @@ -514,6 +566,8 @@ _GLOBAL(_switch) std r23,_CCR(r1) std r1,KSP(r3) /* Set old stack pointer */
+ FLUSH_COUNT_CACHE + #ifdef CONFIG_SMP /* We need a sync somewhere here to make sure that if the * previous task gets rescheduled on another CPU, it sees all diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index 2f30fc8ed0a8..fd4703b6ddc0 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -9,6 +9,8 @@ #include <linux/device.h> #include <linux/seq_buf.h>
+#include <asm/asm-prototypes.h> +#include <asm/code-patching.h> #include <asm/debug.h> #include <asm/security_features.h> #include <asm/setup.h> @@ -16,6 +18,13 @@
unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
+enum count_cache_flush_type { + COUNT_CACHE_FLUSH_NONE = 0x1, + COUNT_CACHE_FLUSH_SW = 0x2, + COUNT_CACHE_FLUSH_HW = 0x4, +}; +static enum count_cache_flush_type count_cache_flush_type; + bool barrier_nospec_enabled; static bool no_nospec;
@@ -160,17 +169,29 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, c bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED); ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED);
- if (bcs || ccd) { + if (bcs || ccd || count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) { + bool comma = false; seq_buf_printf(&s, "Mitigation: ");
- if (bcs) + if (bcs) { seq_buf_printf(&s, "Indirect branch serialisation (kernel only)"); + comma = true; + } + + if (ccd) { + if (comma) + seq_buf_printf(&s, ", "); + seq_buf_printf(&s, "Indirect branch cache disabled"); + comma = true; + }
- if (bcs && ccd) + if (comma) seq_buf_printf(&s, ", ");
- if (ccd) - seq_buf_printf(&s, "Indirect branch cache disabled"); + seq_buf_printf(&s, "Software count cache flush"); + + if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW) + seq_buf_printf(&s, "(hardware accelerated)"); } else seq_buf_printf(&s, "Vulnerable");
@@ -327,4 +348,71 @@ static __init int stf_barrier_debugfs_init(void) } device_initcall(stf_barrier_debugfs_init); #endif /* CONFIG_DEBUG_FS */ + +static void toggle_count_cache_flush(bool enable) +{ + if (!enable || !security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE)) { + patch_instruction_site(&patch__call_flush_count_cache, PPC_INST_NOP); + count_cache_flush_type = COUNT_CACHE_FLUSH_NONE; + pr_info("count-cache-flush: software flush disabled.\n"); + return; + } + + patch_branch_site(&patch__call_flush_count_cache, + (u64)&flush_count_cache, BRANCH_SET_LINK); + + if (!security_ftr_enabled(SEC_FTR_BCCTR_FLUSH_ASSIST)) { + count_cache_flush_type = COUNT_CACHE_FLUSH_SW; + pr_info("count-cache-flush: full software flush sequence enabled.\n"); + return; + } + + patch_instruction_site(&patch__flush_count_cache_return, PPC_INST_BLR); + count_cache_flush_type = COUNT_CACHE_FLUSH_HW; + pr_info("count-cache-flush: hardware assisted flush sequence enabled\n"); +} + +void setup_count_cache_flush(void) +{ + toggle_count_cache_flush(true); +} + +#ifdef CONFIG_DEBUG_FS +static int count_cache_flush_set(void *data, u64 val) +{ + bool enable; + + if (val == 1) + enable = true; + else if (val == 0) + enable = false; + else + return -EINVAL; + + toggle_count_cache_flush(enable); + + return 0; +} + +static int count_cache_flush_get(void *data, u64 *val) +{ + if (count_cache_flush_type == COUNT_CACHE_FLUSH_NONE) + *val = 0; + else + *val = 1; + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_count_cache_flush, count_cache_flush_get, + count_cache_flush_set, "%llu\n"); + +static __init int count_cache_flush_debugfs_init(void) +{ + debugfs_create_file("count_cache_flush", 0600, powerpc_debugfs_root, + NULL, &fops_count_cache_flush); + return 0; +} +device_initcall(count_cache_flush_debugfs_init); +#endif /* CONFIG_DEBUG_FS */ #endif /* CONFIG_PPC_BOOK3S_64 */
commit ba72dc171954b782a79d25e0f4b3ed91090c3b1e upstream.
Use the existing hypercall to determine the appropriate settings for the count cache flush, and then call the generic powerpc code to set it up based on the security feature flags.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/hvcall.h | 2 ++ arch/powerpc/platforms/pseries/setup.c | 7 +++++++ 2 files changed, 9 insertions(+)
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 9d978102bf0d..9587d301db55 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -316,10 +316,12 @@ #define H_CPU_CHAR_BRANCH_HINTS_HONORED (1ull << 58) // IBM bit 5 #define H_CPU_CHAR_THREAD_RECONFIG_CTRL (1ull << 57) // IBM bit 6 #define H_CPU_CHAR_COUNT_CACHE_DISABLED (1ull << 56) // IBM bit 7 +#define H_CPU_CHAR_BCCTR_FLUSH_ASSIST (1ull << 54) // IBM bit 9
#define H_CPU_BEHAV_FAVOUR_SECURITY (1ull << 63) // IBM bit 0 #define H_CPU_BEHAV_L1D_FLUSH_PR (1ull << 62) // IBM bit 1 #define H_CPU_BEHAV_BNDS_CHK_SPEC_BAR (1ull << 61) // IBM bit 2 +#define H_CPU_BEHAV_FLUSH_COUNT_CACHE (1ull << 58) // IBM bit 5
#ifndef __ASSEMBLY__ #include <linux/types.h> diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 91ade7755823..adb09ab87f7c 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -475,6 +475,12 @@ static void init_cpu_char_feature_flags(struct h_cpu_char_result *result) if (result->character & H_CPU_CHAR_COUNT_CACHE_DISABLED) security_ftr_set(SEC_FTR_COUNT_CACHE_DISABLED);
+ if (result->character & H_CPU_CHAR_BCCTR_FLUSH_ASSIST) + security_ftr_set(SEC_FTR_BCCTR_FLUSH_ASSIST); + + if (result->behaviour & H_CPU_BEHAV_FLUSH_COUNT_CACHE) + security_ftr_set(SEC_FTR_FLUSH_COUNT_CACHE); + /* * The features below are enabled by default, so we instead look to see * if firmware has *disabled* them, and clear them if so. @@ -525,6 +531,7 @@ void pseries_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_PR);
setup_rfi_flush(types, enable); + setup_count_cache_flush(); }
static void __init pSeries_setup_arch(void)
commit 99d54754d3d5f896a8f616b0b6520662bc99d66b upstream.
Look for fw-features properties to determine the appropriate settings for the count cache flush, and then call the generic powerpc code to set it up based on the security feature flags.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/platforms/powernv/setup.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 17203abf38e8..365e2b620201 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -77,6 +77,12 @@ static void init_fw_feat_flags(struct device_node *np) if (fw_feature_is("enabled", "fw-count-cache-disabled", np)) security_ftr_set(SEC_FTR_COUNT_CACHE_DISABLED);
+ if (fw_feature_is("enabled", "fw-count-cache-flush-bcctr2,0,0", np)) + security_ftr_set(SEC_FTR_BCCTR_FLUSH_ASSIST); + + if (fw_feature_is("enabled", "needs-count-cache-flush-on-context-switch", np)) + security_ftr_set(SEC_FTR_FLUSH_COUNT_CACHE); + /* * The features below are enabled by default, so we instead look to see * if firmware has *disabled* them, and clear them if so. @@ -123,6 +129,7 @@ static void pnv_setup_rfi_flush(void) security_ftr_enabled(SEC_FTR_L1D_FLUSH_HV));
setup_rfi_flush(type, enable); + setup_count_cache_flush(); }
static void __init pnv_setup_arch(void)
commit 76a5eaa38b15dda92cd6964248c39b5a6f3a4e9d upstream.
In order to protect against speculation attacks (Spectre variant 2) on NXP PowerPC platforms, the branch predictor should be flushed when the privillege level is changed. This patch is adding the infrastructure to fixup at runtime the code sections that are performing the branch predictor flush depending on a boot arg parameter which is added later in a separate patch.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/feature-fixups.h | 12 ++++++++++++ arch/powerpc/include/asm/setup.h | 2 ++ arch/powerpc/kernel/vmlinux.lds.S | 8 ++++++++ arch/powerpc/lib/feature-fixups.c | 23 +++++++++++++++++++++++ 4 files changed, 45 insertions(+)
diff --git a/arch/powerpc/include/asm/feature-fixups.h b/arch/powerpc/include/asm/feature-fixups.h index afd3efd38938..175128e19025 100644 --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -221,6 +221,17 @@ void setup_feature_keys(void); FTR_ENTRY_OFFSET 953b-954b; \ .popsection;
+#define START_BTB_FLUSH_SECTION \ +955: \ + +#define END_BTB_FLUSH_SECTION \ +956: \ + .pushsection __btb_flush_fixup,"a"; \ + .align 2; \ +957: \ + FTR_ENTRY_OFFSET 955b-957b; \ + FTR_ENTRY_OFFSET 956b-957b; \ + .popsection;
#ifndef __ASSEMBLY__
@@ -229,6 +240,7 @@ extern long __start___stf_entry_barrier_fixup, __stop___stf_entry_barrier_fixup; extern long __start___stf_exit_barrier_fixup, __stop___stf_exit_barrier_fixup; extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; extern long __start___barrier_nospec_fixup, __stop___barrier_nospec_fixup; +extern long __start__btb_flush_fixup, __stop__btb_flush_fixup;
#endif
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index d3e9da62d029..23ee67e279ae 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -65,6 +65,8 @@ void do_barrier_nospec_fixups_range(bool enable, void *start, void *end); static inline void do_barrier_nospec_fixups_range(bool enable, void *start, void *end) { }; #endif
+void do_btb_flush_fixups(void); + #endif /* !__ASSEMBLY__ */
#endif /* _ASM_POWERPC_SETUP_H */ diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 5c6cf58943b9..50d365060855 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -164,6 +164,14 @@ SECTIONS } #endif /* CONFIG_PPC_BARRIER_NOSPEC */
+#ifdef CONFIG_PPC_FSL_BOOK3E + . = ALIGN(8); + __spec_btb_flush_fixup : AT(ADDR(__spec_btb_flush_fixup) - LOAD_OFFSET) { + __start__btb_flush_fixup = .; + *(__btb_flush_fixup) + __stop__btb_flush_fixup = .; + } +#endif EXCEPTION_TABLE(0)
NOTES :kernel :notes diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index b3e362437ec4..e6ed0ec94bc8 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -347,6 +347,29 @@ void do_barrier_nospec_fixups_range(bool enable, void *fixup_start, void *fixup_
printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); } + +static void patch_btb_flush_section(long *curr) +{ + unsigned int *start, *end; + + start = (void *)curr + *curr; + end = (void *)curr + *(curr + 1); + for (; start < end; start++) { + pr_devel("patching dest %lx\n", (unsigned long)start); + patch_instruction(start, PPC_INST_NOP); + } +} + +void do_btb_flush_fixups(void) +{ + long *start, *end; + + start = PTRRELOC(&__start__btb_flush_fixup); + end = PTRRELOC(&__stop__btb_flush_fixup); + + for (; start < end; start += 2) + patch_btb_flush_section(start); +} #endif /* CONFIG_PPC_FSL_BOOK3E */
void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end)
commit 1cbf8990d79ff69da8ad09e8a3df014e1494462b upstream.
The BUCSR register can be used to invalidate the entries in the branch prediction mechanisms.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/ppc_asm.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index 24e95be3bfaf..bbd35ba36a22 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -780,4 +780,25 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601) .long 0x2400004c /* rfid */ #endif /* !CONFIG_PPC_BOOK3E */ #endif /* __ASSEMBLY__ */ + +/* + * Helper macro for exception table entries + */ +#define EX_TABLE(_fault, _target) \ + stringify_in_c(.section __ex_table,"a";)\ + stringify_in_c(.balign 4;) \ + stringify_in_c(.long (_fault) - . ;) \ + stringify_in_c(.long (_target) - . ;) \ + stringify_in_c(.previous) + +#ifdef CONFIG_PPC_FSL_BOOK3E +#define BTB_FLUSH(reg) \ + lis reg,BUCSR_INIT@h; \ + ori reg,reg,BUCSR_INIT@l; \ + mtspr SPRN_BUCSR,reg; \ + isync; +#else +#define BTB_FLUSH(reg) +#endif /* CONFIG_PPC_FSL_BOOK3E */ + #endif /* _ASM_POWERPC_PPC_ASM_H */
commit 7d8bad99ba5a22892f0cad6881289fdc3875a930 upstream.
Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect:
$ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2 "Mitigation: Software count cache flush"
Which is wrong. Fix it to report vulnerable for now.
Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/security.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index fd4703b6ddc0..fc41bccd9ab6 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -23,7 +23,7 @@ enum count_cache_flush_type { COUNT_CACHE_FLUSH_SW = 0x2, COUNT_CACHE_FLUSH_HW = 0x4, }; -static enum count_cache_flush_type count_cache_flush_type; +static enum count_cache_flush_type count_cache_flush_type = COUNT_CACHE_FLUSH_NONE;
bool barrier_nospec_enabled; static bool no_nospec;
commit 98518c4d8728656db349f875fcbbc7c126d4c973 upstream.
In order to flush the branch predictor the guest kernel performs writes to the BUCSR register which is hypervisor privilleged. However, the branch predictor is flushed at each KVM entry, so the branch predictor has been already flushed, so just return as soon as possible to guest.
Signed-off-by: Diana Craciun diana.craciun@nxp.com [mpe: Tweak comment formatting] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kvm/e500_emulate.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index 990db69a1d0b..fa88f641ac03 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -277,6 +277,13 @@ int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong spr_va vcpu->arch.pwrmgtcr0 = spr_val; break;
+ case SPRN_BUCSR: + /* + * If we are here, it means that we have already flushed the + * branch predictor, so just return to guest. + */ + break; + /* extra exceptions */ #ifdef CONFIG_SPE_POSSIBLE case SPRN_IVOR32:
commit f633a8ad636efb5d4bba1a047d4a0f1ef719aa06 upstream.
When the command line argument is present, the Spectre variant 2 mitigations are disabled.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/setup.h | 5 +++++ arch/powerpc/kernel/security.c | 21 +++++++++++++++++++++ 2 files changed, 26 insertions(+)
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 23ee67e279ae..862ebce3ae54 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -65,6 +65,11 @@ void do_barrier_nospec_fixups_range(bool enable, void *start, void *end); static inline void do_barrier_nospec_fixups_range(bool enable, void *start, void *end) { }; #endif
+#ifdef CONFIG_PPC_FSL_BOOK3E +void setup_spectre_v2(void); +#else +static inline void setup_spectre_v2(void) {}; +#endif void do_btb_flush_fixups(void);
#endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index fc41bccd9ab6..6dc5cdc2b87c 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -27,6 +27,10 @@ static enum count_cache_flush_type count_cache_flush_type = COUNT_CACHE_FLUSH_NO
bool barrier_nospec_enabled; static bool no_nospec; +static bool btb_flush_enabled; +#ifdef CONFIG_PPC_FSL_BOOK3E +static bool no_spectrev2; +#endif
static void enable_barrier_nospec(bool enable) { @@ -102,6 +106,23 @@ static __init int barrier_nospec_debugfs_init(void) device_initcall(barrier_nospec_debugfs_init); #endif /* CONFIG_DEBUG_FS */
+#ifdef CONFIG_PPC_FSL_BOOK3E +static int __init handle_nospectre_v2(char *p) +{ + no_spectrev2 = true; + + return 0; +} +early_param("nospectre_v2", handle_nospectre_v2); +void setup_spectre_v2(void) +{ + if (no_spectrev2) + do_btb_flush_fixups(); + else + btb_flush_enabled = true; +} +#endif /* CONFIG_PPC_FSL_BOOK3E */ + #ifdef CONFIG_PPC_BOOK3S_64 ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) {
commit 10c5e83afd4a3f01712d97d3bb1ae34d5b74a185 upstream.
In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e. the kernel is entered), the branch predictor state is flushed.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/entry_64.S | 5 +++++ arch/powerpc/kernel/exceptions-64e.S | 26 +++++++++++++++++++++++++- arch/powerpc/mm/tlb_low_64e.S | 7 +++++++ 3 files changed, 37 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 6625cec9e7c0..390ebf4ef384 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -80,6 +80,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM) std r0,GPR0(r1) std r10,GPR1(r1) beq 2f /* if from kernel mode */ +#ifdef CONFIG_PPC_FSL_BOOK3E +START_BTB_FLUSH_SECTION + BTB_FLUSH(r10) +END_BTB_FLUSH_SECTION +#endif ACCOUNT_CPU_USER_ENTRY(r13, r10, r11) 2: std r2,GPR2(r1) std r3,GPR3(r1) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index ca03eb229a9a..79c6fee6368d 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -295,7 +295,8 @@ ret_from_mc_except: andi. r10,r11,MSR_PR; /* save stack pointer */ \ beq 1f; /* branch around if supervisor */ \ ld r1,PACAKSAVE(r13); /* get kernel stack coming from usr */\ -1: cmpdi cr1,r1,0; /* check if SP makes sense */ \ +1: type##_BTB_FLUSH \ + cmpdi cr1,r1,0; /* check if SP makes sense */ \ bge- cr1,exc_##n##_bad_stack;/* bad stack (TODO: out of line) */ \ mfspr r10,SPRN_##type##_SRR0; /* read SRR0 before touching stack */
@@ -327,6 +328,29 @@ ret_from_mc_except: #define SPRN_MC_SRR0 SPRN_MCSRR0 #define SPRN_MC_SRR1 SPRN_MCSRR1
+#ifdef CONFIG_PPC_FSL_BOOK3E +#define GEN_BTB_FLUSH \ + START_BTB_FLUSH_SECTION \ + beq 1f; \ + BTB_FLUSH(r10) \ + 1: \ + END_BTB_FLUSH_SECTION + +#define CRIT_BTB_FLUSH \ + START_BTB_FLUSH_SECTION \ + BTB_FLUSH(r10) \ + END_BTB_FLUSH_SECTION + +#define DBG_BTB_FLUSH CRIT_BTB_FLUSH +#define MC_BTB_FLUSH CRIT_BTB_FLUSH +#define GDBELL_BTB_FLUSH GEN_BTB_FLUSH +#else +#define GEN_BTB_FLUSH +#define CRIT_BTB_FLUSH +#define DBG_BTB_FLUSH +#define GDBELL_BTB_FLUSH +#endif + #define NORMAL_EXCEPTION_PROLOG(n, intnum, addition) \ EXCEPTION_PROLOG(n, intnum, GEN, addition##_GEN(n))
diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S index eb82d787d99a..b7e9c09dfe19 100644 --- a/arch/powerpc/mm/tlb_low_64e.S +++ b/arch/powerpc/mm/tlb_low_64e.S @@ -69,6 +69,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) std r15,EX_TLB_R15(r12) std r10,EX_TLB_CR(r12) #ifdef CONFIG_PPC_FSL_BOOK3E +START_BTB_FLUSH_SECTION + mfspr r11, SPRN_SRR1 + andi. r10,r11,MSR_PR + beq 1f + BTB_FLUSH(r10) +1: +END_BTB_FLUSH_SECTION std r7,EX_TLB_R7(r12) #endif TLB_MISS_PROLOG_STATS
commit 7fef436295bf6c05effe682c8797dfcb0deb112a upstream.
In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e.the kernel is entered), the branch predictor state is flushed.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/head_booke.h | 6 ++++++ arch/powerpc/kernel/head_fsl_booke.S | 15 +++++++++++++++ 2 files changed, 21 insertions(+)
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index a620203f7de3..384bb4d80520 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -42,6 +42,9 @@ andi. r11, r11, MSR_PR; /* check whether user or kernel */\ mr r11, r1; \ beq 1f; \ +START_BTB_FLUSH_SECTION \ + BTB_FLUSH(r11) \ +END_BTB_FLUSH_SECTION \ /* if from user, start at top of this thread's kernel stack */ \ lwz r11, THREAD_INFO-THREAD(r10); \ ALLOC_STACK_FRAME(r11, THREAD_SIZE); \ @@ -127,6 +130,9 @@ stw r9,_CCR(r8); /* save CR on stack */\ mfspr r11,exc_level_srr1; /* check whether user or kernel */\ DO_KVM BOOKE_INTERRUPT_##intno exc_level_srr1; \ +START_BTB_FLUSH_SECTION \ + BTB_FLUSH(r10) \ +END_BTB_FLUSH_SECTION \ andi. r11,r11,MSR_PR; \ mfspr r11,SPRN_SPRG_THREAD; /* if from user, start at top of */\ lwz r11,THREAD_INFO-THREAD(r11); /* this thread's kernel stack */\ diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index bf4c6021515f..60a0aeefc4a7 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -452,6 +452,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) mfcr r13 stw r13, THREAD_NORMSAVE(3)(r10) DO_KVM BOOKE_INTERRUPT_DTLB_MISS SPRN_SRR1 +START_BTB_FLUSH_SECTION + mfspr r11, SPRN_SRR1 + andi. r10,r11,MSR_PR + beq 1f + BTB_FLUSH(r10) +1: +END_BTB_FLUSH_SECTION mfspr r10, SPRN_DEAR /* Get faulting address */
/* If we are faulting a kernel address, we have to use the @@ -546,6 +553,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) mfcr r13 stw r13, THREAD_NORMSAVE(3)(r10) DO_KVM BOOKE_INTERRUPT_ITLB_MISS SPRN_SRR1 +START_BTB_FLUSH_SECTION + mfspr r11, SPRN_SRR1 + andi. r10,r11,MSR_PR + beq 1f + BTB_FLUSH(r10) +1: +END_BTB_FLUSH_SECTION + mfspr r10, SPRN_SRR0 /* Get faulting address */
/* If we are faulting a kernel address, we have to use the
commit e7aa61f47b23afbec41031bc47ca8d6cb6516abc upstream.
Switching from the guest to host is another place where the speculative accesses can be exploited. Flush the branch predictor when entering KVM.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kvm/bookehv_interrupts.S | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 81bd8a07aa51..612b7f6a887f 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -75,6 +75,10 @@ PPC_LL r1, VCPU_HOST_STACK(r4) PPC_LL r2, HOST_R2(r1)
+START_BTB_FLUSH_SECTION + BTB_FLUSH(r10) +END_BTB_FLUSH_SECTION + mfspr r10, SPRN_PID lwz r8, VCPU_HOST_PID(r4) PPC_LL r11, VCPU_SHARED(r4)
commit 3bc8ea8603ae4c1e09aca8de229ad38b8091fcb3 upstream.
If the user choses not to use the mitigations, replace the code sequence with nops.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/setup-common.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index d5a128f54537..5e7d70c5d065 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -919,6 +919,7 @@ void __init setup_arch(char **cmdline_p) ppc_md.setup_arch();
setup_barrier_nospec(); + setup_spectre_v2();
paging_init();
commit dfa88658fb0583abb92e062c7a9cd5a5b94f2a46 upstream.
Report branch predictor state flush as a mitigation for Spectre variant 2.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/security.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index 6dc5cdc2b87c..43ce800e73bf 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -213,8 +213,11 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, c
if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW) seq_buf_printf(&s, "(hardware accelerated)"); - } else + } else if (btb_flush_enabled) { + seq_buf_printf(&s, "Mitigation: Branch predictor state flush"); + } else { seq_buf_printf(&s, "Vulnerable"); + }
seq_buf_printf(&s, "\n");
commit 039daac5526932ec731e4499613018d263af8b3e upstream.
Fixed the following build warning: powerpc-linux-gnu-ld: warning: orphan section `__btb_flush_fixup' from `arch/powerpc/kernel/head_44x.o' being placed in section `__btb_flush_fixup'.
Signed-off-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/head_booke.h | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index 384bb4d80520..7b98c7351f6c 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -31,6 +31,16 @@ */ #define THREAD_NORMSAVE(offset) (THREAD_NORMSAVES + (offset * 4))
+#ifdef CONFIG_PPC_FSL_BOOK3E +#define BOOKE_CLEAR_BTB(reg) \ +START_BTB_FLUSH_SECTION \ + BTB_FLUSH(reg) \ +END_BTB_FLUSH_SECTION +#else +#define BOOKE_CLEAR_BTB(reg) +#endif + + #define NORMAL_EXCEPTION_PROLOG(intno) \ mtspr SPRN_SPRG_WSCRATCH0, r10; /* save one register */ \ mfspr r10, SPRN_SPRG_THREAD; \ @@ -42,9 +52,7 @@ andi. r11, r11, MSR_PR; /* check whether user or kernel */\ mr r11, r1; \ beq 1f; \ -START_BTB_FLUSH_SECTION \ - BTB_FLUSH(r11) \ -END_BTB_FLUSH_SECTION \ + BOOKE_CLEAR_BTB(r11) \ /* if from user, start at top of this thread's kernel stack */ \ lwz r11, THREAD_INFO-THREAD(r10); \ ALLOC_STACK_FRAME(r11, THREAD_SIZE); \ @@ -130,9 +138,7 @@ END_BTB_FLUSH_SECTION \ stw r9,_CCR(r8); /* save CR on stack */\ mfspr r11,exc_level_srr1; /* check whether user or kernel */\ DO_KVM BOOKE_INTERRUPT_##intno exc_level_srr1; \ -START_BTB_FLUSH_SECTION \ - BTB_FLUSH(r10) \ -END_BTB_FLUSH_SECTION \ + BOOKE_CLEAR_BTB(r10) \ andi. r11,r11,MSR_PR; \ mfspr r11,SPRN_SPRG_THREAD; /* if from user, start at top of */\ lwz r11,THREAD_INFO-THREAD(r11); /* this thread's kernel stack */\
commit 27da80719ef132cf8c80eb406d5aeb37dddf78cc upstream.
The commit identified below adds MC_BTB_FLUSH macro only when CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error on some configs (seen several times with kisskb randconfig_defconfig)
arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush' make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1 make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2 make[1]: *** [Makefile:1043: arch/powerpc] Error 2 make: *** [Makefile:152: sub-make] Error 2
This patch adds a blank definition of MC_BTB_FLUSH for other cases.
Fixes: 10c5e83afd4a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)") Cc: Diana Craciun diana.craciun@nxp.com Signed-off-by: Christophe Leroy christophe.leroy@c-s.fr Reviewed-by: Daniel Axtens dja@axtens.net Reviewed-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/exceptions-64e.S | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 79c6fee6368d..423b5257d3a1 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -348,6 +348,7 @@ ret_from_mc_except: #define GEN_BTB_FLUSH #define CRIT_BTB_FLUSH #define DBG_BTB_FLUSH +#define MC_BTB_FLUSH #define GDBELL_BTB_FLUSH #endif
commit 92edf8df0ff2ae86cc632eeca0e651fd8431d40d upstream.
When I updated the spectre_v2 reporting to handle software count cache flush I got the logic wrong when there's no software count cache enabled at all.
The result is that on systems with the software count cache flush disabled we print:
Mitigation: Indirect branch cache disabled, Software count cache flush
Which correctly indicates that the count cache is disabled, but incorrectly says the software count cache flush is enabled.
The root of the problem is that we are trying to handle all combinations of options. But we know now that we only expect to see the software count cache flush enabled if the other options are false.
So split the two cases, which simplifies the logic and fixes the bug. We were also missing a space before "(hardware accelerated)".
The result is we see one of:
Mitigation: Indirect branch serialisation (kernel only) Mitigation: Indirect branch cache disabled Mitigation: Software count cache flush Mitigation: Software count cache flush (hardware accelerated)
Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Michael Ellerman mpe@ellerman.id.au Reviewed-by: Michael Neuling mikey@neuling.org Reviewed-by: Diana Craciun diana.craciun@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/security.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c index 43ce800e73bf..30542e833ebe 100644 --- a/arch/powerpc/kernel/security.c +++ b/arch/powerpc/kernel/security.c @@ -190,29 +190,22 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, c bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED); ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED);
- if (bcs || ccd || count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) { - bool comma = false; + if (bcs || ccd) { seq_buf_printf(&s, "Mitigation: ");
- if (bcs) { + if (bcs) seq_buf_printf(&s, "Indirect branch serialisation (kernel only)"); - comma = true; - }
- if (ccd) { - if (comma) - seq_buf_printf(&s, ", "); - seq_buf_printf(&s, "Indirect branch cache disabled"); - comma = true; - } - - if (comma) + if (bcs && ccd) seq_buf_printf(&s, ", ");
- seq_buf_printf(&s, "Software count cache flush"); + if (ccd) + seq_buf_printf(&s, "Indirect branch cache disabled"); + } else if (count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) { + seq_buf_printf(&s, "Mitigation: Software count cache flush");
if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW) - seq_buf_printf(&s, "(hardware accelerated)"); + seq_buf_printf(&s, " (hardware accelerated)"); } else if (btb_flush_enabled) { seq_buf_printf(&s, "Mitigation: Branch predictor state flush"); } else {
[ Upstream commit c8a43c18a97845e7f94ed7d181c11f41964976a2 ]
When KASLR is enabled (CONFIG_RANDOMIZE_BASE=y), the top 4K of kernel virtual address space may be mapped to physical addresses despite being reserved for ERR_PTR values.
Fix the randomization of the linear region so that we avoid mapping the last page of the virtual address space.
Cc: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: liyueyi liyueyi@live.com [will: rewrote commit message; merged in suggestion from Ard] Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- arch/arm64/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index fa6b2fad7a3d..5d3df68272f5 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -272,7 +272,7 @@ void __init arm64_memblock_init(void) * memory spans, randomize the linear region as well. */ if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) { - range = range / ARM64_MEMSTART_ALIGN + 1; + range /= ARM64_MEMSTART_ALIGN; memstart_addr -= ARM64_MEMSTART_ALIGN * ((range * memstart_offset_seed) >> 16); }
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit c7084edc3f6d67750f50d4183134c4fb5712a5c8 upstream.
The n_r3964 line discipline driver was written in a different time, when SMP machines were rare, and users were trusted to do the right thing. Since then, the world has moved on but not this code, it has stayed rooted in the past with its lovely hand-crafted list structures and loads of "interesting" race conditions all over the place.
After attempting to clean up most of the issues, I just gave up and am now marking the driver as BROKEN so that hopefully someone who has this hardware will show up out of the woodwork (I know you are out there!) and will help with debugging a raft of changes that I had laying around for the code, but was too afraid to commit as odds are they would break things.
Many thanks to Jann and Linus for pointing out the initial problems in this codebase, as well as many reviews of my attempts to fix the issues. It was a case of whack-a-mole, and as you can see, the mole won.
Reported-by: Jann Horn jannh@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
--- drivers/char/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -377,7 +377,7 @@ config XILINX_HWICAP
config R3964 tristate "Siemens R3964 line discipline" - depends on TTY + depends on TTY && BROKEN ---help--- This driver allows synchronous communication with devices using the Siemens R3964 packet protocol. Unless you are dealing with special
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 7c0cca7c847e6e019d67b7d793efbbe3b947d004 upstream.
By default, the kernel will automatically load the module of any line dicipline that is asked for. As this sometimes isn't the safest thing to do, provide a sysctl to disable this feature.
By default, we set this to 'y' as that is the historical way that Linux has worked, and we do not want to break working systems. But in the future, perhaps this can default to 'n' to prevent this functionality.
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/Kconfig | 23 +++++++++++++++++++++++ drivers/tty/tty_io.c | 3 +++ drivers/tty/tty_ldisc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 73 insertions(+)
--- a/drivers/tty/Kconfig +++ b/drivers/tty/Kconfig @@ -455,4 +455,27 @@ config MIPS_EJTAG_FDC_KGDB_CHAN help FDC channel number to use for KGDB.
+config LDISC_AUTOLOAD + bool "Automatically load TTY Line Disciplines" + default y + help + Historically the kernel has always automatically loaded any + line discipline that is in a kernel module when a user asks + for it to be loaded with the TIOCSETD ioctl, or through other + means. This is not always the best thing to do on systems + where you know you will not be using some of the more + "ancient" line disciplines, so prevent the kernel from doing + this unless the request is coming from a process with the + CAP_SYS_MODULE permissions. + + Say 'Y' here if you trust your userspace users to do the right + thing, or if you have only provided the line disciplines that + you know you will be using, or if you wish to continue to use + the traditional method of on-demand loading of these modules + by any user. + + This functionality can be changed at runtime with the + dev.tty.ldisc_autoload sysctl, this configuration option will + only set the default value of this functionality. + endif # TTY --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -520,6 +520,8 @@ void proc_clear_tty(struct task_struct * tty_kref_put(tty); }
+extern void tty_sysctl_init(void); + /** * proc_set_tty - set the controlling terminal * @@ -3705,6 +3707,7 @@ void console_sysfs_notify(void) */ int __init tty_init(void) { + tty_sysctl_init(); cdev_init(&tty_cdev, &tty_fops); if (cdev_add(&tty_cdev, MKDEV(TTYAUX_MAJOR, 0), 1) || register_chrdev_region(MKDEV(TTYAUX_MAJOR, 0), 1, "/dev/tty") < 0) --- a/drivers/tty/tty_ldisc.c +++ b/drivers/tty/tty_ldisc.c @@ -155,6 +155,13 @@ static void put_ldops(struct tty_ldisc_o * takes tty_ldiscs_lock to guard against ldisc races */
+#if defined(CONFIG_LDISC_AUTOLOAD) + #define INITIAL_AUTOLOAD_STATE 1 +#else + #define INITIAL_AUTOLOAD_STATE 0 +#endif +static int tty_ldisc_autoload = INITIAL_AUTOLOAD_STATE; + static struct tty_ldisc *tty_ldisc_get(struct tty_struct *tty, int disc) { struct tty_ldisc *ld; @@ -169,6 +176,8 @@ static struct tty_ldisc *tty_ldisc_get(s */ ldops = get_ldops(disc); if (IS_ERR(ldops)) { + if (!capable(CAP_SYS_MODULE) && !tty_ldisc_autoload) + return ERR_PTR(-EPERM); request_module("tty-ldisc-%d", disc); ldops = get_ldops(disc); if (IS_ERR(ldops)) @@ -774,3 +783,41 @@ void tty_ldisc_deinit(struct tty_struct tty_ldisc_put(tty->ldisc); tty->ldisc = NULL; } + +static int zero; +static int one = 1; +static struct ctl_table tty_table[] = { + { + .procname = "ldisc_autoload", + .data = &tty_ldisc_autoload, + .maxlen = sizeof(tty_ldisc_autoload), + .mode = 0644, + .proc_handler = proc_dointvec, + .extra1 = &zero, + .extra2 = &one, + }, + { } +}; + +static struct ctl_table tty_dir_table[] = { + { + .procname = "tty", + .mode = 0555, + .child = tty_table, + }, + { } +}; + +static struct ctl_table tty_root_table[] = { + { + .procname = "dev", + .mode = 0555, + .child = tty_dir_table, + }, + { } +}; + +void tty_sysctl_init(void) +{ + register_sysctl_table(tty_root_table); +}
From: Junwei Hu hujunwei4@huawei.com
[ Upstream commit ef0efcd3bd3fd0589732b67fb586ffd3c8705806 ]
At the beginning of ip6_fragment func, the prevhdr pointer is obtained in the ip6_find_1stfragopt func. However, all the pointers pointing into skb header may change when calling skb_checksum_help func with skb->ip_summed = CHECKSUM_PARTIAL condition. The prevhdr pointe will be dangling if it is not reloaded after calling __skb_linearize func in skb_checksum_help func.
Here, I add a variable, nexthdr_offset, to evaluate the offset, which does not changes even after calling __skb_linearize func.
Fixes: 405c92f7a541 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment") Signed-off-by: Junwei Hu hujunwei4@huawei.com Reported-by: Wenhao Zhang zhangwenhao8@huawei.com Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com Reviewed-by: Zhiqiang Liu liuzhiqiang26@huawei.com Acked-by: Martin KaFai Lau kafai@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -592,7 +592,7 @@ int ip6_fragment(struct net *net, struct inet6_sk(skb->sk) : NULL; struct ipv6hdr *tmp_hdr; struct frag_hdr *fh; - unsigned int mtu, hlen, left, len; + unsigned int mtu, hlen, left, len, nexthdr_offset; int hroom, troom; __be32 frag_id; int ptr, offset = 0, err = 0; @@ -603,6 +603,7 @@ int ip6_fragment(struct net *net, struct goto fail; hlen = err; nexthdr = *prevhdr; + nexthdr_offset = prevhdr - skb_network_header(skb);
mtu = ip6_skb_dst_mtu(skb);
@@ -637,6 +638,7 @@ int ip6_fragment(struct net *net, struct (err = skb_checksum_help(skb))) goto fail;
+ prevhdr = skb_network_header(skb) + nexthdr_offset; hroom = LL_RESERVED_SPACE(rt->dst.dev); if (skb_has_frag_list(skb)) { int first_len = skb_pagelen(skb);
From: Lorenzo Bianconi lorenzo.bianconi@redhat.com
[ Upstream commit bb9bd814ebf04f579be466ba61fc922625508807 ]
ipip6 tunnels run iptunnel_pull_header on received skbs. This can determine the following use-after-free accessing iph pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device)
[ 706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit] [ 706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/= [ 706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016 [ 706.771839] Call trace: [ 706.801159] dump_backtrace+0x0/0x2f8 [ 706.845079] show_stack+0x24/0x30 [ 706.884833] dump_stack+0xe0/0x11c [ 706.925629] print_address_description+0x68/0x260 [ 706.982070] kasan_report+0x178/0x340 [ 707.025995] __asan_report_load1_noabort+0x30/0x40 [ 707.083481] ipip6_rcv+0x1678/0x16e0 [sit] [ 707.132623] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 707.185940] ip_local_deliver_finish+0x3b8/0x988 [ 707.241338] ip_local_deliver+0x144/0x470 [ 707.289436] ip_rcv_finish+0x43c/0x14b0 [ 707.335447] ip_rcv+0x628/0x1138 [ 707.374151] __netif_receive_skb_core+0x1670/0x2600 [ 707.432680] __netif_receive_skb+0x28/0x190 [ 707.482859] process_backlog+0x1d0/0x610 [ 707.529913] net_rx_action+0x37c/0xf68 [ 707.574882] __do_softirq+0x288/0x1018 [ 707.619852] run_ksoftirqd+0x70/0xa8 [ 707.662734] smpboot_thread_fn+0x3a4/0x9e8 [ 707.711875] kthread+0x2c8/0x350 [ 707.750583] ret_from_fork+0x10/0x18
[ 707.811302] Allocated by task 16982: [ 707.854182] kasan_kmalloc.part.1+0x40/0x108 [ 707.905405] kasan_kmalloc+0xb4/0xc8 [ 707.948291] kasan_slab_alloc+0x14/0x20 [ 707.994309] __kmalloc_node_track_caller+0x158/0x5e0 [ 708.053902] __kmalloc_reserve.isra.8+0x54/0xe0 [ 708.108280] __alloc_skb+0xd8/0x400 [ 708.150139] sk_stream_alloc_skb+0xa4/0x638 [ 708.200346] tcp_sendmsg_locked+0x818/0x2b90 [ 708.251581] tcp_sendmsg+0x40/0x60 [ 708.292376] inet_sendmsg+0xf0/0x520 [ 708.335259] sock_sendmsg+0xac/0xf8 [ 708.377096] sock_write_iter+0x1c0/0x2c0 [ 708.424154] new_sync_write+0x358/0x4a8 [ 708.470162] __vfs_write+0xc4/0xf8 [ 708.510950] vfs_write+0x12c/0x3d0 [ 708.551739] ksys_write+0xcc/0x178 [ 708.592533] __arm64_sys_write+0x70/0xa0 [ 708.639593] el0_svc_handler+0x13c/0x298 [ 708.686646] el0_svc+0x8/0xc
[ 708.739019] Freed by task 17: [ 708.774597] __kasan_slab_free+0x114/0x228 [ 708.823736] kasan_slab_free+0x10/0x18 [ 708.868703] kfree+0x100/0x3d8 [ 708.905320] skb_free_head+0x7c/0x98 [ 708.948204] skb_release_data+0x320/0x490 [ 708.996301] pskb_expand_head+0x60c/0x970 [ 709.044399] __iptunnel_pull_header+0x3b8/0x5d0 [ 709.098770] ipip6_rcv+0x41c/0x16e0 [sit] [ 709.146873] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 709.200195] ip_local_deliver_finish+0x3b8/0x988 [ 709.255596] ip_local_deliver+0x144/0x470 [ 709.303692] ip_rcv_finish+0x43c/0x14b0 [ 709.349705] ip_rcv+0x628/0x1138 [ 709.388413] __netif_receive_skb_core+0x1670/0x2600 [ 709.446943] __netif_receive_skb+0x28/0x190 [ 709.497120] process_backlog+0x1d0/0x610 [ 709.544169] net_rx_action+0x37c/0xf68 [ 709.589131] __do_softirq+0x288/0x1018
[ 709.651938] The buggy address belongs to the object at ffffe01b6bd85580 which belongs to the cache kmalloc-1024 of size 1024 [ 709.804356] The buggy address is located 117 bytes inside of 1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980) [ 709.946340] The buggy address belongs to the page: [ 710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0 [ 710.099914] flags: 0xfffff8000000100(slab) [ 710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600 [ 710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000 [ 710.334966] page dumped because: kasan: bad access detected
Fix it resetting iph pointer after iptunnel_pull_header
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Tested-by: Jianlin Shi jishi@redhat.com Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/sit.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -661,6 +661,10 @@ static int ipip6_rcv(struct sk_buff *skb !net_eq(tunnel->net, dev_net(tunnel->dev)))) goto out;
+ /* skb can be uncloned in iptunnel_pull_header, so + * old iph is no longer valid + */ + iph = (const struct iphdr *)skb_mac_header(skb); err = IP_ECN_decapsulate(iph, skb); if (unlikely(err)) { if (log_ecn_error)
From: Jiri Slaby jslaby@suse.cz
[ Upstream commit 3c446e6f96997f2a95bf0037ef463802162d2323 ]
When kcm is loaded while many processes try to create a KCM socket, a crash occurs: BUG: unable to handle kernel NULL pointer dereference at 000000000000000e IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0 Oops: 0002 [#1] SMP KASAN PTI CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased) RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 RSP: 0018:ffff88000d487a00 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719 ... CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0 Call Trace: kcm_create+0x600/0xbf0 [kcm] __sock_create+0x324/0x750 net/socket.c:1272 ...
This is due to race between sock_create and unfinished register_pernet_device. kcm_create tries to do "net_generic(net, kcm_net_id)". but kcm_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with mutiple processes doing socket(PF_KCM, ...) and one process doing module removal.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: Jiri Slaby jslaby@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/kcm/kcmsock.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -2058,14 +2058,14 @@ static int __init kcm_init(void) if (err) goto fail;
- err = sock_register(&kcm_family_ops); - if (err) - goto sock_register_fail; - err = register_pernet_device(&kcm_net_ops); if (err) goto net_ops_fail;
+ err = sock_register(&kcm_family_ops); + if (err) + goto sock_register_fail; + err = kcm_proc_init(); if (err) goto proc_init_fail; @@ -2073,12 +2073,12 @@ static int __init kcm_init(void) return 0;
proc_init_fail: - unregister_pernet_device(&kcm_net_ops); - -net_ops_fail: sock_unregister(PF_KCM);
sock_register_fail: + unregister_pernet_device(&kcm_net_ops); + +net_ops_fail: proto_unregister(&kcm_proto);
fail: @@ -2094,8 +2094,8 @@ fail: static void __exit kcm_exit(void) { kcm_proc_exit(); - unregister_pernet_device(&kcm_net_ops); sock_unregister(PF_KCM); + unregister_pernet_device(&kcm_net_ops); proto_unregister(&kcm_proto); destroy_workqueue(kcm_wq);
From: Mao Wenan maowenan@huawei.com
[ Upstream commit cb66ddd156203daefb8d71158036b27b0e2caf63 ]
When it is to cleanup net namespace, rds_tcp_exit_net() will call rds_tcp_kill_sock(), if t_sock is NULL, it will not call rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect() and reference 'net' which has already been freed.
In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before sock->ops->connect, but if connect() is failed, it will call rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always failed, rds_connect_worker() will try to reconnect all the time, so rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the connections.
Therefore, the condition !tc->t_sock is not needed if it is going to do cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always NULL, and there is on other path to cancel cp_conn_w and free connection. So this patch is to fix this.
rds_tcp_kill_sock(): ... if (net != c_net || !tc->t_sock) ... Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com
================================================================== BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11 Hardware name: linux,dummy-virt (DT) Workqueue: krdsd rds_connect_worker Call trace: dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x120/0x188 lib/dump_stack.c:113 print_address_description+0x68/0x278 mm/kasan/report.c:253 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x21c/0x348 mm/kasan/report.c:409 __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429 inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 __sock_create+0x4f8/0x770 net/socket.c:1276 sock_create_kern+0x50/0x68 net/socket.c:1322 rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114 rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
Allocated by task 687: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553 kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2705 [inline] slab_alloc mm/slub.c:2713 [inline] kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718 kmem_cache_zalloc include/linux/slab.h:697 [inline] net_alloc net/core/net_namespace.c:384 [inline] copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424 create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206 ksys_unshare+0x340/0x628 kernel/fork.c:2577 __do_sys_unshare kernel/fork.c:2645 [inline] __se_sys_unshare kernel/fork.c:2643 [inline] __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall arch/arm64/kernel/syscall.c:47 [inline] el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83 el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129 el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
Freed by task 264: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521 kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528 slab_free_hook mm/slub.c:1370 [inline] slab_free_freelist_hook mm/slub.c:1397 [inline] slab_free mm/slub.c:2952 [inline] kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968 net_free net/core/net_namespace.c:400 [inline] net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407 net_drop_ns net/core/net_namespace.c:406 [inline] cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
The buggy address belongs to the object at ffff8003496a3f80 which belongs to the cache net_namespace of size 7872 The buggy address is located 1796 bytes inside of 7872-byte region [ffff8003496a3f80, ffff8003496a5e40) The buggy address belongs to the page: page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000 index:0x0 compound_mapcount: 0 flags: 0xffffe0000008100(slab|head) raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000 raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -527,7 +527,7 @@ static void rds_tcp_kill_sock(struct net list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net);
- if (net != c_net || !tc->t_sock) + if (net != c_net) continue; if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn)) { list_move_tail(&tc->t_tcp_node, &tmp_list);
From: Andrea Righi andrea.righi@canonical.com
[ Upstream commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb ]
The flow action buffer can be resized if it's not big enough to contain all the requested flow actions. However, this resize doesn't take into account the new requested size, the buffer is only increased by a factor of 2x. This might be not enough to contain the new data, causing a buffer overflow, for example:
[ 42.044472] ============================================================================= [ 42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten [ 42.046415] -----------------------------------------------------------------------------
[ 42.047715] Disabling lock debugging due to kernel taint [ 42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc [ 42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101 [ 42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
[ 42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc ........ [ 42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00 kkkkkkkk....l... [ 42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6 l...........x... [ 42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00 ............... [ 42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.059061] Redzone 8bf2c4a5: 00 00 00 00 .... [ 42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Fix by making sure the new buffer is properly resized to contain all the requested data.
BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi andrea.righi@canonical.com Acked-by: Pravin B Shelar pshelar@ovn.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/openvswitch/flow_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -1853,14 +1853,14 @@ static struct nlattr *reserve_sfa_size(s
struct sw_flow_actions *acts; int new_acts_size; - int req_size = NLA_ALIGN(attr_len); + size_t req_size = NLA_ALIGN(attr_len); int next_offset = offsetof(struct sw_flow_actions, actions) + (*sfa)->actions_len;
if (req_size <= (ksize(*sfa) - next_offset)) goto out;
- new_acts_size = ksize(*sfa) * 2; + new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
if (new_acts_size > MAX_ACTIONS_BUFSIZE) { if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) {
From: Bjørn Mork bjorn@mork.no
[ Upstream commit 6289d0facd9ebce4cc83e5da39e15643ee998dc5 ]
This is a Qualcomm based device with a QMI function on interface 4. It is mode switched from 2020:2030 using a standard eject message.
T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 6 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2020 ProdID=2031 Rev= 2.32 S: Manufacturer=Mobile Connect S: Product=Mobile Connect S: SerialNumber=0123456789ABCDEF C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=89(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none) E: Ad=8a(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
Signed-off-by: Bjørn Mork bjorn@mork.no Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -890,6 +890,7 @@ static const struct usb_device_id produc {QMI_FIXED_INTF(0x19d2, 0x2002, 4)}, /* ZTE (Vodafone) K3765-Z */ {QMI_FIXED_INTF(0x2001, 0x7e19, 4)}, /* D-Link DWM-221 B1 */ {QMI_FIXED_INTF(0x2001, 0x7e35, 4)}, /* D-Link DWM-222 */ + {QMI_FIXED_INTF(0x2020, 0x2031, 4)}, /* Olicard 600 */ {QMI_FIXED_INTF(0x2020, 0x2033, 4)}, /* BroadMobi BM806U */ {QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)}, /* Sierra Wireless MC7700 */ {QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 09279e615c81ce55e04835970601ae286e3facbe ]
Syzbot report a kernel-infoleak:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 Call Trace: _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 copy_to_user include/linux/uaccess.h:174 [inline] sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline] sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562 ... Uninit was stored to memory at: sctp_transport_init net/sctp/transport.c:61 [inline] sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115 sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637 sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline] sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361 ... Bytes 8-15 of 16 are uninitialized
It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in struct sockaddr_in) wasn't initialized, but directly copied to user memory in sctp_getsockopt_peer_addrs().
So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as sctp_v6_addr_to_user() does.
Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Tested-by: Alexander Potapenko glider@google.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/protocol.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -600,6 +600,7 @@ out: static int sctp_v4_addr_to_user(struct sctp_sock *sp, union sctp_addr *addr) { /* No address mapping for V4 sockets */ + memset(addr->v4.sin_zero, 0, sizeof(addr->v4.sin_zero)); return sizeof(struct sockaddr_in); }
From: Koen De Schepper koen.de_schepper@nokia-bell-labs.com
[ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to loss episodes in the same way as conventional TCP".
Currently, Linux DCTCP performs no cwnd reduction when losses are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets alpha to its maximal value if a RTO happens. This behavior is sub-optimal for at least two reasons: i) it ignores losses triggering fast retransmissions; and ii) it causes unnecessary large cwnd reduction in the future if the loss was isolated as it resets the historical term of DCTCP's alpha EWMA to its maximal value (i.e., denoting a total congestion). The second reason has an especially noticeable effect when using DCTCP in high BDP environments, where alpha normally stays at low values.
This patch replace the clamping of alpha by setting ssthresh to half of cwnd for both fast retransmissions and RTOs, at most once per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter has been removed.
The table below shows experimental results where we measured the drop probability of a PIE AQM (not applying ECN marks) at a bottleneck in the presence of a single TCP flow with either the alpha-clamping option enabled or the cwnd halving proposed by this patch. Results using reno or cubic are given for comparison.
| Link | RTT | Drop TCP CC | speed | base+AQM | probability ==================|=========|==========|============ CUBIC | 40Mbps | 7+20ms | 0.21% RENO | | | 0.19% DCTCP-CLAMP-ALPHA | | | 25.80% DCTCP-HALVE-CWND | | | 0.22% ------------------|---------|----------|------------ CUBIC | 100Mbps | 7+20ms | 0.03% RENO | | | 0.02% DCTCP-CLAMP-ALPHA | | | 23.30% DCTCP-HALVE-CWND | | | 0.04% ------------------|---------|----------|------------ CUBIC | 800Mbps | 1+1ms | 0.04% RENO | | | 0.05% DCTCP-CLAMP-ALPHA | | | 18.70% DCTCP-HALVE-CWND | | | 0.06%
We see that, without halving its cwnd for all source of losses, DCTCP drives the AQM to large drop probabilities in order to keep the queue length under control (i.e., it repeatedly faces RTOs). Instead, if DCTCP reacts to all source of losses, it can then be controlled by the AQM using similar drop levels than cubic or reno.
Signed-off-by: Koen De Schepper koen.de_schepper@nokia-bell-labs.com Signed-off-by: Olivier Tilmans olivier.tilmans@nokia-bell-labs.com Cc: Bob Briscoe research@bobbriscoe.net Cc: Lawrence Brakmo brakmo@fb.com Cc: Florian Westphal fw@strlen.de Cc: Daniel Borkmann borkmann@iogearbox.net Cc: Yuchung Cheng ycheng@google.com Cc: Neal Cardwell ncardwell@google.com Cc: Eric Dumazet edumazet@google.com Cc: Andrew Shewmaker agshew@gmail.com Cc: Glenn Judd glenn.judd@morganstanley.com Acked-by: Florian Westphal fw@strlen.de Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_dctcp.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-)
--- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -66,11 +66,6 @@ static unsigned int dctcp_alpha_on_init module_param(dctcp_alpha_on_init, uint, 0644); MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value");
-static unsigned int dctcp_clamp_alpha_on_loss __read_mostly; -module_param(dctcp_clamp_alpha_on_loss, uint, 0644); -MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss, - "parameter for clamping alpha on loss"); - static struct tcp_congestion_ops dctcp_reno;
static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca) @@ -211,21 +206,23 @@ static void dctcp_update_alpha(struct so } }
-static void dctcp_state(struct sock *sk, u8 new_state) +static void dctcp_react_to_loss(struct sock *sk) { - if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) { - struct dctcp *ca = inet_csk_ca(sk); + struct dctcp *ca = inet_csk_ca(sk); + struct tcp_sock *tp = tcp_sk(sk);
- /* If this extension is enabled, we clamp dctcp_alpha to - * max on packet loss; the motivation is that dctcp_alpha - * is an indicator to the extend of congestion and packet - * loss is an indicator of extreme congestion; setting - * this in practice turned out to be beneficial, and - * effectively assumes total congestion which reduces the - * window by half. - */ - ca->dctcp_alpha = DCTCP_MAX_ALPHA; - } + ca->loss_cwnd = tp->snd_cwnd; + tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U); +} + +static void dctcp_state(struct sock *sk, u8 new_state) +{ + if (new_state == TCP_CA_Recovery && + new_state != inet_csk(sk)->icsk_ca_state) + dctcp_react_to_loss(sk); + /* We handle RTO in dctcp_cwnd_event to ensure that we perform only + * one loss-adjustment per RTT. + */ }
static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) @@ -237,6 +234,9 @@ static void dctcp_cwnd_event(struct sock case CA_EVENT_ECN_NO_CE: dctcp_ce_state_1_to_0(sk); break; + case CA_EVENT_LOSS: + dctcp_react_to_loss(sk); + break; default: /* Don't care for the rest. */ break;
From: Stephen Suryaputra ssuryaextr@gmail.com
[ Upstream commit 8c83f2df9c6578ea4c5b940d8238ad8a41b87e9e ]
Configuration check to accept source route IP options should be made on the incoming netdevice when the skb->dev is an l3mdev master. The route lookup for the source route next hop also needs the incoming netdev.
v2->v3: - Simplify by passing the original netdevice down the stack (per David Ahern).
Signed-off-by: Stephen Suryaputra ssuryaextr@gmail.com Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ip.h | 2 +- net/ipv4/ip_input.c | 7 +++---- net/ipv4/ip_options.c | 4 ++-- 3 files changed, 6 insertions(+), 7 deletions(-)
--- a/include/net/ip.h +++ b/include/net/ip.h @@ -580,7 +580,7 @@ int ip_options_get_from_user(struct net unsigned char __user *data, int optlen); void ip_options_undo(struct ip_options *opt); void ip_forward_options(struct sk_buff *skb); -int ip_options_rcv_srr(struct sk_buff *skb); +int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev);
/* * Functions provided by ip_sockglue.c --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -259,11 +259,10 @@ int ip_local_deliver(struct sk_buff *skb ip_local_deliver_finish); }
-static inline bool ip_rcv_options(struct sk_buff *skb) +static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev) { struct ip_options *opt; const struct iphdr *iph; - struct net_device *dev = skb->dev;
/* It looks as overkill, because not all IP options require packet mangling. @@ -299,7 +298,7 @@ static inline bool ip_rcv_options(struct } }
- if (ip_options_rcv_srr(skb)) + if (ip_options_rcv_srr(skb, dev)) goto drop; }
@@ -361,7 +360,7 @@ static int ip_rcv_finish(struct net *net } #endif
- if (iph->ihl > 5 && ip_rcv_options(skb)) + if (iph->ihl > 5 && ip_rcv_options(skb, dev)) goto drop;
rt = skb_rtable(skb); --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -614,7 +614,7 @@ void ip_forward_options(struct sk_buff * } }
-int ip_options_rcv_srr(struct sk_buff *skb) +int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev) { struct ip_options *opt = &(IPCB(skb)->opt); int srrspace, srrptr; @@ -649,7 +649,7 @@ int ip_options_rcv_srr(struct sk_buff *s
orefdst = skb->_skb_refdst; skb_dst_set(skb, NULL); - err = ip_route_input(skb, nexthop, iph->saddr, iph->tos, skb->dev); + err = ip_route_input(skb, nexthop, iph->saddr, iph->tos, dev); rt2 = skb_rtable(skb); if (err || (rt2->rt_type != RTN_UNICAST && rt2->rt_type != RTN_LOCAL)) { skb_dst_drop(skb);
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 8e44e96c6c8e8fb80b84a2ca11798a8554f710f2 ]
If the RX completion indicates RX buffers errors, the RX ring will be disabled by firmware and no packets will be received on that ring from that point on. Recover by resetting the device.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1400,11 +1400,17 @@ static int bnxt_rx_pkt(struct bnxt *bp,
rx_buf->data = NULL; if (rxcmp1->rx_cmp_cfa_code_errors_v2 & RX_CMP_L2_ERRORS) { + u32 rx_err = le32_to_cpu(rxcmp1->rx_cmp_cfa_code_errors_v2); + bnxt_reuse_rx_data(rxr, cons, data); if (agg_bufs) bnxt_reuse_rx_agg_bufs(bnapi, cp_cons, agg_bufs);
rc = -EIO; + if (rx_err & RX_CMPL_ERRORS_BUFFER_ERROR_MASK) { + netdev_warn(bp->dev, "RX buffer error %x\n", rx_err); + bnxt_sched_reset(bp, rxr); + } goto next_rx; }
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit a1b0e4e684e9c300b9e759b46cb7a0147e61ddff ]
There is logic to check that the RX/TPA consumer index is the expected index to work around a hardware problem. However, the potentially bad consumer index is first used to index into an array to reference an entry. This can potentially crash if the bad consumer index is beyond legal range. Improve the logic to use the consumer index for dereferencing after the validity check and log an error message.
Fixes: fa7e28127a5a ("bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -959,6 +959,8 @@ static void bnxt_tpa_start(struct bnxt * tpa_info = &rxr->rx_tpa[agg_id];
if (unlikely(cons != rxr->rx_next_cons)) { + netdev_warn(bp->dev, "TPA cons %x != expected cons %x\n", + cons, rxr->rx_next_cons); bnxt_sched_reset(bp, rxr); return; } @@ -1377,14 +1379,16 @@ static int bnxt_rx_pkt(struct bnxt *bp, }
cons = rxcmp->rx_cmp_opaque; - rx_buf = &rxr->rx_buf_ring[cons]; - data = rx_buf->data; if (unlikely(cons != rxr->rx_next_cons)) { int rc1 = bnxt_discard_rx(bp, bnapi, raw_cons, rxcmp);
+ netdev_warn(bp->dev, "RX cons %x != expected cons %x\n", + cons, rxr->rx_next_cons); bnxt_sched_reset(bp, rxr); return rc1; } + rx_buf = &rxr->rx_buf_ring[cons]; + data = rx_buf->data; prefetch(data);
agg_bufs = (le32_to_cpu(rxcmp->rx_cmp_misc_v1) & RX_CMP_AGG_BUFS) >>
From: Yuval Avnery yuvalav@mellanox.com
[ Upstream commit 80a2a9026b24c6bd34b8d58256973e22270bedec ]
Refresh tirs is looping over a global list of tirs while netdevs are adding and removing tirs from that list. That is why a lock is required.
Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring") Signed-off-by: Yuval Avnery yuvalav@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 7 +++++++ include/linux/mlx5/driver.h | 2 ++ 2 files changed, 9 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -45,7 +45,9 @@ int mlx5e_create_tir(struct mlx5_core_de if (err) return err;
+ mutex_lock(&mdev->mlx5e_res.td.list_lock); list_add(&tir->list, &mdev->mlx5e_res.td.tirs_list); + mutex_unlock(&mdev->mlx5e_res.td.list_lock);
return 0; } @@ -53,8 +55,10 @@ int mlx5e_create_tir(struct mlx5_core_de void mlx5e_destroy_tir(struct mlx5_core_dev *mdev, struct mlx5e_tir *tir) { + mutex_lock(&mdev->mlx5e_res.td.list_lock); mlx5_core_destroy_tir(mdev, tir->tirn); list_del(&tir->list); + mutex_unlock(&mdev->mlx5e_res.td.list_lock); }
static int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, @@ -114,6 +118,7 @@ int mlx5e_create_mdev_resources(struct m }
INIT_LIST_HEAD(&mdev->mlx5e_res.td.tirs_list); + mutex_init(&mdev->mlx5e_res.td.list_lock);
return 0;
@@ -151,6 +156,7 @@ int mlx5e_refresh_tirs_self_loopback_ena
MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
+ mutex_lock(&mdev->mlx5e_res.td.list_lock); list_for_each_entry(tir, &mdev->mlx5e_res.td.tirs_list, list) { err = mlx5_core_modify_tir(mdev, tir->tirn, in, inlen); if (err) @@ -159,6 +165,7 @@ int mlx5e_refresh_tirs_self_loopback_ena
out: kvfree(in); + mutex_unlock(&mdev->mlx5e_res.td.list_lock);
return err; } --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -578,6 +578,8 @@ enum mlx5_pci_status { };
struct mlx5_td { + /* protects tirs list changes while tirs refresh */ + struct mutex list_lock; struct list_head tirs_list; u32 tdn; };
From: Eric Dumazet edumazet@google.com
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]
net_hash_mix() currently uses kernel address of a struct net, and is used in many places that could be used to reveal this address to a patient attacker, thus defeating KASLR, for the typical case (initial net namespace, &init_net is not dynamically allocated)
I believe the original implementation tried to avoid spending too many cycles in this function, but security comes first.
Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Amit Klein aksecurity@gmail.com Reported-by: Benny Pinkas benny@pinkas.net Cc: Pavel Emelyanov xemul@openvz.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/net_namespace.h | 1 + include/net/netns/hash.h | 15 ++------------- net/core/net_namespace.c | 1 + 3 files changed, 4 insertions(+), 13 deletions(-)
--- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -53,6 +53,7 @@ struct net { */ spinlock_t rules_mod_lock;
+ u32 hash_mix; atomic64_t cookie_gen;
struct list_head list; /* list of network namespaces */ --- a/include/net/netns/hash.h +++ b/include/net/netns/hash.h @@ -1,21 +1,10 @@ #ifndef __NET_NS_HASH_H__ #define __NET_NS_HASH_H__
-#include <asm/cache.h> - -struct net; +#include <net/net_namespace.h>
static inline u32 net_hash_mix(const struct net *net) { -#ifdef CONFIG_NET_NS - /* - * shift this right to eliminate bits, that are - * always zeroed - */ - - return (u32)(((unsigned long)net) >> L1_CACHE_SHIFT); -#else - return 0; -#endif + return net->hash_mix; } #endif --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -282,6 +282,7 @@ static __net_init int setup_net(struct n
atomic_set(&net->count, 1); atomic_set(&net->passive, 1); + get_random_bytes(&net->hash_mix, sizeof(u32)); net->dev_base_seq = 1; net->user_ns = user_ns; idr_init(&net->netns_ids);
From: Li RongQing lirongqing@baidu.com
[ Upstream commit 3d8830266ffc28c16032b859e38a0252e014b631 ]
NULL or ZERO_SIZE_PTR will be returned for zero sized memory request, and derefencing them will lead to a segfault
so it is unnecessory to call vzalloc for zero sized memory request and not call functions which maybe derefence the NULL allocated memory
this also fixes a possible memory leak if phy_ethtool_get_stats returns error, memory should be freed before exit
Signed-off-by: Li RongQing lirongqing@baidu.com Reviewed-by: Wang Li wangli39@baidu.com Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/ethtool.c | 49 +++++++++++++++++++++++++++++++------------------ 1 file changed, 31 insertions(+), 18 deletions(-)
--- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1801,17 +1801,22 @@ static int ethtool_get_strings(struct ne
gstrings.len = ret;
- data = kcalloc(gstrings.len, ETH_GSTRING_LEN, GFP_USER); - if (!data) - return -ENOMEM; + if (gstrings.len) { + data = kcalloc(gstrings.len, ETH_GSTRING_LEN, GFP_USER); + if (!data) + return -ENOMEM;
- __ethtool_get_strings(dev, gstrings.string_set, data); + __ethtool_get_strings(dev, gstrings.string_set, data); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &gstrings, sizeof(gstrings))) goto out; useraddr += sizeof(gstrings); - if (copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN)) + if (gstrings.len && + copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN)) goto out; ret = 0;
@@ -1899,17 +1904,21 @@ static int ethtool_get_stats(struct net_ return -EFAULT;
stats.n_stats = n_stats; - data = kmalloc(n_stats * sizeof(u64), GFP_USER); - if (!data) - return -ENOMEM; + if (n_stats) { + data = kmalloc(n_stats * sizeof(u64), GFP_USER); + if (!data) + return -ENOMEM;
- ops->get_ethtool_stats(dev, &stats, data); + ops->get_ethtool_stats(dev, &stats, data); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &stats, sizeof(stats))) goto out; useraddr += sizeof(stats); - if (copy_to_user(useraddr, data, stats.n_stats * sizeof(u64))) + if (n_stats && copy_to_user(useraddr, data, n_stats * sizeof(u64))) goto out; ret = 0;
@@ -1938,19 +1947,23 @@ static int ethtool_get_phy_stats(struct return -EFAULT;
stats.n_stats = n_stats; - data = kmalloc_array(n_stats, sizeof(u64), GFP_USER); - if (!data) - return -ENOMEM; - - mutex_lock(&phydev->lock); - phydev->drv->get_stats(phydev, &stats, data); - mutex_unlock(&phydev->lock); + if (n_stats) { + data = kmalloc_array(n_stats, sizeof(u64), GFP_USER); + if (!data) + return -ENOMEM; + + mutex_lock(&phydev->lock); + phydev->drv->get_stats(phydev, &stats, data); + mutex_unlock(&phydev->lock); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &stats, sizeof(stats))) goto out; useraddr += sizeof(stats); - if (copy_to_user(useraddr, data, stats.n_stats * sizeof(u64))) + if (n_stats && copy_to_user(useraddr, data, n_stats * sizeof(u64))) goto out; ret = 0;
From: Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz
[ Upstream commit b2e54b09a3d29c4db883b920274ca8dca4d9f04d ]
The device type for ip6 tunnels is set to ARPHRD_TUNNEL6. However, the ip4ip6_err function is expecting the device type of the tunnel to be ARPHRD_TUNNEL. Since the device types do not match, the function exits and the ICMP error packet is not sent to the originating host. Note that the device type for IPv4 tunnels is set to ARPHRD_TUNNEL.
Fix is to expect a tunnel device type of ARPHRD_TUNNEL6 instead. Now the tunnel device type matches and the ICMP error packet is sent to the originating host.
Signed-off-by: Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_tunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -634,7 +634,7 @@ ip4ip6_err(struct sk_buff *skb, struct i IPPROTO_IPIP, RT_TOS(eiph->tos), 0); if (IS_ERR(rt) || - rt->dst.dev->type != ARPHRD_TUNNEL) { + rt->dst.dev->type != ARPHRD_TUNNEL6) { if (!IS_ERR(rt)) ip_rt_put(rt); goto out; @@ -644,7 +644,7 @@ ip4ip6_err(struct sk_buff *skb, struct i ip_rt_put(rt); if (ip_route_input(skb2, eiph->daddr, eiph->saddr, eiph->tos, skb2->dev) || - skb_dst(skb2)->dev->type != ARPHRD_TUNNEL) + skb_dst(skb2)->dev->type != ARPHRD_TUNNEL6) goto out; }
From: Zubin Mithra zsm@chromium.org
commit 212ac181c158c09038c474ba68068be49caecebb upstream.
When ioctl calls are made with non-null-terminated userspace strings, strlcpy causes an OOB-read from within strlen. Fix by changing to use strscpy instead.
Signed-off-by: Zubin Mithra zsm@chromium.org Reviewed-by: Guenter Roeck groeck@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/seq/seq_clientmgr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -1249,7 +1249,7 @@ static int snd_seq_ioctl_set_client_info
/* fill the info fields */ if (client_info->name[0]) - strlcpy(client->name, client_info->name, sizeof(client->name)); + strscpy(client->name, client_info->name, sizeof(client->name));
client->filter = client_info->filter; client->event_lost = client_info->event_lost; @@ -1527,7 +1527,7 @@ static int snd_seq_ioctl_create_queue(st /* set queue name */ if (!info->name[0]) snprintf(info->name, sizeof(info->name), "Queue-%d", q->queue); - strlcpy(q->name, info->name, sizeof(q->name)); + strscpy(q->name, info->name, sizeof(q->name)); snd_use_lock_free(&q->use_lock);
return 0; @@ -1589,7 +1589,7 @@ static int snd_seq_ioctl_set_queue_info( queuefree(q); return -EPERM; } - strlcpy(q->name, info->name, sizeof(q->name)); + strscpy(q->name, info->name, sizeof(q->name)); queuefree(q);
return 0;
From: Helge Deller deller@gmx.de
commit d006e95b5561f708d0385e9677ffe2c46f2ae345 upstream.
While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before.
This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding.
Fixes: 310d82784fb4 ("parisc: qemu idle sleep support") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/kernel/process.c | 6 ------ arch/parisc/kernel/setup.c | 3 +++ 2 files changed, 3 insertions(+), 6 deletions(-)
--- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -206,12 +206,6 @@ void __cpuidle arch_cpu_idle(void)
static int __init parisc_idle_init(void) { - const char *marker; - - /* check QEMU/SeaBIOS marker in PAGE0 */ - marker = (char *) &PAGE0->pad0; - running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0); - if (!running_on_qemu) cpu_idle_poll_ctrl(1);
--- a/arch/parisc/kernel/setup.c +++ b/arch/parisc/kernel/setup.c @@ -403,6 +403,9 @@ void start_parisc(void) int ret, cpunum; struct pdc_coproc_cfg coproc_cfg;
+ /* check QEMU/SeaBIOS marker in PAGE0 */ + running_on_qemu = (memcmp(&PAGE0->pad0, "SeaBIOS", 8) == 0); + cpunum = smp_processor_id();
set_firmware_width_unlocked();
Hi Greg,
please do NOT apply this patch to the 4.9 tree. See below, it was tagged for v4.14+ only. It breaks the build on 4.9, I got a 0-day build bug about it.
Thanks, Helge
On 15.04.19 20:44, Greg Kroah-Hartman wrote:
From: Helge Deller deller@gmx.de
commit d006e95b5561f708d0385e9677ffe2c46f2ae345 upstream.
While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before.
This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding.
Fixes: 310d82784fb4 ("parisc: qemu idle sleep support") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/parisc/kernel/process.c | 6 ------ arch/parisc/kernel/setup.c | 3 +++ 2 files changed, 3 insertions(+), 6 deletions(-)
--- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -206,12 +206,6 @@ void __cpuidle arch_cpu_idle(void)
static int __init parisc_idle_init(void) {
- const char *marker;
- /* check QEMU/SeaBIOS marker in PAGE0 */
- marker = (char *) &PAGE0->pad0;
- running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0);
- if (!running_on_qemu) cpu_idle_poll_ctrl(1);
--- a/arch/parisc/kernel/setup.c +++ b/arch/parisc/kernel/setup.c @@ -403,6 +403,9 @@ void start_parisc(void) int ret, cpunum; struct pdc_coproc_cfg coproc_cfg;
/* check QEMU/SeaBIOS marker in PAGE0 */
running_on_qemu = (memcmp(&PAGE0->pad0, "SeaBIOS", 8) == 0);
cpunum = smp_processor_id();
set_firmware_width_unlocked();
On Tue, Apr 16, 2019 at 03:50:16PM +0200, Helge Deller wrote:
Hi Greg,
please do NOT apply this patch to the 4.9 tree. See below, it was tagged for v4.14+ only.
Yes, but I added it to 4.9 because:
commit d006e95b5561f708d0385e9677ffe2c46f2ae345 upstream.
While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before.
This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding.
Fixes: 310d82784fb4 ("parisc: qemu idle sleep support")
This commit is in 4.9.76. So why wouldn't it be valid in 4.9.y (with the exception of the fact that it doesn't build...)?
thanks,
greg k-h
On 16.04.19 16:23, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 03:50:16PM +0200, Helge Deller wrote:
Hi Greg,
please do NOT apply this patch to the 4.9 tree. See below, it was tagged for v4.14+ only.
Yes, but I added it to 4.9 because:
Ah, ok.
commit d006e95b5561f708d0385e9677ffe2c46f2ae345 upstream.
While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before.
This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding.
Fixes: 310d82784fb4 ("parisc: qemu idle sleep support")
This commit is in 4.9.76. So why wouldn't it be valid in 4.9.y (with the exception of the fact that it doesn't build...)?
In my opinion it was not a critical-enough patch to ask for backporting this and the other depend patch. Idle-sleep does work without this patch, it's just upcoming qemu-support wich will not run optimal (but does work).
That said, of course it would be nice to get that backported in 4.9. I *think* if you pull in 5ffa8518851f1401817c15d2a7eecc0373c26ff9 too, it should be ok and fix the build issue.
Helge
On Tue, Apr 16, 2019 at 05:00:15PM +0200, Helge Deller wrote:
On 16.04.19 16:23, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 03:50:16PM +0200, Helge Deller wrote:
Hi Greg,
please do NOT apply this patch to the 4.9 tree. See below, it was tagged for v4.14+ only.
Yes, but I added it to 4.9 because:
Ah, ok.
commit d006e95b5561f708d0385e9677ffe2c46f2ae345 upstream.
While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before.
This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding.
Fixes: 310d82784fb4 ("parisc: qemu idle sleep support")
This commit is in 4.9.76. So why wouldn't it be valid in 4.9.y (with the exception of the fact that it doesn't build...)?
In my opinion it was not a critical-enough patch to ask for backporting this and the other depend patch. Idle-sleep does work without this patch, it's just upcoming qemu-support wich will not run optimal (but does work).
That said, of course it would be nice to get that backported in 4.9. I *think* if you pull in 5ffa8518851f1401817c15d2a7eecc0373c26ff9 too, it should be ok and fix the build issue.
Ok, I've pulled that patch in now as well, let me regenerate the tree and see if that solves the issue or not.
thanks,
greg k-h
From: Arnd Bergmann arnd@arndb.de
commit 6147e136ff5071609b54f18982dea87706288e21 upstream.
clang points out with hundreds of warnings that the bitrev macros have a problem with constant input:
drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] u8 crc = bitrev8(data->val_status & 0x0F); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8' __constant_bitrev8(__x) : \ ~~~~~~~~~~~~~~~~~~~^~~~ include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8' u8 __x = x; \ ~~~ ^
Both the bitrev and the __constant_bitrev macros use an internal variable named __x, which goes horribly wrong when passing one to the other.
The obvious fix is to rename one of the variables, so this adds an extra '_'.
It seems we got away with this because
- there are only a few drivers using bitrev macros
- usually there are no constant arguments to those
- when they are constant, they tend to be either 0 or (unsigned)-1 (drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and give the correct result by pure chance.
In fact, the only driver that I could find that gets different results with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver for fairly rare hardware (adding the maintainer to Cc for testing).
Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de Fixes: 556d2f055bf6 ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Nick Desaulniers ndesaulniers@google.com Cc: Zhao Qiang qiang.zhao@nxp.com Cc: Yalin Wang yalin.wang@sonymobile.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/bitrev.h | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-)
--- a/include/linux/bitrev.h +++ b/include/linux/bitrev.h @@ -31,32 +31,32 @@ static inline u32 __bitrev32(u32 x)
#define __constant_bitrev32(x) \ ({ \ - u32 __x = x; \ - __x = (__x >> 16) | (__x << 16); \ - __x = ((__x & (u32)0xFF00FF00UL) >> 8) | ((__x & (u32)0x00FF00FFUL) << 8); \ - __x = ((__x & (u32)0xF0F0F0F0UL) >> 4) | ((__x & (u32)0x0F0F0F0FUL) << 4); \ - __x = ((__x & (u32)0xCCCCCCCCUL) >> 2) | ((__x & (u32)0x33333333UL) << 2); \ - __x = ((__x & (u32)0xAAAAAAAAUL) >> 1) | ((__x & (u32)0x55555555UL) << 1); \ - __x; \ + u32 ___x = x; \ + ___x = (___x >> 16) | (___x << 16); \ + ___x = ((___x & (u32)0xFF00FF00UL) >> 8) | ((___x & (u32)0x00FF00FFUL) << 8); \ + ___x = ((___x & (u32)0xF0F0F0F0UL) >> 4) | ((___x & (u32)0x0F0F0F0FUL) << 4); \ + ___x = ((___x & (u32)0xCCCCCCCCUL) >> 2) | ((___x & (u32)0x33333333UL) << 2); \ + ___x = ((___x & (u32)0xAAAAAAAAUL) >> 1) | ((___x & (u32)0x55555555UL) << 1); \ + ___x; \ })
#define __constant_bitrev16(x) \ ({ \ - u16 __x = x; \ - __x = (__x >> 8) | (__x << 8); \ - __x = ((__x & (u16)0xF0F0U) >> 4) | ((__x & (u16)0x0F0FU) << 4); \ - __x = ((__x & (u16)0xCCCCU) >> 2) | ((__x & (u16)0x3333U) << 2); \ - __x = ((__x & (u16)0xAAAAU) >> 1) | ((__x & (u16)0x5555U) << 1); \ - __x; \ + u16 ___x = x; \ + ___x = (___x >> 8) | (___x << 8); \ + ___x = ((___x & (u16)0xF0F0U) >> 4) | ((___x & (u16)0x0F0FU) << 4); \ + ___x = ((___x & (u16)0xCCCCU) >> 2) | ((___x & (u16)0x3333U) << 2); \ + ___x = ((___x & (u16)0xAAAAU) >> 1) | ((___x & (u16)0x5555U) << 1); \ + ___x; \ })
#define __constant_bitrev8(x) \ ({ \ - u8 __x = x; \ - __x = (__x >> 4) | (__x << 4); \ - __x = ((__x & (u8)0xCCU) >> 2) | ((__x & (u8)0x33U) << 2); \ - __x = ((__x & (u8)0xAAU) >> 1) | ((__x & (u8)0x55U) << 1); \ - __x; \ + u8 ___x = x; \ + ___x = (___x >> 4) | (___x << 4); \ + ___x = ((___x & (u8)0xCCU) >> 2) | ((___x & (u8)0x33U) << 2); \ + ___x = ((___x & (u8)0xAAU) >> 1) | ((___x & (u8)0x55U) << 1); \ + ___x; \ })
#define bitrev32(x) \
From: S.j. Wang shengjiu.wang@nxp.com
commit 0ff4e8c61b794a4bf6c854ab071a1abaaa80f358 upstream.
There is very low possibility ( < 0.1% ) that channel swap happened in beginning when multi output/input pin is enabled. The issue is that hardware can't send data to correct pin in the beginning with the normal enable flow.
This is hardware issue, but there is no errata, the workaround flow is that: Each time playback/recording, firstly clear the xSMA/xSMB, then enable TE/RE, then enable xSMB and xSMA (xSMB must be enabled before xSMA). Which is to use the xSMA as the trigger start register, previously the xCR_TE or xCR_RE is the bit for starting.
Fixes commit 43d24e76b698 ("ASoC: fsl_esai: Add ESAI CPU DAI driver") Cc: stable@vger.kernel.org Reviewed-by: Fabio Estevam festevam@gmail.com Acked-by: Nicolin Chen nicoleotsuka@gmail.com Signed-off-by: Shengjiu Wang shengjiu.wang@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/fsl/fsl_esai.c | 47 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 37 insertions(+), 10 deletions(-)
--- a/sound/soc/fsl/fsl_esai.c +++ b/sound/soc/fsl/fsl_esai.c @@ -59,6 +59,8 @@ struct fsl_esai { u32 fifo_depth; u32 slot_width; u32 slots; + u32 tx_mask; + u32 rx_mask; u32 hck_rate[2]; u32 sck_rate[2]; bool hck_dir[2]; @@ -359,21 +361,13 @@ static int fsl_esai_set_dai_tdm_slot(str regmap_update_bits(esai_priv->regmap, REG_ESAI_TCCR, ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
- regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMA, - ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(tx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMB, - ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(tx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_RCCR, ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
- regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMA, - ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(rx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMB, - ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(rx_mask)); - esai_priv->slot_width = slot_width; esai_priv->slots = slots; + esai_priv->tx_mask = tx_mask; + esai_priv->rx_mask = rx_mask;
return 0; } @@ -594,6 +588,7 @@ static int fsl_esai_trigger(struct snd_p bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK; u8 i, channels = substream->runtime->channels; u32 pins = DIV_ROUND_UP(channels, esai_priv->slots); + u32 mask;
switch (cmd) { case SNDRV_PCM_TRIGGER_START: @@ -606,15 +601,38 @@ static int fsl_esai_trigger(struct snd_p for (i = 0; tx && i < channels; i++) regmap_write(esai_priv->regmap, REG_ESAI_ETDR, 0x0);
+ /* + * When set the TE/RE in the end of enablement flow, there + * will be channel swap issue for multi data line case. + * In order to workaround this issue, we switch the bit + * enablement sequence to below sequence + * 1) clear the xSMB & xSMA: which is done in probe and + * stop state. + * 2) set TE/RE + * 3) set xSMB + * 4) set xSMA: xSMA is the last one in this flow, which + * will trigger esai to start. + */ regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK, tx ? ESAI_xCR_TE(pins) : ESAI_xCR_RE(pins)); + mask = tx ? esai_priv->tx_mask : esai_priv->rx_mask; + + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMB(tx), + ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(mask)); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMA(tx), + ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(mask)); + break; case SNDRV_PCM_TRIGGER_SUSPEND: case SNDRV_PCM_TRIGGER_STOP: case SNDRV_PCM_TRIGGER_PAUSE_PUSH: regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK, 0); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMA(tx), + ESAI_xSMA_xS_MASK, 0); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMB(tx), + ESAI_xSMB_xS_MASK, 0);
/* Disable and reset FIFO */ regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), @@ -904,6 +922,15 @@ static int fsl_esai_probe(struct platfor return ret; }
+ esai_priv->tx_mask = 0xFFFFFFFF; + esai_priv->rx_mask = 0xFFFFFFFF; + + /* Clear the TSMA, TSMB, RSMA, RSMB */ + regmap_write(esai_priv->regmap, REG_ESAI_TSMA, 0); + regmap_write(esai_priv->regmap, REG_ESAI_TSMB, 0); + regmap_write(esai_priv->regmap, REG_ESAI_RSMA, 0); + regmap_write(esai_priv->regmap, REG_ESAI_RSMB, 0); + ret = devm_snd_soc_register_component(&pdev->dev, &fsl_esai_component, &fsl_esai_dai, 1); if (ret) {
From: Filipe Manana fdmanana@suse.com
commit f35f06c35560a86e841631f0243b83a984dc11a9 upstream.
Whan a filesystem is mounted with the nologreplay mount option, which requires it to be mounted in RO mode as well, we can not allow discard on free space inside block groups, because log trees refer to extents that are not pinned in a block group's free space cache (pinning the extents is precisely the first phase of replaying a log tree).
So do not allow the fitrim ioctl to do anything when the filesystem is mounted with the nologreplay option, because later it can be mounted RW without that option, which causes log replay to happen and result in either a failure to replay the log trees (leading to a mount failure), a crash or some silent corruption.
Reported-by: Darrick J. Wong darrick.wong@oracle.com Fixes: 96da09192cda ("btrfs: Introduce new mount option to disable tree log replay") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/ioctl.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -385,6 +385,16 @@ static noinline int btrfs_ioctl_fitrim(s if (!capable(CAP_SYS_ADMIN)) return -EPERM;
+ /* + * If the fs is mounted with nologreplay, which requires it to be + * mounted in RO mode as well, we can not allow discard on free space + * inside block groups, because log trees refer to extents that are not + * pinned in a block group's free space cache (pinning the extents is + * precisely the first phase of replaying a log tree). + */ + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) + return -EROFS; + rcu_read_lock(); list_for_each_entry_rcu(device, &fs_info->fs_devices->devices, dev_list) {
From: Jérôme Glisse jglisse@redhat.com
commit a3761c3c91209b58b6f33bf69dd8bb8ec0c9d925 upstream.
When bio_add_pc_page() fails in bio_copy_user_iov() we should free the page we just allocated otherwise we are leaking it.
Cc: linux-block@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/bio.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/block/bio.c +++ b/block/bio.c @@ -1214,8 +1214,11 @@ struct bio *bio_copy_user_iov(struct req } }
- if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) + if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) { + if (!map_data) + __free_page(page); break; + }
len -= bytes; offset = 0;
From: Stephen Boyd swboyd@chromium.org
commit 325aa19598e410672175ed50982f902d4e3f31c5 upstream.
If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.
This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to walk the chain of parents and set irq wake on any chips that don't have the flag set either. If the intent is to call the .irq_set_wake() callback of the parent irqchip, then we expect irqchip implementations to omit the IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that calls irq_chip_set_wake_parent().
The problem has been observed on a Qualcomm sdm845 device where set wake fails on any GPIO interrupts after applying work in progress wakeup irq patches to the GPIO driver. The chain of chips looks like this:
QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)
The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM GIC as parent. The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag set, and so does the grandparent ARM GIC.
The GPIO driver doesn't know if the parent needs to set wake or not, so it unconditionally calls irq_chip_set_wake_parent() causing this function to return a failure because the parent irqchip (PDC) doesn't have the .irq_set_wake() callback set. Returning 0 instead makes everything work and irqs from the GPIO controller can be configured for wakeup.
Make it consistent by returning 0 (success) from irq_chip_set_wake_parent() when a parent chip has IRQCHIP_SKIP_SET_WAKE set.
[ tglx: Massaged changelog ]
Fixes: 08b55e2a9208e ("genirq: Add irqchip_set_wake_parent") Signed-off-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Marc Zyngier marc.zyngier@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-gpio@vger.kernel.org Cc: Lina Iyer ilina@codeaurora.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/chip.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -1142,6 +1142,10 @@ int irq_chip_set_vcpu_affinity_parent(st int irq_chip_set_wake_parent(struct irq_data *data, unsigned int on) { data = data->parent_data; + + if (data->chip->flags & IRQCHIP_SKIP_SET_WAKE) + return 0; + if (data->chip->irq_set_wake) return data->chip->irq_set_wake(data, on);
From: Cornelia Huck cohuck@redhat.com
commit cf94db21905333e610e479688add629397a4b384 upstream.
vring_create_virtqueue() allows the caller to specify via the may_reduce_num parameter whether the vring code is allowed to allocate a smaller ring than specified.
However, the split ring allocation code tries to allocate a smaller ring on allocation failure regardless of what the caller specified. This may cause trouble for e.g. virtio-pci in legacy mode, which does not support ring resizing. (The packed ring code does not resize in any case.)
Let's fix this by bailing out immediately in the split ring code if the requested size cannot be allocated and may_reduce_num has not been specified.
While at it, fix a typo in the usage instructions.
Fixes: 2a2d1382fe9d ("virtio: Add improved queue allocation API") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Halil Pasic pasic@linux.ibm.com Reviewed-by: Jens Freimann jfreimann@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/virtio/virtio_ring.c | 2 ++ include/linux/virtio_ring.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1040,6 +1040,8 @@ struct virtqueue *vring_create_virtqueue GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO); if (queue) break; + if (!may_reduce_num) + return NULL; }
if (!num) --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -62,7 +62,7 @@ struct virtqueue; /* * Creates a virtqueue and allocates the descriptor ring. If * may_reduce_num is set, then this may allocate a smaller ring than - * expected. The caller should query virtqueue_get_ring_size to learn + * expected. The caller should query virtqueue_get_vring_size to learn * the actual size of the ring. */ struct virtqueue *vring_create_virtqueue(unsigned int index,
From: David Engraf david.engraf@sysgo.com
commit e7dfb6d04e4715be1f3eb2c60d97b753fd2e4516 upstream.
The function argument for the ISC_D0 on PC9 was incorrect. According to the documentation it should be 'C' aka 3.
Signed-off-by: David Engraf david.engraf@sysgo.com Reviewed-by: Nicolas Ferre nicolas.ferre@microchip.com Signed-off-by: Ludovic Desroches ludovic.desroches@microchip.com Fixes: 7f16cb676c00 ("ARM: at91/dt: add sama5d2 pinmux") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/sama5d2-pinfunc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/sama5d2-pinfunc.h +++ b/arch/arm/boot/dts/sama5d2-pinfunc.h @@ -517,7 +517,7 @@ #define PIN_PC9__GPIO PINMUX_PIN(PIN_PC9, 0, 0) #define PIN_PC9__FIQ PINMUX_PIN(PIN_PC9, 1, 3) #define PIN_PC9__GTSUCOMP PINMUX_PIN(PIN_PC9, 2, 1) -#define PIN_PC9__ISC_D0 PINMUX_PIN(PIN_PC9, 2, 1) +#define PIN_PC9__ISC_D0 PINMUX_PIN(PIN_PC9, 3, 1) #define PIN_PC9__TIOA4 PINMUX_PIN(PIN_PC9, 4, 2) #define PIN_PC10 74 #define PIN_PC10__GPIO PINMUX_PIN(PIN_PC10, 0, 0)
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -33,8 +33,8 @@ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -53,29 +53,29 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) { - int oldval = 0, ret, tmp; + int oldval, ret, tmp;
pagefault_disable();
switch (op) { case FUTEX_OP_SET: - __futex_atomic_op("mov %w0, %w4", + __futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_ADD: - __futex_atomic_op("add %w0, %w1, %w4", + __futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_OR: - __futex_atomic_op("orr %w0, %w1, %w4", + __futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_ANDN: - __futex_atomic_op("and %w0, %w1, %w4", + __futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg); break; case FUTEX_OP_XOR: - __futex_atomic_op("eor %w0, %w1, %w4", + __futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; default:
On Mon, Apr 15, 2019 at 08:44:36PM +0200, Greg Kroah-Hartman wrote:
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
If the result of the atomic operation is zero, the system call behaves correctly
Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -33,8 +33,8 @@ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -53,29 +53,29 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) {
- int oldval = 0, ret, tmp;
- int oldval, ret, tmp;
pagefault_disable(); switch (op) { case FUTEX_OP_SET:
__futex_atomic_op("mov %w0, %w4",
break; case FUTEX_OP_ADD:__futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("add %w0, %w1, %w4",
break; case FUTEX_OP_OR:__futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("orr %w0, %w1, %w4",
break; case FUTEX_OP_ANDN:__futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("and %w0, %w1, %w4",
break; case FUTEX_OP_XOR:__futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg);
__futex_atomic_op("eor %w0, %w1, %w4",
break; default:__futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
This causes a (false) build warning with AOSP's GCC 4.9.4 (which is used to build nearly all arm64 Android kernels before 4.14):
CC kernel/futex.o ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1492:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:69:0: ../arch/arm64/include/asm/futex.h:56:6: note: 'oldval' was declared here int oldval, ret, tmp; ^
The only reason I bring this up is Qualcomm based kernels have a Python script that emulates -Werror, meaning this will be fatal for a large number of kernels, when this eventually gets merged into them.
Nathan
On Mon, Apr 15, 2019 at 03:01:51PM -0700, Nathan Chancellor wrote:
On Mon, Apr 15, 2019 at 08:44:36PM +0200, Greg Kroah-Hartman wrote:
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
If the result of the atomic operation is zero, the system call behaves correctly
Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -33,8 +33,8 @@ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -53,29 +53,29 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) {
- int oldval = 0, ret, tmp;
- int oldval, ret, tmp;
pagefault_disable(); switch (op) { case FUTEX_OP_SET:
__futex_atomic_op("mov %w0, %w4",
break; case FUTEX_OP_ADD:__futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("add %w0, %w1, %w4",
break; case FUTEX_OP_OR:__futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("orr %w0, %w1, %w4",
break; case FUTEX_OP_ANDN:__futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("and %w0, %w1, %w4",
break; case FUTEX_OP_XOR:__futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg);
__futex_atomic_op("eor %w0, %w1, %w4",
break; default:__futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
This causes a (false) build warning with AOSP's GCC 4.9.4 (which is used to build nearly all arm64 Android kernels before 4.14):
CC kernel/futex.o ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1492:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:69:0: ../arch/arm64/include/asm/futex.h:56:6: note: 'oldval' was declared here int oldval, ret, tmp; ^
The only reason I bring this up is Qualcomm based kernels have a Python script that emulates -Werror, meaning this will be fatal for a large number of kernels, when this eventually gets merged into them.
Argh, really? That's a buggy compiler that you have there, as oldval will be set correctly if all is good, and if not, ret will be and the code will error out.
Working around broken compilers is not something I really like doing :(
That being said, does this also show up in the 4.19.y and 5.0.y tree right now? If not, why not?
thanks,
greg k-h
On Tue, Apr 16, 2019 at 11:00:52AM +0200, Greg Kroah-Hartman wrote:
On Mon, Apr 15, 2019 at 03:01:51PM -0700, Nathan Chancellor wrote:
On Mon, Apr 15, 2019 at 08:44:36PM +0200, Greg Kroah-Hartman wrote:
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
If the result of the atomic operation is zero, the system call behaves correctly
Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -33,8 +33,8 @@ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -53,29 +53,29 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) {
- int oldval = 0, ret, tmp;
- int oldval, ret, tmp;
pagefault_disable(); switch (op) { case FUTEX_OP_SET:
__futex_atomic_op("mov %w0, %w4",
break; case FUTEX_OP_ADD:__futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("add %w0, %w1, %w4",
break; case FUTEX_OP_OR:__futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("orr %w0, %w1, %w4",
break; case FUTEX_OP_ANDN:__futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("and %w0, %w1, %w4",
break; case FUTEX_OP_XOR:__futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg);
__futex_atomic_op("eor %w0, %w1, %w4",
break; default:__futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
This causes a (false) build warning with AOSP's GCC 4.9.4 (which is used to build nearly all arm64 Android kernels before 4.14):
CC kernel/futex.o ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1492:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:69:0: ../arch/arm64/include/asm/futex.h:56:6: note: 'oldval' was declared here int oldval, ret, tmp; ^
The only reason I bring this up is Qualcomm based kernels have a Python script that emulates -Werror, meaning this will be fatal for a large number of kernels, when this eventually gets merged into them.
Argh, really? That's a buggy compiler that you have there, as oldval will be set correctly if all is good, and if not, ret will be and the code will error out.
Correct.
Working around broken compilers is not something I really like doing :(
Indeed, I wouldn't have brought it up if it wasn't the compiler for all Android 4.9 kernels aside from the Pixel 3 (XL).
That being said, does this also show up in the 4.19.y and 5.0.y tree right now? If not, why not?
It does.
$ make ARCH=arm64 CROSS_COMPILE=<path>/bin/aarch64-linux-gnu- defconfig kernel/futex.o
thanks,
greg k-h
On Tue, Apr 16, 2019 at 09:47:51AM -0700, Nathan Chancellor wrote:
On Tue, Apr 16, 2019 at 11:00:52AM +0200, Greg Kroah-Hartman wrote:
On Mon, Apr 15, 2019 at 03:01:51PM -0700, Nathan Chancellor wrote:
On Mon, Apr 15, 2019 at 08:44:36PM +0200, Greg Kroah-Hartman wrote:
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
If the result of the atomic operation is zero, the system call behaves correctly
Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -33,8 +33,8 @@ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -53,29 +53,29 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) {
- int oldval = 0, ret, tmp;
- int oldval, ret, tmp;
pagefault_disable(); switch (op) { case FUTEX_OP_SET:
__futex_atomic_op("mov %w0, %w4",
break; case FUTEX_OP_ADD:__futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("add %w0, %w1, %w4",
break; case FUTEX_OP_OR:__futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("orr %w0, %w1, %w4",
break; case FUTEX_OP_ANDN:__futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("and %w0, %w1, %w4",
break; case FUTEX_OP_XOR:__futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg);
__futex_atomic_op("eor %w0, %w1, %w4",
break; default:__futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
This causes a (false) build warning with AOSP's GCC 4.9.4 (which is used to build nearly all arm64 Android kernels before 4.14):
CC kernel/futex.o ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1492:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:69:0: ../arch/arm64/include/asm/futex.h:56:6: note: 'oldval' was declared here int oldval, ret, tmp; ^
The only reason I bring this up is Qualcomm based kernels have a Python script that emulates -Werror, meaning this will be fatal for a large number of kernels, when this eventually gets merged into them.
Argh, really? That's a buggy compiler that you have there, as oldval will be set correctly if all is good, and if not, ret will be and the code will error out.
Correct.
Working around broken compilers is not something I really like doing :(
Indeed, I wouldn't have brought it up if it wasn't the compiler for all Android 4.9 kernels aside from the Pixel 3 (XL).
That being said, does this also show up in the 4.19.y and 5.0.y tree right now? If not, why not?
It does.
$ make ARCH=arm64 CROSS_COMPILE=<path>/bin/aarch64-linux-gnu- defconfig kernel/futex.o
Great, so it seems this needs to be fixed in Linus's tree first, before I can backport it everywhere.
Want me to send a patch for this or can you?
thanks,
greg k-h
On Wed, Apr 17, 2019 at 08:15:08AM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 09:47:51AM -0700, Nathan Chancellor wrote:
On Tue, Apr 16, 2019 at 11:00:52AM +0200, Greg Kroah-Hartman wrote:
On Mon, Apr 15, 2019 at 03:01:51PM -0700, Nathan Chancellor wrote:
On Mon, Apr 15, 2019 at 08:44:36PM +0200, Greg Kroah-Hartman wrote:
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
If the result of the atomic operation is zero, the system call behaves correctly
Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -33,8 +33,8 @@ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -53,29 +53,29 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) {
- int oldval = 0, ret, tmp;
- int oldval, ret, tmp;
pagefault_disable(); switch (op) { case FUTEX_OP_SET:
__futex_atomic_op("mov %w0, %w4",
break; case FUTEX_OP_ADD:__futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("add %w0, %w1, %w4",
break; case FUTEX_OP_OR:__futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("orr %w0, %w1, %w4",
break; case FUTEX_OP_ANDN:__futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("and %w0, %w1, %w4",
break; case FUTEX_OP_XOR:__futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg);
__futex_atomic_op("eor %w0, %w1, %w4",
break; default:__futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
This causes a (false) build warning with AOSP's GCC 4.9.4 (which is used to build nearly all arm64 Android kernels before 4.14):
CC kernel/futex.o ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1492:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:69:0: ../arch/arm64/include/asm/futex.h:56:6: note: 'oldval' was declared here int oldval, ret, tmp; ^
The only reason I bring this up is Qualcomm based kernels have a Python script that emulates -Werror, meaning this will be fatal for a large number of kernels, when this eventually gets merged into them.
Argh, really? That's a buggy compiler that you have there, as oldval will be set correctly if all is good, and if not, ret will be and the code will error out.
Correct.
Working around broken compilers is not something I really like doing :(
Indeed, I wouldn't have brought it up if it wasn't the compiler for all Android 4.9 kernels aside from the Pixel 3 (XL).
That being said, does this also show up in the 4.19.y and 5.0.y tree right now? If not, why not?
It does.
$ make ARCH=arm64 CROSS_COMPILE=<path>/bin/aarch64-linux-gnu- defconfig kernel/futex.o
Great, so it seems this needs to be fixed in Linus's tree first, before I can backport it everywhere.
Well, is it worth working around this in Linus's tree? I know you hate taking patches just for stable but this compiler won't be used on 4.14+ according to [1] and support for it is planned to be discontinued in less than a year [2]. This warning doesn't happen with Clang or newer versions of GCC (I tested 6.3 in a Debian Docker image, which seems to be the oldest I can find). I suppose there could be other buggy/ancient compilers to work around...
Want me to send a patch for this or can you?
I am happy to send a patch regardless of where it goes, just want to be sure we are all on the same page.
[1]: https://android.googlesource.com/platform/test/vts-testcase/kernel/+/e1622ae... [2]: https://android.googlesource.com/platform/prebuilts/clang/host/linux-x86/+/a...
Thanks, Nathan
thanks,
greg k-h
On Tue, Apr 16, 2019 at 11:41:53PM -0700, Nathan Chancellor wrote:
On Wed, Apr 17, 2019 at 08:15:08AM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 09:47:51AM -0700, Nathan Chancellor wrote:
On Tue, Apr 16, 2019 at 11:00:52AM +0200, Greg Kroah-Hartman wrote:
On Mon, Apr 15, 2019 at 03:01:51PM -0700, Nathan Chancellor wrote:
On Mon, Apr 15, 2019 at 08:44:36PM +0200, Greg Kroah-Hartman wrote:
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
If the result of the atomic operation is zero, the system call behaves correctly
Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -33,8 +33,8 @@ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -53,29 +53,29 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) {
- int oldval = 0, ret, tmp;
- int oldval, ret, tmp;
pagefault_disable(); switch (op) { case FUTEX_OP_SET:
__futex_atomic_op("mov %w0, %w4",
break; case FUTEX_OP_ADD:__futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("add %w0, %w1, %w4",
break; case FUTEX_OP_OR:__futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("orr %w0, %w1, %w4",
break; case FUTEX_OP_ANDN:__futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("and %w0, %w1, %w4",
break; case FUTEX_OP_XOR:__futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg);
__futex_atomic_op("eor %w0, %w1, %w4",
break; default:__futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
This causes a (false) build warning with AOSP's GCC 4.9.4 (which is used to build nearly all arm64 Android kernels before 4.14):
CC kernel/futex.o ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1492:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:69:0: ../arch/arm64/include/asm/futex.h:56:6: note: 'oldval' was declared here int oldval, ret, tmp; ^
The only reason I bring this up is Qualcomm based kernels have a Python script that emulates -Werror, meaning this will be fatal for a large number of kernels, when this eventually gets merged into them.
Argh, really? That's a buggy compiler that you have there, as oldval will be set correctly if all is good, and if not, ret will be and the code will error out.
Correct.
Working around broken compilers is not something I really like doing :(
Indeed, I wouldn't have brought it up if it wasn't the compiler for all Android 4.9 kernels aside from the Pixel 3 (XL).
That being said, does this also show up in the 4.19.y and 5.0.y tree right now? If not, why not?
It does.
$ make ARCH=arm64 CROSS_COMPILE=<path>/bin/aarch64-linux-gnu- defconfig kernel/futex.o
Great, so it seems this needs to be fixed in Linus's tree first, before I can backport it everywhere.
Well, is it worth working around this in Linus's tree? I know you hate taking patches just for stable but this compiler won't be used on 4.14+ according to [1] and support for it is planned to be discontinued in less than a year [2]. This warning doesn't happen with Clang or newer versions of GCC (I tested 6.3 in a Debian Docker image, which seems to be the oldest I can find). I suppose there could be other buggy/ancient compilers to work around...
Yes it is worth it, I do not take fixes that are not in Linus's tree, otherwise we will have major problems over time.
As this is a valid compiler version to be using for 5.0+, it should be fixed upstream first.
thanks,
greg k-h
On Mon, Apr 15, 2019 at 03:01:51PM -0700, Nathan Chancellor wrote:
On Mon, Apr 15, 2019 at 08:44:36PM +0200, Greg Kroah-Hartman wrote:
From: Will Deacon will.deacon@arm.com
[...]
@@ -53,29 +53,29 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) {
- int oldval = 0, ret, tmp;
- int oldval, ret, tmp;
pagefault_disable(); switch (op) { case FUTEX_OP_SET:
__futex_atomic_op("mov %w0, %w4",
break; case FUTEX_OP_ADD:__futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("add %w0, %w1, %w4",
break; case FUTEX_OP_OR:__futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("orr %w0, %w1, %w4",
break; case FUTEX_OP_ANDN:__futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("and %w0, %w1, %w4",
break; case FUTEX_OP_XOR:__futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg);
__futex_atomic_op("eor %w0, %w1, %w4",
break; default:__futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
This causes a (false) build warning with AOSP's GCC 4.9.4 (which is used to build nearly all arm64 Android kernels before 4.14):
CC kernel/futex.o ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1492:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:69:0: ../arch/arm64/include/asm/futex.h:56:6: note: 'oldval' was declared here int oldval, ret, tmp; ^
The only reason I bring this up is Qualcomm based kernels have a Python script that emulates -Werror, meaning this will be fatal for a large number of kernels, when this eventually gets merged into them.
Thanks. Does restoring the initial assignment of 0 suppress the bogus warning? If so, please could you send a patch on top for stable (assuming Greg is ok with the simple change for this)?
Will
On Tue, Apr 16, 2019 at 10:13:40AM +0100, Will Deacon wrote:
On Mon, Apr 15, 2019 at 03:01:51PM -0700, Nathan Chancellor wrote:
On Mon, Apr 15, 2019 at 08:44:36PM +0200, Greg Kroah-Hartman wrote:
From: Will Deacon will.deacon@arm.com
[...]
@@ -53,29 +53,29 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) {
- int oldval = 0, ret, tmp;
- int oldval, ret, tmp;
pagefault_disable(); switch (op) { case FUTEX_OP_SET:
__futex_atomic_op("mov %w0, %w4",
break; case FUTEX_OP_ADD:__futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("add %w0, %w1, %w4",
break; case FUTEX_OP_OR:__futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("orr %w0, %w1, %w4",
break; case FUTEX_OP_ANDN:__futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
__futex_atomic_op("and %w0, %w1, %w4",
break; case FUTEX_OP_XOR:__futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg);
__futex_atomic_op("eor %w0, %w1, %w4",
break; default:__futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg);
This causes a (false) build warning with AOSP's GCC 4.9.4 (which is used to build nearly all arm64 Android kernels before 4.14):
CC kernel/futex.o ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1492:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:69:0: ../arch/arm64/include/asm/futex.h:56:6: note: 'oldval' was declared here int oldval, ret, tmp; ^
The only reason I bring this up is Qualcomm based kernels have a Python script that emulates -Werror, meaning this will be fatal for a large number of kernels, when this eventually gets merged into them.
Thanks. Does restoring the initial assignment of 0 suppress the bogus warning? If so, please could you send a patch on top for stable (assuming Greg is ok with the simple change for this)?
Will
Yes, it does and I sure can. Greg, let me know if that is okay.
From: Dan Carpenter dan.carpenter@oracle.com
commit 42d8644bd77dd2d747e004e367cb0c895a606f39 upstream.
The "call" variable comes from the user in privcmd_ioctl_hypercall(). It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32) elements. We need to put an upper bound on it to prevent an out of bounds access.
Cc: stable@vger.kernel.org Fixes: 1246ae0bb992 ("xen: add variable hypercall caller") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/xen/hypercall.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -216,6 +216,9 @@ privcmd_call(unsigned call, __HYPERCALL_DECLS; __HYPERCALL_5ARG(a1, a2, a3, a4, a5);
+ if (call >= PAGE_SIZE / sizeof(hypercall_page[0])) + return -EINVAL; + stac(); asm volatile(CALL_NOSPEC : __HYPERCALL_5PARAM
From: Mel Gorman mgorman@techsingularity.net
commit 0e9f02450da07fc7b1346c8c32c771555173e397 upstream.
A NULL pointer dereference bug was reported on a distribution kernel but the same issue should be present on mainline kernel. It occured on s390 but should not be arch-specific. A partial oops looks like:
Unable to handle kernel pointer dereference in virtual kernel address space ... Call Trace: ... try_to_wake_up+0xfc/0x450 vhost_poll_wakeup+0x3a/0x50 [vhost] __wake_up_common+0xbc/0x178 __wake_up_common_lock+0x9e/0x160 __wake_up_sync_key+0x4e/0x60 sock_def_readable+0x5e/0x98
The bug hits any time between 1 hour to 3 days. The dereference occurs in update_cfs_rq_h_load when accumulating h_load. The problem is that cfq_rq->h_load_next is not protected by any locking and can be updated by parallel calls to task_h_load. Depending on the compiler, code may be generated that re-reads cfq_rq->h_load_next after the check for NULL and then oops when reading se->avg.load_avg. The dissassembly showed that it was possible to reread h_load_next after the check for NULL.
While this does not appear to be an issue for later compilers, it's still an accident if the correct code is generated. Full locking in this path would have high overhead so this patch uses READ_ONCE to read h_load_next only once and check for NULL before dereferencing. It was confirmed that there were no further oops after 10 days of testing.
As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any potential problems with store tearing.
Signed-off-by: Mel Gorman mgorman@techsingularity.net Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Valentin Schneider valentin.schneider@arm.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()") Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.ne... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6634,10 +6634,10 @@ static void update_cfs_rq_h_load(struct if (cfs_rq->last_h_load_update == now) return;
- cfs_rq->h_load_next = NULL; + WRITE_ONCE(cfs_rq->h_load_next, NULL); for_each_sched_entity(se) { cfs_rq = cfs_rq_of(se); - cfs_rq->h_load_next = se; + WRITE_ONCE(cfs_rq->h_load_next, se); if (cfs_rq->last_h_load_update == now) break; } @@ -6647,7 +6647,7 @@ static void update_cfs_rq_h_load(struct cfs_rq->last_h_load_update = now; }
- while ((se = cfs_rq->h_load_next) != NULL) { + while ((se = READ_ONCE(cfs_rq->h_load_next)) != NULL) { load = cfs_rq->h_load; load = div64_ul(load * se->avg.load_avg, cfs_rq_load_avg(cfs_rq) + 1);
From: Max Filippov jcmvbkbc@gmail.com
commit ada770b1e74a77fff2d5f539bf6c42c25f4784db upstream.
return_address returns the address that is one level higher in the call stack than requested in its argument, because level 0 corresponds to its caller's return address. Use requested level as the number of stack frames to skip.
This fixes the address reported by might_sleep and friends.
Cc: stable@vger.kernel.org Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/xtensa/kernel/stacktrace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/xtensa/kernel/stacktrace.c +++ b/arch/xtensa/kernel/stacktrace.c @@ -272,10 +272,14 @@ static int return_address_cb(struct stac return 1; }
+/* + * level == 0 is for the return address from the caller of this function, + * not from this function itself. + */ unsigned long return_address(unsigned level) { struct return_addr_data r = { - .skip = level + 1, + .skip = level, }; walk_stackframe(stack_pointer(NULL), return_address_cb, &r); return r.addr;
From: Andre Przywara andre.przywara@arm.com
commit 9cde402a59770a0669d895399c13407f63d7d209 upstream.
There is a Marvell 88SE9170 PCIe SATA controller I found on a board here. Some quick testing with the ARM SMMU enabled reveals that it suffers from the same requester ID mixup problems as the other Marvell chips listed already.
Add the PCI vendor/device ID to the list of chips which need the workaround.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3866,6 +3866,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_M /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c14 */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9130, quirk_dma_func1_alias); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9170, + quirk_dma_func1_alias); /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c47 + c57 */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9172, quirk_dma_func1_alias);
stable-rc/linux-4.9.y boot: 92 boots: 0 failed, 83 passed with 9 offline (v4.9.168-77-ga5905936a4b8)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.9.y/kernel/v4.9.1... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.9.y/kernel/v4.9.168-77-g...
Tree: stable-rc Branch: linux-4.9.y Git Describe: v4.9.168-77-ga5905936a4b8 Git Commit: a5905936a4b81c2c035f5abf34914d802573b0e3 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 49 unique boards, 22 SoC families, 15 builds out of 197
Offline Platforms:
arm:
qcom_defconfig: gcc-7 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab
tegra_defconfig: gcc-7 tegra20-iris-512: 1 offline lab
multi_v7_defconfig: gcc-7 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab sun5i-r8-chip: 1 offline lab tegra20-iris-512: 1 offline lab
sunxi_defconfig: gcc-7 sun5i-r8-chip: 1 offline lab
arm64:
defconfig: gcc-7 apq8016-sbc: 1 offline lab
--- For more info write to info@kernelci.org
On Tue, 16 Apr 2019 at 00:15, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.9.169 release. There are 76 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:37 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.169-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.9.169-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.9.y git commit: a5905936a4b81c2c035f5abf34914d802573b0e3 git describe: v4.9.168-77-ga5905936a4b8 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.168-77-...
No regressions (compared to build v4.9.168)
No fixes (compared to build v4.9.168)
Ran 22703 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On 15/04/2019 19:43, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.169 release. There are 76 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:37 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.169-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.9: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 22 tests: 22 pass, 0 fail
Linux version: 4.9.169-rc1-ga590593 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On 4/15/19 11:43 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.169 release. There are 76 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:37 UTC 2019. Anything received after that time might be too late.
Building parisc:defconfig ... failed -------------- Error log: arch/parisc/kernel/setup.c: In function 'start_parisc': arch/parisc/kernel/setup.c:407:2: error: 'running_on_qemu' undeclared
This affects all parisc builds.
Final results follow later.
Guenter
On Mon, Apr 15, 2019 at 08:43:24PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.169 release. There are 76 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:37 UTC 2019. Anything received after that time might be too late.
Build results: total: 172 pass: 165 fail: 7 Failed builds: <all parisc> Qemu test results: total: 320 pass: 304 fail: 16 Failed tests: <all parisc>
Build failures as already reported.
Thanks, Guenter
On Tue, Apr 16, 2019 at 09:29:35AM -0700, Guenter Roeck wrote:
On Mon, Apr 15, 2019 at 08:43:24PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.169 release. There are 76 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:37 UTC 2019. Anything received after that time might be too late.
Build results: total: 172 pass: 165 fail: 7 Failed builds:
<all parisc> Qemu test results: total: 320 pass: 304 fail: 16 Failed tests: <all parisc>
Build failures as already reported.
I should have those build failures fixed.
thanks,
greg k-h
On Tue, Apr 16, 2019 at 08:46:00PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 09:29:35AM -0700, Guenter Roeck wrote:
On Mon, Apr 15, 2019 at 08:43:24PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.169 release. There are 76 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:37 UTC 2019. Anything received after that time might be too late.
Build results: total: 172 pass: 165 fail: 7 Failed builds:
<all parisc> Qemu test results: total: 320 pass: 304 fail: 16 Failed tests: <all parisc>
Build failures as already reported.
I should have those build failures fixed.
I didn't rebuild the other images, but parisc build and boot tests are now fine (with v4.9.168-78-g6ecae2ce7b5a).
Guenter
On Tue, Apr 16, 2019 at 01:19:13PM -0700, Guenter Roeck wrote:
On Tue, Apr 16, 2019 at 08:46:00PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 09:29:35AM -0700, Guenter Roeck wrote:
On Mon, Apr 15, 2019 at 08:43:24PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.169 release. There are 76 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:37 UTC 2019. Anything received after that time might be too late.
Build results: total: 172 pass: 165 fail: 7 Failed builds:
<all parisc> Qemu test results: total: 320 pass: 304 fail: 16 Failed tests: <all parisc>
Build failures as already reported.
I should have those build failures fixed.
I didn't rebuild the other images, but parisc build and boot tests are now fine (with v4.9.168-78-g6ecae2ce7b5a).
Wonderful, thanks for testing this, and all of the others.
greg k-h
On 4/15/19 12:43 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.169 release. There are 76 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:37 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.169-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org