This is the start of the stable review cycle for the 4.14.112 release. There are 69 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:38 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.112-rc1
Tomohiro Mayama parly-gh@iris.mystia.org arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity on rk3328-rock64
Katsuhiro Suzuki katsuhiro@katsuster.net arm64: dts: rockchip: fix vcc_host1_5v pin assign on rk3328-rock64
Ilya Dryomov idryomov@gmail.com dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors
Andre Przywara andre.przywara@arm.com PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
Lendacky, Thomas Thomas.Lendacky@amd.com x86/perf/amd: Remove need to check "running" bit in NMI handler
Lendacky, Thomas Thomas.Lendacky@amd.com x86/perf/amd: Resolve NMI latency issues for active PMCs
Lendacky, Thomas Thomas.Lendacky@amd.com x86/perf/amd: Resolve race condition when disabling PMC
Max Filippov jcmvbkbc@gmail.com xtensa: fix return_address
Mel Gorman mgorman@techsingularity.net sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
Dan Carpenter dan.carpenter@oracle.com xen: Prevent buffer overflow in privcmd ioctl
Will Deacon will.deacon@arm.com arm64: backtrace: Don't bother trying to unwind the userspace stack
Peter Geis pgwipeout@gmail.com arm64: dts: rockchip: fix rk3328 rgmii high tx error rate
Will Deacon will.deacon@arm.com arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
David Engraf david.engraf@sysgo.com ARM: dts: at91: Fix typo in ISC_D0 on PC9
Peter Ujfalusi peter.ujfalusi@ti.com ARM: dts: am335x-evm: Correct the regulators for the audio codec
Peter Ujfalusi peter.ujfalusi@ti.com ARM: dts: am335x-evmsk: Correct the regulators for the audio codec
Cornelia Huck cohuck@redhat.com virtio: Honour 'may_reduce_num' in vring_create_virtqueue
Kefeng Wang wangkefeng.wang@huawei.com genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n
Stephen Boyd swboyd@chromium.org genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
Jason Yan yanaijie@huawei.com block: fix the return errno for direct IO
Jérôme Glisse jglisse@redhat.com block: do not leak memory in bio_copy_user_iov()
Anand Jain anand.jain@oracle.com btrfs: prop: fix vanished compression property after failed set
Anand Jain anand.jain@oracle.com btrfs: prop: fix zstd compression parameter validation
Filipe Manana fdmanana@suse.com Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
S.j. Wang shengjiu.wang@nxp.com ASoC: fsl_esai: fix channel swap issue when stream starts
Arnd Bergmann arnd@arndb.de include/linux/bitrev.h: fix constant bitrev
Dave Airlie airlied@redhat.com drm/udl: add a release method and delay modeset teardown
Andrei Vagin avagin@gmail.com alarmtimer: Return correct remaining time
Sven Schnelle svens@stackframe.org parisc: regs_return_value() should return gpr28
Helge Deller deller@gmx.de parisc: Detect QEMU earlier in boot process
Peter Geis pgwipeout@gmail.com arm64: dts: rockchip: fix rk3328 sdmmc0 write errors
Haiyang Zhang haiyangz@microsoft.com hv_netvsc: Fix unwanted wakeup after tx_disable
Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type
Zubin Mithra zsm@chromium.org ALSA: seq: Fix OOB-reads from strlcpy
Li RongQing lirongqing@baidu.com net: ethtool: not call vzalloc for zero sized memory request
Eric Dumazet edumazet@google.com netns: provide pure entropy for net_hash_mix()
Davide Caratti dcaratti@redhat.com net/sched: act_sample: fix divide by zero in the traffic path
Michael Chan michael.chan@broadcom.com bnxt_en: Reset device on RX buffer errors.
Michael Chan michael.chan@broadcom.com bnxt_en: Improve RX consumer index validity check.
Jakub Kicinski jakub.kicinski@netronome.com nfp: validate the return code from dev_queue_xmit()
Yuval Avnery yuvalav@mellanox.com net/mlx5e: Add a lock on tir list
Gavi Teitz gavi@mellanox.com net/mlx5e: Fix error handling when refreshing TIRs
Stephen Suryaputra ssuryaextr@gmail.com vrf: check accept_source_route on the original netdevice
Koen De Schepper koen.de_schepper@nokia-bell-labs.com tcp: Ensure DCTCP reacts to losses
Xin Long lucien.xin@gmail.com sctp: initialize _pad of sockaddr_in before copying to user memory
Bjørn Mork bjorn@mork.no qmi_wwan: add Olicard 600
Andrea Righi andrea.righi@canonical.com openvswitch: fix flow actions reallocation
Nicolas Dichtel nicolas.dichtel@6wind.com net/sched: fix ->get helper of the matchall cls
Mao Wenan maowenan@huawei.com net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().
Artemy Kovalyov artemyko@mellanox.com net/mlx5: Decrease default mr cache size
Steffen Klassert steffen.klassert@secunet.com net-gro: Fix GRO flush when receiving a GSO packet.
Jiri Slaby jslaby@suse.cz kcm: switch order of device registration to fix a crash
Lorenzo Bianconi lorenzo.bianconi@redhat.com ipv6: sit: reset ip header pointer in ipip6_rcv
Junwei Hu hujunwei4@huawei.com ipv6: Fix dangling pointer when ipv6 fragment
Greg Kroah-Hartman gregkh@linuxfoundation.org tty: ldisc: add sysctl to prevent autoloading of ldiscs
Greg Kroah-Hartman gregkh@linuxfoundation.org tty: mark Siemens R3964 line discipline as BROKEN
Yueyi Li liyueyi@live.com arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear region
Gilad Ben-Yossef gilad@benyossef.com stating: ccree: revert "staging: ccree: fix leak of import() after init()"
Nick Desaulniers ndesaulniers@google.com lib/string.c: implement a basic bcmp
Nick Desaulniers ndesaulniers@google.com x86/vdso: Drop implicit common-page-size linker flag
Alistair Strachan astrachan@google.com x86: vdso: Use $LD instead of $CC to link
Nick Desaulniers ndesaulniers@google.com kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD
Breno Leitao leitao@debian.org powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM
Yan Zhao yan.y.zhao@intel.com drm/i915/gvt: do not let pin count of shadow mm go negative
Andy Lutomirski luto@kernel.org x86/power: Make restore_processor_context() sane
Andy Lutomirski luto@kernel.org x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
Andy Lutomirski luto@kernel.org x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
Andy Lutomirski luto@kernel.org x86/power: Fix some ordering bugs in __restore_processor_context()
Marek Behún marek.behun@nic.cz net: sfp: move sfp_register_socket call from sfp_remove to sfp_probe
-------------
Diffstat:
Makefile | 6 +- arch/arm/boot/dts/am335x-evm.dts | 26 +++- arch/arm/boot/dts/am335x-evmsk.dts | 26 +++- arch/arm/boot/dts/sama5d2-pinfunc.h | 2 +- arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 5 +- arch/arm64/boot/dts/rockchip/rk3328.dtsi | 58 ++++----- arch/arm64/include/asm/futex.h | 16 +-- arch/arm64/kernel/traps.c | 15 ++- arch/arm64/mm/init.c | 2 +- arch/parisc/include/asm/ptrace.h | 2 +- arch/parisc/kernel/process.c | 6 - arch/parisc/kernel/setup.c | 3 + arch/powerpc/kernel/signal_64.c | 23 +++- arch/x86/entry/vdso/Makefile | 22 ++-- arch/x86/events/amd/core.c | 140 ++++++++++++++++++++- arch/x86/events/core.c | 13 +- arch/x86/include/asm/suspend_32.h | 8 +- arch/x86/include/asm/suspend_64.h | 19 ++- arch/x86/include/asm/xen/hypercall.h | 3 + arch/x86/power/cpu.c | 96 +++++++------- arch/xtensa/kernel/stacktrace.c | 6 +- block/bio.c | 5 +- drivers/char/Kconfig | 2 +- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- drivers/gpu/drm/udl/udl_drv.c | 1 + drivers/gpu/drm/udl/udl_drv.h | 1 + drivers/gpu/drm/udl/udl_main.c | 8 +- drivers/md/dm-table.c | 39 ++++++ drivers/net/ethernet/broadcom/bnxt/bnxt.c | 16 ++- .../net/ethernet/mellanox/mlx5/core/en_common.c | 13 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 20 --- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 2 +- drivers/net/hyperv/hyperv_net.h | 1 + drivers/net/hyperv/netvsc.c | 6 +- drivers/net/hyperv/netvsc_drv.c | 32 ++++- drivers/net/phy/sfp.c | 8 +- drivers/net/usb/qmi_wwan.c | 1 + drivers/pci/quirks.c | 2 + drivers/staging/ccree/ssi_hash.c | 11 +- drivers/tty/Kconfig | 24 ++++ drivers/tty/tty_io.c | 3 + drivers/tty/tty_ldisc.c | 47 +++++++ drivers/virtio/virtio_ring.c | 2 + fs/block_dev.c | 8 +- fs/btrfs/ioctl.c | 10 ++ fs/btrfs/props.c | 8 +- include/linux/bitrev.h | 46 +++---- include/linux/mlx5/driver.h | 2 + include/linux/string.h | 3 + include/linux/virtio_ring.h | 2 +- include/net/ip.h | 2 +- include/net/net_namespace.h | 1 + include/net/netns/hash.h | 15 +-- kernel/irq/chip.c | 4 + kernel/irq/irqdesc.c | 1 + kernel/sched/fair.c | 6 +- kernel/time/alarmtimer.c | 2 +- lib/string.c | 20 +++ net/core/ethtool.c | 41 +++--- net/core/net_namespace.c | 1 + net/core/skbuff.c | 2 +- net/ipv4/ip_input.c | 7 +- net/ipv4/ip_options.c | 4 +- net/ipv4/tcp_dctcp.c | 36 +++--- net/ipv6/ip6_output.c | 4 +- net/ipv6/ip6_tunnel.c | 4 +- net/ipv6/sit.c | 4 + net/kcm/kcmsock.c | 16 +-- net/openvswitch/flow_netlink.c | 4 +- net/rds/tcp.c | 2 +- net/sched/act_sample.c | 10 +- net/sched/cls_matchall.c | 5 + net/sctp/protocol.c | 1 + sound/core/seq/seq_clientmgr.c | 6 +- sound/soc/fsl/fsl_esai.c | 47 +++++-- 75 files changed, 749 insertions(+), 318 deletions(-)
Commit c4ba68b8691e4 backported from upstream to 4.14 stable was probably applied wrongly, and instead of calling sfp_register_socket in sfp_probe, the socket registering code was put into sfp_remove. This is obviously wrong.
The commit first appeared in 4.14.104. Fix it for the next 4.14 release.
Fixes: c4ba68b8691e4 ("net: sfp: do not probe SFP module before we're attached") Cc: stable stable@vger.kernel.org Cc: Russell King rmk+kernel@armlinux.org.uk Cc: David S. Miller davem@davemloft.net Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Sasha Levin sashal@kernel.org Signed-off-by: Marek Behún marek.behun@nic.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/sfp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index a1b68b19d912..5ab725a571a8 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -878,6 +878,10 @@ static int sfp_probe(struct platform_device *pdev) if (poll) mod_delayed_work(system_wq, &sfp->poll, poll_jiffies);
+ sfp->sfp_bus = sfp_register_socket(sfp->dev, sfp, &sfp_module_ops); + if (!sfp->sfp_bus) + return -ENOMEM; + return 0; }
@@ -887,10 +891,6 @@ static int sfp_remove(struct platform_device *pdev)
sfp_unregister_socket(sfp->sfp_bus);
- sfp->sfp_bus = sfp_register_socket(sfp->dev, sfp, &sfp_module_ops); - if (!sfp->sfp_bus) - return -ENOMEM; - return 0; }
[ Upstream commit 5b06bbcfc2c621da3009da8decb7511500c293ed ]
__restore_processor_context() had a couple of ordering bugs. It restored GSBASE after calling load_gs_index(), and the latter can call into tracing code. It also tried to restore segment registers before restoring the LDT, which is straight-up wrong.
Reorder the code so that we restore GSBASE, then the descriptor tables, then the segments.
This fixes two bugs. First, it fixes a regression that broke resume under certain configurations due to irqflag tracing in native_load_gs_index(). Second, it fixes resume when the userspace process that initiated suspect had funny segments. The latter can be reproduced by compiling this:
// SPDX-License-Identifier: GPL-2.0 /* * ldt_echo.c - Echo argv[1] while using an LDT segment */
int main(int argc, char **argv) { int ret; size_t len; char *buf;
const struct user_desc desc = { .entry_number = 0, .base_addr = 0, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0 };
if (argc != 2) errx(1, "Usage: %s STRING", argv[0]);
len = asprintf(&buf, "%s\n", argv[1]); if (len < 0) errx(1, "Out of memory");
ret = syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)); if (ret < -1) errno = -ret; if (ret) err(1, "modify_ldt");
asm volatile ("movw %0, %%es" :: "rm" ((unsigned short)7)); write(1, buf, len); return 0; }
and running ldt_echo >/sys/power/mem
Without the fix, the latter causes a triple fault on resume.
Fixes: ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to native_load_gs_index()") Reported-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Andy Lutomirski luto@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Jarkko Nikula jarkko.nikula@linux.intel.com Cc: Peter Zijlstra a.p.zijlstra@chello.nl Cc: Borislav Petkov bp@alien8.de Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lkml.kernel.org/r/6b31721ea92f51ea839e79bd97ade4a75b1eeea2.151205730... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/power/cpu.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 04d5157fe7f8..a51d2dfb57d1 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -228,8 +228,20 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) load_idt((const struct desc_ptr *)&ctxt->idt_limit); #endif
+#ifdef CONFIG_X86_64 /* - * segment registers + * We need GSBASE restored before percpu access can work. + * percpu access can happen in exception handlers or in complicated + * helpers like load_gs_index(). + */ + wrmsrl(MSR_GS_BASE, ctxt->gs_base); +#endif + + fix_processor_context(); + + /* + * Restore segment registers. This happens after restoring the GDT + * and LDT, which happen in fix_processor_context(). */ #ifdef CONFIG_X86_32 loadsegment(es, ctxt->es); @@ -250,13 +262,14 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) load_gs_index(ctxt->gs); asm volatile ("movw %0, %%ss" :: "r" (ctxt->ss));
+ /* + * Restore FSBASE and user GSBASE after reloading the respective + * segment selectors. + */ wrmsrl(MSR_FS_BASE, ctxt->fs_base); - wrmsrl(MSR_GS_BASE, ctxt->gs_base); wrmsrl(MSR_KERNEL_GS_BASE, ctxt->gs_kernel_base); #endif
- fix_processor_context(); - do_fpu_end(); tsc_verify_tsc_adjust(true); x86_platform.restore_sched_clock_state();
[ Upstream commit 090edbe23ff57940fca7f57d9165ce57a826bd7a ]
x86_64's saved_context nonsensically used separate idt_limit and idt_base fields and then cast &idt_limit to struct desc_ptr *.
This was correct (with -fno-strict-aliasing), but it's confusing, served no purpose, and required #ifdeffery. Simplify this by using struct desc_ptr directly.
No change in functionality.
Tested-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Andy Lutomirski luto@kernel.org Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Borislav Petkov bpetkov@suse.de Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Pavel Machek pavel@ucw.cz Cc: Peter Zijlstra peterz@infradead.org Cc: Rafael J. Wysocki rjw@rjwysocki.net Cc: Zhang Rui rui.zhang@intel.com Link: http://lkml.kernel.org/r/967909ce38d341b01d45eff53e278e2728a3a93a.1513286253... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/suspend_64.h | 3 +-- arch/x86/power/cpu.c | 11 +---------- 2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h index 7306e911faee..600e9e0aea51 100644 --- a/arch/x86/include/asm/suspend_64.h +++ b/arch/x86/include/asm/suspend_64.h @@ -30,8 +30,7 @@ struct saved_context { u16 gdt_pad; /* Unused */ struct desc_ptr gdt_desc; u16 idt_pad; - u16 idt_limit; - unsigned long idt_base; + struct desc_ptr idt; u16 ldt; u16 tss; unsigned long tr; diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index a51d2dfb57d1..cba2e2c3f89e 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -82,12 +82,8 @@ static void __save_processor_state(struct saved_context *ctxt) /* * descriptor tables */ -#ifdef CONFIG_X86_32 store_idt(&ctxt->idt); -#else -/* CONFIG_X86_64 */ - store_idt((struct desc_ptr *)&ctxt->idt_limit); -#endif + /* * We save it here, but restore it only in the hibernate case. * For ACPI S3 resume, this is loaded via 'early_gdt_desc' in 64-bit @@ -221,12 +217,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) * now restore the descriptor tables to their proper values * ltr is done i fix_processor_context(). */ -#ifdef CONFIG_X86_32 load_idt(&ctxt->idt); -#else -/* CONFIG_X86_64 */ - load_idt((const struct desc_ptr *)&ctxt->idt_limit); -#endif
#ifdef CONFIG_X86_64 /*
On Mon 2019-04-15 20:58:21, Greg Kroah-Hartman wrote:
[ Upstream commit 090edbe23ff57940fca7f57d9165ce57a826bd7a ]
x86_64's saved_context nonsensically used separate idt_limit and idt_base fields and then cast &idt_limit to struct desc_ptr *.
This was correct (with -fno-strict-aliasing), but it's confusing, served no purpose, and required #ifdeffery. Simplify this by using struct desc_ptr directly.
No change in functionality.
While this is nice cleanup, I don't think it fixes real bug. I don't think it is suitable for stable. Pavel
[ Upstream commit 896c80bef4d3b357814a476663158aaf669d0fb3 ]
x86_64 restores system call MSRs in fix_processor_context(), and x86_32 restored them along with segment registers. The 64-bit variant makes more sense, so move the 32-bit code to match the 64-bit code.
No side effects are expected to runtime behavior.
Tested-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Andy Lutomirski luto@kernel.org Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Borislav Petkov bpetkov@suse.de Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Pavel Machek pavel@ucw.cz Cc: Peter Zijlstra peterz@infradead.org Cc: Rafael J. Wysocki rjw@rjwysocki.net Cc: Zhang Rui rui.zhang@intel.com Link: http://lkml.kernel.org/r/65158f8d7ee64dd6bbc6c1c83b3b34aaa854e3ae.1513286253... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/power/cpu.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index cba2e2c3f89e..8e1668470b23 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -176,6 +176,9 @@ static void fix_processor_context(void) write_gdt_entry(desc, GDT_ENTRY_TSS, &tss, DESC_TSS);
syscall_init(); /* This sets MSR_*STAR and related */ +#else + if (boot_cpu_has(X86_FEATURE_SEP)) + enable_sep_cpu(); #endif load_TR_desc(); /* This does ltr */ load_mm_ldt(current->active_mm); /* This does lldt */ @@ -239,12 +242,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) loadsegment(fs, ctxt->fs); loadsegment(gs, ctxt->gs); loadsegment(ss, ctxt->ss); - - /* - * sysenter MSRs - */ - if (boot_cpu_has(X86_FEATURE_SEP)) - enable_sep_cpu(); #else /* CONFIG_X86_64 */ asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
On Mon 2019-04-15 20:58:22, Greg Kroah-Hartman wrote:
[ Upstream commit 896c80bef4d3b357814a476663158aaf669d0fb3 ]
x86_64 restores system call MSRs in fix_processor_context(), and x86_32 restored them along with segment registers. The 64-bit variant makes more sense, so move the 32-bit code to match the 64-bit code.
No side effects are expected to runtime behavior.
Again, this is cleanup, not a bugfix. Pavel
On Mon, Apr 15, 2019 at 10:07:47PM +0200, Pavel Machek wrote:
On Mon 2019-04-15 20:58:22, Greg Kroah-Hartman wrote:
[ Upstream commit 896c80bef4d3b357814a476663158aaf669d0fb3 ]
x86_64 restores system call MSRs in fix_processor_context(), and x86_32 restored them along with segment registers. The 64-bit variant makes more sense, so move the 32-bit code to match the 64-bit code.
No side effects are expected to runtime behavior.
Again, this is cleanup, not a bugfix.
Indeed, these two are just preparatory patches rather than fixes. Patches that come later need these.
-- Thanks, Sasha
[ Upstream commit 7ee18d677989e99635027cee04c878950e0752b9 ]
My previous attempt to fix a couple of bugs in __restore_processor_context():
5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")
... introduced yet another bug, breaking suspend-resume.
Rather than trying to come up with a minimal fix, let's try to clean it up for real. This patch fixes quite a few things:
- The old code saved a nonsensical subset of segment registers. The only registers that need to be saved are those that contain userspace state or those that can't be trivially restored without percpu access working. (On x86_32, we can restore percpu access by writing __KERNEL_PERCPU to %fs. On x86_64, it's easier to save and restore the kernel's GSBASE.) With this patch, we restore hardcoded values to the kernel state where applicable and explicitly restore the user state after fixing all the descriptor tables.
- We used to use an unholy mix of inline asm and C helpers for segment register access. Let's get rid of the inline asm.
This fixes the reported s2ram hangs and make the code all around more logical.
Analyzed-by: Linus Torvalds torvalds@linux-foundation.org Reported-by: Jarkko Nikula jarkko.nikula@linux.intel.com Reported-by: Pavel Machek pavel@ucw.cz Tested-by: Jarkko Nikula jarkko.nikula@linux.intel.com Tested-by: Pavel Machek pavel@ucw.cz Signed-off-by: Andy Lutomirski luto@kernel.org Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Borislav Petkov bpetkov@suse.de Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Rafael J. Wysocki rjw@rjwysocki.net Cc: Zhang Rui rui.zhang@intel.com Fixes: 5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()") Link: http://lkml.kernel.org/r/398ee68e5c0f766425a7b746becfc810840770ff.1513286253... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/suspend_32.h | 8 +++- arch/x86/include/asm/suspend_64.h | 16 ++++++- arch/x86/power/cpu.c | 79 ++++++++++++++++--------------- 3 files changed, 62 insertions(+), 41 deletions(-)
diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h index 982c325dad33..8be6afb58471 100644 --- a/arch/x86/include/asm/suspend_32.h +++ b/arch/x86/include/asm/suspend_32.h @@ -12,7 +12,13 @@
/* image of the saved processor state */ struct saved_context { - u16 es, fs, gs, ss; + /* + * On x86_32, all segment registers, with the possible exception of + * gs, are saved at kernel entry in pt_regs. + */ +#ifdef CONFIG_X86_32_LAZY_GS + u16 gs; +#endif unsigned long cr0, cr2, cr3, cr4; u64 misc_enable; bool misc_enable_saved; diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h index 600e9e0aea51..a7af9f53c0cb 100644 --- a/arch/x86/include/asm/suspend_64.h +++ b/arch/x86/include/asm/suspend_64.h @@ -20,8 +20,20 @@ */ struct saved_context { struct pt_regs regs; - u16 ds, es, fs, gs, ss; - unsigned long gs_base, gs_kernel_base, fs_base; + + /* + * User CS and SS are saved in current_pt_regs(). The rest of the + * segment selectors need to be saved and restored here. + */ + u16 ds, es, fs, gs; + + /* + * Usermode FSBASE and GSBASE may not match the fs and gs selectors, + * so we save them separately. We save the kernelmode GSBASE to + * restore percpu access after resume. + */ + unsigned long kernelmode_gs_base, usermode_gs_base, fs_base; + unsigned long cr0, cr2, cr3, cr4, cr8; u64 misc_enable; bool misc_enable_saved; diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 8e1668470b23..a7d966964c6f 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -99,22 +99,18 @@ static void __save_processor_state(struct saved_context *ctxt) /* * segment registers */ -#ifdef CONFIG_X86_32 - savesegment(es, ctxt->es); - savesegment(fs, ctxt->fs); +#ifdef CONFIG_X86_32_LAZY_GS savesegment(gs, ctxt->gs); - savesegment(ss, ctxt->ss); -#else -/* CONFIG_X86_64 */ - asm volatile ("movw %%ds, %0" : "=m" (ctxt->ds)); - asm volatile ("movw %%es, %0" : "=m" (ctxt->es)); - asm volatile ("movw %%fs, %0" : "=m" (ctxt->fs)); - asm volatile ("movw %%gs, %0" : "=m" (ctxt->gs)); - asm volatile ("movw %%ss, %0" : "=m" (ctxt->ss)); +#endif +#ifdef CONFIG_X86_64 + savesegment(gs, ctxt->gs); + savesegment(fs, ctxt->fs); + savesegment(ds, ctxt->ds); + savesegment(es, ctxt->es);
rdmsrl(MSR_FS_BASE, ctxt->fs_base); - rdmsrl(MSR_GS_BASE, ctxt->gs_base); - rdmsrl(MSR_KERNEL_GS_BASE, ctxt->gs_kernel_base); + rdmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base); + rdmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base); mtrr_save_fixed_ranges(NULL);
rdmsrl(MSR_EFER, ctxt->efer); @@ -191,9 +187,12 @@ static void fix_processor_context(void) }
/** - * __restore_processor_state - restore the contents of CPU registers saved - * by __save_processor_state() - * @ctxt - structure to load the registers contents from + * __restore_processor_state - restore the contents of CPU registers saved + * by __save_processor_state() + * @ctxt - structure to load the registers contents from + * + * The asm code that gets us here will have restored a usable GDT, although + * it will be pointing to the wrong alias. */ static void notrace __restore_processor_state(struct saved_context *ctxt) { @@ -216,46 +215,50 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) write_cr2(ctxt->cr2); write_cr0(ctxt->cr0);
+ /* Restore the IDT. */ + load_idt(&ctxt->idt); + /* - * now restore the descriptor tables to their proper values - * ltr is done i fix_processor_context(). + * Just in case the asm code got us here with the SS, DS, or ES + * out of sync with the GDT, update them. */ - load_idt(&ctxt->idt); + loadsegment(ss, __KERNEL_DS); + loadsegment(ds, __USER_DS); + loadsegment(es, __USER_DS);
-#ifdef CONFIG_X86_64 /* - * We need GSBASE restored before percpu access can work. - * percpu access can happen in exception handlers or in complicated - * helpers like load_gs_index(). + * Restore percpu access. Percpu access can happen in exception + * handlers or in complicated helpers like load_gs_index(). */ - wrmsrl(MSR_GS_BASE, ctxt->gs_base); +#ifdef CONFIG_X86_64 + wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base); +#else + loadsegment(fs, __KERNEL_PERCPU); + loadsegment(gs, __KERNEL_STACK_CANARY); #endif
+ /* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */ fix_processor_context();
/* - * Restore segment registers. This happens after restoring the GDT - * and LDT, which happen in fix_processor_context(). + * Now that we have descriptor tables fully restored and working + * exception handling, restore the usermode segments. */ -#ifdef CONFIG_X86_32 +#ifdef CONFIG_X86_64 + loadsegment(ds, ctxt->es); loadsegment(es, ctxt->es); loadsegment(fs, ctxt->fs); - loadsegment(gs, ctxt->gs); - loadsegment(ss, ctxt->ss); -#else -/* CONFIG_X86_64 */ - asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds)); - asm volatile ("movw %0, %%es" :: "r" (ctxt->es)); - asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs)); load_gs_index(ctxt->gs); - asm volatile ("movw %0, %%ss" :: "r" (ctxt->ss));
/* - * Restore FSBASE and user GSBASE after reloading the respective - * segment selectors. + * Restore FSBASE and GSBASE after restoring the selectors, since + * restoring the selectors clobbers the bases. Keep in mind + * that MSR_KERNEL_GS_BASE is horribly misnamed. */ wrmsrl(MSR_FS_BASE, ctxt->fs_base); - wrmsrl(MSR_KERNEL_GS_BASE, ctxt->gs_kernel_base); + wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base); +#elif defined(CONFIG_X86_32_LAZY_GS) + loadsegment(gs, ctxt->gs); #endif
do_fpu_end();
On Mon 2019-04-15 20:58:23, Greg Kroah-Hartman wrote:
[ Upstream commit 7ee18d677989e99635027cee04c878950e0752b9 ]
My previous attempt to fix a couple of bugs in __restore_processor_context():
5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")
... introduced yet another bug, breaking suspend-resume.
Rather than trying to come up with a minimal fix, let's try to clean it up for real. This patch fixes quite a few things:
5b06bbcfc2c6 fixed theoretical bug; rather than porting it to stable than fixing it up, it would be better not to port it to stable in the first place or simply revert it there.
Thanks, Pavel
On Mon, Apr 15, 2019 at 1:04 PM Pavel Machek pavel@denx.de wrote:
On Mon 2019-04-15 20:58:23, Greg Kroah-Hartman wrote:
[ Upstream commit 7ee18d677989e99635027cee04c878950e0752b9 ]
My previous attempt to fix a couple of bugs in __restore_processor_context():
5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")
... introduced yet another bug, breaking suspend-resume.
Rather than trying to come up with a minimal fix, let's try to clean it up for real. This patch fixes quite a few things:
5b06bbcfc2c6 fixed theoretical bug; rather than porting it to stable than fixing it up, it would be better not to port it to stable in the first place or simply revert it there.
Are you sure about that? The bug was reported by real users who had their systems really crash:
https://lore.kernel.org/lkml/?q=0fede9f9-88b0-a6e7-1027-dfb2019b8ef2%40linux...
https://lore.kernel.org/lkml/CA+55aFwsMuHUBQz5kDNwRf17JnasXMWjvmLq5qXGH-694y...
And we had a report that the bug got backported:
https://lore.kernel.org/stable/20190407160005.djiw4reapwvbxmgo@debian/
And if we're going to backport some of the fix, we should definitely backport the whole set to avoid having the -stable kernels be in a state that was never in any released kernel.
--Andy https://lore.kernel.org/stable/20190407160005.djiw4reapwvbxmgo@debian/
On Mon 2019-04-15 13:18:56, Andy Lutomirski wrote:
On Mon, Apr 15, 2019 at 1:04 PM Pavel Machek pavel@denx.de wrote:
On Mon 2019-04-15 20:58:23, Greg Kroah-Hartman wrote:
[ Upstream commit 7ee18d677989e99635027cee04c878950e0752b9 ]
My previous attempt to fix a couple of bugs in __restore_processor_context():
5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")
... introduced yet another bug, breaking suspend-resume.
Rather than trying to come up with a minimal fix, let's try to clean it up for real. This patch fixes quite a few things:
5b06bbcfc2c6 fixed theoretical bug; rather than porting it to stable than fixing it up, it would be better not to port it to stable in the first place or simply revert it there.
Are you sure about that? The bug was reported by real users who had their systems really crash:
https://lore.kernel.org/lkml/?q=0fede9f9-88b0-a6e7-1027-dfb2019b8ef2%40linux...
https://lore.kernel.org/lkml/CA+55aFwsMuHUBQz5kDNwRf17JnasXMWjvmLq5qXGH-694y...
And we had a report that the bug got backported:
https://lore.kernel.org/stable/20190407160005.djiw4reapwvbxmgo@debian/
And if we're going to backport some of the fix, we should definitely backport the whole set to avoid having the -stable kernels be in a state that was never in any released kernel.
I agree it should be all or nothing. And I may have been slightly confused.
Anyway: 5b06bbcfc2c6 is fix for ca37e57bbe0cf, and that one is for "mostly harmless warning" (quoting changelog). ca37e57bbe0cf is not present in 4.4.178, I believe best solution is not to add that one in the first place, so we don't have to fix it up. Pavel
On Tue, Apr 16, 2019 at 01:56:44PM +0200, Pavel Machek wrote:
On Mon 2019-04-15 13:18:56, Andy Lutomirski wrote:
On Mon, Apr 15, 2019 at 1:04 PM Pavel Machek pavel@denx.de wrote:
On Mon 2019-04-15 20:58:23, Greg Kroah-Hartman wrote:
[ Upstream commit 7ee18d677989e99635027cee04c878950e0752b9 ]
My previous attempt to fix a couple of bugs in __restore_processor_context():
5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")
... introduced yet another bug, breaking suspend-resume.
Rather than trying to come up with a minimal fix, let's try to clean it up for real. This patch fixes quite a few things:
5b06bbcfc2c6 fixed theoretical bug; rather than porting it to stable than fixing it up, it would be better not to port it to stable in the first place or simply revert it there.
Are you sure about that? The bug was reported by real users who had their systems really crash:
https://lore.kernel.org/lkml/?q=0fede9f9-88b0-a6e7-1027-dfb2019b8ef2%40linux...
https://lore.kernel.org/lkml/CA+55aFwsMuHUBQz5kDNwRf17JnasXMWjvmLq5qXGH-694y...
And we had a report that the bug got backported:
https://lore.kernel.org/stable/20190407160005.djiw4reapwvbxmgo@debian/
And if we're going to backport some of the fix, we should definitely backport the whole set to avoid having the -stable kernels be in a state that was never in any released kernel.
I agree it should be all or nothing. And I may have been slightly confused.
Anyway: 5b06bbcfc2c6 is fix for ca37e57bbe0cf, and that one is for "mostly harmless warning" (quoting changelog). ca37e57bbe0cf is not present in 4.4.178, I believe best solution is not to add that one in the first place, so we don't have to fix it up.
5b06bbcfc2c6 describes in it's commit message two bugs that it fixes: one, as you pointed out, is fixing an issue introduced by ca37e57bbe0cf. The other one which to my understanding is unrelated to ca37e57bbe0cf, is fixing "resume when the userspace process that initiated suspect had funny segments", and provides a straightforward test case for that issue.
-- Thanks, Sasha
[ Upstream commit 663a50ceac75c2208d2ad95365bc8382fd42f44d ]
shadow mm's pin count got increased in workload preparation phase, which is after workload scanning. it will get decreased in complete_current_workload() anyway after workload completion. Sometimes, if a workload meets a scanning error, its shadow mm pin count will not get increased but will get decreased in the end. This patch lets shadow mm's pin count not go below 0.
Fixes: 2707e4446688 ("drm/i915/gvt: vGPU graphics memory virtualization") Cc: zhenyuw@linux.intel.com Cc: stable@vger.kernel.org #4.14+ Signed-off-by: Yan Zhao yan.y.zhao@intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index dadacbe558ab..1a1f7eb46d1e 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1629,7 +1629,7 @@ void intel_vgpu_unpin_mm(struct intel_vgpu_mm *mm) if (WARN_ON(mm->type != INTEL_GVT_MM_PPGTT)) return;
- atomic_dec(&mm->pincount); + atomic_dec_if_positive(&mm->pincount); }
/**
[ Upstream commit 897bc3df8c5aebb54c32d831f917592e873d0559 ]
Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") moved a code block around and this block uses a 'msr' variable outside of the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when CONFIG_PPC_TRANSACTION_MEM is not defined.
error: 'msr' undeclared (first use in this function)
This is not causing a compilation error in the mainline kernel, because 'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set:
#define MSR_TM_ACTIVE(x) 0
This patch just fixes this issue avoiding the 'msr' variable usage outside the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the MSR_TM_ACTIVE() definition.
Cc: stable@vger.kernel.org Reported-by: Christoph Biedl linux-kernel.bfrz@manchmal.in-ulm.de Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint") Signed-off-by: Breno Leitao leitao@debian.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/signal_64.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 979b9463e17b..927384d85faf 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -746,12 +746,25 @@ int sys_rt_sigreturn(unsigned long r3, unsigned long r4, unsigned long r5, if (restore_tm_sigcontexts(current, &uc->uc_mcontext, &uc_transact->uc_mcontext)) goto badframe; - } - else - /* Fall through, for non-TM restore */ + } else #endif - if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) - goto badframe; + { + /* + * Fall through, for non-TM restore + * + * Unset MSR[TS] on the thread regs since MSR from user + * context does not have MSR active, and recheckpoint was + * not called since restore_tm_sigcontexts() was not called + * also. + * + * If not unsetting it, the code can RFID to userspace with + * MSR[TS] set, but without CPU in the proper state, + * causing a TM bad thing. + */ + current->thread.regs->msr &= ~MSR_TS_MASK; + if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) + goto badframe; + }
if (restore_altstack(&uc->uc_stack)) goto badframe;
commit ad15006cc78459d059af56729c4d9bed7c7fd860 upstream.
This causes an issue when trying to build with `make LD=ld.lld` if ld.lld and the rest of your cross tools aren't in the same directory (ex. /usr/local/bin) (as is the case for Android's build system), as the GCC_TOOLCHAIN_DIR then gets set based on `which $(LD)` which will point where LLVM tools are, not GCC/binutils tools are located.
Instead, select the GCC_TOOLCHAIN_DIR based on another tool provided by binutils for which LLVM does not provide a substitute for, such as elfedit.
Fixes: 785f11aa595b ("kbuild: Add better clang cross build support") Link: https://github.com/ClangBuiltLinux/linux/issues/341 Suggested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Nathan Chancellor natechancellor@gmail.com Tested-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile index da223c660c9a..85753250984c 100644 --- a/Makefile +++ b/Makefile @@ -480,7 +480,7 @@ endif ifeq ($(cc-name),clang) ifneq ($(CROSS_COMPILE),) CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%)) -GCC_TOOLCHAIN_DIR := $(dir $(shell which $(LD))) +GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit)) CLANG_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR) GCC_TOOLCHAIN := $(realpath $(GCC_TOOLCHAIN_DIR)/..) endif
commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream.
The vdso{32,64}.so can fail to link with CC=clang when clang tries to find a suitable GCC toolchain to link these libraries with.
/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: access beyond end of merged section (782)
This happens because the host environment leaked into the cross compiler environment due to the way clang searches for suitable GCC toolchains.
Clang is a retargetable compiler, and each invocation of it must provide --target=<something> --gcc-toolchain=<something> to allow it to find the correct binutils for cross compilation. These flags had been added to KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various reasons) which breaks clang's ability to find the correct linker when cross compiling.
Most of the time this goes unnoticed because the host linker is new enough to work anyway, or is incompatible and skipped, but this cannot be reliably assumed.
This change alters the vdso makefile to just use LD directly, which bypasses clang and thus the searching problem. The makefile will just use ${CROSS_COMPILE}ld instead, which is always what we want. This matches the method used to link vmlinux.
This drops references to DISABLE_LTO; this option doesn't seem to be set anywhere, and not knowing what its possible values are, it's not clear how to convert it from CC to LD flag.
Signed-off-by: Alistair Strachan astrachan@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: kernel-team@android.com Cc: joel@joelfernandes.org Cc: Andi Kleen andi.kleen@intel.com Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrachan@google.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/entry/vdso/Makefile | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 0a550dc5c525..0defcc939ab4 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -48,10 +48,8 @@ targets += $(vdso_img_sodbg)
export CPPFLAGS_vdso.lds += -P -C
-VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \ - -Wl,--no-undefined \ - -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \ - $(DISABLE_LTO) +VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \ + -z max-page-size=4096 -z common-page-size=4096
$(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) @@ -103,10 +101,8 @@ CFLAGS_REMOVE_vvar.o = -pg #
CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \ - -Wl,-soname=linux-vdso.so.1 \ - -Wl,-z,max-page-size=4096 \ - -Wl,-z,common-page-size=4096 +VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \ + -z max-page-size=4096 -z common-page-size=4096
# 64-bit objects to re-brand as x32 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y)) @@ -134,7 +130,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE $(call if_changed,vdso)
CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1 +VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
# This makes sure the $(obj) subdirectory exists even though vdso32/ # is not a kbuild sub-make subdirectory. @@ -180,13 +176,13 @@ $(obj)/vdso32.so.dbg: FORCE \ # The DSO images are built using a special linker script. # quiet_cmd_vdso = VDSO $@ - cmd_vdso = $(CC) -nostdlib -o $@ \ + cmd_vdso = $(LD) -nostdlib -o $@ \ $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter %.lds,$(^F))) \ - -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \ + -T $(filter %.lds,$^) $(filter %.o,$^) && \ sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) \ - $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS) +VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \ + $(call ld-option, --build-id) -Bsymbolic GCOV_PROFILE := n
#
On Mon, 2019-04-15 at 20:58 +0200, Greg Kroah-Hartman wrote:
commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream.
Hi,
With this patch in 4.14.112 build-id is now missing in vdso32.so:
$ file arch/x86/entry/vdso/vdso*so* arch/x86/entry/vdso/vdso32.so: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, stripped arch/x86/entry/vdso/vdso32.so.dbg: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, with debug_info, not stripped arch/x86/entry/vdso/vdso64.so: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, stripped arch/x86/entry/vdso/vdso64.so.dbg: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, with debug_info, not stripped
Based on quick check, "$(call ld-option, --build-id)" fails due to some 32/64 bit mismatch, so the --build-id linker flag is not used when linking vdso32.so
Perhaps scripts/Kbuild.include is missing some change in 4.14.y to make this work properly.
-Tommi
The vdso{32,64}.so can fail to link with CC=clang when clang tries to find a suitable GCC toolchain to link these libraries with.
/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: access beyond end of merged section (782)
This happens because the host environment leaked into the cross compiler environment due to the way clang searches for suitable GCC toolchains.
Clang is a retargetable compiler, and each invocation of it must provide --target=<something> --gcc-toolchain=<something> to allow it to find the correct binutils for cross compilation. These flags had been added to KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various reasons) which breaks clang's ability to find the correct linker when cross compiling.
Most of the time this goes unnoticed because the host linker is new enough to work anyway, or is incompatible and skipped, but this cannot be reliably assumed.
This change alters the vdso makefile to just use LD directly, which bypasses clang and thus the searching problem. The makefile will just use ${CROSS_COMPILE}ld instead, which is always what we want. This matches the method used to link vmlinux.
This drops references to DISABLE_LTO; this option doesn't seem to be set anywhere, and not knowing what its possible values are, it's not clear how to convert it from CC to LD flag.
Signed-off-by: Alistair Strachan astrachan@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: kernel-team@android.com Cc: joel@joelfernandes.org Cc: Andi Kleen andi.kleen@intel.com Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrachan@google.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/entry/vdso/Makefile | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 0a550dc5c525..0defcc939ab4 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -48,10 +48,8 @@ targets += $(vdso_img_sodbg) export CPPFLAGS_vdso.lds += -P -C -VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-Wl,--no-undefined \
-Wl,-z,max-page-size=4096 -Wl,-z,common-page-
size=4096 \
$(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no- undefined \
-z max-page-size=4096 -z common-page-size=4096
$(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) @@ -103,10 +101,8 @@ CFLAGS_REMOVE_vvar.o = -pg # CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-Wl,-soname=linux-vdso.so.1 \
-Wl,-z,max-page-size=4096 \
-Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
-z max-page-size=4096 -z common-page-
size=4096 # 64-bit objects to re-brand as x32 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y)) @@ -134,7 +130,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE $(call if_changed,vdso) CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux- gate.so.1 +VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1 # This makes sure the $(obj) subdirectory exists even though vdso32/ # is not a kbuild sub-make subdirectory. @@ -180,13 +176,13 @@ $(obj)/vdso32.so.dbg: FORCE \ # The DSO images are built using a special linker script. # quiet_cmd_vdso = VDSO $@
cmd_vdso = $(CC) -nostdlib -o $@ \
cmd_vdso = $(LD) -nostdlib -o $@ \ $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter
%.lds,$(^F))) \
-Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'-T $(filter %.lds,$^) $(filter %.o,$^) && \
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash- style=both) \
- $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic
$(LTO_CFLAGS) +VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
- $(call ld-option, --build-id) -Bsymbolic
GCOV_PROFILE := n #
On Fri, Apr 26, 2019 at 11:41:30AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
On Mon, 2019-04-15 at 20:58 +0200, Greg Kroah-Hartman wrote:
commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream.
Hi,
With this patch in 4.14.112 build-id is now missing in vdso32.so:
$ file arch/x86/entry/vdso/vdso*so* arch/x86/entry/vdso/vdso32.so: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, stripped arch/x86/entry/vdso/vdso32.so.dbg: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, with debug_info, not stripped arch/x86/entry/vdso/vdso64.so: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, stripped arch/x86/entry/vdso/vdso64.so.dbg: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, with debug_info, not stripped
Based on quick check, "$(call ld-option, --build-id)" fails due to some 32/64 bit mismatch, so the --build-id linker flag is not used when linking vdso32.so
Perhaps scripts/Kbuild.include is missing some change in 4.14.y to make this work properly.
Hi Tommi,
This appears to be fixed by commit 0294e6f4a000 ("kbuild: simplify ld-option implementation") upstream. Could you test the attached backport and make sure everything works on your end? Assuming that it does, I will test the other stable releases and see if this is needed and send those backports along.
Thanks and sorry for the trouble! Nathan
-Tommi
The vdso{32,64}.so can fail to link with CC=clang when clang tries to find a suitable GCC toolchain to link these libraries with.
/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: access beyond end of merged section (782)
This happens because the host environment leaked into the cross compiler environment due to the way clang searches for suitable GCC toolchains.
Clang is a retargetable compiler, and each invocation of it must provide --target=<something> --gcc-toolchain=<something> to allow it to find the correct binutils for cross compilation. These flags had been added to KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various reasons) which breaks clang's ability to find the correct linker when cross compiling.
Most of the time this goes unnoticed because the host linker is new enough to work anyway, or is incompatible and skipped, but this cannot be reliably assumed.
This change alters the vdso makefile to just use LD directly, which bypasses clang and thus the searching problem. The makefile will just use ${CROSS_COMPILE}ld instead, which is always what we want. This matches the method used to link vmlinux.
This drops references to DISABLE_LTO; this option doesn't seem to be set anywhere, and not knowing what its possible values are, it's not clear how to convert it from CC to LD flag.
Signed-off-by: Alistair Strachan astrachan@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: kernel-team@android.com Cc: joel@joelfernandes.org Cc: Andi Kleen andi.kleen@intel.com Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrachan@google.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/entry/vdso/Makefile | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 0a550dc5c525..0defcc939ab4 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -48,10 +48,8 @@ targets += $(vdso_img_sodbg) export CPPFLAGS_vdso.lds += -P -C -VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-Wl,--no-undefined \
-Wl,-z,max-page-size=4096 -Wl,-z,common-page-
size=4096 \
$(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no- undefined \
-z max-page-size=4096 -z common-page-size=4096
$(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) @@ -103,10 +101,8 @@ CFLAGS_REMOVE_vvar.o = -pg # CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-Wl,-soname=linux-vdso.so.1 \
-Wl,-z,max-page-size=4096 \
-Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
-z max-page-size=4096 -z common-page-
size=4096 # 64-bit objects to re-brand as x32 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y)) @@ -134,7 +130,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE $(call if_changed,vdso) CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux- gate.so.1 +VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1 # This makes sure the $(obj) subdirectory exists even though vdso32/ # is not a kbuild sub-make subdirectory. @@ -180,13 +176,13 @@ $(obj)/vdso32.so.dbg: FORCE \ # The DSO images are built using a special linker script. # quiet_cmd_vdso = VDSO $@
cmd_vdso = $(CC) -nostdlib -o $@ \
cmd_vdso = $(LD) -nostdlib -o $@ \ $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter
%.lds,$(^F))) \
-Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'-T $(filter %.lds,$^) $(filter %.o,$^) && \
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash- style=both) \
- $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic
$(LTO_CFLAGS) +VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
- $(call ld-option, --build-id) -Bsymbolic
GCOV_PROFILE := n #
On Fri, 2019-04-26 at 05:48 -0700, Nathan Chancellor wrote:
On Fri, Apr 26, 2019 at 11:41:30AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
On Mon, 2019-04-15 at 20:58 +0200, Greg Kroah-Hartman wrote:
commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream.
Hi,
With this patch in 4.14.112 build-id is now missing in vdso32.so:
$ file arch/x86/entry/vdso/vdso*so* arch/x86/entry/vdso/vdso32.so: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, stripped arch/x86/entry/vdso/vdso32.so.dbg: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, with debug_info, not stripped arch/x86/entry/vdso/vdso64.so: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, stripped arch/x86/entry/vdso/vdso64.so.dbg: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, with debug_info, not stripped
Based on quick check, "$(call ld-option, --build-id)" fails due to some 32/64 bit mismatch, so the --build-id linker flag is not used when linking vdso32.so
Perhaps scripts/Kbuild.include is missing some change in 4.14.y to make this work properly.
Hi Tommi,
This appears to be fixed by commit 0294e6f4a000 ("kbuild: simplify ld-option implementation") upstream. Could you test the attached backport and make sure everything works on your end? Assuming that it does, I will test the other stable releases and see if this is needed and send those backports along.
Yes this patch fixes it. Many thanks!
-Tommi
Thanks and sorry for the trouble! Nathan
-Tommi
The vdso{32,64}.so can fail to link with CC=clang when clang tries to find a suitable GCC toolchain to link these libraries with.
/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: access beyond end of merged section (782)
This happens because the host environment leaked into the cross compiler environment due to the way clang searches for suitable GCC toolchains.
Clang is a retargetable compiler, and each invocation of it must provide --target=<something> --gcc-toolchain=<something> to allow it to find the correct binutils for cross compilation. These flags had been added to KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various reasons) which breaks clang's ability to find the correct linker when cross compiling.
Most of the time this goes unnoticed because the host linker is new enough to work anyway, or is incompatible and skipped, but this cannot be reliably assumed.
This change alters the vdso makefile to just use LD directly, which bypasses clang and thus the searching problem. The makefile will just use ${CROSS_COMPILE}ld instead, which is always what we want. This matches the method used to link vmlinux.
This drops references to DISABLE_LTO; this option doesn't seem to be set anywhere, and not knowing what its possible values are, it's not clear how to convert it from CC to LD flag.
Signed-off-by: Alistair Strachan astrachan@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: kernel-team@android.com Cc: joel@joelfernandes.org Cc: Andi Kleen andi.kleen@intel.com Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrachan@google.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/entry/vdso/Makefile | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 0a550dc5c525..0defcc939ab4 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -48,10 +48,8 @@ targets += $(vdso_img_sodbg) export CPPFLAGS_vdso.lds += -P -C -VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-Wl,--no-undefined \
-Wl,-z,max-page-size=4096 -Wl,-z,common-page-
size=4096 \
$(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 -- no- undefined \
-z max-page-size=4096 -z common-page-size=4096
$(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) @@ -103,10 +101,8 @@ CFLAGS_REMOVE_vvar.o = -pg # CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-Wl,-soname=linux-vdso.so.1 \
-Wl,-z,max-page-size=4096 \
-Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux- vdso.so.1 \
-z max-page-size=4096 -z common-page-
size=4096 # 64-bit objects to re-brand as x32 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y)) @@ -134,7 +130,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE $(call if_changed,vdso) CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,- soname=linux- gate.so.1 +VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1 # This makes sure the $(obj) subdirectory exists even though vdso32/ # is not a kbuild sub-make subdirectory. @@ -180,13 +176,13 @@ $(obj)/vdso32.so.dbg: FORCE \ # The DSO images are built using a special linker script. # quiet_cmd_vdso = VDSO $@
cmd_vdso = $(CC) -nostdlib -o $@ \
cmd_vdso = $(LD) -nostdlib -o $@ \ $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter
%.lds,$(^F))) \
-Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'-T $(filter %.lds,$^) $(filter %.o,$^) && \
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma) --hash- style=both) \
- $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic
$(LTO_CFLAGS) +VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
- $(call ld-option, --build-id) -Bsymbolic
GCOV_PROFILE := n #
On Fri, Apr 26, 2019 at 01:23:17PM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
On Fri, 2019-04-26 at 05:48 -0700, Nathan Chancellor wrote:
On Fri, Apr 26, 2019 at 11:41:30AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
On Mon, 2019-04-15 at 20:58 +0200, Greg Kroah-Hartman wrote:
commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream.
Hi,
With this patch in 4.14.112 build-id is now missing in vdso32.so:
$ file arch/x86/entry/vdso/vdso*so* arch/x86/entry/vdso/vdso32.so: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, stripped arch/x86/entry/vdso/vdso32.so.dbg: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, with debug_info, not stripped arch/x86/entry/vdso/vdso64.so: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, stripped arch/x86/entry/vdso/vdso64.so.dbg: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, with debug_info, not stripped
Based on quick check, "$(call ld-option, --build-id)" fails due to some 32/64 bit mismatch, so the --build-id linker flag is not used when linking vdso32.so
Perhaps scripts/Kbuild.include is missing some change in 4.14.y to make this work properly.
Hi Tommi,
This appears to be fixed by commit 0294e6f4a000 ("kbuild: simplify ld-option implementation") upstream. Could you test the attached backport and make sure everything works on your end? Assuming that it does, I will test the other stable releases and see if this is needed and send those backports along.
Yes this patch fixes it. Many thanks!
Thanks for verifying!
Greg, attached are backports for that commit for 4.4, 4.9, and 4.14. It appeared in 4.16 so it is not needed with a newer version.
Thanks, Nathan
-Tommi
Thanks and sorry for the trouble! Nathan
-Tommi
The vdso{32,64}.so can fail to link with CC=clang when clang tries to find a suitable GCC toolchain to link these libraries with.
/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o: access beyond end of merged section (782)
This happens because the host environment leaked into the cross compiler environment due to the way clang searches for suitable GCC toolchains.
Clang is a retargetable compiler, and each invocation of it must provide --target=<something> --gcc-toolchain=<something> to allow it to find the correct binutils for cross compilation. These flags had been added to KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various reasons) which breaks clang's ability to find the correct linker when cross compiling.
Most of the time this goes unnoticed because the host linker is new enough to work anyway, or is incompatible and skipped, but this cannot be reliably assumed.
This change alters the vdso makefile to just use LD directly, which bypasses clang and thus the searching problem. The makefile will just use ${CROSS_COMPILE}ld instead, which is always what we want. This matches the method used to link vmlinux.
This drops references to DISABLE_LTO; this option doesn't seem to be set anywhere, and not knowing what its possible values are, it's not clear how to convert it from CC to LD flag.
Signed-off-by: Alistair Strachan astrachan@google.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: kernel-team@android.com Cc: joel@joelfernandes.org Cc: Andi Kleen andi.kleen@intel.com Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrachan@google.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/entry/vdso/Makefile | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 0a550dc5c525..0defcc939ab4 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -48,10 +48,8 @@ targets += $(vdso_img_sodbg) export CPPFLAGS_vdso.lds += -P -C -VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
-Wl,--no-undefined \
-Wl,-z,max-page-size=4096 -Wl,-z,common-page-
size=4096 \
$(DISABLE_LTO)
+VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 -- no- undefined \
-z max-page-size=4096 -z common-page-size=4096
$(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) @@ -103,10 +101,8 @@ CFLAGS_REMOVE_vvar.o = -pg # CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
-Wl,-soname=linux-vdso.so.1 \
-Wl,-z,max-page-size=4096 \
-Wl,-z,common-page-size=4096
+VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux- vdso.so.1 \
-z max-page-size=4096 -z common-page-
size=4096 # 64-bit objects to re-brand as x32 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y)) @@ -134,7 +130,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds $(vobjx32s) FORCE $(call if_changed,vdso) CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds) -VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,- soname=linux- gate.so.1 +VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1 # This makes sure the $(obj) subdirectory exists even though vdso32/ # is not a kbuild sub-make subdirectory. @@ -180,13 +176,13 @@ $(obj)/vdso32.so.dbg: FORCE \ # The DSO images are built using a special linker script. # quiet_cmd_vdso = VDSO $@
cmd_vdso = $(CC) -nostdlib -o $@ \
cmd_vdso = $(LD) -nostdlib -o $@ \ $(VDSO_LDFLAGS) $(VDSO_LDFLAGS_$(filter
%.lds,$(^F))) \
-Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'-T $(filter %.lds,$^) $(filter %.o,$^) && \
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma) --hash- style=both) \
- $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic
$(LTO_CFLAGS) +VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \
- $(call ld-option, --build-id) -Bsymbolic
GCOV_PROFILE := n #
On Fri, Apr 26, 2019 at 06:34:22AM -0700, Nathan Chancellor wrote:
On Fri, Apr 26, 2019 at 01:23:17PM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
On Fri, 2019-04-26 at 05:48 -0700, Nathan Chancellor wrote:
On Fri, Apr 26, 2019 at 11:41:30AM +0000, Rantala, Tommi T. (Nokia - FI/Espoo) wrote:
On Mon, 2019-04-15 at 20:58 +0200, Greg Kroah-Hartman wrote:
commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream.
Hi,
With this patch in 4.14.112 build-id is now missing in vdso32.so:
$ file arch/x86/entry/vdso/vdso*so* arch/x86/entry/vdso/vdso32.so: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, stripped arch/x86/entry/vdso/vdso32.so.dbg: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, with debug_info, not stripped arch/x86/entry/vdso/vdso64.so: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, stripped arch/x86/entry/vdso/vdso64.so.dbg: ELF 64-bit LSB pie executable, x86- 64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, with debug_info, not stripped
Based on quick check, "$(call ld-option, --build-id)" fails due to some 32/64 bit mismatch, so the --build-id linker flag is not used when linking vdso32.so
Perhaps scripts/Kbuild.include is missing some change in 4.14.y to make this work properly.
Hi Tommi,
This appears to be fixed by commit 0294e6f4a000 ("kbuild: simplify ld-option implementation") upstream. Could you test the attached backport and make sure everything works on your end? Assuming that it does, I will test the other stable releases and see if this is needed and send those backports along.
Yes this patch fixes it. Many thanks!
Thanks for verifying!
Greg, attached are backports for that commit for 4.4, 4.9, and 4.14. It appeared in 4.16 so it is not needed with a newer version.
Now queued up, thanks!
greg k-h
commit ac3e233d29f7f77f28243af0132057d378d3ea58 upstream.
GNU linker's -z common-page-size's default value is based on the target architecture. arch/x86/entry/vdso/Makefile sets it to the architecture default, which is implicit and redundant. Drop it.
Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Reported-by: Dmitry Golovin dima@golovin.in Reported-by: Bill Wendling morbo@google.com Suggested-by: Dmitry Golovin dima@golovin.in Suggested-by: Rui Ueyama ruiu@google.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Andy Lutomirski luto@kernel.org Cc: Andi Kleen andi@firstfloor.org Cc: Fangrui Song maskray@google.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com Link: https://bugs.llvm.org/show_bug.cgi?id=38774 Link: https://github.com/ClangBuiltLinux/linux/issues/31 Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/entry/vdso/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 0defcc939ab4..839015f1b0de 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -49,7 +49,7 @@ targets += $(vdso_img_sodbg) export CPPFLAGS_vdso.lds += -P -C
VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \ - -z max-page-size=4096 -z common-page-size=4096 + -z max-page-size=4096
$(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) @@ -102,7 +102,7 @@ CFLAGS_REMOVE_vvar.o = -pg
CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds) VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \ - -z max-page-size=4096 -z common-page-size=4096 + -z max-page-size=4096
# 64-bit objects to re-brand as x32 vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y))
[ Upstream commit 5f074f3e192f10c9fade898b9b3b8812e3d83342 ]
A recent optimization in Clang (r355672) lowers comparisons of the return value of memcmp against zero to comparisons of the return value of bcmp against zero. This helps some platforms that implement bcmp more efficiently than memcmp. glibc simply aliases bcmp to memcmp, but an optimized implementation is in the works.
This results in linkage failures for all targets with Clang due to the undefined symbol. For now, just implement bcmp as a tailcail to memcmp to unbreak the build. This routine can be further optimized in the future.
Other ideas discussed:
* A weak alias was discussed, but breaks for architectures that define their own implementations of memcmp since aliases to declarations are not permitted (only definitions). Arch-specific memcmp implementations typically declare memcmp in C headers, but implement them in assembly.
* -ffreestanding also is used sporadically throughout the kernel.
* -fno-builtin-bcmp doesn't work when doing LTO.
Link: https://bugs.llvm.org/show_bug.cgi?id=41035 Link: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#bcmp Link: https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18... Link: https://github.com/ClangBuiltLinux/linux/issues/416 Link: http://lkml.kernel.org/r/20190313211335.165605-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Reported-by: Nathan Chancellor natechancellor@gmail.com Reported-by: Adhemerval Zanella adhemerval.zanella@linaro.org Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: James Y Knight jyknight@google.com Suggested-by: Masahiro Yamada yamada.masahiro@socionext.com Suggested-by: Nathan Chancellor natechancellor@gmail.com Suggested-by: Rasmus Villemoes linux@rasmusvillemoes.dk Acked-by: Steven Rostedt (VMware) rostedt@goodmis.org Reviewed-by: Nathan Chancellor natechancellor@gmail.com Tested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Masahiro Yamada yamada.masahiro@socionext.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: David Laight David.Laight@ACULAB.COM Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Namhyung Kim namhyung@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Dan Williams dan.j.williams@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/string.h | 3 +++ lib/string.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+)
diff --git a/include/linux/string.h b/include/linux/string.h index 96115bf561b4..3d43329c20be 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -142,6 +142,9 @@ extern void * memscan(void *,int,__kernel_size_t); #ifndef __HAVE_ARCH_MEMCMP extern int memcmp(const void *,const void *,__kernel_size_t); #endif +#ifndef __HAVE_ARCH_BCMP +extern int bcmp(const void *,const void *,__kernel_size_t); +#endif #ifndef __HAVE_ARCH_MEMCHR extern void * memchr(const void *,int,__kernel_size_t); #endif diff --git a/lib/string.c b/lib/string.c index 5e8d410a93df..1530643edf00 100644 --- a/lib/string.c +++ b/lib/string.c @@ -865,6 +865,26 @@ __visible int memcmp(const void *cs, const void *ct, size_t count) EXPORT_SYMBOL(memcmp); #endif
+#ifndef __HAVE_ARCH_BCMP +/** + * bcmp - returns 0 if and only if the buffers have identical contents. + * @a: pointer to first buffer. + * @b: pointer to second buffer. + * @len: size of buffers. + * + * The sign or magnitude of a non-zero return value has no particular + * meaning, and architectures may implement their own more efficient bcmp(). So + * while this particular implementation is a simple (tail) call to memcmp, do + * not rely on anything but whether the return value is zero or non-zero. + */ +#undef bcmp +int bcmp(const void *a, const void *b, size_t len) +{ + return memcmp(a, b, len); +} +EXPORT_SYMBOL(bcmp); +#endif + #ifndef __HAVE_ARCH_MEMSCAN /** * memscan - Find a character in an area of memory.
commit 293edc27f8bc8a44978e9e95902b07b74f1c7523 upstream
This reverts commit c5f39d07860c ("staging: ccree: fix leak of import() after init()") and commit aece09024414 ("staging: ccree: Uninitialized return in ssi_ahash_import()").
This is the wrong solution and ends up relying on uninitialized memory, although it was not obvious to me at the time.
Cc: stable@vger.kernel.org Signed-off-by: Gilad Ben-Yossef gilad@benyossef.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/ccree/ssi_hash.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/staging/ccree/ssi_hash.c b/drivers/staging/ccree/ssi_hash.c index e266a70a1b32..13291aeaf350 100644 --- a/drivers/staging/ccree/ssi_hash.c +++ b/drivers/staging/ccree/ssi_hash.c @@ -1781,7 +1781,7 @@ static int ssi_ahash_import(struct ahash_request *req, const void *in) struct device *dev = &ctx->drvdata->plat_dev->dev; struct ahash_req_ctx *state = ahash_request_ctx(req); u32 tmp; - int rc = 0; + int rc;
memcpy(&tmp, in, sizeof(u32)); if (tmp != CC_EXPORT_MAGIC) { @@ -1790,12 +1790,9 @@ static int ssi_ahash_import(struct ahash_request *req, const void *in) } in += sizeof(u32);
- /* call init() to allocate bufs if the user hasn't */ - if (!state->digest_buff) { - rc = ssi_hash_init(state, ctx); - if (rc) - goto out; - } + rc = ssi_hash_init(state, ctx); + if (rc) + goto out;
dma_sync_single_for_cpu(dev, state->digest_buff_dma_addr, ctx->inter_digestsize, DMA_BIDIRECTIONAL);
[ Upstream commit c8a43c18a97845e7f94ed7d181c11f41964976a2 ]
When KASLR is enabled (CONFIG_RANDOMIZE_BASE=y), the top 4K of kernel virtual address space may be mapped to physical addresses despite being reserved for ERR_PTR values.
Fix the randomization of the linear region so that we avoid mapping the last page of the virtual address space.
Cc: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: liyueyi liyueyi@live.com [will: rewrote commit message; merged in suggestion from Ard] Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Sasha Levin (Microsoft) sashal@kernel.org --- arch/arm64/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index caa295cd5d09..9e6c822d458d 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -447,7 +447,7 @@ void __init arm64_memblock_init(void) * memory spans, randomize the linear region as well. */ if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) { - range = range / ARM64_MEMSTART_ALIGN + 1; + range /= ARM64_MEMSTART_ALIGN; memstart_addr -= ARM64_MEMSTART_ALIGN * ((range * memstart_offset_seed) >> 16); }
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit c7084edc3f6d67750f50d4183134c4fb5712a5c8 upstream.
The n_r3964 line discipline driver was written in a different time, when SMP machines were rare, and users were trusted to do the right thing. Since then, the world has moved on but not this code, it has stayed rooted in the past with its lovely hand-crafted list structures and loads of "interesting" race conditions all over the place.
After attempting to clean up most of the issues, I just gave up and am now marking the driver as BROKEN so that hopefully someone who has this hardware will show up out of the woodwork (I know you are out there!) and will help with debugging a raft of changes that I had laying around for the code, but was too afraid to commit as odds are they would break things.
Many thanks to Jann and Linus for pointing out the initial problems in this codebase, as well as many reviews of my attempts to fix the issues. It was a case of whack-a-mole, and as you can see, the mole won.
Reported-by: Jann Horn jannh@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
--- drivers/char/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -380,7 +380,7 @@ config XILINX_HWICAP
config R3964 tristate "Siemens R3964 line discipline" - depends on TTY + depends on TTY && BROKEN ---help--- This driver allows synchronous communication with devices using the Siemens R3964 packet protocol. Unless you are dealing with special
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 7c0cca7c847e6e019d67b7d793efbbe3b947d004 upstream.
By default, the kernel will automatically load the module of any line dicipline that is asked for. As this sometimes isn't the safest thing to do, provide a sysctl to disable this feature.
By default, we set this to 'y' as that is the historical way that Linux has worked, and we do not want to break working systems. But in the future, perhaps this can default to 'n' to prevent this functionality.
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/Kconfig | 24 ++++++++++++++++++++++++ drivers/tty/tty_io.c | 3 +++ drivers/tty/tty_ldisc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 74 insertions(+)
--- a/drivers/tty/Kconfig +++ b/drivers/tty/Kconfig @@ -467,4 +467,28 @@ config VCC depends on SUN_LDOMS help Support for Sun logical domain consoles. + +config LDISC_AUTOLOAD + bool "Automatically load TTY Line Disciplines" + default y + help + Historically the kernel has always automatically loaded any + line discipline that is in a kernel module when a user asks + for it to be loaded with the TIOCSETD ioctl, or through other + means. This is not always the best thing to do on systems + where you know you will not be using some of the more + "ancient" line disciplines, so prevent the kernel from doing + this unless the request is coming from a process with the + CAP_SYS_MODULE permissions. + + Say 'Y' here if you trust your userspace users to do the right + thing, or if you have only provided the line disciplines that + you know you will be using, or if you wish to continue to use + the traditional method of on-demand loading of these modules + by any user. + + This functionality can be changed at runtime with the + dev.tty.ldisc_autoload sysctl, this configuration option will + only set the default value of this functionality. + endif # TTY --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -511,6 +511,8 @@ static const struct file_operations hung static DEFINE_SPINLOCK(redirect_lock); static struct file *redirect;
+extern void tty_sysctl_init(void); + /** * tty_wakeup - request more data * @tty: terminal @@ -3332,6 +3334,7 @@ void console_sysfs_notify(void) */ int __init tty_init(void) { + tty_sysctl_init(); cdev_init(&tty_cdev, &tty_fops); if (cdev_add(&tty_cdev, MKDEV(TTYAUX_MAJOR, 0), 1) || register_chrdev_region(MKDEV(TTYAUX_MAJOR, 0), 1, "/dev/tty") < 0) --- a/drivers/tty/tty_ldisc.c +++ b/drivers/tty/tty_ldisc.c @@ -155,6 +155,13 @@ static void put_ldops(struct tty_ldisc_o * takes tty_ldiscs_lock to guard against ldisc races */
+#if defined(CONFIG_LDISC_AUTOLOAD) + #define INITIAL_AUTOLOAD_STATE 1 +#else + #define INITIAL_AUTOLOAD_STATE 0 +#endif +static int tty_ldisc_autoload = INITIAL_AUTOLOAD_STATE; + static struct tty_ldisc *tty_ldisc_get(struct tty_struct *tty, int disc) { struct tty_ldisc *ld; @@ -169,6 +176,8 @@ static struct tty_ldisc *tty_ldisc_get(s */ ldops = get_ldops(disc); if (IS_ERR(ldops)) { + if (!capable(CAP_SYS_MODULE) && !tty_ldisc_autoload) + return ERR_PTR(-EPERM); request_module("tty-ldisc-%d", disc); ldops = get_ldops(disc); if (IS_ERR(ldops)) @@ -841,3 +850,41 @@ void tty_ldisc_deinit(struct tty_struct tty_ldisc_put(tty->ldisc); tty->ldisc = NULL; } + +static int zero; +static int one = 1; +static struct ctl_table tty_table[] = { + { + .procname = "ldisc_autoload", + .data = &tty_ldisc_autoload, + .maxlen = sizeof(tty_ldisc_autoload), + .mode = 0644, + .proc_handler = proc_dointvec, + .extra1 = &zero, + .extra2 = &one, + }, + { } +}; + +static struct ctl_table tty_dir_table[] = { + { + .procname = "tty", + .mode = 0555, + .child = tty_table, + }, + { } +}; + +static struct ctl_table tty_root_table[] = { + { + .procname = "dev", + .mode = 0555, + .child = tty_dir_table, + }, + { } +}; + +void tty_sysctl_init(void) +{ + register_sysctl_table(tty_root_table); +}
From: Junwei Hu hujunwei4@huawei.com
[ Upstream commit ef0efcd3bd3fd0589732b67fb586ffd3c8705806 ]
At the beginning of ip6_fragment func, the prevhdr pointer is obtained in the ip6_find_1stfragopt func. However, all the pointers pointing into skb header may change when calling skb_checksum_help func with skb->ip_summed = CHECKSUM_PARTIAL condition. The prevhdr pointe will be dangling if it is not reloaded after calling __skb_linearize func in skb_checksum_help func.
Here, I add a variable, nexthdr_offset, to evaluate the offset, which does not changes even after calling __skb_linearize func.
Fixes: 405c92f7a541 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment") Signed-off-by: Junwei Hu hujunwei4@huawei.com Reported-by: Wenhao Zhang zhangwenhao8@huawei.com Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com Reviewed-by: Zhiqiang Liu liuzhiqiang26@huawei.com Acked-by: Martin KaFai Lau kafai@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -611,7 +611,7 @@ int ip6_fragment(struct net *net, struct inet6_sk(skb->sk) : NULL; struct ipv6hdr *tmp_hdr; struct frag_hdr *fh; - unsigned int mtu, hlen, left, len; + unsigned int mtu, hlen, left, len, nexthdr_offset; int hroom, troom; __be32 frag_id; int ptr, offset = 0, err = 0; @@ -622,6 +622,7 @@ int ip6_fragment(struct net *net, struct goto fail; hlen = err; nexthdr = *prevhdr; + nexthdr_offset = prevhdr - skb_network_header(skb);
mtu = ip6_skb_dst_mtu(skb);
@@ -656,6 +657,7 @@ int ip6_fragment(struct net *net, struct (err = skb_checksum_help(skb))) goto fail;
+ prevhdr = skb_network_header(skb) + nexthdr_offset; hroom = LL_RESERVED_SPACE(rt->dst.dev); if (skb_has_frag_list(skb)) { unsigned int first_len = skb_pagelen(skb);
From: Lorenzo Bianconi lorenzo.bianconi@redhat.com
[ Upstream commit bb9bd814ebf04f579be466ba61fc922625508807 ]
ipip6 tunnels run iptunnel_pull_header on received skbs. This can determine the following use-after-free accessing iph pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device)
[ 706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit] [ 706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/= [ 706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016 [ 706.771839] Call trace: [ 706.801159] dump_backtrace+0x0/0x2f8 [ 706.845079] show_stack+0x24/0x30 [ 706.884833] dump_stack+0xe0/0x11c [ 706.925629] print_address_description+0x68/0x260 [ 706.982070] kasan_report+0x178/0x340 [ 707.025995] __asan_report_load1_noabort+0x30/0x40 [ 707.083481] ipip6_rcv+0x1678/0x16e0 [sit] [ 707.132623] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 707.185940] ip_local_deliver_finish+0x3b8/0x988 [ 707.241338] ip_local_deliver+0x144/0x470 [ 707.289436] ip_rcv_finish+0x43c/0x14b0 [ 707.335447] ip_rcv+0x628/0x1138 [ 707.374151] __netif_receive_skb_core+0x1670/0x2600 [ 707.432680] __netif_receive_skb+0x28/0x190 [ 707.482859] process_backlog+0x1d0/0x610 [ 707.529913] net_rx_action+0x37c/0xf68 [ 707.574882] __do_softirq+0x288/0x1018 [ 707.619852] run_ksoftirqd+0x70/0xa8 [ 707.662734] smpboot_thread_fn+0x3a4/0x9e8 [ 707.711875] kthread+0x2c8/0x350 [ 707.750583] ret_from_fork+0x10/0x18
[ 707.811302] Allocated by task 16982: [ 707.854182] kasan_kmalloc.part.1+0x40/0x108 [ 707.905405] kasan_kmalloc+0xb4/0xc8 [ 707.948291] kasan_slab_alloc+0x14/0x20 [ 707.994309] __kmalloc_node_track_caller+0x158/0x5e0 [ 708.053902] __kmalloc_reserve.isra.8+0x54/0xe0 [ 708.108280] __alloc_skb+0xd8/0x400 [ 708.150139] sk_stream_alloc_skb+0xa4/0x638 [ 708.200346] tcp_sendmsg_locked+0x818/0x2b90 [ 708.251581] tcp_sendmsg+0x40/0x60 [ 708.292376] inet_sendmsg+0xf0/0x520 [ 708.335259] sock_sendmsg+0xac/0xf8 [ 708.377096] sock_write_iter+0x1c0/0x2c0 [ 708.424154] new_sync_write+0x358/0x4a8 [ 708.470162] __vfs_write+0xc4/0xf8 [ 708.510950] vfs_write+0x12c/0x3d0 [ 708.551739] ksys_write+0xcc/0x178 [ 708.592533] __arm64_sys_write+0x70/0xa0 [ 708.639593] el0_svc_handler+0x13c/0x298 [ 708.686646] el0_svc+0x8/0xc
[ 708.739019] Freed by task 17: [ 708.774597] __kasan_slab_free+0x114/0x228 [ 708.823736] kasan_slab_free+0x10/0x18 [ 708.868703] kfree+0x100/0x3d8 [ 708.905320] skb_free_head+0x7c/0x98 [ 708.948204] skb_release_data+0x320/0x490 [ 708.996301] pskb_expand_head+0x60c/0x970 [ 709.044399] __iptunnel_pull_header+0x3b8/0x5d0 [ 709.098770] ipip6_rcv+0x41c/0x16e0 [sit] [ 709.146873] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 709.200195] ip_local_deliver_finish+0x3b8/0x988 [ 709.255596] ip_local_deliver+0x144/0x470 [ 709.303692] ip_rcv_finish+0x43c/0x14b0 [ 709.349705] ip_rcv+0x628/0x1138 [ 709.388413] __netif_receive_skb_core+0x1670/0x2600 [ 709.446943] __netif_receive_skb+0x28/0x190 [ 709.497120] process_backlog+0x1d0/0x610 [ 709.544169] net_rx_action+0x37c/0xf68 [ 709.589131] __do_softirq+0x288/0x1018
[ 709.651938] The buggy address belongs to the object at ffffe01b6bd85580 which belongs to the cache kmalloc-1024 of size 1024 [ 709.804356] The buggy address is located 117 bytes inside of 1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980) [ 709.946340] The buggy address belongs to the page: [ 710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0 [ 710.099914] flags: 0xfffff8000000100(slab) [ 710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600 [ 710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000 [ 710.334966] page dumped because: kasan: bad access detected
Fix it resetting iph pointer after iptunnel_pull_header
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Tested-by: Jianlin Shi jishi@redhat.com Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/sit.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -661,6 +661,10 @@ static int ipip6_rcv(struct sk_buff *skb !net_eq(tunnel->net, dev_net(tunnel->dev)))) goto out;
+ /* skb can be uncloned in iptunnel_pull_header, so + * old iph is no longer valid + */ + iph = (const struct iphdr *)skb_mac_header(skb); err = IP_ECN_decapsulate(iph, skb); if (unlikely(err)) { if (log_ecn_error)
From: Jiri Slaby jslaby@suse.cz
[ Upstream commit 3c446e6f96997f2a95bf0037ef463802162d2323 ]
When kcm is loaded while many processes try to create a KCM socket, a crash occurs: BUG: unable to handle kernel NULL pointer dereference at 000000000000000e IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0 Oops: 0002 [#1] SMP KASAN PTI CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased) RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 RSP: 0018:ffff88000d487a00 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719 ... CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0 Call Trace: kcm_create+0x600/0xbf0 [kcm] __sock_create+0x324/0x750 net/socket.c:1272 ...
This is due to race between sock_create and unfinished register_pernet_device. kcm_create tries to do "net_generic(net, kcm_net_id)". but kcm_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with mutiple processes doing socket(PF_KCM, ...) and one process doing module removal.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: Jiri Slaby jslaby@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/kcm/kcmsock.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -2059,14 +2059,14 @@ static int __init kcm_init(void) if (err) goto fail;
- err = sock_register(&kcm_family_ops); - if (err) - goto sock_register_fail; - err = register_pernet_device(&kcm_net_ops); if (err) goto net_ops_fail;
+ err = sock_register(&kcm_family_ops); + if (err) + goto sock_register_fail; + err = kcm_proc_init(); if (err) goto proc_init_fail; @@ -2074,12 +2074,12 @@ static int __init kcm_init(void) return 0;
proc_init_fail: - unregister_pernet_device(&kcm_net_ops); - -net_ops_fail: sock_unregister(PF_KCM);
sock_register_fail: + unregister_pernet_device(&kcm_net_ops); + +net_ops_fail: proto_unregister(&kcm_proto);
fail: @@ -2095,8 +2095,8 @@ fail: static void __exit kcm_exit(void) { kcm_proc_exit(); - unregister_pernet_device(&kcm_net_ops); sock_unregister(PF_KCM); + unregister_pernet_device(&kcm_net_ops); proto_unregister(&kcm_proto); destroy_workqueue(kcm_wq);
From: Steffen Klassert steffen.klassert@secunet.com
[ Upstream commit 0ab03f353d3613ea49d1f924faf98559003670a8 ]
Currently we may merge incorrectly a received GSO packet or a packet with frag_list into a packet sitting in the gro_hash list. skb_segment() may crash case because the assumptions on the skb layout are not met. The correct behaviour would be to flush the packet in the gro_hash list and send the received GSO packet directly afterwards. Commit d61d072e87c8e ("net-gro: avoid reorders") sets NAPI_GRO_CB(skb)->flush in this case, but this is not checked before merging. This patch makes sure to check this flag and to not merge in that case.
Fixes: d61d072e87c8e ("net-gro: avoid reorders") Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3808,7 +3808,7 @@ int skb_gro_receive(struct sk_buff **hea struct sk_buff *lp, *p = *head; unsigned int delta_truesize;
- if (unlikely(p->len + len >= 65536)) + if (unlikely(p->len + len >= 65536 || NAPI_GRO_CB(skb)->flush)) return -E2BIG;
lp = NAPI_GRO_CB(p)->last;
From: Artemy Kovalyov artemyko@mellanox.com
[ Upstream commit e8b26b2135dedc0284490bfeac06dfc4418d0105 ]
Delete initialization of high order entries in mr cache to decrease initial memory footprint. When required, the administrator can populate the entries with memory keys via the /sys interface.
This approach is very helpful to significantly reduce the per HW function memory footprint in virtualization environments such as SRIOV.
Fixes: 9603b61de1ee ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") Signed-off-by: Artemy Kovalyov artemyko@mellanox.com Signed-off-by: Moni Shoua monis@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Reported-by: Shalom Toledo shalomt@mellanox.com Acked-by: Or Gerlitz ogerlitz@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 20 -------------------- 1 file changed, 20 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -155,26 +155,6 @@ static struct mlx5_profile profile[] = { .size = 8, .limit = 4 }, - .mr_cache[16] = { - .size = 8, - .limit = 4 - }, - .mr_cache[17] = { - .size = 8, - .limit = 4 - }, - .mr_cache[18] = { - .size = 8, - .limit = 4 - }, - .mr_cache[19] = { - .size = 4, - .limit = 2 - }, - .mr_cache[20] = { - .size = 4, - .limit = 2 - }, }, };
From: Mao Wenan maowenan@huawei.com
[ Upstream commit cb66ddd156203daefb8d71158036b27b0e2caf63 ]
When it is to cleanup net namespace, rds_tcp_exit_net() will call rds_tcp_kill_sock(), if t_sock is NULL, it will not call rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect() and reference 'net' which has already been freed.
In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before sock->ops->connect, but if connect() is failed, it will call rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always failed, rds_connect_worker() will try to reconnect all the time, so rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the connections.
Therefore, the condition !tc->t_sock is not needed if it is going to do cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always NULL, and there is on other path to cancel cp_conn_w and free connection. So this patch is to fix this.
rds_tcp_kill_sock(): ... if (net != c_net || !tc->t_sock) ... Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com
================================================================== BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11 Hardware name: linux,dummy-virt (DT) Workqueue: krdsd rds_connect_worker Call trace: dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x120/0x188 lib/dump_stack.c:113 print_address_description+0x68/0x278 mm/kasan/report.c:253 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x21c/0x348 mm/kasan/report.c:409 __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429 inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 __sock_create+0x4f8/0x770 net/socket.c:1276 sock_create_kern+0x50/0x68 net/socket.c:1322 rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114 rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
Allocated by task 687: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553 kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2705 [inline] slab_alloc mm/slub.c:2713 [inline] kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718 kmem_cache_zalloc include/linux/slab.h:697 [inline] net_alloc net/core/net_namespace.c:384 [inline] copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424 create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206 ksys_unshare+0x340/0x628 kernel/fork.c:2577 __do_sys_unshare kernel/fork.c:2645 [inline] __se_sys_unshare kernel/fork.c:2643 [inline] __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall arch/arm64/kernel/syscall.c:47 [inline] el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83 el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129 el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
Freed by task 264: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521 kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528 slab_free_hook mm/slub.c:1370 [inline] slab_free_freelist_hook mm/slub.c:1397 [inline] slab_free mm/slub.c:2952 [inline] kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968 net_free net/core/net_namespace.c:400 [inline] net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407 net_drop_ns net/core/net_namespace.c:406 [inline] cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
The buggy address belongs to the object at ffff8003496a3f80 which belongs to the cache net_namespace of size 7872 The buggy address is located 1796 bytes inside of 7872-byte region [ffff8003496a3f80, ffff8003496a5e40) The buggy address belongs to the page: page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000 index:0x0 compound_mapcount: 0 flags: 0xffffe0000008100(slab|head) raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000 raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -530,7 +530,7 @@ static void rds_tcp_kill_sock(struct net list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net);
- if (net != c_net || !tc->t_sock) + if (net != c_net) continue; if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn)) { list_move_tail(&tc->t_tcp_node, &tmp_list);
From: Nicolas Dichtel nicolas.dichtel@6wind.com
[ Upstream commit 0db6f8befc32c68bb13d7ffbb2e563c79e913e13 ]
It returned always NULL, thus it was never possible to get the filter.
Example: $ ip link add foo type dummy $ ip link add bar type dummy $ tc qdisc add dev foo clsact $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \ matchall action mirred ingress mirror dev bar
Before the patch: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall Error: Specified filter handle not found. We have an error talking to the kernel
After: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2 not_in_hw action order 1: mirred (Ingress Mirror to device bar) pipe index 1 ref 1 bind 1
CC: Yotam Gigi yotamg@mellanox.com CC: Jiri Pirko jiri@mellanox.com Fixes: fd62d9f5c575 ("net/sched: matchall: Fix configuration race") Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/cls_matchall.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -125,6 +125,11 @@ static void mall_destroy(struct tcf_prot
static void *mall_get(struct tcf_proto *tp, u32 handle) { + struct cls_mall_head *head = rtnl_dereference(tp->root); + + if (head && head->handle == handle) + return head; + return NULL; }
From: Andrea Righi andrea.righi@canonical.com
[ Upstream commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb ]
The flow action buffer can be resized if it's not big enough to contain all the requested flow actions. However, this resize doesn't take into account the new requested size, the buffer is only increased by a factor of 2x. This might be not enough to contain the new data, causing a buffer overflow, for example:
[ 42.044472] ============================================================================= [ 42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten [ 42.046415] -----------------------------------------------------------------------------
[ 42.047715] Disabling lock debugging due to kernel taint [ 42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc [ 42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101 [ 42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
[ 42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc ........ [ 42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00 kkkkkkkk....l... [ 42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6 l...........x... [ 42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00 ............... [ 42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.059061] Redzone 8bf2c4a5: 00 00 00 00 .... [ 42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Fix by making sure the new buffer is properly resized to contain all the requested data.
BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi andrea.righi@canonical.com Acked-by: Pravin B Shelar pshelar@ovn.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/openvswitch/flow_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -1967,14 +1967,14 @@ static struct nlattr *reserve_sfa_size(s
struct sw_flow_actions *acts; int new_acts_size; - int req_size = NLA_ALIGN(attr_len); + size_t req_size = NLA_ALIGN(attr_len); int next_offset = offsetof(struct sw_flow_actions, actions) + (*sfa)->actions_len;
if (req_size <= (ksize(*sfa) - next_offset)) goto out;
- new_acts_size = ksize(*sfa) * 2; + new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
if (new_acts_size > MAX_ACTIONS_BUFSIZE) { if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) {
From: Bjørn Mork bjorn@mork.no
[ Upstream commit 6289d0facd9ebce4cc83e5da39e15643ee998dc5 ]
This is a Qualcomm based device with a QMI function on interface 4. It is mode switched from 2020:2030 using a standard eject message.
T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 6 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2020 ProdID=2031 Rev= 2.32 S: Manufacturer=Mobile Connect S: Product=Mobile Connect S: SerialNumber=0123456789ABCDEF C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=89(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none) E: Ad=8a(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
Signed-off-by: Bjørn Mork bjorn@mork.no Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1188,6 +1188,7 @@ static const struct usb_device_id produc {QMI_FIXED_INTF(0x19d2, 0x2002, 4)}, /* ZTE (Vodafone) K3765-Z */ {QMI_FIXED_INTF(0x2001, 0x7e19, 4)}, /* D-Link DWM-221 B1 */ {QMI_FIXED_INTF(0x2001, 0x7e35, 4)}, /* D-Link DWM-222 */ + {QMI_FIXED_INTF(0x2020, 0x2031, 4)}, /* Olicard 600 */ {QMI_FIXED_INTF(0x2020, 0x2033, 4)}, /* BroadMobi BM806U */ {QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)}, /* Sierra Wireless MC7700 */ {QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 09279e615c81ce55e04835970601ae286e3facbe ]
Syzbot report a kernel-infoleak:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 Call Trace: _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 copy_to_user include/linux/uaccess.h:174 [inline] sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline] sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562 ... Uninit was stored to memory at: sctp_transport_init net/sctp/transport.c:61 [inline] sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115 sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637 sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline] sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361 ... Bytes 8-15 of 16 are uninitialized
It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in struct sockaddr_in) wasn't initialized, but directly copied to user memory in sctp_getsockopt_peer_addrs().
So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as sctp_v6_addr_to_user() does.
Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Tested-by: Alexander Potapenko glider@google.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/protocol.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -605,6 +605,7 @@ out: static int sctp_v4_addr_to_user(struct sctp_sock *sp, union sctp_addr *addr) { /* No address mapping for V4 sockets */ + memset(addr->v4.sin_zero, 0, sizeof(addr->v4.sin_zero)); return sizeof(struct sockaddr_in); }
From: Koen De Schepper koen.de_schepper@nokia-bell-labs.com
[ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to loss episodes in the same way as conventional TCP".
Currently, Linux DCTCP performs no cwnd reduction when losses are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets alpha to its maximal value if a RTO happens. This behavior is sub-optimal for at least two reasons: i) it ignores losses triggering fast retransmissions; and ii) it causes unnecessary large cwnd reduction in the future if the loss was isolated as it resets the historical term of DCTCP's alpha EWMA to its maximal value (i.e., denoting a total congestion). The second reason has an especially noticeable effect when using DCTCP in high BDP environments, where alpha normally stays at low values.
This patch replace the clamping of alpha by setting ssthresh to half of cwnd for both fast retransmissions and RTOs, at most once per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter has been removed.
The table below shows experimental results where we measured the drop probability of a PIE AQM (not applying ECN marks) at a bottleneck in the presence of a single TCP flow with either the alpha-clamping option enabled or the cwnd halving proposed by this patch. Results using reno or cubic are given for comparison.
| Link | RTT | Drop TCP CC | speed | base+AQM | probability ==================|=========|==========|============ CUBIC | 40Mbps | 7+20ms | 0.21% RENO | | | 0.19% DCTCP-CLAMP-ALPHA | | | 25.80% DCTCP-HALVE-CWND | | | 0.22% ------------------|---------|----------|------------ CUBIC | 100Mbps | 7+20ms | 0.03% RENO | | | 0.02% DCTCP-CLAMP-ALPHA | | | 23.30% DCTCP-HALVE-CWND | | | 0.04% ------------------|---------|----------|------------ CUBIC | 800Mbps | 1+1ms | 0.04% RENO | | | 0.05% DCTCP-CLAMP-ALPHA | | | 18.70% DCTCP-HALVE-CWND | | | 0.06%
We see that, without halving its cwnd for all source of losses, DCTCP drives the AQM to large drop probabilities in order to keep the queue length under control (i.e., it repeatedly faces RTOs). Instead, if DCTCP reacts to all source of losses, it can then be controlled by the AQM using similar drop levels than cubic or reno.
Signed-off-by: Koen De Schepper koen.de_schepper@nokia-bell-labs.com Signed-off-by: Olivier Tilmans olivier.tilmans@nokia-bell-labs.com Cc: Bob Briscoe research@bobbriscoe.net Cc: Lawrence Brakmo brakmo@fb.com Cc: Florian Westphal fw@strlen.de Cc: Daniel Borkmann borkmann@iogearbox.net Cc: Yuchung Cheng ycheng@google.com Cc: Neal Cardwell ncardwell@google.com Cc: Eric Dumazet edumazet@google.com Cc: Andrew Shewmaker agshew@gmail.com Cc: Glenn Judd glenn.judd@morganstanley.com Acked-by: Florian Westphal fw@strlen.de Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_dctcp.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-)
--- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -66,11 +66,6 @@ static unsigned int dctcp_alpha_on_init module_param(dctcp_alpha_on_init, uint, 0644); MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value");
-static unsigned int dctcp_clamp_alpha_on_loss __read_mostly; -module_param(dctcp_clamp_alpha_on_loss, uint, 0644); -MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss, - "parameter for clamping alpha on loss"); - static struct tcp_congestion_ops dctcp_reno;
static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca) @@ -211,21 +206,23 @@ static void dctcp_update_alpha(struct so } }
-static void dctcp_state(struct sock *sk, u8 new_state) +static void dctcp_react_to_loss(struct sock *sk) { - if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) { - struct dctcp *ca = inet_csk_ca(sk); + struct dctcp *ca = inet_csk_ca(sk); + struct tcp_sock *tp = tcp_sk(sk);
- /* If this extension is enabled, we clamp dctcp_alpha to - * max on packet loss; the motivation is that dctcp_alpha - * is an indicator to the extend of congestion and packet - * loss is an indicator of extreme congestion; setting - * this in practice turned out to be beneficial, and - * effectively assumes total congestion which reduces the - * window by half. - */ - ca->dctcp_alpha = DCTCP_MAX_ALPHA; - } + ca->loss_cwnd = tp->snd_cwnd; + tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U); +} + +static void dctcp_state(struct sock *sk, u8 new_state) +{ + if (new_state == TCP_CA_Recovery && + new_state != inet_csk(sk)->icsk_ca_state) + dctcp_react_to_loss(sk); + /* We handle RTO in dctcp_cwnd_event to ensure that we perform only + * one loss-adjustment per RTT. + */ }
static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) @@ -237,6 +234,9 @@ static void dctcp_cwnd_event(struct sock case CA_EVENT_ECN_NO_CE: dctcp_ce_state_1_to_0(sk); break; + case CA_EVENT_LOSS: + dctcp_react_to_loss(sk); + break; default: /* Don't care for the rest. */ break;
From: Stephen Suryaputra ssuryaextr@gmail.com
[ Upstream commit 8c83f2df9c6578ea4c5b940d8238ad8a41b87e9e ]
Configuration check to accept source route IP options should be made on the incoming netdevice when the skb->dev is an l3mdev master. The route lookup for the source route next hop also needs the incoming netdev.
v2->v3: - Simplify by passing the original netdevice down the stack (per David Ahern).
Signed-off-by: Stephen Suryaputra ssuryaextr@gmail.com Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ip.h | 2 +- net/ipv4/ip_input.c | 7 +++---- net/ipv4/ip_options.c | 4 ++-- 3 files changed, 6 insertions(+), 7 deletions(-)
--- a/include/net/ip.h +++ b/include/net/ip.h @@ -603,7 +603,7 @@ int ip_options_get_from_user(struct net unsigned char __user *data, int optlen); void ip_options_undo(struct ip_options *opt); void ip_forward_options(struct sk_buff *skb); -int ip_options_rcv_srr(struct sk_buff *skb); +int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev);
/* * Functions provided by ip_sockglue.c --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -259,11 +259,10 @@ int ip_local_deliver(struct sk_buff *skb ip_local_deliver_finish); }
-static inline bool ip_rcv_options(struct sk_buff *skb) +static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev) { struct ip_options *opt; const struct iphdr *iph; - struct net_device *dev = skb->dev;
/* It looks as overkill, because not all IP options require packet mangling. @@ -299,7 +298,7 @@ static inline bool ip_rcv_options(struct } }
- if (ip_options_rcv_srr(skb)) + if (ip_options_rcv_srr(skb, dev)) goto drop; }
@@ -362,7 +361,7 @@ static int ip_rcv_finish(struct net *net } #endif
- if (iph->ihl > 5 && ip_rcv_options(skb)) + if (iph->ihl > 5 && ip_rcv_options(skb, dev)) goto drop;
rt = skb_rtable(skb); --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -612,7 +612,7 @@ void ip_forward_options(struct sk_buff * } }
-int ip_options_rcv_srr(struct sk_buff *skb) +int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev) { struct ip_options *opt = &(IPCB(skb)->opt); int srrspace, srrptr; @@ -647,7 +647,7 @@ int ip_options_rcv_srr(struct sk_buff *s
orefdst = skb->_skb_refdst; skb_dst_set(skb, NULL); - err = ip_route_input(skb, nexthop, iph->saddr, iph->tos, skb->dev); + err = ip_route_input(skb, nexthop, iph->saddr, iph->tos, dev); rt2 = skb_rtable(skb); if (err || (rt2->rt_type != RTN_UNICAST && rt2->rt_type != RTN_LOCAL)) { skb_dst_drop(skb);
From: Gavi Teitz gavi@mellanox.com
[ Upstream commit bc87a0036826a37b43489b029af8143bd07c6cca ]
Previously, a false positive would be caught if the TIRs list is empty, since the err value was initialized to -ENOMEM, and was only updated if a TIR is refreshed. This is resolved by initializing the err value to zero.
Fixes: b676f653896a ("net/mlx5e: Refactor refresh TIRs") Signed-off-by: Gavi Teitz gavi@mellanox.com Reviewed-by: Roi Dayan roid@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -140,15 +140,17 @@ int mlx5e_refresh_tirs(struct mlx5e_priv { struct mlx5_core_dev *mdev = priv->mdev; struct mlx5e_tir *tir; - int err = -ENOMEM; + int err = 0; u32 tirn = 0; int inlen; void *in;
inlen = MLX5_ST_SZ_BYTES(modify_tir_in); in = kvzalloc(inlen, GFP_KERNEL); - if (!in) + if (!in) { + err = -ENOMEM; goto out; + }
if (enable_uc_lb) MLX5_SET(modify_tir_in, in, ctx.self_lb_block,
From: Yuval Avnery yuvalav@mellanox.com
[ Upstream commit 80a2a9026b24c6bd34b8d58256973e22270bedec ]
Refresh tirs is looping over a global list of tirs while netdevs are adding and removing tirs from that list. That is why a lock is required.
Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring") Signed-off-by: Yuval Avnery yuvalav@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 7 +++++++ include/linux/mlx5/driver.h | 2 ++ 2 files changed, 9 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -45,7 +45,9 @@ int mlx5e_create_tir(struct mlx5_core_de if (err) return err;
+ mutex_lock(&mdev->mlx5e_res.td.list_lock); list_add(&tir->list, &mdev->mlx5e_res.td.tirs_list); + mutex_unlock(&mdev->mlx5e_res.td.list_lock);
return 0; } @@ -53,8 +55,10 @@ int mlx5e_create_tir(struct mlx5_core_de void mlx5e_destroy_tir(struct mlx5_core_dev *mdev, struct mlx5e_tir *tir) { + mutex_lock(&mdev->mlx5e_res.td.list_lock); mlx5_core_destroy_tir(mdev, tir->tirn); list_del(&tir->list); + mutex_unlock(&mdev->mlx5e_res.td.list_lock); }
static int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, @@ -114,6 +118,7 @@ int mlx5e_create_mdev_resources(struct m }
INIT_LIST_HEAD(&mdev->mlx5e_res.td.tirs_list); + mutex_init(&mdev->mlx5e_res.td.list_lock);
return 0;
@@ -158,6 +163,7 @@ int mlx5e_refresh_tirs(struct mlx5e_priv
MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
+ mutex_lock(&mdev->mlx5e_res.td.list_lock); list_for_each_entry(tir, &mdev->mlx5e_res.td.tirs_list, list) { tirn = tir->tirn; err = mlx5_core_modify_tir(mdev, tirn, in, inlen); @@ -169,6 +175,7 @@ out: kvfree(in); if (err) netdev_err(priv->netdev, "refresh tir(0x%x) failed, %d\n", tirn, err); + mutex_unlock(&mdev->mlx5e_res.td.list_lock);
return err; } --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -743,6 +743,8 @@ struct mlx5_pagefault { };
struct mlx5_td { + /* protects tirs list changes while tirs refresh */ + struct mutex list_lock; struct list_head tirs_list; u32 tdn; };
From: Jakub Kicinski jakub.kicinski@netronome.com
[ Upstream commit c8ba5b91a04e3e2643e48501c114108802f21cda ]
dev_queue_xmit() may return error codes as well as netdev_tx_t, and it always consumes the skb. Make sure we always return a correct netdev_tx_t value.
Fixes: eadfa4c3be99 ("nfp: add stats and xmit helpers for representors") Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Reviewed-by: John Hurley john.hurley@netronome.com Reviewed-by: Simon Horman simon.horman@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -200,7 +200,7 @@ static netdev_tx_t nfp_repr_xmit(struct ret = dev_queue_xmit(skb); nfp_repr_inc_tx_stats(netdev, len, ret);
- return ret; + return NETDEV_TX_OK; }
static int nfp_repr_stop(struct net_device *netdev)
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit a1b0e4e684e9c300b9e759b46cb7a0147e61ddff ]
There is logic to check that the RX/TPA consumer index is the expected index to work around a hardware problem. However, the potentially bad consumer index is first used to index into an array to reference an entry. This can potentially crash if the bad consumer index is beyond legal range. Improve the logic to use the consumer index for dereferencing after the validity check and log an error message.
Fixes: fa7e28127a5a ("bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1076,6 +1076,8 @@ static void bnxt_tpa_start(struct bnxt * tpa_info = &rxr->rx_tpa[agg_id];
if (unlikely(cons != rxr->rx_next_cons)) { + netdev_warn(bp->dev, "TPA cons %x != expected cons %x\n", + cons, rxr->rx_next_cons); bnxt_sched_reset(bp, rxr); return; } @@ -1528,15 +1530,17 @@ static int bnxt_rx_pkt(struct bnxt *bp, }
cons = rxcmp->rx_cmp_opaque; - rx_buf = &rxr->rx_buf_ring[cons]; - data = rx_buf->data; - data_ptr = rx_buf->data_ptr; if (unlikely(cons != rxr->rx_next_cons)) { int rc1 = bnxt_discard_rx(bp, bnapi, raw_cons, rxcmp);
+ netdev_warn(bp->dev, "RX cons %x != expected cons %x\n", + cons, rxr->rx_next_cons); bnxt_sched_reset(bp, rxr); return rc1; } + rx_buf = &rxr->rx_buf_ring[cons]; + data = rx_buf->data; + data_ptr = rx_buf->data_ptr; prefetch(data_ptr);
misc = le32_to_cpu(rxcmp->rx_cmp_misc_v1);
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 8e44e96c6c8e8fb80b84a2ca11798a8554f710f2 ]
If the RX completion indicates RX buffers errors, the RX ring will be disabled by firmware and no packets will be received on that ring from that point on. Recover by resetting the device.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1557,11 +1557,17 @@ static int bnxt_rx_pkt(struct bnxt *bp,
rx_buf->data = NULL; if (rxcmp1->rx_cmp_cfa_code_errors_v2 & RX_CMP_L2_ERRORS) { + u32 rx_err = le32_to_cpu(rxcmp1->rx_cmp_cfa_code_errors_v2); + bnxt_reuse_rx_data(rxr, cons, data); if (agg_bufs) bnxt_reuse_rx_agg_bufs(bnapi, cp_cons, agg_bufs);
rc = -EIO; + if (rx_err & RX_CMPL_ERRORS_BUFFER_ERROR_MASK) { + netdev_warn(bp->dev, "RX buffer error %x\n", rx_err); + bnxt_sched_reset(bp, rxr); + } goto next_rx; }
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit fae2708174ae95d98d19f194e03d6e8f688ae195 ]
the control path of 'sample' action does not validate the value of 'rate' provided by the user, but then it uses it as divisor in the traffic path. Validate it in tcf_sample_init(), and return -EINVAL with a proper extack message in case that value is zero, to fix a splat with the script below:
# tc f a dev test0 egress matchall action sample rate 0 group 1 index 2 # tc -s a s action sample total acts 1
action order 0: sample rate 1/0 group 1 pipe index 2 ref 1 bind 1 installed 19 sec used 19 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 # ping 192.0.2.1 -I test0 -c1 -q
divide error: 0000 [#1] SMP PTI CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample] Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000 FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0 Call Trace: tcf_action_exec+0x7c/0x1c0 tcf_classify+0x57/0x160 __dev_queue_xmit+0x3dc/0xd10 ip_finish_output2+0x257/0x6d0 ip_output+0x75/0x280 ip_send_skb+0x15/0x40 raw_sendmsg+0xae3/0x1410 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] Kernel panic - not syncing: Fatal exception in interrupt
Add a TDC selftest to document that 'rate' is now being validated.
Reported-by: Matteo Croce mcroce@redhat.com Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Davide Caratti dcaratti@redhat.com Acked-by: Yotam Gigi yotam.gi@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/act_sample.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -45,6 +45,7 @@ static int tcf_sample_init(struct net *n struct tc_sample *parm; struct tcf_sample *s; bool exists = false; + u32 rate; int ret;
if (!nla) @@ -73,10 +74,17 @@ static int tcf_sample_init(struct net *n if (!ovr) return -EEXIST; } - s = to_sample(*a);
+ rate = nla_get_u32(tb[TCA_SAMPLE_RATE]); + if (!rate) { + tcf_idr_release(*a, bind); + return -EINVAL; + } + + s = to_sample(*a); s->tcf_action = parm->action; s->rate = nla_get_u32(tb[TCA_SAMPLE_RATE]); + s->rate = rate; s->psample_group_num = nla_get_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]); psample_group = psample_group_get(net, s->psample_group_num); if (!psample_group) {
From: Eric Dumazet edumazet@google.com
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]
net_hash_mix() currently uses kernel address of a struct net, and is used in many places that could be used to reveal this address to a patient attacker, thus defeating KASLR, for the typical case (initial net namespace, &init_net is not dynamically allocated)
I believe the original implementation tried to avoid spending too many cycles in this function, but security comes first.
Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Amit Klein aksecurity@gmail.com Reported-by: Benny Pinkas benny@pinkas.net Cc: Pavel Emelyanov xemul@openvz.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/net_namespace.h | 1 + include/net/netns/hash.h | 15 ++------------- net/core/net_namespace.c | 1 + 3 files changed, 4 insertions(+), 13 deletions(-)
--- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -56,6 +56,7 @@ struct net { */ spinlock_t rules_mod_lock;
+ u32 hash_mix; atomic64_t cookie_gen;
struct list_head list; /* list of network namespaces */ --- a/include/net/netns/hash.h +++ b/include/net/netns/hash.h @@ -2,21 +2,10 @@ #ifndef __NET_NS_HASH_H__ #define __NET_NS_HASH_H__
-#include <asm/cache.h> - -struct net; +#include <net/net_namespace.h>
static inline u32 net_hash_mix(const struct net *net) { -#ifdef CONFIG_NET_NS - /* - * shift this right to eliminate bits, that are - * always zeroed - */ - - return (u32)(((unsigned long)net) >> L1_CACHE_SHIFT); -#else - return 0; -#endif + return net->hash_mix; } #endif --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -285,6 +285,7 @@ static __net_init int setup_net(struct n
atomic_set(&net->count, 1); refcount_set(&net->passive, 1); + get_random_bytes(&net->hash_mix, sizeof(u32)); net->dev_base_seq = 1; net->user_ns = user_ns; idr_init(&net->netns_ids);
From: Li RongQing lirongqing@baidu.com
[ Upstream commit 3d8830266ffc28c16032b859e38a0252e014b631 ]
NULL or ZERO_SIZE_PTR will be returned for zero sized memory request, and derefencing them will lead to a segfault
so it is unnecessory to call vzalloc for zero sized memory request and not call functions which maybe derefence the NULL allocated memory
this also fixes a possible memory leak if phy_ethtool_get_stats returns error, memory should be freed before exit
Signed-off-by: Li RongQing lirongqing@baidu.com Reviewed-by: Wang Li wangli39@baidu.com Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/ethtool.c | 43 +++++++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 16 deletions(-)
--- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1815,11 +1815,15 @@ static int ethtool_get_strings(struct ne WARN_ON_ONCE(!ret);
gstrings.len = ret; - data = vzalloc(gstrings.len * ETH_GSTRING_LEN); - if (gstrings.len && !data) - return -ENOMEM; + if (gstrings.len) { + data = vzalloc(gstrings.len * ETH_GSTRING_LEN); + if (!data) + return -ENOMEM;
- __ethtool_get_strings(dev, gstrings.string_set, data); + __ethtool_get_strings(dev, gstrings.string_set, data); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &gstrings, sizeof(gstrings))) @@ -1915,11 +1919,14 @@ static int ethtool_get_stats(struct net_ return -EFAULT;
stats.n_stats = n_stats; - data = vzalloc(n_stats * sizeof(u64)); - if (n_stats && !data) - return -ENOMEM; - - ops->get_ethtool_stats(dev, &stats, data); + if (n_stats) { + data = vzalloc(n_stats * sizeof(u64)); + if (!data) + return -ENOMEM; + ops->get_ethtool_stats(dev, &stats, data); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &stats, sizeof(stats))) @@ -1955,13 +1962,17 @@ static int ethtool_get_phy_stats(struct return -EFAULT;
stats.n_stats = n_stats; - data = vzalloc(n_stats * sizeof(u64)); - if (n_stats && !data) - return -ENOMEM; - - mutex_lock(&phydev->lock); - phydev->drv->get_stats(phydev, &stats, data); - mutex_unlock(&phydev->lock); + if (n_stats) { + data = vzalloc(n_stats * sizeof(u64)); + if (!data) + return -ENOMEM; + + mutex_lock(&phydev->lock); + phydev->drv->get_stats(phydev, &stats, data); + mutex_unlock(&phydev->lock); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &stats, sizeof(stats)))
From: Zubin Mithra zsm@chromium.org
commit 212ac181c158c09038c474ba68068be49caecebb upstream.
When ioctl calls are made with non-null-terminated userspace strings, strlcpy causes an OOB-read from within strlen. Fix by changing to use strscpy instead.
Signed-off-by: Zubin Mithra zsm@chromium.org Reviewed-by: Guenter Roeck groeck@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/seq/seq_clientmgr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -1249,7 +1249,7 @@ static int snd_seq_ioctl_set_client_info
/* fill the info fields */ if (client_info->name[0]) - strlcpy(client->name, client_info->name, sizeof(client->name)); + strscpy(client->name, client_info->name, sizeof(client->name));
client->filter = client_info->filter; client->event_lost = client_info->event_lost; @@ -1527,7 +1527,7 @@ static int snd_seq_ioctl_create_queue(st /* set queue name */ if (!info->name[0]) snprintf(info->name, sizeof(info->name), "Queue-%d", q->queue); - strlcpy(q->name, info->name, sizeof(q->name)); + strscpy(q->name, info->name, sizeof(q->name)); snd_use_lock_free(&q->use_lock);
return 0; @@ -1589,7 +1589,7 @@ static int snd_seq_ioctl_set_queue_info( queuefree(q); return -EPERM; } - strlcpy(q->name, info->name, sizeof(q->name)); + strscpy(q->name, info->name, sizeof(q->name)); queuefree(q);
return 0;
From: Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz
[ Upstream commit b2e54b09a3d29c4db883b920274ca8dca4d9f04d ]
The device type for ip6 tunnels is set to ARPHRD_TUNNEL6. However, the ip4ip6_err function is expecting the device type of the tunnel to be ARPHRD_TUNNEL. Since the device types do not match, the function exits and the ICMP error packet is not sent to the originating host. Note that the device type for IPv4 tunnels is set to ARPHRD_TUNNEL.
Fix is to expect a tunnel device type of ARPHRD_TUNNEL6 instead. Now the tunnel device type matches and the ICMP error packet is sent to the originating host.
Signed-off-by: Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_tunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -633,7 +633,7 @@ ip4ip6_err(struct sk_buff *skb, struct i IPPROTO_IPIP, RT_TOS(eiph->tos), 0); if (IS_ERR(rt) || - rt->dst.dev->type != ARPHRD_TUNNEL) { + rt->dst.dev->type != ARPHRD_TUNNEL6) { if (!IS_ERR(rt)) ip_rt_put(rt); goto out; @@ -643,7 +643,7 @@ ip4ip6_err(struct sk_buff *skb, struct i ip_rt_put(rt); if (ip_route_input(skb2, eiph->daddr, eiph->saddr, eiph->tos, skb2->dev) || - skb_dst(skb2)->dev->type != ARPHRD_TUNNEL) + skb_dst(skb2)->dev->type != ARPHRD_TUNNEL6) goto out; }
From: Haiyang Zhang haiyangz@microsoft.com
[ Upstream commit 1b704c4a1ba95574832e730f23817b651db2aa59 ]
After queue stopped, the wakeup mechanism may wake it up again when ring buffer usage is lower than a threshold. This may cause send path panic on NULL pointer when we stopped all tx queues in netvsc_detach and start removing the netvsc device.
This patch fix it by adding a tx_disable flag to prevent unwanted queue wakeup.
Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Reported-by: Mohammed Gamal mgamal@redhat.com Signed-off-by: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/hyperv_net.h | 1 + drivers/net/hyperv/netvsc.c | 6 ++++-- drivers/net/hyperv/netvsc_drv.c | 32 ++++++++++++++++++++++++++------ 3 files changed, 31 insertions(+), 8 deletions(-)
--- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -779,6 +779,7 @@ struct netvsc_device {
wait_queue_head_t wait_drain; bool destroy; + bool tx_disable; /* if true, do not wake up queue again */
/* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -107,6 +107,7 @@ static struct netvsc_device *alloc_net_d
init_waitqueue_head(&net_device->wait_drain); net_device->destroy = false; + net_device->tx_disable = false; atomic_set(&net_device->open_cnt, 0); net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT; net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT; @@ -712,7 +713,7 @@ static void netvsc_send_tx_complete(stru } else { struct netdev_queue *txq = netdev_get_tx_queue(ndev, q_idx);
- if (netif_tx_queue_stopped(txq) && + if (netif_tx_queue_stopped(txq) && !net_device->tx_disable && (hv_ringbuf_avail_percent(&channel->outbound) > RING_AVAIL_PERCENT_HIWATER || queue_sends < 1)) { netif_tx_wake_queue(txq); @@ -865,7 +866,8 @@ static inline int netvsc_send_pkt( netif_tx_stop_queue(txq); } else if (ret == -EAGAIN) { netif_tx_stop_queue(txq); - if (atomic_read(&nvchan->queue_sends) < 1) { + if (atomic_read(&nvchan->queue_sends) < 1 && + !net_device->tx_disable) { netif_tx_wake_queue(txq); ret = -ENOSPC; } --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -108,6 +108,15 @@ static void netvsc_set_rx_mode(struct ne rcu_read_unlock(); }
+static void netvsc_tx_enable(struct netvsc_device *nvscdev, + struct net_device *ndev) +{ + nvscdev->tx_disable = false; + virt_wmb(); /* ensure queue wake up mechanism is on */ + + netif_tx_wake_all_queues(ndev); +} + static int netvsc_open(struct net_device *net) { struct net_device_context *ndev_ctx = netdev_priv(net); @@ -128,7 +137,7 @@ static int netvsc_open(struct net_device rdev = nvdev->extension; if (!rdev->link_state) { netif_carrier_on(net); - netif_tx_wake_all_queues(net); + netvsc_tx_enable(nvdev, net); }
if (vf_netdev) { @@ -183,6 +192,17 @@ static int netvsc_wait_until_empty(struc } }
+static void netvsc_tx_disable(struct netvsc_device *nvscdev, + struct net_device *ndev) +{ + if (nvscdev) { + nvscdev->tx_disable = true; + virt_wmb(); /* ensure txq will not wake up after stop */ + } + + netif_tx_disable(ndev); +} + static int netvsc_close(struct net_device *net) { struct net_device_context *net_device_ctx = netdev_priv(net); @@ -191,7 +211,7 @@ static int netvsc_close(struct net_devic struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev); int ret;
- netif_tx_disable(net); + netvsc_tx_disable(nvdev, net);
/* No need to close rndis filter if it is removed already */ if (!nvdev) @@ -893,7 +913,7 @@ static int netvsc_detach(struct net_devi
/* If device was up (receiving) then shutdown */ if (netif_running(ndev)) { - netif_tx_disable(ndev); + netvsc_tx_disable(nvdev, ndev);
ret = rndis_filter_close(nvdev); if (ret) { @@ -1720,7 +1740,7 @@ static void netvsc_link_change(struct wo if (rdev->link_state) { rdev->link_state = false; netif_carrier_on(net); - netif_tx_wake_all_queues(net); + netvsc_tx_enable(net_device, net); } else { notify = true; } @@ -1730,7 +1750,7 @@ static void netvsc_link_change(struct wo if (!rdev->link_state) { rdev->link_state = true; netif_carrier_off(net); - netif_tx_stop_all_queues(net); + netvsc_tx_disable(net_device, net); } kfree(event); break; @@ -1739,7 +1759,7 @@ static void netvsc_link_change(struct wo if (!rdev->link_state) { rdev->link_state = true; netif_carrier_off(net); - netif_tx_stop_all_queues(net); + netvsc_tx_disable(net_device, net); event->event = RNDIS_STATUS_MEDIA_CONNECT; spin_lock_irqsave(&ndev_ctx->lock, flags); list_add(&event->list, &ndev_ctx->reconfig_events);
From: Peter Geis pgwipeout@gmail.com
commit 09f91381fa5de1d44bc323d8bf345f5d57b3d9b5 upstream.
Various rk3328 based boards experience occasional sdmmc0 write errors. This is due to the rk3328.dtsi tx drive levels being set to 4ma, vs 8ma per the rk3328 datasheet default settings.
Fix this by setting the tx signal pins to 8ma. Inspiration from tonymac32's patch, https://github.com/ayufan-rock64/linux-kernel/commit/dc1212b347e0da17c5460bc...
Fixes issues on the rk3328-roc-cc and the rk3328-rock64 (as per the above commit message).
Tested on the rk3328-roc-cc board.
Fixes: 52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs") Cc: stable@vger.kernel.org Signed-off-by: Peter Geis pgwipeout@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328.dtsi | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3328.dtsi @@ -1333,11 +1333,11 @@
sdmmc0 { sdmmc0_clk: sdmmc0-clk { - rockchip,pins = <1 RK_PA6 1 &pcfg_pull_none_4ma>; + rockchip,pins = <1 RK_PA6 1 &pcfg_pull_none_8ma>; };
sdmmc0_cmd: sdmmc0-cmd { - rockchip,pins = <1 RK_PA4 1 &pcfg_pull_up_4ma>; + rockchip,pins = <1 RK_PA4 1 &pcfg_pull_up_8ma>; };
sdmmc0_dectn: sdmmc0-dectn { @@ -1349,14 +1349,14 @@ };
sdmmc0_bus1: sdmmc0-bus1 { - rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_4ma>; + rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_8ma>; };
sdmmc0_bus4: sdmmc0-bus4 { - rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_4ma>, - <1 RK_PA1 1 &pcfg_pull_up_4ma>, - <1 RK_PA2 1 &pcfg_pull_up_4ma>, - <1 RK_PA3 1 &pcfg_pull_up_4ma>; + rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_8ma>, + <1 RK_PA1 1 &pcfg_pull_up_8ma>, + <1 RK_PA2 1 &pcfg_pull_up_8ma>, + <1 RK_PA3 1 &pcfg_pull_up_8ma>; };
sdmmc0_gpio: sdmmc0-gpio {
From: Helge Deller deller@gmx.de
commit d006e95b5561f708d0385e9677ffe2c46f2ae345 upstream.
While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before.
This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding.
Fixes: 310d82784fb4 ("parisc: qemu idle sleep support") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/kernel/process.c | 6 ------ arch/parisc/kernel/setup.c | 3 +++ 2 files changed, 3 insertions(+), 6 deletions(-)
--- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -209,12 +209,6 @@ void __cpuidle arch_cpu_idle(void)
static int __init parisc_idle_init(void) { - const char *marker; - - /* check QEMU/SeaBIOS marker in PAGE0 */ - marker = (char *) &PAGE0->pad0; - running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0); - if (!running_on_qemu) cpu_idle_poll_ctrl(1);
--- a/arch/parisc/kernel/setup.c +++ b/arch/parisc/kernel/setup.c @@ -406,6 +406,9 @@ void __init start_parisc(void) int ret, cpunum; struct pdc_coproc_cfg coproc_cfg;
+ /* check QEMU/SeaBIOS marker in PAGE0 */ + running_on_qemu = (memcmp(&PAGE0->pad0, "SeaBIOS", 8) == 0); + cpunum = smp_processor_id();
set_firmware_width_unlocked();
From: Sven Schnelle svens@stackframe.org
commit 45efd871bf0a47648f119d1b41467f70484de5bc upstream.
While working on kretprobes for PA-RISC I was wondering while the kprobes sanity test always fails on kretprobes. This is caused by returning gpr20 instead of gpr28.
Signed-off-by: Sven Schnelle svens@stackframe.org Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/include/asm/ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/parisc/include/asm/ptrace.h +++ b/arch/parisc/include/asm/ptrace.h @@ -22,7 +22,7 @@ unsigned long profile_pc(struct pt_regs
static inline unsigned long regs_return_value(struct pt_regs *regs) { - return regs->gr[20]; + return regs->gr[28]; }
#endif
From: Andrei Vagin avagin@gmail.com
commit 07d7e12091f4ab869cc6a4bb276399057e73b0b3 upstream.
To calculate a remaining time, it's required to subtract the current time from the expiration time. In alarm_timer_remaining() the arguments of ktime_sub are swapped.
Fixes: d653d8457c76 ("alarmtimer: Implement remaining callback") Signed-off-by: Andrei Vagin avagin@gmail.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Mukesh Ojha mojha@codeaurora.org Cc: Stephen Boyd sboyd@kernel.org Cc: John Stultz john.stultz@linaro.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190408041542.26338-1-avagin@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/alarmtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -597,7 +597,7 @@ static ktime_t alarm_timer_remaining(str { struct alarm *alarm = &timr->it.alarm.alarmtimer;
- return ktime_sub(now, alarm->node.expires); + return ktime_sub(alarm->node.expires, now); }
/**
From: Dave Airlie airlied@redhat.com
commit 9b39b013037fbfa8d4b999345d9e904d8a336fc2 upstream.
If we unplug a udl device, the usb callback with deinit the mode_config struct, however userspace will still have an open file descriptor and a framebuffer on that device. When userspace closes the fd, we'll oops because it'll try and look stuff up in the object idr which we've destroyed.
This punts destroying the mode objects until release time instead.
Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20190405031715.5959-2-airlied@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/udl/udl_drv.c | 1 + drivers/gpu/drm/udl/udl_drv.h | 1 + drivers/gpu/drm/udl/udl_main.c | 8 +++++++- 3 files changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/udl/udl_drv.c +++ b/drivers/gpu/drm/udl/udl_drv.c @@ -47,6 +47,7 @@ static struct drm_driver driver = { .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_PRIME, .load = udl_driver_load, .unload = udl_driver_unload, + .release = udl_driver_release,
/* gem hooks */ .gem_free_object = udl_gem_free_object, --- a/drivers/gpu/drm/udl/udl_drv.h +++ b/drivers/gpu/drm/udl/udl_drv.h @@ -101,6 +101,7 @@ void udl_urb_completion(struct urb *urb)
int udl_driver_load(struct drm_device *dev, unsigned long flags); void udl_driver_unload(struct drm_device *dev); +void udl_driver_release(struct drm_device *dev);
int udl_fbdev_init(struct drm_device *dev); void udl_fbdev_cleanup(struct drm_device *dev); --- a/drivers/gpu/drm/udl/udl_main.c +++ b/drivers/gpu/drm/udl/udl_main.c @@ -378,6 +378,12 @@ void udl_driver_unload(struct drm_device udl_free_urb_list(dev);
udl_fbdev_cleanup(dev); - udl_modeset_cleanup(dev); kfree(udl); } + +void udl_driver_release(struct drm_device *dev) +{ + udl_modeset_cleanup(dev); + drm_dev_fini(dev); + kfree(dev); +}
From: Arnd Bergmann arnd@arndb.de
commit 6147e136ff5071609b54f18982dea87706288e21 upstream.
clang points out with hundreds of warnings that the bitrev macros have a problem with constant input:
drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] u8 crc = bitrev8(data->val_status & 0x0F); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8' __constant_bitrev8(__x) : \ ~~~~~~~~~~~~~~~~~~~^~~~ include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8' u8 __x = x; \ ~~~ ^
Both the bitrev and the __constant_bitrev macros use an internal variable named __x, which goes horribly wrong when passing one to the other.
The obvious fix is to rename one of the variables, so this adds an extra '_'.
It seems we got away with this because
- there are only a few drivers using bitrev macros
- usually there are no constant arguments to those
- when they are constant, they tend to be either 0 or (unsigned)-1 (drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and give the correct result by pure chance.
In fact, the only driver that I could find that gets different results with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver for fairly rare hardware (adding the maintainer to Cc for testing).
Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de Fixes: 556d2f055bf6 ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Nick Desaulniers ndesaulniers@google.com Cc: Zhao Qiang qiang.zhao@nxp.com Cc: Yalin Wang yalin.wang@sonymobile.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/bitrev.h | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-)
--- a/include/linux/bitrev.h +++ b/include/linux/bitrev.h @@ -34,41 +34,41 @@ static inline u32 __bitrev32(u32 x)
#define __constant_bitrev32(x) \ ({ \ - u32 __x = x; \ - __x = (__x >> 16) | (__x << 16); \ - __x = ((__x & (u32)0xFF00FF00UL) >> 8) | ((__x & (u32)0x00FF00FFUL) << 8); \ - __x = ((__x & (u32)0xF0F0F0F0UL) >> 4) | ((__x & (u32)0x0F0F0F0FUL) << 4); \ - __x = ((__x & (u32)0xCCCCCCCCUL) >> 2) | ((__x & (u32)0x33333333UL) << 2); \ - __x = ((__x & (u32)0xAAAAAAAAUL) >> 1) | ((__x & (u32)0x55555555UL) << 1); \ - __x; \ + u32 ___x = x; \ + ___x = (___x >> 16) | (___x << 16); \ + ___x = ((___x & (u32)0xFF00FF00UL) >> 8) | ((___x & (u32)0x00FF00FFUL) << 8); \ + ___x = ((___x & (u32)0xF0F0F0F0UL) >> 4) | ((___x & (u32)0x0F0F0F0FUL) << 4); \ + ___x = ((___x & (u32)0xCCCCCCCCUL) >> 2) | ((___x & (u32)0x33333333UL) << 2); \ + ___x = ((___x & (u32)0xAAAAAAAAUL) >> 1) | ((___x & (u32)0x55555555UL) << 1); \ + ___x; \ })
#define __constant_bitrev16(x) \ ({ \ - u16 __x = x; \ - __x = (__x >> 8) | (__x << 8); \ - __x = ((__x & (u16)0xF0F0U) >> 4) | ((__x & (u16)0x0F0FU) << 4); \ - __x = ((__x & (u16)0xCCCCU) >> 2) | ((__x & (u16)0x3333U) << 2); \ - __x = ((__x & (u16)0xAAAAU) >> 1) | ((__x & (u16)0x5555U) << 1); \ - __x; \ + u16 ___x = x; \ + ___x = (___x >> 8) | (___x << 8); \ + ___x = ((___x & (u16)0xF0F0U) >> 4) | ((___x & (u16)0x0F0FU) << 4); \ + ___x = ((___x & (u16)0xCCCCU) >> 2) | ((___x & (u16)0x3333U) << 2); \ + ___x = ((___x & (u16)0xAAAAU) >> 1) | ((___x & (u16)0x5555U) << 1); \ + ___x; \ })
#define __constant_bitrev8x4(x) \ ({ \ - u32 __x = x; \ - __x = ((__x & (u32)0xF0F0F0F0UL) >> 4) | ((__x & (u32)0x0F0F0F0FUL) << 4); \ - __x = ((__x & (u32)0xCCCCCCCCUL) >> 2) | ((__x & (u32)0x33333333UL) << 2); \ - __x = ((__x & (u32)0xAAAAAAAAUL) >> 1) | ((__x & (u32)0x55555555UL) << 1); \ - __x; \ + u32 ___x = x; \ + ___x = ((___x & (u32)0xF0F0F0F0UL) >> 4) | ((___x & (u32)0x0F0F0F0FUL) << 4); \ + ___x = ((___x & (u32)0xCCCCCCCCUL) >> 2) | ((___x & (u32)0x33333333UL) << 2); \ + ___x = ((___x & (u32)0xAAAAAAAAUL) >> 1) | ((___x & (u32)0x55555555UL) << 1); \ + ___x; \ })
#define __constant_bitrev8(x) \ ({ \ - u8 __x = x; \ - __x = (__x >> 4) | (__x << 4); \ - __x = ((__x & (u8)0xCCU) >> 2) | ((__x & (u8)0x33U) << 2); \ - __x = ((__x & (u8)0xAAU) >> 1) | ((__x & (u8)0x55U) << 1); \ - __x; \ + u8 ___x = x; \ + ___x = (___x >> 4) | (___x << 4); \ + ___x = ((___x & (u8)0xCCU) >> 2) | ((___x & (u8)0x33U) << 2); \ + ___x = ((___x & (u8)0xAAU) >> 1) | ((___x & (u8)0x55U) << 1); \ + ___x; \ })
#define bitrev32(x) \
From: S.j. Wang shengjiu.wang@nxp.com
commit 0ff4e8c61b794a4bf6c854ab071a1abaaa80f358 upstream.
There is very low possibility ( < 0.1% ) that channel swap happened in beginning when multi output/input pin is enabled. The issue is that hardware can't send data to correct pin in the beginning with the normal enable flow.
This is hardware issue, but there is no errata, the workaround flow is that: Each time playback/recording, firstly clear the xSMA/xSMB, then enable TE/RE, then enable xSMB and xSMA (xSMB must be enabled before xSMA). Which is to use the xSMA as the trigger start register, previously the xCR_TE or xCR_RE is the bit for starting.
Fixes commit 43d24e76b698 ("ASoC: fsl_esai: Add ESAI CPU DAI driver") Cc: stable@vger.kernel.org Reviewed-by: Fabio Estevam festevam@gmail.com Acked-by: Nicolin Chen nicoleotsuka@gmail.com Signed-off-by: Shengjiu Wang shengjiu.wang@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/fsl/fsl_esai.c | 47 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 37 insertions(+), 10 deletions(-)
--- a/sound/soc/fsl/fsl_esai.c +++ b/sound/soc/fsl/fsl_esai.c @@ -58,6 +58,8 @@ struct fsl_esai { u32 fifo_depth; u32 slot_width; u32 slots; + u32 tx_mask; + u32 rx_mask; u32 hck_rate[2]; u32 sck_rate[2]; bool hck_dir[2]; @@ -358,21 +360,13 @@ static int fsl_esai_set_dai_tdm_slot(str regmap_update_bits(esai_priv->regmap, REG_ESAI_TCCR, ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
- regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMA, - ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(tx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMB, - ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(tx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_RCCR, ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
- regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMA, - ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(rx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMB, - ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(rx_mask)); - esai_priv->slot_width = slot_width; esai_priv->slots = slots; + esai_priv->tx_mask = tx_mask; + esai_priv->rx_mask = rx_mask;
return 0; } @@ -593,6 +587,7 @@ static int fsl_esai_trigger(struct snd_p bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK; u8 i, channels = substream->runtime->channels; u32 pins = DIV_ROUND_UP(channels, esai_priv->slots); + u32 mask;
switch (cmd) { case SNDRV_PCM_TRIGGER_START: @@ -605,15 +600,38 @@ static int fsl_esai_trigger(struct snd_p for (i = 0; tx && i < channels; i++) regmap_write(esai_priv->regmap, REG_ESAI_ETDR, 0x0);
+ /* + * When set the TE/RE in the end of enablement flow, there + * will be channel swap issue for multi data line case. + * In order to workaround this issue, we switch the bit + * enablement sequence to below sequence + * 1) clear the xSMB & xSMA: which is done in probe and + * stop state. + * 2) set TE/RE + * 3) set xSMB + * 4) set xSMA: xSMA is the last one in this flow, which + * will trigger esai to start. + */ regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK, tx ? ESAI_xCR_TE(pins) : ESAI_xCR_RE(pins)); + mask = tx ? esai_priv->tx_mask : esai_priv->rx_mask; + + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMB(tx), + ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(mask)); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMA(tx), + ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(mask)); + break; case SNDRV_PCM_TRIGGER_SUSPEND: case SNDRV_PCM_TRIGGER_STOP: case SNDRV_PCM_TRIGGER_PAUSE_PUSH: regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK, 0); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMA(tx), + ESAI_xSMA_xS_MASK, 0); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMB(tx), + ESAI_xSMB_xS_MASK, 0);
/* Disable and reset FIFO */ regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), @@ -903,6 +921,15 @@ static int fsl_esai_probe(struct platfor return ret; }
+ esai_priv->tx_mask = 0xFFFFFFFF; + esai_priv->rx_mask = 0xFFFFFFFF; + + /* Clear the TSMA, TSMB, RSMA, RSMB */ + regmap_write(esai_priv->regmap, REG_ESAI_TSMA, 0); + regmap_write(esai_priv->regmap, REG_ESAI_TSMB, 0); + regmap_write(esai_priv->regmap, REG_ESAI_RSMA, 0); + regmap_write(esai_priv->regmap, REG_ESAI_RSMB, 0); + ret = devm_snd_soc_register_component(&pdev->dev, &fsl_esai_component, &fsl_esai_dai, 1); if (ret) {
From: Filipe Manana fdmanana@suse.com
commit f35f06c35560a86e841631f0243b83a984dc11a9 upstream.
Whan a filesystem is mounted with the nologreplay mount option, which requires it to be mounted in RO mode as well, we can not allow discard on free space inside block groups, because log trees refer to extents that are not pinned in a block group's free space cache (pinning the extents is precisely the first phase of replaying a log tree).
So do not allow the fitrim ioctl to do anything when the filesystem is mounted with the nologreplay option, because later it can be mounted RW without that option, which causes log replay to happen and result in either a failure to replay the log trees (leading to a mount failure), a crash or some silent corruption.
Reported-by: Darrick J. Wong darrick.wong@oracle.com Fixes: 96da09192cda ("btrfs: Introduce new mount option to disable tree log replay") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/ioctl.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -357,6 +357,16 @@ static noinline int btrfs_ioctl_fitrim(s if (!capable(CAP_SYS_ADMIN)) return -EPERM;
+ /* + * If the fs is mounted with nologreplay, which requires it to be + * mounted in RO mode as well, we can not allow discard on free space + * inside block groups, because log trees refer to extents that are not + * pinned in a block group's free space cache (pinning the extents is + * precisely the first phase of replaying a log tree). + */ + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) + return -EROFS; + rcu_read_lock(); list_for_each_entry_rcu(device, &fs_info->fs_devices->devices, dev_list) {
From: Anand Jain anand.jain@oracle.com
commit 50398fde997f6be8faebdb5f38e9c9c467370f51 upstream.
We let pass zstd compression parameter even if it is not fully valid. For example:
$ btrfs prop set /btrfs compression zst $ btrfs prop get /btrfs compression compression=zst
zlib and lzo are fine.
Fix it by checking the correct prefix length.
Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/props.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -416,7 +416,7 @@ static int prop_compression_apply(struct btrfs_set_fs_incompat(fs_info, COMPRESS_LZO); } else if (!strncmp("zlib", value, 4)) { type = BTRFS_COMPRESS_ZLIB; - } else if (!strncmp("zstd", value, len)) { + } else if (!strncmp("zstd", value, 4)) { type = BTRFS_COMPRESS_ZSTD; btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD); } else {
From: Anand Jain anand.jain@oracle.com
commit 272e5326c7837697882ce3162029ba893059b616 upstream.
The compression property resets to NULL, instead of the old value if we fail to set the new compression parameter.
$ btrfs prop get /btrfs compression compression=lzo $ btrfs prop set /btrfs compression zli ERROR: failed to set compression for /btrfs: Invalid argument $ btrfs prop get /btrfs compression
This is because the compression property ->validate() is successful for 'zli' as the strncmp() used the length passed from the userspace.
Fix it by using the expected string length in strncmp().
Fixes: 63541927c8d1 ("Btrfs: add support for inode properties") Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/props.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -386,11 +386,11 @@ int btrfs_subvol_inherit_props(struct bt
static int prop_compression_validate(const char *value, size_t len) { - if (!strncmp("lzo", value, len)) + if (!strncmp("lzo", value, 3)) return 0; - else if (!strncmp("zlib", value, len)) + else if (!strncmp("zlib", value, 4)) return 0; - else if (!strncmp("zstd", value, len)) + else if (!strncmp("zstd", value, 4)) return 0;
return -EINVAL;
From: Jérôme Glisse jglisse@redhat.com
commit a3761c3c91209b58b6f33bf69dd8bb8ec0c9d925 upstream.
When bio_add_pc_page() fails in bio_copy_user_iov() we should free the page we just allocated otherwise we are leaking it.
Cc: linux-block@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/bio.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/block/bio.c +++ b/block/bio.c @@ -1280,8 +1280,11 @@ struct bio *bio_copy_user_iov(struct req } }
- if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) + if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) { + if (!map_data) + __free_page(page); break; + }
len -= bytes; offset = 0;
From: Jason Yan yanaijie@huawei.com
commit a89afe58f1a74aac768a5eb77af95ef4ee15beaa upstream.
If the last bio returned is not dio->bio, the status of the bio will not assigned to dio->bio if it is error. This will cause the whole IO status wrong.
ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0] <idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0] <idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>] <idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>] ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475] ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7 ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
We always have to assign bio->bi_status to dio->bio.bi_status because we will only check dio->bio.bi_status when we return the whole IO to the upper layer.
Fixes: 542ff7bf18c6 ("block: new direct I/O implementation") Cc: stable@vger.kernel.org Cc: Christoph Hellwig hch@lst.de Cc: Jens Axboe axboe@kernel.dk Reviewed-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jason Yan yanaijie@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/block_dev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -306,10 +306,10 @@ static void blkdev_bio_end_io(struct bio struct blkdev_dio *dio = bio->bi_private; bool should_dirty = dio->should_dirty;
- if (dio->multi_bio && !atomic_dec_and_test(&dio->ref)) { - if (bio->bi_status && !dio->bio.bi_status) - dio->bio.bi_status = bio->bi_status; - } else { + if (bio->bi_status && !dio->bio.bi_status) + dio->bio.bi_status = bio->bi_status; + + if (!dio->multi_bio || atomic_dec_and_test(&dio->ref)) { if (!dio->is_sync) { struct kiocb *iocb = dio->iocb; ssize_t ret;
From: Stephen Boyd swboyd@chromium.org
commit 325aa19598e410672175ed50982f902d4e3f31c5 upstream.
If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.
This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to walk the chain of parents and set irq wake on any chips that don't have the flag set either. If the intent is to call the .irq_set_wake() callback of the parent irqchip, then we expect irqchip implementations to omit the IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that calls irq_chip_set_wake_parent().
The problem has been observed on a Qualcomm sdm845 device where set wake fails on any GPIO interrupts after applying work in progress wakeup irq patches to the GPIO driver. The chain of chips looks like this:
QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)
The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM GIC as parent. The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag set, and so does the grandparent ARM GIC.
The GPIO driver doesn't know if the parent needs to set wake or not, so it unconditionally calls irq_chip_set_wake_parent() causing this function to return a failure because the parent irqchip (PDC) doesn't have the .irq_set_wake() callback set. Returning 0 instead makes everything work and irqs from the GPIO controller can be configured for wakeup.
Make it consistent by returning 0 (success) from irq_chip_set_wake_parent() when a parent chip has IRQCHIP_SKIP_SET_WAKE set.
[ tglx: Massaged changelog ]
Fixes: 08b55e2a9208e ("genirq: Add irqchip_set_wake_parent") Signed-off-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Marc Zyngier marc.zyngier@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-gpio@vger.kernel.org Cc: Lina Iyer ilina@codeaurora.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/chip.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -1363,6 +1363,10 @@ int irq_chip_set_vcpu_affinity_parent(st int irq_chip_set_wake_parent(struct irq_data *data, unsigned int on) { data = data->parent_data; + + if (data->chip->flags & IRQCHIP_SKIP_SET_WAKE) + return 0; + if (data->chip->irq_set_wake) return data->chip->irq_set_wake(data, on);
From: Kefeng Wang wangkefeng.wang@huawei.com
commit e8458e7afa855317b14915d7b86ab3caceea7eb6 upstream.
When CONFIG_SPARSE_IRQ is disable, the request_mutex in struct irq_desc is not initialized which causes malfunction.
Fixes: 9114014cf4e6 ("genirq: Add mutex to irq desc to serialize request/free_irq()") Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Mukesh Ojha mojha@codeaurora.org Cc: Marc Zyngier marc.zyngier@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190404074512.145533-1-wangkefeng.wang@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/irqdesc.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -535,6 +535,7 @@ int __init early_irq_init(void) alloc_masks(&desc[i], node); raw_spin_lock_init(&desc[i].lock); lockdep_set_class(&desc[i].lock, &irq_desc_lock_class); + mutex_init(&desc[i].request_mutex); desc_set_defaults(i, &desc[i], node, NULL, NULL); } return arch_early_irq_init();
From: Cornelia Huck cohuck@redhat.com
commit cf94db21905333e610e479688add629397a4b384 upstream.
vring_create_virtqueue() allows the caller to specify via the may_reduce_num parameter whether the vring code is allowed to allocate a smaller ring than specified.
However, the split ring allocation code tries to allocate a smaller ring on allocation failure regardless of what the caller specified. This may cause trouble for e.g. virtio-pci in legacy mode, which does not support ring resizing. (The packed ring code does not resize in any case.)
Let's fix this by bailing out immediately in the split ring code if the requested size cannot be allocated and may_reduce_num has not been specified.
While at it, fix a typo in the usage instructions.
Fixes: 2a2d1382fe9d ("virtio: Add improved queue allocation API") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Halil Pasic pasic@linux.ibm.com Reviewed-by: Jens Freimann jfreimann@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/virtio/virtio_ring.c | 2 ++ include/linux/virtio_ring.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1087,6 +1087,8 @@ struct virtqueue *vring_create_virtqueue GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO); if (queue) break; + if (!may_reduce_num) + return NULL; }
if (!num) --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -63,7 +63,7 @@ struct virtqueue; /* * Creates a virtqueue and allocates the descriptor ring. If * may_reduce_num is set, then this may allocate a smaller ring than - * expected. The caller should query virtqueue_get_ring_size to learn + * expected. The caller should query virtqueue_get_vring_size to learn * the actual size of the ring. */ struct virtqueue *vring_create_virtqueue(unsigned int index,
From: Peter Ujfalusi peter.ujfalusi@ti.com
commit 6691370646e844be98bb6558c024269791d20bd7 upstream.
Correctly map the regulators used by tlv320aic3106. Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
Cc: Stable@vger.kernel.org # v4.14+ Signed-off-by: Peter Ujfalusi peter.ujfalusi@ti.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/am335x-evmsk.dts | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
--- a/arch/arm/boot/dts/am335x-evmsk.dts +++ b/arch/arm/boot/dts/am335x-evmsk.dts @@ -73,6 +73,24 @@ enable-active-high; };
+ /* TPS79518 */ + v1_8d_reg: fixedregulator-v1_8d { + compatible = "regulator-fixed"; + regulator-name = "v1_8d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + }; + + /* TPS78633 */ + v3_3d_reg: fixedregulator-v3_3d { + compatible = "regulator-fixed"; + regulator-name = "v3_3d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + }; + leds { pinctrl-names = "default"; pinctrl-0 = <&user_leds_s0>; @@ -493,10 +511,10 @@ status = "okay";
/* Regulators */ - AVDD-supply = <&vaux2_reg>; - IOVDD-supply = <&vaux2_reg>; - DRVDD-supply = <&vaux2_reg>; - DVDD-supply = <&vbat>; + AVDD-supply = <&v3_3d_reg>; + IOVDD-supply = <&v3_3d_reg>; + DRVDD-supply = <&v3_3d_reg>; + DVDD-supply = <&v1_8d_reg>; }; };
From: Peter Ujfalusi peter.ujfalusi@ti.com
commit 4f96dc0a3e79ec257a2b082dab3ee694ff88c317 upstream.
Correctly map the regulators used by tlv320aic3106. Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
Cc: Stable@vger.kernel.org # v4.14+ Signed-off-by: Peter Ujfalusi peter.ujfalusi@ti.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/am335x-evm.dts | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
--- a/arch/arm/boot/dts/am335x-evm.dts +++ b/arch/arm/boot/dts/am335x-evm.dts @@ -57,6 +57,24 @@ enable-active-high; };
+ /* TPS79501 */ + v1_8d_reg: fixedregulator-v1_8d { + compatible = "regulator-fixed"; + regulator-name = "v1_8d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + }; + + /* TPS79501 */ + v3_3d_reg: fixedregulator-v3_3d { + compatible = "regulator-fixed"; + regulator-name = "v3_3d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + }; + matrix_keypad: matrix_keypad0 { compatible = "gpio-matrix-keypad"; debounce-delay-ms = <5>; @@ -492,10 +510,10 @@ status = "okay";
/* Regulators */ - AVDD-supply = <&vaux2_reg>; - IOVDD-supply = <&vaux2_reg>; - DRVDD-supply = <&vaux2_reg>; - DVDD-supply = <&vbat>; + AVDD-supply = <&v3_3d_reg>; + IOVDD-supply = <&v3_3d_reg>; + DRVDD-supply = <&v3_3d_reg>; + DVDD-supply = <&v1_8d_reg>; }; };
From: David Engraf david.engraf@sysgo.com
commit e7dfb6d04e4715be1f3eb2c60d97b753fd2e4516 upstream.
The function argument for the ISC_D0 on PC9 was incorrect. According to the documentation it should be 'C' aka 3.
Signed-off-by: David Engraf david.engraf@sysgo.com Reviewed-by: Nicolas Ferre nicolas.ferre@microchip.com Signed-off-by: Ludovic Desroches ludovic.desroches@microchip.com Fixes: 7f16cb676c00 ("ARM: at91/dt: add sama5d2 pinmux") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/sama5d2-pinfunc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/sama5d2-pinfunc.h +++ b/arch/arm/boot/dts/sama5d2-pinfunc.h @@ -518,7 +518,7 @@ #define PIN_PC9__GPIO PINMUX_PIN(PIN_PC9, 0, 0) #define PIN_PC9__FIQ PINMUX_PIN(PIN_PC9, 1, 3) #define PIN_PC9__GTSUCOMP PINMUX_PIN(PIN_PC9, 2, 1) -#define PIN_PC9__ISC_D0 PINMUX_PIN(PIN_PC9, 2, 1) +#define PIN_PC9__ISC_D0 PINMUX_PIN(PIN_PC9, 3, 1) #define PIN_PC9__TIOA4 PINMUX_PIN(PIN_PC9, 4, 2) #define PIN_PC10 74 #define PIN_PC10__GPIO PINMUX_PIN(PIN_PC10, 0, 0)
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -30,8 +30,8 @@ do { \ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -50,30 +50,30 @@ do { \ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr) { - int oldval = 0, ret, tmp; + int oldval, ret, tmp; u32 __user *uaddr = __uaccess_mask_ptr(_uaddr);
pagefault_disable();
switch (op) { case FUTEX_OP_SET: - __futex_atomic_op("mov %w0, %w4", + __futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_ADD: - __futex_atomic_op("add %w0, %w1, %w4", + __futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_OR: - __futex_atomic_op("orr %w0, %w1, %w4", + __futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_ANDN: - __futex_atomic_op("and %w0, %w1, %w4", + __futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg); break; case FUTEX_OP_XOR: - __futex_atomic_op("eor %w0, %w1, %w4", + __futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; default:
From: Peter Geis pgwipeout@gmail.com
commit 6fd8b9780ec1a49ac46e0aaf8775247205e66231 upstream.
Several rk3328 based boards experience high rgmii tx error rates. This is due to several pins in the rk3328.dtsi rgmii pinmux that are missing a defined pull strength setting. This causes the pinmux driver to default to 2ma (bit mask 00).
These pins are only defined in the rk3328.dtsi, and are not listed in the rk3328 specification. The TRM only lists them as "Reserved" (RK3328 TRM V1.1, 3.3.3 Detail Register Description, GRF_GPIO0B_IOMUX, GRF_GPIO0C_IOMUX, GRF_GPIO0D_IOMUX). However, removal of these pins from the rgmii pinmux definition causes the interface to fail to transmit.
Also, the rgmii tx and rx pins defined in the dtsi are not consistent with the rk3328 specification, with tx pins currently set to 12ma and rx pins set to 2ma.
Fix this by setting tx pins to 8ma and the rx pins to 4ma, consistent with the specification. Defining the drive strength for the undefined pins eliminated the high tx packet error rate observed under heavy data transfers. Aligning the drive strength to the TRM values eliminated the occasional packet retry errors under iperf3 testing. This allows much higher data rates with no recorded tx errors.
Tested on the rk3328-roc-cc board.
Fixes: 52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs") Cc: stable@vger.kernel.org Signed-off-by: Peter Geis pgwipeout@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328.dtsi | 44 +++++++++++++++---------------- 1 file changed, 22 insertions(+), 22 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3328.dtsi @@ -1530,50 +1530,50 @@ rgmiim1_pins: rgmiim1-pins { rockchip,pins = /* mac_txclk */ - <1 RK_PB4 2 &pcfg_pull_none_12ma>, + <1 RK_PB4 2 &pcfg_pull_none_8ma>, /* mac_rxclk */ - <1 RK_PB5 2 &pcfg_pull_none_2ma>, + <1 RK_PB5 2 &pcfg_pull_none_4ma>, /* mac_mdio */ - <1 RK_PC3 2 &pcfg_pull_none_2ma>, + <1 RK_PC3 2 &pcfg_pull_none_4ma>, /* mac_txen */ - <1 RK_PD1 2 &pcfg_pull_none_12ma>, + <1 RK_PD1 2 &pcfg_pull_none_8ma>, /* mac_clk */ - <1 RK_PC5 2 &pcfg_pull_none_2ma>, + <1 RK_PC5 2 &pcfg_pull_none_4ma>, /* mac_rxdv */ - <1 RK_PC6 2 &pcfg_pull_none_2ma>, + <1 RK_PC6 2 &pcfg_pull_none_4ma>, /* mac_mdc */ - <1 RK_PC7 2 &pcfg_pull_none_2ma>, + <1 RK_PC7 2 &pcfg_pull_none_4ma>, /* mac_rxd1 */ - <1 RK_PB2 2 &pcfg_pull_none_2ma>, + <1 RK_PB2 2 &pcfg_pull_none_4ma>, /* mac_rxd0 */ - <1 RK_PB3 2 &pcfg_pull_none_2ma>, + <1 RK_PB3 2 &pcfg_pull_none_4ma>, /* mac_txd1 */ - <1 RK_PB0 2 &pcfg_pull_none_12ma>, + <1 RK_PB0 2 &pcfg_pull_none_8ma>, /* mac_txd0 */ - <1 RK_PB1 2 &pcfg_pull_none_12ma>, + <1 RK_PB1 2 &pcfg_pull_none_8ma>, /* mac_rxd3 */ - <1 RK_PB6 2 &pcfg_pull_none_2ma>, + <1 RK_PB6 2 &pcfg_pull_none_4ma>, /* mac_rxd2 */ - <1 RK_PB7 2 &pcfg_pull_none_2ma>, + <1 RK_PB7 2 &pcfg_pull_none_4ma>, /* mac_txd3 */ - <1 RK_PC0 2 &pcfg_pull_none_12ma>, + <1 RK_PC0 2 &pcfg_pull_none_8ma>, /* mac_txd2 */ - <1 RK_PC1 2 &pcfg_pull_none_12ma>, + <1 RK_PC1 2 &pcfg_pull_none_8ma>,
/* mac_txclk */ - <0 RK_PB0 1 &pcfg_pull_none>, + <0 RK_PB0 1 &pcfg_pull_none_8ma>, /* mac_txen */ - <0 RK_PB4 1 &pcfg_pull_none>, + <0 RK_PB4 1 &pcfg_pull_none_8ma>, /* mac_clk */ - <0 RK_PD0 1 &pcfg_pull_none>, + <0 RK_PD0 1 &pcfg_pull_none_4ma>, /* mac_txd1 */ - <0 RK_PC0 1 &pcfg_pull_none>, + <0 RK_PC0 1 &pcfg_pull_none_8ma>, /* mac_txd0 */ - <0 RK_PC1 1 &pcfg_pull_none>, + <0 RK_PC1 1 &pcfg_pull_none_8ma>, /* mac_txd3 */ - <0 RK_PC7 1 &pcfg_pull_none>, + <0 RK_PC7 1 &pcfg_pull_none_8ma>, /* mac_txd2 */ - <0 RK_PC6 1 &pcfg_pull_none>; + <0 RK_PC6 1 &pcfg_pull_none_8ma>; };
rmiim1_pins: rmiim1-pins {
From: Will Deacon will.deacon@arm.com
commit 1e6f5440a6814d28c32d347f338bfef68bc3e69d upstream.
Calling dump_backtrace() with a pt_regs argument corresponding to userspace doesn't make any sense and our unwinder will simply print "Call trace:" before unwinding the stack looking for user frames.
Rather than go through this song and dance, just return early if we're passed a user register state.
Cc: stable@vger.kernel.org Fixes: 1149aad10b1e ("arm64: Add dump_backtrace() in show_regs") Reported-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/traps.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
--- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -145,10 +145,16 @@ static void dump_instr(const char *lvl, void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) { struct stackframe frame; - int skip; + int skip = 0;
pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
+ if (regs) { + if (user_mode(regs)) + return; + skip = 1; + } + if (!tsk) tsk = current;
@@ -169,7 +175,6 @@ void dump_backtrace(struct pt_regs *regs frame.graph = tsk->curr_ret_stack; #endif
- skip = !!regs; printk("Call trace:\n"); while (1) { unsigned long stack; @@ -232,15 +237,13 @@ static int __die(const char *str, int er return ret;
print_modules(); - __show_regs(regs); pr_emerg("Process %.*s (pid: %d, stack limit = 0x%p)\n", TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), end_of_stack(tsk)); + show_regs(regs);
- if (!user_mode(regs)) { - dump_backtrace(regs, tsk); + if (!user_mode(regs)) dump_instr(KERN_EMERG, regs); - }
return ret; }
From: Dan Carpenter dan.carpenter@oracle.com
commit 42d8644bd77dd2d747e004e367cb0c895a606f39 upstream.
The "call" variable comes from the user in privcmd_ioctl_hypercall(). It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32) elements. We need to put an upper bound on it to prevent an out of bounds access.
Cc: stable@vger.kernel.org Fixes: 1246ae0bb992 ("xen: add variable hypercall caller") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/xen/hypercall.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -217,6 +217,9 @@ privcmd_call(unsigned call, __HYPERCALL_DECLS; __HYPERCALL_5ARG(a1, a2, a3, a4, a5);
+ if (call >= PAGE_SIZE / sizeof(hypercall_page[0])) + return -EINVAL; + stac(); asm volatile(CALL_NOSPEC : __HYPERCALL_5PARAM
From: Mel Gorman mgorman@techsingularity.net
commit 0e9f02450da07fc7b1346c8c32c771555173e397 upstream.
A NULL pointer dereference bug was reported on a distribution kernel but the same issue should be present on mainline kernel. It occured on s390 but should not be arch-specific. A partial oops looks like:
Unable to handle kernel pointer dereference in virtual kernel address space ... Call Trace: ... try_to_wake_up+0xfc/0x450 vhost_poll_wakeup+0x3a/0x50 [vhost] __wake_up_common+0xbc/0x178 __wake_up_common_lock+0x9e/0x160 __wake_up_sync_key+0x4e/0x60 sock_def_readable+0x5e/0x98
The bug hits any time between 1 hour to 3 days. The dereference occurs in update_cfs_rq_h_load when accumulating h_load. The problem is that cfq_rq->h_load_next is not protected by any locking and can be updated by parallel calls to task_h_load. Depending on the compiler, code may be generated that re-reads cfq_rq->h_load_next after the check for NULL and then oops when reading se->avg.load_avg. The dissassembly showed that it was possible to reread h_load_next after the check for NULL.
While this does not appear to be an issue for later compilers, it's still an accident if the correct code is generated. Full locking in this path would have high overhead so this patch uses READ_ONCE to read h_load_next only once and check for NULL before dereferencing. It was confirmed that there were no further oops after 10 days of testing.
As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any potential problems with store tearing.
Signed-off-by: Mel Gorman mgorman@techsingularity.net Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Valentin Schneider valentin.schneider@arm.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()") Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.ne... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7018,10 +7018,10 @@ static void update_cfs_rq_h_load(struct if (cfs_rq->last_h_load_update == now) return;
- cfs_rq->h_load_next = NULL; + WRITE_ONCE(cfs_rq->h_load_next, NULL); for_each_sched_entity(se) { cfs_rq = cfs_rq_of(se); - cfs_rq->h_load_next = se; + WRITE_ONCE(cfs_rq->h_load_next, se); if (cfs_rq->last_h_load_update == now) break; } @@ -7031,7 +7031,7 @@ static void update_cfs_rq_h_load(struct cfs_rq->last_h_load_update = now; }
- while ((se = cfs_rq->h_load_next) != NULL) { + while ((se = READ_ONCE(cfs_rq->h_load_next)) != NULL) { load = cfs_rq->h_load; load = div64_ul(load * se->avg.load_avg, cfs_rq_load_avg(cfs_rq) + 1);
From: Max Filippov jcmvbkbc@gmail.com
commit ada770b1e74a77fff2d5f539bf6c42c25f4784db upstream.
return_address returns the address that is one level higher in the call stack than requested in its argument, because level 0 corresponds to its caller's return address. Use requested level as the number of stack frames to skip.
This fixes the address reported by might_sleep and friends.
Cc: stable@vger.kernel.org Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/xtensa/kernel/stacktrace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/xtensa/kernel/stacktrace.c +++ b/arch/xtensa/kernel/stacktrace.c @@ -253,10 +253,14 @@ static int return_address_cb(struct stac return 1; }
+/* + * level == 0 is for the return address from the caller of this function, + * not from this function itself. + */ unsigned long return_address(unsigned level) { struct return_addr_data r = { - .skip = level + 1, + .skip = level, }; walk_stackframe(stack_pointer(NULL), return_address_cb, &r); return r.addr;
From: Lendacky, Thomas Thomas.Lendacky@amd.com
commit 914123fa39042e651d79eaf86bbf63a1b938dddf upstream.
On AMD processors, the detection of an overflowed counter in the NMI handler relies on the current value of the counter. So, for example, to check for overflow on a 48 bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed).
There is currently a race condition present when disabling and then updating the PMC. Increased NMI latency in newer AMD processors makes this race condition more pronounced. If the counter value has overflowed, it is possible to update the PMC value before the NMI handler can run. The updated PMC value is not an overflowed value, so when the perf NMI handler does run, it will not find an overflowed counter. This may appear as an unknown NMI resulting in either a panic or a series of messages, depending on how the kernel is configured.
To eliminate this race condition, the PMC value must be checked after disabling the counter. Add an AMD function, amd_pmu_disable_all(), that will wait for the NMI handler to reset any active and overflowed counter after calling x86_pmu_disable_all().
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.14.x- Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 65 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 3 deletions(-)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -3,6 +3,7 @@ #include <linux/types.h> #include <linux/init.h> #include <linux/slab.h> +#include <linux/delay.h> #include <asm/apicdef.h>
#include "../perf_event.h" @@ -429,6 +430,64 @@ static void amd_pmu_cpu_dead(int cpu) } }
+/* + * When a PMC counter overflows, an NMI is used to process the event and + * reset the counter. NMI latency can result in the counter being updated + * before the NMI can run, which can result in what appear to be spurious + * NMIs. This function is intended to wait for the NMI to run and reset + * the counter to avoid possible unhandled NMI messages. + */ +#define OVERFLOW_WAIT_COUNT 50 + +static void amd_pmu_wait_on_overflow(int idx) +{ + unsigned int i; + u64 counter; + + /* + * Wait for the counter to be reset if it has overflowed. This loop + * should exit very, very quickly, but just in case, don't wait + * forever... + */ + for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) { + rdmsrl(x86_pmu_event_addr(idx), counter); + if (counter & (1ULL << (x86_pmu.cntval_bits - 1))) + break; + + /* Might be in IRQ context, so can't sleep */ + udelay(1); + } +} + +static void amd_pmu_disable_all(void) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + int idx; + + x86_pmu_disable_all(); + + /* + * This shouldn't be called from NMI context, but add a safeguard here + * to return, since if we're in NMI context we can't wait for an NMI + * to reset an overflowed counter value. + */ + if (in_nmi()) + return; + + /* + * Check each counter for overflow and wait for it to be reset by the + * NMI if it has overflowed. This relies on the fact that all active + * counters are always enabled when this function is caled and + * ARCH_PERFMON_EVENTSEL_INT is always set. + */ + for (idx = 0; idx < x86_pmu.num_counters; idx++) { + if (!test_bit(idx, cpuc->active_mask)) + continue; + + amd_pmu_wait_on_overflow(idx); + } +} + static struct event_constraint * amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, struct perf_event *event) @@ -622,7 +681,7 @@ static ssize_t amd_event_sysfs_show(char static __initconst const struct x86_pmu amd_pmu = { .name = "AMD", .handle_irq = x86_pmu_handle_irq, - .disable_all = x86_pmu_disable_all, + .disable_all = amd_pmu_disable_all, .enable_all = x86_pmu_enable_all, .enable = x86_pmu_enable_event, .disable = x86_pmu_disable_event, @@ -728,7 +787,7 @@ void amd_pmu_enable_virt(void) cpuc->perf_ctr_virt_mask = 0;
/* Reload all events */ - x86_pmu_disable_all(); + amd_pmu_disable_all(); x86_pmu_enable_all(0); } EXPORT_SYMBOL_GPL(amd_pmu_enable_virt); @@ -746,7 +805,7 @@ void amd_pmu_disable_virt(void) cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY;
/* Reload all events */ - x86_pmu_disable_all(); + amd_pmu_disable_all(); x86_pmu_enable_all(0); } EXPORT_SYMBOL_GPL(amd_pmu_disable_virt);
From: Lendacky, Thomas Thomas.Lendacky@amd.com
commit 6d3edaae16c6c7d238360f2841212c2b26774d5e upstream.
On AMD processors, the detection of an overflowed PMC counter in the NMI handler relies on the current value of the PMC. So, for example, to check for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed).
When the perf NMI handler executes it does not know in advance which PMC counters have overflowed. As such, the NMI handler will process all active PMC counters that have overflowed. NMI latency in newer AMD processors can result in multiple overflowed PMC counters being processed in one NMI and then a subsequent NMI, that does not appear to be a back-to-back NMI, not finding any PMC counters that have overflowed. This may appear to be an unhandled NMI resulting in either a panic or a series of messages, depending on how the kernel was configured.
To mitigate this issue, add an AMD handle_irq callback function, amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq() function and upon return perform some additional processing that will indicate if the NMI has been handled or would have been handled had an earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a minimum value of the number of active PMCs or 2 will be set whenever a PMC is active. This is used to indicate the possible number of NMIs that can still occur. The value of 2 is used for when an NMI does not arrive at the LAPIC in time to be collapsed into an already pending NMI. Each time the function is called without having handled an overflowed counter, the per-CPU value is checked. If the value is non-zero, it is decremented and the NMI indicates that it handled the NMI. If the value is zero, then the NMI indicates that it did not handle the NMI.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.14.x- Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 56 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -4,10 +4,13 @@ #include <linux/init.h> #include <linux/slab.h> #include <linux/delay.h> +#include <linux/nmi.h> #include <asm/apicdef.h>
#include "../perf_event.h"
+static DEFINE_PER_CPU(unsigned int, perf_nmi_counter); + static __initconst const u64 amd_hw_cache_event_ids [PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_OP_MAX] @@ -488,6 +491,57 @@ static void amd_pmu_disable_all(void) } }
+/* + * Because of NMI latency, if multiple PMC counters are active or other sources + * of NMIs are received, the perf NMI handler can handle one or more overflowed + * PMC counters outside of the NMI associated with the PMC overflow. If the NMI + * doesn't arrive at the LAPIC in time to become a pending NMI, then the kernel + * back-to-back NMI support won't be active. This PMC handler needs to take into + * account that this can occur, otherwise this could result in unknown NMI + * messages being issued. Examples of this is PMC overflow while in the NMI + * handler when multiple PMCs are active or PMC overflow while handling some + * other source of an NMI. + * + * Attempt to mitigate this by using the number of active PMCs to determine + * whether to return NMI_HANDLED if the perf NMI handler did not handle/reset + * any PMCs. The per-CPU perf_nmi_counter variable is set to a minimum of the + * number of active PMCs or 2. The value of 2 is used in case an NMI does not + * arrive at the LAPIC in time to be collapsed into an already pending NMI. + */ +static int amd_pmu_handle_irq(struct pt_regs *regs) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + int active, handled; + + /* + * Obtain the active count before calling x86_pmu_handle_irq() since + * it is possible that x86_pmu_handle_irq() may make a counter + * inactive (through x86_pmu_stop). + */ + active = __bitmap_weight(cpuc->active_mask, X86_PMC_IDX_MAX); + + /* Process any counter overflows */ + handled = x86_pmu_handle_irq(regs); + + /* + * If a counter was handled, record the number of possible remaining + * NMIs that can occur. + */ + if (handled) { + this_cpu_write(perf_nmi_counter, + min_t(unsigned int, 2, active)); + + return handled; + } + + if (!this_cpu_read(perf_nmi_counter)) + return NMI_DONE; + + this_cpu_dec(perf_nmi_counter); + + return NMI_HANDLED; +} + static struct event_constraint * amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, struct perf_event *event) @@ -680,7 +734,7 @@ static ssize_t amd_event_sysfs_show(char
static __initconst const struct x86_pmu amd_pmu = { .name = "AMD", - .handle_irq = x86_pmu_handle_irq, + .handle_irq = amd_pmu_handle_irq, .disable_all = amd_pmu_disable_all, .enable_all = x86_pmu_enable_all, .enable = x86_pmu_enable_event,
From: Lendacky, Thomas Thomas.Lendacky@amd.com
commit 3966c3feca3fd10b2935caa0b4a08c7dd59469e5 upstream.
Spurious interrupt support was added to perf in the following commit, almost a decade ago:
63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters")
The two previous patches (resolving the race condition when disabling a PMC and NMI latency mitigation) allow for the removal of this older spurious interrupt support.
Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap is cleared before disabling the PMC, which sets up a race condition. This race condition was mitigated by introducing the running bitmap. That race condition can be eliminated by first disabling the PMC, waiting for PMC reset on overflow and then clearing the bit for the PMC in the active_mask bitmap. The NMI handler will not re-enable a disabled counter.
If x86_pmu_stop() is called from the perf NMI handler, the NMI latency mitigation support will guard against any unhandled NMI messages.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.14.x- Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 21 +++++++++++++++++++-- arch/x86/events/core.c | 13 +++---------- 2 files changed, 22 insertions(+), 12 deletions(-)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -4,8 +4,8 @@ #include <linux/init.h> #include <linux/slab.h> #include <linux/delay.h> -#include <linux/nmi.h> #include <asm/apicdef.h> +#include <asm/nmi.h>
#include "../perf_event.h"
@@ -491,6 +491,23 @@ static void amd_pmu_disable_all(void) } }
+static void amd_pmu_disable_event(struct perf_event *event) +{ + x86_pmu_disable_event(event); + + /* + * This can be called from NMI context (via x86_pmu_stop). The counter + * may have overflowed, but either way, we'll never see it get reset + * by the NMI if we're already in the NMI. And the NMI latency support + * below will take care of any pending NMI that might have been + * generated by the overflow. + */ + if (in_nmi()) + return; + + amd_pmu_wait_on_overflow(event->hw.idx); +} + /* * Because of NMI latency, if multiple PMC counters are active or other sources * of NMIs are received, the perf NMI handler can handle one or more overflowed @@ -738,7 +755,7 @@ static __initconst const struct x86_pmu .disable_all = amd_pmu_disable_all, .enable_all = x86_pmu_enable_all, .enable = x86_pmu_enable_event, - .disable = x86_pmu_disable_event, + .disable = amd_pmu_disable_event, .hw_config = amd_pmu_hw_config, .schedule_events = x86_schedule_events, .eventsel = MSR_K7_EVNTSEL0, --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1328,8 +1328,9 @@ void x86_pmu_stop(struct perf_event *eve struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); struct hw_perf_event *hwc = &event->hw;
- if (__test_and_clear_bit(hwc->idx, cpuc->active_mask)) { + if (test_bit(hwc->idx, cpuc->active_mask)) { x86_pmu.disable(event); + __clear_bit(hwc->idx, cpuc->active_mask); cpuc->events[hwc->idx] = NULL; WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); hwc->state |= PERF_HES_STOPPED; @@ -1426,16 +1427,8 @@ int x86_pmu_handle_irq(struct pt_regs *r apic_write(APIC_LVTPC, APIC_DM_NMI);
for (idx = 0; idx < x86_pmu.num_counters; idx++) { - if (!test_bit(idx, cpuc->active_mask)) { - /* - * Though we deactivated the counter some cpus - * might still deliver spurious interrupts still - * in flight. Catch them: - */ - if (__test_and_clear_bit(idx, cpuc->running)) - handled++; + if (!test_bit(idx, cpuc->active_mask)) continue; - }
event = cpuc->events[idx];
From: Andre Przywara andre.przywara@arm.com
commit 9cde402a59770a0669d895399c13407f63d7d209 upstream.
There is a Marvell 88SE9170 PCIe SATA controller I found on a board here. Some quick testing with the ARM SMMU enabled reveals that it suffers from the same requester ID mixup problems as the other Marvell chips listed already.
Add the PCI vendor/device ID to the list of chips which need the workaround.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3888,6 +3888,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_M /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c14 */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9130, quirk_dma_func1_alias); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9170, + quirk_dma_func1_alias); /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c47 + c57 */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9172, quirk_dma_func1_alias);
From: Ilya Dryomov idryomov@gmail.com
commit eb40c0acdc342b815d4d03ae6abb09e80c0f2988 upstream.
Some devices don't use blk_integrity but still want stable pages because they do their own checksumming. Examples include rbd and iSCSI when data digests are negotiated. Stacking DM (and thus LVM) on top of these devices results in sporadic checksum errors.
Set BDI_CAP_STABLE_WRITES if any underlying device has it set.
Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-table.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
--- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1789,6 +1789,36 @@ static bool dm_table_supports_discards(s return true; }
+static int device_requires_stable_pages(struct dm_target *ti, + struct dm_dev *dev, sector_t start, + sector_t len, void *data) +{ + struct request_queue *q = bdev_get_queue(dev->bdev); + + return q && bdi_cap_stable_pages_required(q->backing_dev_info); +} + +/* + * If any underlying device requires stable pages, a table must require + * them as well. Only targets that support iterate_devices are considered: + * don't want error, zero, etc to require stable pages. + */ +static bool dm_table_requires_stable_pages(struct dm_table *t) +{ + struct dm_target *ti; + unsigned i; + + for (i = 0; i < dm_table_get_num_targets(t); i++) { + ti = dm_table_get_target(t, i); + + if (ti->type->iterate_devices && + ti->type->iterate_devices(ti, device_requires_stable_pages, NULL)) + return true; + } + + return false; +} + void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, struct queue_limits *limits) { @@ -1838,6 +1868,15 @@ void dm_table_set_restrictions(struct dm dm_table_verify_integrity(t);
/* + * Some devices don't use blk_integrity but still want stable pages + * because they do their own checksumming. + */ + if (dm_table_requires_stable_pages(t)) + q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; + else + q->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES; + + /* * Determine whether or not this queue's I/O timings contribute * to the entropy pool, Only request-based targets use this. * Clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
From: Katsuhiro Suzuki katsuhiro@katsuster.net
commit ef05bcb60c1a8841e38c91923ba998181117a87c upstream.
This patch fixes pin assign of vcc_host1_5v. This regulator is controlled by USB20_HOST_DRV signal.
ROCK64 schematic says that GPIO0_A2 pin is used as USB20_HOST_DRV. GPIO0_D3 pin is for SPDIF_TX_M0.
Signed-off-by: Katsuhiro Suzuki katsuhiro@katsuster.net Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts @@ -83,7 +83,7 @@ vcc_host1_5v: vcc_otg_5v: vcc-host1-5v-regulator { compatible = "regulator-fixed"; enable-active-high; - gpio = <&gpio0 RK_PD3 GPIO_ACTIVE_HIGH>; + gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_HIGH>; pinctrl-names = "default"; pinctrl-0 = <&usb20_host_drv>; regulator-name = "vcc_host1_5v"; @@ -275,7 +275,7 @@
usb2 { usb20_host_drv: usb20-host-drv { - rockchip,pins = <0 RK_PD3 RK_FUNC_GPIO &pcfg_pull_none>; + rockchip,pins = <0 RK_PA2 RK_FUNC_GPIO &pcfg_pull_none>; }; };
From: Tomohiro Mayama parly-gh@iris.mystia.org
commit a8772e5d826d0f61f8aa9c284b3ab49035d5273d upstream.
This patch makes USB ports functioning again.
Fixes: 955bebde057e ("arm64: dts: rockchip: add rk3328-rock64 board") Cc: stable@vger.kernel.org Suggested-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Tomohiro Mayama parly-gh@iris.mystia.org Tested-by: Katsuhiro Suzuki katsuhiro@katsuster.net Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts @@ -82,8 +82,7 @@
vcc_host1_5v: vcc_otg_5v: vcc-host1-5v-regulator { compatible = "regulator-fixed"; - enable-active-high; - gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_HIGH>; + gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_LOW>; pinctrl-names = "default"; pinctrl-0 = <&usb20_host_drv>; regulator-name = "vcc_host1_5v";
stable-rc/linux-4.14.y boot: 107 boots: 1 failed, 96 passed with 9 offline, 1 untried/unknown (v4.14.111-70-g58023abef2c4)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.14... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.111-70...
Tree: stable-rc Branch: linux-4.14.y Git Describe: v4.14.111-70-g58023abef2c4 Git Commit: 58023abef2c4691261db5b40563dc18cc95f3c37 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 62 unique boards, 24 SoC families, 14 builds out of 201
Boot Failure Detected:
arm64:
defconfig: gcc-7: rk3399-firefly: 1 failed lab
Offline Platforms:
arm:
qcom_defconfig: gcc-7 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab
tegra_defconfig: gcc-7 tegra20-iris-512: 1 offline lab
multi_v7_defconfig: gcc-7 qcom-apq8064-cm-qs600: 1 offline lab qcom-apq8064-ifc6410: 1 offline lab sun5i-r8-chip: 1 offline lab tegra20-iris-512: 1 offline lab
sunxi_defconfig: gcc-7 sun5i-r8-chip: 1 offline lab
arm64:
defconfig: gcc-7 apq8016-sbc: 1 offline lab
--- For more info write to info@kernelci.org
On Tue, 16 Apr 2019 at 00:32, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.112 release. There are 69 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:38 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.14.112-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 58023abef2c4691261db5b40563dc18cc95f3c37 git describe: v4.14.111-70-g58023abef2c4 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.111-7...
No regressions (compared to build v4.14.111)
No fixes (compared to build v4.14.111)
Ran 23296 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-commands-tests * ltp-containers-tests * ltp-cve-tests * ltp-dio-tests * ltp-fs-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-sched-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * kvm-unit-tests * ltp-cap_bounds-tests * ltp-cpuhotplug-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-nptl-tests * ltp-pty-tests * ltp-securebits-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On 15/04/2019 19:58, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.112 release. There are 69 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:38 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 22 tests: 22 pass, 0 fail
Linux version: 4.14.112-rc1-g58023ab Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Mon, Apr 15, 2019 at 08:58:18PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.112 release. There are 69 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:38 UTC 2019. Anything received after that time might be too late.
Build results: total: 172 pass: 172 fail: 0 Qemu test results: total: 333 pass: 333 fail: 0
Guenter
On 4/15/19 12:58 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.112 release. There are 69 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:38 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.112-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org