This is the start of the stable review cycle for the 5.0.8 release. There are 117 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:42 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.0.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.0.8-rc1
Gerd Hoffmann kraxel@redhat.com drm/virtio: do NOT reuse resource ids
Marc Orr marcorr@google.com KVM: x86: nVMX: fix x2APIC VTPR read intercept
Marc Orr marcorr@google.com KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)
Mikulas Patocka mpatocka@redhat.com dm integrity: fix deadlock with overlapping I/O
Mike Snitzer snitzer@redhat.com dm: disable DISCARD if the underlying storage no longer supports it
Ilya Dryomov idryomov@gmail.com dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors
Mikulas Patocka mpatocka@redhat.com dm: revert 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
Mikulas Patocka mpatocka@redhat.com dm integrity: change memcmp to strncmp in dm_integrity_ctr
Nicholas Piggin npiggin@gmail.com powerpc/64s/radix: Fix radix segment exception handling
Chuck Lever chuck.lever@oracle.com xprtrdma: Fix helper that drains the transport
Sergey Miroshnichenko s.miroshnichenko@yadro.com PCI: pciehp: Ignore Link State Changes after powering off a slot
Andre Przywara andre.przywara@arm.com PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
Lendacky, Thomas Thomas.Lendacky@amd.com x86/perf/amd: Remove need to check "running" bit in NMI handler
Lendacky, Thomas Thomas.Lendacky@amd.com x86/perf/amd: Resolve NMI latency issues for active PMCs
Lendacky, Thomas Thomas.Lendacky@amd.com x86/perf/amd: Resolve race condition when disabling PMC
Alexander Potapenko glider@google.com x86/asm: Use stricter assembly constraints in bitops
Rasmus Villemoes linux@rasmusvillemoes.dk x86/asm: Remove dead __GNUC__ conditionals
Dmitry V. Levin ldv@altlinux.org csky: Fix syscall_get_arguments() and syscall_set_arguments()
Max Filippov jcmvbkbc@gmail.com xtensa: fix return_address
Mel Gorman mgorman@techsingularity.net sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
Dan Carpenter dan.carpenter@oracle.com xen: Prevent buffer overflow in privcmd ioctl
Moni Shoua monis@mellanox.com IB/mlx5: Reset access mask when looping inside page fault handler
Ard Biesheuvel ard.biesheuvel@linaro.org arm64/ftrace: fix inadvertent BUG() in trampoline check
Will Deacon will.deacon@arm.com arm64: backtrace: Don't bother trying to unwind the userspace stack
Peter Geis pgwipeout@gmail.com arm64: dts: rockchip: fix rk3328 rgmii high tx error rate
Tomohiro Mayama parly-gh@iris.mystia.org arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity on rk3328-rock64
Will Deacon will.deacon@arm.com arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
David Engraf david.engraf@sysgo.com ARM: dts: at91: Fix typo in ISC_D0 on PC9
David Summers beagleboard@davidjohnsummers.uk ARM: dts: rockchip: Fix SD card detection on rk3288-tinker
Peter Ujfalusi peter.ujfalusi@ti.com ARM: dts: am335x-evm: Correct the regulators for the audio codec
Peter Ujfalusi peter.ujfalusi@ti.com ARM: dts: am335x-evmsk: Correct the regulators for the audio codec
Jonas Karlman jonas@kwiboo.se ARM: dts: rockchip: fix rk3288 cpu opp node reference
Janusz Krzysztofik jmkrzyszt@gmail.com ARM: OMAP1: ams-delta: Fix broken GPIO ID allocation
Jani Nikula jani.nikula@intel.com drm/i915/dp: revert back to max link rate and lane count on eDP
Cornelia Huck cohuck@redhat.com virtio: Honour 'may_reduce_num' in vring_create_virtqueue
Kefeng Wang wangkefeng.wang@huawei.com genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n
Stephen Boyd swboyd@chromium.org genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
Jason Yan yanaijie@huawei.com block: fix the return errno for direct IO
Jérôme Glisse jglisse@redhat.com block: do not leak memory in bio_copy_user_iov()
Bart Van Assche bvanassche@acm.org block: Revert v5.0 blk_mq_request_issue_directly() changes
Dmitry V. Levin ldv@altlinux.org riscv: Fix syscall_get_arguments() and syscall_set_arguments()
Anand Jain anand.jain@oracle.com btrfs: prop: fix vanished compression property after failed set
Anand Jain anand.jain@oracle.com btrfs: prop: fix zstd compression parameter validation
Filipe Manana fdmanana@suse.com Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
S.j. Wang shengjiu.wang@nxp.com ASoC: fsl_esai: fix channel swap issue when stream starts
Guenter Roeck linux@roeck-us.net ASoC: intel: Fix crash at suspend/resume after failed codec registration
Greg Thelen gthelen@google.com mm: writeback: use exact memcg dirty counts
Arnd Bergmann arnd@arndb.de include/linux/bitrev.h: fix constant bitrev
David Rientjes rientjes@google.com kvm: svm: fix potential get_num_contig_pages overflow
Dave Airlie airlied@redhat.com drm/udl: add a release method and delay modeset teardown
Jernej Skrabec jernej.skrabec@siol.net drm/sun4i: DW HDMI: Lower max. supported rate for H6
Yan Zhao yan.y.zhao@intel.com drm/i915/gvt: do not deliver a workload if its creation fails
Andrei Vagin avagin@gmail.com alarmtimer: Return correct remaining time
Sven Schnelle svens@stackframe.org parisc: also set iaoq_b in instruction_pointer_set()
Sven Schnelle svens@stackframe.org parisc: regs_return_value() should return gpr28
Helge Deller deller@gmx.de parisc: Detect QEMU earlier in boot process
Faiz Abbas faiz_abbas@ti.com mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning
Daniel Drake drake@endlessm.com mmc: alcor: don't write data before command has completed
Peter Geis pgwipeout@gmail.com arm64: dts: rockchip: fix rk3328 sdmmc0 write errors
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
Hui Wang hui.wang@canonical.com ALSA: hda - Add two more machines to the power_save_blacklist
Oleksandr Andrushchenko oleksandr_andrushchenko@epam.com ALSA: xen-front: Do not use stream buffer size before it is set
Richard Sailer rs@tuxedocomputers.com ALSA: hda/realtek - Add quirk for Tuxedo XC 1509
Jian-Hong Pan jian-hong@endlessm.com ALSA: hda/realtek: Enable headset MIC of Acer TravelMate B114-21 with ALC233
Zubin Mithra zsm@chromium.org ALSA: seq: Fix OOB-reads from strlcpy
Erik Schmauss erik.schmauss@intel.com ACPICA: Namespace: remove address node from global list after method termination
Furquan Shaikh furquan@google.com ACPICA: Clear status of GPEs before enabling them
Peter Hutterer peter.hutterer@who-t.net HID: logitech: Handle 0 scroll events for the m560
Steve French stfrench@microsoft.com SMB3: Allow persistent handle timeout to be configurable on mount
Eddie James eajames@linux.ibm.com hwmon: (occ) Fix power sensor indexing
Axel Lin axel.lin@ingics.com hwmon: (w83773g) Select REGMAP_I2C to fix build error
Greg Kroah-Hartman gregkh@linuxfoundation.org tty: ldisc: add sysctl to prevent autoloading of ldiscs
Greg Kroah-Hartman gregkh@linuxfoundation.org tty: mark Siemens R3964 line discipline as BROKEN
Neil Armstrong narmstrong@baylibre.com Revert "clk: meson: clean-up clock registration"
Nick Desaulniers ndesaulniers@google.com lib/string.c: implement a basic bcmp
Nick Desaulniers ndesaulniers@google.com kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD
Huy Nguyen huyn@mellanox.com net/mlx5e: Update xon formula
Huy Nguyen huyn@mellanox.com net/mlx5e: Update xoff formula
Aditya Pakki pakki001@umn.edu net: mlx5: Add a missing check on idr_find, free buf
Heiner Kallweit hkallweit1@gmail.com r8169: disable default rx interrupt coalescing on RTL8168
Alexander Lobakin alobakin@dlink.ru net: core: netif_receive_skb_list: unlist skb before passing to pt->func
Miaohe Lin linmiaohe@huawei.com net: vrf: Fix ping failed when vrf mtu is set to 0
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
Nikolay Aleksandrov nikolay@cumulusnetworks.com net: bridge: always clear mcast matching struct on reports and leaves
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: ip_gre: fix possible use-after-free in erspan_rcv
Michael Chan michael.chan@broadcom.com bnxt_en: Reset device on RX buffer errors.
Michael Chan michael.chan@broadcom.com bnxt_en: Improve RX consumer index validity check.
Jakub Kicinski jakub.kicinski@netronome.com nfp: disable netpoll on representors
Jakub Kicinski jakub.kicinski@netronome.com nfp: validate the return code from dev_queue_xmit()
Yuval Avnery yuvalav@mellanox.com net/mlx5e: Add a lock on tir list
Gavi Teitz gavi@mellanox.com net/mlx5e: Fix error handling when refreshing TIRs
Stephen Suryaputra ssuryaextr@gmail.com vrf: check accept_source_route on the original netdevice
Dust Li dust.li@linux.alibaba.com tcp: fix a potential NULL pointer dereference in tcp_sk_exit
Koen De Schepper koen.de_schepper@nokia-bell-labs.com tcp: Ensure DCTCP reacts to losses
Xin Long lucien.xin@gmail.com sctp: initialize _pad of sockaddr_in before copying to user memory
Heiner Kallweit hkallweit1@gmail.com r8169: disable ASPM again
Bjørn Mork bjorn@mork.no qmi_wwan: add Olicard 600
Andrea Righi andrea.righi@canonical.com openvswitch: fix flow actions reallocation
Nicolas Dichtel nicolas.dichtel@6wind.com net/sched: fix ->get helper of the matchall cls
Davide Caratti dcaratti@redhat.com net/sched: act_sample: fix divide by zero in the traffic path
Mao Wenan maowenan@huawei.com net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().
Eric Dumazet edumazet@google.com netns: provide pure entropy for net_hash_mix()
Artemy Kovalyov artemyko@mellanox.com net/mlx5: Decrease default mr cache size
Steffen Klassert steffen.klassert@secunet.com net-gro: Fix GRO flush when receiving a GSO packet.
Li RongQing lirongqing@baidu.com net: ethtool: not call vzalloc for zero sized memory request
Jiri Slaby jslaby@suse.cz kcm: switch order of device registration to fix a crash
Lorenzo Bianconi lorenzo.bianconi@redhat.com ipv6: sit: reset ip header pointer in ipip6_rcv
Junwei Hu hujunwei4@huawei.com ipv6: Fix dangling pointer when ipv6 fragment
Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type
Thomas Falcon tlfalcon@linux.ibm.com ibmvnic: Fix completion structure initialization
Haiyang Zhang haiyangz@microsoft.com hv_netvsc: Fix unwanted wakeup after tx_disable
Taehee Yoo ap420073@gmail.com netfilter: nf_tables: add missing ->release_ops() in error path of newrule()
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: use-after-free in dynamic operations
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_compat: use .release_ops and remove list of extension
Masahiro Yamada yamada.masahiro@socionext.com kbuild: pkg: use -f $(srctree)/Makefile to recurse to top Makefile
Yan Zhao yan.y.zhao@intel.com drm/i915/gvt: do not let pin count of shadow mm go negative
-------------
Diffstat:
Makefile | 6 +- arch/arm/boot/dts/am335x-evm.dts | 26 +- arch/arm/boot/dts/am335x-evmsk.dts | 26 +- arch/arm/boot/dts/rk3288-tinker.dtsi | 3 +- arch/arm/boot/dts/rk3288.dtsi | 6 +- arch/arm/boot/dts/sama5d2-pinfunc.h | 2 +- arch/arm/mach-omap1/board-ams-delta.c | 2 + arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 3 +- arch/arm64/boot/dts/rockchip/rk3328.dtsi | 58 ++--- arch/arm64/include/asm/futex.h | 16 +- arch/arm64/include/asm/module.h | 5 + arch/arm64/kernel/ftrace.c | 3 +- arch/arm64/kernel/traps.c | 15 +- arch/csky/include/asm/syscall.h | 10 +- arch/parisc/include/asm/ptrace.h | 5 +- arch/parisc/kernel/process.c | 6 - arch/parisc/kernel/setup.c | 3 + arch/powerpc/kernel/exceptions-64s.S | 12 + arch/riscv/include/asm/syscall.h | 12 +- arch/x86/events/amd/core.c | 140 +++++++++- arch/x86/events/core.c | 13 +- arch/x86/include/asm/bitops.h | 47 ++-- arch/x86/include/asm/string_32.h | 20 -- arch/x86/include/asm/string_64.h | 15 -- arch/x86/include/asm/xen/hypercall.h | 3 + arch/x86/kvm/svm.c | 10 +- arch/x86/kvm/vmx/nested.c | 74 +++--- arch/xtensa/kernel/stacktrace.c | 6 +- block/bio.c | 5 +- block/blk-core.c | 4 +- block/blk-mq-sched.c | 8 +- block/blk-mq.c | 122 ++++----- block/blk-mq.h | 6 +- drivers/acpi/acpica/evgpe.c | 6 +- drivers/acpi/acpica/nsobject.c | 4 + drivers/char/Kconfig | 2 +- drivers/clk/meson/meson-aoclk.c | 15 +- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- drivers/gpu/drm/i915/gvt/scheduler.c | 5 +- drivers/gpu/drm/i915/intel_dp.c | 69 +---- drivers/gpu/drm/sun4i/sun8i_dw_hdmi.c | 9 +- drivers/gpu/drm/udl/udl_drv.c | 1 + drivers/gpu/drm/udl/udl_drv.h | 1 + drivers/gpu/drm/udl/udl_main.c | 8 +- drivers/gpu/drm/virtio/virtgpu_object.c | 13 + drivers/hid/hid-logitech-hidpp.c | 5 +- drivers/hwmon/Kconfig | 1 + drivers/hwmon/occ/common.c | 6 +- drivers/infiniband/hw/mlx5/odp.c | 3 +- drivers/md/dm-core.h | 1 + drivers/md/dm-integrity.c | 12 +- drivers/md/dm-rq.c | 11 +- drivers/md/dm-table.c | 39 +++ drivers/md/dm.c | 30 ++- drivers/mmc/host/alcor.c | 34 +-- drivers/mmc/host/sdhci-omap.c | 38 +++ drivers/net/ethernet/broadcom/bnxt/bnxt.c | 16 +- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 20 +- drivers/net/ethernet/ibm/ibmvnic.c | 5 +- .../ethernet/mellanox/mlx5/core/en/port_buffer.c | 39 +-- .../net/ethernet/mellanox/mlx5/core/en_common.c | 13 +- drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 14 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 20 -- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 4 +- drivers/net/ethernet/realtek/r8169.c | 8 +- drivers/net/hyperv/hyperv_net.h | 1 + drivers/net/hyperv/netvsc.c | 6 +- drivers/net/hyperv/netvsc_drv.c | 32 ++- drivers/net/usb/qmi_wwan.c | 1 + drivers/net/vrf.c | 8 +- drivers/pci/hotplug/pciehp_ctrl.c | 4 + drivers/pci/quirks.c | 2 + drivers/tty/Kconfig | 24 ++ drivers/tty/tty_io.c | 3 + drivers/tty/tty_ldisc.c | 47 ++++ drivers/virtio/virtio_ring.c | 2 + fs/block_dev.c | 8 +- fs/btrfs/ioctl.c | 10 + fs/btrfs/props.c | 8 +- fs/cifs/cifsfs.c | 2 + fs/cifs/cifsglob.h | 8 + fs/cifs/connect.c | 30 ++- fs/cifs/smb2file.c | 4 +- fs/cifs/smb2pdu.c | 14 +- include/linux/bitrev.h | 46 ++-- include/linux/memcontrol.h | 5 +- include/linux/mlx5/driver.h | 2 + include/linux/string.h | 3 + include/linux/virtio_ring.h | 2 +- include/net/ip.h | 2 +- include/net/net_namespace.h | 1 + include/net/netfilter/nf_tables.h | 3 + include/net/netns/hash.h | 10 +- kernel/irq/chip.c | 4 + kernel/irq/irqdesc.c | 1 + kernel/sched/fair.c | 6 +- kernel/time/alarmtimer.c | 2 +- lib/string.c | 20 ++ mm/huge_memory.c | 36 +++ mm/memcontrol.c | 20 +- net/bridge/br_multicast.c | 3 + net/core/dev.c | 4 +- net/core/ethtool.c | 46 ++-- net/core/net_namespace.c | 1 + net/core/skbuff.c | 2 +- net/ipv4/ip_gre.c | 15 +- net/ipv4/ip_input.c | 7 +- net/ipv4/ip_options.c | 4 +- net/ipv4/tcp_dctcp.c | 36 +-- net/ipv4/tcp_ipv4.c | 3 +- net/ipv6/ip6_gre.c | 21 +- net/ipv6/ip6_output.c | 4 +- net/ipv6/ip6_tunnel.c | 4 +- net/ipv6/sit.c | 4 + net/kcm/kcmsock.c | 16 +- net/netfilter/nf_tables_api.c | 16 +- net/netfilter/nft_compat.c | 281 ++++----------------- net/openvswitch/flow_netlink.c | 4 +- net/rds/tcp.c | 2 +- net/sched/act_sample.c | 10 +- net/sched/cls_matchall.c | 5 + net/sctp/protocol.c | 1 + net/sunrpc/xprtrdma/verbs.c | 2 +- scripts/package/Makefile | 4 +- scripts/package/builddeb | 10 +- scripts/package/buildtar | 2 +- scripts/package/mkdebian | 6 +- sound/core/seq/seq_clientmgr.c | 6 +- sound/pci/hda/hda_intel.c | 4 + sound/pci/hda/patch_realtek.c | 31 ++- sound/soc/fsl/fsl_esai.c | 47 +++- sound/soc/intel/atom/sst-mfld-platform-pcm.c | 8 + sound/xen/xen_snd_front_alsa.c | 2 +- 133 files changed, 1287 insertions(+), 847 deletions(-)
[ Upstream commit 663a50ceac75c2208d2ad95365bc8382fd42f44d ]
shadow mm's pin count got increased in workload preparation phase, which is after workload scanning. it will get decreased in complete_current_workload() anyway after workload completion. Sometimes, if a workload meets a scanning error, its shadow mm pin count will not get increased but will get decreased in the end. This patch lets shadow mm's pin count not go below 0.
Fixes: 2707e4446688 ("drm/i915/gvt: vGPU graphics memory virtualization") Cc: zhenyuw@linux.intel.com Cc: stable@vger.kernel.org #4.14+ Signed-off-by: Yan Zhao yan.y.zhao@intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index c7103dd2d8d5..563ab8590061 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1942,7 +1942,7 @@ void _intel_vgpu_mm_release(struct kref *mm_ref) */ void intel_vgpu_unpin_mm(struct intel_vgpu_mm *mm) { - atomic_dec(&mm->pincount); + atomic_dec_if_positive(&mm->pincount); }
/**
[ Upstream commit 175209cce23d6b0669ed5366add2517e26cd75cd ]
'$(MAKE) KBUILD_SRC=' changes the working directory back and forth between objtree and srctree.
It is better to recurse to the top-level Makefile directly.
Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/package/Makefile | 4 ++-- scripts/package/builddeb | 10 +++++----- scripts/package/buildtar | 2 +- scripts/package/mkdebian | 6 ++++-- 4 files changed, 12 insertions(+), 10 deletions(-)
diff --git a/scripts/package/Makefile b/scripts/package/Makefile index 453fecee62f0..aa39c2b5e46a 100644 --- a/scripts/package/Makefile +++ b/scripts/package/Makefile @@ -59,7 +59,7 @@ rpm-pkg: FORCE # binrpm-pkg # --------------------------------------------------------------------------- binrpm-pkg: FORCE - $(MAKE) KBUILD_SRC= + $(MAKE) -f $(srctree)/Makefile $(CONFIG_SHELL) $(MKSPEC) prebuilt > $(objtree)/binkernel.spec +rpmbuild $(RPMOPTS) --define "_builddir $(objtree)" --target \ $(UTS_MACHINE) -bb $(objtree)/binkernel.spec @@ -102,7 +102,7 @@ clean-dirs += $(objtree)/snap/ # tarball targets # --------------------------------------------------------------------------- tar%pkg: FORCE - $(MAKE) KBUILD_SRC= + $(MAKE) -f $(srctree)/Makefile $(CONFIG_SHELL) $(srctree)/scripts/package/buildtar $@
clean-dirs += $(objtree)/tar-install/ diff --git a/scripts/package/builddeb b/scripts/package/builddeb index f43a274f4f1d..8ac25d10a6ad 100755 --- a/scripts/package/builddeb +++ b/scripts/package/builddeb @@ -86,12 +86,12 @@ cp "$($MAKE -s -f $srctree/Makefile image_name)" "$tmpdir/$installed_image_path" if grep -q "^CONFIG_OF_EARLY_FLATTREE=y" $KCONFIG_CONFIG ; then # Only some architectures with OF support have this target if [ -d "${srctree}/arch/$SRCARCH/boot/dts" ]; then - $MAKE KBUILD_SRC= INSTALL_DTBS_PATH="$tmpdir/usr/lib/$packagename" dtbs_install + $MAKE -f $srctree/Makefile INSTALL_DTBS_PATH="$tmpdir/usr/lib/$packagename" dtbs_install fi fi
if grep -q '^CONFIG_MODULES=y' $KCONFIG_CONFIG ; then - INSTALL_MOD_PATH="$tmpdir" $MAKE KBUILD_SRC= modules_install + INSTALL_MOD_PATH="$tmpdir" $MAKE -f $srctree/Makefile modules_install rm -f "$tmpdir/lib/modules/$version/build" rm -f "$tmpdir/lib/modules/$version/source" if [ "$ARCH" = "um" ] ; then @@ -113,14 +113,14 @@ if grep -q '^CONFIG_MODULES=y' $KCONFIG_CONFIG ; then # resign stripped modules MODULE_SIG_ALL="$(grep -s '^CONFIG_MODULE_SIG_ALL=y' $KCONFIG_CONFIG || true)" if [ -n "$MODULE_SIG_ALL" ]; then - INSTALL_MOD_PATH="$tmpdir" $MAKE KBUILD_SRC= modules_sign + INSTALL_MOD_PATH="$tmpdir" $MAKE -f $srctree/Makefile modules_sign fi fi fi
if [ "$ARCH" != "um" ]; then - $MAKE headers_check KBUILD_SRC= - $MAKE headers_install KBUILD_SRC= INSTALL_HDR_PATH="$libc_headers_dir/usr" + $MAKE -f $srctree/Makefile headers_check + $MAKE -f $srctree/Makefile headers_install INSTALL_HDR_PATH="$libc_headers_dir/usr" fi
# Install the maintainer scripts diff --git a/scripts/package/buildtar b/scripts/package/buildtar index d624a07a4e77..cfd2a4a3fe42 100755 --- a/scripts/package/buildtar +++ b/scripts/package/buildtar @@ -57,7 +57,7 @@ dirs=boot # Try to install modules # if grep -q '^CONFIG_MODULES=y' "${KCONFIG_CONFIG}"; then - make ARCH="${ARCH}" O="${objtree}" KBUILD_SRC= INSTALL_MOD_PATH="${tmpdir}" modules_install + make ARCH="${ARCH}" -f ${srctree}/Makefile INSTALL_MOD_PATH="${tmpdir}" modules_install dirs="$dirs lib" fi
diff --git a/scripts/package/mkdebian b/scripts/package/mkdebian index edcad61fe3cd..f030961c5165 100755 --- a/scripts/package/mkdebian +++ b/scripts/package/mkdebian @@ -205,13 +205,15 @@ EOF cat <<EOF > debian/rules #!$(command -v $MAKE) -f
+srctree ?= . + build: $(MAKE) KERNELRELEASE=${version} ARCH=${ARCH} \ - KBUILD_BUILD_VERSION=${revision} KBUILD_SRC= + KBUILD_BUILD_VERSION=${revision} -f $(srctree)/Makefile
binary-arch: $(MAKE) KERNELRELEASE=${version} ARCH=${ARCH} \ - KBUILD_BUILD_VERSION=${revision} KBUILD_SRC= intdeb-pkg + KBUILD_BUILD_VERSION=${revision} -f $(srctree)/Makefile intdeb-pkg
clean: rm -rf debian/*tmp debian/files
[ Upstream commit b8e204006340b7aaf32bd2b9806c692f6e0cb38a ]
Add .release_ops, that is called in case of error at a later stage in the expression initialization path, ie. .select_ops() has been already set up operations and that needs to be undone. This allows us to unwind .select_ops from the error path, ie. release the dynamic operations for this extension.
Moreover, allocate one single operation instead of recycling them, this comes at the cost of consuming a bit more memory per rule, but it simplifies the infrastructure.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 3 + net/netfilter/nf_tables_api.c | 7 +- net/netfilter/nft_compat.c | 281 ++++++------------------------ 3 files changed, 64 insertions(+), 227 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 3d58acf94dd2..0612439909dc 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -691,10 +691,12 @@ static inline void nft_set_gc_batch_add(struct nft_set_gc_batch *gcb, gcb->elems[gcb->head.cnt++] = elem; }
+struct nft_expr_ops; /** * struct nft_expr_type - nf_tables expression type * * @select_ops: function to select nft_expr_ops + * @release_ops: release nft_expr_ops * @ops: default ops, used when no select_ops functions is present * @list: used internally * @name: Identifier @@ -707,6 +709,7 @@ static inline void nft_set_gc_batch_add(struct nft_set_gc_batch *gcb, struct nft_expr_type { const struct nft_expr_ops *(*select_ops)(const struct nft_ctx *, const struct nlattr * const tb[]); + void (*release_ops)(const struct nft_expr_ops *ops); const struct nft_expr_ops *ops; struct list_head list; const char *name; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index e1724f9d8b9d..dc3670f2860e 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2129,6 +2129,7 @@ struct nft_expr *nft_expr_init(const struct nft_ctx *ctx, { struct nft_expr_info info; struct nft_expr *expr; + struct module *owner; int err;
err = nf_tables_expr_parse(ctx, nla, &info); @@ -2148,7 +2149,11 @@ struct nft_expr *nft_expr_init(const struct nft_ctx *ctx, err3: kfree(expr); err2: - module_put(info.ops->type->owner); + owner = info.ops->type->owner; + if (info.ops->type->release_ops) + info.ops->type->release_ops(info.ops); + + module_put(owner); err1: return ERR_PTR(err); } diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c index 0a4bad55a8aa..469f9da5073b 100644 --- a/net/netfilter/nft_compat.c +++ b/net/netfilter/nft_compat.c @@ -22,23 +22,6 @@ #include <linux/netfilter_bridge/ebtables.h> #include <linux/netfilter_arp/arp_tables.h> #include <net/netfilter/nf_tables.h> -#include <net/netns/generic.h> - -struct nft_xt { - struct list_head head; - struct nft_expr_ops ops; - refcount_t refcnt; - - /* used only when transaction mutex is locked */ - unsigned int listcnt; - - /* Unlike other expressions, ops doesn't have static storage duration. - * nft core assumes they do. We use kfree_rcu so that nft core can - * can check expr->ops->size even after nft_compat->destroy() frees - * the nft_xt struct that holds the ops structure. - */ - struct rcu_head rcu_head; -};
/* Used for matches where *info is larger than X byte */ #define NFT_MATCH_LARGE_THRESH 192 @@ -47,46 +30,6 @@ struct nft_xt_match_priv { void *info; };
-struct nft_compat_net { - struct list_head nft_target_list; - struct list_head nft_match_list; -}; - -static unsigned int nft_compat_net_id __read_mostly; -static struct nft_expr_type nft_match_type; -static struct nft_expr_type nft_target_type; - -static struct nft_compat_net *nft_compat_pernet(struct net *net) -{ - return net_generic(net, nft_compat_net_id); -} - -static void nft_xt_get(struct nft_xt *xt) -{ - /* refcount_inc() warns on 0 -> 1 transition, but we can't - * init the reference count to 1 in .select_ops -- we can't - * undo such an increase when another expression inside the same - * rule fails afterwards. - */ - if (xt->listcnt == 0) - refcount_set(&xt->refcnt, 1); - else - refcount_inc(&xt->refcnt); - - xt->listcnt++; -} - -static bool nft_xt_put(struct nft_xt *xt) -{ - if (refcount_dec_and_test(&xt->refcnt)) { - WARN_ON_ONCE(!list_empty(&xt->head)); - kfree_rcu(xt, rcu_head); - return true; - } - - return false; -} - static int nft_compat_chain_validate_dependency(const struct nft_ctx *ctx, const char *tablename) { @@ -281,7 +224,6 @@ nft_target_init(const struct nft_ctx *ctx, const struct nft_expr *expr, struct xt_target *target = expr->ops->data; struct xt_tgchk_param par; size_t size = XT_ALIGN(nla_len(tb[NFTA_TARGET_INFO])); - struct nft_xt *nft_xt; u16 proto = 0; bool inv = false; union nft_entry e = {}; @@ -305,8 +247,6 @@ nft_target_init(const struct nft_ctx *ctx, const struct nft_expr *expr, if (!target->target) return -EINVAL;
- nft_xt = container_of(expr->ops, struct nft_xt, ops); - nft_xt_get(nft_xt); return 0; }
@@ -325,8 +265,8 @@ nft_target_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) if (par.target->destroy != NULL) par.target->destroy(&par);
- if (nft_xt_put(container_of(expr->ops, struct nft_xt, ops))) - module_put(me); + module_put(me); + kfree(expr->ops); }
static int nft_extension_dump_info(struct sk_buff *skb, int attr, @@ -499,7 +439,6 @@ __nft_match_init(const struct nft_ctx *ctx, const struct nft_expr *expr, struct xt_match *match = expr->ops->data; struct xt_mtchk_param par; size_t size = XT_ALIGN(nla_len(tb[NFTA_MATCH_INFO])); - struct nft_xt *nft_xt; u16 proto = 0; bool inv = false; union nft_entry e = {}; @@ -515,13 +454,7 @@ __nft_match_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
nft_match_set_mtchk_param(&par, ctx, match, info, &e, proto, inv);
- ret = xt_check_match(&par, size, proto, inv); - if (ret < 0) - return ret; - - nft_xt = container_of(expr->ops, struct nft_xt, ops); - nft_xt_get(nft_xt); - return 0; + return xt_check_match(&par, size, proto, inv); }
static int @@ -564,8 +497,8 @@ __nft_match_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr, if (par.match->destroy != NULL) par.match->destroy(&par);
- if (nft_xt_put(container_of(expr->ops, struct nft_xt, ops))) - module_put(me); + module_put(me); + kfree(expr->ops); }
static void @@ -574,18 +507,6 @@ nft_match_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) __nft_match_destroy(ctx, expr, nft_expr_priv(expr)); }
-static void nft_compat_deactivate(const struct nft_ctx *ctx, - const struct nft_expr *expr, - enum nft_trans_phase phase) -{ - struct nft_xt *xt = container_of(expr->ops, struct nft_xt, ops); - - if (phase == NFT_TRANS_ABORT || phase == NFT_TRANS_COMMIT) { - if (--xt->listcnt == 0) - list_del_init(&xt->head); - } -} - static void nft_match_large_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr) { @@ -780,19 +701,13 @@ static const struct nfnetlink_subsystem nfnl_compat_subsys = { .cb = nfnl_nft_compat_cb, };
-static bool nft_match_cmp(const struct xt_match *match, - const char *name, u32 rev, u32 family) -{ - return strcmp(match->name, name) == 0 && match->revision == rev && - (match->family == NFPROTO_UNSPEC || match->family == family); -} +static struct nft_expr_type nft_match_type;
static const struct nft_expr_ops * nft_match_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { - struct nft_compat_net *cn; - struct nft_xt *nft_match; + struct nft_expr_ops *ops; struct xt_match *match; unsigned int matchsize; char *mt_name; @@ -808,16 +723,6 @@ nft_match_select_ops(const struct nft_ctx *ctx, rev = ntohl(nla_get_be32(tb[NFTA_MATCH_REV])); family = ctx->family;
- cn = nft_compat_pernet(ctx->net); - - /* Re-use the existing match if it's already loaded. */ - list_for_each_entry(nft_match, &cn->nft_match_list, head) { - struct xt_match *match = nft_match->ops.data; - - if (nft_match_cmp(match, mt_name, rev, family)) - return &nft_match->ops; - } - match = xt_request_find_match(family, mt_name, rev); if (IS_ERR(match)) return ERR_PTR(-ENOENT); @@ -827,65 +732,62 @@ nft_match_select_ops(const struct nft_ctx *ctx, goto err; }
- /* This is the first time we use this match, allocate operations */ - nft_match = kzalloc(sizeof(struct nft_xt), GFP_KERNEL); - if (nft_match == NULL) { + ops = kzalloc(sizeof(struct nft_expr_ops), GFP_KERNEL); + if (!ops) { err = -ENOMEM; goto err; }
- refcount_set(&nft_match->refcnt, 0); - nft_match->ops.type = &nft_match_type; - nft_match->ops.eval = nft_match_eval; - nft_match->ops.init = nft_match_init; - nft_match->ops.destroy = nft_match_destroy; - nft_match->ops.deactivate = nft_compat_deactivate; - nft_match->ops.dump = nft_match_dump; - nft_match->ops.validate = nft_match_validate; - nft_match->ops.data = match; + ops->type = &nft_match_type; + ops->eval = nft_match_eval; + ops->init = nft_match_init; + ops->destroy = nft_match_destroy; + ops->dump = nft_match_dump; + ops->validate = nft_match_validate; + ops->data = match;
matchsize = NFT_EXPR_SIZE(XT_ALIGN(match->matchsize)); if (matchsize > NFT_MATCH_LARGE_THRESH) { matchsize = NFT_EXPR_SIZE(sizeof(struct nft_xt_match_priv));
- nft_match->ops.eval = nft_match_large_eval; - nft_match->ops.init = nft_match_large_init; - nft_match->ops.destroy = nft_match_large_destroy; - nft_match->ops.dump = nft_match_large_dump; + ops->eval = nft_match_large_eval; + ops->init = nft_match_large_init; + ops->destroy = nft_match_large_destroy; + ops->dump = nft_match_large_dump; }
- nft_match->ops.size = matchsize; + ops->size = matchsize;
- nft_match->listcnt = 0; - list_add(&nft_match->head, &cn->nft_match_list); - - return &nft_match->ops; + return ops; err: module_put(match->me); return ERR_PTR(err); }
+static void nft_match_release_ops(const struct nft_expr_ops *ops) +{ + struct xt_match *match = ops->data; + + module_put(match->me); + kfree(ops); +} + static struct nft_expr_type nft_match_type __read_mostly = { .name = "match", .select_ops = nft_match_select_ops, + .release_ops = nft_match_release_ops, .policy = nft_match_policy, .maxattr = NFTA_MATCH_MAX, .owner = THIS_MODULE, };
-static bool nft_target_cmp(const struct xt_target *tg, - const char *name, u32 rev, u32 family) -{ - return strcmp(tg->name, name) == 0 && tg->revision == rev && - (tg->family == NFPROTO_UNSPEC || tg->family == family); -} +static struct nft_expr_type nft_target_type;
static const struct nft_expr_ops * nft_target_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { - struct nft_compat_net *cn; - struct nft_xt *nft_target; + struct nft_expr_ops *ops; struct xt_target *target; char *tg_name; u32 rev, family; @@ -905,18 +807,6 @@ nft_target_select_ops(const struct nft_ctx *ctx, strcmp(tg_name, "standard") == 0) return ERR_PTR(-EINVAL);
- cn = nft_compat_pernet(ctx->net); - /* Re-use the existing target if it's already loaded. */ - list_for_each_entry(nft_target, &cn->nft_target_list, head) { - struct xt_target *target = nft_target->ops.data; - - if (!target->target) - continue; - - if (nft_target_cmp(target, tg_name, rev, family)) - return &nft_target->ops; - } - target = xt_request_find_target(family, tg_name, rev); if (IS_ERR(target)) return ERR_PTR(-ENOENT); @@ -931,113 +821,55 @@ nft_target_select_ops(const struct nft_ctx *ctx, goto err; }
- /* This is the first time we use this target, allocate operations */ - nft_target = kzalloc(sizeof(struct nft_xt), GFP_KERNEL); - if (nft_target == NULL) { + ops = kzalloc(sizeof(struct nft_expr_ops), GFP_KERNEL); + if (!ops) { err = -ENOMEM; goto err; }
- refcount_set(&nft_target->refcnt, 0); - nft_target->ops.type = &nft_target_type; - nft_target->ops.size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize)); - nft_target->ops.init = nft_target_init; - nft_target->ops.destroy = nft_target_destroy; - nft_target->ops.deactivate = nft_compat_deactivate; - nft_target->ops.dump = nft_target_dump; - nft_target->ops.validate = nft_target_validate; - nft_target->ops.data = target; + ops->type = &nft_target_type; + ops->size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize)); + ops->init = nft_target_init; + ops->destroy = nft_target_destroy; + ops->dump = nft_target_dump; + ops->validate = nft_target_validate; + ops->data = target;
if (family == NFPROTO_BRIDGE) - nft_target->ops.eval = nft_target_eval_bridge; + ops->eval = nft_target_eval_bridge; else - nft_target->ops.eval = nft_target_eval_xt; - - nft_target->listcnt = 0; - list_add(&nft_target->head, &cn->nft_target_list); + ops->eval = nft_target_eval_xt;
- return &nft_target->ops; + return ops; err: module_put(target->me); return ERR_PTR(err); }
+static void nft_target_release_ops(const struct nft_expr_ops *ops) +{ + struct xt_target *target = ops->data; + + module_put(target->me); + kfree(ops); +} + static struct nft_expr_type nft_target_type __read_mostly = { .name = "target", .select_ops = nft_target_select_ops, + .release_ops = nft_target_release_ops, .policy = nft_target_policy, .maxattr = NFTA_TARGET_MAX, .owner = THIS_MODULE, };
-static int __net_init nft_compat_init_net(struct net *net) -{ - struct nft_compat_net *cn = nft_compat_pernet(net); - - INIT_LIST_HEAD(&cn->nft_target_list); - INIT_LIST_HEAD(&cn->nft_match_list); - - return 0; -} - -static void __net_exit nft_compat_exit_net(struct net *net) -{ - struct nft_compat_net *cn = nft_compat_pernet(net); - struct nft_xt *xt, *next; - - if (list_empty(&cn->nft_match_list) && - list_empty(&cn->nft_target_list)) - return; - - /* If there was an error that caused nft_xt expr to not be initialized - * fully and noone else requested the same expression later, the lists - * contain 0-refcount entries that still hold module reference. - * - * Clean them here. - */ - mutex_lock(&net->nft.commit_mutex); - list_for_each_entry_safe(xt, next, &cn->nft_target_list, head) { - struct xt_target *target = xt->ops.data; - - list_del_init(&xt->head); - - if (refcount_read(&xt->refcnt)) - continue; - module_put(target->me); - kfree(xt); - } - - list_for_each_entry_safe(xt, next, &cn->nft_match_list, head) { - struct xt_match *match = xt->ops.data; - - list_del_init(&xt->head); - - if (refcount_read(&xt->refcnt)) - continue; - module_put(match->me); - kfree(xt); - } - mutex_unlock(&net->nft.commit_mutex); -} - -static struct pernet_operations nft_compat_net_ops = { - .init = nft_compat_init_net, - .exit = nft_compat_exit_net, - .id = &nft_compat_net_id, - .size = sizeof(struct nft_compat_net), -}; - static int __init nft_compat_module_init(void) { int ret;
- ret = register_pernet_subsys(&nft_compat_net_ops); - if (ret < 0) - goto err_target; - ret = nft_register_expr(&nft_match_type); if (ret < 0) - goto err_pernet; + return ret;
ret = nft_register_expr(&nft_target_type); if (ret < 0) @@ -1054,8 +886,6 @@ static int __init nft_compat_module_init(void) nft_unregister_expr(&nft_target_type); err_match: nft_unregister_expr(&nft_match_type); -err_pernet: - unregister_pernet_subsys(&nft_compat_net_ops); return ret; }
@@ -1064,7 +894,6 @@ static void __exit nft_compat_module_exit(void) nfnetlink_subsys_unregister(&nfnl_compat_subsys); nft_unregister_expr(&nft_target_type); nft_unregister_expr(&nft_match_type); - unregister_pernet_subsys(&nft_compat_net_ops); }
MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_NFT_COMPAT);
[ Upstream commit 3f3a390dbd59d236f62cff8e8b20355ef7069e3d ]
Smatch reports:
net/netfilter/nf_tables_api.c:2167 nf_tables_expr_destroy() error: dereferencing freed memory 'expr->ops'
net/netfilter/nf_tables_api.c 2162 static void nf_tables_expr_destroy(const struct nft_ctx *ctx, 2163 struct nft_expr *expr) 2164 { 2165 if (expr->ops->destroy) 2166 expr->ops->destroy(ctx, expr); ^^^^ --> 2167 module_put(expr->ops->type->owner); ^^^^^^^^^ 2168 }
Smatch says there are three functions which free expr->ops.
Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension") Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index dc3670f2860e..f20b904873c6 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2119,9 +2119,11 @@ static int nf_tables_newexpr(const struct nft_ctx *ctx, static void nf_tables_expr_destroy(const struct nft_ctx *ctx, struct nft_expr *expr) { + const struct nft_expr_type *type = expr->ops->type; + if (expr->ops->destroy) expr->ops->destroy(ctx, expr); - module_put(expr->ops->type->owner); + module_put(type->owner); }
struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,
[ Upstream commit b25a31bf0ca091aa8bdb9ab329b0226257568bbe ]
->release_ops() callback releases resources and this is used in error path. If nf_tables_newrule() fails after ->select_ops(), it should release resources. but it can not call ->destroy() because that should be called after ->init(). At this point, ->release_ops() should be used for releasing resources.
Test commands: modprobe -rv xt_tcpudp iptables-nft -I INPUT -m tcp <-- error command lsmod
Result: Module Size Used by xt_tcpudp 20480 2 <-- it should be 0
Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f20b904873c6..acb124ce92ec 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2753,8 +2753,11 @@ static int nf_tables_newrule(struct net *net, struct sock *nlsk, nf_tables_rule_release(&ctx, rule); err1: for (i = 0; i < n; i++) { - if (info[i].ops != NULL) + if (info[i].ops) { module_put(info[i].ops->type->owner); + if (info[i].ops->type->release_ops) + info[i].ops->type->release_ops(info[i].ops); + } } kvfree(info); return err;
[ Upstream commit 1b704c4a1ba95574832e730f23817b651db2aa59 ]
After queue stopped, the wakeup mechanism may wake it up again when ring buffer usage is lower than a threshold. This may cause send path panic on NULL pointer when we stopped all tx queues in netvsc_detach and start removing the netvsc device.
This patch fix it by adding a tx_disable flag to prevent unwanted queue wakeup.
Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Reported-by: Mohammed Gamal mgamal@redhat.com Signed-off-by: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/hyperv/hyperv_net.h | 1 + drivers/net/hyperv/netvsc.c | 6 ++++-- drivers/net/hyperv/netvsc_drv.c | 32 ++++++++++++++++++++++++++------ 3 files changed, 31 insertions(+), 8 deletions(-)
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index e859ae2e42d5..49f41b64077b 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -987,6 +987,7 @@ struct netvsc_device {
wait_queue_head_t wait_drain; bool destroy; + bool tx_disable; /* if true, do not wake up queue again */
/* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 813d195bbd57..e0dce373cdd9 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -110,6 +110,7 @@ static struct netvsc_device *alloc_net_device(void)
init_waitqueue_head(&net_device->wait_drain); net_device->destroy = false; + net_device->tx_disable = false;
net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT; net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT; @@ -719,7 +720,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev, } else { struct netdev_queue *txq = netdev_get_tx_queue(ndev, q_idx);
- if (netif_tx_queue_stopped(txq) && + if (netif_tx_queue_stopped(txq) && !net_device->tx_disable && (hv_get_avail_to_write_percent(&channel->outbound) > RING_AVAIL_PERCENT_HIWATER || queue_sends < 1)) { netif_tx_wake_queue(txq); @@ -874,7 +875,8 @@ static inline int netvsc_send_pkt( } else if (ret == -EAGAIN) { netif_tx_stop_queue(txq); ndev_ctx->eth_stats.stop_queue++; - if (atomic_read(&nvchan->queue_sends) < 1) { + if (atomic_read(&nvchan->queue_sends) < 1 && + !net_device->tx_disable) { netif_tx_wake_queue(txq); ndev_ctx->eth_stats.wake_queue++; ret = -ENOSPC; diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index cf4897043e83..b20fb0fb595b 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -109,6 +109,15 @@ static void netvsc_set_rx_mode(struct net_device *net) rcu_read_unlock(); }
+static void netvsc_tx_enable(struct netvsc_device *nvscdev, + struct net_device *ndev) +{ + nvscdev->tx_disable = false; + virt_wmb(); /* ensure queue wake up mechanism is on */ + + netif_tx_wake_all_queues(ndev); +} + static int netvsc_open(struct net_device *net) { struct net_device_context *ndev_ctx = netdev_priv(net); @@ -129,7 +138,7 @@ static int netvsc_open(struct net_device *net) rdev = nvdev->extension; if (!rdev->link_state) { netif_carrier_on(net); - netif_tx_wake_all_queues(net); + netvsc_tx_enable(nvdev, net); }
if (vf_netdev) { @@ -184,6 +193,17 @@ static int netvsc_wait_until_empty(struct netvsc_device *nvdev) } }
+static void netvsc_tx_disable(struct netvsc_device *nvscdev, + struct net_device *ndev) +{ + if (nvscdev) { + nvscdev->tx_disable = true; + virt_wmb(); /* ensure txq will not wake up after stop */ + } + + netif_tx_disable(ndev); +} + static int netvsc_close(struct net_device *net) { struct net_device_context *net_device_ctx = netdev_priv(net); @@ -192,7 +212,7 @@ static int netvsc_close(struct net_device *net) struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev); int ret;
- netif_tx_disable(net); + netvsc_tx_disable(nvdev, net);
/* No need to close rndis filter if it is removed already */ if (!nvdev) @@ -920,7 +940,7 @@ static int netvsc_detach(struct net_device *ndev,
/* If device was up (receiving) then shutdown */ if (netif_running(ndev)) { - netif_tx_disable(ndev); + netvsc_tx_disable(nvdev, ndev);
ret = rndis_filter_close(nvdev); if (ret) { @@ -1908,7 +1928,7 @@ static void netvsc_link_change(struct work_struct *w) if (rdev->link_state) { rdev->link_state = false; netif_carrier_on(net); - netif_tx_wake_all_queues(net); + netvsc_tx_enable(net_device, net); } else { notify = true; } @@ -1918,7 +1938,7 @@ static void netvsc_link_change(struct work_struct *w) if (!rdev->link_state) { rdev->link_state = true; netif_carrier_off(net); - netif_tx_stop_all_queues(net); + netvsc_tx_disable(net_device, net); } kfree(event); break; @@ -1927,7 +1947,7 @@ static void netvsc_link_change(struct work_struct *w) if (!rdev->link_state) { rdev->link_state = true; netif_carrier_off(net); - netif_tx_stop_all_queues(net); + netvsc_tx_disable(net_device, net); event->event = RNDIS_STATUS_MEDIA_CONNECT; spin_lock_irqsave(&ndev_ctx->lock, flags); list_add(&event->list, &ndev_ctx->reconfig_events);
[ Upstream commit bbd669a868bba591ffd38b7bc75a7b361bb54b04 ]
Fix device initialization completion handling for vNIC adapters. Initialize the completion structure on probe and reinitialize when needed. This also fixes a race condition during kdump where the driver can attempt to access the completion struct before it is initialized:
Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc0000000081acbe0 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1 NIP: c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900 REGS: c000000027f3f990 TRAP: 0300 Not tainted (4.18.0-64.el8.ppc64le) MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228288 XER: 00000006 CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58 GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4 GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28 GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100 GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006 GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000 GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003 GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58 NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100 LR [c0000000081ad964] complete+0x64/0xa0 Call Trace: [c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable) [c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0 [c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic] [c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic] [c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0 [c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400 [c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0 [c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210 [c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24 [c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130 [c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120
Signed-off-by: Thomas Falcon tlfalcon@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 5ecbb1adcf3b..51cfe95f3e24 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1885,6 +1885,7 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, */ adapter->state = VNIC_PROBED;
+ reinit_completion(&adapter->init_done); rc = init_crq_queue(adapter); if (rc) { netdev_err(adapter->netdev, @@ -4625,7 +4626,7 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) old_num_rx_queues = adapter->req_rx_queues; old_num_tx_queues = adapter->req_tx_queues;
- init_completion(&adapter->init_done); + reinit_completion(&adapter->init_done); adapter->init_done_rc = 0; ibmvnic_send_crq_init(adapter); if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { @@ -4680,7 +4681,6 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter)
adapter->from_passive_init = false;
- init_completion(&adapter->init_done); adapter->init_done_rc = 0; ibmvnic_send_crq_init(adapter); if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { @@ -4759,6 +4759,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) INIT_WORK(&adapter->ibmvnic_reset, __ibmvnic_reset); INIT_LIST_HEAD(&adapter->rwi_list); spin_lock_init(&adapter->rwi_lock); + init_completion(&adapter->init_done); adapter->resetting = false;
adapter->mac_change_pending = false;
[ Upstream commit b2e54b09a3d29c4db883b920274ca8dca4d9f04d ]
The device type for ip6 tunnels is set to ARPHRD_TUNNEL6. However, the ip4ip6_err function is expecting the device type of the tunnel to be ARPHRD_TUNNEL. Since the device types do not match, the function exits and the ICMP error packet is not sent to the originating host. Note that the device type for IPv4 tunnels is set to ARPHRD_TUNNEL.
Fix is to expect a tunnel device type of ARPHRD_TUNNEL6 instead. Now the tunnel device type matches and the ICMP error packet is sent to the originating host.
Signed-off-by: Sheena Mira-ato sheena.mira-ato@alliedtelesis.co.nz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_tunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 0c6403cf8b52..ade1390c6348 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -627,7 +627,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, rt = ip_route_output_ports(dev_net(skb->dev), &fl4, NULL, eiph->daddr, eiph->saddr, 0, 0, IPPROTO_IPIP, RT_TOS(eiph->tos), 0); - if (IS_ERR(rt) || rt->dst.dev->type != ARPHRD_TUNNEL) { + if (IS_ERR(rt) || rt->dst.dev->type != ARPHRD_TUNNEL6) { if (!IS_ERR(rt)) ip_rt_put(rt); goto out; @@ -636,7 +636,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, } else { if (ip_route_input(skb2, eiph->daddr, eiph->saddr, eiph->tos, skb2->dev) || - skb_dst(skb2)->dev->type != ARPHRD_TUNNEL) + skb_dst(skb2)->dev->type != ARPHRD_TUNNEL6) goto out; }
[ Upstream commit ef0efcd3bd3fd0589732b67fb586ffd3c8705806 ]
At the beginning of ip6_fragment func, the prevhdr pointer is obtained in the ip6_find_1stfragopt func. However, all the pointers pointing into skb header may change when calling skb_checksum_help func with skb->ip_summed = CHECKSUM_PARTIAL condition. The prevhdr pointe will be dangling if it is not reloaded after calling __skb_linearize func in skb_checksum_help func.
Here, I add a variable, nexthdr_offset, to evaluate the offset, which does not changes even after calling __skb_linearize func.
Fixes: 405c92f7a541 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment") Signed-off-by: Junwei Hu hujunwei4@huawei.com Reported-by: Wenhao Zhang zhangwenhao8@huawei.com Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com Reviewed-by: Zhiqiang Liu liuzhiqiang26@huawei.com Acked-by: Martin KaFai Lau kafai@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 5f9fa0302b5a..e71227390bec 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -595,7 +595,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, inet6_sk(skb->sk) : NULL; struct ipv6hdr *tmp_hdr; struct frag_hdr *fh; - unsigned int mtu, hlen, left, len; + unsigned int mtu, hlen, left, len, nexthdr_offset; int hroom, troom; __be32 frag_id; int ptr, offset = 0, err = 0; @@ -606,6 +606,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, goto fail; hlen = err; nexthdr = *prevhdr; + nexthdr_offset = prevhdr - skb_network_header(skb);
mtu = ip6_skb_dst_mtu(skb);
@@ -640,6 +641,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, (err = skb_checksum_help(skb))) goto fail;
+ prevhdr = skb_network_header(skb) + nexthdr_offset; hroom = LL_RESERVED_SPACE(rt->dst.dev); if (skb_has_frag_list(skb)) { unsigned int first_len = skb_pagelen(skb);
[ Upstream commit bb9bd814ebf04f579be466ba61fc922625508807 ]
ipip6 tunnels run iptunnel_pull_header on received skbs. This can determine the following use-after-free accessing iph pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device)
[ 706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit] [ 706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/= [ 706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016 [ 706.771839] Call trace: [ 706.801159] dump_backtrace+0x0/0x2f8 [ 706.845079] show_stack+0x24/0x30 [ 706.884833] dump_stack+0xe0/0x11c [ 706.925629] print_address_description+0x68/0x260 [ 706.982070] kasan_report+0x178/0x340 [ 707.025995] __asan_report_load1_noabort+0x30/0x40 [ 707.083481] ipip6_rcv+0x1678/0x16e0 [sit] [ 707.132623] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 707.185940] ip_local_deliver_finish+0x3b8/0x988 [ 707.241338] ip_local_deliver+0x144/0x470 [ 707.289436] ip_rcv_finish+0x43c/0x14b0 [ 707.335447] ip_rcv+0x628/0x1138 [ 707.374151] __netif_receive_skb_core+0x1670/0x2600 [ 707.432680] __netif_receive_skb+0x28/0x190 [ 707.482859] process_backlog+0x1d0/0x610 [ 707.529913] net_rx_action+0x37c/0xf68 [ 707.574882] __do_softirq+0x288/0x1018 [ 707.619852] run_ksoftirqd+0x70/0xa8 [ 707.662734] smpboot_thread_fn+0x3a4/0x9e8 [ 707.711875] kthread+0x2c8/0x350 [ 707.750583] ret_from_fork+0x10/0x18
[ 707.811302] Allocated by task 16982: [ 707.854182] kasan_kmalloc.part.1+0x40/0x108 [ 707.905405] kasan_kmalloc+0xb4/0xc8 [ 707.948291] kasan_slab_alloc+0x14/0x20 [ 707.994309] __kmalloc_node_track_caller+0x158/0x5e0 [ 708.053902] __kmalloc_reserve.isra.8+0x54/0xe0 [ 708.108280] __alloc_skb+0xd8/0x400 [ 708.150139] sk_stream_alloc_skb+0xa4/0x638 [ 708.200346] tcp_sendmsg_locked+0x818/0x2b90 [ 708.251581] tcp_sendmsg+0x40/0x60 [ 708.292376] inet_sendmsg+0xf0/0x520 [ 708.335259] sock_sendmsg+0xac/0xf8 [ 708.377096] sock_write_iter+0x1c0/0x2c0 [ 708.424154] new_sync_write+0x358/0x4a8 [ 708.470162] __vfs_write+0xc4/0xf8 [ 708.510950] vfs_write+0x12c/0x3d0 [ 708.551739] ksys_write+0xcc/0x178 [ 708.592533] __arm64_sys_write+0x70/0xa0 [ 708.639593] el0_svc_handler+0x13c/0x298 [ 708.686646] el0_svc+0x8/0xc
[ 708.739019] Freed by task 17: [ 708.774597] __kasan_slab_free+0x114/0x228 [ 708.823736] kasan_slab_free+0x10/0x18 [ 708.868703] kfree+0x100/0x3d8 [ 708.905320] skb_free_head+0x7c/0x98 [ 708.948204] skb_release_data+0x320/0x490 [ 708.996301] pskb_expand_head+0x60c/0x970 [ 709.044399] __iptunnel_pull_header+0x3b8/0x5d0 [ 709.098770] ipip6_rcv+0x41c/0x16e0 [sit] [ 709.146873] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 709.200195] ip_local_deliver_finish+0x3b8/0x988 [ 709.255596] ip_local_deliver+0x144/0x470 [ 709.303692] ip_rcv_finish+0x43c/0x14b0 [ 709.349705] ip_rcv+0x628/0x1138 [ 709.388413] __netif_receive_skb_core+0x1670/0x2600 [ 709.446943] __netif_receive_skb+0x28/0x190 [ 709.497120] process_backlog+0x1d0/0x610 [ 709.544169] net_rx_action+0x37c/0xf68 [ 709.589131] __do_softirq+0x288/0x1018
[ 709.651938] The buggy address belongs to the object at ffffe01b6bd85580 which belongs to the cache kmalloc-1024 of size 1024 [ 709.804356] The buggy address is located 117 bytes inside of 1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980) [ 709.946340] The buggy address belongs to the page: [ 710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0 [ 710.099914] flags: 0xfffff8000000100(slab) [ 710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600 [ 710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000 [ 710.334966] page dumped because: kasan: bad access detected
Fix it resetting iph pointer after iptunnel_pull_header
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Tested-by: Jianlin Shi jishi@redhat.com Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/sit.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 07e21a82ce4c..b2109b74857d 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -669,6 +669,10 @@ static int ipip6_rcv(struct sk_buff *skb) !net_eq(tunnel->net, dev_net(tunnel->dev)))) goto out;
+ /* skb can be uncloned in iptunnel_pull_header, so + * old iph is no longer valid + */ + iph = (const struct iphdr *)skb_mac_header(skb); err = IP_ECN_decapsulate(iph, skb); if (unlikely(err)) { if (log_ecn_error)
[ Upstream commit 3c446e6f96997f2a95bf0037ef463802162d2323 ]
When kcm is loaded while many processes try to create a KCM socket, a crash occurs: BUG: unable to handle kernel NULL pointer dereference at 000000000000000e IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0 Oops: 0002 [#1] SMP KASAN PTI CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased) RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240 RSP: 0018:ffff88000d487a00 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719 ... CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0 Call Trace: kcm_create+0x600/0xbf0 [kcm] __sock_create+0x324/0x750 net/socket.c:1272 ...
This is due to race between sock_create and unfinished register_pernet_device. kcm_create tries to do "net_generic(net, kcm_net_id)". but kcm_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with mutiple processes doing socket(PF_KCM, ...) and one process doing module removal.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: Jiri Slaby jslaby@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/kcm/kcmsock.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 571d824e4e24..b919db02c7f9 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -2054,14 +2054,14 @@ static int __init kcm_init(void) if (err) goto fail;
- err = sock_register(&kcm_family_ops); - if (err) - goto sock_register_fail; - err = register_pernet_device(&kcm_net_ops); if (err) goto net_ops_fail;
+ err = sock_register(&kcm_family_ops); + if (err) + goto sock_register_fail; + err = kcm_proc_init(); if (err) goto proc_init_fail; @@ -2069,12 +2069,12 @@ static int __init kcm_init(void) return 0;
proc_init_fail: - unregister_pernet_device(&kcm_net_ops); - -net_ops_fail: sock_unregister(PF_KCM);
sock_register_fail: + unregister_pernet_device(&kcm_net_ops); + +net_ops_fail: proto_unregister(&kcm_proto);
fail: @@ -2090,8 +2090,8 @@ static int __init kcm_init(void) static void __exit kcm_exit(void) { kcm_proc_exit(); - unregister_pernet_device(&kcm_net_ops); sock_unregister(PF_KCM); + unregister_pernet_device(&kcm_net_ops); proto_unregister(&kcm_proto); destroy_workqueue(kcm_wq);
[ Upstream commit 3d8830266ffc28c16032b859e38a0252e014b631 ]
NULL or ZERO_SIZE_PTR will be returned for zero sized memory request, and derefencing them will lead to a segfault
so it is unnecessory to call vzalloc for zero sized memory request and not call functions which maybe derefence the NULL allocated memory
this also fixes a possible memory leak if phy_ethtool_get_stats returns error, memory should be freed before exit
Signed-off-by: Li RongQing lirongqing@baidu.com Reviewed-by: Wang Li wangli39@baidu.com Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/ethtool.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 158264f7cfaf..3a7f19a61768 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1794,11 +1794,16 @@ static int ethtool_get_strings(struct net_device *dev, void __user *useraddr) WARN_ON_ONCE(!ret);
gstrings.len = ret; - data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN)); - if (gstrings.len && !data) - return -ENOMEM;
- __ethtool_get_strings(dev, gstrings.string_set, data); + if (gstrings.len) { + data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN)); + if (!data) + return -ENOMEM; + + __ethtool_get_strings(dev, gstrings.string_set, data); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &gstrings, sizeof(gstrings))) @@ -1894,11 +1899,15 @@ static int ethtool_get_stats(struct net_device *dev, void __user *useraddr) return -EFAULT;
stats.n_stats = n_stats; - data = vzalloc(array_size(n_stats, sizeof(u64))); - if (n_stats && !data) - return -ENOMEM;
- ops->get_ethtool_stats(dev, &stats, data); + if (n_stats) { + data = vzalloc(array_size(n_stats, sizeof(u64))); + if (!data) + return -ENOMEM; + ops->get_ethtool_stats(dev, &stats, data); + } else { + data = NULL; + }
ret = -EFAULT; if (copy_to_user(useraddr, &stats, sizeof(stats))) @@ -1938,16 +1947,21 @@ static int ethtool_get_phy_stats(struct net_device *dev, void __user *useraddr) return -EFAULT;
stats.n_stats = n_stats; - data = vzalloc(array_size(n_stats, sizeof(u64))); - if (n_stats && !data) - return -ENOMEM;
- if (dev->phydev && !ops->get_ethtool_phy_stats) { - ret = phy_ethtool_get_stats(dev->phydev, &stats, data); - if (ret < 0) - return ret; + if (n_stats) { + data = vzalloc(array_size(n_stats, sizeof(u64))); + if (!data) + return -ENOMEM; + + if (dev->phydev && !ops->get_ethtool_phy_stats) { + ret = phy_ethtool_get_stats(dev->phydev, &stats, data); + if (ret < 0) + goto out; + } else { + ops->get_ethtool_phy_stats(dev, &stats, data); + } } else { - ops->get_ethtool_phy_stats(dev, &stats, data); + data = NULL; }
ret = -EFAULT;
[ Upstream commit 0ab03f353d3613ea49d1f924faf98559003670a8 ]
Currently we may merge incorrectly a received GSO packet or a packet with frag_list into a packet sitting in the gro_hash list. skb_segment() may crash case because the assumptions on the skb layout are not met. The correct behaviour would be to flush the packet in the gro_hash list and send the received GSO packet directly afterwards. Commit d61d072e87c8e ("net-gro: avoid reorders") sets NAPI_GRO_CB(skb)->flush in this case, but this is not checked before merging. This patch makes sure to check this flag and to not merge in that case.
Fixes: d61d072e87c8e ("net-gro: avoid reorders") Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 2415d9cb9b89..ef2cd5712098 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3801,7 +3801,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) unsigned int delta_truesize; struct sk_buff *lp;
- if (unlikely(p->len + len >= 65536)) + if (unlikely(p->len + len >= 65536 || NAPI_GRO_CB(skb)->flush)) return -E2BIG;
lp = NAPI_GRO_CB(p)->last;
[ Upstream commit e8b26b2135dedc0284490bfeac06dfc4418d0105 ]
Delete initialization of high order entries in mr cache to decrease initial memory footprint. When required, the administrator can populate the entries with memory keys via the /sys interface.
This approach is very helpful to significantly reduce the per HW function memory footprint in virtualization environments such as SRIOV.
Fixes: 9603b61de1ee ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") Signed-off-by: Artemy Kovalyov artemyko@mellanox.com Signed-off-by: Moni Shoua monis@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Reported-by: Shalom Toledo shalomt@mellanox.com Acked-by: Or Gerlitz ogerlitz@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/main.c | 20 ------------------- 1 file changed, 20 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index be81b319b0dc..694edd899322 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -163,26 +163,6 @@ static struct mlx5_profile profile[] = { .size = 8, .limit = 4 }, - .mr_cache[16] = { - .size = 8, - .limit = 4 - }, - .mr_cache[17] = { - .size = 8, - .limit = 4 - }, - .mr_cache[18] = { - .size = 8, - .limit = 4 - }, - .mr_cache[19] = { - .size = 4, - .limit = 2 - }, - .mr_cache[20] = { - .size = 4, - .limit = 2 - }, }, };
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]
net_hash_mix() currently uses kernel address of a struct net, and is used in many places that could be used to reveal this address to a patient attacker, thus defeating KASLR, for the typical case (initial net namespace, &init_net is not dynamically allocated)
I believe the original implementation tried to avoid spending too many cycles in this function, but security comes first.
Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Amit Klein aksecurity@gmail.com Reported-by: Benny Pinkas benny@pinkas.net Cc: Pavel Emelyanov xemul@openvz.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/net_namespace.h | 1 + include/net/netns/hash.h | 10 ++-------- net/core/net_namespace.c | 1 + 3 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 99d4148e0f90..1c3126c14930 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -58,6 +58,7 @@ struct net { */ spinlock_t rules_mod_lock;
+ u32 hash_mix; atomic64_t cookie_gen;
struct list_head list; /* list of network namespaces */ diff --git a/include/net/netns/hash.h b/include/net/netns/hash.h index 16a842456189..d9b665151f3d 100644 --- a/include/net/netns/hash.h +++ b/include/net/netns/hash.h @@ -2,16 +2,10 @@ #ifndef __NET_NS_HASH_H__ #define __NET_NS_HASH_H__
-#include <asm/cache.h> - -struct net; +#include <net/net_namespace.h>
static inline u32 net_hash_mix(const struct net *net) { -#ifdef CONFIG_NET_NS - return (u32)(((unsigned long)net) >> ilog2(sizeof(*net))); -#else - return 0; -#endif + return net->hash_mix; } #endif diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index b02fb19df2cc..40c249c574c1 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -304,6 +304,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
refcount_set(&net->count, 1); refcount_set(&net->passive, 1); + get_random_bytes(&net->hash_mix, sizeof(u32)); net->dev_base_seq = 1; net->user_ns = user_ns; idr_init(&net->netns_ids);
[ Upstream commit cb66ddd156203daefb8d71158036b27b0e2caf63 ]
When it is to cleanup net namespace, rds_tcp_exit_net() will call rds_tcp_kill_sock(), if t_sock is NULL, it will not call rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect() and reference 'net' which has already been freed.
In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before sock->ops->connect, but if connect() is failed, it will call rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always failed, rds_connect_worker() will try to reconnect all the time, so rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the connections.
Therefore, the condition !tc->t_sock is not needed if it is going to do cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always NULL, and there is on other path to cancel cp_conn_w and free connection. So this patch is to fix this.
rds_tcp_kill_sock(): ... if (net != c_net || !tc->t_sock) ... Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com
================================================================== BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11 Hardware name: linux,dummy-virt (DT) Workqueue: krdsd rds_connect_worker Call trace: dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x120/0x188 lib/dump_stack.c:113 print_address_description+0x68/0x278 mm/kasan/report.c:253 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x21c/0x348 mm/kasan/report.c:409 __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429 inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340 __sock_create+0x4f8/0x770 net/socket.c:1276 sock_create_kern+0x50/0x68 net/socket.c:1322 rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114 rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
Allocated by task 687: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553 kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2705 [inline] slab_alloc mm/slub.c:2713 [inline] kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718 kmem_cache_zalloc include/linux/slab.h:697 [inline] net_alloc net/core/net_namespace.c:384 [inline] copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424 create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206 ksys_unshare+0x340/0x628 kernel/fork.c:2577 __do_sys_unshare kernel/fork.c:2645 [inline] __se_sys_unshare kernel/fork.c:2643 [inline] __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall arch/arm64/kernel/syscall.c:47 [inline] el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83 el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129 el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
Freed by task 264: save_stack mm/kasan/kasan.c:448 [inline] set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521 kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528 slab_free_hook mm/slub.c:1370 [inline] slab_free_freelist_hook mm/slub.c:1397 [inline] slab_free mm/slub.c:2952 [inline] kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968 net_free net/core/net_namespace.c:400 [inline] net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407 net_drop_ns net/core/net_namespace.c:406 [inline] cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569 process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153 worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296 kthread+0x2f0/0x378 kernel/kthread.c:255 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
The buggy address belongs to the object at ffff8003496a3f80 which belongs to the cache net_namespace of size 7872 The buggy address is located 1796 bytes inside of 7872-byte region [ffff8003496a3f80, ffff8003496a5e40) The buggy address belongs to the page: page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000 index:0x0 compound_mapcount: 0 flags: 0xffffe0000008100(slab|head) raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000 raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Mao Wenan maowenan@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rds/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rds/tcp.c b/net/rds/tcp.c index c16f0a362c32..a729c47db781 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -600,7 +600,7 @@ static void rds_tcp_kill_sock(struct net *net) list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net);
- if (net != c_net || !tc->t_sock) + if (net != c_net) continue; if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn)) { list_move_tail(&tc->t_tcp_node, &tmp_list);
[ Upstream commit fae2708174ae95d98d19f194e03d6e8f688ae195 ]
the control path of 'sample' action does not validate the value of 'rate' provided by the user, but then it uses it as divisor in the traffic path. Validate it in tcf_sample_init(), and return -EINVAL with a proper extack message in case that value is zero, to fix a splat with the script below:
# tc f a dev test0 egress matchall action sample rate 0 group 1 index 2 # tc -s a s action sample total acts 1
action order 0: sample rate 1/0 group 1 pipe index 2 ref 1 bind 1 installed 19 sec used 19 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 # ping 192.0.2.1 -I test0 -c1 -q
divide error: 0000 [#1] SMP PTI CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample] Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000 FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0 Call Trace: tcf_action_exec+0x7c/0x1c0 tcf_classify+0x57/0x160 __dev_queue_xmit+0x3dc/0xd10 ip_finish_output2+0x257/0x6d0 ip_output+0x75/0x280 ip_send_skb+0x15/0x40 raw_sendmsg+0xae3/0x1410 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] Kernel panic - not syncing: Fatal exception in interrupt
Add a TDC selftest to document that 'rate' is now being validated.
Reported-by: Matteo Croce mcroce@redhat.com Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Davide Caratti dcaratti@redhat.com Acked-by: Yotam Gigi yotam.gi@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/act_sample.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c index 1a0c682fd734..fd62fe6c8e73 100644 --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -43,8 +43,8 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla, struct tc_action_net *tn = net_generic(net, sample_net_id); struct nlattr *tb[TCA_SAMPLE_MAX + 1]; struct psample_group *psample_group; + u32 psample_group_num, rate; struct tc_sample *parm; - u32 psample_group_num; struct tcf_sample *s; bool exists = false; int ret, err; @@ -80,6 +80,12 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla, return -EEXIST; }
+ rate = nla_get_u32(tb[TCA_SAMPLE_RATE]); + if (!rate) { + NL_SET_ERR_MSG(extack, "invalid sample rate"); + tcf_idr_release(*a, bind); + return -EINVAL; + } psample_group_num = nla_get_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]); psample_group = psample_group_get(net, psample_group_num); if (!psample_group) { @@ -91,7 +97,7 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
spin_lock_bh(&s->tcf_lock); s->tcf_action = parm->action; - s->rate = nla_get_u32(tb[TCA_SAMPLE_RATE]); + s->rate = rate; s->psample_group_num = psample_group_num; RCU_INIT_POINTER(s->psample_group, psample_group);
[ Upstream commit 0db6f8befc32c68bb13d7ffbb2e563c79e913e13 ]
It returned always NULL, thus it was never possible to get the filter.
Example: $ ip link add foo type dummy $ ip link add bar type dummy $ tc qdisc add dev foo clsact $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \ matchall action mirred ingress mirror dev bar
Before the patch: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall Error: Specified filter handle not found. We have an error talking to the kernel
After: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2 not_in_hw action order 1: mirred (Ingress Mirror to device bar) pipe index 1 ref 1 bind 1
CC: Yotam Gigi yotamg@mellanox.com CC: Jiri Pirko jiri@mellanox.com Fixes: fd62d9f5c575 ("net/sched: matchall: Fix configuration race") Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_matchall.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index 0e408ee9dcec..5ba07cd11e31 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -125,6 +125,11 @@ static void mall_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
static void *mall_get(struct tcf_proto *tp, u32 handle) { + struct cls_mall_head *head = rtnl_dereference(tp->root); + + if (head && head->handle == handle) + return head; + return NULL; }
[ Upstream commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb ]
The flow action buffer can be resized if it's not big enough to contain all the requested flow actions. However, this resize doesn't take into account the new requested size, the buffer is only increased by a factor of 2x. This might be not enough to contain the new data, causing a buffer overflow, for example:
[ 42.044472] ============================================================================= [ 42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten [ 42.046415] -----------------------------------------------------------------------------
[ 42.047715] Disabling lock debugging due to kernel taint [ 42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc [ 42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101 [ 42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
[ 42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc ........ [ 42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00 kkkkkkkk....l... [ 42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6 l...........x... [ 42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00 ............... [ 42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.059061] Redzone 8bf2c4a5: 00 00 00 00 .... [ 42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Fix by making sure the new buffer is properly resized to contain all the requested data.
BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi andrea.righi@canonical.com Acked-by: Pravin B Shelar pshelar@ovn.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/flow_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 691da853bef5..4bdf5e3ac208 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -2306,14 +2306,14 @@ static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
struct sw_flow_actions *acts; int new_acts_size; - int req_size = NLA_ALIGN(attr_len); + size_t req_size = NLA_ALIGN(attr_len); int next_offset = offsetof(struct sw_flow_actions, actions) + (*sfa)->actions_len;
if (req_size <= (ksize(*sfa) - next_offset)) goto out;
- new_acts_size = ksize(*sfa) * 2; + new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
if (new_acts_size > MAX_ACTIONS_BUFSIZE) { if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) {
[ Upstream commit 6289d0facd9ebce4cc83e5da39e15643ee998dc5 ]
This is a Qualcomm based device with a QMI function on interface 4. It is mode switched from 2020:2030 using a standard eject message.
T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 6 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2020 ProdID=2031 Rev= 2.32 S: Manufacturer=Mobile Connect S: Product=Mobile Connect S: SerialNumber=0123456789ABCDEF C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=89(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none) E: Ad=8a(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
Signed-off-by: Bjørn Mork bjorn@mork.no Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 74bebbdb4b15..9195f3476b1d 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1203,6 +1203,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x19d2, 0x2002, 4)}, /* ZTE (Vodafone) K3765-Z */ {QMI_FIXED_INTF(0x2001, 0x7e19, 4)}, /* D-Link DWM-221 B1 */ {QMI_FIXED_INTF(0x2001, 0x7e35, 4)}, /* D-Link DWM-222 */ + {QMI_FIXED_INTF(0x2020, 0x2031, 4)}, /* Olicard 600 */ {QMI_FIXED_INTF(0x2020, 0x2033, 4)}, /* BroadMobi BM806U */ {QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)}, /* Sierra Wireless MC7700 */ {QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */
[ Upstream commit b75bb8a5b755d0c7bf1ac071e4df2349a7644a1e ]
There's a significant number of reports that re-enabling ASPM causes different issues, ranging from decreased performance to system not booting at all. This affects only a minority of users, but the number of affected users is big enough that we better switch off ASPM again.
This will hurt notebook users who are not affected by the issues, they may see decreased battery runtime w/o ASPM. With the PCI core folks is being discussed to add generic sysfs attributes to control ASPM. Once this is in place brave enough users can re-enable ASPM on their system.
Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support") Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/realtek/r8169.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index f55d177ae894..5adb00f521db 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -28,6 +28,7 @@ #include <linux/pm_runtime.h> #include <linux/firmware.h> #include <linux/prefetch.h> +#include <linux/pci-aspm.h> #include <linux/ipv6.h> #include <net/ip6_checksum.h>
@@ -7224,6 +7225,11 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) return rc; }
+ /* Disable ASPM completely as that cause random device stop working + * problems as well as full system hangs for some PCIe devices users. + */ + pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1); + /* enable device (incl. PCI PM wakeup and hotplug setup) */ rc = pcim_enable_device(pdev); if (rc < 0) {
[ Upstream commit 09279e615c81ce55e04835970601ae286e3facbe ]
Syzbot report a kernel-infoleak:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 Call Trace: _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 copy_to_user include/linux/uaccess.h:174 [inline] sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline] sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562 ... Uninit was stored to memory at: sctp_transport_init net/sctp/transport.c:61 [inline] sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115 sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637 sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline] sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361 ... Bytes 8-15 of 16 are uninitialized
It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in struct sockaddr_in) wasn't initialized, but directly copied to user memory in sctp_getsockopt_peer_addrs().
So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as sctp_v6_addr_to_user() does.
Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Tested-by: Alexander Potapenko glider@google.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/protocol.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 6abc8b274270..951afdeea5e9 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -600,6 +600,7 @@ static struct sock *sctp_v4_create_accept_sk(struct sock *sk, static int sctp_v4_addr_to_user(struct sctp_sock *sp, union sctp_addr *addr) { /* No address mapping for V4 sockets */ + memset(addr->v4.sin_zero, 0, sizeof(addr->v4.sin_zero)); return sizeof(struct sockaddr_in); }
[ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to loss episodes in the same way as conventional TCP".
Currently, Linux DCTCP performs no cwnd reduction when losses are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets alpha to its maximal value if a RTO happens. This behavior is sub-optimal for at least two reasons: i) it ignores losses triggering fast retransmissions; and ii) it causes unnecessary large cwnd reduction in the future if the loss was isolated as it resets the historical term of DCTCP's alpha EWMA to its maximal value (i.e., denoting a total congestion). The second reason has an especially noticeable effect when using DCTCP in high BDP environments, where alpha normally stays at low values.
This patch replace the clamping of alpha by setting ssthresh to half of cwnd for both fast retransmissions and RTOs, at most once per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter has been removed.
The table below shows experimental results where we measured the drop probability of a PIE AQM (not applying ECN marks) at a bottleneck in the presence of a single TCP flow with either the alpha-clamping option enabled or the cwnd halving proposed by this patch. Results using reno or cubic are given for comparison.
| Link | RTT | Drop TCP CC | speed | base+AQM | probability ==================|=========|==========|============ CUBIC | 40Mbps | 7+20ms | 0.21% RENO | | | 0.19% DCTCP-CLAMP-ALPHA | | | 25.80% DCTCP-HALVE-CWND | | | 0.22% ------------------|---------|----------|------------ CUBIC | 100Mbps | 7+20ms | 0.03% RENO | | | 0.02% DCTCP-CLAMP-ALPHA | | | 23.30% DCTCP-HALVE-CWND | | | 0.04% ------------------|---------|----------|------------ CUBIC | 800Mbps | 1+1ms | 0.04% RENO | | | 0.05% DCTCP-CLAMP-ALPHA | | | 18.70% DCTCP-HALVE-CWND | | | 0.06%
We see that, without halving its cwnd for all source of losses, DCTCP drives the AQM to large drop probabilities in order to keep the queue length under control (i.e., it repeatedly faces RTOs). Instead, if DCTCP reacts to all source of losses, it can then be controlled by the AQM using similar drop levels than cubic or reno.
Signed-off-by: Koen De Schepper koen.de_schepper@nokia-bell-labs.com Signed-off-by: Olivier Tilmans olivier.tilmans@nokia-bell-labs.com Cc: Bob Briscoe research@bobbriscoe.net Cc: Lawrence Brakmo brakmo@fb.com Cc: Florian Westphal fw@strlen.de Cc: Daniel Borkmann borkmann@iogearbox.net Cc: Yuchung Cheng ycheng@google.com Cc: Neal Cardwell ncardwell@google.com Cc: Eric Dumazet edumazet@google.com Cc: Andrew Shewmaker agshew@gmail.com Cc: Glenn Judd glenn.judd@morganstanley.com Acked-by: Florian Westphal fw@strlen.de Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_dctcp.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index cd4814f7e962..359da68d7c06 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -67,11 +67,6 @@ static unsigned int dctcp_alpha_on_init __read_mostly = DCTCP_MAX_ALPHA; module_param(dctcp_alpha_on_init, uint, 0644); MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value");
-static unsigned int dctcp_clamp_alpha_on_loss __read_mostly; -module_param(dctcp_clamp_alpha_on_loss, uint, 0644); -MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss, - "parameter for clamping alpha on loss"); - static struct tcp_congestion_ops dctcp_reno;
static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca) @@ -164,21 +159,23 @@ static void dctcp_update_alpha(struct sock *sk, u32 flags) } }
-static void dctcp_state(struct sock *sk, u8 new_state) +static void dctcp_react_to_loss(struct sock *sk) { - if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) { - struct dctcp *ca = inet_csk_ca(sk); + struct dctcp *ca = inet_csk_ca(sk); + struct tcp_sock *tp = tcp_sk(sk);
- /* If this extension is enabled, we clamp dctcp_alpha to - * max on packet loss; the motivation is that dctcp_alpha - * is an indicator to the extend of congestion and packet - * loss is an indicator of extreme congestion; setting - * this in practice turned out to be beneficial, and - * effectively assumes total congestion which reduces the - * window by half. - */ - ca->dctcp_alpha = DCTCP_MAX_ALPHA; - } + ca->loss_cwnd = tp->snd_cwnd; + tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U); +} + +static void dctcp_state(struct sock *sk, u8 new_state) +{ + if (new_state == TCP_CA_Recovery && + new_state != inet_csk(sk)->icsk_ca_state) + dctcp_react_to_loss(sk); + /* We handle RTO in dctcp_cwnd_event to ensure that we perform only + * one loss-adjustment per RTT. + */ }
static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) @@ -190,6 +187,9 @@ static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) case CA_EVENT_ECN_NO_CE: dctcp_ece_ack_update(sk, ev, &ca->prior_rcv_nxt, &ca->ce_state); break; + case CA_EVENT_LOSS: + dctcp_react_to_loss(sk); + break; default: /* Don't care for the rest. */ break;
[ Upstream commit b506bc975f60f06e13e74adb35e708a23dc4e87c ]
When tcp_sk_init() failed in inet_ctl_sock_create(), 'net->ipv4.tcp_congestion_control' will be left uninitialized, but tcp_sk_exit() hasn't check for that.
This patch add checking on 'net->ipv4.tcp_congestion_control' in tcp_sk_exit() to prevent NULL-ptr dereference.
Fixes: 6670e1524477 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control") Signed-off-by: Dust Li dust.li@linux.alibaba.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_ipv4.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 1aae9ab57fe9..00852f47a73d 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2578,7 +2578,8 @@ static void __net_exit tcp_sk_exit(struct net *net) { int cpu;
- module_put(net->ipv4.tcp_congestion_control->owner); + if (net->ipv4.tcp_congestion_control) + module_put(net->ipv4.tcp_congestion_control->owner);
for_each_possible_cpu(cpu) inet_ctl_sock_destroy(*per_cpu_ptr(net->ipv4.tcp_sk, cpu));
[ Upstream commit 8c83f2df9c6578ea4c5b940d8238ad8a41b87e9e ]
Configuration check to accept source route IP options should be made on the incoming netdevice when the skb->dev is an l3mdev master. The route lookup for the source route next hop also needs the incoming netdev.
v2->v3: - Simplify by passing the original netdevice down the stack (per David Ahern).
Signed-off-by: Stephen Suryaputra ssuryaextr@gmail.com Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/ip.h | 2 +- net/ipv4/ip_input.c | 7 +++---- net/ipv4/ip_options.c | 4 ++-- 3 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h index be3cad9c2e4c..583526aad1d0 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -677,7 +677,7 @@ int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp, unsigned char __user *data, int optlen); void ip_options_undo(struct ip_options *opt); void ip_forward_options(struct sk_buff *skb); -int ip_options_rcv_srr(struct sk_buff *skb); +int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev);
/* * Functions provided by ip_sockglue.c diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index 1f4737b77067..ccf0d31b6ce5 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -257,11 +257,10 @@ int ip_local_deliver(struct sk_buff *skb) ip_local_deliver_finish); }
-static inline bool ip_rcv_options(struct sk_buff *skb) +static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev) { struct ip_options *opt; const struct iphdr *iph; - struct net_device *dev = skb->dev;
/* It looks as overkill, because not all IP options require packet mangling. @@ -297,7 +296,7 @@ static inline bool ip_rcv_options(struct sk_buff *skb) } }
- if (ip_options_rcv_srr(skb)) + if (ip_options_rcv_srr(skb, dev)) goto drop; }
@@ -353,7 +352,7 @@ static int ip_rcv_finish_core(struct net *net, struct sock *sk, } #endif
- if (iph->ihl > 5 && ip_rcv_options(skb)) + if (iph->ihl > 5 && ip_rcv_options(skb, dev)) goto drop;
rt = skb_rtable(skb); diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c index 32a35043c9f5..3db31bb9df50 100644 --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -612,7 +612,7 @@ void ip_forward_options(struct sk_buff *skb) } }
-int ip_options_rcv_srr(struct sk_buff *skb) +int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev) { struct ip_options *opt = &(IPCB(skb)->opt); int srrspace, srrptr; @@ -647,7 +647,7 @@ int ip_options_rcv_srr(struct sk_buff *skb)
orefdst = skb->_skb_refdst; skb_dst_set(skb, NULL); - err = ip_route_input(skb, nexthop, iph->saddr, iph->tos, skb->dev); + err = ip_route_input(skb, nexthop, iph->saddr, iph->tos, dev); rt2 = skb_rtable(skb); if (err || (rt2->rt_type != RTN_UNICAST && rt2->rt_type != RTN_LOCAL)) { skb_dst_drop(skb);
[ Upstream commit bc87a0036826a37b43489b029af8143bd07c6cca ]
Previously, a false positive would be caught if the TIRs list is empty, since the err value was initialized to -ENOMEM, and was only updated if a TIR is refreshed. This is resolved by initializing the err value to zero.
Fixes: b676f653896a ("net/mlx5e: Refactor refresh TIRs") Signed-off-by: Gavi Teitz gavi@mellanox.com Reviewed-by: Roi Dayan roid@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index 3078491cc0d0..8100786f6fb5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -141,15 +141,17 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb) { struct mlx5_core_dev *mdev = priv->mdev; struct mlx5e_tir *tir; - int err = -ENOMEM; + int err = 0; u32 tirn = 0; int inlen; void *in;
inlen = MLX5_ST_SZ_BYTES(modify_tir_in); in = kvzalloc(inlen, GFP_KERNEL); - if (!in) + if (!in) { + err = -ENOMEM; goto out; + }
if (enable_uc_lb) MLX5_SET(modify_tir_in, in, ctx.self_lb_block,
[ Upstream commit 80a2a9026b24c6bd34b8d58256973e22270bedec ]
Refresh tirs is looping over a global list of tirs while netdevs are adding and removing tirs from that list. That is why a lock is required.
Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring") Signed-off-by: Yuval Avnery yuvalav@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 7 +++++++ include/linux/mlx5/driver.h | 2 ++ 2 files changed, 9 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index 8100786f6fb5..1539cf3de5dc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -45,7 +45,9 @@ int mlx5e_create_tir(struct mlx5_core_dev *mdev, if (err) return err;
+ mutex_lock(&mdev->mlx5e_res.td.list_lock); list_add(&tir->list, &mdev->mlx5e_res.td.tirs_list); + mutex_unlock(&mdev->mlx5e_res.td.list_lock);
return 0; } @@ -53,8 +55,10 @@ int mlx5e_create_tir(struct mlx5_core_dev *mdev, void mlx5e_destroy_tir(struct mlx5_core_dev *mdev, struct mlx5e_tir *tir) { + mutex_lock(&mdev->mlx5e_res.td.list_lock); mlx5_core_destroy_tir(mdev, tir->tirn); list_del(&tir->list); + mutex_unlock(&mdev->mlx5e_res.td.list_lock); }
static int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, @@ -114,6 +118,7 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev) }
INIT_LIST_HEAD(&mdev->mlx5e_res.td.tirs_list); + mutex_init(&mdev->mlx5e_res.td.list_lock);
return 0;
@@ -159,6 +164,7 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb)
MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
+ mutex_lock(&mdev->mlx5e_res.td.list_lock); list_for_each_entry(tir, &mdev->mlx5e_res.td.tirs_list, list) { tirn = tir->tirn; err = mlx5_core_modify_tir(mdev, tirn, in, inlen); @@ -170,6 +176,7 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb) kvfree(in); if (err) netdev_err(priv->netdev, "refresh tir(0x%x) failed, %d\n", tirn, err); + mutex_unlock(&mdev->mlx5e_res.td.list_lock);
return err; } diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 54299251d40d..4f001619f854 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -591,6 +591,8 @@ enum mlx5_pagefault_type_flags { };
struct mlx5_td { + /* protects tirs list changes while tirs refresh */ + struct mutex list_lock; struct list_head tirs_list; u32 tdn; };
[ Upstream commit c8ba5b91a04e3e2643e48501c114108802f21cda ]
dev_queue_xmit() may return error codes as well as netdev_tx_t, and it always consumes the skb. Make sure we always return a correct netdev_tx_t value.
Fixes: eadfa4c3be99 ("nfp: add stats and xmit helpers for representors") Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Reviewed-by: John Hurley john.hurley@netronome.com Reviewed-by: Simon Horman simon.horman@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c index 69d7aebda09b..7d62e3698f08 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -196,7 +196,7 @@ static netdev_tx_t nfp_repr_xmit(struct sk_buff *skb, struct net_device *netdev) ret = dev_queue_xmit(skb); nfp_repr_inc_tx_stats(netdev, len, ret);
- return ret; + return NETDEV_TX_OK; }
static int nfp_repr_stop(struct net_device *netdev)
[ Upstream commit c3e1f7fff69c78169c8ac40cc74ac4307f74e36d ]
NFP reprs are software device on top of the PF's vNIC. The comment above __dev_queue_xmit() sayeth:
When calling this method, interrupts MUST be enabled. This is because the BH enable code must have IRQs enabled so that it will not deadlock.
For netconsole we can't guarantee IRQ state, let's just disable netpoll on representors to be on the safe side.
When the initial implementation of NFP reprs was added by the commit 5de73ee46704 ("nfp: general representor implementation") .ndo_poll_controller was required for netpoll to be enabled.
Fixes: ac3d9dd034e5 ("netpoll: make ndo_poll_controller() optional") Signed-off-by: Jakub Kicinski jakub.kicinski@netronome.com Reviewed-by: John Hurley john.hurley@netronome.com Reviewed-by: Simon Horman simon.horman@netronome.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c index 7d62e3698f08..73db94e55fd0 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -384,7 +384,7 @@ int nfp_repr_init(struct nfp_app *app, struct net_device *netdev, netdev->features &= ~(NETIF_F_TSO | NETIF_F_TSO6); netdev->gso_max_segs = NFP_NET_LSO_MAX_SEGS;
- netdev->priv_flags |= IFF_NO_QUEUE; + netdev->priv_flags |= IFF_NO_QUEUE | IFF_DISABLE_NETPOLL; netdev->features |= NETIF_F_LLTX;
if (nfp_app_has_tc(app)) {
[ Upstream commit a1b0e4e684e9c300b9e759b46cb7a0147e61ddff ]
There is logic to check that the RX/TPA consumer index is the expected index to work around a hardware problem. However, the potentially bad consumer index is first used to index into an array to reference an entry. This can potentially crash if the bad consumer index is beyond legal range. Improve the logic to use the consumer index for dereferencing after the validity check and log an error message.
Fixes: fa7e28127a5a ("bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 803f7990d32b..351417e74ae2 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1129,6 +1129,8 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, tpa_info = &rxr->rx_tpa[agg_id];
if (unlikely(cons != rxr->rx_next_cons)) { + netdev_warn(bp->dev, "TPA cons %x != expected cons %x\n", + cons, rxr->rx_next_cons); bnxt_sched_reset(bp, rxr); return; } @@ -1581,15 +1583,17 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, }
cons = rxcmp->rx_cmp_opaque; - rx_buf = &rxr->rx_buf_ring[cons]; - data = rx_buf->data; - data_ptr = rx_buf->data_ptr; if (unlikely(cons != rxr->rx_next_cons)) { int rc1 = bnxt_discard_rx(bp, cpr, raw_cons, rxcmp);
+ netdev_warn(bp->dev, "RX cons %x != expected cons %x\n", + cons, rxr->rx_next_cons); bnxt_sched_reset(bp, rxr); return rc1; } + rx_buf = &rxr->rx_buf_ring[cons]; + data = rx_buf->data; + data_ptr = rx_buf->data_ptr; prefetch(data_ptr);
misc = le32_to_cpu(rxcmp->rx_cmp_misc_v1);
[ Upstream commit 8e44e96c6c8e8fb80b84a2ca11798a8554f710f2 ]
If the RX completion indicates RX buffers errors, the RX ring will be disabled by firmware and no packets will be received on that ring from that point on. Recover by resetting the device.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 351417e74ae2..40ca339ec3df 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1610,11 +1610,17 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
rx_buf->data = NULL; if (rxcmp1->rx_cmp_cfa_code_errors_v2 & RX_CMP_L2_ERRORS) { + u32 rx_err = le32_to_cpu(rxcmp1->rx_cmp_cfa_code_errors_v2); + bnxt_reuse_rx_data(rxr, cons, data); if (agg_bufs) bnxt_reuse_rx_agg_bufs(cpr, cp_cons, agg_bufs);
rc = -EIO; + if (rx_err & RX_CMPL_ERRORS_BUFFER_ERROR_MASK) { + netdev_warn(bp->dev, "RX buffer error %x\n", rx_err); + bnxt_sched_reset(bp, rxr); + } goto next_rx; }
[ Upstream commit 492b67e28ee5f2a2522fb72e3d3bcb990e461514 ]
erspan tunnels run __iptunnel_pull_header on received skbs to remove gre and erspan headers. This can determine a possible use-after-free accessing pkt_md pointer in erspan_rcv since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device). Fix it resetting pkt_md pointer after __iptunnel_pull_header
Fixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code") Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_gre.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 6ae89f2b541b..2d5734079e6b 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -259,7 +259,6 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, struct net *net = dev_net(skb->dev); struct metadata_dst *tun_dst = NULL; struct erspan_base_hdr *ershdr; - struct erspan_metadata *pkt_md; struct ip_tunnel_net *itn; struct ip_tunnel *tunnel; const struct iphdr *iph; @@ -282,9 +281,6 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, if (unlikely(!pskb_may_pull(skb, len))) return PACKET_REJECT;
- ershdr = (struct erspan_base_hdr *)(skb->data + gre_hdr_len); - pkt_md = (struct erspan_metadata *)(ershdr + 1); - if (__iptunnel_pull_header(skb, len, htons(ETH_P_TEB), @@ -292,8 +288,9 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, goto drop;
if (tunnel->collect_md) { + struct erspan_metadata *pkt_md, *md; struct ip_tunnel_info *info; - struct erspan_metadata *md; + unsigned char *gh; __be64 tun_id; __be16 flags;
@@ -306,6 +303,14 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, if (!tun_dst) return PACKET_REJECT;
+ /* skb can be uncloned in __iptunnel_pull_header, so + * old pkt_md is no longer valid and we need to reset + * it + */ + gh = skb_network_header(skb) + + skb_network_header_len(skb); + pkt_md = (struct erspan_metadata *)(gh + gre_hdr_len + + sizeof(*ershdr)); md = ip_tunnel_info_opts(&tun_dst->u.tun_info); md->version = ver; md2 = &md->u.md2;
[ Upstream commit 2a3cabae4536edbcb21d344e7aa8be7a584d2afb ]
erspan_v6 tunnels run __iptunnel_pull_header on received skbs to remove erspan header. This can determine a possible use-after-free accessing pkt_md pointer in ip6erspan_rcv since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device). Fix it resetting pkt_md pointer after __iptunnel_pull_header
Fixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code") Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_gre.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 26f25b6e2833..438f1a5fd19a 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -524,11 +524,10 @@ static int ip6gre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi) return PACKET_REJECT; }
-static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len, - struct tnl_ptk_info *tpi) +static int ip6erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi, + int gre_hdr_len) { struct erspan_base_hdr *ershdr; - struct erspan_metadata *pkt_md; const struct ipv6hdr *ipv6h; struct erspan_md2 *md2; struct ip6_tnl *tunnel; @@ -547,18 +546,16 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len, if (unlikely(!pskb_may_pull(skb, len))) return PACKET_REJECT;
- ershdr = (struct erspan_base_hdr *)skb->data; - pkt_md = (struct erspan_metadata *)(ershdr + 1); - if (__iptunnel_pull_header(skb, len, htons(ETH_P_TEB), false, false) < 0) return PACKET_REJECT;
if (tunnel->parms.collect_md) { + struct erspan_metadata *pkt_md, *md; struct metadata_dst *tun_dst; struct ip_tunnel_info *info; - struct erspan_metadata *md; + unsigned char *gh; __be64 tun_id; __be16 flags;
@@ -571,6 +568,14 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len, if (!tun_dst) return PACKET_REJECT;
+ /* skb can be uncloned in __iptunnel_pull_header, so + * old pkt_md is no longer valid and we need to reset + * it + */ + gh = skb_network_header(skb) + + skb_network_header_len(skb); + pkt_md = (struct erspan_metadata *)(gh + gre_hdr_len + + sizeof(*ershdr)); info = &tun_dst->u.tun_info; md = ip_tunnel_info_opts(info); md->version = ver; @@ -607,7 +612,7 @@ static int gre_rcv(struct sk_buff *skb)
if (unlikely(tpi.proto == htons(ETH_P_ERSPAN) || tpi.proto == htons(ETH_P_ERSPAN2))) { - if (ip6erspan_rcv(skb, hdr_len, &tpi) == PACKET_RCVD) + if (ip6erspan_rcv(skb, &tpi, hdr_len) == PACKET_RCVD) return 0; goto out; }
[ Upstream commit 1515a63fc413f160d20574ab0894e7f1020c7be2 ]
We need to be careful and always zero the whole br_ip struct when it is used for matching since the rhashtable change. This patch fixes all the places which didn't properly clear it which in turn might've caused mismatches.
Thanks for the great bug report with reproducing steps and bisection.
Steps to reproduce (from the bug report): ip link add br0 type bridge mcast_querier 1 ip link set br0 up
ip link add v2 type veth peer name v3 ip link set v2 master br0 ip link set v2 up ip link set v3 up ip addr add 3.0.0.2/24 dev v3
ip netns add test ip link add v1 type veth peer name v1 netns test ip link set v1 master br0 ip link set v1 up ip -n test link set v1 up ip -n test addr add 3.0.0.1/24 dev v1
# Multicast receiver ip netns exec test socat UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork -
# Multicast sender echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588
Reported-by: liam.mcbirnie@boeing.com Fixes: 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable") Signed-off-by: Nikolay Aleksandrov nikolay@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_multicast.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index ac92b2eb32b1..e4777614a8a0 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -599,6 +599,7 @@ static int br_ip4_multicast_add_group(struct net_bridge *br, if (ipv4_is_local_multicast(group)) return 0;
+ memset(&br_group, 0, sizeof(br_group)); br_group.u.ip4 = group; br_group.proto = htons(ETH_P_IP); br_group.vid = vid; @@ -1489,6 +1490,7 @@ static void br_ip4_multicast_leave_group(struct net_bridge *br,
own_query = port ? &port->ip4_own_query : &br->ip4_own_query;
+ memset(&br_group, 0, sizeof(br_group)); br_group.u.ip4 = group; br_group.proto = htons(ETH_P_IP); br_group.vid = vid; @@ -1512,6 +1514,7 @@ static void br_ip6_multicast_leave_group(struct net_bridge *br,
own_query = port ? &port->ip6_own_query : &br->ip6_own_query;
+ memset(&br_group, 0, sizeof(br_group)); br_group.u.ip6 = *group; br_group.proto = htons(ETH_P_IPV6); br_group.vid = vid;
[ Upstream commit 2ec1ed2aa68782b342458681aa4d16b65c9014d6 ]
When a bpf program is uploaded, the driver computes the number of xdp tx queues resulting in the allocation of additional qsets. Starting from commit '2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them")' the driver runs link state polling for each VF resulting in the following NULL pointer dereference:
[ 56.169256] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 [ 56.178032] Mem abort info: [ 56.180834] ESR = 0x96000005 [ 56.183877] Exception class = DABT (current EL), IL = 32 bits [ 56.189792] SET = 0, FnV = 0 [ 56.192834] EA = 0, S1PTW = 0 [ 56.195963] Data abort info: [ 56.198831] ISV = 0, ISS = 0x00000005 [ 56.202662] CM = 0, WnR = 0 [ 56.205619] user pgtable: 64k pages, 48-bit VAs, pgdp = 0000000021f0c7a0 [ 56.212315] [0000000000000020] pgd=0000000000000000, pud=0000000000000000 [ 56.219094] Internal error: Oops: 96000005 [#1] SMP [ 56.260459] CPU: 39 PID: 2034 Comm: ip Not tainted 5.1.0-rc3+ #3 [ 56.266452] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018 [ 56.273315] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 56.278098] pc : __ll_sc___cmpxchg_case_acq_64+0x4/0x20 [ 56.283312] lr : mutex_lock+0x2c/0x50 [ 56.286962] sp : ffff0000219af1b0 [ 56.290264] x29: ffff0000219af1b0 x28: ffff800f64de49a0 [ 56.295565] x27: 0000000000000000 x26: 0000000000000015 [ 56.300865] x25: 0000000000000000 x24: 0000000000000000 [ 56.306165] x23: 0000000000000000 x22: ffff000011117000 [ 56.311465] x21: ffff800f64dfc080 x20: 0000000000000020 [ 56.316766] x19: 0000000000000020 x18: 0000000000000001 [ 56.322066] x17: 0000000000000000 x16: ffff800f2e077080 [ 56.327367] x15: 0000000000000004 x14: 0000000000000000 [ 56.332667] x13: ffff000010964438 x12: 0000000000000002 [ 56.337967] x11: 0000000000000000 x10: 0000000000000c70 [ 56.343268] x9 : ffff0000219af120 x8 : ffff800f2e077d50 [ 56.348568] x7 : 0000000000000027 x6 : 000000062a9d6a84 [ 56.353869] x5 : 0000000000000000 x4 : ffff800f2e077480 [ 56.359169] x3 : 0000000000000008 x2 : ffff800f2e077080 [ 56.364469] x1 : 0000000000000000 x0 : 0000000000000020 [ 56.369770] Process ip (pid: 2034, stack limit = 0x00000000c862da3a) [ 56.376110] Call trace: [ 56.378546] __ll_sc___cmpxchg_case_acq_64+0x4/0x20 [ 56.383414] drain_workqueue+0x34/0x198 [ 56.387247] nicvf_open+0x48/0x9e8 [nicvf] [ 56.391334] nicvf_open+0x898/0x9e8 [nicvf] [ 56.395507] nicvf_xdp+0x1bc/0x238 [nicvf] [ 56.399595] dev_xdp_install+0x68/0x90 [ 56.403333] dev_change_xdp_fd+0xc8/0x240 [ 56.407333] do_setlink+0x8e0/0xbe8 [ 56.410810] __rtnl_newlink+0x5b8/0x6d8 [ 56.414634] rtnl_newlink+0x54/0x80 [ 56.418112] rtnetlink_rcv_msg+0x22c/0x2f8 [ 56.422199] netlink_rcv_skb+0x60/0x120 [ 56.426023] rtnetlink_rcv+0x28/0x38 [ 56.429587] netlink_unicast+0x1c8/0x258 [ 56.433498] netlink_sendmsg+0x1b4/0x350 [ 56.437410] sock_sendmsg+0x4c/0x68 [ 56.440887] ___sys_sendmsg+0x240/0x280 [ 56.444711] __sys_sendmsg+0x68/0xb0 [ 56.448275] __arm64_sys_sendmsg+0x2c/0x38 [ 56.452361] el0_svc_handler+0x9c/0x128 [ 56.456186] el0_svc+0x8/0xc [ 56.459056] Code: 35ffff91 2a1003e0 d65f03c0 f9800011 (c85ffc10) [ 56.465166] ---[ end trace 4a57fdc27b0a572c ]--- [ 56.469772] Kernel panic - not syncing: Fatal exception
Fix it by checking nicvf_rx_mode_wq pointer in nicvf_open and nicvf_stop
Fixes: 2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them") Fixes: 2c632ad8bc74 ("net: thunderx: move link state polling function to VF") Reported-by: Matteo Croce mcroce@redhat.com Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Tested-by: Matteo Croce mcroce@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/cavium/thunder/nicvf_main.c | 20 +++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index 503cfadff4ac..d4ee9f9c8c34 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -1328,10 +1328,11 @@ int nicvf_stop(struct net_device *netdev) struct nicvf_cq_poll *cq_poll = NULL; union nic_mbx mbx = {};
- cancel_delayed_work_sync(&nic->link_change_work); - /* wait till all queued set_rx_mode tasks completes */ - drain_workqueue(nic->nicvf_rx_mode_wq); + if (nic->nicvf_rx_mode_wq) { + cancel_delayed_work_sync(&nic->link_change_work); + drain_workqueue(nic->nicvf_rx_mode_wq); + }
mbx.msg.msg = NIC_MBOX_MSG_SHUTDOWN; nicvf_send_msg_to_pf(nic, &mbx); @@ -1452,7 +1453,8 @@ int nicvf_open(struct net_device *netdev) struct nicvf_cq_poll *cq_poll = NULL;
/* wait till all queued set_rx_mode tasks completes if any */ - drain_workqueue(nic->nicvf_rx_mode_wq); + if (nic->nicvf_rx_mode_wq) + drain_workqueue(nic->nicvf_rx_mode_wq);
netif_carrier_off(netdev);
@@ -1550,10 +1552,12 @@ int nicvf_open(struct net_device *netdev) /* Send VF config done msg to PF */ nicvf_send_cfg_done(nic);
- INIT_DELAYED_WORK(&nic->link_change_work, - nicvf_link_status_check_task); - queue_delayed_work(nic->nicvf_rx_mode_wq, - &nic->link_change_work, 0); + if (nic->nicvf_rx_mode_wq) { + INIT_DELAYED_WORK(&nic->link_change_work, + nicvf_link_status_check_task); + queue_delayed_work(nic->nicvf_rx_mode_wq, + &nic->link_change_work, 0); + }
return 0; cleanup:
[ Upstream commit 5055376a3b44c4021de8830c9157f086a97731df ]
When the mtu of a vrf device is set to 0, it would cause ping failed. So I think we should limit vrf mtu in a reasonable range to solve this problem. I set dev->min_mtu to IPV6_MIN_MTU, so it will works for both ipv4 and ipv6. And if dev->max_mtu still be 0 can be confusing, so I set dev->max_mtu to ETH_MAX_MTU.
Here is the reproduce step:
1.Config vrf interface and set mtu to 0: 3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master vrf1 state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
2.Ping peer: 3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master vrf1 state UP group default qlen 1000 link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff inet 10.0.0.1/16 scope global enp4s0 valid_lft forever preferred_lft forever connect: Network is unreachable
3.Set mtu to default value, ping works: PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. 64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.88 ms
Fixes: ad49bc6361ca2 ("net: vrf: remove MTU limits for vrf device") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vrf.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 6d1a1abbed27..cd15c32b2e43 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1275,8 +1275,12 @@ static void vrf_setup(struct net_device *dev) dev->priv_flags |= IFF_NO_QUEUE; dev->priv_flags |= IFF_NO_RX_HANDLER;
- dev->min_mtu = 0; - dev->max_mtu = 0; + /* VRF devices do not care about MTU, but if the MTU is set + * too low then the ipv4 and ipv6 protocols are disabled + * which breaks networking. + */ + dev->min_mtu = IPV6_MIN_MTU; + dev->max_mtu = ETH_MAX_MTU; }
static int vrf_validate(struct nlattr *tb[], struct nlattr *data[],
[ Upstream commit 9a5a90d167b0e5fe3d47af16b68fd09ce64085cd ]
__netif_receive_skb_list_ptype() leaves skb->next poisoned before passing it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA setup) crashes like:
[ 88.606777] CPU 0 Unable to handle kernel paging request at virtual address 0000000e, epc == 80687078, ra == 8052cc7c [ 88.618666] Oops[#1]: [ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-dlink-00206-g4192a172-dirty #1473 [ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850 [ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000 [ 88.642526] $ 8 : 00000000 49600000 00000001 00000001 [ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41 [ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd [ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800 [ 88.665793] $24 : 00000000 8011e658 [ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c [ 88.677427] Hi : 00000003 [ 88.680622] Lo : 7b5b4220 [ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0 [ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188 [ 88.696734] Status: 10000404 IEp [ 88.700422] Cause : 50000008 (ExcCode 02) [ 88.704874] BadVA : 0000000e [ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi)) [ 88.713005] Modules linked in: [ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000) [ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0 85cff080 80790000 8052cc7c [ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850 87cabf00 807d0000 [ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a 00000000 8052d96c [ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558 87cabf00 85f58c50 [ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001 85f58c00 864d7800 [ 88.771770] ... [ 88.774483] Call Trace: [ 88.777199] [<80687078>] vlan_dev_hard_start_xmit+0x1c/0x1a0 [ 88.783504] [<8052cc7c>] dev_hard_start_xmit+0xac/0x188 [ 88.789326] [<8052d96c>] __dev_queue_xmit+0x6e8/0x7d4 [ 88.794955] [<805a8640>] ip_finish_output2+0x238/0x4d0 [ 88.800677] [<805ab6a0>] ip_output+0xc8/0x140 [ 88.805526] [<805a68f4>] ip_forward+0x364/0x560 [ 88.810567] [<805a4ff8>] ip_rcv+0x48/0xe4 [ 88.815030] [<80528d44>] __netif_receive_skb_one_core+0x44/0x58 [ 88.821635] [<8067f220>] dsa_switch_rcv+0x108/0x1ac [ 88.827067] [<80528f80>] __netif_receive_skb_list_core+0x228/0x26c [ 88.833951] [<8052ed84>] netif_receive_skb_list+0x1d4/0x394 [ 88.840160] [<80355a88>] lunar_rx_poll+0x38c/0x828 [ 88.845496] [<8052fa78>] net_rx_action+0x14c/0x3cc [ 88.850835] [<806ad300>] __do_softirq+0x178/0x338 [ 88.856077] [<8012a2d4>] irq_exit+0xbc/0x100 [ 88.860846] [<802f8b70>] plat_irq_dispatch+0xc0/0x144 [ 88.866477] [<80105974>] handle_int+0x14c/0x158 [ 88.871516] [<806acfb0>] r4k_wait+0x30/0x40 [ 88.876462] Code: afb10014 8c8200a0 00803025 <9443000c> 94a20468 00000000 10620042 00a08025 9605046a [ 88.887332] [ 88.888982] ---[ end trace eb863d007da11cf1 ]--- [ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt [ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fix this by pulling skb off the sublist and zeroing skb->next pointer before calling ptype callback.
Fixes: 88eb1944e18c ("net: core: propagate SKB lists through packet_type lookup") Reviewed-by: Edward Cree ecree@solarflare.com Signed-off-by: Alexander Lobakin alobakin@dlink.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c index 5d03889502eb..12824e007e06 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5014,8 +5014,10 @@ static inline void __netif_receive_skb_list_ptype(struct list_head *head, if (pt_prev->list_func != NULL) pt_prev->list_func(head, pt_prev, orig_dev); else - list_for_each_entry_safe(skb, next, head, list) + list_for_each_entry_safe(skb, next, head, list) { + skb_list_del_init(skb); pt_prev->func(skb, skb->dev, pt_prev, orig_dev); + } }
static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemalloc)
[ Upstream commit 288ac524cf70a8e7ed851a61ed2a9744039dae8d ]
It was reported that re-introducing ASPM, in combination with RX interrupt coalescing, results in significantly increased packet latency, see [0]. Disabling ASPM or RX interrupt coalescing fixes the issue. Therefore change the driver's default to disable RX interrupt coalescing. Users still have the option to enable RX coalescing via ethtool.
[0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925496
Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support") Reported-by: Mike Crowe mac@mcrowe.com Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/realtek/r8169.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 5adb00f521db..365cddbfc684 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -5333,7 +5333,7 @@ static void rtl_hw_start_8168(struct rtl8169_private *tp) tp->cp_cmd |= PktCntrDisable | INTT_1; RTL_W16(tp, CPlusCmd, tp->cp_cmd);
- RTL_W16(tp, IntrMitigate, 0x5151); + RTL_W16(tp, IntrMitigate, 0x5100);
/* Work around for RxFIFO overflow. */ if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
[ Upstream commit 8e949363f017e2011464812a714fb29710fb95b4 ]
idr_find() can return a NULL value to 'flow' which is used without a check. The patch adds a check to avoid potential NULL pointer dereference.
In case of mlx5_fpga_sbu_conn_sendmsg() failure, free buf allocated using kzalloc.
Fixes: ab412e1dd7db ("net/mlx5: Accel, add TLS rx offload routines") Signed-off-by: Aditya Pakki pakki001@umn.edu Reviewed-by: Yuval Shaia yuval.shaia@oracle.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c index 5cf5f2a9d51f..8de64e88c670 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c @@ -217,15 +217,21 @@ int mlx5_fpga_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq, void *cmd; int ret;
+ rcu_read_lock(); + flow = idr_find(&mdev->fpga->tls->rx_idr, ntohl(handle)); + rcu_read_unlock(); + + if (!flow) { + WARN_ONCE(1, "Received NULL pointer for handle\n"); + return -EINVAL; + } + buf = kzalloc(size, GFP_ATOMIC); if (!buf) return -ENOMEM;
cmd = (buf + 1);
- rcu_read_lock(); - flow = idr_find(&mdev->fpga->tls->rx_idr, ntohl(handle)); - rcu_read_unlock(); mlx5_fpga_tls_flow_to_cmd(flow, cmd);
MLX5_SET(tls_cmd, cmd, swid, ntohl(handle)); @@ -238,6 +244,8 @@ int mlx5_fpga_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq, buf->complete = mlx_tls_kfree_complete;
ret = mlx5_fpga_sbu_conn_sendmsg(mdev->fpga->tls->conn, buf); + if (ret < 0) + kfree(buf);
return ret; }
[ Upstream commit 5ec983e924c7978aaec3cf8679ece9436508bb20 ]
Set minimum speed in xoff threshold formula to 40Gbps
Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Huy Nguyen huyn@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en/port_buffer.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c index eac245a93f91..f00de0c987cd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c @@ -122,7 +122,9 @@ static int port_set_buffer(struct mlx5e_priv *priv, return err; }
-/* xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]) */ +/* xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]) + * minimum speed value is 40Gbps + */ static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu) { u32 speed; @@ -130,10 +132,9 @@ static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu) int err;
err = mlx5e_port_linkspeed(priv->mdev, &speed); - if (err) { - mlx5_core_warn(priv->mdev, "cannot get port speed\n"); - return 0; - } + if (err) + speed = SPEED_40000; + speed = max_t(u32, speed, SPEED_40000);
xoff = (301 + 216 * priv->dcbx.cable_len / 100) * speed / 1000 + 272 * mtu / 100;
[ Upstream commit e28408e98bced123038857b6e3c81fa12a2e3e68 ]
Set xon = xoff - netdev's max_mtu. netdev's max_mtu will give enough time for the pause frame to arrive at the sender.
Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Huy Nguyen huyn@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlx5/core/en/port_buffer.c | 28 +++++++++++-------- 1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c index f00de0c987cd..4ab0d030b544 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c @@ -143,7 +143,7 @@ static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu) }
static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer, - u32 xoff, unsigned int mtu) + u32 xoff, unsigned int max_mtu) { int i;
@@ -155,11 +155,12 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer, }
if (port_buffer->buffer[i].size < - (xoff + mtu + (1 << MLX5E_BUFFER_CELL_SHIFT))) + (xoff + max_mtu + (1 << MLX5E_BUFFER_CELL_SHIFT))) return -ENOMEM;
port_buffer->buffer[i].xoff = port_buffer->buffer[i].size - xoff; - port_buffer->buffer[i].xon = port_buffer->buffer[i].xoff - mtu; + port_buffer->buffer[i].xon = + port_buffer->buffer[i].xoff - max_mtu; }
return 0; @@ -167,7 +168,7 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
/** * update_buffer_lossy() - * mtu: device's MTU + * max_mtu: netdev's max_mtu * pfc_en: <input> current pfc configuration * buffer: <input> current prio to buffer mapping * xoff: <input> xoff value @@ -184,7 +185,7 @@ static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer, * Return 0 if no error. * Set change to true if buffer configuration is modified. */ -static int update_buffer_lossy(unsigned int mtu, +static int update_buffer_lossy(unsigned int max_mtu, u8 pfc_en, u8 *buffer, u32 xoff, struct mlx5e_port_buffer *port_buffer, bool *change) @@ -221,7 +222,7 @@ static int update_buffer_lossy(unsigned int mtu, }
if (changed) { - err = update_xoff_threshold(port_buffer, xoff, mtu); + err = update_xoff_threshold(port_buffer, xoff, max_mtu); if (err) return err;
@@ -231,6 +232,7 @@ static int update_buffer_lossy(unsigned int mtu, return 0; }
+#define MINIMUM_MAX_MTU 9216 int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, u32 change, unsigned int mtu, struct ieee_pfc *pfc, @@ -242,12 +244,14 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, bool update_prio2buffer = false; u8 buffer[MLX5E_MAX_PRIORITY]; bool update_buffer = false; + unsigned int max_mtu; u32 total_used = 0; u8 curr_pfc_en; int err; int i;
mlx5e_dbg(HW, priv, "%s: change=%x\n", __func__, change); + max_mtu = max_t(unsigned int, priv->netdev->max_mtu, MINIMUM_MAX_MTU);
err = mlx5e_port_query_buffer(priv, &port_buffer); if (err) @@ -255,7 +259,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
if (change & MLX5E_PORT_BUFFER_CABLE_LEN) { update_buffer = true; - err = update_xoff_threshold(&port_buffer, xoff, mtu); + err = update_xoff_threshold(&port_buffer, xoff, max_mtu); if (err) return err; } @@ -265,7 +269,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, if (err) return err;
- err = update_buffer_lossy(mtu, pfc->pfc_en, buffer, xoff, + err = update_buffer_lossy(max_mtu, pfc->pfc_en, buffer, xoff, &port_buffer, &update_buffer); if (err) return err; @@ -277,8 +281,8 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, if (err) return err;
- err = update_buffer_lossy(mtu, curr_pfc_en, prio2buffer, xoff, - &port_buffer, &update_buffer); + err = update_buffer_lossy(max_mtu, curr_pfc_en, prio2buffer, + xoff, &port_buffer, &update_buffer); if (err) return err; } @@ -302,7 +306,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, return -EINVAL;
update_buffer = true; - err = update_xoff_threshold(&port_buffer, xoff, mtu); + err = update_xoff_threshold(&port_buffer, xoff, max_mtu); if (err) return err; } @@ -310,7 +314,7 @@ int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv, /* Need to update buffer configuration if xoff value is changed */ if (!update_buffer && xoff != priv->dcbx.xoff) { update_buffer = true; - err = update_xoff_threshold(&port_buffer, xoff, mtu); + err = update_xoff_threshold(&port_buffer, xoff, max_mtu); if (err) return err; }
[ Upstream commit ad15006cc78459d059af56729c4d9bed7c7fd860 ]
This causes an issue when trying to build with `make LD=ld.lld` if ld.lld and the rest of your cross tools aren't in the same directory (ex. /usr/local/bin) (as is the case for Android's build system), as the GCC_TOOLCHAIN_DIR then gets set based on `which $(LD)` which will point where LLVM tools are, not GCC/binutils tools are located.
Instead, select the GCC_TOOLCHAIN_DIR based on another tool provided by binutils for which LLVM does not provide a substitute for, such as elfedit.
Fixes: 785f11aa595b ("kbuild: Add better clang cross build support") Link: https://github.com/ClangBuiltLinux/linux/issues/341 Suggested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Nathan Chancellor natechancellor@gmail.com Tested-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Sasha Levin sashal@kernel.org --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile index af99c77c7066..be885715f83f 100644 --- a/Makefile +++ b/Makefile @@ -510,7 +510,7 @@ endif ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),) ifneq ($(CROSS_COMPILE),) CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%)) -GCC_TOOLCHAIN_DIR := $(dir $(shell which $(LD))) +GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit)) CLANG_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR) GCC_TOOLCHAIN := $(realpath $(GCC_TOOLCHAIN_DIR)/..) endif
[ Upstream commit 5f074f3e192f10c9fade898b9b3b8812e3d83342 ]
A recent optimization in Clang (r355672) lowers comparisons of the return value of memcmp against zero to comparisons of the return value of bcmp against zero. This helps some platforms that implement bcmp more efficiently than memcmp. glibc simply aliases bcmp to memcmp, but an optimized implementation is in the works.
This results in linkage failures for all targets with Clang due to the undefined symbol. For now, just implement bcmp as a tailcail to memcmp to unbreak the build. This routine can be further optimized in the future.
Other ideas discussed:
* A weak alias was discussed, but breaks for architectures that define their own implementations of memcmp since aliases to declarations are not permitted (only definitions). Arch-specific memcmp implementations typically declare memcmp in C headers, but implement them in assembly.
* -ffreestanding also is used sporadically throughout the kernel.
* -fno-builtin-bcmp doesn't work when doing LTO.
Link: https://bugs.llvm.org/show_bug.cgi?id=41035 Link: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#bcmp Link: https://github.com/llvm/llvm-project/commit/8e16d73346f8091461319a7dfc4ddd18... Link: https://github.com/ClangBuiltLinux/linux/issues/416 Link: http://lkml.kernel.org/r/20190313211335.165605-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Reported-by: Nathan Chancellor natechancellor@gmail.com Reported-by: Adhemerval Zanella adhemerval.zanella@linaro.org Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: James Y Knight jyknight@google.com Suggested-by: Masahiro Yamada yamada.masahiro@socionext.com Suggested-by: Nathan Chancellor natechancellor@gmail.com Suggested-by: Rasmus Villemoes linux@rasmusvillemoes.dk Acked-by: Steven Rostedt (VMware) rostedt@goodmis.org Reviewed-by: Nathan Chancellor natechancellor@gmail.com Tested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Masahiro Yamada yamada.masahiro@socionext.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: David Laight David.Laight@ACULAB.COM Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Namhyung Kim namhyung@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Dan Williams dan.j.williams@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/string.h | 3 +++ lib/string.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+)
diff --git a/include/linux/string.h b/include/linux/string.h index 7927b875f80c..6ab0a6fa512e 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -150,6 +150,9 @@ extern void * memscan(void *,int,__kernel_size_t); #ifndef __HAVE_ARCH_MEMCMP extern int memcmp(const void *,const void *,__kernel_size_t); #endif +#ifndef __HAVE_ARCH_BCMP +extern int bcmp(const void *,const void *,__kernel_size_t); +#endif #ifndef __HAVE_ARCH_MEMCHR extern void * memchr(const void *,int,__kernel_size_t); #endif diff --git a/lib/string.c b/lib/string.c index 38e4ca08e757..3ab861c1a857 100644 --- a/lib/string.c +++ b/lib/string.c @@ -866,6 +866,26 @@ __visible int memcmp(const void *cs, const void *ct, size_t count) EXPORT_SYMBOL(memcmp); #endif
+#ifndef __HAVE_ARCH_BCMP +/** + * bcmp - returns 0 if and only if the buffers have identical contents. + * @a: pointer to first buffer. + * @b: pointer to second buffer. + * @len: size of buffers. + * + * The sign or magnitude of a non-zero return value has no particular + * meaning, and architectures may implement their own more efficient bcmp(). So + * while this particular implementation is a simple (tail) call to memcmp, do + * not rely on anything but whether the return value is zero or non-zero. + */ +#undef bcmp +int bcmp(const void *a, const void *b, size_t len) +{ + return memcmp(a, b, len); +} +EXPORT_SYMBOL(bcmp); +#endif + #ifndef __HAVE_ARCH_MEMSCAN /** * memscan - Find a character in an area of memory.
This reverts commit 9b0f430450cf230e736bc40f95bf34fbdb99cead.
This patch was not initially a fix and is dependent on other changes which are not fixes eithers.
With this change, multiple Amlogic based boards fails to boot, as reported by kernelci.
Cc: stable@vger.kernel.org # 5.0.7 Signed-off-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/meson/meson-aoclk.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/clk/meson/meson-aoclk.c b/drivers/clk/meson/meson-aoclk.c index 258c8d259ea1..f965845917e3 100644 --- a/drivers/clk/meson/meson-aoclk.c +++ b/drivers/clk/meson/meson-aoclk.c @@ -65,20 +65,15 @@ int meson_aoclkc_probe(struct platform_device *pdev) return ret; }
- /* Populate regmap */ - for (clkid = 0; clkid < data->num_clks; clkid++) + /* + * Populate regmap and register all clks + */ + for (clkid = 0; clkid < data->num_clks; clkid++) { data->clks[clkid]->map = regmap;
- /* Register all clks */ - for (clkid = 0; clkid < data->hw_data->num; clkid++) { - if (!data->hw_data->hws[clkid]) - continue; - ret = devm_clk_hw_register(dev, data->hw_data->hws[clkid]); - if (ret) { - dev_err(dev, "Clock registration failed\n"); + if (ret) return ret; - } }
return devm_of_clk_add_hw_provider(dev, of_clk_hw_onecell_get,
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit c7084edc3f6d67750f50d4183134c4fb5712a5c8 upstream.
The n_r3964 line discipline driver was written in a different time, when SMP machines were rare, and users were trusted to do the right thing. Since then, the world has moved on but not this code, it has stayed rooted in the past with its lovely hand-crafted list structures and loads of "interesting" race conditions all over the place.
After attempting to clean up most of the issues, I just gave up and am now marking the driver as BROKEN so that hopefully someone who has this hardware will show up out of the woodwork (I know you are out there!) and will help with debugging a raft of changes that I had laying around for the code, but was too afraid to commit as odds are they would break things.
Many thanks to Jann and Linus for pointing out the initial problems in this codebase, as well as many reviews of my attempts to fix the issues. It was a case of whack-a-mole, and as you can see, the mole won.
Reported-by: Jann Horn jannh@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
--- drivers/char/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -351,7 +351,7 @@ config XILINX_HWICAP
config R3964 tristate "Siemens R3964 line discipline" - depends on TTY + depends on TTY && BROKEN ---help--- This driver allows synchronous communication with devices using the Siemens R3964 packet protocol. Unless you are dealing with special
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 7c0cca7c847e6e019d67b7d793efbbe3b947d004 upstream.
By default, the kernel will automatically load the module of any line dicipline that is asked for. As this sometimes isn't the safest thing to do, provide a sysctl to disable this feature.
By default, we set this to 'y' as that is the historical way that Linux has worked, and we do not want to break working systems. But in the future, perhaps this can default to 'n' to prevent this functionality.
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/Kconfig | 24 ++++++++++++++++++++++++ drivers/tty/tty_io.c | 3 +++ drivers/tty/tty_ldisc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 74 insertions(+)
--- a/drivers/tty/Kconfig +++ b/drivers/tty/Kconfig @@ -441,4 +441,28 @@ config VCC depends on SUN_LDOMS help Support for Sun logical domain consoles. + +config LDISC_AUTOLOAD + bool "Automatically load TTY Line Disciplines" + default y + help + Historically the kernel has always automatically loaded any + line discipline that is in a kernel module when a user asks + for it to be loaded with the TIOCSETD ioctl, or through other + means. This is not always the best thing to do on systems + where you know you will not be using some of the more + "ancient" line disciplines, so prevent the kernel from doing + this unless the request is coming from a process with the + CAP_SYS_MODULE permissions. + + Say 'Y' here if you trust your userspace users to do the right + thing, or if you have only provided the line disciplines that + you know you will be using, or if you wish to continue to use + the traditional method of on-demand loading of these modules + by any user. + + This functionality can be changed at runtime with the + dev.tty.ldisc_autoload sysctl, this configuration option will + only set the default value of this functionality. + endif # TTY --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -513,6 +513,8 @@ static const struct file_operations hung static DEFINE_SPINLOCK(redirect_lock); static struct file *redirect;
+extern void tty_sysctl_init(void); + /** * tty_wakeup - request more data * @tty: terminal @@ -3483,6 +3485,7 @@ void console_sysfs_notify(void) */ int __init tty_init(void) { + tty_sysctl_init(); cdev_init(&tty_cdev, &tty_fops); if (cdev_add(&tty_cdev, MKDEV(TTYAUX_MAJOR, 0), 1) || register_chrdev_region(MKDEV(TTYAUX_MAJOR, 0), 1, "/dev/tty") < 0) --- a/drivers/tty/tty_ldisc.c +++ b/drivers/tty/tty_ldisc.c @@ -156,6 +156,13 @@ static void put_ldops(struct tty_ldisc_o * takes tty_ldiscs_lock to guard against ldisc races */
+#if defined(CONFIG_LDISC_AUTOLOAD) + #define INITIAL_AUTOLOAD_STATE 1 +#else + #define INITIAL_AUTOLOAD_STATE 0 +#endif +static int tty_ldisc_autoload = INITIAL_AUTOLOAD_STATE; + static struct tty_ldisc *tty_ldisc_get(struct tty_struct *tty, int disc) { struct tty_ldisc *ld; @@ -170,6 +177,8 @@ static struct tty_ldisc *tty_ldisc_get(s */ ldops = get_ldops(disc); if (IS_ERR(ldops)) { + if (!capable(CAP_SYS_MODULE) && !tty_ldisc_autoload) + return ERR_PTR(-EPERM); request_module("tty-ldisc-%d", disc); ldops = get_ldops(disc); if (IS_ERR(ldops)) @@ -845,3 +854,41 @@ void tty_ldisc_deinit(struct tty_struct tty_ldisc_put(tty->ldisc); tty->ldisc = NULL; } + +static int zero; +static int one = 1; +static struct ctl_table tty_table[] = { + { + .procname = "ldisc_autoload", + .data = &tty_ldisc_autoload, + .maxlen = sizeof(tty_ldisc_autoload), + .mode = 0644, + .proc_handler = proc_dointvec, + .extra1 = &zero, + .extra2 = &one, + }, + { } +}; + +static struct ctl_table tty_dir_table[] = { + { + .procname = "tty", + .mode = 0555, + .child = tty_table, + }, + { } +}; + +static struct ctl_table tty_root_table[] = { + { + .procname = "dev", + .mode = 0555, + .child = tty_dir_table, + }, + { } +}; + +void tty_sysctl_init(void) +{ + register_sysctl_table(tty_root_table); +}
From: Axel Lin axel.lin@ingics.com
commit a165dcc923ada2ffdee1d4f41f12f81b66d04c55 upstream.
Select REGMAP_I2C to avoid below build error: ERROR: "__devm_regmap_init_i2c" [drivers/hwmon/w83773g.ko] undefined!
Fixes: ee249f271524 ("hwmon: Add W83773G driver") Cc: stable@vger.kernel.org Signed-off-by: Axel Lin axel.lin@ingics.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -1759,6 +1759,7 @@ config SENSORS_VT8231 config SENSORS_W83773G tristate "Nuvoton W83773G" depends on I2C + select REGMAP_I2C help If you say yes here you get support for the Nuvoton W83773G hardware monitoring chip.
From: Eddie James eajames@linux.ibm.com
commit 8e6af454117a51dbf6c8a47c00180a0c235052fe upstream.
In the case of power sensor version 0xA0, the sensor indexing overlapped with the "caps" power sensors, resulting in probe failure and kernel warnings. Fix this by specifying the next index for each power sensor version.
Fixes: 54076cb3b5ff ("hwmon (occ): Add sensor attributes and register ...") Cc: stable@vger.kernel.org Signed-off-by: Eddie James eajames@linux.ibm.com Tested-by: Joel Stanley joel@jms.id.au Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/occ/common.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/hwmon/occ/common.c +++ b/drivers/hwmon/occ/common.c @@ -889,6 +889,8 @@ static int occ_setup_sensor_attrs(struct s++; } } + + s = (sensors->power.num_sensors * 4) + 1; } else { for (i = 0; i < sensors->power.num_sensors; ++i) { s = i + 1; @@ -917,11 +919,11 @@ static int occ_setup_sensor_attrs(struct show_power, NULL, 3, i); attr++; } - }
- if (sensors->caps.num_sensors >= 1) { s = sensors->power.num_sensors + 1; + }
+ if (sensors->caps.num_sensors >= 1) { snprintf(attr->name, sizeof(attr->name), "power%d_label", s); attr->sensor = OCC_INIT_ATTR(attr->name, 0444, show_caps, NULL, 0, 0);
From: Steve French stfrench@microsoft.com
commit ca567eb2b3f014d5be0f44c6f68b01a522f15ca4 upstream.
Reconnecting after server or network failure can be improved (to maintain availability and protect data integrity) by allowing the client to choose the default persistent (or resilient) handle timeout in some use cases. Today we default to 0 which lets the server pick the default timeout (usually 120 seconds) but this can be problematic for some workloads. Add the new mount parameter to cifs.ko for SMB3 mounts "handletimeout" which enables the user to override the default handle timeout for persistent (mount option "persistenthandles") or resilient handles (mount option "resilienthandles"). Maximum allowed is 16 minutes (960000 ms). Units for the timeout are expressed in milliseconds. See section 2.2.14.2.12 and 2.2.31.3 of the MS-SMB2 protocol specification for more information.
Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Pavel Shilovsky pshilov@microsoft.com Reviewed-by: Ronnie Sahlberg lsahlber@redhat.com CC: Stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/cifsfs.c | 2 ++ fs/cifs/cifsglob.h | 8 ++++++++ fs/cifs/connect.c | 30 +++++++++++++++++++++++++++++- fs/cifs/smb2file.c | 4 +++- fs/cifs/smb2pdu.c | 14 +++++++++++--- 5 files changed, 53 insertions(+), 5 deletions(-)
--- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -559,6 +559,8 @@ cifs_show_options(struct seq_file *s, st tcon->ses->server->echo_interval / HZ); if (tcon->snapshot_time) seq_printf(s, ",snapshot=%llu", tcon->snapshot_time); + if (tcon->handle_timeout) + seq_printf(s, ",handletimeout=%u", tcon->handle_timeout); /* convert actimeo and display it in seconds */ seq_printf(s, ",actimeo=%lu", cifs_sb->actimeo / HZ);
--- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -60,6 +60,12 @@ #define CIFS_MAX_ACTIMEO (1 << 30)
/* + * Max persistent and resilient handle timeout (milliseconds). + * Windows durable max was 960000 (16 minutes) + */ +#define SMB3_MAX_HANDLE_TIMEOUT 960000 + +/* * MAX_REQ is the maximum number of requests that WE will send * on one socket concurrently. */ @@ -572,6 +578,7 @@ struct smb_vol { struct nls_table *local_nls; unsigned int echo_interval; /* echo interval in secs */ __u64 snapshot_time; /* needed for timewarp tokens */ + __u32 handle_timeout; /* persistent and durable handle timeout in ms */ unsigned int max_credits; /* smb3 max_credits 10 < credits < 60000 */ };
@@ -1028,6 +1035,7 @@ struct cifs_tcon { __u32 vol_serial_number; __le64 vol_create_time; __u64 snapshot_time; /* for timewarp tokens - timestamp of snapshot */ + __u32 handle_timeout; /* persistent and durable handle timeout in ms */ __u32 ss_flags; /* sector size flags */ __u32 perf_sector_size; /* best sector size for perf */ __u32 max_chunks; --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -103,7 +103,7 @@ enum { Opt_cruid, Opt_gid, Opt_file_mode, Opt_dirmode, Opt_port, Opt_blocksize, Opt_rsize, Opt_wsize, Opt_actimeo, - Opt_echo_interval, Opt_max_credits, + Opt_echo_interval, Opt_max_credits, Opt_handletimeout, Opt_snapshot,
/* Mount options which take string value */ @@ -208,6 +208,7 @@ static const match_table_t cifs_mount_op { Opt_rsize, "rsize=%s" }, { Opt_wsize, "wsize=%s" }, { Opt_actimeo, "actimeo=%s" }, + { Opt_handletimeout, "handletimeout=%s" }, { Opt_echo_interval, "echo_interval=%s" }, { Opt_max_credits, "max_credits=%s" }, { Opt_snapshot, "snapshot=%s" }, @@ -1600,6 +1601,9 @@ cifs_parse_mount_options(const char *mou
vol->actimeo = CIFS_DEF_ACTIMEO;
+ /* Most clients set timeout to 0, allows server to use its default */ + vol->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */ + /* offer SMB2.1 and later (SMB3 etc). Secure and widely accepted */ vol->ops = &smb30_operations; vol->vals = &smbdefault_values; @@ -1998,6 +2002,18 @@ cifs_parse_mount_options(const char *mou goto cifs_parse_mount_err; } break; + case Opt_handletimeout: + if (get_option_ul(args, &option)) { + cifs_dbg(VFS, "%s: Invalid handletimeout value\n", + __func__); + goto cifs_parse_mount_err; + } + vol->handle_timeout = option; + if (vol->handle_timeout > SMB3_MAX_HANDLE_TIMEOUT) { + cifs_dbg(VFS, "Invalid handle cache timeout, longer than 16 minutes\n"); + goto cifs_parse_mount_err; + } + break; case Opt_echo_interval: if (get_option_ul(args, &option)) { cifs_dbg(VFS, "%s: Invalid echo interval value\n", @@ -3164,6 +3180,8 @@ static int match_tcon(struct cifs_tcon * return 0; if (tcon->snapshot_time != volume_info->snapshot_time) return 0; + if (tcon->handle_timeout != volume_info->handle_timeout) + return 0; return 1; }
@@ -3278,6 +3296,16 @@ cifs_get_tcon(struct cifs_ses *ses, stru tcon->snapshot_time = volume_info->snapshot_time; }
+ if (volume_info->handle_timeout) { + if (ses->server->vals->protocol_id == 0) { + cifs_dbg(VFS, + "Use SMB2.1 or later for handle timeout option\n"); + rc = -EOPNOTSUPP; + goto out_fail; + } else + tcon->handle_timeout = volume_info->handle_timeout; + } + tcon->ses = ses; if (volume_info->password) { tcon->password = kstrdup(volume_info->password, GFP_KERNEL); --- a/fs/cifs/smb2file.c +++ b/fs/cifs/smb2file.c @@ -68,7 +68,9 @@ smb2_open_file(const unsigned int xid, s
if (oparms->tcon->use_resilient) { - nr_ioctl_req.Timeout = 0; /* use server default (120 seconds) */ + /* default timeout is 0, servers pick default (120 seconds) */ + nr_ioctl_req.Timeout = + cpu_to_le32(oparms->tcon->handle_timeout); nr_ioctl_req.Reserved = 0; rc = SMB2_ioctl(xid, oparms->tcon, fid->persistent_fid, fid->volatile_fid, FSCTL_LMR_REQUEST_RESILIENCY, --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -1837,8 +1837,9 @@ add_lease_context(struct TCP_Server_Info }
static struct create_durable_v2 * -create_durable_v2_buf(struct cifs_fid *pfid) +create_durable_v2_buf(struct cifs_open_parms *oparms) { + struct cifs_fid *pfid = oparms->fid; struct create_durable_v2 *buf;
buf = kzalloc(sizeof(struct create_durable_v2), GFP_KERNEL); @@ -1852,7 +1853,14 @@ create_durable_v2_buf(struct cifs_fid *p (struct create_durable_v2, Name)); buf->ccontext.NameLength = cpu_to_le16(4);
- buf->dcontext.Timeout = 0; /* Should this be configurable by workload */ + /* + * NB: Handle timeout defaults to 0, which allows server to choose + * (most servers default to 120 seconds) and most clients default to 0. + * This can be overridden at mount ("handletimeout=") if the user wants + * a different persistent (or resilient) handle timeout for all opens + * opens on a particular SMB3 mount. + */ + buf->dcontext.Timeout = cpu_to_le32(oparms->tcon->handle_timeout); buf->dcontext.Flags = cpu_to_le32(SMB2_DHANDLE_FLAG_PERSISTENT); generate_random_uuid(buf->dcontext.CreateGuid); memcpy(pfid->create_guid, buf->dcontext.CreateGuid, 16); @@ -1905,7 +1913,7 @@ add_durable_v2_context(struct kvec *iov, struct smb2_create_req *req = iov[0].iov_base; unsigned int num = *num_iovec;
- iov[num].iov_base = create_durable_v2_buf(oparms->fid); + iov[num].iov_base = create_durable_v2_buf(oparms); if (iov[num].iov_base == NULL) return -ENOMEM; iov[num].iov_len = sizeof(struct create_durable_v2);
From: Peter Hutterer peter.hutterer@who-t.net
commit fd35759ce32b60d3eb52436894bab996dbf8cffa upstream.
hidpp_scroll_counter_handle_scroll() doesn't expect a 0-value scroll event, it gets interpreted as a negative scroll direction event. This can cause scroll direction resets and thus broken scrolling.
Fixes: 4435ff2f09a2fc ("HID: logitech: Enable high-resolution scrolling on Logitech mice") Cc: stable@vger.kernel.org # v5.0 Reported-and-tested-by: Aimo Metsälä aimetsal@outlook.com Signed-off-by: Peter Hutterer peter.hutterer@who-t.net Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/hid-logitech-hidpp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/hid/hid-logitech-hidpp.c +++ b/drivers/hid/hid-logitech-hidpp.c @@ -2608,8 +2608,9 @@ static int m560_raw_event(struct hid_dev input_report_rel(mydata->input, REL_Y, v);
v = hid_snto32(data[6], 8); - hidpp_scroll_counter_handle_scroll( - &hidpp->vertical_wheel_counter, v); + if (v != 0) + hidpp_scroll_counter_handle_scroll( + &hidpp->vertical_wheel_counter, v);
input_sync(mydata->input); }
From: Furquan Shaikh furquan@google.com
commit c8b1917c8987a6fa3695d479b4d60fbbbc3e537b upstream.
Commit 18996f2db918 ("ACPICA: Events: Stop unconditionally clearing ACPI IRQs during suspend/resume") was added to stop clearing event status bits unconditionally in the system-wide suspend and resume paths. This was done because of an issue with a laptop lid appaering to be closed even when it was used to wake up the system from suspend (see https://bugzilla.kernel.org/show_bug.cgi?id=196249), which happened because event status bits were cleared unconditionally on system resume. Though this change fixed the issue in the resume path, it introduced regressions in a few suspend paths.
First regression was reported and fixed in the S5 entry path by commit fa85015c0d95 ("ACPICA: Clear status of all events when entering S5"). Next regression was reported and fixed for all legacy sleep paths by commit f317c7dc12b7 ("ACPICA: Clear status of all events when entering sleep states"). However, there still is a suspend-to-idle regression, since suspend-to-idle does not follow the legacy sleep paths.
In the suspend-to-idle case, wakeup is enabled as part of device suspend. If the status bits of wakeup GPEs are set when they are enabled, it causes a premature system wakeup to occur.
To address that problem, partially revert commit 18996f2db918 to restore GPE status bits clearing before the GPE is enabled in acpi_ev_enable_gpe().
Fixes: 18996f2db918 ("ACPICA: Events: Stop unconditionally clearing ACPI IRQs during suspend/resume") Signed-off-by: Furquan Shaikh furquan@google.com Cc: 4.17+ stable@vger.kernel.org # 4.17+ [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/acpica/evgpe.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/acpi/acpica/evgpe.c +++ b/drivers/acpi/acpica/evgpe.c @@ -81,8 +81,12 @@ acpi_status acpi_ev_enable_gpe(struct ac
ACPI_FUNCTION_TRACE(ev_enable_gpe);
- /* Enable the requested GPE */ + /* Clear the GPE status */ + status = acpi_hw_clear_gpe(gpe_event_info); + if (ACPI_FAILURE(status)) + return_ACPI_STATUS(status);
+ /* Enable the requested GPE */ status = acpi_hw_low_set_gpe(gpe_event_info, ACPI_GPE_ENABLE); return_ACPI_STATUS(status); }
From: Erik Schmauss erik.schmauss@intel.com
commit c5781ffbbd4f742a58263458145fe7f0ac01d9e0 upstream.
ACPICA commit b233720031a480abd438f2e9c643080929d144c3
ASL operation_regions declare a range of addresses that it uses. In a perfect world, the range of addresses should be used exclusively by the AML interpreter. The OS can use this information to decide which drivers to load so that the AML interpreter and device drivers use different regions of memory.
During table load, the address information is added to a global address range list. Each node in this list contains an address range as well as a namespace node of the operation_region. This list is deleted at ACPI shutdown.
Unfortunately, ASL operation_regions can be declared inside of control methods. Although this is not recommended, modern firmware contains such code. New module level code changes unintentionally removed the functionality of adding and removing nodes to the global address range list.
A few months ago, support for adding addresses has been re- implemented. However, the removal of the address range list was missed and resulted in some systems to crash due to the address list containing bogus namespace nodes from operation_regions declared in control methods. In order to fix the crash, this change removes dynamic operation_regions after control method termination.
Link: https://github.com/acpica/acpica/commit/b2337200 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202475 Fixes: 4abb951b73ff ("ACPICA: AML interpreter: add region addresses in global list during initialization") Reported-by: Michael J Gruber mjg@fedoraproject.org Signed-off-by: Erik Schmauss erik.schmauss@intel.com Signed-off-by: Bob Moore robert.moore@intel.com Cc: 4.20+ stable@vger.kernel.org # 4.20+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/acpica/nsobject.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/acpi/acpica/nsobject.c +++ b/drivers/acpi/acpica/nsobject.c @@ -186,6 +186,10 @@ void acpi_ns_detach_object(struct acpi_n } }
+ if (obj_desc->common.type == ACPI_TYPE_REGION) { + acpi_ut_remove_address_range(obj_desc->region.space_id, node); + } + /* Clear the Node entry in all cases */
node->object = NULL;
From: Zubin Mithra zsm@chromium.org
commit 212ac181c158c09038c474ba68068be49caecebb upstream.
When ioctl calls are made with non-null-terminated userspace strings, strlcpy causes an OOB-read from within strlen. Fix by changing to use strscpy instead.
Signed-off-by: Zubin Mithra zsm@chromium.org Reviewed-by: Guenter Roeck groeck@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/seq/seq_clientmgr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -1252,7 +1252,7 @@ static int snd_seq_ioctl_set_client_info
/* fill the info fields */ if (client_info->name[0]) - strlcpy(client->name, client_info->name, sizeof(client->name)); + strscpy(client->name, client_info->name, sizeof(client->name));
client->filter = client_info->filter; client->event_lost = client_info->event_lost; @@ -1530,7 +1530,7 @@ static int snd_seq_ioctl_create_queue(st /* set queue name */ if (!info->name[0]) snprintf(info->name, sizeof(info->name), "Queue-%d", q->queue); - strlcpy(q->name, info->name, sizeof(q->name)); + strscpy(q->name, info->name, sizeof(q->name)); snd_use_lock_free(&q->use_lock);
return 0; @@ -1592,7 +1592,7 @@ static int snd_seq_ioctl_set_queue_info( queuefree(q); return -EPERM; } - strlcpy(q->name, info->name, sizeof(q->name)); + strscpy(q->name, info->name, sizeof(q->name)); queuefree(q);
return 0;
From: Jian-Hong Pan jian-hong@endlessm.com
commit ea5c7eba216e832906e594799b8670f1954a588c upstream.
The Acer TravelMate B114-21 laptop cannot detect and record sound from headset MIC. This patch adds the ALC233_FIXUP_ACER_HEADSET_MIC HDA verb quirk chained with ALC233_FIXUP_ASUS_MIC_NO_PRESENCE pin quirk to fix this issue.
[ fixed the missing brace and reordered the entry -- tiwai ]
Signed-off-by: Jian-Hong Pan jian-hong@endlessm.com Signed-off-by: Daniel Drake drake@endlessm.com Reviewed-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -5661,6 +5661,7 @@ enum { ALC233_FIXUP_ASUS_MIC_NO_PRESENCE, ALC233_FIXUP_EAPD_COEF_AND_MIC_NO_PRESENCE, ALC233_FIXUP_LENOVO_MULTI_CODECS, + ALC233_FIXUP_ACER_HEADSET_MIC, ALC294_FIXUP_LENOVO_MIC_LOCATION, ALC225_FIXUP_DELL_WYSE_MIC_NO_PRESENCE, ALC700_FIXUP_INTEL_REFERENCE, @@ -6488,6 +6489,16 @@ static const struct hda_fixup alc269_fix .type = HDA_FIXUP_FUNC, .v.func = alc233_alc662_fixup_lenovo_dual_codecs, }, + [ALC233_FIXUP_ACER_HEADSET_MIC] = { + .type = HDA_FIXUP_VERBS, + .v.verbs = (const struct hda_verb[]) { + { 0x20, AC_VERB_SET_COEF_INDEX, 0x45 }, + { 0x20, AC_VERB_SET_PROC_COEF, 0x5089 }, + { } + }, + .chained = true, + .chain_id = ALC233_FIXUP_ASUS_MIC_NO_PRESENCE + }, [ALC294_FIXUP_LENOVO_MIC_LOCATION] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -6735,6 +6746,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1025, 0x1290, "Acer Veriton Z4860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1291, "Acer Veriton Z4660G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1308, "Acer Aspire Z24-890", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), + SND_PCI_QUIRK(0x1025, 0x132a, "Acer TravelMate B114-21", ALC233_FIXUP_ACER_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1330, "Acer TravelMate X514-51T", ALC255_FIXUP_ACER_HEADSET_MIC), SND_PCI_QUIRK(0x1028, 0x0470, "Dell M101z", ALC269_FIXUP_DELL_M101Z), SND_PCI_QUIRK(0x1028, 0x054b, "Dell XPS one 2710", ALC275_FIXUP_DELL_XPS),
From: Richard Sailer rs@tuxedocomputers.com
commit 80690a276f444a68a332136d98bfea1c338bc263 upstream.
This adds a SND_PCI_QUIRK(...) line for the Tuxedo XC 1509.
The Tuxedo XC 1509 and the System76 oryp5 are the same barebone notebooks manufactured by Clevo. To name the fixups both use after the actual underlying hardware, this patch also changes System76_orpy5 to clevo_pb51ed in 2 enum symbols and one function name, matching the other pci_quirk entries which are also named after the device ODM.
Fixes: 7f665b1c3283 ("ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5") Signed-off-by: Richard Sailer rs@tuxedocomputers.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -1864,8 +1864,8 @@ enum { ALC887_FIXUP_BASS_CHMAP, ALC1220_FIXUP_GB_DUAL_CODECS, ALC1220_FIXUP_CLEVO_P950, - ALC1220_FIXUP_SYSTEM76_ORYP5, - ALC1220_FIXUP_SYSTEM76_ORYP5_PINS, + ALC1220_FIXUP_CLEVO_PB51ED, + ALC1220_FIXUP_CLEVO_PB51ED_PINS, };
static void alc889_fixup_coef(struct hda_codec *codec, @@ -2070,7 +2070,7 @@ static void alc1220_fixup_clevo_p950(str static void alc_fixup_headset_mode_no_hp_mic(struct hda_codec *codec, const struct hda_fixup *fix, int action);
-static void alc1220_fixup_system76_oryp5(struct hda_codec *codec, +static void alc1220_fixup_clevo_pb51ed(struct hda_codec *codec, const struct hda_fixup *fix, int action) { @@ -2322,18 +2322,18 @@ static const struct hda_fixup alc882_fix .type = HDA_FIXUP_FUNC, .v.func = alc1220_fixup_clevo_p950, }, - [ALC1220_FIXUP_SYSTEM76_ORYP5] = { + [ALC1220_FIXUP_CLEVO_PB51ED] = { .type = HDA_FIXUP_FUNC, - .v.func = alc1220_fixup_system76_oryp5, + .v.func = alc1220_fixup_clevo_pb51ed, }, - [ALC1220_FIXUP_SYSTEM76_ORYP5_PINS] = { + [ALC1220_FIXUP_CLEVO_PB51ED_PINS] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { { 0x19, 0x01a1913c }, /* use as headset mic, without its own jack detect */ {} }, .chained = true, - .chain_id = ALC1220_FIXUP_SYSTEM76_ORYP5, + .chain_id = ALC1220_FIXUP_CLEVO_PB51ED, }, };
@@ -2411,8 +2411,9 @@ static const struct snd_pci_quirk alc882 SND_PCI_QUIRK(0x1558, 0x9501, "Clevo P950HR", ALC1220_FIXUP_CLEVO_P950), SND_PCI_QUIRK(0x1558, 0x95e1, "Clevo P95xER", ALC1220_FIXUP_CLEVO_P950), SND_PCI_QUIRK(0x1558, 0x95e2, "Clevo P950ER", ALC1220_FIXUP_CLEVO_P950), - SND_PCI_QUIRK(0x1558, 0x96e1, "System76 Oryx Pro (oryp5)", ALC1220_FIXUP_SYSTEM76_ORYP5_PINS), - SND_PCI_QUIRK(0x1558, 0x97e1, "System76 Oryx Pro (oryp5)", ALC1220_FIXUP_SYSTEM76_ORYP5_PINS), + SND_PCI_QUIRK(0x1558, 0x96e1, "System76 Oryx Pro (oryp5)", ALC1220_FIXUP_CLEVO_PB51ED_PINS), + SND_PCI_QUIRK(0x1558, 0x97e1, "System76 Oryx Pro (oryp5)", ALC1220_FIXUP_CLEVO_PB51ED_PINS), + SND_PCI_QUIRK(0x1558, 0x65d1, "Tuxedo Book XC1509", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK_VENDOR(0x1558, "Clevo laptop", ALC882_FIXUP_EAPD), SND_PCI_QUIRK(0x161f, 0x2054, "Medion laptop", ALC883_FIXUP_EAPD), SND_PCI_QUIRK(0x17aa, 0x3a0d, "Lenovo Y530", ALC882_FIXUP_LENOVO_Y530),
From: Oleksandr Andrushchenko oleksandr_andrushchenko@epam.com
commit 8b030a57e35a0efc1a8aa18bb10555bc5066ac40 upstream.
This fixes the regression introduced while moving to Xen shared buffer implementation.
Fixes: 58f9d806d16a ("ALSA: xen-front: Use Xen common shared buffer implementation") Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: Oleksandr Andrushchenko oleksandr_andrushchenko@epam.com Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/xen/xen_snd_front_alsa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/xen/xen_snd_front_alsa.c +++ b/sound/xen/xen_snd_front_alsa.c @@ -441,7 +441,7 @@ static int shbuf_setup_backstore(struct { int i;
- stream->buffer = alloc_pages_exact(stream->buffer_sz, GFP_KERNEL); + stream->buffer = alloc_pages_exact(buffer_sz, GFP_KERNEL); if (!stream->buffer) return -ENOMEM;
From: Hui Wang hui.wang@canonical.com
commit cae30527901d9590db0e12ace994c1d58bea87fd upstream.
Recently we set CONFIG_SND_HDA_POWER_SAVE_DEFAULT to 1 when configuring the kernel, then two machines were reported to have noise after installing the new kernel. Put them in the blacklist, the noise disappears.
https://bugs.launchpad.net/bugs/1821663 Cc: stable@vger.kernel.org Signed-off-by: Hui Wang hui.wang@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/hda_intel.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2142,6 +2142,8 @@ static struct snd_pci_quirk power_save_b SND_PCI_QUIRK(0x8086, 0x2040, "Intel DZ77BH-55K", 0), /* https://bugzilla.kernel.org/show_bug.cgi?id=199607 */ SND_PCI_QUIRK(0x8086, 0x2057, "Intel NUC5i7RYB", 0), + /* https://bugs.launchpad.net/bugs/1821663 */ + SND_PCI_QUIRK(0x8086, 0x2064, "Intel SDP 8086:2064", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1520902 */ SND_PCI_QUIRK(0x8086, 0x2068, "Intel NUC7i3BNB", 0), /* https://bugzilla.kernel.org/show_bug.cgi?id=198611 */ @@ -2150,6 +2152,8 @@ static struct snd_pci_quirk power_save_b SND_PCI_QUIRK(0x17aa, 0x367b, "Lenovo IdeaCentre B550", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1572975 */ SND_PCI_QUIRK(0x17aa, 0x36a7, "Lenovo C50 All in one", 0), + /* https://bugs.launchpad.net/bugs/1821663 */ + SND_PCI_QUIRK(0x1631, 0xe017, "Packard Bell NEC IMEDIA 5204", 0), {} }; #endif /* CONFIG_PM */
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
commit c6f3c5ee40c10bb65725047a220570f718507001 upstream.
With some architectures like ppc64, set_pmd_at() cannot cope with a situation where there is already some (different) valid entry present.
Use pmdp_set_access_flags() instead to modify the pfn which is built to deal with modifying existing PMD entries.
This is similar to commit cae85cb8add3 ("mm/memory.c: fix modifying of page protection by insert_pfn()")
We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't support pud pfn entries now.
Without this patch we also see the below message in kernel log "BUG: non-zero pgtables_bytes on freeing mm:"
Link: http://lkml.kernel.org/r/20190402115125.18803-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Reported-by: Chandan Rajendra chandan@linux.ibm.com Reviewed-by: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/huge_memory.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -753,6 +753,21 @@ static void insert_pfn_pmd(struct vm_are spinlock_t *ptl;
ptl = pmd_lock(mm, pmd); + if (!pmd_none(*pmd)) { + if (write) { + if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) { + WARN_ON_ONCE(!is_huge_zero_pmd(*pmd)); + goto out_unlock; + } + entry = pmd_mkyoung(*pmd); + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (pmdp_set_access_flags(vma, addr, pmd, entry, 1)) + update_mmu_cache_pmd(vma, addr, pmd); + } + + goto out_unlock; + } + entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pmd_mkdevmap(entry); @@ -764,11 +779,16 @@ static void insert_pfn_pmd(struct vm_are if (pgtable) { pgtable_trans_huge_deposit(mm, pmd, pgtable); mm_inc_nr_ptes(mm); + pgtable = NULL; }
set_pmd_at(mm, addr, pmd, entry); update_mmu_cache_pmd(vma, addr, pmd); + +out_unlock: spin_unlock(ptl); + if (pgtable) + pte_free(mm, pgtable); }
vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, @@ -819,6 +839,20 @@ static void insert_pfn_pud(struct vm_are spinlock_t *ptl;
ptl = pud_lock(mm, pud); + if (!pud_none(*pud)) { + if (write) { + if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) { + WARN_ON_ONCE(!is_huge_zero_pud(*pud)); + goto out_unlock; + } + entry = pud_mkyoung(*pud); + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); + if (pudp_set_access_flags(vma, addr, pud, entry, 1)) + update_mmu_cache_pud(vma, addr, pud); + } + goto out_unlock; + } + entry = pud_mkhuge(pfn_t_pud(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pud_mkdevmap(entry); @@ -828,6 +862,8 @@ static void insert_pfn_pud(struct vm_are } set_pud_at(mm, addr, pud, entry); update_mmu_cache_pud(vma, addr, pud); + +out_unlock: spin_unlock(ptl); }
From: Peter Geis pgwipeout@gmail.com
commit 09f91381fa5de1d44bc323d8bf345f5d57b3d9b5 upstream.
Various rk3328 based boards experience occasional sdmmc0 write errors. This is due to the rk3328.dtsi tx drive levels being set to 4ma, vs 8ma per the rk3328 datasheet default settings.
Fix this by setting the tx signal pins to 8ma. Inspiration from tonymac32's patch, https://github.com/ayufan-rock64/linux-kernel/commit/dc1212b347e0da17c5460bc...
Fixes issues on the rk3328-roc-cc and the rk3328-rock64 (as per the above commit message).
Tested on the rk3328-roc-cc board.
Fixes: 52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs") Cc: stable@vger.kernel.org Signed-off-by: Peter Geis pgwipeout@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328.dtsi | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3328.dtsi @@ -1431,11 +1431,11 @@
sdmmc0 { sdmmc0_clk: sdmmc0-clk { - rockchip,pins = <1 RK_PA6 1 &pcfg_pull_none_4ma>; + rockchip,pins = <1 RK_PA6 1 &pcfg_pull_none_8ma>; };
sdmmc0_cmd: sdmmc0-cmd { - rockchip,pins = <1 RK_PA4 1 &pcfg_pull_up_4ma>; + rockchip,pins = <1 RK_PA4 1 &pcfg_pull_up_8ma>; };
sdmmc0_dectn: sdmmc0-dectn { @@ -1447,14 +1447,14 @@ };
sdmmc0_bus1: sdmmc0-bus1 { - rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_4ma>; + rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_8ma>; };
sdmmc0_bus4: sdmmc0-bus4 { - rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_4ma>, - <1 RK_PA1 1 &pcfg_pull_up_4ma>, - <1 RK_PA2 1 &pcfg_pull_up_4ma>, - <1 RK_PA3 1 &pcfg_pull_up_4ma>; + rockchip,pins = <1 RK_PA0 1 &pcfg_pull_up_8ma>, + <1 RK_PA1 1 &pcfg_pull_up_8ma>, + <1 RK_PA2 1 &pcfg_pull_up_8ma>, + <1 RK_PA3 1 &pcfg_pull_up_8ma>; };
sdmmc0_gpio: sdmmc0-gpio {
From: Daniel Drake drake@endlessm.com
commit 157c99c5a2956a9ab1ae12de0136a2d8a1b1a307 upstream.
The alcor driver is setting up data transfer and submitting the associated MMC command at the same time. While this works most of the time, it occasionally causes problems upon write.
In the working case, after setting up the data transfer and submitting the MMC command, an interrupt comes in a moment later with CMD_END and WRITE_BUF_RDY bits set. The data transfer then happens without problem.
However, on occasion, the interrupt that arrives at that point only has WRITE_BUF_RDY set. The hardware notifies that it's ready to write data, but the associated MMC command is still running. Regardless, the driver was proceeding to write data immediately, and that would then cause another interrupt indicating data CRC error, and the write would fail.
Additionally, the transfer setup function alcor_trigger_data_transfer() was being called 3 times for each write operation, which was confusing and may be contributing to this issue.
Solve this by tweaking the driver behaviour to follow the sequence observed in the original ampe_stor vendor driver: 1. When starting request handling, write 0 to DATA_XFER_CTRL 2. Submit the command 3. Wait for CMD_END interrupt and then trigger data transfer 4. For the PIO case, trigger the next step of the data transfer only upon the following DATA_END interrupt, which occurs after the block has been written.
I confirmed that the read path still works (DMA & PIO) and also now presents more consistency with the operations performed by ampe_stor.
Signed-off-by: Daniel Drake drake@endlessm.com Fixes: c5413ad815a6 ("mmc: add new Alcor Micro Cardreader SD/MMC driver") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/alcor.c | 34 +++++++++++++--------------------- 1 file changed, 13 insertions(+), 21 deletions(-)
--- a/drivers/mmc/host/alcor.c +++ b/drivers/mmc/host/alcor.c @@ -48,7 +48,6 @@ struct alcor_sdmmc_host { struct mmc_command *cmd; struct mmc_data *data; unsigned int dma_on:1; - unsigned int early_data:1;
struct mutex cmd_mutex;
@@ -144,8 +143,7 @@ static void alcor_data_set_dma(struct al host->sg_count--; }
-static void alcor_trigger_data_transfer(struct alcor_sdmmc_host *host, - bool early) +static void alcor_trigger_data_transfer(struct alcor_sdmmc_host *host) { struct alcor_pci_priv *priv = host->alcor_pci; struct mmc_data *data = host->data; @@ -155,13 +153,6 @@ static void alcor_trigger_data_transfer( ctrl |= AU6601_DATA_WRITE;
if (data->host_cookie == COOKIE_MAPPED) { - if (host->early_data) { - host->early_data = false; - return; - } - - host->early_data = early; - alcor_data_set_dma(host); ctrl |= AU6601_DATA_DMA_MODE; host->dma_on = 1; @@ -231,6 +222,7 @@ static void alcor_prepare_sg_miter(struc static void alcor_prepare_data(struct alcor_sdmmc_host *host, struct mmc_command *cmd) { + struct alcor_pci_priv *priv = host->alcor_pci; struct mmc_data *data = cmd->data;
if (!data) @@ -248,7 +240,7 @@ static void alcor_prepare_data(struct al if (data->host_cookie != COOKIE_MAPPED) alcor_prepare_sg_miter(host);
- alcor_trigger_data_transfer(host, true); + alcor_write8(priv, 0, AU6601_DATA_XFER_CTRL); }
static void alcor_send_cmd(struct alcor_sdmmc_host *host, @@ -435,7 +427,7 @@ static int alcor_cmd_irq_done(struct alc if (!host->data) return false;
- alcor_trigger_data_transfer(host, false); + alcor_trigger_data_transfer(host); host->cmd = NULL; return true; } @@ -456,7 +448,7 @@ static void alcor_cmd_irq_thread(struct if (!host->data) alcor_request_complete(host, 1); else - alcor_trigger_data_transfer(host, false); + alcor_trigger_data_transfer(host); host->cmd = NULL; }
@@ -487,15 +479,9 @@ static int alcor_data_irq_done(struct al break; case AU6601_INT_READ_BUF_RDY: alcor_trf_block_pio(host, true); - if (!host->blocks) - break; - alcor_trigger_data_transfer(host, false); return 1; case AU6601_INT_WRITE_BUF_RDY: alcor_trf_block_pio(host, false); - if (!host->blocks) - break; - alcor_trigger_data_transfer(host, false); return 1; case AU6601_INT_DMA_END: if (!host->sg_count) @@ -508,8 +494,14 @@ static int alcor_data_irq_done(struct al break; }
- if (intmask & AU6601_INT_DATA_END) - return 0; + if (intmask & AU6601_INT_DATA_END) { + if (!host->dma_on && host->blocks) { + alcor_trigger_data_transfer(host); + return 1; + } else { + return 0; + } + }
return 1; }
From: Faiz Abbas faiz_abbas@ti.com
commit 5c41ea6d52003b5bc77c2a82fd5ca7d480237d89 upstream.
commit 5b0d62108b46 ("mmc: sdhci-omap: Add platform specific reset callback") skips data resets during tuning operation. Because of this, a data error or data finish interrupt might still arrive after a command error has been handled and the mrq ended. This ends up with a "mmc0: Got data interrupt 0x00000002 even though no data operation was in progress" error message.
Fix this by adding a platform specific callback for sdhci_irq. Mark the mrq as a failure but wait for a data interrupt instead of calling finish_mrq().
Fixes: 5b0d62108b46 ("mmc: sdhci-omap: Add platform specific reset callback") Signed-off-by: Faiz Abbas faiz_abbas@ti.com Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mmc/host/sdhci-omap.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
--- a/drivers/mmc/host/sdhci-omap.c +++ b/drivers/mmc/host/sdhci-omap.c @@ -797,6 +797,43 @@ void sdhci_omap_reset(struct sdhci_host sdhci_reset(host, mask); }
+#define CMD_ERR_MASK (SDHCI_INT_CRC | SDHCI_INT_END_BIT | SDHCI_INT_INDEX |\ + SDHCI_INT_TIMEOUT) +#define CMD_MASK (CMD_ERR_MASK | SDHCI_INT_RESPONSE) + +static u32 sdhci_omap_irq(struct sdhci_host *host, u32 intmask) +{ + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); + struct sdhci_omap_host *omap_host = sdhci_pltfm_priv(pltfm_host); + + if (omap_host->is_tuning && host->cmd && !host->data_early && + (intmask & CMD_ERR_MASK)) { + + /* + * Since we are not resetting data lines during tuning + * operation, data error or data complete interrupts + * might still arrive. Mark this request as a failure + * but still wait for the data interrupt + */ + if (intmask & SDHCI_INT_TIMEOUT) + host->cmd->error = -ETIMEDOUT; + else + host->cmd->error = -EILSEQ; + + host->cmd = NULL; + + /* + * Sometimes command error interrupts and command complete + * interrupt will arrive together. Clear all command related + * interrupts here. + */ + sdhci_writel(host, intmask & CMD_MASK, SDHCI_INT_STATUS); + intmask &= ~CMD_MASK; + } + + return intmask; +} + static struct sdhci_ops sdhci_omap_ops = { .set_clock = sdhci_omap_set_clock, .set_power = sdhci_omap_set_power, @@ -807,6 +844,7 @@ static struct sdhci_ops sdhci_omap_ops = .platform_send_init_74_clocks = sdhci_omap_init_74_clocks, .reset = sdhci_omap_reset, .set_uhs_signaling = sdhci_omap_set_uhs_signaling, + .irq = sdhci_omap_irq, };
static int sdhci_omap_set_capabilities(struct sdhci_omap_host *omap_host)
From: Helge Deller deller@gmx.de
commit d006e95b5561f708d0385e9677ffe2c46f2ae345 upstream.
While adding LASI support to QEMU, I noticed that the QEMU detection in the kernel happens much too late. For example, when a LASI chip is found by the kernel, it registers the LASI LED driver as well. But when we run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so we need to access the running_on_QEMU flag earlier than before.
This patch now makes the QEMU detection the fist task of the Linux kernel by moving it to where the kernel enters the C-coding.
Fixes: 310d82784fb4 ("parisc: qemu idle sleep support") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/kernel/process.c | 6 ------ arch/parisc/kernel/setup.c | 3 +++ 2 files changed, 3 insertions(+), 6 deletions(-)
--- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -210,12 +210,6 @@ void __cpuidle arch_cpu_idle(void)
static int __init parisc_idle_init(void) { - const char *marker; - - /* check QEMU/SeaBIOS marker in PAGE0 */ - marker = (char *) &PAGE0->pad0; - running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0); - if (!running_on_qemu) cpu_idle_poll_ctrl(1);
--- a/arch/parisc/kernel/setup.c +++ b/arch/parisc/kernel/setup.c @@ -396,6 +396,9 @@ void __init start_parisc(void) int ret, cpunum; struct pdc_coproc_cfg coproc_cfg;
+ /* check QEMU/SeaBIOS marker in PAGE0 */ + running_on_qemu = (memcmp(&PAGE0->pad0, "SeaBIOS", 8) == 0); + cpunum = smp_processor_id();
init_cpu_topology();
From: Sven Schnelle svens@stackframe.org
commit 45efd871bf0a47648f119d1b41467f70484de5bc upstream.
While working on kretprobes for PA-RISC I was wondering while the kprobes sanity test always fails on kretprobes. This is caused by returning gpr20 instead of gpr28.
Signed-off-by: Sven Schnelle svens@stackframe.org Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/include/asm/ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/parisc/include/asm/ptrace.h +++ b/arch/parisc/include/asm/ptrace.h @@ -22,7 +22,7 @@ unsigned long profile_pc(struct pt_regs
static inline unsigned long regs_return_value(struct pt_regs *regs) { - return regs->gr[20]; + return regs->gr[28]; }
static inline void instruction_pointer_set(struct pt_regs *regs,
From: Sven Schnelle svens@stackframe.org
commit f324fa58327791b2696628b31480e7e21c745706 upstream.
When setting the instruction pointer on PA-RISC we also need to set the back of the instruction queue to the new offset, otherwise we will execute on instruction from the new location, and jumping back to the old location stored in iaoq_b.
Signed-off-by: Sven Schnelle svens@stackframe.org Signed-off-by: Helge Deller deller@gmx.de Fixes: 75ebedf1d263 ("parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/include/asm/ptrace.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/parisc/include/asm/ptrace.h +++ b/arch/parisc/include/asm/ptrace.h @@ -28,7 +28,8 @@ static inline unsigned long regs_return_ static inline void instruction_pointer_set(struct pt_regs *regs, unsigned long val) { - regs->iaoq[0] = val; + regs->iaoq[0] = val; + regs->iaoq[1] = val + 4; }
/* Query offset/name of register from its name/offset */
From: Andrei Vagin avagin@gmail.com
commit 07d7e12091f4ab869cc6a4bb276399057e73b0b3 upstream.
To calculate a remaining time, it's required to subtract the current time from the expiration time. In alarm_timer_remaining() the arguments of ktime_sub are swapped.
Fixes: d653d8457c76 ("alarmtimer: Implement remaining callback") Signed-off-by: Andrei Vagin avagin@gmail.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Mukesh Ojha mojha@codeaurora.org Cc: Stephen Boyd sboyd@kernel.org Cc: John Stultz john.stultz@linaro.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190408041542.26338-1-avagin@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/alarmtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -594,7 +594,7 @@ static ktime_t alarm_timer_remaining(str { struct alarm *alarm = &timr->it.alarm.alarmtimer;
- return ktime_sub(now, alarm->node.expires); + return ktime_sub(alarm->node.expires, now); }
/**
From: Yan Zhao yan.y.zhao@intel.com
commit dade58ed5af6365ac50ff4259c2a0bf31219e285 upstream.
in workload creation routine, if any failure occurs, do not queue this workload for delivery. if this failure is fatal, enter into failsafe mode.
Fixes: 6d76303553ba ("drm/i915/gvt: Move common vGPU workload creation into scheduler.c") Cc: stable@vger.kernel.org #4.19+ Cc: zhenyuw@linux.intel.com Signed-off-by: Yan Zhao yan.y.zhao@intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gvt/scheduler.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -1475,8 +1475,9 @@ intel_vgpu_create_workload(struct intel_ intel_runtime_pm_put(dev_priv); }
- if (ret && (vgpu_is_vm_unhealthy(ret))) { - enter_failsafe_mode(vgpu, GVT_FAILSAFE_GUEST_ERR); + if (ret) { + if (vgpu_is_vm_unhealthy(ret)) + enter_failsafe_mode(vgpu, GVT_FAILSAFE_GUEST_ERR); intel_vgpu_destroy_workload(workload); return ERR_PTR(ret); }
From: Jernej Skrabec jernej.skrabec@siol.net
commit cd9063757a227cf31ebf5391ccda2bf583b0806e upstream.
Currently resolutions with pixel clock higher than 340 MHz don't work with H6 HDMI controller. They just produce a blank screen.
Limit maximum pixel clock rate to 340 MHz until scrambling is supported.
Cc: stable@vger.kernel.org # 5.0 Fixes: 40bb9d3147b2 ("drm/sun4i: Add support for H6 DW HDMI controller") Signed-off-by: Jernej Skrabec jernej.skrabec@siol.net Signed-off-by: Maxime Ripard maxime.ripard@bootlin.com Link: https://patchwork.freedesktop.org/patch/msgid/20190324190609.32721-1-jernej.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/sun4i/sun8i_dw_hdmi.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/sun4i/sun8i_dw_hdmi.c +++ b/drivers/gpu/drm/sun4i/sun8i_dw_hdmi.c @@ -48,8 +48,13 @@ static enum drm_mode_status sun8i_dw_hdmi_mode_valid_h6(struct drm_connector *connector, const struct drm_display_mode *mode) { - /* This is max for HDMI 2.0b (4K@60Hz) */ - if (mode->clock > 594000) + /* + * Controller support maximum of 594 MHz, which correlates to + * 4K@60Hz 4:4:4 or RGB. However, for frequencies greater than + * 340 MHz scrambling has to be enabled. Because scrambling is + * not yet implemented, just limit to 340 MHz for now. + */ + if (mode->clock > 340000) return MODE_CLOCK_HIGH;
return MODE_OK;
From: Dave Airlie airlied@redhat.com
commit 9b39b013037fbfa8d4b999345d9e904d8a336fc2 upstream.
If we unplug a udl device, the usb callback with deinit the mode_config struct, however userspace will still have an open file descriptor and a framebuffer on that device. When userspace closes the fd, we'll oops because it'll try and look stuff up in the object idr which we've destroyed.
This punts destroying the mode objects until release time instead.
Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20190405031715.5959-2-airlied@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/udl/udl_drv.c | 1 + drivers/gpu/drm/udl/udl_drv.h | 1 + drivers/gpu/drm/udl/udl_main.c | 8 +++++++- 3 files changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/udl/udl_drv.c +++ b/drivers/gpu/drm/udl/udl_drv.c @@ -51,6 +51,7 @@ static struct drm_driver driver = { .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_PRIME, .load = udl_driver_load, .unload = udl_driver_unload, + .release = udl_driver_release,
/* gem hooks */ .gem_free_object_unlocked = udl_gem_free_object, --- a/drivers/gpu/drm/udl/udl_drv.h +++ b/drivers/gpu/drm/udl/udl_drv.h @@ -104,6 +104,7 @@ void udl_urb_completion(struct urb *urb)
int udl_driver_load(struct drm_device *dev, unsigned long flags); void udl_driver_unload(struct drm_device *dev); +void udl_driver_release(struct drm_device *dev);
int udl_fbdev_init(struct drm_device *dev); void udl_fbdev_cleanup(struct drm_device *dev); --- a/drivers/gpu/drm/udl/udl_main.c +++ b/drivers/gpu/drm/udl/udl_main.c @@ -378,6 +378,12 @@ void udl_driver_unload(struct drm_device udl_free_urb_list(dev);
udl_fbdev_cleanup(dev); - udl_modeset_cleanup(dev); kfree(udl); } + +void udl_driver_release(struct drm_device *dev) +{ + udl_modeset_cleanup(dev); + drm_dev_fini(dev); + kfree(dev); +}
From: David Rientjes rientjes@google.com
commit ede885ecb2cdf8a8dd5367702e3d964ec846a2d5 upstream.
get_num_contig_pages() could potentially overflow int so make its type consistent with its usage.
Reported-by: Cfir Cohen cfir@google.com Cc: stable@vger.kernel.org Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/svm.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -6422,11 +6422,11 @@ e_free: return ret; }
-static int get_num_contig_pages(int idx, struct page **inpages, - unsigned long npages) +static unsigned long get_num_contig_pages(unsigned long idx, + struct page **inpages, unsigned long npages) { unsigned long paddr, next_paddr; - int i = idx + 1, pages = 1; + unsigned long i = idx + 1, pages = 1;
/* find the number of contiguous pages starting from idx */ paddr = __sme_page_pa(inpages[idx]); @@ -6445,12 +6445,12 @@ static int get_num_contig_pages(int idx,
static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp) { - unsigned long vaddr, vaddr_end, next_vaddr, npages, size; + unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i; struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; struct kvm_sev_launch_update_data params; struct sev_data_launch_update_data *data; struct page **inpages; - int i, ret, pages; + int ret;
if (!sev_guest(kvm)) return -ENOTTY;
From: Arnd Bergmann arnd@arndb.de
commit 6147e136ff5071609b54f18982dea87706288e21 upstream.
clang points out with hundreds of warnings that the bitrev macros have a problem with constant input:
drivers/hwmon/sht15.c:187:11: error: variable '__x' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] u8 crc = bitrev8(data->val_status & 0x0F); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/bitrev.h:102:21: note: expanded from macro 'bitrev8' __constant_bitrev8(__x) : \ ~~~~~~~~~~~~~~~~~~~^~~~ include/linux/bitrev.h:67:11: note: expanded from macro '__constant_bitrev8' u8 __x = x; \ ~~~ ^
Both the bitrev and the __constant_bitrev macros use an internal variable named __x, which goes horribly wrong when passing one to the other.
The obvious fix is to rename one of the variables, so this adds an extra '_'.
It seems we got away with this because
- there are only a few drivers using bitrev macros
- usually there are no constant arguments to those
- when they are constant, they tend to be either 0 or (unsigned)-1 (drivers/isdn/i4l/isdnhdlc.o, drivers/iio/amplifiers/ad8366.c) and give the correct result by pure chance.
In fact, the only driver that I could find that gets different results with this is drivers/net/wan/slic_ds26522.c, which in turn is a driver for fairly rare hardware (adding the maintainer to Cc for testing).
Link: http://lkml.kernel.org/r/20190322140503.123580-1-arnd@arndb.de Fixes: 556d2f055bf6 ("ARM: 8187/1: add CONFIG_HAVE_ARCH_BITREVERSE to support rbit instruction") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Nick Desaulniers ndesaulniers@google.com Cc: Zhao Qiang qiang.zhao@nxp.com Cc: Yalin Wang yalin.wang@sonymobile.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/bitrev.h | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-)
--- a/include/linux/bitrev.h +++ b/include/linux/bitrev.h @@ -34,41 +34,41 @@ static inline u32 __bitrev32(u32 x)
#define __constant_bitrev32(x) \ ({ \ - u32 __x = x; \ - __x = (__x >> 16) | (__x << 16); \ - __x = ((__x & (u32)0xFF00FF00UL) >> 8) | ((__x & (u32)0x00FF00FFUL) << 8); \ - __x = ((__x & (u32)0xF0F0F0F0UL) >> 4) | ((__x & (u32)0x0F0F0F0FUL) << 4); \ - __x = ((__x & (u32)0xCCCCCCCCUL) >> 2) | ((__x & (u32)0x33333333UL) << 2); \ - __x = ((__x & (u32)0xAAAAAAAAUL) >> 1) | ((__x & (u32)0x55555555UL) << 1); \ - __x; \ + u32 ___x = x; \ + ___x = (___x >> 16) | (___x << 16); \ + ___x = ((___x & (u32)0xFF00FF00UL) >> 8) | ((___x & (u32)0x00FF00FFUL) << 8); \ + ___x = ((___x & (u32)0xF0F0F0F0UL) >> 4) | ((___x & (u32)0x0F0F0F0FUL) << 4); \ + ___x = ((___x & (u32)0xCCCCCCCCUL) >> 2) | ((___x & (u32)0x33333333UL) << 2); \ + ___x = ((___x & (u32)0xAAAAAAAAUL) >> 1) | ((___x & (u32)0x55555555UL) << 1); \ + ___x; \ })
#define __constant_bitrev16(x) \ ({ \ - u16 __x = x; \ - __x = (__x >> 8) | (__x << 8); \ - __x = ((__x & (u16)0xF0F0U) >> 4) | ((__x & (u16)0x0F0FU) << 4); \ - __x = ((__x & (u16)0xCCCCU) >> 2) | ((__x & (u16)0x3333U) << 2); \ - __x = ((__x & (u16)0xAAAAU) >> 1) | ((__x & (u16)0x5555U) << 1); \ - __x; \ + u16 ___x = x; \ + ___x = (___x >> 8) | (___x << 8); \ + ___x = ((___x & (u16)0xF0F0U) >> 4) | ((___x & (u16)0x0F0FU) << 4); \ + ___x = ((___x & (u16)0xCCCCU) >> 2) | ((___x & (u16)0x3333U) << 2); \ + ___x = ((___x & (u16)0xAAAAU) >> 1) | ((___x & (u16)0x5555U) << 1); \ + ___x; \ })
#define __constant_bitrev8x4(x) \ ({ \ - u32 __x = x; \ - __x = ((__x & (u32)0xF0F0F0F0UL) >> 4) | ((__x & (u32)0x0F0F0F0FUL) << 4); \ - __x = ((__x & (u32)0xCCCCCCCCUL) >> 2) | ((__x & (u32)0x33333333UL) << 2); \ - __x = ((__x & (u32)0xAAAAAAAAUL) >> 1) | ((__x & (u32)0x55555555UL) << 1); \ - __x; \ + u32 ___x = x; \ + ___x = ((___x & (u32)0xF0F0F0F0UL) >> 4) | ((___x & (u32)0x0F0F0F0FUL) << 4); \ + ___x = ((___x & (u32)0xCCCCCCCCUL) >> 2) | ((___x & (u32)0x33333333UL) << 2); \ + ___x = ((___x & (u32)0xAAAAAAAAUL) >> 1) | ((___x & (u32)0x55555555UL) << 1); \ + ___x; \ })
#define __constant_bitrev8(x) \ ({ \ - u8 __x = x; \ - __x = (__x >> 4) | (__x << 4); \ - __x = ((__x & (u8)0xCCU) >> 2) | ((__x & (u8)0x33U) << 2); \ - __x = ((__x & (u8)0xAAU) >> 1) | ((__x & (u8)0x55U) << 1); \ - __x; \ + u8 ___x = x; \ + ___x = (___x >> 4) | (___x << 4); \ + ___x = ((___x & (u8)0xCCU) >> 2) | ((___x & (u8)0x33U) << 2); \ + ___x = ((___x & (u8)0xAAU) >> 1) | ((___x & (u8)0x55U) << 1); \ + ___x; \ })
#define bitrev32(x) \
From: Greg Thelen gthelen@google.com
commit 0b3d6e6f2dd0a7b697b1aa8c167265908940624b upstream.
Since commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting") memcg dirty and writeback counters are managed as:
1) per-memcg per-cpu values in range of [-32..32]
2) per-memcg atomic counter
When a per-cpu counter cannot fit in [-32..32] it's flushed to the atomic. Stat readers only check the atomic. Thus readers such as balance_dirty_pages() may see a nontrivial error margin: 32 pages per cpu.
Assuming 100 cpus: 4k x86 page_size: 13 MiB error per memcg 64k ppc page_size: 200 MiB error per memcg
Considering that dirty+writeback are used together for some decisions the errors double.
This inaccuracy can lead to undeserved oom kills. One nasty case is when all per-cpu counters hold positive values offsetting an atomic negative value (i.e. per_cpu[*]=32, atomic=n_cpu*-32). balance_dirty_pages() only consults the atomic and does not consider throttling the next n_cpu*32 dirty pages. If the file_lru is in the 13..200 MiB range then there's absolutely no dirty throttling, which burdens vmscan with only dirty+writeback pages thus resorting to oom kill.
It could be argued that tiny containers are not supported, but it's more subtle. It's the amount the space available for file lru that matters. If a container has memory.max-200MiB of non reclaimable memory, then it will also suffer such oom kills on a 100 cpu machine.
The following test reliably ooms without this patch. This patch avoids oom kills.
$ cat test mount -t cgroup2 none /dev/cgroup cd /dev/cgroup echo +io +memory > cgroup.subtree_control mkdir test cd test echo 10M > memory.max (echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo) (echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)
$ cat memcg-writeback-stress.c /* * Dirty pages from all but one cpu. * Clean pages from the non dirtying cpu. * This is to stress per cpu counter imbalance. * On a 100 cpu machine: * - per memcg per cpu dirty count is 32 pages for each of 99 cpus * - per memcg atomic is -99*32 pages * - thus the complete dirty limit: sum of all counters 0 * - balance_dirty_pages() only sees atomic count -99*32 pages, which * it max()s to 0. * - So a workload can dirty -99*32 pages before balance_dirty_pages() * cares. */ #define _GNU_SOURCE #include <err.h> #include <fcntl.h> #include <sched.h> #include <stdlib.h> #include <stdio.h> #include <sys/stat.h> #include <sys/sysinfo.h> #include <sys/types.h> #include <unistd.h>
static char *buf; static int bufSize;
static void set_affinity(int cpu) { cpu_set_t affinity;
CPU_ZERO(&affinity); CPU_SET(cpu, &affinity); if (sched_setaffinity(0, sizeof(affinity), &affinity)) err(1, "sched_setaffinity"); }
static void dirty_on(int output_fd, int cpu) { int i, wrote;
set_affinity(cpu); for (i = 0; i < 32; i++) { for (wrote = 0; wrote < bufSize; ) { int ret = write(output_fd, buf+wrote, bufSize-wrote); if (ret == -1) err(1, "write"); wrote += ret; } } }
int main(int argc, char **argv) { int cpu, flush_cpu = 1, output_fd; const char *output;
if (argc != 2) errx(1, "usage: output_file");
output = argv[1]; bufSize = getpagesize(); buf = malloc(getpagesize()); if (buf == NULL) errx(1, "malloc failed");
output_fd = open(output, O_CREAT|O_RDWR); if (output_fd == -1) err(1, "open(%s)", output);
for (cpu = 0; cpu < get_nprocs(); cpu++) { if (cpu != flush_cpu) dirty_on(output_fd, cpu); }
set_affinity(flush_cpu); if (fsync(output_fd)) err(1, "fsync(%s)", output); if (close(output_fd)) err(1, "close(%s)", output); free(buf); }
Make balance_dirty_pages() and wb_over_bg_thresh() work harder to collect exact per memcg counters. This avoids the aforementioned oom kills.
This does not affect the overhead of memory.stat, which still reads the single atomic counter.
Why not use percpu_counter? memcg already handles cpus going offline, so no need for that overhead from percpu_counter. And the percpu_counter spinlocks are more heavyweight than is required.
It probably also makes sense to use exact dirty and writeback counters in memcg oom reports. But that is saved for later.
Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com Signed-off-by: Greg Thelen gthelen@google.com Reviewed-by: Roman Gushchin guro@fb.com Acked-by: Johannes Weiner hannes@cmpxchg.org Cc: Michal Hocko mhocko@kernel.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org [4.16+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/memcontrol.h | 5 ++++- mm/memcontrol.c | 20 ++++++++++++++++++-- 2 files changed, 22 insertions(+), 3 deletions(-)
--- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -561,7 +561,10 @@ struct mem_cgroup *lock_page_memcg(struc void __unlock_page_memcg(struct mem_cgroup *memcg); void unlock_page_memcg(struct page *page);
-/* idx can be of type enum memcg_stat_item or node_stat_item */ +/* + * idx can be of type enum memcg_stat_item or node_stat_item. + * Keep in sync with memcg_exact_page_state(). + */ static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) { --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3882,6 +3882,22 @@ struct wb_domain *mem_cgroup_wb_domain(s return &memcg->cgwb_domain; }
+/* + * idx can be of type enum memcg_stat_item or node_stat_item. + * Keep in sync with memcg_exact_page(). + */ +static unsigned long memcg_exact_page_state(struct mem_cgroup *memcg, int idx) +{ + long x = atomic_long_read(&memcg->stat[idx]); + int cpu; + + for_each_online_cpu(cpu) + x += per_cpu_ptr(memcg->stat_cpu, cpu)->count[idx]; + if (x < 0) + x = 0; + return x; +} + /** * mem_cgroup_wb_stats - retrieve writeback related stats from its memcg * @wb: bdi_writeback in question @@ -3907,10 +3923,10 @@ void mem_cgroup_wb_stats(struct bdi_writ struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent;
- *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); + *pdirty = memcg_exact_page_state(memcg, NR_FILE_DIRTY);
/* this should eventually include NR_UNSTABLE_NFS */ - *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); + *pwriteback = memcg_exact_page_state(memcg, NR_WRITEBACK); *pfilepages = mem_cgroup_nr_lru_pages(memcg, (1 << LRU_INACTIVE_FILE) | (1 << LRU_ACTIVE_FILE)); *pheadroom = PAGE_COUNTER_MAX;
From: Guenter Roeck linux@roeck-us.net
commit 8f71370f4b02730e8c27faf460af7a3586e24e1f upstream.
If codec registration fails after the ASoC Intel SST driver has been probed, the kernel will Oops and crash at suspend/resume.
general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 PID: 2811 Comm: cat Tainted: G W 4.19.30 #15 Hardware name: GOOGLE Clapper, BIOS Google_Clapper.5216.199.7 08/22/2014 RIP: 0010:snd_soc_suspend+0x5a/0xd21 Code: 03 80 3c 10 00 49 89 d7 74 0b 48 89 df e8 71 72 c4 fe 4c 89 fa 48 8b 03 48 89 45 d0 48 8d 98 a0 01 00 00 48 89 d8 48 c1 e8 03 <8a> 04 10 84 c0 0f 85 85 0c 00 00 80 3b 00 0f 84 6b 0c 00 00 48 8b RSP: 0018:ffff888035407750 EFLAGS: 00010202 RAX: 0000000000000034 RBX: 00000000000001a0 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88805c417098 RBP: ffff8880354077b0 R08: dffffc0000000000 R09: ffffed100b975718 R10: 0000000000000001 R11: ffffffff949ea4a3 R12: 1ffff1100b975746 R13: dffffc0000000000 R14: ffff88805cba4588 R15: dffffc0000000000 FS: 0000794a78e91b80(0000) GS:ffff888068d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007bd5283ccf58 CR3: 000000004b7aa000 CR4: 00000000001006e0 Call Trace: ? dpm_complete+0x67b/0x67b ? i915_gem_suspend+0x14d/0x1ad sst_soc_prepare+0x91/0x1dd ? sst_be_hw_params+0x7e/0x7e dpm_prepare+0x39a/0x88b dpm_suspend_start+0x13/0x9d suspend_devices_and_enter+0x18f/0xbd7 ? arch_suspend_enable_irqs+0x11/0x11 ? printk+0xd9/0x12d ? lock_release+0x95f/0x95f ? log_buf_vmcoreinfo_setup+0x131/0x131 ? rcu_read_lock_sched_held+0x140/0x22a ? __bpf_trace_rcu_utilization+0xa/0xa ? __pm_pr_dbg+0x186/0x190 ? pm_notifier_call_chain+0x39/0x39 ? suspend_test+0x9d/0x9d pm_suspend+0x2f4/0x728 ? trace_suspend_resume+0x3da/0x3da ? lock_release+0x95f/0x95f ? kernfs_fop_write+0x19f/0x32d state_store+0xd8/0x147 ? sysfs_kf_read+0x155/0x155 kernfs_fop_write+0x23e/0x32d __vfs_write+0x108/0x608 ? vfs_read+0x2e9/0x2e9 ? rcu_read_lock_sched_held+0x140/0x22a ? __bpf_trace_rcu_utilization+0xa/0xa ? debug_smp_processor_id+0x10/0x10 ? selinux_file_permission+0x1c5/0x3c8 ? rcu_sync_lockdep_assert+0x6a/0xad ? __sb_start_write+0x129/0x2ac vfs_write+0x1aa/0x434 ksys_write+0xfe/0x1be ? __ia32_sys_read+0x82/0x82 do_syscall_64+0xcd/0x120 entry_SYSCALL_64_after_hwframe+0x49/0xbe
In the observed situation, the problem is seen because the codec driver failed to probe due to a hardware problem.
max98090 i2c-193C9890:00: Failed to read device revision: -1 max98090 i2c-193C9890:00: ASoC: failed to probe component -1 cht-bsw-max98090 cht-bsw-max98090: ASoC: failed to instantiate card -1 cht-bsw-max98090 cht-bsw-max98090: snd_soc_register_card failed -1 cht-bsw-max98090: probe of cht-bsw-max98090 failed with error -1
The problem is similar to the problem solved with commit 2fc995a87f2e ("ASoC: intel: Fix crash at suspend/resume without card registration"), but codec registration fails at a later point. At that time, the pointer checked with the above mentioned commit is already set, but it is not cleared if the device is subsequently removed. Adding a remove function to clear the pointer fixes the problem.
Cc: stable@vger.kernel.org Cc: Jarkko Nikula jarkko.nikula@linux.intel.com Cc: Curtis Malainey cujomalainey@chromium.org Signed-off-by: Guenter Roeck linux@roeck-us.net Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/intel/atom/sst-mfld-platform-pcm.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/sound/soc/intel/atom/sst-mfld-platform-pcm.c +++ b/sound/soc/intel/atom/sst-mfld-platform-pcm.c @@ -711,9 +711,17 @@ static int sst_soc_probe(struct snd_soc_ return sst_dsp_init_v2_dpcm(component); }
+static void sst_soc_remove(struct snd_soc_component *component) +{ + struct sst_data *drv = dev_get_drvdata(component->dev); + + drv->soc_card = NULL; +} + static const struct snd_soc_component_driver sst_soc_platform_drv = { .name = DRV_NAME, .probe = sst_soc_probe, + .remove = sst_soc_remove, .ops = &sst_platform_ops, .compr_ops = &sst_platform_compr_ops, .pcm_new = sst_pcm_new,
From: S.j. Wang shengjiu.wang@nxp.com
commit 0ff4e8c61b794a4bf6c854ab071a1abaaa80f358 upstream.
There is very low possibility ( < 0.1% ) that channel swap happened in beginning when multi output/input pin is enabled. The issue is that hardware can't send data to correct pin in the beginning with the normal enable flow.
This is hardware issue, but there is no errata, the workaround flow is that: Each time playback/recording, firstly clear the xSMA/xSMB, then enable TE/RE, then enable xSMB and xSMA (xSMB must be enabled before xSMA). Which is to use the xSMA as the trigger start register, previously the xCR_TE or xCR_RE is the bit for starting.
Fixes commit 43d24e76b698 ("ASoC: fsl_esai: Add ESAI CPU DAI driver") Cc: stable@vger.kernel.org Reviewed-by: Fabio Estevam festevam@gmail.com Acked-by: Nicolin Chen nicoleotsuka@gmail.com Signed-off-by: Shengjiu Wang shengjiu.wang@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/fsl/fsl_esai.c | 47 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 37 insertions(+), 10 deletions(-)
--- a/sound/soc/fsl/fsl_esai.c +++ b/sound/soc/fsl/fsl_esai.c @@ -54,6 +54,8 @@ struct fsl_esai { u32 fifo_depth; u32 slot_width; u32 slots; + u32 tx_mask; + u32 rx_mask; u32 hck_rate[2]; u32 sck_rate[2]; bool hck_dir[2]; @@ -361,21 +363,13 @@ static int fsl_esai_set_dai_tdm_slot(str regmap_update_bits(esai_priv->regmap, REG_ESAI_TCCR, ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
- regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMA, - ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(tx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_TSMB, - ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(tx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_RCCR, ESAI_xCCR_xDC_MASK, ESAI_xCCR_xDC(slots));
- regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMA, - ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(rx_mask)); - regmap_update_bits(esai_priv->regmap, REG_ESAI_RSMB, - ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(rx_mask)); - esai_priv->slot_width = slot_width; esai_priv->slots = slots; + esai_priv->tx_mask = tx_mask; + esai_priv->rx_mask = rx_mask;
return 0; } @@ -596,6 +590,7 @@ static int fsl_esai_trigger(struct snd_p bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK; u8 i, channels = substream->runtime->channels; u32 pins = DIV_ROUND_UP(channels, esai_priv->slots); + u32 mask;
switch (cmd) { case SNDRV_PCM_TRIGGER_START: @@ -608,15 +603,38 @@ static int fsl_esai_trigger(struct snd_p for (i = 0; tx && i < channels; i++) regmap_write(esai_priv->regmap, REG_ESAI_ETDR, 0x0);
+ /* + * When set the TE/RE in the end of enablement flow, there + * will be channel swap issue for multi data line case. + * In order to workaround this issue, we switch the bit + * enablement sequence to below sequence + * 1) clear the xSMB & xSMA: which is done in probe and + * stop state. + * 2) set TE/RE + * 3) set xSMB + * 4) set xSMA: xSMA is the last one in this flow, which + * will trigger esai to start. + */ regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK, tx ? ESAI_xCR_TE(pins) : ESAI_xCR_RE(pins)); + mask = tx ? esai_priv->tx_mask : esai_priv->rx_mask; + + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMB(tx), + ESAI_xSMB_xS_MASK, ESAI_xSMB_xS(mask)); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMA(tx), + ESAI_xSMA_xS_MASK, ESAI_xSMA_xS(mask)); + break; case SNDRV_PCM_TRIGGER_SUSPEND: case SNDRV_PCM_TRIGGER_STOP: case SNDRV_PCM_TRIGGER_PAUSE_PUSH: regmap_update_bits(esai_priv->regmap, REG_ESAI_xCR(tx), tx ? ESAI_xCR_TE_MASK : ESAI_xCR_RE_MASK, 0); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMA(tx), + ESAI_xSMA_xS_MASK, 0); + regmap_update_bits(esai_priv->regmap, REG_ESAI_xSMB(tx), + ESAI_xSMB_xS_MASK, 0);
/* Disable and reset FIFO */ regmap_update_bits(esai_priv->regmap, REG_ESAI_xFCR(tx), @@ -906,6 +924,15 @@ static int fsl_esai_probe(struct platfor return ret; }
+ esai_priv->tx_mask = 0xFFFFFFFF; + esai_priv->rx_mask = 0xFFFFFFFF; + + /* Clear the TSMA, TSMB, RSMA, RSMB */ + regmap_write(esai_priv->regmap, REG_ESAI_TSMA, 0); + regmap_write(esai_priv->regmap, REG_ESAI_TSMB, 0); + regmap_write(esai_priv->regmap, REG_ESAI_RSMA, 0); + regmap_write(esai_priv->regmap, REG_ESAI_RSMB, 0); + ret = devm_snd_soc_register_component(&pdev->dev, &fsl_esai_component, &fsl_esai_dai, 1); if (ret) {
From: Filipe Manana fdmanana@suse.com
commit f35f06c35560a86e841631f0243b83a984dc11a9 upstream.
Whan a filesystem is mounted with the nologreplay mount option, which requires it to be mounted in RO mode as well, we can not allow discard on free space inside block groups, because log trees refer to extents that are not pinned in a block group's free space cache (pinning the extents is precisely the first phase of replaying a log tree).
So do not allow the fitrim ioctl to do anything when the filesystem is mounted with the nologreplay option, because later it can be mounted RW without that option, which causes log replay to happen and result in either a failure to replay the log trees (leading to a mount failure), a crash or some silent corruption.
Reported-by: Darrick J. Wong darrick.wong@oracle.com Fixes: 96da09192cda ("btrfs: Introduce new mount option to disable tree log replay") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/ioctl.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(s if (!capable(CAP_SYS_ADMIN)) return -EPERM;
+ /* + * If the fs is mounted with nologreplay, which requires it to be + * mounted in RO mode as well, we can not allow discard on free space + * inside block groups, because log trees refer to extents that are not + * pinned in a block group's free space cache (pinning the extents is + * precisely the first phase of replaying a log tree). + */ + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) + return -EROFS; + rcu_read_lock(); list_for_each_entry_rcu(device, &fs_info->fs_devices->devices, dev_list) {
From: Anand Jain anand.jain@oracle.com
commit 50398fde997f6be8faebdb5f38e9c9c467370f51 upstream.
We let pass zstd compression parameter even if it is not fully valid. For example:
$ btrfs prop set /btrfs compression zst $ btrfs prop get /btrfs compression compression=zst
zlib and lzo are fine.
Fix it by checking the correct prefix length.
Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/props.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -396,7 +396,7 @@ static int prop_compression_apply(struct btrfs_set_fs_incompat(fs_info, COMPRESS_LZO); } else if (!strncmp("zlib", value, 4)) { type = BTRFS_COMPRESS_ZLIB; - } else if (!strncmp("zstd", value, len)) { + } else if (!strncmp("zstd", value, 4)) { type = BTRFS_COMPRESS_ZSTD; btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD); } else {
From: Anand Jain anand.jain@oracle.com
commit 272e5326c7837697882ce3162029ba893059b616 upstream.
The compression property resets to NULL, instead of the old value if we fail to set the new compression parameter.
$ btrfs prop get /btrfs compression compression=lzo $ btrfs prop set /btrfs compression zli ERROR: failed to set compression for /btrfs: Invalid argument $ btrfs prop get /btrfs compression
This is because the compression property ->validate() is successful for 'zli' as the strncmp() used the length passed from the userspace.
Fix it by using the expected string length in strncmp().
Fixes: 63541927c8d1 ("Btrfs: add support for inode properties") Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/props.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -366,11 +366,11 @@ int btrfs_subvol_inherit_props(struct bt
static int prop_compression_validate(const char *value, size_t len) { - if (!strncmp("lzo", value, len)) + if (!strncmp("lzo", value, 3)) return 0; - else if (!strncmp("zlib", value, len)) + else if (!strncmp("zlib", value, 4)) return 0; - else if (!strncmp("zstd", value, len)) + else if (!strncmp("zstd", value, 4)) return 0;
return -EINVAL;
From: Dmitry V. Levin ldv@altlinux.org
commit 10a16997db3d99fc02c026cf2c6e6c670acafab0 upstream.
RISC-V syscall arguments are located in orig_a0,a1..a5 fields of struct pt_regs.
Due to an off-by-one bug and a bug in pointer arithmetic syscall_get_arguments() was reading s3..s7 fields instead of a1..a5. Likewise, syscall_set_arguments() was writing s3..s7 fields instead of a1..a5.
Link: http://lkml.kernel.org/r/20190329171221.GA32456@altlinux.org
Fixes: e2c0cdfba7f69 ("RISC-V: User-facing API") Cc: Ingo Molnar mingo@redhat.com Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: Albert Ou aou@eecs.berkeley.edu Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org # v4.15+ Acked-by: Palmer Dabbelt palmer@sifive.com Signed-off-by: Dmitry V. Levin ldv@altlinux.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/riscv/include/asm/syscall.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/arch/riscv/include/asm/syscall.h +++ b/arch/riscv/include/asm/syscall.h @@ -79,10 +79,11 @@ static inline void syscall_get_arguments if (i == 0) { args[0] = regs->orig_a0; args++; - i++; n--; + } else { + i--; } - memcpy(args, ®s->a1 + i * sizeof(regs->a1), n * sizeof(args[0])); + memcpy(args, ®s->a1 + i, n * sizeof(args[0])); }
static inline void syscall_set_arguments(struct task_struct *task, @@ -94,10 +95,11 @@ static inline void syscall_set_arguments if (i == 0) { regs->orig_a0 = args[0]; args++; - i++; n--; - } - memcpy(®s->a1 + i * sizeof(regs->a1), args, n * sizeof(regs->a0)); + } else { + i--; + } + memcpy(®s->a1 + i, args, n * sizeof(regs->a1)); }
static inline int syscall_get_arch(void)
From: Bart Van Assche bvanassche@acm.org
commit fd9c40f64c514bdc585a21e2e33fa5f83ca8811b upstream.
blk_mq_try_issue_directly() can return BLK_STS*_RESOURCE for requests that have been queued. If that happens when blk_mq_try_issue_directly() is called by the dm-mpath driver then dm-mpath will try to resubmit a request that is already queued and a kernel crash follows. Since it is nontrivial to fix blk_mq_request_issue_directly(), revert the blk_mq_request_issue_directly() changes that went into kernel v5.0.
This patch reverts the following commits: * d6a51a97c0b2 ("blk-mq: replace and kill blk_mq_request_issue_directly") # v5.0. * 5b7a6f128aad ("blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests") # v5.0. * 7f556a44e61d ("blk-mq: refactor the code of issue request directly") # v5.0.
Cc: Christoph Hellwig hch@infradead.org Cc: Ming Lei ming.lei@redhat.com Cc: Jianchao Wang jianchao.w.wang@oracle.com Cc: Hannes Reinecke hare@suse.com Cc: Johannes Thumshirn jthumshirn@suse.de Cc: James Smart james.smart@broadcom.com Cc: Dongli Zhang dongli.zhang@oracle.com Cc: Laurence Oberman loberman@redhat.com Cc: stable@vger.kernel.org Reported-by: Laurence Oberman loberman@redhat.com Tested-by: Laurence Oberman loberman@redhat.com Fixes: 7f556a44e61d ("blk-mq: refactor the code of issue request directly") # v5.0. Signed-off-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/blk-core.c | 4 - block/blk-mq-sched.c | 8 ++- block/blk-mq.c | 122 ++++++++++++++++++++++++++------------------------- block/blk-mq.h | 6 -- 4 files changed, 71 insertions(+), 69 deletions(-)
--- a/block/blk-core.c +++ b/block/blk-core.c @@ -1246,8 +1246,6 @@ static int blk_cloned_rq_check_limits(st */ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *rq) { - blk_qc_t unused; - if (blk_cloned_rq_check_limits(q, rq)) return BLK_STS_IOERR;
@@ -1263,7 +1261,7 @@ blk_status_t blk_insert_cloned_request(s * bypass a potential scheduler on the bottom device for * insert. */ - return blk_mq_try_issue_directly(rq->mq_hctx, rq, &unused, true, true); + return blk_mq_request_issue_directly(rq, true); } EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
--- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -423,10 +423,12 @@ void blk_mq_sched_insert_requests(struct * busy in case of 'none' scheduler, and this way may save * us one extra enqueue & dequeue to sw queue. */ - if (!hctx->dispatch_busy && !e && !run_queue_async) + if (!hctx->dispatch_busy && !e && !run_queue_async) { blk_mq_try_issue_list_directly(hctx, list); - else - blk_mq_insert_requests(hctx, ctx, list); + if (list_empty(list)) + return; + } + blk_mq_insert_requests(hctx, ctx, list); }
blk_mq_run_hw_queue(hctx, run_queue_async); --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1805,74 +1805,76 @@ static blk_status_t __blk_mq_issue_direc return ret; }
-blk_status_t blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, +static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_qc_t *cookie, - bool bypass, bool last) + bool bypass_insert, bool last) { struct request_queue *q = rq->q; bool run_queue = true; - blk_status_t ret = BLK_STS_RESOURCE; - int srcu_idx; - bool force = false;
- hctx_lock(hctx, &srcu_idx); /* - * hctx_lock is needed before checking quiesced flag. + * RCU or SRCU read lock is needed before checking quiesced flag. * - * When queue is stopped or quiesced, ignore 'bypass', insert - * and return BLK_STS_OK to caller, and avoid driver to try to - * dispatch again. + * When queue is stopped or quiesced, ignore 'bypass_insert' from + * blk_mq_request_issue_directly(), and return BLK_STS_OK to caller, + * and avoid driver to try to dispatch again. */ - if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q))) { + if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)) { run_queue = false; - bypass = false; - goto out_unlock; + bypass_insert = false; + goto insert; }
- if (unlikely(q->elevator && !bypass)) - goto out_unlock; + if (q->elevator && !bypass_insert) + goto insert;
if (!blk_mq_get_dispatch_budget(hctx)) - goto out_unlock; + goto insert;
if (!blk_mq_get_driver_tag(rq)) { blk_mq_put_dispatch_budget(hctx); - goto out_unlock; + goto insert; }
- /* - * Always add a request that has been through - *.queue_rq() to the hardware dispatch list. - */ - force = true; - ret = __blk_mq_issue_directly(hctx, rq, cookie, last); -out_unlock: + return __blk_mq_issue_directly(hctx, rq, cookie, last); +insert: + if (bypass_insert) + return BLK_STS_RESOURCE; + + blk_mq_request_bypass_insert(rq, run_queue); + return BLK_STS_OK; +} + +static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, + struct request *rq, blk_qc_t *cookie) +{ + blk_status_t ret; + int srcu_idx; + + might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); + + hctx_lock(hctx, &srcu_idx); + + ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true); + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) + blk_mq_request_bypass_insert(rq, true); + else if (ret != BLK_STS_OK) + blk_mq_end_request(rq, ret); + + hctx_unlock(hctx, srcu_idx); +} + +blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last) +{ + blk_status_t ret; + int srcu_idx; + blk_qc_t unused_cookie; + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; + + hctx_lock(hctx, &srcu_idx); + ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true, last); hctx_unlock(hctx, srcu_idx); - switch (ret) { - case BLK_STS_OK: - break; - case BLK_STS_DEV_RESOURCE: - case BLK_STS_RESOURCE: - if (force) { - blk_mq_request_bypass_insert(rq, run_queue); - /* - * We have to return BLK_STS_OK for the DM - * to avoid livelock. Otherwise, we return - * the real result to indicate whether the - * request is direct-issued successfully. - */ - ret = bypass ? BLK_STS_OK : ret; - } else if (!bypass) { - blk_mq_sched_insert_request(rq, false, - run_queue, false); - } - break; - default: - if (!bypass) - blk_mq_end_request(rq, ret); - break; - }
return ret; } @@ -1880,20 +1882,22 @@ out_unlock: void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, struct list_head *list) { - blk_qc_t unused; - blk_status_t ret = BLK_STS_OK; - while (!list_empty(list)) { + blk_status_t ret; struct request *rq = list_first_entry(list, struct request, queuelist);
list_del_init(&rq->queuelist); - if (ret == BLK_STS_OK) - ret = blk_mq_try_issue_directly(hctx, rq, &unused, - false, + ret = blk_mq_request_issue_directly(rq, list_empty(list)); + if (ret != BLK_STS_OK) { + if (ret == BLK_STS_RESOURCE || + ret == BLK_STS_DEV_RESOURCE) { + blk_mq_request_bypass_insert(rq, list_empty(list)); - else - blk_mq_sched_insert_request(rq, false, true, false); + break; + } + blk_mq_end_request(rq, ret); + } }
/* @@ -1901,7 +1905,7 @@ void blk_mq_try_issue_list_directly(stru * the driver there was more coming, but that turned out to * be a lie. */ - if (ret != BLK_STS_OK && hctx->queue->mq_ops->commit_rqs) + if (!list_empty(list) && hctx->queue->mq_ops->commit_rqs) hctx->queue->mq_ops->commit_rqs(hctx); }
@@ -2014,13 +2018,13 @@ static blk_qc_t blk_mq_make_request(stru if (same_queue_rq) { data.hctx = same_queue_rq->mq_hctx; blk_mq_try_issue_directly(data.hctx, same_queue_rq, - &cookie, false, true); + &cookie); } } else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator && !data.hctx->dispatch_busy)) { blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); - blk_mq_try_issue_directly(data.hctx, rq, &cookie, false, true); + blk_mq_try_issue_directly(data.hctx, rq, &cookie); } else { blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -67,10 +67,8 @@ void blk_mq_request_bypass_insert(struct void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list);
-blk_status_t blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, - struct request *rq, - blk_qc_t *cookie, - bool bypass, bool last); +/* Used by blk_insert_cloned_request() to issue request directly */ +blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last); void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, struct list_head *list);
From: Jérôme Glisse jglisse@redhat.com
commit a3761c3c91209b58b6f33bf69dd8bb8ec0c9d925 upstream.
When bio_add_pc_page() fails in bio_copy_user_iov() we should free the page we just allocated otherwise we are leaking it.
Cc: linux-block@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/bio.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/block/bio.c +++ b/block/bio.c @@ -1238,8 +1238,11 @@ struct bio *bio_copy_user_iov(struct req } }
- if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) + if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) { + if (!map_data) + __free_page(page); break; + }
len -= bytes; offset = 0;
From: Jason Yan yanaijie@huawei.com
commit a89afe58f1a74aac768a5eb77af95ef4ee15beaa upstream.
If the last bio returned is not dio->bio, the status of the bio will not assigned to dio->bio if it is error. This will cause the whole IO status wrong.
ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0] <idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0] <idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>] <idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>] ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475] ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7 ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
We always have to assign bio->bi_status to dio->bio.bi_status because we will only check dio->bio.bi_status when we return the whole IO to the upper layer.
Fixes: 542ff7bf18c6 ("block: new direct I/O implementation") Cc: stable@vger.kernel.org Cc: Christoph Hellwig hch@lst.de Cc: Jens Axboe axboe@kernel.dk Reviewed-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jason Yan yanaijie@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/block_dev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -298,10 +298,10 @@ static void blkdev_bio_end_io(struct bio struct blkdev_dio *dio = bio->bi_private; bool should_dirty = dio->should_dirty;
- if (dio->multi_bio && !atomic_dec_and_test(&dio->ref)) { - if (bio->bi_status && !dio->bio.bi_status) - dio->bio.bi_status = bio->bi_status; - } else { + if (bio->bi_status && !dio->bio.bi_status) + dio->bio.bi_status = bio->bi_status; + + if (!dio->multi_bio || atomic_dec_and_test(&dio->ref)) { if (!dio->is_sync) { struct kiocb *iocb = dio->iocb; ssize_t ret;
From: Stephen Boyd swboyd@chromium.org
commit 325aa19598e410672175ed50982f902d4e3f31c5 upstream.
If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.
This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to walk the chain of parents and set irq wake on any chips that don't have the flag set either. If the intent is to call the .irq_set_wake() callback of the parent irqchip, then we expect irqchip implementations to omit the IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that calls irq_chip_set_wake_parent().
The problem has been observed on a Qualcomm sdm845 device where set wake fails on any GPIO interrupts after applying work in progress wakeup irq patches to the GPIO driver. The chain of chips looks like this:
QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)
The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM GIC as parent. The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag set, and so does the grandparent ARM GIC.
The GPIO driver doesn't know if the parent needs to set wake or not, so it unconditionally calls irq_chip_set_wake_parent() causing this function to return a failure because the parent irqchip (PDC) doesn't have the .irq_set_wake() callback set. Returning 0 instead makes everything work and irqs from the GPIO controller can be configured for wakeup.
Make it consistent by returning 0 (success) from irq_chip_set_wake_parent() when a parent chip has IRQCHIP_SKIP_SET_WAKE set.
[ tglx: Massaged changelog ]
Fixes: 08b55e2a9208e ("genirq: Add irqchip_set_wake_parent") Signed-off-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Marc Zyngier marc.zyngier@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-gpio@vger.kernel.org Cc: Lina Iyer ilina@codeaurora.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/chip.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -1384,6 +1384,10 @@ int irq_chip_set_vcpu_affinity_parent(st int irq_chip_set_wake_parent(struct irq_data *data, unsigned int on) { data = data->parent_data; + + if (data->chip->flags & IRQCHIP_SKIP_SET_WAKE) + return 0; + if (data->chip->irq_set_wake) return data->chip->irq_set_wake(data, on);
From: Kefeng Wang wangkefeng.wang@huawei.com
commit e8458e7afa855317b14915d7b86ab3caceea7eb6 upstream.
When CONFIG_SPARSE_IRQ is disable, the request_mutex in struct irq_desc is not initialized which causes malfunction.
Fixes: 9114014cf4e6 ("genirq: Add mutex to irq desc to serialize request/free_irq()") Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Mukesh Ojha mojha@codeaurora.org Cc: Marc Zyngier marc.zyngier@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190404074512.145533-1-wangkefeng.wang@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/irqdesc.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -558,6 +558,7 @@ int __init early_irq_init(void) alloc_masks(&desc[i], node); raw_spin_lock_init(&desc[i].lock); lockdep_set_class(&desc[i].lock, &irq_desc_lock_class); + mutex_init(&desc[i].request_mutex); desc_set_defaults(i, &desc[i], node, NULL, NULL); } return arch_early_irq_init();
From: Cornelia Huck cohuck@redhat.com
commit cf94db21905333e610e479688add629397a4b384 upstream.
vring_create_virtqueue() allows the caller to specify via the may_reduce_num parameter whether the vring code is allowed to allocate a smaller ring than specified.
However, the split ring allocation code tries to allocate a smaller ring on allocation failure regardless of what the caller specified. This may cause trouble for e.g. virtio-pci in legacy mode, which does not support ring resizing. (The packed ring code does not resize in any case.)
Let's fix this by bailing out immediately in the split ring code if the requested size cannot be allocated and may_reduce_num has not been specified.
While at it, fix a typo in the usage instructions.
Fixes: 2a2d1382fe9d ("virtio: Add improved queue allocation API") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Halil Pasic pasic@linux.ibm.com Reviewed-by: Jens Freimann jfreimann@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/virtio/virtio_ring.c | 2 ++ include/linux/virtio_ring.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -871,6 +871,8 @@ static struct virtqueue *vring_create_vi GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO); if (queue) break; + if (!may_reduce_num) + return NULL; }
if (!num) --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -63,7 +63,7 @@ struct virtqueue; /* * Creates a virtqueue and allocates the descriptor ring. If * may_reduce_num is set, then this may allocate a smaller ring than - * expected. The caller should query virtqueue_get_ring_size to learn + * expected. The caller should query virtqueue_get_vring_size to learn * the actual size of the ring. */ struct virtqueue *vring_create_virtqueue(unsigned int index,
From: Jani Nikula jani.nikula@intel.com
commit 21635d7311734d2d1b177f8a95e2f9386174b76d upstream.
Commit 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast and narrow") started to optize the eDP 1.4+ link config, both per spec and as preparation for display stream compression support.
Sadly, we again face panels that flat out fail with parameters they claim to support. Revert, and go back to the drawing board.
v2: Actually revert to max params instead of just wide-and-slow.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109959 Fixes: 7769db588384 ("drm/i915/dp: optimize eDP 1.4+ link config fast and narrow") Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Manasi Navare manasi.d.navare@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Matt Atwood matthew.s.atwood@intel.com Cc: "Lee, Shawn C" shawn.c.lee@intel.com Cc: Dave Airlie airlied@gmail.com Cc: intel-gfx@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.0+ Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Reviewed-by: Manasi Navare manasi.d.navare@intel.com Tested-by: Albert Astals Cid aacid@kde.org # v5.0 backport Tested-by: Emanuele Panigati ilpanich@gmail.com # v5.0 backport Tested-by: Matteo Iervasi matteoiervasi@gmail.com # v5.0 backport Signed-off-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20190405075220.9815-1-jani.nik... (cherry picked from commit f11cb1c19ad0563b3c1ea5eb16a6bac0e401f428) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/intel_dp.c | 69 +++++----------------------------------- 1 file changed, 10 insertions(+), 59 deletions(-)
--- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -1845,42 +1845,6 @@ intel_dp_compute_link_config_wide(struct return false; }
-/* Optimize link config in order: max bpp, min lanes, min clock */ -static bool -intel_dp_compute_link_config_fast(struct intel_dp *intel_dp, - struct intel_crtc_state *pipe_config, - const struct link_config_limits *limits) -{ - struct drm_display_mode *adjusted_mode = &pipe_config->base.adjusted_mode; - int bpp, clock, lane_count; - int mode_rate, link_clock, link_avail; - - for (bpp = limits->max_bpp; bpp >= limits->min_bpp; bpp -= 2 * 3) { - mode_rate = intel_dp_link_required(adjusted_mode->crtc_clock, - bpp); - - for (lane_count = limits->min_lane_count; - lane_count <= limits->max_lane_count; - lane_count <<= 1) { - for (clock = limits->min_clock; clock <= limits->max_clock; clock++) { - link_clock = intel_dp->common_rates[clock]; - link_avail = intel_dp_max_data_rate(link_clock, - lane_count); - - if (mode_rate <= link_avail) { - pipe_config->lane_count = lane_count; - pipe_config->pipe_bpp = bpp; - pipe_config->port_clock = link_clock; - - return true; - } - } - } - } - - return false; -} - static int intel_dp_dsc_compute_bpp(struct intel_dp *intel_dp, u8 dsc_max_bpc) { int i, num_bpc; @@ -2013,15 +1977,13 @@ intel_dp_compute_link_config(struct inte limits.min_bpp = 6 * 3; limits.max_bpp = intel_dp_compute_bpp(intel_dp, pipe_config);
- if (intel_dp_is_edp(intel_dp) && intel_dp->edp_dpcd[0] < DP_EDP_14) { + if (intel_dp_is_edp(intel_dp)) { /* * Use the maximum clock and number of lanes the eDP panel - * advertizes being capable of. The eDP 1.3 and earlier panels - * are generally designed to support only a single clock and - * lane configuration, and typically these values correspond to - * the native resolution of the panel. With eDP 1.4 rate select - * and DSC, this is decreasingly the case, and we need to be - * able to select less than maximum link config. + * advertizes being capable of. The panels are generally + * designed to support only a single clock and lane + * configuration, and typically these values correspond to the + * native resolution of the panel. */ limits.min_lane_count = limits.max_lane_count; limits.min_clock = limits.max_clock; @@ -2035,22 +1997,11 @@ intel_dp_compute_link_config(struct inte intel_dp->common_rates[limits.max_clock], limits.max_bpp, adjusted_mode->crtc_clock);
- if (intel_dp_is_edp(intel_dp)) - /* - * Optimize for fast and narrow. eDP 1.3 section 3.3 and eDP 1.4 - * section A.1: "It is recommended that the minimum number of - * lanes be used, using the minimum link rate allowed for that - * lane configuration." - * - * Note that we use the max clock and lane count for eDP 1.3 and - * earlier, and fast vs. wide is irrelevant. - */ - ret = intel_dp_compute_link_config_fast(intel_dp, pipe_config, - &limits); - else - /* Optimize for slow and wide. */ - ret = intel_dp_compute_link_config_wide(intel_dp, pipe_config, - &limits); + /* + * Optimize for slow and wide. This is the place to add alternative + * optimization policy. + */ + ret = intel_dp_compute_link_config_wide(intel_dp, pipe_config, &limits);
/* enable compression if the mode doesn't fit available BW */ if (!ret) {
From: Janusz Krzysztofik jmkrzyszt@gmail.com
commit 3e2cf62efec52fb49daed437cc486c3cb9a0afa2 upstream.
In order to request dynamic allocationn of GPIO IDs, a negative number should be passed as a base GPIO ID via platform data. Unfortuntely, commit 771e53c4d1a1 ("ARM: OMAP1: ams-delta: Drop board specific global GPIO numbers") didn't follow that rule while switching to dynamically allocated GPIO IDs for Amstrad Delta latches, making their IDs overlapping with those already assigned to OMAP GPIO devices. Fix it.
Fixes: 771e53c4d1a1 ("ARM: OMAP1: ams-delta: Drop board specific global GPIO numbers") Signed-off-by: Janusz Krzysztofik jmkrzyszt@gmail.com Cc: stable@vger.kernel.org Acked-by: Aaro Koskinen aaro.koskinen@iki.fi Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/mach-omap1/board-ams-delta.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm/mach-omap1/board-ams-delta.c +++ b/arch/arm/mach-omap1/board-ams-delta.c @@ -182,6 +182,7 @@ static struct resource latch1_resources[
static struct bgpio_pdata latch1_pdata = { .label = LATCH1_LABEL, + .base = -1, .ngpio = LATCH1_NGPIO, };
@@ -219,6 +220,7 @@ static struct resource latch2_resources[
static struct bgpio_pdata latch2_pdata = { .label = LATCH2_LABEL, + .base = -1, .ngpio = LATCH2_NGPIO, };
From: Jonas Karlman jonas@kwiboo.se
commit 6b2fde3dbfab6ebc45b0cd605e17ca5057ff9a3b upstream.
The following error can be seen during boot:
of: /cpus/cpu@501: Couldn't find opp node
Change cpu nodes to use operating-points-v2 in order to fix this.
Fixes: ce76de984649 ("ARM: dts: rockchip: convert rk3288 to operating-points-v2") Cc: stable@vger.kernel.org Signed-off-by: Jonas Karlman jonas@kwiboo.se Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/rk3288.dtsi | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/arm/boot/dts/rk3288.dtsi +++ b/arch/arm/boot/dts/rk3288.dtsi @@ -70,7 +70,7 @@ compatible = "arm,cortex-a12"; reg = <0x501>; resets = <&cru SRST_CORE1>; - operating-points = <&cpu_opp_table>; + operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; /* min followed by max */ clock-latency = <40000>; clocks = <&cru ARMCLK>; @@ -80,7 +80,7 @@ compatible = "arm,cortex-a12"; reg = <0x502>; resets = <&cru SRST_CORE2>; - operating-points = <&cpu_opp_table>; + operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; /* min followed by max */ clock-latency = <40000>; clocks = <&cru ARMCLK>; @@ -90,7 +90,7 @@ compatible = "arm,cortex-a12"; reg = <0x503>; resets = <&cru SRST_CORE3>; - operating-points = <&cpu_opp_table>; + operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; /* min followed by max */ clock-latency = <40000>; clocks = <&cru ARMCLK>;
From: Peter Ujfalusi peter.ujfalusi@ti.com
commit 6691370646e844be98bb6558c024269791d20bd7 upstream.
Correctly map the regulators used by tlv320aic3106. Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
Cc: Stable@vger.kernel.org # v4.14+ Signed-off-by: Peter Ujfalusi peter.ujfalusi@ti.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/am335x-evmsk.dts | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
--- a/arch/arm/boot/dts/am335x-evmsk.dts +++ b/arch/arm/boot/dts/am335x-evmsk.dts @@ -73,6 +73,24 @@ enable-active-high; };
+ /* TPS79518 */ + v1_8d_reg: fixedregulator-v1_8d { + compatible = "regulator-fixed"; + regulator-name = "v1_8d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + }; + + /* TPS78633 */ + v3_3d_reg: fixedregulator-v3_3d { + compatible = "regulator-fixed"; + regulator-name = "v3_3d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + }; + leds { pinctrl-names = "default"; pinctrl-0 = <&user_leds_s0>; @@ -501,10 +519,10 @@ status = "okay";
/* Regulators */ - AVDD-supply = <&vaux2_reg>; - IOVDD-supply = <&vaux2_reg>; - DRVDD-supply = <&vaux2_reg>; - DVDD-supply = <&vbat>; + AVDD-supply = <&v3_3d_reg>; + IOVDD-supply = <&v3_3d_reg>; + DRVDD-supply = <&v3_3d_reg>; + DVDD-supply = <&v1_8d_reg>; }; };
From: Peter Ujfalusi peter.ujfalusi@ti.com
commit 4f96dc0a3e79ec257a2b082dab3ee694ff88c317 upstream.
Correctly map the regulators used by tlv320aic3106. Both 1.8V and 3.3V for the codec is derived from VBAT via fixed regulators.
Cc: Stable@vger.kernel.org # v4.14+ Signed-off-by: Peter Ujfalusi peter.ujfalusi@ti.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/am335x-evm.dts | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
--- a/arch/arm/boot/dts/am335x-evm.dts +++ b/arch/arm/boot/dts/am335x-evm.dts @@ -57,6 +57,24 @@ enable-active-high; };
+ /* TPS79501 */ + v1_8d_reg: fixedregulator-v1_8d { + compatible = "regulator-fixed"; + regulator-name = "v1_8d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <1800000>; + regulator-max-microvolt = <1800000>; + }; + + /* TPS79501 */ + v3_3d_reg: fixedregulator-v3_3d { + compatible = "regulator-fixed"; + regulator-name = "v3_3d"; + vin-supply = <&vbat>; + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + }; + matrix_keypad: matrix_keypad0 { compatible = "gpio-matrix-keypad"; debounce-delay-ms = <5>; @@ -499,10 +517,10 @@ status = "okay";
/* Regulators */ - AVDD-supply = <&vaux2_reg>; - IOVDD-supply = <&vaux2_reg>; - DRVDD-supply = <&vaux2_reg>; - DVDD-supply = <&vbat>; + AVDD-supply = <&v3_3d_reg>; + IOVDD-supply = <&v3_3d_reg>; + DRVDD-supply = <&v3_3d_reg>; + DVDD-supply = <&v1_8d_reg>; }; };
From: David Summers beagleboard@davidjohnsummers.uk
commit 8dbc4d5ddb59f49cb3e85bccf42a4720b27a6576 upstream.
The Problem:
On ASUS Tinker Board S, when booting from the eMMC, and there is card in the sd slot, there are constant errors.
Also when warm reboot, uboot can not access the sd slot
Cause:
Identified by Robin Murphy @ ARM. The Card Detect on rk3288 devices is pulled up by vccio-sd; so when the regulator powers this off, card detect gives spurious errors. A second problem, is during power down, vccio-sd apprears to be powered down. This causes a problem when warm rebooting from the sd card. This was identified by Jonas Karlman.
History:
A common fault on these rk3288 board, which impliment the reference design.
When this arose before:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/281153.htm...
And Ulf and Jaehoon clearly said this was a broken card detect design, which should be solved via polling
Solution:
Hence broken-cd is set as a property. This cures the errors. The powering down of vccio-sd during reboot is cured by adding regulator-boot-on.
This solutions has been fairly widely reviewed and tested.
Fixes: e58c5e739d6f ("ARM: dts: rockchip: move shared tinker-board nodes to a common dtsi") Cc: stable@vger.kernel.org [Heiko: slightly inaccurate fixes but tinker is a sbc (aka like a Pi) where we can hopefully expect people not to rely on overly old stable kernels] Signed-off-by: David Summers beagleboard@davidjohnsummers.uk Reviewed-by: Jonas Karlman jonas@kwiboo.se Tested-by: Jonas Karlman jonas@kwiboo.se Reviewed-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/rk3288-tinker.dtsi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm/boot/dts/rk3288-tinker.dtsi +++ b/arch/arm/boot/dts/rk3288-tinker.dtsi @@ -254,6 +254,7 @@ };
vccio_sd: LDO_REG5 { + regulator-boot-on; regulator-min-microvolt = <1800000>; regulator-max-microvolt = <3300000>; regulator-name = "vccio_sd"; @@ -430,7 +431,7 @@ bus-width = <4>; cap-mmc-highspeed; cap-sd-highspeed; - card-detect-delay = <200>; + broken-cd; disable-wp; /* wp not hooked up */ pinctrl-names = "default"; pinctrl-0 = <&sdmmc_clk &sdmmc_cmd &sdmmc_cd &sdmmc_bus4>;
From: David Engraf david.engraf@sysgo.com
commit e7dfb6d04e4715be1f3eb2c60d97b753fd2e4516 upstream.
The function argument for the ISC_D0 on PC9 was incorrect. According to the documentation it should be 'C' aka 3.
Signed-off-by: David Engraf david.engraf@sysgo.com Reviewed-by: Nicolas Ferre nicolas.ferre@microchip.com Signed-off-by: Ludovic Desroches ludovic.desroches@microchip.com Fixes: 7f16cb676c00 ("ARM: at91/dt: add sama5d2 pinmux") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/sama5d2-pinfunc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/sama5d2-pinfunc.h +++ b/arch/arm/boot/dts/sama5d2-pinfunc.h @@ -518,7 +518,7 @@ #define PIN_PC9__GPIO PINMUX_PIN(PIN_PC9, 0, 0) #define PIN_PC9__FIQ PINMUX_PIN(PIN_PC9, 1, 3) #define PIN_PC9__GTSUCOMP PINMUX_PIN(PIN_PC9, 2, 1) -#define PIN_PC9__ISC_D0 PINMUX_PIN(PIN_PC9, 2, 1) +#define PIN_PC9__ISC_D0 PINMUX_PIN(PIN_PC9, 3, 1) #define PIN_PC9__TIOA4 PINMUX_PIN(PIN_PC9, 4, 2) #define PIN_PC10 74 #define PIN_PC10__GPIO PINMUX_PIN(PIN_PC10, 0, 0)
From: Will Deacon will.deacon@arm.com
commit 045afc24124d80c6998d9c770844c67912083506 upstream.
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't explicitly set the return value on the non-faulting path and instead leaves it holding the result of the underlying atomic operation. This means that any FUTEX_WAKE_OP atomic operation which computes a non-zero value will be reported as having failed. Regrettably, I wrote the buggy code back in 2011 and it was upstreamed as part of the initial arm64 support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the futex to zero, and therefore worked as expected. From 2.25 onwards, FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0 to indicate that the atomic operation completed successfully, or -EFAULT if we encountered a fault when accessing the user mapping.
Cc: stable@kernel.org Fixes: 6170a97460db ("arm64: Atomic operations") Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/futex.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -30,8 +30,8 @@ do { \ " prfm pstl1strm, %2\n" \ "1: ldxr %w1, %2\n" \ insn "\n" \ -"2: stlxr %w3, %w0, %2\n" \ -" cbnz %w3, 1b\n" \ +"2: stlxr %w0, %w3, %2\n" \ +" cbnz %w0, 1b\n" \ " dmb ish\n" \ "3:\n" \ " .pushsection .fixup,"ax"\n" \ @@ -50,30 +50,30 @@ do { \ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *_uaddr) { - int oldval = 0, ret, tmp; + int oldval, ret, tmp; u32 __user *uaddr = __uaccess_mask_ptr(_uaddr);
pagefault_disable();
switch (op) { case FUTEX_OP_SET: - __futex_atomic_op("mov %w0, %w4", + __futex_atomic_op("mov %w3, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_ADD: - __futex_atomic_op("add %w0, %w1, %w4", + __futex_atomic_op("add %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_OR: - __futex_atomic_op("orr %w0, %w1, %w4", + __futex_atomic_op("orr %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; case FUTEX_OP_ANDN: - __futex_atomic_op("and %w0, %w1, %w4", + __futex_atomic_op("and %w3, %w1, %w4", ret, oldval, uaddr, tmp, ~oparg); break; case FUTEX_OP_XOR: - __futex_atomic_op("eor %w0, %w1, %w4", + __futex_atomic_op("eor %w3, %w1, %w4", ret, oldval, uaddr, tmp, oparg); break; default:
From: Tomohiro Mayama parly-gh@iris.mystia.org
commit a8772e5d826d0f61f8aa9c284b3ab49035d5273d upstream.
This patch makes USB ports functioning again.
Fixes: 955bebde057e ("arm64: dts: rockchip: add rk3328-rock64 board") Cc: stable@vger.kernel.org Suggested-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Tomohiro Mayama parly-gh@iris.mystia.org Tested-by: Katsuhiro Suzuki katsuhiro@katsuster.net Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts @@ -46,8 +46,7 @@
vcc_host1_5v: vcc_otg_5v: vcc-host1-5v-regulator { compatible = "regulator-fixed"; - enable-active-high; - gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_HIGH>; + gpio = <&gpio0 RK_PA2 GPIO_ACTIVE_LOW>; pinctrl-names = "default"; pinctrl-0 = <&usb20_host_drv>; regulator-name = "vcc_host1_5v";
From: Peter Geis pgwipeout@gmail.com
commit 6fd8b9780ec1a49ac46e0aaf8775247205e66231 upstream.
Several rk3328 based boards experience high rgmii tx error rates. This is due to several pins in the rk3328.dtsi rgmii pinmux that are missing a defined pull strength setting. This causes the pinmux driver to default to 2ma (bit mask 00).
These pins are only defined in the rk3328.dtsi, and are not listed in the rk3328 specification. The TRM only lists them as "Reserved" (RK3328 TRM V1.1, 3.3.3 Detail Register Description, GRF_GPIO0B_IOMUX, GRF_GPIO0C_IOMUX, GRF_GPIO0D_IOMUX). However, removal of these pins from the rgmii pinmux definition causes the interface to fail to transmit.
Also, the rgmii tx and rx pins defined in the dtsi are not consistent with the rk3328 specification, with tx pins currently set to 12ma and rx pins set to 2ma.
Fix this by setting tx pins to 8ma and the rx pins to 4ma, consistent with the specification. Defining the drive strength for the undefined pins eliminated the high tx packet error rate observed under heavy data transfers. Aligning the drive strength to the TRM values eliminated the occasional packet retry errors under iperf3 testing. This allows much higher data rates with no recorded tx errors.
Tested on the rk3328-roc-cc board.
Fixes: 52e02d377a72 ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs") Cc: stable@vger.kernel.org Signed-off-by: Peter Geis pgwipeout@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3328.dtsi | 44 +++++++++++++++---------------- 1 file changed, 22 insertions(+), 22 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3328.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3328.dtsi @@ -1628,50 +1628,50 @@ rgmiim1_pins: rgmiim1-pins { rockchip,pins = /* mac_txclk */ - <1 RK_PB4 2 &pcfg_pull_none_12ma>, + <1 RK_PB4 2 &pcfg_pull_none_8ma>, /* mac_rxclk */ - <1 RK_PB5 2 &pcfg_pull_none_2ma>, + <1 RK_PB5 2 &pcfg_pull_none_4ma>, /* mac_mdio */ - <1 RK_PC3 2 &pcfg_pull_none_2ma>, + <1 RK_PC3 2 &pcfg_pull_none_4ma>, /* mac_txen */ - <1 RK_PD1 2 &pcfg_pull_none_12ma>, + <1 RK_PD1 2 &pcfg_pull_none_8ma>, /* mac_clk */ - <1 RK_PC5 2 &pcfg_pull_none_2ma>, + <1 RK_PC5 2 &pcfg_pull_none_4ma>, /* mac_rxdv */ - <1 RK_PC6 2 &pcfg_pull_none_2ma>, + <1 RK_PC6 2 &pcfg_pull_none_4ma>, /* mac_mdc */ - <1 RK_PC7 2 &pcfg_pull_none_2ma>, + <1 RK_PC7 2 &pcfg_pull_none_4ma>, /* mac_rxd1 */ - <1 RK_PB2 2 &pcfg_pull_none_2ma>, + <1 RK_PB2 2 &pcfg_pull_none_4ma>, /* mac_rxd0 */ - <1 RK_PB3 2 &pcfg_pull_none_2ma>, + <1 RK_PB3 2 &pcfg_pull_none_4ma>, /* mac_txd1 */ - <1 RK_PB0 2 &pcfg_pull_none_12ma>, + <1 RK_PB0 2 &pcfg_pull_none_8ma>, /* mac_txd0 */ - <1 RK_PB1 2 &pcfg_pull_none_12ma>, + <1 RK_PB1 2 &pcfg_pull_none_8ma>, /* mac_rxd3 */ - <1 RK_PB6 2 &pcfg_pull_none_2ma>, + <1 RK_PB6 2 &pcfg_pull_none_4ma>, /* mac_rxd2 */ - <1 RK_PB7 2 &pcfg_pull_none_2ma>, + <1 RK_PB7 2 &pcfg_pull_none_4ma>, /* mac_txd3 */ - <1 RK_PC0 2 &pcfg_pull_none_12ma>, + <1 RK_PC0 2 &pcfg_pull_none_8ma>, /* mac_txd2 */ - <1 RK_PC1 2 &pcfg_pull_none_12ma>, + <1 RK_PC1 2 &pcfg_pull_none_8ma>,
/* mac_txclk */ - <0 RK_PB0 1 &pcfg_pull_none>, + <0 RK_PB0 1 &pcfg_pull_none_8ma>, /* mac_txen */ - <0 RK_PB4 1 &pcfg_pull_none>, + <0 RK_PB4 1 &pcfg_pull_none_8ma>, /* mac_clk */ - <0 RK_PD0 1 &pcfg_pull_none>, + <0 RK_PD0 1 &pcfg_pull_none_4ma>, /* mac_txd1 */ - <0 RK_PC0 1 &pcfg_pull_none>, + <0 RK_PC0 1 &pcfg_pull_none_8ma>, /* mac_txd0 */ - <0 RK_PC1 1 &pcfg_pull_none>, + <0 RK_PC1 1 &pcfg_pull_none_8ma>, /* mac_txd3 */ - <0 RK_PC7 1 &pcfg_pull_none>, + <0 RK_PC7 1 &pcfg_pull_none_8ma>, /* mac_txd2 */ - <0 RK_PC6 1 &pcfg_pull_none>; + <0 RK_PC6 1 &pcfg_pull_none_8ma>; };
rmiim1_pins: rmiim1-pins {
From: Will Deacon will.deacon@arm.com
commit 1e6f5440a6814d28c32d347f338bfef68bc3e69d upstream.
Calling dump_backtrace() with a pt_regs argument corresponding to userspace doesn't make any sense and our unwinder will simply print "Call trace:" before unwinding the stack looking for user frames.
Rather than go through this song and dance, just return early if we're passed a user register state.
Cc: stable@vger.kernel.org Fixes: 1149aad10b1e ("arm64: Add dump_backtrace() in show_regs") Reported-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/traps.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
--- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -102,10 +102,16 @@ static void dump_instr(const char *lvl, void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) { struct stackframe frame; - int skip; + int skip = 0;
pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
+ if (regs) { + if (user_mode(regs)) + return; + skip = 1; + } + if (!tsk) tsk = current;
@@ -126,7 +132,6 @@ void dump_backtrace(struct pt_regs *regs frame.graph = 0; #endif
- skip = !!regs; printk("Call trace:\n"); do { /* skip until specified stack frame */ @@ -176,15 +181,13 @@ static int __die(const char *str, int er return ret;
print_modules(); - __show_regs(regs); pr_emerg("Process %.*s (pid: %d, stack limit = 0x%p)\n", TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), end_of_stack(tsk)); + show_regs(regs);
- if (!user_mode(regs)) { - dump_backtrace(regs, tsk); + if (!user_mode(regs)) dump_instr(KERN_EMERG, regs); - }
return ret; }
From: Ard Biesheuvel ard.biesheuvel@linaro.org
commit 5a3ae7b314a2259b1188b22b392f5eba01e443ee upstream.
The ftrace trampoline code (which deals with modules loaded out of BL range of the core kernel) uses plt_entries_equal() to check whether the per-module trampoline equals a zero buffer, to decide whether the trampoline has already been initialized.
This triggers a BUG() in the opcode manipulation code, since we end up checking the ADRP offset of a 0x0 opcode, which is not an ADRP instruction.
So instead, add a helper to check whether a PLT is initialized, and call that from the frace code.
Cc: stable@vger.kernel.org # v5.0 Fixes: bdb85cd1d206 ("arm64/module: switch to ADRP/ADD sequences for PLT entries") Acked-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/module.h | 5 +++++ arch/arm64/kernel/ftrace.c | 3 +-- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/arch/arm64/include/asm/module.h +++ b/arch/arm64/include/asm/module.h @@ -73,4 +73,9 @@ static inline bool is_forbidden_offset_f struct plt_entry get_plt_entry(u64 dst, void *pc); bool plt_entries_equal(const struct plt_entry *a, const struct plt_entry *b);
+static inline bool plt_entry_is_initialized(const struct plt_entry *e) +{ + return e->adrp || e->add || e->br; +} + #endif /* __ASM_MODULE_H */ --- a/arch/arm64/kernel/ftrace.c +++ b/arch/arm64/kernel/ftrace.c @@ -107,8 +107,7 @@ int ftrace_make_call(struct dyn_ftrace * trampoline = get_plt_entry(addr, mod->arch.ftrace_trampoline); if (!plt_entries_equal(mod->arch.ftrace_trampoline, &trampoline)) { - if (!plt_entries_equal(mod->arch.ftrace_trampoline, - &(struct plt_entry){})) { + if (plt_entry_is_initialized(mod->arch.ftrace_trampoline)) { pr_err("ftrace: far branches to multiple entry points unsupported inside a single module\n"); return -EINVAL; }
From: Moni Shoua monis@mellanox.com
commit 1abe186ed8a6593069bc122da55fc684383fdc1c upstream.
If page-fault handler spans multiple MRs then the access mask needs to be reset before each MR handling or otherwise write access will be granted to mapped pages instead of read-only.
Cc: stable@vger.kernel.org # 3.19 Fixes: 7bdf65d411c1 ("IB/mlx5: Handle page faults") Reported-by: Jerome Glisse jglisse@redhat.com Signed-off-by: Moni Shoua monis@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/mlx5/odp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -560,7 +560,7 @@ static int pagefault_mr(struct mlx5_ib_d struct ib_umem_odp *odp_mr = to_ib_umem_odp(mr->umem); bool downgrade = flags & MLX5_PF_FLAGS_DOWNGRADE; bool prefetch = flags & MLX5_PF_FLAGS_PREFETCH; - u64 access_mask = ODP_READ_ALLOWED_BIT; + u64 access_mask; u64 start_idx, page_mask; struct ib_umem_odp *odp; size_t size; @@ -582,6 +582,7 @@ next_mr: page_shift = mr->umem->page_shift; page_mask = ~(BIT(page_shift) - 1); start_idx = (io_virt - (mr->mmkey.iova & page_mask)) >> page_shift; + access_mask = ODP_READ_ALLOWED_BIT;
if (prefetch && !downgrade && !mr->umem->writable) { /* prefetch with write-access must
From: Dan Carpenter dan.carpenter@oracle.com
commit 42d8644bd77dd2d747e004e367cb0c895a606f39 upstream.
The "call" variable comes from the user in privcmd_ioctl_hypercall(). It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32) elements. We need to put an upper bound on it to prevent an out of bounds access.
Cc: stable@vger.kernel.org Fixes: 1246ae0bb992 ("xen: add variable hypercall caller") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/xen/hypercall.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -206,6 +206,9 @@ xen_single_call(unsigned int call, __HYPERCALL_DECLS; __HYPERCALL_5ARG(a1, a2, a3, a4, a5);
+ if (call >= PAGE_SIZE / sizeof(hypercall_page[0])) + return -EINVAL; + asm volatile(CALL_NOSPEC : __HYPERCALL_5PARAM : [thunk_target] "a" (&hypercall_page[call])
From: Mel Gorman mgorman@techsingularity.net
commit 0e9f02450da07fc7b1346c8c32c771555173e397 upstream.
A NULL pointer dereference bug was reported on a distribution kernel but the same issue should be present on mainline kernel. It occured on s390 but should not be arch-specific. A partial oops looks like:
Unable to handle kernel pointer dereference in virtual kernel address space ... Call Trace: ... try_to_wake_up+0xfc/0x450 vhost_poll_wakeup+0x3a/0x50 [vhost] __wake_up_common+0xbc/0x178 __wake_up_common_lock+0x9e/0x160 __wake_up_sync_key+0x4e/0x60 sock_def_readable+0x5e/0x98
The bug hits any time between 1 hour to 3 days. The dereference occurs in update_cfs_rq_h_load when accumulating h_load. The problem is that cfq_rq->h_load_next is not protected by any locking and can be updated by parallel calls to task_h_load. Depending on the compiler, code may be generated that re-reads cfq_rq->h_load_next after the check for NULL and then oops when reading se->avg.load_avg. The dissassembly showed that it was possible to reread h_load_next after the check for NULL.
While this does not appear to be an issue for later compilers, it's still an accident if the correct code is generated. Full locking in this path would have high overhead so this patch uses READ_ONCE to read h_load_next only once and check for NULL before dereferencing. It was confirmed that there were no further oops after 10 days of testing.
As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any potential problems with store tearing.
Signed-off-by: Mel Gorman mgorman@techsingularity.net Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Valentin Schneider valentin.schneider@arm.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mike Galbraith efault@gmx.de Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()") Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.ne... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7713,10 +7713,10 @@ static void update_cfs_rq_h_load(struct if (cfs_rq->last_h_load_update == now) return;
- cfs_rq->h_load_next = NULL; + WRITE_ONCE(cfs_rq->h_load_next, NULL); for_each_sched_entity(se) { cfs_rq = cfs_rq_of(se); - cfs_rq->h_load_next = se; + WRITE_ONCE(cfs_rq->h_load_next, se); if (cfs_rq->last_h_load_update == now) break; } @@ -7726,7 +7726,7 @@ static void update_cfs_rq_h_load(struct cfs_rq->last_h_load_update = now; }
- while ((se = cfs_rq->h_load_next) != NULL) { + while ((se = READ_ONCE(cfs_rq->h_load_next)) != NULL) { load = cfs_rq->h_load; load = div64_ul(load * se->avg.load_avg, cfs_rq_load_avg(cfs_rq) + 1);
From: Max Filippov jcmvbkbc@gmail.com
commit ada770b1e74a77fff2d5f539bf6c42c25f4784db upstream.
return_address returns the address that is one level higher in the call stack than requested in its argument, because level 0 corresponds to its caller's return address. Use requested level as the number of stack frames to skip.
This fixes the address reported by might_sleep and friends.
Cc: stable@vger.kernel.org Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/xtensa/kernel/stacktrace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/xtensa/kernel/stacktrace.c +++ b/arch/xtensa/kernel/stacktrace.c @@ -253,10 +253,14 @@ static int return_address_cb(struct stac return 1; }
+/* + * level == 0 is for the return address from the caller of this function, + * not from this function itself. + */ unsigned long return_address(unsigned level) { struct return_addr_data r = { - .skip = level + 1, + .skip = level, }; walk_stackframe(stack_pointer(NULL), return_address_cb, &r); return r.addr;
From: Dmitry V. Levin ldv@altlinux.org
commit ed3bb007021b9bddb90afae28a19f08ed8890add upstream.
C-SKY syscall arguments are located in orig_a0,a1,a2,a3,regs[0],regs[1] fields of struct pt_regs.
Due to an off-by-one bug and a bug in pointer arithmetic syscall_get_arguments() was reading orig_a0,regs[1..5] fields instead. Likewise, syscall_set_arguments() was writing orig_a0,regs[1..5] fields instead.
Link: http://lkml.kernel.org/r/20190329171230.GB32456@altlinux.org
Fixes: 4859bfca11c7d ("csky: System Call") Cc: Ingo Molnar mingo@redhat.com Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: stable@vger.kernel.org # v4.20+ Tested-by: Guo Ren ren_guo@c-sky.com Acked-by: Guo Ren ren_guo@c-sky.com Signed-off-by: Dmitry V. Levin ldv@altlinux.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/csky/include/asm/syscall.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/arch/csky/include/asm/syscall.h +++ b/arch/csky/include/asm/syscall.h @@ -49,10 +49,11 @@ syscall_get_arguments(struct task_struct if (i == 0) { args[0] = regs->orig_a0; args++; - i++; n--; + } else { + i--; } - memcpy(args, ®s->a1 + i * sizeof(regs->a1), n * sizeof(args[0])); + memcpy(args, ®s->a1 + i, n * sizeof(args[0])); }
static inline void @@ -63,10 +64,11 @@ syscall_set_arguments(struct task_struct if (i == 0) { regs->orig_a0 = args[0]; args++; - i++; n--; + } else { + i--; } - memcpy(®s->a1 + i * sizeof(regs->a1), args, n * sizeof(regs->a0)); + memcpy(®s->a1 + i, args, n * sizeof(regs->a1)); }
static inline int
From: Rasmus Villemoes linux@rasmusvillemoes.dk
commit 88ca66d8540ca26119b1428cddb96b37925bdf01 upstream.
The minimum supported gcc version is >= 4.6, so these can be removed.
Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Signed-off-by: Borislav Petkov bp@suse.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: Dan Williams dan.j.williams@intel.com Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Ingo Molnar mingo@redhat.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: x86-ml x86@kernel.org Link: https://lkml.kernel.org/r/20190111084931.24601-1-linux@rasmusvillemoes.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/bitops.h | 6 ------ arch/x86/include/asm/string_32.h | 20 -------------------- arch/x86/include/asm/string_64.h | 15 --------------- 3 files changed, 41 deletions(-)
--- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -36,13 +36,7 @@ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1). */
-#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1) -/* Technically wrong, but this avoids compilation errors on some gcc - versions. */ -#define BITOP_ADDR(x) "=m" (*(volatile long *) (x)) -#else #define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) -#endif
#define ADDR BITOP_ADDR(addr)
--- a/arch/x86/include/asm/string_32.h +++ b/arch/x86/include/asm/string_32.h @@ -179,14 +179,7 @@ static inline void *__memcpy3d(void *to, * No 3D Now! */
-#if (__GNUC__ >= 4) #define memcpy(t, f, n) __builtin_memcpy(t, f, n) -#else -#define memcpy(t, f, n) \ - (__builtin_constant_p((n)) \ - ? __constant_memcpy((t), (f), (n)) \ - : __memcpy((t), (f), (n))) -#endif
#endif #endif /* !CONFIG_FORTIFY_SOURCE */ @@ -282,12 +275,7 @@ void *__constant_c_and_count_memset(void
{ int d0, d1; -#if __GNUC__ == 4 && __GNUC_MINOR__ == 0 - /* Workaround for broken gcc 4.0 */ - register unsigned long eax asm("%eax") = pattern; -#else unsigned long eax = pattern; -#endif
switch (count % 4) { case 0: @@ -321,15 +309,7 @@ void *__constant_c_and_count_memset(void #define __HAVE_ARCH_MEMSET extern void *memset(void *, int, size_t); #ifndef CONFIG_FORTIFY_SOURCE -#if (__GNUC__ >= 4) #define memset(s, c, count) __builtin_memset(s, c, count) -#else -#define memset(s, c, count) \ - (__builtin_constant_p(c) \ - ? __constant_c_x_memset((s), (0x01010101UL * (unsigned char)(c)), \ - (count)) \ - : __memset((s), (c), (count))) -#endif #endif /* !CONFIG_FORTIFY_SOURCE */
#define __HAVE_ARCH_MEMSET16 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -14,21 +14,6 @@ extern void *memcpy(void *to, const void *from, size_t len); extern void *__memcpy(void *to, const void *from, size_t len);
-#ifndef CONFIG_FORTIFY_SOURCE -#if (__GNUC__ == 4 && __GNUC_MINOR__ < 3) || __GNUC__ < 4 -#define memcpy(dst, src, len) \ -({ \ - size_t __len = (len); \ - void *__ret; \ - if (__builtin_constant_p(len) && __len >= 64) \ - __ret = __memcpy((dst), (src), __len); \ - else \ - __ret = __builtin_memcpy((dst), (src), __len); \ - __ret; \ -}) -#endif -#endif /* !CONFIG_FORTIFY_SOURCE */ - #define __HAVE_ARCH_MEMSET void *memset(void *s, int c, size_t n); void *__memset(void *s, int c, size_t n);
From: Alexander Potapenko glider@google.com
commit 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 upstream.
There's a number of problems with how arch/x86/include/asm/bitops.h is currently using assembly constraints for the memory region bitops are modifying:
1) Use memory clobber in bitops that touch arbitrary memory
Certain bit operations that read/write bits take a base pointer and an arbitrarily large offset to address the bit relative to that base. Inline assembly constraints aren't expressive enough to tell the compiler that the assembly directive is going to touch a specific memory location of unknown size, therefore we have to use the "memory" clobber to indicate that the assembly is going to access memory locations other than those listed in the inputs/outputs.
To indicate that BTR/BTS instructions don't necessarily touch the first sizeof(long) bytes of the argument, we also move the address to assembly inputs.
This particular change leads to size increase of 124 kernel functions in a defconfig build. For some of them the diff is in NOP operations, other end up re-reading values from memory and may potentially slow down the execution. But without these clobbers the compiler is free to cache the contents of the bitmaps and use them as if they weren't changed by the inline assembly.
2) Use byte-sized arguments for operations touching single bytes.
Passing a long value to ANDB/ORB/XORB instructions makes the compiler treat sizeof(long) bytes as being clobbered, which isn't the case. This may theoretically lead to worse code in the case of heavy optimization.
Practical impact:
I've built a defconfig kernel and looked through some of the functions generated by GCC 7.3.0 with and without this clobber, and didn't spot any miscompilations.
However there is a (trivial) theoretical case where this code leads to miscompilation:
https://lkml.org/lkml/2019/3/28/393
using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes such a function in the kernel someday.
So the primary motivation is to fix an existing misuse of the asm directive, which happens to work in certain configurations now, but isn't guaranteed to work under different circumstances.
[ --mingo: Added -stable tag because defconfig only builds a fraction of the kernel and the trivial testcase looks normal enough to be used in existing or in-development code. ]
Signed-off-by: Alexander Potapenko glider@google.com Cc: stable@vger.kernel.org Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Dmitry Vyukov dvyukov@google.com Cc: H. Peter Anvin hpa@zytor.com Cc: James Y Knight jyknight@google.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Paul E. McKenney paulmck@linux.ibm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com [ Edited the changelog, tidied up one of the defines. ] Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/bitops.h | 41 ++++++++++++++++++----------------------- 1 file changed, 18 insertions(+), 23 deletions(-)
--- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -36,16 +36,17 @@ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1). */
-#define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) +#define RLONG_ADDR(x) "m" (*(volatile long *) (x)) +#define WBYTE_ADDR(x) "+m" (*(volatile char *) (x))
-#define ADDR BITOP_ADDR(addr) +#define ADDR RLONG_ADDR(addr)
/* * We do the locked ops that don't return the old value as * a mask operation on a byte. */ #define IS_IMMEDIATE(nr) (__builtin_constant_p(nr)) -#define CONST_MASK_ADDR(nr, addr) BITOP_ADDR((void *)(addr) + ((nr)>>3)) +#define CONST_MASK_ADDR(nr, addr) WBYTE_ADDR((void *)(addr) + ((nr)>>3)) #define CONST_MASK(nr) (1 << ((nr) & 7))
/** @@ -73,7 +74,7 @@ set_bit(long nr, volatile unsigned long : "memory"); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0" - : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); + : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } }
@@ -88,7 +89,7 @@ set_bit(long nr, volatile unsigned long */ static __always_inline void __set_bit(long nr, volatile unsigned long *addr) { - asm volatile(__ASM_SIZE(bts) " %1,%0" : ADDR : "Ir" (nr) : "memory"); + asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); }
/** @@ -110,8 +111,7 @@ clear_bit(long nr, volatile unsigned lon : "iq" ((u8)~CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); + : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } }
@@ -131,7 +131,7 @@ static __always_inline void clear_bit_un
static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) { - asm volatile(__ASM_SIZE(btr) " %1,%0" : ADDR : "Ir" (nr)); + asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); }
static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr) @@ -139,7 +139,7 @@ static __always_inline bool clear_bit_un bool negative; asm volatile(LOCK_PREFIX "andb %2,%1" CC_SET(s) - : CC_OUT(s) (negative), ADDR + : CC_OUT(s) (negative), WBYTE_ADDR(addr) : "ir" ((char) ~(1 << nr)) : "memory"); return negative; } @@ -155,13 +155,9 @@ static __always_inline bool clear_bit_un * __clear_bit() is non-atomic and implies release semantics before the memory * operation. It can be used for an unlock if no other CPUs can concurrently * modify other bits in the word. - * - * No memory barrier is required here, because x86 cannot reorder stores past - * older loads. Same principle as spin_unlock. */ static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *addr) { - barrier(); __clear_bit(nr, addr); }
@@ -176,7 +172,7 @@ static __always_inline void __clear_bit_ */ static __always_inline void __change_bit(long nr, volatile unsigned long *addr) { - asm volatile(__ASM_SIZE(btc) " %1,%0" : ADDR : "Ir" (nr)); + asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); }
/** @@ -196,8 +192,7 @@ static __always_inline void change_bit(l : "iq" ((u8)CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(btc) " %1,%0" - : BITOP_ADDR(addr) - : "Ir" (nr)); + : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } }
@@ -242,8 +237,8 @@ static __always_inline bool __test_and_s
asm(__ASM_SIZE(bts) " %2,%1" CC_SET(c) - : CC_OUT(c) (oldbit), ADDR - : "Ir" (nr)); + : CC_OUT(c) (oldbit) + : ADDR, "Ir" (nr) : "memory"); return oldbit; }
@@ -282,8 +277,8 @@ static __always_inline bool __test_and_c
asm volatile(__ASM_SIZE(btr) " %2,%1" CC_SET(c) - : CC_OUT(c) (oldbit), ADDR - : "Ir" (nr)); + : CC_OUT(c) (oldbit) + : ADDR, "Ir" (nr) : "memory"); return oldbit; }
@@ -294,8 +289,8 @@ static __always_inline bool __test_and_c
asm volatile(__ASM_SIZE(btc) " %2,%1" CC_SET(c) - : CC_OUT(c) (oldbit), ADDR - : "Ir" (nr) : "memory"); + : CC_OUT(c) (oldbit) + : ADDR, "Ir" (nr) : "memory");
return oldbit; } @@ -326,7 +321,7 @@ static __always_inline bool variable_tes asm volatile(__ASM_SIZE(bt) " %2,%1" CC_SET(c) : CC_OUT(c) (oldbit) - : "m" (*(unsigned long *)addr), "Ir" (nr)); + : "m" (*(unsigned long *)addr), "Ir" (nr) : "memory");
return oldbit; }
From: Lendacky, Thomas Thomas.Lendacky@amd.com
commit 914123fa39042e651d79eaf86bbf63a1b938dddf upstream.
On AMD processors, the detection of an overflowed counter in the NMI handler relies on the current value of the counter. So, for example, to check for overflow on a 48 bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed).
There is currently a race condition present when disabling and then updating the PMC. Increased NMI latency in newer AMD processors makes this race condition more pronounced. If the counter value has overflowed, it is possible to update the PMC value before the NMI handler can run. The updated PMC value is not an overflowed value, so when the perf NMI handler does run, it will not find an overflowed counter. This may appear as an unknown NMI resulting in either a panic or a series of messages, depending on how the kernel is configured.
To eliminate this race condition, the PMC value must be checked after disabling the counter. Add an AMD function, amd_pmu_disable_all(), that will wait for the NMI handler to reset any active and overflowed counter after calling x86_pmu_disable_all().
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.14.x- Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 65 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 3 deletions(-)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -3,6 +3,7 @@ #include <linux/types.h> #include <linux/init.h> #include <linux/slab.h> +#include <linux/delay.h> #include <asm/apicdef.h>
#include "../perf_event.h" @@ -429,6 +430,64 @@ static void amd_pmu_cpu_dead(int cpu) } }
+/* + * When a PMC counter overflows, an NMI is used to process the event and + * reset the counter. NMI latency can result in the counter being updated + * before the NMI can run, which can result in what appear to be spurious + * NMIs. This function is intended to wait for the NMI to run and reset + * the counter to avoid possible unhandled NMI messages. + */ +#define OVERFLOW_WAIT_COUNT 50 + +static void amd_pmu_wait_on_overflow(int idx) +{ + unsigned int i; + u64 counter; + + /* + * Wait for the counter to be reset if it has overflowed. This loop + * should exit very, very quickly, but just in case, don't wait + * forever... + */ + for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) { + rdmsrl(x86_pmu_event_addr(idx), counter); + if (counter & (1ULL << (x86_pmu.cntval_bits - 1))) + break; + + /* Might be in IRQ context, so can't sleep */ + udelay(1); + } +} + +static void amd_pmu_disable_all(void) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + int idx; + + x86_pmu_disable_all(); + + /* + * This shouldn't be called from NMI context, but add a safeguard here + * to return, since if we're in NMI context we can't wait for an NMI + * to reset an overflowed counter value. + */ + if (in_nmi()) + return; + + /* + * Check each counter for overflow and wait for it to be reset by the + * NMI if it has overflowed. This relies on the fact that all active + * counters are always enabled when this function is caled and + * ARCH_PERFMON_EVENTSEL_INT is always set. + */ + for (idx = 0; idx < x86_pmu.num_counters; idx++) { + if (!test_bit(idx, cpuc->active_mask)) + continue; + + amd_pmu_wait_on_overflow(idx); + } +} + static struct event_constraint * amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, struct perf_event *event) @@ -622,7 +681,7 @@ static ssize_t amd_event_sysfs_show(char static __initconst const struct x86_pmu amd_pmu = { .name = "AMD", .handle_irq = x86_pmu_handle_irq, - .disable_all = x86_pmu_disable_all, + .disable_all = amd_pmu_disable_all, .enable_all = x86_pmu_enable_all, .enable = x86_pmu_enable_event, .disable = x86_pmu_disable_event, @@ -732,7 +791,7 @@ void amd_pmu_enable_virt(void) cpuc->perf_ctr_virt_mask = 0;
/* Reload all events */ - x86_pmu_disable_all(); + amd_pmu_disable_all(); x86_pmu_enable_all(0); } EXPORT_SYMBOL_GPL(amd_pmu_enable_virt); @@ -750,7 +809,7 @@ void amd_pmu_disable_virt(void) cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY;
/* Reload all events */ - x86_pmu_disable_all(); + amd_pmu_disable_all(); x86_pmu_enable_all(0); } EXPORT_SYMBOL_GPL(amd_pmu_disable_virt);
From: Lendacky, Thomas Thomas.Lendacky@amd.com
commit 6d3edaae16c6c7d238360f2841212c2b26774d5e upstream.
On AMD processors, the detection of an overflowed PMC counter in the NMI handler relies on the current value of the PMC. So, for example, to check for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed).
When the perf NMI handler executes it does not know in advance which PMC counters have overflowed. As such, the NMI handler will process all active PMC counters that have overflowed. NMI latency in newer AMD processors can result in multiple overflowed PMC counters being processed in one NMI and then a subsequent NMI, that does not appear to be a back-to-back NMI, not finding any PMC counters that have overflowed. This may appear to be an unhandled NMI resulting in either a panic or a series of messages, depending on how the kernel was configured.
To mitigate this issue, add an AMD handle_irq callback function, amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq() function and upon return perform some additional processing that will indicate if the NMI has been handled or would have been handled had an earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a minimum value of the number of active PMCs or 2 will be set whenever a PMC is active. This is used to indicate the possible number of NMIs that can still occur. The value of 2 is used for when an NMI does not arrive at the LAPIC in time to be collapsed into an already pending NMI. Each time the function is called without having handled an overflowed counter, the per-CPU value is checked. If the value is non-zero, it is decremented and the NMI indicates that it handled the NMI. If the value is zero, then the NMI indicates that it did not handle the NMI.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.14.x- Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 56 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -4,10 +4,13 @@ #include <linux/init.h> #include <linux/slab.h> #include <linux/delay.h> +#include <linux/nmi.h> #include <asm/apicdef.h>
#include "../perf_event.h"
+static DEFINE_PER_CPU(unsigned int, perf_nmi_counter); + static __initconst const u64 amd_hw_cache_event_ids [PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_OP_MAX] @@ -488,6 +491,57 @@ static void amd_pmu_disable_all(void) } }
+/* + * Because of NMI latency, if multiple PMC counters are active or other sources + * of NMIs are received, the perf NMI handler can handle one or more overflowed + * PMC counters outside of the NMI associated with the PMC overflow. If the NMI + * doesn't arrive at the LAPIC in time to become a pending NMI, then the kernel + * back-to-back NMI support won't be active. This PMC handler needs to take into + * account that this can occur, otherwise this could result in unknown NMI + * messages being issued. Examples of this is PMC overflow while in the NMI + * handler when multiple PMCs are active or PMC overflow while handling some + * other source of an NMI. + * + * Attempt to mitigate this by using the number of active PMCs to determine + * whether to return NMI_HANDLED if the perf NMI handler did not handle/reset + * any PMCs. The per-CPU perf_nmi_counter variable is set to a minimum of the + * number of active PMCs or 2. The value of 2 is used in case an NMI does not + * arrive at the LAPIC in time to be collapsed into an already pending NMI. + */ +static int amd_pmu_handle_irq(struct pt_regs *regs) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + int active, handled; + + /* + * Obtain the active count before calling x86_pmu_handle_irq() since + * it is possible that x86_pmu_handle_irq() may make a counter + * inactive (through x86_pmu_stop). + */ + active = __bitmap_weight(cpuc->active_mask, X86_PMC_IDX_MAX); + + /* Process any counter overflows */ + handled = x86_pmu_handle_irq(regs); + + /* + * If a counter was handled, record the number of possible remaining + * NMIs that can occur. + */ + if (handled) { + this_cpu_write(perf_nmi_counter, + min_t(unsigned int, 2, active)); + + return handled; + } + + if (!this_cpu_read(perf_nmi_counter)) + return NMI_DONE; + + this_cpu_dec(perf_nmi_counter); + + return NMI_HANDLED; +} + static struct event_constraint * amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx, struct perf_event *event) @@ -680,7 +734,7 @@ static ssize_t amd_event_sysfs_show(char
static __initconst const struct x86_pmu amd_pmu = { .name = "AMD", - .handle_irq = x86_pmu_handle_irq, + .handle_irq = amd_pmu_handle_irq, .disable_all = amd_pmu_disable_all, .enable_all = x86_pmu_enable_all, .enable = x86_pmu_enable_event,
From: Lendacky, Thomas Thomas.Lendacky@amd.com
commit 3966c3feca3fd10b2935caa0b4a08c7dd59469e5 upstream.
Spurious interrupt support was added to perf in the following commit, almost a decade ago:
63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters")
The two previous patches (resolving the race condition when disabling a PMC and NMI latency mitigation) allow for the removal of this older spurious interrupt support.
Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap is cleared before disabling the PMC, which sets up a race condition. This race condition was mitigated by introducing the running bitmap. That race condition can be eliminated by first disabling the PMC, waiting for PMC reset on overflow and then clearing the bit for the PMC in the active_mask bitmap. The NMI handler will not re-enable a disabled counter.
If x86_pmu_stop() is called from the perf NMI handler, the NMI latency mitigation support will guard against any unhandled NMI messages.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.14.x- Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 21 +++++++++++++++++++-- arch/x86/events/core.c | 13 +++---------- 2 files changed, 22 insertions(+), 12 deletions(-)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -4,8 +4,8 @@ #include <linux/init.h> #include <linux/slab.h> #include <linux/delay.h> -#include <linux/nmi.h> #include <asm/apicdef.h> +#include <asm/nmi.h>
#include "../perf_event.h"
@@ -491,6 +491,23 @@ static void amd_pmu_disable_all(void) } }
+static void amd_pmu_disable_event(struct perf_event *event) +{ + x86_pmu_disable_event(event); + + /* + * This can be called from NMI context (via x86_pmu_stop). The counter + * may have overflowed, but either way, we'll never see it get reset + * by the NMI if we're already in the NMI. And the NMI latency support + * below will take care of any pending NMI that might have been + * generated by the overflow. + */ + if (in_nmi()) + return; + + amd_pmu_wait_on_overflow(event->hw.idx); +} + /* * Because of NMI latency, if multiple PMC counters are active or other sources * of NMIs are received, the perf NMI handler can handle one or more overflowed @@ -738,7 +755,7 @@ static __initconst const struct x86_pmu .disable_all = amd_pmu_disable_all, .enable_all = x86_pmu_enable_all, .enable = x86_pmu_enable_event, - .disable = x86_pmu_disable_event, + .disable = amd_pmu_disable_event, .hw_config = amd_pmu_hw_config, .schedule_events = x86_schedule_events, .eventsel = MSR_K7_EVNTSEL0, --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1349,8 +1349,9 @@ void x86_pmu_stop(struct perf_event *eve struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); struct hw_perf_event *hwc = &event->hw;
- if (__test_and_clear_bit(hwc->idx, cpuc->active_mask)) { + if (test_bit(hwc->idx, cpuc->active_mask)) { x86_pmu.disable(event); + __clear_bit(hwc->idx, cpuc->active_mask); cpuc->events[hwc->idx] = NULL; WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); hwc->state |= PERF_HES_STOPPED; @@ -1447,16 +1448,8 @@ int x86_pmu_handle_irq(struct pt_regs *r apic_write(APIC_LVTPC, APIC_DM_NMI);
for (idx = 0; idx < x86_pmu.num_counters; idx++) { - if (!test_bit(idx, cpuc->active_mask)) { - /* - * Though we deactivated the counter some cpus - * might still deliver spurious interrupts still - * in flight. Catch them: - */ - if (__test_and_clear_bit(idx, cpuc->running)) - handled++; + if (!test_bit(idx, cpuc->active_mask)) continue; - }
event = cpuc->events[idx];
From: Andre Przywara andre.przywara@arm.com
commit 9cde402a59770a0669d895399c13407f63d7d209 upstream.
There is a Marvell 88SE9170 PCIe SATA controller I found on a board here. Some quick testing with the ARM SMMU enabled reveals that it suffers from the same requester ID mixup problems as the other Marvell chips listed already.
Add the PCI vendor/device ID to the list of chips which need the workaround.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3877,6 +3877,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_M /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c14 */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9130, quirk_dma_func1_alias); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9170, + quirk_dma_func1_alias); /* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c47 + c57 */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9172, quirk_dma_func1_alias);
From: Sergey Miroshnichenko s.miroshnichenko@yadro.com
commit 3943af9d01e94330d0cfac6fccdbc829aad50c92 upstream.
During a safe hot remove, the OS powers off the slot, which may cause a Data Link Layer State Changed event. The slot has already been set to OFF_STATE, so that event results in re-enabling the device, making it impossible to safely remove it.
Clear out the Presence Detect Changed and Data Link Layer State Changed events when the disabled slot has settled down.
It is still possible to re-enable the device if it remains in the slot after pressing the Attention Button by pressing it again.
Fixes the problem that Micah reported below: an NVMe drive power button may not actually turn off the drive.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203237 Reported-by: Micah Parrish micah.parrish@hpe.com Tested-by: Micah Parrish micah.parrish@hpe.com Signed-off-by: Sergey Miroshnichenko s.miroshnichenko@yadro.com [bhelgaas: changelog, add bugzilla URL] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Lukas Wunner lukas@wunner.de Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/hotplug/pciehp_ctrl.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -115,6 +115,10 @@ static void remove_board(struct controll * removed from the slot/adapter. */ msleep(1000); + + /* Ignore link or presence changes caused by power off */ + atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC), + &ctrl->pending_events); }
/* turn off Green LED */
From: Chuck Lever chuck.lever@oracle.com
commit e1ede312f17e96a9c5cda9aaa1cdcf442c1a5da8 upstream.
We want to drain only the RQ first. Otherwise the transport can deadlock on ->close if there are outstanding Send completions.
Fixes: 6d2d0ee27c7a ("xprtrdma: Replace rpcrdma_receive_wq ... ") Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/xprtrdma/verbs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -90,7 +90,7 @@ static void rpcrdma_xprt_drain(struct rp /* Flush Receives, then wait for deferred Reply work * to complete. */ - ib_drain_qp(ia->ri_id->qp); + ib_drain_rq(ia->ri_id->qp); drain_workqueue(buf->rb_completion_wq);
/* Deferred Reply processing might have scheduled
From: Nicholas Piggin npiggin@gmail.com
commit 7100e8704b61247649c50551b965e71d168df30b upstream.
Commit 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C") broke the radix-mode segment exception handler. In radix mode, this is exception is not an SLB miss, rather it signals that the EA is outside the range translated by any page table.
The commit lost the radix feature alternate code patch, which can cause faults to some EAs to kernel BUG at arch/powerpc/mm/slb.c:639!
The original radix code would send faults to slb_miss_large_addr, which would end up faulting due to slb_addr_limit being 0. This patch sends radix directly to do_bad_slb_fault, which is a bit clearer.
Fixes: 48e7b7695745 ("powerpc/64s/hash: Convert SLB miss handlers to C") Cc: stable@vger.kernel.org # v4.20+ Reported-by: Anton Blanchard anton@samba.org Signed-off-by: Nicholas Piggin npiggin@gmail.com Reviewed-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/exceptions-64s.S | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -612,11 +612,17 @@ EXC_COMMON_BEGIN(data_access_slb_common) ld r4,PACA_EXSLB+EX_DAR(r13) std r4,_DAR(r1) addi r3,r1,STACK_FRAME_OVERHEAD +BEGIN_MMU_FTR_SECTION + /* HPT case, do SLB fault */ bl do_slb_fault cmpdi r3,0 bne- 1f b fast_exception_return 1: /* Error case */ +MMU_FTR_SECTION_ELSE + /* Radix case, access is outside page table range */ + li r3,-EFAULT +ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) std r3,RESULT(r1) bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) @@ -661,11 +667,17 @@ EXC_COMMON_BEGIN(instruction_access_slb_ EXCEPTION_PROLOG_COMMON(0x480, PACA_EXSLB) ld r4,_NIP(r1) addi r3,r1,STACK_FRAME_OVERHEAD +BEGIN_MMU_FTR_SECTION + /* HPT case, do SLB fault */ bl do_slb_fault cmpdi r3,0 bne- 1f b fast_exception_return 1: /* Error case */ +MMU_FTR_SECTION_ELSE + /* Radix case, access is outside page table range */ + li r3,-EFAULT +ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) std r3,RESULT(r1) bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11)
From: Mikulas Patocka mpatocka@redhat.com
commit 0d74e6a3b6421d98eeafbed26f29156d469bc0b5 upstream.
If the string opt_string is small, the function memcmp can access bytes that are beyond the terminating nul character. In theory, it could cause segfault, if opt_string were located just below some unmapped memory.
Change from memcmp to strncmp so that we don't read bytes beyond the end of the string.
Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-integrity.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -3185,7 +3185,7 @@ static int dm_integrity_ctr(struct dm_ta journal_watermark = val; else if (sscanf(opt_string, "commit_time:%u%c", &val, &dummy) == 1) sync_msec = val; - else if (!memcmp(opt_string, "meta_device:", strlen("meta_device:"))) { + else if (!strncmp(opt_string, "meta_device:", strlen("meta_device:"))) { if (ic->meta_dev) { dm_put_device(ti, ic->meta_dev); ic->meta_dev = NULL; @@ -3204,17 +3204,17 @@ static int dm_integrity_ctr(struct dm_ta goto bad; } ic->sectors_per_block = val >> SECTOR_SHIFT; - } else if (!memcmp(opt_string, "internal_hash:", strlen("internal_hash:"))) { + } else if (!strncmp(opt_string, "internal_hash:", strlen("internal_hash:"))) { r = get_alg_and_key(opt_string, &ic->internal_hash_alg, &ti->error, "Invalid internal_hash argument"); if (r) goto bad; - } else if (!memcmp(opt_string, "journal_crypt:", strlen("journal_crypt:"))) { + } else if (!strncmp(opt_string, "journal_crypt:", strlen("journal_crypt:"))) { r = get_alg_and_key(opt_string, &ic->journal_crypt_alg, &ti->error, "Invalid journal_crypt argument"); if (r) goto bad; - } else if (!memcmp(opt_string, "journal_mac:", strlen("journal_mac:"))) { + } else if (!strncmp(opt_string, "journal_mac:", strlen("journal_mac:"))) { r = get_alg_and_key(opt_string, &ic->journal_mac_alg, &ti->error, "Invalid journal_mac argument"); if (r)
From: Mikulas Patocka mpatocka@redhat.com
commit 75ae193626de3238ca5fb895868ec91c94e63b1b upstream.
The limit was already incorporated to dm-crypt with commit 4e870e948fba ("dm crypt: fix error with too large bios"), so we don't need to apply it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is wrong anyway because the variable ti->max_io_len it is supposed to be in the units of 512-byte sectors not in bytes.
Reduction of the limit to 1048576 sectors could even cause data corruption in rare cases - suppose that we have a dm-striped device with stripe size 768MiB. The target will call dm_set_target_max_io_len with the value 1572864. The buggy code would reduce it to 1048576. Now, the dm-core will errorneously split the bios on 1048576-sector boundary insetad of 1572864-sector boundary and pass these stripe-crossing bios to the striped target.
Cc: stable@vger.kernel.org # v4.16+ Fixes: 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") Signed-off-by: Mikulas Patocka mpatocka@redhat.com Acked-by: Ming Lei ming.lei@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
--- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1060,15 +1060,7 @@ int dm_set_target_max_io_len(struct dm_t return -EINVAL; }
- /* - * BIO based queue uses its own splitting. When multipage bvecs - * is switched on, size of the incoming bio may be too big to - * be handled in some targets, such as crypt. - * - * When these targets are ready for the big bio, we can remove - * the limit. - */ - ti->max_io_len = min_t(uint32_t, len, BIO_MAX_PAGES * PAGE_SIZE); + ti->max_io_len = (uint32_t) len;
return 0; }
From: Ilya Dryomov idryomov@gmail.com
commit eb40c0acdc342b815d4d03ae6abb09e80c0f2988 upstream.
Some devices don't use blk_integrity but still want stable pages because they do their own checksumming. Examples include rbd and iSCSI when data digests are negotiated. Stacking DM (and thus LVM) on top of these devices results in sporadic checksum errors.
Set BDI_CAP_STABLE_WRITES if any underlying device has it set.
Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-table.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
--- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1852,6 +1852,36 @@ static bool dm_table_supports_secure_era return true; }
+static int device_requires_stable_pages(struct dm_target *ti, + struct dm_dev *dev, sector_t start, + sector_t len, void *data) +{ + struct request_queue *q = bdev_get_queue(dev->bdev); + + return q && bdi_cap_stable_pages_required(q->backing_dev_info); +} + +/* + * If any underlying device requires stable pages, a table must require + * them as well. Only targets that support iterate_devices are considered: + * don't want error, zero, etc to require stable pages. + */ +static bool dm_table_requires_stable_pages(struct dm_table *t) +{ + struct dm_target *ti; + unsigned i; + + for (i = 0; i < dm_table_get_num_targets(t); i++) { + ti = dm_table_get_target(t, i); + + if (ti->type->iterate_devices && + ti->type->iterate_devices(ti, device_requires_stable_pages, NULL)) + return true; + } + + return false; +} + void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, struct queue_limits *limits) { @@ -1910,6 +1940,15 @@ void dm_table_set_restrictions(struct dm dm_table_verify_integrity(t);
/* + * Some devices don't use blk_integrity but still want stable pages + * because they do their own checksumming. + */ + if (dm_table_requires_stable_pages(t)) + q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; + else + q->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES; + + /* * Determine whether or not this queue's I/O timings contribute * to the entropy pool, Only request-based targets use this. * Clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
From: Mike Snitzer snitzer@redhat.com
commit bcb44433bba5eaff293888ef22ffa07f1f0347d6 upstream.
Storage devices which report supporting discard commands like WRITE_SAME_16 with unmap, but reject discard commands sent to the storage device. This is a clear storage firmware bug but it doesn't change the fact that should a program cause discards to be sent to a multipath device layered on this buggy storage, all paths can end up failed at the same time from the discards, causing possible I/O loss.
The first discard to a path will fail with Illegal Request, Invalid field in cdb, e.g.: kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current] kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00 kernel: blk_update_request: critical target error, dev sdfn, sector 10487808
The SCSI layer converts this to the BLK_STS_TARGET error number, the sd device disables its support for discard on this path, and because of the BLK_STS_TARGET error multipath fails the discard without failing any path or retrying down a different path. But subsequent discards can cause path failures. Any discards sent to the path which already failed a discard ends up failing with EIO from blk_cloned_rq_check_limits with an "over max size limit" error since the discard limit was set to 0 by the sd driver for the path. As the error is EIO, this now fails the path and multipath tries to send the discard down the next path. This cycle continues as discards are sent until all paths fail.
Fix this by training DM core to disable DISCARD if the underlying storage already did so.
Also, fix branching in dm_done() and clone_endio() to reflect the mutually exclussive nature of the IO operations in question.
Cc: stable@vger.kernel.org Reported-by: David Jeffery djeffery@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-core.h | 1 + drivers/md/dm-rq.c | 11 +++++++---- drivers/md/dm.c | 20 ++++++++++++++++---- 3 files changed, 24 insertions(+), 8 deletions(-)
--- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -115,6 +115,7 @@ struct mapped_device { struct srcu_struct io_barrier; };
+void disable_discard(struct mapped_device *md); void disable_write_same(struct mapped_device *md); void disable_write_zeroes(struct mapped_device *md);
--- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -206,11 +206,14 @@ static void dm_done(struct request *clon }
if (unlikely(error == BLK_STS_TARGET)) { - if (req_op(clone) == REQ_OP_WRITE_SAME && - !clone->q->limits.max_write_same_sectors) + if (req_op(clone) == REQ_OP_DISCARD && + !clone->q->limits.max_discard_sectors) + disable_discard(tio->md); + else if (req_op(clone) == REQ_OP_WRITE_SAME && + !clone->q->limits.max_write_same_sectors) disable_write_same(tio->md); - if (req_op(clone) == REQ_OP_WRITE_ZEROES && - !clone->q->limits.max_write_zeroes_sectors) + else if (req_op(clone) == REQ_OP_WRITE_ZEROES && + !clone->q->limits.max_write_zeroes_sectors) disable_write_zeroes(tio->md); }
--- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -963,6 +963,15 @@ static void dec_pending(struct dm_io *io } }
+void disable_discard(struct mapped_device *md) +{ + struct queue_limits *limits = dm_get_queue_limits(md); + + /* device doesn't really support DISCARD, disable it */ + limits->max_discard_sectors = 0; + blk_queue_flag_clear(QUEUE_FLAG_DISCARD, md->queue); +} + void disable_write_same(struct mapped_device *md) { struct queue_limits *limits = dm_get_queue_limits(md); @@ -988,11 +997,14 @@ static void clone_endio(struct bio *bio) dm_endio_fn endio = tio->ti->type->end_io;
if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) { - if (bio_op(bio) == REQ_OP_WRITE_SAME && - !bio->bi_disk->queue->limits.max_write_same_sectors) + if (bio_op(bio) == REQ_OP_DISCARD && + !bio->bi_disk->queue->limits.max_discard_sectors) + disable_discard(md); + else if (bio_op(bio) == REQ_OP_WRITE_SAME && + !bio->bi_disk->queue->limits.max_write_same_sectors) disable_write_same(md); - if (bio_op(bio) == REQ_OP_WRITE_ZEROES && - !bio->bi_disk->queue->limits.max_write_zeroes_sectors) + else if (bio_op(bio) == REQ_OP_WRITE_ZEROES && + !bio->bi_disk->queue->limits.max_write_zeroes_sectors) disable_write_zeroes(md); }
From: Mikulas Patocka mpatocka@redhat.com
commit 4ed319c6ac08e9a28fca7ac188181ac122f4de84 upstream.
dm-integrity will deadlock if overlapping I/O is issued to it, the bug was introduced by commit 724376a04d1a ("dm integrity: implement fair range locks"). Users rarely use overlapping I/O so this bug went undetected until now.
Fix this bug by correcting, likely cut-n-paste, typos in ranges_overlap() and also remove a flawed ranges_overlap() check in remove_range_unlocked(). This condition could leave unprocessed bios hanging on wait_list forever.
Cc: stable@vger.kernel.org # v4.19+ Fixes: 724376a04d1a ("dm integrity: implement fair range locks") Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-integrity.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -913,7 +913,7 @@ static void copy_from_journal(struct dm_ static bool ranges_overlap(struct dm_integrity_range *range1, struct dm_integrity_range *range2) { return range1->logical_sector < range2->logical_sector + range2->n_sectors && - range2->logical_sector + range2->n_sectors > range2->logical_sector; + range1->logical_sector + range1->n_sectors > range2->logical_sector; }
static bool add_new_range(struct dm_integrity_c *ic, struct dm_integrity_range *new_range, bool check_waiting) @@ -959,8 +959,6 @@ static void remove_range_unlocked(struct struct dm_integrity_range *last_range = list_first_entry(&ic->wait_list, struct dm_integrity_range, wait_entry); struct task_struct *last_range_task; - if (!ranges_overlap(range, last_range)) - break; last_range_task = last_range->task; list_del(&last_range->wait_entry); if (!add_new_range(ic, last_range, false)) {
From: Marc Orr marcorr@google.com
commit acff78477b9b4f26ecdf65733a4ed77fe837e9dc upstream.
The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a result, we discovered the potential for a buggy or malicious L1 to get access to L0's x2APIC MSRs, via an L2, as follows.
1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl variable, in nested_vmx_prepare_msr_bitmap() to become true. 2. L1 disables "virtualize x2APIC mode" in VMCS12. 3. L1 enables "APIC-register virtualization" in VMCS12.
Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then set "virtualize x2APIC mode" to 0 in VMCS02. Oops.
This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR intercepts with VMCS12's "virtualize x2APIC mode" control.
The scenario outlined above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Add leak scenario to virt_x2apic_mode_test".
Note, it looks like this issue may have been introduced inadvertently during a merge---see 15303ba5d1cd.
Signed-off-by: Marc Orr marcorr@google.com Reviewed-by: Jim Mattson jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx/nested.c | 72 ++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 28 deletions(-)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -500,6 +500,17 @@ static void nested_vmx_disable_intercept } }
+static inline void enable_x2apic_msr_intercepts(unsigned long *msr_bitmap) { + int msr; + + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { + unsigned word = msr / BITS_PER_LONG; + + msr_bitmap[word] = ~0; + msr_bitmap[word + (0x800 / sizeof(long))] = ~0; + } +} + /* * Merge L0's and L1's MSR bitmap, return false to indicate that * we do not use the hardware. @@ -541,39 +552,44 @@ static inline bool nested_vmx_prepare_ms return false;
msr_bitmap_l1 = (unsigned long *)kmap(page); - if (nested_cpu_has_apic_reg_virt(vmcs12)) { - /* - * L0 need not intercept reads for MSRs between 0x800 and 0x8ff, it - * just lets the processor take the value from the virtual-APIC page; - * take those 256 bits directly from the L1 bitmap. - */ - for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { - unsigned word = msr / BITS_PER_LONG; - msr_bitmap_l0[word] = msr_bitmap_l1[word]; - msr_bitmap_l0[word + (0x800 / sizeof(long))] = ~0; - } - } else { - for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { - unsigned word = msr / BITS_PER_LONG; - msr_bitmap_l0[word] = ~0; - msr_bitmap_l0[word + (0x800 / sizeof(long))] = ~0; - } - }
- nested_vmx_disable_intercept_for_msr( - msr_bitmap_l1, msr_bitmap_l0, - X2APIC_MSR(APIC_TASKPRI), - MSR_TYPE_W); + /* + * To keep the control flow simple, pay eight 8-byte writes (sixteen + * 4-byte writes on 32-bit systems) up front to enable intercepts for + * the x2APIC MSR range and selectively disable them below. + */ + enable_x2apic_msr_intercepts(msr_bitmap_l0); + + if (nested_cpu_has_virt_x2apic_mode(vmcs12)) { + if (nested_cpu_has_apic_reg_virt(vmcs12)) { + /* + * L0 need not intercept reads for MSRs between 0x800 + * and 0x8ff, it just lets the processor take the value + * from the virtual-APIC page; take those 256 bits + * directly from the L1 bitmap. + */ + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { + unsigned word = msr / BITS_PER_LONG; + + msr_bitmap_l0[word] = msr_bitmap_l1[word]; + } + }
- if (nested_cpu_has_vid(vmcs12)) { - nested_vmx_disable_intercept_for_msr( - msr_bitmap_l1, msr_bitmap_l0, - X2APIC_MSR(APIC_EOI), - MSR_TYPE_W); nested_vmx_disable_intercept_for_msr( msr_bitmap_l1, msr_bitmap_l0, - X2APIC_MSR(APIC_SELF_IPI), + X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_W); + + if (nested_cpu_has_vid(vmcs12)) { + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + X2APIC_MSR(APIC_EOI), + MSR_TYPE_W); + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + X2APIC_MSR(APIC_SELF_IPI), + MSR_TYPE_W); + } }
if (spec_ctrl)
From: Marc Orr marcorr@google.com
commit c73f4c998e1fd4249b9edfa39e23f4fda2b9b041 upstream.
Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the SDM, when "virtualize x2APIC mode" is 1 and "APIC-register virtualization" is 0, a RDMSR of 808H should return the VTPR from the virtual APIC page.
However, for nested, KVM currently fails to disable the read intercept for this MSR. This means that a RDMSR exit takes precedence over "virtualize x2APIC mode", and KVM passes through L1's TPR to L2, instead of sourcing the value from L2's virtual APIC page.
This patch fixes the issue by disabling the read intercept, in VMCS02, for the VTPR when "APIC-register virtualization" is 0.
The issue described above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC mode w/ nested".
Signed-off-by: Marc Orr marcorr@google.com Reviewed-by: Jim Mattson jmattson@google.com Fixes: c992384bde84f ("KVM: vmx: speed up MSR bitmap merge") Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx/nested.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -578,7 +578,7 @@ static inline bool nested_vmx_prepare_ms nested_vmx_disable_intercept_for_msr( msr_bitmap_l1, msr_bitmap_l0, X2APIC_MSR(APIC_TASKPRI), - MSR_TYPE_W); + MSR_TYPE_R | MSR_TYPE_W);
if (nested_cpu_has_vid(vmcs12)) { nested_vmx_disable_intercept_for_msr(
From: Gerd Hoffmann kraxel@redhat.com
commit 16065fcdd19ddb9e093192914ac863884f308766 upstream.
Bisected guest kernel changes crashing qemu. Landed at "6c1cd97bda drm/virtio: fix resource id handling". Looked again, and noticed we where not only leaking *some* ids, but *all* ids. The old code never ever called virtio_gpu_resource_id_put().
So, commit 6c1cd97bda effectively makes the linux kernel starting re-using IDs after releasing them, and apparently virglrenderer can't deal with that. Oops.
This patch puts a temporary stopgap into place for the 5.0 release.
Signed-off-by: Gerd Hoffmann kraxel@redhat.com Reviewed-by: Dave Airlie airlied@redhat.com Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20190208140409.15280-1-kraxel@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/virtio/virtgpu_object.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/drivers/gpu/drm/virtio/virtgpu_object.c +++ b/drivers/gpu/drm/virtio/virtgpu_object.c @@ -28,10 +28,21 @@ static int virtio_gpu_resource_id_get(struct virtio_gpu_device *vgdev, uint32_t *resid) { +#if 0 int handle = ida_alloc(&vgdev->resource_ida, GFP_KERNEL);
if (handle < 0) return handle; +#else + static int handle; + + /* + * FIXME: dirty hack to avoid re-using IDs, virglrenderer + * can't deal with that. Needs fixing in virglrenderer, also + * should figure a better way to handle that in the guest. + */ + handle++; +#endif
*resid = handle + 1; return 0; @@ -39,7 +50,9 @@ static int virtio_gpu_resource_id_get(st
static void virtio_gpu_resource_id_put(struct virtio_gpu_device *vgdev, uint32_t id) { +#if 0 ida_free(&vgdev->resource_ida, id - 1); +#endif }
static void virtio_gpu_ttm_bo_destroy(struct ttm_buffer_object *tbo)
On 15/04/2019 19:59, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.0.8 release. There are 117 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:42 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.0.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.0: 11 builds: 11 pass, 0 fail 22 boots: 22 pass, 0 fail 30 tests: 30 pass, 0 fail
Linux version: 5.0.8-rc1-ge6fdfdb Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Tue, Apr 16, 2019 at 11:34:44AM +0100, Jon Hunter wrote:
On 15/04/2019 19:59, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.0.8 release. There are 117 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:42 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.0.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.0: 11 builds: 11 pass, 0 fail 22 boots: 22 pass, 0 fail 30 tests: 30 pass, 0 fail
Linux version: 5.0.8-rc1-ge6fdfdb Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Thanks for testing all of these and letting me know.
greg k-h
On Tue, 16 Apr 2019 at 00:39, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.0.8 release. There are 117 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:42 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.0.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 5.0.8-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.0.y git commit: e6fdfdbfcc6a9ee160e553eb0eaab811d6de783e git describe: v5.0.7-118-ge6fdfdbfcc6a Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.0-oe/build/v5.0.7-118-g...
No regressions (compared to build v5.0.7)
No fixes (compared to build v5.0.7)
Ran 23487 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86
Test Suites ----------- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * spectre-meltdown-checker-test * v4l2-compliance * kvm-unit-tests * ltp-fs-tests * perf * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Tue, Apr 16, 2019 at 04:57:07PM +0530, Naresh Kamboju wrote:
On Tue, 16 Apr 2019 at 00:39, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.0.8 release. There are 117 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:42 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.0.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Thanks for testing all of these and letting me know.
greg k-h
On Mon, Apr 15, 2019 at 08:59:30PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.0.8 release. There are 117 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:42 UTC 2019. Anything received after that time might be too late.
Build results: total: 159 pass: 159 fail: 0 Qemu test results: total: 349 pass: 349 fail: 0
Guenter
On 4/15/19 12:59 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.0.8 release. There are 117 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:42 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.0.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Tue, Apr 16, 2019 at 03:41:02PM -0600, shuah wrote:
On 4/15/19 12:59 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.0.8 release. There are 117 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Apr 17 18:36:42 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.0.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Thanks for testing these and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org