This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 16 Mar 2022 11:27:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.15-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.16.15-rc1
Niklas Cassel niklas.cassel@wdc.com riscv: dts: k210: fix broken IRQs on hart1
Filipe Manana fdmanana@suse.com btrfs: make send work with concurrent block group relocation
Zhengjun Xing zhengjun.xing@linux.intel.com perf parse: Fix event parser error for hybrid systems
Thomas Zimmermann tzimmermann@suse.de drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP
Li Huafei lihuafei1@huawei.com x86/traps: Mark do_int3() NOKPROBE_SYMBOL
Jarkko Sakkinen jarkko@kernel.org x86/sgx: Free backing memory after faulting the enclave page
Peter Zijlstra peterz@infradead.org x86/module: Fix the paravirt vs alternative order
Ross Philipson ross.philipson@oracle.com x86/boot: Add setup_indirect support in early_memremap_is_setup_data()
Ross Philipson ross.philipson@oracle.com x86/boot: Fix memremap of setup_indirect structures
David Howells dhowells@redhat.com watch_queue: Make comment about setting ->defunct more accurate
David Howells dhowells@redhat.com watch_queue: Fix lack of barrier/sync/lock between post and read
David Howells dhowells@redhat.com watch_queue: Free the alloc bitmap when the watch_queue is torn down
David Howells dhowells@redhat.com watch_queue: Fix the alloc bitmap size to reflect notes allocated
David Howells dhowells@redhat.com watch_queue: Fix to always request a pow-of-2 pipe ring size
David Howells dhowells@redhat.com watch_queue: Fix to release page in ->release()
David Howells dhowells@redhat.com watch_queue, pipe: Free watchqueue state after clearing pipe ring
David Howells dhowells@redhat.com watch_queue: Fix filter limit check
Russell King (Oracle) rmk+kernel@armlinux.org.uk ARM: fix Thumb2 regression with Spectre BHB
Dima Chumak dchumak@nvidia.com net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
Michael S. Tsirkin mst@redhat.com virtio: acknowledge all features before access
Michael S. Tsirkin mst@redhat.com virtio: unexport virtio_finalize_features
Halil Pasic pasic@linux.ibm.com swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
Paul Semel semelpaul@gmail.com arm64: kasan: fix include error in MTE functions
Catalin Marinas catalin.marinas@arm.com arm64: Ensure execute-only permissions are not allowed without EPAN
Pali Rohár pali@kernel.org arm64: dts: marvell: armada-37xx: Remap IO space to bus address 0x0
Daniel Bristot de Oliveira bristot@kernel.org tracing/osnoise: Do not unregister events twice
Nicolas Saenz Julienne nsaenzju@redhat.com tracing/osnoise: Force quiescent states while tracing
Emil Renner Berthing kernel@esmil.dk riscv: Fix auipc+jalr relocation range checks
Rong Chen rong.chen@amlogic.com mmc: meson: Fix usage of meson_mmc_post_req()
Jisheng Zhang jszhang@kernel.org riscv: alternative only works on !XIP_KERNEL
Robert Hancock robert.hancock@calian.com net: macb: Fix lost RX packet wakeup race in NAPI receive
Dan Carpenter dan.carpenter@oracle.com staging: gdm724x: fix use after free in gdm_lte_rx()
Hans de Goede hdegoede@redhat.com staging: rtl8723bs: Fix access-point mode deadlock
Miklos Szeredi mszeredi@redhat.com fuse: fix pipe buffer lifetime for direct_io
Miklos Szeredi mszeredi@redhat.com fuse: fix fileattr op failure
Randy Dunlap rdunlap@infradead.org ARM: Spectre-BHB: provide empty stub for non-config
Mike Kravetz mike.kravetz@oracle.com selftests/memfd: clean up mapping in mfd_fail_write
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com selftest/vm: fix map_fixed_noreplace test failure
Christophe Leroy christophe.leroy@csgroup.eu tracing: Fix selftest config check for function graph start up test
Daniel Bristot de Oliveira bristot@kernel.org tracing/osnoise: Make osnoise_main to sleep for microseconds
Sven Schnelle svens@linux.ibm.com tracing: Ensure trace buffer is at least 4096 bytes large
Niels Dossche dossche.niels@gmail.com ipv6: prevent a possible race condition with lifetimes
Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Revert "xen-netback: Check for hotplug-status existence before watching"
Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
Guchun Chen guchun.chen@amd.com drm/amdgpu: bypass tiling flag check in virtual display case (v2)
Shreeya Patel shreeya.patel@collabora.com gpio: Return EPROBE_DEFER if gc->to_irq is NULL
Alex Deucher alexander.deucher@amd.com PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
Varun Prakash varun@chelsio.com nvme-tcp: send H2CData PDUs based on MAXH2CDATA
Vikash Chandola vikash.chandola@linux.intel.com hwmon: (pmbus) Clear pmbus fault/warning bits after read
suresh kumar suresh2514@gmail.com net-sysfs: add check for netdevice being present to speed_show
Duoming Zhou duoming@zju.edu.cn drivers: hamradio: 6pack: fix UAF bug caused by mod_timer()
Wanpeng Li wanpengli@tencent.com x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
Nikhil Gupta nikhil.gupta@nxp.com of/fdt: move elfcorehdr reservation early for crash dump kernel
Maxime Ripard maxime@cerno.tech drm/vc4: hdmi: Unregister codec device on unbind
Jon Lin jon.lin@rock-chips.com spi: rockchip: terminate dma transmission when slave abort
Jon Lin jon.lin@rock-chips.com spi: rockchip: Fix error in getting num-cs property
Anton Romanov romanton@google.com kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
Wanpeng Li wanpengli@tencent.com KVM: Fix lockdep false negative during host resume
Andy Shevchenko andriy.shevchenko@linux.intel.com pinctrl: tigerlake: Revert "Add Alder Lake-M ACPI ID"
Heikki Krogerus heikki.krogerus@linux.intel.com usb: dwc3: pci: add support for the Intel Raptor Lake-S
Halil Pasic pasic@linux.ibm.com swiotlb: fix info leak with DMA_FROM_DEVICE
Kumar Kartikeya Dwivedi memxor@gmail.com selftests/bpf: Add test for bpf_timer overwriting crash
Heiner Kallweit hkallweit1@gmail.com net: phy: meson-gxl: improve link-up behavior
Jeremy Linton jeremy.linton@arm.com net: bcmgenet: Don't claim WOL when its not available
Jianglei Nie niejianglei2021@163.com net: arc_emac: Fix use after free in arc_mdio_probe()
Eric Dumazet edumazet@google.com sctp: fix kernel-infoleak for SCTP sockets
Clément Léger clement.leger@bootlin.com net: phy: DP83822: clear MISR2 register to disable interrupts
Miaoqian Lin linmq006@gmail.com gianfar: ethtool: Fix refcount leak in gfar_get_ts_info
Linus Torvalds torvalds@linux-foundation.org mm: gup: make fault_in_safe_writeable() use fixup_user_fault()
Mark Featherston mark@embeddedTS.com gpio: ts4900: Do not set DAT and OE together
Guillaume Nault gnault@redhat.com selftests: pmtu.sh: Kill nettest processes launched in subshell.
Guillaume Nault gnault@redhat.com selftests: pmtu.sh: Kill tcpdump processes launched by subshell.
Pavel Skripkin paskripkin@gmail.com NFC: port100: fix use-after-free in port100_send_complete
Ben Ben-Ishay benishay@nvidia.com net/mlx5e: SHAMPO, reduce TIR indication
Roi Dayan roid@nvidia.com net/mlx5e: Lag, Only handle events from highest priority multipath entry
Moshe Shemesh moshe@nvidia.com net/mlx5: Fix a race on command flush flow
Mohammad Kabat mohammadkab@nvidia.com net/mlx5: Fix size field in bufferx_reg struct
Duoming Zhou duoming@zju.edu.cn ax25: Fix NULL pointer dereference in ax25_kill_by_device
Miaoqian Lin linmq006@gmail.com net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr
Jiasheng Jiang jiasheng@iscas.ac.cn net: ethernet: lpc_eth: Handle error for clk_enable
Jiasheng Jiang jiasheng@iscas.ac.cn net: ethernet: ti: cpts: Handle error for clk_enable
Tung Nguyen tung.q.nguyen@dektech.com.au tipc: fix incorrect order of state message data sanity check
Miaoqian Lin linmq006@gmail.com ethernet: Fix error handling in xemaclite_of_probe
Jedrzej Jagielski jedrzej.jagielski@intel.com ice: Fix curr_link_speed advertised speed
Christophe JAILLET christophe.jaillet@wanadoo.fr ice: Don't use GFP_KERNEL in atomic context
Dave Ertman david.m.ertman@intel.com ice: Fix error with handling of bonding MTU
Jacob Keller jacob.e.keller@intel.com ice: stop disabling VFs due to PF error responses
Jacob Keller jacob.e.keller@intel.com i40e: stop disabling VFs due to PF error responses
Michal Maloszewski michal.maloszewski@intel.com iavf: Fix handling of vlan strip virtual channel messages
Joel Stanley joel@jms.id.au ARM: dts: aspeed: Fix AST2600 quad spi group
Russell King (Oracle) rmk+kernel@armlinux.org.uk net: dsa: mt7530: fix incorrect test in mt753x_phylink_validate()
Jernej Skrabec jernej.skrabec@gmail.com drm/sun4i: mixer: Fix P010 and P210 format numbers
Jouni Högander jouni.hogander@intel.com drm/i915/psr: Set "SF Partial Frame Enable" also on full update
Andy Shevchenko andriy.shevchenko@linux.intel.com gpiolib: acpi: Convert ACPI value of debounce to microseconds
Fabio Estevam festevam@denx.de smsc95xx: Ignore -ENODEV errors when device is unplugged
Tom Rix trix@redhat.com qed: return status of qed_iov_get_link
Eric Dumazet edumazet@google.com net: gro: move skb_gro_receive_list to udp_offload.c
Steffen Klassert steffen.klassert@secunet.com esp: Fix BEET mode inter address family tunneling on GSO
Steffen Klassert steffen.klassert@secunet.com esp: Fix possible buffer overflow in ESP transformation
Jia-Ju Bai baijiaju1990@gmail.com net: qlogic: check the return value of dma_alloc_coherent() in qed_vf_hw_prepare()
Jia-Ju Bai baijiaju1990@gmail.com isdn: hfcpci: check the return value of dma_set_mask() in setup_hw()
Zhang Min zhang.min9@zte.com.cn vdpa: fix use-after-free on vp_vdpa_remove
Xie Yongji xieyongji@bytedance.com virtio-blk: Remove BUG_ON() in virtio_queue_rq()
Xie Yongji xieyongji@bytedance.com virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
Anirudh Rayabharam mail@anirudhrb.com vhost: fix hung thread due to erroneous iotlb entries
Alexey Khoroshilov khoroshilov@ispras.ru mISDN: Fix memory leak in dsp_pipeline_build()
Heiner Kallweit hkallweit1@gmail.com net: phy: meson-gxl: fix interrupt handling in forced mode
Xie Yongji xieyongji@bytedance.com vduse: Fix returning wrong type in vduse_domain_alloc_iova()
Si-Wei Liu si-wei.liu@oracle.com vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
Tung Nguyen tung.q.nguyen@dektech.com.au tipc: fix kernel panic when enabling bearer
Pali Rohár pali@kernel.org arm64: dts: armada-3720-turris-mox: Add missing ethernet0 alias
Jia-Ju Bai baijiaju1990@gmail.com HID: nintendo: check the return value of alloc_workqueue()
Dmitry Torokhov dmitry.torokhov@gmail.com HID: vivaldi: fix sysfs attributes leak
AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com soc: mediatek: mt8192-mmsys: Fix dither to dsi0 path's input sel
Taniya Das tdas@codeaurora.org clk: qcom: dispcc: Update the transition delay for MDSS GDSC
Taniya Das tdas@codeaurora.org clk: qcom: gdsc: Add support to update GDSC transition delay
Maxime Ripard maxime@cerno.tech ARM: boot: dts: bcm2711: Fix HVS register range
Pavel Skripkin paskripkin@gmail.com HID: hid-thrustmaster: fix OOB read in thrustmaster_interrupts
Jiri Kosina jkosina@suse.cz HID: elo: Revert USB reference counting
Bjorn Andersson bjorn.andersson@linaro.org arm64: dts: qcom: sm8350: Correct UFS symbol clocks
Konrad Dybcio konrad.dybcio@somainline.org arm64: dts: qcom: sm8350: Describe GCC dependency clocks
-------------
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/aspeed-g6-pinctrl.dtsi | 2 +- arch/arm/boot/dts/bcm2711.dtsi | 1 + arch/arm/include/asm/spectre.h | 6 + arch/arm/kernel/entry-armv.S | 4 +- arch/arm64/Kconfig | 3 - .../boot/dts/marvell/armada-3720-turris-mox.dts | 8 +- arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +- arch/arm64/boot/dts/qcom/sm8350.dtsi | 48 ++- arch/arm64/include/asm/mte-kasan.h | 1 + arch/arm64/include/asm/pgtable-prot.h | 4 +- arch/arm64/include/asm/pgtable.h | 11 - arch/arm64/mm/mmap.c | 17 + arch/riscv/Kconfig.erratas | 1 + arch/riscv/Kconfig.socs | 4 +- arch/riscv/boot/dts/canaan/k210.dtsi | 3 +- arch/riscv/kernel/module.c | 21 +- arch/x86/kernel/cpu/sgx/encl.c | 57 +++- arch/x86/kernel/e820.c | 41 ++- arch/x86/kernel/kdebugfs.c | 37 ++- arch/x86/kernel/ksysfs.c | 77 ++++- arch/x86/kernel/kvm.c | 9 +- arch/x86/kernel/module.c | 13 +- arch/x86/kernel/setup.c | 34 +- arch/x86/kernel/traps.c | 1 + arch/x86/kvm/x86.c | 7 + arch/x86/mm/ioremap.c | 57 +++- drivers/block/virtio_blk.c | 20 +- drivers/clk/qcom/dispcc-sc7180.c | 5 +- drivers/clk/qcom/dispcc-sc7280.c | 5 +- drivers/clk/qcom/dispcc-sm8250.c | 5 +- drivers/clk/qcom/gdsc.c | 26 +- drivers/clk/qcom/gdsc.h | 8 +- drivers/gpio/gpio-ts4900.c | 24 +- drivers/gpio/gpiolib-acpi.c | 6 +- drivers/gpio/gpiolib.c | 20 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +- drivers/gpu/drm/i915/display/intel_psr.c | 16 +- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/panel/Kconfig | 1 + drivers/gpu/drm/sun4i/sun8i_mixer.h | 8 +- drivers/gpu/drm/vc4/vc4_hdmi.c | 8 + drivers/gpu/drm/vc4/vc4_hdmi.h | 1 + drivers/hid/hid-elo.c | 7 +- drivers/hid/hid-nintendo.c | 4 + drivers/hid/hid-thrustmaster.c | 6 + drivers/hid/hid-vivaldi.c | 2 +- drivers/hwmon/pmbus/pmbus_core.c | 5 + drivers/isdn/hardware/mISDN/hfcpci.c | 6 +- drivers/isdn/mISDN/dsp_pipeline.c | 6 +- drivers/mmc/host/meson-gx-mmc.c | 15 +- drivers/net/dsa/mt7530.c | 2 +- drivers/net/ethernet/arc/emac_mdio.c | 5 +- drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c | 7 + drivers/net/ethernet/cadence/macb_main.c | 25 +- drivers/net/ethernet/freescale/gianfar_ethtool.c | 1 + drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 6 +- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 57 +--- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 5 - drivers/net/ethernet/intel/iavf/iavf_virtchnl.c | 40 +++ drivers/net/ethernet/intel/ice/ice.h | 1 + drivers/net/ethernet/intel/ice/ice_ethtool.c | 2 +- drivers/net/ethernet/intel/ice/ice_main.c | 31 +- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 18 -- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h | 3 - .../net/ethernet/marvell/prestera/prestera_main.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 15 +- drivers/net/ethernet/mellanox/mlx5/core/en/tir.c | 3 - drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c | 11 +- .../ethernet/mellanox/mlx5/core/lib/fs_chains.c | 3 - drivers/net/ethernet/nxp/lpc_eth.c | 5 +- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 18 +- drivers/net/ethernet/qlogic/qed/qed_vf.c | 7 + drivers/net/ethernet/ti/cpts.c | 4 +- drivers/net/ethernet/xilinx/xilinx_emaclite.c | 4 +- drivers/net/hamradio/6pack.c | 4 +- drivers/net/phy/dp83822.c | 2 +- drivers/net/phy/meson-gxl.c | 31 +- drivers/net/usb/smsc95xx.c | 28 +- drivers/net/xen-netback/xenbus.c | 14 +- drivers/nfc/port100.c | 2 + drivers/nvme/host/tcp.c | 63 +++- drivers/of/fdt.c | 2 +- drivers/pci/quirks.c | 14 +- drivers/pinctrl/intel/pinctrl-tigerlake.c | 1 - drivers/soc/mediatek/mt8192-mmsys.h | 3 +- drivers/spi/spi-rockchip.c | 13 +- drivers/staging/gdm724x/gdm_lte.c | 5 +- drivers/staging/rtl8723bs/core/rtw_mlme_ext.c | 7 +- drivers/staging/rtl8723bs/core/rtw_recv.c | 10 +- drivers/staging/rtl8723bs/core/rtw_sta_mgt.c | 22 +- drivers/staging/rtl8723bs/core/rtw_xmit.c | 16 +- drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c | 2 + drivers/usb/dwc3/dwc3-pci.c | 4 + drivers/vdpa/mlx5/net/mlx5_vnet.c | 16 + drivers/vdpa/vdpa_user/iova_domain.c | 2 +- drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +- drivers/vhost/iotlb.c | 11 + drivers/vhost/vhost.c | 5 + drivers/virtio/virtio.c | 40 +-- fs/btrfs/block-group.c | 9 +- fs/btrfs/ctree.c | 98 ++++-- fs/btrfs/ctree.h | 14 +- fs/btrfs/disk-io.c | 4 +- fs/btrfs/relocation.c | 13 - fs/btrfs/send.c | 357 ++++++++++++++++++--- fs/btrfs/transaction.c | 4 + fs/fuse/dev.c | 12 +- fs/fuse/file.c | 1 + fs/fuse/fuse_i.h | 1 + fs/fuse/ioctl.c | 9 +- fs/pipe.c | 11 +- include/linux/mlx5/mlx5_ifc.h | 5 +- include/linux/netdevice.h | 1 - include/linux/nvme-tcp.h | 1 + include/linux/virtio.h | 1 - include/linux/virtio_config.h | 3 +- include/linux/watch_queue.h | 3 +- include/net/esp.h | 2 + kernel/dma/swiotlb.c | 22 +- kernel/trace/trace.c | 10 +- kernel/trace/trace_osnoise.c | 84 +++-- kernel/trace/trace_selftest.c | 6 +- kernel/watch_queue.c | 15 +- mm/gup.c | 57 ++-- net/ax25/af_ax25.c | 7 + net/core/net-sysfs.c | 2 +- net/core/skbuff.c | 26 -- net/ipv4/esp4.c | 5 + net/ipv4/esp4_offload.c | 3 + net/ipv4/udp_offload.c | 27 ++ net/ipv6/addrconf.c | 2 + net/ipv6/esp6.c | 5 + net/ipv6/esp6_offload.c | 3 + net/sctp/diag.c | 9 +- net/tipc/bearer.c | 12 +- net/tipc/link.c | 9 +- tools/perf/util/parse-events.c | 6 +- .../testing/selftests/bpf/prog_tests/timer_crash.c | 32 ++ tools/testing/selftests/bpf/progs/timer_crash.c | 54 ++++ tools/testing/selftests/memfd/memfd_test.c | 1 + tools/testing/selftests/net/pmtu.sh | 21 +- tools/testing/selftests/vm/map_fixed_noreplace.c | 49 ++- virt/kvm/kvm_main.c | 4 +- 145 files changed, 1650 insertions(+), 646 deletions(-)
From: Konrad Dybcio konrad.dybcio@somainline.org
[ Upstream commit 9ea9eb36b3c046fc48e737db4de69f7acd12f9be ]
Add all the clock names that the GCC driver expects to get via DT, so that the clock handles can be filled as the development progresses.
Signed-off-by: Konrad Dybcio konrad.dybcio@somainline.org Signed-off-by: Bjorn Andersson bjorn.andersson@linaro.org Link: https://lore.kernel.org/r/20211114012755.112226-8-konrad.dybcio@somainline.o... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/sm8350.dtsi | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sm8350.dtsi b/arch/arm64/boot/dts/qcom/sm8350.dtsi index c13858cf50dd..db102b293154 100644 --- a/arch/arm64/boot/dts/qcom/sm8350.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8350.dtsi @@ -583,8 +583,30 @@ gcc: clock-controller@100000 { #clock-cells = <1>; #reset-cells = <1>; #power-domain-cells = <1>; - clock-names = "bi_tcxo", "sleep_clk"; - clocks = <&rpmhcc RPMH_CXO_CLK>, <&sleep_clk>; + clock-names = "bi_tcxo", + "sleep_clk", + "pcie_0_pipe_clk", + "pcie_1_pipe_clk", + "ufs_card_rx_symbol_0_clk", + "ufs_card_rx_symbol_1_clk", + "ufs_card_tx_symbol_0_clk", + "ufs_phy_rx_symbol_0_clk", + "ufs_phy_rx_symbol_1_clk", + "ufs_phy_tx_symbol_0_clk", + "usb3_phy_wrapper_gcc_usb30_pipe_clk", + "usb3_uni_phy_sec_gcc_usb30_pipe_clk"; + clocks = <&rpmhcc RPMH_CXO_CLK>, + <&sleep_clk>, + <0>, + <0>, + <0>, + <0>, + <0>, + <0>, + <0>, + <0>, + <0>, + <0>; };
ipcc: mailbox@408000 {
From: Bjorn Andersson bjorn.andersson@linaro.org
[ Upstream commit 0fd4dcb607ce29110d6c0b481a98c4ff3d300551 ]
The introduction of '9a61f813fcc8 ("clk: qcom: regmap-mux: fix parent clock lookup")' broke UFS support on SM8350.
The cause for this is that the symbol clocks have a specified rate in the "freq-table-hz" table in the UFS node, which causes the UFS code to request a rate change, for which the "bi_tcxo" happens to provide the closest rate. Prior to the change in regmap-mux it was determined (incorrectly) that no change was needed and everything worked.
The rates of 75 and 300MHz matches the documentation for the symbol clocks, but we don't represent the parent clocks today. So let's mimic the configuration found in other platforms, by omitting the rate for the symbol clocks as well to avoid the rate change.
While at it also fill in the dummy symbol clocks that was dropped from the GCC driver as it was upstreamed.
Fixes: 59c7cf814783 ("arm64: dts: qcom: sm8350: Add UFS nodes") Signed-off-by: Bjorn Andersson bjorn.andersson@linaro.org Reviewed-by: Vinod Koul vkoul@kernel.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20211222162058.3418902-1-bjorn.andersson@linaro.or... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/sm8350.dtsi | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sm8350.dtsi b/arch/arm64/boot/dts/qcom/sm8350.dtsi index db102b293154..1a70a70ed056 100644 --- a/arch/arm64/boot/dts/qcom/sm8350.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8350.dtsi @@ -34,6 +34,24 @@ sleep_clk: sleep-clk { clock-frequency = <32000>; #clock-cells = <0>; }; + + ufs_phy_rx_symbol_0_clk: ufs-phy-rx-symbol-0 { + compatible = "fixed-clock"; + clock-frequency = <1000>; + #clock-cells = <0>; + }; + + ufs_phy_rx_symbol_1_clk: ufs-phy-rx-symbol-1 { + compatible = "fixed-clock"; + clock-frequency = <1000>; + #clock-cells = <0>; + }; + + ufs_phy_tx_symbol_0_clk: ufs-phy-tx-symbol-0 { + compatible = "fixed-clock"; + clock-frequency = <1000>; + #clock-cells = <0>; + }; };
cpus { @@ -602,9 +620,9 @@ gcc: clock-controller@100000 { <0>, <0>, <0>, - <0>, - <0>, - <0>, + <&ufs_phy_rx_symbol_0_clk>, + <&ufs_phy_rx_symbol_1_clk>, + <&ufs_phy_tx_symbol_0_clk>, <0>, <0>; }; @@ -1227,8 +1245,8 @@ ufs_mem_hc: ufshc@1d84000 { <75000000 300000000>, <0 0>, <0 0>, - <75000000 300000000>, - <75000000 300000000>; + <0 0>, + <0 0>; status = "disabled"; };
From: Jiri Kosina jkosina@suse.cz
[ Upstream commit ac89895213d8950dba6ab342863a0959f73142a7 ]
Commit 817b8b9c539 ("HID: elo: fix memory leak in elo_probe") introduced memory leak on error path, but more importantly the whole USB reference counting is not needed at all in the first place, as the driver itself doesn't change the reference counting in any way, and the associated usb_device is guaranteed to be kept around by USB core as long as the driver binding exists.
Reported-by: Alan Stern stern@rowland.harvard.edu Reported-by: Dan Carpenter dan.carpenter@oracle.com Fixes: fbf42729d0e ("HID: elo: update the reference count of the usb device structure") Fixes: 817b8b9c539 ("HID: elo: fix memory leak in elo_probe") Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-elo.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/hid/hid-elo.c b/drivers/hid/hid-elo.c index 9b42b0cdeef0..2876cb6a7dca 100644 --- a/drivers/hid/hid-elo.c +++ b/drivers/hid/hid-elo.c @@ -228,7 +228,6 @@ static int elo_probe(struct hid_device *hdev, const struct hid_device_id *id) { struct elo_priv *priv; int ret; - struct usb_device *udev;
if (!hid_is_usb(hdev)) return -EINVAL; @@ -238,8 +237,7 @@ static int elo_probe(struct hid_device *hdev, const struct hid_device_id *id) return -ENOMEM;
INIT_DELAYED_WORK(&priv->work, elo_work); - udev = interface_to_usbdev(to_usb_interface(hdev->dev.parent)); - priv->usbdev = usb_get_dev(udev); + priv->usbdev = interface_to_usbdev(to_usb_interface(hdev->dev.parent));
hid_set_drvdata(hdev, priv);
@@ -262,7 +260,6 @@ static int elo_probe(struct hid_device *hdev, const struct hid_device_id *id)
return 0; err_free: - usb_put_dev(udev); kfree(priv); return ret; } @@ -271,8 +268,6 @@ static void elo_remove(struct hid_device *hdev) { struct elo_priv *priv = hid_get_drvdata(hdev);
- usb_put_dev(priv->usbdev); - hid_hw_stop(hdev); cancel_delayed_work_sync(&priv->work); kfree(priv);
From: Pavel Skripkin paskripkin@gmail.com
[ Upstream commit fc3ef2e3297b3c0e2006b5d7b3d66965e3392036 ]
Syzbot reported an slab-out-of-bounds Read in thrustmaster_probe() bug. The root case is in missing validation check of actual number of endpoints.
Code should not blindly access usb_host_interface::endpoint array, since it may contain less endpoints than code expects.
Fix it by adding missing validaion check and print an error if number of endpoints do not match expected number
Fixes: c49c33637802 ("HID: support for initialization of some Thrustmaster wheels") Reported-and-tested-by: syzbot+35eebd505e97d315d01c@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-thrustmaster.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/hid/hid-thrustmaster.c b/drivers/hid/hid-thrustmaster.c index 03b935ff02d5..9da4240530dd 100644 --- a/drivers/hid/hid-thrustmaster.c +++ b/drivers/hid/hid-thrustmaster.c @@ -158,6 +158,12 @@ static void thrustmaster_interrupts(struct hid_device *hdev) return; }
+ if (usbif->cur_altsetting->desc.bNumEndpoints < 2) { + kfree(send_buf); + hid_err(hdev, "Wrong number of endpoints?\n"); + return; + } + ep = &usbif->cur_altsetting->endpoint[1]; b_ep = ep->desc.bEndpointAddress;
From: Maxime Ripard maxime@cerno.tech
[ Upstream commit 515415d316168c6521d74ea8280287e28d7303e6 ]
While the HVS has the same context memory size in the BCM2711 than in the previous SoCs, the range allocated to the registers doubled and it now takes 16k + 16k, compared to 8k + 16k before.
The KMS driver will use the whole context RAM though, eventually resulting in a pointer dereference error when we access the higher half of the context memory since it hasn't been mapped.
Fixes: 4564363351e2 ("ARM: dts: bcm2711: Enable the display pipeline") Signed-off-by: Maxime Ripard maxime@cerno.tech Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/bcm2711.dtsi | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm/boot/dts/bcm2711.dtsi b/arch/arm/boot/dts/bcm2711.dtsi index dff18fc9a906..21294f775a20 100644 --- a/arch/arm/boot/dts/bcm2711.dtsi +++ b/arch/arm/boot/dts/bcm2711.dtsi @@ -290,6 +290,7 @@ pixelvalve4: pixelvalve@7e216000 {
hvs: hvs@7e400000 { compatible = "brcm,bcm2711-hvs"; + reg = <0x7e400000 0x8000>; interrupts = <GIC_SPI 97 IRQ_TYPE_LEVEL_HIGH>; };
From: Taniya Das tdas@codeaurora.org
[ Upstream commit 4e7c4d3652f96f41179aab3ff53025c7a550d689 ]
GDSCs have multiple transition delays which are used for the GDSC FSM states. Older targets/designs required these values to be updated from gdsc code to certain default values for the FSM state to work as expected. But on the newer targets/designs the values updated from the GDSC driver can hamper the FSM state to not work as expected.
On SC7180 we observe black screens because the gdsc is being enabled/disabled very rapidly and the GDSC FSM state does not work as expected. This is due to the fact that the GDSC reset value is being updated from SW.
Thus add support to update the transition delay from the clock controller gdscs as required.
Fixes: 45dd0e55317cc ("clk: qcom: Add support for GDSCs) Signed-off-by: Taniya Das tdas@codeaurora.org Link: https://lore.kernel.org/r/20220223185606.3941-1-tdas@codeaurora.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/qcom/gdsc.c | 26 +++++++++++++++++++++----- drivers/clk/qcom/gdsc.h | 8 +++++++- 2 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c index 7e1dd8ccfa38..44520efc6c72 100644 --- a/drivers/clk/qcom/gdsc.c +++ b/drivers/clk/qcom/gdsc.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2015, 2017-2018, The Linux Foundation. All rights reserved. + * Copyright (c) 2015, 2017-2018, 2022, The Linux Foundation. All rights reserved. */
#include <linux/bitops.h> @@ -35,9 +35,14 @@ #define CFG_GDSCR_OFFSET 0x4
/* Wait 2^n CXO cycles between all states. Here, n=2 (4 cycles). */ -#define EN_REST_WAIT_VAL (0x2 << 20) -#define EN_FEW_WAIT_VAL (0x8 << 16) -#define CLK_DIS_WAIT_VAL (0x2 << 12) +#define EN_REST_WAIT_VAL 0x2 +#define EN_FEW_WAIT_VAL 0x8 +#define CLK_DIS_WAIT_VAL 0x2 + +/* Transition delay shifts */ +#define EN_REST_WAIT_SHIFT 20 +#define EN_FEW_WAIT_SHIFT 16 +#define CLK_DIS_WAIT_SHIFT 12
#define RETAIN_MEM BIT(14) #define RETAIN_PERIPH BIT(13) @@ -380,7 +385,18 @@ static int gdsc_init(struct gdsc *sc) */ mask = HW_CONTROL_MASK | SW_OVERRIDE_MASK | EN_REST_WAIT_MASK | EN_FEW_WAIT_MASK | CLK_DIS_WAIT_MASK; - val = EN_REST_WAIT_VAL | EN_FEW_WAIT_VAL | CLK_DIS_WAIT_VAL; + + if (!sc->en_rest_wait_val) + sc->en_rest_wait_val = EN_REST_WAIT_VAL; + if (!sc->en_few_wait_val) + sc->en_few_wait_val = EN_FEW_WAIT_VAL; + if (!sc->clk_dis_wait_val) + sc->clk_dis_wait_val = CLK_DIS_WAIT_VAL; + + val = sc->en_rest_wait_val << EN_REST_WAIT_SHIFT | + sc->en_few_wait_val << EN_FEW_WAIT_SHIFT | + sc->clk_dis_wait_val << CLK_DIS_WAIT_SHIFT; + ret = regmap_update_bits(sc->regmap, sc->gdscr, mask, val); if (ret) return ret; diff --git a/drivers/clk/qcom/gdsc.h b/drivers/clk/qcom/gdsc.h index d7cc4c21a9d4..ad313d7210bd 100644 --- a/drivers/clk/qcom/gdsc.h +++ b/drivers/clk/qcom/gdsc.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * Copyright (c) 2015, 2017-2018, The Linux Foundation. All rights reserved. + * Copyright (c) 2015, 2017-2018, 2022, The Linux Foundation. All rights reserved. */
#ifndef __QCOM_GDSC_H__ @@ -22,6 +22,9 @@ struct reset_controller_dev; * @cxcs: offsets of branch registers to toggle mem/periph bits in * @cxc_count: number of @cxcs * @pwrsts: Possible powerdomain power states + * @en_rest_wait_val: transition delay value for receiving enr ack signal + * @en_few_wait_val: transition delay value for receiving enf ack signal + * @clk_dis_wait_val: transition delay value for halting clock * @resets: ids of resets associated with this gdsc * @reset_count: number of @resets * @rcdev: reset controller @@ -36,6 +39,9 @@ struct gdsc { unsigned int clamp_io_ctrl; unsigned int *cxcs; unsigned int cxc_count; + unsigned int en_rest_wait_val; + unsigned int en_few_wait_val; + unsigned int clk_dis_wait_val; const u8 pwrsts; /* Powerdomain allowable state bitfields */ #define PWRSTS_OFF BIT(0)
From: Taniya Das tdas@codeaurora.org
[ Upstream commit 6e6fec3f961c00ca34ffb4bf2ad9febb4b499f8d ]
On SC7180 we observe black screens because the gdsc is being enabled/disabled very rapidly and the GDSC FSM state does not work as expected. This is due to the fact that the GDSC reset value is being updated from SW.
The recommended transition delay for mdss core gdsc updated for SC7180/SC7280/SM8250.
Fixes: dd3d06622138 ("clk: qcom: Add display clock controller driver for SC7180") Fixes: 1a00c962f9cd ("clk: qcom: Add display clock controller driver for SC7280") Fixes: 80a18f4a8567 ("clk: qcom: Add display clock controller driver for SM8150 and SM8250") Signed-off-by: Taniya Das tdas@codeaurora.org Link: https://lore.kernel.org/r/20220223185606.3941-2-tdas@codeaurora.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org [sboyd@kernel.org: lowercase hex] Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/qcom/dispcc-sc7180.c | 5 ++++- drivers/clk/qcom/dispcc-sc7280.c | 5 ++++- drivers/clk/qcom/dispcc-sm8250.c | 5 ++++- 3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/clk/qcom/dispcc-sc7180.c b/drivers/clk/qcom/dispcc-sc7180.c index 538e4963c915..5d2ae297e741 100644 --- a/drivers/clk/qcom/dispcc-sc7180.c +++ b/drivers/clk/qcom/dispcc-sc7180.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2019, The Linux Foundation. All rights reserved. + * Copyright (c) 2019, 2022, The Linux Foundation. All rights reserved. */
#include <linux/clk-provider.h> @@ -625,6 +625,9 @@ static struct clk_branch disp_cc_mdss_vsync_clk = {
static struct gdsc mdss_gdsc = { .gdscr = 0x3000, + .en_rest_wait_val = 0x2, + .en_few_wait_val = 0x2, + .clk_dis_wait_val = 0xf, .pd = { .name = "mdss_gdsc", }, diff --git a/drivers/clk/qcom/dispcc-sc7280.c b/drivers/clk/qcom/dispcc-sc7280.c index 4ef4ae231794..ad596d567f6a 100644 --- a/drivers/clk/qcom/dispcc-sc7280.c +++ b/drivers/clk/qcom/dispcc-sc7280.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2021, The Linux Foundation. All rights reserved. + * Copyright (c) 2021-2022, The Linux Foundation. All rights reserved. */
#include <linux/clk-provider.h> @@ -787,6 +787,9 @@ static struct clk_branch disp_cc_sleep_clk = {
static struct gdsc disp_cc_mdss_core_gdsc = { .gdscr = 0x1004, + .en_rest_wait_val = 0x2, + .en_few_wait_val = 0x2, + .clk_dis_wait_val = 0xf, .pd = { .name = "disp_cc_mdss_core_gdsc", }, diff --git a/drivers/clk/qcom/dispcc-sm8250.c b/drivers/clk/qcom/dispcc-sm8250.c index 566fdfa0a15b..db9379634fb2 100644 --- a/drivers/clk/qcom/dispcc-sm8250.c +++ b/drivers/clk/qcom/dispcc-sm8250.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Copyright (c) 2018-2020, The Linux Foundation. All rights reserved. + * Copyright (c) 2018-2020, 2022, The Linux Foundation. All rights reserved. */
#include <linux/clk-provider.h> @@ -1126,6 +1126,9 @@ static struct clk_branch disp_cc_mdss_vsync_clk = {
static struct gdsc mdss_gdsc = { .gdscr = 0x3000, + .en_rest_wait_val = 0x2, + .en_few_wait_val = 0x2, + .clk_dis_wait_val = 0xf, .pd = { .name = "mdss_gdsc", },
From: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com
[ Upstream commit c432cd598a185afefba1ac3b0ee226f222f71341 ]
In commit d687e056a18f ("soc: mediatek: mmsys: Add mt8192 mmsys routing table"), the mmsys routing table for mt8192 was introduced but the input selector for DITHER->DSI0 has no value assigned to it.
This means that we are clearing bit 0 instead of setting it, blocking communication between these two blocks; due to that, any display that is connected to DSI0 will not work, as no data will go through. The effect of that issue is that, during bootup, the DRM will block for some time, while atomically waiting for a vblank that never happens; later, the situation doesn't get better, leaving the display in a non-functional state.
To fix this issue, fix the route entry in the table by assigning the dither input selector to MT8192_DISP_DSI0_SEL_IN.
Fixes: d687e056a18f ("soc: mediatek: mmsys: Add mt8192 mmsys routing table") Signed-off-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Tested-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reviewed-by: Nícolas F. R. A. Prado nfraprado@collabora.com Link: https://lore.kernel.org/r/20220128142056.359900-1-angelogioacchino.delregno@... Signed-off-by: Matthias Brugger matthias.bgg@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soc/mediatek/mt8192-mmsys.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/mediatek/mt8192-mmsys.h b/drivers/soc/mediatek/mt8192-mmsys.h index 6f0a57044a7b..6aae0b12b6ff 100644 --- a/drivers/soc/mediatek/mt8192-mmsys.h +++ b/drivers/soc/mediatek/mt8192-mmsys.h @@ -53,7 +53,8 @@ static const struct mtk_mmsys_routes mmsys_mt8192_routing_table[] = { MT8192_AAL0_SEL_IN_CCORR0 }, { DDP_COMPONENT_DITHER, DDP_COMPONENT_DSI0, - MT8192_DISP_DSI0_SEL_IN, MT8192_DSI0_SEL_IN_DITHER0 + MT8192_DISP_DSI0_SEL_IN, MT8192_DSI0_SEL_IN_DITHER0, + MT8192_DSI0_SEL_IN_DITHER0 }, { DDP_COMPONENT_RDMA0, DDP_COMPONENT_COLOR0, MT8192_DISP_RDMA0_SOUT_SEL, MT8192_RDMA0_SOUT_COLOR0,
From: Dmitry Torokhov dmitry.torokhov@gmail.com
[ Upstream commit cc71d37fd1f11e0495b1cf580909ebea37eaa886 ]
The driver creates the top row map sysfs attribute in input_configured() method; unfortunately we do not have a callback that is executed when HID interface is unbound, thus we are leaking these sysfs attributes, for example when device is disconnected.
To fix it let's switch to managed version of adding sysfs attributes which will ensure that they are destroyed when the driver is unbound.
Fixes: 14c9c014babe ("HID: add vivaldi HID driver") Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Tested-by: Stephen Boyd swboyd@chromium.org Reviewed-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-vivaldi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/hid-vivaldi.c b/drivers/hid/hid-vivaldi.c index 576518e704ee..d57ec1767037 100644 --- a/drivers/hid/hid-vivaldi.c +++ b/drivers/hid/hid-vivaldi.c @@ -143,7 +143,7 @@ static void vivaldi_feature_mapping(struct hid_device *hdev, static int vivaldi_input_configured(struct hid_device *hdev, struct hid_input *hidinput) { - return sysfs_create_group(&hdev->dev.kobj, &input_attribute_group); + return devm_device_add_group(&hdev->dev, &input_attribute_group); }
static const struct hid_device_id vivaldi_table[] = {
From: Jia-Ju Bai baijiaju1990@gmail.com
[ Upstream commit fe23b6bbeac40de957724b90a88d46fb336e29a9 ]
The function alloc_workqueue() in nintendo_hid_probe() can fail, but there is no check of its return value. To fix this bug, its return value should be checked with new error handling code.
Fixes: c4eae84feff3e ("HID: nintendo: add rumble support") Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Reviewed-by: Silvan Jegen s.jegen@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-nintendo.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/hid/hid-nintendo.c b/drivers/hid/hid-nintendo.c index b6a9a0f3966e..2204de889739 100644 --- a/drivers/hid/hid-nintendo.c +++ b/drivers/hid/hid-nintendo.c @@ -2128,6 +2128,10 @@ static int nintendo_hid_probe(struct hid_device *hdev, spin_lock_init(&ctlr->lock); ctlr->rumble_queue = alloc_workqueue("hid-nintendo-rumble_wq", WQ_FREEZABLE | WQ_MEM_RECLAIM, 0); + if (!ctlr->rumble_queue) { + ret = -ENOMEM; + goto err; + } INIT_WORK(&ctlr->rumble_worker, joycon_rumble_worker);
ret = hid_parse(hdev);
From: Pali Rohár pali@kernel.org
[ Upstream commit a0e897d1b36793fe0ab899f2fe93dff25c82f418 ]
U-Boot uses ethernet* aliases for setting MAC addresses. Therefore define also alias for ethernet0.
Fixes: 7109d817db2e ("arm64: dts: marvell: add DTS for Turris Mox") Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Gregory CLEMENT gregory.clement@bootlin.com Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts index 04da07ae4420..1eddf31d8bd8 100644 --- a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts +++ b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts @@ -18,6 +18,7 @@ / {
aliases { spi0 = &spi0; + ethernet0 = ð0; ethernet1 = ð1; mmc0 = &sdhci0; mmc1 = &sdhci1;
From: Tung Nguyen tung.q.nguyen@dektech.com.au
[ Upstream commit be4977b847f5d5cedb64d50eaaf2218c3a55a3a3 ]
When enabling a bearer on a node, a kernel panic is observed:
[ 4.498085] RIP: 0010:tipc_mon_prep+0x4e/0x130 [tipc] ... [ 4.520030] Call Trace: [ 4.520689] <IRQ> [ 4.521236] tipc_link_build_proto_msg+0x375/0x750 [tipc] [ 4.522654] tipc_link_build_state_msg+0x48/0xc0 [tipc] [ 4.524034] __tipc_node_link_up+0xd7/0x290 [tipc] [ 4.525292] tipc_rcv+0x5da/0x730 [tipc] [ 4.526346] ? __netif_receive_skb_core+0xb7/0xfc0 [ 4.527601] tipc_l2_rcv_msg+0x5e/0x90 [tipc] [ 4.528737] __netif_receive_skb_list_core+0x20b/0x260 [ 4.530068] netif_receive_skb_list_internal+0x1bf/0x2e0 [ 4.531450] ? dev_gro_receive+0x4c2/0x680 [ 4.532512] napi_complete_done+0x6f/0x180 [ 4.533570] virtnet_poll+0x29c/0x42e [virtio_net] ...
The node in question is receiving activate messages in another thread after changing bearer status to allow message sending/ receiving in current thread:
thread 1 | thread 2 -------- | -------- | tipc_enable_bearer() | test_and_set_bit_lock() | tipc_bearer_xmit_skb() | | tipc_l2_rcv_msg() | tipc_rcv() | __tipc_node_link_up() | tipc_link_build_state_msg() | tipc_link_build_proto_msg() | tipc_mon_prep() | { | ... | // null-pointer dereference | u16 gen = mon->dom_gen; | ... | } // Not being executed yet | tipc_mon_create() | { | ... | // allocate | mon = kzalloc(); | ... | } |
Monitoring pointer in thread 2 is dereferenced before monitoring data is allocated in thread 1. This causes kernel panic.
This commit fixes it by allocating the monitoring data before enabling the bearer to receive messages.
Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Reported-by: Shuang Li shuali@redhat.com Acked-by: Jon Maloy jmaloy@redhat.com Signed-off-by: Tung Nguyen tung.q.nguyen@dektech.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/bearer.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 60bc74b76adc..1cb5907d90d8 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -352,16 +352,18 @@ static int tipc_enable_bearer(struct net *net, const char *name, goto rejected; }
- test_and_set_bit_lock(0, &b->up); - rcu_assign_pointer(tn->bearer_list[bearer_id], b); - if (skb) - tipc_bearer_xmit_skb(net, bearer_id, skb, &b->bcast_addr); - + /* Create monitoring data before accepting activate messages */ if (tipc_mon_create(net, bearer_id)) { bearer_disable(net, b); + kfree_skb(skb); return -ENOMEM; }
+ test_and_set_bit_lock(0, &b->up); + rcu_assign_pointer(tn->bearer_list[bearer_id], b); + if (skb) + tipc_bearer_xmit_skb(net, bearer_id, skb, &b->bcast_addr); + pr_info("Enabled bearer <%s>, priority %u\n", name, prio);
return res;
From: Si-Wei Liu si-wei.liu@oracle.com
[ Upstream commit ed0f849fc3a63ed2ddf5e72cdb1de3bdbbb0f8eb ]
When control vq receives a VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command request from the driver, presently there is no validation against the number of queue pairs to configure, or even if multiqueue had been negotiated or not is unverified. This may lead to kernel panic due to uninitialized resource for the queues were there any bogus request sent down by untrusted driver. Tie up the loose ends there.
Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support") Signed-off-by: Si-Wei Liu si-wei.liu@oracle.com Link: https://lore.kernel.org/r/1642206481-30721-4-git-send-email-si-wei.liu@oracl... Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Eli Cohen elic@nvidia.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index ef6da39ccb3f..7b4ab7cfc359 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1571,11 +1571,27 @@ static virtio_net_ctrl_ack handle_ctrl_mq(struct mlx5_vdpa_dev *mvdev, u8 cmd)
switch (cmd) { case VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET: + /* This mq feature check aligns with pre-existing userspace + * implementation. + * + * Without it, an untrusted driver could fake a multiqueue config + * request down to a non-mq device that may cause kernel to + * panic due to uninitialized resources for extra vqs. Even with + * a well behaving guest driver, it is not expected to allow + * changing the number of vqs on a non-mq device. + */ + if (!MLX5_FEATURE(mvdev, VIRTIO_NET_F_MQ)) + break; + read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->riov, (void *)&mq, sizeof(mq)); if (read != sizeof(mq)) break;
newqps = mlx5vdpa16_to_cpu(mvdev, mq.virtqueue_pairs); + if (newqps < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN || + newqps > mlx5_vdpa_max_qps(mvdev->max_vqs)) + break; + if (ndev->cur_num_vqs == 2 * newqps) { status = VIRTIO_NET_OK; break;
From: Xie Yongji xieyongji@bytedance.com
[ Upstream commit b9d102dafec6af1c07b610faf0a6d4e8aee14ae0 ]
This fixes the following smatch warnings:
drivers/vdpa/vdpa_user/iova_domain.c:305 vduse_domain_alloc_iova() warn: should 'iova_pfn << shift' be a 64 bit type?
Fixes: 8c773d53fb7b ("vduse: Implement an MMU-based software IOTLB") Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Xie Yongji xieyongji@bytedance.com Link: https://lore.kernel.org/r/20220121083940.102-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/vdpa_user/iova_domain.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c index 1daae2608860..0678c2514197 100644 --- a/drivers/vdpa/vdpa_user/iova_domain.c +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -302,7 +302,7 @@ vduse_domain_alloc_iova(struct iova_domain *iovad, iova_len = roundup_pow_of_two(iova_len); iova_pfn = alloc_iova_fast(iovad, iova_len, limit >> shift, true);
- return iova_pfn << shift; + return (dma_addr_t)iova_pfn << shift; }
static void vduse_domain_free_iova(struct iova_domain *iovad,
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit a502a8f04097e038c3daa16c5202a9538116d563 ]
This PHY doesn't support a link-up interrupt source. If aneg is enabled we use the "aneg complete" interrupt for this purpose, but if aneg is disabled link-up isn't signaled currently. According to a vendor driver there's an additional "energy detect" interrupt source that can be used to signal link-up if aneg is disabled. We can safely ignore this interrupt source if aneg is enabled.
This patch was tested on a TX3 Mini TV box with S905W (even though boot message says it's a S905D).
This issue has been existing longer, but due to changes in phylib and the driver the patch applies only from the commit marked as fixed.
Fixes: 84c8f773d2dc ("net: phy: meson-gxl: remove the use of .ack_callback()") Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/04cac530-ea1b-850e-6cfa-144a55c4d75d@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/meson-gxl.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c index 7e7904fee1d9..c49062ad72c6 100644 --- a/drivers/net/phy/meson-gxl.c +++ b/drivers/net/phy/meson-gxl.c @@ -30,8 +30,12 @@ #define INTSRC_LINK_DOWN BIT(4) #define INTSRC_REMOTE_FAULT BIT(5) #define INTSRC_ANEG_COMPLETE BIT(6) +#define INTSRC_ENERGY_DETECT BIT(7) #define INTSRC_MASK 30
+#define INT_SOURCES (INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE | \ + INTSRC_ENERGY_DETECT) + #define BANK_ANALOG_DSP 0 #define BANK_WOL 1 #define BANK_BIST 3 @@ -200,7 +204,6 @@ static int meson_gxl_ack_interrupt(struct phy_device *phydev)
static int meson_gxl_config_intr(struct phy_device *phydev) { - u16 val; int ret;
if (phydev->interrupts == PHY_INTERRUPT_ENABLED) { @@ -209,16 +212,9 @@ static int meson_gxl_config_intr(struct phy_device *phydev) if (ret) return ret;
- val = INTSRC_ANEG_PR - | INTSRC_PARALLEL_FAULT - | INTSRC_ANEG_LP_ACK - | INTSRC_LINK_DOWN - | INTSRC_REMOTE_FAULT - | INTSRC_ANEG_COMPLETE; - ret = phy_write(phydev, INTSRC_MASK, val); + ret = phy_write(phydev, INTSRC_MASK, INT_SOURCES); } else { - val = 0; - ret = phy_write(phydev, INTSRC_MASK, val); + ret = phy_write(phydev, INTSRC_MASK, 0);
/* Ack any pending IRQ */ ret = meson_gxl_ack_interrupt(phydev); @@ -237,9 +233,16 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev) return IRQ_NONE; }
+ irq_status &= INT_SOURCES; + if (irq_status == 0) return IRQ_NONE;
+ /* Aneg-complete interrupt is used for link-up detection */ + if (phydev->autoneg == AUTONEG_ENABLE && + irq_status == INTSRC_ENERGY_DETECT) + return IRQ_HANDLED; + phy_trigger_machine(phydev);
return IRQ_HANDLED;
From: Alexey Khoroshilov khoroshilov@ispras.ru
[ Upstream commit c6a502c2299941c8326d029cfc8a3bc8a4607ad5 ]
dsp_pipeline_build() allocates dup pointer by kstrdup(cfg), but then it updates dup variable by strsep(&dup, "|"). As a result when it calls kfree(dup), the dup variable contains NULL.
Found by Linux Driver Verification project (linuxtesting.org) with SVACE.
Signed-off-by: Alexey Khoroshilov khoroshilov@ispras.ru Fixes: 960366cf8dbb ("Add mISDN DSP") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/mISDN/dsp_pipeline.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/isdn/mISDN/dsp_pipeline.c b/drivers/isdn/mISDN/dsp_pipeline.c index e11ca6bbc7f4..c3b2c99b5cd5 100644 --- a/drivers/isdn/mISDN/dsp_pipeline.c +++ b/drivers/isdn/mISDN/dsp_pipeline.c @@ -192,7 +192,7 @@ void dsp_pipeline_destroy(struct dsp_pipeline *pipeline) int dsp_pipeline_build(struct dsp_pipeline *pipeline, const char *cfg) { int found = 0; - char *dup, *tok, *name, *args; + char *dup, *next, *tok, *name, *args; struct dsp_element_entry *entry, *n; struct dsp_pipeline_entry *pipeline_entry; struct mISDN_dsp_element *elem; @@ -203,10 +203,10 @@ int dsp_pipeline_build(struct dsp_pipeline *pipeline, const char *cfg) if (!list_empty(&pipeline->list)) _dsp_pipeline_destroy(pipeline);
- dup = kstrdup(cfg, GFP_ATOMIC); + dup = next = kstrdup(cfg, GFP_ATOMIC); if (!dup) return 0; - while ((tok = strsep(&dup, "|"))) { + while ((tok = strsep(&next, "|"))) { if (!strlen(tok)) continue; name = strsep(&tok, "(");
From: Anirudh Rayabharam mail@anirudhrb.com
[ Upstream commit e2ae38cf3d91837a493cb2093c87700ff3cbe667 ]
In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when start is 0 and last is ULONG_MAX. One instance where it can happen is when userspace sends an IOTLB message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an entry with size = 0, start = 0, last = ULONG_MAX ends up in the iotlb. Next time a packet is sent, iotlb_access_ok() loops indefinitely due to that erroneous entry.
Call Trace: <TASK> iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340 vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366 vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104 vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK>
Reported by syzbot at: https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
To fix this, do two things:
1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map a range with size 0. 2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX] by splitting it into two entries.
Fixes: 0bbe30668d89e ("vhost: factor out IOTLB") Reported-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com Tested-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com Signed-off-by: Anirudh Rayabharam mail@anirudhrb.com Link: https://lore.kernel.org/r/20220305095525.5145-1-mail@anirudhrb.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/iotlb.c | 11 +++++++++++ drivers/vhost/vhost.c | 5 +++++ 2 files changed, 16 insertions(+)
diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c index 670d56c879e5..40b098320b2a 100644 --- a/drivers/vhost/iotlb.c +++ b/drivers/vhost/iotlb.c @@ -57,6 +57,17 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, if (last < start) return -EFAULT;
+ /* If the range being mapped is [0, ULONG_MAX], split it into two entries + * otherwise its size would overflow u64. + */ + if (start == 0 && last == ULONG_MAX) { + u64 mid = last / 2; + + vhost_iotlb_add_range_ctx(iotlb, start, mid, addr, perm, opaque); + addr += mid + 1; + start = mid + 1; + } + if (iotlb->limit && iotlb->nmaps == iotlb->limit && iotlb->flags & VHOST_IOTLB_FLAG_RETIRE) { diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 59edb5a1ffe2..55475fd59fb7 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1170,6 +1170,11 @@ ssize_t vhost_chr_write_iter(struct vhost_dev *dev, goto done; }
+ if (msg.size == 0) { + ret = -EINVAL; + goto done; + } + if (dev->msg_handler) ret = dev->msg_handler(dev, &msg); else
On Mon, Mar 14, 2022 at 12:53:20PM +0100, Greg Kroah-Hartman wrote:
From: Anirudh Rayabharam mail@anirudhrb.com
[ Upstream commit e2ae38cf3d91837a493cb2093c87700ff3cbe667 ]
This breaks batching of IOTLB messages. [1] fixes it but hasn't landed in Linus' tree yet.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linu...
- Anirudh.
In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when start is 0 and last is ULONG_MAX. One instance where it can happen is when userspace sends an IOTLB message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an entry with size = 0, start = 0, last = ULONG_MAX ends up in the iotlb. Next time a packet is sent, iotlb_access_ok() loops indefinitely due to that erroneous entry.
Call Trace: <TASK> iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340 vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366 vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104 vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK>
Reported by syzbot at: https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
To fix this, do two things:
- Return -EINVAL in vhost_chr_write_iter() when userspace asks to map a range with size 0.
- Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX] by splitting it into two entries.
Fixes: 0bbe30668d89e ("vhost: factor out IOTLB") Reported-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com Tested-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com Signed-off-by: Anirudh Rayabharam mail@anirudhrb.com Link: https://lore.kernel.org/r/20220305095525.5145-1-mail@anirudhrb.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/vhost/iotlb.c | 11 +++++++++++ drivers/vhost/vhost.c | 5 +++++ 2 files changed, 16 insertions(+)
diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c index 670d56c879e5..40b098320b2a 100644 --- a/drivers/vhost/iotlb.c +++ b/drivers/vhost/iotlb.c @@ -57,6 +57,17 @@ int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, if (last < start) return -EFAULT;
- /* If the range being mapped is [0, ULONG_MAX], split it into two entries
* otherwise its size would overflow u64.
*/
- if (start == 0 && last == ULONG_MAX) {
u64 mid = last / 2;
vhost_iotlb_add_range_ctx(iotlb, start, mid, addr, perm, opaque);
addr += mid + 1;
start = mid + 1;
- }
- if (iotlb->limit && iotlb->nmaps == iotlb->limit && iotlb->flags & VHOST_IOTLB_FLAG_RETIRE) {
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 59edb5a1ffe2..55475fd59fb7 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1170,6 +1170,11 @@ ssize_t vhost_chr_write_iter(struct vhost_dev *dev, goto done; }
- if (msg.size == 0) {
ret = -EINVAL;
goto done;
- }
- if (dev->msg_handler) ret = dev->msg_handler(dev, &msg); else
-- 2.34.1
From: Xie Yongji xieyongji@bytedance.com
[ Upstream commit dacc73ed0b88f1a787ec20385f42ca9dd9eddcd0 ]
Currently the value of max_discard_segment will be set to MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device set 0 to max_discard_seg in configuration space. It's incorrect since the device might not be able to handle such large descriptors. To fix it, let's follow max_segments restrictions in this case.
Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Signed-off-by: Xie Yongji xieyongji@bytedance.com Link: https://lore.kernel.org/r/20220304100058.116-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/virtio_blk.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 6ae38776e30e..87f239eb0a99 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -926,9 +926,15 @@ static int virtblk_probe(struct virtio_device *vdev)
virtio_cread(vdev, struct virtio_blk_config, max_discard_seg, &v); + + /* + * max_discard_seg == 0 is out of spec but we always + * handled it. + */ + if (!v) + v = sg_elems - 2; blk_queue_max_discard_segments(q, - min_not_zero(v, - MAX_DISCARD_SEGMENTS)); + min(v, MAX_DISCARD_SEGMENTS));
blk_queue_flag_set(QUEUE_FLAG_DISCARD, q); }
From: Xie Yongji xieyongji@bytedance.com
[ Upstream commit e030759a1ddcbf61d42b6e996bfeb675e0032d8b ]
Currently we have a BUG_ON() to make sure the number of sg list does not exceed queue_max_segments() in virtio_queue_rq(). However, the block layer uses queue_max_discard_segments() instead of queue_max_segments() to limit the sg list for discard requests. So the BUG_ON() might be triggered if virtio-blk device reports a larger value for max discard segment than queue_max_segments(). To fix it, let's simply remove the BUG_ON() which has become unnecessary after commit 02746e26c39e("virtio-blk: avoid preallocating big SGL for data"). And the unused vblk->sg_elems can also be removed together.
Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Suggested-by: Christoph Hellwig hch@infradead.org Signed-off-by: Xie Yongji xieyongji@bytedance.com Reviewed-by: Max Gurtovoy mgurtovoy@nvidia.com Link: https://lore.kernel.org/r/20220304100058.116-2-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/virtio_blk.c | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 87f239eb0a99..b3df5e5452a7 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -76,9 +76,6 @@ struct virtio_blk { */ refcount_t refs;
- /* What host tells us, plus 2 for header & tailer. */ - unsigned int sg_elems; - /* Ida index - used to track minor number allocations. */ int index;
@@ -322,8 +319,6 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, blk_status_t status; int err;
- BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); - status = virtblk_setup_cmd(vblk->vdev, req, vbr); if (unlikely(status)) return status; @@ -783,8 +778,6 @@ static int virtblk_probe(struct virtio_device *vdev) /* Prevent integer overflows and honor max vq size */ sg_elems = min_t(u32, sg_elems, VIRTIO_BLK_MAX_SG_ELEMS - 2);
- /* We need extra sg elements at head and tail. */ - sg_elems += 2; vdev->priv = vblk = kmalloc(sizeof(*vblk), GFP_KERNEL); if (!vblk) { err = -ENOMEM; @@ -796,7 +789,6 @@ static int virtblk_probe(struct virtio_device *vdev) mutex_init(&vblk->vdev_mutex);
vblk->vdev = vdev; - vblk->sg_elems = sg_elems;
INIT_WORK(&vblk->config_work, virtblk_config_changed_work);
@@ -854,7 +846,7 @@ static int virtblk_probe(struct virtio_device *vdev) set_disk_ro(vblk->disk, 1);
/* We can handle whatever the host told us to handle. */ - blk_queue_max_segments(q, vblk->sg_elems-2); + blk_queue_max_segments(q, sg_elems);
/* No real sector limit. */ blk_queue_max_hw_sectors(q, -1U); @@ -932,7 +924,7 @@ static int virtblk_probe(struct virtio_device *vdev) * handled it. */ if (!v) - v = sg_elems - 2; + v = sg_elems; blk_queue_max_discard_segments(q, min(v, MAX_DISCARD_SEGMENTS));
From: Zhang Min zhang.min9@zte.com.cn
[ Upstream commit eb057b44dbe35ae14527830236a92f51de8f9184 ]
When vp_vdpa driver is unbind, vp_vdpa is freed in vdpa_unregister_device and then vp_vdpa->mdev.pci_dev is dereferenced in vp_modern_remove, triggering use-after-free.
Call Trace of unbinding driver free vp_vdpa : do_syscall_64 vfs_write kernfs_fop_write_iter device_release_driver_internal pci_device_remove vp_vdpa_remove vdpa_unregister_device kobject_release device_release kfree
Call Trace of dereference vp_vdpa->mdev.pci_dev: vp_modern_remove pci_release_selected_regions pci_release_region pci_resource_len pci_resource_end (dev)->resource[(bar)].end
Signed-off-by: Zhang Min zhang.min9@zte.com.cn Signed-off-by: Yi Wang wang.yi59@zte.com.cn Link: https://lore.kernel.org/r/20220301091059.46869-1-wang.yi59@zte.com.cn Signed-off-by: Michael S. Tsirkin mst@redhat.com Fixes: 64b9f64f80a6 ("vdpa: introduce virtio pci driver") Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index e3ff7875e123..fab161961160 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -525,8 +525,8 @@ static void vp_vdpa_remove(struct pci_dev *pdev) { struct vp_vdpa *vp_vdpa = pci_get_drvdata(pdev);
- vdpa_unregister_device(&vp_vdpa->vdpa); vp_modern_remove(&vp_vdpa->mdev); + vdpa_unregister_device(&vp_vdpa->vdpa); }
static struct pci_driver vp_vdpa_driver = {
From: Jia-Ju Bai baijiaju1990@gmail.com
[ Upstream commit d0aeb0d4a3f7d2a0df7e9545892bbeede8f2ac7e ]
The function dma_set_mask() in setup_hw() can fail, so its return value should be checked.
Fixes: 1700fe1a10dc ("Add mISDN HFC PCI driver") Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/hardware/mISDN/hfcpci.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/isdn/hardware/mISDN/hfcpci.c b/drivers/isdn/hardware/mISDN/hfcpci.c index bd087cca1c1d..af17459c1a5c 100644 --- a/drivers/isdn/hardware/mISDN/hfcpci.c +++ b/drivers/isdn/hardware/mISDN/hfcpci.c @@ -2005,7 +2005,11 @@ setup_hw(struct hfc_pci *hc) } /* Allocate memory for FIFOS */ /* the memory needs to be on a 32k boundary within the first 4G */ - dma_set_mask(&hc->pdev->dev, 0xFFFF8000); + if (dma_set_mask(&hc->pdev->dev, 0xFFFF8000)) { + printk(KERN_WARNING + "HFC-PCI: No usable DMA configuration!\n"); + return -EIO; + } buffer = dma_alloc_coherent(&hc->pdev->dev, 0x8000, &hc->hw.dmahandle, GFP_KERNEL); /* We silently assume the address is okay if nonzero */
From: Jia-Ju Bai baijiaju1990@gmail.com
[ Upstream commit e0058f0fa80f6e09c4d363779c241c45a3c56b94 ]
The function dma_alloc_coherent() in qed_vf_hw_prepare() can fail, so its return value should be checked.
Fixes: 1408cc1fa48c ("qed: Introduce VFs") Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qed/qed_vf.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.c b/drivers/net/ethernet/qlogic/qed/qed_vf.c index 597cd9cd57b5..7b0e390c0b07 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_vf.c +++ b/drivers/net/ethernet/qlogic/qed/qed_vf.c @@ -513,6 +513,9 @@ int qed_vf_hw_prepare(struct qed_hwfn *p_hwfn) p_iov->bulletin.size, &p_iov->bulletin.phys, GFP_KERNEL); + if (!p_iov->bulletin.p_virt) + goto free_pf2vf_reply; + DP_VERBOSE(p_hwfn, QED_MSG_IOV, "VF's bulletin Board [%p virt 0x%llx phys 0x%08x bytes]\n", p_iov->bulletin.p_virt, @@ -552,6 +555,10 @@ int qed_vf_hw_prepare(struct qed_hwfn *p_hwfn)
return rc;
+free_pf2vf_reply: + dma_free_coherent(&p_hwfn->cdev->pdev->dev, + sizeof(union pfvf_tlvs), + p_iov->pf2vf_reply, p_iov->pf2vf_reply_phys); free_vf2pf_request: dma_free_coherent(&p_hwfn->cdev->pdev->dev, sizeof(union vfpf_tlvs),
From: Steffen Klassert steffen.klassert@secunet.com
[ Upstream commit ebe48d368e97d007bfeb76fcb065d6cfc4c96645 ]
The maximum message size that can be send is bigger than the maximum site that skb_page_frag_refill can allocate. So it is possible to write beyond the allocated buffer.
Fix this by doing a fallback to COW in that case.
v2:
Avoid get get_order() costs as suggested by Linus Torvalds.
Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Reported-by: valis sec@valis.email Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/esp.h | 2 ++ net/ipv4/esp4.c | 5 +++++ net/ipv6/esp6.c | 5 +++++ 3 files changed, 12 insertions(+)
diff --git a/include/net/esp.h b/include/net/esp.h index 9c5637d41d95..90cd02ff77ef 100644 --- a/include/net/esp.h +++ b/include/net/esp.h @@ -4,6 +4,8 @@
#include <linux/skbuff.h>
+#define ESP_SKB_FRAG_MAXSIZE (PAGE_SIZE << SKB_FRAG_PAGE_ORDER) + struct ip_esp_hdr;
static inline struct ip_esp_hdr *ip_esp_hdr(const struct sk_buff *skb) diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index e1b1d080e908..70e6c87fbe3d 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -446,6 +446,7 @@ int esp_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info * struct page *page; struct sk_buff *trailer; int tailen = esp->tailen; + unsigned int allocsz;
/* this is non-NULL only with TCP/UDP Encapsulation */ if (x->encap) { @@ -455,6 +456,10 @@ int esp_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info * return err; }
+ allocsz = ALIGN(skb->data_len + tailen, L1_CACHE_BYTES); + if (allocsz > ESP_SKB_FRAG_MAXSIZE) + goto cow; + if (!skb_cloned(skb)) { if (tailen <= skb_tailroom(skb)) { nfrags = 1; diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c index 883b53fd7846..b7b573085bd5 100644 --- a/net/ipv6/esp6.c +++ b/net/ipv6/esp6.c @@ -483,6 +483,7 @@ int esp6_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info struct page *page; struct sk_buff *trailer; int tailen = esp->tailen; + unsigned int allocsz;
if (x->encap) { int err = esp6_output_encap(x, skb, esp); @@ -491,6 +492,10 @@ int esp6_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info return err; }
+ allocsz = ALIGN(skb->data_len + tailen, L1_CACHE_BYTES); + if (allocsz > ESP_SKB_FRAG_MAXSIZE) + goto cow; + if (!skb_cloned(skb)) { if (tailen <= skb_tailroom(skb)) { nfrags = 1;
From: Steffen Klassert steffen.klassert@secunet.com
[ Upstream commit 053c8fdf2c930efdff5496960842bbb5c34ad43a ]
The xfrm{4,6}_beet_gso_segment() functions did not correctly set the SKB_GSO_IPXIP4 and SKB_GSO_IPXIP6 gso types for the address family tunneling case. Fix this by setting these gso types.
Fixes: 384a46ea7bdc7 ("esp4: add gso_segment for esp4 beet mode") Fixes: 7f9e40eb18a99 ("esp6: add gso_segment for esp6 beet mode") Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/esp4_offload.c | 3 +++ net/ipv6/esp6_offload.c | 3 +++ 2 files changed, 6 insertions(+)
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c index 8e4e9aa12130..dad5d29a6a8d 100644 --- a/net/ipv4/esp4_offload.c +++ b/net/ipv4/esp4_offload.c @@ -159,6 +159,9 @@ static struct sk_buff *xfrm4_beet_gso_segment(struct xfrm_state *x, skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4; }
+ if (proto == IPPROTO_IPV6) + skb_shinfo(skb)->gso_type |= SKB_GSO_IPXIP4; + __skb_pull(skb, skb_transport_offset(skb)); ops = rcu_dereference(inet_offloads[proto]); if (likely(ops && ops->callbacks.gso_segment)) diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c index a349d4798077..302170882382 100644 --- a/net/ipv6/esp6_offload.c +++ b/net/ipv6/esp6_offload.c @@ -198,6 +198,9 @@ static struct sk_buff *xfrm6_beet_gso_segment(struct xfrm_state *x, ipv6_skip_exthdr(skb, 0, &proto, &frag); }
+ if (proto == IPPROTO_IPIP) + skb_shinfo(skb)->gso_type |= SKB_GSO_IPXIP6; + __skb_pull(skb, skb_transport_offset(skb)); ops = rcu_dereference(inet6_offloads[proto]); if (likely(ops && ops->callbacks.gso_segment))
From: Eric Dumazet edumazet@google.com
[ Upstream commit 0b935d7f8c07bf0a192712bdbf76dbf45ef8b115 ]
This helper is used once, no need to keep it in fat net/core/skbuff.c
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netdevice.h | 1 - net/core/skbuff.c | 26 -------------------------- net/ipv4/udp_offload.c | 27 +++++++++++++++++++++++++++ 3 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 049858c671ef..7500ac08c9ba 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3007,7 +3007,6 @@ struct net_device *dev_get_by_napi_id(unsigned int napi_id); int netdev_get_name(struct net *net, char *name, int ifindex); int dev_restart(struct net_device *dev); int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb); -int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb);
static inline unsigned int skb_gro_offset(const struct sk_buff *skb) { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 56e23333e708..f1e3d70e8987 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3919,32 +3919,6 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, } EXPORT_SYMBOL_GPL(skb_segment_list);
-int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb) -{ - if (unlikely(p->len + skb->len >= 65536)) - return -E2BIG; - - if (NAPI_GRO_CB(p)->last == p) - skb_shinfo(p)->frag_list = skb; - else - NAPI_GRO_CB(p)->last->next = skb; - - skb_pull(skb, skb_gro_offset(skb)); - - NAPI_GRO_CB(p)->last = skb; - NAPI_GRO_CB(p)->count++; - p->data_len += skb->len; - - /* sk owenrship - if any - completely transferred to the aggregated packet */ - skb->destructor = NULL; - p->truesize += skb->truesize; - p->len += skb->len; - - NAPI_GRO_CB(skb)->same_flow = 1; - - return 0; -} - /** * skb_segment - Perform protocol segmentation on skb. * @head_skb: buffer to segment diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 86d32a1e62ac..c2398f9e46f0 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -424,6 +424,33 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, return segs; }
+static int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb) +{ + if (unlikely(p->len + skb->len >= 65536)) + return -E2BIG; + + if (NAPI_GRO_CB(p)->last == p) + skb_shinfo(p)->frag_list = skb; + else + NAPI_GRO_CB(p)->last->next = skb; + + skb_pull(skb, skb_gro_offset(skb)); + + NAPI_GRO_CB(p)->last = skb; + NAPI_GRO_CB(p)->count++; + p->data_len += skb->len; + + /* sk owenrship - if any - completely transferred to the aggregated packet */ + skb->destructor = NULL; + p->truesize += skb->truesize; + p->len += skb->len; + + NAPI_GRO_CB(skb)->same_flow = 1; + + return 0; +} + + #define UDP_GRO_CNT_MAX 64 static struct sk_buff *udp_gro_receive_segment(struct list_head *head, struct sk_buff *skb)
From: Tom Rix trix@redhat.com
[ Upstream commit d9dc0c84ad2d4cc911ba252c973d1bf18d5eb9cf ]
Clang static analysis reports this issue qed_sriov.c:4727:19: warning: Assigned value is garbage or undefined ivi->max_tx_rate = tx_rate ? tx_rate : link.speed; ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
link is only sometimes set by the call to qed_iov_get_link() qed_iov_get_link fails without setting link or returning status. So change the decl to return status.
Fixes: 73390ac9d82b ("qed*: support ndo_get_vf_config") Signed-off-by: Tom Rix trix@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.c b/drivers/net/ethernet/qlogic/qed/qed_sriov.c index 8ac38828ba45..48cf4355bc47 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c +++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c @@ -3806,11 +3806,11 @@ bool qed_iov_mark_vf_flr(struct qed_hwfn *p_hwfn, u32 *p_disabled_vfs) return found; }
-static void qed_iov_get_link(struct qed_hwfn *p_hwfn, - u16 vfid, - struct qed_mcp_link_params *p_params, - struct qed_mcp_link_state *p_link, - struct qed_mcp_link_capabilities *p_caps) +static int qed_iov_get_link(struct qed_hwfn *p_hwfn, + u16 vfid, + struct qed_mcp_link_params *p_params, + struct qed_mcp_link_state *p_link, + struct qed_mcp_link_capabilities *p_caps) { struct qed_vf_info *p_vf = qed_iov_get_vf_info(p_hwfn, vfid, @@ -3818,7 +3818,7 @@ static void qed_iov_get_link(struct qed_hwfn *p_hwfn, struct qed_bulletin_content *p_bulletin;
if (!p_vf) - return; + return -EINVAL;
p_bulletin = p_vf->bulletin.p_virt;
@@ -3828,6 +3828,7 @@ static void qed_iov_get_link(struct qed_hwfn *p_hwfn, __qed_vf_get_link_state(p_hwfn, p_link, p_bulletin); if (p_caps) __qed_vf_get_link_caps(p_hwfn, p_caps, p_bulletin); + return 0; }
static int @@ -4686,6 +4687,7 @@ static int qed_get_vf_config(struct qed_dev *cdev, struct qed_public_vf_info *vf_info; struct qed_mcp_link_state link; u32 tx_rate; + int ret;
/* Sanitize request */ if (IS_VF(cdev)) @@ -4699,7 +4701,9 @@ static int qed_get_vf_config(struct qed_dev *cdev,
vf_info = qed_iov_get_public_vf_info(hwfn, vf_id, true);
- qed_iov_get_link(hwfn, vf_id, NULL, &link, NULL); + ret = qed_iov_get_link(hwfn, vf_id, NULL, &link, NULL); + if (ret) + return ret;
/* Fill information about VF */ ivi->vf = vf_id;
From: Fabio Estevam festevam@denx.de
[ Upstream commit c70c453abcbf3ecbaadd4c3236a5119b8da365cf ]
According to Documentation/driver-api/usb/URB.rst when a device is unplugged usb_submit_urb() returns -ENODEV.
This error code propagates all the way up to usbnet_read_cmd() and usbnet_write_cmd() calls inside the smsc95xx.c driver during Ethernet cable unplug, unbind or reboot.
This causes the following errors to be shown on reboot, for example:
ci_hdrc ci_hdrc.1: remove, state 1 usb usb2: USB disconnect, device number 1 usb 2-1: USB disconnect, device number 2 usb 2-1.1: USB disconnect, device number 3 smsc95xx 2-1.1:1.0 eth1: unregister 'smsc95xx' usb-ci_hdrc.1-1.1, smsc95xx USB 2.0 Ethernet smsc95xx 2-1.1:1.0 eth1: Failed to read reg index 0x00000114: -19 smsc95xx 2-1.1:1.0 eth1: Error reading MII_ACCESS smsc95xx 2-1.1:1.0 eth1: __smsc95xx_mdio_read: MII is busy smsc95xx 2-1.1:1.0 eth1: Failed to read reg index 0x00000114: -19 smsc95xx 2-1.1:1.0 eth1: Error reading MII_ACCESS smsc95xx 2-1.1:1.0 eth1: __smsc95xx_mdio_read: MII is busy smsc95xx 2-1.1:1.0 eth1: hardware isn't capable of remote wakeup usb 2-1.4: USB disconnect, device number 4 ci_hdrc ci_hdrc.1: USB bus 2 deregistered ci_hdrc ci_hdrc.0: remove, state 4 usb usb1: USB disconnect, device number 1 ci_hdrc ci_hdrc.0: USB bus 1 deregistered imx2-wdt 30280000.watchdog: Device shutdown: Expect reboot! reboot: Restarting system
Ignore the -ENODEV errors inside __smsc95xx_mdio_read() and __smsc95xx_phy_wait_not_busy() and do not print error messages when -ENODEV is returned.
Fixes: a049a30fc27c ("net: usb: Correct PHY handling of smsc95xx") Signed-off-by: Fabio Estevam festevam@denx.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/smsc95xx.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index bc1e3dd67c04..a0f29482294d 100644 --- a/drivers/net/usb/smsc95xx.c +++ b/drivers/net/usb/smsc95xx.c @@ -84,9 +84,10 @@ static int __must_check __smsc95xx_read_reg(struct usbnet *dev, u32 index, ret = fn(dev, USB_VENDOR_REQUEST_READ_REGISTER, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, index, &buf, 4); - if (unlikely(ret < 0)) { - netdev_warn(dev->net, "Failed to read reg index 0x%08x: %d\n", - index, ret); + if (ret < 0) { + if (ret != -ENODEV) + netdev_warn(dev->net, "Failed to read reg index 0x%08x: %d\n", + index, ret); return ret; }
@@ -116,7 +117,7 @@ static int __must_check __smsc95xx_write_reg(struct usbnet *dev, u32 index, ret = fn(dev, USB_VENDOR_REQUEST_WRITE_REGISTER, USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, index, &buf, 4); - if (unlikely(ret < 0)) + if (ret < 0 && ret != -ENODEV) netdev_warn(dev->net, "Failed to write reg index 0x%08x: %d\n", index, ret);
@@ -159,6 +160,9 @@ static int __must_check __smsc95xx_phy_wait_not_busy(struct usbnet *dev, do { ret = __smsc95xx_read_reg(dev, MII_ADDR, &val, in_pm); if (ret < 0) { + /* Ignore -ENODEV error during disconnect() */ + if (ret == -ENODEV) + return 0; netdev_warn(dev->net, "Error reading MII_ACCESS\n"); return ret; } @@ -194,7 +198,8 @@ static int __smsc95xx_mdio_read(struct usbnet *dev, int phy_id, int idx, addr = mii_address_cmd(phy_id, idx, MII_READ_ | MII_BUSY_); ret = __smsc95xx_write_reg(dev, MII_ADDR, addr, in_pm); if (ret < 0) { - netdev_warn(dev->net, "Error writing MII_ADDR\n"); + if (ret != -ENODEV) + netdev_warn(dev->net, "Error writing MII_ADDR\n"); goto done; }
@@ -206,7 +211,8 @@ static int __smsc95xx_mdio_read(struct usbnet *dev, int phy_id, int idx,
ret = __smsc95xx_read_reg(dev, MII_DATA, &val, in_pm); if (ret < 0) { - netdev_warn(dev->net, "Error reading MII_DATA\n"); + if (ret != -ENODEV) + netdev_warn(dev->net, "Error reading MII_DATA\n"); goto done; }
@@ -214,6 +220,10 @@ static int __smsc95xx_mdio_read(struct usbnet *dev, int phy_id, int idx,
done: mutex_unlock(&dev->phy_mutex); + + /* Ignore -ENODEV error during disconnect() */ + if (ret == -ENODEV) + return 0; return ret; }
@@ -235,7 +245,8 @@ static void __smsc95xx_mdio_write(struct usbnet *dev, int phy_id, val = regval; ret = __smsc95xx_write_reg(dev, MII_DATA, val, in_pm); if (ret < 0) { - netdev_warn(dev->net, "Error writing MII_DATA\n"); + if (ret != -ENODEV) + netdev_warn(dev->net, "Error writing MII_DATA\n"); goto done; }
@@ -243,7 +254,8 @@ static void __smsc95xx_mdio_write(struct usbnet *dev, int phy_id, addr = mii_address_cmd(phy_id, idx, MII_WRITE_ | MII_BUSY_); ret = __smsc95xx_write_reg(dev, MII_ADDR, addr, in_pm); if (ret < 0) { - netdev_warn(dev->net, "Error writing MII_ADDR\n"); + if (ret != -ENODEV) + netdev_warn(dev->net, "Error writing MII_ADDR\n"); goto done; }
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit 660c619b9d7ccd28648ee3766cdbe94ec7b27402 ]
It appears that GPIO ACPI library uses ACPI debounce values directly. However, the GPIO library APIs expect the debounce timeout to be in microseconds.
Convert ACPI value of debounce to microseconds.
While at it, document this detail where it is appropriate.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215664 Reported-by: Kai-Heng Feng kai.heng.feng@canonical.com Fixes: 8dcb7a15a585 ("gpiolib: acpi: Take into account debounce settings") Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Tested-by: Kai-Heng Feng kai.heng.feng@canonical.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib-acpi.c | 6 ++++-- drivers/gpio/gpiolib.c | 10 ++++++++++ 2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpiolib-acpi.c b/drivers/gpio/gpiolib-acpi.c index feb8157d2d67..c49b3b5334cd 100644 --- a/drivers/gpio/gpiolib-acpi.c +++ b/drivers/gpio/gpiolib-acpi.c @@ -308,7 +308,8 @@ static struct gpio_desc *acpi_request_own_gpiod(struct gpio_chip *chip, if (IS_ERR(desc)) return desc;
- ret = gpio_set_debounce_timeout(desc, agpio->debounce_timeout); + /* ACPI uses hundredths of milliseconds units */ + ret = gpio_set_debounce_timeout(desc, agpio->debounce_timeout * 10); if (ret) dev_warn(chip->parent, "Failed to set debounce-timeout for pin 0x%04X, err %d\n", @@ -1049,7 +1050,8 @@ int acpi_dev_gpio_irq_get_by(struct acpi_device *adev, const char *name, int ind if (ret < 0) return ret;
- ret = gpio_set_debounce_timeout(desc, info.debounce); + /* ACPI uses hundredths of milliseconds units */ + ret = gpio_set_debounce_timeout(desc, info.debounce * 10); if (ret) return ret;
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index abfbf546d159..a1dca6dc03b4 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -2191,6 +2191,16 @@ static int gpio_set_bias(struct gpio_desc *desc) return gpio_set_config_with_argument_optional(desc, bias, arg); }
+/** + * gpio_set_debounce_timeout() - Set debounce timeout + * @desc: GPIO descriptor to set the debounce timeout + * @debounce: Debounce timeout in microseconds + * + * The function calls the certain GPIO driver to set debounce timeout + * in the hardware. + * + * Returns 0 on success, or negative error code otherwise. + */ int gpio_set_debounce_timeout(struct gpio_desc *desc, unsigned int debounce) { return gpio_set_config_with_argument_optional(desc,
From: Jouni Högander jouni.hogander@intel.com
[ Upstream commit 804f468853179b9b58af05c153c411931aa5b310 ]
Currently we are observing occasional screen flickering when PSR2 selective fetch is enabled. More specifically glitch seems to happen on full frame update when cursor moves to coords x = -1 or y = -1.
According to Bspec SF Single full frame should not be set if SF Partial Frame Enable is not set. This happened to be true for ADLP as PSR2_MAN_TRK_CTL_ENABLE is always set and for ADL_P it's actually "SF Partial Frame Enable" (Bit 31).
Setting "SF Partial Frame Enable" bit also on full update seems to fix screen flickering.
Also make code more clear by setting PSR2_MAN_TRK_CTL_ENABLE only if not on ADL_P. Bit 31 has different meaning in ADL_P.
Bspec: 49274
v2: Fix Mihai Harpau email address v3: Modify commit message and remove unnecessary comment
Tested-by: Lyude Paul lyude@redhat.com Fixes: 7f6002e58025 ("drm/i915/display: Enable PSR2 selective fetch by default") Reported-by: Lyude Paul lyude@redhat.com Cc: Mihai Harpau mharpau@gmail.com Cc: José Roberto de Souza jose.souza@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Bugzilla: https://gitlab.freedesktop.org/drm/intel/-/issues/5077 Signed-off-by: Jouni Högander jouni.hogander@intel.com Reviewed-by: José Roberto de Souza jose.souza@intel.com Signed-off-by: José Roberto de Souza jose.souza@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20220225070228.855138-1-jouni.... (cherry picked from commit 8d5516d18b323cf7274d1cf5fe76f4a691f879c6) Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/display/intel_psr.c | 16 ++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 1 + 2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_psr.c b/drivers/gpu/drm/i915/display/intel_psr.c index 7a205fd5023b..3ba8b717e176 100644 --- a/drivers/gpu/drm/i915/display/intel_psr.c +++ b/drivers/gpu/drm/i915/display/intel_psr.c @@ -1400,6 +1400,13 @@ static inline u32 man_trk_ctl_single_full_frame_bit_get(struct drm_i915_private PSR2_MAN_TRK_CTL_SF_SINGLE_FULL_FRAME; }
+static inline u32 man_trk_ctl_partial_frame_bit_get(struct drm_i915_private *dev_priv) +{ + return IS_ALDERLAKE_P(dev_priv) ? + ADLP_PSR2_MAN_TRK_CTL_SF_PARTIAL_FRAME_UPDATE : + PSR2_MAN_TRK_CTL_SF_PARTIAL_FRAME_UPDATE; +} + static void psr_force_hw_tracking_exit(struct intel_dp *intel_dp) { struct drm_i915_private *dev_priv = dp_to_i915(intel_dp); @@ -1495,7 +1502,13 @@ static void psr2_man_trk_ctl_calc(struct intel_crtc_state *crtc_state, { struct intel_crtc *crtc = to_intel_crtc(crtc_state->uapi.crtc); struct drm_i915_private *dev_priv = to_i915(crtc->base.dev); - u32 val = PSR2_MAN_TRK_CTL_ENABLE; + u32 val = 0; + + if (!IS_ALDERLAKE_P(dev_priv)) + val = PSR2_MAN_TRK_CTL_ENABLE; + + /* SF partial frame enable has to be set even on full update */ + val |= man_trk_ctl_partial_frame_bit_get(dev_priv);
if (full_update) { /* @@ -1515,7 +1528,6 @@ static void psr2_man_trk_ctl_calc(struct intel_crtc_state *crtc_state, } else { drm_WARN_ON(crtc_state->uapi.crtc->dev, clip->y1 % 4 || clip->y2 % 4);
- val |= PSR2_MAN_TRK_CTL_SF_PARTIAL_FRAME_UPDATE; val |= PSR2_MAN_TRK_CTL_SU_REGION_START_ADDR(clip->y1 / 4 + 1); val |= PSR2_MAN_TRK_CTL_SU_REGION_END_ADDR(clip->y2 / 4 + 1); } diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 14ce8809efdd..e927776ae183 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -4738,6 +4738,7 @@ enum { #define ADLP_PSR2_MAN_TRK_CTL_SU_REGION_START_ADDR(val) REG_FIELD_PREP(ADLP_PSR2_MAN_TRK_CTL_SU_REGION_START_ADDR_MASK, val) #define ADLP_PSR2_MAN_TRK_CTL_SU_REGION_END_ADDR_MASK REG_GENMASK(12, 0) #define ADLP_PSR2_MAN_TRK_CTL_SU_REGION_END_ADDR(val) REG_FIELD_PREP(ADLP_PSR2_MAN_TRK_CTL_SU_REGION_END_ADDR_MASK, val) +#define ADLP_PSR2_MAN_TRK_CTL_SF_PARTIAL_FRAME_UPDATE REG_BIT(31) #define ADLP_PSR2_MAN_TRK_CTL_SF_SINGLE_FULL_FRAME REG_BIT(14) #define ADLP_PSR2_MAN_TRK_CTL_SF_CONTINUOS_FULL_FRAME REG_BIT(13)
From: Jernej Skrabec jernej.skrabec@gmail.com
[ Upstream commit 9470c29faa91c804aa04de4c10634bf02462bfa5 ]
It turns out that DE3 manual has inverted YUV and YVU format numbers for P010 and P210. Invert them.
This was tested by playing video decoded to P010 and additionally confirmed by looking at BSP driver source.
Fixes: 169ca4b38932 ("drm/sun4i: Add separate DE3 VI layer formats") Signed-off-by: Jernej Skrabec jernej.skrabec@gmail.com Signed-off-by: Maxime Ripard maxime@cerno.tech Link: https://patchwork.freedesktop.org/patch/msgid/20220228181436.1424550-1-jerne... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/sun4i/sun8i_mixer.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/sun4i/sun8i_mixer.h b/drivers/gpu/drm/sun4i/sun8i_mixer.h index 145833a9d82d..5b3fbee18671 100644 --- a/drivers/gpu/drm/sun4i/sun8i_mixer.h +++ b/drivers/gpu/drm/sun4i/sun8i_mixer.h @@ -111,10 +111,10 @@ /* format 13 is semi-planar YUV411 VUVU */ #define SUN8I_MIXER_FBFMT_YUV411 14 /* format 15 doesn't exist */ -/* format 16 is P010 YVU */ -#define SUN8I_MIXER_FBFMT_P010_YUV 17 -/* format 18 is P210 YVU */ -#define SUN8I_MIXER_FBFMT_P210_YUV 19 +#define SUN8I_MIXER_FBFMT_P010_YUV 16 +/* format 17 is P010 YVU */ +#define SUN8I_MIXER_FBFMT_P210_YUV 18 +/* format 19 is P210 YVU */ /* format 20 is packed YVU444 10-bit */ /* format 21 is packed YUV444 10-bit */
From: Russell King (Oracle) rmk+kernel@armlinux.org.uk
[ Upstream commit e5417cbf7ab5df1632e68fe7d9e6331fc0e7dbd6 ]
Discussing one of the tests in mt753x_phylink_validate() with Landen Chao confirms that the "||" should be "&&". Fix this.
Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Link: https://lore.kernel.org/r/E1nRCF0-00CiXD-7q@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index fb59efc7f926..14bf1828cbba 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -2928,7 +2928,7 @@ mt753x_phylink_validate(struct dsa_switch *ds, int port,
phylink_set_port_modes(mask);
- if (state->interface != PHY_INTERFACE_MODE_TRGMII || + if (state->interface != PHY_INTERFACE_MODE_TRGMII && !phy_interface_mode_is_8023z(state->interface)) { phylink_set(mask, 10baseT_Half); phylink_set(mask, 10baseT_Full);
From: Joel Stanley joel@jms.id.au
[ Upstream commit 2f6edb6bcb2f3f41d876e0eba2ba97f87a0296ea ]
Requesting quad mode for the FMC resulted in an error:
&fmc { status = "okay"; + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_fwqspi_default>'
[ 0.742963] aspeed-g6-pinctrl 1e6e2000.syscon:pinctrl: invalid function FWQSPID in map table 
This is because the quad mode pins are a group of pins, not a function.
After applying this patch we can request the pins and the QSPI data lines are muxed:
# cat /sys/kernel/debug/pinctrl/1e6e2000.syscon:pinctrl-aspeed-g6-pinctrl/pinmux-pins |grep 1e620000.spi pin 196 (AE12): device 1e620000.spi function FWSPID group FWQSPID pin 197 (AF12): device 1e620000.spi function FWSPID group FWQSPID pin 240 (Y1): device 1e620000.spi function FWSPID group FWQSPID pin 241 (Y2): device 1e620000.spi function FWSPID group FWQSPID pin 242 (Y3): device 1e620000.spi function FWSPID group FWQSPID pin 243 (Y4): device 1e620000.spi function FWSPID group FWQSPID
Fixes: f510f04c8c83 ("ARM: dts: aspeed: Add AST2600 pinmux nodes") Signed-off-by: Joel Stanley joel@jms.id.au Reviewed-by: Andrew Jeffery andrew@aj.id.au Link: https://lore.kernel.org/r/20220304011010.974863-1-joel@jms.id.au Link: https://lore.kernel.org/r/20220304011010.974863-1-joel@jms.id.au' Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/aspeed-g6-pinctrl.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/aspeed-g6-pinctrl.dtsi b/arch/arm/boot/dts/aspeed-g6-pinctrl.dtsi index 6dde51c2aed3..e4775bbceecc 100644 --- a/arch/arm/boot/dts/aspeed-g6-pinctrl.dtsi +++ b/arch/arm/boot/dts/aspeed-g6-pinctrl.dtsi @@ -118,7 +118,7 @@ pinctrl_fwspid_default: fwspid_default { };
pinctrl_fwqspid_default: fwqspid_default { - function = "FWQSPID"; + function = "FWSPID"; groups = "FWQSPID"; };
From: Michal Maloszewski michal.maloszewski@intel.com
[ Upstream commit 2cf29e55894886965722e6625f6a03630b4db31d ]
Modify netdev->features for vlan stripping based on virtual channel messages received from the PF. Change is needed to synchronize vlan strip status between PF sysfs and iavf ethtool.
Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR") Signed-off-by: Norbert Ciosek norbertx.ciosek@intel.com Signed-off-by: Michal Maloszewski michal.maloszewski@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/intel/iavf/iavf_virtchnl.c | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c index d3da65d24bd6..c83ac6adeeb7 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c @@ -1460,6 +1460,22 @@ void iavf_request_reset(struct iavf_adapter *adapter) adapter->current_op = VIRTCHNL_OP_UNKNOWN; }
+/** + * iavf_netdev_features_vlan_strip_set - update vlan strip status + * @netdev: ptr to netdev being adjusted + * @enable: enable or disable vlan strip + * + * Helper function to change vlan strip status in netdev->features. + */ +static void iavf_netdev_features_vlan_strip_set(struct net_device *netdev, + const bool enable) +{ + if (enable) + netdev->features |= NETIF_F_HW_VLAN_CTAG_RX; + else + netdev->features &= ~NETIF_F_HW_VLAN_CTAG_RX; +} + /** * iavf_virtchnl_completion * @adapter: adapter structure @@ -1683,8 +1699,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, } break; case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING: + dev_warn(&adapter->pdev->dev, "Changing VLAN Stripping is not allowed when Port VLAN is configured\n"); + /* Vlan stripping could not be enabled by ethtool. + * Disable it in netdev->features. + */ + iavf_netdev_features_vlan_strip_set(netdev, false); + break; case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING: dev_warn(&adapter->pdev->dev, "Changing VLAN Stripping is not allowed when Port VLAN is configured\n"); + /* Vlan stripping could not be disabled by ethtool. + * Enable it in netdev->features. + */ + iavf_netdev_features_vlan_strip_set(netdev, true); break; default: dev_err(&adapter->pdev->dev, "PF returned error %d (%s) to our request %d\n", @@ -1918,6 +1944,20 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, spin_unlock_bh(&adapter->adv_rss_lock); } break; + case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING: + /* PF enabled vlan strip on this VF. + * Update netdev->features if needed to be in sync with ethtool. + */ + if (!v_retval) + iavf_netdev_features_vlan_strip_set(netdev, true); + break; + case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING: + /* PF disabled vlan strip on this VF. + * Update netdev->features if needed to be in sync with ethtool. + */ + if (!v_retval) + iavf_netdev_features_vlan_strip_set(netdev, false); + break; default: if (adapter->current_op && (v_opcode != adapter->current_op)) dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
From: Jacob Keller jacob.e.keller@intel.com
[ Upstream commit 5710ab79166504013f7c0ae6a57e7d2fd26e5c43 ]
The i40e_vc_send_msg_to_vf_ex (and its wrapper i40e_vc_send_msg_to_vf) function has logic to detect "failure" responses sent to the VF. If a VF is sent more than I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED, then the VF is marked as disabled. In either case, a dev_info message is printed stating that a VF opcode failed.
This logic originates from the early implementation of VF support in commit 5c3c48ac6bf5 ("i40e: implement virtual device interface").
That commit did not go far enough. The "logic" for this behavior seems to be that error responses somehow indicate a malicious VF. This is not really true. The PF might be sending an error for any number of reasons such as lacking resources, an unsupported operation, etc. This does not indicate a malicious VF. We already have a separate robust malicious VF detection which relies on hardware logic to detect and prevent a variety of behaviors.
There is no justification for this behavior in the original implementation. In fact, a later commit 18b7af57d9c1 ("i40e: Lower some message levels") reduced the opcode failure message from a dev_err to a dev_info. In addition, recent commit 01cbf50877e6 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change") changed the logic to allow quieting it for expected failures.
That commit prevented this logic from kicking in for specific circumstances. This change did not go far enough. The behavior is not documented nor is it part of any requirement for our products. Other operating systems such as the FreeBSD implementation of our driver do not include this logic.
It is clear this check does not make sense, and causes problems which led to ugly workarounds.
Fix this by just removing the entire logic and the need for the i40e_vc_send_msg_to_vf_ex function.
Fixes: 01cbf50877e6 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change") Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/intel/i40e/i40e_debugfs.c | 6 +- .../ethernet/intel/i40e/i40e_virtchnl_pf.c | 57 +++---------------- .../ethernet/intel/i40e/i40e_virtchnl_pf.h | 5 -- 3 files changed, 9 insertions(+), 59 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c index 1e57cc8c47d7..9db5001297c7 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c @@ -742,10 +742,8 @@ static void i40e_dbg_dump_vf(struct i40e_pf *pf, int vf_id) vsi = pf->vsi[vf->lan_vsi_idx]; dev_info(&pf->pdev->dev, "vf %2d: VSI id=%d, seid=%d, qps=%d\n", vf_id, vf->lan_vsi_id, vsi->seid, vf->num_queue_pairs); - dev_info(&pf->pdev->dev, " num MDD=%lld, invalid msg=%lld, valid msg=%lld\n", - vf->num_mdd_events, - vf->num_invalid_msgs, - vf->num_valid_msgs); + dev_info(&pf->pdev->dev, " num MDD=%lld\n", + vf->num_mdd_events); } else { dev_info(&pf->pdev->dev, "invalid VF id %d\n", vf_id); } diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index c6f643e54c4f..babf8b7fa767 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -1917,19 +1917,17 @@ int i40e_pci_sriov_configure(struct pci_dev *pdev, int num_vfs) /***********************virtual channel routines******************/
/** - * i40e_vc_send_msg_to_vf_ex + * i40e_vc_send_msg_to_vf * @vf: pointer to the VF info * @v_opcode: virtual channel opcode * @v_retval: virtual channel return value * @msg: pointer to the msg buffer * @msglen: msg length - * @is_quiet: true for not printing unsuccessful return values, false otherwise * * send msg to VF **/ -static int i40e_vc_send_msg_to_vf_ex(struct i40e_vf *vf, u32 v_opcode, - u32 v_retval, u8 *msg, u16 msglen, - bool is_quiet) +static int i40e_vc_send_msg_to_vf(struct i40e_vf *vf, u32 v_opcode, + u32 v_retval, u8 *msg, u16 msglen) { struct i40e_pf *pf; struct i40e_hw *hw; @@ -1944,25 +1942,6 @@ static int i40e_vc_send_msg_to_vf_ex(struct i40e_vf *vf, u32 v_opcode, hw = &pf->hw; abs_vf_id = vf->vf_id + hw->func_caps.vf_base_id;
- /* single place to detect unsuccessful return values */ - if (v_retval && !is_quiet) { - vf->num_invalid_msgs++; - dev_info(&pf->pdev->dev, "VF %d failed opcode %d, retval: %d\n", - vf->vf_id, v_opcode, v_retval); - if (vf->num_invalid_msgs > - I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED) { - dev_err(&pf->pdev->dev, - "Number of invalid messages exceeded for VF %d\n", - vf->vf_id); - dev_err(&pf->pdev->dev, "Use PF Control I/F to enable the VF\n"); - set_bit(I40E_VF_STATE_DISABLED, &vf->vf_states); - } - } else { - vf->num_valid_msgs++; - /* reset the invalid counter, if a valid message is received. */ - vf->num_invalid_msgs = 0; - } - aq_ret = i40e_aq_send_msg_to_vf(hw, abs_vf_id, v_opcode, v_retval, msg, msglen, NULL); if (aq_ret) { @@ -1975,23 +1954,6 @@ static int i40e_vc_send_msg_to_vf_ex(struct i40e_vf *vf, u32 v_opcode, return 0; }
-/** - * i40e_vc_send_msg_to_vf - * @vf: pointer to the VF info - * @v_opcode: virtual channel opcode - * @v_retval: virtual channel return value - * @msg: pointer to the msg buffer - * @msglen: msg length - * - * send msg to VF - **/ -static int i40e_vc_send_msg_to_vf(struct i40e_vf *vf, u32 v_opcode, - u32 v_retval, u8 *msg, u16 msglen) -{ - return i40e_vc_send_msg_to_vf_ex(vf, v_opcode, v_retval, - msg, msglen, false); -} - /** * i40e_vc_send_resp_to_vf * @vf: pointer to the VF info @@ -2813,7 +2775,6 @@ static int i40e_vc_get_stats_msg(struct i40e_vf *vf, u8 *msg) * i40e_check_vf_permission * @vf: pointer to the VF info * @al: MAC address list from virtchnl - * @is_quiet: set true for printing msg without opcode info, false otherwise * * Check that the given list of MAC addresses is allowed. Will return -EPERM * if any address in the list is not valid. Checks the following conditions: @@ -2828,15 +2789,13 @@ static int i40e_vc_get_stats_msg(struct i40e_vf *vf, u8 *msg) * addresses might not be accurate. **/ static inline int i40e_check_vf_permission(struct i40e_vf *vf, - struct virtchnl_ether_addr_list *al, - bool *is_quiet) + struct virtchnl_ether_addr_list *al) { struct i40e_pf *pf = vf->pf; struct i40e_vsi *vsi = pf->vsi[vf->lan_vsi_idx]; int mac2add_cnt = 0; int i;
- *is_quiet = false; for (i = 0; i < al->num_elements; i++) { struct i40e_mac_filter *f; u8 *addr = al->list[i].addr; @@ -2860,7 +2819,6 @@ static inline int i40e_check_vf_permission(struct i40e_vf *vf, !ether_addr_equal(addr, vf->default_lan_addr.addr)) { dev_err(&pf->pdev->dev, "VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation\n"); - *is_quiet = true; return -EPERM; }
@@ -2897,7 +2855,6 @@ static int i40e_vc_add_mac_addr_msg(struct i40e_vf *vf, u8 *msg) (struct virtchnl_ether_addr_list *)msg; struct i40e_pf *pf = vf->pf; struct i40e_vsi *vsi = NULL; - bool is_quiet = false; i40e_status ret = 0; int i;
@@ -2914,7 +2871,7 @@ static int i40e_vc_add_mac_addr_msg(struct i40e_vf *vf, u8 *msg) */ spin_lock_bh(&vsi->mac_filter_hash_lock);
- ret = i40e_check_vf_permission(vf, al, &is_quiet); + ret = i40e_check_vf_permission(vf, al); if (ret) { spin_unlock_bh(&vsi->mac_filter_hash_lock); goto error_param; @@ -2952,8 +2909,8 @@ static int i40e_vc_add_mac_addr_msg(struct i40e_vf *vf, u8 *msg)
error_param: /* send the response to the VF */ - return i40e_vc_send_msg_to_vf_ex(vf, VIRTCHNL_OP_ADD_ETH_ADDR, - ret, NULL, 0, is_quiet); + return i40e_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_ETH_ADDR, + ret, NULL, 0); }
/** diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h index 03c42fd0fea1..a554d0a0b09b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h @@ -10,8 +10,6 @@
#define I40E_VIRTCHNL_SUPPORTED_QTYPES 2
-#define I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED 10 - #define I40E_VLAN_PRIORITY_SHIFT 13 #define I40E_VLAN_MASK 0xFFF #define I40E_PRIORITY_MASK 0xE000 @@ -92,9 +90,6 @@ struct i40e_vf { u8 num_queue_pairs; /* num of qps assigned to VF vsis */ u8 num_req_queues; /* num of requested qps */ u64 num_mdd_events; /* num of mdd events detected */ - /* num of continuous malformed or invalid msgs detected */ - u64 num_invalid_msgs; - u64 num_valid_msgs; /* num of valid msgs detected */
unsigned long vf_caps; /* vf's adv. capabilities */ unsigned long vf_states; /* vf's runtime states */
From: Jacob Keller jacob.e.keller@intel.com
[ Upstream commit 79498d5af8e458102242d1667cf44df1f1564e63 ]
The ice_vc_send_msg_to_vf function has logic to detect "failure" responses being sent to a VF. If a VF is sent more than ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled. Almost identical logic also existed in the i40e driver.
This logic was added to the ice driver in commit 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") which itself copied from the i40e implementation in commit 5c3c48ac6bf5 ("i40e: implement virtual device interface").
Neither commit provides a proper explanation or justification of the check. In fact, later commits to i40e changed the logic to allow bypassing the check in some specific instances.
The "logic" for this seems to be that error responses somehow indicate a malicious VF. This is not really true. The PF might be sending an error for any number of reasons such as lack of resources, etc.
Additionally, this causes the PF to log an info message for every failed VF response which may confuse users, and can spam the kernel log.
This behavior is not documented as part of any requirement for our products and other operating system drivers such as the FreeBSD implementation of our drivers do not include this type of check.
In fact, the change from dev_err to dev_info in i40e commit 18b7af57d9c1 ("i40e: Lower some message levels") explains that these messages typically don't actually indicate a real issue. It is quite likely that a user who hits this in practice will be very confused as the VF will be disabled without an obvious way to recover.
We already have robust malicious driver detection logic using actual hardware detection mechanisms that detect and prevent invalid device usage. Remove the logic since its not a documented requirement and the behavior is not intuitive.
Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/intel/ice/ice_virtchnl_pf.c | 18 ------------------ .../net/ethernet/intel/ice/ice_virtchnl_pf.h | 3 --- 2 files changed, 21 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index a12cc305c461..e17813fb71a1 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -2297,24 +2297,6 @@ ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
dev = ice_pf_to_dev(pf);
- /* single place to detect unsuccessful return values */ - if (v_retval) { - vf->num_inval_msgs++; - dev_info(dev, "VF %d failed opcode %d, retval: %d\n", vf->vf_id, - v_opcode, v_retval); - if (vf->num_inval_msgs > ICE_DFLT_NUM_INVAL_MSGS_ALLOWED) { - dev_err(dev, "Number of invalid messages exceeded for VF %d\n", - vf->vf_id); - dev_err(dev, "Use PF Control I/F to enable the VF\n"); - set_bit(ICE_VF_STATE_DIS, vf->vf_states); - return -EIO; - } - } else { - vf->num_valid_msgs++; - /* reset the invalid counter, if a valid message is received. */ - vf->num_inval_msgs = 0; - } - aq_ret = ice_aq_send_msg_to_vf(&pf->hw, vf->vf_id, v_opcode, v_retval, msg, msglen, NULL); if (aq_ret && pf->hw.mailboxq.sq_last_status != ICE_AQ_RC_ENOSYS) { diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h index 7e28ecbbe7af..f33c0889a5d4 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h @@ -14,7 +14,6 @@ #define ICE_MAX_MACADDR_PER_VF 18
/* Malicious Driver Detection */ -#define ICE_DFLT_NUM_INVAL_MSGS_ALLOWED 10 #define ICE_MDD_EVENTS_THRESHOLD 30
/* Static VF transaction/status register def */ @@ -134,8 +133,6 @@ struct ice_vf { unsigned int max_tx_rate; /* Maximum Tx bandwidth limit in Mbps */ DECLARE_BITMAP(vf_states, ICE_VF_STATES_NBITS); /* VF runtime states */
- u64 num_inval_msgs; /* number of continuous invalid msgs */ - u64 num_valid_msgs; /* number of valid msgs detected */ unsigned long vf_caps; /* VF's adv. capabilities */ u8 num_req_qs; /* num of queue pairs requested by VF */ u16 num_mac;
From: Dave Ertman david.m.ertman@intel.com
[ Upstream commit 97b0129146b1544bbb0773585327896da3bb4e0a ]
When a bonded interface is destroyed, .ndo_change_mtu can be called during the tear-down process while the RTNL lock is held. This is a problem since the auxiliary driver linked to the LAN driver needs to be notified of the MTU change, and this requires grabbing a device_lock on the auxiliary_device's dev. Currently this is being attempted in the same execution context as the call to .ndo_change_mtu which is causing a dead-lock.
Move the notification of the changed MTU to a separate execution context (watchdog service task) and eliminate the "before" notification.
Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Dave Ertman david.m.ertman@intel.com Tested-by: Jonathan Toppins jtoppins@redhat.com Tested-by: Gurucharan G gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice.h | 1 + drivers/net/ethernet/intel/ice/ice_main.c | 29 +++++++++++------------ 2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index b067dd9c71e7..fa91896ae699 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -483,6 +483,7 @@ enum ice_pf_flags { ICE_FLAG_MDD_AUTO_RESET_VF, ICE_FLAG_LINK_LENIENT_MODE_ENA, ICE_FLAG_PLUG_AUX_DEV, + ICE_FLAG_MTU_CHANGED, ICE_PF_FLAGS_NBITS /* must be last */ };
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 8ee778aaa800..fc04b4cf4ae0 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2240,6 +2240,17 @@ static void ice_service_task(struct work_struct *work) if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) ice_plug_aux_dev(pf);
+ if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) { + struct iidc_event *event; + + event = kzalloc(sizeof(*event), GFP_KERNEL); + if (event) { + set_bit(IIDC_EVENT_AFTER_MTU_CHANGE, event->type); + ice_send_event_to_aux(pf, event); + kfree(event); + } + } + ice_clean_adminq_subtask(pf); ice_check_media_subtask(pf); ice_check_for_hang_subtask(pf); @@ -6822,7 +6833,6 @@ static int ice_change_mtu(struct net_device *netdev, int new_mtu) struct ice_netdev_priv *np = netdev_priv(netdev); struct ice_vsi *vsi = np->vsi; struct ice_pf *pf = vsi->back; - struct iidc_event *event; u8 count = 0; int err = 0;
@@ -6857,14 +6867,6 @@ static int ice_change_mtu(struct net_device *netdev, int new_mtu) return -EBUSY; }
- event = kzalloc(sizeof(*event), GFP_KERNEL); - if (!event) - return -ENOMEM; - - set_bit(IIDC_EVENT_BEFORE_MTU_CHANGE, event->type); - ice_send_event_to_aux(pf, event); - clear_bit(IIDC_EVENT_BEFORE_MTU_CHANGE, event->type); - netdev->mtu = (unsigned int)new_mtu;
/* if VSI is up, bring it down and then back up */ @@ -6872,21 +6874,18 @@ static int ice_change_mtu(struct net_device *netdev, int new_mtu) err = ice_down(vsi); if (err) { netdev_err(netdev, "change MTU if_down err %d\n", err); - goto event_after; + return err; }
err = ice_up(vsi); if (err) { netdev_err(netdev, "change MTU if_up err %d\n", err); - goto event_after; + return err; } }
netdev_dbg(netdev, "changed MTU to %d\n", new_mtu); -event_after: - set_bit(IIDC_EVENT_AFTER_MTU_CHANGE, event->type); - ice_send_event_to_aux(pf, event); - kfree(event); + set_bit(ICE_FLAG_MTU_CHANGED, pf->flags);
return err; }
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 3d97f1afd8d831e0c0dc1157418f94b8faa97b54 ]
ice_misc_intr() is an irq handler. It should not sleep.
Use GFP_ATOMIC instead of GFP_KERNEL when allocating some memory.
Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Tested-by: Leszek Kaliszczuk leszek.kaliszczuk@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index fc04b4cf4ae0..676e837d48cf 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -3016,7 +3016,7 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data) struct iidc_event *event;
ena_mask &= ~ICE_AUX_CRIT_ERR; - event = kzalloc(sizeof(*event), GFP_KERNEL); + event = kzalloc(sizeof(*event), GFP_ATOMIC); if (event) { set_bit(IIDC_EVENT_CRIT_ERR, event->type); /* report the entire OICR value to AUX driver */
From: Jedrzej Jagielski jedrzej.jagielski@intel.com
[ Upstream commit ad35ffa252af67d4cc7c744b9377a2b577748e3f ]
Change curr_link_speed advertised speed, due to link_info.link_speed is not equal phy.curr_user_speed_req. Without this patch it is impossible to set advertised speed to same as link_speed.
Testing Hints: Try to set advertised speed to 25G only with 25G default link (use ethtool -s 0x80000000)
Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Grzegorz Siwik grzegorz.siwik@intel.com Signed-off-by: Jedrzej Jagielski jedrzej.jagielski@intel.com Tested-by: Gurucharan gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index 572519e402f4..b05a5029b61f 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -2314,7 +2314,7 @@ ice_set_link_ksettings(struct net_device *netdev, goto done; }
- curr_link_speed = pi->phy.link_info.link_speed; + curr_link_speed = pi->phy.curr_user_speed_req; adv_link_speed = ice_ksettings_find_adv_link_speed(ks);
/* If speed didn't get set, set it to what it currently is.
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit b19ab4b38b06aae12442b2de95ccf58b5dc53584 ]
This node pointer is returned by of_parse_phandle() with refcount incremented in this function. Calling of_node_put() to avoid the refcount leak. As the remove function do.
Fixes: 5cdaaa12866e ("net: emaclite: adding MDIO and phy lib support") Signed-off-by: Miaoqian Lin linmq006@gmail.com Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20220308024751.2320-1-linmq006@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_emaclite.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c index 0815de581c7f..7ae67b054191 100644 --- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c +++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c @@ -1186,7 +1186,7 @@ static int xemaclite_of_probe(struct platform_device *ofdev) if (rc) { dev_err(dev, "Cannot register network device, aborting\n"); - goto error; + goto put_node; }
dev_info(dev, @@ -1194,6 +1194,8 @@ static int xemaclite_of_probe(struct platform_device *ofdev) (unsigned long __force)ndev->mem_start, lp->base_addr, ndev->irq); return 0;
+put_node: + of_node_put(lp->phy_node); error: free_netdev(ndev); return rc;
From: Tung Nguyen tung.q.nguyen@dektech.com.au
[ Upstream commit c79fcc27be90b308b3fa90811aefafdd4078668c ]
When receiving a state message, function tipc_link_validate_msg() is called to validate its header portion. Then, its data portion is validated before it can be accessed correctly. However, current data sanity check is done after the message header is accessed to update some link variables.
This commit fixes this issue by moving the data sanity check to the beginning of state message handling and right after the header sanity check.
Fixes: 9aa422ad3266 ("tipc: improve size validations for received domain records") Acked-by: Jon Maloy jmaloy@redhat.com Signed-off-by: Tung Nguyen tung.q.nguyen@dektech.com.au Link: https://lore.kernel.org/r/20220308021200.9245-1-tung.q.nguyen@dektech.com.au Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/link.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/tipc/link.c b/net/tipc/link.c index 4e7936d9b442..115a4a7950f5 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -2285,6 +2285,11 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, break;
case STATE_MSG: + /* Validate Gap ACK blocks, drop if invalid */ + glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); + if (glen > dlen) + break; + l->rcv_nxt_state = msg_seqno(hdr) + 1;
/* Update own tolerance if peer indicates a non-zero value */ @@ -2310,10 +2315,6 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, break; }
- /* Receive Gap ACK blocks from peer if any */ - glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); - if(glen > dlen) - break; tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr, &l->mon_state, l->bearer_id);
From: Jiasheng Jiang jiasheng@iscas.ac.cn
[ Upstream commit 6babfc6e6fab068018c36e8f6605184b8c0b349d ]
As the potential failure of the clk_enable(), it should be better to check it and return error if fails.
Fixes: 8a2c9a5ab4b9 ("net: ethernet: ti: cpts: rework initialization/deinitialization") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ti/cpts.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c index dc70a6bfaa6a..92ca739fac01 100644 --- a/drivers/net/ethernet/ti/cpts.c +++ b/drivers/net/ethernet/ti/cpts.c @@ -568,7 +568,9 @@ int cpts_register(struct cpts *cpts) for (i = 0; i < CPTS_MAX_EVENTS; i++) list_add(&cpts->pool_data[i].list, &cpts->pool);
- clk_enable(cpts->refclk); + err = clk_enable(cpts->refclk); + if (err) + return err;
cpts_write32(cpts, CPTS_EN, control); cpts_write32(cpts, TS_PEND_EN, int_enable);
From: Jiasheng Jiang jiasheng@iscas.ac.cn
[ Upstream commit 2169b79258c8be803d2595d6456b1e77129fe154 ]
As the potential failure of the clk_enable(), it should be better to check it and return error if fails.
Fixes: b7370112f519 ("lpc32xx: Added ethernet driver") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/nxp/lpc_eth.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c index bc39558fe82b..756f97dce85b 100644 --- a/drivers/net/ethernet/nxp/lpc_eth.c +++ b/drivers/net/ethernet/nxp/lpc_eth.c @@ -1471,6 +1471,7 @@ static int lpc_eth_drv_resume(struct platform_device *pdev) { struct net_device *ndev = platform_get_drvdata(pdev); struct netdata_local *pldat; + int ret;
if (device_may_wakeup(&pdev->dev)) disable_irq_wake(ndev->irq); @@ -1480,7 +1481,9 @@ static int lpc_eth_drv_resume(struct platform_device *pdev) pldat = netdev_priv(ndev);
/* Enable interface clock */ - clk_enable(pldat->clk); + ret = clk_enable(pldat->clk); + if (ret) + return ret;
/* Reset and initialize */ __lpc_eth_reset(pldat);
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit c9ffa3e2bc451816ce0295e40063514fabf2bd36 ]
This node pointer is returned by of_find_compatible_node() with refcount incremented. Calling of_node_put() to aovid the refcount leak.
Fixes: 501ef3066c89 ("net: marvell: prestera: Add driver for Prestera family ASIC devices") Signed-off-by: Miaoqian Lin linmq006@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/prestera/prestera_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_main.c b/drivers/net/ethernet/marvell/prestera/prestera_main.c index c687dc9aa973..36c5b1eba30d 100644 --- a/drivers/net/ethernet/marvell/prestera/prestera_main.c +++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c @@ -553,6 +553,7 @@ static int prestera_switch_set_base_mac_addr(struct prestera_switch *sw) dev_info(prestera_dev(sw), "using random base mac address\n"); } of_node_put(base_mac_np); + of_node_put(np);
return prestera_hw_switch_mac_set(sw, sw->base_mac); }
From: Duoming Zhou duoming@zju.edu.cn
[ Upstream commit 71171ac8eb34ce7fe6b3267dce27c313ab3cb3ac ]
When two ax25 devices attempted to establish connection, the requester use ax25_create(), ax25_bind() and ax25_connect() to initiate connection. The receiver use ax25_rcv() to accept connection and use ax25_create_cb() in ax25_rcv() to create ax25_cb, but the ax25_cb->sk is NULL. When the receiver is detaching, a NULL pointer dereference bug caused by sock_hold(sk) in ax25_kill_by_device() will happen. The corresponding fail log is shown below:
=============================================================== BUG: KASAN: null-ptr-deref in ax25_device_event+0xfd/0x290 Call Trace: ... ax25_device_event+0xfd/0x290 raw_notifier_call_chain+0x5e/0x70 dev_close_many+0x174/0x220 unregister_netdevice_many+0x1f7/0xa60 unregister_netdevice_queue+0x12f/0x170 unregister_netdev+0x13/0x20 mkiss_close+0xcd/0x140 tty_ldisc_release+0xc0/0x220 tty_release_struct+0x17/0xa0 tty_release+0x62d/0x670 ...
This patch add condition check in ax25_kill_by_device(). If s->sk is NULL, it will goto if branch to kill device.
Fixes: 4e0f718daf97 ("ax25: improve the incomplete fix to avoid UAF and NPD bugs") Reported-by: Thomas Osterried thomas@osterried.de Signed-off-by: Duoming Zhou duoming@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ax25/af_ax25.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 44a8730c26ac..00bb087c2ca8 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -87,6 +87,13 @@ static void ax25_kill_by_device(struct net_device *dev) ax25_for_each(s, &ax25_list) { if (s->ax25_dev == ax25_dev) { sk = s->sk; + if (!sk) { + spin_unlock_bh(&ax25_list_lock); + s->ax25_dev = NULL; + ax25_disconnect(s, ENETUNREACH); + spin_lock_bh(&ax25_list_lock); + goto again; + } sock_hold(sk); spin_unlock_bh(&ax25_list_lock); lock_sock(sk);
From: Mohammad Kabat mohammadkab@nvidia.com
[ Upstream commit ac77998b7ac3044f0509b097da9637184598980d ]
According to HW spec the field "size" should be 16 bits in bufferx register.
Fixes: e281682bf294 ("net/mlx5_core: HW data structs/types definitions cleanup") Signed-off-by: Mohammad Kabat mohammadkab@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/mlx5/mlx5_ifc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index fbaab440a484..58a60e46c319 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -9875,8 +9875,8 @@ struct mlx5_ifc_bufferx_reg_bits { u8 reserved_at_0[0x6]; u8 lossy[0x1]; u8 epsb[0x1]; - u8 reserved_at_8[0xc]; - u8 size[0xc]; + u8 reserved_at_8[0x8]; + u8 size[0x10];
u8 xoff_threshold[0x10]; u8 xon_threshold[0x10];
From: Moshe Shemesh moshe@nvidia.com
[ Upstream commit 063bd355595428750803d8736a9bb7c8db67d42d ]
Fix a refcount use after free warning due to a race on command entry. Such race occurs when one of the commands releases its last refcount and frees its index and entry while another process running command flush flow takes refcount to this command entry. The process which handles commands flush may see this command as needed to be flushed if the other process released its refcount but didn't release the index yet. Fix it by adding the needed spin lock.
It fixes the following warning trace:
refcount_t: addition on 0; use-after-free. WARNING: CPU: 11 PID: 540311 at lib/refcount.c:25 refcount_warn_saturate+0x80/0xe0 ... RIP: 0010:refcount_warn_saturate+0x80/0xe0 ... Call Trace: <TASK> mlx5_cmd_trigger_completions+0x293/0x340 [mlx5_core] mlx5_cmd_flush+0x3a/0xf0 [mlx5_core] enter_error_state+0x44/0x80 [mlx5_core] mlx5_fw_fatal_reporter_err_work+0x37/0xe0 [mlx5_core] process_one_work+0x1be/0x390 worker_thread+0x4d/0x3d0 ? rescuer_thread+0x350/0x350 kthread+0x141/0x160 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK>
Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Eran Ben Elisha eranbe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 17fe05809653..3eacd8739929 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -131,11 +131,8 @@ static int cmd_alloc_index(struct mlx5_cmd *cmd)
static void cmd_free_index(struct mlx5_cmd *cmd, int idx) { - unsigned long flags; - - spin_lock_irqsave(&cmd->alloc_lock, flags); + lockdep_assert_held(&cmd->alloc_lock); set_bit(idx, &cmd->bitmask); - spin_unlock_irqrestore(&cmd->alloc_lock, flags); }
static void cmd_ent_get(struct mlx5_cmd_work_ent *ent) @@ -145,17 +142,21 @@ static void cmd_ent_get(struct mlx5_cmd_work_ent *ent)
static void cmd_ent_put(struct mlx5_cmd_work_ent *ent) { + struct mlx5_cmd *cmd = ent->cmd; + unsigned long flags; + + spin_lock_irqsave(&cmd->alloc_lock, flags); if (!refcount_dec_and_test(&ent->refcnt)) - return; + goto out;
if (ent->idx >= 0) { - struct mlx5_cmd *cmd = ent->cmd; - cmd_free_index(cmd, ent->idx); up(ent->page_queue ? &cmd->pages_sem : &cmd->sem); }
cmd_free_ent(ent); +out: + spin_unlock_irqrestore(&cmd->alloc_lock, flags); }
static struct mlx5_cmd_layout *get_inst(struct mlx5_cmd *cmd, int idx)
From: Roi Dayan roid@nvidia.com
[ Upstream commit ad11c4f1d8fd1f03639460e425a36f7fd0ea83f5 ]
There could be multiple multipath entries but changing the port affinity for each one doesn't make much sense and there should be a default one. So only track the entry with lowest priority value. The commit doesn't affect existing users with a single entry.
Fixes: 544fe7c2e654 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events") Signed-off-by: Roi Dayan roid@nvidia.com Reviewed-by: Maor Dickman maord@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c index 1ca01a5b6cdd..626aa60b6099 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c @@ -126,6 +126,10 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, return; }
+ /* Handle multipath entry with lower priority value */ + if (mp->mfi && mp->mfi != fi && fi->fib_priority >= mp->mfi->fib_priority) + return; + /* Handle add/replace event */ nhs = fib_info_num_path(fi); if (nhs == 1) { @@ -135,12 +139,13 @@ static void mlx5_lag_fib_route_event(struct mlx5_lag *ldev, int i = mlx5_lag_dev_get_netdev_idx(ldev, nh_dev);
if (i < 0) - i = MLX5_LAG_NORMAL_AFFINITY; - else - ++i; + return;
+ i++; mlx5_lag_set_port_affinity(ldev, i); } + + mp->mfi = fi; return; }
From: Ben Ben-Ishay benishay@nvidia.com
[ Upstream commit 99a2b9be077ae3a5d97fbf5f7782e0f2e9812978 ]
SHAMPO is an RQ / WQ feature, an indication was added to the TIR in the first place to enforce suitability between connected TIR and RQ, this enforcement does not exist in current the Firmware implementation and was redundant in the first place.
Fixes: 83439f3c37aa ("net/mlx5e: Add HW-GRO offload") Signed-off-by: Ben Ben-Ishay benishay@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/tir.c | 3 --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +-- include/linux/mlx5/mlx5_ifc.h | 1 - 3 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tir.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tir.c index da169b816665..d4239e3b3c88 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tir.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tir.c @@ -88,9 +88,6 @@ void mlx5e_tir_builder_build_packet_merge(struct mlx5e_tir_builder *builder, (MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ - rough_max_l2_l3_hdr_sz) >> 8); MLX5_SET(tirc, tirc, lro_timeout_period_usecs, pkt_merge_param->timeout); break; - case MLX5E_PACKET_MERGE_SHAMPO: - MLX5_SET(tirc, tirc, packet_merge_mask, MLX5_TIRC_PACKET_MERGE_MASK_SHAMPO); - break; default: break; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index d92b82cdfd4e..22de7327c5a8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3592,8 +3592,7 @@ static int set_feature_hw_gro(struct net_device *netdev, bool enable) goto out; }
- err = mlx5e_safe_switch_params(priv, &new_params, - mlx5e_modify_tirs_packet_merge_ctx, NULL, reset); + err = mlx5e_safe_switch_params(priv, &new_params, NULL, NULL, reset); out: mutex_unlock(&priv->state_lock); return err; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 58a60e46c319..66522bc56a0b 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -3410,7 +3410,6 @@ enum { enum { MLX5_TIRC_PACKET_MERGE_MASK_IPV4_LRO = BIT(0), MLX5_TIRC_PACKET_MERGE_MASK_IPV6_LRO = BIT(1), - MLX5_TIRC_PACKET_MERGE_MASK_SHAMPO = BIT(2), };
enum {
From: Pavel Skripkin paskripkin@gmail.com
[ Upstream commit f80cfe2f26581f188429c12bd937eb905ad3ac7b ]
Syzbot reported UAF in port100_send_complete(). The root case is in missing usb_kill_urb() calls on error handling path of ->probe function.
port100_send_complete() accesses devm allocated memory which will be freed on probe failure. We should kill this urbs before returning an error from probe function to prevent reported use-after-free
Fail log:
BUG: KASAN: use-after-free in port100_send_complete+0x16e/0x1a0 drivers/nfc/port100.c:935 Read of size 1 at addr ffff88801bb59540 by task ksoftirqd/2/26 ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x303 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 port100_send_complete+0x16e/0x1a0 drivers/nfc/port100.c:935 __usb_hcd_giveback_urb+0x2b0/0x5c0 drivers/usb/core/hcd.c:1670
...
Allocated by task 1255: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:436 [inline] ____kasan_kmalloc mm/kasan/common.c:515 [inline] ____kasan_kmalloc mm/kasan/common.c:474 [inline] __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:524 alloc_dr drivers/base/devres.c:116 [inline] devm_kmalloc+0x96/0x1d0 drivers/base/devres.c:823 devm_kzalloc include/linux/device.h:209 [inline] port100_probe+0x8a/0x1320 drivers/nfc/port100.c:1502
Freed by task 1255: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:45 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free+0xff/0x140 mm/kasan/common.c:328 kasan_slab_free include/linux/kasan.h:236 [inline] __cache_free mm/slab.c:3437 [inline] kfree+0xf8/0x2b0 mm/slab.c:3794 release_nodes+0x112/0x1a0 drivers/base/devres.c:501 devres_release_all+0x114/0x190 drivers/base/devres.c:530 really_probe+0x626/0xcc0 drivers/base/dd.c:670
Reported-and-tested-by: syzbot+16bcb127fb73baeecb14@syzkaller.appspotmail.com Fixes: 0347a6ab300a ("NFC: port100: Commands mechanism implementation") Signed-off-by: Pavel Skripkin paskripkin@gmail.com Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Link: https://lore.kernel.org/r/20220308185007.6987-1-paskripkin@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/port100.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nfc/port100.c b/drivers/nfc/port100.c index d7db1a0e6be1..00d8ea6dcb5d 100644 --- a/drivers/nfc/port100.c +++ b/drivers/nfc/port100.c @@ -1612,7 +1612,9 @@ static int port100_probe(struct usb_interface *interface, nfc_digital_free_device(dev->nfc_digital_dev);
error: + usb_kill_urb(dev->in_urb); usb_free_urb(dev->in_urb); + usb_kill_urb(dev->out_urb); usb_free_urb(dev->out_urb); usb_put_dev(dev->udev);
From: Guillaume Nault gnault@redhat.com
[ Upstream commit 18dfc667550fe9c032a6dcc3402b50e691e18029 ]
The cleanup() function takes care of killing processes launched by the test functions. It relies on variables like ${tcpdump_pids} to get the relevant PIDs. But tests are run in their own subshell, so updated *_pids values are invisible to other shells. Therefore cleanup() never sees any process to kill:
$ ./tools/testing/selftests/net/pmtu.sh -t pmtu_ipv4_exception TEST: ipv4: PMTU exceptions [ OK ] TEST: ipv4: PMTU exceptions - nexthop objects [ OK ]
$ pgrep -af tcpdump 6084 tcpdump -s 0 -i veth_A-R1 -w pmtu_ipv4_exception_veth_A-R1.pcap 6085 tcpdump -s 0 -i veth_R1-A -w pmtu_ipv4_exception_veth_R1-A.pcap 6086 tcpdump -s 0 -i veth_R1-B -w pmtu_ipv4_exception_veth_R1-B.pcap 6087 tcpdump -s 0 -i veth_B-R1 -w pmtu_ipv4_exception_veth_B-R1.pcap 6088 tcpdump -s 0 -i veth_A-R2 -w pmtu_ipv4_exception_veth_A-R2.pcap 6089 tcpdump -s 0 -i veth_R2-A -w pmtu_ipv4_exception_veth_R2-A.pcap 6090 tcpdump -s 0 -i veth_R2-B -w pmtu_ipv4_exception_veth_R2-B.pcap 6091 tcpdump -s 0 -i veth_B-R2 -w pmtu_ipv4_exception_veth_B-R2.pcap 6228 tcpdump -s 0 -i veth_A-R1 -w pmtu_ipv4_exception_veth_A-R1.pcap 6229 tcpdump -s 0 -i veth_R1-A -w pmtu_ipv4_exception_veth_R1-A.pcap 6230 tcpdump -s 0 -i veth_R1-B -w pmtu_ipv4_exception_veth_R1-B.pcap 6231 tcpdump -s 0 -i veth_B-R1 -w pmtu_ipv4_exception_veth_B-R1.pcap 6232 tcpdump -s 0 -i veth_A-R2 -w pmtu_ipv4_exception_veth_A-R2.pcap 6233 tcpdump -s 0 -i veth_R2-A -w pmtu_ipv4_exception_veth_R2-A.pcap 6234 tcpdump -s 0 -i veth_R2-B -w pmtu_ipv4_exception_veth_R2-B.pcap 6235 tcpdump -s 0 -i veth_B-R2 -w pmtu_ipv4_exception_veth_B-R2.pcap
Fix this by running cleanup() in the context of the test subshell. Now that each test cleans the environment after completion, there's no need for calling cleanup() again when the next test starts. So let's drop it from the setup() function. This is okay because cleanup() is also called when pmtu.sh starts, so even the first test starts in a clean environment.
Also, use tcpdump's immediate mode. Otherwise it might not have time to process buffered packets, resulting in missing packets or even empty pcap files for short tests.
Note: PAUSE_ON_FAIL is still evaluated before cleanup(), so one can still inspect the test environment upon failure when using -p.
Fixes: a92a0a7b8e7c ("selftests: pmtu: Simplify cleanup and namespace names") Signed-off-by: Guillaume Nault gnault@redhat.com Reviewed-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/pmtu.sh | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 543ad7513a8e..2e8972573d91 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -865,7 +865,6 @@ setup_ovs_bridge() { setup() { [ "$(id -u)" -ne 0 ] && echo " need to run as root" && return $ksft_skip
- cleanup for arg do eval setup_${arg} || { echo " ${arg} not supported"; return 1; } done @@ -876,7 +875,7 @@ trace() {
for arg do [ "${ns_cmd}" = "" ] && ns_cmd="${arg}" && continue - ${ns_cmd} tcpdump -s 0 -i "${arg}" -w "${name}_${arg}.pcap" 2> /dev/null & + ${ns_cmd} tcpdump --immediate-mode -s 0 -i "${arg}" -w "${name}_${arg}.pcap" 2> /dev/null & tcpdump_pids="${tcpdump_pids} $!" ns_cmd= done @@ -1836,6 +1835,10 @@ run_test() {
unset IFS
+ # Since cleanup() relies on variables modified by this subshell, it + # has to run in this context. + trap cleanup EXIT + if [ "$VERBOSE" = "1" ]; then printf "\n##########################################################################\n\n" fi
From: Guillaume Nault gnault@redhat.com
[ Upstream commit 94a4a4fe4c696413932eed8bdec46574de9576b8 ]
When using "run_cmd <command> &", then "$!" refers to the PID of the subshell used to run <command>, not the command itself. Therefore nettest_pids actually doesn't contain the list of the nettest commands running in the background. So cleanup() can't kill them and the nettest processes run until completion (fortunately they have a 5s timeout).
Fix this by defining a new command for running processes in the background, for which "$!" really refers to the PID of the command run.
Also, double quote variables on the modified lines, to avoid shellcheck warnings.
Fixes: ece1278a9b81 ("selftests: net: add ESP-in-UDP PMTU test") Signed-off-by: Guillaume Nault gnault@redhat.com Reviewed-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/pmtu.sh | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 2e8972573d91..694732e4b344 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -374,6 +374,16 @@ run_cmd() { return $rc }
+run_cmd_bg() { + cmd="$*" + + if [ "$VERBOSE" = "1" ]; then + printf " COMMAND: %s &\n" "${cmd}" + fi + + $cmd 2>&1 & +} + # Find the auto-generated name for this namespace nsname() { eval echo $NS_$1 @@ -670,10 +680,10 @@ setup_nettest_xfrm() { [ ${1} -eq 6 ] && proto="-6" || proto="" port=${2}
- run_cmd ${ns_a} nettest ${proto} -q -D -s -x -p ${port} -t 5 & + run_cmd_bg "${ns_a}" nettest "${proto}" -q -D -s -x -p "${port}" -t 5 nettest_pids="${nettest_pids} $!"
- run_cmd ${ns_b} nettest ${proto} -q -D -s -x -p ${port} -t 5 & + run_cmd_bg "${ns_b}" nettest "${proto}" -q -D -s -x -p "${port}" -t 5 nettest_pids="${nettest_pids} $!" }
From: Mark Featherston mark@embeddedTS.com
[ Upstream commit 03fe003547975680fdb9ff5ab0e41cb68276c4f2 ]
This works around an issue with the hardware where both OE and DAT are exposed in the same register. If both are updated simultaneously, the harware makes no guarantees that OE or DAT will actually change in any given order and may result in a glitch of a few ns on a GPIO pin when changing direction and value in a single write.
Setting direction to input now only affects OE bit. Setting direction to output updates DAT first, then OE.
Fixes: 9c6686322d74 ("gpio: add Technologic I2C-FPGA gpio support") Signed-off-by: Mark Featherston mark@embeddedTS.com Signed-off-by: Kris Bahnsen kris@embeddedTS.com Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-ts4900.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/gpio/gpio-ts4900.c b/drivers/gpio/gpio-ts4900.c index d885032cf814..d918d2df4de2 100644 --- a/drivers/gpio/gpio-ts4900.c +++ b/drivers/gpio/gpio-ts4900.c @@ -1,7 +1,7 @@ /* * Digital I/O driver for Technologic Systems I2C FPGA Core * - * Copyright (C) 2015 Technologic Systems + * Copyright (C) 2015, 2018 Technologic Systems * Copyright (C) 2016 Savoir-Faire Linux * * This program is free software; you can redistribute it and/or @@ -55,19 +55,33 @@ static int ts4900_gpio_direction_input(struct gpio_chip *chip, { struct ts4900_gpio_priv *priv = gpiochip_get_data(chip);
- /* - * This will clear the output enable bit, the other bits are - * dontcare when this is cleared + /* Only clear the OE bit here, requires a RMW. Prevents potential issue + * with OE and data getting to the physical pin at different times. */ - return regmap_write(priv->regmap, offset, 0); + return regmap_update_bits(priv->regmap, offset, TS4900_GPIO_OE, 0); }
static int ts4900_gpio_direction_output(struct gpio_chip *chip, unsigned int offset, int value) { struct ts4900_gpio_priv *priv = gpiochip_get_data(chip); + unsigned int reg; int ret;
+ /* If changing from an input to an output, we need to first set the + * proper data bit to what is requested and then set OE bit. This + * prevents a glitch that can occur on the IO line + */ + regmap_read(priv->regmap, offset, ®); + if (!(reg & TS4900_GPIO_OE)) { + if (value) + reg = TS4900_GPIO_OUT; + else + reg &= ~TS4900_GPIO_OUT; + + regmap_write(priv->regmap, offset, reg); + } + if (value) ret = regmap_write(priv->regmap, offset, TS4900_GPIO_OE | TS4900_GPIO_OUT);
From: Linus Torvalds torvalds@linux-foundation.org
[ Upstream commit fe673d3f5bf1fc50cdc4b754831db91a2ec10126 ]
Instead of using GUP, make fault_in_safe_writeable() actually force a 'handle_mm_fault()' using the same fixup_user_fault() machinery that futexes already use.
Using the GUP machinery meant that fault_in_safe_writeable() did not do everything that a real fault would do, ranging from not auto-expanding the stack segment, to not updating accessed or dirty flags in the page tables (GUP sets those flags on the pages themselves).
The latter causes problems on architectures (like s390) that do accessed bit handling in software, which meant that fault_in_safe_writeable() didn't actually do all the fault handling it needed to, and trying to access the user address afterwards would still cause faults.
Reported-and-tested-by: Andreas Gruenbacher agruenba@redhat.com Fixes: cdd591fc86e3 ("iov_iter: Introduce fault_in_iov_iter_writeable") Link: https://lore.kernel.org/all/CAHc6FU5nP+nziNGG0JAF1FUx-GV7kKFvM7aZuU_XD2_1v4v... Acked-by: David Hildenbrand david@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/gup.c | 57 +++++++++++++++++++------------------------------------- 1 file changed, 19 insertions(+), 38 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c index 37087529bb95..b7e5e80538c9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1723,11 +1723,11 @@ EXPORT_SYMBOL(fault_in_writeable); * @uaddr: start of address range * @size: length of address range * - * Faults in an address range using get_user_pages, i.e., without triggering - * hardware page faults. This is primarily useful when we already know that - * some or all of the pages in the address range aren't in memory. + * Faults in an address range for writing. This is primarily useful when we + * already know that some or all of the pages in the address range aren't in + * memory. * - * Other than fault_in_writeable(), this function is non-destructive. + * Unlike fault_in_writeable(), this function is non-destructive. * * Note that we don't pin or otherwise hold the pages referenced that we fault * in. There's no guarantee that they'll stay in memory for any duration of @@ -1738,46 +1738,27 @@ EXPORT_SYMBOL(fault_in_writeable); */ size_t fault_in_safe_writeable(const char __user *uaddr, size_t size) { - unsigned long start = (unsigned long)untagged_addr(uaddr); - unsigned long end, nstart, nend; + unsigned long start = (unsigned long)uaddr, end; struct mm_struct *mm = current->mm; - struct vm_area_struct *vma = NULL; - int locked = 0; + bool unlocked = false;
- nstart = start & PAGE_MASK; + if (unlikely(size == 0)) + return 0; end = PAGE_ALIGN(start + size); - if (end < nstart) + if (end < start) end = 0; - for (; nstart != end; nstart = nend) { - unsigned long nr_pages; - long ret;
- if (!locked) { - locked = 1; - mmap_read_lock(mm); - vma = find_vma(mm, nstart); - } else if (nstart >= vma->vm_end) - vma = vma->vm_next; - if (!vma || vma->vm_start >= end) - break; - nend = end ? min(end, vma->vm_end) : vma->vm_end; - if (vma->vm_flags & (VM_IO | VM_PFNMAP)) - continue; - if (nstart < vma->vm_start) - nstart = vma->vm_start; - nr_pages = (nend - nstart) / PAGE_SIZE; - ret = __get_user_pages_locked(mm, nstart, nr_pages, - NULL, NULL, &locked, - FOLL_TOUCH | FOLL_WRITE); - if (ret <= 0) + mmap_read_lock(mm); + do { + if (fixup_user_fault(mm, start, FAULT_FLAG_WRITE, &unlocked)) break; - nend = nstart + ret * PAGE_SIZE; - } - if (locked) - mmap_read_unlock(mm); - if (nstart == end) - return 0; - return size - min_t(size_t, nstart - start, size); + start = (start + PAGE_SIZE) & PAGE_MASK; + } while (start != end); + mmap_read_unlock(mm); + + if (size > (unsigned long)uaddr - start) + return size - ((unsigned long)uaddr - start); + return 0; } EXPORT_SYMBOL(fault_in_safe_writeable);
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 2ac5b58e645c66932438bb021cb5b52097ce70b0 ]
The of_find_compatible_node() function returns a node pointer with refcount incremented, We should use of_node_put() on it when done Add the missing of_node_put() to release the refcount.
Fixes: 7349a74ea75c ("net: ethernet: gianfar_ethtool: get phc index through drvdata") Signed-off-by: Miaoqian Lin linmq006@gmail.com Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Reviewed-by: Claudiu Manoil claudiu.manoil@nxp.com Link: https://lore.kernel.org/r/20220310015313.14938-1-linmq006@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/gianfar_ethtool.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c index 7b32ed29bf4c..8c17fe5d66ed 100644 --- a/drivers/net/ethernet/freescale/gianfar_ethtool.c +++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c @@ -1460,6 +1460,7 @@ static int gfar_get_ts_info(struct net_device *dev, ptp_node = of_find_compatible_node(NULL, NULL, "fsl,etsec-ptp"); if (ptp_node) { ptp_dev = of_find_device_by_node(ptp_node); + of_node_put(ptp_node); if (ptp_dev) ptp = platform_get_drvdata(ptp_dev); }
From: Clément Léger clement.leger@bootlin.com
[ Upstream commit 37c9d66c95564c85a001d8a035354f0220a1e1c3 ]
MISR1 was cleared twice but the original author intention was probably to clear MISR1 & MISR2 to completely disable interrupts. Fix it to clear MISR2.
Fixes: 87461f7a58ab ("net: phy: DP83822 initial driver submission") Signed-off-by: Clément Léger clement.leger@bootlin.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20220309142228.761153-1-clement.leger@bootlin.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/dp83822.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/dp83822.c b/drivers/net/phy/dp83822.c index 211b5476a6f5..ce17b2af3218 100644 --- a/drivers/net/phy/dp83822.c +++ b/drivers/net/phy/dp83822.c @@ -274,7 +274,7 @@ static int dp83822_config_intr(struct phy_device *phydev) if (err < 0) return err;
- err = phy_write(phydev, MII_DP83822_MISR1, 0); + err = phy_write(phydev, MII_DP83822_MISR2, 0); if (err < 0) return err;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 633593a808980f82d251d0ca89730d8bb8b0220c ]
syzbot reported a kernel infoleak [1] of 4 bytes.
After analysis, it turned out r->idiag_expires is not initialized if inet_sctp_diag_fill() calls inet_diag_msg_common_fill()
Make sure to clear idiag_timer/idiag_retrans/idiag_expires and let inet_diag_msg_sctpasoc_fill() fill them again if needed.
[1]
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:154 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x6ef/0x25a0 lib/iov_iter.c:668 instrument_copy_to_user include/linux/instrumented.h:121 [inline] copyout lib/iov_iter.c:154 [inline] _copy_to_iter+0x6ef/0x25a0 lib/iov_iter.c:668 copy_to_iter include/linux/uio.h:162 [inline] simple_copy_to_iter+0xf3/0x140 net/core/datagram.c:519 __skb_datagram_iter+0x2d5/0x11b0 net/core/datagram.c:425 skb_copy_datagram_iter+0xdc/0x270 net/core/datagram.c:533 skb_copy_datagram_msg include/linux/skbuff.h:3696 [inline] netlink_recvmsg+0x669/0x1c80 net/netlink/af_netlink.c:1977 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] __sys_recvfrom+0x795/0xa10 net/socket.c:2097 __do_sys_recvfrom net/socket.c:2115 [inline] __se_sys_recvfrom net/socket.c:2111 [inline] __x64_sys_recvfrom+0x19d/0x210 net/socket.c:2111 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae
Uninit was created at: slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3247 [inline] __kmalloc_node_track_caller+0xe0c/0x1510 mm/slub.c:4975 kmalloc_reserve net/core/skbuff.c:354 [inline] __alloc_skb+0x545/0xf90 net/core/skbuff.c:426 alloc_skb include/linux/skbuff.h:1158 [inline] netlink_dump+0x3e5/0x16c0 net/netlink/af_netlink.c:2248 __netlink_dump_start+0xcf8/0xe90 net/netlink/af_netlink.c:2373 netlink_dump_start include/linux/netlink.h:254 [inline] inet_diag_handler_cmd+0x2e7/0x400 net/ipv4/inet_diag.c:1341 sock_diag_rcv_msg+0x24a/0x620 netlink_rcv_skb+0x40c/0x7e0 net/netlink/af_netlink.c:2494 sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:277 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x1093/0x1360 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x14d9/0x1720 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] sock_write_iter+0x594/0x690 net/socket.c:1061 do_iter_readv_writev+0xa7f/0xc70 do_iter_write+0x52c/0x1500 fs/read_write.c:851 vfs_writev fs/read_write.c:924 [inline] do_writev+0x645/0xe00 fs/read_write.c:967 __do_sys_writev fs/read_write.c:1040 [inline] __se_sys_writev fs/read_write.c:1037 [inline] __x64_sys_writev+0xe5/0x120 fs/read_write.c:1037 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae
Bytes 68-71 of 2508 are uninitialized Memory access of size 2508 starts at ffff888114f9b000 Data copied to user address 00007f7fe09ff2e0
CPU: 1 PID: 3478 Comm: syz-executor306 Not tainted 5.17.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Vlad Yasevich vyasevich@gmail.com Cc: Neil Horman nhorman@tuxdriver.com Cc: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Reviewed-by: Xin Long lucien.xin@gmail.com Link: https://lore.kernel.org/r/20220310001145.297371-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/diag.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/net/sctp/diag.c b/net/sctp/diag.c index 034e2c74497d..d9c6d8f30f09 100644 --- a/net/sctp/diag.c +++ b/net/sctp/diag.c @@ -61,10 +61,6 @@ static void inet_diag_msg_sctpasoc_fill(struct inet_diag_msg *r, r->idiag_timer = SCTP_EVENT_TIMEOUT_T3_RTX; r->idiag_retrans = asoc->rtx_data_chunks; r->idiag_expires = jiffies_to_msecs(t3_rtx->expires - jiffies); - } else { - r->idiag_timer = 0; - r->idiag_retrans = 0; - r->idiag_expires = 0; } }
@@ -144,13 +140,14 @@ static int inet_sctp_diag_fill(struct sock *sk, struct sctp_association *asoc, r = nlmsg_data(nlh); BUG_ON(!sk_fullsock(sk));
+ r->idiag_timer = 0; + r->idiag_retrans = 0; + r->idiag_expires = 0; if (asoc) { inet_diag_msg_sctpasoc_fill(r, sk, asoc); } else { inet_diag_msg_common_fill(r, sk); r->idiag_state = sk->sk_state; - r->idiag_timer = 0; - r->idiag_retrans = 0; }
if (inet_diag_msg_attrs_fill(sk, skb, r, ext, user_ns, net_admin))
From: Jianglei Nie niejianglei2021@163.com
[ Upstream commit bc0e610a6eb0d46e4123fafdbe5e6141d9fff3be ]
If bus->state is equal to MDIOBUS_ALLOCATED, mdiobus_free(bus) will free the "bus". But bus->name is still used in the next line, which will lead to a use after free.
We can fix it by putting the name in a local variable and make the bus->name point to the rodata section "name",then use the name in the error message without referring to bus to avoid the uaf.
Fixes: 95b5fc03c189 ("net: arc_emac: Make use of the helper function dev_err_probe()") Signed-off-by: Jianglei Nie niejianglei2021@163.com Link: https://lore.kernel.org/r/20220309121824.36529-1-niejianglei2021@163.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/arc/emac_mdio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/arc/emac_mdio.c b/drivers/net/ethernet/arc/emac_mdio.c index 9acf589b1178..87f40c2ba904 100644 --- a/drivers/net/ethernet/arc/emac_mdio.c +++ b/drivers/net/ethernet/arc/emac_mdio.c @@ -132,6 +132,7 @@ int arc_mdio_probe(struct arc_emac_priv *priv) { struct arc_emac_mdio_bus_data *data = &priv->bus_data; struct device_node *np = priv->dev->of_node; + const char *name = "Synopsys MII Bus"; struct mii_bus *bus; int error;
@@ -142,7 +143,7 @@ int arc_mdio_probe(struct arc_emac_priv *priv) priv->bus = bus; bus->priv = priv; bus->parent = priv->dev; - bus->name = "Synopsys MII Bus"; + bus->name = name; bus->read = &arc_mdio_read; bus->write = &arc_mdio_write; bus->reset = &arc_mdio_reset; @@ -167,7 +168,7 @@ int arc_mdio_probe(struct arc_emac_priv *priv) if (error) { mdiobus_free(bus); return dev_err_probe(priv->dev, error, - "cannot register MDIO bus %s\n", bus->name); + "cannot register MDIO bus %s\n", name); }
return 0;
From: Jeremy Linton jeremy.linton@arm.com
[ Upstream commit 00b022f8f876a3a036b0df7f971001bef6398605 ]
Some of the bcmgenet platforms don't correctly support WOL, yet ethtool returns:
"Supports Wake-on: gsf"
which is false.
Ideally if there isn't a wol_irq, or there is something else that keeps the device from being able to wakeup it should display:
"Supports Wake-on: d"
This patch checks whether the device can wakup, before using the hard-coded supported flags. This corrects the ethtool reporting, as well as the WOL configuration because ethtool verifies that the mode is supported before attempting it.
Fixes: c51de7f3976b ("net: bcmgenet: add Wake-on-LAN support code") Signed-off-by: Jeremy Linton jeremy.linton@arm.com Tested-by: Peter Robinson pbrobinson@gmail.com Acked-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20220310045535.224450-1-jeremy.linton@arm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c b/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c index e31a5a397f11..f55d9d9c01a8 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c @@ -40,6 +40,13 @@ void bcmgenet_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) { struct bcmgenet_priv *priv = netdev_priv(dev); + struct device *kdev = &priv->pdev->dev; + + if (!device_can_wakeup(kdev)) { + wol->supported = 0; + wol->wolopts = 0; + return; + }
wol->supported = WAKE_MAGIC | WAKE_MAGICSECURE | WAKE_FILTER; wol->wolopts = priv->wolopts;
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 2c87c6f9fbddc5b84d67b2fa3f432fcac6d99d93 ]
Sometimes the link comes up but no data flows. This patch fixes this behavior. It's not clear what's the root cause of the issue.
According to the tests one other link-up issue remains. In very rare cases the link isn't even reported as up.
Fixes: 84c8f773d2dc ("net: phy: meson-gxl: remove the use of .ack_callback()") Tested-by: Erico Nunes nunes.erico@gmail.com Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/e3473452-a1f9-efcf-5fdd-02b6f44c3fcd@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/meson-gxl.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c index c49062ad72c6..73f7962a37d3 100644 --- a/drivers/net/phy/meson-gxl.c +++ b/drivers/net/phy/meson-gxl.c @@ -243,7 +243,13 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev) irq_status == INTSRC_ENERGY_DETECT) return IRQ_HANDLED;
- phy_trigger_machine(phydev); + /* Give PHY some time before MAC starts sending data. This works + * around an issue where network doesn't come up properly. + */ + if (!(irq_status & INTSRC_LINK_DOWN)) + phy_queue_state_machine(phydev, msecs_to_jiffies(100)); + else + phy_trigger_machine(phydev);
return IRQ_HANDLED; }
From: Kumar Kartikeya Dwivedi memxor@gmail.com
[ Upstream commit a7e75016a0753c24d6c995bc02501ae35368e333 ]
Add a test that validates that timer value is not overwritten when doing a copy_map_value call in the kernel. Without the prior fix, this test triggers a crash.
Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/20220209070324.1093182-3-memxor@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/bpf/prog_tests/timer_crash.c | 32 +++++++++++ .../testing/selftests/bpf/progs/timer_crash.c | 54 +++++++++++++++++++ 2 files changed, 86 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/timer_crash.c create mode 100644 tools/testing/selftests/bpf/progs/timer_crash.c
diff --git a/tools/testing/selftests/bpf/prog_tests/timer_crash.c b/tools/testing/selftests/bpf/prog_tests/timer_crash.c new file mode 100644 index 000000000000..f74b82305da8 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/timer_crash.c @@ -0,0 +1,32 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <test_progs.h> +#include "timer_crash.skel.h" + +enum { + MODE_ARRAY, + MODE_HASH, +}; + +static void test_timer_crash_mode(int mode) +{ + struct timer_crash *skel; + + skel = timer_crash__open_and_load(); + if (!ASSERT_OK_PTR(skel, "timer_crash__open_and_load")) + return; + skel->bss->pid = getpid(); + skel->bss->crash_map = mode; + if (!ASSERT_OK(timer_crash__attach(skel), "timer_crash__attach")) + goto end; + usleep(1); +end: + timer_crash__destroy(skel); +} + +void test_timer_crash(void) +{ + if (test__start_subtest("array")) + test_timer_crash_mode(MODE_ARRAY); + if (test__start_subtest("hash")) + test_timer_crash_mode(MODE_HASH); +} diff --git a/tools/testing/selftests/bpf/progs/timer_crash.c b/tools/testing/selftests/bpf/progs/timer_crash.c new file mode 100644 index 000000000000..f8f7944e70da --- /dev/null +++ b/tools/testing/selftests/bpf/progs/timer_crash.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <vmlinux.h> +#include <bpf/bpf_tracing.h> +#include <bpf/bpf_helpers.h> + +struct map_elem { + struct bpf_timer timer; + struct bpf_spin_lock lock; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct map_elem); +} amap SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct map_elem); +} hmap SEC(".maps"); + +int pid = 0; +int crash_map = 0; /* 0 for amap, 1 for hmap */ + +SEC("fentry/do_nanosleep") +int sys_enter(void *ctx) +{ + struct map_elem *e, value = {}; + void *map = crash_map ? (void *)&hmap : (void *)&amap; + + if (bpf_get_current_task_btf()->tgid != pid) + return 0; + + *(void **)&value = (void *)0xdeadcaf3; + + bpf_map_update_elem(map, &(int){0}, &value, 0); + /* For array map, doing bpf_map_update_elem will do a + * check_and_free_timer_in_array, which will trigger the crash if timer + * pointer was overwritten, for hmap we need to use bpf_timer_cancel. + */ + if (crash_map == 1) { + e = bpf_map_lookup_elem(map, &(int){0}); + if (!e) + return 0; + bpf_timer_cancel(&e->timer); + } + return 0; +} + +char _license[] SEC("license") = "GPL";
From: Halil Pasic pasic@linux.ibm.com
[ Upstream commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e ]
The problem I'm addressing was discovered by the LTP test covering cve-2018-1000204.
A short description of what happens follows: 1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV and a corresponding dxferp. The peculiar thing about this is that TUR is not reading from the device. 2) In sg_start_req() the invocation of blk_rq_map_user() effectively bounces the user-space buffer. As if the device was to transfer into it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()") we make sure this first bounce buffer is allocated with GFP_ZERO. 3) For the rest of the story we keep ignoring that we have a TUR, so the device won't touch the buffer we prepare as if the we had a DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device and the buffer allocated by SG is mapped by the function virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here scatter-gather and not scsi generics). This mapping involves bouncing via the swiotlb (we need swiotlb to do virtio in protected guest like s390 Secure Execution, or AMD SEV). 4) When the SCSI TUR is done, we first copy back the content of the second (that is swiotlb) bounce buffer (which most likely contains some previous IO data), to the first bounce buffer, which contains all zeros. Then we copy back the content of the first bounce buffer to the user-space buffer. 5) The test case detects that the buffer, which it zero-initialized, ain't all zeros and fails.
One can argue that this is an swiotlb problem, because without swiotlb we leak all zeros, and the swiotlb should be transparent in a sense that it does not affect the outcome (if all other participants are well behaved).
Copying the content of the original buffer into the swiotlb buffer is the only way I can think of to make swiotlb transparent in such scenarios. So let's do just that if in doubt, but allow the driver to tell us that the whole mapped buffer is going to be overwritten, in which case we can preserve the old behavior and avoid the performance impact of the extra bounce.
Signed-off-by: Halil Pasic pasic@linux.ibm.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/core-api/dma-attributes.rst | 8 ++++++++ include/linux/dma-mapping.h | 8 ++++++++ kernel/dma/swiotlb.c | 3 ++- 3 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst index 1887d92e8e92..17706dc91ec9 100644 --- a/Documentation/core-api/dma-attributes.rst +++ b/Documentation/core-api/dma-attributes.rst @@ -130,3 +130,11 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged subsystem that the buffer is fully accessible at the elevated privilege level (and ideally inaccessible or at least read-only at the lesser-privileged levels). + +DMA_ATTR_OVERWRITE +------------------ + +This is a hint to the DMA-mapping subsystem that the device is expected to +overwrite the entire mapped size, thus the caller does not require any of the +previous buffer contents to be preserved. This allows bounce-buffering +implementations to optimise DMA_FROM_DEVICE transfers. diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index dca2b1355bb1..6150d11a607e 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -61,6 +61,14 @@ */ #define DMA_ATTR_PRIVILEGED (1UL << 9)
+/* + * This is a hint to the DMA-mapping subsystem that the device is expected + * to overwrite the entire mapped size, thus the caller does not require any + * of the previous buffer contents to be preserved. This allows + * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers. + */ +#define DMA_ATTR_OVERWRITE (1UL << 10) + /* * A dma_addr_t can hold any valid DMA or bus address for the platform. It can * be given to a device to use as a DMA source or target. It is specific to a diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..d958b1201092 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -582,7 +582,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, mem->slots[index + i].orig_addr = slot_addr(orig_addr, i); tlb_addr = slot_addr(mem->start, index) + offset; if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) + (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE || + dir == DMA_BIDIRECTIONAL)) swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE); return tlb_addr; }
From: Heikki Krogerus heikki.krogerus@linux.intel.com
[ Upstream commit 038438a25c45d5ac996e95a22fa9e76ff3d1f8c7 ]
This patch adds the necessary PCI ID for Intel Raptor Lake-S devices.
Signed-off-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20220214141948.18637-1-heikki.krogerus@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-pci.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c index 1ecedbb1684c..06d0e88ec8af 100644 --- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -43,6 +43,7 @@ #define PCI_DEVICE_ID_INTEL_ADLP 0x51ee #define PCI_DEVICE_ID_INTEL_ADLM 0x54ee #define PCI_DEVICE_ID_INTEL_ADLS 0x7ae1 +#define PCI_DEVICE_ID_INTEL_RPLS 0x7a61 #define PCI_DEVICE_ID_INTEL_TGL 0x9a15 #define PCI_DEVICE_ID_AMD_MR 0x163a
@@ -420,6 +421,9 @@ static const struct pci_device_id dwc3_pci_id_table[] = { { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_ADLS), (kernel_ulong_t) &dwc3_pci_intel_swnode, },
+ { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_RPLS), + (kernel_ulong_t) &dwc3_pci_intel_swnode, }, + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_TGL), (kernel_ulong_t) &dwc3_pci_intel_swnode, },
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit 6f66db29e2415cbe8759c48584f9cae19b3c2651 ]
It appears that last minute change moved ACPI ID of Alder Lake-M to the INTC1055, which is already in the driver.
This ID on the other hand will be used elsewhere.
This reverts commit 258435a1c8187f559549e515d2f77fa0b57bcd27.
Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/intel/pinctrl-tigerlake.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/pinctrl/intel/pinctrl-tigerlake.c b/drivers/pinctrl/intel/pinctrl-tigerlake.c index 0bcd19597e4a..3ddaeffc0415 100644 --- a/drivers/pinctrl/intel/pinctrl-tigerlake.c +++ b/drivers/pinctrl/intel/pinctrl-tigerlake.c @@ -749,7 +749,6 @@ static const struct acpi_device_id tgl_pinctrl_acpi_match[] = { { "INT34C5", (kernel_ulong_t)&tgllp_soc_data }, { "INT34C6", (kernel_ulong_t)&tglh_soc_data }, { "INTC1055", (kernel_ulong_t)&tgllp_soc_data }, - { "INTC1057", (kernel_ulong_t)&tgllp_soc_data }, { } }; MODULE_DEVICE_TABLE(acpi, tgl_pinctrl_acpi_match);
From: Wanpeng Li wanpengli@tencent.com
[ Upstream commit 4cb9a998b1ce25fad74a82f5a5c45a4ef40de337 ]
I saw the below splatting after the host suspended and resumed.
WARNING: CPU: 0 PID: 2943 at kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:5531 kvm_resume+0x2c/0x30 [kvm] CPU: 0 PID: 2943 Comm: step_after_susp Tainted: G W IOE 5.17.0-rc3+ #4 RIP: 0010:kvm_resume+0x2c/0x30 [kvm] Call Trace: <TASK> syscore_resume+0x90/0x340 suspend_devices_and_enter+0xaee/0xe90 pm_suspend.cold+0x36b/0x3c2 state_store+0x82/0xf0 kernfs_fop_write_iter+0x1b6/0x260 new_sync_write+0x258/0x370 vfs_write+0x33f/0x510 ksys_write+0xc9/0x160 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae
lockdep_is_held() can return -1 when lockdep is disabled which triggers this warning. Let's use lockdep_assert_not_held() which can detect incorrect calls while holding a lock and it also avoids false negatives when lockdep is disabled.
Signed-off-by: Wanpeng Li wanpengli@tencent.com Message-Id: 1644920142-81249-1-git-send-email-wanpengli@tencent.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- virt/kvm/kvm_main.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 71ddc7a8bc30..6ae9e04d0585 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5347,9 +5347,7 @@ static int kvm_suspend(void) static void kvm_resume(void) { if (kvm_usage_count) { -#ifdef CONFIG_LOCKDEP - WARN_ON(lockdep_is_held(&kvm_count_lock)); -#endif + lockdep_assert_not_held(&kvm_count_lock); hardware_enable_nolock(NULL); } }
From: Anton Romanov romanton@google.com
[ Upstream commit 3a55f729240a686aa8af00af436306c0cd532522 ]
If vcpu has tsc_always_catchup set each request updates pvclock data. KVM_HC_CLOCK_PAIRING consumers such as ptp_kvm_x86 rely on tsc read on host's side and do hypercall inside pvclock_read_retry loop leading to infinite loop in such situation.
v3: Removed warn Changed return code to KVM_EFAULT v2: Added warn
Signed-off-by: Anton Romanov romanton@google.com Message-Id: 20220216182653.506850-1-romanton@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/x86.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c6eb3e45e3d8..e8f495b9ae10 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8770,6 +8770,13 @@ static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr, if (clock_type != KVM_CLOCK_PAIRING_WALLCLOCK) return -KVM_EOPNOTSUPP;
+ /* + * When tsc is in permanent catchup mode guests won't be able to use + * pvclock_read_retry loop to get consistent view of pvclock + */ + if (vcpu->arch.tsc_always_catchup) + return -KVM_EOPNOTSUPP; + if (!kvm_get_walltime_and_clockread(&ts, &cycle)) return -KVM_EOPNOTSUPP;
From: Jon Lin jon.lin@rock-chips.com
[ Upstream commit 9382df0a98aad5bbcd4d634790305a1d786ad224 ]
Get num-cs u32 from dts of_node property rather than u16.
Signed-off-by: Jon Lin jon.lin@rock-chips.com Link: https://lore.kernel.org/r/20220216014028.8123-2-jon.lin@rock-chips.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-rockchip.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/spi/spi-rockchip.c b/drivers/spi/spi-rockchip.c index 553b6b9d0222..4f65ba3dd19c 100644 --- a/drivers/spi/spi-rockchip.c +++ b/drivers/spi/spi-rockchip.c @@ -654,7 +654,7 @@ static int rockchip_spi_probe(struct platform_device *pdev) struct spi_controller *ctlr; struct resource *mem; struct device_node *np = pdev->dev.of_node; - u32 rsd_nsecs; + u32 rsd_nsecs, num_cs; bool slave_mode;
slave_mode = of_property_read_bool(np, "spi-slave"); @@ -764,8 +764,9 @@ static int rockchip_spi_probe(struct platform_device *pdev) * rk spi0 has two native cs, spi1..5 one cs only * if num-cs is missing in the dts, default to 1 */ - if (of_property_read_u16(np, "num-cs", &ctlr->num_chipselect)) - ctlr->num_chipselect = 1; + if (of_property_read_u32(np, "num-cs", &num_cs)) + num_cs = 1; + ctlr->num_chipselect = num_cs; ctlr->use_gpio_descriptors = true; } ctlr->dev.of_node = pdev->dev.of_node;
From: Jon Lin jon.lin@rock-chips.com
[ Upstream commit 80808768e41324d2e23de89972b5406c1020e6e4 ]
After slave abort, all DMA should be stopped, or it will affect the next transmission and maybe abort again.
Signed-off-by: Jon Lin jon.lin@rock-chips.com Link: https://lore.kernel.org/r/20220216014028.8123-3-jon.lin@rock-chips.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-rockchip.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/spi/spi-rockchip.c b/drivers/spi/spi-rockchip.c index 4f65ba3dd19c..c6a1bb09be05 100644 --- a/drivers/spi/spi-rockchip.c +++ b/drivers/spi/spi-rockchip.c @@ -585,6 +585,12 @@ static int rockchip_spi_slave_abort(struct spi_controller *ctlr) { struct rockchip_spi *rs = spi_controller_get_devdata(ctlr);
+ if (atomic_read(&rs->state) & RXDMA) + dmaengine_terminate_sync(ctlr->dma_rx); + if (atomic_read(&rs->state) & TXDMA) + dmaengine_terminate_sync(ctlr->dma_tx); + atomic_set(&rs->state, 0); + spi_enable_chip(rs, false); rs->slave_abort = true; spi_finalize_current_transfer(ctlr);
From: Maxime Ripard maxime@cerno.tech
[ Upstream commit e40945ab7c7f966d0c37b7bd7b0596497dfe228d ]
On bind we will register the HDMI codec device but we don't unregister it on unbind, leading to a device leakage. Unregister our device at unbind.
Signed-off-by: Maxime Ripard maxime@cerno.tech Reviewed-by: Javier Martinez Canillas javierm@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20220127111452.222002-1-maxime... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vc4/vc4_hdmi.c | 8 ++++++++ drivers/gpu/drm/vc4/vc4_hdmi.h | 1 + 2 files changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c index 24f11c07bc3c..2f53ba54b81a 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c @@ -1522,6 +1522,7 @@ static int vc4_hdmi_audio_init(struct vc4_hdmi *vc4_hdmi) dev_err(dev, "Couldn't register the HDMI codec: %ld\n", PTR_ERR(codec_pdev)); return PTR_ERR(codec_pdev); } + vc4_hdmi->audio.codec_pdev = codec_pdev;
dai_link->cpus = &vc4_hdmi->audio.cpu; dai_link->codecs = &vc4_hdmi->audio.codec; @@ -1561,6 +1562,12 @@ static int vc4_hdmi_audio_init(struct vc4_hdmi *vc4_hdmi)
}
+static void vc4_hdmi_audio_exit(struct vc4_hdmi *vc4_hdmi) +{ + platform_device_unregister(vc4_hdmi->audio.codec_pdev); + vc4_hdmi->audio.codec_pdev = NULL; +} + static irqreturn_t vc4_hdmi_hpd_irq_thread(int irq, void *priv) { struct vc4_hdmi *vc4_hdmi = priv; @@ -2299,6 +2306,7 @@ static void vc4_hdmi_unbind(struct device *dev, struct device *master, kfree(vc4_hdmi->hdmi_regset.regs); kfree(vc4_hdmi->hd_regset.regs);
+ vc4_hdmi_audio_exit(vc4_hdmi); vc4_hdmi_cec_exit(vc4_hdmi); vc4_hdmi_hotplug_exit(vc4_hdmi); vc4_hdmi_connector_destroy(&vc4_hdmi->connector); diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.h b/drivers/gpu/drm/vc4/vc4_hdmi.h index 33e9f665ab8e..c0492da73683 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.h +++ b/drivers/gpu/drm/vc4/vc4_hdmi.h @@ -113,6 +113,7 @@ struct vc4_hdmi_audio { struct snd_soc_dai_link_component platform; struct snd_dmaengine_dai_dma_data dma_data; struct hdmi_audio_infoframe infoframe; + struct platform_device *codec_pdev; bool streaming; };
From: Nikhil Gupta nikhil.gupta@nxp.com
[ Upstream commit 132507ed04ce0c5559be04dd378fec4f3bbc00e8 ]
elfcorehdr_addr is fixed address passed to Second kernel which may be conflicted with potential reserved memory in Second kernel,so fdt_reserve_elfcorehdr() ahead of fdt_init_reserved_mem() can relieve this situation.
Signed-off-by: Nikhil Gupta nikhil.gupta@nxp.com Signed-off-by: Rob Herring robh@kernel.org Link: https://lore.kernel.org/r/20220128042321.15228-1-nikhil.gupta@nxp.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/of/fdt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 7e868e5995b7..f66abb496ed1 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -644,8 +644,8 @@ void __init early_init_fdt_scan_reserved_mem(void) }
fdt_scan_reserved_mem(); - fdt_init_reserved_mem(); fdt_reserve_elfcorehdr(); + fdt_init_reserved_mem(); }
/**
From: Wanpeng Li wanpengli@tencent.com
[ Upstream commit ec756e40e271866f951d77c5e923d8deb6002b15 ]
Inspired by commit 3553ae5690a (x86/kvm: Don't use pvqspinlock code if only 1 vCPU), on a VM with only 1 vCPU, there is no need to enable pv tlb/ipi/sched_yield and we can save the memory for __pv_cpu_mask.
Signed-off-by: Wanpeng Li wanpengli@tencent.com Message-Id: 1645171838-2855-1-git-send-email-wanpengli@tencent.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/kvm.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 59abbdad7729..ff3db164e52c 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -462,19 +462,22 @@ static bool pv_tlb_flush_supported(void) { return (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && !kvm_para_has_hint(KVM_HINTS_REALTIME) && - kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)); + kvm_para_has_feature(KVM_FEATURE_STEAL_TIME) && + (num_possible_cpus() != 1)); }
static bool pv_ipi_supported(void) { - return kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI); + return (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) && + (num_possible_cpus() != 1)); }
static bool pv_sched_yield_supported(void) { return (kvm_para_has_feature(KVM_FEATURE_PV_SCHED_YIELD) && !kvm_para_has_hint(KVM_HINTS_REALTIME) && - kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)); + kvm_para_has_feature(KVM_FEATURE_STEAL_TIME) && + (num_possible_cpus() != 1)); }
#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
From: Duoming Zhou duoming@zju.edu.cn
[ Upstream commit efe4186e6a1b54bf38b9e05450d43b0da1fd7739 ]
When a 6pack device is detaching, the sixpack_close() will act to cleanup necessary resources. Although del_timer_sync() in sixpack_close() won't return if there is an active timer, one could use mod_timer() in sp_xmit_on_air() to wake up timer again by calling userspace syscall such as ax25_sendmsg(), ax25_connect() and ax25_ioctl().
This unexpected waked handler, sp_xmit_on_air(), realizes nothing about the undergoing cleanup and may still call pty_write() to use driver layer resources that have already been released.
One of the possible race conditions is shown below:
(USE) | (FREE) ax25_sendmsg() | ax25_queue_xmit() | ... | sp_xmit() | sp_encaps() | sixpack_close() sp_xmit_on_air() | del_timer_sync(&sp->tx_t) mod_timer(&sp->tx_t,...) | ... | unregister_netdev() | ... (wait a while) | tty_release() | tty_release_struct() | release_tty() sp_xmit_on_air() | tty_kref_put(tty_struct) //FREE pty_write(tty_struct) //USE | ...
The corresponding fail log is shown below: =============================================================== BUG: KASAN: use-after-free in __run_timers.part.0+0x170/0x470 Write of size 8 at addr ffff88800a652ab8 by task swapper/2/0 ... Call Trace: ... queue_work_on+0x3f/0x50 pty_write+0xcd/0xe0pty_write+0xcd/0xe0 sp_xmit_on_air+0xb2/0x1f0 call_timer_fn+0x28/0x150 __run_timers.part.0+0x3c2/0x470 run_timer_softirq+0x3b/0x80 __do_softirq+0xf1/0x380 ...
This patch reorders the del_timer_sync() after the unregister_netdev() to avoid UAF bugs. Because the unregister_netdev() is well synchronized, it flushs out any pending queues, waits the refcount of net_device decreases to zero and removes net_device from kernel. There is not any running routines after executing unregister_netdev(). Therefore, we could not arouse timer from userspace again.
Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Lin Ma linma@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/hamradio/6pack.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c index 8a19a06b505d..ff2bb3d80fac 100644 --- a/drivers/net/hamradio/6pack.c +++ b/drivers/net/hamradio/6pack.c @@ -668,11 +668,11 @@ static void sixpack_close(struct tty_struct *tty) */ netif_stop_queue(sp->dev);
+ unregister_netdev(sp->dev); + del_timer_sync(&sp->tx_t); del_timer_sync(&sp->resync_t);
- unregister_netdev(sp->dev); - /* Free all 6pack frame buffers after unreg. */ kfree(sp->rbuff); kfree(sp->xbuff);
From: suresh kumar suresh2514@gmail.com
[ Upstream commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624 ]
When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed.
[ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
To prevent this scenario, we also make sure that the netdevice is present.
Signed-off-by: suresh kumar suresh2514@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/net-sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index d7f9ee830d34..9e5657f63245 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -213,7 +213,7 @@ static ssize_t speed_show(struct device *dev, if (!rtnl_trylock()) return restart_syscall();
- if (netif_running(netdev)) { + if (netif_running(netdev) && netif_device_present(netdev)) { struct ethtool_link_ksettings cmd;
if (!__ethtool_get_link_ksettings(netdev, &cmd))
From: Vikash Chandola vikash.chandola@linux.intel.com
[ Upstream commit 35f165f08950a876f1b95a61d79c93678fba2fd6 ]
Almost all fault/warning bits in pmbus status registers remain set even after fault/warning condition are removed. As per pmbus specification these faults must be cleared by user. Modify hwmon behavior to clear fault/warning bit after fetching data if fault/warning bit was set. This allows to get fresh data in next read.
Signed-off-by: Vikash Chandola vikash.chandola@linux.intel.com Link: https://lore.kernel.org/r/20220222131253.2426834-1-vikash.chandola@linux.int... Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/pmbus/pmbus_core.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/hwmon/pmbus/pmbus_core.c b/drivers/hwmon/pmbus/pmbus_core.c index 776ee2237be2..ac2fbee1ba9c 100644 --- a/drivers/hwmon/pmbus/pmbus_core.c +++ b/drivers/hwmon/pmbus/pmbus_core.c @@ -911,6 +911,11 @@ static int pmbus_get_boolean(struct i2c_client *client, struct pmbus_boolean *b, pmbus_update_sensor_data(client, s2);
regval = status & mask; + if (regval) { + ret = pmbus_write_byte_data(client, page, reg, regval); + if (ret) + goto unlock; + } if (s1 && s2) { s64 v1, v2;
From: Varun Prakash varun@chelsio.com
[ Upstream commit c2700d2886a87f83f31e0a301de1d2350b52c79b ]
As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3) Maximum Host to Controller Data length (MAXH2CDATA): Specifies the maximum number of PDU-Data bytes per H2CData PDU in bytes. This value is a multiple of dwords and should be no less than 4,096.
Current code sets H2CData PDU data_length to r2t_length, it does not check MAXH2CDATA value. Fix this by setting H2CData PDU data_length to min(req->h2cdata_left, queue->maxh2cdata).
Also validate MAXH2CDATA value returned by target in ICResp PDU, if it is not a multiple of dword or if it is less than 4096 return -EINVAL from nvme_tcp_init_connection().
Signed-off-by: Varun Prakash varun@chelsio.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/tcp.c | 63 +++++++++++++++++++++++++++++++--------- include/linux/nvme-tcp.h | 1 + 2 files changed, 50 insertions(+), 14 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 891a36d02e7c..65e00c64a588 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -44,6 +44,8 @@ struct nvme_tcp_request { u32 data_len; u32 pdu_len; u32 pdu_sent; + u32 h2cdata_left; + u32 h2cdata_offset; u16 ttag; __le16 status; struct list_head entry; @@ -95,6 +97,7 @@ struct nvme_tcp_queue { struct nvme_tcp_request *request;
int queue_size; + u32 maxh2cdata; size_t cmnd_capsule_len; struct nvme_tcp_ctrl *ctrl; unsigned long flags; @@ -572,23 +575,26 @@ static int nvme_tcp_handle_comp(struct nvme_tcp_queue *queue, return ret; }
-static void nvme_tcp_setup_h2c_data_pdu(struct nvme_tcp_request *req, - struct nvme_tcp_r2t_pdu *pdu) +static void nvme_tcp_setup_h2c_data_pdu(struct nvme_tcp_request *req) { struct nvme_tcp_data_pdu *data = req->pdu; struct nvme_tcp_queue *queue = req->queue; struct request *rq = blk_mq_rq_from_pdu(req); + u32 h2cdata_sent = req->pdu_len; u8 hdgst = nvme_tcp_hdgst_len(queue); u8 ddgst = nvme_tcp_ddgst_len(queue);
req->state = NVME_TCP_SEND_H2C_PDU; req->offset = 0; - req->pdu_len = le32_to_cpu(pdu->r2t_length); + req->pdu_len = min(req->h2cdata_left, queue->maxh2cdata); req->pdu_sent = 0; + req->h2cdata_left -= req->pdu_len; + req->h2cdata_offset += h2cdata_sent;
memset(data, 0, sizeof(*data)); data->hdr.type = nvme_tcp_h2c_data; - data->hdr.flags = NVME_TCP_F_DATA_LAST; + if (!req->h2cdata_left) + data->hdr.flags = NVME_TCP_F_DATA_LAST; if (queue->hdr_digest) data->hdr.flags |= NVME_TCP_F_HDGST; if (queue->data_digest) @@ -597,9 +603,9 @@ static void nvme_tcp_setup_h2c_data_pdu(struct nvme_tcp_request *req, data->hdr.pdo = data->hdr.hlen + hdgst; data->hdr.plen = cpu_to_le32(data->hdr.hlen + hdgst + req->pdu_len + ddgst); - data->ttag = pdu->ttag; + data->ttag = req->ttag; data->command_id = nvme_cid(rq); - data->data_offset = pdu->r2t_offset; + data->data_offset = cpu_to_le32(req->h2cdata_offset); data->data_length = cpu_to_le32(req->pdu_len); }
@@ -609,6 +615,7 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue, struct nvme_tcp_request *req; struct request *rq; u32 r2t_length = le32_to_cpu(pdu->r2t_length); + u32 r2t_offset = le32_to_cpu(pdu->r2t_offset);
rq = nvme_find_rq(nvme_tcp_tagset(queue), pdu->command_id); if (!rq) { @@ -633,14 +640,19 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue, return -EPROTO; }
- if (unlikely(le32_to_cpu(pdu->r2t_offset) < req->data_sent)) { + if (unlikely(r2t_offset < req->data_sent)) { dev_err(queue->ctrl->ctrl.device, "req %d unexpected r2t offset %u (expected %zu)\n", - rq->tag, le32_to_cpu(pdu->r2t_offset), req->data_sent); + rq->tag, r2t_offset, req->data_sent); return -EPROTO; }
- nvme_tcp_setup_h2c_data_pdu(req, pdu); + req->pdu_len = 0; + req->h2cdata_left = r2t_length; + req->h2cdata_offset = r2t_offset; + req->ttag = pdu->ttag; + + nvme_tcp_setup_h2c_data_pdu(req); nvme_tcp_queue_request(req, false, true);
return 0; @@ -928,6 +940,7 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req) { struct nvme_tcp_queue *queue = req->queue; int req_data_len = req->data_len; + u32 h2cdata_left = req->h2cdata_left;
while (true) { struct page *page = nvme_tcp_req_cur_page(req); @@ -972,7 +985,10 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req) req->state = NVME_TCP_SEND_DDGST; req->offset = 0; } else { - nvme_tcp_done_send_req(queue); + if (h2cdata_left) + nvme_tcp_setup_h2c_data_pdu(req); + else + nvme_tcp_done_send_req(queue); } return 1; } @@ -1030,9 +1046,14 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req) if (queue->hdr_digest && !req->offset) nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu));
- ret = kernel_sendpage(queue->sock, virt_to_page(pdu), - offset_in_page(pdu) + req->offset, len, - MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST); + if (!req->h2cdata_left) + ret = kernel_sendpage(queue->sock, virt_to_page(pdu), + offset_in_page(pdu) + req->offset, len, + MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST); + else + ret = sock_no_sendpage(queue->sock, virt_to_page(pdu), + offset_in_page(pdu) + req->offset, len, + MSG_DONTWAIT | MSG_MORE); if (unlikely(ret <= 0)) return ret;
@@ -1052,6 +1073,7 @@ static int nvme_tcp_try_send_ddgst(struct nvme_tcp_request *req) { struct nvme_tcp_queue *queue = req->queue; size_t offset = req->offset; + u32 h2cdata_left = req->h2cdata_left; int ret; struct msghdr msg = { .msg_flags = MSG_DONTWAIT }; struct kvec iov = { @@ -1069,7 +1091,10 @@ static int nvme_tcp_try_send_ddgst(struct nvme_tcp_request *req) return ret;
if (offset + ret == NVME_TCP_DIGEST_LENGTH) { - nvme_tcp_done_send_req(queue); + if (h2cdata_left) + nvme_tcp_setup_h2c_data_pdu(req); + else + nvme_tcp_done_send_req(queue); return 1; }
@@ -1261,6 +1286,7 @@ static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue) struct msghdr msg = {}; struct kvec iov; bool ctrl_hdgst, ctrl_ddgst; + u32 maxh2cdata; int ret;
icreq = kzalloc(sizeof(*icreq), GFP_KERNEL); @@ -1344,6 +1370,14 @@ static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue) goto free_icresp; }
+ maxh2cdata = le32_to_cpu(icresp->maxdata); + if ((maxh2cdata % 4) || (maxh2cdata < NVME_TCP_MIN_MAXH2CDATA)) { + pr_err("queue %d: invalid maxh2cdata returned %u\n", + nvme_tcp_queue_id(queue), maxh2cdata); + goto free_icresp; + } + queue->maxh2cdata = maxh2cdata; + ret = 0; free_icresp: kfree(icresp); @@ -2329,6 +2363,7 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns, req->data_sent = 0; req->pdu_len = 0; req->pdu_sent = 0; + req->h2cdata_left = 0; req->data_len = blk_rq_nr_phys_segments(rq) ? blk_rq_payload_bytes(rq) : 0; req->curr_bio = rq->bio; diff --git a/include/linux/nvme-tcp.h b/include/linux/nvme-tcp.h index 959e0bd9a913..75470159a194 100644 --- a/include/linux/nvme-tcp.h +++ b/include/linux/nvme-tcp.h @@ -12,6 +12,7 @@ #define NVME_TCP_DISC_PORT 8009 #define NVME_TCP_ADMIN_CCSZ SZ_8K #define NVME_TCP_DIGEST_LENGTH 4 +#define NVME_TCP_MIN_MAXH2CDATA 4096
enum nvme_tcp_pfv { NVME_TCP_PFV_1_0 = 0x0,
From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit 3f1271b54edcc692da5a3663f2aa2a64781f9bc3 ]
There are enough VBIOS escapes without the proper workaround that some users still hit this. Microsoft never productized ATS on Windows so OEM platforms that were Windows-only didn't always validate ATS.
The advantages of ATS are not worth it compared to the potential instabilities on harvested boards. Disable ATS on all Navi10 and Navi14 boards.
Symptoms include:
amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0007 address=0xffffc02000 flags=0x0000] AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x0007 address=0xffffc02000 flags=0x0000] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=6047, emitted seq=6049 amdgpu 0000:07:00.0: amdgpu: GPU reset begin! amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110 amdgpu 0000:07:00.0: amdgpu: GPU reset(1) failed
Related commits:
e8946a53e2a6 ("PCI: Mark AMD Navi14 GPU ATS as broken") a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some platforms") 45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken") 5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken") d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken") 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken")
[bhelgaas: add symptoms and related commits] Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1760 Link: https://lore.kernel.org/r/20220222160801.841643-1-alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Acked-by: Christian König christian.koenig@amd.com Acked-by: Guchun Chen guchun.chen@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/quirks.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 20a932690738..db864bf634a3 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5344,11 +5344,6 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, 0x0422, quirk_no_ext_tags); */ static void quirk_amd_harvest_no_ats(struct pci_dev *pdev) { - if ((pdev->device == 0x7312 && pdev->revision != 0x00) || - (pdev->device == 0x7340 && pdev->revision != 0xc5) || - (pdev->device == 0x7341 && pdev->revision != 0x00)) - return; - if (pdev->device == 0x15d8) { if (pdev->revision == 0xcf && pdev->subsystem_vendor == 0xea50 && @@ -5370,10 +5365,19 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_amd_harvest_no_ats); /* AMD Iceland dGPU */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900, quirk_amd_harvest_no_ats); /* AMD Navi10 dGPU */ +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7310, quirk_amd_harvest_no_ats); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7318, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7319, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x731a, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x731b, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x731e, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x731f, quirk_amd_harvest_no_ats); /* AMD Navi14 dGPU */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7347, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x734f, quirk_amd_harvest_no_ats); /* AMD Raven platform iGPU */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, quirk_amd_harvest_no_ats); #endif /* CONFIG_PCI_ATS */
From: Shreeya Patel shreeya.patel@collabora.com
[ Upstream commit ae42f9288846353982e2eab181fb41e7fd8bf60f ]
We are racing the registering of .to_irq when probing the i2c driver. This results in random failure of touchscreen devices.
Following explains the race condition better.
[gpio driver] gpio driver registers gpio chip [gpio consumer] gpio is acquired [gpio consumer] gpiod_to_irq() fails with -ENXIO [gpio driver] gpio driver registers irqchip gpiod_to_irq works at this point, but -ENXIO is fatal
We could see the following errors in dmesg logs when gc->to_irq is NULL
[2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ [2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22
To avoid this situation, defer probing until to_irq is registered. Returning -EPROBE_DEFER would be the first step towards avoiding the failure of devices due to the race in registration of .to_irq. Final solution to this issue would be to avoid using gc irq members until they are fully initialized.
This issue has been reported many times in past and people have been using workarounds like changing the pinctrl_amd to built-in instead of loading it as a module or by adding a softdep for pinctrl_amd into the config file.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413 Reviewed-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Reported-by: kernel test robot lkp@intel.com Signed-off-by: Shreeya Patel shreeya.patel@collabora.com Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index a1dca6dc03b4..dcb0dca651ac 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -3121,6 +3121,16 @@ int gpiod_to_irq(const struct gpio_desc *desc)
return retirq; } +#ifdef CONFIG_GPIOLIB_IRQCHIP + if (gc->irq.chip) { + /* + * Avoid race condition with other code, which tries to lookup + * an IRQ before the irqchip has been properly registered, + * i.e. while gpiochip is still being brought up. + */ + return -EPROBE_DEFER; + } +#endif return -ENXIO; } EXPORT_SYMBOL_GPL(gpiod_to_irq);
From: Guchun Chen guchun.chen@amd.com
[ Upstream commit e2b993302f40c4eb714ecf896dd9e1c5be7d4cd7 ]
vkms leverages common amdgpu framebuffer creation, and also as it does not support FB modifier, there is no need to check tiling flags when initing framebuffer when virtual display is enabled.
This can fix below calltrace:
amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]
v2: check adev->enable_virtual_display instead as vkms can be enabled in bare metal as well.
Signed-off-by: Leslie Shi Yuliang.Shi@amd.com Signed-off-by: Guchun Chen guchun.chen@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index dc50c05f23fc..5c08047adb59 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -1145,7 +1145,7 @@ int amdgpu_display_framebuffer_init(struct drm_device *dev, if (ret) return ret;
- if (!dev->mode_config.allow_fb_modifiers) { + if (!dev->mode_config.allow_fb_modifiers && !adev->enable_virtual_display) { drm_WARN_ONCE(dev, adev->family >= AMDGPU_FAMILY_AI, "GFX9+ requires FB check based on format modifier\n"); ret = check_tiling_flags_gfx6(rfb);
From: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com
[ Upstream commit 0f4558ae91870692ce7f509c31c9d6ee721d8cdc ]
This reverts commit 1f2565780e9b7218cf92c7630130e82dcc0fe9c2.
The 'hotplug-status' node should not be removed as long as the vif device remains configured. Otherwise the xen-netback would wait for re-running the network script even if it was already called (in case of the frontent re-connecting). But also, it _should_ be removed when the vif device is destroyed (for example when unbinding the driver) - otherwise hotplug script would not configure the device whenever it re-appear.
Moving removal of the 'hotplug-status' node was a workaround for nothing calling network script after xen-netback module is reloaded. But when vif interface is re-created (on xen-netback unbind/bind for example), the script should be called, regardless of who does that - currently this case is not handled by the toolstack, and requires manual script call. Keeping hotplug-status=connected to skip the call is wrong and leads to not configured interface.
More discussion at https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe....
Signed-off-by: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Reviewed-by: Paul Durrant paul@xen.org Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingsl... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/xenbus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index d24b7a7993aa..3fad58d22155 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -256,6 +256,7 @@ static void backend_disconnect(struct backend_info *be) unsigned int queue_index;
xen_unregister_watchers(vif); + xenbus_rm(XBT_NIL, be->dev->nodename, "hotplug-status"); #ifdef CONFIG_DEBUG_FS xenvif_debugfs_delif(vif); #endif /* CONFIG_DEBUG_FS */ @@ -675,7 +676,6 @@ static void hotplug_status_changed(struct xenbus_watch *watch,
/* Not interested in this watch anymore. */ unregister_hotplug_status_watch(be); - xenbus_rm(XBT_NIL, be->dev->nodename, "hotplug-status"); } kfree(str); }
From: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com
[ Upstream commit e8240addd0a3919e0fd7436416afe9aa6429c484 ]
This reverts commit 2afeec08ab5c86ae21952151f726bfe184f6b23d.
The reasoning in the commit was wrong - the code expected to setup the watch even if 'hotplug-status' didn't exist. In fact, it relied on the watch being fired the first time - to check if maybe 'hotplug-status' is already set to 'connected'. Not registering a watch for non-existing path (which is the case if hotplug script hasn't been executed yet), made the backend not waiting for the hotplug script to execute. This in turns, made the netfront think the interface is fully operational, while in fact it was not (the vif interface on xen-netback side might not be configured yet).
This was a workaround for 'hotplug-status' erroneously being removed. But since that is reverted now, the workaround is not necessary either.
More discussion at https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe....
Signed-off-by: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Reviewed-by: Paul Durrant paul@xen.org Reviewed-by: Michael Brown mbrown@fensystems.co.uk Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingsl... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/xen-netback/xenbus.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index 3fad58d22155..990360d75cb6 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -824,15 +824,11 @@ static void connect(struct backend_info *be) xenvif_carrier_on(be->vif);
unregister_hotplug_status_watch(be); - if (xenbus_exists(XBT_NIL, dev->nodename, "hotplug-status")) { - err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, - NULL, hotplug_status_changed, - "%s/%s", dev->nodename, - "hotplug-status"); - if (err) - goto err; + err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, NULL, + hotplug_status_changed, + "%s/%s", dev->nodename, "hotplug-status"); + if (!err) be->have_hotplug_status_watch = 1; - }
netif_tx_wake_all_queues(be->vif->dev);
From: Niels Dossche dossche.niels@gmail.com
[ Upstream commit 6c0d8833a605e195ae219b5042577ce52bf71fff ]
valid_lft, prefered_lft and tstamp are always accessed under the lock "lock" in other places. Reading these without taking the lock may result in inconsistencies regarding the calculation of the valid and preferred variables since decisions are taken on these fields for those variables.
Signed-off-by: Niels Dossche dossche.niels@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Niels Dossche niels.dossche@ugent.be Link: https://lore.kernel.org/r/20220223131954.6570-1-niels.dossche@ugent.be Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/addrconf.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 7c78e1215ae3..e92ca415756a 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -5002,6 +5002,7 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa, nla_put_s32(skb, IFA_TARGET_NETNSID, args->netnsid)) goto error;
+ spin_lock_bh(&ifa->lock); if (!((ifa->flags&IFA_F_PERMANENT) && (ifa->prefered_lft == INFINITY_LIFE_TIME))) { preferred = ifa->prefered_lft; @@ -5023,6 +5024,7 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa, preferred = INFINITY_LIFE_TIME; valid = INFINITY_LIFE_TIME; } + spin_unlock_bh(&ifa->lock);
if (!ipv6_addr_any(&ifa->peer_addr)) { if (nla_put_in6_addr(skb, IFA_LOCAL, &ifa->addr) < 0 ||
From: Sven Schnelle svens@linux.ibm.com
[ Upstream commit 7acf3a127bb7c65ff39099afd78960e77b2ca5de ]
Booting the kernel with 'trace_buf_size=1' give a warning at boot during the ftrace selftests:
[ 0.892809] Running postponed tracer tests: [ 0.892893] Testing tracer function: [ 0.901899] Callback from call_rcu_tasks_trace() invoked. [ 0.983829] Callback from call_rcu_tasks_rude() invoked. [ 1.072003] .. bad ring buffer .. corrupted trace buffer .. [ 1.091944] Callback from call_rcu_tasks() invoked. [ 1.097695] PASSED [ 1.097701] Testing dynamic ftrace: .. filter failed count=0 ..FAILED! [ 1.353474] ------------[ cut here ]------------ [ 1.353478] WARNING: CPU: 0 PID: 1 at kernel/trace/trace.c:1951 run_tracer_selftest+0x13c/0x1b0
Therefore enforce a minimum of 4096 bytes to make the selftest pass.
Link: https://lkml.kernel.org/r/20220214134456.1751749-1-svens@linux.ibm.com
Signed-off-by: Sven Schnelle svens@linux.ibm.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 24683115eade..5816ad79cce8 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1472,10 +1472,12 @@ static int __init set_buf_size(char *str) if (!str) return 0; buf_size = memparse(str, &str); - /* nr_entries can not be zero */ - if (buf_size == 0) - return 0; - trace_buf_size = buf_size; + /* + * nr_entries can not be zero and the startup + * tests require some buffer space. Therefore + * ensure we have at least 4096 bytes of buffer. + */ + trace_buf_size = max(4096UL, buf_size); return 1; } __setup("trace_buf_size=", set_buf_size);
From: Daniel Bristot de Oliveira bristot@kernel.org
[ Upstream commit dd990352f01ee9a6c6eee152e5d11c021caccfe4 ]
osnoise's runtime and period are in the microseconds scale, but it is currently sleeping in the millisecond's scale. This behavior roots in the usage of hwlat as the skeleton for osnoise.
Make osnoise to sleep in the microseconds scale. Also, move the sleep to a specialized function.
Link: https://lkml.kernel.org/r/302aa6c7bdf2d131719b22901905e9da122a11b2.164519733...
Cc: Ingo Molnar mingo@redhat.com Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace_osnoise.c | 53 ++++++++++++++++++++++-------------- 1 file changed, 32 insertions(+), 21 deletions(-)
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index b58674e8644a..58c788b0ca27 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -1437,6 +1437,37 @@ static int run_osnoise(void) static struct cpumask osnoise_cpumask; static struct cpumask save_cpumask;
+/* + * osnoise_sleep - sleep until the next period + */ +static void osnoise_sleep(void) +{ + u64 interval; + ktime_t wake_time; + + mutex_lock(&interface_lock); + interval = osnoise_data.sample_period - osnoise_data.sample_runtime; + mutex_unlock(&interface_lock); + + /* + * differently from hwlat_detector, the osnoise tracer can run + * without a pause because preemption is on. + */ + if (!interval) { + /* Let synchronize_rcu_tasks() make progress */ + cond_resched_tasks_rcu_qs(); + return; + } + + wake_time = ktime_add_us(ktime_get(), interval); + __set_current_state(TASK_INTERRUPTIBLE); + + while (schedule_hrtimeout_range(&wake_time, 0, HRTIMER_MODE_ABS)) { + if (kthread_should_stop()) + break; + } +} + /* * osnoise_main - The osnoise detection kernel thread * @@ -1445,30 +1476,10 @@ static struct cpumask save_cpumask; */ static int osnoise_main(void *data) { - u64 interval;
while (!kthread_should_stop()) { - run_osnoise(); - - mutex_lock(&interface_lock); - interval = osnoise_data.sample_period - osnoise_data.sample_runtime; - mutex_unlock(&interface_lock); - - do_div(interval, USEC_PER_MSEC); - - /* - * differently from hwlat_detector, the osnoise tracer can run - * without a pause because preemption is on. - */ - if (interval < 1) { - /* Let synchronize_rcu_tasks() make progress */ - cond_resched_tasks_rcu_qs(); - continue; - } - - if (msleep_interruptible(interval)) - break; + osnoise_sleep(); }
return 0;
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit c5229a0bd47814770c895e94fbc97ad21819abfe ]
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is required to test direct tramp.
Link: https://lkml.kernel.org/r/bdc7e594e13b0891c1d61bc8d56c94b1890eaed7.164001796...
Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace_selftest.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c index afd937a46496..abcadbe933bb 100644 --- a/kernel/trace/trace_selftest.c +++ b/kernel/trace/trace_selftest.c @@ -784,9 +784,7 @@ static struct fgraph_ops fgraph_ops __initdata = { .retfunc = &trace_graph_return, };
-#if defined(CONFIG_DYNAMIC_FTRACE) && \ - defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS) -#define TEST_DIRECT_TRAMP +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS noinline __noclone static void trace_direct_tramp(void) { } #endif
@@ -849,7 +847,7 @@ trace_selftest_startup_function_graph(struct tracer *trace, goto out; }
-#ifdef TEST_DIRECT_TRAMP +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS tracing_reset_online_cpus(&tr->array_buffer); set_graph_array(tr);
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit f39c58008dee7ab5fc94c3f1995a21e886801df0 ]
On the latest RHEL the test fails due to executable mapped at 256MB address
# ./map_fixed_noreplace mmap() @ 0x10000000-0x10050000 p=0xffffffffffffffff result=File exists 10000000-10010000 r-xp 00000000 fd:04 34905657 /root/rpmbuild/BUILD/kernel-5.14.0-56.el9/linux-5.14.0-56.el9.ppc64le/tools/testing/selftests/vm/map_fixed_noreplace 10010000-10020000 r--p 00000000 fd:04 34905657 /root/rpmbuild/BUILD/kernel-5.14.0-56.el9/linux-5.14.0-56.el9.ppc64le/tools/testing/selftests/vm/map_fixed_noreplace 10020000-10030000 rw-p 00010000 fd:04 34905657 /root/rpmbuild/BUILD/kernel-5.14.0-56.el9/linux-5.14.0-56.el9.ppc64le/tools/testing/selftests/vm/map_fixed_noreplace 10029b90000-10029bc0000 rw-p 00000000 00:00 0 [heap] 7fffbb510000-7fffbb750000 r-xp 00000000 fd:04 24534 /usr/lib64/libc.so.6 7fffbb750000-7fffbb760000 r--p 00230000 fd:04 24534 /usr/lib64/libc.so.6 7fffbb760000-7fffbb770000 rw-p 00240000 fd:04 24534 /usr/lib64/libc.so.6 7fffbb780000-7fffbb7a0000 r--p 00000000 00:00 0 [vvar] 7fffbb7a0000-7fffbb7b0000 r-xp 00000000 00:00 0 [vdso] 7fffbb7b0000-7fffbb800000 r-xp 00000000 fd:04 24514 /usr/lib64/ld64.so.2 7fffbb800000-7fffbb810000 r--p 00040000 fd:04 24514 /usr/lib64/ld64.so.2 7fffbb810000-7fffbb820000 rw-p 00050000 fd:04 24514 /usr/lib64/ld64.so.2 7fffd93f0000-7fffd9420000 rw-p 00000000 00:00 0 [stack] Error: couldn't map the space we need for the test
Fix this by finding a free address using mmap instead of hardcoding BASE_ADDRESS.
Link: https://lkml.kernel.org/r/20220217083417.373823-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Cc: Michael Ellerman mpe@ellerman.id.au Cc: Jann Horn jannh@google.com Cc: Shuah Khan shuah@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/vm/map_fixed_noreplace.c | 49 ++++++++++++++----- 1 file changed, 37 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/vm/map_fixed_noreplace.c b/tools/testing/selftests/vm/map_fixed_noreplace.c index d91bde511268..eed44322d1a6 100644 --- a/tools/testing/selftests/vm/map_fixed_noreplace.c +++ b/tools/testing/selftests/vm/map_fixed_noreplace.c @@ -17,9 +17,6 @@ #define MAP_FIXED_NOREPLACE 0x100000 #endif
-#define BASE_ADDRESS (256ul * 1024 * 1024) - - static void dump_maps(void) { char cmd[32]; @@ -28,18 +25,46 @@ static void dump_maps(void) system(cmd); }
+static unsigned long find_base_addr(unsigned long size) +{ + void *addr; + unsigned long flags; + + flags = MAP_PRIVATE | MAP_ANONYMOUS; + addr = mmap(NULL, size, PROT_NONE, flags, -1, 0); + if (addr == MAP_FAILED) { + printf("Error: couldn't map the space we need for the test\n"); + return 0; + } + + if (munmap(addr, size) != 0) { + printf("Error: couldn't map the space we need for the test\n"); + return 0; + } + return (unsigned long)addr; +} + int main(void) { + unsigned long base_addr; unsigned long flags, addr, size, page_size; char *p;
page_size = sysconf(_SC_PAGE_SIZE);
+ //let's find a base addr that is free before we start the tests + size = 5 * page_size; + base_addr = find_base_addr(size); + if (!base_addr) { + printf("Error: couldn't map the space we need for the test\n"); + return 1; + } + flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE;
// Check we can map all the areas we need below errno = 0; - addr = BASE_ADDRESS; + addr = base_addr; size = 5 * page_size; p = mmap((void *)addr, size, PROT_NONE, flags, -1, 0);
@@ -60,7 +85,7 @@ int main(void) printf("unmap() successful\n");
errno = 0; - addr = BASE_ADDRESS + page_size; + addr = base_addr + page_size; size = 3 * page_size; p = mmap((void *)addr, size, PROT_NONE, flags, -1, 0); printf("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p); @@ -80,7 +105,7 @@ int main(void) * +4 | free | new */ errno = 0; - addr = BASE_ADDRESS; + addr = base_addr; size = 5 * page_size; p = mmap((void *)addr, size, PROT_NONE, flags, -1, 0); printf("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p); @@ -101,7 +126,7 @@ int main(void) * +4 | free | */ errno = 0; - addr = BASE_ADDRESS + (2 * page_size); + addr = base_addr + (2 * page_size); size = page_size; p = mmap((void *)addr, size, PROT_NONE, flags, -1, 0); printf("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p); @@ -121,7 +146,7 @@ int main(void) * +4 | free | new */ errno = 0; - addr = BASE_ADDRESS + (3 * page_size); + addr = base_addr + (3 * page_size); size = 2 * page_size; p = mmap((void *)addr, size, PROT_NONE, flags, -1, 0); printf("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p); @@ -141,7 +166,7 @@ int main(void) * +4 | free | */ errno = 0; - addr = BASE_ADDRESS; + addr = base_addr; size = 2 * page_size; p = mmap((void *)addr, size, PROT_NONE, flags, -1, 0); printf("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p); @@ -161,7 +186,7 @@ int main(void) * +4 | free | */ errno = 0; - addr = BASE_ADDRESS; + addr = base_addr; size = page_size; p = mmap((void *)addr, size, PROT_NONE, flags, -1, 0); printf("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p); @@ -181,7 +206,7 @@ int main(void) * +4 | free | new */ errno = 0; - addr = BASE_ADDRESS + (4 * page_size); + addr = base_addr + (4 * page_size); size = page_size; p = mmap((void *)addr, size, PROT_NONE, flags, -1, 0); printf("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p); @@ -192,7 +217,7 @@ int main(void) return 1; }
- addr = BASE_ADDRESS; + addr = base_addr; size = 5 * page_size; if (munmap((void *)addr, size) != 0) { dump_maps();
From: Mike Kravetz mike.kravetz@oracle.com
[ Upstream commit fda153c89af344d21df281009a9d046cf587ea0f ]
Running the memfd script ./run_hugetlbfs_test.sh will often end in error as follows:
memfd-hugetlb: CREATE memfd-hugetlb: BASIC memfd-hugetlb: SEAL-WRITE memfd-hugetlb: SEAL-FUTURE-WRITE memfd-hugetlb: SEAL-SHRINK fallocate(ALLOC) failed: No space left on device ./run_hugetlbfs_test.sh: line 60: 166855 Aborted (core dumped) ./memfd_test hugetlbfs opening: ./mnt/memfd fuse: DONE
If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will allocate 'just enough' pages to run the test. In the SEAL-FUTURE-WRITE test the mfd_fail_write routine maps the file, but does not unmap. As a result, two hugetlb pages remain reserved for the mapping. When the fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb pages, it is short by the two reserved pages.
Fix by making sure to unmap in mfd_fail_write.
Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Cc: Joel Fernandes joel@joelfernandes.org Cc: Shuah Khan shuah@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/memfd/memfd_test.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index 192a2899bae8..94df2692e6e4 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -455,6 +455,7 @@ static void mfd_fail_write(int fd) printf("mmap()+mprotect() didn't fail as expected\n"); abort(); } + munmap(p, mfd_def_size); }
/* verify PUNCH_HOLE fails */
From: Randy Dunlap rdunlap@infradead.org
commit 68453767131a5deec1e8f9ac92a9042f929e585d upstream.
When CONFIG_GENERIC_CPU_VULNERABILITIES is not set, references to spectre_v2_update_state() cause a build error, so provide an empty stub for that function when the Kconfig option is not set.
Fixes this build error:
arm-linux-gnueabi-ld: arch/arm/mm/proc-v7-bugs.o: in function `cpu_v7_bugs_init': proc-v7-bugs.c:(.text+0x52): undefined reference to `spectre_v2_update_state' arm-linux-gnueabi-ld: proc-v7-bugs.c:(.text+0x82): undefined reference to `spectre_v2_update_state'
Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: kernel test robot lkp@intel.com Cc: Russell King rmk+kernel@armlinux.org.uk Cc: Catalin Marinas catalin.marinas@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: patches@armlinux.org.uk Acked-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/spectre.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/arch/arm/include/asm/spectre.h +++ b/arch/arm/include/asm/spectre.h @@ -25,7 +25,13 @@ enum { SPECTRE_V2_METHOD_LOOP8 = BIT(__SPECTRE_V2_METHOD_LOOP8), };
+#ifdef CONFIG_GENERIC_CPU_VULNERABILITIES void spectre_v2_update_state(unsigned int state, unsigned int methods); +#else +static inline void spectre_v2_update_state(unsigned int state, + unsigned int methods) +{} +#endif
int spectre_bhb_update_vectors(unsigned int method);
From: Miklos Szeredi mszeredi@redhat.com
commit a679a61520d8a7b0211a1da990404daf5cc80b72 upstream.
The fileattr API conversion broke lsattr on ntfs3g.
Previously the ioctl(... FS_IOC_GETFLAGS) returned an EINVAL error, but after the conversion the error returned by the fuse filesystem was not propagated back to the ioctl() system call, resulting in success being returned with bogus values.
Fix by checking for outarg.result in fuse_priv_ioctl(), just as generic ioctl code does.
Reported-by: Jean-Pierre André jean-pierre.andre@wanadoo.fr Fixes: 72227eac177d ("fuse: convert to fileattr") Cc: stable@vger.kernel.org # v5.13 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/ioctl.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/fs/fuse/ioctl.c +++ b/fs/fuse/ioctl.c @@ -394,9 +394,12 @@ static int fuse_priv_ioctl(struct inode args.out_args[1].value = ptr;
err = fuse_simple_request(fm, &args); - if (!err && outarg.flags & FUSE_IOCTL_RETRY) - err = -EIO; - + if (!err) { + if (outarg.result < 0) + err = outarg.result; + else if (outarg.flags & FUSE_IOCTL_RETRY) + err = -EIO; + } return err; }
From: Miklos Szeredi mszeredi@redhat.com
commit 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 upstream.
In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then imports the write buffer with fuse_get_user_pages(), which uses iov_iter_get_pages() to grab references to userspace pages instead of actually copying memory.
On the filesystem device side, these pages can then either be read to userspace (via fuse_dev_read()), or splice()d over into a pipe using fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
This is wrong because after fuse_dev_do_read() unlocks the FUSE request, the userspace filesystem can mark the request as completed, causing write() to return. At that point, the userspace filesystem should no longer have access to the pipe buffer.
Fix by copying pages coming from the user address space to new pipe buffers.
Reported-by: Jann Horn jannh@google.com Fixes: c3021629a0d8 ("fuse: support splice() reading from fuse device") Cc: stable@vger.kernel.org Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/dev.c | 12 +++++++++++- fs/fuse/file.c | 1 + fs/fuse/fuse_i.h | 1 + 3 files changed, 13 insertions(+), 1 deletion(-)
--- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -941,7 +941,17 @@ static int fuse_copy_page(struct fuse_co
while (count) { if (cs->write && cs->pipebufs && page) { - return fuse_ref_page(cs, page, offset, count); + /* + * Can't control lifetime of pipe buffers, so always + * copy user pages. + */ + if (cs->req->args->user_pages) { + err = fuse_copy_fill(cs); + if (err) + return err; + } else { + return fuse_ref_page(cs, page, offset, count); + } } else if (!cs->len) { if (cs->move_pages && page && offset == 0 && count == PAGE_SIZE) { --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1413,6 +1413,7 @@ static int fuse_get_user_pages(struct fu (PAGE_SIZE - ret) & (PAGE_SIZE - 1); }
+ ap->args.user_pages = true; if (write) ap->args.in_pages = true; else --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -256,6 +256,7 @@ struct fuse_args { bool nocreds:1; bool in_pages:1; bool out_pages:1; + bool user_pages:1; bool out_argvar:1; bool page_zeroing:1; bool page_replace:1;
From: Hans de Goede hdegoede@redhat.com
commit 8f4347081be32e67b0873827e0138ab0fdaaf450 upstream.
Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock into 2 locks in attempt to fix a lockdep reported issue with the locking order of the sta_hash_lock vs pxmitpriv->lock.
But in the end this turned out to not fully solve the sta_hash_lock issue so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible deadlock") was added to fix this in another way.
The original fix was kept as it was still seen as a good thing to have, but now it turns out that it creates a deadlock in access-point mode:
[Feb20 23:47] ====================================================== [ +0.074085] WARNING: possible circular locking dependency detected [ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E [ +0.064710] ------------------------------------------------------ [ +0.074075] ksoftirqd/3/29 is trying to acquire lock: [ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs] [ +0.114921] but task is already holding lock: [ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs] [ +0.116976] which lock already depends on the new lock.
[ +0.098037] the existing dependency chain (in reverse order) is: [ +0.089704] -> #1 (&psta->sleep_q.lock){+.-.}-{2:2}: [ +0.077232] _raw_spin_lock_bh+0x34/0x40 [ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs] [ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs] [ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs] [ +0.062755] dev_hard_start_xmit+0xf1/0x320 [ +0.056381] sch_direct_xmit+0x9e/0x360 [ +0.052212] __dev_queue_xmit+0xce4/0x1080 [ +0.055334] ip6_finish_output2+0x18f/0x6e0 [ +0.056378] ndisc_send_skb+0x2c8/0x870 [ +0.052209] ndisc_send_ns+0xd3/0x210 [ +0.050130] addrconf_dad_work+0x3df/0x5a0 [ +0.055338] process_one_work+0x274/0x5a0 [ +0.054296] worker_thread+0x52/0x3b0 [ +0.050124] kthread+0x16c/0x1a0 [ +0.044925] ret_from_fork+0x1f/0x30 [ +0.049092] -> #0 (&pxmitpriv->lock){+.-.}-{2:2}: [ +0.074101] __lock_acquire+0x10f5/0x1d80 [ +0.054298] lock_acquire+0xd7/0x300 [ +0.049088] _raw_spin_lock_bh+0x34/0x40 [ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs] [ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs] [ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs] [ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs] [ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs] [ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs] [ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110 [ +0.070966] __do_softirq+0x16f/0x50a [ +0.050134] __irq_exit_rcu+0xeb/0x140 [ +0.051172] irq_exit_rcu+0xa/0x20 [ +0.047006] common_interrupt+0xb8/0xd0 [ +0.052214] asm_common_interrupt+0x1e/0x40 [ +0.056381] finish_task_switch.isra.0+0x100/0x3a0 [ +0.063670] __schedule+0x3ad/0xd20 [ +0.048047] schedule+0x4e/0xc0 [ +0.043880] smpboot_thread_fn+0xc4/0x220 [ +0.054298] kthread+0x16c/0x1a0 [ +0.044922] ret_from_fork+0x1f/0x30 [ +0.049088] other info that might help us debug this:
[ +0.095950] Possible unsafe locking scenario:
[ +0.070952] CPU0 CPU1 [ +0.054282] ---- ---- [ +0.054285] lock(&psta->sleep_q.lock); [ +0.047004] lock(&pxmitpriv->lock); [ +0.074082] lock(&psta->sleep_q.lock); [ +0.077209] lock(&pxmitpriv->lock); [ +0.043873] *** DEADLOCK ***
[ +0.070950] 1 lock held by ksoftirqd/3/29: [ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
Analysis shows that in hindsight the splitting of the lock was not a good idea, so revert this to fix the access-point mode deadlock.
Note this is a straight-forward revert done with git revert, the commented out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the code before the reverted changes.
Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)") Cc: stable stable@vger.kernel.org Cc: Fabio Aiuto fabioaiuto83@gmail.com Signed-off-by: Hans de Goede hdegoede@redhat.com BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542 Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/rtl8723bs/core/rtw_mlme_ext.c | 7 +++++-- drivers/staging/rtl8723bs/core/rtw_recv.c | 10 +++++++--- drivers/staging/rtl8723bs/core/rtw_sta_mgt.c | 22 ++++++++++------------ drivers/staging/rtl8723bs/core/rtw_xmit.c | 16 +++++++++------- drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c | 2 ++ 5 files changed, 33 insertions(+), 24 deletions(-)
--- a/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c +++ b/drivers/staging/rtl8723bs/core/rtw_mlme_ext.c @@ -5907,6 +5907,7 @@ u8 chk_bmc_sleepq_hdl(struct adapter *pa struct sta_info *psta_bmc; struct list_head *xmitframe_plist, *xmitframe_phead, *tmp; struct xmit_frame *pxmitframe = NULL; + struct xmit_priv *pxmitpriv = &padapter->xmitpriv; struct sta_priv *pstapriv = &padapter->stapriv;
/* for BC/MC Frames */ @@ -5917,7 +5918,8 @@ u8 chk_bmc_sleepq_hdl(struct adapter *pa if ((pstapriv->tim_bitmap&BIT(0)) && (psta_bmc->sleepq_len > 0)) { msleep(10);/* 10ms, ATIM(HIQ) Windows */
- spin_lock_bh(&psta_bmc->sleep_q.lock); + /* spin_lock_bh(&psta_bmc->sleep_q.lock); */ + spin_lock_bh(&pxmitpriv->lock);
xmitframe_phead = get_list_head(&psta_bmc->sleep_q); list_for_each_safe(xmitframe_plist, tmp, xmitframe_phead) { @@ -5940,7 +5942,8 @@ u8 chk_bmc_sleepq_hdl(struct adapter *pa rtw_hal_xmitframe_enqueue(padapter, pxmitframe); }
- spin_unlock_bh(&psta_bmc->sleep_q.lock); + /* spin_unlock_bh(&psta_bmc->sleep_q.lock); */ + spin_unlock_bh(&pxmitpriv->lock);
/* check hi queue and bmc_sleepq */ rtw_chk_hi_queue_cmd(padapter); --- a/drivers/staging/rtl8723bs/core/rtw_recv.c +++ b/drivers/staging/rtl8723bs/core/rtw_recv.c @@ -957,8 +957,10 @@ static signed int validate_recv_ctrl_fra if ((psta->state&WIFI_SLEEP_STATE) && (pstapriv->sta_dz_bitmap&BIT(psta->aid))) { struct list_head *xmitframe_plist, *xmitframe_phead; struct xmit_frame *pxmitframe = NULL; + struct xmit_priv *pxmitpriv = &padapter->xmitpriv;
- spin_lock_bh(&psta->sleep_q.lock); + /* spin_lock_bh(&psta->sleep_q.lock); */ + spin_lock_bh(&pxmitpriv->lock);
xmitframe_phead = get_list_head(&psta->sleep_q); xmitframe_plist = get_next(xmitframe_phead); @@ -989,10 +991,12 @@ static signed int validate_recv_ctrl_fra update_beacon(padapter, WLAN_EID_TIM, NULL, true); }
- spin_unlock_bh(&psta->sleep_q.lock); + /* spin_unlock_bh(&psta->sleep_q.lock); */ + spin_unlock_bh(&pxmitpriv->lock);
} else { - spin_unlock_bh(&psta->sleep_q.lock); + /* spin_unlock_bh(&psta->sleep_q.lock); */ + spin_unlock_bh(&pxmitpriv->lock);
if (pstapriv->tim_bitmap&BIT(psta->aid)) { if (psta->sleepq_len == 0) { --- a/drivers/staging/rtl8723bs/core/rtw_sta_mgt.c +++ b/drivers/staging/rtl8723bs/core/rtw_sta_mgt.c @@ -293,48 +293,46 @@ u32 rtw_free_stainfo(struct adapter *pad
/* list_del_init(&psta->wakeup_list); */
- spin_lock_bh(&psta->sleep_q.lock); + spin_lock_bh(&pxmitpriv->lock); + rtw_free_xmitframe_queue(pxmitpriv, &psta->sleep_q); psta->sleepq_len = 0; - spin_unlock_bh(&psta->sleep_q.lock); - - spin_lock_bh(&pxmitpriv->lock);
/* vo */ - spin_lock_bh(&pstaxmitpriv->vo_q.sta_pending.lock); + /* spin_lock_bh(&(pxmitpriv->vo_pending.lock)); */ rtw_free_xmitframe_queue(pxmitpriv, &pstaxmitpriv->vo_q.sta_pending); list_del_init(&(pstaxmitpriv->vo_q.tx_pending)); phwxmit = pxmitpriv->hwxmits; phwxmit->accnt -= pstaxmitpriv->vo_q.qcnt; pstaxmitpriv->vo_q.qcnt = 0; - spin_unlock_bh(&pstaxmitpriv->vo_q.sta_pending.lock); + /* spin_unlock_bh(&(pxmitpriv->vo_pending.lock)); */
/* vi */ - spin_lock_bh(&pstaxmitpriv->vi_q.sta_pending.lock); + /* spin_lock_bh(&(pxmitpriv->vi_pending.lock)); */ rtw_free_xmitframe_queue(pxmitpriv, &pstaxmitpriv->vi_q.sta_pending); list_del_init(&(pstaxmitpriv->vi_q.tx_pending)); phwxmit = pxmitpriv->hwxmits+1; phwxmit->accnt -= pstaxmitpriv->vi_q.qcnt; pstaxmitpriv->vi_q.qcnt = 0; - spin_unlock_bh(&pstaxmitpriv->vi_q.sta_pending.lock); + /* spin_unlock_bh(&(pxmitpriv->vi_pending.lock)); */
/* be */ - spin_lock_bh(&pstaxmitpriv->be_q.sta_pending.lock); + /* spin_lock_bh(&(pxmitpriv->be_pending.lock)); */ rtw_free_xmitframe_queue(pxmitpriv, &pstaxmitpriv->be_q.sta_pending); list_del_init(&(pstaxmitpriv->be_q.tx_pending)); phwxmit = pxmitpriv->hwxmits+2; phwxmit->accnt -= pstaxmitpriv->be_q.qcnt; pstaxmitpriv->be_q.qcnt = 0; - spin_unlock_bh(&pstaxmitpriv->be_q.sta_pending.lock); + /* spin_unlock_bh(&(pxmitpriv->be_pending.lock)); */
/* bk */ - spin_lock_bh(&pstaxmitpriv->bk_q.sta_pending.lock); + /* spin_lock_bh(&(pxmitpriv->bk_pending.lock)); */ rtw_free_xmitframe_queue(pxmitpriv, &pstaxmitpriv->bk_q.sta_pending); list_del_init(&(pstaxmitpriv->bk_q.tx_pending)); phwxmit = pxmitpriv->hwxmits+3; phwxmit->accnt -= pstaxmitpriv->bk_q.qcnt; pstaxmitpriv->bk_q.qcnt = 0; - spin_unlock_bh(&pstaxmitpriv->bk_q.sta_pending.lock); + /* spin_unlock_bh(&(pxmitpriv->bk_pending.lock)); */
spin_unlock_bh(&pxmitpriv->lock);
--- a/drivers/staging/rtl8723bs/core/rtw_xmit.c +++ b/drivers/staging/rtl8723bs/core/rtw_xmit.c @@ -1734,12 +1734,15 @@ void rtw_free_xmitframe_queue(struct xmi struct list_head *plist, *phead, *tmp; struct xmit_frame *pxmitframe;
+ spin_lock_bh(&pframequeue->lock); + phead = get_list_head(pframequeue); list_for_each_safe(plist, tmp, phead) { pxmitframe = list_entry(plist, struct xmit_frame, list);
rtw_free_xmitframe(pxmitpriv, pxmitframe); } + spin_unlock_bh(&pframequeue->lock); }
s32 rtw_xmitframe_enqueue(struct adapter *padapter, struct xmit_frame *pxmitframe) @@ -1794,7 +1797,6 @@ s32 rtw_xmit_classifier(struct adapter * struct sta_info *psta; struct tx_servq *ptxservq; struct pkt_attrib *pattrib = &pxmitframe->attrib; - struct xmit_priv *xmit_priv = &padapter->xmitpriv; struct hw_xmit *phwxmits = padapter->xmitpriv.hwxmits; signed int res = _SUCCESS;
@@ -1812,14 +1814,12 @@ s32 rtw_xmit_classifier(struct adapter *
ptxservq = rtw_get_sta_pending(padapter, psta, pattrib->priority, (u8 *)(&ac_index));
- spin_lock_bh(&xmit_priv->lock); if (list_empty(&ptxservq->tx_pending)) list_add_tail(&ptxservq->tx_pending, get_list_head(phwxmits[ac_index].sta_queue));
list_add_tail(&pxmitframe->list, get_list_head(&ptxservq->sta_pending)); ptxservq->qcnt++; phwxmits[ac_index].accnt++; - spin_unlock_bh(&xmit_priv->lock);
exit:
@@ -2202,10 +2202,11 @@ void wakeup_sta_to_xmit(struct adapter * struct list_head *xmitframe_plist, *xmitframe_phead, *tmp; struct xmit_frame *pxmitframe = NULL; struct sta_priv *pstapriv = &padapter->stapriv; + struct xmit_priv *pxmitpriv = &padapter->xmitpriv;
psta_bmc = rtw_get_bcmc_stainfo(padapter);
- spin_lock_bh(&psta->sleep_q.lock); + spin_lock_bh(&pxmitpriv->lock);
xmitframe_phead = get_list_head(&psta->sleep_q); list_for_each_safe(xmitframe_plist, tmp, xmitframe_phead) { @@ -2306,7 +2307,7 @@ void wakeup_sta_to_xmit(struct adapter *
_exit:
- spin_unlock_bh(&psta->sleep_q.lock); + spin_unlock_bh(&pxmitpriv->lock);
if (update_mask) update_beacon(padapter, WLAN_EID_TIM, NULL, true); @@ -2318,8 +2319,9 @@ void xmit_delivery_enabled_frames(struct struct list_head *xmitframe_plist, *xmitframe_phead, *tmp; struct xmit_frame *pxmitframe = NULL; struct sta_priv *pstapriv = &padapter->stapriv; + struct xmit_priv *pxmitpriv = &padapter->xmitpriv;
- spin_lock_bh(&psta->sleep_q.lock); + spin_lock_bh(&pxmitpriv->lock);
xmitframe_phead = get_list_head(&psta->sleep_q); list_for_each_safe(xmitframe_plist, tmp, xmitframe_phead) { @@ -2372,7 +2374,7 @@ void xmit_delivery_enabled_frames(struct } }
- spin_unlock_bh(&psta->sleep_q.lock); + spin_unlock_bh(&pxmitpriv->lock); }
void enqueue_pending_xmitbuf(struct xmit_priv *pxmitpriv, struct xmit_buf *pxmitbuf) --- a/drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c +++ b/drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c @@ -507,7 +507,9 @@ s32 rtl8723bs_hal_xmit( rtw_issue_addbareq_cmd(padapter, pxmitframe); }
+ spin_lock_bh(&pxmitpriv->lock); err = rtw_xmitframe_enqueue(padapter, pxmitframe); + spin_unlock_bh(&pxmitpriv->lock); if (err != _SUCCESS) { rtw_free_xmitframe(pxmitpriv, pxmitframe);
From: Dan Carpenter dan.carpenter@oracle.com
commit fc7f750dc9d102c1ed7bbe4591f991e770c99033 upstream.
The netif_rx_ni() function frees the skb so we can't dereference it to save the skb->len.
Fixes: 61e121047645 ("staging: gdm7240: adding LTE USB driver") Cc: stable stable@vger.kernel.org Reported-by: kernel test robot lkp@intel.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Link: https://lore.kernel.org/r/20220228074331.GA13685@kili Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/gdm724x/gdm_lte.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/gdm724x/gdm_lte.c b/drivers/staging/gdm724x/gdm_lte.c index 493ed4821515..0d8d8fed283d 100644 --- a/drivers/staging/gdm724x/gdm_lte.c +++ b/drivers/staging/gdm724x/gdm_lte.c @@ -76,14 +76,15 @@ static void tx_complete(void *arg)
static int gdm_lte_rx(struct sk_buff *skb, struct nic *nic, int nic_type) { - int ret; + int ret, len;
+ len = skb->len + ETH_HLEN; ret = netif_rx_ni(skb); if (ret == NET_RX_DROP) { nic->stats.rx_dropped++; } else { nic->stats.rx_packets++; - nic->stats.rx_bytes += skb->len + ETH_HLEN; + nic->stats.rx_bytes += len; }
return 0;
From: Robert Hancock robert.hancock@calian.com
commit 0bf476fc3624e3a72af4ba7340d430a91c18cd67 upstream.
There is an oddity in the way the RSR register flags propagate to the ISR register (and the actual interrupt output) on this hardware: it appears that RSR register bits only result in ISR being asserted if the interrupt was actually enabled at the time, so enabling interrupts with RSR bits already set doesn't trigger an interrupt to be raised. There was already a partial fix for this race in the macb_poll function where it checked for RSR bits being set and re-triggered NAPI receive. However, there was a still a race window between checking RSR and actually enabling interrupts, where a lost wakeup could happen. It's necessary to check again after enabling interrupts to see if RSR was set just prior to the interrupt being enabled, and re-trigger receive in that case.
This issue was noticed in a point-to-point UDP request-response protocol which periodically saw timeouts or abnormally high response times due to received packets not being processed in a timely fashion. In many applications, more packets arriving, including TCP retransmissions, would cause the original packet to be processed, thus masking the issue.
Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is done") Cc: stable@vger.kernel.org Co-developed-by: Scott McNutt scott.mcnutt@siriusxm.com Signed-off-by: Scott McNutt scott.mcnutt@siriusxm.com Signed-off-by: Robert Hancock robert.hancock@calian.com Tested-by: Claudiu Beznea claudiu.beznea@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cadence/macb_main.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -1614,7 +1614,14 @@ static int macb_poll(struct napi_struct if (work_done < budget) { napi_complete_done(napi, work_done);
- /* Packets received while interrupts were disabled */ + /* RSR bits only seem to propagate to raise interrupts when + * interrupts are enabled at the time, so if bits are already + * set due to packets received while interrupts were disabled, + * they will not cause another interrupt to be generated when + * interrupts are re-enabled. + * Check for this case here. This has been seen to happen + * around 30% of the time under heavy network load. + */ status = macb_readl(bp, RSR); if (status) { if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) @@ -1622,6 +1629,22 @@ static int macb_poll(struct napi_struct napi_reschedule(napi); } else { queue_writel(queue, IER, bp->rx_intr_mask); + + /* In rare cases, packets could have been received in + * the window between the check above and re-enabling + * interrupts. Therefore, a double-check is required + * to avoid losing a wakeup. This can potentially race + * with the interrupt handler doing the same actions + * if an interrupt is raised just after enabling them, + * but this should be harmless. + */ + status = macb_readl(bp, RSR); + if (unlikely(status)) { + queue_writel(queue, IDR, bp->rx_intr_mask); + if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) + queue_writel(queue, ISR, MACB_BIT(RCOMP)); + napi_schedule(napi); + } } }
From: Jisheng Zhang jszhang@kernel.org
commit c80ee64a8020ef1a6a92109798080786829b8994 upstream.
The alternative mechanism needs runtime code patching, it can't work on XIP_KERNEL. And the errata workarounds are implemented via the alternative mechanism. So add !XIP_KERNEL dependency for alternative and erratas.
Signed-off-by: Jisheng Zhang jszhang@kernel.org Fixes: 44c922572952 ("RISC-V: enable XIP") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/Kconfig.erratas | 1 + arch/riscv/Kconfig.socs | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-)
--- a/arch/riscv/Kconfig.erratas +++ b/arch/riscv/Kconfig.erratas @@ -2,6 +2,7 @@ menu "CPU errata selection"
config RISCV_ERRATA_ALTERNATIVE bool "RISC-V alternative scheme" + depends on !XIP_KERNEL default y help This Kconfig allows the kernel to automatically patch the --- a/arch/riscv/Kconfig.socs +++ b/arch/riscv/Kconfig.socs @@ -14,8 +14,8 @@ config SOC_SIFIVE select CLK_SIFIVE select CLK_SIFIVE_PRCI select SIFIVE_PLIC - select RISCV_ERRATA_ALTERNATIVE - select ERRATA_SIFIVE + select RISCV_ERRATA_ALTERNATIVE if !XIP_KERNEL + select ERRATA_SIFIVE if !XIP_KERNEL help This enables support for SiFive SoC platform hardware.
From: Rong Chen rong.chen@amlogic.com
commit f0d2f15362f02444c5d7ffd5a5eb03e4aa54b685 upstream.
Currently meson_mmc_post_req() is called in meson_mmc_request() right after meson_mmc_start_cmd(). This could lead to DMA unmapping before the request is actually finished.
To fix, don't call meson_mmc_post_req() until meson_mmc_request_done().
Signed-off-by: Rong Chen rong.chen@amlogic.com Reviewed-by: Kevin Hilman khilman@baylibre.com Fixes: 79ed05e329c3 ("mmc: meson-gx: add support for descriptor chain mode") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220216124239.4007667-1-rong.chen@amlogic.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/meson-gx-mmc.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
--- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -173,6 +173,8 @@ struct meson_host { int irq;
bool vqmmc_enabled; + bool needs_pre_post_req; + };
#define CMD_CFG_LENGTH_MASK GENMASK(8, 0) @@ -663,6 +665,8 @@ static void meson_mmc_request_done(struc struct meson_host *host = mmc_priv(mmc);
host->cmd = NULL; + if (host->needs_pre_post_req) + meson_mmc_post_req(mmc, mrq, 0); mmc_request_done(host->mmc, mrq); }
@@ -880,7 +884,7 @@ static int meson_mmc_validate_dram_acces static void meson_mmc_request(struct mmc_host *mmc, struct mmc_request *mrq) { struct meson_host *host = mmc_priv(mmc); - bool needs_pre_post_req = mrq->data && + host->needs_pre_post_req = mrq->data && !(mrq->data->host_cookie & SD_EMMC_PRE_REQ_DONE);
/* @@ -896,22 +900,19 @@ static void meson_mmc_request(struct mmc } }
- if (needs_pre_post_req) { + if (host->needs_pre_post_req) { meson_mmc_get_transfer_mode(mmc, mrq); if (!meson_mmc_desc_chain_mode(mrq->data)) - needs_pre_post_req = false; + host->needs_pre_post_req = false; }
- if (needs_pre_post_req) + if (host->needs_pre_post_req) meson_mmc_pre_req(mmc, mrq);
/* Stop execution */ writel(0, host->regs + SD_EMMC_START);
meson_mmc_start_cmd(mmc, mrq->sbc ?: mrq->cmd); - - if (needs_pre_post_req) - meson_mmc_post_req(mmc, mrq, 0); }
static void meson_mmc_read_resp(struct mmc_host *mmc, struct mmc_command *cmd)
From: Emil Renner Berthing kernel@esmil.dk
commit 0966d385830de3470b7131db8e86c0c5bc9c52dc upstream.
RISC-V can do PC-relative jumps with a 32bit range using the following two instructions:
auipc t0, imm20 ; t0 = PC + imm20 * 2^12 jalr ra, t0, imm12 ; ra = PC + 4, PC = t0 + imm12
Crucially both the 20bit immediate imm20 and the 12bit immediate imm12 are treated as two's-complement signed values. For this reason the immediates are usually calculated like this:
imm20 = (offset + 0x800) >> 12 imm12 = offset & 0xfff
..where offset is the signed offset from the auipc instruction. When the 11th bit of offset is 0 the addition of 0x800 doesn't change the top 20 bits and imm12 considered positive. When the 11th bit is 1 the carry of the addition by 0x800 means imm20 is one higher, but since imm12 is then considered negative the two's complement representation means it all cancels out nicely.
However, this addition by 0x800 (2^11) means an offset greater than or equal to 2^31 - 2^11 would overflow so imm20 is considered negative and result in a backwards jump. Similarly the lower range of offset is also moved down by 2^11 and hence the true 32bit range is
[-2^31 - 2^11, 2^31 - 2^11)
Signed-off-by: Emil Renner Berthing kernel@esmil.dk Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/module.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
--- a/arch/riscv/kernel/module.c +++ b/arch/riscv/kernel/module.c @@ -13,6 +13,19 @@ #include <linux/pgtable.h> #include <asm/sections.h>
+/* + * The auipc+jalr instruction pair can reach any PC-relative offset + * in the range [-2^31 - 2^11, 2^31 - 2^11) + */ +static bool riscv_insn_valid_32bit_offset(ptrdiff_t val) +{ +#ifdef CONFIG_32BIT + return true; +#else + return (-(1L << 31) - (1L << 11)) <= val && val < ((1L << 31) - (1L << 11)); +#endif +} + static int apply_r_riscv_32_rela(struct module *me, u32 *location, Elf_Addr v) { if (v != (u32)v) { @@ -95,7 +108,7 @@ static int apply_r_riscv_pcrel_hi20_rela ptrdiff_t offset = (void *)v - (void *)location; s32 hi20;
- if (offset != (s32)offset) { + if (!riscv_insn_valid_32bit_offset(offset)) { pr_err( "%s: target %016llx can not be addressed by the 32-bit offset from PC = %p\n", me->name, (long long)v, location); @@ -197,10 +210,9 @@ static int apply_r_riscv_call_plt_rela(s Elf_Addr v) { ptrdiff_t offset = (void *)v - (void *)location; - s32 fill_v = offset; u32 hi20, lo12;
- if (offset != fill_v) { + if (!riscv_insn_valid_32bit_offset(offset)) { /* Only emit the plt entry if offset over 32-bit range */ if (IS_ENABLED(CONFIG_MODULE_SECTIONS)) { offset = module_emit_plt_entry(me, v); @@ -224,10 +236,9 @@ static int apply_r_riscv_call_rela(struc Elf_Addr v) { ptrdiff_t offset = (void *)v - (void *)location; - s32 fill_v = offset; u32 hi20, lo12;
- if (offset != fill_v) { + if (!riscv_insn_valid_32bit_offset(offset)) { pr_err( "%s: target %016llx can not be addressed by the 32-bit offset from PC = %p\n", me->name, (long long)v, location);
From: Nicolas Saenz Julienne nsaenzju@redhat.com
commit caf4c86bf136845982c5103b2661751b40c474c0 upstream.
At the moment running osnoise on a nohz_full CPU or uncontested FIFO priority and a PREEMPT_RCU kernel might have the side effect of extending grace periods too much. This will entice RCU to force a context switch on the wayward CPU to end the grace period, all while introducing unwarranted noise into the tracer. This behaviour is unavoidable as overly extending grace periods might exhaust the system's memory.
This same exact problem is what extended quiescent states (EQS) were created for, conversely, rcu_momentary_dyntick_idle() emulates them by performing a zero duration EQS. So let's make use of it.
In the common case rcu_momentary_dyntick_idle() is fairly inexpensive: atomically incrementing a local per-CPU counter and doing a store. So it shouldn't affect osnoise's measurements (which has a 1us granularity), so we'll call it unanimously.
The uncommon case involve calling rcu_momentary_dyntick_idle() after having the osnoise process:
- Receive an expedited quiescent state IPI with preemption disabled or during an RCU critical section. (activates rdp->cpu_no_qs.b.exp code-path).
- Being preempted within in an RCU critical section and having the subsequent outermost rcu_read_unlock() called with interrupts disabled. (t->rcu_read_unlock_special.b.blocked code-path).
Neither of those are possible at the moment, and are unlikely to be in the future given the osnoise's loop design. On top of this, the noise generated by the situations described above is unavoidable, and if not exposed by rcu_momentary_dyntick_idle() will be eventually seen in subsequent rcu_read_unlock() calls or schedule operations.
Link: https://lkml.kernel.org/r/20220307180740.577607-1-nsaenzju@redhat.com
Cc: stable@vger.kernel.org Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Signed-off-by: Nicolas Saenz Julienne nsaenzju@redhat.com Acked-by: Paul E. McKenney paulmck@kernel.org Acked-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_osnoise.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
--- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -1388,6 +1388,26 @@ static int run_osnoise(void) }
/* + * In some cases, notably when running on a nohz_full CPU with + * a stopped tick PREEMPT_RCU has no way to account for QSs. + * This will eventually cause unwarranted noise as PREEMPT_RCU + * will force preemption as the means of ending the current + * grace period. We avoid this problem by calling + * rcu_momentary_dyntick_idle(), which performs a zero duration + * EQS allowing PREEMPT_RCU to end the current grace period. + * This call shouldn't be wrapped inside an RCU critical + * section. + * + * Note that in non PREEMPT_RCU kernels QSs are handled through + * cond_resched() + */ + if (IS_ENABLED(CONFIG_PREEMPT_RCU)) { + local_irq_disable(); + rcu_momentary_dyntick_idle(); + local_irq_enable(); + } + + /* * For the non-preemptive kernel config: let threads runs, if * they so wish. */
From: Daniel Bristot de Oliveira bristot@kernel.org
commit f0cfe17bcc1dd2f0872966b554a148e888833ee9 upstream.
Nicolas reported that using:
# trace-cmd record -e all -M 10 -p osnoise --poll
Resulted in the following kernel warning:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 1217 at kernel/tracepoint.c:404 tracepoint_probe_unregister+0x280/0x370 [...] CPU: 0 PID: 1217 Comm: trace-cmd Not tainted 5.17.0-rc6-next-20220307-nico+ #19 RIP: 0010:tracepoint_probe_unregister+0x280/0x370 [...] CR2: 00007ff919b29497 CR3: 0000000109da4005 CR4: 0000000000170ef0 Call Trace: <TASK> osnoise_workload_stop+0x36/0x90 tracing_set_tracer+0x108/0x260 tracing_set_trace_write+0x94/0xd0 ? __check_object_size.part.0+0x10a/0x150 ? selinux_file_permission+0x104/0x150 vfs_write+0xb5/0x290 ksys_write+0x5f/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7ff919a18127 [...] ---[ end trace 0000000000000000 ]---
The warning complains about an attempt to unregister an unregistered tracepoint.
This happens on trace-cmd because it first stops tracing, and then switches the tracer to nop. Which is equivalent to:
# cd /sys/kernel/tracing/ # echo osnoise > current_tracer # echo 0 > tracing_on # echo nop > current_tracer
The osnoise tracer stops the workload when no trace instance is actually collecting data. This can be caused both by disabling tracing or disabling the tracer itself.
To avoid unregistering events twice, use the existing trace_osnoise_callback_enabled variable to check if the events (and the workload) are actually active before trying to deactivate them.
Link: https://lore.kernel.org/all/c898d1911f7f9303b7e14726e7cc9678fbfb4a0e.camel@r... Link: https://lkml.kernel.org/r/938765e17d5a781c2df429a98f0b2e7cc317b022.164682391...
Cc: stable@vger.kernel.org Cc: Marcelo Tosatti mtosatti@redhat.com Fixes: 2fac8d6486d5 ("tracing/osnoise: Allow multiple instances of the same tracer") Reported-by: Nicolas Saenz Julienne nsaenzju@redhat.com Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_osnoise.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -2222,6 +2222,17 @@ static void osnoise_workload_stop(void) if (osnoise_has_registered_instances()) return;
+ /* + * If callbacks were already disabled in a previous stop + * call, there is no need to disable then again. + * + * For instance, this happens when tracing is stopped via: + * echo 0 > tracing_on + * echo nop > current_tracer. + */ + if (!trace_osnoise_callback_enabled) + return; + trace_osnoise_callback_enabled = false; /* * Make sure that ftrace_nmi_enter/exit() see
From: Pali Rohár pali@kernel.org
commit a1cc1697bb56cdf880ad4d17b79a39ef2c294bc9 upstream.
Legacy and old PCI I/O based cards do not support 32-bit I/O addressing.
Since commit 64f160e19e92 ("PCI: aardvark: Configure PCIe resources from 'ranges' DT property") kernel can set different PCIe address on CPU and different on the bus for the one A37xx address mapping without any firmware support in case the bus address does not conflict with other A37xx mapping.
So remap I/O space to the bus address 0x0 to enable support for old legacy I/O port based cards which have hardcoded I/O ports in low address space.
Note that DDR on A37xx is mapped to bus address 0x0. And mapping of I/O space can be set to address 0x0 too because MEM space and I/O space are separate and so do not conflict.
Remapping IO space on Turris Mox to different address is not possible to due bootloader bug.
Signed-off-by: Pali Rohár pali@kernel.org Reported-by: Arnd Bergmann arnd@arndb.de Fixes: 76f6386b25cc ("arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700") Cc: stable@vger.kernel.org # 64f160e19e92 ("PCI: aardvark: Configure PCIe resources from 'ranges' DT property") Cc: stable@vger.kernel.org # 514ef1e62d65 ("arm64: dts: marvell: armada-37xx: Extend PCIe MEM space") Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Gregory CLEMENT gregory.clement@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts | 7 ++++++- arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +- 2 files changed, 7 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts +++ b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts @@ -139,7 +139,9 @@ /* * U-Boot port for Turris Mox has a bug which always expects that "ranges" DT property * contains exactly 2 ranges with 3 (child) address cells, 2 (parent) address cells and - * 2 size cells and also expects that the second range starts at 16 MB offset. If these + * 2 size cells and also expects that the second range starts at 16 MB offset. Also it + * expects that first range uses same address for PCI (child) and CPU (parent) cells (so + * no remapping) and that this address is the lowest from all specified ranges. If these * conditions are not met then U-Boot crashes during loading kernel DTB file. PCIe address * space is 128 MB long, so the best split between MEM and IO is to use fixed 16 MB window * for IO and the rest 112 MB (64+32+16) for MEM, despite that maximal IO size is just 64 kB. @@ -148,6 +150,9 @@ * https://source.denx.de/u-boot/u-boot/-/commit/cb2ddb291ee6fcbddd6d8f4ff49089... * https://source.denx.de/u-boot/u-boot/-/commit/c64ac3b3185aeb3846297ad7391fc6... * https://source.denx.de/u-boot/u-boot/-/commit/4a82fca8e330157081fc132a591ebd... + * Bug related to requirement of same child and parent addresses for first range is fixed + * in U-Boot version 2022.04 by following commit: + * https://source.denx.de/u-boot/u-boot/-/commit/1fd54253bca7d43d046bba4853fe5f... */ #address-cells = <3>; #size-cells = <2>; --- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi +++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi @@ -497,7 +497,7 @@ * (totaling 127 MiB) for MEM. */ ranges = <0x82000000 0 0xe8000000 0 0xe8000000 0 0x07f00000 /* Port 0 MEM */ - 0x81000000 0 0xefff0000 0 0xefff0000 0 0x00010000>; /* Port 0 IO */ + 0x81000000 0 0x00000000 0 0xefff0000 0 0x00010000>; /* Port 0 IO */ interrupt-map-mask = <0 0 0 7>; interrupt-map = <0 0 0 1 &pcie_intc 0>, <0 0 0 2 &pcie_intc 1>,
From: Catalin Marinas catalin.marinas@arm.com
commit 6e2edd6371a497a6350bb735534c9bda2a31f43d upstream.
Commit 18107f8a2df6 ("arm64: Support execute-only permissions with Enhanced PAN") re-introduced execute-only permissions when EPAN is available. When EPAN is not available, arch_filter_pgprot() is supposed to change a PAGE_EXECONLY permission into PAGE_READONLY_EXEC. However, if BTI or MTE are present, such check does not detect the execute-only pgprot in the presence of PTE_GP (BTI) or MT_NORMAL_TAGGED (MTE), allowing the user to request PROT_EXEC with PROT_BTI or PROT_MTE.
Remove the arch_filter_pgprot() function, change the default VM_EXEC permissions to PAGE_READONLY_EXEC and update the protection_map[] array at core_initcall() if EPAN is detected.
Signed-off-by: Catalin Marinas catalin.marinas@arm.com Fixes: 18107f8a2df6 ("arm64: Support execute-only permissions with Enhanced PAN") Cc: stable@vger.kernel.org # 5.13.x Acked-by: Will Deacon will@kernel.org Reviewed-by: Vladimir Murzin vladimir.murzin@arm.com Tested-by: Vladimir Murzin vladimir.murzin@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/Kconfig | 3 --- arch/arm64/include/asm/pgtable-prot.h | 4 ++-- arch/arm64/include/asm/pgtable.h | 11 ----------- arch/arm64/mm/mmap.c | 17 +++++++++++++++++ 4 files changed, 19 insertions(+), 16 deletions(-)
--- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1264,9 +1264,6 @@ config HW_PERF_EVENTS def_bool y depends on ARM_PMU
-config ARCH_HAS_FILTER_PGPROT - def_bool y - # Supported by clang >= 7.0 config CC_HAVE_SHADOW_CALL_STACK def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18) --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -92,7 +92,7 @@ extern bool arm64_use_ng_mappings; #define __P001 PAGE_READONLY #define __P010 PAGE_READONLY #define __P011 PAGE_READONLY -#define __P100 PAGE_EXECONLY +#define __P100 PAGE_READONLY_EXEC /* PAGE_EXECONLY if Enhanced PAN */ #define __P101 PAGE_READONLY_EXEC #define __P110 PAGE_READONLY_EXEC #define __P111 PAGE_READONLY_EXEC @@ -101,7 +101,7 @@ extern bool arm64_use_ng_mappings; #define __S001 PAGE_READONLY #define __S010 PAGE_SHARED #define __S011 PAGE_SHARED -#define __S100 PAGE_EXECONLY +#define __S100 PAGE_READONLY_EXEC /* PAGE_EXECONLY if Enhanced PAN */ #define __S101 PAGE_READONLY_EXEC #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1017,17 +1017,6 @@ static inline bool arch_wants_old_prefau } #define arch_wants_old_prefaulted_pte arch_wants_old_prefaulted_pte
-static inline pgprot_t arch_filter_pgprot(pgprot_t prot) -{ - if (cpus_have_const_cap(ARM64_HAS_EPAN)) - return prot; - - if (pgprot_val(prot) != pgprot_val(PAGE_EXECONLY)) - return prot; - - return PAGE_READONLY_EXEC; -} - static inline bool pud_sect_supported(void) { return PAGE_SIZE == SZ_4K; --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -7,8 +7,10 @@
#include <linux/io.h> #include <linux/memblock.h> +#include <linux/mm.h> #include <linux/types.h>
+#include <asm/cpufeature.h> #include <asm/page.h>
/* @@ -38,3 +40,18 @@ int valid_mmap_phys_addr_range(unsigned { return !(((pfn << PAGE_SHIFT) + size) & ~PHYS_MASK); } + +static int __init adjust_protection_map(void) +{ + /* + * With Enhanced PAN we can honour the execute-only permissions as + * there is no PAN override with such mappings. + */ + if (cpus_have_const_cap(ARM64_HAS_EPAN)) { + protection_map[VM_EXEC] = PAGE_EXECONLY; + protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY; + } + + return 0; +} +arch_initcall(adjust_protection_map);
From: Paul Semel semelpaul@gmail.com
commit b859ebedd1e730bbda69142fca87af4e712649a1 upstream.
Fix `error: expected string literal in 'asm'`. This happens when compiling an ebpf object file that includes `net/net_namespace.h` from linux kernel headers.
Include trace: include/net/net_namespace.h:10 include/linux/workqueue.h:9 include/linux/timer.h:8 include/linux/debugobjects.h:6 include/linux/spinlock.h:90 include/linux/workqueue.h:9 arch/arm64/include/asm/spinlock.h:9 arch/arm64/include/generated/asm/qrwlock.h:1 include/asm-generic/qrwlock.h:14 arch/arm64/include/asm/processor.h:33 arch/arm64/include/asm/kasan.h:9 arch/arm64/include/asm/mte-kasan.h:45 arch/arm64/include/asm/mte-def.h:14
Signed-off-by: Paul Semel paul.semel@datadoghq.com Fixes: 2cb34276427a ("arm64: kasan: simplify and inline MTE functions") Cc: stable@vger.kernel.org # 5.12.x Link: https://lore.kernel.org/r/bacb5387-2992-97e4-0c48-1ed925905bee@gmail.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/mte-kasan.h | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/include/asm/mte-kasan.h +++ b/arch/arm64/include/asm/mte-kasan.h @@ -5,6 +5,7 @@ #ifndef __ASM_MTE_KASAN_H #define __ASM_MTE_KASAN_H
+#include <asm/compiler.h> #include <asm/mte-def.h>
#ifndef __ASSEMBLY__
From: Halil Pasic pasic@linux.ibm.com
commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 upstream.
Unfortunately, we ended up merging an old version of the patch "fix info leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph (the swiotlb maintainer), he asked me to create an incremental fix (after I have pointed this out the mix up, and asked him for guidance). So here we go.
The main differences between what we got and what was agreed are: * swiotlb_sync_single_for_device is also required to do an extra bounce * We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters * The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes swiotlb_sync_single_for_device() bounce unconditionally (that is, also when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the size used with dma_[un]map_*, under certain circumstances we may still end up with swiotlb not being transparent. In that sense, this is no perfect fix either.
To get this bullet proof, we would have to bounce the entire mapping/bounce buffer. For that we would have to figure out the starting address, and the size of the mapping in swiotlb_sync_single_for_device(). While this does seem possible, there seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic pasic@linux.ibm.com Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/core-api/dma-attributes.rst | 8 -------- include/linux/dma-mapping.h | 8 -------- kernel/dma/swiotlb.c | 23 +++++++++++++++-------- 3 files changed, 15 insertions(+), 24 deletions(-)
--- a/Documentation/core-api/dma-attributes.rst +++ b/Documentation/core-api/dma-attributes.rst @@ -130,11 +130,3 @@ accesses to DMA buffers in both privileg subsystem that the buffer is fully accessible at the elevated privilege level (and ideally inaccessible or at least read-only at the lesser-privileged levels). - -DMA_ATTR_OVERWRITE ------------------- - -This is a hint to the DMA-mapping subsystem that the device is expected to -overwrite the entire mapped size, thus the caller does not require any of the -previous buffer contents to be preserved. This allows bounce-buffering -implementations to optimise DMA_FROM_DEVICE transfers. --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -62,14 +62,6 @@ #define DMA_ATTR_PRIVILEGED (1UL << 9)
/* - * This is a hint to the DMA-mapping subsystem that the device is expected - * to overwrite the entire mapped size, thus the caller does not require any - * of the previous buffer contents to be preserved. This allows - * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers. - */ -#define DMA_ATTR_OVERWRITE (1UL << 10) - -/* * A dma_addr_t can hold any valid DMA or bus address for the platform. It can * be given to a device to use as a DMA source or target. It is specific to a * given device and there may be a translation between the CPU physical address --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -581,10 +581,14 @@ phys_addr_t swiotlb_tbl_map_single(struc for (i = 0; i < nr_slots(alloc_size + offset); i++) mem->slots[index + i].orig_addr = slot_addr(orig_addr, i); tlb_addr = slot_addr(mem->start, index) + offset; - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE || - dir == DMA_BIDIRECTIONAL)) - swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE); + /* + * When dir == DMA_FROM_DEVICE we could omit the copy from the orig + * to the tlb buffer, if we knew for sure the device will + * overwirte the entire current content. But we don't. Thus + * unconditional bounce may prevent leaking swiotlb content (i.e. + * kernel memory) to user-space. + */ + swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE); return tlb_addr; }
@@ -651,10 +655,13 @@ void swiotlb_tbl_unmap_single(struct dev void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) { - if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) - swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE); - else - BUG_ON(dir != DMA_FROM_DEVICE); + /* + * Unconditional bounce is necessary to avoid corruption on + * sync_*_for_cpu or dma_ummap_* when the device didn't overwrite + * the whole lengt of the bounce buffer. + */ + swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE); + BUG_ON(!valid_dma_direction(dir)); }
void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
From: Michael S. Tsirkin mst@redhat.com
commit 838d6d3461db0fdbf33fc5f8a69c27b50b4a46da upstream.
virtio_finalize_features is only used internally within virtio. No reason to export it.
Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Cornelia Huck cohuck@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/virtio/virtio.c | 3 +-- include/linux/virtio.h | 1 - 2 files changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -166,7 +166,7 @@ void virtio_add_status(struct virtio_dev } EXPORT_SYMBOL_GPL(virtio_add_status);
-int virtio_finalize_features(struct virtio_device *dev) +static int virtio_finalize_features(struct virtio_device *dev) { int ret = dev->config->finalize_features(dev); unsigned status; @@ -202,7 +202,6 @@ int virtio_finalize_features(struct virt } return 0; } -EXPORT_SYMBOL_GPL(virtio_finalize_features);
static int virtio_dev_probe(struct device *_d) { --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -133,7 +133,6 @@ bool is_virtio_device(struct device *dev void virtio_break_device(struct virtio_device *dev);
void virtio_config_changed(struct virtio_device *dev); -int virtio_finalize_features(struct virtio_device *dev); #ifdef CONFIG_PM_SLEEP int virtio_device_freeze(struct virtio_device *dev); int virtio_device_restore(struct virtio_device *dev);
From: Michael S. Tsirkin mst@redhat.com
commit 4fa59ede95195f267101a1b8916992cf3f245cdb upstream.
The feature negotiation was designed in a way that makes it possible for devices to know which config fields will be accessed by drivers.
This is broken since commit 404123c2db79 ("virtio: allow drivers to validate features") with fallout in at least block and net. We have a partial work-around in commit 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") which at least lets devices find out which format should config space have, but this is a partial fix: guests should not access config space without acknowledging features since otherwise we'll never be able to change the config space format.
To fix, split finalize_features from virtio_finalize_features and call finalize_features with all feature bits before validation, and then - if validation changed any bits - once again after.
Since virtio_finalize_features no longer writes out features rename it to virtio_features_ok - since that is what it does: checks that features are ok with the device.
As a side effect, this also reduces the amount of hypervisor accesses - we now only acknowledge features once unless we are clearing any features when validating (which is uncommon).
IRC I think that this was more or less always the intent in the spec but unfortunately the way the spec is worded does not say this explicitly, I plan to address this at the spec level, too.
Acked-by: Jason Wang jasowang@redhat.com Cc: stable@vger.kernel.org Fixes: 404123c2db79 ("virtio: allow drivers to validate features") Fixes: 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") Cc: "Halil Pasic" pasic@linux.ibm.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/virtio/virtio.c | 39 ++++++++++++++++++++++----------------- include/linux/virtio_config.h | 3 ++- 2 files changed, 24 insertions(+), 18 deletions(-)
--- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -166,14 +166,13 @@ void virtio_add_status(struct virtio_dev } EXPORT_SYMBOL_GPL(virtio_add_status);
-static int virtio_finalize_features(struct virtio_device *dev) +/* Do some validation, then set FEATURES_OK */ +static int virtio_features_ok(struct virtio_device *dev) { - int ret = dev->config->finalize_features(dev); unsigned status; + int ret;
might_sleep(); - if (ret) - return ret;
ret = arch_has_restricted_virtio_memory_access(); if (ret) { @@ -238,17 +237,6 @@ static int virtio_dev_probe(struct devic driver_features_legacy = driver_features; }
- /* - * Some devices detect legacy solely via F_VERSION_1. Write - * F_VERSION_1 to force LE config space accesses before FEATURES_OK for - * these when needed. - */ - if (drv->validate && !virtio_legacy_is_little_endian() - && device_features & BIT_ULL(VIRTIO_F_VERSION_1)) { - dev->features = BIT_ULL(VIRTIO_F_VERSION_1); - dev->config->finalize_features(dev); - } - if (device_features & (1ULL << VIRTIO_F_VERSION_1)) dev->features = driver_features & device_features; else @@ -259,13 +247,26 @@ static int virtio_dev_probe(struct devic if (device_features & (1ULL << i)) __virtio_set_bit(dev, i);
+ err = dev->config->finalize_features(dev); + if (err) + goto err; + if (drv->validate) { + u64 features = dev->features; + err = drv->validate(dev); if (err) goto err; + + /* Did validation change any features? Then write them again. */ + if (features != dev->features) { + err = dev->config->finalize_features(dev); + if (err) + goto err; + } }
- err = virtio_finalize_features(dev); + err = virtio_features_ok(dev); if (err) goto err;
@@ -489,7 +490,11 @@ int virtio_device_restore(struct virtio_ /* We have a driver! */ virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
- ret = virtio_finalize_features(dev); + ret = dev->config->finalize_features(dev); + if (ret) + goto err; + + ret = virtio_features_ok(dev); if (ret) goto err;
--- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -64,8 +64,9 @@ struct virtio_shm_region { * Returns the first 64 feature bits (all we currently need). * @finalize_features: confirm what device features we'll be using. * vdev: the virtio_device - * This gives the final feature bits for the device: it can change + * This sends the driver feature bits to the device: it can change * the dev->feature bits if it wants. + * Note: despite the name this can be called any number of times. * Returns 0 on success or error status * @bus_name: return the bus name associated with the device (optional) * vdev: the virtio_device
From: Dima Chumak dchumak@nvidia.com
commit 39bab83b119faac4bf7f07173a42ed35be95147e upstream.
Only prio 1 is supported for nic mode when there is no ignore flow level support in firmware. But for switchdev mode, which supports fixed number of statically pre-allocated prios, this restriction is not relevant so it can be relaxed.
Fixes: d671e109bd85 ("net/mlx5: Fix tc max supported prio for nic mode") Signed-off-by: Dima Chumak dchumak@nvidia.com Reviewed-by: Roi Dayan roid@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c | 3 --- 1 file changed, 3 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c @@ -121,9 +121,6 @@ u32 mlx5_chains_get_nf_ft_chain(struct m
u32 mlx5_chains_get_prio_range(struct mlx5_fs_chains *chains) { - if (!mlx5_chains_prios_supported(chains)) - return 1; - if (mlx5_chains_ignore_flow_level_supported(chains)) return UINT_MAX;
From: Russell King (Oracle) rmk+kernel@armlinux.org.uk
commit 6c7cb60bff7aec24b834343ff433125f469886a3 upstream.
When building for Thumb2, the vectors make use of a local label. Sadly, the Spectre BHB code also uses a local label with the same number which results in the Thumb2 reference pointing at the wrong place. Fix this by changing the number used for the Spectre BHB local label.
Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround") Tested-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/kernel/entry-armv.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -1040,9 +1040,9 @@ vector_bhb_loop8_\name:
@ bhb workaround mov r0, #8 -1: b . + 4 +3: b . + 4 subs r0, r0, #1 - bne 1b + bne 3b dsb isb b 2b
From: David Howells dhowells@redhat.com
commit c993ee0f9f81caf5767a50d1faeba39a0dc82af2 upstream.
In watch_queue_set_filter(), there are a couple of places where we check that the filter type value does not exceed what the type_filter bitmap can hold. One place calculates the number of bits by:
if (tf[i].type >= sizeof(wfilter->type_filter) * 8)
which is fine, but the second does:
if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG)
which is not. This can lead to a couple of out-of-bounds writes due to a too-large type:
(1) __set_bit() on wfilter->type_filter (2) Writing more elements in wfilter->filters[] than we allocated.
Fix this by just using the proper WATCH_TYPE__NR instead, which is the number of types we actually know about.
The bug may cause an oops looking something like:
BUG: KASAN: slab-out-of-bounds in watch_queue_set_filter+0x659/0x740 Write of size 4 at addr ffff88800d2c66bc by task watch_queue_oob/611 ... Call Trace: <TASK> dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x150 ... kasan_report.cold+0x7f/0x11b ... watch_queue_set_filter+0x659/0x740 ... __x64_sys_ioctl+0x127/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
Allocated by task 611: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 watch_queue_set_filter+0x23a/0x740 __x64_sys_ioctl+0x127/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff88800d2c66a0 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 28 bytes inside of 32-byte region [ffff88800d2c66a0, ffff88800d2c66c0)
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn jannh@google.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/watch_queue.h | 3 ++- kernel/watch_queue.c | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-)
--- a/include/linux/watch_queue.h +++ b/include/linux/watch_queue.h @@ -28,7 +28,8 @@ struct watch_type_filter { struct watch_filter { union { struct rcu_head rcu; - unsigned long type_filter[2]; /* Bitmask of accepted types */ + /* Bitmask of accepted types */ + DECLARE_BITMAP(type_filter, WATCH_TYPE__NR); }; u32 nr_filters; /* Number of filters */ struct watch_type_filter filters[]; --- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -320,7 +320,7 @@ long watch_queue_set_filter(struct pipe_ tf[i].info_mask & WATCH_INFO_LENGTH) goto err_filter; /* Ignore any unknown types */ - if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + if (tf[i].type >= WATCH_TYPE__NR) continue; nr_filter++; } @@ -336,7 +336,7 @@ long watch_queue_set_filter(struct pipe_
q = wfilter->filters; for (i = 0; i < filter.nr_filters; i++) { - if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + if (tf[i].type >= WATCH_TYPE__NR) continue;
q->type = tf[i].type;
From: David Howells dhowells@redhat.com
commit db8facfc9fafacefe8a835416a6b77c838088f8b upstream.
In free_pipe_info(), free the watchqueue state after clearing the pipe ring as each pipe ring descriptor has a release function, and in the case of a notification message, this is watch_queue_pipe_buf_release() which tries to mark the allocation bitmap that was previously released.
Fix this by moving the put of the pipe's ref on the watch queue to after the ring has been cleared. We still need to call watch_queue_clear() before doing that to make sure that the pipe is disconnected from any notification sources first.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn jannh@google.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pipe.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/fs/pipe.c +++ b/fs/pipe.c @@ -830,10 +830,8 @@ void free_pipe_info(struct pipe_inode_in int i;
#ifdef CONFIG_WATCH_QUEUE - if (pipe->watch_queue) { + if (pipe->watch_queue) watch_queue_clear(pipe->watch_queue); - put_watch_queue(pipe->watch_queue); - } #endif
(void) account_pipe_buffers(pipe->user, pipe->nr_accounted, 0); @@ -843,6 +841,10 @@ void free_pipe_info(struct pipe_inode_in if (buf->ops) pipe_buf_release(pipe, buf); } +#ifdef CONFIG_WATCH_QUEUE + if (pipe->watch_queue) + put_watch_queue(pipe->watch_queue); +#endif if (pipe->tmp_page) __free_page(pipe->tmp_page); kfree(pipe->bufs);
From: David Howells dhowells@redhat.com
commit c1853fbadcba1497f4907971e7107888e0714c81 upstream.
When a pipe ring descriptor points to a notification message, the refcount on the backing page is incremented by the generic get function, but the release function, which marks the bitmap, doesn't drop the page ref.
Fix this by calling generic_pipe_buf_release() at the end of watch_queue_pipe_buf_release().
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn jannh@google.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/watch_queue.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -54,6 +54,7 @@ static void watch_queue_pipe_buf_release bit += page->index;
set_bit(bit, wqueue->notes_bitmap); + generic_pipe_buf_release(pipe, buf); }
// No try_steal function => no stealing
From: David Howells dhowells@redhat.com
commit 96a4d8912b28451cd62825fd7caa0e66e091d938 upstream.
The pipe ring size must always be a power of 2 as the head and tail pointers are masked off by AND'ing with the size of the ring - 1. watch_queue_set_size(), however, lets you specify any number of notes between 1 and 511. This number is passed through to pipe_resize_ring() without checking/forcing its alignment.
Fix this by rounding the number of slots required up to the nearest power of two. The request is meant to guarantee that at least that many notifications can be generated before the queue is full, so rounding down isn't an option, but, alternatively, it may be better to give an error if we aren't allowed to allocate that much ring space.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn jannh@google.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/watch_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -244,7 +244,7 @@ long watch_queue_set_size(struct pipe_in goto error; }
- ret = pipe_resize_ring(pipe, nr_notes); + ret = pipe_resize_ring(pipe, roundup_pow_of_two(nr_notes)); if (ret < 0) goto error;
From: David Howells dhowells@redhat.com
commit 3b4c0371928c17af03e8397ac842346624017ce6 upstream.
Currently, watch_queue_set_size() sets the number of notes available in wqueue->nr_notes according to the number of notes allocated, but sets the size of the bitmap to the unrounded number of notes originally asked for.
Fix this by setting the bitmap size to the number of notes we're actually going to make available (ie. the number allocated).
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn jannh@google.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/watch_queue.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -244,6 +244,7 @@ long watch_queue_set_size(struct pipe_in goto error; }
+ nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE; ret = pipe_resize_ring(pipe, roundup_pow_of_two(nr_notes)); if (ret < 0) goto error; @@ -269,7 +270,7 @@ long watch_queue_set_size(struct pipe_in wqueue->notes = pages; wqueue->notes_bitmap = bitmap; wqueue->nr_pages = nr_pages; - wqueue->nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE; + wqueue->nr_notes = nr_notes; return 0;
error_p:
From: David Howells dhowells@redhat.com
commit 7ea1a0124b6da246b5bc8c66cddaafd36acf3ecb upstream.
Free the watch_queue note allocation bitmap when the watch_queue is destroyed.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn jannh@google.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/watch_queue.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -373,6 +373,7 @@ static void __put_watch_queue(struct kre
for (i = 0; i < wqueue->nr_pages; i++) __free_page(wqueue->notes[i]); + bitmap_free(wqueue->notes_bitmap);
wfilter = rcu_access_pointer(wqueue->filter); if (wfilter)
From: David Howells dhowells@redhat.com
commit 2ed147f015af2b48f41c6f0b6746aa9ea85c19f3 upstream.
There's nothing to synchronise post_one_notification() versus pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the reader only takes pipe->mutex which cannot bar notification posting as that may need to be made from contexts that cannot sleep.
Fix this by setting pipe->head with a barrier in post_one_notification() and reading pipe->head with a barrier in pipe_read().
If that's not sufficient, the rd_wait.lock will need to be taken, possibly in a ->confirm() op so that it only applies to notifications. The lock would, however, have to be dropped before copy_page_to_iter() is invoked.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn jannh@google.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pipe.c | 3 ++- kernel/watch_queue.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)
--- a/fs/pipe.c +++ b/fs/pipe.c @@ -252,7 +252,8 @@ pipe_read(struct kiocb *iocb, struct iov */ was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage); for (;;) { - unsigned int head = pipe->head; + /* Read ->head with a barrier vs post_one_notification() */ + unsigned int head = smp_load_acquire(&pipe->head); unsigned int tail = pipe->tail; unsigned int mask = pipe->ring_size - 1;
--- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -113,7 +113,7 @@ static bool post_one_notification(struct buf->offset = offset; buf->len = len; buf->flags = PIPE_BUF_FLAG_WHOLE; - pipe->head = head + 1; + smp_store_release(&pipe->head, head + 1); /* vs pipe_read() */
if (!test_and_clear_bit(note, wqueue->notes_bitmap)) { spin_unlock_irq(&pipe->rd_wait.lock);
From: David Howells dhowells@redhat.com
commit 4edc0760412b0c4ecefc7e02cb855b310b122825 upstream.
watch_queue_clear() has a comment stating that setting ->defunct to true preventing new additions as well as preventing notifications. Whilst the latter is true, the first bit is superfluous since at the time this function is called, the pipe cannot be accessed to add new event sources.
Remove the "new additions" bit from the comment.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn jannh@google.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/watch_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/watch_queue.c +++ b/kernel/watch_queue.c @@ -569,7 +569,7 @@ void watch_queue_clear(struct watch_queu rcu_read_lock(); spin_lock_bh(&wqueue->lock);
- /* Prevent new additions and prevent notifications from happening */ + /* Prevent new notifications from being stored. */ wqueue->defunct = true;
while (!hlist_empty(&wqueue->watches)) {
From: Ross Philipson ross.philipson@oracle.com
commit 7228918b34615ef6317edcd9a058a057bc54aa32 upstream.
As documented, the setup_indirect structure is nested inside the setup_data structures in the setup_data list. The code currently accesses the fields inside the setup_indirect structure but only the sizeof(struct setup_data) is being memremapped. No crash occurred but this is just due to how the area is remapped under the covers.
Properly memremap both the setup_data and setup_indirect structures in these cases before accessing them.
Fixes: b3c72fc9a78e ("x86/boot: Introduce setup_indirect") Signed-off-by: Ross Philipson ross.philipson@oracle.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Daniel Kiper daniel.kiper@oracle.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1645668456-22036-2-git-send-email-ross.philipson@o... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/e820.c | 41 +++++++++++++++++------ arch/x86/kernel/kdebugfs.c | 35 +++++++++++++++----- arch/x86/kernel/ksysfs.c | 77 +++++++++++++++++++++++++++++++++++---------- arch/x86/kernel/setup.c | 34 +++++++++++++++---- arch/x86/mm/ioremap.c | 24 +++++++++++--- 5 files changed, 165 insertions(+), 46 deletions(-)
--- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -995,8 +995,10 @@ early_param("memmap", parse_memmap_opt); */ void __init e820__reserve_setup_data(void) { + struct setup_indirect *indirect; struct setup_data *data; - u64 pa_data; + u64 pa_data, pa_next; + u32 len;
pa_data = boot_params.hdr.setup_data; if (!pa_data) @@ -1004,6 +1006,14 @@ void __init e820__reserve_setup_data(voi
while (pa_data) { data = early_memremap(pa_data, sizeof(*data)); + if (!data) { + pr_warn("e820: failed to memremap setup_data entry\n"); + return; + } + + len = sizeof(*data); + pa_next = data->next; + e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
/* @@ -1015,18 +1025,27 @@ void __init e820__reserve_setup_data(voi sizeof(*data) + data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
- if (data->type == SETUP_INDIRECT && - ((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) { - e820__range_update(((struct setup_indirect *)data->data)->addr, - ((struct setup_indirect *)data->data)->len, - E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); - e820__range_update_kexec(((struct setup_indirect *)data->data)->addr, - ((struct setup_indirect *)data->data)->len, - E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); + if (data->type == SETUP_INDIRECT) { + len += data->len; + early_memunmap(data, sizeof(*data)); + data = early_memremap(pa_data, len); + if (!data) { + pr_warn("e820: failed to memremap indirect setup_data\n"); + return; + } + + indirect = (struct setup_indirect *)data->data; + + if (indirect->type != SETUP_INDIRECT) { + e820__range_update(indirect->addr, indirect->len, + E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); + e820__range_update_kexec(indirect->addr, indirect->len, + E820_TYPE_RAM, E820_TYPE_RESERVED_KERN); + } }
- pa_data = data->next; - early_memunmap(data, sizeof(*data)); + pa_data = pa_next; + early_memunmap(data, len); }
e820__update_table(e820_table); --- a/arch/x86/kernel/kdebugfs.c +++ b/arch/x86/kernel/kdebugfs.c @@ -88,11 +88,13 @@ create_setup_data_node(struct dentry *pa
static int __init create_setup_data_nodes(struct dentry *parent) { + struct setup_indirect *indirect; struct setup_data_node *node; struct setup_data *data; - int error; + u64 pa_data, pa_next; struct dentry *d; - u64 pa_data; + int error; + u32 len; int no = 0;
d = debugfs_create_dir("setup_data", parent); @@ -112,12 +114,29 @@ static int __init create_setup_data_node error = -ENOMEM; goto err_dir; } + pa_next = data->next;
- if (data->type == SETUP_INDIRECT && - ((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) { - node->paddr = ((struct setup_indirect *)data->data)->addr; - node->type = ((struct setup_indirect *)data->data)->type; - node->len = ((struct setup_indirect *)data->data)->len; + if (data->type == SETUP_INDIRECT) { + len = sizeof(*data) + data->len; + memunmap(data); + data = memremap(pa_data, len, MEMREMAP_WB); + if (!data) { + kfree(node); + error = -ENOMEM; + goto err_dir; + } + + indirect = (struct setup_indirect *)data->data; + + if (indirect->type != SETUP_INDIRECT) { + node->paddr = indirect->addr; + node->type = indirect->type; + node->len = indirect->len; + } else { + node->paddr = pa_data; + node->type = data->type; + node->len = data->len; + } } else { node->paddr = pa_data; node->type = data->type; @@ -125,7 +144,7 @@ static int __init create_setup_data_node }
create_setup_data_node(d, no, node); - pa_data = data->next; + pa_data = pa_next;
memunmap(data); no++; --- a/arch/x86/kernel/ksysfs.c +++ b/arch/x86/kernel/ksysfs.c @@ -91,26 +91,41 @@ static int get_setup_data_paddr(int nr,
static int __init get_setup_data_size(int nr, size_t *size) { - int i = 0; + u64 pa_data = boot_params.hdr.setup_data, pa_next; + struct setup_indirect *indirect; struct setup_data *data; - u64 pa_data = boot_params.hdr.setup_data; + int i = 0; + u32 len;
while (pa_data) { data = memremap(pa_data, sizeof(*data), MEMREMAP_WB); if (!data) return -ENOMEM; + pa_next = data->next; + if (nr == i) { - if (data->type == SETUP_INDIRECT && - ((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) - *size = ((struct setup_indirect *)data->data)->len; - else + if (data->type == SETUP_INDIRECT) { + len = sizeof(*data) + data->len; + memunmap(data); + data = memremap(pa_data, len, MEMREMAP_WB); + if (!data) + return -ENOMEM; + + indirect = (struct setup_indirect *)data->data; + + if (indirect->type != SETUP_INDIRECT) + *size = indirect->len; + else + *size = data->len; + } else { *size = data->len; + }
memunmap(data); return 0; }
- pa_data = data->next; + pa_data = pa_next; memunmap(data); i++; } @@ -120,9 +135,11 @@ static int __init get_setup_data_size(in static ssize_t type_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { + struct setup_indirect *indirect; + struct setup_data *data; int nr, ret; u64 paddr; - struct setup_data *data; + u32 len;
ret = kobj_to_setup_data_nr(kobj, &nr); if (ret) @@ -135,10 +152,20 @@ static ssize_t type_show(struct kobject if (!data) return -ENOMEM;
- if (data->type == SETUP_INDIRECT) - ret = sprintf(buf, "0x%x\n", ((struct setup_indirect *)data->data)->type); - else + if (data->type == SETUP_INDIRECT) { + len = sizeof(*data) + data->len; + memunmap(data); + data = memremap(paddr, len, MEMREMAP_WB); + if (!data) + return -ENOMEM; + + indirect = (struct setup_indirect *)data->data; + + ret = sprintf(buf, "0x%x\n", indirect->type); + } else { ret = sprintf(buf, "0x%x\n", data->type); + } + memunmap(data); return ret; } @@ -149,9 +176,10 @@ static ssize_t setup_data_data_read(stru char *buf, loff_t off, size_t count) { + struct setup_indirect *indirect; + struct setup_data *data; int nr, ret = 0; u64 paddr, len; - struct setup_data *data; void *p;
ret = kobj_to_setup_data_nr(kobj, &nr); @@ -165,10 +193,27 @@ static ssize_t setup_data_data_read(stru if (!data) return -ENOMEM;
- if (data->type == SETUP_INDIRECT && - ((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) { - paddr = ((struct setup_indirect *)data->data)->addr; - len = ((struct setup_indirect *)data->data)->len; + if (data->type == SETUP_INDIRECT) { + len = sizeof(*data) + data->len; + memunmap(data); + data = memremap(paddr, len, MEMREMAP_WB); + if (!data) + return -ENOMEM; + + indirect = (struct setup_indirect *)data->data; + + if (indirect->type != SETUP_INDIRECT) { + paddr = indirect->addr; + len = indirect->len; + } else { + /* + * Even though this is technically undefined, return + * the data as though it is a normal setup_data struct. + * This will at least allow it to be inspected. + */ + paddr += sizeof(*data); + len = data->len; + } } else { paddr += sizeof(*data); len = data->len; --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -368,21 +368,41 @@ static void __init parse_setup_data(void
static void __init memblock_x86_reserve_range_setup_data(void) { + struct setup_indirect *indirect; struct setup_data *data; - u64 pa_data; + u64 pa_data, pa_next; + u32 len;
pa_data = boot_params.hdr.setup_data; while (pa_data) { data = early_memremap(pa_data, sizeof(*data)); + if (!data) { + pr_warn("setup: failed to memremap setup_data entry\n"); + return; + } + + len = sizeof(*data); + pa_next = data->next; + memblock_reserve(pa_data, sizeof(*data) + data->len);
- if (data->type == SETUP_INDIRECT && - ((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) - memblock_reserve(((struct setup_indirect *)data->data)->addr, - ((struct setup_indirect *)data->data)->len); + if (data->type == SETUP_INDIRECT) { + len += data->len; + early_memunmap(data, sizeof(*data)); + data = early_memremap(pa_data, len); + if (!data) { + pr_warn("setup: failed to memremap indirect setup_data\n"); + return; + } + + indirect = (struct setup_indirect *)data->data; + + if (indirect->type != SETUP_INDIRECT) + memblock_reserve(indirect->addr, indirect->len); + }
- pa_data = data->next; - early_memunmap(data, sizeof(*data)); + pa_data = pa_next; + early_memunmap(data, len); } }
--- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -615,6 +615,7 @@ static bool memremap_is_efi_data(resourc static bool memremap_is_setup_data(resource_size_t phys_addr, unsigned long size) { + struct setup_indirect *indirect; struct setup_data *data; u64 paddr, paddr_next;
@@ -627,6 +628,10 @@ static bool memremap_is_setup_data(resou
data = memremap(paddr, sizeof(*data), MEMREMAP_WB | MEMREMAP_DEC); + if (!data) { + pr_warn("failed to memremap setup_data entry\n"); + return false; + }
paddr_next = data->next; len = data->len; @@ -636,10 +641,21 @@ static bool memremap_is_setup_data(resou return true; }
- if (data->type == SETUP_INDIRECT && - ((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) { - paddr = ((struct setup_indirect *)data->data)->addr; - len = ((struct setup_indirect *)data->data)->len; + if (data->type == SETUP_INDIRECT) { + memunmap(data); + data = memremap(paddr, sizeof(*data) + len, + MEMREMAP_WB | MEMREMAP_DEC); + if (!data) { + pr_warn("failed to memremap indirect setup_data\n"); + return false; + } + + indirect = (struct setup_indirect *)data->data; + + if (indirect->type != SETUP_INDIRECT) { + paddr = indirect->addr; + len = indirect->len; + } }
memunmap(data);
From: Ross Philipson ross.philipson@oracle.com
commit 445c1470b6ef96440e7cfc42dfc160f5004fd149 upstream.
The x86 boot documentation describes the setup_indirect structures and how they are used. Only one of the two functions in ioremap.c that needed to be modified to be aware of the introduction of setup_indirect functionality was updated. Adds comparable support to the other function where it was missing.
Fixes: b3c72fc9a78e ("x86/boot: Introduce setup_indirect") Signed-off-by: Ross Philipson ross.philipson@oracle.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Daniel Kiper daniel.kiper@oracle.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1645668456-22036-3-git-send-email-ross.philipson@o... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/ioremap.c | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-)
--- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -676,22 +676,51 @@ static bool memremap_is_setup_data(resou static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, unsigned long size) { + struct setup_indirect *indirect; struct setup_data *data; u64 paddr, paddr_next;
paddr = boot_params.hdr.setup_data; while (paddr) { - unsigned int len; + unsigned int len, size;
if (phys_addr == paddr) return true;
data = early_memremap_decrypted(paddr, sizeof(*data)); + if (!data) { + pr_warn("failed to early memremap setup_data entry\n"); + return false; + } + + size = sizeof(*data);
paddr_next = data->next; len = data->len;
- early_memunmap(data, sizeof(*data)); + if ((phys_addr > paddr) && (phys_addr < (paddr + len))) { + early_memunmap(data, sizeof(*data)); + return true; + } + + if (data->type == SETUP_INDIRECT) { + size += len; + early_memunmap(data, sizeof(*data)); + data = early_memremap_decrypted(paddr, size); + if (!data) { + pr_warn("failed to early memremap indirect setup_data\n"); + return false; + } + + indirect = (struct setup_indirect *)data->data; + + if (indirect->type != SETUP_INDIRECT) { + paddr = indirect->addr; + len = indirect->len; + } + } + + early_memunmap(data, size);
if ((phys_addr > paddr) && (phys_addr < (paddr + len))) return true;
From: Peter Zijlstra peterz@infradead.org
commit 5adf349439d29f92467e864f728dfc23180f3ef9 upstream.
Ever since commit
4e6292114c74 ("x86/paravirt: Add new features for paravirt patching")
there is an ordering dependency between patching paravirt ops and patching alternatives, the module loader still violates this.
Fixes: 4e6292114c74 ("x86/paravirt: Add new features for paravirt patching") Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Miroslav Benes mbenes@suse.cz Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220303112825.068773913@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/module.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -273,6 +273,14 @@ int module_finalize(const Elf_Ehdr *hdr, retpolines = s; }
+ /* + * See alternative_instructions() for the ordering rules between the + * various patching types. + */ + if (para) { + void *pseg = (void *)para->sh_addr; + apply_paravirt(pseg, pseg + para->sh_size); + } if (retpolines) { void *rseg = (void *)retpolines->sh_addr; apply_retpolines(rseg, rseg + retpolines->sh_size); @@ -290,11 +298,6 @@ int module_finalize(const Elf_Ehdr *hdr, tseg, tseg + text->sh_size); }
- if (para) { - void *pseg = (void *)para->sh_addr; - apply_paravirt(pseg, pseg + para->sh_size); - } - /* make jump label nops */ jump_label_apply_nops(me);
From: Jarkko Sakkinen jarkko@kernel.org
commit 08999b2489b4c9b939d7483dbd03702ee4576d96 upstream.
There is a limited amount of SGX memory (EPC) on each system. When that memory is used up, SGX has its own swapping mechanism which is similar in concept but totally separate from the core mm/* code. Instead of swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM comes from a shared memory pseudo-file and can itself be swapped by the core mm code. There is a hierarchy like this:
EPC <-> shmem <-> disk
After data is swapped back in from shmem to EPC, the shmem backing storage needs to be freed. Currently, the backing shmem is not freed. This effectively wastes the shmem while the enclave is running. The memory is recovered when the enclave is destroyed and the backing storage freed.
Sort this out by freeing memory with shmem_truncate_range(), as soon as a page is faulted back to the EPC. In addition, free the memory for PCMD pages as soon as all PCMD's in a page have been marked as unused by zeroing its contents.
Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/encl.c | 57 ++++++++++++++++++++++++++++++++++------- 1 file changed, 48 insertions(+), 9 deletions(-)
--- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -13,6 +13,30 @@ #include "sgx.h"
/* + * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's + * follow right after the EPC data in the backing storage. In addition to the + * visible enclave pages, there's one extra page slot for SECS, before PCMD + * structs. + */ +static inline pgoff_t sgx_encl_get_backing_page_pcmd_offset(struct sgx_encl *encl, + unsigned long page_index) +{ + pgoff_t epc_end_off = encl->size + sizeof(struct sgx_secs); + + return epc_end_off + page_index * sizeof(struct sgx_pcmd); +} + +/* + * Free a page from the backing storage in the given page index. + */ +static inline void sgx_encl_truncate_backing_page(struct sgx_encl *encl, unsigned long page_index) +{ + struct inode *inode = file_inode(encl->backing); + + shmem_truncate_range(inode, PFN_PHYS(page_index), PFN_PHYS(page_index) + PAGE_SIZE - 1); +} + +/* * ELDU: Load an EPC page as unblocked. For more info, see "OS Management of EPC * Pages" in the SDM. */ @@ -22,9 +46,11 @@ static int __sgx_encl_eldu(struct sgx_en { unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK; struct sgx_encl *encl = encl_page->encl; + pgoff_t page_index, page_pcmd_off; struct sgx_pageinfo pginfo; struct sgx_backing b; - pgoff_t page_index; + bool pcmd_page_empty; + u8 *pcmd_page; int ret;
if (secs_page) @@ -32,14 +58,16 @@ static int __sgx_encl_eldu(struct sgx_en else page_index = PFN_DOWN(encl->size);
+ page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index); + ret = sgx_encl_get_backing(encl, page_index, &b); if (ret) return ret;
pginfo.addr = encl_page->desc & PAGE_MASK; pginfo.contents = (unsigned long)kmap_atomic(b.contents); - pginfo.metadata = (unsigned long)kmap_atomic(b.pcmd) + - b.pcmd_offset; + pcmd_page = kmap_atomic(b.pcmd); + pginfo.metadata = (unsigned long)pcmd_page + b.pcmd_offset;
if (secs_page) pginfo.secs = (u64)sgx_get_epc_virt_addr(secs_page); @@ -55,11 +83,24 @@ static int __sgx_encl_eldu(struct sgx_en ret = -EFAULT; }
- kunmap_atomic((void *)(unsigned long)(pginfo.metadata - b.pcmd_offset)); + memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd)); + + /* + * The area for the PCMD in the page was zeroed above. Check if the + * whole page is now empty meaning that all PCMD's have been zeroed: + */ + pcmd_page_empty = !memchr_inv(pcmd_page, 0, PAGE_SIZE); + + kunmap_atomic(pcmd_page); kunmap_atomic((void *)(unsigned long)pginfo.contents);
sgx_encl_put_backing(&b, false);
+ sgx_encl_truncate_backing_page(encl, page_index); + + if (pcmd_page_empty) + sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off)); + return ret; }
@@ -579,7 +620,7 @@ static struct page *sgx_encl_get_backing int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, struct sgx_backing *backing) { - pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5); + pgoff_t page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index); struct page *contents; struct page *pcmd;
@@ -587,7 +628,7 @@ int sgx_encl_get_backing(struct sgx_encl if (IS_ERR(contents)) return PTR_ERR(contents);
- pcmd = sgx_encl_get_backing_page(encl, pcmd_index); + pcmd = sgx_encl_get_backing_page(encl, PFN_DOWN(page_pcmd_off)); if (IS_ERR(pcmd)) { put_page(contents); return PTR_ERR(pcmd); @@ -596,9 +637,7 @@ int sgx_encl_get_backing(struct sgx_encl backing->page_index = page_index; backing->contents = contents; backing->pcmd = pcmd; - backing->pcmd_offset = - (page_index & (PAGE_SIZE / sizeof(struct sgx_pcmd) - 1)) * - sizeof(struct sgx_pcmd); + backing->pcmd_offset = page_pcmd_off & (PAGE_SIZE - 1);
return 0; }
From: Li Huafei lihuafei1@huawei.com
commit a365a65f9ca1ceb9cf1ac29db4a4f51df7c507ad upstream.
Since kprobe_int3_handler() is called in do_int3(), probing do_int3() can cause a breakpoint recursion and crash the kernel. Therefore, do_int3() should be marked as NOKPROBE_SYMBOL.
Fixes: 21e28290b317 ("x86/traps: Split int3 handler up") Signed-off-by: Li Huafei lihuafei1@huawei.com Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Masami Hiramatsu mhiramat@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220310120915.63349-1-lihuafei1@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/traps.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -659,6 +659,7 @@ static bool do_int3(struct pt_regs *regs
return res == NOTIFY_STOP; } +NOKPROBE_SYMBOL(do_int3);
static void do_int3_user(struct pt_regs *regs) {
From: Thomas Zimmermann tzimmermann@suse.de
commit 3755d35ee1d2454b20b8a1e20d790e56201678a4 upstream.
As reported in [1], DRM_PANEL_EDP depends on DRM_DP_HELPER. Select the option to fix the build failure. The error message is shown below.
arm-linux-gnueabihf-ld: drivers/gpu/drm/panel/panel-edp.o: in function `panel_edp_probe': panel-edp.c:(.text+0xb74): undefined reference to `drm_panel_dp_aux_backlight' make[1]: *** [/builds/linux/Makefile:1222: vmlinux] Error 1
The issue has been reported before, when DisplayPort helpers were hidden behind the option CONFIG_DRM_KMS_HELPER. [2]
v2: * fix and expand commit description (Arnd)
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: 9d6366e743f3 ("drm: fb_helper: improve CONFIG_FB dependency") Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reviewed-by: Lyude Paul lyude@redhat.com Acked-by: Sam Ravnborg sam@ravnborg.org Link: https://lore.kernel.org/dri-devel/CA+G9fYvN0NyaVkRQmA1O6rX7H8PPaZrUAD7=RDy33... # [1] Link: https://lore.kernel.org/all/20211117062704.14671-1-rdunlap@infradead.org/ # [2] Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Lyude Paul lyude@redhat.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: dri-devel@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20220203093922.20754-1-tzimmer... Signed-off-by: Dave Airlie airlied@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/panel/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/panel/Kconfig +++ b/drivers/gpu/drm/panel/Kconfig @@ -95,6 +95,7 @@ config DRM_PANEL_EDP depends on PM select VIDEOMODE_HELPERS select DRM_DP_AUX_BUS + select DRM_DP_HELPER help DRM panel driver for dumb eDP panels that need at most a regulator and a GPIO to be powered up. Optionally a backlight can be attached so
From: Zhengjun Xing zhengjun.xing@linux.intel.com
commit 91c9923a473a694eb1c5c01ab778a77114969707 upstream.
This bug happened on hybrid systems when both cpu_core and cpu_atom have the same event name such as "UOPS_RETIRED.MS" while their event terms are different, then during perf stat, the event for cpu_atom will parse fail and then no output for cpu_atom.
UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/ UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/
It is because event terms in the "head" of parse_events_multi_pmu_add will be changed to event terms for cpu_core after parsing UOPS_RETIRED.MS for cpu_core, then when parsing the same event for cpu_atom, it still uses the event terms for cpu_core, but event terms for cpu_atom are different with cpu_core, the event parses for cpu_atom will fail. This patch fixes it, the event terms should be parsed from the original event.
This patch can work for the hybrid systems that have the same event in more than 2 PMUs. It also can work in non-hybrid systems.
Before:
# perf stat -v -e UOPS_RETIRED.MS -a sleep 1
Using CPUID GenuineIntel-6-97-1 UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/ Control descriptor is not initialized UOPS_RETIRED.MS: 2737845 16068518485 16068518485
Performance counter stats for 'system wide':
2,737,845 cpu_core/UOPS_RETIRED.MS/
1.002553850 seconds time elapsed
After:
# perf stat -v -e UOPS_RETIRED.MS -a sleep 1
Using CPUID GenuineIntel-6-97-1 UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/ UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/ Control descriptor is not initialized UOPS_RETIRED.MS: 1977555 16076950711 16076950711 UOPS_RETIRED.MS: 568684 8038694234 8038694234
Performance counter stats for 'system wide':
1,977,555 cpu_core/UOPS_RETIRED.MS/ 568,684 cpu_atom/UOPS_RETIRED.MS/
1.004758259 seconds time elapsed
Fixes: fb0811535e92c6c1 ("perf parse-events: Allow config on kernel PMU events") Reviewed-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Zhengjun Xing zhengjun.xing@linux.intel.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: https://lore.kernel.org/r/20220307151627.30049-1-zhengjun.xing@linux.intel.c... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/util/parse-events.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -1648,6 +1648,7 @@ int parse_events_multi_pmu_add(struct pa { struct parse_events_term *term; struct list_head *list = NULL; + struct list_head *orig_head = NULL; struct perf_pmu *pmu = NULL; int ok = 0; char *config; @@ -1674,7 +1675,6 @@ int parse_events_multi_pmu_add(struct pa } list_add_tail(&term->list, head);
- /* Add it for all PMUs that support the alias */ list = malloc(sizeof(struct list_head)); if (!list) @@ -1687,13 +1687,15 @@ int parse_events_multi_pmu_add(struct pa
list_for_each_entry(alias, &pmu->aliases, list) { if (!strcasecmp(alias->name, str)) { + parse_events_copy_term_list(head, &orig_head); if (!parse_events_add_pmu(parse_state, list, - pmu->name, head, + pmu->name, orig_head, true, true)) { pr_debug("%s -> %s/%s/\n", str, pmu->name, alias->str); ok++; } + parse_events_terms__delete(orig_head); } } }
From: Filipe Manana fdmanana@suse.com
commit d96b34248c2f4ea8cd09286090f2f6f77102eaab upstream.
We don't allow send and balance/relocation to run in parallel in order to prevent send failing or silently producing some bad stream. This is because while send is using an extent (specially metadata) or about to read a metadata extent and expecting it belongs to a specific parent node, relocation can run, the transaction used for the relocation is committed and the extent gets reallocated while send is still using the extent, so it ends up with a different content than expected. This can result in just failing to read a metadata extent due to failure of the validation checks (parent transid, level, etc), failure to find a backreference for a data extent, and other unexpected failures. Besides reallocation, there's also a similar problem of an extent getting discarded when it's unpinned after the transaction used for block group relocation is committed.
The restriction between balance and send was added in commit 9e967495e0e0 ("Btrfs: prevent send failures and crashes due to concurrent relocation"), kernel 5.3, while the more general restriction between send and relocation was added in commit 1cea5cf0e664 ("btrfs: ensure relocation never runs while we have send operations running"), kernel 5.14.
Both send and relocation can be very long running operations. Relocation because it has to do a lot of IO and expensive backreference lookups in case there are many snapshots, and send due to read IO when operating on very large trees. This makes it inconvenient for users and tools to deal with scheduling both operations.
For zoned filesystem we also have automatic block group relocation, so send can fail with -EAGAIN when users least expect it or send can end up delaying the block group relocation for too long. In the future we might also get the automatic block group relocation for non zoned filesystems.
This change makes it possible for send and relocation to run in parallel. This is achieved the following way:
1) For all tree searches, send acquires a read lock on the commit root semaphore;
2) After each tree search, and before releasing the commit root semaphore, the leaf is cloned and placed in the search path (struct btrfs_path);
3) After releasing the commit root semaphore, the changed_cb() callback is invoked, which operates on the leaf and writes commands to the pipe (or file in case send/receive is not used with a pipe). It's important here to not hold a lock on the commit root semaphore, because if we did we could deadlock when sending and receiving to the same filesystem using a pipe - the send task blocks on the pipe because it's full, the receive task, which is the only consumer of the pipe, triggers a transaction commit when attempting to create a subvolume or reserve space for a write operation for example, but the transaction commit blocks trying to write lock the commit root semaphore, resulting in a deadlock;
4) Before moving to the next key, or advancing to the next change in case of an incremental send, check if a transaction used for relocation was committed (or is about to finish its commit). If so, release the search path(s) and restart the search, to where we were before, so that we don't operate on stale extent buffers. The search restarts are always possible because both the send and parent roots are RO, and no one can add, remove of update keys (change their offset) in RO trees - the only exception is deduplication, but that is still not allowed to run in parallel with send;
5) Periodically check if there is contention on the commit root semaphore, which means there is a transaction commit trying to write lock it, and release the semaphore and reschedule if there is contention, so as to avoid causing any significant delays to transaction commits.
This leaves some room for optimizations for send to have less path releases and re searching the trees when there's relocation running, but for now it's kept simple as it performs quite well (on very large trees with resulting send streams in the order of a few hundred gigabytes).
Test case btrfs/187, from fstests, stresses relocation, send and deduplication attempting to run in parallel, but without verifying if send succeeds and if it produces correct streams. A new test case will be added that exercises relocation happening in parallel with send and then checks that send succeeds and the resulting streams are correct.
A final note is that for now this still leaves the mutual exclusion between send operations and deduplication on files belonging to a root used by send operations. A solution for that will be slightly more complex but it will eventually be built on top of this change.
Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/block-group.c | 9 - fs/btrfs/ctree.c | 98 ++++++++++--- fs/btrfs/ctree.h | 14 - fs/btrfs/disk-io.c | 4 fs/btrfs/relocation.c | 13 - fs/btrfs/send.c | 357 ++++++++++++++++++++++++++++++++++++++++++------- fs/btrfs/transaction.c | 4 7 files changed, 395 insertions(+), 104 deletions(-)
--- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1508,7 +1508,6 @@ void btrfs_reclaim_bgs_work(struct work_ container_of(work, struct btrfs_fs_info, reclaim_bgs_work); struct btrfs_block_group *bg; struct btrfs_space_info *space_info; - LIST_HEAD(again_list);
if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return; @@ -1585,18 +1584,14 @@ void btrfs_reclaim_bgs_work(struct work_ div64_u64(zone_unusable * 100, bg->length)); trace_btrfs_reclaim_block_group(bg); ret = btrfs_relocate_chunk(fs_info, bg->start); - if (ret && ret != -EAGAIN) + if (ret) btrfs_err(fs_info, "error relocating chunk %llu", bg->start);
next: + btrfs_put_block_group(bg); spin_lock(&fs_info->unused_bgs_lock); - if (ret == -EAGAIN && list_empty(&bg->bg_list)) - list_add_tail(&bg->bg_list, &again_list); - else - btrfs_put_block_group(bg); } - list_splice_tail(&again_list, &fs_info->reclaim_bgs); spin_unlock(&fs_info->unused_bgs_lock); mutex_unlock(&fs_info->reclaim_bgs_lock); btrfs_exclop_finish(fs_info); --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1568,32 +1568,13 @@ static struct extent_buffer *btrfs_searc struct btrfs_path *p, int write_lock_level) { - struct btrfs_fs_info *fs_info = root->fs_info; struct extent_buffer *b; int root_lock = 0; int level = 0;
if (p->search_commit_root) { - /* - * The commit roots are read only so we always do read locks, - * and we always must hold the commit_root_sem when doing - * searches on them, the only exception is send where we don't - * want to block transaction commits for a long time, so - * we need to clone the commit root in order to avoid races - * with transaction commits that create a snapshot of one of - * the roots used by a send operation. - */ - if (p->need_commit_sem) { - down_read(&fs_info->commit_root_sem); - b = btrfs_clone_extent_buffer(root->commit_root); - up_read(&fs_info->commit_root_sem); - if (!b) - return ERR_PTR(-ENOMEM); - - } else { - b = root->commit_root; - atomic_inc(&b->refs); - } + b = root->commit_root; + atomic_inc(&b->refs); level = btrfs_header_level(b); /* * Ensure that all callers have set skip_locking when @@ -1659,6 +1640,42 @@ out: return b; }
+/* + * Replace the extent buffer at the lowest level of the path with a cloned + * version. The purpose is to be able to use it safely, after releasing the + * commit root semaphore, even if relocation is happening in parallel, the + * transaction used for relocation is committed and the extent buffer is + * reallocated in the next transaction. + * + * This is used in a context where the caller does not prevent transaction + * commits from happening, either by holding a transaction handle or holding + * some lock, while it's doing searches through a commit root. + * At the moment it's only used for send operations. + */ +static int finish_need_commit_sem_search(struct btrfs_path *path) +{ + const int i = path->lowest_level; + const int slot = path->slots[i]; + struct extent_buffer *lowest = path->nodes[i]; + struct extent_buffer *clone; + + ASSERT(path->need_commit_sem); + + if (!lowest) + return 0; + + lockdep_assert_held_read(&lowest->fs_info->commit_root_sem); + + clone = btrfs_clone_extent_buffer(lowest); + if (!clone) + return -ENOMEM; + + btrfs_release_path(path); + path->nodes[i] = clone; + path->slots[i] = slot; + + return 0; +}
/* * btrfs_search_slot - look for a key in a tree and perform necessary @@ -1695,6 +1712,7 @@ int btrfs_search_slot(struct btrfs_trans const struct btrfs_key *key, struct btrfs_path *p, int ins_len, int cow) { + struct btrfs_fs_info *fs_info = root->fs_info; struct extent_buffer *b; int slot; int ret; @@ -1736,6 +1754,11 @@ int btrfs_search_slot(struct btrfs_trans
min_write_lock_level = write_lock_level;
+ if (p->need_commit_sem) { + ASSERT(p->search_commit_root); + down_read(&fs_info->commit_root_sem); + } + again: prev_cmp = -1; b = btrfs_search_slot_get_root(root, p, write_lock_level); @@ -1930,6 +1953,16 @@ cow_done: done: if (ret < 0 && !p->skip_release_on_error) btrfs_release_path(p); + + if (p->need_commit_sem) { + int ret2; + + ret2 = finish_need_commit_sem_search(p); + up_read(&fs_info->commit_root_sem); + if (ret2) + ret = ret2; + } + return ret; } ALLOW_ERROR_INJECTION(btrfs_search_slot, ERRNO); @@ -4414,7 +4447,9 @@ int btrfs_next_old_leaf(struct btrfs_roo int level; struct extent_buffer *c; struct extent_buffer *next; + struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_key key; + bool need_commit_sem = false; u32 nritems; int ret; int i; @@ -4431,14 +4466,20 @@ again:
path->keep_locks = 1;
- if (time_seq) + if (time_seq) { ret = btrfs_search_old_slot(root, &key, path, time_seq); - else + } else { + if (path->need_commit_sem) { + path->need_commit_sem = 0; + need_commit_sem = true; + down_read(&fs_info->commit_root_sem); + } ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + } path->keep_locks = 0;
if (ret < 0) - return ret; + goto done;
nritems = btrfs_header_nritems(path->nodes[0]); /* @@ -4561,6 +4602,15 @@ again: ret = 0; done: unlock_up(path, 0, 1, 0, NULL); + if (need_commit_sem) { + int ret2; + + path->need_commit_sem = 1; + ret2 = finish_need_commit_sem_search(path); + up_read(&fs_info->commit_root_sem); + if (ret2) + ret = ret2; + }
return ret; } --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -576,7 +576,6 @@ enum { /* * Indicate that relocation of a chunk has started, it's set per chunk * and is toggled between chunks. - * Set, tested and cleared while holding fs_info::send_reloc_lock. */ BTRFS_FS_RELOC_RUNNING,
@@ -676,6 +675,12 @@ struct btrfs_fs_info {
u64 generation; u64 last_trans_committed; + /* + * Generation of the last transaction used for block group relocation + * since the filesystem was last mounted (or 0 if none happened yet). + * Must be written and read while holding btrfs_fs_info::commit_root_sem. + */ + u64 last_reloc_trans; u64 avg_delayed_ref_runtime;
/* @@ -1006,13 +1011,6 @@ struct btrfs_fs_info {
struct crypto_shash *csum_shash;
- spinlock_t send_reloc_lock; - /* - * Number of send operations in progress. - * Updated while holding fs_info::send_reloc_lock. - */ - int send_in_progress; - /* Type of exclusive operation running, protected by super_lock */ enum btrfs_exclusive_operation exclusive_operation;
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2858,6 +2858,7 @@ static int __cold init_tree_roots(struct /* All successful */ fs_info->generation = generation; fs_info->last_trans_committed = generation; + fs_info->last_reloc_trans = 0;
/* Always begin writing backup roots after the one being used */ if (backup_index < 0) { @@ -2993,9 +2994,6 @@ void btrfs_init_fs_info(struct btrfs_fs_ spin_lock_init(&fs_info->swapfile_pins_lock); fs_info->swapfile_pins = RB_ROOT;
- spin_lock_init(&fs_info->send_reloc_lock); - fs_info->send_in_progress = 0; - fs_info->bg_reclaim_threshold = BTRFS_DEFAULT_RECLAIM_THRESH; INIT_WORK(&fs_info->reclaim_bgs_work, btrfs_reclaim_bgs_work); } --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3858,25 +3858,14 @@ out: * 0 success * -EINPROGRESS operation is already in progress, that's probably a bug * -ECANCELED cancellation request was set before the operation started - * -EAGAIN can not start because there are ongoing send operations */ static int reloc_chunk_start(struct btrfs_fs_info *fs_info) { - spin_lock(&fs_info->send_reloc_lock); - if (fs_info->send_in_progress) { - btrfs_warn_rl(fs_info, -"cannot run relocation while send operations are in progress (%d in progress)", - fs_info->send_in_progress); - spin_unlock(&fs_info->send_reloc_lock); - return -EAGAIN; - } if (test_and_set_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags)) { /* This should not happen */ - spin_unlock(&fs_info->send_reloc_lock); btrfs_err(fs_info, "reloc already running, cannot start"); return -EINPROGRESS; } - spin_unlock(&fs_info->send_reloc_lock);
if (atomic_read(&fs_info->reloc_cancel_req) > 0) { btrfs_info(fs_info, "chunk relocation canceled on start"); @@ -3898,9 +3887,7 @@ static void reloc_chunk_end(struct btrfs /* Requested after start, clear bit first so any waiters can continue */ if (atomic_read(&fs_info->reloc_cancel_req) > 0) btrfs_info(fs_info, "chunk relocation canceled during operation"); - spin_lock(&fs_info->send_reloc_lock); clear_and_wake_up_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags); - spin_unlock(&fs_info->send_reloc_lock); atomic_set(&fs_info->reloc_cancel_req, 0); }
--- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -24,6 +24,7 @@ #include "transaction.h" #include "compression.h" #include "xattr.h" +#include "print-tree.h"
/* * Maximum number of references an extent can have in order for us to attempt to @@ -98,6 +99,15 @@ struct send_ctx { struct btrfs_key *cmp_key;
/* + * Keep track of the generation of the last transaction that was used + * for relocating a block group. This is periodically checked in order + * to detect if a relocation happened since the last check, so that we + * don't operate on stale extent buffers for nodes (level >= 1) or on + * stale disk_bytenr values of file extent items. + */ + u64 last_reloc_trans; + + /* * infos of the currently processed inode. In case of deleted inodes, * these are the values from the deleted inode. */ @@ -1427,6 +1437,26 @@ static int find_extent_clone(struct send if (ret < 0) goto out;
+ down_read(&fs_info->commit_root_sem); + if (fs_info->last_reloc_trans > sctx->last_reloc_trans) { + /* + * A transaction commit for a transaction in which block group + * relocation was done just happened. + * The disk_bytenr of the file extent item we processed is + * possibly stale, referring to the extent's location before + * relocation. So act as if we haven't found any clone sources + * and fallback to write commands, which will read the correct + * data from the new extent location. Otherwise we will fail + * below because we haven't found our own back reference or we + * could be getting incorrect sources in case the old extent + * was already reallocated after the relocation. + */ + up_read(&fs_info->commit_root_sem); + ret = -ENOENT; + goto out; + } + up_read(&fs_info->commit_root_sem); + if (!backref_ctx.found_itself) { /* found a bug in backref code? */ ret = -EIO; @@ -6601,6 +6631,50 @@ static int changed_cb(struct btrfs_path { int ret = 0;
+ /* + * We can not hold the commit root semaphore here. This is because in + * the case of sending and receiving to the same filesystem, using a + * pipe, could result in a deadlock: + * + * 1) The task running send blocks on the pipe because it's full; + * + * 2) The task running receive, which is the only consumer of the pipe, + * is waiting for a transaction commit (for example due to a space + * reservation when doing a write or triggering a transaction commit + * when creating a subvolume); + * + * 3) The transaction is waiting to write lock the commit root semaphore, + * but can not acquire it since it's being held at 1). + * + * Down this call chain we write to the pipe through kernel_write(). + * The same type of problem can also happen when sending to a file that + * is stored in the same filesystem - when reserving space for a write + * into the file, we can trigger a transaction commit. + * + * Our caller has supplied us with clones of leaves from the send and + * parent roots, so we're safe here from a concurrent relocation and + * further reallocation of metadata extents while we are here. Below we + * also assert that the leaves are clones. + */ + lockdep_assert_not_held(&sctx->send_root->fs_info->commit_root_sem); + + /* + * We always have a send root, so left_path is never NULL. We will not + * have a leaf when we have reached the end of the send root but have + * not yet reached the end of the parent root. + */ + if (left_path->nodes[0]) + ASSERT(test_bit(EXTENT_BUFFER_UNMAPPED, + &left_path->nodes[0]->bflags)); + /* + * When doing a full send we don't have a parent root, so right_path is + * NULL. When doing an incremental send, we may have reached the end of + * the parent root already, so we don't have a leaf at right_path. + */ + if (right_path && right_path->nodes[0]) + ASSERT(test_bit(EXTENT_BUFFER_UNMAPPED, + &right_path->nodes[0]->bflags)); + if (result == BTRFS_COMPARE_TREE_SAME) { if (key->type == BTRFS_INODE_REF_KEY || key->type == BTRFS_INODE_EXTREF_KEY) { @@ -6647,14 +6721,46 @@ out: return ret; }
+static int search_key_again(const struct send_ctx *sctx, + struct btrfs_root *root, + struct btrfs_path *path, + const struct btrfs_key *key) +{ + int ret; + + if (!path->need_commit_sem) + lockdep_assert_held_read(&root->fs_info->commit_root_sem); + + /* + * Roots used for send operations are readonly and no one can add, + * update or remove keys from them, so we should be able to find our + * key again. The only exception is deduplication, which can operate on + * readonly roots and add, update or remove keys to/from them - but at + * the moment we don't allow it to run in parallel with send. + */ + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + ASSERT(ret <= 0); + if (ret > 0) { + btrfs_print_tree(path->nodes[path->lowest_level], false); + btrfs_err(root->fs_info, +"send: key (%llu %u %llu) not found in %s root %llu, lowest_level %d, slot %d", + key->objectid, key->type, key->offset, + (root == sctx->parent_root ? "parent" : "send"), + root->root_key.objectid, path->lowest_level, + path->slots[path->lowest_level]); + return -EUCLEAN; + } + + return ret; +} + static int full_send_tree(struct send_ctx *sctx) { int ret; struct btrfs_root *send_root = sctx->send_root; struct btrfs_key key; + struct btrfs_fs_info *fs_info = send_root->fs_info; struct btrfs_path *path; - struct extent_buffer *eb; - int slot;
path = alloc_path_for_send(); if (!path) @@ -6665,6 +6771,10 @@ static int full_send_tree(struct send_ct key.type = BTRFS_INODE_ITEM_KEY; key.offset = 0;
+ down_read(&fs_info->commit_root_sem); + sctx->last_reloc_trans = fs_info->last_reloc_trans; + up_read(&fs_info->commit_root_sem); + ret = btrfs_search_slot_for_read(send_root, &key, path, 1, 0); if (ret < 0) goto out; @@ -6672,15 +6782,35 @@ static int full_send_tree(struct send_ct goto out_finish;
while (1) { - eb = path->nodes[0]; - slot = path->slots[0]; - btrfs_item_key_to_cpu(eb, &key, slot); + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
ret = changed_cb(path, NULL, &key, BTRFS_COMPARE_TREE_NEW, sctx); if (ret < 0) goto out;
+ down_read(&fs_info->commit_root_sem); + if (fs_info->last_reloc_trans > sctx->last_reloc_trans) { + sctx->last_reloc_trans = fs_info->last_reloc_trans; + up_read(&fs_info->commit_root_sem); + /* + * A transaction used for relocating a block group was + * committed or is about to finish its commit. Release + * our path (leaf) and restart the search, so that we + * avoid operating on any file extent items that are + * stale, with a disk_bytenr that reflects a pre + * relocation value. This way we avoid as much as + * possible to fallback to regular writes when checking + * if we can clone file ranges. + */ + btrfs_release_path(path); + ret = search_key_again(sctx, send_root, path, &key); + if (ret < 0) + goto out; + } else { + up_read(&fs_info->commit_root_sem); + } + ret = btrfs_next_item(send_root, path); if (ret < 0) goto out; @@ -6698,6 +6828,20 @@ out: return ret; }
+static int replace_node_with_clone(struct btrfs_path *path, int level) +{ + struct extent_buffer *clone; + + clone = btrfs_clone_extent_buffer(path->nodes[level]); + if (!clone) + return -ENOMEM; + + free_extent_buffer(path->nodes[level]); + path->nodes[level] = clone; + + return 0; +} + static int tree_move_down(struct btrfs_path *path, int *level, u64 reada_min_gen) { struct extent_buffer *eb; @@ -6707,6 +6851,8 @@ static int tree_move_down(struct btrfs_p u64 reada_max; u64 reada_done = 0;
+ lockdep_assert_held_read(&parent->fs_info->commit_root_sem); + BUG_ON(*level == 0); eb = btrfs_read_node_slot(parent, slot); if (IS_ERR(eb)) @@ -6730,6 +6876,10 @@ static int tree_move_down(struct btrfs_p path->nodes[*level - 1] = eb; path->slots[*level - 1] = 0; (*level)--; + + if (*level == 0) + return replace_node_with_clone(path, 0); + return 0; }
@@ -6743,8 +6893,10 @@ static int tree_move_next_or_upnext(stru path->slots[*level]++;
while (path->slots[*level] >= nritems) { - if (*level == root_level) + if (*level == root_level) { + path->slots[*level] = nritems - 1; return -1; + }
/* move upnext */ path->slots[*level] = 0; @@ -6776,14 +6928,20 @@ static int tree_advance(struct btrfs_pat } else { ret = tree_move_down(path, level, reada_min_gen); } - if (ret >= 0) { - if (*level == 0) - btrfs_item_key_to_cpu(path->nodes[*level], key, - path->slots[*level]); - else - btrfs_node_key_to_cpu(path->nodes[*level], key, - path->slots[*level]); - } + + /* + * Even if we have reached the end of a tree, ret is -1, update the key + * anyway, so that in case we need to restart due to a block group + * relocation, we can assert that the last key of the root node still + * exists in the tree. + */ + if (*level == 0) + btrfs_item_key_to_cpu(path->nodes[*level], key, + path->slots[*level]); + else + btrfs_node_key_to_cpu(path->nodes[*level], key, + path->slots[*level]); + return ret; }
@@ -6813,6 +6971,97 @@ static int tree_compare_item(struct btrf }
/* + * A transaction used for relocating a block group was committed or is about to + * finish its commit. Release our paths and restart the search, so that we are + * not using stale extent buffers: + * + * 1) For levels > 0, we are only holding references of extent buffers, without + * any locks on them, which does not prevent them from having been relocated + * and reallocated after the last time we released the commit root semaphore. + * The exception are the root nodes, for which we always have a clone, see + * the comment at btrfs_compare_trees(); + * + * 2) For leaves, level 0, we are holding copies (clones) of extent buffers, so + * we are safe from the concurrent relocation and reallocation. However they + * can have file extent items with a pre relocation disk_bytenr value, so we + * restart the start from the current commit roots and clone the new leaves so + * that we get the post relocation disk_bytenr values. Not doing so, could + * make us clone the wrong data in case there are new extents using the old + * disk_bytenr that happen to be shared. + */ +static int restart_after_relocation(struct btrfs_path *left_path, + struct btrfs_path *right_path, + const struct btrfs_key *left_key, + const struct btrfs_key *right_key, + int left_level, + int right_level, + const struct send_ctx *sctx) +{ + int root_level; + int ret; + + lockdep_assert_held_read(&sctx->send_root->fs_info->commit_root_sem); + + btrfs_release_path(left_path); + btrfs_release_path(right_path); + + /* + * Since keys can not be added or removed to/from our roots because they + * are readonly and we do not allow deduplication to run in parallel + * (which can add, remove or change keys), the layout of the trees should + * not change. + */ + left_path->lowest_level = left_level; + ret = search_key_again(sctx, sctx->send_root, left_path, left_key); + if (ret < 0) + return ret; + + right_path->lowest_level = right_level; + ret = search_key_again(sctx, sctx->parent_root, right_path, right_key); + if (ret < 0) + return ret; + + /* + * If the lowest level nodes are leaves, clone them so that they can be + * safely used by changed_cb() while not under the protection of the + * commit root semaphore, even if relocation and reallocation happens in + * parallel. + */ + if (left_level == 0) { + ret = replace_node_with_clone(left_path, 0); + if (ret < 0) + return ret; + } + + if (right_level == 0) { + ret = replace_node_with_clone(right_path, 0); + if (ret < 0) + return ret; + } + + /* + * Now clone the root nodes (unless they happen to be the leaves we have + * already cloned). This is to protect against concurrent snapshotting of + * the send and parent roots (see the comment at btrfs_compare_trees()). + */ + root_level = btrfs_header_level(sctx->send_root->commit_root); + if (root_level > 0) { + ret = replace_node_with_clone(left_path, root_level); + if (ret < 0) + return ret; + } + + root_level = btrfs_header_level(sctx->parent_root->commit_root); + if (root_level > 0) { + ret = replace_node_with_clone(right_path, root_level); + if (ret < 0) + return ret; + } + + return 0; +} + +/* * This function compares two trees and calls the provided callback for * every changed/new/deleted item it finds. * If shared tree blocks are encountered, whole subtrees are skipped, making @@ -6840,10 +7089,10 @@ static int btrfs_compare_trees(struct bt int right_root_level; int left_level; int right_level; - int left_end_reached; - int right_end_reached; - int advance_left; - int advance_right; + int left_end_reached = 0; + int right_end_reached = 0; + int advance_left = 0; + int advance_right = 0; u64 left_blockptr; u64 right_blockptr; u64 left_gen; @@ -6911,12 +7160,18 @@ static int btrfs_compare_trees(struct bt down_read(&fs_info->commit_root_sem); left_level = btrfs_header_level(left_root->commit_root); left_root_level = left_level; + /* + * We clone the root node of the send and parent roots to prevent races + * with snapshot creation of these roots. Snapshot creation COWs the + * root node of a tree, so after the transaction is committed the old + * extent can be reallocated while this send operation is still ongoing. + * So we clone them, under the commit root semaphore, to be race free. + */ left_path->nodes[left_level] = btrfs_clone_extent_buffer(left_root->commit_root); if (!left_path->nodes[left_level]) { - up_read(&fs_info->commit_root_sem); ret = -ENOMEM; - goto out; + goto out_unlock; }
right_level = btrfs_header_level(right_root->commit_root); @@ -6924,9 +7179,8 @@ static int btrfs_compare_trees(struct bt right_path->nodes[right_level] = btrfs_clone_extent_buffer(right_root->commit_root); if (!right_path->nodes[right_level]) { - up_read(&fs_info->commit_root_sem); ret = -ENOMEM; - goto out; + goto out_unlock; } /* * Our right root is the parent root, while the left root is the "send" @@ -6936,7 +7190,6 @@ static int btrfs_compare_trees(struct bt * will need to read them at some point. */ reada_min_gen = btrfs_header_generation(right_root->commit_root); - up_read(&fs_info->commit_root_sem);
if (left_level == 0) btrfs_item_key_to_cpu(left_path->nodes[left_level], @@ -6951,11 +7204,26 @@ static int btrfs_compare_trees(struct bt btrfs_node_key_to_cpu(right_path->nodes[right_level], &right_key, right_path->slots[right_level]);
- left_end_reached = right_end_reached = 0; - advance_left = advance_right = 0; + sctx->last_reloc_trans = fs_info->last_reloc_trans;
while (1) { - cond_resched(); + if (need_resched() || + rwsem_is_contended(&fs_info->commit_root_sem)) { + up_read(&fs_info->commit_root_sem); + cond_resched(); + down_read(&fs_info->commit_root_sem); + } + + if (fs_info->last_reloc_trans > sctx->last_reloc_trans) { + ret = restart_after_relocation(left_path, right_path, + &left_key, &right_key, + left_level, right_level, + sctx); + if (ret < 0) + goto out_unlock; + sctx->last_reloc_trans = fs_info->last_reloc_trans; + } + if (advance_left && !left_end_reached) { ret = tree_advance(left_path, &left_level, left_root_level, @@ -6964,7 +7232,7 @@ static int btrfs_compare_trees(struct bt if (ret == -1) left_end_reached = ADVANCE; else if (ret < 0) - goto out; + goto out_unlock; advance_left = 0; } if (advance_right && !right_end_reached) { @@ -6975,54 +7243,55 @@ static int btrfs_compare_trees(struct bt if (ret == -1) right_end_reached = ADVANCE; else if (ret < 0) - goto out; + goto out_unlock; advance_right = 0; }
if (left_end_reached && right_end_reached) { ret = 0; - goto out; + goto out_unlock; } else if (left_end_reached) { if (right_level == 0) { + up_read(&fs_info->commit_root_sem); ret = changed_cb(left_path, right_path, &right_key, BTRFS_COMPARE_TREE_DELETED, sctx); if (ret < 0) goto out; + down_read(&fs_info->commit_root_sem); } advance_right = ADVANCE; continue; } else if (right_end_reached) { if (left_level == 0) { + up_read(&fs_info->commit_root_sem); ret = changed_cb(left_path, right_path, &left_key, BTRFS_COMPARE_TREE_NEW, sctx); if (ret < 0) goto out; + down_read(&fs_info->commit_root_sem); } advance_left = ADVANCE; continue; }
if (left_level == 0 && right_level == 0) { + up_read(&fs_info->commit_root_sem); cmp = btrfs_comp_cpu_keys(&left_key, &right_key); if (cmp < 0) { ret = changed_cb(left_path, right_path, &left_key, BTRFS_COMPARE_TREE_NEW, sctx); - if (ret < 0) - goto out; advance_left = ADVANCE; } else if (cmp > 0) { ret = changed_cb(left_path, right_path, &right_key, BTRFS_COMPARE_TREE_DELETED, sctx); - if (ret < 0) - goto out; advance_right = ADVANCE; } else { enum btrfs_compare_tree_result result; @@ -7036,11 +7305,13 @@ static int btrfs_compare_trees(struct bt result = BTRFS_COMPARE_TREE_SAME; ret = changed_cb(left_path, right_path, &left_key, result, sctx); - if (ret < 0) - goto out; advance_left = ADVANCE; advance_right = ADVANCE; } + + if (ret < 0) + goto out; + down_read(&fs_info->commit_root_sem); } else if (left_level == right_level) { cmp = btrfs_comp_cpu_keys(&left_key, &right_key); if (cmp < 0) { @@ -7080,6 +7351,8 @@ static int btrfs_compare_trees(struct bt } }
+out_unlock: + up_read(&fs_info->commit_root_sem); out: btrfs_free_path(left_path); btrfs_free_path(right_path); @@ -7429,21 +7702,7 @@ long btrfs_ioctl_send(struct file *mnt_f if (ret) goto out;
- spin_lock(&fs_info->send_reloc_lock); - if (test_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags)) { - spin_unlock(&fs_info->send_reloc_lock); - btrfs_warn_rl(fs_info, - "cannot run send because a relocation operation is in progress"); - ret = -EAGAIN; - goto out; - } - fs_info->send_in_progress++; - spin_unlock(&fs_info->send_reloc_lock); - ret = send_subvol(sctx); - spin_lock(&fs_info->send_reloc_lock); - fs_info->send_in_progress--; - spin_unlock(&fs_info->send_reloc_lock); if (ret < 0) goto out;
--- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -163,6 +163,10 @@ static noinline void switch_commit_roots struct btrfs_caching_control *caching_ctl, *next;
down_write(&fs_info->commit_root_sem); + + if (test_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags)) + fs_info->last_reloc_trans = trans->transid; + list_for_each_entry_safe(root, tmp, &cur_trans->switch_commits, dirty_list) { list_del_init(&root->dirty_list);
From: Niklas Cassel niklas.cassel@wdc.com
commit 74583f1b92cb3bbba1a3741cea237545c56f506c upstream.
Commit 67d96729a9e7 ("riscv: Update Canaan Kendryte K210 device tree") incorrectly removed two entries from the PLIC interrupt-controller node's interrupts-extended property.
The PLIC driver cannot know the mapping between hart contexts and hart ids, so this information has to be provided by device tree, as specified by the PLIC device tree binding.
The PLIC driver uses the interrupts-extended property, and initializes the hart context registers in the exact same order as provided by the interrupts-extended property.
In other words, if we don't specify the S-mode interrupts, the PLIC driver will simply initialize the hart0 S-mode hart context with the hart1 M-mode configuration. It is therefore essential to specify the S-mode IRQs even though the system itself will only ever be running in M-mode.
Re-add the S-mode interrupts, so that we get working IRQs on hart1 again.
Cc: stable@vger.kernel.org Fixes: 67d96729a9e7 ("riscv: Update Canaan Kendryte K210 device tree") Signed-off-by: Niklas Cassel niklas.cassel@wdc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/boot/dts/canaan/k210.dtsi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/riscv/boot/dts/canaan/k210.dtsi +++ b/arch/riscv/boot/dts/canaan/k210.dtsi @@ -113,7 +113,8 @@ compatible = "canaan,k210-plic", "sifive,plic-1.0.0"; reg = <0xC000000 0x4000000>; interrupt-controller; - interrupts-extended = <&cpu0_intc 11 &cpu1_intc 11>; + interrupts-extended = <&cpu0_intc 11>, <&cpu0_intc 9>, + <&cpu1_intc 11>, <&cpu1_intc 9>; riscv,ndev = <65>; };
On Mon, Mar 14, 2022 at 12:53:03PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 16 Mar 2022 11:27:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.15-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, armv7, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
On 3/14/22 4:53 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 16 Mar 2022 11:27:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.15-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On Mon, Mar 14, 2022 at 12:53:03PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 16 Mar 2022 11:27:22 +0000. Anything received after that time might be too late.
Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 488 pass: 488 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Mon, 14 Mar 2022 12:53:03 +0100, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 16 Mar 2022 11:27:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.15-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
5.16.15-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) (bcm2711)
Tested-by: Fox Chen foxhlchen@gmail.com
On Mon, Mar 14, 2022 at 12:53:03PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Hi Greg,
5.16.15-rc1 tested.
Run tested on: - Allwinner H6 (Tanix TX6) - Intel Tiger Lake x86_64 (nuc11 i7-1165G7)
In addition - build tested on: - Allwinner A64 - Allwinner H3 - Allwinner H5 - NXP iMX6 - NXP iMX8 - Qualcomm Dragonboard - Rockchip RK3288 - Rockchip RK3328 - Rockchip RK3399pro - Samsung Exynos
Tested-by: Rudi Heitbaum rudi@heitbaum.com -- Rudi
On Mon, 14 Mar 2022 at 17:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 16 Mar 2022 11:27:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.15-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.16.15-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-5.16.y * git commit: b2a408c85a229c023493aced7ac25f71e2f61107 * git describe: v5.16.14-122-gb2a408c85a22 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.16.y/build/v5.16....
## Test Regressions (compared to v5.16.14) No test regressions found.
## Metric Regressions (compared to v5.16.14) No metric regressions found.
## Test Fixes (compared to v5.16.14) No test fixes found.
## Metric Fixes (compared to v5.16.14) No metric fixes found.
## Test result summary total: 115372, pass: 96926, fail: 1741, skip: 15433, xfail: 1272
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 296 total, 293 passed, 3 failed * arm64: 47 total, 47 passed, 0 failed * dragonboard-410c: 2 total, 2 passed, 0 failed * hi6220-hikey: 2 total, 2 passed, 0 failed * i386: 46 total, 42 passed, 4 failed * juno-r2: 2 total, 2 passed, 0 failed * mips: 41 total, 38 passed, 3 failed * parisc: 14 total, 14 passed, 0 failed * powerpc: 65 total, 50 passed, 15 failed * riscv: 32 total, 27 passed, 5 failed * s390: 26 total, 23 passed, 3 failed * sh: 26 total, 24 passed, 2 failed * sparc: 14 total, 14 passed, 0 failed * x15: 2 total, 2 passed, 0 failed * x86: 2 total, 2 passed, 0 failed * x86_64: 47 total, 47 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * kselftest- * kselftest-android * kselftest-arm64 * kselftest-bpf * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * perf/Zstd-perf.data-compression * prep-inline * rcutorture * ssuite * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On 3/14/22 4:53 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 16 Mar 2022 11:27:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.15-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Mon, 14 Mar 2022 12:53:03 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 16 Mar 2022 11:27:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.15-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.16: 10 builds: 10 pass, 0 fail 28 boots: 28 pass, 0 fail 122 tests: 122 pass, 0 fail
Linux version: 5.16.15-rc1-gb2a408c85a22 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 14/03/22 18.53, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.15 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully cross-compiled for arm64 (bcm2711_defconfig, gcc 10.2.0) and powerpc (ps3_defconfig, gcc 11.2.0).
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
linux-stable-mirror@lists.linaro.org