This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.11-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.4.11-rc1
Masahiro Yamada masahiroy@kernel.org alpha: remove __init annotation from exported page_is_ram()
Simon Trimmer simont@opensource.cirrus.com ACPI: scan: Create platform device for CS35L56
David Xu xuwd1@hotmail.com platform/x86: serial-multi-instantiate: Auto detect IRQ resource for CSC3551
Vadim Pasternak vadimp@nvidia.com platform: mellanox: Fix order in exit flow
Vadim Pasternak vadimp@nvidia.com platform: mellanox: mlx-platform: Modify graceful shutdown callback and power down mask
Vadim Pasternak vadimp@nvidia.com platform: mellanox: mlx-platform: Fix signals polarity and latch mask
Vadim Pasternak vadimp@nvidia.com platform: mellanox: Change register offset addresses
Hans de Goede hdegoede@redhat.com platform/x86: lenovo-ymc: Only bind on machines with a convertible DMI chassis-type
Jean Delvare jdelvare@suse.de platform/x86: msi-ec: Fix the build
Nilesh Javali njavali@marvell.com scsi: qedf: Fix firmware halt over suspend and resume
Nilesh Javali njavali@marvell.com scsi: qedi: Fix firmware halt over suspend and resume
Karan Tilak Kumar kartilak@cisco.com scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
Zhu Wang wangzhu9@huawei.com scsi: core: Fix possible memory leak if device_add() fails
Zhu Wang wangzhu9@huawei.com scsi: snic: Fix possible memory leak if device_add() fails
Alexandra Diupina adiupina@astralinux.ru scsi: 53c700: Check that command slot is not NULL
Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com scsi: ufs: renesas: Fix private allocation
Michael Kelley mikelley@microsoft.com scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
Tony Battersby tonyb@cybernetics.com scsi: core: Fix legacy /proc parsing buffer overflow
Josef Bacik josef@toxicpanda.com btrfs: set cache_block_group_error if we find an error
Qu Wenruo wqu@suse.com btrfs: reject invalid reloc tree root keys with stack dump
Qu Wenruo wqu@suse.com btrfs: exit gracefully if reloc roots don't match
Christoph Hellwig hch@lst.de btrfs: properly clear end of the unreserved range in cow_file_range
Christoph Hellwig hch@lst.de btrfs: don't wait for writeback on clean pages in extent_write_cache_pages
Christoph Hellwig hch@lst.de btrfs: don't stop integrity writeback too early
Josef Bacik josef@toxicpanda.com btrfs: wait for actual caching progress during allocation
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: sim: mark the GPIO chip as a one that can sleep
William Breathitt Gray william.gray@linaro.org gpio: ws16c48: Fix off-by-one error in WS16C48 resource region extent
Nick Child nnac123@linux.ibm.com ibmvnic: Ensure login failure recovery is safe from other resets
Nick Child nnac123@linux.ibm.com ibmvnic: Do partial reset on login failure
Nick Child nnac123@linux.ibm.com ibmvnic: Handle DMA unmapping of login buffs in release functions
Nick Child nnac123@linux.ibm.com ibmvnic: Unmap DMA login rsp buffer on send login fail
Nick Child nnac123@linux.ibm.com ibmvnic: Enforce stronger sanity checks on login response
Moshe Shemesh moshe@nvidia.com net/mlx5: Reload auxiliary devices in pci error handlers
Moshe Shemesh moshe@nvidia.com net/mlx5: Skip clock update work when device is in error state
Shay Drory shayd@nvidia.com net/mlx5: LAG, Check correct bucket when modifying LAG
Chris Mi cmi@nvidia.com net/mlx5e: Unoffload post act rule when handling FIB events
Daniel Jurgens danielj@nvidia.com net/mlx5: Allow 0 for total host VFs
Yevgeny Kliteynik kliteyn@nvidia.com net/mlx5: DR, Fix wrong allocation of modify hdr pattern
Jianbo Liu jianbol@nvidia.com net/mlx5e: TC, Fix internal port memory leak
Gal Pressman gal@nvidia.com net/mlx5e: Take RTNL lock when needed before calling xdp_set_features()
Zhang Jianhua chris.zjh@huawei.com dmaengine: owl-dma: Modify mismatched function name
Fenghua Yu fenghua.yu@intel.com dmaengine: idxd: Clear PRS disable flag when disabling IDXD device
Christophe JAILLET christophe.jaillet@wanadoo.fr dmaengine: mcf-edma: Fix a potential un-allocated memory access
Hao Chen chenhao418@huawei.com net: hns3: fix strscpy causing content truncation issue
Ido Schimmel idosch@nvidia.com nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
Ido Schimmel idosch@nvidia.com nexthop: Make nexthop bucket dump more efficient
Ido Schimmel idosch@nvidia.com nexthop: Fix infinite nexthop dump when using maximum nexthop ID
Vladimir Oltean vladimir.oltean@nxp.com net: enetc: reimplement RFS/RSS memory clearing as PCI quirk
Yonglong Liu liuyonglong@huawei.com net: hns3: fix deadlock issue when externel_lb and reset are executed together
Jie Wang wangjie125@huawei.com net: hns3: add wait until mac link down
Jie Wang wangjie125@huawei.com net: hns3: refactor hclge_mac_link_status_wait for interface reuse
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: ocelot: call dsa_tag_8021q_unregister() under rtnl_lock() on driver remove
Li Yang leoyang.li@nxp.com net: phy: at803x: remove set/get wol callbacks for AR8032
Jonas Gorski jonas.gorski@bisdn.de net: marvell: prestera: fix handling IPv4 routes with nhid
Jakub Kicinski kuba@kernel.org net: tls: avoid discarding data on record close
Kalesh AP kalesh-anakkur.purayil@broadcom.com RDMA/bnxt_re: Fix error handling in probe failure path
Selvin Xavier selvin.xavier@broadcom.com RDMA/bnxt_re: Properly order ib_device_unalloc() to avoid UAF
Michael Guralnik michaelgur@nvidia.com RDMA/umem: Set iova in ODP flow
Felix Fietkau nbd@nbd.name wifi: cfg80211: fix sband iftype data lookup for AP_VLAN
Petr Tesarik petr.tesarik.ext@huawei.com wifi: brcm80211: handle params_v1 allocation failure
Daniel Stone daniels@collabora.com drm/rockchip: Don't spam logs in atomic check
Arnd Bergmann arnd@arndb.de drm/nouveau: remove unused tu102_gr_load() function
Pin-yen Lin treapking@chromium.org drm/bridge: it6505: Check power state with it6505->powered in IRQ handler
Mario Limonciello mario.limonciello@amd.com drm/amd/display: Don't show stack trace for missing eDP
Douglas Miller doug.miller@cornelisnetworks.com IB/hfi1: Fix possible panic during hotplug remove
Piotr Gardocki piotrx.gardocki@intel.com iavf: fix potential races for FDIR filters
Fedor Pchelkin pchelkin@ispras.ru drivers: vxlan: vnifilter: free percpu vni stats on error path
Andrew Kanner andrew.kanner@gmail.com drivers: net: prevent tun_build_skb() to exceed the packet size limit
Eric Dumazet edumazet@google.com dccp: fix data-race around dp->dccps_mss_cache
Ziyang Xuan william.xuanziyang@huawei.com bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
Magnus Karlsson magnus.karlsson@intel.com xsk: fix refcount underflow in error path
Florian Westphal fw@strlen.de tunnels: fix kasan splat when generating ipv4 pmtu error
Eric Dumazet edumazet@google.com tcp: add missing family to tcp_set_ca_state() tracepoint
Vladimir Oltean vladimir.oltean@nxp.com PCI: move OF status = "disabled" detection to dev->match_driver
Gerd Bayer gbayer@linux.ibm.com net/smc: Use correct buffer sizes when switching between TCP and SMC
Gerd Bayer gbayer@linux.ibm.com net/smc: Fix setsockopt and sysctl to specify same buffer size again
Eric Dumazet edumazet@google.com net/packet: annotate data-races around tp->status
Nitya Sunkad nitya.sunkad@amd.com ionic: Add missing err handling for queue reconfig
Muhammad Husaini Zulkifli muhammad.husaini.zulkifli@intel.com igc: Add lock to safeguard global Qbv variables
Xiang Yang xiangyang3@huawei.com mptcp: fix the incorrect judgment for msk->cb_flags
Eric Dumazet edumazet@google.com macsec: use DEV_STATS_INC()
Nathan Chancellor nathan@kernel.org mISDN: Update parameter type of dsp_cmx_send()
Aleksa Savic savicaleksa83@gmail.com hwmon: (aquacomputer_d5next) Add selective 200ms delay after sending ctrl report
Xu Kuohai xukuohai@huawei.com bpf, sockmap: Fix bug that strp_done cannot be called
Xu Kuohai xukuohai@huawei.com bpf, sockmap: Fix map type error in sock_map_del_link
Andrew Kanner andrew.kanner@gmail.com net: core: remove unnecessary frame_sz check in bpf_xdp_adjust_tail()
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb: Make test more robust
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb: Fix failing test with old libnet
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb_max: Fix failing test with old libnet
Ido Schimmel idosch@nvidia.com selftests: forwarding: tc_flower: Relax success criterion
Ido Schimmel idosch@nvidia.com selftests: forwarding: tc_actions: Use ncat instead of nc
Ido Schimmel idosch@nvidia.com selftests: forwarding: Switch off timeout
Ido Schimmel idosch@nvidia.com selftests: forwarding: Skip test when no interfaces are specified
Ido Schimmel idosch@nvidia.com selftests: forwarding: hw_stats_l3_gre: Skip when using veth pairs
Ido Schimmel idosch@nvidia.com selftests: forwarding: ethtool_extended_state: Skip when using veth pairs
Ido Schimmel idosch@nvidia.com selftests: forwarding: ethtool: Skip when using veth pairs
Ido Schimmel idosch@nvidia.com selftests: forwarding: Add a helper to skip test when using veth pairs
Mark Brown broonie@kernel.org selftests/rseq: Fix build with undefined __weak
Xu Kuohai xukuohai@huawei.com selftests/bpf: fix a CI failure caused by vsock sockmap test
Minjie Du duminjie@vivo.com dmaengine: xilinx: xdma: Fix Judgment of the return value
Miquel Raynal miquel.raynal@bootlin.com dmaengine: xilinx: xdma: Fix typo
Raghavendra Rao Ananta rananta@google.com KVM: arm64: Fix hardware enable/disable flows for pKVM
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb: Check iproute2 version
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb_max: Check iproute2 version
Ido Schimmel idosch@nvidia.com selftests: forwarding: ethtool_mm: Skip when MAC Merge is not supported
Ido Schimmel idosch@nvidia.com selftests: forwarding: tc_tunnel_key: Make filters more specific
Neil Armstrong neil.armstrong@linaro.org interconnect: qcom: sm8550: add enable_mask for bcm nodes
Neil Armstrong neil.armstrong@linaro.org interconnect: qcom: sm8450: add enable_mask for bcm nodes
Neil Armstrong neil.armstrong@linaro.org interconnect: qcom: sa8775p: add enable_mask for bcm nodes
Mike Tipton mdtipton@codeaurora.org interconnect: qcom: Add support for mask-based BCMs
Matti Vaittinen mazziesaccount@gmail.com iio: light: bu27034: Fix scale format
Milan Zamazal mzamazal@redhat.com iio: core: Prevent invalid memory access when there is no parent
Alejandro Tafalla atafalla@dnyon.com iio: imu: lsm6dsx: Fix mount matrix retrieval
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_hash: mark set element as dead when deleting from packet path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: adapt set backend to use GC transaction API
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: GC transaction API to avoid race with control plane
Florian Westphal fw@strlen.de netfilter: nf_tables: don't skip expired elements during walk
Karol Herbst kherbst@redhat.com drm/nouveau/disp: Revert a NULL check inside nouveau_connector_get_modes
Bjorn Helgaas bhelgaas@google.com Revert "PCI: mvebu: Mark driver as BROKEN"
Arnd Bergmann arnd@arndb.de x86: Move gds_ucode_mitigated() declaration to header
Arnd Bergmann arnd@arndb.de x86/speculation: Add cpu_show_gds() prototype
Jinghao Jia jinghao@linux.ibm.com x86/linkage: Fix typo of BUILD_VDSO in asm/linkage.h
Borislav Petkov (AMD) bp@alien8.de x86/sev: Do not try to parse for the CC blob on non-AMD hardware
Kirill A. Shutemov kirill.shutemov@linux.intel.com x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
Cristian Ciocaltea cristian.ciocaltea@collabora.com x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405
Xin Li xin3.li@intel.com x86/vdso: Choose the right GDT_ENTRY_CPUNODE for 32-bit getcpu() on 64-bit kernel
Nick Desaulniers ndesaulniers@google.com x86/srso: Fix build breakage with the LLVM linker
RD Babiera rdbabiera@google.com usb: typec: altmodes/displayport: Signal hpd when configuring pin assignment
Badhri Jagan Sridharan badhri@google.com usb: typec: tcpm: Fix response to vsafe0V event
Prashanth K quic_prashk@quicinc.com usb: common: usb-conn-gpio: Prevent bailing out if initial role is none
Alan Stern stern@rowland.harvard.edu USB: Gadget: core: Help prevent panic during UVC unconfigure
Elson Roy Serrao quic_eserrao@quicinc.com usb: dwc3: Properly handle processing of pending events
Alan Stern stern@rowland.harvard.edu usb-storage: alauda: Fix uninit-value in alauda_check_media()
Mika Westerberg mika.westerberg@linux.intel.com thunderbolt: Fix memory leak in tb_handle_dp_bandwidth_request()
Ricky WU ricky_wu@realtek.com misc: rtsx: judge ASPM Mode to set PETXCFG Reg
Qi Zheng zhengqi.arch@bytedance.com binder: fix memory leak in binder_init()
Alvin Šipraga alsi@bang-olufsen.dk iio: adc: ina2xx: avoid NULL pointer dereference on OF device match
George Stark gnstark@sberdevices.ru iio: adc: meson: fix core clock enable/disable moment
Alisa Roman alisa.roman@analog.com iio: adc: ad7192: Fix ac excitation feature
Dan Carpenter dan.carpenter@linaro.org iio: frequency: admv1013: propagate errors from regulator_get_voltage()
Yiyuan Guo yguoaz@gmail.com iio: cros_ec: Fix the allocation size for cros_ec_command
Evan Quan evan.quan@amd.com drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation
Evan Quan evan.quan@amd.com drm/amd/pm: expose swctf threshold setting for legacy powerplay
Miaohe Lin linmiaohe@huawei.com mm: memory-failure: avoid false hwpoison page mapped error info
Miaohe Lin linmiaohe@huawei.com mm: memory-failure: fix potential unexpected return value from unpoison_memory()
Ayush Jain ayush.jain3@amd.com selftests: mm: ksm: fix incorrect evaluation of parameter
SeongJae Park sj@kernel.org mm/damon/core: initialize damo_filter->list from damos_new_filter()
Mike Kravetz mike.kravetz@oracle.com hugetlb: do not clear hugetlb dtor until allocating vmemmap
Karol Wachowski karol.wachowski@linux.intel.com accel/ivpu: Add set_pages_array_wc/uc for internal buffers
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput
Lorenzo Stoakes lstoakes@gmail.com fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions
Thomas Weißschuh linux@weissschuh.net cpufreq: amd-pstate: fix global sysfs attribute type
Colin Ian King colin.i.king@gmail.com radix tree test suite: fix incorrect allocation size for pthreads
Tao Ren rentao.bupt@gmail.com hwmon: (pmbus/bel-pfe) Enable PMBUS_SKIP_STATUS_CHECK for pfe1100
Andrew Yang andrew.yang@mediatek.com zsmalloc: fix races between modifications of fullness and isolated
Aleksa Sarai cyphar@cyphar.com io_uring: correct check for O_TMPFILE
Maulik Shah quic_mkshah@quicinc.com cpuidle: psci: Move enabling OSI mode after power domains creation
Maulik Shah quic_mkshah@quicinc.com cpuidle: dt_idle_genpd: Add helper function to remove genpd topology
Jarkko Sakkinen jarkko@kernel.org tpm_tis: Opt-in interrupts
Peter Ujfalusi peter.ujfalusi@linux.intel.com tpm: tpm_tis: Fix UPX-i11 DMI_MATCH condition
Mario Limonciello mario.limonciello@amd.com drm/amd: Disable S/G for APUs when 64GB or more host memory
Melissa Wen mwen@igalia.com drm/amd/display: check attr flag before set cursor degamma on DCN3+
Mario Limonciello mario.limonciello@amd.com drm/amd/display: Fix a regression on Polaris cards
Kenneth Feng kenneth.feng@amd.com drm/amd/pm: correct the pcie width for smu 13.0.0
Alex Deucher alexander.deucher@amd.com drm/amdgpu: fix possible UAF in amdgpu_cs_pass1()
Boris Brezillon boris.brezillon@collabora.com drm/shmem-helper: Reset vma->vm_ops before calling dma_buf_mmap()
Lyude Paul lyude@redhat.com drm/nouveau/nvkm/dp: Add workaround to fix DP 1.3+ DPCD issues
Karol Herbst kherbst@redhat.com drm/nouveau/gr: enable memory loads on helper invocation on all channels
August Wikerfors git@augustwikerfors.se nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G
Ming Lei ming.lei@redhat.com nvme-rdma: fix potential unbalanced freeze & unfreeze
Ming Lei ming.lei@redhat.com nvme-tcp: fix potential unbalanced freeze & unfreeze
Ming Lei ming.lei@redhat.com nvme: fix possible hang when removing a controller during error recovery
Nick Desaulniers ndesaulniers@google.com riscv: mm: fix 2 instances of -Wmissing-variable-declarations
Torsten Duwe duwe@suse.de riscv/kexec: handle R_RISCV_CALL_PLT relocation type
Andrea Parri parri.andrea@gmail.com riscv,mmio: Fix readX()-to-delay() ordering
Torsten Duwe duwe@suse.de riscv/kexec: load initrd high in available memory
Alexandre Ghiti alexghiti@rivosinc.com riscv: Start of DRAM should at least be aligned on PMD size for the direct mapping
Helge Deller deller@gmx.de parisc: Fix lightweight spinlock checks to not break futexes
Helge Deller deller@gmx.de io_uring/parisc: Adjust pgoff in io_uring mmap() for parisc
Christoph Hellwig hch@lst.de zram: take device and not only bvec offset into account
Hans de Goede hdegoede@redhat.com ACPI: resource: Add IRQ override quirk for PCSpecialist Elimina Pro 16 M
Hans de Goede hdegoede@redhat.com ACPI: resource: Honor MADT INT_SRC_OVR settings for IRQ1 on AMD Zen
Hans de Goede hdegoede@redhat.com ACPI: resource: Always use MADT override IRQ settings for all legacy non i8042 IRQs
Hans de Goede hdegoede@redhat.com ACPI: resource: revert "Remove "Zen" specific match and quirks"
Souradeep Chakrabarti schakrabarti@linux.microsoft.com net: mana: Fix MANA VF unload when hardware is unresponsive
Miquel Raynal miquel.raynal@bootlin.com dmaengine: xilinx: xdma: Fix interrupt vector setting
Ilpo Järvinen ilpo.jarvinen@linux.intel.com dmaengine: pl330: Return DMA_PAUSED when transaction is paused
Paolo Abeni pabeni@redhat.com mptcp: fix disconnect vs accept race
Paolo Abeni pabeni@redhat.com mptcp: avoid bogus reset on fallback close
Andrea Claudi aclaudi@redhat.com selftests: mptcp: join: fix 'implicit EP' test
Andrea Claudi aclaudi@redhat.com selftests: mptcp: join: fix 'delete and re-add' test
Maciej Żenczykowski maze@google.com ipv6: adjust ndisc_is_useropt() to also return true for PIO
Kunihiko Hayashi hayashi.kunihiko@socionext.com mmc: sdhci-f-sdh30: Replace with sdhci_pltfm
Sergei Antonov saproj@gmail.com mmc: moxart: read scr register without changing byte order
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: expand maximum node depth
Ido Schimmel idosch@nvidia.com selftests: forwarding: Set default IPv6 traceroute utility
Ping-Ke Shih pkshih@realtek.com wifi: rtw89: fix 8852AE disconnection caused by RX full flags
Keith Yeo keithyjy@gmail.com wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems()
Paolo Bonzini pbonzini@redhat.com KVM: SEV: only access GHCB fields once
Paolo Bonzini pbonzini@redhat.com KVM: SEV: snapshot the GHCB before accessing it
Namjae Jeon linkinjeon@kernel.org ksmbd: fix wrong next length validation of ea buffer in smb2_set_ea()
Long Li leo.lilong@huawei.com ksmbd: validate command request size
Mario Limonciello mario.limonciello@amd.com tpm: Add a helper for checking hwrng enabled
Jonathan McDowell noodles@meta.com tpm/tpm_tis: Disable interrupts for Lenovo P620 devices
Mario Limonciello mario.limonciello@amd.com tpm: Disable RNG for all AMD fTPMs
Takashi Iwai tiwai@suse.de tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7
-------------
Diffstat:
Makefile | 4 +- arch/alpha/kernel/setup.c | 3 +- arch/arm64/kvm/arm.c | 15 +- arch/parisc/Kconfig.debug | 2 +- arch/parisc/include/asm/spinlock.h | 2 - arch/parisc/include/asm/spinlock_types.h | 6 + arch/parisc/kernel/sys_parisc.c | 15 +- arch/parisc/kernel/syscall.S | 23 +- arch/riscv/include/asm/mmio.h | 16 +- arch/riscv/include/asm/pgtable.h | 2 + arch/riscv/kernel/elf_kexec.c | 3 +- arch/riscv/mm/init.c | 16 +- arch/riscv/mm/kasan_init.c | 1 - arch/x86/boot/compressed/idt_64.c | 9 +- arch/x86/boot/compressed/sev.c | 37 ++- arch/x86/entry/vdso/vma.c | 4 +- arch/x86/include/asm/acpi.h | 2 + arch/x86/include/asm/linkage.h | 2 +- arch/x86/include/asm/processor.h | 2 + arch/x86/include/asm/segment.h | 2 +- arch/x86/kernel/acpi/boot.c | 4 + arch/x86/kernel/cpu/amd.c | 1 + arch/x86/kernel/vmlinux.lds.S | 12 +- arch/x86/kvm/svm/sev.c | 94 ++++---- arch/x86/kvm/svm/svm.h | 26 +++ arch/x86/kvm/x86.c | 2 - drivers/accel/ivpu/ivpu_gem.c | 8 + drivers/acpi/resource.c | 64 +++++ drivers/acpi/scan.c | 1 + drivers/android/binder.c | 1 + drivers/android/binder_alloc.c | 6 + drivers/android/binder_alloc.h | 1 + drivers/block/zram/zram_drv.c | 32 ++- drivers/char/tpm/tpm-chip.c | 83 ++----- drivers/char/tpm/tpm_crb.c | 30 +++ drivers/char/tpm/tpm_tis.c | 20 +- drivers/cpufreq/amd-pstate.c | 10 +- drivers/cpuidle/cpuidle-psci-domain.c | 39 ++-- drivers/cpuidle/dt_idle_genpd.c | 24 ++ drivers/cpuidle/dt_idle_genpd.h | 7 + drivers/dma/idxd/device.c | 4 +- drivers/dma/mcf-edma.c | 13 +- drivers/dma/owl-dma.c | 2 +- drivers/dma/pl330.c | 18 +- drivers/dma/xilinx/xdma.c | 6 +- drivers/gpio/gpio-sim.c | 1 + drivers/gpio/gpio-ws16c48.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 +++ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 +- .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 2 +- .../amd/display/dc/dce110/dce110_hw_sequencer.c | 3 +- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c | 7 +- drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 2 + drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 48 ++++ .../drm/amd/pm/powerplay/hwmgr/hardwaremanager.c | 4 +- .../gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 2 + .../gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c | 27 +-- .../gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 10 + .../gpu/drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c | 4 + .../gpu/drm/amd/pm/powerplay/hwmgr/vega20_hwmgr.c | 4 + drivers/gpu/drm/amd/pm/powerplay/inc/hwmgr.h | 2 + drivers/gpu/drm/amd/pm/powerplay/inc/power_state.h | 1 + drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 34 +++ drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 2 + drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 9 +- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 9 +- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 3 +- drivers/gpu/drm/bridge/ite-it6505.c | 4 +- drivers/gpu/drm/drm_gem_shmem_helper.c | 6 + drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c | 48 +++- drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c | 4 +- drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c | 10 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm107.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c | 13 -- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 17 +- drivers/hwmon/aquacomputer_d5next.c | 37 ++- drivers/hwmon/pmbus/bel-pfe.c | 16 +- drivers/iio/adc/ad7192.c | 16 +- drivers/iio/adc/ina2xx-adc.c | 9 +- drivers/iio/adc/meson_saradc.c | 23 +- .../common/cros_ec_sensors/cros_ec_sensors_core.c | 2 +- drivers/iio/frequency/admv1013.c | 5 +- drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 2 +- drivers/iio/industrialio-core.c | 5 +- drivers/iio/light/rohm-bu27034.c | 22 +- drivers/infiniband/core/umem.c | 3 +- drivers/infiniband/hw/bnxt_re/main.c | 4 +- drivers/infiniband/hw/hfi1/chip.c | 1 + drivers/interconnect/qcom/bcm-voter.c | 5 + drivers/interconnect/qcom/icc-rpmh.h | 2 + drivers/interconnect/qcom/sa8775p.c | 1 + drivers/interconnect/qcom/sm8450.c | 9 + drivers/interconnect/qcom/sm8550.c | 17 ++ drivers/isdn/mISDN/dsp.h | 2 +- drivers/isdn/mISDN/dsp_cmx.c | 2 +- drivers/isdn/mISDN/dsp_core.c | 2 +- drivers/misc/cardreader/rts5227.c | 2 +- drivers/misc/cardreader/rts5228.c | 18 -- drivers/misc/cardreader/rts5249.c | 3 +- drivers/misc/cardreader/rts5260.c | 18 -- drivers/misc/cardreader/rts5261.c | 18 -- drivers/misc/cardreader/rtsx_pcr.c | 5 +- drivers/mmc/host/moxart-mmc.c | 8 +- drivers/mmc/host/sdhci_f_sdh30.c | 60 +++-- drivers/net/bonding/bond_main.c | 4 +- drivers/net/dsa/ocelot/felix.c | 2 + drivers/net/ethernet/freescale/enetc/enetc_pf.c | 103 +++++--- drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 4 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 14 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 4 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 24 +- drivers/net/ethernet/ibm/ibmvnic.c | 112 +++++++-- drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 5 +- drivers/net/ethernet/intel/iavf/iavf_fdir.c | 11 +- drivers/net/ethernet/intel/igc/igc.h | 4 + drivers/net/ethernet/intel/igc/igc_main.c | 34 ++- .../ethernet/marvell/prestera/prestera_router.c | 14 +- .../ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 6 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 21 +- .../net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 2 +- .../net/ethernet/mellanox/mlx5/core/lib/clock.c | 5 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 +- .../ethernet/mellanox/mlx5/core/steering/dr_ptrn.c | 2 +- drivers/net/ethernet/microsoft/mana/mana_en.c | 37 ++- drivers/net/ethernet/pensando/ionic/ionic_lif.c | 23 +- drivers/net/macsec.c | 28 +-- drivers/net/phy/at803x.c | 2 - drivers/net/tun.c | 2 +- drivers/net/vxlan/vxlan_vnifilter.c | 11 +- drivers/net/wireguard/allowedips.c | 8 +- drivers/net/wireguard/selftest/allowedips.c | 16 +- .../broadcom/brcm80211/brcmfmac/cfg80211.c | 5 + drivers/net/wireless/realtek/rtw89/mac.c | 2 +- drivers/nvme/host/core.c | 10 +- drivers/nvme/host/pci.c | 3 +- drivers/nvme/host/rdma.c | 3 +- drivers/nvme/host/tcp.c | 3 +- drivers/pci/bus.c | 4 +- drivers/pci/controller/Kconfig | 1 - drivers/pci/of.c | 5 - drivers/platform/x86/lenovo-ymc.c | 25 ++ drivers/platform/x86/mlx-platform.c | 23 +- drivers/platform/x86/msi-ec.c | 18 +- drivers/platform/x86/serial-multi-instantiate.c | 35 ++- drivers/scsi/53c700.c | 2 +- drivers/scsi/fnic/fnic.h | 2 +- drivers/scsi/fnic/fnic_scsi.c | 6 +- drivers/scsi/qedf/qedf_main.c | 18 ++ drivers/scsi/qedi/qedi_main.c | 18 ++ drivers/scsi/raid_class.c | 1 + drivers/scsi/scsi_proc.c | 30 +-- drivers/scsi/snic/snic_disc.c | 1 + drivers/scsi/storvsc_drv.c | 4 - drivers/thunderbolt/tb.c | 2 + drivers/ufs/host/ufs-renesas.c | 2 +- drivers/usb/common/usb-conn-gpio.c | 6 +- drivers/usb/dwc3/gadget.c | 9 +- drivers/usb/gadget/udc/core.c | 9 + drivers/usb/storage/alauda.c | 12 +- drivers/usb/typec/altmodes/displayport.c | 18 +- drivers/usb/typec/tcpm/tcpm.c | 7 + fs/btrfs/block-group.c | 17 +- fs/btrfs/block-group.h | 2 + fs/btrfs/disk-io.c | 3 +- fs/btrfs/extent-tree.c | 5 +- fs/btrfs/extent_io.c | 13 +- fs/btrfs/inode.c | 10 +- fs/btrfs/relocation.c | 45 +++- fs/btrfs/tree-checker.c | 14 ++ fs/nilfs2/inode.c | 8 + fs/nilfs2/segment.c | 2 + fs/nilfs2/the_nilfs.h | 2 + fs/proc/kcore.c | 30 ++- fs/smb/server/smb2misc.c | 10 +- fs/smb/server/smb2pdu.c | 9 +- include/linux/cpu.h | 2 + include/linux/skmsg.h | 1 + include/linux/tpm.h | 1 + include/net/cfg80211.h | 3 + include/net/netfilter/nf_tables.h | 64 ++++- include/trace/events/tcp.h | 5 +- io_uring/io_uring.c | 3 + io_uring/openclose.c | 6 +- mm/damon/core.c | 1 + mm/hugetlb.c | 75 ++++-- mm/memory-failure.c | 29 +-- mm/zsmalloc.c | 14 +- net/core/filter.c | 6 - net/core/skmsg.c | 10 +- net/core/sock_map.c | 10 +- net/dccp/output.c | 2 +- net/dccp/proto.c | 10 +- net/ipv4/ip_tunnel_core.c | 2 +- net/ipv4/nexthop.c | 28 +-- net/ipv6/ndisc.c | 3 +- net/mptcp/protocol.c | 4 +- net/mptcp/protocol.h | 1 - net/mptcp/subflow.c | 58 ++--- net/netfilter/nf_tables_api.c | 259 +++++++++++++++++++-- net/netfilter/nft_set_hash.c | 85 ++++--- net/netfilter/nft_set_pipapo.c | 66 ++++-- net/netfilter/nft_set_rbtree.c | 146 +++++++----- net/packet/af_packet.c | 16 +- net/smc/af_smc.c | 77 ++++-- net/smc/smc.h | 2 +- net/smc/smc_clc.c | 4 +- net/smc/smc_core.c | 25 +- net/smc/smc_sysctl.c | 10 +- net/tls/tls_device.c | 64 ++--- net/wireless/nl80211.c | 5 +- net/xdp/xsk.c | 1 + tools/testing/radix-tree/regression1.c | 2 +- .../selftests/bpf/prog_tests/sockmap_listen.c | 2 +- tools/testing/selftests/mm/ksm_tests.c | 1 + tools/testing/selftests/net/fib_nexthops.sh | 10 + .../testing/selftests/net/forwarding/bridge_mdb.sh | 59 ++--- .../selftests/net/forwarding/bridge_mdb_max.sh | 19 +- tools/testing/selftests/net/forwarding/ethtool.sh | 2 + .../net/forwarding/ethtool_extended_state.sh | 2 + .../testing/selftests/net/forwarding/ethtool_mm.sh | 18 +- .../selftests/net/forwarding/hw_stats_l3_gre.sh | 2 + .../net/forwarding/ip6_forward_instats_vrf.sh | 2 + tools/testing/selftests/net/forwarding/lib.sh | 17 ++ tools/testing/selftests/net/forwarding/settings | 1 + .../testing/selftests/net/forwarding/tc_actions.sh | 6 +- .../testing/selftests/net/forwarding/tc_flower.sh | 8 +- .../selftests/net/forwarding/tc_tunnel_key.sh | 9 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 6 +- tools/testing/selftests/rseq/Makefile | 4 +- tools/testing/selftests/rseq/rseq.c | 2 + 238 files changed, 2516 insertions(+), 1034 deletions(-)
From: Takashi Iwai tiwai@suse.de
commit 0b15afc9038146bb2009e7924b1ead2e919b2a56 upstream.
TUXEDO InfinityBook S 15/17 Gen7 suffers from an IRQ problem on tpm_tis like a few other laptops. Add an entry for the workaround.
Cc: stable@vger.kernel.org Fixes: e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") Link: https://bugzilla.suse.com/show_bug.cgi?id=1213645 Signed-off-by: Takashi Iwai tiwai@suse.de Acked-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_tis.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index cc42cf3de960..a98773ac2e55 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -162,6 +162,14 @@ static const struct dmi_system_id tpm_tis_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad L590"), }, }, + { + .callback = tpm_tis_disable_irq, + .ident = "TUXEDO InfinityBook S 15/17 Gen7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"), + DMI_MATCH(DMI_PRODUCT_NAME, "TUXEDO InfinityBook S 15/17 Gen7"), + }, + }, { .callback = tpm_tis_disable_irq, .ident = "UPX-TGL",
From: Mario Limonciello mario.limonciello@amd.com
commit 554b841d470338a3b1d6335b14ee1cd0c8f5d754 upstream.
The TPM RNG functionality is not necessary for entropy when the CPU already supports the RDRAND instruction. The TPM RNG functionality was previously disabled on a subset of AMD fTPM series, but reports continue to show problems on some systems causing stutter root caused to TPM RNG functionality.
Expand disabling TPM RNG use for all AMD fTPMs whether they have versions that claim to have fixed or not. To accomplish this, move the detection into part of the TPM CRB registration and add a flag indicating that the TPM should opt-out of registration to hwrng.
Cc: stable@vger.kernel.org # 6.1.y+ Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Fixes: f1324bbc4011 ("tpm: disable hwrng for fTPM on some AMD designs") Reported-by: daniil.stas@posteo.net Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217719 Reported-by: bitlord0xff@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217212 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm-chip.c | 68 +------------------------------------------- drivers/char/tpm/tpm_crb.c | 30 +++++++++++++++++++ include/linux/tpm.h | 1 3 files changed, 33 insertions(+), 66 deletions(-)
--- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -510,70 +510,6 @@ static int tpm_add_legacy_sysfs(struct t return 0; }
-/* - * Some AMD fTPM versions may cause stutter - * https://www.amd.com/en/support/kb/faq/pa-410 - * - * Fixes are available in two series of fTPM firmware: - * 6.x.y.z series: 6.0.18.6 + - * 3.x.y.z series: 3.57.y.5 + - */ -#ifdef CONFIG_X86 -static bool tpm_amd_is_rng_defective(struct tpm_chip *chip) -{ - u32 val1, val2; - u64 version; - int ret; - - if (!(chip->flags & TPM_CHIP_FLAG_TPM2)) - return false; - - ret = tpm_request_locality(chip); - if (ret) - return false; - - ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val1, NULL); - if (ret) - goto release; - if (val1 != 0x414D4400U /* AMD */) { - ret = -ENODEV; - goto release; - } - ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_1, &val1, NULL); - if (ret) - goto release; - ret = tpm2_get_tpm_pt(chip, TPM2_PT_FIRMWARE_VERSION_2, &val2, NULL); - -release: - tpm_relinquish_locality(chip); - - if (ret) - return false; - - version = ((u64)val1 << 32) | val2; - if ((version >> 48) == 6) { - if (version >= 0x0006000000180006ULL) - return false; - } else if ((version >> 48) == 3) { - if (version >= 0x0003005700000005ULL) - return false; - } else { - return false; - } - - dev_warn(&chip->dev, - "AMD fTPM version 0x%llx causes system stutter; hwrng disabled\n", - version); - - return true; -} -#else -static inline bool tpm_amd_is_rng_defective(struct tpm_chip *chip) -{ - return false; -} -#endif /* CONFIG_X86 */ - static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait) { struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng); @@ -588,7 +524,7 @@ static int tpm_hwrng_read(struct hwrng * static int tpm_add_hwrng(struct tpm_chip *chip) { if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip) || - tpm_amd_is_rng_defective(chip)) + chip->flags & TPM_CHIP_FLAG_HWRNG_DISABLED) return 0;
snprintf(chip->hwrng_name, sizeof(chip->hwrng_name), @@ -719,7 +655,7 @@ void tpm_chip_unregister(struct tpm_chip { tpm_del_legacy_sysfs(chip); if (IS_ENABLED(CONFIG_HW_RANDOM_TPM) && !tpm_is_firmware_upgrade(chip) && - !tpm_amd_is_rng_defective(chip)) + !(chip->flags & TPM_CHIP_FLAG_HWRNG_DISABLED)) hwrng_unregister(&chip->hwrng); tpm_bios_log_teardown(chip); if (chip->flags & TPM_CHIP_FLAG_TPM2 && !tpm_is_firmware_upgrade(chip)) --- a/drivers/char/tpm/tpm_crb.c +++ b/drivers/char/tpm/tpm_crb.c @@ -463,6 +463,28 @@ static bool crb_req_canceled(struct tpm_ return (cancel & CRB_CANCEL_INVOKE) == CRB_CANCEL_INVOKE; }
+static int crb_check_flags(struct tpm_chip *chip) +{ + u32 val; + int ret; + + ret = crb_request_locality(chip, 0); + if (ret) + return ret; + + ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL); + if (ret) + goto release; + + if (val == 0x414D4400U /* AMD */) + chip->flags |= TPM_CHIP_FLAG_HWRNG_DISABLED; + +release: + crb_relinquish_locality(chip, 0); + + return ret; +} + static const struct tpm_class_ops tpm_crb = { .flags = TPM_OPS_AUTO_STARTUP, .status = crb_status, @@ -800,6 +822,14 @@ static int crb_acpi_add(struct acpi_devi chip->acpi_dev_handle = device->handle; chip->flags = TPM_CHIP_FLAG_TPM2;
+ rc = tpm_chip_bootstrap(chip); + if (rc) + goto out; + + rc = crb_check_flags(chip); + if (rc) + goto out; + rc = tpm_chip_register(chip);
out: --- a/include/linux/tpm.h +++ b/include/linux/tpm.h @@ -283,6 +283,7 @@ enum tpm_chip_flags { TPM_CHIP_FLAG_FIRMWARE_POWER_MANAGED = BIT(6), TPM_CHIP_FLAG_FIRMWARE_UPGRADE = BIT(7), TPM_CHIP_FLAG_SUSPENDED = BIT(8), + TPM_CHIP_FLAG_HWRNG_DISABLED = BIT(9), };
#define to_tpm_chip(d) container_of(d, struct tpm_chip, dev)
From: Jonathan McDowell noodles@meta.com
commit e117e7adc637e364b599dc766f1d740698e7e027 upstream.
The Lenovo ThinkStation P620 suffers from an irq storm issue like various other Lenovo machines, so add an entry for it to tpm_tis_dmi_table and force polling.
It is worth noting that 481c2d14627d (tpm,tpm_tis: Disable interrupts after 1000 unhandled IRQs) does not seem to fix the problem on this machine, but setting 'tpm_tis.interrupts=0' on the kernel command line does.
[jarkko@kernel.org: truncated the commit ID in the description to 12 characters] Cc: stable@vger.kernel.org # v6.4+ Fixes: e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") Signed-off-by: Jonathan McDowell noodles@meta.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_tis.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -164,6 +164,14 @@ static const struct dmi_system_id tpm_ti }, { .callback = tpm_tis_disable_irq, + .ident = "ThinkStation P620", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkStation P620"), + }, + }, + { + .callback = tpm_tis_disable_irq, .ident = "TUXEDO InfinityBook S 15/17 Gen7", .matches = { DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
From: Mario Limonciello mario.limonciello@amd.com
commit cacc6e22932f373a91d7be55a9b992dc77f4c59b upstream.
The same checks are repeated in three places to decide whether to use hwrng. Consolidate these into a helper.
Also this fixes a case that one of them was missing a check in the cleanup path.
Fixes: 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs") Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm-chip.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
--- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -521,10 +521,20 @@ static int tpm_hwrng_read(struct hwrng * return tpm_get_random(chip, data, max); }
+static bool tpm_is_hwrng_enabled(struct tpm_chip *chip) +{ + if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM)) + return false; + if (tpm_is_firmware_upgrade(chip)) + return false; + if (chip->flags & TPM_CHIP_FLAG_HWRNG_DISABLED) + return false; + return true; +} + static int tpm_add_hwrng(struct tpm_chip *chip) { - if (!IS_ENABLED(CONFIG_HW_RANDOM_TPM) || tpm_is_firmware_upgrade(chip) || - chip->flags & TPM_CHIP_FLAG_HWRNG_DISABLED) + if (!tpm_is_hwrng_enabled(chip)) return 0;
snprintf(chip->hwrng_name, sizeof(chip->hwrng_name), @@ -629,7 +639,7 @@ int tpm_chip_register(struct tpm_chip *c return 0;
out_hwrng: - if (IS_ENABLED(CONFIG_HW_RANDOM_TPM) && !tpm_is_firmware_upgrade(chip)) + if (tpm_is_hwrng_enabled(chip)) hwrng_unregister(&chip->hwrng); out_ppi: tpm_bios_log_teardown(chip); @@ -654,8 +664,7 @@ EXPORT_SYMBOL_GPL(tpm_chip_register); void tpm_chip_unregister(struct tpm_chip *chip) { tpm_del_legacy_sysfs(chip); - if (IS_ENABLED(CONFIG_HW_RANDOM_TPM) && !tpm_is_firmware_upgrade(chip) && - !(chip->flags & TPM_CHIP_FLAG_HWRNG_DISABLED)) + if (tpm_is_hwrng_enabled(chip)) hwrng_unregister(&chip->hwrng); tpm_bios_log_teardown(chip); if (chip->flags & TPM_CHIP_FLAG_TPM2 && !tpm_is_firmware_upgrade(chip))
From: Long Li leo.lilong@huawei.com
commit 5aa4fda5aa9c2a5a7bac67b4a12b089ab81fee3c upstream.
In commit 2b9b8f3b68ed ("ksmbd: validate command payload size"), except for SMB2_OPLOCK_BREAK_HE command, the request size of other commands is not checked, it's not expected. Fix it by add check for request size of other commands.
Cc: stable@vger.kernel.org Fixes: 2b9b8f3b68ed ("ksmbd: validate command payload size") Acked-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Long Li leo.lilong@huawei.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2misc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/smb/server/smb2misc.c +++ b/fs/smb/server/smb2misc.c @@ -380,13 +380,13 @@ int ksmbd_smb2_check_message(struct ksmb }
if (smb2_req_struct_sizes[command] != pdu->StructureSize2) { - if (command == SMB2_OPLOCK_BREAK_HE && - le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_20 && - le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_21) { + if (!(command == SMB2_OPLOCK_BREAK_HE && + (le16_to_cpu(pdu->StructureSize2) == OP_BREAK_STRUCT_SIZE_20 || + le16_to_cpu(pdu->StructureSize2) == OP_BREAK_STRUCT_SIZE_21))) { /* special case for SMB2.1 lease break message */ ksmbd_debug(SMB, - "Illegal request size %d for oplock break\n", - le16_to_cpu(pdu->StructureSize2)); + "Illegal request size %u for command %d\n", + le16_to_cpu(pdu->StructureSize2), command); return 1; } }
From: Namjae Jeon linkinjeon@kernel.org
commit 79ed288cef201f1f212dfb934bcaac75572fb8f6 upstream.
There are multiple smb2_ea_info buffers in FILE_FULL_EA_INFORMATION request from client. ksmbd find next smb2_ea_info using ->NextEntryOffset of current smb2_ea_info. ksmbd need to validate buffer length Before accessing the next ea. ksmbd should check buffer length using buf_len, not next variable. next is the start offset of current ea that got from previous ea.
Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21598 Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2pdu.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -2324,9 +2324,16 @@ next: break; buf_len -= next; eabuf = (struct smb2_ea_info *)((char *)eabuf + next); - if (next < (u32)eabuf->EaNameLength + le16_to_cpu(eabuf->EaValueLength)) + if (buf_len < sizeof(struct smb2_ea_info)) { + rc = -EINVAL; break; + }
+ if (buf_len < sizeof(struct smb2_ea_info) + eabuf->EaNameLength + + le16_to_cpu(eabuf->EaValueLength)) { + rc = -EINVAL; + break; + } } while (next != 0);
kfree(attr_name);
From: Paolo Bonzini pbonzini@redhat.com
commit 4e15a0ddc3ff40e8ea84032213976ecf774d7f77 upstream.
Validation of the GHCB is susceptible to time-of-check/time-of-use vulnerabilities. To avoid them, we would like to always snapshot the fields that are read in sev_es_validate_vmgexit(), and not use the GHCB anymore after it returns.
This means:
- invoking sev_es_sync_from_ghcb() before any GHCB access, including before sev_es_validate_vmgexit()
- snapshotting all fields including the valid bitmap and the sw_scratch field, which are currently not caching anywhere.
The valid bitmap is the first thing to be copied out of the GHCB; then, further accesses will use the copy in svm->sev_es.
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/sev.c | 69 ++++++++++++++++++++++++------------------------- arch/x86/kvm/svm/svm.h | 26 ++++++++++++++++++ 2 files changed, 61 insertions(+), 34 deletions(-)
--- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2414,15 +2414,18 @@ static void sev_es_sync_from_ghcb(struct */ memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs));
- vcpu->arch.regs[VCPU_REGS_RAX] = ghcb_get_rax_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RBX] = ghcb_get_rbx_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RCX] = ghcb_get_rcx_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RDX] = ghcb_get_rdx_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RSI] = ghcb_get_rsi_if_valid(ghcb); + BUILD_BUG_ON(sizeof(svm->sev_es.valid_bitmap) != sizeof(ghcb->save.valid_bitmap)); + memcpy(&svm->sev_es.valid_bitmap, &ghcb->save.valid_bitmap, sizeof(ghcb->save.valid_bitmap));
- svm->vmcb->save.cpl = ghcb_get_cpl_if_valid(ghcb); + vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm, ghcb); + vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm, ghcb); + vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm, ghcb); + vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm, ghcb); + vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm, ghcb);
- if (ghcb_xcr0_is_valid(ghcb)) { + svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb); + + if (kvm_ghcb_xcr0_is_valid(svm)) { vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb); kvm_update_cpuid_runtime(vcpu); } @@ -2433,6 +2436,7 @@ static void sev_es_sync_from_ghcb(struct control->exit_code_hi = upper_32_bits(exit_code); control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb); control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb); + svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm, ghcb);
/* Clear the valid entries fields */ memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap)); @@ -2461,56 +2465,56 @@ static int sev_es_validate_vmgexit(struc
reason = GHCB_ERR_MISSING_INPUT;
- if (!ghcb_sw_exit_code_is_valid(ghcb) || - !ghcb_sw_exit_info_1_is_valid(ghcb) || - !ghcb_sw_exit_info_2_is_valid(ghcb)) + if (!kvm_ghcb_sw_exit_code_is_valid(svm) || + !kvm_ghcb_sw_exit_info_1_is_valid(svm) || + !kvm_ghcb_sw_exit_info_2_is_valid(svm)) goto vmgexit_err;
switch (ghcb_get_sw_exit_code(ghcb)) { case SVM_EXIT_READ_DR7: break; case SVM_EXIT_WRITE_DR7: - if (!ghcb_rax_is_valid(ghcb)) + if (!kvm_ghcb_rax_is_valid(svm)) goto vmgexit_err; break; case SVM_EXIT_RDTSC: break; case SVM_EXIT_RDPMC: - if (!ghcb_rcx_is_valid(ghcb)) + if (!kvm_ghcb_rcx_is_valid(svm)) goto vmgexit_err; break; case SVM_EXIT_CPUID: - if (!ghcb_rax_is_valid(ghcb) || - !ghcb_rcx_is_valid(ghcb)) + if (!kvm_ghcb_rax_is_valid(svm) || + !kvm_ghcb_rcx_is_valid(svm)) goto vmgexit_err; if (ghcb_get_rax(ghcb) == 0xd) - if (!ghcb_xcr0_is_valid(ghcb)) + if (!kvm_ghcb_xcr0_is_valid(svm)) goto vmgexit_err; break; case SVM_EXIT_INVD: break; case SVM_EXIT_IOIO: if (ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_STR_MASK) { - if (!ghcb_sw_scratch_is_valid(ghcb)) + if (!kvm_ghcb_sw_scratch_is_valid(svm)) goto vmgexit_err; } else { if (!(ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_TYPE_MASK)) - if (!ghcb_rax_is_valid(ghcb)) + if (!kvm_ghcb_rax_is_valid(svm)) goto vmgexit_err; } break; case SVM_EXIT_MSR: - if (!ghcb_rcx_is_valid(ghcb)) + if (!kvm_ghcb_rcx_is_valid(svm)) goto vmgexit_err; if (ghcb_get_sw_exit_info_1(ghcb)) { - if (!ghcb_rax_is_valid(ghcb) || - !ghcb_rdx_is_valid(ghcb)) + if (!kvm_ghcb_rax_is_valid(svm) || + !kvm_ghcb_rdx_is_valid(svm)) goto vmgexit_err; } break; case SVM_EXIT_VMMCALL: - if (!ghcb_rax_is_valid(ghcb) || - !ghcb_cpl_is_valid(ghcb)) + if (!kvm_ghcb_rax_is_valid(svm) || + !kvm_ghcb_cpl_is_valid(svm)) goto vmgexit_err; break; case SVM_EXIT_RDTSCP: @@ -2518,19 +2522,19 @@ static int sev_es_validate_vmgexit(struc case SVM_EXIT_WBINVD: break; case SVM_EXIT_MONITOR: - if (!ghcb_rax_is_valid(ghcb) || - !ghcb_rcx_is_valid(ghcb) || - !ghcb_rdx_is_valid(ghcb)) + if (!kvm_ghcb_rax_is_valid(svm) || + !kvm_ghcb_rcx_is_valid(svm) || + !kvm_ghcb_rdx_is_valid(svm)) goto vmgexit_err; break; case SVM_EXIT_MWAIT: - if (!ghcb_rax_is_valid(ghcb) || - !ghcb_rcx_is_valid(ghcb)) + if (!kvm_ghcb_rax_is_valid(svm) || + !kvm_ghcb_rcx_is_valid(svm)) goto vmgexit_err; break; case SVM_VMGEXIT_MMIO_READ: case SVM_VMGEXIT_MMIO_WRITE: - if (!ghcb_sw_scratch_is_valid(ghcb)) + if (!kvm_ghcb_sw_scratch_is_valid(svm)) goto vmgexit_err; break; case SVM_VMGEXIT_NMI_COMPLETE: @@ -2560,9 +2564,6 @@ vmgexit_err: dump_ghcb(svm); }
- /* Clear the valid entries fields */ - memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap)); - ghcb_set_sw_exit_info_1(ghcb, 2); ghcb_set_sw_exit_info_2(ghcb, reason);
@@ -2583,7 +2584,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm * */ if (svm->sev_es.ghcb_sa_sync) { kvm_write_guest(svm->vcpu.kvm, - ghcb_get_sw_scratch(svm->sev_es.ghcb), + svm->sev_es.sw_scratch, svm->sev_es.ghcb_sa, svm->sev_es.ghcb_sa_len); svm->sev_es.ghcb_sa_sync = false; @@ -2634,7 +2635,7 @@ static int setup_vmgexit_scratch(struct u64 scratch_gpa_beg, scratch_gpa_end; void *scratch_va;
- scratch_gpa_beg = ghcb_get_sw_scratch(ghcb); + scratch_gpa_beg = svm->sev_es.sw_scratch; if (!scratch_gpa_beg) { pr_err("vmgexit: scratch gpa not provided\n"); goto e_scratch; @@ -2850,11 +2851,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *
exit_code = ghcb_get_sw_exit_code(ghcb);
+ sev_es_sync_from_ghcb(svm); ret = sev_es_validate_vmgexit(svm); if (ret) return ret;
- sev_es_sync_from_ghcb(svm); ghcb_set_sw_exit_info_1(ghcb, 0); ghcb_set_sw_exit_info_2(ghcb, 0);
--- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -190,10 +190,12 @@ struct vcpu_sev_es_state { /* SEV-ES support */ struct sev_es_save_area *vmsa; struct ghcb *ghcb; + u8 valid_bitmap[16]; struct kvm_host_map ghcb_map; bool received_first_sipi;
/* SEV-ES scratch area support */ + u64 sw_scratch; void *ghcb_sa; u32 ghcb_sa_len; bool ghcb_sa_sync; @@ -745,4 +747,28 @@ void sev_es_unmap_ghcb(struct vcpu_svm * void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted); void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
+#define DEFINE_KVM_GHCB_ACCESSORS(field) \ + static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \ + { \ + return test_bit(GHCB_BITMAP_IDX(field), \ + (unsigned long *)&svm->sev_es.valid_bitmap); \ + } \ + \ + static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm, struct ghcb *ghcb) \ + { \ + return kvm_ghcb_##field##_is_valid(svm) ? ghcb->save.field : 0; \ + } \ + +DEFINE_KVM_GHCB_ACCESSORS(cpl) +DEFINE_KVM_GHCB_ACCESSORS(rax) +DEFINE_KVM_GHCB_ACCESSORS(rcx) +DEFINE_KVM_GHCB_ACCESSORS(rdx) +DEFINE_KVM_GHCB_ACCESSORS(rbx) +DEFINE_KVM_GHCB_ACCESSORS(rsi) +DEFINE_KVM_GHCB_ACCESSORS(sw_exit_code) +DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1) +DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2) +DEFINE_KVM_GHCB_ACCESSORS(sw_scratch) +DEFINE_KVM_GHCB_ACCESSORS(xcr0) + #endif
From: Paolo Bonzini pbonzini@redhat.com
commit 7588dbcebcbf0193ab5b76987396d0254270b04a upstream.
A KVM guest using SEV-ES or SEV-SNP with multiple vCPUs can trigger a double fetch race condition vulnerability and invoke the VMGEXIT handler recursively.
sev_handle_vmgexit() maps the GHCB page using kvm_vcpu_map() and then fetches the exit code using ghcb_get_sw_exit_code(). Soon after, sev_es_validate_vmgexit() fetches the exit code again. Since the GHCB page is shared with the guest, the guest is able to quickly swap the values with another vCPU and hence bypass the validation. One vmexit code that can be rejected by sev_es_validate_vmgexit() is SVM_EXIT_VMGEXIT; if sev_handle_vmgexit() observes it in the second fetch, the call to svm_invoke_exit_handler() will invoke sev_handle_vmgexit() again recursively.
To avoid the race, always fetch the GHCB data from the places where sev_es_sync_from_ghcb stores it.
Exploiting recursions on linux kernel has been proven feasible in the past, but the impact is mitigated by stack guard pages (CONFIG_VMAP_STACK). Still, if an attacker manages to call the handler multiple times, they can theoretically trigger a stack overflow and cause a denial-of-service, or potentially guest-to-host escape in kernel configurations without stack guard pages.
Note that winning the race reliably in every iteration is very tricky due to the very tight window of the fetches; depending on the compiler settings, they are often consecutive because of optimization and inlining.
Tested by booting an SEV-ES RHEL9 guest.
Fixes: CVE-2023-4155 Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: stable@vger.kernel.org Reported-by: Andy Nguyen theflow@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/sev.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
--- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2442,9 +2442,15 @@ static void sev_es_sync_from_ghcb(struct memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap)); }
+static u64 kvm_ghcb_get_sw_exit_code(struct vmcb_control_area *control) +{ + return (((u64)control->exit_code_hi) << 32) | control->exit_code; +} + static int sev_es_validate_vmgexit(struct vcpu_svm *svm) { - struct kvm_vcpu *vcpu; + struct vmcb_control_area *control = &svm->vmcb->control; + struct kvm_vcpu *vcpu = &svm->vcpu; struct ghcb *ghcb; u64 exit_code; u64 reason; @@ -2455,7 +2461,7 @@ static int sev_es_validate_vmgexit(struc * Retrieve the exit code now even though it may not be marked valid * as it could help with debugging. */ - exit_code = ghcb_get_sw_exit_code(ghcb); + exit_code = kvm_ghcb_get_sw_exit_code(control);
/* Only GHCB Usage code 0 is supported */ if (ghcb->ghcb_usage) { @@ -2470,7 +2476,7 @@ static int sev_es_validate_vmgexit(struc !kvm_ghcb_sw_exit_info_2_is_valid(svm)) goto vmgexit_err;
- switch (ghcb_get_sw_exit_code(ghcb)) { + switch (exit_code) { case SVM_EXIT_READ_DR7: break; case SVM_EXIT_WRITE_DR7: @@ -2487,18 +2493,18 @@ static int sev_es_validate_vmgexit(struc if (!kvm_ghcb_rax_is_valid(svm) || !kvm_ghcb_rcx_is_valid(svm)) goto vmgexit_err; - if (ghcb_get_rax(ghcb) == 0xd) + if (vcpu->arch.regs[VCPU_REGS_RAX] == 0xd) if (!kvm_ghcb_xcr0_is_valid(svm)) goto vmgexit_err; break; case SVM_EXIT_INVD: break; case SVM_EXIT_IOIO: - if (ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_STR_MASK) { + if (control->exit_info_1 & SVM_IOIO_STR_MASK) { if (!kvm_ghcb_sw_scratch_is_valid(svm)) goto vmgexit_err; } else { - if (!(ghcb_get_sw_exit_info_1(ghcb) & SVM_IOIO_TYPE_MASK)) + if (!(control->exit_info_1 & SVM_IOIO_TYPE_MASK)) if (!kvm_ghcb_rax_is_valid(svm)) goto vmgexit_err; } @@ -2506,7 +2512,7 @@ static int sev_es_validate_vmgexit(struc case SVM_EXIT_MSR: if (!kvm_ghcb_rcx_is_valid(svm)) goto vmgexit_err; - if (ghcb_get_sw_exit_info_1(ghcb)) { + if (control->exit_info_1) { if (!kvm_ghcb_rax_is_valid(svm) || !kvm_ghcb_rdx_is_valid(svm)) goto vmgexit_err; @@ -2550,8 +2556,6 @@ static int sev_es_validate_vmgexit(struc return 0;
vmgexit_err: - vcpu = &svm->vcpu; - if (reason == GHCB_ERR_INVALID_USAGE) { vcpu_unimpl(vcpu, "vmgexit: ghcb usage %#x is not valid\n", ghcb->ghcb_usage); @@ -2849,8 +2853,6 @@ int sev_handle_vmgexit(struct kvm_vcpu *
trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);
- exit_code = ghcb_get_sw_exit_code(ghcb); - sev_es_sync_from_ghcb(svm); ret = sev_es_validate_vmgexit(svm); if (ret) @@ -2859,6 +2861,7 @@ int sev_handle_vmgexit(struct kvm_vcpu * ghcb_set_sw_exit_info_1(ghcb, 0); ghcb_set_sw_exit_info_2(ghcb, 0);
+ exit_code = kvm_ghcb_get_sw_exit_code(control); switch (exit_code) { case SVM_VMGEXIT_MMIO_READ: ret = setup_vmgexit_scratch(svm, true, control->exit_info_2);
From: Keith Yeo keithyjy@gmail.com
commit 6311071a056272e1e761de8d0305e87cc566f734 upstream.
nl80211_parse_mbssid_elems() uses a u8 variable num_elems to count the number of MBSSID elements in the nested netlink attribute attrs, which can lead to an integer overflow if a user of the nl80211 interface specifies 256 or more elements in the corresponding attribute in userspace. The integer overflow can lead to a heap buffer overflow as num_elems determines the size of the trailing array in elems, and this array is thereafter written to for each element in attrs.
Note that this vulnerability only affects devices with the wiphy->mbssid_max_interfaces member set for the wireless physical device struct in the device driver, and can only be triggered by a process with CAP_NET_ADMIN capabilities.
Fix this by checking for a maximum of 255 elements in attrs.
Cc: stable@vger.kernel.org Fixes: dc1e3cb8da8b ("nl80211: MBSSID and EMA support in AP mode") Signed-off-by: Keith Yeo keithyjy@gmail.com Link: https://lore.kernel.org/r/20230731034719.77206-1-keithyjy@gmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/nl80211.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -5426,8 +5426,11 @@ nl80211_parse_mbssid_elems(struct wiphy if (!wiphy->mbssid_max_interfaces) return ERR_PTR(-EINVAL);
- nla_for_each_nested(nl_elems, attrs, rem_elems) + nla_for_each_nested(nl_elems, attrs, rem_elems) { + if (num_elems >= 255) + return ERR_PTR(-EINVAL); num_elems++; + }
elems = kzalloc(struct_size(elems, elem, num_elems), GFP_KERNEL); if (!elems)
From: Ping-Ke Shih pkshih@realtek.com
commit b74bb07cdab6859e1a3fc9fe7351052176322ddf upstream.
RX full flags are raised if certain types of RX FIFO are full, and then drop all following MPDU of AMPDU. In order to resume to receive MPDU when RX FIFO becomes available, we clear the register bits by the commit a0d99ebb3ecd ("wifi: rtw89: initialize DMA of CMAC"). But, 8852AE needs more settings to support this. To quickly fix disconnection problem, revert the behavior as before.
Fixes: a0d99ebb3ecd ("wifi: rtw89: initialize DMA of CMAC") Reported-by: Damian B bronecki.damian@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217710 Cc: Stable@vger.kernel.org Signed-off-by: Ping-Ke Shih pkshih@realtek.com Tested-by: Damian B bronecki.damian@gmail.com Link: https://lore.kernel.org/r/20230808005426.5327-1-pkshih@realtek.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/realtek/rtw89/mac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/wireless/realtek/rtw89/mac.c +++ b/drivers/net/wireless/realtek/rtw89/mac.c @@ -2484,7 +2484,7 @@ static int cmac_dma_init(struct rtw89_de u32 reg; int ret;
- if (chip_id != RTL8852A && chip_id != RTL8852B) + if (chip_id != RTL8852B) return 0;
ret = rtw89_mac_check_mac_en(rtwdev, mac_idx, RTW89_CMAC_SEL);
From: Ido Schimmel idosch@nvidia.com
commit 38f7c44d6e760a8513557e27340d61b820c91b8f upstream.
The test uses the 'TROUTE6' environment variable to encode the name of the IPv6 traceroute utility. By default (without a configuration file), this variable is not set, resulting in failures:
# ./ip6_forward_instats_vrf.sh TEST: ping6 [ OK ] TEST: Ip6InTooBigErrors [ OK ] TEST: Ip6InHdrErrors [FAIL] TEST: Ip6InAddrErrors [ OK ] TEST: Ip6InDiscards [ OK ]
Fix by setting a default utility name and skip the test if the utility is not present.
Fixes: 0857d6f8c759 ("ipv6: When forwarding count rx stats on the orig netdev") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-6-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/ip6_forward_instats_vrf.sh | 2 ++ tools/testing/selftests/net/forwarding/lib.sh | 1 + 2 files changed, 3 insertions(+)
--- a/tools/testing/selftests/net/forwarding/ip6_forward_instats_vrf.sh +++ b/tools/testing/selftests/net/forwarding/ip6_forward_instats_vrf.sh @@ -14,6 +14,8 @@ ALL_TESTS=" NUM_NETIFS=4 source lib.sh
+require_command $TROUTE6 + h1_create() { simple_if_init $h1 2001:1:1::2/64 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -30,6 +30,7 @@ REQUIRE_MZ=${REQUIRE_MZ:=yes} REQUIRE_MTOOLS=${REQUIRE_MTOOLS:=no} STABLE_MAC_ADDRS=${STABLE_MAC_ADDRS:=no} TCPDUMP_EXTRA_FLAGS=${TCPDUMP_EXTRA_FLAGS:=} +TROUTE6=${TROUTE6:=traceroute6}
relative_path="${BASH_SOURCE%/*}" if [[ "$relative_path" == "${BASH_SOURCE}" ]]; then
From: Jason A. Donenfeld Jason@zx2c4.com
commit 46622219aae2b67813fe31a7b8cb7da5baff5c8a upstream.
In the allowedips self-test, nodes are inserted into the tree, but it generated an even amount of nodes, but for checking maximum node depth, there is of course the root node, which makes the total number necessarily odd. With two few nodes added, it never triggered the maximum depth check like it should have. So, add 129 nodes instead of 128 nodes, and do so with a more straightforward scheme, starting with all the bits set, and shifting over one each time. Then increase the maximum depth to 129, and choose a better name for that variable to make it clear that it represents depth as opposed to bits.
Cc: stable@vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Link: https://lore.kernel.org/r/20230807132146.2191597-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireguard/allowedips.c | 8 ++++---- drivers/net/wireguard/selftest/allowedips.c | 16 ++++++++++------ 2 files changed, 14 insertions(+), 10 deletions(-)
--- a/drivers/net/wireguard/allowedips.c +++ b/drivers/net/wireguard/allowedips.c @@ -6,7 +6,7 @@ #include "allowedips.h" #include "peer.h"
-enum { MAX_ALLOWEDIPS_BITS = 128 }; +enum { MAX_ALLOWEDIPS_DEPTH = 129 };
static struct kmem_cache *node_cache;
@@ -42,7 +42,7 @@ static void push_rcu(struct allowedips_n struct allowedips_node __rcu *p, unsigned int *len) { if (rcu_access_pointer(p)) { - if (WARN_ON(IS_ENABLED(DEBUG) && *len >= MAX_ALLOWEDIPS_BITS)) + if (WARN_ON(IS_ENABLED(DEBUG) && *len >= MAX_ALLOWEDIPS_DEPTH)) return; stack[(*len)++] = rcu_dereference_raw(p); } @@ -55,7 +55,7 @@ static void node_free_rcu(struct rcu_hea
static void root_free_rcu(struct rcu_head *rcu) { - struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_BITS] = { + struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_DEPTH] = { container_of(rcu, struct allowedips_node, rcu) }; unsigned int len = 1;
@@ -68,7 +68,7 @@ static void root_free_rcu(struct rcu_hea
static void root_remove_peer_lists(struct allowedips_node *root) { - struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_BITS] = { root }; + struct allowedips_node *node, *stack[MAX_ALLOWEDIPS_DEPTH] = { root }; unsigned int len = 1;
while (len > 0 && (node = stack[--len])) { --- a/drivers/net/wireguard/selftest/allowedips.c +++ b/drivers/net/wireguard/selftest/allowedips.c @@ -593,16 +593,20 @@ bool __init wg_allowedips_selftest(void) wg_allowedips_remove_by_peer(&t, a, &mutex); test_negative(4, a, 192, 168, 0, 1);
- /* These will hit the WARN_ON(len >= MAX_ALLOWEDIPS_BITS) in free_node + /* These will hit the WARN_ON(len >= MAX_ALLOWEDIPS_DEPTH) in free_node * if something goes wrong. */ - for (i = 0; i < MAX_ALLOWEDIPS_BITS; ++i) { - part = cpu_to_be64(~(1LLU << (i % 64))); - memset(&ip, 0xff, 16); - memcpy((u8 *)&ip + (i < 64) * 8, &part, 8); + for (i = 0; i < 64; ++i) { + part = cpu_to_be64(~0LLU << i); + memset(&ip, 0xff, 8); + memcpy((u8 *)&ip + 8, &part, 8); + wg_allowedips_insert_v6(&t, &ip, 128, a, &mutex); + memcpy(&ip, &part, 8); + memset((u8 *)&ip + 8, 0, 8); wg_allowedips_insert_v6(&t, &ip, 128, a, &mutex); } - + memset(&ip, 0, 16); + wg_allowedips_insert_v6(&t, &ip, 128, a, &mutex); wg_allowedips_free(&t, &mutex);
wg_allowedips_init(&t);
From: Sergei Antonov saproj@gmail.com
commit d44263222134b5635932974c6177a5cba65a07e8 upstream.
Conversion from big-endian to native is done in a common function mmc_app_send_scr(). Converting in moxart_transfer_pio() is extra. Double conversion on a LE system returns an incorrect SCR value, leads to errors:
mmc0: unrecognised SCR structure version 8
Fixes: 1b66e94e6b99 ("mmc: moxart: Add MOXA ART SD/MMC driver") Signed-off-by: Sergei Antonov saproj@gmail.com Cc: Jonas Jensen jonas.jensen@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230627120549.2400325-1-saproj@gmail.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/moxart-mmc.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/drivers/mmc/host/moxart-mmc.c +++ b/drivers/mmc/host/moxart-mmc.c @@ -338,13 +338,7 @@ static void moxart_transfer_pio(struct m return; } for (len = 0; len < remain && len < host->fifo_width;) { - /* SCR data must be read in big endian. */ - if (data->mrq->cmd->opcode == SD_APP_SEND_SCR) - *sgp = ioread32be(host->base + - REG_DATA_WINDOW); - else - *sgp = ioread32(host->base + - REG_DATA_WINDOW); + *sgp = ioread32(host->base + REG_DATA_WINDOW); sgp++; len += 4; }
From: Kunihiko Hayashi hayashi.kunihiko@socionext.com
commit 5def5c1c15bf22934ee227af85c1716762f3829f upstream.
Even if sdhci_pltfm_pmops is specified for PM, this driver doesn't apply sdhci_pltfm, so the structure is not correctly referenced in PM functions. This applies sdhci_pltfm to this driver to fix this issue.
- Call sdhci_pltfm_init() instead of sdhci_alloc_host() and other functions that covered by sdhci_pltfm. - Move ops and quirks to sdhci_pltfm_data - Replace sdhci_priv() with own private function sdhci_f_sdh30_priv().
Fixes: 87a507459f49 ("mmc: sdhci: host: add new f_sdh30") Signed-off-by: Kunihiko Hayashi hayashi.kunihiko@socionext.com Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230630004533.26644-1-hayashi.kunihiko@socionext.... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci_f_sdh30.c | 60 +++++++++++++++++---------------------- 1 file changed, 27 insertions(+), 33 deletions(-)
--- a/drivers/mmc/host/sdhci_f_sdh30.c +++ b/drivers/mmc/host/sdhci_f_sdh30.c @@ -29,9 +29,16 @@ struct f_sdhost_priv { bool enable_cmd_dat_delay; };
+static void *sdhci_f_sdhost_priv(struct sdhci_host *host) +{ + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); + + return sdhci_pltfm_priv(pltfm_host); +} + static void sdhci_f_sdh30_soft_voltage_switch(struct sdhci_host *host) { - struct f_sdhost_priv *priv = sdhci_priv(host); + struct f_sdhost_priv *priv = sdhci_f_sdhost_priv(host); u32 ctrl = 0;
usleep_range(2500, 3000); @@ -64,7 +71,7 @@ static unsigned int sdhci_f_sdh30_get_mi
static void sdhci_f_sdh30_reset(struct sdhci_host *host, u8 mask) { - struct f_sdhost_priv *priv = sdhci_priv(host); + struct f_sdhost_priv *priv = sdhci_f_sdhost_priv(host); u32 ctl;
if (sdhci_readw(host, SDHCI_CLOCK_CONTROL) == 0) @@ -95,30 +102,32 @@ static const struct sdhci_ops sdhci_f_sd .set_uhs_signaling = sdhci_set_uhs_signaling, };
+static const struct sdhci_pltfm_data sdhci_f_sdh30_pltfm_data = { + .ops = &sdhci_f_sdh30_ops, + .quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC + | SDHCI_QUIRK_INVERTED_WRITE_PROTECT, + .quirks2 = SDHCI_QUIRK2_SUPPORT_SINGLE + | SDHCI_QUIRK2_TUNING_WORK_AROUND, +}; + static int sdhci_f_sdh30_probe(struct platform_device *pdev) { struct sdhci_host *host; struct device *dev = &pdev->dev; - int irq, ctrl = 0, ret = 0; + int ctrl = 0, ret = 0; struct f_sdhost_priv *priv; + struct sdhci_pltfm_host *pltfm_host; u32 reg = 0;
- irq = platform_get_irq(pdev, 0); - if (irq < 0) - return irq; - - host = sdhci_alloc_host(dev, sizeof(struct f_sdhost_priv)); + host = sdhci_pltfm_init(pdev, &sdhci_f_sdh30_pltfm_data, + sizeof(struct f_sdhost_priv)); if (IS_ERR(host)) return PTR_ERR(host);
- priv = sdhci_priv(host); + pltfm_host = sdhci_priv(host); + priv = sdhci_pltfm_priv(pltfm_host); priv->dev = dev;
- host->quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC | - SDHCI_QUIRK_INVERTED_WRITE_PROTECT; - host->quirks2 = SDHCI_QUIRK2_SUPPORT_SINGLE | - SDHCI_QUIRK2_TUNING_WORK_AROUND; - priv->enable_cmd_dat_delay = device_property_read_bool(dev, "fujitsu,cmd-dat-delay-select");
@@ -126,18 +135,6 @@ static int sdhci_f_sdh30_probe(struct pl if (ret) goto err;
- platform_set_drvdata(pdev, host); - - host->hw_name = "f_sdh30"; - host->ops = &sdhci_f_sdh30_ops; - host->irq = irq; - - host->ioaddr = devm_platform_ioremap_resource(pdev, 0); - if (IS_ERR(host->ioaddr)) { - ret = PTR_ERR(host->ioaddr); - goto err; - } - if (dev_of_node(dev)) { sdhci_get_of_property(pdev);
@@ -204,24 +201,21 @@ err_rst: err_clk: clk_disable_unprepare(priv->clk_iface); err: - sdhci_free_host(host); + sdhci_pltfm_free(pdev); + return ret; }
static int sdhci_f_sdh30_remove(struct platform_device *pdev) { struct sdhci_host *host = platform_get_drvdata(pdev); - struct f_sdhost_priv *priv = sdhci_priv(host); - - sdhci_remove_host(host, readl(host->ioaddr + SDHCI_INT_STATUS) == - 0xffffffff); + struct f_sdhost_priv *priv = sdhci_f_sdhost_priv(host);
reset_control_assert(priv->rst); clk_disable_unprepare(priv->clk); clk_disable_unprepare(priv->clk_iface);
- sdhci_free_host(host); - platform_set_drvdata(pdev, NULL); + sdhci_pltfm_unregister(pdev);
return 0; }
From: Maciej Żenczykowski maze@google.com
commit 048c796beb6eb4fa3a5a647ee1c81f5c6f0f6a2a upstream.
The upcoming (and nearly finalized): https://datatracker.ietf.org/doc/draft-collink-6man-pio-pflag/ will update the IPv6 RA to include a new flag in the PIO field, which will serve as a hint to perform DHCPv6-PD.
As we don't want DHCPv6 related logic inside the kernel, this piece of information needs to be exposed to userspace. The simplest option is to simply expose the entire PIO through the already existing mechanism.
Even without this new flag, the already existing PIO R (router address) flag (from RFC6275) cannot AFAICT be handled entirely in kernel, and provides useful information that should be exposed to userspace (the router's global address, for use by Mobile IPv6).
Also cc'ing stable@ for inclusion in LTS, as while technically this is not quite a bugfix, and instead more of a feature, it is absolutely trivial and the alternative is manually cherrypicking into all Android Common Kernel trees - and I know Greg will ask for it to be sent in via LTS instead...
Cc: Jen Linkova furry@google.com Cc: Lorenzo Colitti lorenzo@google.com Cc: David Ahern dsahern@gmail.com Cc: YOSHIFUJI Hideaki / 吉藤英明 yoshfuji@linux-ipv6.org Cc: stable@vger.kernel.org Signed-off-by: Maciej Żenczykowski maze@google.com Link: https://lore.kernel.org/r/20230807102533.1147559-1-maze@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ndisc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -197,7 +197,8 @@ static struct nd_opt_hdr *ndisc_next_opt static inline int ndisc_is_useropt(const struct net_device *dev, struct nd_opt_hdr *opt) { - return opt->nd_opt_type == ND_OPT_RDNSS || + return opt->nd_opt_type == ND_OPT_PREFIX_INFO || + opt->nd_opt_type == ND_OPT_RDNSS || opt->nd_opt_type == ND_OPT_DNSSL || opt->nd_opt_type == ND_OPT_CAPTIVE_PORTAL || opt->nd_opt_type == ND_OPT_PREF64 ||
From: Andrea Claudi aclaudi@redhat.com
commit aaf2123a5cf46dbd97f84b6eee80269758064d93 upstream.
mptcp_join 'delete and re-add' test fails when using ip mptcp:
$ ./mptcp_join.sh -iI <snip> 002 delete and re-add before delete[ ok ] mptcp_info subflows=1 [ ok ] Error: argument "ADDRESS" is wrong: invalid for non-zero id address after delete[fail] got 2:2 subflows expected 1
This happens because endpoint delete includes an ip address while id is not 0, contrary to what is indicated in the ip mptcp man page:
"When used with the delete id operation, an IFADDR is only included when the ID is 0."
This fixes the issue using the $addr variable in pm_nl_del_endpoint() only when id is 0.
Fixes: 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers") Cc: stable@vger.kernel.org Signed-off-by: Andrea Claudi aclaudi@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-1... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 1 + 1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -676,6 +676,7 @@ pm_nl_del_endpoint() local addr=$3
if [ $ip_mptcp -eq 1 ]; then + [ $id -ne 0 ] && addr='' ip -n $ns mptcp endpoint delete id $id $addr else ip netns exec $ns ./pm_nl_ctl del $id $addr
From: Andrea Claudi aclaudi@redhat.com
commit c8c101ae390a3e817369e94a6f12a1ddea420702 upstream.
mptcp_join 'implicit EP' test currently fails when using ip mptcp:
$ ./mptcp_join.sh -iI <snip> 001 implicit EP creation[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 ' Error: too many addresses or duplicate one: -22. ID change is prevented[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 ' modif is allowed[fail] expected '10.0.2.2 10.0.2.2 id 1 signal' found '10.0.2.2 id 1 signal '
This happens because of two reasons: - iproute v6.3.0 does not support the implicit flag, fixed with iproute2-next commit 3a2535a41854 ("mptcp: add support for implicit flag") - pm_nl_check_endpoint wrongly expects the ip address to be repeated two times in iproute output, and does not account for a final whitespace in it.
This fixes the issue trimming the whitespace in the output string and removing the double address in the expected string.
Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Andrea Claudi aclaudi@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-2... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -767,10 +767,11 @@ pm_nl_check_endpoint() fi
if [ $ip_mptcp -eq 1 ]; then + # get line and trim trailing whitespace line=$(ip -n $ns mptcp endpoint show $id) + line="${line% }" # the dump order is: address id flags port dev - expected_line="$addr" - [ -n "$addr" ] && expected_line="$expected_line $addr" + [ -n "$addr" ] && expected_line="$addr" expected_line="$expected_line $id" [ -n "$_flags" ] && expected_line="$expected_line ${_flags//","/" "}" [ -n "$dev" ] && expected_line="$expected_line $dev"
From: Paolo Abeni pabeni@redhat.com
commit ff18f9ef30ee87740f741b964375d0cfb84e1ec2 upstream.
Since the blamed commit, the MPTCP protocol unconditionally sends TCP resets on all the subflows on disconnect().
That fits full-blown MPTCP sockets - to implement the fastclose mechanism - but causes unexpected corruption of the data stream, caught as sporadic self-tests failures.
Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios") Cc: stable@vger.kernel.org Tested-by: Matthieu Baerts matthieu.baerts@tessares.net Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/419 Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-3... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2335,7 +2335,7 @@ static void __mptcp_close_ssk(struct soc
lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
- if (flags & MPTCP_CF_FASTCLOSE) { + if ((flags & MPTCP_CF_FASTCLOSE) && !__mptcp_check_fallback(msk)) { /* be sure to force the tcp_disconnect() path, * to generate the egress reset */
From: Paolo Abeni pabeni@redhat.com
commit 511b90e39250135a7f900f1c3afbce25543018a2 upstream.
Despite commit 0ad529d9fd2b ("mptcp: fix possible divide by zero in recvmsg()"), the mptcp protocol is still prone to a race between disconnect() (or shutdown) and accept.
The root cause is that the mentioned commit checks the msk-level flag, but mptcp_stream_accept() does acquire the msk-level lock, as it can rely directly on the first subflow lock.
As reported by Christoph than can lead to a race where an msk socket is accepted after that mptcp_subflow_queue_clean() releases the listener socket lock and just before it takes destructive actions leading to the following splat:
BUG: kernel NULL pointer dereference, address: 0000000000000012 PGD 5a4ca067 P4D 5a4ca067 PUD 37d4c067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 2 PID: 10955 Comm: syz-executor.5 Not tainted 6.5.0-rc1-gdc7b257ee5dd #37 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:mptcp_stream_accept+0x1ee/0x2f0 include/net/inet_sock.h:330 Code: 0a 09 00 48 8b 1b 4c 39 e3 74 07 e8 bc 7c 7f fe eb a1 e8 b5 7c 7f fe 4c 8b 6c 24 08 eb 05 e8 a9 7c 7f fe 49 8b 85 d8 09 00 00 <0f> b6 40 12 88 44 24 07 0f b6 6c 24 07 bf 07 00 00 00 89 ee e8 89 RSP: 0018:ffffc90000d07dc0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888037e8d020 RCX: ffff88803b093300 RDX: 0000000000000000 RSI: ffffffff833822c5 RDI: ffffffff8333896a RBP: 0000607f82031520 R08: ffff88803b093300 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000003e83 R12: ffff888037e8d020 R13: ffff888037e8c680 R14: ffff888009af7900 R15: ffff888009af6880 FS: 00007fc26d708640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000012 CR3: 0000000066bc5001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> do_accept+0x1ae/0x260 net/socket.c:1872 __sys_accept4+0x9b/0x110 net/socket.c:1913 __do_sys_accept4 net/socket.c:1954 [inline] __se_sys_accept4 net/socket.c:1951 [inline] __x64_sys_accept4+0x20/0x30 net/socket.c:1951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Address the issue by temporary removing the pending request socket from the accept queue, so that racing accept() can't touch them.
After depleting the msk - the ssk still exists, as plain TCP sockets, re-insert them into the accept queue, so that later inet_csk_listen_stop() will complete the tcp socket disposal.
Fixes: 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch cpaasch@apple.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/423 Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-4... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.h | 1 net/mptcp/subflow.c | 60 +++++++++++++++++++++++++-------------------------- 2 files changed, 30 insertions(+), 31 deletions(-)
--- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -320,7 +320,6 @@ struct mptcp_sock {
u32 setsockopt_seq; char ca_name[TCP_CA_NAME_MAX]; - struct mptcp_sock *dl_next; };
#define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1785,16 +1785,31 @@ static void subflow_state_change(struct void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_ssk) { struct request_sock_queue *queue = &inet_csk(listener_ssk)->icsk_accept_queue; - struct mptcp_sock *msk, *next, *head = NULL; - struct request_sock *req; - struct sock *sk; - - /* build a list of all unaccepted mptcp sockets */ + struct request_sock *req, *head, *tail; + struct mptcp_subflow_context *subflow; + struct sock *sk, *ssk; + + /* Due to lock dependencies no relevant lock can be acquired under rskq_lock. + * Splice the req list, so that accept() can not reach the pending ssk after + * the listener socket is released below. + */ spin_lock_bh(&queue->rskq_lock); - for (req = queue->rskq_accept_head; req; req = req->dl_next) { - struct mptcp_subflow_context *subflow; - struct sock *ssk = req->sk; + head = queue->rskq_accept_head; + tail = queue->rskq_accept_tail; + queue->rskq_accept_head = NULL; + queue->rskq_accept_tail = NULL; + spin_unlock_bh(&queue->rskq_lock); + + if (!head) + return; + + /* can't acquire the msk socket lock under the subflow one, + * or will cause ABBA deadlock + */ + release_sock(listener_ssk);
+ for (req = head; req; req = req->dl_next) { + ssk = req->sk; if (!sk_is_mptcp(ssk)) continue;
@@ -1802,32 +1817,10 @@ void mptcp_subflow_queue_clean(struct so if (!subflow || !subflow->conn) continue;
- /* skip if already in list */ sk = subflow->conn; - msk = mptcp_sk(sk); - if (msk->dl_next || msk == head) - continue; - sock_hold(sk); - msk->dl_next = head; - head = msk; - } - spin_unlock_bh(&queue->rskq_lock); - if (!head) - return; - - /* can't acquire the msk socket lock under the subflow one, - * or will cause ABBA deadlock - */ - release_sock(listener_ssk); - - for (msk = head; msk; msk = next) { - sk = (struct sock *)msk;
lock_sock_nested(sk, SINGLE_DEPTH_NESTING); - next = msk->dl_next; - msk->dl_next = NULL; - __mptcp_unaccepted_force_close(sk); release_sock(sk);
@@ -1851,6 +1844,13 @@ void mptcp_subflow_queue_clean(struct so
/* we are still under the listener msk socket lock */ lock_sock_nested(listener_ssk, SINGLE_DEPTH_NESTING); + + /* restore the listener queue, to let the TCP code clean it up */ + spin_lock_bh(&queue->rskq_lock); + WARN_ON_ONCE(queue->rskq_accept_head); + queue->rskq_accept_head = head; + queue->rskq_accept_tail = tail; + spin_unlock_bh(&queue->rskq_lock); }
static int subflow_ulp_init(struct sock *sk)
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
commit 8cda3ececf07d374774f6a13e5a94bc2dc04c26c upstream.
pl330_pause() does not set anything to indicate paused condition which causes pl330_tx_status() to return DMA_IN_PROGRESS. This breaks 8250 DMA flush after the fix in commit 57e9af7831dc ("serial: 8250_dma: Fix DMA Rx rearm race"). The function comment for pl330_pause() claims pause is supported but resume is not which is enough for 8250 DMA flush to work as long as DMA status reports DMA_PAUSED when appropriate.
Add PAUSED state for descriptor and mark BUSY descriptors with PAUSED in pl330_pause(). Return DMA_PAUSED from pl330_tx_status() when the descriptor is PAUSED.
Reported-by: Richard Tresidder rtresidd@electromag.com.au Tested-by: Richard Tresidder rtresidd@electromag.com.au Fixes: 88987d2c7534 ("dmaengine: pl330: add DMA_PAUSE feature") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-serial/f8a86ecd-64b1-573f-c2fa-59f541083f1a@el... Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20230526105434.14959-1-ilpo.jarvinen@linux.intel.c... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/pl330.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
--- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -404,6 +404,12 @@ enum desc_status { */ BUSY, /* + * Pause was called while descriptor was BUSY. Due to hardware + * limitations, only termination is possible for descriptors + * that have been paused. + */ + PAUSED, + /* * Sitting on the channel work_list but xfer done * by PL330 core */ @@ -2041,7 +2047,7 @@ static inline void fill_queue(struct dma list_for_each_entry(desc, &pch->work_list, node) {
/* If already submitted */ - if (desc->status == BUSY) + if (desc->status == BUSY || desc->status == PAUSED) continue;
ret = pl330_submit_req(pch->thread, desc); @@ -2326,6 +2332,7 @@ static int pl330_pause(struct dma_chan * { struct dma_pl330_chan *pch = to_pchan(chan); struct pl330_dmac *pl330 = pch->dmac; + struct dma_pl330_desc *desc; unsigned long flags;
pm_runtime_get_sync(pl330->ddma.dev); @@ -2335,6 +2342,10 @@ static int pl330_pause(struct dma_chan * _stop(pch->thread); spin_unlock(&pl330->lock);
+ list_for_each_entry(desc, &pch->work_list, node) { + if (desc->status == BUSY) + desc->status = PAUSED; + } spin_unlock_irqrestore(&pch->lock, flags); pm_runtime_mark_last_busy(pl330->ddma.dev); pm_runtime_put_autosuspend(pl330->ddma.dev); @@ -2425,7 +2436,7 @@ pl330_tx_status(struct dma_chan *chan, d else if (running && desc == running) transferred = pl330_get_current_xferred_count(pch, desc); - else if (desc->status == BUSY) + else if (desc->status == BUSY || desc->status == PAUSED) /* * Busy but not running means either just enqueued, * or finished and not yet marked done @@ -2442,6 +2453,9 @@ pl330_tx_status(struct dma_chan *chan, d case DONE: ret = DMA_COMPLETE; break; + case PAUSED: + ret = DMA_PAUSED; + break; case PREP: case BUSY: ret = DMA_IN_PROGRESS;
From: Miquel Raynal miquel.raynal@bootlin.com
commit 96891e90d1256b569b1c183e7c9a0cfc568fa3b0 upstream.
A couple of hardware registers need to be set to reflect which interrupts have been allocated to the device. Each register is 32-bit wide and can receive four 8-bit values. If we provide any other interrupt number than four, the irq_num variable will never be 0 within the while check and the while block will loop forever.
There is an easy way to prevent this: just break the for loop when we reach "irq_num == 0", which anyway means all interrupts have been processed.
Cc: stable@vger.kernel.org Fixes: 17ce252266c7 ("dmaengine: xilinx: xdma: Add xilinx xdma driver") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Acked-by: Lizhi Hou lizhi.hou@amd.com Link: https://lore.kernel.org/r/20230731101442.792514-2-miquel.raynal@bootlin.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/xilinx/xdma.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/dma/xilinx/xdma.c b/drivers/dma/xilinx/xdma.c index ad5ff63354cf..5116188b9977 100644 --- a/drivers/dma/xilinx/xdma.c +++ b/drivers/dma/xilinx/xdma.c @@ -668,6 +668,8 @@ static int xdma_set_vector_reg(struct xdma_device *xdev, u32 vec_tbl_start, val |= irq_start << shift; irq_start++; irq_num--; + if (!irq_num) + break; }
/* write IRQ register */
From: Souradeep Chakrabarti schakrabarti@linux.microsoft.com
commit a7dfeda6fdeccab4c7c3dce9a72c4262b9530c80 upstream.
When unloading the MANA driver, mana_dealloc_queues() waits for the MANA hardware to complete any inflight packets and set the pending send count to zero. But if the hardware has failed, mana_dealloc_queues() could wait forever.
Fix this by adding a timeout to the wait. Set the timeout to 120 seconds, which is a somewhat arbitrary value that is more than long enough for functional hardware to complete any sends.
Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Souradeep Chakrabarti schakrabarti@linux.microsoft.com Link: https://lore.kernel.org/r/1691576525-24271-1-git-send-email-schakrabarti@lin... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/microsoft/mana/mana_en.c | 37 +++++++++++++++++++++++--- 1 file changed, 33 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -8,6 +8,7 @@ #include <linux/ethtool.h> #include <linux/filter.h> #include <linux/mm.h> +#include <linux/pci.h>
#include <net/checksum.h> #include <net/ip6_checksum.h> @@ -2328,9 +2329,12 @@ int mana_attach(struct net_device *ndev) static int mana_dealloc_queues(struct net_device *ndev) { struct mana_port_context *apc = netdev_priv(ndev); + unsigned long timeout = jiffies + 120 * HZ; struct gdma_dev *gd = apc->ac->gdma_dev; struct mana_txq *txq; + struct sk_buff *skb; int i, err; + u32 tsleep;
if (apc->port_is_up) return -EINVAL; @@ -2346,15 +2350,40 @@ static int mana_dealloc_queues(struct ne * to false, but it doesn't matter since mana_start_xmit() drops any * new packets due to apc->port_is_up being false. * - * Drain all the in-flight TX packets + * Drain all the in-flight TX packets. + * A timeout of 120 seconds for all the queues is used. + * This will break the while loop when h/w is not responding. + * This value of 120 has been decided here considering max + * number of queues. */ + for (i = 0; i < apc->num_queues; i++) { txq = &apc->tx_qp[i].txq; - - while (atomic_read(&txq->pending_sends) > 0) - usleep_range(1000, 2000); + tsleep = 1000; + while (atomic_read(&txq->pending_sends) > 0 && + time_before(jiffies, timeout)) { + usleep_range(tsleep, tsleep + 1000); + tsleep <<= 1; + } + if (atomic_read(&txq->pending_sends)) { + err = pcie_flr(to_pci_dev(gd->gdma_context->dev)); + if (err) { + netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n", + err, atomic_read(&txq->pending_sends), + txq->gdma_txq_id); + } + break; + } }
+ for (i = 0; i < apc->num_queues; i++) { + txq = &apc->tx_qp[i].txq; + while ((skb = skb_dequeue(&txq->pending_skbs))) { + mana_unmap_skb(skb, apc); + dev_kfree_skb_any(skb); + } + atomic_set(&txq->pending_sends, 0); + } /* We're 100% sure the queues can no longer be woken up, because * we're sure now mana_poll_tx_cq() can't be running. */
From: Hans de Goede hdegoede@redhat.com
commit 2d331a6ac4815e2e2fe5f2d80d908566e57797cc upstream.
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2228891 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217718 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217731 Cc: All applicable stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf67..0800a9d77558 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = { { } };
+static const struct dmi_system_id lenovo_laptop[] = { + { + .ident = "LENOVO IdeaPad Flex 5 14ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82R9"), + }, + }, + { + .ident = "LENOVO IdeaPad Flex 5 16ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82RA"), + }, + }, + { } +}; + +static const struct dmi_system_id tongfang_gm_rg[] = { + { + .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"), + }, + }, + { } +}; + +static const struct dmi_system_id maingear_laptop[] = { + { + .ident = "MAINGEAR Vector Pro 2 15", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"), + } + }, + { + .ident = "MAINGEAR Vector Pro 2 17", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"), + }, + }, + { } +}; + static const struct dmi_system_id lg_laptop[] = { { .ident = "LG Electronics 17U70P", @@ -493,6 +539,10 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, + { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, + { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, };
@@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, return entry->override; }
+#ifdef CONFIG_X86 + /* + * IRQ override isn't needed on modern AMD Zen systems and + * this override breaks active low IRQs on AMD Ryzen 6000 and + * newer systems. Skip it. + */ + if (boot_cpu_has(X86_FEATURE_ZEN)) + return false; +#endif + return true; }
From: Hans de Goede hdegoede@redhat.com
commit 9728ac221160c5ea111879125a7694bb81364720 upstream.
All the cases, were the DSDT IRQ settings should be used instead of the MADT override, are for IRQ 1 or 12, the PS/2 kbd resp. mouse IRQs.
Simplify things by always honering the override for other legacy IRQs (for non DMI quirked cases).
This allows removing the DMI quirks to honor the override for some non i8042 IRQs on some AMD ZEN based Lenovo models.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Cc: All applicable stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/resource.c | 28 ++++++++-------------------- 1 file changed, 8 insertions(+), 20 deletions(-)
--- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -470,24 +470,6 @@ static const struct dmi_system_id asus_l { } };
-static const struct dmi_system_id lenovo_laptop[] = { - { - .ident = "LENOVO IdeaPad Flex 5 14ALC7", - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), - DMI_MATCH(DMI_PRODUCT_NAME, "82R9"), - }, - }, - { - .ident = "LENOVO IdeaPad Flex 5 16ALC7", - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), - DMI_MATCH(DMI_PRODUCT_NAME, "82RA"), - }, - }, - { } -}; - static const struct dmi_system_id tongfang_gm_rg[] = { { .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD", @@ -539,8 +521,6 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, - { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, - { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, @@ -564,6 +544,14 @@ static bool acpi_dev_irq_override(u32 gs
#ifdef CONFIG_X86 /* + * Always use the MADT override info, except for the i8042 PS/2 ctrl + * IRQs (1 and 12). For these the DSDT IRQ settings should sometimes + * be used otherwise PS/2 keyboards / mice will not work. + */ + if (gsi != 1 && gsi != 12) + return true; + + /* * IRQ override isn't needed on modern AMD Zen systems and * this override breaks active low IRQs on AMD Ryzen 6000 and * newer systems. Skip it.
From: Hans de Goede hdegoede@redhat.com
commit c6a1fd910d1bf8a0e3db7aebb229e3c81bc305c4 upstream.
On AMD Zen acpi_dev_irq_override() by default prefers the DSDT IRQ 1 settings over the MADT settings.
This causes the keyboard to malfunction on some laptop models (see Links), all models from the Links have an INT_SRC_OVR MADT entry for IRQ 1.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217336 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217394 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217406 Cc: All applicable stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/acpi.h | 2 ++ arch/x86/kernel/acpi/boot.c | 4 ++++ drivers/acpi/resource.c | 4 ++++ 3 files changed, 10 insertions(+)
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h index 8eb74cf386db..2888c0ee4df0 100644 --- a/arch/x86/include/asm/acpi.h +++ b/arch/x86/include/asm/acpi.h @@ -15,6 +15,7 @@ #include <asm/mpspec.h> #include <asm/x86_init.h> #include <asm/cpufeature.h> +#include <asm/irq_vectors.h>
#ifdef CONFIG_ACPI_APEI # include <asm/pgtable_types.h> @@ -31,6 +32,7 @@ extern int acpi_skip_timer_override; extern int acpi_use_timer_override; extern int acpi_fix_pin2_polarity; extern int acpi_disable_cmcff; +extern bool acpi_int_src_ovr[NR_IRQS_LEGACY];
extern u8 acpi_sci_flags; extern u32 acpi_sci_override_gsi; diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 21b542a6866c..53369c57751e 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -52,6 +52,7 @@ int acpi_lapic; int acpi_ioapic; int acpi_strict; int acpi_disable_cmcff; +bool acpi_int_src_ovr[NR_IRQS_LEGACY];
/* ACPI SCI override configuration */ u8 acpi_sci_flags __initdata; @@ -588,6 +589,9 @@ acpi_parse_int_src_ovr(union acpi_subtable_headers * header,
acpi_table_print_madt_entry(&header->common);
+ if (intsrc->source_irq < NR_IRQS_LEGACY) + acpi_int_src_ovr[intsrc->source_irq] = true; + if (intsrc->source_irq == acpi_gbl_FADT.sci_interrupt) { acpi_sci_ioapic_setup(intsrc->source_irq, intsrc->inti_flags & ACPI_MADT_POLARITY_MASK, diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 380cda1e86f4..8e32dd5776f5 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -551,6 +551,10 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, if (gsi != 1 && gsi != 12) return true;
+ /* If the override comes from an INT_SRC_OVR MADT entry, honor it. */ + if (acpi_int_src_ovr[gsi]) + return true; + /* * IRQ override isn't needed on modern AMD Zen systems and * this override breaks active low IRQs on AMD Ryzen 6000 and
From: Hans de Goede hdegoede@redhat.com
commit 56fec0051a69ace182ca3fba47be9c13038b4e3f upstream.
The PCSpecialist Elimina Pro 16 M laptop model is a Zen laptop which needs to use the MADT IRQ settings override and which does not have an INT_SRC_OVR entry for IRQ 1 in its MADT.
So this model needs a DMI quirk to enable the MADT IRQ settings override to fix its keyboard not working.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217394#c18 Cc: All applicable stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/resource.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 8e32dd5776f5..a4d9f149b48d 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -498,6 +498,17 @@ static const struct dmi_system_id maingear_laptop[] = { { } };
+static const struct dmi_system_id pcspecialist_laptop[] = { + { + .ident = "PCSpecialist Elimina Pro 16 M", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "PCSpecialist"), + DMI_MATCH(DMI_PRODUCT_NAME, "Elimina Pro 16 M"), + }, + }, + { } +}; + static const struct dmi_system_id lg_laptop[] = { { .ident = "LG Electronics 17U70P", @@ -523,6 +534,7 @@ static const struct irq_override_cmp override_table[] = { { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, + { pcspecialist_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, };
From: Christoph Hellwig hch@lst.de
commit 95848dcb9d676738411a8ff70a9704039f1b3982 upstream.
Commit af8b04c63708 ("zram: simplify bvec iteration in __zram_make_request") changed the bio iteration in zram to rely on the implicit capping to page boundaries in bio_for_each_segment. But it failed to care for the fact zram not only care about the page alignment of the bio payload, but also the page alignment into the device. For buffered I/O and swap those are the same, but for direct I/O or kernel internal I/O like XFS log buffer writes they can differ.
Fix this by open coding bio_for_each_segment and limiting the bvec len so that it never crosses over a page alignment boundary in the device in addition to the payload boundary already taken care of by bio_iter_iovec.
Cc: stable@vger.kernel.org Fixes: af8b04c63708 ("zram: simplify bvec iteration in __zram_make_request") Reported-by: Dusty Mabe dusty@dustymabe.com Signed-off-by: Christoph Hellwig hch@lst.de Acked-by: Sergey Senozhatsky senozhatsky@chromium.org Link: https://lore.kernel.org/r/20230805055537.147835-1-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/zram/zram_drv.c | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-)
--- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1870,15 +1870,16 @@ static void zram_bio_discard(struct zram
static void zram_bio_read(struct zram *zram, struct bio *bio) { - struct bvec_iter iter; - struct bio_vec bv; - unsigned long start_time; + unsigned long start_time = bio_start_io_acct(bio); + struct bvec_iter iter = bio->bi_iter;
- start_time = bio_start_io_acct(bio); - bio_for_each_segment(bv, bio, iter) { + do { u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT; u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT; + struct bio_vec bv = bio_iter_iovec(bio, iter); + + bv.bv_len = min_t(u32, bv.bv_len, PAGE_SIZE - offset);
if (zram_bvec_read(zram, &bv, index, offset, bio) < 0) { atomic64_inc(&zram->stats.failed_reads); @@ -1890,22 +1891,26 @@ static void zram_bio_read(struct zram *z zram_slot_lock(zram, index); zram_accessed(zram, index); zram_slot_unlock(zram, index); - } + + bio_advance_iter_single(bio, &iter, bv.bv_len); + } while (iter.bi_size); + bio_end_io_acct(bio, start_time); bio_endio(bio); }
static void zram_bio_write(struct zram *zram, struct bio *bio) { - struct bvec_iter iter; - struct bio_vec bv; - unsigned long start_time; + unsigned long start_time = bio_start_io_acct(bio); + struct bvec_iter iter = bio->bi_iter;
- start_time = bio_start_io_acct(bio); - bio_for_each_segment(bv, bio, iter) { + do { u32 index = iter.bi_sector >> SECTORS_PER_PAGE_SHIFT; u32 offset = (iter.bi_sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT; + struct bio_vec bv = bio_iter_iovec(bio, iter); + + bv.bv_len = min_t(u32, bv.bv_len, PAGE_SIZE - offset);
if (zram_bvec_write(zram, &bv, index, offset, bio) < 0) { atomic64_inc(&zram->stats.failed_writes); @@ -1916,7 +1921,10 @@ static void zram_bio_write(struct zram * zram_slot_lock(zram, index); zram_accessed(zram, index); zram_slot_unlock(zram, index); - } + + bio_advance_iter_single(bio, &iter, bv.bv_len); + } while (iter.bi_size); + bio_end_io_acct(bio, start_time); bio_endio(bio); }
From: Helge Deller deller@gmx.de
commit 56675f8b9f9b15b024b8e3145fa289b004916ab7 upstream.
The changes from commit 32832a407a71 ("io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()") to the parisc implementation of get_unmapped_area() broke glibc's locale-gen executable when running on parisc.
This patch reverts those architecture-specific changes, and instead adjusts in io_uring_mmu_get_unmapped_area() the pgoff offset which is then given to parisc's get_unmapped_area() function. This is much cleaner than the previous approach, and we still will get a coherent addresss.
This patch has no effect on other architectures (SHM_COLOUR is only defined on parisc), and the liburing testcase stil passes on parisc.
Cc: stable@vger.kernel.org # 6.4 Signed-off-by: Helge Deller deller@gmx.de Reported-by: Christoph Biedl linux-kernel.bfrz@manchmal.in-ulm.de Fixes: 32832a407a71 ("io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()") Fixes: d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements") Link: https://lore.kernel.org/r/ZNEyGV0jyI8kOOfz@p100 Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/sys_parisc.c | 15 +++++---------- io_uring/io_uring.c | 3 +++ 2 files changed, 8 insertions(+), 10 deletions(-)
--- a/arch/parisc/kernel/sys_parisc.c +++ b/arch/parisc/kernel/sys_parisc.c @@ -26,17 +26,12 @@ #include <linux/compat.h>
/* - * Construct an artificial page offset for the mapping based on the virtual + * Construct an artificial page offset for the mapping based on the physical * address of the kernel file mapping variable. - * If filp is zero the calculated pgoff value aliases the memory of the given - * address. This is useful for io_uring where the mapping shall alias a kernel - * address and a userspace adress where both the kernel and the userspace - * access the same memory region. */ -#define GET_FILP_PGOFF(filp, addr) \ - ((filp ? (((unsigned long) filp->f_mapping) >> 8) \ - & ((SHM_COLOUR-1) >> PAGE_SHIFT) : 0UL) \ - + (addr >> PAGE_SHIFT)) +#define GET_FILP_PGOFF(filp) \ + (filp ? (((unsigned long) filp->f_mapping) >> 8) \ + & ((SHM_COLOUR-1) >> PAGE_SHIFT) : 0UL)
static unsigned long shared_align_offset(unsigned long filp_pgoff, unsigned long pgoff) @@ -116,7 +111,7 @@ static unsigned long arch_get_unmapped_a do_color_align = 0; if (filp || (flags & MAP_SHARED)) do_color_align = 1; - filp_pgoff = GET_FILP_PGOFF(filp, addr); + filp_pgoff = GET_FILP_PGOFF(filp);
if (flags & MAP_FIXED) { /* Even MAP_FIXED mappings must reside within TASK_SIZE */ --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3466,6 +3466,8 @@ static unsigned long io_uring_mmu_get_un * - use the kernel virtual address of the shared io_uring context * (instead of the userspace-provided address, which has to be 0UL * anyway). + * - use the same pgoff which the get_unmapped_area() uses to + * calculate the page colouring. * For architectures without such aliasing requirements, the * architecture will return any suitable mapping because addr is 0. */ @@ -3474,6 +3476,7 @@ static unsigned long io_uring_mmu_get_un pgoff = 0; /* has been translated to ptr above */ #ifdef SHM_COLOUR addr = (uintptr_t) ptr; + pgoff = addr >> PAGE_SHIFT; #else addr = 0UL; #endif
Greg Kroah-Hartman wrote...
From: Helge Deller deller@gmx.de
commit 56675f8b9f9b15b024b8e3145fa289b004916ab7 upstream.
(...)
This patch has no effect on other architectures (SHM_COLOUR is only defined on parisc), and the liburing testcase stil passes on parisc.
Confirmed: Both issues reported to linux-parisc earlier are fixed now.
Christoph (very busy fighting time thieves)
From: Helge Deller deller@gmx.de
commit a0f4b7879f2e14986200747d1b545e5daac8c624 upstream.
The lightweight spinlock checks verify that a spinlock has either value 0 (spinlock locked) and that not any other bits than in __ARCH_SPIN_LOCK_UNLOCKED_VAL is set.
This breaks the current LWS code, which writes the address of the lock into the lock word to unlock it, which was an optimization to save one assembler instruction.
Fix it by making spinlock_types.h accessible for asm code, change the LWS spinlock-unlocking code to write __ARCH_SPIN_LOCK_UNLOCKED_VAL into the lock word, and add some missing lightweight spinlock checks to the LWS path. Finally, make the spinlock checks dependend on DEBUG_KERNEL.
Noticed-by: John David Anglin dave.anglin@bell.net Signed-off-by: Helge Deller deller@gmx.de Tested-by: John David Anglin dave.anglin@bell.net Cc: stable@vger.kernel.org # v6.4+ Fixes: 15e64ef6520e ("parisc: Add lightweight spinlock checks") Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/Kconfig.debug | 2 +- arch/parisc/include/asm/spinlock.h | 2 -- arch/parisc/include/asm/spinlock_types.h | 6 ++++++ arch/parisc/kernel/syscall.S | 23 ++++++++++++++++++++--- 4 files changed, 27 insertions(+), 6 deletions(-)
diff --git a/arch/parisc/Kconfig.debug b/arch/parisc/Kconfig.debug index 1401e4c5fe5f..bf2b21b96f0b 100644 --- a/arch/parisc/Kconfig.debug +++ b/arch/parisc/Kconfig.debug @@ -2,7 +2,7 @@ # config LIGHTWEIGHT_SPINLOCK_CHECK bool "Enable lightweight spinlock checks" - depends on SMP && !DEBUG_SPINLOCK + depends on DEBUG_KERNEL && SMP && !DEBUG_SPINLOCK default y help Add checks with low performance impact to the spinlock functions diff --git a/arch/parisc/include/asm/spinlock.h b/arch/parisc/include/asm/spinlock.h index edfcb9858bcb..0b326e52255e 100644 --- a/arch/parisc/include/asm/spinlock.h +++ b/arch/parisc/include/asm/spinlock.h @@ -7,8 +7,6 @@ #include <asm/processor.h> #include <asm/spinlock_types.h>
-#define SPINLOCK_BREAK_INSN 0x0000c006 /* break 6,6 */ - static inline void arch_spin_val_check(int lock_val) { if (IS_ENABLED(CONFIG_LIGHTWEIGHT_SPINLOCK_CHECK)) diff --git a/arch/parisc/include/asm/spinlock_types.h b/arch/parisc/include/asm/spinlock_types.h index d65934079ebd..efd06a897c6a 100644 --- a/arch/parisc/include/asm/spinlock_types.h +++ b/arch/parisc/include/asm/spinlock_types.h @@ -4,6 +4,10 @@
#define __ARCH_SPIN_LOCK_UNLOCKED_VAL 0x1a46
+#define SPINLOCK_BREAK_INSN 0x0000c006 /* break 6,6 */ + +#ifndef __ASSEMBLY__ + typedef struct { #ifdef CONFIG_PA20 volatile unsigned int slock; @@ -27,6 +31,8 @@ typedef struct { volatile unsigned int counter; } arch_rwlock_t;
+#endif /* __ASSEMBLY__ */ + #define __ARCH_RW_LOCK_UNLOCKED__ 0x01000000 #define __ARCH_RW_LOCK_UNLOCKED { .lock_mutex = __ARCH_SPIN_LOCK_UNLOCKED, \ .counter = __ARCH_RW_LOCK_UNLOCKED__ } diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S index 1373e5129868..1f51aa9c8230 100644 --- a/arch/parisc/kernel/syscall.S +++ b/arch/parisc/kernel/syscall.S @@ -39,6 +39,7 @@ registers). #include <asm/assembly.h> #include <asm/processor.h> #include <asm/cache.h> +#include <asm/spinlock_types.h>
#include <linux/linkage.h>
@@ -66,6 +67,16 @@ registers). stw \reg1, 0(%sr2,\reg2) .endm
+ /* raise exception if spinlock content is not zero or + * __ARCH_SPIN_LOCK_UNLOCKED_VAL */ + .macro spinlock_check spin_val,tmpreg +#ifdef CONFIG_LIGHTWEIGHT_SPINLOCK_CHECK + ldi __ARCH_SPIN_LOCK_UNLOCKED_VAL, \tmpreg + andcm,= \spin_val, \tmpreg, %r0 + .word SPINLOCK_BREAK_INSN +#endif + .endm + .text
.import syscall_exit,code @@ -508,7 +519,8 @@ lws_start:
lws_exit_noerror: lws_pagefault_enable %r1,%r21 - stw,ma %r20, 0(%sr2,%r20) + ldi __ARCH_SPIN_LOCK_UNLOCKED_VAL, %r21 + stw,ma %r21, 0(%sr2,%r20) ssm PSW_SM_I, %r0 b lws_exit copy %r0, %r21 @@ -521,7 +533,8 @@ lws_wouldblock:
lws_pagefault: lws_pagefault_enable %r1,%r21 - stw,ma %r20, 0(%sr2,%r20) + ldi __ARCH_SPIN_LOCK_UNLOCKED_VAL, %r21 + stw,ma %r21, 0(%sr2,%r20) ssm PSW_SM_I, %r0 ldo 3(%r0),%r28 b lws_exit @@ -619,6 +632,7 @@ lws_compare_and_swap:
/* Try to acquire the lock */ LDCW 0(%sr2,%r20), %r28 + spinlock_check %r28, %r21 comclr,<> %r0, %r28, %r0 b,n lws_wouldblock
@@ -772,6 +786,7 @@ cas2_lock_start:
/* Try to acquire the lock */ LDCW 0(%sr2,%r20), %r28 + spinlock_check %r28, %r21 comclr,<> %r0, %r28, %r0 b,n lws_wouldblock
@@ -1001,6 +1016,7 @@ atomic_xchg_start:
/* Try to acquire the lock */ LDCW 0(%sr2,%r20), %r28 + spinlock_check %r28, %r21 comclr,<> %r0, %r28, %r0 b,n lws_wouldblock
@@ -1199,6 +1215,7 @@ atomic_store_start:
/* Try to acquire the lock */ LDCW 0(%sr2,%r20), %r28 + spinlock_check %r28, %r21 comclr,<> %r0, %r28, %r0 b,n lws_wouldblock
@@ -1330,7 +1347,7 @@ ENTRY(lws_lock_start) /* lws locks */ .rept 256 /* Keep locks aligned at 16-bytes */ - .word 1 + .word __ARCH_SPIN_LOCK_UNLOCKED_VAL .word 0 .word 0 .word 0
From: Alexandre Ghiti alexghiti@rivosinc.com
commit c3bcc65d4d2e8292c435322cbc34c318d06b8b6c upstream.
So that we do not end up mapping the whole linear mapping using 4K pages, which is slow at boot time, and also very likely at runtime.
So make sure we align the start of DRAM on a PMD boundary.
Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Reported-by: Song Shuai suagrfillet@gmail.com Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") Tested-by: Song Shuai suagrfillet@gmail.com Link: https://lore.kernel.org/r/20230704121837.248976-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/mm/init.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 9ce504737d18..ad845c3aa9b2 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -214,8 +214,13 @@ static void __init setup_bootmem(void) memblock_reserve(vmlinux_start, vmlinux_end - vmlinux_start);
phys_ram_end = memblock_end_of_DRAM(); + + /* + * Make sure we align the start of the memory on a PMD boundary so that + * at worst, we map the linear mapping with PMD mappings. + */ if (!IS_ENABLED(CONFIG_XIP_KERNEL)) - phys_ram_base = memblock_start_of_DRAM(); + phys_ram_base = memblock_start_of_DRAM() & PMD_MASK;
/* * In 64-bit, any use of __va/__pa before this point is wrong as we
From: Torsten Duwe duwe@suse.de
commit 49af7a2cd5f678217b8b4f86a29411aebebf3e78 upstream.
When initrd is loaded low, the secondary kernel fails like this:
INITRD: 0xdc581000+0x00eef000 overlaps in-use memory region
This initrd load address corresponds to the _end symbol, but the reservation is aligned on PMD_SIZE, as explained by a comment in setup_bootmem().
It is technically possible to align the initrd load address accordingly, leaving a hole between the end of kernel and the initrd, but it is much simpler to allocate the initrd top-down.
Fixes: 838b3e28488f ("RISC-V: Load purgatory in kexec_file") Signed-off-by: Torsten Duwe duwe@suse.de Signed-off-by: Petr Tesarik petr.tesarik.ext@huawei.com Cc: stable@vger.kernel.org Reviewed-by: Conor Dooley conor.dooley@microchip.com Link: https://lore.kernel.org/all/67c8eb9eea25717c2c8208d9bfbfaa39e6e2a1c6.1690365... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/elf_kexec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/riscv/kernel/elf_kexec.c +++ b/arch/riscv/kernel/elf_kexec.c @@ -281,7 +281,7 @@ static void *elf_kexec_load(struct kimag kbuf.buffer = initrd; kbuf.bufsz = kbuf.memsz = initrd_len; kbuf.buf_align = PAGE_SIZE; - kbuf.top_down = false; + kbuf.top_down = true; kbuf.mem = KEXEC_BUF_MEM_UNKNOWN; ret = kexec_add_buffer(&kbuf); if (ret)
From: Andrea Parri parri.andrea@gmail.com
commit 4eb2eb1b4c0eb07793c240744843498564a67b83 upstream.
Section 2.1 of the Platform Specification [1] states:
Unless otherwise specified by a given I/O device, I/O devices are on ordering channel 0 (i.e., they are point-to-point strongly ordered).
which is not sufficient to guarantee that a readX() by a hart completes before a subsequent delay() on the same hart (cf. memory-barriers.txt, "Kernel I/O barrier effects").
Set the I(nput) bit in __io_ar() to restore the ordering, align inline comments.
[1] https://github.com/riscv/riscv-platform-specs
Signed-off-by: Andrea Parri parri.andrea@gmail.com Link: https://lore.kernel.org/r/20230803042738.5937-1-parri.andrea@gmail.com Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/mmio.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/riscv/include/asm/mmio.h +++ b/arch/riscv/include/asm/mmio.h @@ -101,9 +101,9 @@ static inline u64 __raw_readq(const vola * Relaxed I/O memory access primitives. These follow the Device memory * ordering rules but do not guarantee any ordering relative to Normal memory * accesses. These are defined to order the indicated access (either a read or - * write) with all other I/O memory accesses. Since the platform specification - * defines that all I/O regions are strongly ordered on channel 2, no explicit - * fences are required to enforce this ordering. + * write) with all other I/O memory accesses to the same peripheral. Since the + * platform specification defines that all I/O regions are strongly ordered on + * channel 0, no explicit fences are required to enforce this ordering. */ /* FIXME: These are now the same as asm-generic */ #define __io_rbr() do {} while (0) @@ -125,14 +125,14 @@ static inline u64 __raw_readq(const vola #endif
/* - * I/O memory access primitives. Reads are ordered relative to any - * following Normal memory access. Writes are ordered relative to any prior - * Normal memory access. The memory barriers here are necessary as RISC-V + * I/O memory access primitives. Reads are ordered relative to any following + * Normal memory read and delay() loop. Writes are ordered relative to any + * prior Normal memory write. The memory barriers here are necessary as RISC-V * doesn't define any ordering between the memory space and the I/O space. */ #define __io_br() do {} while (0) -#define __io_ar(v) __asm__ __volatile__ ("fence i,r" : : : "memory") -#define __io_bw() __asm__ __volatile__ ("fence w,o" : : : "memory") +#define __io_ar(v) ({ __asm__ __volatile__ ("fence i,ir" : : : "memory"); }) +#define __io_bw() ({ __asm__ __volatile__ ("fence w,o" : : : "memory"); }) #define __io_aw() mmiowb_set_pending()
#define readb(c) ({ u8 __v; __io_br(); __v = readb_cpu(c); __io_ar(__v); __v; })
From: Torsten Duwe duwe@suse.de
commit d0b4f95a51038becce4bdab4789aa7ce59d4ea6e upstream.
R_RISCV_CALL has been deprecated and replaced by R_RISCV_CALL_PLT. See Enum 18-19 in Table 3. Relocation types here:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.a...
It was deprecated in ("Deprecated R_RISCV_CALL, prefer R_RISCV_CALL_PLT"):
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/commit/a0dced85018d7a0e...
Recent tools (at least GNU binutils-2.40) already use R_RISCV_CALL_PLT. Kernels built with such binutils fail kexec_load_file(2) with:
kexec_image: Unknown rela relocation: 19 kexec_image: Error loading purgatory ret=-8
The binary code at the call site remains the same, so tell arch_kexec_apply_relocations_add() to handle _PLT alike.
Fixes: 838b3e28488f ("RISC-V: Load purgatory in kexec_file") Signed-off-by: Torsten Duwe duwe@suse.de Signed-off-by: Petr Tesarik petr.tesarik.ext@huawei.com Cc: Li Zhengyu lizhengyu3@huawei.com Cc: stable@vger.kernel.org Reviewed-by: Conor Dooley conor.dooley@microchip.com Link: https://lore.kernel.org/all/b046b164af8efd33bbdb7d4003273bdf9196a5b0.1690365... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/elf_kexec.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/riscv/kernel/elf_kexec.c +++ b/arch/riscv/kernel/elf_kexec.c @@ -425,6 +425,7 @@ int arch_kexec_apply_relocations_add(str * sym, instead of searching the whole relsec. */ case R_RISCV_PCREL_HI20: + case R_RISCV_CALL_PLT: case R_RISCV_CALL: *(u64 *)loc = CLEAN_IMM(UITYPE, *(u64 *)loc) | ENCODE_UJTYPE_IMM(val - addr);
From: Nick Desaulniers ndesaulniers@google.com
commit d2402048bc8a206a56fde4bc41dd01336c7b5a21 upstream.
I'm looking to enable -Wmissing-variable-declarations behind W=1. 0day bot spotted the following instance in ARCH=riscv builds:
arch/riscv/mm/init.c:276:7: warning: no previous extern declaration for non-static variable 'trampoline_pg_dir' [-Wmissing-variable-declarations] 276 | pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss; | ^ arch/riscv/mm/init.c:276:1: note: declare 'static' if the variable is not intended to be used outside of this translation unit 276 | pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss; | ^ arch/riscv/mm/init.c:279:7: warning: no previous extern declaration for non-static variable 'early_pg_dir' [-Wmissing-variable-declarations] 279 | pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); | ^ arch/riscv/mm/init.c:279:1: note: declare 'static' if the variable is not intended to be used outside of this translation unit 279 | pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); | ^
These symbols are referenced by more than one translation unit, so make sure they're both declared and include the correct header for their declarations. Finally, sort the list of includes to help keep them tidy.
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/llvm/202308081000.tTL1ElTr-lkp@intel.com/ Signed-off-by: Nick Desaulniers ndesaulniers@google.com Link: https://lore.kernel.org/r/20230808-riscv_static-v2-1-2a1e2d2c7a4f@google.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/pgtable.h | 2 ++ arch/riscv/mm/init.c | 9 +++++---- arch/riscv/mm/kasan_init.c | 1 - 3 files changed, 7 insertions(+), 5 deletions(-)
--- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -188,6 +188,8 @@ extern struct pt_alloc_ops pt_ops __init #define PAGE_KERNEL_IO __pgprot(_PAGE_IOREMAP)
extern pgd_t swapper_pg_dir[]; +extern pgd_t trampoline_pg_dir[]; +extern pgd_t early_pg_dir[];
#ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline int pmd_present(pmd_t pmd) --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -26,12 +26,13 @@ #include <linux/kfence.h>
#include <asm/fixmap.h> -#include <asm/tlbflush.h> -#include <asm/sections.h> -#include <asm/soc.h> #include <asm/io.h> -#include <asm/ptdump.h> #include <asm/numa.h> +#include <asm/pgtable.h> +#include <asm/ptdump.h> +#include <asm/sections.h> +#include <asm/soc.h> +#include <asm/tlbflush.h>
#include "../kernel/head.h"
--- a/arch/riscv/mm/kasan_init.c +++ b/arch/riscv/mm/kasan_init.c @@ -22,7 +22,6 @@ * region is not and then we have to go down to the PUD level. */
-extern pgd_t early_pg_dir[PTRS_PER_PGD]; pgd_t tmp_pg_dir[PTRS_PER_PGD] __page_aligned_bss; p4d_t tmp_p4d[PTRS_PER_P4D] __page_aligned_bss; pud_t tmp_pud[PTRS_PER_PUD] __page_aligned_bss;
From: Ming Lei ming.lei@redhat.com
commit 1b95e817916069ec45a7f259d088fd1c091a8cc6 upstream.
Error recovery can be interrupted by controller removal, then the controller is left as quiesced, and IO hang can be caused.
Fix the issue by unquiescing controller unconditionally when removing namespaces.
This way is reasonable and safe given forward progress can be made when removing namespaces.
Reviewed-by: Keith Busch kbusch@kernel.org Reviewed-by: Sagi Grimberg sagi@grimberg.me Reported-by: Chunguang Xu brookxu.cn@gmail.com Closes: https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.... Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nvme/host/core.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4728,6 +4728,12 @@ void nvme_remove_namespaces(struct nvme_ */ nvme_mpath_clear_ctrl_paths(ctrl);
+ /* + * Unquiesce io queues so any pending IO won't hang, especially + * those submitted from scan work + */ + nvme_unquiesce_io_queues(ctrl); + /* prevent racing with ns scanning */ flush_work(&ctrl->scan_work);
@@ -4737,10 +4743,8 @@ void nvme_remove_namespaces(struct nvme_ * removing the namespaces' disks; fail all the queues now to avoid * potentially having to clean up the failed sync later. */ - if (ctrl->state == NVME_CTRL_DEAD) { + if (ctrl->state == NVME_CTRL_DEAD) nvme_mark_namespaces_dead(ctrl); - nvme_unquiesce_io_queues(ctrl); - }
/* this is a no-op when called from the controller reset handler */ nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);
From: Ming Lei ming.lei@redhat.com
commit 99dc264014d5aed66ee37ddf136a38b5a2b1b529 upstream.
Move start_freeze into nvme_tcp_configure_io_queues(), and there is at least two benefits:
1) fix unbalanced freeze and unfreeze, since re-connection work may fail or be broken by removal
2) IO during error recovery can be failfast quickly because nvme fabrics unquiesces queues after teardown.
One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal:
1) same problem exists with current code base
2) compared with !mpath, mpath use case is dominant
Fixes: 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com Tested-by: Yi Zhang yi.zhang@redhat.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nvme/host/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1909,6 +1909,7 @@ static int nvme_tcp_configure_io_queues( goto out_cleanup_connect_q;
if (!new) { + nvme_start_freeze(ctrl); nvme_unquiesce_io_queues(ctrl); if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) { /* @@ -1917,6 +1918,7 @@ static int nvme_tcp_configure_io_queues( * to be safe. */ ret = -ENODEV; + nvme_unfreeze(ctrl); goto out_wait_freeze_timed_out; } blk_mq_update_nr_hw_queues(ctrl->tagset, @@ -2021,7 +2023,6 @@ static void nvme_tcp_teardown_io_queues( if (ctrl->queue_count <= 1) return; nvme_quiesce_admin_queue(ctrl); - nvme_start_freeze(ctrl); nvme_quiesce_io_queues(ctrl); nvme_sync_io_queues(ctrl); nvme_tcp_stop_io_queues(ctrl);
From: Ming Lei ming.lei@redhat.com
commit 29b434d1e49252b3ad56ad3197e47fafff5356a1 upstream.
Move start_freeze into nvme_rdma_configure_io_queues(), and there is at least two benefits:
1) fix unbalanced freeze and unfreeze, since re-connection work may fail or be broken by removal
2) IO during error recovery can be failfast quickly because nvme fabrics unquiesces queues after teardown.
One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal:
1) same problem exists with current code base
2) compared with !mpath, mpath use case is dominant
Fixes: 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com Tested-by: Yi Zhang yi.zhang@redhat.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nvme/host/rdma.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -918,6 +918,7 @@ static int nvme_rdma_configure_io_queues goto out_cleanup_tagset;
if (!new) { + nvme_start_freeze(&ctrl->ctrl); nvme_unquiesce_io_queues(&ctrl->ctrl); if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) { /* @@ -926,6 +927,7 @@ static int nvme_rdma_configure_io_queues * to be safe. */ ret = -ENODEV; + nvme_unfreeze(&ctrl->ctrl); goto out_wait_freeze_timed_out; } blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset, @@ -975,7 +977,6 @@ static void nvme_rdma_teardown_io_queues bool remove) { if (ctrl->ctrl.queue_count > 1) { - nvme_start_freeze(&ctrl->ctrl); nvme_quiesce_io_queues(&ctrl->ctrl); nvme_sync_io_queues(&ctrl->ctrl); nvme_rdma_stop_io_queues(ctrl);
From: August Wikerfors git@augustwikerfors.se
commit 688b419c57c13637d95d7879e165fff3dec581eb upstream.
The Samsung PM9B1 512G SSD found in some Lenovo Yoga 7 14ARB7 laptop units reports eui as 0001000200030004 when resuming from s2idle, causing the device to be removed with this error in dmesg:
nvme nvme0: identifiers changed for nsid 1
To fix this, add a quirk to ignore namespace identifiers for this device.
Signed-off-by: August Wikerfors git@augustwikerfors.se Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nvme/host/pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3391,7 +3391,8 @@ static const struct pci_device_id nvme_i { PCI_DEVICE(0x1d97, 0x2263), /* SPCC */ .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, { PCI_DEVICE(0x144d, 0xa80b), /* Samsung PM9B1 256G and 512G */ - .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES | + NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x144d, 0xa809), /* Samsung MZALQ256HBJD 256G */ .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, { PCI_DEVICE(0x1cc4, 0x6303), /* UMIS RPJTJ512MGE1QDY 512G */
From: Karol Herbst kherbst@redhat.com
commit 1cb9e2ef66d53b020842b18762e30d0eb4384de8 upstream.
We have a lurking bug where Fragment Shader Helper Invocations can't load from memory. But this is actually required in OpenGL and is causing random hangs or failures in random shaders.
It is unknown how widespread this issue is, but shaders hitting this can end up with infinite loops.
We enable those only on all Kepler and newer GPUs where we use our own Firmware.
Nvidia's firmware provides a way to set a kernelspace controlled list of mmio registers in the gr space from push buffers via MME macros.
v2: drop code for gm200 and newer.
Cc: Ben Skeggs bskeggs@redhat.com Cc: David Airlie airlied@gmail.com Cc: nouveau@lists.freedesktop.org Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Karol Herbst kherbst@redhat.com Reviewed-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20230622152017.2512101-1-kherb... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c | 4 +++- drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c | 10 ++++++++++ drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm107.c | 1 + 6 files changed, 17 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h @@ -117,6 +117,7 @@ void gk104_grctx_generate_r418800(struct
extern const struct gf100_grctx_func gk110_grctx; void gk110_grctx_generate_r419eb0(struct gf100_gr *); +void gk110_grctx_generate_r419f78(struct gf100_gr *);
extern const struct gf100_grctx_func gk110b_grctx; extern const struct gf100_grctx_func gk208_grctx; --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c @@ -906,7 +906,9 @@ static void gk104_grctx_generate_r419f78(struct gf100_gr *gr) { struct nvkm_device *device = gr->base.engine.subdev.device; - nvkm_mask(device, 0x419f78, 0x00000001, 0x00000000); + + /* bit 3 set disables loads in fp helper invocations, we need it enabled */ + nvkm_mask(device, 0x419f78, 0x00000009, 0x00000000); }
void --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c @@ -820,6 +820,15 @@ gk110_grctx_generate_r419eb0(struct gf10 nvkm_mask(device, 0x419eb0, 0x00001000, 0x00001000); }
+void +gk110_grctx_generate_r419f78(struct gf100_gr *gr) +{ + struct nvkm_device *device = gr->base.engine.subdev.device; + + /* bit 3 set disables loads in fp helper invocations, we need it enabled */ + nvkm_mask(device, 0x419f78, 0x00000008, 0x00000000); +} + const struct gf100_grctx_func gk110_grctx = { .main = gf100_grctx_generate_main, @@ -854,4 +863,5 @@ gk110_grctx = { .gpc_tpc_nr = gk104_grctx_generate_gpc_tpc_nr, .r418800 = gk104_grctx_generate_r418800, .r419eb0 = gk110_grctx_generate_r419eb0, + .r419f78 = gk110_grctx_generate_r419f78, }; --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c @@ -103,4 +103,5 @@ gk110b_grctx = { .gpc_tpc_nr = gk104_grctx_generate_gpc_tpc_nr, .r418800 = gk104_grctx_generate_r418800, .r419eb0 = gk110_grctx_generate_r419eb0, + .r419f78 = gk110_grctx_generate_r419f78, }; --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c @@ -568,4 +568,5 @@ gk208_grctx = { .dist_skip_table = gf117_grctx_generate_dist_skip_table, .gpc_tpc_nr = gk104_grctx_generate_gpc_tpc_nr, .r418800 = gk104_grctx_generate_r418800, + .r419f78 = gk110_grctx_generate_r419f78, }; --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm107.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm107.c @@ -988,4 +988,5 @@ gm107_grctx = { .r406500 = gm107_grctx_generate_r406500, .gpc_tpc_nr = gk104_grctx_generate_gpc_tpc_nr, .r419e00 = gm107_grctx_generate_r419e00, + .r419f78 = gk110_grctx_generate_r419f78, };
From: Lyude Paul lyude@redhat.com
commit e4060dad253352382b20420d8ef98daab24dbc17 upstream.
Currently we use the drm_dp_dpcd_read_caps() helper in the DRM side of nouveau in order to read the DPCD of a DP connector, which makes sure we do the right thing and also check for extended DPCD caps. However, it turns out we're not currently doing this on the nvkm side since we don't have access to the drm_dp_aux structure there - which means that the DRM side of the driver and the NVKM side can end up with different DPCD capabilities for the same connector.
Ideally in order to fix this, we just want to use the drm_dp_read_dpcd_caps() helper in nouveau. That's not currently possible though, and is going to depend on having a bunch of the DP code moved out of nvkm and into the DRM side of things as part of the GSP enablement work.
Until then however, let's workaround this problem by porting a copy of drm_dp_read_dpcd_caps() into NVKM - which should fix this issue.
Signed-off-by: Lyude Paul lyude@redhat.com Reviewed-by: Karol Herbst kherbst@redhat.com Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/211 Link: https://patchwork.freedesktop.org/patch/msgid/20230728225858.350581-1-lyude@... (cherry picked from commit cc4adf3a7323212f303bc9ff0f96346c44fcba06 in drm-misc-next) Cc: stable@vger.kernel.org # 6.3+ Signed-off-by: Karol Herbst kherbst@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c | 48 +++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c @@ -26,6 +26,8 @@ #include "head.h" #include "ior.h"
+#include <drm/display/drm_dp.h> + #include <subdev/bios.h> #include <subdev/bios/init.h> #include <subdev/gpio.h> @@ -634,6 +636,50 @@ nvkm_dp_enable_supported_link_rates(stru return outp->dp.rates != 0; }
+/* XXX: This is a big fat hack, and this is just drm_dp_read_dpcd_caps() + * converted to work inside nvkm. This is a temporary holdover until we start + * passing the drm_dp_aux device through NVKM + */ +static int +nvkm_dp_read_dpcd_caps(struct nvkm_outp *outp) +{ + struct nvkm_i2c_aux *aux = outp->dp.aux; + u8 dpcd_ext[DP_RECEIVER_CAP_SIZE]; + int ret; + + ret = nvkm_rdaux(aux, DPCD_RC00_DPCD_REV, outp->dp.dpcd, DP_RECEIVER_CAP_SIZE); + if (ret < 0) + return ret; + + /* + * Prior to DP1.3 the bit represented by + * DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT was reserved. + * If it is set DP_DPCD_REV at 0000h could be at a value less than + * the true capability of the panel. The only way to check is to + * then compare 0000h and 2200h. + */ + if (!(outp->dp.dpcd[DP_TRAINING_AUX_RD_INTERVAL] & + DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT)) + return 0; + + ret = nvkm_rdaux(aux, DP_DP13_DPCD_REV, dpcd_ext, sizeof(dpcd_ext)); + if (ret < 0) + return ret; + + if (outp->dp.dpcd[DP_DPCD_REV] > dpcd_ext[DP_DPCD_REV]) { + OUTP_DBG(outp, "Extended DPCD rev less than base DPCD rev (%d > %d)\n", + outp->dp.dpcd[DP_DPCD_REV], dpcd_ext[DP_DPCD_REV]); + return 0; + } + + if (!memcmp(outp->dp.dpcd, dpcd_ext, sizeof(dpcd_ext))) + return 0; + + memcpy(outp->dp.dpcd, dpcd_ext, sizeof(dpcd_ext)); + + return 0; +} + void nvkm_dp_enable(struct nvkm_outp *outp, bool auxpwr) { @@ -689,7 +735,7 @@ nvkm_dp_enable(struct nvkm_outp *outp, b memset(outp->dp.lttpr, 0x00, sizeof(outp->dp.lttpr)); }
- if (!nvkm_rdaux(aux, DPCD_RC00_DPCD_REV, outp->dp.dpcd, sizeof(outp->dp.dpcd))) { + if (!nvkm_dp_read_dpcd_caps(outp)) { const u8 rates[] = { 0x1e, 0x14, 0x0a, 0x06, 0 }; const u8 *rate; int rate_max;
From: Boris Brezillon boris.brezillon@collabora.com
commit 07dd476f6116966cb2006e25fdcf48f0715115ff upstream.
The dma-buf backend is supposed to provide its own vm_ops, but some implementation just have nothing special to do and leave vm_ops untouched, probably expecting this field to be zero initialized (this is the case with the system_heap implementation for instance). Let's reset vma->vm_ops to NULL to keep things working with these implementations.
Fixes: 26d3ac3cb04d ("drm/shmem-helpers: Redirect mmap for imported dma-buf") Cc: stable@vger.kernel.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Reported-by: Roman Stratiienko r.stratiienko@gmail.com Signed-off-by: Boris Brezillon boris.brezillon@collabora.com Tested-by: Roman Stratiienko r.stratiienko@gmail.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Link: https://patchwork.freedesktop.org/patch/msgid/20230724112610.60974-1-boris.b... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_gem_shmem_helper.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -623,7 +623,13 @@ int drm_gem_shmem_mmap(struct drm_gem_sh int ret;
if (obj->import_attach) { + /* Reset both vm_ops and vm_private_data, so we don't end up with + * vm_ops pointing to our implementation if the dma-buf backend + * doesn't set those fields. + */ vma->vm_private_data = NULL; + vma->vm_ops = NULL; + ret = dma_buf_mmap(obj->dma_buf, vma, 0);
/* Drop the reference drm_gem_mmap_obj() acquired.*/
From: Alex Deucher alexander.deucher@amd.com
commit 90e065677e0362a777b9db97ea21d43a39211399 upstream.
Since the gang_size check is outside of chunk parsing loop, we need to reset i before we free the chunk data.
Suggested by Ye Zhang (@VAR10CK) of Baidu Security.
Reviewed-by: Guchun Chen guchun.chen@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -291,7 +291,7 @@ static int amdgpu_cs_pass1(struct amdgpu
if (!p->gang_size) { ret = -EINVAL; - goto free_partial_kdata; + goto free_all_kdata; }
for (i = 0; i < p->gang_size; ++i) {
From: Kenneth Feng kenneth.feng@amd.com
commit bd60e2eafd8fb053948b6e23e8167baf7a159750 upstream.
correct the pcie width value in pp_dpm_pcie for smu 13.0.0
Signed-off-by: Kenneth Feng kenneth.feng@amd.com Reviewed-by: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c @@ -1030,7 +1030,6 @@ static int smu_v13_0_0_print_clk_levels( struct smu_13_0_dpm_context *dpm_context = smu_dpm->dpm_context; struct smu_13_0_dpm_table *single_dpm_table; struct smu_13_0_pcie_table *pcie_table; - const int link_width[] = {0, 1, 2, 4, 8, 12, 16}; uint32_t gen_speed, lane_width; int i, curr_freq, size = 0; int ret = 0; @@ -1145,7 +1144,7 @@ static int smu_v13_0_0_print_clk_levels( (pcie_table->pcie_lane[i] == 6) ? "x16" : "", pcie_table->clk_freq[i], (gen_speed == DECODE_GEN_SPEED(pcie_table->pcie_gen[i])) && - (lane_width == DECODE_LANE_WIDTH(link_width[pcie_table->pcie_lane[i]])) ? + (lane_width == DECODE_LANE_WIDTH(pcie_table->pcie_lane[i])) ? "*" : ""); break;
From: Mario Limonciello mario.limonciello@amd.com
commit 3bb575572bf498a9d39e9d1ca5c06cc3152928a1 upstream.
DCE products don't define a `remove_stream_from_ctx` like DCN ones do. This means that when compute_mst_dsc_configs_for_state() is called it always returns -EINVAL which causes MST to fail to setup.
Cc: stable@vger.kernel.org # 6.4.y Cc: Harry Wentland Harry.Wentland@amd.com Reported-by: Klaus.Kusche@computerix.info Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2671 Fixes: efa4c4df864e ("drm/amd/display: call remove_stream_from_ctx from res_pool funcs") Signed-off-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c index 9bc86deac9e8..b885c39bd16b 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -1320,7 +1320,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, if (computed_streams[i]) continue;
- if (!res_pool->funcs->remove_stream_from_ctx || + if (res_pool->funcs->remove_stream_from_ctx && res_pool->funcs->remove_stream_from_ctx(stream->ctx->dc, dc_state, stream) != DC_OK) return -EINVAL;
From: Melissa Wen mwen@igalia.com
commit 96b020e2163fb2197266b2f71b1007495206e6bb upstream.
Don't set predefined degamma curve to cursor plane if the cursor attribute flag is not set. Applying a degamma curve to the cursor by default breaks userspace expectation. Checking the flag before performing any color transformation prevents too dark cursor gamma in DCN3+ on many Linux desktop environment (KDE Plasma, GNOME, wlroots-based, etc.) as reported at: - https://gitlab.freedesktop.org/drm/amd/-/issues/1513
This is the same approach followed by DCN2 drivers where the issue is not present.
Fixes: 03f54d7d3448 ("drm/amd/display: Add DCN3 DPP") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1513 Signed-off-by: Melissa Wen mwen@igalia.com Reviewed-by: Harry Wentland harry.wentland@amd.com Tested-by: Alex Hung alex.hung@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c @@ -357,8 +357,11 @@ void dpp3_set_cursor_attributes( int cur_rom_en = 0;
if (color_format == CURSOR_MODE_COLOR_PRE_MULTIPLIED_ALPHA || - color_format == CURSOR_MODE_COLOR_UN_PRE_MULTIPLIED_ALPHA) - cur_rom_en = 1; + color_format == CURSOR_MODE_COLOR_UN_PRE_MULTIPLIED_ALPHA) { + if (cursor_attributes->attribute_flags.bits.ENABLE_CURSOR_DEGAMMA) { + cur_rom_en = 1; + } + }
REG_UPDATE_3(CURSOR0_CONTROL, CUR0_MODE, color_format,
From: Mario Limonciello mario.limonciello@amd.com
commit 08fffa74d9772d9538338be3f304006c94dde6f0 upstream.
Users report a white flickering screen on multiple systems that is tied to having 64GB or more memory. When S/G is enabled pages will get pinned to both VRAM carve out and system RAM leading to this.
Until it can be fixed properly, disable S/G when 64GB of memory or more is detected. This will force pages to be pinned into VRAM. This should fix white screen flickers but if VRAM pressure is encountered may lead to black screens. It's a trade-off for now.
Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") Cc: Hamza Mahfooz Hamza.Mahfooz@amd.com Cc: Roman Li roman.li@amd.com Cc: stable@vger.kernel.org # 6.1.y: bf0207e172703 ("drm/amdgpu: add S/G display parameter") Cc: stable@vger.kernel.org # 6.4.y Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2735 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ++++++++++++++++++++++ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 +--- 3 files changed, 29 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1246,6 +1246,7 @@ int amdgpu_device_gpu_recover(struct amd void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev); +bool amdgpu_sg_display_supported(struct amdgpu_device *adev); bool amdgpu_device_pcie_dynamic_switching_supported(void); bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev); bool amdgpu_device_aspm_support_quirk(void); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1353,6 +1353,32 @@ bool amdgpu_device_need_post(struct amdg }
/* + * On APUs with >= 64GB white flickering has been observed w/ SG enabled. + * Disable S/G on such systems until we have a proper fix. + * https://gitlab.freedesktop.org/drm/amd/-/issues/2354 + * https://gitlab.freedesktop.org/drm/amd/-/issues/2735 + */ +bool amdgpu_sg_display_supported(struct amdgpu_device *adev) +{ + switch (amdgpu_sg_display) { + case -1: + break; + case 0: + return false; + case 1: + return true; + default: + return false; + } + if ((totalram_pages() << (PAGE_SHIFT - 10)) + + (adev->gmc.real_vram_size / 1024) >= 64000000) { + DRM_WARN("Disabling S/G due to >=64GB RAM\n"); + return false; + } + return true; +} + +/* * Intel hosts such as Raptor Lake and Sapphire Rapids don't support dynamic * speed switching. Until we have confirmation from Intel that a specific host * supports it, it's safer that we keep it disabled for all. --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1630,9 +1630,8 @@ static int amdgpu_dm_init(struct amdgpu_ } break; } - if (init_data.flags.gpu_vm_support && - (amdgpu_sg_display == 0)) - init_data.flags.gpu_vm_support = false; + if (init_data.flags.gpu_vm_support) + init_data.flags.gpu_vm_support = amdgpu_sg_display_supported(adev);
if (init_data.flags.gpu_vm_support) adev->mode_info.gpu_vm_support = true;
From: Peter Ujfalusi peter.ujfalusi@linux.intel.com
commit 51e5e551af53259e0274b0cd4ff83d8351fb8c40 upstream.
The patch which made it to the kernel somehow changed the match condition from DMI_MATCH(DMI_PRODUCT_NAME, "UPX-TGL01") to DMI_MATCH(DMI_PRODUCT_VERSION, "UPX-TGL")
Revert back to the correct match condition to disable the interrupt mode on the board.
Cc: stable@vger.kernel.org # v6.4+ Fixes: edb13d7bb034 ("tpm: tpm_tis: Disable interrupts *only* for AEON UPX-i11") Link: https://lore.kernel.org/lkml/20230524085844.11580-1-peter.ujfalusi@linux.int... Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_tis.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index ac4daaf294a3..3c0f68b9e44f 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -183,7 +183,7 @@ static const struct dmi_system_id tpm_tis_dmi_table[] = { .ident = "UPX-TGL", .matches = { DMI_MATCH(DMI_SYS_VENDOR, "AAEON"), - DMI_MATCH(DMI_PRODUCT_VERSION, "UPX-TGL"), + DMI_MATCH(DMI_PRODUCT_NAME, "UPX-TGL01"), }, }, {}
From: Jarkko Sakkinen jarkko@kernel.org
commit 6aaf663ee04a80b445f8f5abff53cb92cb583c88 upstream.
Cc: stable@vger.kernel.org # v6.4+ Link: https://lore.kernel.org/linux-integrity/CAHk-=whRVp4h8uWOX1YO+Y99+44u4s=XxMK... Fixes: e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_tis.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 3c0f68b9e44f..7fa3d91042b2 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -89,7 +89,7 @@ static inline void tpm_tis_iowrite32(u32 b, void __iomem *iobase, u32 addr) tpm_tis_flush(iobase); }
-static int interrupts = -1; +static int interrupts; module_param(interrupts, int, 0444); MODULE_PARM_DESC(interrupts, "Enable interrupts");
From: Maulik Shah quic_mkshah@quicinc.com
commit 9a8fa00dad3c7b260071f2f220cfb00505372c40 upstream.
Genpd parent and child domain topology created using dt_idle_pd_init_topology() needs to be removed during error cases.
Add new helper function dt_idle_pd_remove_topology() for same.
Cc: stable@vger.kernel.org Reviewed-by: Ulf Hanssson ulf.hansson@linaro.org Signed-off-by: Maulik Shah quic_mkshah@quicinc.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cpuidle/dt_idle_genpd.c | 24 ++++++++++++++++++++++++ drivers/cpuidle/dt_idle_genpd.h | 7 +++++++ 2 files changed, 31 insertions(+)
--- a/drivers/cpuidle/dt_idle_genpd.c +++ b/drivers/cpuidle/dt_idle_genpd.c @@ -152,6 +152,30 @@ int dt_idle_pd_init_topology(struct devi return 0; }
+int dt_idle_pd_remove_topology(struct device_node *np) +{ + struct device_node *node; + struct of_phandle_args child, parent; + int ret; + + for_each_child_of_node(np, node) { + if (of_parse_phandle_with_args(node, "power-domains", + "#power-domain-cells", 0, &parent)) + continue; + + child.np = node; + child.args_count = 0; + ret = of_genpd_remove_subdomain(&parent, &child); + of_node_put(parent.np); + if (ret) { + of_node_put(node); + return ret; + } + } + + return 0; +} + struct device *dt_idle_attach_cpu(int cpu, const char *name) { struct device *dev; --- a/drivers/cpuidle/dt_idle_genpd.h +++ b/drivers/cpuidle/dt_idle_genpd.h @@ -14,6 +14,8 @@ struct generic_pm_domain *dt_idle_pd_all
int dt_idle_pd_init_topology(struct device_node *np);
+int dt_idle_pd_remove_topology(struct device_node *np); + struct device *dt_idle_attach_cpu(int cpu, const char *name);
void dt_idle_detach_cpu(struct device *dev); @@ -35,6 +37,11 @@ static inline int dt_idle_pd_init_topolo { return 0; } + +static inline int dt_idle_pd_remove_topology(struct device_node *np) +{ + return 0; +}
static inline struct device *dt_idle_attach_cpu(int cpu, const char *name) {
From: Maulik Shah quic_mkshah@quicinc.com
commit 12acb348fa4528a4203edf1cce7a3be2c9af2279 upstream.
A switch from OSI to PC mode is only possible if all CPUs other than the calling one are OFF, either through a call to CPU_OFF or not yet booted.
Currently OSI mode is enabled before power domains are created. In cases where CPUidle states are not using hierarchical CPU topology the bail out path tries to switch back to PC mode which gets denied by firmware since other CPUs are online at this point and creates inconsistent state as firmware is in OSI mode and Linux in PC mode.
This change moves enabling OSI mode after power domains are created, this would makes sure that hierarchical CPU topology is used before switching firmware to OSI mode.
Cc: stable@vger.kernel.org Fixes: 70c179b49870 ("cpuidle: psci: Allow PM domain to be initialized even if no OSI mode") Signed-off-by: Maulik Shah quic_mkshah@quicinc.com Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cpuidle/cpuidle-psci-domain.c | 39 +++++++++++----------------------- 1 file changed, 13 insertions(+), 26 deletions(-)
--- a/drivers/cpuidle/cpuidle-psci-domain.c +++ b/drivers/cpuidle/cpuidle-psci-domain.c @@ -120,20 +120,6 @@ static void psci_pd_remove(void) } }
-static bool psci_pd_try_set_osi_mode(void) -{ - int ret; - - if (!psci_has_osi_support()) - return false; - - ret = psci_set_osi_mode(true); - if (ret) - return false; - - return true; -} - static void psci_cpuidle_domain_sync_state(struct device *dev) { /* @@ -152,15 +138,12 @@ static int psci_cpuidle_domain_probe(str { struct device_node *np = pdev->dev.of_node; struct device_node *node; - bool use_osi; + bool use_osi = psci_has_osi_support(); int ret = 0, pd_count = 0;
if (!np) return -ENODEV;
- /* If OSI mode is supported, let's try to enable it. */ - use_osi = psci_pd_try_set_osi_mode(); - /* * Parse child nodes for the "#power-domain-cells" property and * initialize a genpd/genpd-of-provider pair when it's found. @@ -170,33 +153,37 @@ static int psci_cpuidle_domain_probe(str continue;
ret = psci_pd_init(node, use_osi); - if (ret) - goto put_node; + if (ret) { + of_node_put(node); + goto exit; + }
pd_count++; }
/* Bail out if not using the hierarchical CPU topology. */ if (!pd_count) - goto no_pd; + return 0;
/* Link genpd masters/subdomains to model the CPU topology. */ ret = dt_idle_pd_init_topology(np); if (ret) goto remove_pd;
+ /* let's try to enable OSI. */ + ret = psci_set_osi_mode(use_osi); + if (ret) + goto remove_pd; + pr_info("Initialized CPU PM domain topology using %s mode\n", use_osi ? "OSI" : "PC"); return 0;
-put_node: - of_node_put(node); remove_pd: + dt_idle_pd_remove_topology(np); psci_pd_remove(); +exit: pr_err("failed to create CPU PM domains ret=%d\n", ret); -no_pd: - if (use_osi) - psci_set_osi_mode(false); return ret; }
From: Aleksa Sarai cyphar@cyphar.com
commit 72dbde0f2afbe4af8e8595a89c650ae6b9d9c36f upstream.
O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old check for whether RESOLVE_CACHED can be used would incorrectly think that O_DIRECTORY could not be used with RESOLVE_CACHED.
Cc: stable@vger.kernel.org # v5.12+ Fixes: 3a81fd02045c ("io_uring: enable LOOKUP_CACHED path resolution for filename lookups") Signed-off-by: Aleksa Sarai cyphar@cyphar.com Link: https://lore.kernel.org/r/20230807-resolve_cached-o_tmpfile-v3-1-e49323e1ef6... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/openclose.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/io_uring/openclose.c +++ b/io_uring/openclose.c @@ -35,9 +35,11 @@ static bool io_openat_force_async(struct { /* * Don't bother trying for O_TRUNC, O_CREAT, or O_TMPFILE open, - * it'll always -EAGAIN + * it'll always -EAGAIN. Note that we test for __O_TMPFILE because + * O_TMPFILE includes O_DIRECTORY, which isn't a flag we need to force + * async for. */ - return open->how.flags & (O_TRUNC | O_CREAT | O_TMPFILE); + return open->how.flags & (O_TRUNC | O_CREAT | __O_TMPFILE); }
static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
From: Andrew Yang andrew.yang@mediatek.com
commit 4b5d1e47b69426c0f7491d97d73ad0152d02d437 upstream.
We encountered many kernel exceptions of VM_BUG_ON(zspage->isolated == 0) in dec_zspage_isolation() and BUG_ON(!pages[1]) in zs_unmap_object() lately. This issue only occurs when migration and reclamation occur at the same time.
With our memory stress test, we can reproduce this issue several times a day. We have no idea why no one else encountered this issue. BTW, we switched to the new kernel version with this defect a few months ago.
Since fullness and isolated share the same unsigned int, modifications of them should be protected by the same lock.
[andrew.yang@mediatek.com: move comment] Link: https://lkml.kernel.org/r/20230727062910.6337-1-andrew.yang@mediatek.com Link: https://lkml.kernel.org/r/20230721063705.11455-1-andrew.yang@mediatek.com Fixes: c4549b871102 ("zsmalloc: remove zspage isolation for migration") Signed-off-by: Andrew Yang andrew.yang@mediatek.com Reviewed-by: Sergey Senozhatsky senozhatsky@chromium.org Cc: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Cc: Matthias Brugger matthias.bgg@gmail.com Cc: Minchan Kim minchan@kernel.org Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/zsmalloc.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
--- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1977,6 +1977,7 @@ static void replace_sub_page(struct size
static bool zs_page_isolate(struct page *page, isolate_mode_t mode) { + struct zs_pool *pool; struct zspage *zspage;
/* @@ -1986,9 +1987,10 @@ static bool zs_page_isolate(struct page VM_BUG_ON_PAGE(PageIsolated(page), page);
zspage = get_zspage(page); - migrate_write_lock(zspage); + pool = zspage->pool; + spin_lock(&pool->lock); inc_zspage_isolation(zspage); - migrate_write_unlock(zspage); + spin_unlock(&pool->lock);
return true; } @@ -2054,12 +2056,12 @@ static int zs_page_migrate(struct page * kunmap_atomic(s_addr);
replace_sub_page(class, zspage, newpage, page); + dec_zspage_isolation(zspage); /* * Since we complete the data copy and set up new zspage structure, * it's okay to release the pool's lock. */ spin_unlock(&pool->lock); - dec_zspage_isolation(zspage); migrate_write_unlock(zspage);
get_page(newpage); @@ -2076,14 +2078,16 @@ static int zs_page_migrate(struct page *
static void zs_page_putback(struct page *page) { + struct zs_pool *pool; struct zspage *zspage;
VM_BUG_ON_PAGE(!PageIsolated(page), page);
zspage = get_zspage(page); - migrate_write_lock(zspage); + pool = zspage->pool; + spin_lock(&pool->lock); dec_zspage_isolation(zspage); - migrate_write_unlock(zspage); + spin_unlock(&pool->lock); }
static const struct movable_operations zsmalloc_mops = {
From: Tao Ren rentao.bupt@gmail.com
commit f38963b9cd0645a336cf30c5da2e89e34e34fec3 upstream.
Skip status check for both pfe1100 and pfe3000 because the communication error is also observed on pfe1100 devices.
Signed-off-by: Tao Ren rentao.bupt@gmail.com Fixes: 626bb2f3fb3c hwmon: (pmbus) add driver for BEL PFE1100 and PFE3000 Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230804221403.28931-1-rentao.bupt@gmail.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwmon/pmbus/bel-pfe.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/hwmon/pmbus/bel-pfe.c +++ b/drivers/hwmon/pmbus/bel-pfe.c @@ -17,12 +17,13 @@ enum chips {pfe1100, pfe3000};
/* - * Disable status check for pfe3000 devices, because some devices report - * communication error (invalid command) for VOUT_MODE command (0x20) - * although correct VOUT_MODE (0x16) is returned: it leads to incorrect - * exponent in linear mode. + * Disable status check because some devices report communication error + * (invalid command) for VOUT_MODE command (0x20) although the correct + * VOUT_MODE (0x16) is returned: it leads to incorrect exponent in linear + * mode. + * This affects both pfe3000 and pfe1100. */ -static struct pmbus_platform_data pfe3000_plat_data = { +static struct pmbus_platform_data pfe_plat_data = { .flags = PMBUS_SKIP_STATUS_CHECK, };
@@ -94,16 +95,15 @@ static int pfe_pmbus_probe(struct i2c_cl int model;
model = (int)i2c_match_id(pfe_device_id, client)->driver_data; + client->dev.platform_data = &pfe_plat_data;
/* * PFE3000-12-069RA devices may not stay in page 0 during device * probe which leads to probe failure (read status word failed). * So let's set the device to page 0 at the beginning. */ - if (model == pfe3000) { - client->dev.platform_data = &pfe3000_plat_data; + if (model == pfe3000) i2c_smbus_write_byte_data(client, PMBUS_PAGE, 0); - }
return pmbus_do_probe(client, &pfe_driver_info[model]); }
From: Colin Ian King colin.i.king@gmail.com
commit cac7ea57a06016e4914848b707477fb07ee4ae1c upstream.
Currently the pthread allocation for each array item is based on the size of a pthread_t pointer and should be the size of the pthread_t structure, so the allocation is under-allocating the correct size. Fix this by using the size of each element in the pthreads array.
Static analysis cppcheck reported: tools/testing/radix-tree/regression1.c:180:2: warning: Size of pointer 'threads' used instead of size of its data. [pointerSize]
Link: https://lkml.kernel.org/r/20230727160930.632674-1-colin.i.king@gmail.com Fixes: 1366c37ed84b ("radix tree test harness") Signed-off-by: Colin Ian King colin.i.king@gmail.com Cc: Konstantin Khlebnikov koct9i@gmail.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/radix-tree/regression1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/radix-tree/regression1.c +++ b/tools/testing/radix-tree/regression1.c @@ -177,7 +177,7 @@ void regression1_test(void) nr_threads = 2; pthread_barrier_init(&worker_barrier, NULL, nr_threads);
- threads = malloc(nr_threads * sizeof(pthread_t *)); + threads = malloc(nr_threads * sizeof(*threads));
for (i = 0; i < nr_threads; i++) { arg = i;
From: Thomas Weißschuh linux@weissschuh.net
commit 5e720f8c8c9d959283c3908bbf32a91a01a86547 upstream.
In commit 3666062b87ec ("cpufreq: amd-pstate: move to use bus_get_dev_root()") the "amd_pstate" attributes where moved from a dedicated kobject to the cpu root kobject.
While the dedicated kobject expects to contain kobj_attributes the root kobject needs device_attributes.
As the changed arguments are not used by the callbacks it works most of the time. However CFI will detect this issue:
[ 4947.849350] CFI failure at dev_attr_show+0x24/0x60 (target: show_status+0x0/0x70; expected type: 0x8651b1de) ... [ 4947.849409] Call Trace: [ 4947.849410] <TASK> [ 4947.849411] ? __warn+0xcf/0x1c0 [ 4947.849414] ? dev_attr_show+0x24/0x60 [ 4947.849415] ? report_cfi_failure+0x4e/0x60 [ 4947.849417] ? handle_cfi_failure+0x14c/0x1d0 [ 4947.849419] ? __cfi_show_status+0x10/0x10 [ 4947.849420] ? handle_bug+0x4f/0x90 [ 4947.849421] ? exc_invalid_op+0x1a/0x60 [ 4947.849422] ? asm_exc_invalid_op+0x1a/0x20 [ 4947.849424] ? __cfi_show_status+0x10/0x10 [ 4947.849425] ? dev_attr_show+0x24/0x60 [ 4947.849426] sysfs_kf_seq_show+0xa6/0x110 [ 4947.849433] seq_read_iter+0x16c/0x4b0 [ 4947.849436] vfs_read+0x272/0x2d0 [ 4947.849438] ksys_read+0x72/0xe0 [ 4947.849439] do_syscall_64+0x76/0xb0 [ 4947.849440] ? do_user_addr_fault+0x252/0x650 [ 4947.849442] ? exc_page_fault+0x7a/0x1b0 [ 4947.849443] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixes: 3666062b87ec ("cpufreq: amd-pstate: move to use bus_get_dev_root()") Reported-by: Jannik Glückert jannik.glueckert@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217765 Link: https://lore.kernel.org/lkml/c7f1bf9b-b183-bf6e-1cbb-d43f72494083@gmail.com/ Cc: All applicable stable@vger.kernel.org Signed-off-by: Thomas Weißschuh linux@weissschuh.net Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cpufreq/amd-pstate.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -986,8 +986,8 @@ static int amd_pstate_update_status(cons return 0; }
-static ssize_t show_status(struct kobject *kobj, - struct kobj_attribute *attr, char *buf) +static ssize_t status_show(struct device *dev, + struct device_attribute *attr, char *buf) { ssize_t ret;
@@ -998,7 +998,7 @@ static ssize_t show_status(struct kobjec return ret; }
-static ssize_t store_status(struct kobject *a, struct kobj_attribute *b, +static ssize_t status_store(struct device *a, struct device_attribute *b, const char *buf, size_t count) { char *p = memchr(buf, '\n', count); @@ -1017,7 +1017,7 @@ cpufreq_freq_attr_ro(amd_pstate_lowest_n cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); -define_one_global_rw(status); +static DEVICE_ATTR_RW(status);
static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -1036,7 +1036,7 @@ static struct freq_attr *amd_pstate_epp_ };
static struct attribute *pstate_global_attributes[] = { - &status.attr, + &dev_attr_status.attr, NULL };
From: Lorenzo Stoakes lstoakes@gmail.com
commit 17457784004c84178798432a029ab20e14f728b1 upstream.
Some architectures do not populate the entire range categorised by KCORE_TEXT, so we must ensure that the kernel address we read from is valid.
Unfortunately there is no solution currently available to do so with a purely iterator solution so reinstate the bounce buffer in this instance so we can use copy_from_kernel_nofault() in order to avoid page faults when regions are unmapped.
This change partly reverts commit 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data"), reinstating the bounce buffer, but adapts the code to continue to use an iterator.
[lstoakes@gmail.com: correct comment to be strictly correct about reasoning] Link: https://lkml.kernel.org/r/525a3f14-74fa-4c22-9fca-9dab4de8a0c3@lucifer.local Link: https://lkml.kernel.org/r/20230731215021.70911-1-lstoakes@gmail.com Fixes: 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data") Signed-off-by: Lorenzo Stoakes lstoakes@gmail.com Reported-by: Jiri Olsa olsajiri@gmail.com Closes: https://lore.kernel.org/all/ZHc2fm+9daF6cgCE@krava Tested-by: Jiri Olsa jolsa@kernel.org Tested-by: Will Deacon will@kernel.org Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Ard Biesheuvel ardb@kernel.org Cc: Baoquan He bhe@redhat.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: David Hildenbrand david@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Liu Shixin liushixin2@huawei.com Cc: Matthew Wilcox willy@infradead.org Cc: Mike Galbraith efault@gmx.de Cc: Thorsten Leemhuis regressions@leemhuis.info Cc: Uladzislau Rezki (Sony) urezki@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/kcore.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-)
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 9cb32e1a78a0..23fc24d16b31 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -309,6 +309,8 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) { + struct file *file = iocb->ki_filp; + char *buf = file->private_data; loff_t *fpos = &iocb->ki_pos; size_t phdrs_offset, notes_offset, data_offset; size_t page_offline_frozen = 1; @@ -555,10 +557,21 @@ static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter) case KCORE_VMEMMAP: case KCORE_TEXT: /* - * We use _copy_to_iter() to bypass usermode hardening - * which would otherwise prevent this operation. + * Sadly we must use a bounce buffer here to be able to + * make use of copy_from_kernel_nofault(), as these + * memory regions might not always be mapped on all + * architectures. */ - if (_copy_to_iter((char *)start, tsz, iter) != tsz) { + if (copy_from_kernel_nofault(buf, (void *)start, tsz)) { + if (iov_iter_zero(tsz, iter) != tsz) { + ret = -EFAULT; + goto out; + } + /* + * We know the bounce buffer is safe to copy from, so + * use _copy_to_iter() directly. + */ + } else if (_copy_to_iter(buf, tsz, iter) != tsz) { ret = -EFAULT; goto out; } @@ -595,6 +608,10 @@ static int open_kcore(struct inode *inode, struct file *filp) if (ret) return ret;
+ filp->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL); + if (!filp->private_data) + return -ENOMEM; + if (kcore_need_update) kcore_update_ram(); if (i_size_read(inode) != proc_root_kcore->size) { @@ -605,9 +622,16 @@ static int open_kcore(struct inode *inode, struct file *filp) return 0; }
+static int release_kcore(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + return 0; +} + static const struct proc_ops kcore_proc_ops = { .proc_read_iter = read_kcore_iter, .proc_open = open_kcore, + .proc_release = release_kcore, .proc_lseek = default_llseek, };
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit f8654743a0e6909dc634cbfad6db6816f10f3399 upstream.
During unmount process of nilfs2, nothing holds nilfs_root structure after nilfs2 detaches its writer in nilfs_detach_log_writer(). Previously, nilfs_evict_inode() could cause use-after-free read for nilfs_root if inodes are left in "garbage_list" and released by nilfs_dispose_list at the end of nilfs_detach_log_writer(), and this bug was fixed by commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()").
However, it turned out that there is another possibility of UAF in the call path where mark_inode_dirty_sync() is called from iput():
nilfs_detach_log_writer() nilfs_dispose_list() iput() mark_inode_dirty_sync() __mark_inode_dirty() nilfs_dirty_inode() __nilfs_mark_inode_dirty() nilfs_load_inode_block() --> causes UAF of nilfs_root struct
This can happen after commit 0ae45f63d4ef ("vfs: add support for a lazytime mount option"), which changed iput() to call mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME flag and i_nlink is non-zero.
This issue appears after commit 28a65b49eb53 ("nilfs2: do not write dirty data after degenerating to read-only") when using the syzbot reproducer, but the issue has potentially existed before.
Fix this issue by adding a "purging flag" to the nilfs structure, setting that flag while disposing the "garbage_list" and checking it in __nilfs_mark_inode_dirty().
Unlike commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()"), this patch does not rely on ns_writer to determine whether to skip operations, so as not to break recovery on mount. The nilfs_salvage_orphan_logs routine dirties the buffer of salvaged data before attaching the log writer, so changing __nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL will cause recovery write to fail. The purpose of using the cleanup-only flag is to allow for narrowing of such conditions.
Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/inode.c | 8 ++++++++ fs/nilfs2/segment.c | 2 ++ fs/nilfs2/the_nilfs.h | 2 ++ 3 files changed, 12 insertions(+)
--- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -1101,9 +1101,17 @@ int nilfs_set_file_dirty(struct inode *i
int __nilfs_mark_inode_dirty(struct inode *inode, int flags) { + struct the_nilfs *nilfs = inode->i_sb->s_fs_info; struct buffer_head *ibh; int err;
+ /* + * Do not dirty inodes after the log writer has been detached + * and its nilfs_root struct has been freed. + */ + if (unlikely(nilfs_purging(nilfs))) + return 0; + err = nilfs_load_inode_block(inode, &ibh); if (unlikely(err)) { nilfs_warn(inode->i_sb, --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -2845,6 +2845,7 @@ void nilfs_detach_log_writer(struct supe nilfs_segctor_destroy(nilfs->ns_writer); nilfs->ns_writer = NULL; } + set_nilfs_purging(nilfs);
/* Force to free the list of dirty files */ spin_lock(&nilfs->ns_inode_lock); @@ -2857,4 +2858,5 @@ void nilfs_detach_log_writer(struct supe up_write(&nilfs->ns_segctor_sem);
nilfs_dispose_list(nilfs, &garbage_list, 1); + clear_nilfs_purging(nilfs); } --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -29,6 +29,7 @@ enum { THE_NILFS_DISCONTINUED, /* 'next' pointer chain has broken */ THE_NILFS_GC_RUNNING, /* gc process is running */ THE_NILFS_SB_DIRTY, /* super block is dirty */ + THE_NILFS_PURGING, /* disposing dirty files for cleanup */ };
/** @@ -208,6 +209,7 @@ THE_NILFS_FNS(INIT, init) THE_NILFS_FNS(DISCONTINUED, discontinued) THE_NILFS_FNS(GC_RUNNING, gc_running) THE_NILFS_FNS(SB_DIRTY, sb_dirty) +THE_NILFS_FNS(PURGING, purging)
/* * Mount option operations
From: Karol Wachowski karol.wachowski@linux.intel.com
commit 1963546390ed8b649f529993a755eba0fdeb7aaa upstream.
Buffers mapped with pgprot_writecombined() are not correctly flushed. This triggers issues on VPU access using random memory content such as MMU translation faults, invalid context descriptors being fetched and can lead to VPU FW crashes.
Fixes: 647371a6609d ("accel/ivpu: Add GEM buffer object management") Cc: stable@vger.kernel.org # 6.3+ Signed-off-by: Karol Wachowski karol.wachowski@linux.intel.com Reviewed-by: Stanislaw Gruszka stanislaw.gruszka@linux.intel.com Signed-off-by: Stanislaw Gruszka stanislaw.gruszka@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20230802063735.3005291-1-stani... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/accel/ivpu/ivpu_gem.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/accel/ivpu/ivpu_gem.c b/drivers/accel/ivpu/ivpu_gem.c index 52b339aefadc..9967fcfa27ec 100644 --- a/drivers/accel/ivpu/ivpu_gem.c +++ b/drivers/accel/ivpu/ivpu_gem.c @@ -173,6 +173,9 @@ static void internal_free_pages_locked(struct ivpu_bo *bo) { unsigned int i, npages = bo->base.size >> PAGE_SHIFT;
+ if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED) + set_pages_array_wb(bo->pages, bo->base.size >> PAGE_SHIFT); + for (i = 0; i < npages; i++) put_page(bo->pages[i]);
@@ -587,6 +590,11 @@ ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 fla if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED) drm_clflush_pages(bo->pages, bo->base.size >> PAGE_SHIFT);
+ if (bo->flags & DRM_IVPU_BO_WC) + set_pages_array_wc(bo->pages, bo->base.size >> PAGE_SHIFT); + else if (bo->flags & DRM_IVPU_BO_UNCACHED) + set_pages_array_uc(bo->pages, bo->base.size >> PAGE_SHIFT); + prot = ivpu_bo_pgprot(bo, PAGE_KERNEL); bo->kvaddr = vmap(bo->pages, bo->base.size >> PAGE_SHIFT, VM_MAP, prot); if (!bo->kvaddr) {
From: Mike Kravetz mike.kravetz@oracle.com
commit 32c877191e022b55fe3a374f3d7e9fb5741c514d upstream.
Patch series "Fix hugetlb free path race with memory errors".
In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on HWPOISON hugepages" the race window was discovered. https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/
Freeing a hugetlb page back to low level memory allocators is performed in two steps. 1) Under hugetlb lock, remove page from hugetlb lists and clear destructor 2) Outside lock, allocate vmemmap if necessary and call low level free Between these two steps, the hugetlb page will appear as a normal compound page. However, vmemmap for tail pages could be missing. If a memory error occurs at this time, we could try to update page flags non-existant page structs.
A much more detailed description is in the first patch.
The first patch addresses the race window. However, it adds a hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page free operation. This could lead to slowdowns if one is freeing a large number of hugetlb pages.
The second path optimizes the update_and_free_pages_bulk routine to only take the lock once in bulk operations.
The second patch is technically not a bug fix, but includes a Fixes tag and Cc stable to avoid a performance regression. It can be combined with the first, but was done separately make reviewing easier.
This patch (of 2):
Freeing a hugetlb page and releasing base pages back to the underlying allocator such as buddy or cma is performed in two steps: - remove_hugetlb_folio() is called to remove the folio from hugetlb lists, get a ref on the page and remove hugetlb destructor. This all must be done under the hugetlb lock. After this call, the page can be treated as a normal compound page or a collection of base size pages. - update_and_free_hugetlb_folio() is called to allocate vmemmap if needed and the free routine of the underlying allocator is called on the resulting page. We can not hold the hugetlb lock here.
One issue with this scheme is that a memory error could occur between these two steps. In this case, the memory error handling code treats the old hugetlb page as a normal compound page or collection of base pages. It will then try to SetPageHWPoison(page) on the page with an error. If the page with error is a tail page without vmemmap, a write error will occur when trying to set the flag.
Address this issue by modifying remove_hugetlb_folio() and update_and_free_hugetlb_folio() such that the hugetlb destructor is not cleared until after allocating vmemmap. Since clearing the destructor requires holding the hugetlb lock, the clearing is done in remove_hugetlb_folio() if the vmemmap is present. This saves a lock/unlock cycle. Otherwise, destructor is cleared in update_and_free_hugetlb_folio() after allocating vmemmap.
Note that this will leave hugetlb pages in a state where they are marked free (by hugetlb specific page flag) and have a ref count. This is not a normal state. The only code that would notice is the memory error code, and it is set up to retry in such a case.
A subsequent patch will create a routine to do bulk processing of vmemmap allocation. This will eliminate a lock/unlock cycle for each hugetlb page in the case where we are freeing a large number of pages.
Link: https://lkml.kernel.org/r/20230711220942.43706-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20230711220942.43706-2-mike.kravetz@oracle.com Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Muchun Song songmuchun@bytedance.com Tested-by: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Axel Rasmussen axelrasmussen@google.com Cc: James Houghton jthoughton@google.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Miaohe Lin linmiaohe@huawei.com Cc: Michal Hocko mhocko@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/hugetlb.c | 75 ++++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 51 insertions(+), 24 deletions(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1580,9 +1580,37 @@ static inline void destroy_compound_giga unsigned int order) { } #endif
+static inline void __clear_hugetlb_destructor(struct hstate *h, + struct folio *folio) +{ + lockdep_assert_held(&hugetlb_lock); + + /* + * Very subtle + * + * For non-gigantic pages set the destructor to the normal compound + * page dtor. This is needed in case someone takes an additional + * temporary ref to the page, and freeing is delayed until they drop + * their reference. + * + * For gigantic pages set the destructor to the null dtor. This + * destructor will never be called. Before freeing the gigantic + * page destroy_compound_gigantic_folio will turn the folio into a + * simple group of pages. After this the destructor does not + * apply. + * + */ + if (hstate_is_gigantic(h)) + folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR); + else + folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR); +} + /* - * Remove hugetlb folio from lists, and update dtor so that the folio appears - * as just a compound page. + * Remove hugetlb folio from lists. + * If vmemmap exists for the folio, update dtor so that the folio appears + * as just a compound page. Otherwise, wait until after allocating vmemmap + * to update dtor. * * A reference is held on the folio, except in the case of demote. * @@ -1613,31 +1641,19 @@ static void __remove_hugetlb_folio(struc }
/* - * Very subtle - * - * For non-gigantic pages set the destructor to the normal compound - * page dtor. This is needed in case someone takes an additional - * temporary ref to the page, and freeing is delayed until they drop - * their reference. - * - * For gigantic pages set the destructor to the null dtor. This - * destructor will never be called. Before freeing the gigantic - * page destroy_compound_gigantic_folio will turn the folio into a - * simple group of pages. After this the destructor does not - * apply. - * - * This handles the case where more than one ref is held when and - * after update_and_free_hugetlb_folio is called. - * - * In the case of demote we do not ref count the page as it will soon - * be turned into a page of smaller size. + * We can only clear the hugetlb destructor after allocating vmemmap + * pages. Otherwise, someone (memory error handling) may try to write + * to tail struct pages. + */ + if (!folio_test_hugetlb_vmemmap_optimized(folio)) + __clear_hugetlb_destructor(h, folio); + + /* + * In the case of demote we do not ref count the page as it will soon + * be turned into a page of smaller size. */ if (!demote) folio_ref_unfreeze(folio, 1); - if (hstate_is_gigantic(h)) - folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR); - else - folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
h->nr_huge_pages--; h->nr_huge_pages_node[nid]--; @@ -1706,6 +1722,7 @@ static void __update_and_free_hugetlb_fo { int i; struct page *subpage; + bool clear_dtor = folio_test_hugetlb_vmemmap_optimized(folio);
if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return; @@ -1736,6 +1753,16 @@ static void __update_and_free_hugetlb_fo if (unlikely(folio_test_hwpoison(folio))) folio_clear_hugetlb_hwpoison(folio);
+ /* + * If vmemmap pages were allocated above, then we need to clear the + * hugetlb destructor under the hugetlb lock. + */ + if (clear_dtor) { + spin_lock_irq(&hugetlb_lock); + __clear_hugetlb_destructor(h, folio); + spin_unlock_irq(&hugetlb_lock); + } + for (i = 0; i < pages_per_huge_page(h); i++) { subpage = folio_page(folio, i); subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
From: SeongJae Park sj@kernel.org
commit 5f1fc67f2cb8d3035d3acd273b48b97835af8afd upstream.
damos_new_filter() is not initializing the list field of newly allocated filter object. However, DAMON sysfs interface and DAMON_RECLAIM are not initializing it after calling damos_new_filter(). As a result, accessing uninitialized memory is possible. Actually, adding multiple DAMOS filters via DAMON sysfs interface caused NULL pointer dereferencing. Initialize the field just after the allocation from damos_new_filter().
Link: https://lkml.kernel.org/r/20230729203733.38949-2-sj@kernel.org Fixes: 98def236f63c ("mm/damon/core: implement damos filter") Signed-off-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/mm/damon/core.c b/mm/damon/core.c index 91cff7f2997e..eb9580942a5c 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -273,6 +273,7 @@ struct damos_filter *damos_new_filter(enum damos_filter_type type, return NULL; filter->type = type; filter->matching = matching; + INIT_LIST_HEAD(&filter->list); return filter; }
From: Ayush Jain ayush.jain3@amd.com
commit 65294de30cb8bc7659e445f7be2846af9ed35499 upstream.
A missing break in kms_tests leads to kselftest hang when the parameter -s is used.
In current code flow because of missing break in -s, -t parses args spilled from -s and as -t accepts only valid values as 0,1 so any arg in -s >1 or <0, gets in ksm_test failure
This went undetected since, before the addition of option -t, the next case -M would immediately break out of the switch statement but that is no longer the case
Add the missing break statement.
----Before---- ./ksm_tests -H -s 100 Invalid merge type
----After---- ./ksm_tests -H -s 100 Number of normal pages: 0 Number of huge pages: 50 Total size: 100 MiB Total time: 0.401732682 s Average speed: 248.922 MiB/s
Link: https://lkml.kernel.org/r/20230728163952.4634-1-ayush.jain3@amd.com Fixes: 07115fcc15b4 ("selftests/mm: add new selftests for KSM") Signed-off-by: Ayush Jain ayush.jain3@amd.com Reviewed-by: David Hildenbrand david@redhat.com Cc: Stefan Roesch shr@devkernel.io Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/ksm_tests.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c index 435acebdc325..380b691d3eb9 100644 --- a/tools/testing/selftests/mm/ksm_tests.c +++ b/tools/testing/selftests/mm/ksm_tests.c @@ -831,6 +831,7 @@ int main(int argc, char *argv[]) printf("Size must be greater than 0\n"); return KSFT_FAIL; } + break; case 't': { int tmp = atoi(optarg);
From: Miaohe Lin linmiaohe@huawei.com
commit f29623e4a599c295cc8f518c8e4bb7848581a14d upstream.
If unpoison_memory() fails to clear page hwpoisoned flag, return value ret is expected to be -EBUSY. But when get_hwpoison_page() returns 1 and fails to clear page hwpoisoned flag due to races, return value will be unexpected 1 leading to users being confused. And there's a code smell that the variable "ret" is used not only to save the return value of unpoison_memory(), but also the return value from get_hwpoison_page(). Make a further cleanup by using another auto-variable solely to save the return value of get_hwpoison_page() as suggested by Naoya.
Link: https://lkml.kernel.org/r/20230727115643.639741-3-linmiaohe@huawei.com Fixes: bf181c582588 ("mm/hwpoison: fix unpoison_memory()") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2469,7 +2469,7 @@ int unpoison_memory(unsigned long pfn) { struct folio *folio; struct page *p; - int ret = -EBUSY; + int ret = -EBUSY, ghp; unsigned long count = 1; bool huge = false; static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL, @@ -2517,29 +2517,28 @@ int unpoison_memory(unsigned long pfn) if (folio_test_slab(folio) || PageTable(&folio->page) || folio_test_reserved(folio)) goto unlock_mutex;
- ret = get_hwpoison_page(p, MF_UNPOISON); - if (!ret) { + ghp = get_hwpoison_page(p, MF_UNPOISON); + if (!ghp) { if (PageHuge(p)) { huge = true; count = folio_free_raw_hwp(folio, false); - if (count == 0) { - ret = -EBUSY; + if (count == 0) goto unlock_mutex; - } } ret = folio_test_clear_hwpoison(folio) ? 0 : -EBUSY; - } else if (ret < 0) { - if (ret == -EHWPOISON) { + } else if (ghp < 0) { + if (ghp == -EHWPOISON) { ret = put_page_back_buddy(p) ? 0 : -EBUSY; - } else + } else { + ret = ghp; unpoison_pr_info("Unpoison: failed to grab page %#lx\n", pfn, &unpoison_rs); + } } else { if (PageHuge(p)) { huge = true; count = folio_free_raw_hwp(folio, false); if (count == 0) { - ret = -EBUSY; folio_put(folio); goto unlock_mutex; }
From: Miaohe Lin linmiaohe@huawei.com
commit faeb2ff2c1c5cb60ce0da193580b256c941f99ca upstream.
folio->_mapcount is overloaded in SLAB, so folio_mapped() has to be done after folio_test_slab() is checked. Otherwise slab folio might be treated as a mapped folio leading to false 'Someone maps the hwpoison page' error info.
Link: https://lkml.kernel.org/r/20230727115643.639741-4-linmiaohe@huawei.com Fixes: 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed pages") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2502,6 +2502,13 @@ int unpoison_memory(unsigned long pfn) goto unlock_mutex; }
+ if (folio_test_slab(folio) || PageTable(&folio->page) || folio_test_reserved(folio)) + goto unlock_mutex; + + /* + * Note that folio->_mapcount is overloaded in SLAB, so the simple test + * in folio_mapped() has to be done after folio_test_slab() is checked. + */ if (folio_mapped(folio)) { unpoison_pr_info("Unpoison: Someone maps the hwpoison page %#lx\n", pfn, &unpoison_rs); @@ -2514,9 +2521,6 @@ int unpoison_memory(unsigned long pfn) goto unlock_mutex; }
- if (folio_test_slab(folio) || PageTable(&folio->page) || folio_test_reserved(folio)) - goto unlock_mutex; - ghp = get_hwpoison_page(p, MF_UNPOISON); if (!ghp) { if (PageHuge(p)) {
From: Evan Quan evan.quan@amd.com
commit 064329c595da56eff6d7a7e7760660c726433139 upstream.
Preparation for coming optimization which eliminates the influence of GPU temperature momentary fluctuation.
Signed-off-by: Evan Quan evan.quan@amd.com Reviewed-by: Lijo Lazar lijo.lazar@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 2 ++ drivers/gpu/drm/amd/pm/powerplay/hwmgr/hardwaremanager.c | 4 +++- drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 2 ++ drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 10 ++++++++++ drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c | 4 ++++ drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega20_hwmgr.c | 4 ++++ drivers/gpu/drm/amd/pm/powerplay/inc/power_state.h | 1 + 7 files changed, 26 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h @@ -89,6 +89,8 @@ struct amdgpu_dpm_thermal { int max_mem_crit_temp; /* memory max emergency(shutdown) temp */ int max_mem_emergency_temp; + /* SWCTF threshold */ + int sw_ctf_threshold; /* was last interrupt low to high or high to low */ bool high_to_low; /* interrupt source */ --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/hardwaremanager.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/hardwaremanager.c @@ -241,7 +241,8 @@ int phm_start_thermal_controller(struct TEMP_RANGE_MAX, TEMP_RANGE_MIN, TEMP_RANGE_MAX, - TEMP_RANGE_MAX}; + TEMP_RANGE_MAX, + 0}; struct amdgpu_device *adev = hwmgr->adev;
if (!hwmgr->not_vf) @@ -265,6 +266,7 @@ int phm_start_thermal_controller(struct adev->pm.dpm.thermal.min_mem_temp = range.mem_min; adev->pm.dpm.thermal.max_mem_crit_temp = range.mem_crit_max; adev->pm.dpm.thermal.max_mem_emergency_temp = range.mem_emergency_max; + adev->pm.dpm.thermal.sw_ctf_threshold = range.sw_ctf_threshold;
return ret; } --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c @@ -5432,6 +5432,8 @@ static int smu7_get_thermal_temperature_ thermal_data->max = data->thermal_temp_setting.temperature_shutdown * PP_TEMPERATURE_UNITS_PER_CENTIGRADES;
+ thermal_data->sw_ctf_threshold = thermal_data->max; + return 0; }
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c @@ -5241,6 +5241,9 @@ static int vega10_get_thermal_temperatur { struct vega10_hwmgr *data = hwmgr->backend; PPTable_t *pp_table = &(data->smc_state_table.pp_table); + struct phm_ppt_v2_information *pp_table_info = + (struct phm_ppt_v2_information *)(hwmgr->pptable); + struct phm_tdp_table *tdp_table = pp_table_info->tdp_table;
memcpy(thermal_data, &SMU7ThermalWithDelayPolicy[0], sizeof(struct PP_TemperatureRange));
@@ -5257,6 +5260,13 @@ static int vega10_get_thermal_temperatur thermal_data->mem_emergency_max = (pp_table->ThbmLimit + CTF_OFFSET_HBM)* PP_TEMPERATURE_UNITS_PER_CENTIGRADES;
+ if (tdp_table->usSoftwareShutdownTemp > pp_table->ThotspotLimit && + tdp_table->usSoftwareShutdownTemp < VEGA10_THERMAL_MAXIMUM_ALERT_TEMP) + thermal_data->sw_ctf_threshold = tdp_table->usSoftwareShutdownTemp; + else + thermal_data->sw_ctf_threshold = VEGA10_THERMAL_MAXIMUM_ALERT_TEMP; + thermal_data->sw_ctf_threshold *= PP_TEMPERATURE_UNITS_PER_CENTIGRADES; + return 0; }
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c @@ -2763,6 +2763,8 @@ static int vega12_notify_cac_buffer_info static int vega12_get_thermal_temperature_range(struct pp_hwmgr *hwmgr, struct PP_TemperatureRange *thermal_data) { + struct phm_ppt_v3_information *pptable_information = + (struct phm_ppt_v3_information *)hwmgr->pptable; struct vega12_hwmgr *data = (struct vega12_hwmgr *)(hwmgr->backend); PPTable_t *pp_table = &(data->smc_state_table.pp_table); @@ -2781,6 +2783,8 @@ static int vega12_get_thermal_temperatur PP_TEMPERATURE_UNITS_PER_CENTIGRADES; thermal_data->mem_emergency_max = (pp_table->ThbmLimit + CTF_OFFSET_HBM)* PP_TEMPERATURE_UNITS_PER_CENTIGRADES; + thermal_data->sw_ctf_threshold = pptable_information->us_software_shutdown_temp * + PP_TEMPERATURE_UNITS_PER_CENTIGRADES;
return 0; } --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega20_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega20_hwmgr.c @@ -4206,6 +4206,8 @@ static int vega20_notify_cac_buffer_info static int vega20_get_thermal_temperature_range(struct pp_hwmgr *hwmgr, struct PP_TemperatureRange *thermal_data) { + struct phm_ppt_v3_information *pptable_information = + (struct phm_ppt_v3_information *)hwmgr->pptable; struct vega20_hwmgr *data = (struct vega20_hwmgr *)(hwmgr->backend); PPTable_t *pp_table = &(data->smc_state_table.pp_table); @@ -4224,6 +4226,8 @@ static int vega20_get_thermal_temperatur PP_TEMPERATURE_UNITS_PER_CENTIGRADES; thermal_data->mem_emergency_max = (pp_table->ThbmLimit + CTF_OFFSET_HBM)* PP_TEMPERATURE_UNITS_PER_CENTIGRADES; + thermal_data->sw_ctf_threshold = pptable_information->us_software_shutdown_temp * + PP_TEMPERATURE_UNITS_PER_CENTIGRADES;
return 0; } --- a/drivers/gpu/drm/amd/pm/powerplay/inc/power_state.h +++ b/drivers/gpu/drm/amd/pm/powerplay/inc/power_state.h @@ -131,6 +131,7 @@ struct PP_TemperatureRange { int mem_min; int mem_crit_max; int mem_emergency_max; + int sw_ctf_threshold; };
struct PP_StateValidationBlock {
From: Evan Quan evan.quan@amd.com
commit b75efe88b20c2be28b67e2821a794cc183e32374 upstream.
An intentional delay is added on soft ctf triggered. Then there will be a double check for the GPU temperature before taking further action. This can avoid unintended shutdown due to temperature momentary fluctuation.
Signed-off-by: Evan Quan evan.quan@amd.com Reviewed-by: Lijo Lazar lijo.lazar@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com [ Hand-modified because XCP support added to amdgpu.h in kernel 6.5 and is not necessary for this fix. ] Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1267 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2779 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 + drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 48 ++++++++++++++++++++ drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c | 27 +++-------- drivers/gpu/drm/amd/pm/powerplay/inc/hwmgr.h | 2 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 34 ++++++++++++++ drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 2 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 9 --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 9 --- 8 files changed, 102 insertions(+), 32 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -282,6 +282,9 @@ extern int amdgpu_sg_display; #define AMDGPU_SMARTSHIFT_MAX_BIAS (100) #define AMDGPU_SMARTSHIFT_MIN_BIAS (-100)
+/* Extra time delay(in ms) to eliminate the influence of temperature momentary fluctuation */ +#define AMDGPU_SWCTF_EXTRA_DELAY 50 + struct amdgpu_device; struct amdgpu_irq_src; struct amdgpu_fpriv; --- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c @@ -26,6 +26,7 @@ #include <linux/gfp.h> #include <linux/slab.h> #include <linux/firmware.h> +#include <linux/reboot.h> #include "amd_shared.h" #include "amd_powerplay.h" #include "power_state.h" @@ -91,6 +92,45 @@ static int pp_early_init(void *handle) return 0; }
+static void pp_swctf_delayed_work_handler(struct work_struct *work) +{ + struct pp_hwmgr *hwmgr = + container_of(work, struct pp_hwmgr, swctf_delayed_work.work); + struct amdgpu_device *adev = hwmgr->adev; + struct amdgpu_dpm_thermal *range = + &adev->pm.dpm.thermal; + uint32_t gpu_temperature, size; + int ret; + + /* + * If the hotspot/edge temperature is confirmed as below SW CTF setting point + * after the delay enforced, nothing will be done. + * Otherwise, a graceful shutdown will be performed to prevent further damage. + */ + if (range->sw_ctf_threshold && + hwmgr->hwmgr_func->read_sensor) { + ret = hwmgr->hwmgr_func->read_sensor(hwmgr, + AMDGPU_PP_SENSOR_HOTSPOT_TEMP, + &gpu_temperature, + &size); + /* + * For some legacy ASICs, hotspot temperature retrieving might be not + * supported. Check the edge temperature instead then. + */ + if (ret == -EOPNOTSUPP) + ret = hwmgr->hwmgr_func->read_sensor(hwmgr, + AMDGPU_PP_SENSOR_EDGE_TEMP, + &gpu_temperature, + &size); + if (!ret && gpu_temperature / 1000 < range->sw_ctf_threshold) + return; + } + + dev_emerg(adev->dev, "ERROR: GPU over temperature range(SW CTF) detected!\n"); + dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU SW CTF!\n"); + orderly_poweroff(true); +} + static int pp_sw_init(void *handle) { struct amdgpu_device *adev = handle; @@ -101,6 +141,10 @@ static int pp_sw_init(void *handle)
pr_debug("powerplay sw init %s\n", ret ? "failed" : "successfully");
+ if (!ret) + INIT_DELAYED_WORK(&hwmgr->swctf_delayed_work, + pp_swctf_delayed_work_handler); + return ret; }
@@ -135,6 +179,8 @@ static int pp_hw_fini(void *handle) struct amdgpu_device *adev = handle; struct pp_hwmgr *hwmgr = adev->powerplay.pp_handle;
+ cancel_delayed_work_sync(&hwmgr->swctf_delayed_work); + hwmgr_hw_fini(hwmgr);
return 0; @@ -221,6 +267,8 @@ static int pp_suspend(void *handle) struct amdgpu_device *adev = handle; struct pp_hwmgr *hwmgr = adev->powerplay.pp_handle;
+ cancel_delayed_work_sync(&hwmgr->swctf_delayed_work); + return hwmgr_suspend(hwmgr); }
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c @@ -603,21 +603,17 @@ int phm_irq_process(struct amdgpu_device struct amdgpu_irq_src *source, struct amdgpu_iv_entry *entry) { + struct pp_hwmgr *hwmgr = adev->powerplay.pp_handle; uint32_t client_id = entry->client_id; uint32_t src_id = entry->src_id;
if (client_id == AMDGPU_IRQ_CLIENTID_LEGACY) { if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_LOW_TO_HIGH) { - dev_emerg(adev->dev, "ERROR: GPU over temperature range(SW CTF) detected!\n"); - /* - * SW CTF just occurred. - * Try to do a graceful shutdown to prevent further damage. - */ - dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU SW CTF!\n"); - orderly_poweroff(true); - } else if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_HIGH_TO_LOW) + schedule_delayed_work(&hwmgr->swctf_delayed_work, + msecs_to_jiffies(AMDGPU_SWCTF_EXTRA_DELAY)); + } else if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_HIGH_TO_LOW) { dev_emerg(adev->dev, "ERROR: GPU under temperature range detected!\n"); - else if (src_id == VISLANDS30_IV_SRCID_GPIO_19) { + } else if (src_id == VISLANDS30_IV_SRCID_GPIO_19) { dev_emerg(adev->dev, "ERROR: GPU HW Critical Temperature Fault(aka CTF) detected!\n"); /* * HW CTF just occurred. Shutdown to prevent further damage. @@ -626,15 +622,10 @@ int phm_irq_process(struct amdgpu_device orderly_poweroff(true); } } else if (client_id == SOC15_IH_CLIENTID_THM) { - if (src_id == 0) { - dev_emerg(adev->dev, "ERROR: GPU over temperature range(SW CTF) detected!\n"); - /* - * SW CTF just occurred. - * Try to do a graceful shutdown to prevent further damage. - */ - dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU SW CTF!\n"); - orderly_poweroff(true); - } else + if (src_id == 0) + schedule_delayed_work(&hwmgr->swctf_delayed_work, + msecs_to_jiffies(AMDGPU_SWCTF_EXTRA_DELAY)); + else dev_emerg(adev->dev, "ERROR: GPU under temperature range detected!\n"); } else if (client_id == SOC15_IH_CLIENTID_ROM_SMUIO) { dev_emerg(adev->dev, "ERROR: GPU HW Critical Temperature Fault(aka CTF) detected!\n"); --- a/drivers/gpu/drm/amd/pm/powerplay/inc/hwmgr.h +++ b/drivers/gpu/drm/amd/pm/powerplay/inc/hwmgr.h @@ -811,6 +811,8 @@ struct pp_hwmgr { bool gfxoff_state_changed_by_workload; uint32_t pstate_sclk_peak; uint32_t pstate_mclk_peak; + + struct delayed_work swctf_delayed_work; };
int hwmgr_early_init(struct pp_hwmgr *hwmgr); --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -24,6 +24,7 @@
#include <linux/firmware.h> #include <linux/pci.h> +#include <linux/reboot.h>
#include "amdgpu.h" #include "amdgpu_smu.h" @@ -1070,6 +1071,34 @@ static void smu_interrupt_work_fn(struct smu->ppt_funcs->interrupt_work(smu); }
+static void smu_swctf_delayed_work_handler(struct work_struct *work) +{ + struct smu_context *smu = + container_of(work, struct smu_context, swctf_delayed_work.work); + struct smu_temperature_range *range = + &smu->thermal_range; + struct amdgpu_device *adev = smu->adev; + uint32_t hotspot_tmp, size; + + /* + * If the hotspot temperature is confirmed as below SW CTF setting point + * after the delay enforced, nothing will be done. + * Otherwise, a graceful shutdown will be performed to prevent further damage. + */ + if (range->software_shutdown_temp && + smu->ppt_funcs->read_sensor && + !smu->ppt_funcs->read_sensor(smu, + AMDGPU_PP_SENSOR_HOTSPOT_TEMP, + &hotspot_tmp, + &size) && + hotspot_tmp / 1000 < range->software_shutdown_temp) + return; + + dev_emerg(adev->dev, "ERROR: GPU over temperature range(SW CTF) detected!\n"); + dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU SW CTF!\n"); + orderly_poweroff(true); +} + static int smu_sw_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; @@ -1112,6 +1141,9 @@ static int smu_sw_init(void *handle) smu->smu_dpm.dpm_level = AMD_DPM_FORCED_LEVEL_AUTO; smu->smu_dpm.requested_dpm_level = AMD_DPM_FORCED_LEVEL_AUTO;
+ INIT_DELAYED_WORK(&smu->swctf_delayed_work, + smu_swctf_delayed_work_handler); + ret = smu_smc_table_sw_init(smu); if (ret) { dev_err(adev->dev, "Failed to sw init smc table!\n"); @@ -1592,6 +1624,8 @@ static int smu_smc_hw_cleanup(struct smu return ret; }
+ cancel_delayed_work_sync(&smu->swctf_delayed_work); + ret = smu_disable_dpms(smu); if (ret) { dev_err(adev->dev, "Fail to disable dpm features!\n"); --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h @@ -573,6 +573,8 @@ struct smu_context u32 debug_param_reg; u32 debug_msg_reg; u32 debug_resp_reg; + + struct delayed_work swctf_delayed_work; };
struct i2c_adapter; --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c @@ -1412,13 +1412,8 @@ static int smu_v11_0_irq_process(struct if (client_id == SOC15_IH_CLIENTID_THM) { switch (src_id) { case THM_11_0__SRCID__THM_DIG_THERM_L2H: - dev_emerg(adev->dev, "ERROR: GPU over temperature range(SW CTF) detected!\n"); - /* - * SW CTF just occurred. - * Try to do a graceful shutdown to prevent further damage. - */ - dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU SW CTF!\n"); - orderly_poweroff(true); + schedule_delayed_work(&smu->swctf_delayed_work, + msecs_to_jiffies(AMDGPU_SWCTF_EXTRA_DELAY)); break; case THM_11_0__SRCID__THM_DIG_THERM_H2L: dev_emerg(adev->dev, "ERROR: GPU under temperature range detected\n"); --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -1377,13 +1377,8 @@ static int smu_v13_0_irq_process(struct if (client_id == SOC15_IH_CLIENTID_THM) { switch (src_id) { case THM_11_0__SRCID__THM_DIG_THERM_L2H: - dev_emerg(adev->dev, "ERROR: GPU over temperature range(SW CTF) detected!\n"); - /* - * SW CTF just occurred. - * Try to do a graceful shutdown to prevent further damage. - */ - dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU SW CTF!\n"); - orderly_poweroff(true); + schedule_delayed_work(&smu->swctf_delayed_work, + msecs_to_jiffies(AMDGPU_SWCTF_EXTRA_DELAY)); break; case THM_11_0__SRCID__THM_DIG_THERM_H2L: dev_emerg(adev->dev, "ERROR: GPU under temperature range detected\n");
From: Yiyuan Guo yguoaz@gmail.com
commit 8a4629055ef55177b5b63dab1ecce676bd8cccdd upstream.
The struct cros_ec_command contains several integer fields and a trailing array. An allocation size neglecting the integer fields can lead to buffer overrun.
Reviewed-by: Tzung-Bi Shih tzungbi@kernel.org Signed-off-by: Yiyuan Guo yguoaz@gmail.com Fixes: 974e6f02e27e ("iio: cros_ec_sensors_core: Add common functions for the ChromeOS EC Sensor Hub.") Link: https://lore.kernel.org/r/20230630143719.1513906-1-yguoaz@gmail.com Cc: Stable@vger.kerenl.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c +++ b/drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.c @@ -253,7 +253,7 @@ int cros_ec_sensors_core_init(struct pla platform_set_drvdata(pdev, indio_dev);
state->ec = ec->ec_dev; - state->msg = devm_kzalloc(&pdev->dev, + state->msg = devm_kzalloc(&pdev->dev, sizeof(*state->msg) + max((u16)sizeof(struct ec_params_motion_sense), state->ec->max_response), GFP_KERNEL); if (!state->msg)
From: Dan Carpenter dan.carpenter@linaro.org
commit 507397d19b5a296aa339f7a1bd16284f668a1906 upstream.
The regulator_get_voltage() function returns negative error codes. This function saves it to an unsigned int and then does some range checking and, since the error code falls outside the correct range, it returns -EINVAL.
Beyond the messiness, this is bad because the regulator_get_voltage() function can return -EPROBE_DEFER and it's important to propagate that back properly so it can be handled.
Fixes: da35a7b526d9 ("iio: frequency: admv1013: add support for ADMV1013") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/r/ce75aac3-2aba-4435-8419-02e59fdd862b@moroto.mounta... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/frequency/admv1013.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/iio/frequency/admv1013.c +++ b/drivers/iio/frequency/admv1013.c @@ -344,9 +344,12 @@ static int admv1013_update_quad_filters(
static int admv1013_update_mixer_vgate(struct admv1013_state *st) { - unsigned int vcm, mixer_vgate; + unsigned int mixer_vgate; + int vcm;
vcm = regulator_get_voltage(st->reg); + if (vcm < 0) + return vcm;
if (vcm < 1800000) mixer_vgate = (2389 * vcm / 1000000 + 8100) / 100;
From: Alisa Roman alisa.roman@analog.com
commit 6bc471b6c3aeaa7b95d1b86a1bb8d91a3c341fa5 upstream.
AC excitation enable feature exposed to user on AD7192, allowing a bit which should be 0 to be set. This feature is specific only to AD7195. AC excitation attribute moved accordingly.
In the AD7195 documentation, the AC excitation enable bit is on position 22 in the Configuration register. ACX macro changed to match correct register and bit.
Note that the fix tag is for the commit that moved the driver out of staging.
Fixes: b581f748cce0 ("staging: iio: adc: ad7192: move out of staging") Signed-off-by: Alisa Roman alisa.roman@analog.com Cc: stable@vger.kernel.org Reviewed-by: Nuno Sa nuno.sa@analog.com Link: https://lore.kernel.org/r/20230614155242.160296-1-alisa.roman@analog.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/ad7192.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/iio/adc/ad7192.c +++ b/drivers/iio/adc/ad7192.c @@ -62,7 +62,6 @@ #define AD7192_MODE_STA_MASK BIT(20) /* Status Register transmission Mask */ #define AD7192_MODE_CLKSRC(x) (((x) & 0x3) << 18) /* Clock Source Select */ #define AD7192_MODE_SINC3 BIT(15) /* SINC3 Filter Select */ -#define AD7192_MODE_ACX BIT(14) /* AC excitation enable(AD7195 only)*/ #define AD7192_MODE_ENPAR BIT(13) /* Parity Enable */ #define AD7192_MODE_CLKDIV BIT(12) /* Clock divide by 2 (AD7190/2 only)*/ #define AD7192_MODE_SCYCLE BIT(11) /* Single cycle conversion */ @@ -91,6 +90,7 @@ /* Configuration Register Bit Designations (AD7192_REG_CONF) */
#define AD7192_CONF_CHOP BIT(23) /* CHOP enable */ +#define AD7192_CONF_ACX BIT(22) /* AC excitation enable(AD7195 only) */ #define AD7192_CONF_REFSEL BIT(20) /* REFIN1/REFIN2 Reference Select */ #define AD7192_CONF_CHAN(x) ((x) << 8) /* Channel select */ #define AD7192_CONF_CHAN_MASK (0x7FF << 8) /* Channel select mask */ @@ -472,7 +472,7 @@ static ssize_t ad7192_show_ac_excitation struct iio_dev *indio_dev = dev_to_iio_dev(dev); struct ad7192_state *st = iio_priv(indio_dev);
- return sysfs_emit(buf, "%d\n", !!(st->mode & AD7192_MODE_ACX)); + return sysfs_emit(buf, "%d\n", !!(st->conf & AD7192_CONF_ACX)); }
static ssize_t ad7192_show_bridge_switch(struct device *dev, @@ -513,13 +513,13 @@ static ssize_t ad7192_set(struct device
ad_sd_write_reg(&st->sd, AD7192_REG_GPOCON, 1, st->gpocon); break; - case AD7192_REG_MODE: + case AD7192_REG_CONF: if (val) - st->mode |= AD7192_MODE_ACX; + st->conf |= AD7192_CONF_ACX; else - st->mode &= ~AD7192_MODE_ACX; + st->conf &= ~AD7192_CONF_ACX;
- ad_sd_write_reg(&st->sd, AD7192_REG_MODE, 3, st->mode); + ad_sd_write_reg(&st->sd, AD7192_REG_CONF, 3, st->conf); break; default: ret = -EINVAL; @@ -579,12 +579,11 @@ static IIO_DEVICE_ATTR(bridge_switch_en,
static IIO_DEVICE_ATTR(ac_excitation_en, 0644, ad7192_show_ac_excitation, ad7192_set, - AD7192_REG_MODE); + AD7192_REG_CONF);
static struct attribute *ad7192_attributes[] = { &iio_dev_attr_filter_low_pass_3db_frequency_available.dev_attr.attr, &iio_dev_attr_bridge_switch_en.dev_attr.attr, - &iio_dev_attr_ac_excitation_en.dev_attr.attr, NULL };
@@ -595,6 +594,7 @@ static const struct attribute_group ad71 static struct attribute *ad7195_attributes[] = { &iio_dev_attr_filter_low_pass_3db_frequency_available.dev_attr.attr, &iio_dev_attr_bridge_switch_en.dev_attr.attr, + &iio_dev_attr_ac_excitation_en.dev_attr.attr, NULL };
From: George Stark gnstark@sberdevices.ru
commit 09738ccbc4148c62d6c8c4644ff4a099d57f49ad upstream.
Enable core clock at probe stage and disable it at remove stage. Core clock is responsible for turning on/off the entire SoC module so it should be on before the first module register is touched and be off at very last moment.
Fixes: 3adbf3427330 ("iio: adc: add a driver for the SAR ADC found in Amlogic Meson SoCs") Signed-off-by: George Stark gnstark@sberdevices.ru Link: https://lore.kernel.org/r/20230721102413.255726-2-gnstark@sberdevices.ru Cc: stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/meson_saradc.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)
--- a/drivers/iio/adc/meson_saradc.c +++ b/drivers/iio/adc/meson_saradc.c @@ -916,12 +916,6 @@ static int meson_sar_adc_hw_enable(struc goto err_vref; }
- ret = clk_prepare_enable(priv->core_clk); - if (ret) { - dev_err(dev, "failed to enable core clk\n"); - goto err_core_clk; - } - regval = FIELD_PREP(MESON_SAR_ADC_REG0_FIFO_CNT_IRQ_MASK, 1); regmap_update_bits(priv->regmap, MESON_SAR_ADC_REG0, MESON_SAR_ADC_REG0_FIFO_CNT_IRQ_MASK, regval); @@ -948,8 +942,6 @@ err_adc_clk: regmap_update_bits(priv->regmap, MESON_SAR_ADC_REG3, MESON_SAR_ADC_REG3_ADC_EN, 0); meson_sar_adc_set_bandgap(indio_dev, false); - clk_disable_unprepare(priv->core_clk); -err_core_clk: regulator_disable(priv->vref); err_vref: meson_sar_adc_unlock(indio_dev); @@ -977,8 +969,6 @@ static void meson_sar_adc_hw_disable(str
meson_sar_adc_set_bandgap(indio_dev, false);
- clk_disable_unprepare(priv->core_clk); - regulator_disable(priv->vref);
if (!ret) @@ -1211,7 +1201,7 @@ static int meson_sar_adc_probe(struct pl if (IS_ERR(priv->clkin)) return dev_err_probe(dev, PTR_ERR(priv->clkin), "failed to get clkin\n");
- priv->core_clk = devm_clk_get(dev, "core"); + priv->core_clk = devm_clk_get_enabled(dev, "core"); if (IS_ERR(priv->core_clk)) return dev_err_probe(dev, PTR_ERR(priv->core_clk), "failed to get core clk\n");
@@ -1294,15 +1284,26 @@ static int meson_sar_adc_remove(struct p static int meson_sar_adc_suspend(struct device *dev) { struct iio_dev *indio_dev = dev_get_drvdata(dev); + struct meson_sar_adc_priv *priv = iio_priv(indio_dev);
meson_sar_adc_hw_disable(indio_dev);
+ clk_disable_unprepare(priv->core_clk); + return 0; }
static int meson_sar_adc_resume(struct device *dev) { struct iio_dev *indio_dev = dev_get_drvdata(dev); + struct meson_sar_adc_priv *priv = iio_priv(indio_dev); + int ret; + + ret = clk_prepare_enable(priv->core_clk); + if (ret) { + dev_err(dev, "failed to enable core clk\n"); + return ret; + }
return meson_sar_adc_hw_enable(indio_dev); }
From: Alvin Šipraga alsi@bang-olufsen.dk
commit a41e19cc0d6b6a445a4133170b90271e4a2553dc upstream.
The affected lines were resulting in a NULL pointer dereference on our platform because the device tree contained the following list of compatible strings:
power-sensor@40 { compatible = "ti,ina232", "ti,ina231"; ... };
Since the driver doesn't declare a compatible string "ti,ina232", the OF matching succeeds on "ti,ina231". But the I2C device ID info is populated via the first compatible string, cf. modalias population in of_i2c_get_board_info(). Since there is no "ina232" entry in the legacy I2C device ID table either, the struct i2c_device_id *id pointer in the probe function is NULL.
Fix this by using the already populated type variable instead, which points to the proper driver data. Since the name is also wanted, add a generic one to the ina2xx_config table.
Signed-off-by: Alvin Šipraga alsi@bang-olufsen.dk Fixes: c43a102e67db ("iio: ina2xx: add support for TI INA2xx Power Monitors") Link: https://lore.kernel.org/r/20230619141239.2257392-1-alvin@pqrs.dk Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/ina2xx-adc.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/iio/adc/ina2xx-adc.c +++ b/drivers/iio/adc/ina2xx-adc.c @@ -124,6 +124,7 @@ static const struct regmap_config ina2xx enum ina2xx_ids { ina219, ina226 };
struct ina2xx_config { + const char *name; u16 config_default; int calibration_value; int shunt_voltage_lsb; /* nV */ @@ -155,6 +156,7 @@ struct ina2xx_chip_info {
static const struct ina2xx_config ina2xx_config[] = { [ina219] = { + .name = "ina219", .config_default = INA219_CONFIG_DEFAULT, .calibration_value = 4096, .shunt_voltage_lsb = 10000, @@ -164,6 +166,7 @@ static const struct ina2xx_config ina2xx .chip_id = ina219, }, [ina226] = { + .name = "ina226", .config_default = INA226_CONFIG_DEFAULT, .calibration_value = 2048, .shunt_voltage_lsb = 2500, @@ -996,7 +999,7 @@ static int ina2xx_probe(struct i2c_clien /* Patch the current config register with default. */ val = chip->config->config_default;
- if (id->driver_data == ina226) { + if (type == ina226) { ina226_set_average(chip, INA226_DEFAULT_AVG, &val); ina226_set_int_time_vbus(chip, INA226_DEFAULT_IT, &val); ina226_set_int_time_vshunt(chip, INA226_DEFAULT_IT, &val); @@ -1015,7 +1018,7 @@ static int ina2xx_probe(struct i2c_clien }
indio_dev->modes = INDIO_DIRECT_MODE; - if (id->driver_data == ina226) { + if (type == ina226) { indio_dev->channels = ina226_channels; indio_dev->num_channels = ARRAY_SIZE(ina226_channels); indio_dev->info = &ina226_info; @@ -1024,7 +1027,7 @@ static int ina2xx_probe(struct i2c_clien indio_dev->num_channels = ARRAY_SIZE(ina219_channels); indio_dev->info = &ina219_info; } - indio_dev->name = id->name; + indio_dev->name = id ? id->name : chip->config->name;
ret = devm_iio_kfifo_buffer_setup(&client->dev, indio_dev, &ina2xx_setup_ops);
From: Qi Zheng zhengqi.arch@bytedance.com
commit adb9743d6a08778b78d62d16b4230346d3508986 upstream.
In binder_init(), the destruction of binder_alloc_shrinker_init() is not performed in the wrong path, which will cause memory leaks. So this commit introduces binder_alloc_shrinker_exit() and calls it in the wrong path to fix that.
Signed-off-by: Qi Zheng zhengqi.arch@bytedance.com Acked-by: Carlos Llamas cmllamas@google.com Fixes: f2517eb76f1f ("android: binder: Add global lru shrinker to binder") Cc: stable stable@kernel.org Link: https://lore.kernel.org/r/20230625154937.64316-1-qi.zheng@linux.dev Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/android/binder.c | 1 + drivers/android/binder_alloc.c | 6 ++++++ drivers/android/binder_alloc.h | 1 + 3 files changed, 8 insertions(+)
--- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -6610,6 +6610,7 @@ err_init_binder_device_failed:
err_alloc_device_names_failed: debugfs_remove_recursive(binder_debugfs_dir_entry_root); + binder_alloc_shrinker_exit();
return ret; } --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -1087,6 +1087,12 @@ int binder_alloc_shrinker_init(void) return ret; }
+void binder_alloc_shrinker_exit(void) +{ + unregister_shrinker(&binder_shrinker); + list_lru_destroy(&binder_alloc_lru); +} + /** * check_buffer() - verify that buffer/offset is safe to access * @alloc: binder_alloc for this proc --- a/drivers/android/binder_alloc.h +++ b/drivers/android/binder_alloc.h @@ -129,6 +129,7 @@ extern struct binder_buffer *binder_allo int pid); extern void binder_alloc_init(struct binder_alloc *alloc); extern int binder_alloc_shrinker_init(void); +extern void binder_alloc_shrinker_exit(void); extern void binder_alloc_vma_close(struct binder_alloc *alloc); extern struct binder_buffer * binder_alloc_prepare_to_free(struct binder_alloc *alloc,
From: Ricky WU ricky_wu@realtek.com
commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0 to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG always set to HIGH during the initialization.
Cc: stable@vger.kernel.org Signed-off-by: Ricky Wu ricky_wu@realtek.com Link: https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/cardreader/rts5227.c | 2 +- drivers/misc/cardreader/rts5228.c | 18 ------------------ drivers/misc/cardreader/rts5249.c | 3 +-- drivers/misc/cardreader/rts5260.c | 18 ------------------ drivers/misc/cardreader/rts5261.c | 18 ------------------ drivers/misc/cardreader/rtsx_pcr.c | 5 ++++- 6 files changed, 6 insertions(+), 58 deletions(-)
--- a/drivers/misc/cardreader/rts5227.c +++ b/drivers/misc/cardreader/rts5227.c @@ -195,7 +195,7 @@ static int rts5227_extra_init_hw(struct } }
- if (option->force_clkreq_0) + if (option->force_clkreq_0 && pcr->aspm_mode == ASPM_MODE_CFG) rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, PETXCFG, FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); else --- a/drivers/misc/cardreader/rts5228.c +++ b/drivers/misc/cardreader/rts5228.c @@ -435,17 +435,10 @@ static void rts5228_init_from_cfg(struct option->ltr_enabled = false; } } - - if (rtsx_check_dev_flag(pcr, ASPM_L1_1_EN | ASPM_L1_2_EN - | PM_L1_1_EN | PM_L1_2_EN)) - option->force_clkreq_0 = false; - else - option->force_clkreq_0 = true; }
static int rts5228_extra_init_hw(struct rtsx_pcr *pcr) { - struct rtsx_cr_option *option = &pcr->option;
rtsx_pci_write_register(pcr, RTS5228_AUTOLOAD_CFG1, CD_RESUME_EN_MASK, CD_RESUME_EN_MASK); @@ -476,17 +469,6 @@ static int rts5228_extra_init_hw(struct else rtsx_pci_write_register(pcr, PETXCFG, 0x30, 0x00);
- /* - * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced - * to drive low, and we forcibly request clock. - */ - if (option->force_clkreq_0) - rtsx_pci_write_register(pcr, PETXCFG, - FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); - else - rtsx_pci_write_register(pcr, PETXCFG, - FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH); - rtsx_pci_write_register(pcr, PWD_SUSPEND_EN, 0xFF, 0xFB);
if (pcr->rtd3_en) { --- a/drivers/misc/cardreader/rts5249.c +++ b/drivers/misc/cardreader/rts5249.c @@ -327,12 +327,11 @@ static int rts5249_extra_init_hw(struct } }
- /* * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced * to drive low, and we forcibly request clock. */ - if (option->force_clkreq_0) + if (option->force_clkreq_0 && pcr->aspm_mode == ASPM_MODE_CFG) rtsx_pci_write_register(pcr, PETXCFG, FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); else --- a/drivers/misc/cardreader/rts5260.c +++ b/drivers/misc/cardreader/rts5260.c @@ -517,17 +517,10 @@ static void rts5260_init_from_cfg(struct option->ltr_enabled = false; } } - - if (rtsx_check_dev_flag(pcr, ASPM_L1_1_EN | ASPM_L1_2_EN - | PM_L1_1_EN | PM_L1_2_EN)) - option->force_clkreq_0 = false; - else - option->force_clkreq_0 = true; }
static int rts5260_extra_init_hw(struct rtsx_pcr *pcr) { - struct rtsx_cr_option *option = &pcr->option;
/* Set mcu_cnt to 7 to ensure data can be sampled properly */ rtsx_pci_write_register(pcr, 0xFC03, 0x7F, 0x07); @@ -546,17 +539,6 @@ static int rts5260_extra_init_hw(struct
rts5260_init_hw(pcr);
- /* - * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced - * to drive low, and we forcibly request clock. - */ - if (option->force_clkreq_0) - rtsx_pci_write_register(pcr, PETXCFG, - FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); - else - rtsx_pci_write_register(pcr, PETXCFG, - FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH); - rtsx_pci_write_register(pcr, pcr->reg_pm_ctrl3, 0x10, 0x00);
return 0; --- a/drivers/misc/cardreader/rts5261.c +++ b/drivers/misc/cardreader/rts5261.c @@ -498,17 +498,10 @@ static void rts5261_init_from_cfg(struct option->ltr_enabled = false; } } - - if (rtsx_check_dev_flag(pcr, ASPM_L1_1_EN | ASPM_L1_2_EN - | PM_L1_1_EN | PM_L1_2_EN)) - option->force_clkreq_0 = false; - else - option->force_clkreq_0 = true; }
static int rts5261_extra_init_hw(struct rtsx_pcr *pcr) { - struct rtsx_cr_option *option = &pcr->option; u32 val;
rtsx_pci_write_register(pcr, RTS5261_AUTOLOAD_CFG1, @@ -554,17 +547,6 @@ static int rts5261_extra_init_hw(struct else rtsx_pci_write_register(pcr, PETXCFG, 0x30, 0x00);
- /* - * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced - * to drive low, and we forcibly request clock. - */ - if (option->force_clkreq_0) - rtsx_pci_write_register(pcr, PETXCFG, - FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); - else - rtsx_pci_write_register(pcr, PETXCFG, - FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH); - rtsx_pci_write_register(pcr, PWD_SUSPEND_EN, 0xFF, 0xFB);
if (pcr->rtd3_en) { --- a/drivers/misc/cardreader/rtsx_pcr.c +++ b/drivers/misc/cardreader/rtsx_pcr.c @@ -1326,8 +1326,11 @@ static int rtsx_pci_init_hw(struct rtsx_ return err; }
- if (pcr->aspm_mode == ASPM_MODE_REG) + if (pcr->aspm_mode == ASPM_MODE_REG) { rtsx_pci_write_register(pcr, ASPM_FORCE_CTL, 0x30, 0x30); + rtsx_pci_write_register(pcr, PETXCFG, + FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH); + }
/* No CD interrupt if probing driver with card inserted. * So we need to initialize pcr->card_exist here.
From: Mika Westerberg mika.westerberg@linux.intel.com
commit 596a5123cc782d458b057eb3837e66535cd0befa upstream.
The memory allocated in tb_queue_dp_bandwidth_request() needs to be released once the request is handled to avoid leaking it.
Fixes: 6ce3563520be ("thunderbolt: Add support for DisplayPort bandwidth allocation mode") Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thunderbolt/tb.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -1810,6 +1810,8 @@ unlock:
pm_runtime_mark_last_busy(&tb->dev); pm_runtime_put_autosuspend(&tb->dev); + + kfree(ev); }
static void tb_queue_dp_bandwidth_request(struct tb *tb, u64 route, u8 port)
From: Alan Stern stern@rowland.harvard.edu
commit a6ff6e7a9dd69364547751db0f626a10a6d628d2 upstream.
Syzbot got KMSAN to complain about access to an uninitialized value in the alauda subdriver of usb-storage:
BUG: KMSAN: uninit-value in alauda_transport+0x462/0x57f0 drivers/usb/storage/alauda.c:1137 CPU: 0 PID: 12279 Comm: usb-storage Not tainted 5.3.0-rc7+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x191/0x1f0 lib/dump_stack.c:113 kmsan_report+0x13a/0x2b0 mm/kmsan/kmsan_report.c:108 __msan_warning+0x73/0xe0 mm/kmsan/kmsan_instr.c:250 alauda_check_media+0x344/0x3310 drivers/usb/storage/alauda.c:460
The problem is that alauda_check_media() doesn't verify that its USB transfer succeeded before trying to use the received data. What should happen if the transfer fails isn't entirely clear, but a reasonably conservative approach is to pretend that no media is present.
A similar problem exists in a usb_stor_dbg() call in alauda_get_media_status(). In this case, when an error occurs the call is redundant, because usb_stor_ctrl_transfer() already will print a debugging message.
Finally, unrelated to the uninitialized memory access, is the fact that alauda_check_media() performs DMA to a buffer on the stack. Fortunately usb-storage provides a general purpose DMA-able buffer for uses like this. We'll use it instead.
Reported-and-tested-by: syzbot+e7d46eb426883fb97efd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/0000000000007d25ff059457342d@google.com/T/ Suggested-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Alan Stern stern@rowland.harvard.edu Fixes: e80b0fade09e ("[PATCH] USB Storage: add alauda support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/693d5d5e-f09b-42d0-8ed9-1f96cd30bcce@rowland.harva... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/storage/alauda.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/usb/storage/alauda.c +++ b/drivers/usb/storage/alauda.c @@ -318,7 +318,8 @@ static int alauda_get_media_status(struc rc = usb_stor_ctrl_transfer(us, us->recv_ctrl_pipe, command, 0xc0, 0, 1, data, 2);
- usb_stor_dbg(us, "Media status %02X %02X\n", data[0], data[1]); + if (rc == USB_STOR_XFER_GOOD) + usb_stor_dbg(us, "Media status %02X %02X\n", data[0], data[1]);
return rc; } @@ -454,9 +455,14 @@ static int alauda_init_media(struct us_d static int alauda_check_media(struct us_data *us) { struct alauda_info *info = (struct alauda_info *) us->extra; - unsigned char status[2]; + unsigned char *status = us->iobuf; + int rc;
- alauda_get_media_status(us, status); + rc = alauda_get_media_status(us, status); + if (rc != USB_STOR_XFER_GOOD) { + status[0] = 0xF0; /* Pretend there's no media */ + status[1] = 0; + }
/* Check for no media or door open */ if ((status[0] & 0x80) || ((status[0] & 0x1F) == 0x10)
From: Elson Roy Serrao quic_eserrao@quicinc.com
commit 3ddaa6a274578e23745b7466346fc2650df8f959 upstream.
If dwc3 is runtime suspended we defer processing the event buffer until resume, by setting the pending_events flag. Set this flag before triggering resume to avoid race with the runtime resume callback.
While handling the pending events, in addition to checking the event buffer we also need to process it. Handle this by explicitly calling dwc3_thread_interrupt(). Also balance the runtime pm get() operation that triggered this processing.
Cc: stable@vger.kernel.org Fixes: fc8bb91bc83e ("usb: dwc3: implement runtime PM") Signed-off-by: Elson Roy Serrao quic_eserrao@quicinc.com Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Reviewed-by: Roger Quadros rogerq@kernel.org Link: https://lore.kernel.org/r/20230801192658.19275-1-quic_eserrao@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/gadget.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -4448,9 +4448,14 @@ static irqreturn_t dwc3_check_event_buf( u32 count;
if (pm_runtime_suspended(dwc->dev)) { + dwc->pending_events = true; + /* + * Trigger runtime resume. The get() function will be balanced + * after processing the pending events in dwc3_process_pending + * events(). + */ pm_runtime_get(dwc->dev); disable_irq_nosync(dwc->irq_gadget); - dwc->pending_events = true; return IRQ_HANDLED; }
@@ -4711,6 +4716,8 @@ void dwc3_gadget_process_pending_events( { if (dwc->pending_events) { dwc3_interrupt(dwc->irq_gadget, dwc->ev_buf); + dwc3_thread_interrupt(dwc->irq_gadget, dwc->ev_buf); + pm_runtime_put(dwc->dev); dwc->pending_events = false; enable_irq(dwc->irq_gadget); }
From: Alan Stern stern@rowland.harvard.edu
commit 65dadb2beeb7360232b09ebc4585b54475dfee06 upstream.
Avichal Rakesh reported a kernel panic that occurred when the UVC gadget driver was removed from a gadget's configuration. The panic involves a somewhat complicated interaction between the kernel driver and a userspace component (as described in the Link tag below), but the analysis did make one thing clear: The Gadget core should accomodate gadget drivers calling usb_gadget_deactivate() as part of their unbind procedure.
Currently this doesn't work. gadget_unbind_driver() calls driver->unbind() while holding the udc->connect_lock mutex, and usb_gadget_deactivate() attempts to acquire that mutex, which will result in a deadlock.
The simple fix is for gadget_unbind_driver() to release the mutex when invoking the ->unbind() callback. There is no particular reason for it to be holding the mutex at that time, and the mutex isn't held while the ->bind() callback is invoked. So we'll drop the mutex before performing the unbind callback and reacquire it afterward.
We'll also add a couple of comments to usb_gadget_activate() and usb_gadget_deactivate(). Because they run in process context they must not be called from a gadget driver's ->disconnect() callback, which (according to the kerneldoc for struct usb_gadget_driver in include/linux/usb/gadget.h) may run in interrupt context. This may help prevent similar bugs from arising in the future.
Reported-and-tested-by: Avichal Rakesh arakesh@google.com Signed-off-by: Alan Stern stern@rowland.harvard.edu Fixes: 286d9975a838 ("usb: gadget: udc: core: Prevent soft_connect_store() race") Link: https://lore.kernel.org/linux-usb/4d7aa3f4-22d9-9f5a-3d70-1bd7148ff4ba@googl... Cc: Badhri Jagan Sridharan badhri@google.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/48b2f1f1-0639-46bf-bbfc-98cb05a24914@rowland.harva... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/udc/core.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c index cd58f2a4e7f3..7d49d8a0b00c 100644 --- a/drivers/usb/gadget/udc/core.c +++ b/drivers/usb/gadget/udc/core.c @@ -822,6 +822,9 @@ EXPORT_SYMBOL_GPL(usb_gadget_disconnect); * usb_gadget_activate() is called. For example, user mode components may * need to be activated before the system can talk to hosts. * + * This routine may sleep; it must not be called in interrupt context + * (such as from within a gadget driver's disconnect() callback). + * * Returns zero on success, else negative errno. */ int usb_gadget_deactivate(struct usb_gadget *gadget) @@ -860,6 +863,8 @@ EXPORT_SYMBOL_GPL(usb_gadget_deactivate); * This routine activates gadget which was previously deactivated with * usb_gadget_deactivate() call. It calls usb_gadget_connect() if needed. * + * This routine may sleep; it must not be called in interrupt context. + * * Returns zero on success, else negative errno. */ int usb_gadget_activate(struct usb_gadget *gadget) @@ -1638,7 +1643,11 @@ static void gadget_unbind_driver(struct device *dev) usb_gadget_disable_async_callbacks(udc); if (gadget->irq) synchronize_irq(gadget->irq); + mutex_unlock(&udc->connect_lock); + udc->driver->unbind(gadget); + + mutex_lock(&udc->connect_lock); usb_gadget_udc_stop_locked(udc); mutex_unlock(&udc->connect_lock);
From: Prashanth K quic_prashk@quicinc.com
commit 8e21a620c7e6e00347ade1a6ed4967b359eada5a upstream.
Currently if we bootup a device without cable connected, then usb-conn-gpio won't call set_role() because last_role is same as current role. This happens since last_role gets initialised to zero during the probe.
To avoid this, add a new flag initial_detection into struct usb_conn_info, which prevents bailing out during initial detection.
Cc: stable@vger.kernel.org # 5.4 Fixes: 4602f3bff266 ("usb: common: add USB GPIO based connection detection driver") Signed-off-by: Prashanth K quic_prashk@quicinc.com Tested-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/1690880632-12588-1-git-send-email-quic_prashk@quic... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/common/usb-conn-gpio.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/usb/common/usb-conn-gpio.c +++ b/drivers/usb/common/usb-conn-gpio.c @@ -42,6 +42,7 @@ struct usb_conn_info {
struct power_supply_desc desc; struct power_supply *charger; + bool initial_detection; };
/* @@ -86,11 +87,13 @@ static void usb_conn_detect_cable(struct dev_dbg(info->dev, "role %s -> %s, gpios: id %d, vbus %d\n", usb_role_string(info->last_role), usb_role_string(role), id, vbus);
- if (info->last_role == role) { + if (!info->initial_detection && info->last_role == role) { dev_warn(info->dev, "repeated role: %s\n", usb_role_string(role)); return; }
+ info->initial_detection = false; + if (info->last_role == USB_ROLE_HOST && info->vbus) regulator_disable(info->vbus);
@@ -258,6 +261,7 @@ static int usb_conn_probe(struct platfor device_set_wakeup_capable(&pdev->dev, true);
/* Perform initial detection */ + info->initial_detection = true; usb_conn_queue_dwork(info, 0);
return 0;
From: Badhri Jagan Sridharan badhri@google.com
commit 4270d2b4845e820b274702bfc2a7140f69e4d19d upstream.
Do not transition to SNK_UNATTACHED state when receiving vsafe0v event while in SNK_HARD_RESET_WAIT_VBUS. Ignore VBUS off events as well as in some platforms VBUS off can be signalled more than once.
[143515.364753] Requesting mux state 1, usb-role 2, orientation 2 [143515.365520] pending state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_SINK_ON @ 650 ms [rev3 HARD_RESET] [143515.632281] CC1: 0 -> 0, CC2: 3 -> 0 [state SNK_HARD_RESET_SINK_OFF, polarity 1, disconnected] [143515.637214] VBUS on [143515.664985] VBUS off [143515.664992] state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_WAIT_VBUS [rev3 HARD_RESET] [143515.665564] VBUS VSAFE0V [143515.665566] state change SNK_HARD_RESET_WAIT_VBUS -> SNK_UNATTACHED [rev3 HARD_RESET]
Fixes: 28b43d3d746b ("usb: typec: tcpm: Introduce vsafe0v for vbus") Cc: stable@vger.kernel.org Signed-off-by: Badhri Jagan Sridharan badhri@google.com Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20230712085722.1414743-1-badhri@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -5348,6 +5348,10 @@ static void _tcpm_pd_vbus_off(struct tcp /* Do nothing, vbus drop expected */ break;
+ case SNK_HARD_RESET_WAIT_VBUS: + /* Do nothing, its OK to receive vbus off events */ + break; + default: if (port->pwr_role == TYPEC_SINK && port->attached) tcpm_set_state(port, SNK_UNATTACHED, tcpm_wait_for_discharge(port)); @@ -5394,6 +5398,9 @@ static void _tcpm_pd_vbus_vsafe0v(struct case SNK_DEBOUNCED: /*Do nothing, still waiting for VSAFE5V for connect */ break; + case SNK_HARD_RESET_WAIT_VBUS: + /* Do nothing, its OK to receive vbus off events */ + break; default: if (port->pwr_role == TYPEC_SINK && port->auto_vbus_discharge_enabled) tcpm_set_state(port, SNK_UNATTACHED, 0);
From: RD Babiera rdbabiera@google.com
commit 5a5ccd61cfd76156cb3e0373c300c509d05448ce upstream.
When connecting to some DisplayPort partners, the initial status update after entering DisplayPort Alt Mode notifies that the DFP_D/UFP_D is not in the connected state. This leads to sending a configure message that keeps the device in USB mode. The port partner then sets DFP_D/UFP_D to the connected state and HPD to high in the same Attention message. Currently, the HPD signal is dropped in order to handle configuration.
This patch saves changes to the HPD signal when the device chooses to configure during dp_altmode_status_update, and invokes sysfs_notify if necessary for HPD after configuring.
Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode") Cc: stable@vger.kernel.org Signed-off-by: RD Babiera rdbabiera@google.com Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20230726020903.1409072-1-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/altmodes/displayport.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
--- a/drivers/usb/typec/altmodes/displayport.c +++ b/drivers/usb/typec/altmodes/displayport.c @@ -60,6 +60,7 @@ struct dp_altmode {
enum dp_state state; bool hpd; + bool pending_hpd;
struct mutex lock; /* device lock */ struct work_struct work; @@ -144,8 +145,13 @@ static int dp_altmode_status_update(stru dp->state = DP_STATE_EXIT; } else if (!(con & DP_CONF_CURRENTLY(dp->data.conf))) { ret = dp_altmode_configure(dp, con); - if (!ret) + if (!ret) { dp->state = DP_STATE_CONFIGURE; + if (dp->hpd != hpd) { + dp->hpd = hpd; + dp->pending_hpd = true; + } + } } else { if (dp->hpd != hpd) { drm_connector_oob_hotplug_event(dp->connector_fwnode); @@ -161,6 +167,16 @@ static int dp_altmode_configured(struct { sysfs_notify(&dp->alt->dev.kobj, "displayport", "configuration"); sysfs_notify(&dp->alt->dev.kobj, "displayport", "pin_assignment"); + /* + * If the DFP_D/UFP_D sends a change in HPD when first notifying the + * DisplayPort driver that it is connected, then we wait until + * configuration is complete to signal HPD. + */ + if (dp->pending_hpd) { + drm_connector_oob_hotplug_event(dp->connector_fwnode); + sysfs_notify(&dp->alt->dev.kobj, "displayport", "hpd"); + dp->pending_hpd = false; + }
return dp_altmode_notify(dp); }
From: Nick Desaulniers ndesaulniers@google.com
commit cbe8ded48b939b9d55d2c5589ab56caa7b530709 upstream.
The assertion added to verify the difference in bits set of the addresses of srso_untrain_ret_alias() and srso_safe_ret_alias() would fail to link in LLVM's ld.lld linker with the following error:
ld.lld: error: ./arch/x86/kernel/vmlinux.lds:210: at least one side of the expression must be absolute ld.lld: error: ./arch/x86/kernel/vmlinux.lds:211: at least one side of the expression must be absolute
Use ABSOLUTE to evaluate the expression referring to at least one of the symbols so that LLD can evaluate the linker script.
Also, add linker version info to the comment about XOR being unsupported in either ld.bfd or ld.lld until somewhat recently.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Closes: https://lore.kernel.org/llvm/CA+G9fYsdUeNu-gwbs0+T6XHi4hYYk=Y9725-wFhZ7gJMsp... Reported-by: Nathan Chancellor nathan@kernel.org Reported-by: Daniel Kolesa daniel@octaforge.org Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Suggested-by: Sven Volkinsfeld thyrc@gmx.net Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://github.com/ClangBuiltLinux/linux/issues/1907 Link: https://lore.kernel.org/r/20230809-gds-v1-1-eaac90b0cbcc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/vmlinux.lds.S | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -529,11 +529,17 @@ INIT_PER_CPU(irq_stack_backing_store);
#ifdef CONFIG_CPU_SRSO /* - * GNU ld cannot do XOR so do: (A | B) - (A & B) in order to compute the XOR + * GNU ld cannot do XOR until 2.41. + * https://sourceware.org/git/?p=binutils-gdb.git%3Ba=commit%3Bh=f6f78318fca803... + * + * LLVM lld cannot do XOR until lld-17. + * https://github.com/llvm/llvm-project/commit/fae96104d4378166cbe5c875ef8ed808... + * + * Instead do: (A | B) - (A & B) in order to compute the XOR * of the two function addresses: */ -. = ASSERT(((srso_untrain_ret_alias | srso_safe_ret_alias) - - (srso_untrain_ret_alias & srso_safe_ret_alias)) == ((1 << 2) | (1 << 8) | (1 << 14) | (1 << 20)), +. = ASSERT(((ABSOLUTE(srso_untrain_ret_alias) | srso_safe_ret_alias) - + (ABSOLUTE(srso_untrain_ret_alias) & srso_safe_ret_alias)) == ((1 << 2) | (1 << 8) | (1 << 14) | (1 << 20)), "SRSO function pair won't alias"); #endif
From: Xin Li xin3.li@intel.com
commit 39163d5479285a36522b6e8f9cc568cc4987db08 upstream.
The vDSO getcpu() reads CPU ID from the GDT_ENTRY_CPUNODE entry when the RDPID instruction is not available. And GDT_ENTRY_CPUNODE is defined as 28 on 32-bit Linux kernel and 15 on 64-bit. But the 32-bit getcpu() on 64-bit Linux kernel is compiled with 32-bit Linux kernel GDT_ENTRY_CPUNODE, i.e., 28, beyond the 64-bit Linux kernel GDT limit. Thus, it just fails _silently_.
When BUILD_VDSO32_64 is defined, choose the 64-bit Linux kernel GDT definitions to compile the 32-bit getcpu().
Fixes: 877cff5296faa6e ("x86/vdso: Fake 32bit VDSO build on 64bit compile for vgetcpu") Reported-by: kernel test robot yujie.liu@intel.com Reported-by: Shan Kang shan.kang@intel.com Signed-off-by: Xin Li xin3.li@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20230322061758.10639-1-xin3.li@intel.com Link: https://lore.kernel.org/oe-lkp/202303020903.b01fd1de-yujie.liu@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/segment.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index 794f69625780..9d6411c65920 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -56,7 +56,7 @@
#define GDT_ENTRY_INVALID_SEG 0
-#ifdef CONFIG_X86_32 +#if defined(CONFIG_X86_32) && !defined(BUILD_VDSO32_64) /* * The layout of the per-CPU GDT under Linux: *
From: Cristian Ciocaltea cristian.ciocaltea@collabora.com
commit 6dbef74aeb090d6bee7d64ef3fa82ae6fa53f271 upstream.
Commit
522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix")
provided a fix for the Zen2 VZEROUPPER data corruption bug affecting a range of CPU models, but the AMD Custom APU 0405 found on SteamDeck was not listed, although it is clearly affected by the vulnerability.
Add this CPU variant to the Zenbleed erratum list, in order to unconditionally enable the fallback fix until a proper microcode update is available.
Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") Signed-off-by: Cristian Ciocaltea cristian.ciocaltea@collabora.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230811203705.1699914-1-cristian.ciocaltea@collab... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/amd.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -73,6 +73,7 @@ static const int amd_erratum_1054[] = static const int amd_zenbleed[] = AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x30, 0x0, 0x4f, 0xf), AMD_MODEL_RANGE(0x17, 0x60, 0x0, 0x7f, 0xf), + AMD_MODEL_RANGE(0x17, 0x90, 0x0, 0x91, 0xf), AMD_MODEL_RANGE(0x17, 0xa0, 0x0, 0xaf, 0xf));
static const int amd_div0[] =
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 1b8b1aa90c9c0e825b181b98b8d9e249dc395470 upstream.
Yingcong has noticed that on the 5-level paging machine, VDSO and VVAR VMAs are placed above the 47-bit border:
8000001a9000-8000001ad000 r--p 00000000 00:00 0 [vvar] 8000001ad000-8000001af000 r-xp 00000000 00:00 0 [vdso]
This might confuse users who are not aware of 5-level paging and expect all userspace addresses to be under the 47-bit border.
So far problem has only been triggered with ASLR disabled, although it may also occur with ASLR enabled if the layout is randomized in a just right way.
The problem happens due to custom placement for the VMAs in the VDSO code: vdso_addr() tries to place them above the stack and checks the result against TASK_SIZE_MAX, which is wrong. TASK_SIZE_MAX is set to the 56-bit border on 5-level paging machines. Use DEFAULT_MAP_WINDOW instead.
Fixes: b569bab78d8d ("x86/mm: Prepare to expose larger address space to userspace") Reported-by: Yingcong Wu yingcong.wu@intel.com Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230803151609.22141-1-kirill.shutemov%40linux.i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/entry/vdso/vma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -299,8 +299,8 @@ static unsigned long vdso_addr(unsigned
/* Round the lowest possible end address up to a PMD boundary. */ end = (start + len + PMD_SIZE - 1) & PMD_MASK; - if (end >= TASK_SIZE_MAX) - end = TASK_SIZE_MAX; + if (end >= DEFAULT_MAP_WINDOW) + end = DEFAULT_MAP_WINDOW; end -= len;
if (end > start) {
From: Borislav Petkov (AMD) bp@alien8.de
commit bee6cf1a80b54548a039e224c651bb15b644a480 upstream.
Tao Liu reported a boot hang on an Intel Atom machine due to an unmapped EFI config table. The reason being that the CC blob which contains the CPUID page for AMD SNP guests is parsed for before even checking whether the machine runs on AMD hardware.
Usually that's not a problem on !AMD hw - it simply won't find the CC blob's GUID and return. However, if any parts of the config table pointers array is not mapped, the kernel will #PF very early in the decompressor stage without any opportunity to recover.
Therefore, do a superficial CPUID check before poking for the CC blob. This will fix the current issue on real hardware. It would also work as a guest on a non-lying hypervisor.
For the lying hypervisor, the check is done again, *after* parsing the CC blob as the real CPUID page will be present then.
Clear the #VC handler in case SEV-{ES,SNP} hasn't been detected, as a precaution.
Fixes: c01fce9cef84 ("x86/compressed: Add SEV-SNP feature detection/setup") Reported-by: Tao Liu ltao@redhat.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: Tom Lendacky thomas.lendacky@amd.com Tested-by: Tao Liu ltao@redhat.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230601072043.24439-1-ltao@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/boot/compressed/idt_64.c | 9 ++++++++- arch/x86/boot/compressed/sev.c | 37 +++++++++++++++++++++++++++++++++++-- 2 files changed, 43 insertions(+), 3 deletions(-)
--- a/arch/x86/boot/compressed/idt_64.c +++ b/arch/x86/boot/compressed/idt_64.c @@ -63,7 +63,14 @@ void load_stage2_idt(void) set_idt_entry(X86_TRAP_PF, boot_page_fault);
#ifdef CONFIG_AMD_MEM_ENCRYPT - set_idt_entry(X86_TRAP_VC, boot_stage2_vc); + /* + * Clear the second stage #VC handler in case guest types + * needing #VC have not been detected. + */ + if (sev_status & BIT(1)) + set_idt_entry(X86_TRAP_VC, boot_stage2_vc); + else + set_idt_entry(X86_TRAP_VC, NULL); #endif
load_boot_idt(&boot_idt_desc); --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -353,12 +353,45 @@ void sev_enable(struct boot_params *bp) bp->cc_blob_address = 0;
/* + * Do an initial SEV capability check before snp_init() which + * loads the CPUID page and the same checks afterwards are done + * without the hypervisor and are trustworthy. + * + * If the HV fakes SEV support, the guest will crash'n'burn + * which is good enough. + */ + + /* Check for the SME/SEV support leaf */ + eax = 0x80000000; + ecx = 0; + native_cpuid(&eax, &ebx, &ecx, &edx); + if (eax < 0x8000001f) + return; + + /* + * Check for the SME/SEV feature: + * CPUID Fn8000_001F[EAX] + * - Bit 0 - Secure Memory Encryption support + * - Bit 1 - Secure Encrypted Virtualization support + * CPUID Fn8000_001F[EBX] + * - Bits 5:0 - Pagetable bit position used to indicate encryption + */ + eax = 0x8000001f; + ecx = 0; + native_cpuid(&eax, &ebx, &ecx, &edx); + /* Check whether SEV is supported */ + if (!(eax & BIT(1))) + return; + + /* * Setup/preliminary detection of SNP. This will be sanity-checked * against CPUID/MSR values later. */ snp = snp_init(bp);
- /* Check for the SME/SEV support leaf */ + /* Now repeat the checks with the SNP CPUID table. */ + + /* Recheck the SME/SEV support leaf */ eax = 0x80000000; ecx = 0; native_cpuid(&eax, &ebx, &ecx, &edx); @@ -366,7 +399,7 @@ void sev_enable(struct boot_params *bp) return;
/* - * Check for the SME/SEV feature: + * Recheck for the SME/SEV feature: * CPUID Fn8000_001F[EAX] * - Bit 0 - Secure Memory Encryption support * - Bit 1 - Secure Encrypted Virtualization support
From: Jinghao Jia jinghao@linux.ibm.com
commit 7324f74d39531262b8e362f228b46512e6bee632 upstream.
The BUILD_VDSO macro was incorrectly spelled as BULID_VDSO in asm/linkage.h. This causes the !defined(BULID_VDSO) directive to always evaluate to true.
Correct the spelling to BUILD_VDSO.
Fixes: bea75b33895f ("x86/Kconfig: Introduce function padding") Signed-off-by: Jinghao Jia jinghao@linux.ibm.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Randy Dunlap rdunlap@infradead.org Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230808182353.76218-1-jinghao@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/linkage.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h index 0953aa32a324..97a3de7892d3 100644 --- a/arch/x86/include/asm/linkage.h +++ b/arch/x86/include/asm/linkage.h @@ -21,7 +21,7 @@ #define FUNCTION_PADDING #endif
-#if (CONFIG_FUNCTION_ALIGNMENT > 8) && !defined(__DISABLE_EXPORTS) && !defined(BULID_VDSO) +#if (CONFIG_FUNCTION_ALIGNMENT > 8) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO) # define __FUNC_ALIGN __ALIGN; FUNCTION_PADDING #else # define __FUNC_ALIGN __ALIGN
From: Arnd Bergmann arnd@arndb.de
commit a57c27c7ad85c420b7de44c6ee56692d51709dda upstream.
The newly added function has two definitions but no prototypes:
drivers/base/cpu.c:605:16: error: no previous prototype for 'cpu_show_gds' [-Werror=missing-prototypes]
Add a declaration next to the other ones for this file to avoid the warning.
Fixes: 8974eb588283b ("x86/speculation: Add Gather Data Sampling mitigation") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Tested-by: Daniel Sneddon daniel.sneddon@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/all/20230809130530.1913368-1-arnd%40kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/cpu.h | 2 ++ 1 file changed, 2 insertions(+)
--- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -72,6 +72,8 @@ extern ssize_t cpu_show_retbleed(struct struct device_attribute *attr, char *buf); extern ssize_t cpu_show_spec_rstack_overflow(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_gds(struct device *dev, + struct device_attribute *attr, char *buf);
extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata,
From: Arnd Bergmann arnd@arndb.de
commit eb3515dc99c7c85f4170b50838136b2a193f8012 upstream.
The declaration got placed in the .c file of the caller, but that causes a warning for the definition:
arch/x86/kernel/cpu/bugs.c:682:6: error: no previous prototype for 'gds_ucode_mitigated' [-Werror=missing-prototypes]
Move it to a header where both sides can observe it instead.
Fixes: 81ac7e5d74174 ("KVM: Add GDS_NO support to KVM") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Tested-by: Daniel Sneddon daniel.sneddon@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/all/20230809130530.1913368-2-arnd%40kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/processor.h | 2 ++ arch/x86/kvm/x86.c | 2 -- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -732,4 +732,6 @@ bool arch_is_platform_page(u64 paddr); #define arch_is_platform_page arch_is_platform_page #endif
+extern bool gds_ucode_mitigated(void); + #endif /* _ASM_X86_PROCESSOR_H */ --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -314,8 +314,6 @@ u64 __read_mostly host_xcr0;
static struct kmem_cache *x86_emulator_cache;
-extern bool gds_ucode_mitigated(void); - /* * When called, it means the previous get/set msr reached an invalid msr. * Return true if we want to ignore/silent this failed msr access.
From: Bjorn Helgaas bhelgaas@google.com
commit 3bfc37d92687b4b19056998cebc02f94fbc81427 upstream.
b3574f579ece ("PCI: mvebu: Mark driver as BROKEN") made it impossible to enable the pci-mvebu driver. The driver does have known problems, but as Russell and Uwe reported, it does work in some configurations, so removing it broke some working setups.
Revert b3574f579ece so pci-mvebu is available.
Reported-by: Russell King (Oracle) linux@armlinux.org.uk Link: https://lore.kernel.org/r/ZMzicVQEyHyZzBOc@shell.armlinux.org.uk Reported-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Link: https://lore.kernel.org/r/20230804134622.pmbymxtzxj2yfhri@pengutronix.de Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/Kconfig | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/pci/controller/Kconfig +++ b/drivers/pci/controller/Kconfig @@ -179,7 +179,6 @@ config PCI_MVEBU depends on MVEBU_MBUS depends on ARM depends on OF - depends on BROKEN select PCI_BRIDGE_EMUL help Add support for Marvell EBU PCIe controller. This PCIe controller
From: Karol Herbst kherbst@redhat.com
commit d5712cd22b9cf109fded1b7f178f4c1888c8b84b upstream.
The original commit adding that check tried to protect the kenrel against a potential invalid NULL pointer access.
However we call nouveau_connector_detect_depth once without a native_mode set on purpose for non LVDS connectors and this broke DP support in a few cases.
Cc: Olaf Skibbe news@kravcenko.com Cc: Lyude Paul lyude@redhat.com Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/238 Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/245 Fixes: 20a2ce87fbaf8 ("drm/nouveau/dp: check for NULL nv_connector->native_mode") Signed-off-by: Karol Herbst kherbst@redhat.com Reviewed-by: Lyude Paul lyude@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20230805101813.2603989-1-kherb... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c @@ -967,7 +967,7 @@ nouveau_connector_get_modes(struct drm_c /* Determine display colour depth for everything except LVDS now, * DP requires this before mode_valid() is called. */ - if (connector->connector_type != DRM_MODE_CONNECTOR_LVDS && nv_connector->native_mode) + if (connector->connector_type != DRM_MODE_CONNECTOR_LVDS) nouveau_connector_detect_depth(connector);
/* Find the native mode if this is a digital panel, if we didn't
From: Florian Westphal fw@strlen.de
commit 24138933b97b055d486e8064b4a1721702442a9b upstream.
There is an asymmetry between commit/abort and preparation phase if the following conditions are met:
1. set is a verdict map ("1.2.3.4 : jump foo") 2. timeouts are enabled
In this case, following sequence is problematic:
1. element E in set S refers to chain C 2. userspace requests removal of set S 3. kernel does a set walk to decrement chain->use count for all elements from preparation phase 4. kernel does another set walk to remove elements from the commit phase (or another walk to do a chain->use increment for all elements from abort phase)
If E has already expired in 1), it will be ignored during list walk, so its use count won't have been changed.
Then, when set is culled, ->destroy callback will zap the element via nf_tables_set_elem_destroy(), but this function is only safe for elements that have been deactivated earlier from the preparation phase: lack of earlier deactivate removes the element but leaks the chain use count, which results in a WARN splat when the chain gets removed later, plus a leak of the nft_chain structure.
Update pipapo_get() not to skip expired elements, otherwise flush command reports bogus ENOENT errors.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 4 ++++ net/netfilter/nft_set_hash.c | 2 -- net/netfilter/nft_set_pipapo.c | 18 ++++++++++++------ net/netfilter/nft_set_rbtree.c | 2 -- 4 files changed, 16 insertions(+), 10 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -5590,8 +5590,12 @@ static int nf_tables_dump_setelem(const const struct nft_set_iter *iter, struct nft_set_elem *elem) { + const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv); struct nft_set_dump_args *args;
+ if (nft_set_elem_expired(ext)) + return 0; + args = container_of(iter, struct nft_set_dump_args, iter); return nf_tables_fill_setelem(args->skb, set, elem); } --- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -278,8 +278,6 @@ static void nft_rhash_walk(const struct
if (iter->count < iter->skip) goto cont; - if (nft_set_elem_expired(&he->ext)) - goto cont; if (!nft_set_elem_active(&he->ext, iter->genmask)) goto cont;
--- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -566,8 +566,7 @@ next_match: goto out;
if (last) { - if (nft_set_elem_expired(&f->mt[b].e->ext) || - (genmask && + if ((genmask && !nft_set_elem_active(&f->mt[b].e->ext, genmask))) goto next_match;
@@ -601,8 +600,17 @@ out: static void *nft_pipapo_get(const struct net *net, const struct nft_set *set, const struct nft_set_elem *elem, unsigned int flags) { - return pipapo_get(net, set, (const u8 *)elem->key.val.data, - nft_genmask_cur(net)); + struct nft_pipapo_elem *ret; + + ret = pipapo_get(net, set, (const u8 *)elem->key.val.data, + nft_genmask_cur(net)); + if (IS_ERR(ret)) + return ret; + + if (nft_set_elem_expired(&ret->ext)) + return ERR_PTR(-ENOENT); + + return ret; }
/** @@ -2006,8 +2014,6 @@ static void nft_pipapo_walk(const struct goto cont;
e = f->mt[r].e; - if (nft_set_elem_expired(&e->ext)) - goto cont;
elem.priv = e;
--- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -552,8 +552,6 @@ static void nft_rbtree_walk(const struct
if (iter->count < iter->skip) goto cont; - if (nft_set_elem_expired(&rbe->ext)) - goto cont; if (!nft_set_elem_active(&rbe->ext, iter->genmask)) goto cont;
Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Florian Westphal fw@strlen.de
commit 24138933b97b055d486e8064b4a1721702442a9b upstream.
Just FYI, this change is not correct.
There is an asymmetry between commit/abort and preparation phase if the following conditions are met:
- set is a verdict map ("1.2.3.4 : jump foo")
- timeouts are enabled
[..]
--- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -566,8 +566,7 @@ next_match: goto out; if (last) {
if (nft_set_elem_expired(&f->mt[b].e->ext) ||
(genmask &&
if ((genmask && !nft_set_elem_active(&f->mt[b].e->ext, genmask))) goto next_match;
This part is bonkers, it papers over the real issue and introduces another bug while at it (insertions for key K will fail if we have a key K that is already expired).
A patch to resolve it is queued on the mailing list and I'll make sure it gets passed to the net tree by this wednesday.
Sorry for the inconvenience, I hope this doesn't interefere with -stable release plans and this is leaves enough time for the fix to make it to -stable too.
On Mon, Aug 14, 2023 at 12:17:30AM +0200, Florian Westphal wrote:
Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Florian Westphal fw@strlen.de
commit 24138933b97b055d486e8064b4a1721702442a9b upstream.
Just FYI, this change is not correct.
There is an asymmetry between commit/abort and preparation phase if the following conditions are met:
- set is a verdict map ("1.2.3.4 : jump foo")
- timeouts are enabled
[..]
--- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -566,8 +566,7 @@ next_match: goto out; if (last) {
if (nft_set_elem_expired(&f->mt[b].e->ext) ||
(genmask &&
if ((genmask && !nft_set_elem_active(&f->mt[b].e->ext, genmask))) goto next_match;
This part is bonkers, it papers over the real issue and introduces another bug while at it (insertions for key K will fail if we have a key K that is already expired).
A patch to resolve it is queued on the mailing list and I'll make sure it gets passed to the net tree by this wednesday.
Sorry for the inconvenience, I hope this doesn't interefere with -stable release plans and this is leaves enough time for the fix to make it to -stable too.
Is there an upstream fix for this yet? If so, I can pull it into the stable tree, or should I drop this one for now and wait for the real fix? It's your call.
thanks,
greg k-h
On Mon, Aug 14, 2023 at 05:14:48PM +0200, Greg Kroah-Hartman wrote:
On Mon, Aug 14, 2023 at 12:17:30AM +0200, Florian Westphal wrote:
Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Florian Westphal fw@strlen.de
commit 24138933b97b055d486e8064b4a1721702442a9b upstream.
Just FYI, this change is not correct.
There is an asymmetry between commit/abort and preparation phase if the following conditions are met:
- set is a verdict map ("1.2.3.4 : jump foo")
- timeouts are enabled
[..]
--- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -566,8 +566,7 @@ next_match: goto out; if (last) {
if (nft_set_elem_expired(&f->mt[b].e->ext) ||
(genmask &&
if ((genmask && !nft_set_elem_active(&f->mt[b].e->ext, genmask))) goto next_match;
This part is bonkers, it papers over the real issue and introduces another bug while at it (insertions for key K will fail if we have a key K that is already expired).
A patch to resolve it is queued on the mailing list and I'll make sure it gets passed to the net tree by this wednesday.
Sorry for the inconvenience, I hope this doesn't interefere with -stable release plans and this is leaves enough time for the fix to make it to -stable too.
Is there an upstream fix for this yet? If so, I can pull it into the stable tree, or should I drop this one for now and wait for the real fix? It's your call.
I'd suggest: Drop it for 5.10, 5.15 and 6.1, because these versions are still missing the full series.
Keep it for 6.4 (this already have the full series with fixed) the incremental fix that is flying upstream will event amend this patch.
In summary:
- drop it for 5.10, 5.15 and 6.1 - keep it for 6.4
Thanks
On Mon, Aug 14, 2023 at 05:41:19PM +0200, Pablo Neira Ayuso wrote:
On Mon, Aug 14, 2023 at 05:14:48PM +0200, Greg Kroah-Hartman wrote:
On Mon, Aug 14, 2023 at 12:17:30AM +0200, Florian Westphal wrote:
Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Florian Westphal fw@strlen.de
commit 24138933b97b055d486e8064b4a1721702442a9b upstream.
Just FYI, this change is not correct.
There is an asymmetry between commit/abort and preparation phase if the following conditions are met:
- set is a verdict map ("1.2.3.4 : jump foo")
- timeouts are enabled
[..]
--- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -566,8 +566,7 @@ next_match: goto out; if (last) {
if (nft_set_elem_expired(&f->mt[b].e->ext) ||
(genmask &&
if ((genmask && !nft_set_elem_active(&f->mt[b].e->ext, genmask))) goto next_match;
This part is bonkers, it papers over the real issue and introduces another bug while at it (insertions for key K will fail if we have a key K that is already expired).
A patch to resolve it is queued on the mailing list and I'll make sure it gets passed to the net tree by this wednesday.
Sorry for the inconvenience, I hope this doesn't interefere with -stable release plans and this is leaves enough time for the fix to make it to -stable too.
Is there an upstream fix for this yet? If so, I can pull it into the stable tree, or should I drop this one for now and wait for the real fix? It's your call.
I'd suggest: Drop it for 5.10, 5.15 and 6.1, because these versions are still missing the full series.
Keep it for 6.4 (this already have the full series with fixed) the incremental fix that is flying upstream will event amend this patch.
In summary:
- drop it for 5.10, 5.15 and 6.1
- keep it for 6.4
Ok, thanks, now dropped for those trees.
greg k-h
From: Pablo Neira Ayuso pablo@netfilter.org
commit 5f68718b34a531a556f2f50300ead2862278da26 upstream.
The set types rhashtable and rbtree use a GC worker to reclaim memory.
From system work queue, in periodic intervals, a scan of the table is
done.
The major caveat here is that the nft transaction mutex is not held. This causes a race between control plane and GC when they attempt to delete the same element.
We cannot grab the netlink mutex from the work queue, because the control plane has to wait for the GC work queue in case the set is to be removed, so we get following deadlock:
cpu 1 cpu2 GC work transaction comes in , lock nft mutex `acquire nft mutex // BLOCKS transaction asks to remove the set set destruction calls cancel_work_sync()
cancel_work_sync will now block forever, because it is waiting for the mutex the caller already owns.
This patch adds a new API that deals with garbage collection in two steps:
1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT so they are not visible via lookup. Annotate current GC sequence in the GC transaction. Enqueue GC transaction work as soon as it is full. If ruleset is updated, then GC transaction is aborted and retried later.
2) GC work grabs the mutex. If GC sequence has changed then this GC transaction lost race with control plane, abort it as it contains stale references to objects and let GC try again later. If the ruleset is intact, then this GC transaction deactivates and removes the elements and it uses call_rcu() to destroy elements.
Note that no elements are removed from GC lockless path, the _DEAD bit is set and pointers are collected. GC catchall does not remove the elements anymore too. There is a new set->dead flag that is set on to abort the GC transaction to deal with set->ops->destroy() path which removes the remaining elements in the set from commit_release, where no mutex is held.
To deal with GC when mutex is held, which allows safe deactivate and removal, add sync GC API which releases the set element object via call_rcu(). This is used by rbtree and pipapo backends which also perform garbage collection from control plane path.
Since element removal from sets can happen from control plane and element garbage collection/timeout, it is necessary to keep the set structure alive until all elements have been deactivated and destroyed.
We cannot do a cancel_work_sync or flush_work in nft_set_destroy because its called with the transaction mutex held, but the aforementioned async work queue might be blocked on the very mutex that nft_set_destroy() callchain is sitting on.
This gives us the choice of ABBA deadlock or UaF.
To avoid both, add set->refs refcount_t member. The GC API can then increment the set refcount and release it once the elements have been free'd.
Set backends are adapted to use the GC transaction API in a follow up patch entitled:
("netfilter: nf_tables: use gc transaction API in set backends")
This is joint work with Florian Westphal.
Fixes: cfed7e1b1f8e ("netfilter: nf_tables: add set garbage collection helpers") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 64 +++++++++ net/netfilter/nf_tables_api.c | 248 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 300 insertions(+), 12 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -512,6 +512,7 @@ struct nft_set_elem_expr { * * @list: table set list node * @bindings: list of set bindings + * @refs: internal refcounting for async set destruction * @table: table this set belongs to * @net: netnamespace this set belongs to * @name: name of the set @@ -541,6 +542,7 @@ struct nft_set_elem_expr { struct nft_set { struct list_head list; struct list_head bindings; + refcount_t refs; struct nft_table *table; possible_net_t net; char *name; @@ -562,7 +564,8 @@ struct nft_set { struct list_head pending_update; /* runtime data below here */ const struct nft_set_ops *ops ____cacheline_aligned; - u16 flags:14, + u16 flags:13, + dead:1, genmask:2; u8 klen; u8 dlen; @@ -1592,6 +1595,32 @@ static inline void nft_set_elem_clear_bu clear_bit(NFT_SET_ELEM_BUSY_BIT, word); }
+#define NFT_SET_ELEM_DEAD_MASK (1 << 3) + +#if defined(__LITTLE_ENDIAN_BITFIELD) +#define NFT_SET_ELEM_DEAD_BIT 3 +#elif defined(__BIG_ENDIAN_BITFIELD) +#define NFT_SET_ELEM_DEAD_BIT (BITS_PER_LONG - BITS_PER_BYTE + 3) +#else +#error +#endif + +static inline void nft_set_elem_dead(struct nft_set_ext *ext) +{ + unsigned long *word = (unsigned long *)ext; + + BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0); + set_bit(NFT_SET_ELEM_DEAD_BIT, word); +} + +static inline int nft_set_elem_is_dead(const struct nft_set_ext *ext) +{ + unsigned long *word = (unsigned long *)ext; + + BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0); + return test_bit(NFT_SET_ELEM_DEAD_BIT, word); +} + /** * struct nft_trans - nf_tables object update in transaction * @@ -1729,6 +1758,38 @@ struct nft_trans_flowtable { #define nft_trans_flowtable_flags(trans) \ (((struct nft_trans_flowtable *)trans->data)->flags)
+#define NFT_TRANS_GC_BATCHCOUNT 256 + +struct nft_trans_gc { + struct list_head list; + struct net *net; + struct nft_set *set; + u32 seq; + u8 count; + void *priv[NFT_TRANS_GC_BATCHCOUNT]; + struct rcu_head rcu; +}; + +struct nft_trans_gc *nft_trans_gc_alloc(struct nft_set *set, + unsigned int gc_seq, gfp_t gfp); +void nft_trans_gc_destroy(struct nft_trans_gc *trans); + +struct nft_trans_gc *nft_trans_gc_queue_async(struct nft_trans_gc *gc, + unsigned int gc_seq, gfp_t gfp); +void nft_trans_gc_queue_async_done(struct nft_trans_gc *gc); + +struct nft_trans_gc *nft_trans_gc_queue_sync(struct nft_trans_gc *gc, gfp_t gfp); +void nft_trans_gc_queue_sync_done(struct nft_trans_gc *trans); + +void nft_trans_gc_elem_add(struct nft_trans_gc *gc, void *priv); + +struct nft_trans_gc *nft_trans_gc_catchall(struct nft_trans_gc *gc, + unsigned int gc_seq); + +void nft_setelem_data_deactivate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem); + int __init nft_chain_filter_init(void); void nft_chain_filter_fini(void);
@@ -1755,6 +1816,7 @@ struct nftables_pernet { struct mutex commit_mutex; u64 table_handle; unsigned int base_seq; + unsigned int gc_seq; };
extern unsigned int nf_tables_net_id; --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -31,7 +31,9 @@ static LIST_HEAD(nf_tables_expressions); static LIST_HEAD(nf_tables_objects); static LIST_HEAD(nf_tables_flowtables); static LIST_HEAD(nf_tables_destroy_list); +static LIST_HEAD(nf_tables_gc_list); static DEFINE_SPINLOCK(nf_tables_destroy_list_lock); +static DEFINE_SPINLOCK(nf_tables_gc_list_lock);
enum { NFT_VALIDATE_SKIP = 0, @@ -120,6 +122,9 @@ static void nft_validate_state_update(st static void nf_tables_trans_destroy_work(struct work_struct *w); static DECLARE_WORK(trans_destroy_work, nf_tables_trans_destroy_work);
+static void nft_trans_gc_work(struct work_struct *work); +static DECLARE_WORK(trans_gc_work, nft_trans_gc_work); + static void nft_ctx_init(struct nft_ctx *ctx, struct net *net, const struct sk_buff *skb, @@ -581,10 +586,6 @@ static int nft_trans_set_add(const struc return __nft_trans_set_add(ctx, msg_type, set, NULL); }
-static void nft_setelem_data_deactivate(const struct net *net, - const struct nft_set *set, - struct nft_set_elem *elem); - static int nft_mapelem_deactivate(const struct nft_ctx *ctx, struct nft_set *set, const struct nft_set_iter *iter, @@ -5054,6 +5055,7 @@ static int nf_tables_newset(struct sk_bu
INIT_LIST_HEAD(&set->bindings); INIT_LIST_HEAD(&set->catchall_list); + refcount_set(&set->refs, 1); set->table = table; write_pnet(&set->net, net); set->ops = ops; @@ -5121,6 +5123,14 @@ static void nft_set_catchall_destroy(con } }
+static void nft_set_put(struct nft_set *set) +{ + if (refcount_dec_and_test(&set->refs)) { + kfree(set->name); + kvfree(set); + } +} + static void nft_set_destroy(const struct nft_ctx *ctx, struct nft_set *set) { int i; @@ -5133,8 +5143,7 @@ static void nft_set_destroy(const struct
set->ops->destroy(ctx, set); nft_set_catchall_destroy(ctx, set); - kfree(set->name); - kvfree(set); + nft_set_put(set); }
static int nf_tables_delset(struct sk_buff *skb, const struct nfnl_info *info, @@ -6255,7 +6264,8 @@ struct nft_set_ext *nft_set_catchall_loo list_for_each_entry_rcu(catchall, &set->catchall_list, list) { ext = nft_set_elem_ext(set, catchall->elem); if (nft_set_elem_active(ext, genmask) && - !nft_set_elem_expired(ext)) + !nft_set_elem_expired(ext) && + !nft_set_elem_is_dead(ext)) return ext; }
@@ -6907,9 +6917,9 @@ static void nft_setelem_data_activate(co nft_use_inc_restore(&(*nft_set_ext_obj(ext))->use); }
-static void nft_setelem_data_deactivate(const struct net *net, - const struct nft_set *set, - struct nft_set_elem *elem) +void nft_setelem_data_deactivate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem) { const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
@@ -9386,6 +9396,207 @@ void nft_chain_del(struct nft_chain *cha list_del_rcu(&chain->list); }
+static void nft_trans_gc_setelem_remove(struct nft_ctx *ctx, + struct nft_trans_gc *trans) +{ + void **priv = trans->priv; + unsigned int i; + + for (i = 0; i < trans->count; i++) { + struct nft_set_elem elem = { + .priv = priv[i], + }; + + nft_setelem_data_deactivate(ctx->net, trans->set, &elem); + nft_setelem_remove(ctx->net, trans->set, &elem); + } +} + +void nft_trans_gc_destroy(struct nft_trans_gc *trans) +{ + nft_set_put(trans->set); + put_net(trans->net); + kfree(trans); +} + +static void nft_trans_gc_trans_free(struct rcu_head *rcu) +{ + struct nft_set_elem elem = {}; + struct nft_trans_gc *trans; + struct nft_ctx ctx = {}; + unsigned int i; + + trans = container_of(rcu, struct nft_trans_gc, rcu); + ctx.net = read_pnet(&trans->set->net); + + for (i = 0; i < trans->count; i++) { + elem.priv = trans->priv[i]; + if (!nft_setelem_is_catchall(trans->set, &elem)) + atomic_dec(&trans->set->nelems); + + nf_tables_set_elem_destroy(&ctx, trans->set, elem.priv); + } + + nft_trans_gc_destroy(trans); +} + +static bool nft_trans_gc_work_done(struct nft_trans_gc *trans) +{ + struct nftables_pernet *nft_net; + struct nft_ctx ctx = {}; + + nft_net = nft_pernet(trans->net); + + mutex_lock(&nft_net->commit_mutex); + + /* Check for race with transaction, otherwise this batch refers to + * stale objects that might not be there anymore. Skip transaction if + * set has been destroyed from control plane transaction in case gc + * worker loses race. + */ + if (READ_ONCE(nft_net->gc_seq) != trans->seq || trans->set->dead) { + mutex_unlock(&nft_net->commit_mutex); + return false; + } + + ctx.net = trans->net; + ctx.table = trans->set->table; + + nft_trans_gc_setelem_remove(&ctx, trans); + mutex_unlock(&nft_net->commit_mutex); + + return true; +} + +static void nft_trans_gc_work(struct work_struct *work) +{ + struct nft_trans_gc *trans, *next; + LIST_HEAD(trans_gc_list); + + spin_lock(&nf_tables_destroy_list_lock); + list_splice_init(&nf_tables_gc_list, &trans_gc_list); + spin_unlock(&nf_tables_destroy_list_lock); + + list_for_each_entry_safe(trans, next, &trans_gc_list, list) { + list_del(&trans->list); + if (!nft_trans_gc_work_done(trans)) { + nft_trans_gc_destroy(trans); + continue; + } + call_rcu(&trans->rcu, nft_trans_gc_trans_free); + } +} + +struct nft_trans_gc *nft_trans_gc_alloc(struct nft_set *set, + unsigned int gc_seq, gfp_t gfp) +{ + struct net *net = read_pnet(&set->net); + struct nft_trans_gc *trans; + + trans = kzalloc(sizeof(*trans), gfp); + if (!trans) + return NULL; + + refcount_inc(&set->refs); + trans->set = set; + trans->net = get_net(net); + trans->seq = gc_seq; + + return trans; +} + +void nft_trans_gc_elem_add(struct nft_trans_gc *trans, void *priv) +{ + trans->priv[trans->count++] = priv; +} + +static void nft_trans_gc_queue_work(struct nft_trans_gc *trans) +{ + spin_lock(&nf_tables_gc_list_lock); + list_add_tail(&trans->list, &nf_tables_gc_list); + spin_unlock(&nf_tables_gc_list_lock); + + schedule_work(&trans_gc_work); +} + +static int nft_trans_gc_space(struct nft_trans_gc *trans) +{ + return NFT_TRANS_GC_BATCHCOUNT - trans->count; +} + +struct nft_trans_gc *nft_trans_gc_queue_async(struct nft_trans_gc *gc, + unsigned int gc_seq, gfp_t gfp) +{ + if (nft_trans_gc_space(gc)) + return gc; + + nft_trans_gc_queue_work(gc); + + return nft_trans_gc_alloc(gc->set, gc_seq, gfp); +} + +void nft_trans_gc_queue_async_done(struct nft_trans_gc *trans) +{ + if (trans->count == 0) { + nft_trans_gc_destroy(trans); + return; + } + + nft_trans_gc_queue_work(trans); +} + +struct nft_trans_gc *nft_trans_gc_queue_sync(struct nft_trans_gc *gc, gfp_t gfp) +{ + if (WARN_ON_ONCE(!lockdep_commit_lock_is_held(gc->net))) + return NULL; + + if (nft_trans_gc_space(gc)) + return gc; + + call_rcu(&gc->rcu, nft_trans_gc_trans_free); + + return nft_trans_gc_alloc(gc->set, 0, gfp); +} + +void nft_trans_gc_queue_sync_done(struct nft_trans_gc *trans) +{ + WARN_ON_ONCE(!lockdep_commit_lock_is_held(trans->net)); + + if (trans->count == 0) { + nft_trans_gc_destroy(trans); + return; + } + + call_rcu(&trans->rcu, nft_trans_gc_trans_free); +} + +struct nft_trans_gc *nft_trans_gc_catchall(struct nft_trans_gc *gc, + unsigned int gc_seq) +{ + struct nft_set_elem_catchall *catchall; + const struct nft_set *set = gc->set; + struct nft_set_ext *ext; + + list_for_each_entry_rcu(catchall, &set->catchall_list, list) { + ext = nft_set_elem_ext(set, catchall->elem); + + if (!nft_set_elem_expired(ext)) + continue; + if (nft_set_elem_is_dead(ext)) + goto dead_elem; + + nft_set_elem_dead(ext); +dead_elem: + gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC); + if (!gc) + return NULL; + + nft_trans_gc_elem_add(gc, catchall->elem); + } + + return gc; +} + static void nf_tables_module_autoload_cleanup(struct net *net) { struct nftables_pernet *nft_net = nft_pernet(net); @@ -9548,11 +9759,11 @@ static int nf_tables_commit(struct net * { struct nftables_pernet *nft_net = nft_pernet(net); struct nft_trans *trans, *next; + unsigned int base_seq, gc_seq; LIST_HEAD(set_update_list); struct nft_trans_elem *te; struct nft_chain *chain; struct nft_table *table; - unsigned int base_seq; LIST_HEAD(adl); int err;
@@ -9629,6 +9840,10 @@ static int nf_tables_commit(struct net *
WRITE_ONCE(nft_net->base_seq, base_seq);
+ /* Bump gc counter, it becomes odd, this is the busy mark. */ + gc_seq = READ_ONCE(nft_net->gc_seq); + WRITE_ONCE(nft_net->gc_seq, ++gc_seq); + /* step 3. Start new generation, rules_gen_X now in use. */ net->nft.gencursor = nft_gencursor_next(net);
@@ -9733,6 +9948,7 @@ static int nf_tables_commit(struct net * break; case NFT_MSG_DELSET: case NFT_MSG_DESTROYSET: + nft_trans_set(trans)->dead = 1; list_del_rcu(&nft_trans_set(trans)->list); nf_tables_set_notify(&trans->ctx, nft_trans_set(trans), trans->msg_type, GFP_KERNEL); @@ -9835,6 +10051,8 @@ static int nf_tables_commit(struct net * nft_commit_notify(net, NETLINK_CB(skb).portid); nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN); nf_tables_commit_audit_log(&adl, nft_net->base_seq); + + WRITE_ONCE(nft_net->gc_seq, ++gc_seq); nf_tables_commit_release(net);
return 0; @@ -10884,6 +11102,7 @@ static int __net_init nf_tables_init_net INIT_LIST_HEAD(&nft_net->notify_list); mutex_init(&nft_net->commit_mutex); nft_net->base_seq = 1; + nft_net->gc_seq = 0;
return 0; } @@ -10912,10 +11131,16 @@ static void __net_exit nf_tables_exit_ne WARN_ON_ONCE(!list_empty(&nft_net->notify_list)); }
+static void nf_tables_exit_batch(struct list_head *net_exit_list) +{ + flush_work(&trans_gc_work); +} + static struct pernet_operations nf_tables_net_ops = { .init = nf_tables_init_net, .pre_exit = nf_tables_pre_exit_net, .exit = nf_tables_exit_net, + .exit_batch = nf_tables_exit_batch, .id = &nf_tables_net_id, .size = sizeof(struct nftables_pernet), }; @@ -10987,6 +11212,7 @@ static void __exit nf_tables_module_exit nft_chain_filter_fini(); nft_chain_route_fini(); unregister_pernet_subsys(&nf_tables_net_ops); + cancel_work_sync(&trans_gc_work); cancel_work_sync(&trans_destroy_work); rcu_barrier(); rhltable_destroy(&nft_objname_ht);
From: Pablo Neira Ayuso pablo@netfilter.org
commit f6c383b8c31a93752a52697f8430a71dcbc46adf upstream.
Use the GC transaction API to replace the old and buggy gc API and the busy mark approach.
No set elements are removed from async garbage collection anymore, instead the _DEAD bit is set on so the set element is not visible from lookup path anymore. Async GC enqueues transaction work that might be aborted and retried later.
rbtree and pipapo set backends does not set on the _DEAD bit from the sync GC path since this runs in control plane path where mutex is held. In this case, set elements are deactivated, removed and then released via RCU callback, sync GC never fails.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 7 - net/netfilter/nft_set_hash.c | 77 +++++++++++++-------- net/netfilter/nft_set_pipapo.c | 48 ++++++++++--- net/netfilter/nft_set_rbtree.c | 146 ++++++++++++++++++++++++----------------- 4 files changed, 174 insertions(+), 104 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -6357,7 +6357,6 @@ static void nft_setelem_activate(struct
if (nft_setelem_is_catchall(set, elem)) { nft_set_elem_change_active(net, set, ext); - nft_set_elem_clear_busy(ext); } else { set->ops->activate(net, set, elem); } @@ -6372,8 +6371,7 @@ static int nft_setelem_catchall_deactiva
list_for_each_entry(catchall, &set->catchall_list, list) { ext = nft_set_elem_ext(set, catchall->elem); - if (!nft_is_active(net, ext) || - nft_set_elem_mark_busy(ext)) + if (!nft_is_active(net, ext)) continue;
kfree(elem->priv); @@ -7083,8 +7081,7 @@ static int nft_set_catchall_flush(const
list_for_each_entry_rcu(catchall, &set->catchall_list, list) { ext = nft_set_elem_ext(set, catchall->elem); - if (!nft_set_elem_active(ext, genmask) || - nft_set_elem_mark_busy(ext)) + if (!nft_set_elem_active(ext, genmask)) continue;
elem.priv = catchall->elem; --- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -59,6 +59,8 @@ static inline int nft_rhash_cmp(struct r
if (memcmp(nft_set_ext_key(&he->ext), x->key, x->set->klen)) return 1; + if (nft_set_elem_is_dead(&he->ext)) + return 1; if (nft_set_elem_expired(&he->ext)) return 1; if (!nft_set_elem_active(&he->ext, x->genmask)) @@ -188,7 +190,6 @@ static void nft_rhash_activate(const str struct nft_rhash_elem *he = elem->priv;
nft_set_elem_change_active(net, set, &he->ext); - nft_set_elem_clear_busy(&he->ext); }
static bool nft_rhash_flush(const struct net *net, @@ -196,12 +197,9 @@ static bool nft_rhash_flush(const struct { struct nft_rhash_elem *he = priv;
- if (!nft_set_elem_mark_busy(&he->ext) || - !nft_is_active(net, &he->ext)) { - nft_set_elem_change_active(net, set, &he->ext); - return true; - } - return false; + nft_set_elem_change_active(net, set, &he->ext); + + return true; }
static void *nft_rhash_deactivate(const struct net *net, @@ -218,9 +216,8 @@ static void *nft_rhash_deactivate(const
rcu_read_lock(); he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params); - if (he != NULL && - !nft_rhash_flush(net, set, he)) - he = NULL; + if (he) + nft_set_elem_change_active(net, set, &he->ext);
rcu_read_unlock();
@@ -312,25 +309,48 @@ static bool nft_rhash_expr_needs_gc_run(
static void nft_rhash_gc(struct work_struct *work) { + struct nftables_pernet *nft_net; struct nft_set *set; struct nft_rhash_elem *he; struct nft_rhash *priv; - struct nft_set_gc_batch *gcb = NULL; struct rhashtable_iter hti; + struct nft_trans_gc *gc; + struct net *net; + u32 gc_seq;
priv = container_of(work, struct nft_rhash, gc_work.work); set = nft_set_container_of(priv); + net = read_pnet(&set->net); + nft_net = nft_pernet(net); + gc_seq = READ_ONCE(nft_net->gc_seq); + + gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL); + if (!gc) + goto done;
rhashtable_walk_enter(&priv->ht, &hti); rhashtable_walk_start(&hti);
while ((he = rhashtable_walk_next(&hti))) { if (IS_ERR(he)) { - if (PTR_ERR(he) != -EAGAIN) - break; + if (PTR_ERR(he) != -EAGAIN) { + nft_trans_gc_destroy(gc); + gc = NULL; + goto try_later; + } continue; }
+ /* Ruleset has been updated, try later. */ + if (READ_ONCE(nft_net->gc_seq) != gc_seq) { + nft_trans_gc_destroy(gc); + gc = NULL; + goto try_later; + } + + if (nft_set_elem_is_dead(&he->ext)) + goto dead_elem; + if (nft_set_ext_exists(&he->ext, NFT_SET_EXT_EXPRESSIONS) && nft_rhash_expr_needs_gc_run(set, &he->ext)) goto needs_gc_run; @@ -338,26 +358,26 @@ static void nft_rhash_gc(struct work_str if (!nft_set_elem_expired(&he->ext)) continue; needs_gc_run: - if (nft_set_elem_mark_busy(&he->ext)) - continue; + nft_set_elem_dead(&he->ext); +dead_elem: + gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC); + if (!gc) + goto try_later;
- gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC); - if (gcb == NULL) - break; - rhashtable_remove_fast(&priv->ht, &he->node, nft_rhash_params); - atomic_dec(&set->nelems); - nft_set_gc_batch_add(gcb, he); + nft_trans_gc_elem_add(gc, he); } + + gc = nft_trans_gc_catchall(gc, gc_seq); + +try_later: + /* catchall list iteration requires rcu read side lock. */ rhashtable_walk_stop(&hti); rhashtable_walk_exit(&hti);
- he = nft_set_catchall_gc(set); - if (he) { - gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC); - if (gcb) - nft_set_gc_batch_add(gcb, he); - } - nft_set_gc_batch_complete(gcb); + if (gc) + nft_trans_gc_queue_async_done(gc); + +done: queue_delayed_work(system_power_efficient_wq, &priv->gc_work, nft_set_gc_interval(set)); } @@ -420,7 +440,6 @@ static void nft_rhash_destroy(const stru };
cancel_delayed_work_sync(&priv->gc_work); - rcu_barrier(); rhashtable_free_and_destroy(&priv->ht, nft_rhash_elem_destroy, (void *)&rhash_ctx); } --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -1537,16 +1537,34 @@ static void pipapo_drop(struct nft_pipap } }
+static void nft_pipapo_gc_deactivate(struct net *net, struct nft_set *set, + struct nft_pipapo_elem *e) + +{ + struct nft_set_elem elem = { + .priv = e, + }; + + nft_setelem_data_deactivate(net, set, &elem); +} + /** * pipapo_gc() - Drop expired entries from set, destroy start and end elements * @set: nftables API set representation * @m: Matching data */ -static void pipapo_gc(const struct nft_set *set, struct nft_pipapo_match *m) +static void pipapo_gc(const struct nft_set *_set, struct nft_pipapo_match *m) { + struct nft_set *set = (struct nft_set *) _set; struct nft_pipapo *priv = nft_set_priv(set); + struct net *net = read_pnet(&set->net); int rules_f0, first_rule = 0; struct nft_pipapo_elem *e; + struct nft_trans_gc *gc; + + gc = nft_trans_gc_alloc(set, 0, GFP_KERNEL); + if (!gc) + return;
while ((rules_f0 = pipapo_rules_same_key(m->f, first_rule))) { union nft_pipapo_map_bucket rulemap[NFT_PIPAPO_MAX_FIELDS]; @@ -1570,13 +1588,20 @@ static void pipapo_gc(const struct nft_s f--; i--; e = f->mt[rulemap[i].to].e; - if (nft_set_elem_expired(&e->ext) && - !nft_set_elem_mark_busy(&e->ext)) { + + /* synchronous gc never fails, there is no need to set on + * NFT_SET_ELEM_DEAD_BIT. + */ + if (nft_set_elem_expired(&e->ext)) { priv->dirty = true; - pipapo_drop(m, rulemap);
- rcu_barrier(); - nft_set_elem_destroy(set, e, true); + gc = nft_trans_gc_queue_sync(gc, GFP_ATOMIC); + if (!gc) + break; + + nft_pipapo_gc_deactivate(net, set, e); + pipapo_drop(m, rulemap); + nft_trans_gc_elem_add(gc, e);
/* And check again current first rule, which is now the * first we haven't checked. @@ -1586,11 +1611,11 @@ static void pipapo_gc(const struct nft_s } }
- e = nft_set_catchall_gc(set); - if (e) - nft_set_elem_destroy(set, e, true); - - priv->last_gc = jiffies; + gc = nft_trans_gc_catchall(gc, 0); + if (gc) { + nft_trans_gc_queue_sync_done(gc); + priv->last_gc = jiffies; + } }
/** @@ -1715,7 +1740,6 @@ static void nft_pipapo_activate(const st return;
nft_set_elem_change_active(net, set, &e->ext); - nft_set_elem_clear_busy(&e->ext); }
/** --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -46,6 +46,12 @@ static int nft_rbtree_cmp(const struct n set->klen); }
+static bool nft_rbtree_elem_expired(const struct nft_rbtree_elem *rbe) +{ + return nft_set_elem_expired(&rbe->ext) || + nft_set_elem_is_dead(&rbe->ext); +} + static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, const u32 *key, const struct nft_set_ext **ext, unsigned int seq) @@ -80,7 +86,7 @@ static bool __nft_rbtree_lookup(const st continue; }
- if (nft_set_elem_expired(&rbe->ext)) + if (nft_rbtree_elem_expired(rbe)) return false;
if (nft_rbtree_interval_end(rbe)) { @@ -98,7 +104,7 @@ static bool __nft_rbtree_lookup(const st
if (set->flags & NFT_SET_INTERVAL && interval != NULL && nft_set_elem_active(&interval->ext, genmask) && - !nft_set_elem_expired(&interval->ext) && + !nft_rbtree_elem_expired(interval) && nft_rbtree_interval_start(interval)) { *ext = &interval->ext; return true; @@ -215,6 +221,18 @@ static void *nft_rbtree_get(const struct return rbe; }
+static void nft_rbtree_gc_remove(struct net *net, struct nft_set *set, + struct nft_rbtree *priv, + struct nft_rbtree_elem *rbe) +{ + struct nft_set_elem elem = { + .priv = rbe, + }; + + nft_setelem_data_deactivate(net, set, &elem); + rb_erase(&rbe->node, &priv->root); +} + static int nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv, struct nft_rbtree_elem *rbe, @@ -222,11 +240,12 @@ static int nft_rbtree_gc_elem(const stru { struct nft_set *set = (struct nft_set *)__set; struct rb_node *prev = rb_prev(&rbe->node); + struct net *net = read_pnet(&set->net); struct nft_rbtree_elem *rbe_prev; - struct nft_set_gc_batch *gcb; + struct nft_trans_gc *gc;
- gcb = nft_set_gc_batch_check(set, NULL, GFP_ATOMIC); - if (!gcb) + gc = nft_trans_gc_alloc(set, 0, GFP_ATOMIC); + if (!gc) return -ENOMEM;
/* search for end interval coming before this element. @@ -244,17 +263,28 @@ static int nft_rbtree_gc_elem(const stru
if (prev) { rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node); + nft_rbtree_gc_remove(net, set, priv, rbe_prev);
- rb_erase(&rbe_prev->node, &priv->root); - atomic_dec(&set->nelems); - nft_set_gc_batch_add(gcb, rbe_prev); + /* There is always room in this trans gc for this element, + * memory allocation never actually happens, hence, the warning + * splat in such case. No need to set NFT_SET_ELEM_DEAD_BIT, + * this is synchronous gc which never fails. + */ + gc = nft_trans_gc_queue_sync(gc, GFP_ATOMIC); + if (WARN_ON_ONCE(!gc)) + return -ENOMEM; + + nft_trans_gc_elem_add(gc, rbe_prev); }
- rb_erase(&rbe->node, &priv->root); - atomic_dec(&set->nelems); + nft_rbtree_gc_remove(net, set, priv, rbe); + gc = nft_trans_gc_queue_sync(gc, GFP_ATOMIC); + if (WARN_ON_ONCE(!gc)) + return -ENOMEM;
- nft_set_gc_batch_add(gcb, rbe); - nft_set_gc_batch_complete(gcb); + nft_trans_gc_elem_add(gc, rbe); + + nft_trans_gc_queue_sync_done(gc);
return 0; } @@ -482,7 +512,6 @@ static void nft_rbtree_activate(const st struct nft_rbtree_elem *rbe = elem->priv;
nft_set_elem_change_active(net, set, &rbe->ext); - nft_set_elem_clear_busy(&rbe->ext); }
static bool nft_rbtree_flush(const struct net *net, @@ -490,12 +519,9 @@ static bool nft_rbtree_flush(const struc { struct nft_rbtree_elem *rbe = priv;
- if (!nft_set_elem_mark_busy(&rbe->ext) || - !nft_is_active(net, &rbe->ext)) { - nft_set_elem_change_active(net, set, &rbe->ext); - return true; - } - return false; + nft_set_elem_change_active(net, set, &rbe->ext); + + return true; }
static void *nft_rbtree_deactivate(const struct net *net, @@ -570,26 +596,40 @@ cont:
static void nft_rbtree_gc(struct work_struct *work) { - struct nft_rbtree_elem *rbe, *rbe_end = NULL, *rbe_prev = NULL; - struct nft_set_gc_batch *gcb = NULL; + struct nft_rbtree_elem *rbe, *rbe_end = NULL; + struct nftables_pernet *nft_net; struct nft_rbtree *priv; + struct nft_trans_gc *gc; struct rb_node *node; struct nft_set *set; + unsigned int gc_seq; struct net *net; - u8 genmask;
priv = container_of(work, struct nft_rbtree, gc_work.work); set = nft_set_container_of(priv); net = read_pnet(&set->net); - genmask = nft_genmask_cur(net); + nft_net = nft_pernet(net); + gc_seq = READ_ONCE(nft_net->gc_seq); + + gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL); + if (!gc) + goto done;
write_lock_bh(&priv->lock); write_seqcount_begin(&priv->count); for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) { + + /* Ruleset has been updated, try later. */ + if (READ_ONCE(nft_net->gc_seq) != gc_seq) { + nft_trans_gc_destroy(gc); + gc = NULL; + goto try_later; + } + rbe = rb_entry(node, struct nft_rbtree_elem, node);
- if (!nft_set_elem_active(&rbe->ext, genmask)) - continue; + if (nft_set_elem_is_dead(&rbe->ext)) + goto dead_elem;
/* elements are reversed in the rbtree for historical reasons, * from highest to lowest value, that is why end element is @@ -602,46 +642,36 @@ static void nft_rbtree_gc(struct work_st if (!nft_set_elem_expired(&rbe->ext)) continue;
- if (nft_set_elem_mark_busy(&rbe->ext)) { - rbe_end = NULL; + nft_set_elem_dead(&rbe->ext); + + if (!rbe_end) continue; - }
- if (rbe_prev) { - rb_erase(&rbe_prev->node, &priv->root); - rbe_prev = NULL; - } - gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC); - if (!gcb) - break; + nft_set_elem_dead(&rbe_end->ext);
- atomic_dec(&set->nelems); - nft_set_gc_batch_add(gcb, rbe); - rbe_prev = rbe; - - if (rbe_end) { - atomic_dec(&set->nelems); - nft_set_gc_batch_add(gcb, rbe_end); - rb_erase(&rbe_end->node, &priv->root); - rbe_end = NULL; - } - node = rb_next(node); - if (!node) - break; + gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC); + if (!gc) + goto try_later; + + nft_trans_gc_elem_add(gc, rbe_end); + rbe_end = NULL; +dead_elem: + gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC); + if (!gc) + goto try_later; + + nft_trans_gc_elem_add(gc, rbe); } - if (rbe_prev) - rb_erase(&rbe_prev->node, &priv->root); + + gc = nft_trans_gc_catchall(gc, gc_seq); + +try_later: write_seqcount_end(&priv->count); write_unlock_bh(&priv->lock);
- rbe = nft_set_catchall_gc(set); - if (rbe) { - gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC); - if (gcb) - nft_set_gc_batch_add(gcb, rbe); - } - nft_set_gc_batch_complete(gcb); - + if (gc) + nft_trans_gc_queue_async_done(gc); +done: queue_delayed_work(system_power_efficient_wq, &priv->gc_work, nft_set_gc_interval(set)); }
From: Pablo Neira Ayuso pablo@netfilter.org
commit c92db3030492b8ad1d0faace7a93bbcf53850d0c upstream.
Set on the NFT_SET_ELEM_DEAD_BIT flag on this element, instead of performing element removal which might race with an ongoing transaction. Enable gc when dynamic flag is set on since dynset deletion requires garbage collection after this patch.
Fixes: d0a8d877da97 ("netfilter: nft_dynset: support for element deletion") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_set_hash.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -249,7 +249,9 @@ static bool nft_rhash_delete(const struc if (he == NULL) return false;
- return rhashtable_remove_fast(&priv->ht, &he->node, nft_rhash_params) == 0; + nft_set_elem_dead(&he->ext); + + return true; }
static void nft_rhash_walk(const struct nft_ctx *ctx, struct nft_set *set, @@ -412,7 +414,7 @@ static int nft_rhash_init(const struct n return err;
INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rhash_gc); - if (set->flags & NFT_SET_TIMEOUT) + if (set->flags & (NFT_SET_TIMEOUT | NFT_SET_EVAL)) nft_rhash_gc_init(set);
return 0;
From: Alejandro Tafalla atafalla@dnyon.com
commit 6811694eb2f6b7a4e97be2029edc7dd6a39460f8 upstream.
The function lsm6dsx_get_acpi_mount_matrix should return an error when ACPI support is not enabled to allow executing iio_read_mount_matrix in the probe function.
Fixes: dc3d25f22b88 ("iio: imu: lsm6dsx: Add ACPI mount matrix retrieval") Signed-off-by: Alejandro Tafalla atafalla@dnyon.com Acked-by: Lorenzo Bianconi lorenzo@kernel.org Link: https://lore.kernel.org/r/20230714153132.27265-1-atafalla@dnyon.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c index 6a18b363cf73..b6e6b1df8a61 100644 --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c @@ -2687,7 +2687,7 @@ static int lsm6dsx_get_acpi_mount_matrix(struct device *dev, static int lsm6dsx_get_acpi_mount_matrix(struct device *dev, struct iio_mount_matrix *orientation) { - return false; + return -EOPNOTSUPP; }
#endif
From: Milan Zamazal mzamazal@redhat.com
commit b2a69969908fcaf68596dfc04369af0fe2e1d2f7 upstream.
Commit 813665564b3d ("iio: core: Convert to use firmware node handle instead of OF node") switched the kind of nodes to use for label retrieval in device registration. Probably an unwanted change in that commit was that if the device has no parent then NULL pointer is accessed. This is what happens in the stock IIO dummy driver when a new entry is created in configfs:
# mkdir /sys/kernel/config/iio/devices/dummy/foo BUG: kernel NULL pointer dereference, address: ... ... Call Trace: __iio_device_register iio_dummy_probe
Since there seems to be no reason to make a parent device of an IIO dummy device mandatory, let’s prevent the invalid memory access in __iio_device_register when the parent device is NULL. With this change, the IIO dummy driver works fine with configfs.
Fixes: 813665564b3d ("iio: core: Convert to use firmware node handle instead of OF node") Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Milan Zamazal mzamazal@redhat.com Link: https://lore.kernel.org/r/20230719083208.88149-1-mzamazal@redhat.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/industrialio-core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/iio/industrialio-core.c +++ b/drivers/iio/industrialio-core.c @@ -1888,7 +1888,7 @@ static const struct iio_buffer_setup_ops int __iio_device_register(struct iio_dev *indio_dev, struct module *this_mod) { struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev); - struct fwnode_handle *fwnode; + struct fwnode_handle *fwnode = NULL; int ret;
if (!indio_dev->info) @@ -1899,7 +1899,8 @@ int __iio_device_register(struct iio_dev /* If the calling driver did not initialize firmware node, do it here */ if (dev_fwnode(&indio_dev->dev)) fwnode = dev_fwnode(&indio_dev->dev); - else + /* The default dummy IIO device has no parent */ + else if (indio_dev->dev.parent) fwnode = dev_fwnode(indio_dev->dev.parent); device_set_node(&indio_dev->dev, fwnode);
From: Matti Vaittinen mazziesaccount@gmail.com
commit d47b9b84292706784482a661324bbc178153781f upstream.
The driver is expecting accuracy of NANOs for intensity scale in raw_write. The IIO core is however defaulting to MICROs. This leads the raw-write of smallest scales to never succeed as correct selector(s) are not found.
Fix this by implementing the .write_raw_get_fmt callback to use NANO accuracy for writes of IIO_CHAN_INFO_SCALE.
Signed-off-by: Matti Vaittinen mazziesaccount@gmail.com Fixes: e52afbd61039 ("iio: light: ROHM BU27034 Ambient Light Sensor") Link: https://lore.kernel.org/r/5369117315cf05b88cf0ccb87373fd77190f6ca2.168664842... Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/light/rohm-bu27034.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/iio/light/rohm-bu27034.c b/drivers/iio/light/rohm-bu27034.c index e63ef5789cde..bf3de853a811 100644 --- a/drivers/iio/light/rohm-bu27034.c +++ b/drivers/iio/light/rohm-bu27034.c @@ -575,7 +575,7 @@ static int bu27034_set_scale(struct bu27034_data *data, int chan, return -EINVAL;
if (chan == BU27034_CHAN_ALS) { - if (val == 0 && val2 == 1000) + if (val == 0 && val2 == 1000000) return 0;
return -EINVAL; @@ -587,7 +587,7 @@ static int bu27034_set_scale(struct bu27034_data *data, int chan, goto unlock_out;
ret = iio_gts_find_gain_sel_for_scale_using_time(&data->gts, time_sel, - val, val2 * 1000, &gain_sel); + val, val2, &gain_sel); if (ret) { /* * Could not support scale with given time. Need to change time. @@ -624,7 +624,7 @@ static int bu27034_set_scale(struct bu27034_data *data, int chan,
/* Can we provide requested scale with this time? */ ret = iio_gts_find_gain_sel_for_scale_using_time( - &data->gts, new_time_sel, val, val2 * 1000, + &data->gts, new_time_sel, val, val2, &gain_sel); if (ret) continue; @@ -1217,6 +1217,21 @@ static int bu27034_read_raw(struct iio_dev *idev, } }
+static int bu27034_write_raw_get_fmt(struct iio_dev *indio_dev, + struct iio_chan_spec const *chan, + long mask) +{ + + switch (mask) { + case IIO_CHAN_INFO_SCALE: + return IIO_VAL_INT_PLUS_NANO; + case IIO_CHAN_INFO_INT_TIME: + return IIO_VAL_INT_PLUS_MICRO; + default: + return -EINVAL; + } +} + static int bu27034_write_raw(struct iio_dev *idev, struct iio_chan_spec const *chan, int val, int val2, long mask) @@ -1267,6 +1282,7 @@ static int bu27034_read_avail(struct iio_dev *idev, static const struct iio_info bu27034_info = { .read_raw = &bu27034_read_raw, .write_raw = &bu27034_write_raw, + .write_raw_get_fmt = &bu27034_write_raw_get_fmt, .read_avail = &bu27034_read_avail, };
From: Mike Tipton mdtipton@codeaurora.org
commit d8630f050d3fd2079f8617dd6c00c6509109c755 upstream.
Some BCMs aren't directly associated with the data path (i.e. ACV) and therefore don't communicate using BW. Instead, they are simply enabled/disabled with a simple bit mask. Add support for these.
Origin commit retrieved from: https://git.codelinaro.org/clo/la/kernel/msm-5.15/-/commit/2d1573e0206998151...
Signed-off-by: Mike Tipton mdtipton@codeaurora.org [narmstrong: removed copyright change from original commit] Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20230619-topic-sm8550-upstream-interconnect-mask-v... Fixes: fafc114a468e ("interconnect: qcom: Add SM8450 interconnect provider driver") Signed-off-by: Georgi Djakov djakov@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/interconnect/qcom/bcm-voter.c | 5 +++++ drivers/interconnect/qcom/icc-rpmh.h | 2 ++ 2 files changed, 7 insertions(+)
--- a/drivers/interconnect/qcom/bcm-voter.c +++ b/drivers/interconnect/qcom/bcm-voter.c @@ -83,6 +83,11 @@ static void bcm_aggregate(struct qcom_ic
temp = agg_peak[bucket] * bcm->vote_scale; bcm->vote_y[bucket] = bcm_div(temp, bcm->aux_data.unit); + + if (bcm->enable_mask && (bcm->vote_x[bucket] || bcm->vote_y[bucket])) { + bcm->vote_x[bucket] = 0; + bcm->vote_y[bucket] = bcm->enable_mask; + } }
if (bcm->keepalive && bcm->vote_x[QCOM_ICC_BUCKET_AMC] == 0 && --- a/drivers/interconnect/qcom/icc-rpmh.h +++ b/drivers/interconnect/qcom/icc-rpmh.h @@ -81,6 +81,7 @@ struct qcom_icc_node { * @vote_x: aggregated threshold values, represents sum_bw when @type is bw bcm * @vote_y: aggregated threshold values, represents peak_bw when @type is bw bcm * @vote_scale: scaling factor for vote_x and vote_y + * @enable_mask: optional mask to send as vote instead of vote_x/vote_y * @dirty: flag used to indicate whether the bcm needs to be committed * @keepalive: flag used to indicate whether a keepalive is required * @aux_data: auxiliary data used when calculating threshold values and @@ -97,6 +98,7 @@ struct qcom_icc_bcm { u64 vote_x[QCOM_ICC_NUM_BUCKETS]; u64 vote_y[QCOM_ICC_NUM_BUCKETS]; u64 vote_scale; + u32 enable_mask; bool dirty; bool keepalive; struct bcm_db aux_data;
From: Neil Armstrong neil.armstrong@linaro.org
commit 3cb11fe244d516f757c1022cfa971528d525fe65 upstream.
Set the proper enable_mask the ACV node requiring such value to be used instead of a bandwidth when voting.
The masks was copied from the downstream implementation at [1].
[1] https://git.codelinaro.org/clo/la/kernel/msm-5.15/-/blob/kernel.lnx.5.15.r32...
Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20230619-topic-sm8550-upstream-interconnect-mask-v... Fixes: 3655a63f9661 ("interconnect: qcom: add a driver for sa8775p") Signed-off-by: Georgi Djakov djakov@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/interconnect/qcom/sa8775p.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/interconnect/qcom/sa8775p.c +++ b/drivers/interconnect/qcom/sa8775p.c @@ -1873,6 +1873,7 @@ static struct qcom_icc_node srvc_snoc =
static struct qcom_icc_bcm bcm_acv = { .name = "ACV", + .enable_mask = 0x8, .num_nodes = 1, .nodes = { &ebi }, };
From: Neil Armstrong neil.armstrong@linaro.org
commit be02db24cf840bc0fdfbecc78ad803619dd143e6 upstream.
Set the proper enable_mask to nodes requiring such value to be used instead of a bandwidth when voting.
The masks were copied from the downstream implementation at [1].
[1] https://git.codelinaro.org/clo/la/kernel/msm-5.10/-/blob/KERNEL.PLATFORM.1.0...
Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20230619-topic-sm8550-upstream-interconnect-mask-v... Fixes: fafc114a468e ("interconnect: qcom: Add SM8450 interconnect provider driver") Signed-off-by: Georgi Djakov djakov@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/interconnect/qcom/sm8450.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/interconnect/qcom/sm8450.c +++ b/drivers/interconnect/qcom/sm8450.c @@ -1337,6 +1337,7 @@ static struct qcom_icc_node qns_mem_noc_
static struct qcom_icc_bcm bcm_acv = { .name = "ACV", + .enable_mask = 0x8, .num_nodes = 1, .nodes = { &ebi }, }; @@ -1349,6 +1350,7 @@ static struct qcom_icc_bcm bcm_ce0 = {
static struct qcom_icc_bcm bcm_cn0 = { .name = "CN0", + .enable_mask = 0x1, .keepalive = true, .num_nodes = 55, .nodes = { &qnm_gemnoc_cnoc, &qnm_gemnoc_pcie, @@ -1383,6 +1385,7 @@ static struct qcom_icc_bcm bcm_cn0 = {
static struct qcom_icc_bcm bcm_co0 = { .name = "CO0", + .enable_mask = 0x1, .num_nodes = 2, .nodes = { &qxm_nsp, &qns_nsp_gemnoc }, }; @@ -1403,6 +1406,7 @@ static struct qcom_icc_bcm bcm_mm0 = {
static struct qcom_icc_bcm bcm_mm1 = { .name = "MM1", + .enable_mask = 0x1, .num_nodes = 12, .nodes = { &qnm_camnoc_hf, &qnm_camnoc_icp, &qnm_camnoc_sf, &qnm_mdp, @@ -1445,6 +1449,7 @@ static struct qcom_icc_bcm bcm_sh0 = {
static struct qcom_icc_bcm bcm_sh1 = { .name = "SH1", + .enable_mask = 0x1, .num_nodes = 7, .nodes = { &alm_gpu_tcu, &alm_sys_tcu, &qnm_nsp_gemnoc, &qnm_pcie, @@ -1461,6 +1466,7 @@ static struct qcom_icc_bcm bcm_sn0 = {
static struct qcom_icc_bcm bcm_sn1 = { .name = "SN1", + .enable_mask = 0x1, .num_nodes = 4, .nodes = { &qhm_gic, &qxm_pimem, &xm_gic, &qns_gemnoc_gc }, @@ -1492,6 +1498,7 @@ static struct qcom_icc_bcm bcm_sn7 = {
static struct qcom_icc_bcm bcm_acv_disp = { .name = "ACV", + .enable_mask = 0x1, .num_nodes = 1, .nodes = { &ebi_disp }, }; @@ -1510,6 +1517,7 @@ static struct qcom_icc_bcm bcm_mm0_disp
static struct qcom_icc_bcm bcm_mm1_disp = { .name = "MM1", + .enable_mask = 0x1, .num_nodes = 3, .nodes = { &qnm_mdp_disp, &qnm_rot_disp, &qns_mem_noc_sf_disp }, @@ -1523,6 +1531,7 @@ static struct qcom_icc_bcm bcm_sh0_disp
static struct qcom_icc_bcm bcm_sh1_disp = { .name = "SH1", + .enable_mask = 0x1, .num_nodes = 1, .nodes = { &qnm_pcie_disp }, };
From: Neil Armstrong neil.armstrong@linaro.org
commit 0dc82bd9e4627065dbc6ac8468296aa18f13c840 upstream.
Set the proper enable_mask to nodes requiring such value to be used instead of a bandwidth when voting.
The masks were copied from the downstream implementation at [1].
[1] https://git.codelinaro.org/clo/la/kernel/msm-5.15/-/blob/kernel.lnx.5.15.r1-...
Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20230619-topic-sm8550-upstream-interconnect-mask-v... Fixes: e6f0d6a30f73 ("interconnect: qcom: Add SM8550 interconnect provider driver") Signed-off-by: Georgi Djakov djakov@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/interconnect/qcom/sm8550.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/interconnect/qcom/sm8550.c b/drivers/interconnect/qcom/sm8550.c index d823ba988ef6..0864ed285375 100644 --- a/drivers/interconnect/qcom/sm8550.c +++ b/drivers/interconnect/qcom/sm8550.c @@ -1473,6 +1473,7 @@ static struct qcom_icc_node qns_mem_noc_sf_cam_ife_2 = {
static struct qcom_icc_bcm bcm_acv = { .name = "ACV", + .enable_mask = 0x8, .num_nodes = 1, .nodes = { &ebi }, }; @@ -1485,6 +1486,7 @@ static struct qcom_icc_bcm bcm_ce0 = {
static struct qcom_icc_bcm bcm_cn0 = { .name = "CN0", + .enable_mask = 0x1, .keepalive = true, .num_nodes = 54, .nodes = { &qsm_cfg, &qhs_ahb2phy0, @@ -1524,6 +1526,7 @@ static struct qcom_icc_bcm bcm_cn1 = {
static struct qcom_icc_bcm bcm_co0 = { .name = "CO0", + .enable_mask = 0x1, .num_nodes = 2, .nodes = { &qxm_nsp, &qns_nsp_gemnoc }, }; @@ -1549,6 +1552,7 @@ static struct qcom_icc_bcm bcm_mm0 = {
static struct qcom_icc_bcm bcm_mm1 = { .name = "MM1", + .enable_mask = 0x1, .num_nodes = 8, .nodes = { &qnm_camnoc_hf, &qnm_camnoc_icp, &qnm_camnoc_sf, &qnm_vapss_hcp, @@ -1589,6 +1593,7 @@ static struct qcom_icc_bcm bcm_sh0 = {
static struct qcom_icc_bcm bcm_sh1 = { .name = "SH1", + .enable_mask = 0x1, .num_nodes = 13, .nodes = { &alm_gpu_tcu, &alm_sys_tcu, &chm_apps, &qnm_gpu, @@ -1608,6 +1613,7 @@ static struct qcom_icc_bcm bcm_sn0 = {
static struct qcom_icc_bcm bcm_sn1 = { .name = "SN1", + .enable_mask = 0x1, .num_nodes = 3, .nodes = { &qhm_gic, &xm_gic, &qns_gemnoc_gc }, @@ -1633,6 +1639,7 @@ static struct qcom_icc_bcm bcm_sn7 = {
static struct qcom_icc_bcm bcm_acv_disp = { .name = "ACV", + .enable_mask = 0x1, .num_nodes = 1, .nodes = { &ebi_disp }, }; @@ -1657,12 +1664,14 @@ static struct qcom_icc_bcm bcm_sh0_disp = {
static struct qcom_icc_bcm bcm_sh1_disp = { .name = "SH1", + .enable_mask = 0x1, .num_nodes = 2, .nodes = { &qnm_mnoc_hf_disp, &qnm_pcie_disp }, };
static struct qcom_icc_bcm bcm_acv_cam_ife_0 = { .name = "ACV", + .enable_mask = 0x0, .num_nodes = 1, .nodes = { &ebi_cam_ife_0 }, }; @@ -1681,6 +1690,7 @@ static struct qcom_icc_bcm bcm_mm0_cam_ife_0 = {
static struct qcom_icc_bcm bcm_mm1_cam_ife_0 = { .name = "MM1", + .enable_mask = 0x1, .num_nodes = 4, .nodes = { &qnm_camnoc_hf_cam_ife_0, &qnm_camnoc_icp_cam_ife_0, &qnm_camnoc_sf_cam_ife_0, &qns_mem_noc_sf_cam_ife_0 }, @@ -1694,6 +1704,7 @@ static struct qcom_icc_bcm bcm_sh0_cam_ife_0 = {
static struct qcom_icc_bcm bcm_sh1_cam_ife_0 = { .name = "SH1", + .enable_mask = 0x1, .num_nodes = 3, .nodes = { &qnm_mnoc_hf_cam_ife_0, &qnm_mnoc_sf_cam_ife_0, &qnm_pcie_cam_ife_0 }, @@ -1701,6 +1712,7 @@ static struct qcom_icc_bcm bcm_sh1_cam_ife_0 = {
static struct qcom_icc_bcm bcm_acv_cam_ife_1 = { .name = "ACV", + .enable_mask = 0x0, .num_nodes = 1, .nodes = { &ebi_cam_ife_1 }, }; @@ -1719,6 +1731,7 @@ static struct qcom_icc_bcm bcm_mm0_cam_ife_1 = {
static struct qcom_icc_bcm bcm_mm1_cam_ife_1 = { .name = "MM1", + .enable_mask = 0x1, .num_nodes = 4, .nodes = { &qnm_camnoc_hf_cam_ife_1, &qnm_camnoc_icp_cam_ife_1, &qnm_camnoc_sf_cam_ife_1, &qns_mem_noc_sf_cam_ife_1 }, @@ -1732,6 +1745,7 @@ static struct qcom_icc_bcm bcm_sh0_cam_ife_1 = {
static struct qcom_icc_bcm bcm_sh1_cam_ife_1 = { .name = "SH1", + .enable_mask = 0x1, .num_nodes = 3, .nodes = { &qnm_mnoc_hf_cam_ife_1, &qnm_mnoc_sf_cam_ife_1, &qnm_pcie_cam_ife_1 }, @@ -1739,6 +1753,7 @@ static struct qcom_icc_bcm bcm_sh1_cam_ife_1 = {
static struct qcom_icc_bcm bcm_acv_cam_ife_2 = { .name = "ACV", + .enable_mask = 0x0, .num_nodes = 1, .nodes = { &ebi_cam_ife_2 }, }; @@ -1757,6 +1772,7 @@ static struct qcom_icc_bcm bcm_mm0_cam_ife_2 = {
static struct qcom_icc_bcm bcm_mm1_cam_ife_2 = { .name = "MM1", + .enable_mask = 0x1, .num_nodes = 4, .nodes = { &qnm_camnoc_hf_cam_ife_2, &qnm_camnoc_icp_cam_ife_2, &qnm_camnoc_sf_cam_ife_2, &qns_mem_noc_sf_cam_ife_2 }, @@ -1770,6 +1786,7 @@ static struct qcom_icc_bcm bcm_sh0_cam_ife_2 = {
static struct qcom_icc_bcm bcm_sh1_cam_ife_2 = { .name = "SH1", + .enable_mask = 0x1, .num_nodes = 3, .nodes = { &qnm_mnoc_hf_cam_ife_2, &qnm_mnoc_sf_cam_ife_2, &qnm_pcie_cam_ife_2 },
From: Ido Schimmel idosch@nvidia.com
commit 11604178fdc3404d46518e5332f1fe865d30c4a1 upstream.
The test installs filters that match on various IP fragments (e.g., no fragment, first fragment) and expects a certain amount of packets to hit each filter. This is problematic as the filters are not specific enough and can match IP packets (e.g., IGMP) generated by the stack, resulting in failures [1].
Fix by making the filters more specific and match on more fields in the IP header: Source IP, destination IP and protocol.
[1] # timeout set to 0 # selftests: net/forwarding: tc_tunnel_key.sh # TEST: tunnel_key nofrag (skip_hw) [FAIL] # packet smaller than MTU was not tunneled # INFO: Could not test offloaded functionality not ok 89 selftests: net/forwarding: tc_tunnel_key.sh # exit=1
Fixes: 533a89b1940f ("selftests: forwarding: add tunnel_key "nofrag" test case") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Acked-by: Davide Caratti dcaratti@redhat.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-14-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/tc_tunnel_key.sh | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/tc_tunnel_key.sh b/tools/testing/selftests/net/forwarding/tc_tunnel_key.sh index 5ac184d51809..5a5dd9034819 100755 --- a/tools/testing/selftests/net/forwarding/tc_tunnel_key.sh +++ b/tools/testing/selftests/net/forwarding/tc_tunnel_key.sh @@ -104,11 +104,14 @@ tunnel_key_nofrag_test() local i
tc filter add dev $swp1 ingress protocol ip pref 100 handle 100 \ - flower ip_flags nofrag action drop + flower src_ip 192.0.2.1 dst_ip 192.0.2.2 ip_proto udp \ + ip_flags nofrag action drop tc filter add dev $swp1 ingress protocol ip pref 101 handle 101 \ - flower ip_flags firstfrag action drop + flower src_ip 192.0.2.1 dst_ip 192.0.2.2 ip_proto udp \ + ip_flags firstfrag action drop tc filter add dev $swp1 ingress protocol ip pref 102 handle 102 \ - flower ip_flags nofirstfrag action drop + flower src_ip 192.0.2.1 dst_ip 192.0.2.2 ip_proto udp \ + ip_flags nofirstfrag action drop
# test 'nofrag' set tc filter add dev h1-et egress protocol all pref 1 handle 1 matchall $tcflags \
From: Ido Schimmel idosch@nvidia.com
commit 23fb886a1ced6f885ddd541cc86d45c415ce705c upstream.
MAC Merge cannot be tested with veth pairs, resulting in failures:
# ./ethtool_mm.sh [...] TEST: Manual configuration with verification: swp1 to swp2 [FAIL] Verification did not succeed
Fix by skipping the test when the interfaces do not support MAC Merge.
Fixes: e6991384ace5 ("selftests: forwarding: add a test for MAC Merge layer") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-11-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- .../selftests/net/forwarding/ethtool_mm.sh | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/ethtool_mm.sh b/tools/testing/selftests/net/forwarding/ethtool_mm.sh index c580ad623848..39e736f30322 100755 --- a/tools/testing/selftests/net/forwarding/ethtool_mm.sh +++ b/tools/testing/selftests/net/forwarding/ethtool_mm.sh @@ -258,11 +258,6 @@ h2_destroy()
setup_prepare() { - check_ethtool_mm_support - check_tc_fp_support - require_command lldptool - bail_on_lldpad "autoconfigure the MAC Merge layer" "configure it manually" - h1=${NETIFS[p1]} h2=${NETIFS[p2]}
@@ -278,6 +273,19 @@ cleanup() h1_destroy }
+check_ethtool_mm_support +check_tc_fp_support +require_command lldptool +bail_on_lldpad "autoconfigure the MAC Merge layer" "configure it manually" + +for netif in ${NETIFS[@]}; do + ethtool --show-mm $netif 2>&1 &> /dev/null + if [[ $? -ne 0 ]]; then + echo "SKIP: $netif does not support MAC Merge" + exit $ksft_skip + fi +done + trap cleanup EXIT
setup_prepare
From: Ido Schimmel idosch@nvidia.com
commit 6bdf3d9765f4e0eebfd919e70acc65bce5daa9b9 upstream.
The selftest relies on iproute2 changes present in version 6.3, but the test does not check for it, resulting in errors:
# ./bridge_mdb_max.sh INFO: 802.1d tests TEST: cfg4: port: ngroups reporting [FAIL] Number of groups was null, now is null, but 5 expected TEST: ctl4: port: ngroups reporting [FAIL] Number of groups was null, now is null, but 5 expected TEST: cfg6: port: ngroups reporting [FAIL] Number of groups was null, now is null, but 5 expected [...]
Fix by skipping the test if iproute2 is too old.
Fixes: 3446dcd7df05 ("selftests: forwarding: bridge_mdb_max: Add a new selftest") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/6b04b2ba-2372-6f6b-3ac8-b7cba1cfae83@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/bridge_mdb_max.sh | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh b/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh index ae255b662ba3..fa762b716288 100755 --- a/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh +++ b/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh @@ -1328,6 +1328,11 @@ test_8021qvs() switch_destroy }
+if ! bridge link help 2>&1 | grep -q "mcast_max_groups"; then + echo "SKIP: iproute2 too old, missing bridge "mcast_max_groups" support" + exit $ksft_skip +fi + trap cleanup EXIT
setup_prepare
From: Ido Schimmel idosch@nvidia.com
commit ab2eda04e2c2116910b9d77ccc82e727efa71d49 upstream.
The selftest relies on iproute2 changes present in version 6.3, but the test does not check for it, resulting in error:
# ./bridge_mdb.sh
INFO: # Host entries configuration tests TEST: Common host entries configuration tests (IPv4) [FAIL] Managed to add IPv4 host entry with a filter mode TEST: Common host entries configuration tests (IPv6) [FAIL] Managed to add IPv6 host entry with a filter mode TEST: Common host entries configuration tests (L2) [FAIL] Managed to add L2 host entry with a filter mode
INFO: # Port group entries configuration tests - (*, G) Command "replace" is unknown, try "bridge mdb help". [...]
Fix by skipping the test if iproute2 is too old.
Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/6b04b2ba-2372-6f6b-3ac8-b7cba1cfae83@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/bridge_mdb.sh | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh index ae3f9462a2b6..6f830b5f03c9 100755 --- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh +++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh @@ -1206,6 +1206,11 @@ ctrl_test() ctrl_mldv2_is_in_test }
+if ! bridge mdb help 2>&1 | grep -q "replace"; then + echo "SKIP: iproute2 too old, missing bridge mdb replace support" + exit $ksft_skip +fi + trap cleanup EXIT
setup_prepare
From: Raghavendra Rao Ananta rananta@google.com
commit c718ca0e99401d80d2480c08e1b02cf5f7cd7033 upstream.
When running in protected mode, the hyp stub is disabled after pKVM is initialized, meaning the host cannot enable/disable the hyp at runtime. As such, kvm_arm_hardware_enabled is always 1 after initialization, and kvm_arch_hardware_enable() never enables the vgic maintenance irq or timer irqs.
Unconditionally enable/disable the vgic + timer irqs in the respective calls, instead relying on the percpu bookkeeping in the generic code to keep track of which cpus have the interrupts unmasked.
Fixes: 466d27e48d7c ("KVM: arm64: Simplify the CPUHP logic") Reported-by: Oliver Upton oliver.upton@linux.dev Suggested-by: Oliver Upton oliver.upton@linux.dev Signed-off-by: Raghavendra Rao Ananta rananta@google.com Link: https://lore.kernel.org/r/20230719175400.647154-1-rananta@google.com Acked-by: Marc Zyngier maz@kernel.org Signed-off-by: Oliver Upton oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/arm.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-)
--- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1800,8 +1800,6 @@ static void _kvm_arch_hardware_enable(vo
int kvm_arch_hardware_enable(void) { - int was_enabled; - /* * Most calls to this function are made with migration * disabled, but not with preemption disabled. The former is @@ -1810,13 +1808,10 @@ int kvm_arch_hardware_enable(void) */ preempt_disable();
- was_enabled = __this_cpu_read(kvm_arm_hardware_enabled); _kvm_arch_hardware_enable(NULL);
- if (!was_enabled) { - kvm_vgic_cpu_up(); - kvm_timer_cpu_up(); - } + kvm_vgic_cpu_up(); + kvm_timer_cpu_up();
preempt_enable();
@@ -1833,10 +1828,8 @@ static void _kvm_arch_hardware_disable(v
void kvm_arch_hardware_disable(void) { - if (__this_cpu_read(kvm_arm_hardware_enabled)) { - kvm_timer_cpu_down(); - kvm_vgic_cpu_down(); - } + kvm_timer_cpu_down(); + kvm_vgic_cpu_down();
if (!is_protected_kvm_enabled()) _kvm_arch_hardware_disable(NULL);
From: Miquel Raynal miquel.raynal@bootlin.com
commit 422dbc66b7702ae797326d5480c3c9b6467053da upstream.
Probably a copy/paste error with the previous block, here we are actually managing C2H IRQs.
Fixes: 17ce252266c7 ("dmaengine: xilinx: xdma: Add xilinx xdma driver") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/r/20230731101442.792514-3-miquel.raynal@bootlin.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/xilinx/xdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/dma/xilinx/xdma.c +++ b/drivers/dma/xilinx/xdma.c @@ -717,7 +717,7 @@ static int xdma_irq_init(struct xdma_dev ret = request_irq(irq, xdma_channel_isr, 0, "xdma-c2h-channel", &xdev->c2h_chans[j]); if (ret) { - xdma_err(xdev, "H2C channel%d request irq%d failed: %d", + xdma_err(xdev, "C2H channel%d request irq%d failed: %d", j, irq, ret); goto failed_init_c2h; }
From: Minjie Du duminjie@vivo.com
commit a68b48afc050a9456ed4ed19d8755e0f925b44e6 upstream.
Fix: make IS_ERR() judge the devm_ioremap_resource() function return.
Fixes: 17ce252266c7 ("dmaengine: xilinx: xdma: Add xilinx xdma driver") Signed-off-by: Minjie Du duminjie@vivo.com Acked-by: Michal Simek michal.simek@amd.com Link: https://lore.kernel.org/r/20230705113912.16247-1-duminjie@vivo.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/xilinx/xdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/dma/xilinx/xdma.c +++ b/drivers/dma/xilinx/xdma.c @@ -894,7 +894,7 @@ static int xdma_probe(struct platform_de }
reg_base = devm_ioremap_resource(&pdev->dev, res); - if (!reg_base) { + if (IS_ERR(reg_base)) { xdma_err(xdev, "ioremap failed"); goto failed; }
From: Xu Kuohai xukuohai@huawei.com
commit 90f0074cd9f9a24b7b6c4d5afffa676aee48c0e9 upstream.
BPF CI has reported the following failure:
Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir Error: #200/79 sockmap_listen/sockmap VSOCK test_vsock_redir ./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected vsock_unix_redir_connectible:FAIL:1506 ./test_progs:vsock_unix_redir_connectible:1506: ingress: write: Transport endpoint is not connected vsock_unix_redir_connectible:FAIL:1506 ./test_progs:vsock_unix_redir_connectible:1506: egress: write: Transport endpoint is not connected vsock_unix_redir_connectible:FAIL:1506 ./test_progs:vsock_unix_redir_connectible:1514: ingress: recv() err, errno=11 vsock_unix_redir_connectible:FAIL:1514 ./test_progs:vsock_unix_redir_connectible:1518: ingress: vsock socket map failed, a != b vsock_unix_redir_connectible:FAIL:1518 ./test_progs:vsock_unix_redir_connectible:1525: ingress: want pass count 1, have 0
It’s because the recv(... MSG_DONTWAIT) syscall in the test case is called before the queued work sk_psock_backlog() in the kernel finishes executing. So the data to be read is still queued in psock->ingress_skb and cannot be read by the user program. Therefore, the non-blocking recv() reads nothing and reports an EAGAIN error.
So replace recv(... MSG_DONTWAIT) with xrecv_nonblock(), which calls select() to wait for data to be readable or timeout before calls recv().
Fixes: d61bd8c1fd02 ("selftests/bpf: add a test case for vsock sockmap") Signed-off-by: Xu Kuohai xukuohai@huawei.com Link: https://lore.kernel.org/r/20230804073740.194770-4-xukuohai@huaweicloud.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index b4f6f3a50ae5..ba35bcc66e7e 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -1432,7 +1432,7 @@ static void vsock_unix_redir_connectible(int sock_mapfd, int verd_mapfd, if (n < 1) goto out;
- n = recv(mode == REDIR_INGRESS ? u0 : u1, &b, sizeof(b), MSG_DONTWAIT); + n = xrecv_nonblock(mode == REDIR_INGRESS ? u0 : u1, &b, sizeof(b), 0); if (n < 0) FAIL("%s: recv() err, errno=%d", log_prefix, errno); if (n == 0)
From: Mark Brown broonie@kernel.org
commit d5ad9aae13dcced333c1a7816ff0a4fbbb052466 upstream.
Commit 3bcbc20942db ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+") which is now in Linus' tree introduced uses of __weak but did nothing to ensure that a definition is provided for it resulting in build failures for the rseq tests:
rseq.c:41:1: error: unknown type name '__weak' __weak ptrdiff_t __rseq_offset; ^ rseq.c:41:17: error: expected ';' after top level declarator __weak ptrdiff_t __rseq_offset; ^ ; rseq.c:42:1: error: unknown type name '__weak' __weak unsigned int __rseq_size; ^ rseq.c:43:1: error: unknown type name '__weak' __weak unsigned int __rseq_flags;
Fix this by using the definition from tools/include compiler.h.
Fixes: 3bcbc20942db ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+") Signed-off-by: Mark Brown broonie@kernel.org Message-Id: 20230804-kselftest-rseq-build-v1-1-015830b66aa9@kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/rseq/Makefile | 4 +++- tools/testing/selftests/rseq/rseq.c | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -4,8 +4,10 @@ ifneq ($(shell $(CC) --version 2>&1 | he CLANG_FLAGS += -no-integrated-as endif
+top_srcdir = ../../../.. + CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -L$(OUTPUT) -Wl,-rpath=./ \ - $(CLANG_FLAGS) + $(CLANG_FLAGS) -I$(top_srcdir)/tools/include LDLIBS += -lpthread -ldl
# Own dependencies because we only want to build against 1st prerequisite, but --- a/tools/testing/selftests/rseq/rseq.c +++ b/tools/testing/selftests/rseq/rseq.c @@ -31,6 +31,8 @@ #include <sys/auxv.h> #include <linux/auxvec.h>
+#include <linux/compiler.h> + #include "../kselftest.h" #include "rseq.h"
From: Ido Schimmel idosch@nvidia.com
commit 66e131861ab7bf754b50813216f5c6885cd32d63 upstream.
A handful of tests require physical loopbacks to be used instead of veth pairs. Add a helper that these tests will invoke in order to be skipped when executed with veth pairs.
Fixes: 64916b57c0b1 ("selftests: forwarding: Add speed and auto-negotiation test") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-7-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/lib.sh | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -164,6 +164,17 @@ check_port_mab_support() fi }
+skip_on_veth() +{ + local kind=$(ip -j -d link show dev ${NETIFS[p1]} | + jq -r '.[].linkinfo.info_kind') + + if [[ $kind == veth ]]; then + echo "SKIP: Test cannot be run with veth pairs" + exit $ksft_skip + fi +} + if [[ "$(id -u)" -ne 0 ]]; then echo "SKIP: need root privileges" exit $ksft_skip
From: Ido Schimmel idosch@nvidia.com
commit 60a36e21915c31c0375d9427be9406aa8ce2ec34 upstream.
Auto-negotiation cannot be tested with veth pairs, resulting in failures:
# ./ethtool.sh TEST: force of same speed autoneg off [FAIL] error in configuration. swp1 speed Not autoneg off [...]
Fix by skipping the test when used with veth pairs.
Fixes: 64916b57c0b1 ("selftests: forwarding: Add speed and auto-negotiation test") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-8-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/ethtool.sh | 2 ++ 1 file changed, 2 insertions(+)
--- a/tools/testing/selftests/net/forwarding/ethtool.sh +++ b/tools/testing/selftests/net/forwarding/ethtool.sh @@ -286,6 +286,8 @@ different_speeds_autoneg_on() ethtool -s $h1 autoneg on }
+skip_on_veth + trap cleanup EXIT
setup_prepare
From: Ido Schimmel idosch@nvidia.com
commit b3d9305e60d121dac20a77b6847c4cf14a4c0001 upstream.
Ethtool extended state cannot be tested with veth pairs, resulting in failures:
# ./ethtool_extended_state.sh TEST: Autoneg, No partner detected [FAIL] Expected "Autoneg", got "Link detected: no" [...]
Fix by skipping the test when used with veth pairs.
Fixes: 7d10bcce98cd ("selftests: forwarding: Add tests for ethtool extended state") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-9-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/ethtool_extended_state.sh | 2 ++ 1 file changed, 2 insertions(+)
--- a/tools/testing/selftests/net/forwarding/ethtool_extended_state.sh +++ b/tools/testing/selftests/net/forwarding/ethtool_extended_state.sh @@ -108,6 +108,8 @@ no_cable() ip link set dev $swp3 down }
+skip_on_veth + setup_prepare
tests_run
From: Ido Schimmel idosch@nvidia.com
commit 9a711cde07c245a163d95eee5b42ed1871e73236 upstream.
Layer 3 hardware stats cannot be used when the underlying interfaces are veth pairs, resulting in failures:
# ./hw_stats_l3_gre.sh TEST: ping gre flat [ OK ] TEST: Test rx packets: [FAIL] Traffic not reflected in the counter: 0 -> 0 TEST: Test tx packets: [FAIL] Traffic not reflected in the counter: 0 -> 0
Fix by skipping the test when used with veth pairs.
Fixes: 813f97a26860 ("selftests: forwarding: Add a tunnel-based test for L3 HW stats") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-10-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/hw_stats_l3_gre.sh | 2 ++ 1 file changed, 2 insertions(+)
--- a/tools/testing/selftests/net/forwarding/hw_stats_l3_gre.sh +++ b/tools/testing/selftests/net/forwarding/hw_stats_l3_gre.sh @@ -99,6 +99,8 @@ test_stats_rx() test_stats g2a rx }
+skip_on_veth + trap cleanup EXIT
setup_prepare
From: Ido Schimmel idosch@nvidia.com
commit d72c83b1e4b4a36a38269c77a85ff52f95eb0d08 upstream.
As explained in [1], the forwarding selftests are meant to be run with either physical loopbacks or veth pairs. The interfaces are expected to be specified in a user-provided forwarding.config file or as command line arguments. By default, this file is not present and the tests fail:
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests [...] TAP version 13 1..102 # timeout set to 45 # selftests: net/forwarding: bridge_igmp.sh # Command line is not complete. Try option "help" # Failed to create netif not ok 1 selftests: net/forwarding: bridge_igmp.sh # exit=1 [...]
Fix by skipping a test if interfaces are not provided either via the configuration file or command line arguments.
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests [...] TAP version 13 1..102 # timeout set to 45 # selftests: net/forwarding: bridge_igmp.sh # SKIP: Cannot create interface. Name not specified ok 1 selftests: net/forwarding: bridge_igmp.sh # SKIP
[1] tools/testing/selftests/net/forwarding/README
Fixes: 81573b18f26d ("selftests/net/forwarding: add Makefile to install tests") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/856d454e-f83c-20cf-e166-6dc06cbc1543@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/lib.sh | 5 +++++ 1 file changed, 5 insertions(+)
--- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -237,6 +237,11 @@ create_netif_veth() for ((i = 1; i <= NUM_NETIFS; ++i)); do local j=$((i+1))
+ if [ -z ${NETIFS[p$i]} ]; then + echo "SKIP: Cannot create interface. Name not specified" + exit $ksft_skip + fi + ip link show dev ${NETIFS[p$i]} &> /dev/null if [[ $? -ne 0 ]]; then ip link add ${NETIFS[p$i]} type veth \
From: Ido Schimmel idosch@nvidia.com
commit 0529883ad102f6c04e19fb7018f31e1bda575bbe upstream.
The default timeout for selftests is 45 seconds, but it is not enough for forwarding selftests which can takes minutes to finish depending on the number of tests cases:
# make -C tools/testing/selftests TARGETS=net/forwarding run_tests TAP version 13 1..102 # timeout set to 45 # selftests: net/forwarding: bridge_igmp.sh # TEST: IGMPv2 report 239.10.10.10 [ OK ] # TEST: IGMPv2 leave 239.10.10.10 [ OK ] # TEST: IGMPv3 report 239.10.10.10 is_include [ OK ] # TEST: IGMPv3 report 239.10.10.10 include -> allow [ OK ] # not ok 1 selftests: net/forwarding: bridge_igmp.sh # TIMEOUT 45 seconds
Fix by switching off the timeout and setting it to 0. A similar change was done for BPF selftests in commit 6fc5916cc256 ("selftests: bpf: Switch off timeout").
Fixes: 81573b18f26d ("selftests/net/forwarding: add Makefile to install tests") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/8d149f8c-818e-d141-a0ce-a6bae606bc22@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/settings | 1 + 1 file changed, 1 insertion(+) create mode 100644 tools/testing/selftests/net/forwarding/settings
--- /dev/null +++ b/tools/testing/selftests/net/forwarding/settings @@ -0,0 +1 @@ +timeout=0
From: Ido Schimmel idosch@nvidia.com
commit 5e8670610b93158ffacc3241f835454ff26a3469 upstream.
The test relies on 'nc' being the netcat version from the nmap project. While this seems to be the case on Fedora, it is not the case on Ubuntu, resulting in failures such as [1].
Fix by explicitly using the 'ncat' utility from the nmap project and the skip the test in case it is not installed.
[1] # timeout set to 0 # selftests: net/forwarding: tc_actions.sh # TEST: gact drop and ok (skip_hw) [ OK ] # TEST: mirred egress flower redirect (skip_hw) [ OK ] # TEST: mirred egress flower mirror (skip_hw) [ OK ] # TEST: mirred egress matchall mirror (skip_hw) [ OK ] # TEST: mirred_egress_to_ingress (skip_hw) [ OK ] # nc: invalid option -- '-' # usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl] # [-m minttl] [-O length] [-P proxy_username] [-p source_port] # [-q seconds] [-s sourceaddr] [-T keyword] [-V rtable] [-W recvlimit] # [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]] # [destination] [port] # nc: invalid option -- '-' # usage: nc [-46CDdFhklNnrStUuvZz] [-I length] [-i interval] [-M ttl] # [-m minttl] [-O length] [-P proxy_username] [-p source_port] # [-q seconds] [-s sourceaddr] [-T keyword] [-V rtable] [-W recvlimit] # [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]] # [destination] [port] # TEST: mirred_egress_to_ingress_tcp (skip_hw) [FAIL] # server output check failed # INFO: Could not test offloaded functionality not ok 80 selftests: net/forwarding: tc_actions.sh # exit=1
Fixes: ca22da2fbd69 ("act_mirred: use the backlog for nested calls to mirred ingress") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-12-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/tc_actions.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh @@ -9,6 +9,8 @@ NUM_NETIFS=4 source tc_common.sh source lib.sh
+require_command ncat + tcflags="skip_hw"
h1_create() @@ -220,9 +222,9 @@ mirred_egress_to_ingress_tcp_test() ip_proto icmp \ action drop
- ip vrf exec v$h1 nc --recv-only -w10 -l -p 12345 -o $mirred_e2i_tf2 & + ip vrf exec v$h1 ncat --recv-only -w10 -l -p 12345 -o $mirred_e2i_tf2 & local rpid=$! - ip vrf exec v$h1 nc -w1 --send-only 192.0.2.2 12345 <$mirred_e2i_tf1 + ip vrf exec v$h1 ncat -w1 --send-only 192.0.2.2 12345 <$mirred_e2i_tf1 wait -n $rpid cmp -s $mirred_e2i_tf1 $mirred_e2i_tf2 check_err $? "server output check failed"
From: Ido Schimmel idosch@nvidia.com
commit 9ee37e53e7687654b487fc94e82569377272a7a8 upstream.
The test checks that filters that match on source or destination MAC were only hit once. A host can send more than one packet with a given source or destination MAC, resulting in failures.
Fix by relaxing the success criterion and instead check that the filters were not hit zero times. Using tc_check_at_least_x_packets() is also an option, but it is not available in older kernels.
Fixes: 07e5c75184a1 ("selftests: forwarding: Introduce tc flower matching tests") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-13-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/tc_flower.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/net/forwarding/tc_flower.sh +++ b/tools/testing/selftests/net/forwarding/tc_flower.sh @@ -52,8 +52,8 @@ match_dst_mac_test() tc_check_packets "dev $h2 ingress" 101 1 check_fail $? "Matched on a wrong filter"
- tc_check_packets "dev $h2 ingress" 102 1 - check_err $? "Did not match on correct filter" + tc_check_packets "dev $h2 ingress" 102 0 + check_fail $? "Did not match on correct filter"
tc filter del dev $h2 ingress protocol ip pref 1 handle 101 flower tc filter del dev $h2 ingress protocol ip pref 2 handle 102 flower @@ -78,8 +78,8 @@ match_src_mac_test() tc_check_packets "dev $h2 ingress" 101 1 check_fail $? "Matched on a wrong filter"
- tc_check_packets "dev $h2 ingress" 102 1 - check_err $? "Did not match on correct filter" + tc_check_packets "dev $h2 ingress" 102 0 + check_fail $? "Did not match on correct filter"
tc filter del dev $h2 ingress protocol ip pref 1 handle 101 flower tc filter del dev $h2 ingress protocol ip pref 2 handle 102 flower
From: Ido Schimmel idosch@nvidia.com
commit cb034948ac292da82cc0e6bc1340f81be36e117d upstream.
As explained in commit 8bcfb4ae4d97 ("selftests: forwarding: Fix failing tests with old libnet"), old versions of libnet (used by mausezahn) do not use the "SO_BINDTODEVICE" socket option. For IP unicast packets, this can be solved by prefixing mausezahn invocations with "ip vrf exec". However, IP multicast packets do not perform routing and simply egress the bound device, which does not exist in this case.
Fix by specifying the source and destination MAC of the packet which will cause mausezahn to use a packet socket instead of an IP socket.
Fixes: 3446dcd7df05 ("selftests: forwarding: bridge_mdb_max: Add a new selftest") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-17-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- .../selftests/net/forwarding/bridge_mdb_max.sh | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh b/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh index fa762b716288..3da9d93ab36f 100755 --- a/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh +++ b/tools/testing/selftests/net/forwarding/bridge_mdb_max.sh @@ -252,7 +252,8 @@ ctl4_entries_add() local IPs=$(seq -f 192.0.2.%g 1 $((n - 1))) local peer=$(locus_dev_peer $locus) local GRP=239.1.1.${grp} - $MZ $peer -c 1 -A 192.0.2.1 -B $GRP \ + local dmac=01:00:5e:01:01:$(printf "%02x" $grp) + $MZ $peer -a own -b $dmac -c 1 -A 192.0.2.1 -B $GRP \ -t ip proto=2,p=$(igmpv3_is_in_get $GRP $IPs) -q sleep 1
@@ -272,7 +273,8 @@ ctl4_entries_del()
local peer=$(locus_dev_peer $locus) local GRP=239.1.1.${grp} - $MZ $peer -c 1 -A 192.0.2.1 -B 224.0.0.2 \ + local dmac=01:00:5e:00:00:02 + $MZ $peer -a own -b $dmac -c 1 -A 192.0.2.1 -B 224.0.0.2 \ -t ip proto=2,p=$(igmpv2_leave_get $GRP) -q sleep 1 ! bridge mdb show dev br0 | grep -q $GRP @@ -289,8 +291,10 @@ ctl6_entries_add() local peer=$(locus_dev_peer $locus) local SIP=fe80::1 local GRP=ff0e::${grp} + local dmac=33:33:00:00:00:$(printf "%02x" $grp) local p=$(mldv2_is_in_get $SIP $GRP $IPs) - $MZ -6 $peer -c 1 -A $SIP -B $GRP -t ip hop=1,next=0,p="$p" -q + $MZ -6 $peer -a own -b $dmac -c 1 -A $SIP -B $GRP \ + -t ip hop=1,next=0,p="$p" -q sleep 1
local nn=$(bridge mdb show dev br0 | grep $GRP | wc -l) @@ -310,8 +314,10 @@ ctl6_entries_del() local peer=$(locus_dev_peer $locus) local SIP=fe80::1 local GRP=ff0e::${grp} + local dmac=33:33:00:00:00:$(printf "%02x" $grp) local p=$(mldv1_done_get $SIP $GRP) - $MZ -6 $peer -c 1 -A $SIP -B $GRP -t ip hop=1,next=0,p="$p" -q + $MZ -6 $peer -a own -b $dmac -c 1 -A $SIP -B $GRP \ + -t ip hop=1,next=0,p="$p" -q sleep 1 ! bridge mdb show dev br0 | grep -q $GRP }
From: Ido Schimmel idosch@nvidia.com
commit e98e195d90cc93b1bf2ad762c7c274a40dab7173 upstream.
As explained in commit 8bcfb4ae4d97 ("selftests: forwarding: Fix failing tests with old libnet"), old versions of libnet (used by mausezahn) do not use the "SO_BINDTODEVICE" socket option. For IP unicast packets, this can be solved by prefixing mausezahn invocations with "ip vrf exec". However, IP multicast packets do not perform routing and simply egress the bound device, which does not exist in this case.
Fix by specifying the source and destination MAC of the packet which will cause mausezahn to use a packet socket instead of an IP socket.
Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-16-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- .../selftests/net/forwarding/bridge_mdb.sh | 46 ++++++++++--------- 1 file changed, 24 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh index 6f830b5f03c9..4853b8e4f8d3 100755 --- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh +++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh @@ -850,6 +850,7 @@ cfg_test() __fwd_test_host_ip() { local grp=$1; shift + local dmac=$1; shift local src=$1; shift local mode=$1; shift local name @@ -872,27 +873,27 @@ __fwd_test_host_ip() # Packet should only be flooded to multicast router ports when there is # no matching MDB entry. The bridge is not configured as a multicast # router port. - $MZ $mode $h1.10 -c 1 -p 128 -A $src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $src -B $grp -t udp -q tc_check_packets "dev br0 ingress" 1 0 check_err $? "Packet locally received after flood"
# Install a regular port group entry and expect the packet to not be # locally received. bridge mdb add dev br0 port $swp2 grp $grp temp vid 10 - $MZ $mode $h1.10 -c 1 -p 128 -A $src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $src -B $grp -t udp -q tc_check_packets "dev br0 ingress" 1 0 check_err $? "Packet locally received after installing a regular entry"
# Add a host entry and expect the packet to be locally received. bridge mdb add dev br0 port br0 grp $grp temp vid 10 - $MZ $mode $h1.10 -c 1 -p 128 -A $src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $src -B $grp -t udp -q tc_check_packets "dev br0 ingress" 1 1 check_err $? "Packet not locally received after adding a host entry"
# Remove the host entry and expect the packet to not be locally # received. bridge mdb del dev br0 port br0 grp $grp vid 10 - $MZ $mode $h1.10 -c 1 -p 128 -A $src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $src -B $grp -t udp -q tc_check_packets "dev br0 ingress" 1 1 check_err $? "Packet locally received after removing a host entry"
@@ -905,8 +906,8 @@ __fwd_test_host_ip()
fwd_test_host_ip() { - __fwd_test_host_ip "239.1.1.1" "192.0.2.1" "-4" - __fwd_test_host_ip "ff0e::1" "2001:db8:1::1" "-6" + __fwd_test_host_ip "239.1.1.1" "01:00:5e:01:01:01" "192.0.2.1" "-4" + __fwd_test_host_ip "ff0e::1" "33:33:00:00:00:01" "2001:db8:1::1" "-6" }
fwd_test_host_l2() @@ -966,6 +967,7 @@ fwd_test_host() __fwd_test_port_ip() { local grp=$1; shift + local dmac=$1; shift local valid_src=$1; shift local invalid_src=$1; shift local mode=$1; shift @@ -999,43 +1001,43 @@ __fwd_test_port_ip() vlan_ethtype $eth_type vlan_id 10 dst_ip $grp \ src_ip $invalid_src action drop
- $MZ $mode $h1.10 -c 1 -p 128 -A $valid_src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $valid_src -B $grp -t udp -q tc_check_packets "dev $h2 ingress" 1 0 check_err $? "Packet from valid source received on H2 before adding entry"
- $MZ $mode $h1.10 -c 1 -p 128 -A $invalid_src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $invalid_src -B $grp -t udp -q tc_check_packets "dev $h2 ingress" 2 0 check_err $? "Packet from invalid source received on H2 before adding entry"
bridge mdb add dev br0 port $swp2 grp $grp vid 10 \ filter_mode $filter_mode source_list $src_list
- $MZ $mode $h1.10 -c 1 -p 128 -A $valid_src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $valid_src -B $grp -t udp -q tc_check_packets "dev $h2 ingress" 1 1 check_err $? "Packet from valid source not received on H2 after adding entry"
- $MZ $mode $h1.10 -c 1 -p 128 -A $invalid_src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $invalid_src -B $grp -t udp -q tc_check_packets "dev $h2 ingress" 2 0 check_err $? "Packet from invalid source received on H2 after adding entry"
bridge mdb replace dev br0 port $swp2 grp $grp vid 10 \ filter_mode exclude
- $MZ $mode $h1.10 -c 1 -p 128 -A $valid_src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $valid_src -B $grp -t udp -q tc_check_packets "dev $h2 ingress" 1 2 check_err $? "Packet from valid source not received on H2 after allowing all sources"
- $MZ $mode $h1.10 -c 1 -p 128 -A $invalid_src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $invalid_src -B $grp -t udp -q tc_check_packets "dev $h2 ingress" 2 1 check_err $? "Packet from invalid source not received on H2 after allowing all sources"
bridge mdb del dev br0 port $swp2 grp $grp vid 10
- $MZ $mode $h1.10 -c 1 -p 128 -A $valid_src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $valid_src -B $grp -t udp -q tc_check_packets "dev $h2 ingress" 1 2 check_err $? "Packet from valid source received on H2 after deleting entry"
- $MZ $mode $h1.10 -c 1 -p 128 -A $invalid_src -B $grp -t udp -q + $MZ $mode $h1.10 -a own -b $dmac -c 1 -p 128 -A $invalid_src -B $grp -t udp -q tc_check_packets "dev $h2 ingress" 2 1 check_err $? "Packet from invalid source received on H2 after deleting entry"
@@ -1047,11 +1049,11 @@ __fwd_test_port_ip()
fwd_test_port_ip() { - __fwd_test_port_ip "239.1.1.1" "192.0.2.1" "192.0.2.2" "-4" "exclude" - __fwd_test_port_ip "ff0e::1" "2001:db8:1::1" "2001:db8:1::2" "-6" \ + __fwd_test_port_ip "239.1.1.1" "01:00:5e:01:01:01" "192.0.2.1" "192.0.2.2" "-4" "exclude" + __fwd_test_port_ip "ff0e::1" "33:33:00:00:00:01" "2001:db8:1::1" "2001:db8:1::2" "-6" \ "exclude" - __fwd_test_port_ip "239.1.1.1" "192.0.2.1" "192.0.2.2" "-4" "include" - __fwd_test_port_ip "ff0e::1" "2001:db8:1::1" "2001:db8:1::2" "-6" \ + __fwd_test_port_ip "239.1.1.1" "01:00:5e:01:01:01" "192.0.2.1" "192.0.2.2" "-4" "include" + __fwd_test_port_ip "ff0e::1" "33:33:00:00:00:01" "2001:db8:1::1" "2001:db8:1::2" "-6" \ "include" }
@@ -1127,7 +1129,7 @@ ctrl_igmpv3_is_in_test() filter_mode include source_list 192.0.2.1
# IS_IN ( 192.0.2.2 ) - $MZ $h1.10 -c 1 -A 192.0.2.1 -B 239.1.1.1 \ + $MZ $h1.10 -c 1 -a own -b 01:00:5e:01:01:01 -A 192.0.2.1 -B 239.1.1.1 \ -t ip proto=2,p=$(igmpv3_is_in_get 239.1.1.1 192.0.2.2) -q
bridge -d mdb show dev br0 vid 10 | grep 239.1.1.1 | grep -q 192.0.2.2 @@ -1140,7 +1142,7 @@ ctrl_igmpv3_is_in_test() filter_mode include source_list 192.0.2.1
# IS_IN ( 192.0.2.2 ) - $MZ $h1.10 -c 1 -A 192.0.2.1 -B 239.1.1.1 \ + $MZ $h1.10 -a own -b 01:00:5e:01:01:01 -c 1 -A 192.0.2.1 -B 239.1.1.1 \ -t ip proto=2,p=$(igmpv3_is_in_get 239.1.1.1 192.0.2.2) -q
bridge -d mdb show dev br0 vid 10 | grep 239.1.1.1 | grep -v "src" | \ @@ -1167,7 +1169,7 @@ ctrl_mldv2_is_in_test()
# IS_IN ( 2001:db8:1::2 ) local p=$(mldv2_is_in_get fe80::1 ff0e::1 2001:db8:1::2) - $MZ -6 $h1.10 -c 1 -A fe80::1 -B ff0e::1 \ + $MZ -6 $h1.10 -a own -b 33:33:00:00:00:01 -c 1 -A fe80::1 -B ff0e::1 \ -t ip hop=1,next=0,p="$p" -q
bridge -d mdb show dev br0 vid 10 | grep ff0e::1 | \ @@ -1181,7 +1183,7 @@ ctrl_mldv2_is_in_test() filter_mode include source_list 2001:db8:1::1
# IS_IN ( 2001:db8:1::2 ) - $MZ -6 $h1.10 -c 1 -A fe80::1 -B ff0e::1 \ + $MZ -6 $h1.10 -a own -b 33:33:00:00:00:01 -c 1 -A fe80::1 -B ff0e::1 \ -t ip hop=1,next=0,p="$p" -q
bridge -d mdb show dev br0 vid 10 | grep ff0e::1 | grep -v "src" | \
From: Ido Schimmel idosch@nvidia.com
commit 8b5ff37097775cdbd447442603957066dd2e4d02 upstream.
Some test cases check that the group timer is (or isn't) 0. Instead of grepping for "0.00" grep for " 0.00" as the former can also match "260.00" which is the default group membership interval.
Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Reported-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.uniz... Signed-off-by: Ido Schimmel idosch@nvidia.com Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20230808141503.4060661-18-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/forwarding/bridge_mdb.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh +++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh @@ -617,7 +617,7 @@ __cfg_test_port_ip_sg() grep -q "permanent" check_err $? "Entry not added as "permanent" when should" bridge -d -s mdb show dev br0 vid 10 | grep "$grp_key" | \ - grep -q "0.00" + grep -q " 0.00" check_err $? ""permanent" entry has a pending group timer" bridge mdb del dev br0 port $swp1 $grp_key vid 10
@@ -626,7 +626,7 @@ __cfg_test_port_ip_sg() grep -q "temp" check_err $? "Entry not added as "temp" when should" bridge -d -s mdb show dev br0 vid 10 | grep "$grp_key" | \ - grep -q "0.00" + grep -q " 0.00" check_fail $? ""temp" entry has an unpending group timer" bridge mdb del dev br0 port $swp1 $grp_key vid 10
@@ -659,7 +659,7 @@ __cfg_test_port_ip_sg() grep -q "permanent" check_err $? "Entry not marked as "permanent" after replace" bridge -d -s mdb show dev br0 vid 10 | grep "$grp_key" | \ - grep -q "0.00" + grep -q " 0.00" check_err $? "Entry has a pending group timer after replace"
bridge mdb replace dev br0 port $swp1 $grp_key vid 10 temp @@ -667,7 +667,7 @@ __cfg_test_port_ip_sg() grep -q "temp" check_err $? "Entry not marked as "temp" after replace" bridge -d -s mdb show dev br0 vid 10 | grep "$grp_key" | \ - grep -q "0.00" + grep -q " 0.00" check_fail $? "Entry has an unpending group timer after replace" bridge mdb del dev br0 port $swp1 $grp_key vid 10
From: Andrew Kanner andrew.kanner@gmail.com
commit d14eea09edf427fa36bd446f4a3271f99164202f upstream.
Syzkaller reported the following issue: ======================================= Too BIG xdp->frame_sz = 131072 WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline] WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103 ... Call Trace: <TASK> bpf_prog_4add87e5301a4105+0x1a/0x1c __bpf_prog_run include/linux/filter.h:600 [inline] bpf_prog_run_xdp include/linux/filter.h:775 [inline] bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721 netif_receive_generic_xdp net/core/dev.c:4807 [inline] do_xdp_generic+0x35c/0x770 net/core/dev.c:4866 tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919 tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043 call_write_iter include/linux/fs.h:1871 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x650/0xe40 fs/read_write.c:584 ksys_write+0x12f/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87 ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But Jesper Dangaard Brouer jbrouer@redhat.com noted that after introducing the xdp_init_buff() which all XDP driver use - it's safe to remove this check. The original intend was to catch cases where XDP drivers have not been updated to use xdp.frame_sz, but that is not longer a concern (since xdp_init_buff).
Running the initial syzkaller repro it was discovered that the contiguous physical memory allocation is used for both xdp paths in tun_get_user(), e.g. tun_build_skb() and tun_alloc_skb(). It was also stated by Jesper Dangaard Brouer jbrouer@redhat.com that XDP can work on higher order pages, as long as this is contiguous physical memory (e.g. a page).
Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/ Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a Link: https://lore.kernel.org/all/20230725155403.796-1-andrew.kanner@gmail.com/T/ Fixes: 43b5169d8355 ("net, xdp: Introduce xdp_init_buff utility routine") Signed-off-by: Andrew Kanner andrew.kanner@gmail.com Acked-by: Jesper Dangaard Brouer hawk@kernel.org Acked-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/20230803190316.2380231-1-andrew.kanner@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/filter.c | 6 ------ 1 file changed, 6 deletions(-)
--- a/net/core/filter.c +++ b/net/core/filter.c @@ -4115,12 +4115,6 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct x if (unlikely(data_end > data_hard_end)) return -EINVAL;
- /* ALL drivers MUST init xdp->frame_sz, chicken check below */ - if (unlikely(xdp->frame_sz > PAGE_SIZE)) { - WARN_ONCE(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz); - return -EINVAL; - } - if (unlikely(data_end < xdp->data + ETH_HLEN)) return -EINVAL;
From: Xu Kuohai xukuohai@huawei.com
commit 7e96ec0e6605b69bb21bbf6c0ff9051e656ec2b1 upstream.
sock_map_del_link() operates on both SOCKMAP and SOCKHASH, although both types have member named "progs", the offset of "progs" member in these two types is different, so "progs" should be accessed with the real map type.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/r/20230804073740.194770-2-xukuohai@huaweicloud.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/sock_map.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -146,13 +146,13 @@ static void sock_map_del_link(struct soc list_for_each_entry_safe(link, tmp, &psock->link, list) { if (link->link_raw == link_raw) { struct bpf_map *map = link->map; - struct bpf_stab *stab = container_of(map, struct bpf_stab, - map); - if (psock->saved_data_ready && stab->progs.stream_parser) + struct sk_psock_progs *progs = sock_map_progs(map); + + if (psock->saved_data_ready && progs->stream_parser) strp_stop = true; - if (psock->saved_data_ready && stab->progs.stream_verdict) + if (psock->saved_data_ready && progs->stream_verdict) verdict_stop = true; - if (psock->saved_data_ready && stab->progs.skb_verdict) + if (psock->saved_data_ready && progs->skb_verdict) verdict_stop = true; list_del(&link->list); sk_psock_free_link(link);
From: Xu Kuohai xukuohai@huawei.com
commit 809e4dc71a0f2b8d2836035d98603694fff11d5d upstream.
strp_done is only called when psock->progs.stream_parser is not NULL, but stream_parser was set to NULL by sk_psock_stop_strp(), called by sk_psock_drop() earlier. So, strp_done can never be called.
Introduce SK_PSOCK_RX_ENABLED to mark whether there is strp on psock. Change the condition for calling strp_done from judging whether stream_parser is set to judging whether this flag is set. This flag is only set once when strp_init() succeeds, and will never be cleared later.
Fixes: c0d95d3380ee ("bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap") Signed-off-by: Xu Kuohai xukuohai@huawei.com Reviewed-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/r/20230804073740.194770-3-xukuohai@huaweicloud.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/skmsg.h | 1 + net/core/skmsg.c | 10 ++++++++-- 2 files changed, 9 insertions(+), 2 deletions(-)
--- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -62,6 +62,7 @@ struct sk_psock_progs {
enum sk_psock_state_bits { SK_PSOCK_TX_ENABLED, + SK_PSOCK_RX_STRP_ENABLED, };
struct sk_psock_link { --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1120,13 +1120,19 @@ static void sk_psock_strp_data_ready(str
int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock) { + int ret; + static const struct strp_callbacks cb = { .rcv_msg = sk_psock_strp_read, .read_sock_done = sk_psock_strp_read_done, .parse_msg = sk_psock_strp_parse, };
- return strp_init(&psock->strp, sk, &cb); + ret = strp_init(&psock->strp, sk, &cb); + if (!ret) + sk_psock_set_state(psock, SK_PSOCK_RX_STRP_ENABLED); + + return ret; }
void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock) @@ -1154,7 +1160,7 @@ void sk_psock_stop_strp(struct sock *sk, static void sk_psock_done_strp(struct sk_psock *psock) { /* Parser has been stopped */ - if (psock->progs.stream_parser) + if (sk_psock_test_state(psock, SK_PSOCK_RX_STRP_ENABLED)) strp_done(&psock->strp); } #else
From: Aleksa Savic savicaleksa83@gmail.com
commit 56b930dcd88c2adc261410501c402c790980bdb5 upstream.
Add a 200ms delay after sending a ctrl report to Quadro, Octo, D5 Next and Aquaero to give them enough time to process the request and save the data to memory. Otherwise, under heavier userspace loads where multiple sysfs entries are usually set in quick succession, a new ctrl report could be requested from the device while it's still processing the previous one and fail with -EPIPE. The delay is only applied if two ctrl report operations are near each other in time.
Reported by a user on Github [1] and tested by both of us.
[1] https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/issues/82
Fixes: 752b927951ea ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Octo") Signed-off-by: Aleksa Savic savicaleksa83@gmail.com Link: https://lore.kernel.org/r/20230807172004.456968-1-savicaleksa83@gmail.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwmon/aquacomputer_d5next.c | 37 +++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-)
--- a/drivers/hwmon/aquacomputer_d5next.c +++ b/drivers/hwmon/aquacomputer_d5next.c @@ -13,9 +13,11 @@
#include <linux/crc16.h> #include <linux/debugfs.h> +#include <linux/delay.h> #include <linux/hid.h> #include <linux/hwmon.h> #include <linux/jiffies.h> +#include <linux/ktime.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/seq_file.h> @@ -61,6 +63,8 @@ static const char *const aqc_device_name #define CTRL_REPORT_ID 0x03 #define AQUAERO_CTRL_REPORT_ID 0x0b
+#define CTRL_REPORT_DELAY 200 /* ms */ + /* The HID report that the official software always sends * after writing values, currently same for all devices */ @@ -496,6 +500,9 @@ struct aqc_data { int secondary_ctrl_report_size; u8 *secondary_ctrl_report;
+ ktime_t last_ctrl_report_op; + int ctrl_report_delay; /* Delay between two ctrl report operations, in ms */ + int buffer_size; u8 *buffer; int checksum_start; @@ -577,17 +584,35 @@ static int aqc_aquastreamxt_convert_fan_ return 0; }
+static void aqc_delay_ctrl_report(struct aqc_data *priv) +{ + /* + * If previous read or write is too close to this one, delay the current operation + * to give the device enough time to process the previous one. + */ + if (priv->ctrl_report_delay) { + s64 delta = ktime_ms_delta(ktime_get(), priv->last_ctrl_report_op); + + if (delta < priv->ctrl_report_delay) + msleep(priv->ctrl_report_delay - delta); + } +} + /* Expects the mutex to be locked */ static int aqc_get_ctrl_data(struct aqc_data *priv) { int ret;
+ aqc_delay_ctrl_report(priv); + memset(priv->buffer, 0x00, priv->buffer_size); ret = hid_hw_raw_request(priv->hdev, priv->ctrl_report_id, priv->buffer, priv->buffer_size, HID_FEATURE_REPORT, HID_REQ_GET_REPORT); if (ret < 0) ret = -ENODATA;
+ priv->last_ctrl_report_op = ktime_get(); + return ret; }
@@ -597,6 +622,8 @@ static int aqc_send_ctrl_data(struct aqc int ret; u16 checksum;
+ aqc_delay_ctrl_report(priv); + /* Checksum is not needed for Aquaero */ if (priv->kind != aquaero) { /* Init and xorout value for CRC-16/USB is 0xffff */ @@ -612,12 +639,16 @@ static int aqc_send_ctrl_data(struct aqc ret = hid_hw_raw_request(priv->hdev, priv->ctrl_report_id, priv->buffer, priv->buffer_size, HID_FEATURE_REPORT, HID_REQ_SET_REPORT); if (ret < 0) - return ret; + goto record_access_and_ret;
/* The official software sends this report after every change, so do it here as well */ ret = hid_hw_raw_request(priv->hdev, priv->secondary_ctrl_report_id, priv->secondary_ctrl_report, priv->secondary_ctrl_report_size, HID_FEATURE_REPORT, HID_REQ_SET_REPORT); + +record_access_and_ret: + priv->last_ctrl_report_op = ktime_get(); + return ret; }
@@ -1443,6 +1474,7 @@ static int aqc_probe(struct hid_device *
priv->buffer_size = AQUAERO_CTRL_REPORT_SIZE; priv->temp_ctrl_offset = AQUAERO_TEMP_CTRL_OFFSET; + priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->temp_label = label_temp_sensors; priv->virtual_temp_label = label_virtual_temp_sensors; @@ -1466,6 +1498,7 @@ static int aqc_probe(struct hid_device * priv->temp_ctrl_offset = D5NEXT_TEMP_CTRL_OFFSET;
priv->buffer_size = D5NEXT_CTRL_REPORT_SIZE; + priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->power_cycle_count_offset = D5NEXT_POWER_CYCLES;
@@ -1516,6 +1549,7 @@ static int aqc_probe(struct hid_device * priv->temp_ctrl_offset = OCTO_TEMP_CTRL_OFFSET;
priv->buffer_size = OCTO_CTRL_REPORT_SIZE; + priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->power_cycle_count_offset = OCTO_POWER_CYCLES;
@@ -1543,6 +1577,7 @@ static int aqc_probe(struct hid_device * priv->temp_ctrl_offset = QUADRO_TEMP_CTRL_OFFSET;
priv->buffer_size = QUADRO_CTRL_REPORT_SIZE; + priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->flow_pulses_ctrl_offset = QUADRO_FLOW_PULSES_CTRL_OFFSET; priv->power_cycle_count_offset = QUADRO_POWER_CYCLES;
From: Nathan Chancellor nathan@kernel.org
commit 1696ec8654016dad3b1baf6c024303e584400453 upstream.
When booting a kernel with CONFIG_MISDN_DSP=y and CONFIG_CFI_CLANG=y, there is a failure when dsp_cmx_send() is called indirectly from call_timer_fn():
[ 0.371412] CFI failure at call_timer_fn+0x2f/0x150 (target: dsp_cmx_send+0x0/0x530; expected type: 0x92ada1e9)
The function pointer prototype that call_timer_fn() expects is
void (*fn)(struct timer_list *)
whereas dsp_cmx_send() has a parameter type of 'void *', which causes the control flow integrity checks to fail because the parameter types do not match.
Change dsp_cmx_send()'s parameter type to be 'struct timer_list' to match the expected prototype. The argument is unused anyways, so this has no functional change, aside from avoiding the CFI failure.
Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202308020936.58787e6c-oliver.sang@intel.com Signed-off-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Sami Tolvanen samitolvanen@google.com Reviewed-by: Kees Cook keescook@chromium.org Fixes: e313ac12eb13 ("mISDN: Convert timers to use timer_setup()") Link: https://lore.kernel.org/r/20230802-fix-dsp_cmx_send-cfi-failure-v1-1-2f2e79b... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/isdn/mISDN/dsp.h | 2 +- drivers/isdn/mISDN/dsp_cmx.c | 2 +- drivers/isdn/mISDN/dsp_core.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/isdn/mISDN/dsp.h +++ b/drivers/isdn/mISDN/dsp.h @@ -247,7 +247,7 @@ extern void dsp_cmx_hardware(struct dsp_ extern int dsp_cmx_conf(struct dsp *dsp, u32 conf_id); extern void dsp_cmx_receive(struct dsp *dsp, struct sk_buff *skb); extern void dsp_cmx_hdlc(struct dsp *dsp, struct sk_buff *skb); -extern void dsp_cmx_send(void *arg); +extern void dsp_cmx_send(struct timer_list *arg); extern void dsp_cmx_transmit(struct dsp *dsp, struct sk_buff *skb); extern int dsp_cmx_del_conf_member(struct dsp *dsp); extern int dsp_cmx_del_conf(struct dsp_conf *conf); --- a/drivers/isdn/mISDN/dsp_cmx.c +++ b/drivers/isdn/mISDN/dsp_cmx.c @@ -1614,7 +1614,7 @@ static u16 dsp_count; /* last sample cou static int dsp_count_valid; /* if we have last sample count */
void -dsp_cmx_send(void *arg) +dsp_cmx_send(struct timer_list *arg) { struct dsp_conf *conf; struct dsp_conf_member *member; --- a/drivers/isdn/mISDN/dsp_core.c +++ b/drivers/isdn/mISDN/dsp_core.c @@ -1195,7 +1195,7 @@ static int __init dsp_init(void) }
/* set sample timer */ - timer_setup(&dsp_spl_tl, (void *)dsp_cmx_send, 0); + timer_setup(&dsp_spl_tl, dsp_cmx_send, 0); dsp_spl_tl.expires = jiffies + dsp_tics; dsp_spl_jiffies = dsp_spl_tl.expires; add_timer(&dsp_spl_tl);
From: Eric Dumazet edumazet@google.com
commit 32d0a49d36a2a306c2e47fe5659361e424f0ed3f upstream.
syzbot/KCSAN reported data-races in macsec whenever dev->stats fields are updated.
It appears all of these updates can happen from multiple cpus.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/macsec.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-)
--- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -743,7 +743,7 @@ static bool macsec_post_decrypt(struct s u64_stats_update_begin(&rxsc_stats->syncp); rxsc_stats->stats.InPktsLate++; u64_stats_update_end(&rxsc_stats->syncp); - secy->netdev->stats.rx_dropped++; + DEV_STATS_INC(secy->netdev, rx_dropped); return false; }
@@ -767,7 +767,7 @@ static bool macsec_post_decrypt(struct s rxsc_stats->stats.InPktsNotValid++; u64_stats_update_end(&rxsc_stats->syncp); this_cpu_inc(rx_sa->stats->InPktsNotValid); - secy->netdev->stats.rx_errors++; + DEV_STATS_INC(secy->netdev, rx_errors); return false; }
@@ -1069,7 +1069,7 @@ static enum rx_handler_result handle_not u64_stats_update_begin(&secy_stats->syncp); secy_stats->stats.InPktsNoTag++; u64_stats_update_end(&secy_stats->syncp); - macsec->secy.netdev->stats.rx_dropped++; + DEV_STATS_INC(macsec->secy.netdev, rx_dropped); continue; }
@@ -1179,7 +1179,7 @@ static rx_handler_result_t macsec_handle u64_stats_update_begin(&secy_stats->syncp); secy_stats->stats.InPktsBadTag++; u64_stats_update_end(&secy_stats->syncp); - secy->netdev->stats.rx_errors++; + DEV_STATS_INC(secy->netdev, rx_errors); goto drop_nosa; }
@@ -1196,7 +1196,7 @@ static rx_handler_result_t macsec_handle u64_stats_update_begin(&rxsc_stats->syncp); rxsc_stats->stats.InPktsNotUsingSA++; u64_stats_update_end(&rxsc_stats->syncp); - secy->netdev->stats.rx_errors++; + DEV_STATS_INC(secy->netdev, rx_errors); if (active_rx_sa) this_cpu_inc(active_rx_sa->stats->InPktsNotUsingSA); goto drop_nosa; @@ -1230,7 +1230,7 @@ static rx_handler_result_t macsec_handle u64_stats_update_begin(&rxsc_stats->syncp); rxsc_stats->stats.InPktsLate++; u64_stats_update_end(&rxsc_stats->syncp); - macsec->secy.netdev->stats.rx_dropped++; + DEV_STATS_INC(macsec->secy.netdev, rx_dropped); goto drop; } } @@ -1271,7 +1271,7 @@ deliver: if (ret == NET_RX_SUCCESS) count_rx(dev, len); else - macsec->secy.netdev->stats.rx_dropped++; + DEV_STATS_INC(macsec->secy.netdev, rx_dropped);
rcu_read_unlock();
@@ -1308,7 +1308,7 @@ nosci: u64_stats_update_begin(&secy_stats->syncp); secy_stats->stats.InPktsNoSCI++; u64_stats_update_end(&secy_stats->syncp); - macsec->secy.netdev->stats.rx_errors++; + DEV_STATS_INC(macsec->secy.netdev, rx_errors); continue; }
@@ -1327,7 +1327,7 @@ nosci: secy_stats->stats.InPktsUnknownSCI++; u64_stats_update_end(&secy_stats->syncp); } else { - macsec->secy.netdev->stats.rx_dropped++; + DEV_STATS_INC(macsec->secy.netdev, rx_dropped); } }
@@ -3422,7 +3422,7 @@ static netdev_tx_t macsec_start_xmit(str
if (!secy->operational) { kfree_skb(skb); - dev->stats.tx_dropped++; + DEV_STATS_INC(dev, tx_dropped); return NETDEV_TX_OK; }
@@ -3430,7 +3430,7 @@ static netdev_tx_t macsec_start_xmit(str skb = macsec_encrypt(skb, dev); if (IS_ERR(skb)) { if (PTR_ERR(skb) != -EINPROGRESS) - dev->stats.tx_dropped++; + DEV_STATS_INC(dev, tx_dropped); return NETDEV_TX_OK; }
@@ -3667,9 +3667,9 @@ static void macsec_get_stats64(struct ne
dev_fetch_sw_netstats(s, dev->tstats);
- s->rx_dropped = dev->stats.rx_dropped; - s->tx_dropped = dev->stats.tx_dropped; - s->rx_errors = dev->stats.rx_errors; + s->rx_dropped = atomic_long_read(&dev->stats.__rx_dropped); + s->tx_dropped = atomic_long_read(&dev->stats.__tx_dropped); + s->rx_errors = atomic_long_read(&dev->stats.__rx_errors); }
static int macsec_get_iflink(const struct net_device *dev)
From: Xiang Yang xiangyang3@huawei.com
commit 17ebf8a4c38b5481c29623f5e003fdf7583947f9 upstream.
Coccicheck reports the error below: net/mptcp/protocol.c:3330:15-28: ERROR: test of a variable/field address
Since the address of msk->cb_flags is used in __test_and_clear_bit, the address should not be NULL. The judgment for if (unlikely(msk->cb_flags)) will always be true, we should check the real value of msk->cb_flags here.
Fixes: 65a569b03ca8 ("mptcp: optimize release_cb for the common case") Signed-off-by: Xiang Yang xiangyang3@huawei.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Link: https://lore.kernel.org/r/20230803072438.1847500-1-xiangyang3@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3321,7 +3321,7 @@ static void mptcp_release_cb(struct sock
if (__test_and_clear_bit(MPTCP_CLEAN_UNA, &msk->cb_flags)) __mptcp_clean_una_wakeup(sk); - if (unlikely(&msk->cb_flags)) { + if (unlikely(msk->cb_flags)) { /* be sure to set the current sk state before tacking actions * depending on sk_state, that is processing MPTCP_ERROR_REPORT */
From: Muhammad Husaini Zulkifli muhammad.husaini.zulkifli@intel.com
commit 06b412589eef780b792e73df131d35dc43cc4a49 upstream.
Access to shared variables through hrtimer requires locking in order to protect the variables because actions to write into these variables (oper_gate_closed, admin_gate_closed, and qbv_transition) might potentially occur simultaneously. This patch provides a locking mechanisms to avoid such scenarios.
Fixes: 175c241288c0 ("igc: Fix TX Hang issue when QBV Gate is closed") Suggested-by: Leon Romanovsky leon@kernel.org Signed-off-by: Muhammad Husaini Zulkifli muhammad.husaini.zulkifli@intel.com Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Link: https://lore.kernel.org/r/20230807205129.3129346-1-anthony.l.nguyen@intel.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/igc/igc.h | 4 +++ drivers/net/ethernet/intel/igc/igc_main.c | 34 ++++++++++++++++++++++++++++-- 2 files changed, 36 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/intel/igc/igc.h +++ b/drivers/net/ethernet/intel/igc/igc.h @@ -195,6 +195,10 @@ struct igc_adapter { u32 qbv_config_change_errors; bool qbv_transition; unsigned int qbv_count; + /* Access to oper_gate_closed, admin_gate_closed and qbv_transition + * are protected by the qbv_tx_lock. + */ + spinlock_t qbv_tx_lock;
/* OS defined structs */ struct pci_dev *pdev; --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -4799,6 +4799,7 @@ static int igc_sw_init(struct igc_adapte adapter->nfc_rule_count = 0;
spin_lock_init(&adapter->stats64_lock); + spin_lock_init(&adapter->qbv_tx_lock); /* Assume MSI-X interrupts, will be checked during IRQ allocation */ adapter->flags |= IGC_FLAG_HAS_MSIX;
@@ -6117,15 +6118,15 @@ static int igc_tsn_enable_launchtime(str return igc_tsn_offload_apply(adapter); }
-static int igc_tsn_clear_schedule(struct igc_adapter *adapter) +static int igc_qbv_clear_schedule(struct igc_adapter *adapter) { + unsigned long flags; int i;
adapter->base_time = 0; adapter->cycle_time = NSEC_PER_SEC; adapter->taprio_offload_enable = false; adapter->qbv_config_change_errors = 0; - adapter->qbv_transition = false; adapter->qbv_count = 0;
for (i = 0; i < adapter->num_tx_queues; i++) { @@ -6134,10 +6135,28 @@ static int igc_tsn_clear_schedule(struct ring->start_time = 0; ring->end_time = NSEC_PER_SEC; ring->max_sdu = 0; + } + + spin_lock_irqsave(&adapter->qbv_tx_lock, flags); + + adapter->qbv_transition = false; + + for (i = 0; i < adapter->num_tx_queues; i++) { + struct igc_ring *ring = adapter->tx_ring[i]; + ring->oper_gate_closed = false; ring->admin_gate_closed = false; }
+ spin_unlock_irqrestore(&adapter->qbv_tx_lock, flags); + + return 0; +} + +static int igc_tsn_clear_schedule(struct igc_adapter *adapter) +{ + igc_qbv_clear_schedule(adapter); + return 0; }
@@ -6148,6 +6167,7 @@ static int igc_save_qbv_schedule(struct struct igc_hw *hw = &adapter->hw; u32 start_time = 0, end_time = 0; struct timespec64 now; + unsigned long flags; size_t n; int i;
@@ -6215,6 +6235,8 @@ static int igc_save_qbv_schedule(struct start_time += e->interval; }
+ spin_lock_irqsave(&adapter->qbv_tx_lock, flags); + /* Check whether a queue gets configured. * If not, set the start and end time to be end time. */ @@ -6239,6 +6261,8 @@ static int igc_save_qbv_schedule(struct } }
+ spin_unlock_irqrestore(&adapter->qbv_tx_lock, flags); + for (i = 0; i < adapter->num_tx_queues; i++) { struct igc_ring *ring = adapter->tx_ring[i]; struct net_device *dev = adapter->netdev; @@ -6603,8 +6627,11 @@ static enum hrtimer_restart igc_qbv_sche { struct igc_adapter *adapter = container_of(timer, struct igc_adapter, hrtimer); + unsigned long flags; unsigned int i;
+ spin_lock_irqsave(&adapter->qbv_tx_lock, flags); + adapter->qbv_transition = true; for (i = 0; i < adapter->num_tx_queues; i++) { struct igc_ring *tx_ring = adapter->tx_ring[i]; @@ -6617,6 +6644,9 @@ static enum hrtimer_restart igc_qbv_sche } } adapter->qbv_transition = false; + + spin_unlock_irqrestore(&adapter->qbv_tx_lock, flags); + return HRTIMER_NORESTART; }
From: Nitya Sunkad nitya.sunkad@amd.com
commit 52417a95ff2d810dc31a68ae71102e741efea772 upstream.
ionic_start_queues_reconfig returns an error code if txrx_init fails. Handle this error code in the relevant places.
This fixes a corner case where the device could get left in a detached state if the CMB reconfig fails and the attempt to clean up the mess also fails. Note that calling netif_device_attach when the netdev is already attached does not lead to unexpected behavior.
Change goto name "errout" to "err_out" to maintain consistency across goto statements.
Fixes: 40bc471dc714 ("ionic: add tx/rx-push support with device Component Memory Buffers") Fixes: 6f7d6f0fd7a3 ("ionic: pull reset_queues into tx_timeout handler") Signed-off-by: Nitya Sunkad nitya.sunkad@amd.com Signed-off-by: Shannon Nelson shannon.nelson@amd.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/pensando/ionic/ionic_lif.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c @@ -1816,6 +1816,7 @@ static int ionic_change_mtu(struct net_d static void ionic_tx_timeout_work(struct work_struct *ws) { struct ionic_lif *lif = container_of(ws, struct ionic_lif, tx_timeout_work); + int err;
if (test_bit(IONIC_LIF_F_FW_RESET, lif->state)) return; @@ -1828,8 +1829,11 @@ static void ionic_tx_timeout_work(struct
mutex_lock(&lif->queue_lock); ionic_stop_queues_reconfig(lif); - ionic_start_queues_reconfig(lif); + err = ionic_start_queues_reconfig(lif); mutex_unlock(&lif->queue_lock); + + if (err) + dev_err(lif->ionic->dev, "%s: Restarting queues failed\n", __func__); }
static void ionic_tx_timeout(struct net_device *netdev, unsigned int txqueue) @@ -2799,17 +2803,22 @@ static int ionic_cmb_reconfig(struct ion if (err) { dev_err(lif->ionic->dev, "CMB restore failed: %d\n", err); - goto errout; + goto err_out; } }
- ionic_start_queues_reconfig(lif); - } else { - /* This was detached in ionic_stop_queues_reconfig() */ - netif_device_attach(lif->netdev); + err = ionic_start_queues_reconfig(lif); + if (err) { + dev_err(lif->ionic->dev, + "CMB reconfig failed: %d\n", err); + goto err_out; + } }
-errout: +err_out: + /* This was detached in ionic_stop_queues_reconfig() */ + netif_device_attach(lif->netdev); + return err; }
From: Eric Dumazet edumazet@google.com
commit 8a9896177784063d01068293caea3f74f6830ff6 upstream.
Another syzbot report [1] is about tp->status lockless reads from __packet_get_status()
[1] BUG: KCSAN: data-race in __packet_rcv_has_room / __packet_set_status
write to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 0: __packet_set_status+0x78/0xa0 net/packet/af_packet.c:407 tpacket_rcv+0x18bb/0x1a60 net/packet/af_packet.c:2483 deliver_skb net/core/dev.c:2173 [inline] __netif_receive_skb_core+0x408/0x1e80 net/core/dev.c:5337 __netif_receive_skb_one_core net/core/dev.c:5491 [inline] __netif_receive_skb+0x57/0x1b0 net/core/dev.c:5607 process_backlog+0x21f/0x380 net/core/dev.c:5935 __napi_poll+0x60/0x3b0 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6698 __do_softirq+0xc1/0x265 kernel/softirq.c:571 invoke_softirq kernel/softirq.c:445 [inline] __irq_exit_rcu+0x57/0xa0 kernel/softirq.c:650 sysvec_apic_timer_interrupt+0x6d/0x80 arch/x86/kernel/apic/apic.c:1106 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 smpboot_thread_fn+0x33c/0x4a0 kernel/smpboot.c:112 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
read to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 1: __packet_get_status net/packet/af_packet.c:436 [inline] packet_lookup_frame net/packet/af_packet.c:524 [inline] __tpacket_has_room net/packet/af_packet.c:1255 [inline] __packet_rcv_has_room+0x3f9/0x450 net/packet/af_packet.c:1298 tpacket_rcv+0x275/0x1a60 net/packet/af_packet.c:2285 deliver_skb net/core/dev.c:2173 [inline] dev_queue_xmit_nit+0x38a/0x5e0 net/core/dev.c:2243 xmit_one net/core/dev.c:3574 [inline] dev_hard_start_xmit+0xcf/0x3f0 net/core/dev.c:3594 __dev_queue_xmit+0xefb/0x1d10 net/core/dev.c:4244 dev_queue_xmit include/linux/netdevice.h:3088 [inline] can_send+0x4eb/0x5d0 net/can/af_can.c:276 bcm_can_tx+0x314/0x410 net/can/bcm.c:302 bcm_tx_timeout_handler+0xdb/0x260 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x217/0x700 kernel/time/hrtimer.c:1749 hrtimer_run_softirq+0xd6/0x120 kernel/time/hrtimer.c:1766 __do_softirq+0xc1/0x265 kernel/softirq.c:571 run_ksoftirqd+0x17/0x20 kernel/softirq.c:939 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
value changed: 0x0000000000000000 -> 0x0000000020000081
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 6.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Willem de Bruijn willemb@google.com Link: https://lore.kernel.org/r/20230803145600.2937518-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -401,18 +401,20 @@ static void __packet_set_status(struct p { union tpacket_uhdr h;
+ /* WRITE_ONCE() are paired with READ_ONCE() in __packet_get_status */ + h.raw = frame; switch (po->tp_version) { case TPACKET_V1: - h.h1->tp_status = status; + WRITE_ONCE(h.h1->tp_status, status); flush_dcache_page(pgv_to_page(&h.h1->tp_status)); break; case TPACKET_V2: - h.h2->tp_status = status; + WRITE_ONCE(h.h2->tp_status, status); flush_dcache_page(pgv_to_page(&h.h2->tp_status)); break; case TPACKET_V3: - h.h3->tp_status = status; + WRITE_ONCE(h.h3->tp_status, status); flush_dcache_page(pgv_to_page(&h.h3->tp_status)); break; default: @@ -429,17 +431,19 @@ static int __packet_get_status(const str
smp_rmb();
+ /* READ_ONCE() are paired with WRITE_ONCE() in __packet_set_status */ + h.raw = frame; switch (po->tp_version) { case TPACKET_V1: flush_dcache_page(pgv_to_page(&h.h1->tp_status)); - return h.h1->tp_status; + return READ_ONCE(h.h1->tp_status); case TPACKET_V2: flush_dcache_page(pgv_to_page(&h.h2->tp_status)); - return h.h2->tp_status; + return READ_ONCE(h.h2->tp_status); case TPACKET_V3: flush_dcache_page(pgv_to_page(&h.h3->tp_status)); - return h.h3->tp_status; + return READ_ONCE(h.h3->tp_status); default: WARN(1, "TPACKET version not supported.\n"); BUG();
From: Gerd Bayer gbayer@linux.ibm.com
commit 833bac7ec392bf75053c8a4fa4c36d4148dac77d upstream.
Commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") introduced the net.smc.rmem and net.smc.wmem sysctls to specify the size of buffers to be used for SMC type connections. This created a regression for users that specified the buffer size via setsockopt() as the effective buffer size was now doubled.
Re-introduce the division by 2 in the SMC buffer create code and level this out by duplicating the net.smc.[rw]mem values used for initializing sk_rcvbuf/sk_sndbuf at socket creation time. This gives users of both methods (setsockopt or sysctl) the effective buffer size that they expect.
Initialize net.smc.[rw]mem from its own constant of 64kB, respectively. Internal performance tests show that this value is a good compromise between throughput/latency and memory consumption. Also, this decouples it from any tuning that was done to net.ipv4.tcp_[rw]mem[1] before the module for SMC protocol was loaded. Check that no more than INT_MAX / 2 is assigned to net.smc.[rw]mem, in order to avoid any overflow condition when that is doubled for use in sk_sndbuf or sk_rcvbuf.
While at it, drop the confusing sk_buf_size variable from __smc_buf_create and name "compressed" buffer size variables more consistently.
Background:
Before the commit mentioned above, SMC's buffer allocator in __smc_buf_create() always used half of the sockets' sk_rcvbuf/sk_sndbuf value as initial value to search for appropriate buffers. If the search resorted to using a bigger buffer when all buffers of the specified size were busy, the duplicate of the used effective buffer size is stored back to sk_rcvbuf/sk_sndbuf.
When available, buffers of exactly the size that a user had specified as input to setsockopt() were used, despite setsockopt()'s documentation in "man 7 socket" talking of a mandatory duplication:
[...] SO_SNDBUF Sets or gets the maximum socket send buffer in bytes. The kernel doubles this value (to allow space for book‐ keeping overhead) when it is set using setsockopt(2), and this doubled value is returned by getsockopt(2). The default value is set by the /proc/sys/net/core/wmem_default file and the maximum allowed value is set by the /proc/sys/net/core/wmem_max file. The minimum (doubled) value for this option is 2048. [...]
Fixes: 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") Co-developed-by: Jan Karcher jaka@linux.ibm.com Signed-off-by: Jan Karcher jaka@linux.ibm.com Reviewed-by: Wenjia Zhang wenjia@linux.ibm.com Reviewed-by: Tony Lu tonylu@linux.alibaba.com Signed-off-by: Gerd Bayer gbayer@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/smc/af_smc.c | 4 ++-- net/smc/smc.h | 2 +- net/smc/smc_clc.c | 4 ++-- net/smc/smc_core.c | 25 ++++++++++++------------- net/smc/smc_sysctl.c | 10 ++++++++-- 5 files changed, 25 insertions(+), 20 deletions(-)
--- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -378,8 +378,8 @@ static struct sock *smc_sock_alloc(struc sk->sk_state = SMC_INIT; sk->sk_destruct = smc_destruct; sk->sk_protocol = protocol; - WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(net->smc.sysctl_wmem)); - WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(net->smc.sysctl_rmem)); + WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); + WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); smc = smc_sk(sk); INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); INIT_WORK(&smc->connect_work, smc_connect_work); --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -161,7 +161,7 @@ struct smc_connection {
struct smc_buf_desc *sndbuf_desc; /* send buffer descriptor */ struct smc_buf_desc *rmb_desc; /* RMBE descriptor */ - int rmbe_size_short;/* compressed notation */ + int rmbe_size_comp; /* compressed notation */ int rmbe_update_limit; /* lower limit for consumer * cursor update --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -1007,7 +1007,7 @@ static int smc_clc_send_confirm_accept(s clc->d0.gid = conn->lgr->smcd->ops->get_local_gid(conn->lgr->smcd); clc->d0.token = conn->rmb_desc->token; - clc->d0.dmbe_size = conn->rmbe_size_short; + clc->d0.dmbe_size = conn->rmbe_size_comp; clc->d0.dmbe_idx = 0; memcpy(&clc->d0.linkid, conn->lgr->id, SMC_LGR_ID_SIZE); if (version == SMC_V1) { @@ -1050,7 +1050,7 @@ static int smc_clc_send_confirm_accept(s clc->r0.qp_mtu = min(link->path_mtu, link->peer_mtu); break; } - clc->r0.rmbe_size = conn->rmbe_size_short; + clc->r0.rmbe_size = conn->rmbe_size_comp; clc->r0.rmb_dma_addr = conn->rmb_desc->is_vm ? cpu_to_be64((uintptr_t)conn->rmb_desc->cpu_addr) : cpu_to_be64((u64)sg_dma_address --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -2309,31 +2309,30 @@ static int __smc_buf_create(struct smc_s struct smc_connection *conn = &smc->conn; struct smc_link_group *lgr = conn->lgr; struct list_head *buf_list; - int bufsize, bufsize_short; + int bufsize, bufsize_comp; struct rw_semaphore *lock; /* lock buffer list */ bool is_dgraded = false; - int sk_buf_size;
if (is_rmb) /* use socket recv buffer size (w/o overhead) as start value */ - sk_buf_size = smc->sk.sk_rcvbuf; + bufsize = smc->sk.sk_rcvbuf / 2; else /* use socket send buffer size (w/o overhead) as start value */ - sk_buf_size = smc->sk.sk_sndbuf; + bufsize = smc->sk.sk_sndbuf / 2;
- for (bufsize_short = smc_compress_bufsize(sk_buf_size, is_smcd, is_rmb); - bufsize_short >= 0; bufsize_short--) { + for (bufsize_comp = smc_compress_bufsize(bufsize, is_smcd, is_rmb); + bufsize_comp >= 0; bufsize_comp--) { if (is_rmb) { lock = &lgr->rmbs_lock; - buf_list = &lgr->rmbs[bufsize_short]; + buf_list = &lgr->rmbs[bufsize_comp]; } else { lock = &lgr->sndbufs_lock; - buf_list = &lgr->sndbufs[bufsize_short]; + buf_list = &lgr->sndbufs[bufsize_comp]; } - bufsize = smc_uncompress_bufsize(bufsize_short); + bufsize = smc_uncompress_bufsize(bufsize_comp);
/* check for reusable slot in the link group */ - buf_desc = smc_buf_get_slot(bufsize_short, lock, buf_list); + buf_desc = smc_buf_get_slot(bufsize_comp, lock, buf_list); if (buf_desc) { buf_desc->is_dma_need_sync = 0; SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, bufsize); @@ -2377,8 +2376,8 @@ static int __smc_buf_create(struct smc_s
if (is_rmb) { conn->rmb_desc = buf_desc; - conn->rmbe_size_short = bufsize_short; - smc->sk.sk_rcvbuf = bufsize; + conn->rmbe_size_comp = bufsize_comp; + smc->sk.sk_rcvbuf = bufsize * 2; atomic_set(&conn->bytes_to_rcv, 0); conn->rmbe_update_limit = smc_rmb_wnd_update_limit(buf_desc->len); @@ -2386,7 +2385,7 @@ static int __smc_buf_create(struct smc_s smc_ism_set_conn(conn); /* map RMB/smcd_dev to conn */ } else { conn->sndbuf_desc = buf_desc; - smc->sk.sk_sndbuf = bufsize; + smc->sk.sk_sndbuf = bufsize * 2; atomic_set(&conn->sndbuf_space, bufsize); } return 0; --- a/net/smc/smc_sysctl.c +++ b/net/smc/smc_sysctl.c @@ -21,6 +21,10 @@
static int min_sndbuf = SMC_BUF_MIN_SIZE; static int min_rcvbuf = SMC_BUF_MIN_SIZE; +static int max_sndbuf = INT_MAX / 2; +static int max_rcvbuf = INT_MAX / 2; +static const int net_smc_wmem_init = (64 * 1024); +static const int net_smc_rmem_init = (64 * 1024);
static struct ctl_table smc_table[] = { { @@ -53,6 +57,7 @@ static struct ctl_table smc_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &min_sndbuf, + .extra2 = &max_sndbuf, }, { .procname = "rmem", @@ -61,6 +66,7 @@ static struct ctl_table smc_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = &min_rcvbuf, + .extra2 = &max_rcvbuf, }, { } }; @@ -88,8 +94,8 @@ int __net_init smc_sysctl_net_init(struc net->smc.sysctl_autocorking_size = SMC_AUTOCORKING_DEFAULT_SIZE; net->smc.sysctl_smcr_buf_type = SMCR_PHYS_CONT_BUFS; net->smc.sysctl_smcr_testlink_time = SMC_LLC_TESTLINK_DEFAULT_TIME; - WRITE_ONCE(net->smc.sysctl_wmem, READ_ONCE(net->ipv4.sysctl_tcp_wmem[1])); - WRITE_ONCE(net->smc.sysctl_rmem, READ_ONCE(net->ipv4.sysctl_tcp_rmem[1])); + WRITE_ONCE(net->smc.sysctl_wmem, net_smc_wmem_init); + WRITE_ONCE(net->smc.sysctl_rmem, net_smc_rmem_init);
return 0;
From: Gerd Bayer gbayer@linux.ibm.com
commit 30c3c4a4497c3765bf6b298f5072c8165aeaf7cc upstream.
Tuning of the effective buffer size through setsockopts was working for SMC traffic only but not for TCP fall-back connections even before commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable"). That change made it apparent that TCP fall-back connections would use net.smc.[rw]mem as buffer size instead of net.ipv4_tcp_[rw]mem.
Amend the code that copies attributes between the (TCP) clcsock and the SMC socket and adjust buffer sizes appropriately: - Copy over sk_userlocks so that both sockets agree on whether tuning via setsockopt is active. - When falling back to TCP use sk_sndbuf or sk_rcvbuf as specified with setsockopt. Otherwise, use the sysctl value for TCP/IPv4. - Likewise, use either values from setsockopt or from sysctl for SMC (duplicated) on successful SMC connect.
In smc_tcp_listen_work() drop the explicit copy of buffer sizes as that is taken care of by the attribute copy.
Fixes: 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") Reviewed-by: Wenjia Zhang wenjia@linux.ibm.com Reviewed-by: Tony Lu tonylu@linux.alibaba.com Signed-off-by: Gerd Bayer gbayer@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/smc/af_smc.c | 73 ++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 51 insertions(+), 22 deletions(-)
--- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -436,13 +436,60 @@ out: return rc; }
+/* copy only relevant settings and flags of SOL_SOCKET level from smc to + * clc socket (since smc is not called for these options from net/core) + */ + +#define SK_FLAGS_SMC_TO_CLC ((1UL << SOCK_URGINLINE) | \ + (1UL << SOCK_KEEPOPEN) | \ + (1UL << SOCK_LINGER) | \ + (1UL << SOCK_BROADCAST) | \ + (1UL << SOCK_TIMESTAMP) | \ + (1UL << SOCK_DBG) | \ + (1UL << SOCK_RCVTSTAMP) | \ + (1UL << SOCK_RCVTSTAMPNS) | \ + (1UL << SOCK_LOCALROUTE) | \ + (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE) | \ + (1UL << SOCK_RXQ_OVFL) | \ + (1UL << SOCK_WIFI_STATUS) | \ + (1UL << SOCK_NOFCS) | \ + (1UL << SOCK_FILTER_LOCKED) | \ + (1UL << SOCK_TSTAMP_NEW)) + +/* if set, use value set by setsockopt() - else use IPv4 or SMC sysctl value */ +static void smc_adjust_sock_bufsizes(struct sock *nsk, struct sock *osk, + unsigned long mask) +{ + struct net *nnet = sock_net(nsk); + + nsk->sk_userlocks = osk->sk_userlocks; + if (osk->sk_userlocks & SOCK_SNDBUF_LOCK) { + nsk->sk_sndbuf = osk->sk_sndbuf; + } else { + if (mask == SK_FLAGS_SMC_TO_CLC) + WRITE_ONCE(nsk->sk_sndbuf, + READ_ONCE(nnet->ipv4.sysctl_tcp_wmem[1])); + else + WRITE_ONCE(nsk->sk_sndbuf, + 2 * READ_ONCE(nnet->smc.sysctl_wmem)); + } + if (osk->sk_userlocks & SOCK_RCVBUF_LOCK) { + nsk->sk_rcvbuf = osk->sk_rcvbuf; + } else { + if (mask == SK_FLAGS_SMC_TO_CLC) + WRITE_ONCE(nsk->sk_rcvbuf, + READ_ONCE(nnet->ipv4.sysctl_tcp_rmem[1])); + else + WRITE_ONCE(nsk->sk_rcvbuf, + 2 * READ_ONCE(nnet->smc.sysctl_rmem)); + } +} + static void smc_copy_sock_settings(struct sock *nsk, struct sock *osk, unsigned long mask) { /* options we don't get control via setsockopt for */ nsk->sk_type = osk->sk_type; - nsk->sk_sndbuf = osk->sk_sndbuf; - nsk->sk_rcvbuf = osk->sk_rcvbuf; nsk->sk_sndtimeo = osk->sk_sndtimeo; nsk->sk_rcvtimeo = osk->sk_rcvtimeo; nsk->sk_mark = READ_ONCE(osk->sk_mark); @@ -453,26 +500,10 @@ static void smc_copy_sock_settings(struc
nsk->sk_flags &= ~mask; nsk->sk_flags |= osk->sk_flags & mask; + + smc_adjust_sock_bufsizes(nsk, osk, mask); }
-#define SK_FLAGS_SMC_TO_CLC ((1UL << SOCK_URGINLINE) | \ - (1UL << SOCK_KEEPOPEN) | \ - (1UL << SOCK_LINGER) | \ - (1UL << SOCK_BROADCAST) | \ - (1UL << SOCK_TIMESTAMP) | \ - (1UL << SOCK_DBG) | \ - (1UL << SOCK_RCVTSTAMP) | \ - (1UL << SOCK_RCVTSTAMPNS) | \ - (1UL << SOCK_LOCALROUTE) | \ - (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE) | \ - (1UL << SOCK_RXQ_OVFL) | \ - (1UL << SOCK_WIFI_STATUS) | \ - (1UL << SOCK_NOFCS) | \ - (1UL << SOCK_FILTER_LOCKED) | \ - (1UL << SOCK_TSTAMP_NEW)) -/* copy only relevant settings and flags of SOL_SOCKET level from smc to - * clc socket (since smc is not called for these options from net/core) - */ static void smc_copy_sock_settings_to_clc(struct smc_sock *smc) { smc_copy_sock_settings(smc->clcsock->sk, &smc->sk, SK_FLAGS_SMC_TO_CLC); @@ -2479,8 +2510,6 @@ static void smc_tcp_listen_work(struct w sock_hold(lsk); /* sock_put in smc_listen_work */ INIT_WORK(&new_smc->smc_listen_work, smc_listen_work); smc_copy_sock_settings_to_smc(new_smc); - new_smc->sk.sk_sndbuf = lsmc->sk.sk_sndbuf; - new_smc->sk.sk_rcvbuf = lsmc->sk.sk_rcvbuf; sock_hold(&new_smc->sk); /* sock_put in passive closing */ if (!queue_work(smc_hs_wq, &new_smc->smc_listen_work)) sock_put(&new_smc->sk);
From: Vladimir Oltean vladimir.oltean@nxp.com
commit 1a8c251cff2052b60009a070173308322e9600d3 upstream.
The blamed commit has broken probing on arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi when &enetc_port0 (PCI function 0) has status = "disabled".
Background: pci_scan_slot() has logic to say that if the function 0 of a device is absent, the entire device is absent and we can skip the other functions entirely. Traditionally, this has meant that pci_bus_read_dev_vendor_id() returns an error code for that function.
However, since the blamed commit, there is an extra confounding condition: function 0 of the device exists and has a valid vendor id, but it is disabled in the device tree. In that case, pci_scan_slot() would incorrectly skip the entire device instead of just that function.
In the case of NXP LS1028A, status = "disabled" does not mean that the PCI function's config space is not available for reading. It is, but the Ethernet port is just not functionally useful with a particular SerDes protocol configuration (0x9999) due to pinmuxing constraints of the Soc. So, pci_scan_slot() skips all other functions on the ENETC ECAM (enetc_port1, enetc_port2, enetc_mdio_pf3 etc) when just enetc_port0 had to not be probed.
There is an additional regression introduced by the change, caused by its fundamental premise. The enetc driver needs to run code for all PCI functions, regardless of whether they're enabled or not in the device tree. That is no longer possible if the driver's probe function is no longer called. But Rob recommends that we move the of_device_is_available() detection to dev->match_driver, and this makes the PCI fixups still run on all functions, while just probing drivers for those functions that are enabled. So, a separate change in the enetc driver will have to move the workarounds to a PCI fixup.
Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFh... Suggested-by: Rob Herring robh@kernel.org Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/bus.c | 4 +++- drivers/pci/of.c | 5 ----- 2 files changed, 3 insertions(+), 6 deletions(-)
--- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -11,6 +11,7 @@ #include <linux/pci.h> #include <linux/errno.h> #include <linux/ioport.h> +#include <linux/of.h> #include <linux/proc_fs.h> #include <linux/slab.h>
@@ -332,6 +333,7 @@ void __weak pcibios_bus_add_device(struc */ void pci_bus_add_device(struct pci_dev *dev) { + struct device_node *dn = dev->dev.of_node; int retval;
/* @@ -344,7 +346,7 @@ void pci_bus_add_device(struct pci_dev * pci_proc_attach_device(dev); pci_bridge_d3_update(dev);
- dev->match_driver = true; + dev->match_driver = !dn || of_device_is_available(dn); retval = device_attach(&dev->dev); if (retval < 0 && retval != -EPROBE_DEFER) pci_warn(dev, "device attach failed (%d)\n", retval); --- a/drivers/pci/of.c +++ b/drivers/pci/of.c @@ -34,11 +34,6 @@ int pci_set_of_node(struct pci_dev *dev) if (!node) return 0;
- if (!of_device_is_available(node)) { - of_node_put(node); - return -ENODEV; - } - dev->dev.of_node = node; dev->dev.fwnode = &node->fwnode; return 0;
From: Eric Dumazet edumazet@google.com
commit 8a70ed9520c5fafaac91053cacdd44625c39e188 upstream.
Before this code is copied, add the missing family, as we did in commit 3dd344ea84e1 ("net: tracepoint: exposing sk_family in all tcp:tracepoints")
Fixes: 15fcdf6ae116 ("tcp: Add tracepoint for tcp_set_ca_state") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Ping Gan jacky_gam_2001@163.com Cc: Manjusaka me@manjusaka.me Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230808084923.2239142-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/trace/events/tcp.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -381,6 +381,7 @@ TRACE_EVENT(tcp_cong_state_set, __field(const void *, skaddr) __field(__u16, sport) __field(__u16, dport) + __field(__u16, family) __array(__u8, saddr, 4) __array(__u8, daddr, 4) __array(__u8, saddr_v6, 16) @@ -396,6 +397,7 @@ TRACE_EVENT(tcp_cong_state_set,
__entry->sport = ntohs(inet->inet_sport); __entry->dport = ntohs(inet->inet_dport); + __entry->family = sk->sk_family;
p32 = (__be32 *) __entry->saddr; *p32 = inet->inet_saddr; @@ -409,7 +411,8 @@ TRACE_EVENT(tcp_cong_state_set, __entry->cong_state = ca_state; ),
- TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c cong_state=%u", + TP_printk("family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c cong_state=%u", + show_family_name(__entry->family), __entry->sport, __entry->dport, __entry->saddr, __entry->daddr, __entry->saddr_v6, __entry->daddr_v6,
From: Florian Westphal fw@strlen.de
commit 6a7ac3d20593865209dceb554d8b3f094c6bd940 upstream.
If we try to emit an icmp error in response to a nonliner skb, we get
BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220 Read of size 4 at addr ffff88811c50db00 by task iperf3/1691 CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309 [..] kasan_report+0x105/0x140 ip_compute_csum+0x134/0x220 iptunnel_pmtud_build_icmp+0x554/0x1020 skb_tunnel_check_pmtu+0x513/0xb80 vxlan_xmit_one+0x139e/0x2ef0 vxlan_xmit+0x1867/0x2760 dev_hard_start_xmit+0x1ee/0x4f0 br_dev_queue_push_xmit+0x4d1/0x660 [..]
ip_compute_csum() cannot deal with nonlinear skbs, so avoid it. After this change, splat is gone and iperf3 is no longer stuck.
Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Florian Westphal fw@strlen.de Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ip_tunnel_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -224,7 +224,7 @@ static int iptunnel_pmtud_build_icmp(str .un.frag.__unused = 0, .un.frag.mtu = htons(mtu), }; - icmph->checksum = ip_compute_csum(icmph, len); + icmph->checksum = csum_fold(skb_checksum(skb, 0, len, 0)); skb_reset_transport_header(skb);
niph = skb_push(skb, sizeof(*niph));
From: Magnus Karlsson magnus.karlsson@intel.com
commit 85c2c79a07302fe68a1ad5cc449458cc559e314d upstream.
Fix a refcount underflow problem reported by syzbot that can happen when a system is running out of memory. If xp_alloc_tx_descs() fails, and it can only fail due to not having enough memory, then the error path is triggered. In this error path, the refcount of the pool is decremented as it has incremented before. However, the reference to the pool in the socket was not nulled. This means that when the socket is closed later, the socket teardown logic will think that there is a pool attached to the socket and try to decrease the refcount again, leading to a refcount underflow.
I chose this fix as it involved adding just a single line. Another option would have been to move xp_get_pool() and the assignment of xs->pool to after the if-statement and using xs_umem->pool instead of xs->pool in the whole if-statement resulting in somewhat simpler code, but this would have led to much more churn in the code base perhaps making it harder to backport.
Fixes: ba3beec2ec1d ("xsk: Fix possible crash when multiple sockets are created") Reported-by: syzbot+8ada0057e69293a05fd4@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Link: https://lore.kernel.org/r/20230809142843.13944-1-magnus.karlsson@gmail.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/xdp/xsk.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -994,6 +994,7 @@ static int xsk_bind(struct socket *sock, err = xp_alloc_tx_descs(xs->pool, xs); if (err) { xp_put_pool(xs->pool); + xs->pool = NULL; sockfd_put(sock); goto out_unlock; }
From: Ziyang Xuan william.xuanziyang@huawei.com
commit 01f4fd27087078c90a0e22860d1dfa2cd0510791 upstream.
BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with following testcase:
# ip netns add ns1 # ip netns exec ns1 ip link add bond0 type bond mode 0 # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 # ip netns exec ns1 ip link set bond_slave_1 master bond0 # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link set bond_slave_1 nomaster # ip netns del ns1
The logical analysis of the problem is as follows:
1. create ETH_P_8021AD protocol vlan10 for bond_slave_1: register_vlan_dev() vlan_vid_add() vlan_info_alloc() __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1
2. create ETH_P_8021AD protocol bond0_vlan10 for bond0: register_vlan_dev() vlan_vid_add() __vlan_vid_add() vlan_add_rx_filter_info() if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER return 0;
if (netif_device_present(dev)) return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid.
3. detach bond_slave_1 from bond0: __bond_release_one() vlan_vids_del_by_dev() list_for_each_entry(vid_info, &vlan_info->vid_list, list) vlan_vid_del(dev, vid_info->proto, vid_info->vid); // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted. // bond_slave_1->vlan_info will be assigned NULL.
4. delete vlan10 during delete ns1: default_device_exit_batch() dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10 vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1 BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!!
Add S-VLAN tag related features support to bond driver. So the bond driver will always propagate the VLAN info to its slaves.
Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support") Suggested-by: Ido Schimmel idosch@idosch.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Ido Schimmel idosch@nvidia.com Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -5898,7 +5898,9 @@ void bond_setup(struct net_device *bond_
bond_dev->hw_features = BOND_VLAN_FEATURES | NETIF_F_HW_VLAN_CTAG_RX | - NETIF_F_HW_VLAN_CTAG_FILTER; + NETIF_F_HW_VLAN_CTAG_FILTER | + NETIF_F_HW_VLAN_STAG_RX | + NETIF_F_HW_VLAN_STAG_FILTER;
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; bond_dev->features |= bond_dev->hw_features;
From: Eric Dumazet edumazet@google.com
commit a47e598fbd8617967e49d85c49c22f9fc642704c upstream.
dccp_sendmsg() reads dp->dccps_mss_cache before locking the socket. Same thing in do_dccp_getsockopt().
Add READ_ONCE()/WRITE_ONCE() annotations, and change dccp_sendmsg() to check again dccps_mss_cache after socket is locked.
Fixes: 7c657876b63c ("[DCCP]: Initial implementation") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20230803163021.2958262-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/output.c | 2 +- net/dccp/proto.c | 10 ++++++++-- 2 files changed, 9 insertions(+), 3 deletions(-)
--- a/net/dccp/output.c +++ b/net/dccp/output.c @@ -187,7 +187,7 @@ unsigned int dccp_sync_mss(struct sock *
/* And store cached results */ icsk->icsk_pmtu_cookie = pmtu; - dp->dccps_mss_cache = cur_mps; + WRITE_ONCE(dp->dccps_mss_cache, cur_mps);
return cur_mps; } --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -630,7 +630,7 @@ static int do_dccp_getsockopt(struct soc return dccp_getsockopt_service(sk, len, (__be32 __user *)optval, optlen); case DCCP_SOCKOPT_GET_CUR_MPS: - val = dp->dccps_mss_cache; + val = READ_ONCE(dp->dccps_mss_cache); break; case DCCP_SOCKOPT_AVAILABLE_CCIDS: return ccid_getsockopt_builtin_ccids(sk, len, optval, optlen); @@ -739,7 +739,7 @@ int dccp_sendmsg(struct sock *sk, struct
trace_dccp_probe(sk, len);
- if (len > dp->dccps_mss_cache) + if (len > READ_ONCE(dp->dccps_mss_cache)) return -EMSGSIZE;
lock_sock(sk); @@ -772,6 +772,12 @@ int dccp_sendmsg(struct sock *sk, struct goto out_discard; }
+ /* We need to check dccps_mss_cache after socket is locked. */ + if (len > dp->dccps_mss_cache) { + rc = -EMSGSIZE; + goto out_discard; + } + skb_reserve(skb, sk->sk_prot->max_header); rc = memcpy_from_msg(skb_put(skb, len), msg, len); if (rc != 0)
From: Andrew Kanner andrew.kanner@gmail.com
commit 59eeb232940515590de513b997539ef495faca9a upstream.
Using the syzkaller repro with reduced packet size it was discovered that XDP_PACKET_HEADROOM is not checked in tun_can_build_skb(), although pad may be incremented in tun_build_skb(). This may end up with exceeding the PAGE_SIZE limit in tun_build_skb().
Jason Wang jasowang@redhat.com proposed to count XDP_PACKET_HEADROOM always (e.g. without rcu_access_pointer(tun->xdp_prog)) in tun_can_build_skb() since there's a window during which XDP program might be attached between tun_can_build_skb() and tun_build_skb().
Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set") Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a Signed-off-by: Andrew Kanner andrew.kanner@gmail.com Link: https://lore.kernel.org/r/20230803185947.2379988-1-andrew.kanner@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1594,7 +1594,7 @@ static bool tun_can_build_skb(struct tun if (zerocopy) return false;
- if (SKB_DATA_ALIGN(len + TUN_RX_PAD) + + if (SKB_DATA_ALIGN(len + TUN_RX_PAD + XDP_PACKET_HEADROOM) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) return false;
From: Fedor Pchelkin pchelkin@ispras.ru
commit b1c936e9af5dd08636d568736fc6075ed9d1d529 upstream.
In case rhashtable_lookup_insert_fast() fails inside vxlan_vni_add(), the allocated percpu vni stats are not freed on the error path.
Introduce vxlan_vni_free() which would work as a nice wrapper to free vxlan_vni_node resources properly.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4095e0e1328a ("drivers: vxlan: vnifilter: per vni stats") Suggested-by: Ido Schimmel idosch@idosch.org Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/vxlan/vxlan_vnifilter.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/drivers/net/vxlan/vxlan_vnifilter.c +++ b/drivers/net/vxlan/vxlan_vnifilter.c @@ -713,6 +713,12 @@ static struct vxlan_vni_node *vxlan_vni_ return vninode; }
+static void vxlan_vni_free(struct vxlan_vni_node *vninode) +{ + free_percpu(vninode->stats); + kfree(vninode); +} + static int vxlan_vni_add(struct vxlan_dev *vxlan, struct vxlan_vni_group *vg, u32 vni, union vxlan_addr *group, @@ -740,7 +746,7 @@ static int vxlan_vni_add(struct vxlan_de &vninode->vnode, vxlan_vni_rht_params); if (err) { - kfree(vninode); + vxlan_vni_free(vninode); return err; }
@@ -763,8 +769,7 @@ static void vxlan_vni_node_rcu_free(stru struct vxlan_vni_node *v;
v = container_of(rcu, struct vxlan_vni_node, rcu); - free_percpu(v->stats); - kfree(v); + vxlan_vni_free(v); }
static int vxlan_vni_del(struct vxlan_dev *vxlan,
From: Piotr Gardocki piotrx.gardocki@intel.com
commit 0fb1d8eb234b6979d4981d2d385780dd7d8d9771 upstream.
Add fdir_fltr_lock locking in unprotected places.
The change in iavf_fdir_is_dup_fltr adds a spinlock around a loop which iterates over all filters and looks for a duplicate. The filter can be removed from list and freed from memory at the same time it's being compared. All other places where filters are deleted are already protected with spinlock.
The remaining changes protect adapter->fdir_active_fltr variable so now all its uses are under a spinlock.
Fixes: 527691bf0682 ("iavf: Support IPv4 Flow Director filters") Signed-off-by: Piotr Gardocki piotrx.gardocki@intel.com Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230807205011.3129224-1-anthony.l.nguyen@intel.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 5 ++++- drivers/net/ethernet/intel/iavf/iavf_fdir.c | 11 ++++++++--- 2 files changed, 12 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c @@ -1401,14 +1401,15 @@ static int iavf_add_fdir_ethtool(struct if (fsp->flow_type & FLOW_MAC_EXT) return -EINVAL;
+ spin_lock_bh(&adapter->fdir_fltr_lock); if (adapter->fdir_active_fltr >= IAVF_MAX_FDIR_FILTERS) { + spin_unlock_bh(&adapter->fdir_fltr_lock); dev_err(&adapter->pdev->dev, "Unable to add Flow Director filter because VF reached the limit of max allowed filters (%u)\n", IAVF_MAX_FDIR_FILTERS); return -ENOSPC; }
- spin_lock_bh(&adapter->fdir_fltr_lock); if (iavf_find_fdir_fltr_by_loc(adapter, fsp->location)) { dev_err(&adapter->pdev->dev, "Failed to add Flow Director filter, it already exists\n"); spin_unlock_bh(&adapter->fdir_fltr_lock); @@ -1781,7 +1782,9 @@ static int iavf_get_rxnfc(struct net_dev case ETHTOOL_GRXCLSRLCNT: if (!FDIR_FLTR_SUPPORT(adapter)) break; + spin_lock_bh(&adapter->fdir_fltr_lock); cmd->rule_cnt = adapter->fdir_active_fltr; + spin_unlock_bh(&adapter->fdir_fltr_lock); cmd->data = IAVF_MAX_FDIR_FILTERS; ret = 0; break; --- a/drivers/net/ethernet/intel/iavf/iavf_fdir.c +++ b/drivers/net/ethernet/intel/iavf/iavf_fdir.c @@ -722,7 +722,9 @@ void iavf_print_fdir_fltr(struct iavf_ad bool iavf_fdir_is_dup_fltr(struct iavf_adapter *adapter, struct iavf_fdir_fltr *fltr) { struct iavf_fdir_fltr *tmp; + bool ret = false;
+ spin_lock_bh(&adapter->fdir_fltr_lock); list_for_each_entry(tmp, &adapter->fdir_list_head, list) { if (tmp->flow_type != fltr->flow_type) continue; @@ -732,11 +734,14 @@ bool iavf_fdir_is_dup_fltr(struct iavf_a !memcmp(&tmp->ip_data, &fltr->ip_data, sizeof(fltr->ip_data)) && !memcmp(&tmp->ext_data, &fltr->ext_data, - sizeof(fltr->ext_data))) - return true; + sizeof(fltr->ext_data))) { + ret = true; + break; + } } + spin_unlock_bh(&adapter->fdir_fltr_lock);
- return false; + return ret; }
/**
From: Douglas Miller doug.miller@cornelisnetworks.com
commit 4fdfaef71fced490835145631a795497646f4555 upstream.
During hotplug remove it is possible that the update counters work might be pending, and may run after memory has been freed. Cancel the update counters work before freeing memory.
Fixes: 7724105686e7 ("IB/hfi1: add driver files") Signed-off-by: Douglas Miller doug.miller@cornelisnetworks.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Link: https://lore.kernel.org/r/169099756100.3927190.15284930454106475280.stgit@aw... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/hfi1/chip.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/infiniband/hw/hfi1/chip.c +++ b/drivers/infiniband/hw/hfi1/chip.c @@ -12307,6 +12307,7 @@ static void free_cntrs(struct hfi1_devda
if (dd->synth_stats_timer.function) del_timer_sync(&dd->synth_stats_timer); + cancel_work_sync(&dd->update_cntr_work); ppd = (struct hfi1_pportdata *)(dd + 1); for (i = 0; i < dd->num_pports; i++, ppd++) { kfree(ppd->cntrs);
From: Mario Limonciello mario.limonciello@amd.com
commit 7ad1dfc144cbf62702fd07838da8fd8a77921083 upstream.
Some systems are only connected by HDMI or DP, so warning related to missing eDP is unnecessary. Downgrade to debug instead.
Cc: Hamza Mahfooz hamza.mahfooz@amd.com Fixes: 6d9b6dceaa51 ("drm/amd/display: only warn once in dce110_edp_wait_for_hpd_ready()") Reported-by: Mastan.Katragadda@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -780,7 +780,8 @@ void dce110_edp_wait_for_hpd_ready( dal_gpio_destroy_irq(&hpd);
/* ensure that the panel is detected */ - ASSERT(edp_hpd_high); + if (!edp_hpd_high) + DC_LOG_DC("%s: wait timed out!\n", __func__); }
void dce110_edp_power_control(
From: Pin-yen Lin treapking@chromium.org
commit e9d699af3f65d62cf195f0e7a039400093ab2af2 upstream.
On system resume, the driver might call it6505_poweron directly if the runtime PM hasn't been enabled. In such case, pm_runtime_get_if_in_use will always return 0 because dev->power.runtime_status stays at RPM_SUSPENDED, and the IRQ will never be handled.
Use it6505->powered from the driver struct fixes this because it always gets updated when it6505_poweron is called.
Fixes: 5eb9a4314053 ("drm/bridge: it6505: Guard bridge power in IRQ handler") Signed-off-by: Pin-yen Lin treapking@chromium.org Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20230727100131.2338127-1-treap... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/bridge/ite-it6505.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/ite-it6505.c b/drivers/gpu/drm/bridge/ite-it6505.c index 504d51c42f79..aadb396508c5 100644 --- a/drivers/gpu/drm/bridge/ite-it6505.c +++ b/drivers/gpu/drm/bridge/ite-it6505.c @@ -2517,9 +2517,11 @@ static irqreturn_t it6505_int_threaded_handler(int unused, void *data) }; int int_status[3], i;
- if (it6505->enable_drv_hold || pm_runtime_get_if_in_use(dev) <= 0) + if (it6505->enable_drv_hold || !it6505->powered) return IRQ_HANDLED;
+ pm_runtime_get_sync(dev); + int_status[0] = it6505_read(it6505, INT_STATUS_01); int_status[1] = it6505_read(it6505, INT_STATUS_02); int_status[2] = it6505_read(it6505, INT_STATUS_03);
From: Arnd Bergmann arnd@arndb.de
commit 421dabcad1c69e02a41c0d601aefbc29ee3f5368 upstream.
tu102_gr_load() is completely unused and can be removed to address this warning:
drivers/gpu/drm/nouveau/dispnv50/disp.c:2517:1: error: no previous prototype for 'nv50_display_create'
Another patch was sent in the meantime to mark the function static but that would just cause a different warning about an unused function.
Fixes: 1cd97b5490c8 ("drm/nouveau/gr/tu102-: use sw_veid_bundle_init from firmware") Link: https://lore.kernel.org/all/CACO55tuaNOYphHyB9+ygi9AnXVuF49etsW7x2X5K5iEtFNA... Link: https://lore.kernel.org/all/20230417210310.2443152-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Karol Herbst kherbst@redhat.com Signed-off-by: Karol Herbst kherbst@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20230803143358.13563-1-arnd@ke... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c | 13 ------------- 1 file changed, 13 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c b/drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c index 3b6c8100a242..a7775aa18541 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c @@ -206,19 +206,6 @@ tu102_gr_av_to_init_veid(struct nvkm_blob *blob, struct gf100_gr_pack **ppack) return gk20a_gr_av_to_init_(blob, 64, 0x00100000, ppack); }
-int -tu102_gr_load(struct gf100_gr *gr, int ver, const struct gf100_gr_fwif *fwif) -{ - int ret; - - ret = gm200_gr_load(gr, ver, fwif); - if (ret) - return ret; - - return gk20a_gr_load_net(gr, "gr/", "sw_veid_bundle_init", ver, tu102_gr_av_to_init_veid, - &gr->bundle_veid); -} - static const struct gf100_gr_fwif tu102_gr_fwif[] = { { 0, gm200_gr_load, &tu102_gr, &gp108_gr_fecs_acr, &gp108_gr_gpccs_acr },
From: Daniel Stone daniels@collabora.com
commit 43dae319b50fac075ad864f84501c703ef20eb2b upstream.
Userspace should not be able to trigger DRM_ERROR messages to spam the logs; especially not through atomic commit parameters which are completely legitimate for userspace to attempt.
Signed-off-by: Daniel Stone daniels@collabora.com Fixes: 7707f7227f09 ("drm/rockchip: Add support for afbc") Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20230808104405.522493-1-daniel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -833,12 +833,12 @@ static int vop_plane_atomic_check(struct * need align with 2 pixel. */ if (fb->format->is_yuv && ((new_plane_state->src.x1 >> 16) % 2)) { - DRM_ERROR("Invalid Source: Yuv format not support odd xpos\n"); + DRM_DEBUG_KMS("Invalid Source: Yuv format not support odd xpos\n"); return -EINVAL; }
if (fb->format->is_yuv && new_plane_state->rotation & DRM_MODE_REFLECT_Y) { - DRM_ERROR("Invalid Source: Yuv format does not support this rotation\n"); + DRM_DEBUG_KMS("Invalid Source: Yuv format does not support this rotation\n"); return -EINVAL; }
@@ -846,7 +846,7 @@ static int vop_plane_atomic_check(struct struct vop *vop = to_vop(crtc);
if (!vop->data->afbc) { - DRM_ERROR("vop does not support AFBC\n"); + DRM_DEBUG_KMS("vop does not support AFBC\n"); return -EINVAL; }
@@ -855,15 +855,16 @@ static int vop_plane_atomic_check(struct return ret;
if (new_plane_state->src.x1 || new_plane_state->src.y1) { - DRM_ERROR("AFBC does not support offset display, xpos=%d, ypos=%d, offset=%d\n", - new_plane_state->src.x1, - new_plane_state->src.y1, fb->offsets[0]); + DRM_DEBUG_KMS("AFBC does not support offset display, " \ + "xpos=%d, ypos=%d, offset=%d\n", + new_plane_state->src.x1, new_plane_state->src.y1, + fb->offsets[0]); return -EINVAL; }
if (new_plane_state->rotation && new_plane_state->rotation != DRM_MODE_ROTATE_0) { - DRM_ERROR("No rotation support in AFBC, rotation=%d\n", - new_plane_state->rotation); + DRM_DEBUG_KMS("No rotation support in AFBC, rotation=%d\n", + new_plane_state->rotation); return -EINVAL; } }
From: Petr Tesarik petr.tesarik.ext@huawei.com
commit 07d698324110339b420deebab7a7805815340b4f upstream.
Return -ENOMEM from brcmf_run_escan() if kzalloc() fails for v1 params.
Fixes: 398ce273d6b1 ("wifi: brcmfmac: cfg80211: Add support for scan params v2") Signed-off-by: Petr Tesarik petr.tesarik.ext@huawei.com Link: https://lore.kernel.org/r/20230802163430.1656-1-petrtesarik@huaweicloud.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index de8a2e27f49c..2a90bb24ba77 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -1456,6 +1456,10 @@ brcmf_run_escan(struct brcmf_cfg80211_info *cfg, struct brcmf_if *ifp, params_size -= BRCMF_SCAN_PARAMS_V2_FIXED_SIZE; params_size += BRCMF_SCAN_PARAMS_FIXED_SIZE; params_v1 = kzalloc(params_size, GFP_KERNEL); + if (!params_v1) { + err = -ENOMEM; + goto exit_params; + } params_v1->version = cpu_to_le32(BRCMF_ESCAN_REQ_VERSION); brcmf_scan_params_v2_to_v1(¶ms->params_v2_le, ¶ms_v1->params_le); kfree(params); @@ -1473,6 +1477,7 @@ brcmf_run_escan(struct brcmf_cfg80211_info *cfg, struct brcmf_if *ifp, bphy_err(drvr, "error (%d)\n", err); }
+exit_params: kfree(params); exit: return err;
From: Felix Fietkau nbd@nbd.name
commit 5fb9a9fb71a33be61d7d8e8ba4597bfb18d604d0 upstream.
AP_VLAN interfaces are virtual, so doesn't really exist as a type for capabilities. When passed in as a type, AP is the one that's really intended.
Fixes: c4cbaf7973a7 ("cfg80211: Add support for HE") Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230622165919.46841-1-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/cfg80211.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -562,6 +562,9 @@ ieee80211_get_sband_iftype_data(const st if (WARN_ON(iftype >= NL80211_IFTYPE_MAX)) return NULL;
+ if (iftype == NL80211_IFTYPE_AP_VLAN) + iftype = NL80211_IFTYPE_AP; + for (i = 0; i < sband->n_iftype_data; i++) { const struct ieee80211_sband_iftype_data *data = &sband->iftype_data[i];
From: Michael Guralnik michaelgur@nvidia.com
commit 186b169cf1e4be85aa212a893ea783a543400979 upstream.
Fixing the ODP registration flow to set the iova correctly. The calculation in ib_umem_num_dma_blocks() function assumes the iova of the umem is set correctly.
When iova is not set, the calculation in ib_umem_num_dma_blocks() is equivalent to length/page_size, which is true only when memory is aligned. For unaligned memory, iova must be set for the ALIGN() in the ib_umem_num_dma_blocks() to take effect and return a correct value.
mlx5_ib uses ib_umem_num_dma_blocks() to decide the mkey size to use for the MR. Without this fix, when registering unaligned ODP MR, a wrong size mkey might be chosen and this might cause the UMR to fail.
UMR would fail over insufficient size to update the mkey translation: infiniband mlx5_0: dump_cqe:273:(pid 0): dump error cqe 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 0f 00 78 06 25 00 00 58 00 da ac d2 infiniband mlx5_0: mlx5_ib_post_send_wait:806:(pid 20311): reg umr failed (6) infiniband mlx5_0: pagefault_real_mr:661:(pid 20311): Failed to update mkey page tables
Fixes: f0093fb1a7cb ("RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr") Fixes: a665aca89a41 ("RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()") Signed-off-by: Artemy Kovalyov artemyko@nvidia.com Signed-off-by: Michael Guralnik michaelgur@nvidia.com Link: https://lore.kernel.org/r/3d4be7ca2155bf239dd8c00a2d25974a92c26ab8.168975734... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/core/umem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -85,6 +85,8 @@ unsigned long ib_umem_find_best_pgsz(str dma_addr_t mask; int i;
+ umem->iova = va = virt; + if (umem->is_odp) { unsigned int page_size = BIT(to_ib_umem_odp(umem)->page_shift);
@@ -100,7 +102,6 @@ unsigned long ib_umem_find_best_pgsz(str */ pgsz_bitmap &= GENMASK(BITS_PER_LONG - 1, PAGE_SHIFT);
- umem->iova = va = virt; /* The best result is the smallest page size that results in the minimum * number of required pages. Compute the largest page size that could * work based on VA address bits that don't change.
From: Selvin Xavier selvin.xavier@broadcom.com
commit 5363fc488da579923edf6a2fdca3d3b651dd800b upstream.
ib_dealloc_device() should be called only after device cleanup. Fix the dealloc sequence.
Fixes: 6d758147c7b8 ("RDMA/bnxt_re: Use auxiliary driver interface") Link: https://lore.kernel.org/r/1691642677-21369-2-git-send-email-selvin.xavier@br... Signed-off-by: Selvin Xavier selvin.xavier@broadcom.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/bnxt_re/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/infiniband/hw/bnxt_re/main.c +++ b/drivers/infiniband/hw/bnxt_re/main.c @@ -1425,8 +1425,8 @@ static void bnxt_re_remove(struct auxili } bnxt_re_setup_cc(rdev, false); ib_unregister_device(&rdev->ibdev); - ib_dealloc_device(&rdev->ibdev); bnxt_re_dev_uninit(rdev); + ib_dealloc_device(&rdev->ibdev); skip_remove: mutex_unlock(&bnxt_re_mutex); }
From: Kalesh AP kalesh-anakkur.purayil@broadcom.com
commit 5ac8480ae4d01f0ca5dfd561884424046df2478a upstream.
During bnxt_re_dev_init(), when bnxt_re_setup_chip_ctx() fails unregister with L2 first before bailing out probe.
Fixes: ae8637e13185 ("RDMA/bnxt_re: Add chip context to identify 57500 series") Link: https://lore.kernel.org/r/1691642677-21369-3-git-send-email-selvin.xavier@br... Signed-off-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: Selvin Xavier selvin.xavier@broadcom.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/bnxt_re/main.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/infiniband/hw/bnxt_re/main.c +++ b/drivers/infiniband/hw/bnxt_re/main.c @@ -1152,6 +1152,8 @@ static int bnxt_re_dev_init(struct bnxt_
rc = bnxt_re_setup_chip_ctx(rdev, wqe_mode); if (rc) { + bnxt_unregister_dev(rdev->en_dev); + clear_bit(BNXT_RE_FLAG_NETDEV_REGISTERED, &rdev->flags); ibdev_err(&rdev->ibdev, "Failed to get chip context\n"); return -EINVAL; }
From: Jakub Kicinski kuba@kernel.org
commit 6b47808f223c70ff564f9b363446d2a5fa1e05b2 upstream.
TLS records end with a 16B tag. For TLS device offload we only need to make space for this tag in the stream, the device will generate and replace it with the actual calculated tag.
Long time ago the code would just re-reference the head frag which mostly worked but was suboptimal because it prevented TCP from combining the record into a single skb frag. I'm not sure if it was correct as the first frag may be shorter than the tag.
The commit under fixes tried to replace that with using the page frag and if the allocation failed rolling back the data, if record was long enough. It achieves better fragment coalescing but is also buggy.
We don't roll back the iterator, so unless we're at the end of send we'll skip the data we designated as tag and start the next record as if the rollback never happened. There's also the possibility that the record was constructed with MSG_MORE and the data came from a different syscall and we already told the user space that we "got it".
Allocate a single dummy page and use it as fallback.
Found by code inspection, and proven by forcing allocation failures.
Fixes: e7b159a48ba6 ("net/tls: remove the record tail optimization") Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tls/tls_device.c | 64 ++++++++++++++++++++++++++------------------------- 1 file changed, 33 insertions(+), 31 deletions(-)
--- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -52,6 +52,8 @@ static LIST_HEAD(tls_device_list); static LIST_HEAD(tls_device_down_list); static DEFINE_SPINLOCK(tls_device_lock);
+static struct page *dummy_page; + static void tls_device_free_ctx(struct tls_context *ctx) { if (ctx->tx_conf == TLS_HW) { @@ -313,36 +315,33 @@ static int tls_push_record(struct sock * return tls_push_sg(sk, ctx, offload_ctx->sg_tx_data, 0, flags); }
-static int tls_device_record_close(struct sock *sk, - struct tls_context *ctx, - struct tls_record_info *record, - struct page_frag *pfrag, - unsigned char record_type) +static void tls_device_record_close(struct sock *sk, + struct tls_context *ctx, + struct tls_record_info *record, + struct page_frag *pfrag, + unsigned char record_type) { struct tls_prot_info *prot = &ctx->prot_info; - int ret; + struct page_frag dummy_tag_frag;
/* append tag * device will fill in the tag, we just need to append a placeholder * use socket memory to improve coalescing (re-using a single buffer * increases frag count) - * if we can't allocate memory now, steal some back from data + * if we can't allocate memory now use the dummy page */ - if (likely(skb_page_frag_refill(prot->tag_size, pfrag, - sk->sk_allocation))) { - ret = 0; - tls_append_frag(record, pfrag, prot->tag_size); - } else { - ret = prot->tag_size; - if (record->len <= prot->overhead_size) - return -ENOMEM; + if (unlikely(pfrag->size - pfrag->offset < prot->tag_size) && + !skb_page_frag_refill(prot->tag_size, pfrag, sk->sk_allocation)) { + dummy_tag_frag.page = dummy_page; + dummy_tag_frag.offset = 0; + pfrag = &dummy_tag_frag; } + tls_append_frag(record, pfrag, prot->tag_size);
/* fill prepend */ tls_fill_prepend(ctx, skb_frag_address(&record->frags[0]), record->len - prot->overhead_size, record_type); - return ret; }
static int tls_create_new_record(struct tls_offload_context_tx *offload_ctx, @@ -535,18 +534,8 @@ last_record:
if (done || record->len >= max_open_record_len || (record->num_frags >= MAX_SKB_FRAGS - 1)) { - rc = tls_device_record_close(sk, tls_ctx, record, - pfrag, record_type); - if (rc) { - if (rc > 0) { - size += rc; - } else { - size = orig_size; - destroy_record(record); - ctx->open_record = NULL; - break; - } - } + tls_device_record_close(sk, tls_ctx, record, + pfrag, record_type);
rc = tls_push_record(sk, tls_ctx, @@ -1466,14 +1455,26 @@ int __init tls_device_init(void) { int err;
- destruct_wq = alloc_workqueue("ktls_device_destruct", 0, 0); - if (!destruct_wq) + dummy_page = alloc_page(GFP_KERNEL); + if (!dummy_page) return -ENOMEM;
+ destruct_wq = alloc_workqueue("ktls_device_destruct", 0, 0); + if (!destruct_wq) { + err = -ENOMEM; + goto err_free_dummy; + } + err = register_netdevice_notifier(&tls_dev_notifier); if (err) - destroy_workqueue(destruct_wq); + goto err_destroy_wq;
+ return 0; + +err_destroy_wq: + destroy_workqueue(destruct_wq); +err_free_dummy: + put_page(dummy_page); return err; }
@@ -1482,4 +1483,5 @@ void __exit tls_device_cleanup(void) unregister_netdevice_notifier(&tls_dev_notifier); destroy_workqueue(destruct_wq); clean_acked_data_flush(); + put_page(dummy_page); }
From: Jonas Gorski jonas.gorski@bisdn.de
commit 2aa71b4b294ee2c3041d085404cea914be9b3225 upstream.
Fix handling IPv4 routes referencing a nexthop via its id by replacing calls to fib_info_nh() with fib_info_nhc().
Trying to add an IPv4 route referencing a nextop via nhid:
$ ip link set up swp5 $ ip a a 10.0.0.1/24 dev swp5 $ ip nexthop add dev swp5 id 20 via 10.0.0.2 $ ip route add 10.0.1.0/24 nhid 20
triggers warnings when trying to handle the route:
[ 528.805763] ------------[ cut here ]------------ [ 528.810437] WARNING: CPU: 3 PID: 53 at include/net/nexthop.h:468 __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 528.820434] Modules linked in: prestera_pci act_gact act_police sch_ingress cls_u32 cls_flower prestera arm64_delta_tn48m_dn_led(O) arm64_delta_tn48m_dn_cpld(O) [last unloaded: prestera_pci] [ 528.837485] CPU: 3 PID: 53 Comm: kworker/u8:3 Tainted: G O 6.4.5 #1 [ 528.845178] Hardware name: delta,tn48m-dn (DT) [ 528.849641] Workqueue: prestera_ordered __prestera_router_fib_event_work [prestera] [ 528.857352] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 528.864347] pc : __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 528.870135] lr : prestera_k_arb_fib_evt+0xb20/0xd50 [prestera] [ 528.876007] sp : ffff80000b20bc90 [ 528.879336] x29: ffff80000b20bc90 x28: 0000000000000000 x27: ffff0001374d3a48 [ 528.886510] x26: ffff000105604000 x25: ffff000134af8a28 x24: ffff0001374d3800 [ 528.893683] x23: ffff000101c89148 x22: ffff000101c89000 x21: ffff000101c89200 [ 528.900855] x20: ffff00013641fda0 x19: ffff800009d01088 x18: 0000000000000059 [ 528.908027] x17: 0000000000000277 x16: 0000000000000000 x15: 0000000000000000 [ 528.915198] x14: 0000000000000003 x13: 00000000000fe400 x12: 0000000000000000 [ 528.922371] x11: 0000000000000002 x10: 0000000000000aa0 x9 : ffff8000013d2020 [ 528.929543] x8 : 0000000000000018 x7 : 000000007b1703f8 x6 : 000000001ca72f86 [ 528.936715] x5 : 0000000033399ea7 x4 : 0000000000000000 x3 : ffff0001374d3acc [ 528.943886] x2 : 0000000000000000 x1 : ffff00010200de00 x0 : ffff000134ae3f80 [ 528.951058] Call trace: [ 528.953516] __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 528.958952] __prestera_router_fib_event_work+0x100/0x158 [prestera] [ 528.965348] process_one_work+0x208/0x488 [ 528.969387] worker_thread+0x4c/0x430 [ 528.973068] kthread+0x120/0x138 [ 528.976313] ret_from_fork+0x10/0x20 [ 528.979909] ---[ end trace 0000000000000000 ]--- [ 528.984998] ------------[ cut here ]------------ [ 528.989645] WARNING: CPU: 3 PID: 53 at include/net/nexthop.h:468 __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 528.999628] Modules linked in: prestera_pci act_gact act_police sch_ingress cls_u32 cls_flower prestera arm64_delta_tn48m_dn_led(O) arm64_delta_tn48m_dn_cpld(O) [last unloaded: prestera_pci] [ 529.016676] CPU: 3 PID: 53 Comm: kworker/u8:3 Tainted: G W O 6.4.5 #1 [ 529.024368] Hardware name: delta,tn48m-dn (DT) [ 529.028830] Workqueue: prestera_ordered __prestera_router_fib_event_work [prestera] [ 529.036539] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 529.043533] pc : __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 529.049318] lr : __prestera_k_arb_fc_apply+0x280/0x2f8 [prestera] [ 529.055452] sp : ffff80000b20bc60 [ 529.058781] x29: ffff80000b20bc60 x28: 0000000000000000 x27: ffff0001374d3a48 [ 529.065953] x26: ffff000105604000 x25: ffff000134af8a28 x24: ffff0001374d3800 [ 529.073126] x23: ffff000101c89148 x22: ffff000101c89148 x21: ffff00013641fda0 [ 529.080299] x20: ffff000101c89000 x19: ffff000101c89020 x18: 0000000000000059 [ 529.087471] x17: 0000000000000277 x16: 0000000000000000 x15: 0000000000000000 [ 529.094642] x14: 0000000000000003 x13: 00000000000fe400 x12: 0000000000000000 [ 529.101814] x11: 0000000000000002 x10: 0000000000000aa0 x9 : ffff8000013cee80 [ 529.108985] x8 : 0000000000000018 x7 : 000000007b1703f8 x6 : 0000000000000018 [ 529.116157] x5 : 00000000d3497eb6 x4 : ffff000105604081 x3 : 000000008e979557 [ 529.123329] x2 : 0000000000000000 x1 : ffff00010200de00 x0 : ffff000134ae3f80 [ 529.130501] Call trace: [ 529.132958] __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 529.138394] prestera_k_arb_fib_evt+0x6b8/0xd50 [prestera] [ 529.143918] __prestera_router_fib_event_work+0x100/0x158 [prestera] [ 529.150313] process_one_work+0x208/0x488 [ 529.154348] worker_thread+0x4c/0x430 [ 529.158030] kthread+0x120/0x138 [ 529.161274] ret_from_fork+0x10/0x20 [ 529.164867] ---[ end trace 0000000000000000 ]---
and results in a non offloaded route:
$ ip route 10.0.0.0/24 dev swp5 proto kernel scope link src 10.0.0.1 rt_trap 10.0.1.0/24 nhid 20 via 10.0.0.2 dev swp5 rt_trap
When creating a route referencing a nexthop via its ID, the nexthop will be stored in a separate nh pointer instead of the array of nexthops in the fib_info struct. This causes issues since fib_info_nh() only handles the nexthops array, but not the separate nh pointer, and will loudly WARN about it.
In contrast fib_info_nhc() handles both, but returns a fib_nh_common pointer instead of a fib_nh pointer. Luckily we only ever access fields from the fib_nh_common parts, so we can just replace all instances of fib_info_nh() with fib_info_nhc() and access the fields via their fib_nh_common names.
This allows handling IPv4 routes with an external nexthop, and they now get offloaded as expected:
$ ip route 10.0.0.0/24 dev swp5 proto kernel scope link src 10.0.0.1 rt_trap 10.0.1.0/24 nhid 20 via 10.0.0.2 dev swp5 offload rt_offload
Fixes: 396b80cb5cc8 ("net: marvell: prestera: Add neighbour cache accounting") Signed-off-by: Jonas Gorski jonas.gorski@bisdn.de Acked-by: Elad Nachman enachman@marvell.com Link: https://lore.kernel.org/r/20230804101220.247515-1-jonas.gorski@bisdn.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/marvell/prestera/prestera_router.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/marvell/prestera/prestera_router.c +++ b/drivers/net/ethernet/marvell/prestera/prestera_router.c @@ -166,11 +166,11 @@ prestera_util_neigh2nc_key(struct preste
static bool __prestera_fi_is_direct(struct fib_info *fi) { - struct fib_nh *fib_nh; + struct fib_nh_common *fib_nhc;
if (fib_info_num_path(fi) == 1) { - fib_nh = fib_info_nh(fi, 0); - if (fib_nh->fib_nh_gw_family == AF_UNSPEC) + fib_nhc = fib_info_nhc(fi, 0); + if (fib_nhc->nhc_gw_family == AF_UNSPEC) return true; }
@@ -261,7 +261,7 @@ static bool __prestera_util_kern_n_is_reachable_v4(u32 tb_id, __be32 *addr, struct net_device *dev) { - struct fib_nh *fib_nh; + struct fib_nh_common *fib_nhc; struct fib_result res; bool reachable;
@@ -269,8 +269,8 @@ __prestera_util_kern_n_is_reachable_v4(u
if (!prestera_util_kern_get_route(&res, tb_id, addr)) if (prestera_fi_is_direct(res.fi)) { - fib_nh = fib_info_nh(res.fi, 0); - if (dev == fib_nh->fib_nh_dev) + fib_nhc = fib_info_nhc(res.fi, 0); + if (dev == fib_nhc->nhc_dev) reachable = true; }
@@ -324,7 +324,7 @@ prestera_kern_fib_info_nhc(struct fib_no if (info->family == AF_INET) { fen4_info = container_of(info, struct fib_entry_notifier_info, info); - return &fib_info_nh(fen4_info->fi, n)->nh_common; + return fib_info_nhc(fen4_info->fi, n); } else if (info->family == AF_INET6) { fen6_info = container_of(info, struct fib6_entry_notifier_info, info);
From: Li Yang leoyang.li@nxp.com
commit d7791cec2304aea22eb2ada944e4d467302f5bfe upstream.
Since the AR8032 part does not support wol, remove related callbacks from it.
Fixes: 5800091a2061 ("net: phy: at803x: add support for AR8032 PHY") Signed-off-by: Li Yang leoyang.li@nxp.com Cc: David Bauer mail@david-bauer.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/at803x.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/net/phy/at803x.c +++ b/drivers/net/phy/at803x.c @@ -2086,8 +2086,6 @@ static struct phy_driver at803x_driver[] .flags = PHY_POLL_CABLE_TEST, .config_init = at803x_config_init, .link_change_notify = at803x_link_change_notify, - .set_wol = at803x_set_wol, - .get_wol = at803x_get_wol, .suspend = at803x_suspend, .resume = at803x_resume, /* PHY_BASIC_FEATURES */
From: Vladimir Oltean vladimir.oltean@nxp.com
commit a94c16a2fda010866b8858a386a8bfbeba4f72c5 upstream.
When the tagging protocol in current use is "ocelot-8021q" and we unbind the driver, we see this splat:
$ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind mscc_felix 0000:00:00.5 swp0: left promiscuous mode sja1105 spi2.0: Link is Down DSA: tree 1 torn down mscc_felix 0000:00:00.5 swp2: left promiscuous mode sja1105 spi2.2: Link is Down DSA: tree 3 torn down fsl_enetc 0000:00:00.2 eno2: left promiscuous mode mscc_felix 0000:00:00.5: Link is Down ------------[ cut here ]------------ RTNL: assertion failed at net/dsa/tag_8021q.c (409) WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0 Modules linked in: CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771 pc : dsa_tag_8021q_unregister+0x12c/0x1a0 lr : dsa_tag_8021q_unregister+0x12c/0x1a0 Call trace: dsa_tag_8021q_unregister+0x12c/0x1a0 felix_tag_8021q_teardown+0x130/0x150 felix_teardown+0x3c/0xd8 dsa_tree_teardown_switches+0xbc/0xe0 dsa_unregister_switch+0x168/0x260 felix_pci_remove+0x30/0x60 pci_device_remove+0x4c/0x100 device_release_driver_internal+0x188/0x288 device_links_unbind_consumers+0xfc/0x138 device_release_driver_internal+0xe0/0x288 device_driver_detach+0x24/0x38 unbind_store+0xd8/0x108 drv_attr_store+0x30/0x50 ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ RTNL: assertion failed at net/8021q/vlan_core.c (376) WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0 CPU: 1 PID: 329 Comm: bash Tainted: G W 6.5.0-rc3+ #771 pc : vlan_vid_del+0x1b8/0x1f0 lr : vlan_vid_del+0x1b8/0x1f0 dsa_tag_8021q_unregister+0x8c/0x1a0 felix_tag_8021q_teardown+0x130/0x150 felix_teardown+0x3c/0xd8 dsa_tree_teardown_switches+0xbc/0xe0 dsa_unregister_switch+0x168/0x260 felix_pci_remove+0x30/0x60 pci_device_remove+0x4c/0x100 device_release_driver_internal+0x188/0x288 device_links_unbind_consumers+0xfc/0x138 device_release_driver_internal+0xe0/0x288 device_driver_detach+0x24/0x38 unbind_store+0xd8/0x108 drv_attr_store+0x30/0x50 DSA: tree 0 torn down
This was somewhat not so easy to spot, because "ocelot-8021q" is not the default tagging protocol, and thus, not everyone who tests the unbinding path may have switched to it beforehand. The default felix_tag_npi_teardown() does not require rtnl_lock() to be held.
Fixes: 7c83a7c539ab ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://lore.kernel.org/r/20230803134253.2711124-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/ocelot/felix.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -1625,8 +1625,10 @@ static void felix_teardown(struct dsa_sw struct felix *felix = ocelot_to_felix(ocelot); struct dsa_port *dp;
+ rtnl_lock(); if (felix->tag_proto_ops) felix->tag_proto_ops->teardown(ds); + rtnl_unlock();
dsa_switch_for_each_available_port(dp, ds) ocelot_deinit_port(ocelot, dp->index);
From: Jie Wang wangjie125@huawei.com
commit 08469dacfad25428b66549716811807203744f4f upstream.
Some nic configurations could only be performed after link is down. So this patch refactor this API for reuse.
Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/20230807113452.474224-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -72,6 +72,8 @@ static void hclge_restore_hw_table(struc static void hclge_sync_promisc_mode(struct hclge_dev *hdev); static void hclge_sync_fd_table(struct hclge_dev *hdev); static void hclge_update_fec_stats(struct hclge_dev *hdev); +static int hclge_mac_link_status_wait(struct hclge_dev *hdev, int link_ret, + int wait_cnt);
static struct hnae3_ae_algo ae_algo;
@@ -7656,10 +7658,9 @@ static void hclge_phy_link_status_wait(s } while (++i < HCLGE_PHY_LINK_STATUS_NUM); }
-static int hclge_mac_link_status_wait(struct hclge_dev *hdev, int link_ret) +static int hclge_mac_link_status_wait(struct hclge_dev *hdev, int link_ret, + int wait_cnt) { -#define HCLGE_MAC_LINK_STATUS_NUM 100 - int link_status; int i = 0; int ret; @@ -7672,13 +7673,15 @@ static int hclge_mac_link_status_wait(st return 0;
msleep(HCLGE_LINK_STATUS_MS); - } while (++i < HCLGE_MAC_LINK_STATUS_NUM); + } while (++i < wait_cnt); return -EBUSY; }
static int hclge_mac_phy_link_status_wait(struct hclge_dev *hdev, bool en, bool is_phy) { +#define HCLGE_MAC_LINK_STATUS_NUM 100 + int link_ret;
link_ret = en ? HCLGE_LINK_STATUS_UP : HCLGE_LINK_STATUS_DOWN; @@ -7686,7 +7689,8 @@ static int hclge_mac_phy_link_status_wai if (is_phy) hclge_phy_link_status_wait(hdev, link_ret);
- return hclge_mac_link_status_wait(hdev, link_ret); + return hclge_mac_link_status_wait(hdev, link_ret, + HCLGE_MAC_LINK_STATUS_NUM); }
static int hclge_set_app_loopback(struct hclge_dev *hdev, bool en)
From: Jie Wang wangjie125@huawei.com
commit 6265e242f7b95f2c1195b42ec912b84ad161470e upstream.
In some configure flow of hns3 driver, for example, change mtu, it will disable MAC through firmware before configuration. But firmware disables MAC asynchronously. The rx traffic may be not stopped in this case.
So fixes it by waiting until mac link is down.
Fixes: a9775bb64aa7 ("net: hns3: fix set and get link ksettings issue") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/20230807113452.474224-4-shaojijie@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -7569,6 +7569,8 @@ static void hclge_enable_fd(struct hnae3
static void hclge_cfg_mac_mode(struct hclge_dev *hdev, bool enable) { +#define HCLGE_LINK_STATUS_WAIT_CNT 3 + struct hclge_desc desc; struct hclge_config_mac_mode_cmd *req = (struct hclge_config_mac_mode_cmd *)desc.data; @@ -7593,9 +7595,15 @@ static void hclge_cfg_mac_mode(struct hc req->txrx_pad_fcs_loop_en = cpu_to_le32(loop_en);
ret = hclge_cmd_send(&hdev->hw, &desc, 1); - if (ret) + if (ret) { dev_err(&hdev->pdev->dev, "mac enable fail, ret =%d.\n", ret); + return; + } + + if (!enable) + hclge_mac_link_status_wait(hdev, HCLGE_LINK_STATUS_DOWN, + HCLGE_LINK_STATUS_WAIT_CNT); }
static int hclge_config_switch_param(struct hclge_dev *hdev, int vfid,
From: Yonglong Liu liuyonglong@huawei.com
commit ac6257a3ae5db5193b1f19c268e4f72d274ddb88 upstream.
When externel_lb and reset are executed together, a deadlock may occur: [ 3147.217009] INFO: task kworker/u321:0:7 blocked for more than 120 seconds. [ 3147.230483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3147.238999] task:kworker/u321:0 state:D stack: 0 pid: 7 ppid: 2 flags:0x00000008 [ 3147.248045] Workqueue: hclge hclge_service_task [hclge] [ 3147.253957] Call trace: [ 3147.257093] __switch_to+0x7c/0xbc [ 3147.261183] __schedule+0x338/0x6f0 [ 3147.265357] schedule+0x50/0xe0 [ 3147.269185] schedule_preempt_disabled+0x18/0x24 [ 3147.274488] __mutex_lock.constprop.0+0x1d4/0x5dc [ 3147.279880] __mutex_lock_slowpath+0x1c/0x30 [ 3147.284839] mutex_lock+0x50/0x60 [ 3147.288841] rtnl_lock+0x20/0x2c [ 3147.292759] hclge_reset_prepare+0x68/0x90 [hclge] [ 3147.298239] hclge_reset_subtask+0x88/0xe0 [hclge] [ 3147.303718] hclge_reset_service_task+0x84/0x120 [hclge] [ 3147.309718] hclge_service_task+0x2c/0x70 [hclge] [ 3147.315109] process_one_work+0x1d0/0x490 [ 3147.319805] worker_thread+0x158/0x3d0 [ 3147.324240] kthread+0x108/0x13c [ 3147.328154] ret_from_fork+0x10/0x18
In externel_lb process, the hns3 driver call napi_disable() first, then the reset happen, then the restore process of the externel_lb will fail, and will not call napi_enable(). When doing externel_lb again, napi_disable() will be double call, cause a deadlock of rtnl_lock().
This patch use the HNS3_NIC_STATE_DOWN state to protect the calling of napi_disable() and napi_enable() in externel_lb process, just as the usage in ndo_stop() and ndo_start().
Fixes: 04b6ba143521 ("net: hns3: add support for external loopback test") Signed-off-by: Yonglong Liu liuyonglong@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/20230807113452.474224-5-shaojijie@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -5854,6 +5854,9 @@ void hns3_external_lb_prepare(struct net if (!if_running) return;
+ if (test_and_set_bit(HNS3_NIC_STATE_DOWN, &priv->state)) + return; + netif_carrier_off(ndev); netif_tx_disable(ndev);
@@ -5882,7 +5885,16 @@ void hns3_external_lb_restore(struct net if (!if_running) return;
- hns3_nic_reset_all_ring(priv->ae_handle); + if (hns3_nic_resetting(ndev)) + return; + + if (!test_bit(HNS3_NIC_STATE_DOWN, &priv->state)) + return; + + if (hns3_nic_reset_all_ring(priv->ae_handle)) + return; + + clear_bit(HNS3_NIC_STATE_DOWN, &priv->state);
for (i = 0; i < priv->vector_num; i++) hns3_vector_enable(&priv->tqp_vector[i]);
From: Vladimir Oltean vladimir.oltean@nxp.com
commit f0168042a21292d20007d24ab2e4fc32f79ebf11 upstream.
The workaround implemented in commit 3222b5b613db ("net: enetc: initialize RFS/RSS memories for unused ports too") is no longer effective after commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status"). Thus, it has introduced a regression and we see AER errors being reported again:
$ ip link set sw2p0 up && dhclient -i sw2p0 && ip addr show sw2p0 fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode fsl_enetc 0000:00:00.2 eno2: Link is Up - 2.5Gbps/Full - flow control rx/tx mscc_felix 0000:00:00.5 swp2: configuring for fixed/sgmii link mode mscc_felix 0000:00:00.5 swp2: Link is Up - 1Gbps/Full - flow control off sja1105 spi2.2 sw2p0: configuring for phy/rgmii-id link mode sja1105 spi2.2 sw2p0: Link is Up - 1Gbps/Full - flow control off pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0 pcieport 0000:00:1f.0: AER: can't find device of ID0000
Rob's suggestion is to reimplement the enetc driver workaround as a PCI fixup, and to modify the PCI core to run the fixups for all PCI functions. This change handles the first part.
We refactor the common code in enetc_psi_create() and enetc_psi_destroy(), and use the PCI fixup only for those functions for which enetc_pf_probe() won't get called. This avoids some work being done twice for the PFs which are enabled.
Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFh... Suggested-by: Rob Herring robh@kernel.org Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/freescale/enetc/enetc_pf.c | 103 +++++++++++++++++------- 1 file changed, 73 insertions(+), 30 deletions(-)
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c @@ -1222,50 +1222,81 @@ static int enetc_pf_register_with_ierb(s return enetc_ierb_register_pf(ierb_pdev, pdev); }
-static int enetc_pf_probe(struct pci_dev *pdev, - const struct pci_device_id *ent) +static struct enetc_si *enetc_psi_create(struct pci_dev *pdev) { - struct device_node *node = pdev->dev.of_node; - struct enetc_ndev_priv *priv; - struct net_device *ndev; struct enetc_si *si; - struct enetc_pf *pf; int err;
- err = enetc_pf_register_with_ierb(pdev); - if (err == -EPROBE_DEFER) - return err; - if (err) - dev_warn(&pdev->dev, - "Could not register with IERB driver: %pe, please update the device tree\n", - ERR_PTR(err)); - - err = enetc_pci_probe(pdev, KBUILD_MODNAME, sizeof(*pf)); - if (err) - return dev_err_probe(&pdev->dev, err, "PCI probing failed\n"); + err = enetc_pci_probe(pdev, KBUILD_MODNAME, sizeof(struct enetc_pf)); + if (err) { + dev_err_probe(&pdev->dev, err, "PCI probing failed\n"); + goto out; + }
si = pci_get_drvdata(pdev); if (!si->hw.port || !si->hw.global) { err = -ENODEV; dev_err(&pdev->dev, "could not map PF space, probing a VF?\n"); - goto err_map_pf_space; + goto out_pci_remove; }
err = enetc_setup_cbdr(&pdev->dev, &si->hw, ENETC_CBDR_DEFAULT_SIZE, &si->cbd_ring); if (err) - goto err_setup_cbdr; + goto out_pci_remove;
err = enetc_init_port_rfs_memory(si); if (err) { dev_err(&pdev->dev, "Failed to initialize RFS memory\n"); - goto err_init_port_rfs; + goto out_teardown_cbdr; }
err = enetc_init_port_rss_memory(si); if (err) { dev_err(&pdev->dev, "Failed to initialize RSS memory\n"); - goto err_init_port_rss; + goto out_teardown_cbdr; + } + + return si; + +out_teardown_cbdr: + enetc_teardown_cbdr(&si->cbd_ring); +out_pci_remove: + enetc_pci_remove(pdev); +out: + return ERR_PTR(err); +} + +static void enetc_psi_destroy(struct pci_dev *pdev) +{ + struct enetc_si *si = pci_get_drvdata(pdev); + + enetc_teardown_cbdr(&si->cbd_ring); + enetc_pci_remove(pdev); +} + +static int enetc_pf_probe(struct pci_dev *pdev, + const struct pci_device_id *ent) +{ + struct device_node *node = pdev->dev.of_node; + struct enetc_ndev_priv *priv; + struct net_device *ndev; + struct enetc_si *si; + struct enetc_pf *pf; + int err; + + err = enetc_pf_register_with_ierb(pdev); + if (err == -EPROBE_DEFER) + return err; + if (err) + dev_warn(&pdev->dev, + "Could not register with IERB driver: %pe, please update the device tree\n", + ERR_PTR(err)); + + si = enetc_psi_create(pdev); + if (IS_ERR(si)) { + err = PTR_ERR(si); + goto err_psi_create; }
if (node && !of_device_is_available(node)) { @@ -1353,15 +1384,10 @@ err_alloc_si_res: si->ndev = NULL; free_netdev(ndev); err_alloc_netdev: -err_init_port_rss: -err_init_port_rfs: err_device_disabled: err_setup_mac_addresses: - enetc_teardown_cbdr(&si->cbd_ring); -err_setup_cbdr: -err_map_pf_space: - enetc_pci_remove(pdev); - + enetc_psi_destroy(pdev); +err_psi_create: return err; }
@@ -1384,12 +1410,29 @@ static void enetc_pf_remove(struct pci_d enetc_free_msix(priv);
enetc_free_si_resources(priv); - enetc_teardown_cbdr(&si->cbd_ring);
free_netdev(si->ndev);
- enetc_pci_remove(pdev); + enetc_psi_destroy(pdev); +} + +static void enetc_fixup_clear_rss_rfs(struct pci_dev *pdev) +{ + struct device_node *node = pdev->dev.of_node; + struct enetc_si *si; + + /* Only apply quirk for disabled functions. For the ones + * that are enabled, enetc_pf_probe() will apply it. + */ + if (node && of_device_is_available(node)) + return; + + si = enetc_psi_create(pdev); + if (si) + enetc_psi_destroy(pdev); } +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF, + enetc_fixup_clear_rss_rfs);
static const struct pci_device_id enetc_pf_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF) },
From: Ido Schimmel idosch@nvidia.com
commit 913f60cacda73ccac8eead94983e5884c03e04cd upstream.
A netlink dump callback can return a positive number to signal that more information needs to be dumped or zero to signal that the dump is complete. In the second case, the core netlink code will append the NLMSG_DONE message to the skb in order to indicate to user space that the dump is complete.
The nexthop dump callback always returns a positive number if nexthops were filled in the provided skb, even if the dump is complete. This means that a dump will span at least two recvmsg() calls as long as nexthops are present. In the last recvmsg() call the dump callback will not fill in any nexthops because the previous call indicated that the dump should restart from the last dumped nexthop ID plus one.
# ip nexthop add id 1 blackhole # strace -e sendto,recvmsg -s 5 ip nexthop sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394315, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 36 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 1], {nla_len=4, nla_type=NHA_BLACKHOLE}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36 id 1 blackhole recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20 +++ exited with 0 +++
This behavior is both inefficient and buggy. If the last nexthop to be dumped had the maximum ID of 0xffffffff, then the dump will restart from 0 (0xffffffff + 1) and never end:
# ip nexthop add id $((2**32-1)) blackhole # ip nexthop id 4294967295 blackhole id 4294967295 blackhole [...]
Fix by adjusting the dump callback to return zero when the dump is complete. After the fix only one recvmsg() call is made and the NLMSG_DONE message is appended to the RTM_NEWNEXTHOP response:
# ip nexthop add id $((2**32-1)) blackhole # strace -e sendto,recvmsg -s 5 ip nexthop sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394080, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 56 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 4294967295], {nla_len=4, nla_type=NHA_BLACKHOLE}]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 56 id 4294967295 blackhole +++ exited with 0 +++
Note that if the NLMSG_DONE message cannot be appended because of size limitations, then another recvmsg() will be needed, but the core netlink code will not invoke the dump callback and simply reply with a NLMSG_DONE message since it knows that the callback previously returned zero.
Add a test that fails before the fix:
# ./fib_nexthops.sh -t basic [...] TEST: Maximum nexthop ID dump [FAIL] [...]
And passes after it:
# ./fib_nexthops.sh -t basic [...] TEST: Maximum nexthop ID dump [ OK ] [...]
Fixes: ab84be7e54fc ("net: Initial nexthop code") Reported-by: Petr Machata petrm@nvidia.com Closes: https://lore.kernel.org/netdev/87sf91enuf.fsf@nvidia.com/ Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20230808075233.3337922-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/nexthop.c | 6 +----- tools/testing/selftests/net/fib_nexthops.sh | 5 +++++ 2 files changed, 6 insertions(+), 5 deletions(-)
--- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -3221,13 +3221,9 @@ static int rtm_dump_nexthop(struct sk_bu &rtm_dump_nexthop_cb, &filter); if (err < 0) { if (likely(skb->len)) - goto out; - goto out_err; + err = skb->len; }
-out: - err = skb->len; -out_err: cb->seq = net->nexthop.seq; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); return err; --- a/tools/testing/selftests/net/fib_nexthops.sh +++ b/tools/testing/selftests/net/fib_nexthops.sh @@ -1981,6 +1981,11 @@ basic()
run_cmd "$IP link set dev lo up"
+ # Dump should not loop endlessly when maximum nexthop ID is configured. + run_cmd "$IP nexthop add id $((2**32-1)) blackhole" + run_cmd "timeout 5 $IP nexthop" + log_test $? 0 "Maximum nexthop ID dump" + # # groups #
From: Ido Schimmel idosch@nvidia.com
commit f10d3d9df49d9e6ee244fda6ca264f901a9c5d85 upstream.
rtm_dump_nexthop_bucket_nh() is used to dump nexthop buckets belonging to a specific resilient nexthop group. The function returns a positive return code (the skb length) upon both success and failure.
The above behavior is problematic. When a complete nexthop bucket dump is requested, the function that walks the different nexthops treats the non-zero return code as an error. This causes buckets belonging to different resilient nexthop groups to be dumped using different buffers even if they can all fit in the same buffer:
# ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id 10 group 1 type resilient buckets 1 # ip nexthop add id 20 group 1 type resilient buckets 1 # strace -e recvmsg -s 0 ip nexthop bucket [...] recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64 id 10 index 0 idle_time 10.27 nhid 1 [...] recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64 id 20 index 0 idle_time 6.44 nhid 1 [...]
Fix by only returning a non-zero return code when an error occurred and restarting the dump from the bucket index we failed to fill in. This allows buckets belonging to different resilient nexthop groups to be dumped using the same buffer:
# ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id 10 group 1 type resilient buckets 1 # ip nexthop add id 20 group 1 type resilient buckets 1 # strace -e recvmsg -s 0 ip nexthop bucket [...] recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128 id 10 index 0 idle_time 30.21 nhid 1 id 20 index 0 idle_time 26.7 nhid 1 [...]
While this change is more of a performance improvement change than an actual bug fix, it is a prerequisite for a subsequent patch that does fix a bug.
Fixes: 8a1bbabb034d ("nexthop: Add netlink handlers for bucket dump") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20230808075233.3337922-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/nexthop.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-)
--- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -3363,25 +3363,19 @@ static int rtm_dump_nexthop_bucket_nh(st dd->filter.res_bucket_nh_id != nhge->nh->id) continue;
+ dd->ctx->bucket_index = bucket_index; err = nh_fill_res_bucket(skb, nh, bucket, bucket_index, RTM_NEWNEXTHOPBUCKET, portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, cb->extack); - if (err < 0) { - if (likely(skb->len)) - goto out; - goto out_err; - } + if (err) + return err; }
dd->ctx->done_nh_idx = dd->ctx->nh.idx + 1; - bucket_index = 0; + dd->ctx->bucket_index = 0;
-out: - err = skb->len; -out_err: - dd->ctx->bucket_index = bucket_index; - return err; + return 0; }
static int rtm_dump_nexthop_bucket_cb(struct sk_buff *skb,
From: Ido Schimmel idosch@nvidia.com
commit 8743aeff5bc4dcb5b87b43765f48d5ac3ad7dd9f upstream.
A netlink dump callback can return a positive number to signal that more information needs to be dumped or zero to signal that the dump is complete. In the second case, the core netlink code will append the NLMSG_DONE message to the skb in order to indicate to user space that the dump is complete.
The nexthop bucket dump callback always returns a positive number if nexthop buckets were filled in the provided skb, even if the dump is complete. This means that a dump will span at least two recvmsg() calls as long as nexthop buckets are present. In the last recvmsg() call the dump callback will not fill in any nexthop buckets because the previous call indicated that the dump should restart from the last dumped nexthop ID plus one.
# ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id 10 group 1 type resilient buckets 2 # strace -e sendto,recvmsg -s 5 ip nexthop bucket sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128 id 10 index 0 idle_time 6.66 nhid 1 id 10 index 1 idle_time 6.66 nhid 1 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20 +++ exited with 0 +++
This behavior is both inefficient and buggy. If the last nexthop to be dumped had the maximum ID of 0xffffffff, then the dump will restart from 0 (0xffffffff + 1) and never end:
# ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2 # ip nexthop bucket id 4294967295 index 0 idle_time 5.55 nhid 1 id 4294967295 index 1 idle_time 5.55 nhid 1 id 4294967295 index 0 idle_time 5.55 nhid 1 id 4294967295 index 1 idle_time 5.55 nhid 1 [...]
Fix by adjusting the dump callback to return zero when the dump is complete. After the fix only one recvmsg() call is made and the NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:
# ip link add name dummy1 up type dummy # ip nexthop add id 1 dev dummy1 # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2 # strace -e sendto,recvmsg -s 5 ip nexthop bucket sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148 id 4294967295 index 0 idle_time 6.61 nhid 1 id 4294967295 index 1 idle_time 6.61 nhid 1 +++ exited with 0 +++
Note that if the NLMSG_DONE message cannot be appended because of size limitations, then another recvmsg() will be needed, but the core netlink code will not invoke the dump callback and simply reply with a NLMSG_DONE message since it knows that the callback previously returned zero.
Add a test that fails before the fix:
# ./fib_nexthops.sh -t basic_res [...] TEST: Maximum nexthop ID dump [FAIL] [...]
And passes after it:
# ./fib_nexthops.sh -t basic_res [...] TEST: Maximum nexthop ID dump [ OK ] [...]
Fixes: 8a1bbabb034d ("nexthop: Add netlink handlers for bucket dump") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20230808075233.3337922-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/nexthop.c | 6 +----- tools/testing/selftests/net/fib_nexthops.sh | 5 +++++ 2 files changed, 6 insertions(+), 5 deletions(-)
--- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -3424,13 +3424,9 @@ static int rtm_dump_nexthop_bucket(struc
if (err < 0) { if (likely(skb->len)) - goto out; - goto out_err; + err = skb->len; }
-out: - err = skb->len; -out_err: cb->seq = net->nexthop.seq; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); return err; --- a/tools/testing/selftests/net/fib_nexthops.sh +++ b/tools/testing/selftests/net/fib_nexthops.sh @@ -2206,6 +2206,11 @@ basic_res() run_cmd "$IP nexthop bucket list fdb" log_test $? 255 "Dump all nexthop buckets with invalid 'fdb' keyword"
+ # Dump should not loop endlessly when maximum nexthop ID is configured. + run_cmd "$IP nexthop add id $((2**32-1)) group 1/2 type resilient buckets 4" + run_cmd "timeout 5 $IP nexthop bucket" + log_test $? 0 "Maximum nexthop ID dump" + # # resilient nexthop buckets get requests #
From: Hao Chen chenhao418@huawei.com
commit 5e3d20617b055e725e785e0058426368269949f3 upstream.
hns3_dbg_fill_content()/hclge_dbg_fill_content() is aim to integrate some items to a string for content, and we add '\n' and '\0' in the last two bytes of content.
strscpy() will add '\0' in the last byte of destination buffer(one of items), it result in finishing content print ahead of schedule and some dump content truncation.
One Error log shows as below: cat mac_list/uc UC MAC_LIST:
Expected: UC MAC_LIST: FUNC_ID MAC_ADDR STATE pf 00:2b:19:05:03:00 ACTIVE
The destination buffer is length-bounded and not required to be NUL-terminated, so just change strscpy() to memcpy() to fix it.
Fixes: 1cf3d5567f27 ("net: hns3: fix strncpy() not using dest-buf length as length issue") Signed-off-by: Hao Chen chenhao418@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Link: https://lore.kernel.org/r/20230809020902.1941471-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 4 ++-- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c @@ -461,9 +461,9 @@ static void hns3_dbg_fill_content(char * if (result) { if (item_len < strlen(result[i])) break; - strscpy(pos, result[i], strlen(result[i])); + memcpy(pos, result[i], strlen(result[i])); } else { - strscpy(pos, items[i].name, strlen(items[i].name)); + memcpy(pos, items[i].name, strlen(items[i].name)); } pos += item_len; len -= item_len; --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c @@ -111,9 +111,9 @@ static void hclge_dbg_fill_content(char if (result) { if (item_len < strlen(result[i])) break; - strscpy(pos, result[i], strlen(result[i])); + memcpy(pos, result[i], strlen(result[i])); } else { - strscpy(pos, items[i].name, strlen(items[i].name)); + memcpy(pos, items[i].name, strlen(items[i].name)); } pos += item_len; len -= item_len;
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
commit 0a46781c89dece85386885a407244ca26e5c1c44 upstream.
When 'mcf_edma' is allocated, some space is allocated for a flexible array at the end of the struct. 'chans' item are allocated, that is to say 'pdata->dma_channels'.
Then, this number of item is stored in 'mcf_edma->n_chans'.
A few lines later, if 'mcf_edma->n_chans' is 0, then a default value of 64 is set.
This ends to no space allocated by devm_kzalloc() because chans was 0, but 64 items are read and/or written in some not allocated memory.
Change the logic to define a default value before allocating the memory.
Fixes: e7a3ff92eaf1 ("dmaengine: fsl-edma: add ColdFire mcf5441x edma support") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Link: https://lore.kernel.org/r/f55d914407c900828f6fad3ea5fa791a5f17b9a4.168517244... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/mcf-edma.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/drivers/dma/mcf-edma.c +++ b/drivers/dma/mcf-edma.c @@ -190,7 +190,13 @@ static int mcf_edma_probe(struct platfor return -EINVAL; }
- chans = pdata->dma_channels; + if (!pdata->dma_channels) { + dev_info(&pdev->dev, "setting default channel number to 64"); + chans = 64; + } else { + chans = pdata->dma_channels; + } + len = sizeof(*mcf_edma) + sizeof(*mcf_chan) * chans; mcf_edma = devm_kzalloc(&pdev->dev, len, GFP_KERNEL); if (!mcf_edma) @@ -202,11 +208,6 @@ static int mcf_edma_probe(struct platfor mcf_edma->drvdata = &mcf_data; mcf_edma->big_endian = 1;
- if (!mcf_edma->n_chans) { - dev_info(&pdev->dev, "setting default channel number to 64"); - mcf_edma->n_chans = 64; - } - mutex_init(&mcf_edma->fsl_edma_mutex);
mcf_edma->membase = devm_platform_ioremap_resource(pdev, 0);
From: Fenghua Yu fenghua.yu@intel.com
commit 863676fe1ac1b82fc9eb56c242e80acfbfc18b76 upstream.
Disabling IDXD device doesn't reset Page Request Service (PRS) disable flag to its initial value 0. This may cause user confusion because once PRS is disabled user will see PRS still remains the previous setting (i.e. disabled) via sysfs interface even after the device is disabled.
To eliminate user confusion, reset PRS disable flag to ensure that the PRS flag bit reflects correct state after the device is disabled.
Additionally, simplify the code by setting wq->flags to 0, which clears all flag bits, including any future additions.
Fixes: f2dc327131b5 ("dmaengine: idxd: add per wq PRS disable") Tested-by: Tony Zhu tony.zhu@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20230712193505.3440752-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/idxd/device.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 5abbcc61c528..9a15f0d12c79 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -384,9 +384,7 @@ static void idxd_wq_disable_cleanup(struct idxd_wq *wq) wq->threshold = 0; wq->priority = 0; wq->enqcmds_retries = IDXD_ENQCMDS_RETRIES; - clear_bit(WQ_FLAG_DEDICATED, &wq->flags); - clear_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags); - clear_bit(WQ_FLAG_ATS_DISABLE, &wq->flags); + wq->flags = 0; memset(wq->name, 0, WQ_NAME_SIZE); wq->max_xfer_bytes = WQ_DEFAULT_MAX_XFER; idxd_wq_set_max_batch_size(idxd->data->type, wq, WQ_DEFAULT_MAX_BATCH);
From: Zhang Jianhua chris.zjh@huawei.com
commit 74d7221c1f9c9f3a8c316a3557ca7dca8b99d14c upstream.
No functional modification involved.
drivers/dma/owl-dma.c:208: warning: expecting prototype for struct owl_dma_pchan. Prototype was for struct owl_dma_vchan instead HDRTEST usr/include/sound/asequencer.h
Fixes: 47e20577c24d ("dmaengine: Add Actions Semi Owl family S900 DMA driver") Signed-off-by: Zhang Jianhua chris.zjh@huawei.com Reviewed-by: Randy Dunlap rdunlap@infradead.org Link: https://lore.kernel.org/r/20230722153244.2086949-1-chris.zjh@huawei.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/owl-dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/dma/owl-dma.c +++ b/drivers/dma/owl-dma.c @@ -192,7 +192,7 @@ struct owl_dma_pchan { };
/** - * struct owl_dma_pchan - Wrapper for DMA ENGINE channel + * struct owl_dma_vchan - Wrapper for DMA ENGINE channel * @vc: wrapped virtual channel * @pchan: the physical channel utilized by this channel * @txd: active transaction on this channel
From: Gal Pressman gal@nvidia.com
commit 72cc654970658e88a1cdea08f06b11c218efa4da upstream.
Hold RTNL lock when calling xdp_set_features() with a registered netdev, as the call triggers the netdev notifiers. This could happen when switching from uplink rep to nic profile for example.
This resolves the following call trace:
RTNL: assertion failed at net/core/dev.c (1953) WARNING: CPU: 6 PID: 112670 at net/core/dev.c:1953 call_netdevice_notifiers_info+0x7c/0x80 Modules linked in: sch_mqprio sch_mqprio_lib act_tunnel_key act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress bonding ib_umad ip_gre rdma_ucm mlx5_vfio_pci ipip tunnel4 ip6_gre gre mlx5_ib vfio_pci vfio_pci_core vfio_iommu_type1 ib_uverbs vfio mlx5_core ib_ipoib geneve nf_tables ip6_tunnel tunnel6 iptable_raw openvswitch nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: ib_uverbs] CPU: 6 PID: 112670 Comm: devlink Not tainted 6.4.0-rc7_for_upstream_min_debug_2023_06_28_17_02 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:call_netdevice_notifiers_info+0x7c/0x80 Code: 90 ff 80 3d 2d 6b f7 00 00 75 c5 ba a1 07 00 00 48 c7 c6 e4 ce 0b 82 48 c7 c7 c8 f4 04 82 c6 05 11 6b f7 00 01 e8 a4 7c 8e ff <0f> 0b eb a2 0f 1f 44 00 00 55 48 89 e5 41 54 48 83 e4 f0 48 83 ec RSP: 0018:ffff8882a21c3948 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff82e6f880 RCX: 0000000000000027 RDX: ffff88885f99b5c8 RSI: 0000000000000001 RDI: ffff88885f99b5c0 RBP: 0000000000000028 R08: ffff88887ffabaa8 R09: 0000000000000003 R10: ffff88887fecbac0 R11: ffff88887ff7bac0 R12: ffff8882a21c3968 R13: ffff88811c018940 R14: 0000000000000000 R15: ffff8881274401a0 FS: 00007fe141c81800(0000) GS:ffff88885f980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f787c28b948 CR3: 000000014bcf3005 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x79/0x120 ? call_netdevice_notifiers_info+0x7c/0x80 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? call_netdevice_notifiers_info+0x7c/0x80 ? call_netdevice_notifiers_info+0x7c/0x80 call_netdevice_notifiers+0x2e/0x50 mlx5e_set_xdp_feature+0x21/0x50 [mlx5_core] mlx5e_nic_init+0xf1/0x1a0 [mlx5_core] mlx5e_netdev_init_profile+0x76/0x110 [mlx5_core] mlx5e_netdev_attach_profile+0x1f/0x90 [mlx5_core] mlx5e_netdev_change_profile+0x92/0x160 [mlx5_core] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core] mlx5e_vport_rep_unload+0xaa/0xc0 [mlx5_core] __esw_offloads_unload_rep+0x52/0x60 [mlx5_core] mlx5_esw_offloads_rep_unload+0x52/0x70 [mlx5_core] esw_offloads_unload_rep+0x34/0x70 [mlx5_core] esw_offloads_disable+0x2b/0x90 [mlx5_core] mlx5_eswitch_disable_locked+0x1b9/0x210 [mlx5_core] mlx5_devlink_eswitch_mode_set+0xf5/0x630 [mlx5_core] ? devlink_get_from_attrs_lock+0x9e/0x110 devlink_nl_cmd_eswitch_set_doit+0x60/0xe0 genl_family_rcv_msg_doit.isra.0+0xc2/0x110 genl_rcv_msg+0x17d/0x2b0 ? devlink_get_from_attrs_lock+0x110/0x110 ? devlink_nl_cmd_eswitch_get_doit+0x290/0x290 ? devlink_pernet_pre_exit+0xf0/0xf0 ? genl_family_rcv_msg_doit.isra.0+0x110/0x110 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1f6/0x2c0 netlink_sendmsg+0x232/0x4a0 sock_sendmsg+0x38/0x60 ? _copy_from_user+0x2a/0x60 __sys_sendto+0x110/0x160 ? __count_memcg_events+0x48/0x90 ? handle_mm_fault+0x161/0x260 ? do_user_addr_fault+0x278/0x6e0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fe141b1340a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007fff61d03de8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000afab00 RCX: 00007fe141b1340a RDX: 0000000000000038 RSI: 0000000000afab00 RDI: 0000000000000003 RBP: 0000000000afa910 R08: 00007fe141d80200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 </TASK>
Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: Gal Pressman gal@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 1c820119e438..c27df14df145 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5266,6 +5266,7 @@ void mlx5e_destroy_q_counters(struct mlx5e_priv *priv) static int mlx5e_nic_init(struct mlx5_core_dev *mdev, struct net_device *netdev) { + const bool take_rtnl = netdev->reg_state == NETREG_REGISTERED; struct mlx5e_priv *priv = netdev_priv(netdev); struct mlx5e_flow_steering *fs; int err; @@ -5294,9 +5295,19 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev, mlx5_core_err(mdev, "TLS initialization failed, %d\n", err);
mlx5e_health_create_reporters(priv); + + /* If netdev is already registered (e.g. move from uplink to nic profile), + * RTNL lock must be held before triggering netdev notifiers. + */ + if (take_rtnl) + rtnl_lock(); + /* update XDP supported features */ mlx5e_set_xdp_feature(netdev);
+ if (take_rtnl) + rtnl_unlock(); + return 0; }
From: Jianbo Liu jianbol@nvidia.com
commit ac5da544a3c2047cbfd715acd9cec8380d7fe5c6 upstream.
The flow rule can be splited, and the extra post_act rules are added to post_act table. It's possible to trigger memleak when the rule forwards packets from internal port and over tunnel, in the case that, for example, CT 'new' state offload is allowed. As int_port object is assigned to the flow attribute of post_act rule, and its refcnt is incremented by mlx5e_tc_int_port_get(), but mlx5e_tc_int_port_put() is not called, the refcnt is never decremented, then int_port is never freed.
The kmemleak reports the following error: unreferenced object 0xffff888128204b80 (size 64): comm "handler20", pid 50121, jiffies 4296973009 (age 642.932s) hex dump (first 32 bytes): 01 00 00 00 19 00 00 00 03 f0 00 00 04 00 00 00 ................ 98 77 67 41 81 88 ff ff 98 77 67 41 81 88 ff ff .wgA.....wgA.... backtrace: [<00000000e992680d>] kmalloc_trace+0x27/0x120 [<000000009e945a98>] mlx5e_tc_int_port_get+0x3f3/0xe20 [mlx5_core] [<0000000035a537f0>] mlx5e_tc_add_fdb_flow+0x473/0xcf0 [mlx5_core] [<0000000070c2cec6>] __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core] [<000000005cc84048>] mlx5e_configure_flower+0xd40/0x4c40 [mlx5_core] [<000000004f8a2031>] mlx5e_rep_indr_offload.isra.0+0x10e/0x1c0 [mlx5_core] [<000000007df797dc>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core] [<0000000016c15cc3>] tc_setup_cb_add+0x1cf/0x410 [<00000000a63305b4>] fl_hw_replace_filter+0x38f/0x670 [cls_flower] [<000000008bc9e77c>] fl_change+0x1fd5/0x4430 [cls_flower] [<00000000e7f766e4>] tc_new_tfilter+0x867/0x2010 [<00000000e101c0ef>] rtnetlink_rcv_msg+0x6fc/0x9f0 [<00000000e1111d44>] netlink_rcv_skb+0x12c/0x360 [<0000000082dd6c8b>] netlink_unicast+0x438/0x710 [<00000000fc568f70>] netlink_sendmsg+0x794/0xc50 [<0000000016e92590>] sock_sendmsg+0xc5/0x190
So fix this by moving int_port cleanup code to the flow attribute free helper, which is used by all the attribute free cases.
Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions") Signed-off-by: Jianbo Liu jianbol@nvidia.com Reviewed-by: Vlad Buslov vladbu@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1943,9 +1943,7 @@ static void mlx5e_tc_del_fdb_flow(struct { struct mlx5_eswitch *esw = priv->mdev->priv.eswitch; struct mlx5_flow_attr *attr = flow->attr; - struct mlx5_esw_flow_attr *esw_attr;
- esw_attr = attr->esw_attr; mlx5e_put_flow_tunnel_id(flow);
remove_unready_flow(flow); @@ -1966,12 +1964,6 @@ static void mlx5e_tc_del_fdb_flow(struct
mlx5_tc_ct_match_del(get_ct_priv(priv), &flow->attr->ct_attr);
- if (esw_attr->int_port) - mlx5e_tc_int_port_put(mlx5e_get_int_port_priv(priv), esw_attr->int_port); - - if (esw_attr->dest_int_port) - mlx5e_tc_int_port_put(mlx5e_get_int_port_priv(priv), esw_attr->dest_int_port); - if (flow_flag_test(flow, L3_TO_L2_DECAP)) mlx5e_detach_decap(priv, flow);
@@ -4250,6 +4242,7 @@ static void mlx5_free_flow_attr_actions(struct mlx5e_tc_flow *flow, struct mlx5_flow_attr *attr) { struct mlx5_core_dev *counter_dev = get_flow_counter_dev(flow); + struct mlx5_esw_flow_attr *esw_attr;
if (!attr) return; @@ -4267,6 +4260,18 @@ mlx5_free_flow_attr_actions(struct mlx5e mlx5e_tc_detach_mod_hdr(flow->priv, flow, attr); }
+ if (mlx5e_is_eswitch_flow(flow)) { + esw_attr = attr->esw_attr; + + if (esw_attr->int_port) + mlx5e_tc_int_port_put(mlx5e_get_int_port_priv(flow->priv), + esw_attr->int_port); + + if (esw_attr->dest_int_port) + mlx5e_tc_int_port_put(mlx5e_get_int_port_priv(flow->priv), + esw_attr->dest_int_port); + } + mlx5_tc_ct_delete_flow(get_ct_priv(flow->priv), attr);
free_branch_attr(flow, attr->branch_true);
From: Yevgeny Kliteynik kliteyn@nvidia.com
commit 8bfe1e19fb96d89fce14302e35cba0cd9f39d0a1 upstream.
Fixing wrong calculation of the modify hdr pattern size, where the previously calculated number would not be enough to accommodate the required number of actions.
Fixes: da5d0027d666 ("net/mlx5: DR, Add cache for modify header pattern") Signed-off-by: Yevgeny Kliteynik kliteyn@nvidia.com Reviewed-by: Erez Shitrit erezsh@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ptrn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ptrn.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ptrn.c index d6947fe13d56..8ca534ef5d03 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ptrn.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ptrn.c @@ -82,7 +82,7 @@ dr_ptrn_alloc_pattern(struct mlx5dr_ptrn_mgr *mgr, u32 chunk_size; u32 index;
- chunk_size = ilog2(num_of_actions); + chunk_size = ilog2(roundup_pow_of_two(num_of_actions)); /* HW modify action index granularity is at least 64B */ chunk_size = max_t(u32, chunk_size, DR_CHUNK_SIZE_8);
From: Daniel Jurgens danielj@nvidia.com
commit 2dc2b3922d3c0f52d3a792d15dcacfbc4cc76b8f upstream.
When querying eswitch functions 0 is a valid number of host VFs. After introducing ARM SRIOV falling through to getting the max value from PCI results in using the total VFs allowed on the ARM for the host.
Fixes: 86eec50beaf3 ("net/mlx5: Support querying max VFs from device"); Signed-off-by: Daniel Jurgens danielj@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c @@ -264,8 +264,7 @@ static u16 mlx5_get_max_vfs(struct mlx5_ host_total_vfs = MLX5_GET(query_esw_functions_out, out, host_params_context.host_total_vfs); kvfree(out); - if (host_total_vfs) - return host_total_vfs; + return host_total_vfs; }
done:
From: Chris Mi cmi@nvidia.com
commit 6b5926eb1c034affff3fb44a98cb8c67153847d8 upstream.
If having the following tc rule on stack device:
filter parent ffff: protocol ip pref 3 flower chain 1 filter parent ffff: protocol ip pref 3 flower chain 1 handle 0x1 dst_mac 24:25:d0:e1:00:00 src_mac 02:25:d0:25:01:02 eth_type ipv4 ct_state +trk+new in_hw in_hw_count 1 action order 1: ct commit zone 0 pipe index 2 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec Action statistics: Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed
action order 2: tunnel_key set src_ip 192.168.1.25 dst_ip 192.168.1.26 key_id 4 dst_port 4789 csum pipe index 3 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec Action statistics: Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed
action order 3: mirred (Egress Redirect to device vxlan1) stolen index 9 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec Action statistics: Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed
When handling FIB events, the rule in post act will not be deleted. And because the post act rule has packet reformat and modify header actions, also will hit the following syndromes:
mlx5_core 0000:08:00.0: mlx5_cmd_out_err:829:(pid 11613): DEALLOC_MODIFY_HEADER_CONTEXT(0x941) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x1ab444), err(-22) mlx5_core 0000:08:00.0: mlx5_cmd_out_err:829:(pid 11613): DEALLOC_PACKET_REFORMAT_CONTEXT(0x93e) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x179e84), err(-22)
Fix it by unoffloading post act rule when handling FIB events.
Fixes: 314e1105831b ("net/mlx5e: Add post act offload/unoffload API") Signed-off-by: Chris Mi cmi@nvidia.com Reviewed-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Roi Dayan roid@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c @@ -1461,10 +1461,12 @@ static void mlx5e_invalidate_encap(struc attr = mlx5e_tc_get_encap_attr(flow); esw_attr = attr->esw_attr;
- if (flow_flag_test(flow, SLOW)) + if (flow_flag_test(flow, SLOW)) { mlx5e_tc_unoffload_from_slow_path(esw, flow); - else + } else { mlx5e_tc_unoffload_fdb_rules(esw, flow, flow->attr); + mlx5e_tc_unoffload_flow_post_acts(flow); + }
mlx5e_tc_detach_mod_hdr(priv, flow, attr); attr->modify_hdr = NULL;
From: Shay Drory shayd@nvidia.com
commit 86ed7b773c01ba71617538b3b107c33fd9cf90b8 upstream.
Cited patch introduced buckets in hash mode, but missed to update the ports/bucket check when modifying LAG. Fix the check.
Fixes: 352899f384d4 ("net/mlx5: Lag, use buckets in hash mode") Signed-off-by: Shay Drory shayd@nvidia.com Reviewed-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c @@ -574,7 +574,7 @@ static int __mlx5_lag_modify_definers_de for (i = 0; i < ldev->ports; i++) { for (j = 0; j < ldev->buckets; j++) { idx = i * ldev->buckets + j; - if (ldev->v2p_map[i] == ports[i]) + if (ldev->v2p_map[idx] == ports[idx]) continue;
dest.vport.vhca_id = MLX5_CAP_GEN(ldev->pf[ports[idx] - 1].dev,
From: Moshe Shemesh moshe@nvidia.com
commit d006207625657322ba8251b6e7e829f9659755dc upstream.
When device is in error state, marked by the flag MLX5_DEVICE_STATE_INTERNAL_ERROR, the HW and PCI may not be accessible and so clock update work should be skipped. Furthermore, such access through PCI in error state, after calling mlx5_pci_disable_device() can result in failing to recover from pci errors.
Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support") Reported-and-tested-by: Ganesh G R ganeshgr@linux.ibm.com Closes: https://lore.kernel.org/netdev/9bdb9b9d-140a-7a28-f0de-2e64e873c068@nvidia.c... Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Aya Levin ayal@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c @@ -221,10 +221,15 @@ static void mlx5_timestamp_overflow(stru clock = container_of(timer, struct mlx5_clock, timer); mdev = container_of(clock, struct mlx5_core_dev, clock);
+ if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) + goto out; + write_seqlock_irqsave(&clock->lock, flags); timecounter_read(&timer->tc); mlx5_update_clock_info_page(mdev); write_sequnlock_irqrestore(&clock->lock, flags); + +out: schedule_delayed_work(&timer->overflow_work, timer->overflow_period); }
From: Moshe Shemesh moshe@nvidia.com
commit aab8e1a200b926147db51e3f82fd07bb9edf6a98 upstream.
Handling pci errors should fully teardown and load back auxiliary devices, same as done through mlx5 health recovery flow.
Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") Signed-off-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1845,7 +1845,7 @@ static pci_ers_result_t mlx5_pci_err_det
mlx5_enter_error_state(dev, false); mlx5_error_sw_reset(dev); - mlx5_unload_one(dev, true); + mlx5_unload_one(dev, false); mlx5_drain_health_wq(dev); mlx5_pci_disable_device(dev);
From: Nick Child nnac123@linux.ibm.com
commit db17ba719bceb52f0ae4ebca0e4c17d9a3bebf05 upstream.
Ensure that all offsets in a login response buffer are within the size of the allocated response buffer. Any offsets or lengths that surpass the allocation are likely the result of an incomplete response buffer. In these cases, a full reset is necessary.
When attempting to login, the ibmvnic device will allocate a response buffer and pass a reference to the VIOS. The VIOS will then send the ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled with data. If the ibmvnic device does not get a response in 20 seconds, the old buffer is freed and a new login request is sent. With 2 outstanding requests, any LOGIN_RSP CRQ's could be for the older login request. If this is the case then the login response buffer (which is for the newer login request) could be incomplete and contain invalid data. Therefore, we must enforce strict sanity checks on the response buffer values.
Testing has shown that the `off_rxadd_buff_size` value is filled in last by the VIOS and will be the smoking gun for these circumstances.
Until VIOS can implement a mechanism for tracking outstanding response buffers and a method for mapping a LOGIN_RSP CRQ to a particular login response buffer, the best ibmvnic can do in this situation is perform a full reset.
Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230809221038.51296-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ibm/ibmvnic.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -5396,6 +5396,7 @@ static int handle_login_rsp(union ibmvni int num_tx_pools; int num_rx_pools; u64 *size_array; + u32 rsp_len; int i;
/* CHECK: Test/set of login_pending does not need to be atomic @@ -5447,6 +5448,23 @@ static int handle_login_rsp(union ibmvni ibmvnic_reset(adapter, VNIC_RESET_FATAL); return -EIO; } + + rsp_len = be32_to_cpu(login_rsp->len); + if (be32_to_cpu(login->login_rsp_len) < rsp_len || + rsp_len <= be32_to_cpu(login_rsp->off_txsubm_subcrqs) || + rsp_len <= be32_to_cpu(login_rsp->off_rxadd_subcrqs) || + rsp_len <= be32_to_cpu(login_rsp->off_rxadd_buff_size) || + rsp_len <= be32_to_cpu(login_rsp->off_supp_tx_desc)) { + /* This can happen if a login request times out and there are + * 2 outstanding login requests sent, the LOGIN_RSP crq + * could have been for the older login request. So we are + * parsing the newer response buffer which may be incomplete + */ + dev_err(dev, "FATAL: Login rsp offsets/lengths invalid\n"); + ibmvnic_reset(adapter, VNIC_RESET_FATAL); + return -EIO; + } + size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size)); /* variable buffer sizes are not supported, so just read the
From: Nick Child nnac123@linux.ibm.com
commit 411c565b4bc63e9584a8493882bd566e35a90588 upstream.
If the LOGIN CRQ fails to send then we must DMA unmap the response buffer. Previously, if the CRQ failed then the memory was freed without DMA unmapping.
Fixes: c98d9cc4170d ("ibmvnic: send_login should check for crq errors") Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230809221038.51296-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ibm/ibmvnic.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -4830,11 +4830,14 @@ static int send_login(struct ibmvnic_ada if (rc) { adapter->login_pending = false; netdev_err(adapter->netdev, "Failed to send login, rc=%d\n", rc); - goto buf_rsp_map_failed; + goto buf_send_failed; }
return 0;
+buf_send_failed: + dma_unmap_single(dev, rsp_buffer_token, rsp_buffer_size, + DMA_FROM_DEVICE); buf_rsp_map_failed: kfree(login_rsp_buffer); adapter->login_rsp_buf = NULL;
From: Nick Child nnac123@linux.ibm.com
commit d78a671eb8996af19d6311ecdee9790d2fa479f0 upstream.
Rather than leaving the DMA unmapping of the login buffers to the login response handler, move this work into the login release functions. Previously, these functions were only used for freeing the allocated buffers. This could lead to issues if there are more than one outstanding login buffer requests, which is possible if a login request times out.
If a login request times out, then there is another call to send login. The send login function makes a call to the login buffer release function. In the past, this freed the buffers but did not DMA unmap. Therefore, the VIOS could still write to the old login (now freed) buffer. It is for this reason that it is a good idea to leave the DMA unmap call to the login buffers release function.
Since the login buffer release functions now handle DMA unmapping, remove the duplicate DMA unmapping in handle_login_rsp().
Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230809221038.51296-3-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ibm/ibmvnic.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1588,12 +1588,22 @@ static int ibmvnic_login(struct net_devi
static void release_login_buffer(struct ibmvnic_adapter *adapter) { + if (!adapter->login_buf) + return; + + dma_unmap_single(&adapter->vdev->dev, adapter->login_buf_token, + adapter->login_buf_sz, DMA_TO_DEVICE); kfree(adapter->login_buf); adapter->login_buf = NULL; }
static void release_login_rsp_buffer(struct ibmvnic_adapter *adapter) { + if (!adapter->login_rsp_buf) + return; + + dma_unmap_single(&adapter->vdev->dev, adapter->login_rsp_buf_token, + adapter->login_rsp_buf_sz, DMA_FROM_DEVICE); kfree(adapter->login_rsp_buf); adapter->login_rsp_buf = NULL; } @@ -5411,11 +5421,6 @@ static int handle_login_rsp(union ibmvni } adapter->login_pending = false;
- dma_unmap_single(dev, adapter->login_buf_token, adapter->login_buf_sz, - DMA_TO_DEVICE); - dma_unmap_single(dev, adapter->login_rsp_buf_token, - adapter->login_rsp_buf_sz, DMA_FROM_DEVICE); - /* If the number of queues requested can't be allocated by the * server, the login response will return with code 1. We will need * to resend the login buffer with fewer queues requested.
From: Nick Child nnac123@linux.ibm.com
commit 23cc5f667453ca7645a24c8d21bf84dbf61107b2 upstream.
Perform a partial reset before sending a login request if any of the following are true: 1. If a previous request times out. This can be dangerous because the VIOS could still receive the old login request at any point after the timeout. Therefore, it is best to re-register the CRQ's and sub-CRQ's before retrying. 2. If the previous request returns an error that is not described in PAPR. PAPR provides procedures if the login returns with partial success or aborted return codes (section L.5.1) but other values do not have a defined procedure. Previously, these conditions just returned error from the login function rather than trying to resolve the issue. This can cause further issues since most callers of the login function are not prepared to handle an error when logging in. This improper cleanup can lead to the device being permanently DOWN'd. For example, if the VIOS believes that the device is already logged in then it will return INVALID_STATE (-7). If we never re-register CRQ's then it will always think that the device is already logged in. This leaves the device inoperable.
The partial reset involves freeing the sub-CRQs, freeing the CRQ then registering and initializing a new CRQ and sub-CRQs. This essentially restarts all communication with VIOS to allow for a fresh login attempt that will be unhindered by any previous failed attempts.
Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230809221038.51296-4-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ibm/ibmvnic.c | 46 ++++++++++++++++++++++++++++++++----- 1 file changed, 40 insertions(+), 6 deletions(-)
--- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -97,6 +97,8 @@ static int pending_scrq(struct ibmvnic_a static union sub_crq *ibmvnic_next_scrq(struct ibmvnic_adapter *, struct ibmvnic_sub_crq_queue *); static int ibmvnic_poll(struct napi_struct *napi, int data); +static int reset_sub_crq_queues(struct ibmvnic_adapter *adapter); +static inline void reinit_init_done(struct ibmvnic_adapter *adapter); static void send_query_map(struct ibmvnic_adapter *adapter); static int send_request_map(struct ibmvnic_adapter *, dma_addr_t, u32, u8); static int send_request_unmap(struct ibmvnic_adapter *, u8); @@ -1527,11 +1529,9 @@ static int ibmvnic_login(struct net_devi
if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { - netdev_warn(netdev, "Login timed out, retrying...\n"); - retry = true; - adapter->init_done_rc = 0; - retry_count++; - continue; + netdev_warn(netdev, "Login timed out\n"); + adapter->login_pending = false; + goto partial_reset; }
if (adapter->init_done_rc == ABORTED) { @@ -1576,7 +1576,41 @@ static int ibmvnic_login(struct net_devi } else if (adapter->init_done_rc) { netdev_warn(netdev, "Adapter login failed, init_done_rc = %d\n", adapter->init_done_rc); - return -EIO; + +partial_reset: + /* adapter login failed, so free any CRQs or sub-CRQs + * and register again before attempting to login again. + * If we don't do this then the VIOS may think that + * we are already logged in and reject any subsequent + * attempts + */ + netdev_warn(netdev, + "Freeing and re-registering CRQs before attempting to login again\n"); + retry = true; + adapter->init_done_rc = 0; + retry_count++; + release_sub_crqs(adapter, true); + reinit_init_done(adapter); + release_crq_queue(adapter); + /* If we don't sleep here then we risk an unnecessary + * failover event from the VIOS. This is a known VIOS + * issue caused by a vnic device freeing and registering + * a CRQ too quickly. + */ + msleep(1500); + rc = init_crq_queue(adapter); + if (rc) { + netdev_err(netdev, "login recovery: init CRQ failed %d\n", + rc); + return -EIO; + } + + rc = ibmvnic_reset_init(adapter, false); + if (rc) { + netdev_err(netdev, "login recovery: Reset init failed %d\n", + rc); + return -EIO; + } } } while (retry);
From: Nick Child nnac123@linux.ibm.com
commit 6db541ae279bd4e76dbd939e5fbf298396166242 upstream.
If a login request fails, the recovery process should be protected against parallel resets. It is a known issue that freeing and registering CRQ's in quick succession can result in a failover CRQ from the VIOS. Processing a failover during login recovery is dangerous for two reasons: 1. This will result in two parallel initialization processes, this can cause serious issues during login. 2. It is possible that the failover CRQ is received but never executed. We get notified of a pending failover through a transport event CRQ. The reset is not performed until a INIT CRQ request is received. Previously, if CRQ init fails during login recovery, then the ibmvnic irq is freed and the login process returned error. If failover_pending is true (a transport event was received), then the ibmvnic device would never be able to process the reset since it cannot receive the CRQ_INIT request due to the irq being freed. This leaved the device in a inoperable state.
Therefore, the login failure recovery process must be hardened against these possible issues. Possible failovers (due to quick CRQ free and init) must be avoided and any issues during re-initialization should be dealt with instead of being propagated up the stack. This logic is similar to that of ibmvnic_probe().
Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child nnac123@linux.ibm.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20230809221038.51296-5-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ibm/ibmvnic.c | 70 +++++++++++++++++++++++++------------ 1 file changed, 48 insertions(+), 22 deletions(-)
--- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -116,6 +116,7 @@ static void ibmvnic_tx_scrq_clean_buffer static void free_long_term_buff(struct ibmvnic_adapter *adapter, struct ibmvnic_long_term_buff *ltb); static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter); +static void flush_reset_queue(struct ibmvnic_adapter *adapter);
struct ibmvnic_stat { char name[ETH_GSTRING_LEN]; @@ -1507,8 +1508,8 @@ static const char *adapter_state_to_stri
static int ibmvnic_login(struct net_device *netdev) { + unsigned long flags, timeout = msecs_to_jiffies(20000); struct ibmvnic_adapter *adapter = netdev_priv(netdev); - unsigned long timeout = msecs_to_jiffies(20000); int retry_count = 0; int retries = 10; bool retry; @@ -1573,6 +1574,7 @@ static int ibmvnic_login(struct net_devi "SCRQ irq initialization failed\n"); return rc; } + /* Default/timeout error handling, reset and start fresh */ } else if (adapter->init_done_rc) { netdev_warn(netdev, "Adapter login failed, init_done_rc = %d\n", adapter->init_done_rc); @@ -1588,29 +1590,53 @@ partial_reset: "Freeing and re-registering CRQs before attempting to login again\n"); retry = true; adapter->init_done_rc = 0; - retry_count++; release_sub_crqs(adapter, true); - reinit_init_done(adapter); - release_crq_queue(adapter); - /* If we don't sleep here then we risk an unnecessary - * failover event from the VIOS. This is a known VIOS - * issue caused by a vnic device freeing and registering - * a CRQ too quickly. + /* Much of this is similar logic as ibmvnic_probe(), + * we are essentially re-initializing communication + * with the server. We really should not run any + * resets/failovers here because this is already a form + * of reset and we do not want parallel resets occurring */ - msleep(1500); - rc = init_crq_queue(adapter); - if (rc) { - netdev_err(netdev, "login recovery: init CRQ failed %d\n", - rc); - return -EIO; - } - - rc = ibmvnic_reset_init(adapter, false); - if (rc) { - netdev_err(netdev, "login recovery: Reset init failed %d\n", - rc); - return -EIO; - } + do { + reinit_init_done(adapter); + /* Clear any failovers we got in the previous + * pass since we are re-initializing the CRQ + */ + adapter->failover_pending = false; + release_crq_queue(adapter); + /* If we don't sleep here then we risk an + * unnecessary failover event from the VIOS. + * This is a known VIOS issue caused by a vnic + * device freeing and registering a CRQ too + * quickly. + */ + msleep(1500); + /* Avoid any resets, since we are currently + * resetting. + */ + spin_lock_irqsave(&adapter->rwi_lock, flags); + flush_reset_queue(adapter); + spin_unlock_irqrestore(&adapter->rwi_lock, + flags); + + rc = init_crq_queue(adapter); + if (rc) { + netdev_err(netdev, "login recovery: init CRQ failed %d\n", + rc); + return -EIO; + } + + rc = ibmvnic_reset_init(adapter, false); + if (rc) + netdev_err(netdev, "login recovery: Reset init failed %d\n", + rc); + /* IBMVNIC_CRQ_INIT will return EAGAIN if it + * fails, since ibmvnic_reset_init will free + * irq's in failure, we won't be able to receive + * new CRQs so we need to keep trying. probe() + * handles this similarly. + */ + } while (rc == -EAGAIN && retry_count++ < retries); } } while (retry);
From: William Breathitt Gray william.gray@linaro.org
commit 33f83d13ded164cd49ce2a3bd2770115abc64e6f upstream.
The WinSystems WS16C48 I/O address region spans offsets 0x0 through 0xA, which is a total of 11 bytes. Fix the WS16C48_EXTENT define to the correct value of 11 so that access to necessary device registers is properly requested in the ws16c48_probe() callback by the devm_request_region() function call.
Fixes: 2c05a0f29f41 ("gpio: ws16c48: Implement and utilize register structures") Cc: stable@vger.kernel.org Cc: Paul Demetrotion pdemetrotion@winsystems.com Signed-off-by: William Breathitt Gray william.gray@linaro.org Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-ws16c48.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpio/gpio-ws16c48.c +++ b/drivers/gpio/gpio-ws16c48.c @@ -18,7 +18,7 @@ #include <linux/spinlock.h> #include <linux/types.h>
-#define WS16C48_EXTENT 10 +#define WS16C48_EXTENT 11 #define MAX_NUM_WS16C48 max_num_isa_dev(WS16C48_EXTENT)
static unsigned int base[MAX_NUM_WS16C48];
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
commit 5a78d5db9c90c9dc84212f40a5f2687b7cafc8ec upstream.
Simulated chips use a mutex for synchronization in driver callbacks so they must not be called from interrupt context. Set the can_sleep field of the GPIO chip to true to force users to only use threaded irqs.
Fixes: cb8c474e79be ("gpio: sim: new testing module") Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-sim.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpio/gpio-sim.c +++ b/drivers/gpio/gpio-sim.c @@ -429,6 +429,7 @@ static int gpio_sim_add_bank(struct fwno gc->set_config = gpio_sim_set_config; gc->to_irq = gpio_sim_to_irq; gc->free = gpio_sim_free; + gc->can_sleep = true;
ret = devm_gpiochip_add_data(dev, gc, chip); if (ret)
From: Josef Bacik josef@toxicpanda.com
commit fc1f91b9231a28fba333f931a031bf776bc6ef0e upstream.
Recently we've been having mysterious hangs while running generic/475 on the CI system. This turned out to be something like this:
Task 1 dmsetup suspend --nolockfs -> __dm_suspend -> dm_wait_for_completion -> dm_wait_for_bios_completion -> Unable to complete because of IO's on a plug in Task 2
Task 2 wb_workfn -> wb_writeback -> blk_start_plug -> writeback_sb_inodes -> Infinite loop unable to make an allocation
Task 3 cache_block_group ->read_extent_buffer_pages ->Waiting for IO to complete that can't be submitted because Task 1 suspended the DM device
The problem here is that we need Task 2 to be scheduled completely for the blk plug to flush. Normally this would happen, we normally wait for the block group caching to finish (Task 3), and this schedule would result in the block plug flushing.
However if there's enough free space available from the current caching to satisfy the allocation we won't actually wait for the caching to complete. This check however just checks that we have enough space, not that we can make the allocation. In this particular case we were trying to allocate 9MiB, and we had 10MiB of free space, but we didn't have 9MiB of contiguous space to allocate, and thus the allocation failed and we looped.
We specifically don't cycle through the FFE loop until we stop finding cached block groups because we don't want to allocate new block groups just because we're caching, so we short circuit the normal loop once we hit LOOP_CACHING_WAIT and we found a caching block group.
This is normally fine, except in this particular case where the caching thread can't make progress because the DM device has been suspended.
Fix this by not only waiting for free space to >= the amount of space we want to allocate, but also that we make some progress in caching from the time we start waiting. This will keep us from busy looping when the caching is taking a while but still theoretically has enough space for us to allocate from, and fixes this particular case by forcing us to actually sleep and wait for forward progress, which will flush the plug.
With this fix we're no longer hanging with generic/475.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/block-group.c | 17 +++++++++++++++-- fs/btrfs/block-group.h | 2 ++ 2 files changed, 17 insertions(+), 2 deletions(-)
--- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -441,13 +441,23 @@ void btrfs_wait_block_group_cache_progre u64 num_bytes) { struct btrfs_caching_control *caching_ctl; + int progress;
caching_ctl = btrfs_get_caching_control(cache); if (!caching_ctl) return;
+ /* + * We've already failed to allocate from this block group, so even if + * there's enough space in the block group it isn't contiguous enough to + * allow for an allocation, so wait for at least the next wakeup tick, + * or for the thing to be done. + */ + progress = atomic_read(&caching_ctl->progress); + wait_event(caching_ctl->wait, btrfs_block_group_done(cache) || - (cache->free_space_ctl->free_space >= num_bytes)); + (progress != atomic_read(&caching_ctl->progress) && + (cache->free_space_ctl->free_space >= num_bytes)));
btrfs_put_caching_control(caching_ctl); } @@ -802,8 +812,10 @@ next:
if (total_found > CACHING_CTL_WAKE_UP) { total_found = 0; - if (wakeup) + if (wakeup) { + atomic_inc(&caching_ctl->progress); wake_up(&caching_ctl->wait); + } } } path->slots[0]++; @@ -910,6 +922,7 @@ int btrfs_cache_block_group(struct btrfs init_waitqueue_head(&caching_ctl->wait); caching_ctl->block_group = cache; refcount_set(&caching_ctl->count, 2); + atomic_set(&caching_ctl->progress, 0); btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL);
spin_lock(&cache->lock); --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -85,6 +85,8 @@ struct btrfs_caching_control { wait_queue_head_t wait; struct btrfs_work work; struct btrfs_block_group *block_group; + /* Track progress of caching during allocation. */ + atomic_t progress; refcount_t count; };
From: Christoph Hellwig hch@lst.de
commit effa24f689ce0948f68c754991a445a8d697d3a8 upstream.
extent_write_cache_pages stops writing pages as soon as nr_to_write hits zero. That is the right thing for opportunistic writeback, but incorrect for data integrity writeback, which needs to ensure that no dirty pages are left in the range. Thus only stop the writeback for WB_SYNC_NONE if nr_to_write hits 0.
This is a port of write_cache_pages changes in commit 05fe478dd04e ("mm: write_cache_pages integrity fix").
Note that I've only trigger the problem with other changes to the btrfs writeback code, but this condition seems worthwhile fixing anyway.
CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: David Sterba dsterba@suse.com [ updated comment ] Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2181,11 +2181,12 @@ retry: }
/* - * the filesystem may choose to bump up nr_to_write. + * The filesystem may choose to bump up nr_to_write. * We have to make sure to honor the new nr_to_write - * at any time + * at any time. */ - nr_to_write_done = wbc->nr_to_write <= 0; + nr_to_write_done = (wbc->sync_mode == WB_SYNC_NONE && + wbc->nr_to_write <= 0); } folio_batch_release(&fbatch); cond_resched();
From: Christoph Hellwig hch@lst.de
commit 5c25699871112853f231e52d51c576d5c759a020 upstream.
__extent_writepage could have started on more pages than the one it was called for. This happens regularly for zoned file systems, and in theory could happen for compressed I/O if the worker thread was executed very quickly. For such pages extent_write_cache_pages waits for writeback to complete before moving on to the next page, which is highly inefficient as it blocks the flusher thread.
Port over the PageDirty check that was added to write_cache_pages in commit 515f4a037fb ("mm: write_cache_pages optimise page cleaning") to fix this.
CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2345,6 +2345,12 @@ retry: continue; }
+ if (!folio_test_dirty(folio)) { + /* Someone wrote it for us. */ + folio_unlock(folio); + continue; + } + if (wbc->sync_mode != WB_SYNC_NONE) { if (folio_test_writeback(folio)) submit_write_bio(bio_ctrl, 0);
From: Christoph Hellwig hch@lst.de
commit 12b2d64e591652a2d97dd3afa2b062ca7a4ba352 upstream.
When the call to btrfs_reloc_clone_csums in cow_file_range returns an error, we jump to the out_unlock label with the extent_reserved variable set to false. The cleanup at the label will then call extent_clear_unlock_delalloc on the range from start to end. But we've already added cur_alloc_size to start before the jump, so there might no range be left from the newly incremented start to end. Move the check for 'start < end' so that it is reached by also for the !extent_reserved case.
CC: stable@vger.kernel.org # 6.1+ Fixes: a315e68f6e8b ("Btrfs: fix invalid attempt to free reserved space on failure to cow range") Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1453,8 +1453,6 @@ out_unlock: clear_bits, page_ops); start += cur_alloc_size; - if (start >= end) - return ret; }
/* @@ -1463,9 +1461,11 @@ out_unlock: * space_info's bytes_may_use counter, reserved in * btrfs_check_data_free_space(). */ - extent_clear_unlock_delalloc(inode, start, end, locked_page, - clear_bits | EXTENT_CLEAR_DATA_RESV, - page_ops); + if (start < end) { + clear_bits |= EXTENT_CLEAR_DATA_RESV; + extent_clear_unlock_delalloc(inode, start, end, locked_page, + clear_bits, page_ops); + } return ret; }
From: Qu Wenruo wqu@suse.com
commit 05d7ce504545f7874529701664c90814ca645c5d upstream.
[BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge().
[CAUSE] The root cause of the triggered ASSERT() is we can have a race between quota tree creation and relocation.
This leads us to create a duplicated quota tree in the btrfs_read_fs_root() path, and since it's treated as fs tree, it would have ROOT_SHAREABLE flag, causing us to create a reloc tree for it.
The bug itself is fixed by a dedicated patch for it, but this already taught us the ASSERT() is not something straightforward for developers.
[ENHANCEMENT] Instead of using an ASSERT(), let's handle it gracefully and output extra info about the mismatch reloc roots to help debug.
Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside merge_reloc_roots() later. Also replace those ASSERT(0)s with WARN_ON()s.
CC: stable@vger.kernel.org # 5.15+ Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/relocation.c | 45 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 8 deletions(-)
--- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1916,7 +1916,39 @@ again: err = PTR_ERR(root); break; } - ASSERT(root->reloc_root == reloc_root); + + if (unlikely(root->reloc_root != reloc_root)) { + if (root->reloc_root) { + btrfs_err(fs_info, +"reloc tree mismatch, root %lld has reloc root key (%lld %u %llu) gen %llu, expect reloc root key (%lld %u %llu) gen %llu", + root->root_key.objectid, + root->reloc_root->root_key.objectid, + root->reloc_root->root_key.type, + root->reloc_root->root_key.offset, + btrfs_root_generation( + &root->reloc_root->root_item), + reloc_root->root_key.objectid, + reloc_root->root_key.type, + reloc_root->root_key.offset, + btrfs_root_generation( + &reloc_root->root_item)); + } else { + btrfs_err(fs_info, +"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld %u %llu) gen %llu", + root->root_key.objectid, + reloc_root->root_key.objectid, + reloc_root->root_key.type, + reloc_root->root_key.offset, + btrfs_root_generation( + &reloc_root->root_item)); + } + list_add(&reloc_root->root_list, &reloc_roots); + btrfs_put_root(root); + btrfs_abort_transaction(trans, -EUCLEAN); + if (!err) + err = -EUCLEAN; + break; + }
/* * set reference count to 1, so btrfs_recover_relocation @@ -1989,7 +2021,7 @@ again: root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset, false); if (btrfs_root_refs(&reloc_root->root_item) > 0) { - if (IS_ERR(root)) { + if (WARN_ON(IS_ERR(root))) { /* * For recovery we read the fs roots on mount, * and if we didn't find the root then we marked @@ -1998,17 +2030,14 @@ again: * memory. However there's no reason we can't * handle the error properly here just in case. */ - ASSERT(0); ret = PTR_ERR(root); goto out; } - if (root->reloc_root != reloc_root) { + if (WARN_ON(root->reloc_root != reloc_root)) { /* - * This is actually impossible without something - * going really wrong (like weird race condition - * or cosmic rays). + * This can happen if on-disk metadata has some + * corruption, e.g. bad reloc tree key offset. */ - ASSERT(0); ret = -EINVAL; goto out; }
From: Qu Wenruo wqu@suse.com
commit 6ebcd021c92b8e4b904552e4d87283032100796d upstream.
[BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge().
That ASSERT() makes sure the reloc tree is properly pointed back by its subvolume tree.
[CAUSE] After more debugging output, it turns out we had an invalid reloc tree:
BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17
Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM, QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree.
But reloc trees can only exist for subvolumes, as for non-subvolume trees, we just COW the involved tree block, no need to create a reloc tree since those tree blocks won't be shared with other trees.
Only subvolumes tree can share tree blocks with other trees (thus they have BTRFS_ROOT_SHAREABLE flag).
Thus this new debug output proves my previous assumption that corrupted on-disk data can trigger that ASSERT().
[FIX] Besides the dedicated fix and the graceful exit, also let tree-checker to check such root keys, to make sure reloc trees can only exist for subvolumes.
CC: stable@vger.kernel.org # 5.15+ Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/disk-io.c | 3 ++- fs/btrfs/tree-checker.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1351,7 +1351,8 @@ static int btrfs_init_fs_root(struct btr btrfs_drew_lock_init(&root->snapshot_lock);
if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID && - !btrfs_is_data_reloc_root(root)) { + !btrfs_is_data_reloc_root(root) && + is_fstree(root->root_key.objectid)) { set_bit(BTRFS_ROOT_SHAREABLE, &root->state); btrfs_check_and_init_root_item(&root->root_item); } --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -446,6 +446,20 @@ static int check_root_key(struct extent_ btrfs_item_key_to_cpu(leaf, &item_key, slot); is_root_item = (item_key.type == BTRFS_ROOT_ITEM_KEY);
+ /* + * Bad rootid for reloc trees. + * + * Reloc trees are only for subvolume trees, other trees only need + * to be COWed to be relocated. + */ + if (unlikely(is_root_item && key->objectid == BTRFS_TREE_RELOC_OBJECTID && + !is_fstree(key->offset))) { + generic_err(leaf, slot, + "invalid reloc tree for root %lld, root id is not a subvolume tree", + key->offset); + return -EUCLEAN; + } + /* No such tree id */ if (unlikely(key->objectid == 0)) { if (is_root_item)
From: Josef Bacik josef@toxicpanda.com
commit 92fb94b69c6accf1e49fff699640fa0ce03dc910 upstream.
We set cache_block_group_error if btrfs_cache_block_group() returns an error, this is because we could end up not finding space to allocate and mistakenly return -ENOSPC, and which could then abort the transaction with the incorrect errno, and in the case of ENOSPC result in a WARN_ON() that will trip up tests like generic/475.
However there's the case where multiple threads can be racing, one thread gets the proper error, and the other thread doesn't actually call btrfs_cache_block_group(), it instead sees ->cached == BTRFS_CACHE_ERROR. Again the result is the same, we fail to allocate our space and return -ENOSPC. Instead we need to set cache_block_group_error to -EIO in this case to make sure that if we do not make our allocation we get the appropriate error returned back to the caller.
CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent-tree.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4318,8 +4318,11 @@ have_block_group: ret = 0; }
- if (unlikely(block_group->cached == BTRFS_CACHE_ERROR)) + if (unlikely(block_group->cached == BTRFS_CACHE_ERROR)) { + if (!cache_block_group_error) + cache_block_group_error = -EIO; goto loop; + }
if (!find_free_extent_check_size_class(ffe_ctl, block_group)) goto loop;
From: Tony Battersby tonyb@cybernetics.com
commit 9426d3cef5000824e5f24f80ed5f42fb935f2488 upstream.
(lightly modified commit message mostly by Linus Torvalds)
The parsing code for /proc/scsi/scsi is disgusting and broken. We should have just used 'sscanf()' or something simple like that, but the logic may actually predate our kernel sscanf library routine for all I know. It certainly predates both git and BK histories.
And we can't change it to be something sane like that now, because the string matching at the start is done case-insensitively, and the separator parsing between numbers isn't done at all, so *any* separator will work, including a possible terminating NUL character.
This interface is root-only, and entirely for legacy use, so there is absolutely no point in trying to tighten up the parsing. Because any separator has traditionally worked, it's entirely possible that people have used random characters rather than the suggested space.
So don't bother to try to pretty it up, and let's just make a minimal patch that can be back-ported and we can forget about this whole sorry thing for another two decades.
Just make it at least not read past the end of the supplied data.
Link: https://lore.kernel.org/linux-scsi/b570f5fe-cb7c-863a-6ed9-f6774c219b88@cybe... Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Martin K Petersen martin.petersen@oracle.com Cc: James Bottomley jejb@linux.ibm.com Cc: Willy Tarreau w@1wt.eu Cc: stable@kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Tony Battersby tonyb@cybernetics.com Signed-off-by: Martin K Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/scsi_proc.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
--- a/drivers/scsi/scsi_proc.c +++ b/drivers/scsi/scsi_proc.c @@ -406,7 +406,7 @@ static ssize_t proc_scsi_write(struct fi size_t length, loff_t *ppos) { int host, channel, id, lun; - char *buffer, *p; + char *buffer, *end, *p; int err;
if (!buf || length > PAGE_SIZE) @@ -421,10 +421,14 @@ static ssize_t proc_scsi_write(struct fi goto out;
err = -EINVAL; - if (length < PAGE_SIZE) - buffer[length] = '\0'; - else if (buffer[PAGE_SIZE-1]) - goto out; + if (length < PAGE_SIZE) { + end = buffer + length; + *end = '\0'; + } else { + end = buffer + PAGE_SIZE - 1; + if (*end) + goto out; + }
/* * Usage: echo "scsi add-single-device 0 1 2 3" >/proc/scsi/scsi @@ -433,10 +437,10 @@ static ssize_t proc_scsi_write(struct fi if (!strncmp("scsi add-single-device", buffer, 22)) { p = buffer + 23;
- host = simple_strtoul(p, &p, 0); - channel = simple_strtoul(p + 1, &p, 0); - id = simple_strtoul(p + 1, &p, 0); - lun = simple_strtoul(p + 1, &p, 0); + host = (p < end) ? simple_strtoul(p, &p, 0) : 0; + channel = (p + 1 < end) ? simple_strtoul(p + 1, &p, 0) : 0; + id = (p + 1 < end) ? simple_strtoul(p + 1, &p, 0) : 0; + lun = (p + 1 < end) ? simple_strtoul(p + 1, &p, 0) : 0;
err = scsi_add_single_device(host, channel, id, lun);
@@ -447,10 +451,10 @@ static ssize_t proc_scsi_write(struct fi } else if (!strncmp("scsi remove-single-device", buffer, 25)) { p = buffer + 26;
- host = simple_strtoul(p, &p, 0); - channel = simple_strtoul(p + 1, &p, 0); - id = simple_strtoul(p + 1, &p, 0); - lun = simple_strtoul(p + 1, &p, 0); + host = (p < end) ? simple_strtoul(p, &p, 0) : 0; + channel = (p + 1 < end) ? simple_strtoul(p + 1, &p, 0) : 0; + id = (p + 1 < end) ? simple_strtoul(p + 1, &p, 0) : 0; + lun = (p + 1 < end) ? simple_strtoul(p + 1, &p, 0) : 0;
err = scsi_remove_single_device(host, channel, id, lun); }
From: Michael Kelley mikelley@microsoft.com
commit 175544ad48cbf56affeef2a679c6a4d4fb1e2881 upstream.
Hyper-V provides the ability to connect Fibre Channel LUNs to the host system and present them in a guest VM as a SCSI device. I/O to the vFC device is handled by the storvsc driver. The storvsc driver includes a partial integration with the FC transport implemented in the generic portion of the Linux SCSI subsystem so that FC attributes can be displayed in /sys. However, the partial integration means that some aspects of vFC don't work properly. Unfortunately, a full and correct integration isn't practical because of limitations in what Hyper-V provides to the guest.
In particular, in the context of Hyper-V storvsc, the FC transport timeout function fc_eh_timed_out() causes a kernel panic because it can't find the rport and dereferences a NULL pointer. The original patch that added the call from storvsc_eh_timed_out() to fc_eh_timed_out() is faulty in this regard.
In many cases a timeout is due to a transient condition, so the situation can be improved by just continuing to wait like with other I/O requests issued by storvsc, and avoiding the guaranteed panic. For a permanent failure, continuing to wait may result in a hung thread instead of a panic, which again may be better.
So fix the panic by removing the storvsc call to fc_eh_timed_out(). This allows storvsc to keep waiting for a response. The change has been tested by users who experienced a panic in fc_eh_timed_out() due to transient timeouts, and it solves their problem.
In the future we may want to deprecate the vFC functionality in storvsc since it can't be fully fixed. But it has current users for whom it is working well enough, so it should probably stay for a while longer.
Fixes: 3930d7309807 ("scsi: storvsc: use default I/O timeout handler for FC devices") Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1690606764-79669-1-git-send-email-mikelley@microso... Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/storvsc_drv.c | 4 ---- 1 file changed, 4 deletions(-)
--- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1672,10 +1672,6 @@ static int storvsc_host_reset_handler(st */ static enum scsi_timeout_action storvsc_eh_timed_out(struct scsi_cmnd *scmnd) { -#if IS_ENABLED(CONFIG_SCSI_FC_ATTRS) - if (scmnd->device->host->transportt == fc_transport_template) - return fc_eh_timed_out(scmnd); -#endif return SCSI_EH_RESET_TIMER; }
From: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com
commit b6d128f89a85771433a004e8656090ccbe1fb969 upstream.
Should use devm_kzalloc() for struct ufs_renesas_priv because the .initialized should be false as default.
Fixes: d69520288efd ("scsi: ufs: ufs-renesas: Add support for Renesas R-Car UFS controller") Signed-off-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Link: https://lore.kernel.org/r/20230803081812.1446282-1-yoshihiro.shimoda.uh@rene... Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ufs/host/ufs-renesas.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/ufs/host/ufs-renesas.c +++ b/drivers/ufs/host/ufs-renesas.c @@ -359,7 +359,7 @@ static int ufs_renesas_init(struct ufs_h { struct ufs_renesas_priv *priv;
- priv = devm_kmalloc(hba->dev, sizeof(*priv), GFP_KERNEL); + priv = devm_kzalloc(hba->dev, sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; ufshcd_set_variant(hba, priv);
From: Alexandra Diupina adiupina@astralinux.ru
commit 8366d1f1249a0d0bba41d0bd1298d63e5d34c7f7 upstream.
Add a check for the command slot value to avoid dereferencing a NULL pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Co-developed-by: Vladimir Telezhnikov vtelezhnikov@astralinux.ru Signed-off-by: Vladimir Telezhnikov vtelezhnikov@astralinux.ru Signed-off-by: Alexandra Diupina adiupina@astralinux.ru Link: https://lore.kernel.org/r/20230728123521.18293-1-adiupina@astralinux.ru Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/53c700.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/scsi/53c700.c +++ b/drivers/scsi/53c700.c @@ -1598,7 +1598,7 @@ NCR_700_intr(int irq, void *dev_id) printk("scsi%d (%d:%d) PHASE MISMATCH IN SEND MESSAGE %d remain, return %p[%04x], phase %s\n", host->host_no, pun, lun, count, (void *)temp, temp - hostdata->pScript, sbcl_to_string(NCR_700_readb(host, SBCL_REG))); #endif resume_offset = hostdata->pScript + Ent_SendMessagePhaseMismatch; - } else if(dsp >= to32bit(&slot->pSG[0].ins) && + } else if (slot && dsp >= to32bit(&slot->pSG[0].ins) && dsp <= to32bit(&slot->pSG[NCR_700_SG_SEGMENTS].ins)) { int data_transfer = NCR_700_readl(host, DBC_REG) & 0xffffff; int SGcount = (dsp - to32bit(&slot->pSG[0].ins))/sizeof(struct NCR_700_SG_List);
From: Zhu Wang wangzhu9@huawei.com
commit 41320b18a0e0dfb236dba4edb9be12dba1878156 upstream.
If device_add() returns error, the name allocated by dev_set_name() needs be freed. As the comment of device_add() says, put_device() should be used to give up the reference in the error path. So fix this by calling put_device(), then the name can be freed in kobject_cleanp().
Fixes: c8806b6c9e82 ("snic: driver for Cisco SCSI HBA") Signed-off-by: Zhu Wang wangzhu9@huawei.com Acked-by: Narsimhulu Musini nmusini@cisco.com Link: https://lore.kernel.org/r/20230801111421.63651-1-wangzhu9@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/snic/snic_disc.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/scsi/snic/snic_disc.c +++ b/drivers/scsi/snic/snic_disc.c @@ -303,6 +303,7 @@ snic_tgt_create(struct snic *snic, struc "Snic Tgt: device_add, with err = %d\n", ret);
+ put_device(&tgt->dev); put_device(&snic->shost->shost_gendev); spin_lock_irqsave(snic->shost->host_lock, flags); list_del(&tgt->list);
From: Zhu Wang wangzhu9@huawei.com
commit 04b5b5cb0136ce970333a9c6cec7e46adba1ea3a upstream.
If device_add() returns error, the name allocated by dev_set_name() needs be freed. As the comment of device_add() says, put_device() should be used to decrease the reference count in the error path. So fix this by calling put_device(), then the name can be freed in kobject_cleanp().
Fixes: ee959b00c335 ("SCSI: convert struct class_device to struct device") Signed-off-by: Zhu Wang wangzhu9@huawei.com Link: https://lore.kernel.org/r/20230803020230.226903-1-wangzhu9@huawei.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/raid_class.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/scsi/raid_class.c +++ b/drivers/scsi/raid_class.c @@ -248,6 +248,7 @@ int raid_component_add(struct raid_templ return 0;
err_out: + put_device(&rc->dev); list_del(&rc->node); rd->component_count--; put_device(component_dev);
From: Karan Tilak Kumar kartilak@cisco.com
commit 5a43b07a87835660f91d88a4db11abfea8c523b7 upstream.
fnic_clean_pending_aborts() was returning a non-zero value irrespective of failure or success. This caused the caller of this function to assume that the device reset had failed, even though it would succeed in most cases. As a consequence, a successful device reset would escalate to host reset.
Reviewed-by: Sesidhar Baddela sebaddel@cisco.com Tested-by: Karan Tilak Kumar kartilak@cisco.com Signed-off-by: Karan Tilak Kumar kartilak@cisco.com Link: https://lore.kernel.org/r/20230727193919.2519-1-kartilak@cisco.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/fnic/fnic.h | 2 +- drivers/scsi/fnic/fnic_scsi.c | 6 ++++-- 2 files changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/scsi/fnic/fnic.h +++ b/drivers/scsi/fnic/fnic.h @@ -27,7 +27,7 @@
#define DRV_NAME "fnic" #define DRV_DESCRIPTION "Cisco FCoE HBA Driver" -#define DRV_VERSION "1.6.0.54" +#define DRV_VERSION "1.6.0.55" #define PFX DRV_NAME ": " #define DFX DRV_NAME "%d: "
--- a/drivers/scsi/fnic/fnic_scsi.c +++ b/drivers/scsi/fnic/fnic_scsi.c @@ -2139,7 +2139,7 @@ static int fnic_clean_pending_aborts(str bool new_sc)
{ - int ret = SUCCESS; + int ret = 0; struct fnic_pending_aborts_iter_data iter_data = { .fnic = fnic, .lun_dev = lr_sc->device, @@ -2159,9 +2159,11 @@ static int fnic_clean_pending_aborts(str
/* walk again to check, if IOs are still pending in fw */ if (fnic_is_abts_pending(fnic, lr_sc)) - ret = FAILED; + ret = 1;
clean_pending_aborts_end: + FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host, + "%s: exit status: %d\n", __func__, ret); return ret; }
From: Nilesh Javali njavali@marvell.com
commit 1516ee035df32115197cd93ae3619dba7b020986 upstream.
While performing certain power-off sequences, PCI drivers are called to suspend and resume their underlying devices through PCI PM (power management) interface. However the hardware does not support PCI PM suspend/resume operations so system wide suspend/resume leads to bad MFW (management firmware) state which causes various follow-up errors in driver when communicating with the device/firmware.
To fix this driver implements PCI PM suspend handler to indicate unsupported operation to the PCI subsystem explicitly, thus avoiding system to go into suspended/standby mode.
Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230807093725.46829-2-njavali@marvell.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qedi/qedi_main.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/scsi/qedi/qedi_main.c +++ b/drivers/scsi/qedi/qedi_main.c @@ -69,6 +69,7 @@ static struct nvm_iscsi_block *qedi_get_ static void qedi_recovery_handler(struct work_struct *work); static void qedi_schedule_hw_err_handler(void *dev, enum qed_hw_err_type err_type); +static int qedi_suspend(struct pci_dev *pdev, pm_message_t state);
static int qedi_iscsi_event_cb(void *context, u8 fw_event_code, void *fw_handle) { @@ -2510,6 +2511,22 @@ static void qedi_shutdown(struct pci_dev __qedi_remove(pdev, QEDI_MODE_SHUTDOWN); }
+static int qedi_suspend(struct pci_dev *pdev, pm_message_t state) +{ + struct qedi_ctx *qedi; + + if (!pdev) { + QEDI_ERR(NULL, "pdev is NULL.\n"); + return -ENODEV; + } + + qedi = pci_get_drvdata(pdev); + + QEDI_ERR(&qedi->dbg_ctx, "%s: Device does not support suspend operation\n", __func__); + + return -EPERM; +} + static int __qedi_probe(struct pci_dev *pdev, int mode) { struct qedi_ctx *qedi; @@ -2868,6 +2885,7 @@ static struct pci_driver qedi_pci_driver .remove = qedi_remove, .shutdown = qedi_shutdown, .err_handler = &qedi_err_handler, + .suspend = qedi_suspend, };
static int __init qedi_init(void)
From: Nilesh Javali njavali@marvell.com
commit ef222f551e7c4e2008fc442ffc9edcd1a7fd8f63 upstream.
While performing certain power-off sequences, PCI drivers are called to suspend and resume their underlying devices through PCI PM (power management) interface. However the hardware does not support PCI PM suspend/resume operations so system wide suspend/resume leads to bad MFW (management firmware) state which causes various follow-up errors in driver when communicating with the device/firmware.
To fix this driver implements PCI PM suspend handler to indicate unsupported operation to the PCI subsystem explicitly, thus avoiding system to go into suspended/standby mode.
Fixes: 61d8658b4a43 ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.") Signed-off-by: Saurav Kashyap skashyap@marvell.com Signed-off-by: Nilesh Javali njavali@marvell.com Link: https://lore.kernel.org/r/20230807093725.46829-1-njavali@marvell.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/qedf/qedf_main.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/scsi/qedf/qedf_main.c +++ b/drivers/scsi/qedf/qedf_main.c @@ -31,6 +31,7 @@ static void qedf_remove(struct pci_dev * static void qedf_shutdown(struct pci_dev *pdev); static void qedf_schedule_recovery_handler(void *dev); static void qedf_recovery_handler(struct work_struct *work); +static int qedf_suspend(struct pci_dev *pdev, pm_message_t state);
/* * Driver module parameters. @@ -3271,6 +3272,7 @@ static struct pci_driver qedf_pci_driver .probe = qedf_probe, .remove = qedf_remove, .shutdown = qedf_shutdown, + .suspend = qedf_suspend, };
static int __qedf_probe(struct pci_dev *pdev, int mode) @@ -4000,6 +4002,22 @@ static void qedf_shutdown(struct pci_dev __qedf_remove(pdev, QEDF_MODE_NORMAL); }
+static int qedf_suspend(struct pci_dev *pdev, pm_message_t state) +{ + struct qedf_ctx *qedf; + + if (!pdev) { + QEDF_ERR(NULL, "pdev is NULL.\n"); + return -ENODEV; + } + + qedf = pci_get_drvdata(pdev); + + QEDF_ERR(&qedf->dbg_ctx, "%s: Device does not support suspend operation\n", __func__); + + return -EPERM; +} + /* * Recovery handler code */
From: Jean Delvare jdelvare@suse.de
commit 5a66d59b5ff537ddae84a1f175c3f8eb1140a562 upstream.
The msi-ec driver fails to build for me (gcc 7.5):
CC [M] drivers/platform/x86/msi-ec.o drivers/platform/x86/msi-ec.c:72:6: error: initializer element is not constant { SM_ECO_NAME, 0xc2 }, ^~~~~~~~~~~ drivers/platform/x86/msi-ec.c:72:6: note: (near initialization for ‘CONF0.shift_mode.modes[0].name’) drivers/platform/x86/msi-ec.c:73:6: error: initializer element is not constant { SM_COMFORT_NAME, 0xc1 }, ^~~~~~~~~~~~~~~ drivers/platform/x86/msi-ec.c:73:6: note: (near initialization for ‘CONF0.shift_mode.modes[1].name’) drivers/platform/x86/msi-ec.c:74:6: error: initializer element is not constant { SM_SPORT_NAME, 0xc0 }, ^~~~~~~~~~~~~ drivers/platform/x86/msi-ec.c:74:6: note: (near initialization for ‘CONF0.shift_mode.modes[2].name’) (...)
Don't try to be smart, just use defines for the constant strings. The compiler will recognize it's the same string and will store it only once in the data section anyway.
Signed-off-by: Jean Delvare jdelvare@suse.de Fixes: 392cacf2aa10 ("platform/x86: Add new msi-ec driver") Cc: stable@vger.kernel.org Cc: Nikita Kravets teackot@gmail.com Cc: Hans de Goede hdegoede@redhat.com Cc: Mark Gross markgross@kernel.org Link: https://lore.kernel.org/r/20230805101010.54d49e91@endymion.delvare Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/msi-ec.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/platform/x86/msi-ec.c +++ b/drivers/platform/x86/msi-ec.c @@ -27,15 +27,15 @@ #include <linux/seq_file.h> #include <linux/string.h>
-static const char *const SM_ECO_NAME = "eco"; -static const char *const SM_COMFORT_NAME = "comfort"; -static const char *const SM_SPORT_NAME = "sport"; -static const char *const SM_TURBO_NAME = "turbo"; +#define SM_ECO_NAME "eco" +#define SM_COMFORT_NAME "comfort" +#define SM_SPORT_NAME "sport" +#define SM_TURBO_NAME "turbo"
-static const char *const FM_AUTO_NAME = "auto"; -static const char *const FM_SILENT_NAME = "silent"; -static const char *const FM_BASIC_NAME = "basic"; -static const char *const FM_ADVANCED_NAME = "advanced"; +#define FM_AUTO_NAME "auto" +#define FM_SILENT_NAME "silent" +#define FM_BASIC_NAME "basic" +#define FM_ADVANCED_NAME "advanced"
static const char * const ALLOWED_FW_0[] __initconst = { "14C1EMS1.012",
From: Hans de Goede hdegoede@redhat.com
commit 2b6aa6610dc9690f79d305ca938abfb799a4f766 upstream.
The lenovo-ymc driver is causing the keyboard + touchpad to stop working on some regular laptop models such as the Lenovo ThinkBook 13s G2 ITL 20V9.
The problem is that there are YMC WMI GUID methods in the ACPI tables of these laptops, despite them not being Yogas and lenovo-ymc loading causes libinput to see a SW_TABLET_MODE switch with state 1.
This in turn causes libinput to ignore events from the builtin keyboard and touchpad, since it filters those out for a Yoga in tablet mode.
Similar issues with false-positive SW_TABLET_MODE=1 reporting have been seen with the intel-hid driver.
Copy the intel-hid driver approach to fix this and only bind to the WMI device on machines where the DMI chassis-type indicates the machine is a convertible.
Add a 'force' module parameter to allow overriding the chassis-type check so that users can easily test if the YMC interface works on models which report an unexpected chassis-type.
Fixes: e82882cdd241 ("platform/x86: Add driver for Yoga Tablet Mode switch") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229373 Cc: André Apitzsch git@apitzsch.eu Cc: stable@vger.kernel.org Tested-by: Andrew Kallmeyer kallmeyeras@gmail.com Tested-by: Gergő Köteles soyer@irl.hu Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20230812144818.383230-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/lenovo-ymc.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/drivers/platform/x86/lenovo-ymc.c b/drivers/platform/x86/lenovo-ymc.c index 41676188b373..f360370d5002 100644 --- a/drivers/platform/x86/lenovo-ymc.c +++ b/drivers/platform/x86/lenovo-ymc.c @@ -24,6 +24,10 @@ static bool ec_trigger __read_mostly; module_param(ec_trigger, bool, 0444); MODULE_PARM_DESC(ec_trigger, "Enable EC triggering work-around to force emitting tablet mode events");
+static bool force; +module_param(force, bool, 0444); +MODULE_PARM_DESC(force, "Force loading on boards without a convertible DMI chassis-type"); + static const struct dmi_system_id ec_trigger_quirk_dmi_table[] = { { /* Lenovo Yoga 7 14ARB7 */ @@ -35,6 +39,20 @@ static const struct dmi_system_id ec_trigger_quirk_dmi_table[] = { { } };
+static const struct dmi_system_id allowed_chasis_types_dmi_table[] = { + { + .matches = { + DMI_EXACT_MATCH(DMI_CHASSIS_TYPE, "31" /* Convertible */), + }, + }, + { + .matches = { + DMI_EXACT_MATCH(DMI_CHASSIS_TYPE, "32" /* Detachable */), + }, + }, + { } +}; + struct lenovo_ymc_private { struct input_dev *input_dev; struct acpi_device *ec_acpi_dev; @@ -111,6 +129,13 @@ static int lenovo_ymc_probe(struct wmi_device *wdev, const void *ctx) struct input_dev *input_dev; int err;
+ if (!dmi_check_system(allowed_chasis_types_dmi_table)) { + if (force) + dev_info(&wdev->dev, "Force loading Lenovo YMC support\n"); + else + return -ENODEV; + } + ec_trigger |= dmi_check_system(ec_trigger_quirk_dmi_table);
priv = devm_kzalloc(&wdev->dev, sizeof(*priv), GFP_KERNEL);
From: Vadim Pasternak vadimp@nvidia.com
commit d66a8aab7dc36c975bbaa6aa74cf7445878e7c69 upstream.
Move debug register offsets to different location due to hardware changes.
Fixes: dd635e33b5c9 ("platform: mellanox: Introduce support of new Nvidia L1 switch") Signed-off-by: Vadim Pasternak vadimp@nvidia.com Reviewed-by: Michael Shych michaelsh@nvidia.com Link: https://lore.kernel.org/r/20230813083735.39090-5-vadimp@nvidia.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/mlx-platform.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/platform/x86/mlx-platform.c b/drivers/platform/x86/mlx-platform.c index 240bc3174caf..7d33977d9c60 100644 --- a/drivers/platform/x86/mlx-platform.c +++ b/drivers/platform/x86/mlx-platform.c @@ -62,10 +62,6 @@ #define MLXPLAT_CPLD_LPC_REG_PWM_CONTROL_OFFSET 0x37 #define MLXPLAT_CPLD_LPC_REG_AGGR_OFFSET 0x3a #define MLXPLAT_CPLD_LPC_REG_AGGR_MASK_OFFSET 0x3b -#define MLXPLAT_CPLD_LPC_REG_DBG1_OFFSET 0x3c -#define MLXPLAT_CPLD_LPC_REG_DBG2_OFFSET 0x3d -#define MLXPLAT_CPLD_LPC_REG_DBG3_OFFSET 0x3e -#define MLXPLAT_CPLD_LPC_REG_DBG4_OFFSET 0x3f #define MLXPLAT_CPLD_LPC_REG_AGGRLO_OFFSET 0x40 #define MLXPLAT_CPLD_LPC_REG_AGGRLO_MASK_OFFSET 0x41 #define MLXPLAT_CPLD_LPC_REG_AGGRCO_OFFSET 0x42 @@ -126,6 +122,10 @@ #define MLXPLAT_CPLD_LPC_REG_LC_SD_EVENT_OFFSET 0xaa #define MLXPLAT_CPLD_LPC_REG_LC_SD_MASK_OFFSET 0xab #define MLXPLAT_CPLD_LPC_REG_LC_PWR_ON 0xb2 +#define MLXPLAT_CPLD_LPC_REG_DBG1_OFFSET 0xb6 +#define MLXPLAT_CPLD_LPC_REG_DBG2_OFFSET 0xb7 +#define MLXPLAT_CPLD_LPC_REG_DBG3_OFFSET 0xb8 +#define MLXPLAT_CPLD_LPC_REG_DBG4_OFFSET 0xb9 #define MLXPLAT_CPLD_LPC_REG_GP4_RO_OFFSET 0xc2 #define MLXPLAT_CPLD_LPC_REG_SPI_CHNL_SELECT 0xc3 #define MLXPLAT_CPLD_LPC_REG_WD_CLEAR_OFFSET 0xc7
From: Vadim Pasternak vadimp@nvidia.com
commit 3c91d7e8c64f75c63da3565d16d5780320bd5d76 upstream.
Change polarity of chassis health and power signals and fix latch reset mask for L1 switch.
Fixes: dd635e33b5c9 ("platform: mellanox: Introduce support of new Nvidia L1 switch") Signed-off-by: Vadim Pasternak vadimp@nvidia.com Reviewed-by: Michael Shych michaelsh@nvidia.com Link: https://lore.kernel.org/r/20230813083735.39090-3-vadimp@nvidia.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/mlx-platform.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/platform/x86/mlx-platform.c b/drivers/platform/x86/mlx-platform.c index 5fb3348023a7..69256af04f05 100644 --- a/drivers/platform/x86/mlx-platform.c +++ b/drivers/platform/x86/mlx-platform.c @@ -237,7 +237,7 @@ #define MLXPLAT_CPLD_GWP_MASK GENMASK(0, 0) #define MLXPLAT_CPLD_EROT_MASK GENMASK(1, 0) #define MLXPLAT_CPLD_PWR_BUTTON_MASK BIT(0) -#define MLXPLAT_CPLD_LATCH_RST_MASK BIT(5) +#define MLXPLAT_CPLD_LATCH_RST_MASK BIT(6) #define MLXPLAT_CPLD_THERMAL1_PDB_MASK BIT(3) #define MLXPLAT_CPLD_THERMAL2_PDB_MASK BIT(4) #define MLXPLAT_CPLD_INTRUSION_MASK BIT(6) @@ -2475,7 +2475,7 @@ static struct mlxreg_core_item mlxplat_mlxcpld_l1_switch_events_items[] = { .reg = MLXPLAT_CPLD_LPC_REG_PWRB_OFFSET, .mask = MLXPLAT_CPLD_PWR_BUTTON_MASK, .count = ARRAY_SIZE(mlxplat_mlxcpld_l1_switch_pwr_events_items_data), - .inversed = 0, + .inversed = 1, .health = false, }, { @@ -2484,7 +2484,7 @@ static struct mlxreg_core_item mlxplat_mlxcpld_l1_switch_events_items[] = { .reg = MLXPLAT_CPLD_LPC_REG_BRD_OFFSET, .mask = MLXPLAT_CPLD_L1_CHA_HEALTH_MASK, .count = ARRAY_SIZE(mlxplat_mlxcpld_l1_switch_health_events_items_data), - .inversed = 0, + .inversed = 1, .health = false, .ind = 8, }, @@ -3677,7 +3677,7 @@ static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_regs_io_data[] = { { .label = "latch_reset", .reg = MLXPLAT_CPLD_LPC_REG_GP1_OFFSET, - .mask = GENMASK(7, 0) & ~BIT(5), + .mask = GENMASK(7, 0) & ~BIT(6), .mode = 0200, }, {
From: Vadim Pasternak vadimp@nvidia.com
commit 9f8ccdb5088bd03062d9ad9c0f6abf600cbed8e8 upstream.
Use kernel_power_off() instead of kernel_halt() to pass through machine_power_off() -> pm_power_off(), otherwise axillary power does not go off.
Change "power down" bitmask.
Fixes: dd635e33b5c9 ("platform: mellanox: Introduce support of new Nvidia L1 switch") Signed-off-by: Vadim Pasternak vadimp@nvidia.com Reviewed-by: Michael Shych michaelsh@nvidia.com Link: https://lore.kernel.org/r/20230813083735.39090-4-vadimp@nvidia.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/mlx-platform.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/platform/x86/mlx-platform.c +++ b/drivers/platform/x86/mlx-platform.c @@ -222,7 +222,7 @@ MLXPLAT_CPLD_AGGR_MASK_LC_SDWN) #define MLXPLAT_CPLD_LOW_AGGR_MASK_LOW 0xc1 #define MLXPLAT_CPLD_LOW_AGGR_MASK_ASIC2 BIT(2) -#define MLXPLAT_CPLD_LOW_AGGR_MASK_PWR_BUT BIT(4) +#define MLXPLAT_CPLD_LOW_AGGR_MASK_PWR_BUT GENMASK(5, 4) #define MLXPLAT_CPLD_LOW_AGGR_MASK_I2C BIT(6) #define MLXPLAT_CPLD_PSU_MASK GENMASK(1, 0) #define MLXPLAT_CPLD_PWR_MASK GENMASK(1, 0) @@ -2356,7 +2356,7 @@ mlxplat_mlxcpld_l1_switch_pwr_events_han u8 action) { dev_info(&mlxplat_dev->dev, "System shutdown due to short press of power button"); - kernel_halt(); + kernel_power_off(); return 0; }
From: Vadim Pasternak vadimp@nvidia.com
commit 8e3938cff0191c810b2abd827313c090fe09d166 upstream.
Fix exit flow order: call mlxplat_post_exit() after mlxplat_i2c_main_exit() in order to unregister main i2c driver before to "mlxplat" driver.
Fixes: 0170f616f496 ("platform: mellanox: Split initialization procedure") Signed-off-by: Vadim Pasternak vadimp@nvidia.com Reviewed-by: Michael Shych michaelsh@nvidia.com Link: https://lore.kernel.org/r/20230813083735.39090-2-vadimp@nvidia.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/mlx-platform.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/platform/x86/mlx-platform.c +++ b/drivers/platform/x86/mlx-platform.c @@ -6238,8 +6238,6 @@ static void mlxplat_i2c_mux_topolgy_exit if (priv->pdev_mux[i]) platform_device_unregister(priv->pdev_mux[i]); } - - mlxplat_post_exit(); }
static int mlxplat_i2c_main_complition_notify(void *handle, int id) @@ -6369,6 +6367,7 @@ static void __exit mlxplat_exit(void) pm_power_off = NULL; mlxplat_pre_exit(priv); mlxplat_i2c_main_exit(priv); + mlxplat_post_exit(); } module_exit(mlxplat_exit);
From: David Xu xuwd1@hotmail.com
commit 676b7c5ecab36274442887ceadd6dee8248a244f upstream.
The current code assumes that the CSC3551(multiple cs35l41) always have its interrupt pin connected to GPIO thus the IRQ can be acquired with acpi_dev_gpio_irq_get. However on some newer laptop models this is no longer the case as they have the CSC3551's interrupt pin connected to APIC. This causes smi_i2c_probe to fail on these machines.
To support these machines, a new macro IRQ_RESOURCE_AUTO was introduced for cs35l41 smi_node, and smi_get_irq function was modified so it tries to get GPIO irq resource first and if failed, tries to get APIC irq resource for cs35l41.
This patch affects only the cs35l41's probing and brings no negative influence on machines that indeed have the cs35l41's interrupt pin connected to GPIO.
Signed-off-by: David Xu xuwd1@hotmail.com Link: https://lore.kernel.org/r/SY4P282MB18350CD8288687B87FFD2243E037A@SY4P282MB18... Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/serial-multi-instantiate.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)
--- a/drivers/platform/x86/serial-multi-instantiate.c +++ b/drivers/platform/x86/serial-multi-instantiate.c @@ -21,6 +21,7 @@ #define IRQ_RESOURCE_NONE 0 #define IRQ_RESOURCE_GPIO 1 #define IRQ_RESOURCE_APIC 2 +#define IRQ_RESOURCE_AUTO 3
enum smi_bus_type { SMI_I2C, @@ -52,6 +53,18 @@ static int smi_get_irq(struct platform_d int ret;
switch (inst->flags & IRQ_RESOURCE_TYPE) { + case IRQ_RESOURCE_AUTO: + ret = acpi_dev_gpio_irq_get(adev, inst->irq_idx); + if (ret > 0) { + dev_dbg(&pdev->dev, "Using gpio irq\n"); + break; + } + ret = platform_get_irq(pdev, inst->irq_idx); + if (ret > 0) { + dev_dbg(&pdev->dev, "Using platform irq\n"); + break; + } + break; case IRQ_RESOURCE_GPIO: ret = acpi_dev_gpio_irq_get(adev, inst->irq_idx); break; @@ -307,10 +320,10 @@ static const struct smi_node int3515_dat
static const struct smi_node cs35l41_hda = { .instances = { - { "cs35l41-hda", IRQ_RESOURCE_GPIO, 0 }, - { "cs35l41-hda", IRQ_RESOURCE_GPIO, 0 }, - { "cs35l41-hda", IRQ_RESOURCE_GPIO, 0 }, - { "cs35l41-hda", IRQ_RESOURCE_GPIO, 0 }, + { "cs35l41-hda", IRQ_RESOURCE_AUTO, 0 }, + { "cs35l41-hda", IRQ_RESOURCE_AUTO, 0 }, + { "cs35l41-hda", IRQ_RESOURCE_AUTO, 0 }, + { "cs35l41-hda", IRQ_RESOURCE_AUTO, 0 }, {} }, .bus_type = SMI_AUTO_DETECT,
From: Simon Trimmer simont@opensource.cirrus.com
commit 1cd0302be5645420f73090aee26fa787287e1096 upstream.
The ACPI device CSC3556 is a Cirrus Logic CS35L56 mono amplifier which is used in multiples, and can be connected either to I2C or SPI.
There will be multiple instances under the same Device() node. Add it to ignore_serial_bus_ids and handle it in the serial-multi-instantiate driver.
There can be a 5th I2cSerialBusV2, but this is an alias address and doesn't represent a real device. Ignore this by having a dummy 5th entry in the serial-multi-instantiate instance list with the name of a non-existent driver, on the same pattern as done for bsg2150.
Signed-off-by: Simon Trimmer simont@opensource.cirrus.com Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Link: https://lore.kernel.org/r/20230728111345.7224-1-rf@opensource.cirrus.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/scan.c | 1 + drivers/platform/x86/serial-multi-instantiate.c | 14 ++++++++++++++ 2 files changed, 15 insertions(+)
--- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -1712,6 +1712,7 @@ static bool acpi_device_enumeration_by_p {"BSG1160", }, {"BSG2150", }, {"CSC3551", }, + {"CSC3556", }, {"INT33FE", }, {"INT3515", }, /* Non-conforming _HID for Cirrus Logic already released */ --- a/drivers/platform/x86/serial-multi-instantiate.c +++ b/drivers/platform/x86/serial-multi-instantiate.c @@ -329,6 +329,19 @@ static const struct smi_node cs35l41_hda .bus_type = SMI_AUTO_DETECT, };
+static const struct smi_node cs35l56_hda = { + .instances = { + { "cs35l56-hda", IRQ_RESOURCE_AUTO, 0 }, + { "cs35l56-hda", IRQ_RESOURCE_AUTO, 0 }, + { "cs35l56-hda", IRQ_RESOURCE_AUTO, 0 }, + { "cs35l56-hda", IRQ_RESOURCE_AUTO, 0 }, + /* a 5th entry is an alias address, not a real device */ + { "cs35l56-hda_dummy_dev" }, + {} + }, + .bus_type = SMI_AUTO_DETECT, +}; + /* * Note new device-ids must also be added to ignore_serial_bus_ids in * drivers/acpi/scan.c: acpi_device_enumeration_by_parent(). @@ -337,6 +350,7 @@ static const struct acpi_device_id smi_a { "BSG1160", (unsigned long)&bsg1160_data }, { "BSG2150", (unsigned long)&bsg2150_data }, { "CSC3551", (unsigned long)&cs35l41_hda }, + { "CSC3556", (unsigned long)&cs35l56_hda }, { "INT3515", (unsigned long)&int3515_data }, /* Non-conforming _HID for Cirrus Logic already released */ { "CLSA0100", (unsigned long)&cs35l41_hda },
From: Masahiro Yamada masahiroy@kernel.org
commit 6ccbd7fd474674654019a20177c943359469103a upstream.
EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization.
Commit c5a130325f13 ("ACPI/APEI: Add parameter check before error injection") exported page_is_ram(), hence the __init annotation should be removed.
This fixes the modpost warning in ARCH=alpha builds:
WARNING: modpost: vmlinux: page_is_ram: EXPORT_SYMBOL used for init symbol. Remove __init or EXPORT_SYMBOL.
Fixes: c5a130325f13 ("ACPI/APEI: Add parameter check before error injection") Signed-off-by: Masahiro Yamada masahiroy@kernel.org Reviewed-by: Randy Dunlap rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/alpha/kernel/setup.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/alpha/kernel/setup.c +++ b/arch/alpha/kernel/setup.c @@ -385,8 +385,7 @@ setup_memory(void *kernel_end) #endif /* CONFIG_BLK_DEV_INITRD */ }
-int __init -page_is_ram(unsigned long pfn) +int page_is_ram(unsigned long pfn) { struct memclust_struct * cluster; struct memdesc_struct * memdesc;
On Sun, Aug 13, 2023 at 11:16:10PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed bindeb-pkgs on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On Sun, Aug 13, 2023 at 11:16:10PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Conor Dooley conor.dooley@microchip.com
Thanks, Conor.
On Sun, 13 Aug 2023 23:16:10 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.11-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.4: 11 builds: 11 pass, 0 fail 28 boots: 28 pass, 0 fail 130 tests: 130 pass, 0 fail
Linux version: 6.4.11-rc1-g00ae646d9c95 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Thierry Reding treding@nvidia.com
Hello,
On Sun, 13 Aug 2023 23:16:10 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.11-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] 427a3a47257b ("Linux 6.4.11-rc1")
Thanks, SJ
[...]
---
[32m ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_m68k.sh ok 12 selftests: damon-tests: build_arm64.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m _remote_run_corr.sh SUCCESS
Hello!
We see this warning on x86 (real machine and Qemu) with Clang:
-----8<----- [ 1.364590] ------------[ cut here ]------------ [ 1.364685] missing return thunk: __ret+0x5/0x7e-__ret+0x0/0x7e: e9 f6 ff ff ff [ 1.364691] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:630 apply_returns+0x2c9/0x420 [ 1.366684] Modules linked in: [ 1.367685] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.11-rc1 #1 [ 1.368684] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0b 07/27/2017 [ 1.369685] RIP: 0010:apply_returns+0x2c9/0x420 [ 1.370685] Code: ff ff 0f 0b e9 b5 fd ff ff c6 05 a8 97 fa 01 01 48 c7 c7 e6 b5 9f a7 4c 89 ee 4c 89 e2 b9 05 00 00 00 4d 89 e8 e8 57 c2 11 00 <0f> 0b e9 8d fd ff ff 4d 85 e4 0f 84 1f ff ff ff 48 c7 c7 bf 44 98 [ 1.371684] RSP: 0000:ffffffffa7c03e00 EFLAGS: 00010246 [ 1.372684] RAX: 8bcba40230adee00 RBX: ffffffffa8150ba4 RCX: ffffffffa7c72440 [ 1.373684] RDX: ffffffffa7c03c88 RSI: 00000000ffffdfff RDI: 0000000000000001 [ 1.374684] RBP: ffffffffa7c03ed8 R08: 0000000000001fff R09: ffffffffa7c726d0 [ 1.375684] R10: 0000000000005ffd R11: 0000000000000004 R12: ffffffffa7182140 [ 1.376684] R13: ffffffffa7182145 R14: ffffffffa8150b9c R15: ffffffffa71821f0 [ 1.377684] FS: 0000000000000000(0000) GS:ffff8adb1fc00000(0000) knlGS:0000000000000000 [ 1.378684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.379684] CR2: ffff8ad8b7001000 CR3: 00000001f6640001 CR4: 00000000003706f0 [ 1.380684] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.381684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.382684] Call Trace: [ 1.383685] <TASK> [ 1.384685] ? show_regs+0x61/0x70 [ 1.385685] ? __warn+0xce/0x1d0 [ 1.386684] ? apply_returns+0x2c9/0x420 [ 1.387685] ? report_bug+0x160/0x210 [ 1.388685] ? handle_bug+0x41/0x70 [ 1.389684] ? exc_invalid_op+0x1f/0x50 [ 1.390685] ? asm_exc_invalid_op+0x1f/0x30 [ 1.391684] ? srso_safe_ret+0x30/0x30 [ 1.392685] ? __ret+0x5/0x7e [ 1.393684] ? zen_untrain_ret+0x1/0x1 [ 1.394685] ? apply_returns+0x2c9/0x420 [ 1.395685] ? __ret+0x5/0x7e [ 1.396684] ? __ret+0x14/0x7e [ 1.397684] ? __ret+0xa/0x7e [ 1.398685] alternative_instructions+0x50/0x120 [ 1.399685] arch_cpu_finalize_init+0x30/0x60 [ 1.400684] start_kernel+0x30e/0x3e0 [ 1.401685] x86_64_start_reservations+0x28/0x30 [ 1.402684] x86_64_start_kernel+0xaf/0xc0 [ 1.403685] secondary_startup_64_no_verify+0x107/0x10b [ 1.404685] </TASK> [ 1.405685] ---[ end trace 0000000000000000 ]--- -----8>-----
Full log here of one of those instances: https://lkft.validation.linaro.org/scheduler/job/6664660#L671
There is a ClangBuiltLinux issue addressing this [1]. Nathan refers Peter's second patch [2] in his series fixes this problem.
This was introduced in v6.4.9 (which we did not review). Clang builds failed for v6.4.10 [3][4] and are now fixed, so this is the first time we're seeing this warning.
Rest of the report as follows:
## Build * kernel: 6.4.11-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.4.y * git commit: 427a3a47257b870ef6dce995a40bb7aca1bfc6ec * git describe: v6.4.10-207-g427a3a47257b * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.4.y/build/v6.4.10...
## Test Regressions (compared to v6.4.10) * x86, log-parser-boot - check-kernel-warning
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
## No metric regressions (compared to v6.4.10)
## Test Fixes (compared to v6.4.10) * x86_64, build - clang-17-allmodconfig - clang-17-lkftconfig - clang-17-lkftconfig-compat - clang-17-lkftconfig-kcsan - clang-17-lkftconfig-no-kselftest-frag - clang-17-x86_64_defconfig - clang-lkftconfig - clang-nightly-lkftconfig - clang-nightly-lkftconfig-kselftest - clang-nightly-x86_64_defconfig
## No metric fixes (compared to v6.4.10)
## Test result summary total: 172870, pass: 149103, fail: 3038, skip: 20556, xfail: 173
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 145 total, 144 passed, 1 failed * arm64: 54 total, 52 passed, 2 failed * i386: 41 total, 40 passed, 1 failed * mips: 30 total, 28 passed, 2 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 38 total, 36 passed, 2 failed * riscv: 26 total, 24 passed, 2 failed * s390: 16 total, 14 passed, 2 failed * sh: 14 total, 12 passed, 2 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 46 total, 45 passed, 1 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-vm * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * perf * rcutorture * v4l2-compliance
Greetings!
Daniel Díaz daniel.diaz@linaro.org
[1] https://github.com/ClangBuiltLinux/linux/issues/1911 [2] https://lore.kernel.org/20230809072200.543939260@infradead.org/ [3] https://lore.kernel.org/stable/CA+G9fYuoajK0n7RNhSqm-ycO6Md3W4ah_Sc=b_KVAQwY... [4] https://github.com/ClangBuiltLinux/linux/issues/1907
On Sun, Aug 13, 2023 at 11:16:10PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 522 pass: 522 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Sun, Aug 13, 2023 at 11:16:10PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.11-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
On 8/13/23 2:16 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.11-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On 8/13/23 15:16, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.11-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.11-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On 8/13/23 14:16, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.11-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Sun, Aug 13, 2023 at 11:16:10PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.4.11 release. There are 206 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 15 Aug 2023 21:16:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.11-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
For RCU, Tested-by: Joel Fernandes (Google) joel@joelfernandes.org
thanks,
- Joel
thanks,
greg k-h
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.4.11-rc1
Masahiro Yamada masahiroy@kernel.org alpha: remove __init annotation from exported page_is_ram()
Simon Trimmer simont@opensource.cirrus.com ACPI: scan: Create platform device for CS35L56
David Xu xuwd1@hotmail.com platform/x86: serial-multi-instantiate: Auto detect IRQ resource for CSC3551
Vadim Pasternak vadimp@nvidia.com platform: mellanox: Fix order in exit flow
Vadim Pasternak vadimp@nvidia.com platform: mellanox: mlx-platform: Modify graceful shutdown callback and power down mask
Vadim Pasternak vadimp@nvidia.com platform: mellanox: mlx-platform: Fix signals polarity and latch mask
Vadim Pasternak vadimp@nvidia.com platform: mellanox: Change register offset addresses
Hans de Goede hdegoede@redhat.com platform/x86: lenovo-ymc: Only bind on machines with a convertible DMI chassis-type
Jean Delvare jdelvare@suse.de platform/x86: msi-ec: Fix the build
Nilesh Javali njavali@marvell.com scsi: qedf: Fix firmware halt over suspend and resume
Nilesh Javali njavali@marvell.com scsi: qedi: Fix firmware halt over suspend and resume
Karan Tilak Kumar kartilak@cisco.com scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
Zhu Wang wangzhu9@huawei.com scsi: core: Fix possible memory leak if device_add() fails
Zhu Wang wangzhu9@huawei.com scsi: snic: Fix possible memory leak if device_add() fails
Alexandra Diupina adiupina@astralinux.ru scsi: 53c700: Check that command slot is not NULL
Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com scsi: ufs: renesas: Fix private allocation
Michael Kelley mikelley@microsoft.com scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
Tony Battersby tonyb@cybernetics.com scsi: core: Fix legacy /proc parsing buffer overflow
Josef Bacik josef@toxicpanda.com btrfs: set cache_block_group_error if we find an error
Qu Wenruo wqu@suse.com btrfs: reject invalid reloc tree root keys with stack dump
Qu Wenruo wqu@suse.com btrfs: exit gracefully if reloc roots don't match
Christoph Hellwig hch@lst.de btrfs: properly clear end of the unreserved range in cow_file_range
Christoph Hellwig hch@lst.de btrfs: don't wait for writeback on clean pages in extent_write_cache_pages
Christoph Hellwig hch@lst.de btrfs: don't stop integrity writeback too early
Josef Bacik josef@toxicpanda.com btrfs: wait for actual caching progress during allocation
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: sim: mark the GPIO chip as a one that can sleep
William Breathitt Gray william.gray@linaro.org gpio: ws16c48: Fix off-by-one error in WS16C48 resource region extent
Nick Child nnac123@linux.ibm.com ibmvnic: Ensure login failure recovery is safe from other resets
Nick Child nnac123@linux.ibm.com ibmvnic: Do partial reset on login failure
Nick Child nnac123@linux.ibm.com ibmvnic: Handle DMA unmapping of login buffs in release functions
Nick Child nnac123@linux.ibm.com ibmvnic: Unmap DMA login rsp buffer on send login fail
Nick Child nnac123@linux.ibm.com ibmvnic: Enforce stronger sanity checks on login response
Moshe Shemesh moshe@nvidia.com net/mlx5: Reload auxiliary devices in pci error handlers
Moshe Shemesh moshe@nvidia.com net/mlx5: Skip clock update work when device is in error state
Shay Drory shayd@nvidia.com net/mlx5: LAG, Check correct bucket when modifying LAG
Chris Mi cmi@nvidia.com net/mlx5e: Unoffload post act rule when handling FIB events
Daniel Jurgens danielj@nvidia.com net/mlx5: Allow 0 for total host VFs
Yevgeny Kliteynik kliteyn@nvidia.com net/mlx5: DR, Fix wrong allocation of modify hdr pattern
Jianbo Liu jianbol@nvidia.com net/mlx5e: TC, Fix internal port memory leak
Gal Pressman gal@nvidia.com net/mlx5e: Take RTNL lock when needed before calling xdp_set_features()
Zhang Jianhua chris.zjh@huawei.com dmaengine: owl-dma: Modify mismatched function name
Fenghua Yu fenghua.yu@intel.com dmaengine: idxd: Clear PRS disable flag when disabling IDXD device
Christophe JAILLET christophe.jaillet@wanadoo.fr dmaengine: mcf-edma: Fix a potential un-allocated memory access
Hao Chen chenhao418@huawei.com net: hns3: fix strscpy causing content truncation issue
Ido Schimmel idosch@nvidia.com nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
Ido Schimmel idosch@nvidia.com nexthop: Make nexthop bucket dump more efficient
Ido Schimmel idosch@nvidia.com nexthop: Fix infinite nexthop dump when using maximum nexthop ID
Vladimir Oltean vladimir.oltean@nxp.com net: enetc: reimplement RFS/RSS memory clearing as PCI quirk
Yonglong Liu liuyonglong@huawei.com net: hns3: fix deadlock issue when externel_lb and reset are executed together
Jie Wang wangjie125@huawei.com net: hns3: add wait until mac link down
Jie Wang wangjie125@huawei.com net: hns3: refactor hclge_mac_link_status_wait for interface reuse
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: ocelot: call dsa_tag_8021q_unregister() under rtnl_lock() on driver remove
Li Yang leoyang.li@nxp.com net: phy: at803x: remove set/get wol callbacks for AR8032
Jonas Gorski jonas.gorski@bisdn.de net: marvell: prestera: fix handling IPv4 routes with nhid
Jakub Kicinski kuba@kernel.org net: tls: avoid discarding data on record close
Kalesh AP kalesh-anakkur.purayil@broadcom.com RDMA/bnxt_re: Fix error handling in probe failure path
Selvin Xavier selvin.xavier@broadcom.com RDMA/bnxt_re: Properly order ib_device_unalloc() to avoid UAF
Michael Guralnik michaelgur@nvidia.com RDMA/umem: Set iova in ODP flow
Felix Fietkau nbd@nbd.name wifi: cfg80211: fix sband iftype data lookup for AP_VLAN
Petr Tesarik petr.tesarik.ext@huawei.com wifi: brcm80211: handle params_v1 allocation failure
Daniel Stone daniels@collabora.com drm/rockchip: Don't spam logs in atomic check
Arnd Bergmann arnd@arndb.de drm/nouveau: remove unused tu102_gr_load() function
Pin-yen Lin treapking@chromium.org drm/bridge: it6505: Check power state with it6505->powered in IRQ handler
Mario Limonciello mario.limonciello@amd.com drm/amd/display: Don't show stack trace for missing eDP
Douglas Miller doug.miller@cornelisnetworks.com IB/hfi1: Fix possible panic during hotplug remove
Piotr Gardocki piotrx.gardocki@intel.com iavf: fix potential races for FDIR filters
Fedor Pchelkin pchelkin@ispras.ru drivers: vxlan: vnifilter: free percpu vni stats on error path
Andrew Kanner andrew.kanner@gmail.com drivers: net: prevent tun_build_skb() to exceed the packet size limit
Eric Dumazet edumazet@google.com dccp: fix data-race around dp->dccps_mss_cache
Ziyang Xuan william.xuanziyang@huawei.com bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
Magnus Karlsson magnus.karlsson@intel.com xsk: fix refcount underflow in error path
Florian Westphal fw@strlen.de tunnels: fix kasan splat when generating ipv4 pmtu error
Eric Dumazet edumazet@google.com tcp: add missing family to tcp_set_ca_state() tracepoint
Vladimir Oltean vladimir.oltean@nxp.com PCI: move OF status = "disabled" detection to dev->match_driver
Gerd Bayer gbayer@linux.ibm.com net/smc: Use correct buffer sizes when switching between TCP and SMC
Gerd Bayer gbayer@linux.ibm.com net/smc: Fix setsockopt and sysctl to specify same buffer size again
Eric Dumazet edumazet@google.com net/packet: annotate data-races around tp->status
Nitya Sunkad nitya.sunkad@amd.com ionic: Add missing err handling for queue reconfig
Muhammad Husaini Zulkifli muhammad.husaini.zulkifli@intel.com igc: Add lock to safeguard global Qbv variables
Xiang Yang xiangyang3@huawei.com mptcp: fix the incorrect judgment for msk->cb_flags
Eric Dumazet edumazet@google.com macsec: use DEV_STATS_INC()
Nathan Chancellor nathan@kernel.org mISDN: Update parameter type of dsp_cmx_send()
Aleksa Savic savicaleksa83@gmail.com hwmon: (aquacomputer_d5next) Add selective 200ms delay after sending ctrl report
Xu Kuohai xukuohai@huawei.com bpf, sockmap: Fix bug that strp_done cannot be called
Xu Kuohai xukuohai@huawei.com bpf, sockmap: Fix map type error in sock_map_del_link
Andrew Kanner andrew.kanner@gmail.com net: core: remove unnecessary frame_sz check in bpf_xdp_adjust_tail()
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb: Make test more robust
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb: Fix failing test with old libnet
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb_max: Fix failing test with old libnet
Ido Schimmel idosch@nvidia.com selftests: forwarding: tc_flower: Relax success criterion
Ido Schimmel idosch@nvidia.com selftests: forwarding: tc_actions: Use ncat instead of nc
Ido Schimmel idosch@nvidia.com selftests: forwarding: Switch off timeout
Ido Schimmel idosch@nvidia.com selftests: forwarding: Skip test when no interfaces are specified
Ido Schimmel idosch@nvidia.com selftests: forwarding: hw_stats_l3_gre: Skip when using veth pairs
Ido Schimmel idosch@nvidia.com selftests: forwarding: ethtool_extended_state: Skip when using veth pairs
Ido Schimmel idosch@nvidia.com selftests: forwarding: ethtool: Skip when using veth pairs
Ido Schimmel idosch@nvidia.com selftests: forwarding: Add a helper to skip test when using veth pairs
Mark Brown broonie@kernel.org selftests/rseq: Fix build with undefined __weak
Xu Kuohai xukuohai@huawei.com selftests/bpf: fix a CI failure caused by vsock sockmap test
Minjie Du duminjie@vivo.com dmaengine: xilinx: xdma: Fix Judgment of the return value
Miquel Raynal miquel.raynal@bootlin.com dmaengine: xilinx: xdma: Fix typo
Raghavendra Rao Ananta rananta@google.com KVM: arm64: Fix hardware enable/disable flows for pKVM
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb: Check iproute2 version
Ido Schimmel idosch@nvidia.com selftests: forwarding: bridge_mdb_max: Check iproute2 version
Ido Schimmel idosch@nvidia.com selftests: forwarding: ethtool_mm: Skip when MAC Merge is not supported
Ido Schimmel idosch@nvidia.com selftests: forwarding: tc_tunnel_key: Make filters more specific
Neil Armstrong neil.armstrong@linaro.org interconnect: qcom: sm8550: add enable_mask for bcm nodes
Neil Armstrong neil.armstrong@linaro.org interconnect: qcom: sm8450: add enable_mask for bcm nodes
Neil Armstrong neil.armstrong@linaro.org interconnect: qcom: sa8775p: add enable_mask for bcm nodes
Mike Tipton mdtipton@codeaurora.org interconnect: qcom: Add support for mask-based BCMs
Matti Vaittinen mazziesaccount@gmail.com iio: light: bu27034: Fix scale format
Milan Zamazal mzamazal@redhat.com iio: core: Prevent invalid memory access when there is no parent
Alejandro Tafalla atafalla@dnyon.com iio: imu: lsm6dsx: Fix mount matrix retrieval
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_hash: mark set element as dead when deleting from packet path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: adapt set backend to use GC transaction API
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: GC transaction API to avoid race with control plane
Florian Westphal fw@strlen.de netfilter: nf_tables: don't skip expired elements during walk
Karol Herbst kherbst@redhat.com drm/nouveau/disp: Revert a NULL check inside nouveau_connector_get_modes
Bjorn Helgaas bhelgaas@google.com Revert "PCI: mvebu: Mark driver as BROKEN"
Arnd Bergmann arnd@arndb.de x86: Move gds_ucode_mitigated() declaration to header
Arnd Bergmann arnd@arndb.de x86/speculation: Add cpu_show_gds() prototype
Jinghao Jia jinghao@linux.ibm.com x86/linkage: Fix typo of BUILD_VDSO in asm/linkage.h
Borislav Petkov (AMD) bp@alien8.de x86/sev: Do not try to parse for the CC blob on non-AMD hardware
Kirill A. Shutemov kirill.shutemov@linux.intel.com x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
Cristian Ciocaltea cristian.ciocaltea@collabora.com x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405
Xin Li xin3.li@intel.com x86/vdso: Choose the right GDT_ENTRY_CPUNODE for 32-bit getcpu() on 64-bit kernel
Nick Desaulniers ndesaulniers@google.com x86/srso: Fix build breakage with the LLVM linker
RD Babiera rdbabiera@google.com usb: typec: altmodes/displayport: Signal hpd when configuring pin assignment
Badhri Jagan Sridharan badhri@google.com usb: typec: tcpm: Fix response to vsafe0V event
Prashanth K quic_prashk@quicinc.com usb: common: usb-conn-gpio: Prevent bailing out if initial role is none
Alan Stern stern@rowland.harvard.edu USB: Gadget: core: Help prevent panic during UVC unconfigure
Elson Roy Serrao quic_eserrao@quicinc.com usb: dwc3: Properly handle processing of pending events
Alan Stern stern@rowland.harvard.edu usb-storage: alauda: Fix uninit-value in alauda_check_media()
Mika Westerberg mika.westerberg@linux.intel.com thunderbolt: Fix memory leak in tb_handle_dp_bandwidth_request()
Ricky WU ricky_wu@realtek.com misc: rtsx: judge ASPM Mode to set PETXCFG Reg
Qi Zheng zhengqi.arch@bytedance.com binder: fix memory leak in binder_init()
Alvin Šipraga alsi@bang-olufsen.dk iio: adc: ina2xx: avoid NULL pointer dereference on OF device match
George Stark gnstark@sberdevices.ru iio: adc: meson: fix core clock enable/disable moment
Alisa Roman alisa.roman@analog.com iio: adc: ad7192: Fix ac excitation feature
Dan Carpenter dan.carpenter@linaro.org iio: frequency: admv1013: propagate errors from regulator_get_voltage()
Yiyuan Guo yguoaz@gmail.com iio: cros_ec: Fix the allocation size for cros_ec_command
Evan Quan evan.quan@amd.com drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation
Evan Quan evan.quan@amd.com drm/amd/pm: expose swctf threshold setting for legacy powerplay
Miaohe Lin linmiaohe@huawei.com mm: memory-failure: avoid false hwpoison page mapped error info
Miaohe Lin linmiaohe@huawei.com mm: memory-failure: fix potential unexpected return value from unpoison_memory()
Ayush Jain ayush.jain3@amd.com selftests: mm: ksm: fix incorrect evaluation of parameter
SeongJae Park sj@kernel.org mm/damon/core: initialize damo_filter->list from damos_new_filter()
Mike Kravetz mike.kravetz@oracle.com hugetlb: do not clear hugetlb dtor until allocating vmemmap
Karol Wachowski karol.wachowski@linux.intel.com accel/ivpu: Add set_pages_array_wc/uc for internal buffers
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput
Lorenzo Stoakes lstoakes@gmail.com fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions
Thomas Weißschuh linux@weissschuh.net cpufreq: amd-pstate: fix global sysfs attribute type
Colin Ian King colin.i.king@gmail.com radix tree test suite: fix incorrect allocation size for pthreads
Tao Ren rentao.bupt@gmail.com hwmon: (pmbus/bel-pfe) Enable PMBUS_SKIP_STATUS_CHECK for pfe1100
Andrew Yang andrew.yang@mediatek.com zsmalloc: fix races between modifications of fullness and isolated
Aleksa Sarai cyphar@cyphar.com io_uring: correct check for O_TMPFILE
Maulik Shah quic_mkshah@quicinc.com cpuidle: psci: Move enabling OSI mode after power domains creation
Maulik Shah quic_mkshah@quicinc.com cpuidle: dt_idle_genpd: Add helper function to remove genpd topology
Jarkko Sakkinen jarkko@kernel.org tpm_tis: Opt-in interrupts
Peter Ujfalusi peter.ujfalusi@linux.intel.com tpm: tpm_tis: Fix UPX-i11 DMI_MATCH condition
Mario Limonciello mario.limonciello@amd.com drm/amd: Disable S/G for APUs when 64GB or more host memory
Melissa Wen mwen@igalia.com drm/amd/display: check attr flag before set cursor degamma on DCN3+
Mario Limonciello mario.limonciello@amd.com drm/amd/display: Fix a regression on Polaris cards
Kenneth Feng kenneth.feng@amd.com drm/amd/pm: correct the pcie width for smu 13.0.0
Alex Deucher alexander.deucher@amd.com drm/amdgpu: fix possible UAF in amdgpu_cs_pass1()
Boris Brezillon boris.brezillon@collabora.com drm/shmem-helper: Reset vma->vm_ops before calling dma_buf_mmap()
Lyude Paul lyude@redhat.com drm/nouveau/nvkm/dp: Add workaround to fix DP 1.3+ DPCD issues
Karol Herbst kherbst@redhat.com drm/nouveau/gr: enable memory loads on helper invocation on all channels
August Wikerfors git@augustwikerfors.se nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G
Ming Lei ming.lei@redhat.com nvme-rdma: fix potential unbalanced freeze & unfreeze
Ming Lei ming.lei@redhat.com nvme-tcp: fix potential unbalanced freeze & unfreeze
Ming Lei ming.lei@redhat.com nvme: fix possible hang when removing a controller during error recovery
Nick Desaulniers ndesaulniers@google.com riscv: mm: fix 2 instances of -Wmissing-variable-declarations
Torsten Duwe duwe@suse.de riscv/kexec: handle R_RISCV_CALL_PLT relocation type
Andrea Parri parri.andrea@gmail.com riscv,mmio: Fix readX()-to-delay() ordering
Torsten Duwe duwe@suse.de riscv/kexec: load initrd high in available memory
Alexandre Ghiti alexghiti@rivosinc.com riscv: Start of DRAM should at least be aligned on PMD size for the direct mapping
Helge Deller deller@gmx.de parisc: Fix lightweight spinlock checks to not break futexes
Helge Deller deller@gmx.de io_uring/parisc: Adjust pgoff in io_uring mmap() for parisc
Christoph Hellwig hch@lst.de zram: take device and not only bvec offset into account
Hans de Goede hdegoede@redhat.com ACPI: resource: Add IRQ override quirk for PCSpecialist Elimina Pro 16 M
Hans de Goede hdegoede@redhat.com ACPI: resource: Honor MADT INT_SRC_OVR settings for IRQ1 on AMD Zen
Hans de Goede hdegoede@redhat.com ACPI: resource: Always use MADT override IRQ settings for all legacy non i8042 IRQs
Hans de Goede hdegoede@redhat.com ACPI: resource: revert "Remove "Zen" specific match and quirks"
Souradeep Chakrabarti schakrabarti@linux.microsoft.com net: mana: Fix MANA VF unload when hardware is unresponsive
Miquel Raynal miquel.raynal@bootlin.com dmaengine: xilinx: xdma: Fix interrupt vector setting
Ilpo Järvinen ilpo.jarvinen@linux.intel.com dmaengine: pl330: Return DMA_PAUSED when transaction is paused
Paolo Abeni pabeni@redhat.com mptcp: fix disconnect vs accept race
Paolo Abeni pabeni@redhat.com mptcp: avoid bogus reset on fallback close
Andrea Claudi aclaudi@redhat.com selftests: mptcp: join: fix 'implicit EP' test
Andrea Claudi aclaudi@redhat.com selftests: mptcp: join: fix 'delete and re-add' test
Maciej Żenczykowski maze@google.com ipv6: adjust ndisc_is_useropt() to also return true for PIO
Kunihiko Hayashi hayashi.kunihiko@socionext.com mmc: sdhci-f-sdh30: Replace with sdhci_pltfm
Sergei Antonov saproj@gmail.com mmc: moxart: read scr register without changing byte order
Jason A. Donenfeld Jason@zx2c4.com wireguard: allowedips: expand maximum node depth
Ido Schimmel idosch@nvidia.com selftests: forwarding: Set default IPv6 traceroute utility
Ping-Ke Shih pkshih@realtek.com wifi: rtw89: fix 8852AE disconnection caused by RX full flags
Keith Yeo keithyjy@gmail.com wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems()
Paolo Bonzini pbonzini@redhat.com KVM: SEV: only access GHCB fields once
Paolo Bonzini pbonzini@redhat.com KVM: SEV: snapshot the GHCB before accessing it
Namjae Jeon linkinjeon@kernel.org ksmbd: fix wrong next length validation of ea buffer in smb2_set_ea()
Long Li leo.lilong@huawei.com ksmbd: validate command request size
Mario Limonciello mario.limonciello@amd.com tpm: Add a helper for checking hwrng enabled
Jonathan McDowell noodles@meta.com tpm/tpm_tis: Disable interrupts for Lenovo P620 devices
Mario Limonciello mario.limonciello@amd.com tpm: Disable RNG for all AMD fTPMs
Takashi Iwai tiwai@suse.de tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7
Diffstat:
Makefile | 4 +- arch/alpha/kernel/setup.c | 3 +- arch/arm64/kvm/arm.c | 15 +- arch/parisc/Kconfig.debug | 2 +- arch/parisc/include/asm/spinlock.h | 2 - arch/parisc/include/asm/spinlock_types.h | 6 + arch/parisc/kernel/sys_parisc.c | 15 +- arch/parisc/kernel/syscall.S | 23 +- arch/riscv/include/asm/mmio.h | 16 +- arch/riscv/include/asm/pgtable.h | 2 + arch/riscv/kernel/elf_kexec.c | 3 +- arch/riscv/mm/init.c | 16 +- arch/riscv/mm/kasan_init.c | 1 - arch/x86/boot/compressed/idt_64.c | 9 +- arch/x86/boot/compressed/sev.c | 37 ++- arch/x86/entry/vdso/vma.c | 4 +- arch/x86/include/asm/acpi.h | 2 + arch/x86/include/asm/linkage.h | 2 +- arch/x86/include/asm/processor.h | 2 + arch/x86/include/asm/segment.h | 2 +- arch/x86/kernel/acpi/boot.c | 4 + arch/x86/kernel/cpu/amd.c | 1 + arch/x86/kernel/vmlinux.lds.S | 12 +- arch/x86/kvm/svm/sev.c | 94 ++++---- arch/x86/kvm/svm/svm.h | 26 +++ arch/x86/kvm/x86.c | 2 - drivers/accel/ivpu/ivpu_gem.c | 8 + drivers/acpi/resource.c | 64 +++++ drivers/acpi/scan.c | 1 + drivers/android/binder.c | 1 + drivers/android/binder_alloc.c | 6 + drivers/android/binder_alloc.h | 1 + drivers/block/zram/zram_drv.c | 32 ++- drivers/char/tpm/tpm-chip.c | 83 ++----- drivers/char/tpm/tpm_crb.c | 30 +++ drivers/char/tpm/tpm_tis.c | 20 +- drivers/cpufreq/amd-pstate.c | 10 +- drivers/cpuidle/cpuidle-psci-domain.c | 39 ++-- drivers/cpuidle/dt_idle_genpd.c | 24 ++ drivers/cpuidle/dt_idle_genpd.h | 7 + drivers/dma/idxd/device.c | 4 +- drivers/dma/mcf-edma.c | 13 +- drivers/dma/owl-dma.c | 2 +- drivers/dma/pl330.c | 18 +- drivers/dma/xilinx/xdma.c | 6 +- drivers/gpio/gpio-sim.c | 1 + drivers/gpio/gpio-ws16c48.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 +++ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 +- .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 2 +- .../amd/display/dc/dce110/dce110_hw_sequencer.c | 3 +- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c | 7 +- drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 2 + drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 48 ++++ .../drm/amd/pm/powerplay/hwmgr/hardwaremanager.c | 4 +- .../gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 2 + .../gpu/drm/amd/pm/powerplay/hwmgr/smu_helper.c | 27 +-- .../gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 10 + .../gpu/drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c | 4 + .../gpu/drm/amd/pm/powerplay/hwmgr/vega20_hwmgr.c | 4 + drivers/gpu/drm/amd/pm/powerplay/inc/hwmgr.h | 2 + drivers/gpu/drm/amd/pm/powerplay/inc/power_state.h | 1 + drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 34 +++ drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 2 + drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 9 +- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 9 +- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 3 +- drivers/gpu/drm/bridge/ite-it6505.c | 4 +- drivers/gpu/drm/drm_gem_shmem_helper.c | 6 + drivers/gpu/drm/nouveau/nouveau_connector.c | 2 +- drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c | 48 +++- drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.h | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk104.c | 4 +- drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110.c | 10 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk110b.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgk208.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgm107.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c | 13 -- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 17 +- drivers/hwmon/aquacomputer_d5next.c | 37 ++- drivers/hwmon/pmbus/bel-pfe.c | 16 +- drivers/iio/adc/ad7192.c | 16 +- drivers/iio/adc/ina2xx-adc.c | 9 +- drivers/iio/adc/meson_saradc.c | 23 +- .../common/cros_ec_sensors/cros_ec_sensors_core.c | 2 +- drivers/iio/frequency/admv1013.c | 5 +- drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 2 +- drivers/iio/industrialio-core.c | 5 +- drivers/iio/light/rohm-bu27034.c | 22 +- drivers/infiniband/core/umem.c | 3 +- drivers/infiniband/hw/bnxt_re/main.c | 4 +- drivers/infiniband/hw/hfi1/chip.c | 1 + drivers/interconnect/qcom/bcm-voter.c | 5 + drivers/interconnect/qcom/icc-rpmh.h | 2 + drivers/interconnect/qcom/sa8775p.c | 1 + drivers/interconnect/qcom/sm8450.c | 9 + drivers/interconnect/qcom/sm8550.c | 17 ++ drivers/isdn/mISDN/dsp.h | 2 +- drivers/isdn/mISDN/dsp_cmx.c | 2 +- drivers/isdn/mISDN/dsp_core.c | 2 +- drivers/misc/cardreader/rts5227.c | 2 +- drivers/misc/cardreader/rts5228.c | 18 -- drivers/misc/cardreader/rts5249.c | 3 +- drivers/misc/cardreader/rts5260.c | 18 -- drivers/misc/cardreader/rts5261.c | 18 -- drivers/misc/cardreader/rtsx_pcr.c | 5 +- drivers/mmc/host/moxart-mmc.c | 8 +- drivers/mmc/host/sdhci_f_sdh30.c | 60 +++-- drivers/net/bonding/bond_main.c | 4 +- drivers/net/dsa/ocelot/felix.c | 2 + drivers/net/ethernet/freescale/enetc/enetc_pf.c | 103 +++++--- drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 4 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 14 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 4 +- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 24 +- drivers/net/ethernet/ibm/ibmvnic.c | 112 +++++++-- drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 5 +- drivers/net/ethernet/intel/iavf/iavf_fdir.c | 11 +- drivers/net/ethernet/intel/igc/igc.h | 4 + drivers/net/ethernet/intel/igc/igc_main.c | 34 ++- .../ethernet/marvell/prestera/prestera_router.c | 14 +- .../ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 6 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 21 +- .../net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 2 +- .../net/ethernet/mellanox/mlx5/core/lib/clock.c | 5 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 3 +- .../ethernet/mellanox/mlx5/core/steering/dr_ptrn.c | 2 +- drivers/net/ethernet/microsoft/mana/mana_en.c | 37 ++- drivers/net/ethernet/pensando/ionic/ionic_lif.c | 23 +- drivers/net/macsec.c | 28 +-- drivers/net/phy/at803x.c | 2 - drivers/net/tun.c | 2 +- drivers/net/vxlan/vxlan_vnifilter.c | 11 +- drivers/net/wireguard/allowedips.c | 8 +- drivers/net/wireguard/selftest/allowedips.c | 16 +- .../broadcom/brcm80211/brcmfmac/cfg80211.c | 5 + drivers/net/wireless/realtek/rtw89/mac.c | 2 +- drivers/nvme/host/core.c | 10 +- drivers/nvme/host/pci.c | 3 +- drivers/nvme/host/rdma.c | 3 +- drivers/nvme/host/tcp.c | 3 +- drivers/pci/bus.c | 4 +- drivers/pci/controller/Kconfig | 1 - drivers/pci/of.c | 5 - drivers/platform/x86/lenovo-ymc.c | 25 ++ drivers/platform/x86/mlx-platform.c | 23 +- drivers/platform/x86/msi-ec.c | 18 +- drivers/platform/x86/serial-multi-instantiate.c | 35 ++- drivers/scsi/53c700.c | 2 +- drivers/scsi/fnic/fnic.h | 2 +- drivers/scsi/fnic/fnic_scsi.c | 6 +- drivers/scsi/qedf/qedf_main.c | 18 ++ drivers/scsi/qedi/qedi_main.c | 18 ++ drivers/scsi/raid_class.c | 1 + drivers/scsi/scsi_proc.c | 30 +-- drivers/scsi/snic/snic_disc.c | 1 + drivers/scsi/storvsc_drv.c | 4 - drivers/thunderbolt/tb.c | 2 + drivers/ufs/host/ufs-renesas.c | 2 +- drivers/usb/common/usb-conn-gpio.c | 6 +- drivers/usb/dwc3/gadget.c | 9 +- drivers/usb/gadget/udc/core.c | 9 + drivers/usb/storage/alauda.c | 12 +- drivers/usb/typec/altmodes/displayport.c | 18 +- drivers/usb/typec/tcpm/tcpm.c | 7 + fs/btrfs/block-group.c | 17 +- fs/btrfs/block-group.h | 2 + fs/btrfs/disk-io.c | 3 +- fs/btrfs/extent-tree.c | 5 +- fs/btrfs/extent_io.c | 13 +- fs/btrfs/inode.c | 10 +- fs/btrfs/relocation.c | 45 +++- fs/btrfs/tree-checker.c | 14 ++ fs/nilfs2/inode.c | 8 + fs/nilfs2/segment.c | 2 + fs/nilfs2/the_nilfs.h | 2 + fs/proc/kcore.c | 30 ++- fs/smb/server/smb2misc.c | 10 +- fs/smb/server/smb2pdu.c | 9 +- include/linux/cpu.h | 2 + include/linux/skmsg.h | 1 + include/linux/tpm.h | 1 + include/net/cfg80211.h | 3 + include/net/netfilter/nf_tables.h | 64 ++++- include/trace/events/tcp.h | 5 +- io_uring/io_uring.c | 3 + io_uring/openclose.c | 6 +- mm/damon/core.c | 1 + mm/hugetlb.c | 75 ++++-- mm/memory-failure.c | 29 +-- mm/zsmalloc.c | 14 +- net/core/filter.c | 6 - net/core/skmsg.c | 10 +- net/core/sock_map.c | 10 +- net/dccp/output.c | 2 +- net/dccp/proto.c | 10 +- net/ipv4/ip_tunnel_core.c | 2 +- net/ipv4/nexthop.c | 28 +-- net/ipv6/ndisc.c | 3 +- net/mptcp/protocol.c | 4 +- net/mptcp/protocol.h | 1 - net/mptcp/subflow.c | 58 ++--- net/netfilter/nf_tables_api.c | 259 +++++++++++++++++++-- net/netfilter/nft_set_hash.c | 85 ++++--- net/netfilter/nft_set_pipapo.c | 66 ++++-- net/netfilter/nft_set_rbtree.c | 146 +++++++----- net/packet/af_packet.c | 16 +- net/smc/af_smc.c | 77 ++++-- net/smc/smc.h | 2 +- net/smc/smc_clc.c | 4 +- net/smc/smc_core.c | 25 +- net/smc/smc_sysctl.c | 10 +- net/tls/tls_device.c | 64 ++--- net/wireless/nl80211.c | 5 +- net/xdp/xsk.c | 1 + tools/testing/radix-tree/regression1.c | 2 +- .../selftests/bpf/prog_tests/sockmap_listen.c | 2 +- tools/testing/selftests/mm/ksm_tests.c | 1 + tools/testing/selftests/net/fib_nexthops.sh | 10 + .../testing/selftests/net/forwarding/bridge_mdb.sh | 59 ++--- .../selftests/net/forwarding/bridge_mdb_max.sh | 19 +- tools/testing/selftests/net/forwarding/ethtool.sh | 2 + .../net/forwarding/ethtool_extended_state.sh | 2 + .../testing/selftests/net/forwarding/ethtool_mm.sh | 18 +- .../selftests/net/forwarding/hw_stats_l3_gre.sh | 2 + .../net/forwarding/ip6_forward_instats_vrf.sh | 2 + tools/testing/selftests/net/forwarding/lib.sh | 17 ++ tools/testing/selftests/net/forwarding/settings | 1 + .../testing/selftests/net/forwarding/tc_actions.sh | 6 +- .../testing/selftests/net/forwarding/tc_flower.sh | 8 +- .../selftests/net/forwarding/tc_tunnel_key.sh | 9 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 6 +- tools/testing/selftests/rseq/Makefile | 4 +- tools/testing/selftests/rseq/rseq.c | 2 + 238 files changed, 2516 insertions(+), 1034 deletions(-)
linux-stable-mirror@lists.linaro.org