This is the start of the stable review cycle for the 6.7.9 release. There are 162 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.9-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.7.9-rc1
Danilo Krummrich dakr@redhat.com drm/nouveau: don't fini scheduler before entity flush
Geliang Tang tanggeliang@kylinos.cn selftests: mptcp: rm subflow with v4/v4mapped addr
Geliang Tang geliang.tang@linux.dev selftests: mptcp: add mptcp_lib_is_v6
Geliang Tang geliang.tang@linux.dev selftests: mptcp: update userspace pm test helpers
Geliang Tang geliang.tang@linux.dev selftests: mptcp: add chk_subflows_total helper
Geliang Tang geliang.tang@linux.dev selftests: mptcp: add evts_get_info helper
Pawan Gupta pawan.kumar.gupta@linux.intel.com KVM/VMX: Move VERW closer to VMentry for MDS mitigation
Pawan Gupta pawan.kumar.gupta@linux.intel.com KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
Pawan Gupta pawan.kumar.gupta@linux.intel.com x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
Pawan Gupta pawan.kumar.gupta@linux.intel.com x86/entry_32: Add VERW just before userspace transition
Pawan Gupta pawan.kumar.gupta@linux.intel.com x86/entry_64: Add VERW just before userspace transition
Ming Lei ming.lei@redhat.com block: define bvec_iter as __packed __aligned(4)
Bartosz Golaszewski bartosz.golaszewski@linaro.org gpio: fix resource unwinding order in error path
Andy Shevchenko andriy.shevchenko@linux.intel.com gpiolib: Fix the error path order in gpiochip_add_data_with_key()
Arturas Moskvinas arturas.moskvinas@gmail.com gpio: 74x164: Enable output pins after registers are reset
Nathan Lynch nathanl@linux.ibm.com powerpc/rtas: use correct function name for resetting TCE tables
Gaurav Batra gbatra@linux.vnet.ibm.com powerpc/pseries/iommu: IOMMU table is not initialized for kdump over SR-IOV
Fenghua Yu fenghua.yu@intel.com dmaengine: idxd: Ensure safe user copy of completion record
Fenghua Yu fenghua.yu@intel.com dmaengine: idxd: Remove shadow Event Log head stored in idxd
Dmitry Baryshkov dmitry.baryshkov@linaro.org phy: qcom-qmp-usb: fix v3 offsets data
Yang Yingliang yangyingliang@huawei.com phy: qcom: phy-qcom-m31: fix wrong pointer pass to PTR_ERR()
Alexander Stein alexander.stein@ew.tq-group.com phy: freescale: phy-fsl-imx8-mipi-dphy: Fix alias name to use dashes
Kory Maincent kory.maincent@bootlin.com dmaengine: dw-edma: eDMA: Add sync read before starting the DMA transfer in remote setup
Kory Maincent kory.maincent@bootlin.com dmaengine: dw-edma: HDMA: Add sync read before starting the DMA transfer in remote setup
Kory Maincent kory.maincent@bootlin.com dmaengine: dw-edma: Add HDMA remote interrupt configuration
Kory Maincent kory.maincent@bootlin.com dmaengine: dw-edma: HDMA_V0_REMOTEL_STOP_INT_EN typo fix
Kory Maincent kory.maincent@bootlin.com dmaengine: dw-edma: Fix wrong interrupt bit set for HDMA
Kory Maincent kory.maincent@bootlin.com dmaengine: dw-edma: Fix the ch_count hdma callback
Dan Carpenter dan.carpenter@linaro.org ASoC: cs35l56: fix reversed if statement in cs35l56_dspwait_asp1tx_put()
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Drop oob_skb ref before purging queue in GC.
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Fix task hung while purging oob_skb in GC.
NeilBrown neilb@suse.de NFS: Fix data corruption caused by congestion.
Peter Ujfalusi peter.ujfalusi@gmail.com mfd: twl6030-irq: Revert to use of_match_device()
Paolo Abeni pabeni@redhat.com mptcp: fix possible deadlock in subflow diag
Davide Caratti dcaratti@redhat.com mptcp: fix double-free on socket dismantle
Paolo Abeni pabeni@redhat.com mptcp: fix potential wake-up event loss
Paolo Abeni pabeni@redhat.com mptcp: fix snd_wnd initialization for passive socket
Geliang Tang tanggeliang@kylinos.cn selftests: mptcp: join: add ss mptcp support check
Paolo Abeni pabeni@redhat.com mptcp: push at DSS boundaries
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: avoid printing warning once on client side
Geliang Tang tanggeliang@kylinos.cn mptcp: map v4 address to v6 when destroying subflow
Paolo Bonzini pbonzini@redhat.com x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers
Paolo Bonzini pbonzini@redhat.com x86/cpu: Allow reducing x86_phys_bits during early_identify_cpu()
Jiri Bohac jbohac@suse.cz x86/e820: Don't reserve SETUP_RNG_SEED in e820
Byungchul Park byungchul@sk.com mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
Aneesh Kumar K.V (IBM) aneesh.kumar@kernel.org mm/debug_vm_pgtable: fix BUG_ON with pud advanced test
Masami Hiramatsu (Google) mhiramat@kernel.org fprobe: Fix to allocate entry_data_size buffer with rethook instances
Bjorn Andersson quic_bjorande@quicinc.com pmdomain: qcom: rpmhpd: Fix enabled_corner aggregation
Cristian Marussi cristian.marussi@arm.com pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal
Tim Schumacher timschumi@gmx.de efivarfs: Request at most 512 bytes for variable names
Nicolin Chen nicolinc@nvidia.com iommufd: Fix protection fault in iommufd_test_syz_conv_iova
Nicolin Chen nicolinc@nvidia.com iommufd: Fix iopt_access_list_id overwrite bug
Nathan Chancellor nathan@kernel.org kbuild: Add -Wa,--fatal-warnings to as-instr invocation
Thomas Weißschuh linux@weissschuh.net power: supply: mm8013: select REGMAP_I2C
Samuel Holland samuel.holland@sifive.com riscv: Save/restore envcfg CSR during CPU suspend
Samuel Holland samuel.holland@sifive.com riscv: Fix enabling cbo.zero when running in M-mode
Zong Li zong.li@sifive.com riscv: add CALLER_ADDRx support
Nathan Chancellor nathan@kernel.org RISC-V: Drop invalid test from CONFIG_AS_HAS_OPTION_ARCH
Xiubo Li xiubli@redhat.com ceph: switch to corrected encoding of max_xattr_size in mdsmap
Elad Nachman enachman@marvell.com mmc: sdhci-xenon: fix PHY init clock stability
Elad Nachman enachman@marvell.com mmc: sdhci-xenon: add timeout for PHY init complete
Ivan Semenov ivan@semenov.dev mmc: core: Fix eMMC initialization with 1-bit bus connection
Christophe Kerello christophe.kerello@foss.st.com mmc: mmci: stm32: fix DMA API overlapping mappings warning
Curtis Klein curtis.klein@hpe.com dmaengine: fsl-qdma: init irq after reg initialization
Joy Zou joy.zou@nxp.com dmaengine: fsl-edma: correct calculation of 'nbytes' in multi-fifo scenario
Tadeusz Struk tstruk@gigaio.com dmaengine: ptdma: use consistent DMA masks
Ard Biesheuvel ardb@kernel.org crypto: arm64/neonbs - fix out-of-bounds access on short input
Peng Ma peng.ma@nxp.com dmaengine: fsl-qdma: fix SoC may hang on 16 byte unaligned read
Rob Clark robdclark@chromium.org soc: qcom: pmic_glink: Fix boot when QRTR=m
Ryan Lin tsung-hua.lin@amd.com drm/amd/display: Add monitor patch for specific eDP
Ma Jun Jun.Ma2@amd.com drm/amdgpu/pm: Fix the power1_min_cap value
Matthew Auld matthew.auld@intel.com drm/buddy: fix range bias
Alex Deucher alexander.deucher@amd.com Revert "drm/amd/pm: resolve reboot exception for si oland"
Filipe Manana fdmanana@suse.com btrfs: send: don't issue unnecessary zero writes for trailing hole
David Sterba dsterba@suse.com btrfs: dev-replace: properly validate device names
Filipe Manana fdmanana@suse.com btrfs: fix double free of anonymous device after snapshot creation failure
Johannes Berg johannes.berg@intel.com wifi: nl80211: reject iftype change with mesh ID change
Elad Nachman enachman@marvell.com mtd: rawnand: marvell: fix layouts
Nhat Pham nphamcs@gmail.com mm: cachestat: fix folio read-after-free in cache walk
Alexander Ofitserov oficerovas@altlinux.org gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
Mickaël Salaün mic@digikod.net landlock: Fix asymmetric private inodes referring
Johan Hovold johan+linaro@kernel.org Bluetooth: hci_bcm4377: do not mark valid bd_addr as invalid
Willian Wang git@willian.wang ALSA: hda/realtek: Add special fixup for Lenovo 14IRP8
Eniac Zhang eniac-xw.zhang@hp.com ALSA: hda/realtek: fix mute/micmute LED For HP mt440
Hans Peter flurry123@gmx.ch ALSA: hda/realtek: Enable Mute LED on HP 840 G8 (MB 8AB8)
Gergo Koteles soyer@irl.hu ALSA: hda/realtek: tas2781: enable subwoofer volume control
Jay Ajit Mate jay.mate15@gmail.com ALSA: hda/realtek: Fix top speaker connection on Dell Inspiron 16 Plus 7630
Takashi Iwai tiwai@suse.de ALSA: ump: Fix the discard error code from snd_ump_legacy_open()
Takashi Sakamoto o-takashi@sakamocchi.jp ALSA: firewire-lib: fix to check cycle continuity
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp tomoyo: fix UAF write bug in tomoyo_write_control()
Saravana Kannan saravanak@google.com of: property: fw_devlink: Fix stupid bug in remote-endpoint parsing
Sid Pranjale sidpranjale127@protonmail.com drm/nouveau: keep DMA buffers required for suspend/resume
Filipe Manana fdmanana@suse.com btrfs: fix race between ordered extent completion and fiemap
Dimitris Vlachos dvlachos@ics.forth.gr riscv: Sparse-Memory/vmemmap out-of-bounds fix
Alexandre Ghiti alexghiti@rivosinc.com riscv: Fix pte_leaf_size() for NAPOT
Alexandre Ghiti alexghiti@rivosinc.com Revert "riscv: mm: support Svnapot in huge vmap"
Vadim Shakirov vadim.shakirov@syntacore.com drivers: perf: ctr_get_width function for legacy is not defined
Vadim Shakirov vadim.shakirov@syntacore.com drivers: perf: added capabilities for legacy PMU
Srinivasan Shanmugam srinivasan.shanmugam@amd.com drm/amd/display: Prevent potential buffer overflow in map_hw_resources
David Howells dhowells@redhat.com afs: Fix endless loop in directory parsing
Jiri Slaby (SUSE) jirislaby@kernel.org fbcon: always restore the old font data in fbcon_do_set_font()
Thierry Reding treding@nvidia.com drm/tegra: Remove existing framebuffer only if we support display
Conor Dooley conor@kernel.org RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs
Richard Fitzgerald rf@opensource.cirrus.com ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs35l56: Fix deadlock in ASP1 mixer register initialization
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs35l56: Fix misuse of wm_adsp 'part' string for silicon revision
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs35l56: Fix for initializing ASP1 mixer registers
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs35l56: Don't add the same register patch multiple times
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs35l56: cs35l56_component_remove() must clean up wm_adsp
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs35l56: cs35l56_component_remove() must clear cs35l56->component
Alexandre Ghiti alexghiti@rivosinc.com riscv: Fix build error if !CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
Yangyu Chen cyy@cyyself.name riscv: mm: fix NOCACHE_THEAD does not set bit[61] correctly
Mikko Perttunen mperttunen@nvidia.com gpu: host1x: Skip reset assert on Tegra186
Colin Ian King colin.i.king@gmail.com ASoC: qcom: Fix uninitialized pointer dmactl
Takashi Iwai tiwai@suse.de ALSA: Drop leftover snd-rtctimer stuff from Makefile
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs35l56: Must clear HALO_STATE before issuing SYSTEM_RESET
Hans de Goede hdegoede@redhat.com power: supply: bq27xxx-i2c: Do not free non existing IRQ
Arnd Bergmann arnd@arndb.de efi/capsule-loader: fix incorrect allocation size
Jisheng Zhang jszhang@kernel.org riscv: tlb: fix __p*d_free_tlb()
Sabrina Dubroca sd@queasysnail.net tls: fix use-after-free on failed backlog decryption
Sabrina Dubroca sd@queasysnail.net tls: separate no-async decryption request handling from async
Sabrina Dubroca sd@queasysnail.net tls: fix peeking with sync+async decryption
Sabrina Dubroca sd@queasysnail.net tls: decrement decrypt_pending if no async completion will be called
Lukasz Majewski lukma@denx.de net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
Oleksij Rempel o.rempel@pengutronix.de igb: extend PTP timestamp adjustments to i211
Lin Ma linma@zju.edu.cn rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
Jakub Kicinski kuba@kernel.org tools: ynl: fix handling of multiple mcast groups
Florian Westphal fw@strlen.de netfilter: bridge: confirm multicast packets before passing them up the stack
Ignat Korchagin ignat@cloudflare.com netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Zijun Hu quic_zijuhu@quicinc.com Bluetooth: qca: Fix triggering coredump implementation
Janaki Ramaiah Thota quic_janathot@quicinc.com Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Zijun Hu quic_zijuhu@quicinc.com Bluetooth: qca: Fix wrong event type for patch config command
Kai-Heng Feng kai.heng.feng@canonical.com Bluetooth: Enforce validation on max value of connection interval
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Zijun Hu quic_zijuhu@quicinc.com Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_sync: Fix accept_list when attempting to suspend
Ying Hsu yinghsu@chromium.org Bluetooth: Avoid potential use-after-free in hci_error_reset
Jonas Dreßler verdre@v0yd.nl Bluetooth: hci_sync: Check the correct flag before starting a scan
Jakub Raczynski j.raczynski@samsung.com stmmac: Clear variable when destroying workqueue
Justin Iurman justin.iurman@uliege.be uapi: in6: replace temporary label with rfc9486
Oleksij Rempel o.rempel@pengutronix.de net: lan78xx: fix "softirq work is pending" error
Javier Carrasco javier.carrasco.cruz@gmail.com net: usb: dm9601: fix wrong return value in dm9601_mdio_read
Jakub Kicinski kuba@kernel.org veth: try harder when allocating queue memory
Oleksij Rempel o.rempel@pengutronix.de lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detected
Eric Dumazet edumazet@google.com ipv6: fix potential "struct net" leak in inet6_rtm_getaddr()
Jakub Kicinski kuba@kernel.org net: veth: clear GRO when clearing XDP even when down
Doug Smythies dsmythies@telus.net cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back
Yunjian Wang wangyunjian@huawei.com tun: Fix xdp_rxq_info's queue_index when detaching
Vladimir Oltean vladimir.oltean@nxp.com net: dpaa: fman_memac: accept phy-interface-type = "10gbase-r" in the device tree
Jeremy Kerr jk@codeconstruct.com.au net: mctp: take ownership of skb in mctp_local_output
Florian Westphal fw@strlen.de net: ip_tunnel: prevent perpetual headroom growth
Florian Westphal fw@strlen.de netlink: add nla be16/32 types to minlen array
Ryosuke Yasuoka ryasuoka@redhat.com netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
Théo Lebrun theo.lebrun@bootlin.com spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
Théo Lebrun theo.lebrun@bootlin.com spi: cadence-qspi: fix pointer reference in runtime PM hooks
Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com ice: fix pin phase adjust updates on PF reset
Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com ice: fix dpll periodic work data updates on PF reset
Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com ice: fix dpll and dpll_pin data access on PF reset
Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com ice: fix dpll input pin phase_adjust value updates
Yochai Hagvi yochai.hagvi@intel.com ice: fix connection state of DPLL and out pin
Han Xu han.xu@nxp.com mtd: spinand: gigadevice: Fix the get ecc status issue
Josef Bacik josef@toxicpanda.com btrfs: fix deadlock with fiemap and extent locking
-------------
Diffstat:
Documentation/arch/x86/mds.rst | 34 +++- Makefile | 4 +- arch/arm64/crypto/aes-neonbs-glue.c | 11 ++ arch/powerpc/include/asm/rtas.h | 4 +- arch/powerpc/kernel/rtas.c | 9 +- arch/powerpc/platforms/pseries/iommu.c | 156 +++++++++++------ arch/riscv/Kconfig | 1 - arch/riscv/include/asm/csr.h | 2 + arch/riscv/include/asm/ftrace.h | 5 + arch/riscv/include/asm/hugetlb.h | 2 + arch/riscv/include/asm/pgalloc.h | 20 ++- arch/riscv/include/asm/pgtable-64.h | 2 +- arch/riscv/include/asm/pgtable.h | 6 +- arch/riscv/include/asm/suspend.h | 1 + arch/riscv/include/asm/vmalloc.h | 61 +------ arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/cpufeature.c | 17 +- arch/riscv/kernel/return_address.c | 48 +++++ arch/riscv/kernel/suspend.c | 4 + arch/riscv/mm/hugetlbpage.c | 2 + arch/x86/entry/entry_32.S | 3 + arch/x86/entry/entry_64.S | 11 ++ arch/x86/entry/entry_64_compat.S | 1 + arch/x86/include/asm/entry-common.h | 1 - arch/x86/include/asm/nospec-branch.h | 12 -- arch/x86/kernel/cpu/bugs.c | 15 +- arch/x86/kernel/cpu/common.c | 4 +- arch/x86/kernel/cpu/intel.c | 178 ++++++++++--------- arch/x86/kernel/e820.c | 8 +- arch/x86/kernel/nmi.c | 3 - arch/x86/kvm/vmx/run_flags.h | 7 +- arch/x86/kvm/vmx/vmenter.S | 9 +- arch/x86/kvm/vmx/vmx.c | 20 ++- drivers/bluetooth/btqca.c | 2 +- drivers/bluetooth/hci_bcm4377.c | 3 +- drivers/bluetooth/hci_qca.c | 22 ++- drivers/cpufreq/intel_pstate.c | 3 + drivers/dma/dw-edma/dw-edma-v0-core.c | 17 ++ drivers/dma/dw-edma/dw-hdma-v0-core.c | 39 +++-- drivers/dma/dw-edma/dw-hdma-v0-regs.h | 2 +- drivers/dma/fsl-edma-common.c | 2 +- drivers/dma/fsl-qdma.c | 25 +-- drivers/dma/idxd/cdev.c | 2 +- drivers/dma/idxd/debugfs.c | 2 +- drivers/dma/idxd/idxd.h | 1 - drivers/dma/idxd/init.c | 15 +- drivers/dma/idxd/irq.c | 3 +- drivers/dma/ptdma/ptdma-dmaengine.c | 2 - drivers/firmware/efi/capsule-loader.c | 2 +- drivers/gpio/gpio-74x164.c | 4 +- drivers/gpio/gpiolib.c | 12 +- .../drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 6 +- drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c | 5 + drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 29 +++ drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 9 +- drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 9 +- .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 9 +- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 9 +- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 9 +- drivers/gpu/drm/drm_buddy.c | 10 ++ drivers/gpu/drm/nouveau/nouveau_drm.c | 5 +- drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 4 +- drivers/gpu/drm/tegra/drm.c | 23 ++- drivers/gpu/host1x/dev.c | 15 +- drivers/gpu/host1x/dev.h | 6 + drivers/iommu/iommufd/io_pagetable.c | 9 +- drivers/iommu/iommufd/selftest.c | 27 ++- drivers/mfd/twl6030-irq.c | 10 +- drivers/mmc/core/mmc.c | 2 + drivers/mmc/host/mmci_stm32_sdmmc.c | 24 +++ drivers/mmc/host/sdhci-xenon-phy.c | 48 ++++- drivers/mtd/nand/raw/marvell_nand.c | 13 +- drivers/mtd/nand/spi/gigadevice.c | 6 +- drivers/net/ethernet/freescale/fman/fman_memac.c | 18 +- drivers/net/ethernet/intel/ice/ice_dpll.c | 91 ++++++++-- drivers/net/ethernet/intel/igb/igb_ptp.c | 5 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +- drivers/net/gtp.c | 12 +- drivers/net/tun.c | 1 + drivers/net/usb/dm9601.c | 2 +- drivers/net/usb/lan78xx.c | 5 +- drivers/net/veth.c | 40 ++--- drivers/of/property.c | 2 +- drivers/perf/riscv_pmu.c | 18 +- drivers/perf/riscv_pmu_legacy.c | 10 +- drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c | 2 +- drivers/phy/qualcomm/phy-qcom-m31.c | 2 +- drivers/phy/qualcomm/phy-qcom-qmp-usb.c | 10 +- drivers/pmdomain/arm/scmi_perf_domain.c | 3 + drivers/pmdomain/qcom/rpmhpd.c | 7 +- drivers/power/supply/Kconfig | 1 + drivers/power/supply/bq27xxx_battery_i2c.c | 4 +- drivers/soc/qcom/pmic_glink.c | 21 +-- drivers/spi/spi-cadence-quadspi.c | 11 +- drivers/video/fbdev/core/fbcon.c | 8 +- fs/afs/dir.c | 4 +- fs/btrfs/dev-replace.c | 24 ++- fs/btrfs/disk-io.c | 22 +-- fs/btrfs/disk-io.h | 2 +- fs/btrfs/extent_io.c | 165 ++++++++++++++--- fs/btrfs/ioctl.c | 2 +- fs/btrfs/send.c | 17 +- fs/btrfs/transaction.c | 2 +- fs/ceph/mdsmap.c | 7 +- fs/ceph/mdsmap.h | 6 +- fs/efivarfs/vars.c | 17 +- fs/nfs/write.c | 4 +- include/linux/bvec.h | 2 +- include/linux/netfilter.h | 1 + include/net/mctp.h | 1 + include/sound/soc-card.h | 2 + include/uapi/linux/in6.h | 2 +- kernel/trace/fprobe.c | 14 +- lib/nlattr.c | 4 + mm/debug_vm_pgtable.c | 8 + mm/filemap.c | 51 +++--- mm/migrate.c | 8 + net/bluetooth/hci_core.c | 7 +- net/bluetooth/hci_event.c | 13 +- net/bluetooth/hci_sync.c | 7 +- net/bluetooth/l2cap_core.c | 8 +- net/bridge/br_netfilter_hooks.c | 96 ++++++++++ net/bridge/netfilter/nf_conntrack_bridge.c | 30 ++++ net/core/rtnetlink.c | 11 +- net/hsr/hsr_forward.c | 2 +- net/ipv4/ip_tunnel.c | 28 ++- net/ipv6/addrconf.c | 7 +- net/mctp/route.c | 10 +- net/mptcp/diag.c | 3 + net/mptcp/options.c | 2 +- net/mptcp/pm_userspace.c | 10 ++ net/mptcp/protocol.c | 52 +++++- net/mptcp/protocol.h | 21 +-- net/netfilter/nf_conntrack_core.c | 1 + net/netfilter/nft_compat.c | 20 +++ net/netlink/af_netlink.c | 2 +- net/tls/tls_sw.c | 40 +++-- net/unix/garbage.c | 21 +-- net/wireless/nl80211.c | 2 + scripts/Kconfig.include | 2 +- scripts/Makefile.compiler | 2 +- security/landlock/fs.c | 4 +- security/tomoyo/common.c | 3 +- sound/core/Makefile | 1 - sound/core/ump.c | 4 +- sound/firewire/amdtp-stream.c | 2 +- sound/pci/hda/patch_realtek.c | 33 +++- sound/soc/codecs/cs35l45.c | 2 +- sound/soc/codecs/cs35l56-shared.c | 8 +- sound/soc/codecs/cs35l56.c | 195 ++++++++++++++++++--- sound/soc/codecs/cs35l56.h | 1 + sound/soc/fsl/fsl_xcvr.c | 12 +- sound/soc/qcom/lpass-cdc-dma.c | 2 +- sound/soc/soc-card.c | 24 ++- tools/net/ynl/lib/ynl.c | 1 + tools/testing/selftests/net/mptcp/mptcp_connect.sh | 16 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 192 ++++++++++++-------- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 15 ++ tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 8 +- tools/testing/selftests/net/mptcp/userspace_pm.sh | 86 +++++---- 160 files changed, 1927 insertions(+), 829 deletions(-)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit b0ad381fa7690244802aed119b478b4bdafc31dd ]
While working on the patchset to remove extent locking I got a lockdep splat with fiemap and pagefaulting with my new extent lock replacement lock.
This deadlock exists with our normal code, we just don't have lockdep annotations with the extent locking so we've never noticed it.
Since we're copying the fiemap extent to user space on every iteration we have the chance of pagefaulting. Because we hold the extent lock for the entire range we could mkwrite into a range in the file that we have mmap'ed. This would deadlock with the following stack trace
[<0>] lock_extent+0x28d/0x2f0 [<0>] btrfs_page_mkwrite+0x273/0x8a0 [<0>] do_page_mkwrite+0x50/0xb0 [<0>] do_fault+0xc1/0x7b0 [<0>] __handle_mm_fault+0x2fa/0x460 [<0>] handle_mm_fault+0xa4/0x330 [<0>] do_user_addr_fault+0x1f4/0x800 [<0>] exc_page_fault+0x7c/0x1e0 [<0>] asm_exc_page_fault+0x26/0x30 [<0>] rep_movs_alternative+0x33/0x70 [<0>] _copy_to_user+0x49/0x70 [<0>] fiemap_fill_next_extent+0xc8/0x120 [<0>] emit_fiemap_extent+0x4d/0xa0 [<0>] extent_fiemap+0x7f8/0xad0 [<0>] btrfs_fiemap+0x49/0x80 [<0>] __x64_sys_ioctl+0x3e1/0xb50 [<0>] do_syscall_64+0x94/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
I wrote an fstest to reproduce this deadlock without my replacement lock and verified that the deadlock exists with our existing locking.
To fix this simply don't take the extent lock for the entire duration of the fiemap. This is safe in general because we keep track of where we are when we're searching the tree, so if an ordered extent updates in the middle of our fiemap call we'll still emit the correct extents because we know what offset we were on before.
The only place we maintain the lock is searching delalloc. Since the delalloc stuff can change during writeback we want to lock the extent range so we have a consistent view of delalloc at the time we're checking to see if we need to set the delalloc flag.
With this patch applied we no longer deadlock with my testcase.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 62 ++++++++++++++++++++++++++++++++------------ 1 file changed, 45 insertions(+), 17 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8f724c54fc8e9..197b41d02735b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2645,16 +2645,34 @@ static int fiemap_process_hole(struct btrfs_inode *inode, * it beyond i_size. */ while (cur_offset < end && cur_offset < i_size) { + struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; u64 prealloc_start; + u64 lockstart; + u64 lockend; u64 prealloc_len = 0; bool delalloc;
+ lockstart = round_down(cur_offset, inode->root->fs_info->sectorsize); + lockend = round_up(end, inode->root->fs_info->sectorsize); + + /* + * We are only locking for the delalloc range because that's the + * only thing that can change here. With fiemap we have a lock + * on the inode, so no buffered or direct writes can happen. + * + * However mmaps and normal page writeback will cause this to + * change arbitrarily. We have to lock the extent lock here to + * make sure that nobody messes with the tree while we're doing + * btrfs_find_delalloc_in_range. + */ + lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, cur_offset, end, delalloc_cached_state, &delalloc_start, &delalloc_end); + unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) break;
@@ -2822,15 +2840,15 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { const u64 ino = btrfs_ino(inode); - struct extent_state *cached_state = NULL; struct extent_state *delalloc_cached_state = NULL; struct btrfs_path *path; struct fiemap_cache cache = { 0 }; struct btrfs_backref_share_check_ctx *backref_ctx; u64 last_extent_end; u64 prev_extent_end; - u64 lockstart; - u64 lockend; + u64 range_start; + u64 range_end; + const u64 sectorsize = inode->root->fs_info->sectorsize; bool stopped = false; int ret;
@@ -2841,12 +2859,11 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, goto out; }
- lockstart = round_down(start, inode->root->fs_info->sectorsize); - lockend = round_up(start + len, inode->root->fs_info->sectorsize); - prev_extent_end = lockstart; + range_start = round_down(start, sectorsize); + range_end = round_up(start + len, sectorsize); + prev_extent_end = range_start;
btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED); - lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end); if (ret < 0) @@ -2854,7 +2871,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, btrfs_release_path(path);
path->reada = READA_FORWARD; - ret = fiemap_search_slot(inode, path, lockstart); + ret = fiemap_search_slot(inode, path, range_start); if (ret < 0) { goto out_unlock; } else if (ret > 0) { @@ -2866,7 +2883,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, goto check_eof_delalloc; }
- while (prev_extent_end < lockend) { + while (prev_extent_end < range_end) { struct extent_buffer *leaf = path->nodes[0]; struct btrfs_file_extent_item *ei; struct btrfs_key key; @@ -2889,19 +2906,19 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, * The first iteration can leave us at an extent item that ends * before our range's start. Move to the next item. */ - if (extent_end <= lockstart) + if (extent_end <= range_start) goto next_item;
backref_ctx->curr_leaf_bytenr = leaf->start;
/* We have in implicit hole (NO_HOLES feature enabled). */ if (prev_extent_end < key.offset) { - const u64 range_end = min(key.offset, lockend) - 1; + const u64 hole_end = min(key.offset, range_end) - 1;
ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx, 0, 0, 0, - prev_extent_end, range_end); + prev_extent_end, hole_end); if (ret < 0) { goto out_unlock; } else if (ret > 0) { @@ -2911,7 +2928,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, }
/* We've reached the end of the fiemap range, stop. */ - if (key.offset >= lockend) { + if (key.offset >= range_end) { stopped = true; break; } @@ -3005,29 +3022,41 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, btrfs_free_path(path); path = NULL;
- if (!stopped && prev_extent_end < lockend) { + if (!stopped && prev_extent_end < range_end) { ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx, - 0, 0, 0, prev_extent_end, lockend - 1); + 0, 0, 0, prev_extent_end, range_end - 1); if (ret < 0) goto out_unlock; - prev_extent_end = lockend; + prev_extent_end = range_end; }
if (cache.cached && cache.offset + cache.len >= last_extent_end) { const u64 i_size = i_size_read(&inode->vfs_inode);
if (prev_extent_end < i_size) { + struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; + u64 lockstart; + u64 lockend; bool delalloc;
+ lockstart = round_down(prev_extent_end, sectorsize); + lockend = round_up(i_size, sectorsize); + + /* + * See the comment in fiemap_process_hole as to why + * we're doing the locking here. + */ + lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, prev_extent_end, i_size - 1, &delalloc_cached_state, &delalloc_start, &delalloc_end); + unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) cache.flags |= FIEMAP_EXTENT_LAST; } else { @@ -3038,7 +3067,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = emit_last_fiemap_cache(fieinfo, &cache);
out_unlock: - unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); out: free_extent_state(delalloc_cached_state);
On Mon, Mar 4, 2024 at 9:26 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.7-stable review patch. If anyone has any objections, please let me know.
It would be better to delay the backport of this patch (and the followup fix) to any stable release, because it introduced another regression for which there is a reviewed fix but it's not yet in Linus' tree:
https://lore.kernel.org/linux-btrfs/cover.1709202499.git.fdmanana@suse.com/
Thanks.
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit b0ad381fa7690244802aed119b478b4bdafc31dd ]
While working on the patchset to remove extent locking I got a lockdep splat with fiemap and pagefaulting with my new extent lock replacement lock.
This deadlock exists with our normal code, we just don't have lockdep annotations with the extent locking so we've never noticed it.
Since we're copying the fiemap extent to user space on every iteration we have the chance of pagefaulting. Because we hold the extent lock for the entire range we could mkwrite into a range in the file that we have mmap'ed. This would deadlock with the following stack trace
[<0>] lock_extent+0x28d/0x2f0 [<0>] btrfs_page_mkwrite+0x273/0x8a0 [<0>] do_page_mkwrite+0x50/0xb0 [<0>] do_fault+0xc1/0x7b0 [<0>] __handle_mm_fault+0x2fa/0x460 [<0>] handle_mm_fault+0xa4/0x330 [<0>] do_user_addr_fault+0x1f4/0x800 [<0>] exc_page_fault+0x7c/0x1e0 [<0>] asm_exc_page_fault+0x26/0x30 [<0>] rep_movs_alternative+0x33/0x70 [<0>] _copy_to_user+0x49/0x70 [<0>] fiemap_fill_next_extent+0xc8/0x120 [<0>] emit_fiemap_extent+0x4d/0xa0 [<0>] extent_fiemap+0x7f8/0xad0 [<0>] btrfs_fiemap+0x49/0x80 [<0>] __x64_sys_ioctl+0x3e1/0xb50 [<0>] do_syscall_64+0x94/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
I wrote an fstest to reproduce this deadlock without my replacement lock and verified that the deadlock exists with our existing locking.
To fix this simply don't take the extent lock for the entire duration of the fiemap. This is safe in general because we keep track of where we are when we're searching the tree, so if an ordered extent updates in the middle of our fiemap call we'll still emit the correct extents because we know what offset we were on before.
The only place we maintain the lock is searching delalloc. Since the delalloc stuff can change during writeback we want to lock the extent range so we have a consistent view of delalloc at the time we're checking to see if we need to set the delalloc flag.
With this patch applied we no longer deadlock with my testcase.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org
fs/btrfs/extent_io.c | 62 ++++++++++++++++++++++++++++++++------------ 1 file changed, 45 insertions(+), 17 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8f724c54fc8e9..197b41d02735b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2645,16 +2645,34 @@ static int fiemap_process_hole(struct btrfs_inode *inode, * it beyond i_size. */ while (cur_offset < end && cur_offset < i_size) {
struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; u64 prealloc_start;
u64 lockstart;
u64 lockend; u64 prealloc_len = 0; bool delalloc;
lockstart = round_down(cur_offset, inode->root->fs_info->sectorsize);
lockend = round_up(end, inode->root->fs_info->sectorsize);
/*
* We are only locking for the delalloc range because that's the
* only thing that can change here. With fiemap we have a lock
* on the inode, so no buffered or direct writes can happen.
*
* However mmaps and normal page writeback will cause this to
* change arbitrarily. We have to lock the extent lock here to
* make sure that nobody messes with the tree while we're doing
* btrfs_find_delalloc_in_range.
*/
lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, cur_offset, end, delalloc_cached_state, &delalloc_start, &delalloc_end);
unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) break;
@@ -2822,15 +2840,15 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { const u64 ino = btrfs_ino(inode);
struct extent_state *cached_state = NULL; struct extent_state *delalloc_cached_state = NULL; struct btrfs_path *path; struct fiemap_cache cache = { 0 }; struct btrfs_backref_share_check_ctx *backref_ctx; u64 last_extent_end; u64 prev_extent_end;
u64 lockstart;
u64 lockend;
u64 range_start;
u64 range_end;
const u64 sectorsize = inode->root->fs_info->sectorsize; bool stopped = false; int ret;
@@ -2841,12 +2859,11 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, goto out; }
lockstart = round_down(start, inode->root->fs_info->sectorsize);
lockend = round_up(start + len, inode->root->fs_info->sectorsize);
prev_extent_end = lockstart;
range_start = round_down(start, sectorsize);
range_end = round_up(start + len, sectorsize);
prev_extent_end = range_start; btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end); if (ret < 0)
@@ -2854,7 +2871,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, btrfs_release_path(path);
path->reada = READA_FORWARD;
ret = fiemap_search_slot(inode, path, lockstart);
ret = fiemap_search_slot(inode, path, range_start); if (ret < 0) { goto out_unlock; } else if (ret > 0) {
@@ -2866,7 +2883,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, goto check_eof_delalloc; }
while (prev_extent_end < lockend) {
while (prev_extent_end < range_end) { struct extent_buffer *leaf = path->nodes[0]; struct btrfs_file_extent_item *ei; struct btrfs_key key;
@@ -2889,19 +2906,19 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, * The first iteration can leave us at an extent item that ends * before our range's start. Move to the next item. */
if (extent_end <= lockstart)
if (extent_end <= range_start) goto next_item; backref_ctx->curr_leaf_bytenr = leaf->start; /* We have in implicit hole (NO_HOLES feature enabled). */ if (prev_extent_end < key.offset) {
const u64 range_end = min(key.offset, lockend) - 1;
const u64 hole_end = min(key.offset, range_end) - 1; ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx, 0, 0, 0,
prev_extent_end, range_end);
prev_extent_end, hole_end); if (ret < 0) { goto out_unlock; } else if (ret > 0) {
@@ -2911,7 +2928,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, }
/* We've reached the end of the fiemap range, stop. */
if (key.offset >= lockend) {
if (key.offset >= range_end) { stopped = true; break; }
@@ -3005,29 +3022,41 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, btrfs_free_path(path); path = NULL;
if (!stopped && prev_extent_end < lockend) {
if (!stopped && prev_extent_end < range_end) { ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx,
0, 0, 0, prev_extent_end, lockend - 1);
0, 0, 0, prev_extent_end, range_end - 1); if (ret < 0) goto out_unlock;
prev_extent_end = lockend;
prev_extent_end = range_end; } if (cache.cached && cache.offset + cache.len >= last_extent_end) { const u64 i_size = i_size_read(&inode->vfs_inode); if (prev_extent_end < i_size) {
struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end;
u64 lockstart;
u64 lockend; bool delalloc;
lockstart = round_down(prev_extent_end, sectorsize);
lockend = round_up(i_size, sectorsize);
/*
* See the comment in fiemap_process_hole as to why
* we're doing the locking here.
*/
lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, prev_extent_end, i_size - 1, &delalloc_cached_state, &delalloc_start, &delalloc_end);
unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) cache.flags |= FIEMAP_EXTENT_LAST; } else {
@@ -3038,7 +3067,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = emit_last_fiemap_cache(fieinfo, &cache);
out_unlock:
unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
out: free_extent_state(delalloc_cached_state); -- 2.43.0
On Wed, Mar 06, 2024 at 12:39:06PM +0000, Filipe Manana wrote:
On Mon, Mar 4, 2024 at 9:26 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.7-stable review patch. If anyone has any objections, please let me know.
It would be better to delay the backport of this patch (and the followup fix) to any stable release, because it introduced another regression for which there is a reviewed fix but it's not yet in Linus' tree:
https://lore.kernel.org/linux-btrfs/cover.1709202499.git.fdmanana@suse.com/
Ok, thanks, now dropped. Let us know when you want it added back.
greg k-h
On 06.03.24 13:39, Filipe Manana wrote:
On Mon, Mar 4, 2024 at 9:26 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.7-stable review patch. If anyone has any objections, please let me know.
It would be better to delay the backport of this patch (and the followup fix) to any stable release, because it introduced another regression for which there is a reviewed fix but it's not yet in Linus' tree:
https://lore.kernel.org/linux-btrfs/cover.1709202499.git.fdmanana@suse.com/
Those two missed 6.8 afaics. Will those be heading to mainline any time soon? And how fast afterwards will it be wise to backport them to 6.8? Will anyone ask Greg for that when the time has come? Same for backporting "btrfs: fix deadlock with fiemap and extent locking", the followup fix and the two fixed mentioned above for 6.6 and 6.7?
Cioa, Thorsten
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit b0ad381fa7690244802aed119b478b4bdafc31dd ]
While working on the patchset to remove extent locking I got a lockdep splat with fiemap and pagefaulting with my new extent lock replacement lock.
This deadlock exists with our normal code, we just don't have lockdep annotations with the extent locking so we've never noticed it.
Since we're copying the fiemap extent to user space on every iteration we have the chance of pagefaulting. Because we hold the extent lock for the entire range we could mkwrite into a range in the file that we have mmap'ed. This would deadlock with the following stack trace
[<0>] lock_extent+0x28d/0x2f0 [<0>] btrfs_page_mkwrite+0x273/0x8a0 [<0>] do_page_mkwrite+0x50/0xb0 [<0>] do_fault+0xc1/0x7b0 [<0>] __handle_mm_fault+0x2fa/0x460 [<0>] handle_mm_fault+0xa4/0x330 [<0>] do_user_addr_fault+0x1f4/0x800 [<0>] exc_page_fault+0x7c/0x1e0 [<0>] asm_exc_page_fault+0x26/0x30 [<0>] rep_movs_alternative+0x33/0x70 [<0>] _copy_to_user+0x49/0x70 [<0>] fiemap_fill_next_extent+0xc8/0x120 [<0>] emit_fiemap_extent+0x4d/0xa0 [<0>] extent_fiemap+0x7f8/0xad0 [<0>] btrfs_fiemap+0x49/0x80 [<0>] __x64_sys_ioctl+0x3e1/0xb50 [<0>] do_syscall_64+0x94/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
I wrote an fstest to reproduce this deadlock without my replacement lock and verified that the deadlock exists with our existing locking.
To fix this simply don't take the extent lock for the entire duration of the fiemap. This is safe in general because we keep track of where we are when we're searching the tree, so if an ordered extent updates in the middle of our fiemap call we'll still emit the correct extents because we know what offset we were on before.
The only place we maintain the lock is searching delalloc. Since the delalloc stuff can change during writeback we want to lock the extent range so we have a consistent view of delalloc at the time we're checking to see if we need to set the delalloc flag.
With this patch applied we no longer deadlock with my testcase.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org
fs/btrfs/extent_io.c | 62 ++++++++++++++++++++++++++++++++------------ 1 file changed, 45 insertions(+), 17 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8f724c54fc8e9..197b41d02735b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2645,16 +2645,34 @@ static int fiemap_process_hole(struct btrfs_inode *inode, * it beyond i_size. */ while (cur_offset < end && cur_offset < i_size) {
struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; u64 prealloc_start;
u64 lockstart;
u64 lockend; u64 prealloc_len = 0; bool delalloc;
lockstart = round_down(cur_offset, inode->root->fs_info->sectorsize);
lockend = round_up(end, inode->root->fs_info->sectorsize);
/*
* We are only locking for the delalloc range because that's the
* only thing that can change here. With fiemap we have a lock
* on the inode, so no buffered or direct writes can happen.
*
* However mmaps and normal page writeback will cause this to
* change arbitrarily. We have to lock the extent lock here to
* make sure that nobody messes with the tree while we're doing
* btrfs_find_delalloc_in_range.
*/
lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, cur_offset, end, delalloc_cached_state, &delalloc_start, &delalloc_end);
unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) break;
@@ -2822,15 +2840,15 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { const u64 ino = btrfs_ino(inode);
struct extent_state *cached_state = NULL; struct extent_state *delalloc_cached_state = NULL; struct btrfs_path *path; struct fiemap_cache cache = { 0 }; struct btrfs_backref_share_check_ctx *backref_ctx; u64 last_extent_end; u64 prev_extent_end;
u64 lockstart;
u64 lockend;
u64 range_start;
u64 range_end;
const u64 sectorsize = inode->root->fs_info->sectorsize; bool stopped = false; int ret;
@@ -2841,12 +2859,11 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, goto out; }
lockstart = round_down(start, inode->root->fs_info->sectorsize);
lockend = round_up(start + len, inode->root->fs_info->sectorsize);
prev_extent_end = lockstart;
range_start = round_down(start, sectorsize);
range_end = round_up(start + len, sectorsize);
prev_extent_end = range_start; btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end); if (ret < 0)
@@ -2854,7 +2871,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, btrfs_release_path(path);
path->reada = READA_FORWARD;
ret = fiemap_search_slot(inode, path, lockstart);
ret = fiemap_search_slot(inode, path, range_start); if (ret < 0) { goto out_unlock; } else if (ret > 0) {
@@ -2866,7 +2883,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, goto check_eof_delalloc; }
while (prev_extent_end < lockend) {
while (prev_extent_end < range_end) { struct extent_buffer *leaf = path->nodes[0]; struct btrfs_file_extent_item *ei; struct btrfs_key key;
@@ -2889,19 +2906,19 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, * The first iteration can leave us at an extent item that ends * before our range's start. Move to the next item. */
if (extent_end <= lockstart)
if (extent_end <= range_start) goto next_item; backref_ctx->curr_leaf_bytenr = leaf->start; /* We have in implicit hole (NO_HOLES feature enabled). */ if (prev_extent_end < key.offset) {
const u64 range_end = min(key.offset, lockend) - 1;
const u64 hole_end = min(key.offset, range_end) - 1; ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx, 0, 0, 0,
prev_extent_end, range_end);
prev_extent_end, hole_end); if (ret < 0) { goto out_unlock; } else if (ret > 0) {
@@ -2911,7 +2928,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, }
/* We've reached the end of the fiemap range, stop. */
if (key.offset >= lockend) {
if (key.offset >= range_end) { stopped = true; break; }
@@ -3005,29 +3022,41 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, btrfs_free_path(path); path = NULL;
if (!stopped && prev_extent_end < lockend) {
if (!stopped && prev_extent_end < range_end) { ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx,
0, 0, 0, prev_extent_end, lockend - 1);
0, 0, 0, prev_extent_end, range_end - 1); if (ret < 0) goto out_unlock;
prev_extent_end = lockend;
prev_extent_end = range_end; } if (cache.cached && cache.offset + cache.len >= last_extent_end) { const u64 i_size = i_size_read(&inode->vfs_inode); if (prev_extent_end < i_size) {
struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end;
u64 lockstart;
u64 lockend; bool delalloc;
lockstart = round_down(prev_extent_end, sectorsize);
lockend = round_up(i_size, sectorsize);
/*
* See the comment in fiemap_process_hole as to why
* we're doing the locking here.
*/
lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, prev_extent_end, i_size - 1, &delalloc_cached_state, &delalloc_start, &delalloc_end);
unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) cache.flags |= FIEMAP_EXTENT_LAST; } else {
@@ -3038,7 +3067,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = emit_last_fiemap_cache(fieinfo, &cache);
out_unlock:
unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
out: free_extent_state(delalloc_cached_state); -- 2.43.0
On Mon, Mar 11, 2024 at 10:15:31AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
On 06.03.24 13:39, Filipe Manana wrote:
On Mon, Mar 4, 2024 at 9:26 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.7-stable review patch. If anyone has any objections, please let me know.
It would be better to delay the backport of this patch (and the followup fix) to any stable release, because it introduced another regression for which there is a reviewed fix but it's not yet in Linus' tree:
https://lore.kernel.org/linux-btrfs/cover.1709202499.git.fdmanana@suse.com/
Those two missed 6.8 afaics. Will those be heading to mainline any time soon?
Yes, in the 6.9 pull request.
And how fast afterwards will it be wise to backport them to 6.8? Will anyone ask Greg for that when the time has come?
The commits have stable tags and will be processed in the usual way.
On 11.03.24 19:41, David Sterba wrote:
On Mon, Mar 11, 2024 at 10:15:31AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
On 06.03.24 13:39, Filipe Manana wrote:
On Mon, Mar 4, 2024 at 9:26 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.7-stable review patch. If anyone has any objections, please let me know.
It would be better to delay the backport of this patch (and the followup fix) to any stable release, because it introduced another regression for which there is a reviewed fix but it's not yet in Linus' tree:
https://lore.kernel.org/linux-btrfs/cover.1709202499.git.fdmanana@suse.com/
Those two missed 6.8 afaics. Will those be heading to mainline any time soon?
Yes, in the 6.9 pull request.
Great!
And how fast afterwards will it be wise to backport them to 6.8? Will anyone ask Greg for that when the time has come?
The commits have stable tags and will be processed in the usual way.
I'm missing something. The first change from Filipe's series linked above has a fixes tag, but no stable tag afaics: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?h=fo...
So there is no guarantee that Greg will pick it up; and I assume if he does he only will do so after -rc1 (or later, if the CVE stuff continues to keep him busy). As Filipe wrote "can actually have serious consequences" this got me slightly worried. That's why I'm a PITA here, sorry -- but as I said, maybe I'm missing something.
The second of the patches has none of those tags, but well, from the patch descriptions it seems that is just a optimization, so that is likely not something to worry about: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?h=fo...
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
On Mon, Mar 11, 2024 at 7:23 PM Linux regression tracking (Thorsten Leemhuis) regressions@leemhuis.info wrote:
On 11.03.24 19:41, David Sterba wrote:
On Mon, Mar 11, 2024 at 10:15:31AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
On 06.03.24 13:39, Filipe Manana wrote:
On Mon, Mar 4, 2024 at 9:26 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.7-stable review patch. If anyone has any objections, please let me know.
It would be better to delay the backport of this patch (and the followup fix) to any stable release, because it introduced another regression for which there is a reviewed fix but it's not yet in Linus' tree:
https://lore.kernel.org/linux-btrfs/cover.1709202499.git.fdmanana@suse.com/
Those two missed 6.8 afaics. Will those be heading to mainline any time soon?
Yes, in the 6.9 pull request.
Great!
And how fast afterwards will it be wise to backport them to 6.8? Will anyone ask Greg for that when the time has come?
The commits have stable tags and will be processed in the usual way.
I'm missing something. The first change from Filipe's series linked above has a fixes tag, but no stable tag afaics: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?h=fo...
It has no stable tag because when I sent the patch there was yet no kernel release with the buggy commit, which landed in 6.8-rc6. Now it would make sense to add the stable tag because 6.8 was released yesterday and it's the first release with the buggy commit.
So there is no guarantee that Greg will pick it up; and I assume if he does he only will do so after -rc1 (or later, if the CVE stuff continues to keep him busy).
Don't worry, we are paying attention to that and we'll remind Greg if necessary.
As Filipe wrote "can actually have serious consequences" this got me slightly worried. That's why I'm a PITA here, sorry -- but as I said, maybe I'm missing something.
The second of the patches has none of those tags, but well, from the patch descriptions it seems that is just a optimization, so that is likely not something to worry about: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?h=fo...
Yes, it's an optimization. If it were a bug fix, I would have added a Fixes tag and would have described what the bug was.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
On 11.03.24 21:06, Filipe Manana wrote:
On Mon, Mar 11, 2024 at 7:23 PM Linux regression tracking (Thorsten Leemhuis) regressions@leemhuis.info wrote:
On 11.03.24 19:41, David Sterba wrote:
On Mon, Mar 11, 2024 at 10:15:31AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
On 06.03.24 13:39, Filipe Manana wrote:
On Mon, Mar 4, 2024 at 9:26 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.7-stable review patch. If anyone has any objections, please let me know.
It would be better to delay the backport of this patch (and the followup fix) to any stable release, because it introduced another regression for which there is a reviewed fix but it's not yet in Linus' tree: https://lore.kernel.org/linux-btrfs/cover.1709202499.git.fdmanana@suse.com/
Those two missed 6.8 afaics. Will those be heading to mainline any time soon?
Yes, in the 6.9 pull request.
Great!
And how fast afterwards will it be wise to backport them to 6.8? Will anyone ask Greg for that when the time has come?
The commits have stable tags and will be processed in the usual way.
I'm missing something. The first change from Filipe's series linked above has a fixes tag, but no stable tag afaics: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?h=fo...
It has no stable tag because when I sent the patch there was yet no kernel release with the buggy commit
Obviously, no need to explain, the discussion got unintentionally sideways after David mistakenly said "The commits have stable tags [...]". Happens, no worries.
So there is no guarantee that Greg will pick it up; and I assume if he does he only will do so after -rc1 (or later, if the CVE stuff continues to keep him busy).
Don't worry, we are paying attention to that and we'll remind Greg if necessary.
Thx. But well, for the record: I would really have liked if you or David would simply have just answered my earlier "And how fast afterwards will it be wise to backport them to 6.8?" question instead of avoiding it. Just knowing a rough estimate would have helped. And I guess Greg might have liked to know the answer, too. But whatever.
Side note: I'm here to "worry". It's not that I don't trust you or would ask Greg behind your back to pick the patches up. It's just that we are all humans[1]. And regression tracking is here to help with the flaws humans have: they miss things, they suddenly need to go to hospitals for a while, they become preoccupied with solving the next complicated and big bug of the month, or just forget something they wanted to do because something unexpected happens, like aliens landing in a unidentified flying object. And from all the regressions I see that are not handled well and the feedback I got it seems doing this work seems to be worth it.
Ciao, Thorsten
[1] at least I think so... https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you%27re_a_dog :-D
On Wed, Mar 13, 2024 at 1:42 PM Linux regression tracking (Thorsten Leemhuis) regressions@leemhuis.info wrote:
On 11.03.24 21:06, Filipe Manana wrote:
On Mon, Mar 11, 2024 at 7:23 PM Linux regression tracking (Thorsten Leemhuis) regressions@leemhuis.info wrote:
On 11.03.24 19:41, David Sterba wrote:
On Mon, Mar 11, 2024 at 10:15:31AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
On 06.03.24 13:39, Filipe Manana wrote:
On Mon, Mar 4, 2024 at 9:26 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote: > > 6.7-stable review patch. If anyone has any objections, please let me know. It would be better to delay the backport of this patch (and the followup fix) to any stable release, because it introduced another regression for which there is a reviewed fix but it's not yet in Linus' tree: https://lore.kernel.org/linux-btrfs/cover.1709202499.git.fdmanana@suse.com/
Those two missed 6.8 afaics. Will those be heading to mainline any time soon?
Yes, in the 6.9 pull request.
Great!
And how fast afterwards will it be wise to backport them to 6.8? Will anyone ask Greg for that when the time has come?
The commits have stable tags and will be processed in the usual way.
I'm missing something. The first change from Filipe's series linked above has a fixes tag, but no stable tag afaics: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?h=fo...
It has no stable tag because when I sent the patch there was yet no kernel release with the buggy commit
Obviously, no need to explain, the discussion got unintentionally sideways after David mistakenly said "The commits have stable tags [...]". Happens, no worries.
So there is no guarantee that Greg will pick it up; and I assume if he does he only will do so after -rc1 (or later, if the CVE stuff continues to keep him busy).
Don't worry, we are paying attention to that and we'll remind Greg if necessary.
Thx. But well, for the record: I would really have liked if you or David would simply have just answered my earlier "And how fast afterwards will it be wise to backport them to 6.8?" question instead of avoiding it.
If I avoided that question it's because I can't give an answer to it.
David is the maintainer who picks patches for Linus and does the pull requests. So the "how fast" depends on him and then how fast Linus merges and then how fast Greg and the stable people pick it.
Just knowing a rough estimate would have helped. And I guess Greg might have liked to know the answer, too. But whatever.
Side note: I'm here to "worry". It's not that I don't trust you or would ask Greg behind your back to pick the patches up. It's just that we are all humans[1]. And regression tracking is here to help with the flaws humans have: they miss things, they suddenly need to go to hospitals for a while, they become preoccupied with solving the next complicated and big bug of the month, or just forget something they wanted to do because something unexpected happens, like aliens landing in a unidentified flying object. And from all the regressions I see that are not handled well and the feedback I got it seems doing this work seems to be worth it.
Ciao, Thorsten
[1] at least I think so... https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you%27re_a_dog :-D
On Mon, Mar 11, 2024 at 08:23:23PM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
On 11.03.24 19:41, David Sterba wrote:
On Mon, Mar 11, 2024 at 10:15:31AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
On 06.03.24 13:39, Filipe Manana wrote:
On Mon, Mar 4, 2024 at 9:26 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
6.7-stable review patch. If anyone has any objections, please let me know.
It would be better to delay the backport of this patch (and the followup fix) to any stable release, because it introduced another regression for which there is a reviewed fix but it's not yet in Linus' tree:
https://lore.kernel.org/linux-btrfs/cover.1709202499.git.fdmanana@suse.com/
Those two missed 6.8 afaics. Will those be heading to mainline any time soon?
Yes, in the 6.9 pull request.
Great!
And how fast afterwards will it be wise to backport them to 6.8? Will anyone ask Greg for that when the time has come?
The commits have stable tags and will be processed in the usual way.
I'm missing something. The first change from Filipe's series linked above has a fixes tag, but no stable tag afaics: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?h=fo...
So there is no guarantee that Greg will pick it up; and I assume if he does he only will do so after -rc1 (or later, if the CVE stuff continues to keep him busy). As Filipe wrote "can actually have serious consequences" this got me slightly worried. That's why I'm a PITA here, sorry -- but as I said, maybe I'm missing something.
Well it's the timing, last week before a final release the branches don't receive any insignificant changes like reviewed-by or stable tags. The patch connection is also done by the Fixes tag and a missing CC:stable can be substituted by explicit requests for backport if needed.
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Han Xu han.xu@nxp.com
[ Upstream commit 59950610c0c00c7a06d8a75d2ee5d73dba4274cf ]
Some GigaDevice ecc_get_status functions use on-stack buffer for spi_mem_op causes spi_mem_check_op failing, fix the issue by using spinand scratchbuf.
Fixes: c40c7a990a46 ("mtd: spinand: Add support for GigaDevice GD5F1GQ4UExxG") Signed-off-by: Han Xu han.xu@nxp.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20231108150701.593912-1-han.xu@nxp.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/nand/spi/gigadevice.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/spi/gigadevice.c b/drivers/mtd/nand/spi/gigadevice.c index 987710e09441a..6023cba748bb8 100644 --- a/drivers/mtd/nand/spi/gigadevice.c +++ b/drivers/mtd/nand/spi/gigadevice.c @@ -186,7 +186,7 @@ static int gd5fxgq4uexxg_ecc_get_status(struct spinand_device *spinand, { u8 status2; struct spi_mem_op op = SPINAND_GET_FEATURE_OP(GD5FXGQXXEXXG_REG_STATUS2, - &status2); + spinand->scratchbuf); int ret;
switch (status & STATUS_ECC_MASK) { @@ -207,6 +207,7 @@ static int gd5fxgq4uexxg_ecc_get_status(struct spinand_device *spinand, * report the maximum of 4 in this case */ /* bits sorted this way (3...0): ECCS1,ECCS0,ECCSE1,ECCSE0 */ + status2 = *(spinand->scratchbuf); return ((status & STATUS_ECC_MASK) >> 2) | ((status2 & STATUS_ECC_MASK) >> 4);
@@ -228,7 +229,7 @@ static int gd5fxgq5xexxg_ecc_get_status(struct spinand_device *spinand, { u8 status2; struct spi_mem_op op = SPINAND_GET_FEATURE_OP(GD5FXGQXXEXXG_REG_STATUS2, - &status2); + spinand->scratchbuf); int ret;
switch (status & STATUS_ECC_MASK) { @@ -248,6 +249,7 @@ static int gd5fxgq5xexxg_ecc_get_status(struct spinand_device *spinand, * 1 ... 4 bits are flipped (and corrected) */ /* bits sorted this way (1...0): ECCSE1, ECCSE0 */ + status2 = *(spinand->scratchbuf); return ((status2 & STATUS_ECC_MASK) >> 4) + 1;
case STATUS_ECC_UNCOR_ERROR:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yochai Hagvi yochai.hagvi@intel.com
[ Upstream commit e8335ef57c6816d81b24173ba88cc9b3f043687f ]
Fix the connection state between source DPLL and output pin, updating the attribute 'state' of 'parent_device'. Previously, the connection state was broken, and didn't reflect the correct state.
When 'state_on_dpll_set' is called with the value 'DPLL_PIN_STATE_CONNECTED' (1), the output pin will switch to the given DPLL, and the state of the given DPLL will be set to connected. E.g.: --do pin-set --json '{"id":2, "parent-device":{"parent-id":1, "state": 1 }}' This command will connect DPLL device with id 1 to output pin with id 2.
When 'state_on_dpll_set' is called with the value 'DPLL_PIN_STATE_DISCONNECTED' (2) and the given DPLL is currently connected, then the output pin will be disabled. E.g: --do pin-set --json '{"id":2, "parent-device":{"parent-id":1, "state": 2 }}' This command will disable output pin with id 2 if DPLL device with ID 1 is connected to it; otherwise, the command is ignored.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu") Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Reviewed-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Signed-off-by: Yochai Hagvi yochai.hagvi@intel.com Tested-by: Sunitha Mekala sunithax.d.mekala@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_dpll.c | 43 +++++++++++++++++------ 1 file changed, 32 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c index 86b180cb32a02..0f836adc0e587 100644 --- a/drivers/net/ethernet/intel/ice/ice_dpll.c +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c @@ -254,6 +254,7 @@ ice_dpll_output_frequency_get(const struct dpll_pin *pin, void *pin_priv, * ice_dpll_pin_enable - enable a pin on dplls * @hw: board private hw structure * @pin: pointer to a pin + * @dpll_idx: dpll index to connect to output pin * @pin_type: type of pin being enabled * @extack: error reporting * @@ -266,7 +267,7 @@ ice_dpll_output_frequency_get(const struct dpll_pin *pin, void *pin_priv, */ static int ice_dpll_pin_enable(struct ice_hw *hw, struct ice_dpll_pin *pin, - enum ice_dpll_pin_type pin_type, + u8 dpll_idx, enum ice_dpll_pin_type pin_type, struct netlink_ext_ack *extack) { u8 flags = 0; @@ -280,10 +281,12 @@ ice_dpll_pin_enable(struct ice_hw *hw, struct ice_dpll_pin *pin, ret = ice_aq_set_input_pin_cfg(hw, pin->idx, 0, flags, 0, 0); break; case ICE_DPLL_PIN_TYPE_OUTPUT: + flags = ICE_AQC_SET_CGU_OUT_CFG_UPDATE_SRC_SEL; if (pin->flags[0] & ICE_AQC_GET_CGU_OUT_CFG_ESYNC_EN) flags |= ICE_AQC_SET_CGU_OUT_CFG_ESYNC_EN; flags |= ICE_AQC_SET_CGU_OUT_CFG_OUT_EN; - ret = ice_aq_set_output_pin_cfg(hw, pin->idx, flags, 0, 0, 0); + ret = ice_aq_set_output_pin_cfg(hw, pin->idx, flags, dpll_idx, + 0, 0); break; default: return -EINVAL; @@ -398,14 +401,27 @@ ice_dpll_pin_state_update(struct ice_pf *pf, struct ice_dpll_pin *pin, break; case ICE_DPLL_PIN_TYPE_OUTPUT: ret = ice_aq_get_output_pin_cfg(&pf->hw, pin->idx, - &pin->flags[0], NULL, + &pin->flags[0], &parent, &pin->freq, NULL); if (ret) goto err; - if (ICE_AQC_SET_CGU_OUT_CFG_OUT_EN & pin->flags[0]) - pin->state[0] = DPLL_PIN_STATE_CONNECTED; - else - pin->state[0] = DPLL_PIN_STATE_DISCONNECTED; + + parent &= ICE_AQC_GET_CGU_OUT_CFG_DPLL_SRC_SEL; + if (ICE_AQC_SET_CGU_OUT_CFG_OUT_EN & pin->flags[0]) { + pin->state[pf->dplls.eec.dpll_idx] = + parent == pf->dplls.eec.dpll_idx ? + DPLL_PIN_STATE_CONNECTED : + DPLL_PIN_STATE_DISCONNECTED; + pin->state[pf->dplls.pps.dpll_idx] = + parent == pf->dplls.pps.dpll_idx ? + DPLL_PIN_STATE_CONNECTED : + DPLL_PIN_STATE_DISCONNECTED; + } else { + pin->state[pf->dplls.eec.dpll_idx] = + DPLL_PIN_STATE_DISCONNECTED; + pin->state[pf->dplls.pps.dpll_idx] = + DPLL_PIN_STATE_DISCONNECTED; + } break; case ICE_DPLL_PIN_TYPE_RCLK_INPUT: for (parent = 0; parent < pf->dplls.rclk.num_parents; @@ -595,7 +611,8 @@ ice_dpll_pin_state_set(const struct dpll_pin *pin, void *pin_priv,
mutex_lock(&pf->dplls.lock); if (enable) - ret = ice_dpll_pin_enable(&pf->hw, p, pin_type, extack); + ret = ice_dpll_pin_enable(&pf->hw, p, d->dpll_idx, pin_type, + extack); else ret = ice_dpll_pin_disable(&pf->hw, p, pin_type, extack); if (!ret) @@ -628,6 +645,11 @@ ice_dpll_output_state_set(const struct dpll_pin *pin, void *pin_priv, struct netlink_ext_ack *extack) { bool enable = state == DPLL_PIN_STATE_CONNECTED; + struct ice_dpll_pin *p = pin_priv; + struct ice_dpll *d = dpll_priv; + + if (!enable && p->state[d->dpll_idx] == DPLL_PIN_STATE_DISCONNECTED) + return 0;
return ice_dpll_pin_state_set(pin, pin_priv, dpll, dpll_priv, enable, extack, ICE_DPLL_PIN_TYPE_OUTPUT); @@ -694,10 +716,9 @@ ice_dpll_pin_state_get(const struct dpll_pin *pin, void *pin_priv, ret = ice_dpll_pin_state_update(pf, p, pin_type, extack); if (ret) goto unlock; - if (pin_type == ICE_DPLL_PIN_TYPE_INPUT) + if (pin_type == ICE_DPLL_PIN_TYPE_INPUT || + pin_type == ICE_DPLL_PIN_TYPE_OUTPUT) *state = p->state[d->dpll_idx]; - else if (pin_type == ICE_DPLL_PIN_TYPE_OUTPUT) - *state = p->state[0]; ret = 0; unlock: mutex_unlock(&pf->dplls.lock);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com
[ Upstream commit 3b14430c65b4f510b2a310ca4f18ed6ca7184b00 ]
The value of phase_adjust for input pin shall be updated in ice_dpll_pin_state_update(..). Fix by adding proper argument to the firmware query function call - a pin's struct field pointer where the phase_adjust value during driver runtime is stored.
Previously the phase_adjust used to misinform user about actual phase_adjust value. I.e., if phase_adjust was set to a non zero value and if driver was reloaded, the user would see the value equal 0, which is not correct - the actual value is equal to value set before driver reload.
Fixes: 90e1c90750d7 ("ice: dpll: implement phase related callbacks") Reviewed-by: Alan Brady alan.brady@intel.com Signed-off-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_dpll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c index 0f836adc0e587..10a469060d322 100644 --- a/drivers/net/ethernet/intel/ice/ice_dpll.c +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c @@ -373,7 +373,7 @@ ice_dpll_pin_state_update(struct ice_pf *pf, struct ice_dpll_pin *pin, case ICE_DPLL_PIN_TYPE_INPUT: ret = ice_aq_get_input_pin_cfg(&pf->hw, pin->idx, NULL, NULL, NULL, &pin->flags[0], - &pin->freq, NULL); + &pin->freq, &pin->phase_adjust); if (ret) goto err; if (ICE_AQC_GET_CGU_IN_CFG_FLG2_INPUT_EN & pin->flags[0]) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com
[ Upstream commit fc7fd1a10a9d2d38378b42e9a508da4c68018453 ]
Do not allow to acquire data or alter configuration of dpll and pins through firmware if PF reset is in progress, this would cause confusing netlink extack errors as the firmware cannot respond or process the request properly during the reset time.
Return (-EBUSY) and extack error for the user who tries access/modify the config of dpll/pin through firmware during the reset time.
The PF reset and kernel access to dpll data are both asynchronous. It is not possible to guard all the possible reset paths with any determinictic approach. I.e., it is possible that reset starts after reset check is performed (or if the reset would be checked after mutex is locked), but at the same time it is not possible to wait for dpll mutex unlock in the reset flow. This is best effort solution to at least give a clue to the user what is happening in most of the cases, knowing that there are possible race conditions where the user could see a different error received from firmware due to reset unexpectedly starting.
Test by looping execution of below steps until netlink error appears: - perform PF reset $ echo 1 > /sys/class/net/<ice PF>/device/reset - i.e. try to alter/read dpll/pin config: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \ --dump pin-get
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu") Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Signed-off-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_dpll.c | 38 +++++++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c index 10a469060d322..9c8be237c7e50 100644 --- a/drivers/net/ethernet/intel/ice/ice_dpll.c +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c @@ -30,6 +30,26 @@ static const char * const pin_type_name[] = { [ICE_DPLL_PIN_TYPE_RCLK_INPUT] = "rclk-input", };
+/** + * ice_dpll_is_reset - check if reset is in progress + * @pf: private board structure + * @extack: error reporting + * + * If reset is in progress, fill extack with error. + * + * Return: + * * false - no reset in progress + * * true - reset in progress + */ +static bool ice_dpll_is_reset(struct ice_pf *pf, struct netlink_ext_ack *extack) +{ + if (ice_is_reset_in_progress(pf->state)) { + NL_SET_ERR_MSG(extack, "PF reset in progress"); + return true; + } + return false; +} + /** * ice_dpll_pin_freq_set - set pin's frequency * @pf: private board structure @@ -109,6 +129,9 @@ ice_dpll_frequency_set(const struct dpll_pin *pin, void *pin_priv, struct ice_pf *pf = d->pf; int ret;
+ if (ice_dpll_is_reset(pf, extack)) + return -EBUSY; + mutex_lock(&pf->dplls.lock); ret = ice_dpll_pin_freq_set(pf, p, pin_type, frequency, extack); mutex_unlock(&pf->dplls.lock); @@ -609,6 +632,9 @@ ice_dpll_pin_state_set(const struct dpll_pin *pin, void *pin_priv, struct ice_pf *pf = d->pf; int ret;
+ if (ice_dpll_is_reset(pf, extack)) + return -EBUSY; + mutex_lock(&pf->dplls.lock); if (enable) ret = ice_dpll_pin_enable(&pf->hw, p, d->dpll_idx, pin_type, @@ -712,6 +738,9 @@ ice_dpll_pin_state_get(const struct dpll_pin *pin, void *pin_priv, struct ice_pf *pf = d->pf; int ret;
+ if (ice_dpll_is_reset(pf, extack)) + return -EBUSY; + mutex_lock(&pf->dplls.lock); ret = ice_dpll_pin_state_update(pf, p, pin_type, extack); if (ret) @@ -836,6 +865,9 @@ ice_dpll_input_prio_set(const struct dpll_pin *pin, void *pin_priv, struct ice_pf *pf = d->pf; int ret;
+ if (ice_dpll_is_reset(pf, extack)) + return -EBUSY; + mutex_lock(&pf->dplls.lock); ret = ice_dpll_hw_input_prio_set(pf, d, p, prio, extack); mutex_unlock(&pf->dplls.lock); @@ -1115,6 +1147,9 @@ ice_dpll_rclk_state_on_pin_set(const struct dpll_pin *pin, void *pin_priv, int ret = -EINVAL; u32 hw_idx;
+ if (ice_dpll_is_reset(pf, extack)) + return -EBUSY; + mutex_lock(&pf->dplls.lock); hw_idx = parent->idx - pf->dplls.base_rclk_idx; if (hw_idx >= pf->dplls.num_inputs) @@ -1169,6 +1204,9 @@ ice_dpll_rclk_state_on_pin_get(const struct dpll_pin *pin, void *pin_priv, int ret = -EINVAL; u32 hw_idx;
+ if (ice_dpll_is_reset(pf, extack)) + return -EBUSY; + mutex_lock(&pf->dplls.lock); hw_idx = parent->idx - pf->dplls.base_rclk_idx; if (hw_idx >= pf->dplls.num_inputs)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com
[ Upstream commit 9a8385fe14bcb250a3889e744dc54e9c411d8400 ]
Do not allow dpll periodic work function to acquire data from firmware if PF reset is in progress. Acquiring data will cause dmesg errors as the firmware cannot respond or process the request properly during the reset time.
Test by looping execution of below step until dmesg error appears: - perform PF reset $ echo 1 > /sys/class/net/<ice PF>/device/reset
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu") Reviewed-by: Igor Bagnucki igor.bagnucki@intel.com Signed-off-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_dpll.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c index 9c8be237c7e50..bcb9b9c13aabc 100644 --- a/drivers/net/ethernet/intel/ice/ice_dpll.c +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c @@ -1390,8 +1390,10 @@ static void ice_dpll_periodic_work(struct kthread_work *work) struct ice_pf *pf = container_of(d, struct ice_pf, dplls); struct ice_dpll *de = &pf->dplls.eec; struct ice_dpll *dp = &pf->dplls.pps; - int ret; + int ret = 0;
+ if (ice_is_reset_in_progress(pf->state)) + goto resched; mutex_lock(&pf->dplls.lock); ret = ice_dpll_update_state(pf, de, false); if (!ret) @@ -1411,6 +1413,7 @@ static void ice_dpll_periodic_work(struct kthread_work *work) ice_dpll_notify_changes(de); ice_dpll_notify_changes(dp);
+resched: /* Run twice a second or reschedule if update failed */ kthread_queue_delayed_work(d->kworker, &d->work, ret ? msecs_to_jiffies(10) :
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com
[ Upstream commit ee89921da471edcb4b1e67f5bbfedddf39749782 ]
Do not allow to set phase adjust value for a pin if PF reset is in progress, this would cause confusing netlink extack errors as the firmware cannot process the request properly during the reset time.
Return (-EBUSY) and report extack error for the user who tries configure pin phase adjust during the reset time.
Test by looping execution of below steps until netlink error appears: - perform PF reset $ echo 1 > /sys/class/net/<ice PF>/device/reset - change pin phase adjust value: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \ --do pin-set --json '{"id":0, "phase-adjust":1000}'
Fixes: 90e1c90750d7 ("ice: dpll: implement phase related callbacks") Reviewed-by: Igor Bagnucki igor.bagnucki@intel.com Signed-off-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_dpll.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c index bcb9b9c13aabc..2b657d43c769d 100644 --- a/drivers/net/ethernet/intel/ice/ice_dpll.c +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c @@ -988,6 +988,9 @@ ice_dpll_pin_phase_adjust_set(const struct dpll_pin *pin, void *pin_priv, u8 flag, flags_en = 0; int ret;
+ if (ice_dpll_is_reset(pf, extack)) + return -EBUSY; + mutex_lock(&pf->dplls.lock); switch (type) { case ICE_DPLL_PIN_TYPE_INPUT:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Théo Lebrun theo.lebrun@bootlin.com
[ Upstream commit 32ce3bb57b6b402de2aec1012511e7ac4e7449dc ]
dev_get_drvdata() gets used to acquire the pointer to cqspi and the SPI controller. Neither embed the other; this lead to memory corruption.
On a given platform (Mobileye EyeQ5) the memory corruption is hidden inside cqspi->f_pdata. Also, this uninitialised memory is used as a mutex (ctlr->bus_lock_mutex) by spi_controller_suspend().
Fixes: 2087e85bb66e ("spi: cadence-quadspi: fix suspend-resume implementations") Reviewed-by: Dhruva Gole d-gole@ti.com Signed-off-by: Théo Lebrun theo.lebrun@bootlin.com Link: https://msgid.link/r/20240222-cdns-qspi-pm-fix-v4-1-6b6af8bcbf59@bootlin.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-cadence-quadspi.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c index f94e0d370d466..0d184d65dce76 100644 --- a/drivers/spi/spi-cadence-quadspi.c +++ b/drivers/spi/spi-cadence-quadspi.c @@ -1930,10 +1930,9 @@ static void cqspi_remove(struct platform_device *pdev) static int cqspi_suspend(struct device *dev) { struct cqspi_st *cqspi = dev_get_drvdata(dev); - struct spi_controller *host = dev_get_drvdata(dev); int ret;
- ret = spi_controller_suspend(host); + ret = spi_controller_suspend(cqspi->host); cqspi_controller_enable(cqspi, 0);
clk_disable_unprepare(cqspi->clk); @@ -1944,7 +1943,6 @@ static int cqspi_suspend(struct device *dev) static int cqspi_resume(struct device *dev) { struct cqspi_st *cqspi = dev_get_drvdata(dev); - struct spi_controller *host = dev_get_drvdata(dev);
clk_prepare_enable(cqspi->clk); cqspi_wait_idle(cqspi); @@ -1953,7 +1951,7 @@ static int cqspi_resume(struct device *dev) cqspi->current_cs = -1; cqspi->sclk = 0;
- return spi_controller_resume(host); + return spi_controller_resume(cqspi->host); }
static DEFINE_RUNTIME_DEV_PM_OPS(cqspi_dev_pm_ops, cqspi_suspend,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Théo Lebrun theo.lebrun@bootlin.com
[ Upstream commit 959043afe53ae80633e810416cee6076da6e91c6 ]
The ->runtime_suspend() and ->runtime_resume() callbacks are not expected to call spi_controller_suspend() and spi_controller_resume(). Remove calls to those in the cadence-qspi driver.
Those helpers have two roles currently: - They stop/start the queue, including dealing with the kworker. - They toggle the SPI controller SPI_CONTROLLER_SUSPENDED flag. It requires acquiring ctlr->bus_lock_mutex.
Step one is irrelevant because cadence-qspi is not queued. Step two however has two implications: - A deadlock occurs, because ->runtime_resume() is called in a context where the lock is already taken (in the ->exec_op() callback, where the usage count is incremented). - It would disallow all operations once the device is auto-suspended.
Here is a brief call tree highlighting the mutex deadlock:
spi_mem_exec_op() ... spi_mem_access_start() mutex_lock(&ctlr->bus_lock_mutex)
cqspi_exec_mem_op() pm_runtime_resume_and_get() cqspi_resume() spi_controller_resume() mutex_lock(&ctlr->bus_lock_mutex) ...
spi_mem_access_end() mutex_unlock(&ctlr->bus_lock_mutex) ...
Fixes: 0578a6dbfe75 ("spi: spi-cadence-quadspi: add runtime pm support") Signed-off-by: Théo Lebrun theo.lebrun@bootlin.com Link: https://msgid.link/r/20240222-cdns-qspi-pm-fix-v4-2-6b6af8bcbf59@bootlin.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-cadence-quadspi.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c index 0d184d65dce76..731775d34d393 100644 --- a/drivers/spi/spi-cadence-quadspi.c +++ b/drivers/spi/spi-cadence-quadspi.c @@ -1930,14 +1930,10 @@ static void cqspi_remove(struct platform_device *pdev) static int cqspi_suspend(struct device *dev) { struct cqspi_st *cqspi = dev_get_drvdata(dev); - int ret;
- ret = spi_controller_suspend(cqspi->host); cqspi_controller_enable(cqspi, 0); - clk_disable_unprepare(cqspi->clk); - - return ret; + return 0; }
static int cqspi_resume(struct device *dev) @@ -1950,8 +1946,7 @@ static int cqspi_resume(struct device *dev)
cqspi->current_cs = -1; cqspi->sclk = 0; - - return spi_controller_resume(cqspi->host); + return 0; }
static DEFINE_RUNTIME_DEV_PM_OPS(cqspi_dev_pm_ops, cqspi_suspend,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryosuke Yasuoka ryasuoka@redhat.com
[ Upstream commit 661779e1fcafe1b74b3f3fe8e980c1e207fea1fd ]
syzbot reported the following uninit-value access issue [1]:
netlink_to_full_skb() creates a new `skb` and puts the `skb->data` passed as a 1st arg of netlink_to_full_skb() onto new `skb`. The data size is specified as `len` and passed to skb_put_data(). This `len` is based on `skb->end` that is not data offset but buffer offset. The `skb->end` contains data and tailroom. Since the tailroom is not initialized when the new `skb` created, KMSAN detects uninitialized memory area when copying the data.
This patch resolved this issue by correct the len from `skb->end` to `skb->len`, which is the actual data offset.
BUG: KMSAN: kernel-infoleak-after-free in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak-after-free in copy_to_user_iter lib/iov_iter.c:24 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_ubuf include/linux/iov_iter.h:29 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance2 include/linux/iov_iter.h:245 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance include/linux/iov_iter.h:271 [inline] BUG: KMSAN: kernel-infoleak-after-free in _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186 instrument_copy_to_user include/linux/instrumented.h:114 [inline] copy_to_user_iter lib/iov_iter.c:24 [inline] iterate_ubuf include/linux/iov_iter.h:29 [inline] iterate_and_advance2 include/linux/iov_iter.h:245 [inline] iterate_and_advance include/linux/iov_iter.h:271 [inline] _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186 copy_to_iter include/linux/uio.h:197 [inline] simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:532 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:420 skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546 skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline] packet_recvmsg+0xd9c/0x2000 net/packet/af_packet.c:3482 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg net/socket.c:1066 [inline] sock_read_iter+0x467/0x580 net/socket.c:1136 call_read_iter include/linux/fs.h:2014 [inline] new_sync_read fs/read_write.c:389 [inline] vfs_read+0x8f6/0xe00 fs/read_write.c:470 ksys_read+0x20f/0x4c0 fs/read_write.c:613 __do_sys_read fs/read_write.c:623 [inline] __se_sys_read fs/read_write.c:621 [inline] __x64_sys_read+0x93/0xd0 fs/read_write.c:621 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was stored to memory at: skb_put_data include/linux/skbuff.h:2622 [inline] netlink_to_full_skb net/netlink/af_netlink.c:181 [inline] __netlink_deliver_tap_skb net/netlink/af_netlink.c:298 [inline] __netlink_deliver_tap+0x5be/0xc90 net/netlink/af_netlink.c:325 netlink_deliver_tap net/netlink/af_netlink.c:338 [inline] netlink_deliver_tap_kernel net/netlink/af_netlink.c:347 [inline] netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x10f1/0x1250 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at: free_pages_prepare mm/page_alloc.c:1087 [inline] free_unref_page_prepare+0xb0/0xa40 mm/page_alloc.c:2347 free_unref_page_list+0xeb/0x1100 mm/page_alloc.c:2533 release_pages+0x23d3/0x2410 mm/swap.c:1042 free_pages_and_swap_cache+0xd9/0xf0 mm/swap_state.c:316 tlb_batch_pages_flush mm/mmu_gather.c:98 [inline] tlb_flush_mmu_free mm/mmu_gather.c:293 [inline] tlb_flush_mmu+0x6f5/0x980 mm/mmu_gather.c:300 tlb_finish_mmu+0x101/0x260 mm/mmu_gather.c:392 exit_mmap+0x49e/0xd30 mm/mmap.c:3321 __mmput+0x13f/0x530 kernel/fork.c:1349 mmput+0x8a/0xa0 kernel/fork.c:1371 exit_mm+0x1b8/0x360 kernel/exit.c:567 do_exit+0xd57/0x4080 kernel/exit.c:858 do_group_exit+0x2fd/0x390 kernel/exit.c:1021 __do_sys_exit_group kernel/exit.c:1032 [inline] __se_sys_exit_group kernel/exit.c:1030 [inline] __x64_sys_exit_group+0x3c/0x50 kernel/exit.c:1030 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Bytes 3852-3903 of 3904 are uninitialized Memory access of size 3904 starts at ffff88812ea1e000 Data copied to user address 0000000020003280
CPU: 1 PID: 5043 Comm: syz-executor297 Not tainted 6.7.0-rc5-syzkaller-00047-g5bd7ef53ffe5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Fixes: 1853c9496460 ("netlink, mmap: transform mmap skb into full skb on taps") Reported-and-tested-by: syzbot+34ad5fab48f7bf510349@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=34ad5fab48f7bf510349 [1] Signed-off-by: Ryosuke Yasuoka ryasuoka@redhat.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240221074053.1794118-1-ryasuoka@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index d9107b545d360..6ae782efb1ee3 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -167,7 +167,7 @@ static inline u32 netlink_group_mask(u32 group) static struct sk_buff *netlink_to_full_skb(const struct sk_buff *skb, gfp_t gfp_mask) { - unsigned int len = skb_end_offset(skb); + unsigned int len = skb->len; struct sk_buff *new;
new = alloc_skb(len, gfp_mask);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 9a0d18853c280f6a0ee99f91619f2442a17a323a ]
BUG: KMSAN: uninit-value in nla_validate_range_unsigned lib/nlattr.c:222 [inline] BUG: KMSAN: uninit-value in nla_validate_int_range lib/nlattr.c:336 [inline] BUG: KMSAN: uninit-value in validate_nla lib/nlattr.c:575 [inline] BUG: KMSAN: uninit-value in __nla_validate_parse+0x2e20/0x45c0 lib/nlattr.c:631 nla_validate_range_unsigned lib/nlattr.c:222 [inline] nla_validate_int_range lib/nlattr.c:336 [inline] validate_nla lib/nlattr.c:575 [inline] ...
The message in question matches this policy:
[NFTA_TARGET_REV] = NLA_POLICY_MAX(NLA_BE32, 255),
but because NLA_BE32 size in minlen array is 0, the validation code will read past the malformed (too small) attribute.
Note: Other attributes, e.g. BITFIELD32, SINT, UINT.. are also missing: those likely should be added too.
Reported-by: syzbot+3f497b07aa3baf2fb4d0@syzkaller.appspotmail.com Reported-by: xingwei lee xrivendell7@gmail.com Closes: https://lore.kernel.org/all/CABOYnLzFYHSnvTyS6zGa-udNX55+izqkOt2sB9WDqUcEGW6... Fixes: ecaf75ffd5f5 ("netlink: introduce bigendian integer types") Signed-off-by: Florian Westphal fw@strlen.de Link: https://lore.kernel.org/r/20240221172740.5092-1-fw@strlen.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- lib/nlattr.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/lib/nlattr.c b/lib/nlattr.c index dc15e7888fc1f..0319e811bb10a 100644 --- a/lib/nlattr.c +++ b/lib/nlattr.c @@ -30,6 +30,8 @@ static const u8 nla_attr_len[NLA_TYPE_MAX+1] = { [NLA_S16] = sizeof(s16), [NLA_S32] = sizeof(s32), [NLA_S64] = sizeof(s64), + [NLA_BE16] = sizeof(__be16), + [NLA_BE32] = sizeof(__be32), };
static const u8 nla_attr_minlen[NLA_TYPE_MAX+1] = { @@ -43,6 +45,8 @@ static const u8 nla_attr_minlen[NLA_TYPE_MAX+1] = { [NLA_S16] = sizeof(s16), [NLA_S32] = sizeof(s32), [NLA_S64] = sizeof(s64), + [NLA_BE16] = sizeof(__be16), + [NLA_BE32] = sizeof(__be32), };
/*
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 5ae1e9922bbdbaeb9cfbe91085ab75927488ac0f ]
syzkaller triggered following kasan splat: BUG: KASAN: use-after-free in __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170 Read of size 1 at addr ffff88812fb4000e by task syz-executor183/5191 [..] kasan_report+0xda/0x110 mm/kasan/report.c:588 __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170 skb_flow_dissect_flow_keys include/linux/skbuff.h:1514 [inline] ___skb_get_hash net/core/flow_dissector.c:1791 [inline] __skb_get_hash+0xc7/0x540 net/core/flow_dissector.c:1856 skb_get_hash include/linux/skbuff.h:1556 [inline] ip_tunnel_xmit+0x1855/0x33c0 net/ipv4/ip_tunnel.c:748 ipip_tunnel_xmit+0x3cc/0x4e0 net/ipv4/ipip.c:308 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564 __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x42c/0x5d0 net/core/neighbour.c:1592 ... ip_finish_output2+0x833/0x2550 net/ipv4/ip_output.c:235 ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323 .. iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1dbc/0x33c0 net/ipv4/ip_tunnel.c:831 ipgre_xmit+0x4a1/0x980 net/ipv4/ip_gre.c:665 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564 ...
The splat occurs because skb->data points past skb->head allocated area. This is because neigh layer does: __skb_pull(skb, skb_network_offset(skb));
... but skb_network_offset() returns a negative offset and __skb_pull() arg is unsigned. IOW, we skb->data gets "adjusted" by a huge value.
The negative value is returned because skb->head and skb->data distance is more than 64k and skb->network_header (u16) has wrapped around.
The bug is in the ip_tunnel infrastructure, which can cause dev->needed_headroom to increment ad infinitum.
The syzkaller reproducer consists of packets getting routed via a gre tunnel, and route of gre encapsulated packets pointing at another (ipip) tunnel. The ipip encapsulation finds gre0 as next output device.
This results in the following pattern:
1). First packet is to be sent out via gre0. Route lookup found an output device, ipip0.
2). ip_tunnel_xmit for gre0 bumps gre0->needed_headroom based on the future output device, rt.dev->needed_headroom (ipip0).
3). ip output / start_xmit moves skb on to ipip0. which runs the same code path again (xmit recursion).
4). Routing step for the post-gre0-encap packet finds gre0 as output device to use for ipip0 encapsulated packet.
tunl0->needed_headroom is then incremented based on the (already bumped) gre0 device headroom.
This repeats for every future packet:
gre0->needed_headroom gets inflated because previous packets' ipip0 step incremented rt->dev (gre0) headroom, and ipip0 incremented because gre0 needed_headroom was increased.
For each subsequent packet, gre/ipip0->needed_headroom grows until post-expand-head reallocations result in a skb->head/data distance of more than 64k.
Once that happens, skb->network_header (u16) wraps around when pskb_expand_head tries to make sure that skb_network_offset() is unchanged after the headroom expansion/reallocation.
After this skb_network_offset(skb) returns a different (and negative) result post headroom expansion.
The next trip to neigh layer (or anything else that would __skb_pull the network header) makes skb->data point to a memory location outside skb->head area.
v2: Cap the needed_headroom update to an arbitarily chosen upperlimit to prevent perpetual increase instead of dropping the headroom increment completely.
Reported-and-tested-by: syzbot+bfde3bef047a81b8fde6@syzkaller.appspotmail.com Closes: https://groups.google.com/g/syzkaller-bugs/c/fL9G6GtWskY/m/VKk_PR5FBAAJ Fixes: 243aad830e8a ("ip_gre: include route header_len in max_headroom calculation") Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240220135606.4939-1-fw@strlen.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_tunnel.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index beeae624c412d..2d29fce7c5606 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -554,6 +554,20 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb, return 0; }
+static void ip_tunnel_adj_headroom(struct net_device *dev, unsigned int headroom) +{ + /* we must cap headroom to some upperlimit, else pskb_expand_head + * will overflow header offsets in skb_headers_offset_update(). + */ + static const unsigned int max_allowed = 512; + + if (headroom > max_allowed) + headroom = max_allowed; + + if (headroom > READ_ONCE(dev->needed_headroom)) + WRITE_ONCE(dev->needed_headroom, headroom); +} + void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, u8 proto, int tunnel_hlen) { @@ -632,13 +646,13 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, }
headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len; - if (headroom > READ_ONCE(dev->needed_headroom)) - WRITE_ONCE(dev->needed_headroom, headroom); - - if (skb_cow_head(skb, READ_ONCE(dev->needed_headroom))) { + if (skb_cow_head(skb, headroom)) { ip_rt_put(rt); goto tx_dropped; } + + ip_tunnel_adj_headroom(dev, headroom); + iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, tos, ttl, df, !net_eq(tunnel->net, dev_net(dev))); return; @@ -818,16 +832,16 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr) + rt->dst.header_len + ip_encap_hlen(&tunnel->encap); - if (max_headroom > READ_ONCE(dev->needed_headroom)) - WRITE_ONCE(dev->needed_headroom, max_headroom);
- if (skb_cow_head(skb, READ_ONCE(dev->needed_headroom))) { + if (skb_cow_head(skb, max_headroom)) { ip_rt_put(rt); DEV_STATS_INC(dev, tx_dropped); kfree_skb(skb); return; }
+ ip_tunnel_adj_headroom(dev, max_headroom); + iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl, df, !net_eq(tunnel->net, dev_net(dev))); return;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeremy Kerr jk@codeconstruct.com.au
[ Upstream commit 3773d65ae5154ed7df404b050fd7387a36ab5ef3 ]
Currently, mctp_local_output only takes ownership of skb on success, and we may leak an skb if mctp_local_output fails in specific states; the skb ownership isn't transferred until the actual output routing occurs.
Instead, make mctp_local_output free the skb on all error paths up to the route action, so it always consumes the passed skb.
Fixes: 833ef3b91de6 ("mctp: Populate socket implementation") Signed-off-by: Jeremy Kerr jk@codeconstruct.com.au Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240220081053.1439104-1-jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/mctp.h | 1 + net/mctp/route.c | 10 ++++++++-- 2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/include/net/mctp.h b/include/net/mctp.h index da86e106c91d5..2bff5f47ce82f 100644 --- a/include/net/mctp.h +++ b/include/net/mctp.h @@ -249,6 +249,7 @@ struct mctp_route { struct mctp_route *mctp_route_lookup(struct net *net, unsigned int dnet, mctp_eid_t daddr);
+/* always takes ownership of skb */ int mctp_local_output(struct sock *sk, struct mctp_route *rt, struct sk_buff *skb, mctp_eid_t daddr, u8 req_tag);
diff --git a/net/mctp/route.c b/net/mctp/route.c index 6218dcd07e184..ceee44ea09d97 100644 --- a/net/mctp/route.c +++ b/net/mctp/route.c @@ -888,7 +888,7 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt, dev = dev_get_by_index_rcu(sock_net(sk), cb->ifindex); if (!dev) { rcu_read_unlock(); - return rc; + goto out_free; } rt->dev = __mctp_dev_get(dev); rcu_read_unlock(); @@ -903,7 +903,8 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt, rt->mtu = 0;
} else { - return -EINVAL; + rc = -EINVAL; + goto out_free; }
spin_lock_irqsave(&rt->dev->addrs_lock, flags); @@ -966,12 +967,17 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt, rc = mctp_do_fragment_route(rt, skb, mtu, tag); }
+ /* route output functions consume the skb, even on error */ + skb = NULL; + out_release: if (!ext_rt) mctp_route_release(rt);
mctp_dev_put(tmp_rt.dev);
+out_free: + kfree_skb(skb); return rc; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 734f06db599f66d6a159c78abfdbadfea3b7d43b ]
Since commit 5d93cfcf7360 ("net: dpaa: Convert to phylink"), we support the "10gbase-r" phy-mode through a driver-based conversion of "xgmii", but we still don't actually support it when the device tree specifies "10gbase-r" proper.
This is because boards such as LS1046A-RDB do not define pcs-handle-names (for whatever reason) in the ethernet@f0000 device tree node, and the code enters through this code path:
err = of_property_match_string(mac_node, "pcs-handle-names", "xfi"); // code takes neither branch and falls through if (err >= 0) { (...) } else if (err != -EINVAL && err != -ENODATA) { goto _return_fm_mac_free; }
(...)
/* For compatibility, if pcs-handle-names is missing, we assume this * phy is the first one in pcsphy-handle */ err = of_property_match_string(mac_node, "pcs-handle-names", "sgmii"); if (err == -EINVAL || err == -ENODATA) pcs = memac_pcs_create(mac_node, 0); // code takes this branch else if (err < 0) goto _return_fm_mac_free; else pcs = memac_pcs_create(mac_node, err);
// A default PCS is created and saved in "pcs"
// This determination fails and mistakenly saves the default PCS // memac->sgmii_pcs instead of memac->xfi_pcs, because at this // stage, mac_dev->phy_if == PHY_INTERFACE_MODE_10GBASER. if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII) memac->xfi_pcs = pcs; else memac->sgmii_pcs = pcs;
In other words, in the absence of pcs-handle-names, the default xfi_pcs assignment logic only works when in the device tree we have PHY_INTERFACE_MODE_XGMII.
By reversing the order between the fallback xfi_pcs assignment and the "xgmii" overwrite with "10gbase-r", we are able to support both values in the device tree, with identical behavior.
Currently, it is impossible to make the s/xgmii/10gbase-r/ device tree conversion, because it would break forward compatibility (new device tree with old kernel). The only way to modify existing device trees to phy-interface-mode = "10gbase-r" is to fix stable kernels to accept this value and handle it properly.
One reason why the conversion is desirable is because with pre-phylink kernels, the Aquantia PHY driver used to warn about the improper use of PHY_INTERFACE_MODE_XGMII [1]. It is best to have a single (latest) device tree that works with all supported stable kernel versions.
Note that the blamed commit does not constitute a regression per se. Older stable kernels like 6.1 still do not work with "10gbase-r", but for a different reason. That is a battle for another time.
[1] https://lore.kernel.org/netdev/20240214-ls1046-dts-use-10gbase-r-v1-1-8c2d68...
Fixes: 5d93cfcf7360 ("net: dpaa: Convert to phylink") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Sean Anderson sean.anderson@seco.com Acked-by: Madalin Bucur madalin.bucur@oss.nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/freescale/fman/fman_memac.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.c b/drivers/net/ethernet/freescale/fman/fman_memac.c index 9ba15d3183d75..758535adc9ff5 100644 --- a/drivers/net/ethernet/freescale/fman/fman_memac.c +++ b/drivers/net/ethernet/freescale/fman/fman_memac.c @@ -1073,6 +1073,14 @@ int memac_initialization(struct mac_device *mac_dev, unsigned long capabilities; unsigned long *supported;
+ /* The internal connection to the serdes is XGMII, but this isn't + * really correct for the phy mode (which is the external connection). + * However, this is how all older device trees say that they want + * 10GBASE-R (aka XFI), so just convert it for them. + */ + if (mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII) + mac_dev->phy_if = PHY_INTERFACE_MODE_10GBASER; + mac_dev->phylink_ops = &memac_mac_ops; mac_dev->set_promisc = memac_set_promiscuous; mac_dev->change_addr = memac_modify_mac_address; @@ -1139,7 +1147,7 @@ int memac_initialization(struct mac_device *mac_dev, * (and therefore that xfi_pcs cannot be set). If we are defaulting to * XGMII, assume this is for XFI. Otherwise, assume it is for SGMII. */ - if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII) + if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_10GBASER) memac->xfi_pcs = pcs; else memac->sgmii_pcs = pcs; @@ -1153,14 +1161,6 @@ int memac_initialization(struct mac_device *mac_dev, goto _return_fm_mac_free; }
- /* The internal connection to the serdes is XGMII, but this isn't - * really correct for the phy mode (which is the external connection). - * However, this is how all older device trees say that they want - * 10GBASE-R (aka XFI), so just convert it for them. - */ - if (mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII) - mac_dev->phy_if = PHY_INTERFACE_MODE_10GBASER; - /* TODO: The following interface modes are supported by (some) hardware * but not by this driver: * - 1000BASE-KX
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yunjian Wang wangyunjian@huawei.com
[ Upstream commit 2a770cdc4382b457ca3d43d03f0f0064f905a0d0 ]
When a queue(tfile) is detached, we only update tfile's queue_index, but do not update xdp_rxq_info's queue_index. This patch fixes it.
Fixes: 8bf5c4ee1889 ("tun: setup xdp_rxq_info") Signed-off-by: Yunjian Wang wangyunjian@huawei.com Link: https://lore.kernel.org/r/1708398727-46308-1-git-send-email-wangyunjian@huaw... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/tun.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 4a4f8c8e79fa1..8f95a562b8d0c 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -653,6 +653,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean) tun->tfiles[tun->numqueues - 1]); ntfile = rtnl_dereference(tun->tfiles[index]); ntfile->queue_index = index; + ntfile->xdp_rxq.queue_index = index; rcu_assign_pointer(tun->tfiles[tun->numqueues - 1], NULL);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Doug Smythies dsmythies@telus.net
[ Upstream commit f0a0fc10abb062d122db5ac4ed42f6d1ca342649 ]
There is a loophole in pstate limit clamping for the intel_cpufreq CPU frequency scaling driver (intel_pstate in passive mode), schedutil CPU frequency scaling governor, HWP (HardWare Pstate) control enabled, when the adjust_perf call back path is used.
Fix it.
Fixes: a365ab6b9dfb cpufreq: intel_pstate: Implement the ->adjust_perf() callback Signed-off-by: Doug Smythies dsmythies@telus.net Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/intel_pstate.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index f5c69fa230d9b..357e01a272743 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -2983,6 +2983,9 @@ static void intel_cpufreq_adjust_perf(unsigned int cpunum, if (min_pstate < cpu->min_perf_ratio) min_pstate = cpu->min_perf_ratio;
+ if (min_pstate > cpu->max_perf_ratio) + min_pstate = cpu->max_perf_ratio; + max_pstate = min(cap_pstate, cpu->max_perf_ratio); if (max_pstate < min_pstate) max_pstate = min_pstate;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit fe9f801355f0b47668419f30f1fac1cf4539e736 ]
veth sets NETIF_F_GRO automatically when XDP is enabled, because both features use the same NAPI machinery.
The logic to clear NETIF_F_GRO sits in veth_disable_xdp() which is called both on ndo_stop and when XDP is turned off. To avoid the flag from being cleared when the device is brought down, the clearing is skipped when IFF_UP is not set. Bringing the device down should indeed not modify its features.
Unfortunately, this means that clearing is also skipped when XDP is disabled _while_ the device is down. And there's nothing on the open path to bring the device features back into sync. IOW if user enables XDP, disables it and then brings the device up we'll end up with a stray GRO flag set but no NAPI instances.
We don't depend on the GRO flag on the datapath, so the datapath won't crash. We will crash (or hang), however, next time features are sync'ed (either by user via ethtool or peer changing its config). The GRO flag will go away, and veth will try to disable the NAPIs. But the open path never created them since XDP was off, the GRO flag was a stray. If NAPI was initialized before we'll hang in napi_disable(). If it never was we'll crash trying to stop uninitialized hrtimer.
Move the GRO flag updates to the XDP enable / disable paths, instead of mixing them with the ndo_open / ndo_close paths.
Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP") Reported-by: Thomas Gleixner tglx@linutronix.de Reported-by: syzbot+039399a9b96297ddedca@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/veth.c | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 977861c46b1fe..bf4da315ae01c 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1208,14 +1208,6 @@ static int veth_enable_xdp(struct net_device *dev) veth_disable_xdp_range(dev, 0, dev->real_num_rx_queues, true); return err; } - - if (!veth_gro_requested(dev)) { - /* user-space did not require GRO, but adding XDP - * is supposed to get GRO working - */ - dev->features |= NETIF_F_GRO; - netdev_features_change(dev); - } } }
@@ -1235,18 +1227,9 @@ static void veth_disable_xdp(struct net_device *dev) for (i = 0; i < dev->real_num_rx_queues; i++) rcu_assign_pointer(priv->rq[i].xdp_prog, NULL);
- if (!netif_running(dev) || !veth_gro_requested(dev)) { + if (!netif_running(dev) || !veth_gro_requested(dev)) veth_napi_del(dev);
- /* if user-space did not require GRO, since adding XDP - * enabled it, clear it now - */ - if (!veth_gro_requested(dev) && netif_running(dev)) { - dev->features &= ~NETIF_F_GRO; - netdev_features_change(dev); - } - } - veth_disable_xdp_range(dev, 0, dev->real_num_rx_queues, false); }
@@ -1654,6 +1637,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, }
if (!old_prog) { + if (!veth_gro_requested(dev)) { + /* user-space did not require GRO, but adding + * XDP is supposed to get GRO working + */ + dev->features |= NETIF_F_GRO; + netdev_features_change(dev); + } + peer->hw_features &= ~NETIF_F_GSO_SOFTWARE; peer->max_mtu = max_mtu; } @@ -1669,6 +1660,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, if (dev->flags & IFF_UP) veth_disable_xdp(dev);
+ /* if user-space did not require GRO, since adding XDP + * enabled it, clear it now + */ + if (!veth_gro_requested(dev)) { + dev->features &= ~NETIF_F_GRO; + netdev_features_change(dev); + } + if (peer) { peer->hw_features |= NETIF_F_GSO_SOFTWARE; peer->max_mtu = ETH_MAX_MTU;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 10bfd453da64a057bcfd1a49fb6b271c48653cdb ]
It seems that if userspace provides a correct IFA_TARGET_NETNSID value but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr() returns -EINVAL with an elevated "struct net" refcount.
Fixes: 6ecf4c37eb3e ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Christian Brauner brauner@kernel.org Cc: David Ahern dsahern@kernel.org Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/addrconf.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 5a839c5fb1a5a..055230b669cf2 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -5509,9 +5509,10 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh, }
addr = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer); - if (!addr) - return -EINVAL; - + if (!addr) { + err = -EINVAL; + goto errout; + } ifm = nlmsg_data(nlh); if (ifm->ifa_index) dev = dev_get_by_index(tgt_net, ifm->ifa_index);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleksij Rempel o.rempel@pengutronix.de
[ Upstream commit 0e67899abfbfdea0c3c0ed3fd263ffc601c5c157 ]
Same as LAN7800, LAN7850 can be used without EEPROM. If EEPROM is not present or not flashed, LAN7850 will fail to sync the speed detected by the PHY with the MAC. In case link speed is 100Mbit, it will accidentally work, otherwise no data can be transferred.
Better way would be to implement link_up callback, or set auto speed configuration unconditionally. But this changes would be more intrusive. So, for now, set it only if no EEPROM is found.
Fixes: e69647a19c87 ("lan78xx: Set ASD in MAC_CR when EEE is enabled.") Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://lore.kernel.org/r/20240222123839.2816561-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/lan78xx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index 5add4145d9fc1..2e6d423f3a016 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -3035,7 +3035,8 @@ static int lan78xx_reset(struct lan78xx_net *dev) if (dev->chipid == ID_REV_CHIP_ID_7801_) buf &= ~MAC_CR_GMII_EN_;
- if (dev->chipid == ID_REV_CHIP_ID_7800_) { + if (dev->chipid == ID_REV_CHIP_ID_7800_ || + dev->chipid == ID_REV_CHIP_ID_7850_) { ret = lan78xx_read_raw_eeprom(dev, 0, 1, &sig); if (!ret && sig != EEPROM_INDICATOR) { /* Implies there is no external eeprom. Set mac speed */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 1ce7d306ea63f3e379557c79abd88052e0483813 ]
struct veth_rq is pretty large, 832B total without debug options enabled. Since commit under Fixes we try to pre-allocate enough queues for every possible CPU. Miao Wang reports that this may lead to order-5 allocations which will fail in production.
Let the allocation fallback to vmalloc() and try harder. These are the same flags we pass to netdev queue allocation.
Reported-and-tested-by: Miao Wang shankerwangmiao@gmail.com Fixes: 9d3684c24a52 ("veth: create by default nr_possible_cpus queues") Link: https://lore.kernel.org/all/5F52CAE2-2FB7-4712-95F1-3312FBBFA8DD@gmail.com/ Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240223235908.693010-1-kuba@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/veth.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index bf4da315ae01c..a2e80278eb2f9 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1461,7 +1461,8 @@ static int veth_alloc_queues(struct net_device *dev) struct veth_priv *priv = netdev_priv(dev); int i;
- priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL_ACCOUNT); + priv->rq = kvcalloc(dev->num_rx_queues, sizeof(*priv->rq), + GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); if (!priv->rq) return -ENOMEM;
@@ -1477,7 +1478,7 @@ static void veth_free_queues(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev);
- kfree(priv->rq); + kvfree(priv->rq); }
static int veth_dev_init(struct net_device *dev)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Javier Carrasco javier.carrasco.cruz@gmail.com
[ Upstream commit c68b2c9eba38ec3f60f4894b189090febf4d8d22 ]
The MII code does not check the return value of mdio_read (among others), and therefore no error code should be sent. A previous fix to the use of an uninitialized variable propagates negative error codes, that might lead to wrong operations by the MII library.
An example of such issues is the use of mii_nway_restart by the dm9601 driver. The mii_nway_restart function does not check the value returned by mdio_read, which in this case might be a negative number which could contain the exact bit the function checks (BMCR_ANENABLE = 0x1000).
Return zero in case of error, as it is common practice in users of mdio_read to avoid wrong uses of the return value.
Fixes: 8f8abb863fa5 ("net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read") Signed-off-by: Javier Carrasco javier.carrasco.cruz@gmail.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Peter Korsgaard peter@korsgaard.com Link: https://lore.kernel.org/r/20240225-dm9601_ret_err-v1-1-02c1d959ea59@gmail.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/dm9601.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c index 99ec1d4a972db..8b6d6a1b3c2ec 100644 --- a/drivers/net/usb/dm9601.c +++ b/drivers/net/usb/dm9601.c @@ -232,7 +232,7 @@ static int dm9601_mdio_read(struct net_device *netdev, int phy_id, int loc) err = dm_read_shared_word(dev, 1, loc, &res); if (err < 0) { netdev_err(dev->net, "MDIO read error: %d\n", err); - return err; + return 0; }
netdev_dbg(dev->net,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleksij Rempel o.rempel@pengutronix.de
[ Upstream commit e3d5d70cb483df8296dd44e9ae3b6355ef86494c ]
Disable BH around the call to napi_schedule() to avoid following error: NOHZ tick-stop error: local softirq work is pending, handler #08!!!
Fixes: ec4c7e12396b ("lan78xx: Introduce NAPI polling support") Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://lore.kernel.org/r/20240226110820.2113584-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/lan78xx.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index 2e6d423f3a016..a2dde84499fdd 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -1501,7 +1501,9 @@ static int lan78xx_link_reset(struct lan78xx_net *dev)
lan78xx_rx_urb_submit_all(dev);
+ local_bh_disable(); napi_schedule(&dev->napi); + local_bh_enable(); }
return 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Justin Iurman justin.iurman@uliege.be
[ Upstream commit 6a2008641920a9c6fe1abbeb9acbec463215d505 ]
Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY IANA allocation for IOAM", while RFC 9486 is available for some time now. Just update the reference.
Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Signed-off-by: Justin Iurman justin.iurman@uliege.be Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240226124921.9097-1-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/in6.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h index c4c53a9ab9595..ff8d21f9e95b7 100644 --- a/include/uapi/linux/in6.h +++ b/include/uapi/linux/in6.h @@ -145,7 +145,7 @@ struct in6_flowlabel_req { #define IPV6_TLV_PADN 1 #define IPV6_TLV_ROUTERALERT 5 #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */ -#define IPV6_TLV_IOAM 49 /* TEMPORARY IANA allocation for IOAM */ +#define IPV6_TLV_IOAM 49 /* RFC 9486 */ #define IPV6_TLV_JUMBO 194 #define IPV6_TLV_HAO 201 /* home address option */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Raczynski j.raczynski@samsung.com
[ Upstream commit 8af411bbba1f457c33734795f024d0ef26d0963f ]
Currently when suspending driver and stopping workqueue it is checked whether workqueue is not NULL and if so, it is destroyed. Function destroy_workqueue() does drain queue and does clear variable, but it does not set workqueue variable to NULL. This can cause kernel/module panic if code attempts to clear workqueue that was not initialized.
This scenario is possible when resuming suspended driver in stmmac_resume(), because there is no handling for failed stmmac_hw_setup(), which can fail and return if DMA engine has failed to initialize, and workqueue is initialized after DMA engine. Should DMA engine fail to initialize, resume will proceed normally, but interface won't work and TX queue will eventually timeout, causing 'Reset adapter' error. This then does destroy workqueue during reset process. And since workqueue is initialized after DMA engine and can be skipped, it will cause kernel/module panic.
To secure against this possible crash, set workqueue variable to NULL when destroying workqueue.
Log/backtrace from crash goes as follows: [88.031977]------------[ cut here ]------------ [88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out [88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398 <Skipping backtrace for watchdog timeout> [88.032251]---[ end trace e70de432e4d5c2c0 ]--- [88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter. [88.036359]------------[ cut here ]------------ [88.036519]Call trace: [88.036523] flush_workqueue+0x3e4/0x430 [88.036528] drain_workqueue+0xc4/0x160 [88.036533] destroy_workqueue+0x40/0x270 [88.036537] stmmac_fpe_stop_wq+0x4c/0x70 [88.036541] stmmac_release+0x278/0x280 [88.036546] __dev_close_many+0xcc/0x158 [88.036551] dev_close_many+0xbc/0x190 [88.036555] dev_close.part.0+0x70/0xc0 [88.036560] dev_close+0x24/0x30 [88.036564] stmmac_service_task+0x110/0x140 [88.036569] process_one_work+0x1d8/0x4a0 [88.036573] worker_thread+0x54/0x408 [88.036578] kthread+0x164/0x170 [88.036583] ret_from_fork+0x10/0x20 [88.036588]---[ end trace e70de432e4d5c2c1 ]--- [88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
Fixes: 5a5586112b929 ("net: stmmac: support FPE link partner hand-shaking procedure") Signed-off-by: Jakub Raczynski j.raczynski@samsung.com Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index e9a1b60ebb503..de4d769195174 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3942,8 +3942,10 @@ static void stmmac_fpe_stop_wq(struct stmmac_priv *priv) { set_bit(__FPE_REMOVING, &priv->fpe_task_state);
- if (priv->fpe_wq) + if (priv->fpe_wq) { destroy_workqueue(priv->fpe_wq); + priv->fpe_wq = NULL; + }
netdev_info(priv->dev, "FPE workqueue stop"); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Dreßler verdre@v0yd.nl
[ Upstream commit 6b3899be24b16ff8ee0cb25f0bd59b01b15ba1d1 ]
There's a very confusing mistake in the code starting a HCI inquiry: We're calling hci_dev_test_flag() to test for HCI_INQUIRY, but hci_dev_test_flag() checks hdev->dev_flags instead of hdev->flags. HCI_INQUIRY is a bit that's set on hdev->flags, not on hdev->dev_flags though.
HCI_INQUIRY equals the integer 7, and in hdev->dev_flags, 7 means HCI_BONDABLE, so we were actually checking for HCI_BONDABLE here.
The mistake is only present in the synchronous code for starting an inquiry, not in the async one. Also devices are typically bondable while doing an inquiry, so that might be the reason why nobody noticed it so far.
Fixes: abfeea476c68 ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY") Signed-off-by: Jonas Dreßler verdre@v0yd.nl Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_sync.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 97284d9b2a2e3..39ccbb1be24c3 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -5629,7 +5629,7 @@ static int hci_inquiry_sync(struct hci_dev *hdev, u8 length)
bt_dev_dbg(hdev, "");
- if (hci_dev_test_flag(hdev, HCI_INQUIRY)) + if (test_bit(HCI_INQUIRY, &hdev->flags)) return 0;
hci_dev_lock(hdev);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ying Hsu yinghsu@chromium.org
[ Upstream commit 2449007d3f73b2842c9734f45f0aadb522daf592 ]
While handling the HCI_EV_HARDWARE_ERROR event, if the underlying BT controller is not responding, the GPIO reset mechanism would free the hci_dev and lead to a use-after-free in hci_error_reset.
Here's the call trace observed on a ChromeOS device with Intel AX201: queue_work_on+0x3e/0x6c __hci_cmd_sync_sk+0x2ee/0x4c0 [bluetooth HASH:3b4a6] ? init_wait_entry+0x31/0x31 __hci_cmd_sync+0x16/0x20 [bluetooth <HASH:3b4a 6>] hci_error_reset+0x4f/0xa4 [bluetooth <HASH:3b4a 6>] process_one_work+0x1d8/0x33f worker_thread+0x21b/0x373 kthread+0x13a/0x152 ? pr_cont_work+0x54/0x54 ? kthread_blkcg+0x31/0x31 ret_from_fork+0x1f/0x30
This patch holds the reference count on the hci_dev while processing a HCI_EV_HARDWARE_ERROR event to avoid potential crash.
Fixes: c7741d16a57c ("Bluetooth: Perform a power cycle when receiving hardware error event") Signed-off-by: Ying Hsu yinghsu@chromium.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 65601aa52e0d8..2821a42cefdc6 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1049,6 +1049,7 @@ static void hci_error_reset(struct work_struct *work) { struct hci_dev *hdev = container_of(work, struct hci_dev, error_reset);
+ hci_dev_hold(hdev); BT_DBG("%s", hdev->name);
if (hdev->hw_error) @@ -1056,10 +1057,10 @@ static void hci_error_reset(struct work_struct *work) else bt_dev_err(hdev, "hardware error 0x%2.2x", hdev->hw_error_code);
- if (hci_dev_do_close(hdev)) - return; + if (!hci_dev_do_close(hdev)) + hci_dev_do_open(hdev);
- hci_dev_do_open(hdev); + hci_dev_put(hdev); }
void hci_uuids_clear(struct hci_dev *hdev)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit e5469adb2a7e930d96813316592302d9f8f1df4e ]
During suspend, only wakeable devices can be in acceptlist, so if the device was previously added it needs to be removed otherwise the device can end up waking up the system prematurely.
Fixes: 3b42055388c3 ("Bluetooth: hci_sync: Fix attempting to suspend with unfiltered passive scan") Signed-off-by: Clancy Shang clancy.shang@quectel.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_sync.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 39ccbb1be24c3..b90ee68bba1d6 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -2274,8 +2274,11 @@ static int hci_le_add_accept_list_sync(struct hci_dev *hdev,
/* During suspend, only wakeable devices can be in acceptlist */ if (hdev->suspended && - !(params->flags & HCI_CONN_FLAG_REMOTE_WAKEUP)) + !(params->flags & HCI_CONN_FLAG_REMOTE_WAKEUP)) { + hci_le_del_accept_list_sync(hdev, ¶ms->addr, + params->addr_type); return 0; + }
/* Select filter policy to accept all advertising */ if (*num_entries >= hdev->le_accept_list_size)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zijun Hu quic_zijuhu@quicinc.com
[ Upstream commit 61a5ab72edea7ebc3ad2c6beea29d966f528ebfb ]
hci_store_wake_reason() wrongly parses event HCI_Connection_Request as HCI_Connection_Complete and HCI_Connection_Complete as HCI_Connection_Request, so causes recording wakeup BD_ADDR error and potential stability issue, fix it by using the correct field.
Fixes: 2f20216c1d6f ("Bluetooth: Emit controller suspend and resume events") Signed-off-by: Zijun Hu quic_zijuhu@quicinc.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index ef8c3bed73617..22b22c264c2a5 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -7420,10 +7420,10 @@ static void hci_store_wake_reason(struct hci_dev *hdev, u8 event, * keep track of the bdaddr of the connection event that woke us up. */ if (event == HCI_EV_CONN_REQUEST) { - bacpy(&hdev->wake_addr, &conn_complete->bdaddr); + bacpy(&hdev->wake_addr, &conn_request->bdaddr); hdev->wake_addr_type = BDADDR_BREDR; } else if (event == HCI_EV_CONN_COMPLETE) { - bacpy(&hdev->wake_addr, &conn_request->bdaddr); + bacpy(&hdev->wake_addr, &conn_complete->bdaddr); hdev->wake_addr_type = BDADDR_BREDR; } else if (event == HCI_EV_LE_META) { struct hci_ev_le_meta *le_ev = (void *)skb->data;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 7e74aa53a68bf60f6019bd5d9a9a1406ec4d4865 ]
If we received HCI_EV_IO_CAPA_REQUEST while HCI_OP_READ_REMOTE_EXT_FEATURES is yet to be responded assume the remote does support SSP since otherwise this event shouldn't be generated.
Link: https://lore.kernel.org/linux-bluetooth/CABBYNZ+9UdG1cMZVmdtN3U2aS16AKMCyTAR... Fixes: c7f59461f5a7 ("Bluetooth: Fix a refcnt underflow problem for hci_conn") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 22b22c264c2a5..613f2fd0bcc1e 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -5329,9 +5329,12 @@ static void hci_io_capa_request_evt(struct hci_dev *hdev, void *data, hci_dev_lock(hdev);
conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr); - if (!conn || !hci_conn_ssp_enabled(conn)) + if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED)) goto unlock;
+ /* Assume remote supports SSP since it has triggered this event */ + set_bit(HCI_CONN_SSP_ENABLED, &conn->flags); + hci_conn_hold(conn);
if (!hci_dev_test_flag(hdev, HCI_MGMT))
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kai-Heng Feng kai.heng.feng@canonical.com
[ Upstream commit e4b019515f950b4e6e5b74b2e1bb03a90cb33039 ]
Right now Linux BT stack cannot pass test case "GAP/CONN/CPUP/BV-05-C 'Connection Parameter Update Procedure Invalid Parameters Central Responder'" in Bluetooth Test Suite revision GAP.TS.p44. [0]
That was revoled by commit c49a8682fc5d ("Bluetooth: validate BLE connection interval updates"), but later got reverted due to devices like keyboards and mice may require low connection interval.
So only validate the max value connection interval to pass the Test Suite, and let devices to request low connection interval if needed.
[0] https://www.bluetooth.org/docman/handlers/DownloadDoc.ashx?doc_id=229869
Fixes: 68d19d7d9957 ("Revert "Bluetooth: validate BLE connection interval updates"") Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 4 ++++ net/bluetooth/l2cap_core.c | 8 +++++++- 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 613f2fd0bcc1e..2a5f5a7d2412b 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -6797,6 +6797,10 @@ static void hci_le_remote_conn_param_req_evt(struct hci_dev *hdev, void *data, return send_conn_param_neg_reply(hdev, handle, HCI_ERROR_UNKNOWN_CONN_ID);
+ if (max > hcon->le_conn_max_interval) + return send_conn_param_neg_reply(hdev, handle, + HCI_ERROR_INVALID_LL_PARAMS); + if (hci_check_conn_params(min, max, latency, timeout)) return send_conn_param_neg_reply(hdev, handle, HCI_ERROR_INVALID_LL_PARAMS); diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 60298975d5c45..656f49b299d20 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5613,7 +5613,13 @@ static inline int l2cap_conn_param_update_req(struct l2cap_conn *conn,
memset(&rsp, 0, sizeof(rsp));
- err = hci_check_conn_params(min, max, latency, to_multiplier); + if (max > hcon->le_conn_max_interval) { + BT_DBG("requested connection interval exceeds current bounds."); + err = -EINVAL; + } else { + err = hci_check_conn_params(min, max, latency, to_multiplier); + } + if (err) rsp.result = cpu_to_le16(L2CAP_CONN_PARAM_REJECTED); else
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zijun Hu quic_zijuhu@quicinc.com
[ Upstream commit c0dbc56077ae759f2dd602c7561480bc2b1b712c ]
Vendor-specific command patch config has HCI_Command_Complete event as response, but qca_send_patch_config_cmd() wrongly expects vendor-specific event for the command, fixed by using right event type.
Btmon log for the vendor-specific command are shown below: < HCI Command: Vendor (0x3f|0x0000) plen 5 28 01 00 00 00
HCI Event: Command Complete (0x0e) plen 5
Vendor (0x3f|0x0000) ncmd 1 Status: Success (0x00) 28
Fixes: 4fac8a7ac80b ("Bluetooth: btqca: sequential validation") Signed-off-by: Zijun Hu quic_zijuhu@quicinc.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btqca.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c index fdb0fae88d1c5..b40b32fa7f1c3 100644 --- a/drivers/bluetooth/btqca.c +++ b/drivers/bluetooth/btqca.c @@ -152,7 +152,7 @@ static int qca_send_patch_config_cmd(struct hci_dev *hdev) bt_dev_dbg(hdev, "QCA Patch config");
skb = __hci_cmd_sync_ev(hdev, EDL_PATCH_CMD_OPCODE, sizeof(cmd), - cmd, HCI_EV_VENDOR, HCI_INIT_TIMEOUT); + cmd, 0, HCI_INIT_TIMEOUT); if (IS_ERR(skb)) { err = PTR_ERR(skb); bt_dev_err(hdev, "Sending QCA Patch config failed (%d)", err);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Janaki Ramaiah Thota quic_janathot@quicinc.com
[ Upstream commit 7dcd3e014aa7faeeaf4047190b22d8a19a0db696 ]
BT adapter going into UNCONFIGURED state during BT turn ON when devicetree has no local-bd-address node.
Bluetooth will not work out of the box on such devices, to avoid this problem, added check to set HCI_QUIRK_USE_BDADDR_PROPERTY based on local-bd-address node entry.
When this quirk is not set, the public Bluetooth address read by host from controller though HCI Read BD Address command is considered as valid.
Fixes: e668eb1e1578 ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts") Signed-off-by: Janaki Ramaiah Thota quic_janathot@quicinc.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/hci_qca.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 35f74f209d1fc..cc6dabf6884d7 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -7,6 +7,7 @@ * * Copyright (C) 2007 Texas Instruments, Inc. * Copyright (c) 2010, 2012, 2018 The Linux Foundation. All rights reserved. + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. * * Acknowledgements: * This file is based on hci_ll.c, which was... @@ -1886,7 +1887,17 @@ static int qca_setup(struct hci_uart *hu) case QCA_WCN6750: case QCA_WCN6855: case QCA_WCN7850: - set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks); + + /* Set BDA quirk bit for reading BDA value from fwnode property + * only if that property exist in DT. + */ + if (fwnode_property_present(dev_fwnode(hdev->dev.parent), "local-bd-address")) { + set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks); + bt_dev_info(hdev, "setting quirk bit to read BDA from fwnode later"); + } else { + bt_dev_dbg(hdev, "local-bd-address` is not present in the devicetree so not setting quirk bit for BDA"); + } + hci_set_aosp_capable(hdev);
ret = qca_read_soc_version(hdev, &ver, soc_type);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zijun Hu quic_zijuhu@quicinc.com
[ Upstream commit 6abf9dd26bb1699c17d601b9a292577d01827c0e ]
hci_coredump_qca() uses __hci_cmd_sync() to send a vendor-specific command to trigger firmware coredump, but the command does not have any event as its sync response, so it is not suitable to use __hci_cmd_sync(), fixed by using __hci_cmd_send().
Fixes: 06d3fdfcdf5c ("Bluetooth: hci_qca: Add qcom devcoredump support") Signed-off-by: Zijun Hu quic_zijuhu@quicinc.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/hci_qca.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index cc6dabf6884d7..a65de7309da4c 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -1807,13 +1807,12 @@ static int qca_power_on(struct hci_dev *hdev)
static void hci_coredump_qca(struct hci_dev *hdev) { + int err; static const u8 param[] = { 0x26 }; - struct sk_buff *skb;
- skb = __hci_cmd_sync(hdev, 0xfc0c, 1, param, HCI_CMD_TIMEOUT); - if (IS_ERR(skb)) - bt_dev_err(hdev, "%s: trigger crash failed (%ld)", __func__, PTR_ERR(skb)); - kfree_skb(skb); + err = __hci_cmd_send(hdev, 0xfc0c, 1, param); + if (err < 0) + bt_dev_err(hdev, "%s: trigger crash failed (%d)", __func__, err); }
static int qca_setup(struct hci_uart *hu)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ignat Korchagin ignat@cloudflare.com
[ Upstream commit 7e0f122c65912740327e4c54472acaa5f85868cb ]
Commit d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") added some validation of NFPROTO_* families in the nft_compat module, but it broke the ability to use legacy iptables modules in dual-stack nftables.
While with legacy iptables one had to independently manage IPv4 and IPv6 tables, with nftables it is possible to have dual-stack tables sharing the rules. Moreover, it was possible to use rules based on legacy iptables match/target modules in dual-stack nftables.
As an example, the program from [2] creates an INET dual-stack family table using an xt_bpf based rule, which looks like the following (the actual output was generated with a patched nft tool as the current nft tool does not parse dual stack tables with legacy match rules, so consider it for illustrative purposes only):
table inet testfw { chain input { type filter hook prerouting priority filter; policy accept; bytecode counter packets 0 bytes 0 accept } }
After d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") we get EOPNOTSUPP for the above program.
Fix this by allowing NFPROTO_INET for nft_(match/target)_validate(), but also restrict the functions to classic iptables hooks.
Changes in v3: * clarify that upstream nft will not display such configuration properly and that the output was generated with a patched nft tool * remove example program from commit description and link to it instead * no code changes otherwise
Changes in v2: * restrict nft_(match/target)_validate() to classic iptables hooks * rewrite example program to use unmodified libnftnl
Fixes: d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") Link: https://lore.kernel.org/all/Zc1PfoWN38UuFJRI@calendula/T/#mc947262582c90fec0... [1] Link: https://lore.kernel.org/all/20240220145509.53357-1-ignat@cloudflare.com/ [2] Reported-by: Jordan Griege jgriege@cloudflare.com Signed-off-by: Ignat Korchagin ignat@cloudflare.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_compat.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c index 1f9474fefe849..d3d11dede5450 100644 --- a/net/netfilter/nft_compat.c +++ b/net/netfilter/nft_compat.c @@ -359,10 +359,20 @@ static int nft_target_validate(const struct nft_ctx *ctx,
if (ctx->family != NFPROTO_IPV4 && ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET && ctx->family != NFPROTO_BRIDGE && ctx->family != NFPROTO_ARP) return -EOPNOTSUPP;
+ ret = nft_chain_validate_hooks(ctx->chain, + (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_FORWARD) | + (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING)); + if (ret) + return ret; + if (nft_is_base_chain(ctx->chain)) { const struct nft_base_chain *basechain = nft_base_chain(ctx->chain); @@ -610,10 +620,20 @@ static int nft_match_validate(const struct nft_ctx *ctx,
if (ctx->family != NFPROTO_IPV4 && ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET && ctx->family != NFPROTO_BRIDGE && ctx->family != NFPROTO_ARP) return -EOPNOTSUPP;
+ ret = nft_chain_validate_hooks(ctx->chain, + (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_FORWARD) | + (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING)); + if (ret) + return ret; + if (nft_is_base_chain(ctx->chain)) { const struct nft_base_chain *basechain = nft_base_chain(ctx->chain);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 62e7151ae3eb465e0ab52a20c941ff33bb6332e9 ]
conntrack nf_confirm logic cannot handle cloned skbs referencing the same nf_conn entry, which will happen for multicast (broadcast) frames on bridges.
Example: macvlan0 | br0 / \ ethX ethY
ethX (or Y) receives a L2 multicast or broadcast packet containing an IP packet, flow is not yet in conntrack table.
1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. -> skb->_nfct now references a unconfirmed entry 2. skb is broad/mcast packet. bridge now passes clones out on each bridge interface. 3. skb gets passed up the stack. 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb and schedules a work queue to send them out on the lower devices.
The clone skb->_nfct is not a copy, it is the same entry as the original skb. The macvlan rx handler then returns RX_HANDLER_PASS. 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
The Macvlan broadcast worker and normal confirm path will race.
This race will not happen if step 2 already confirmed a clone. In that case later steps perform skb_clone() with skb->_nfct already confirmed (in hash table). This works fine.
But such confirmation won't happen when eb/ip/nftables rules dropped the packets before they reached the nf_confirm step in postrouting.
Pablo points out that nf_conntrack_bridge doesn't allow use of stateful nat, so we can safely discard the nf_conn entry and let inet call conntrack again.
This doesn't work for bridge netfilter: skb could have a nat transformation. Also bridge nf prevents re-invocation of inet prerouting via 'sabotage_in' hook.
Work around this problem by explicit confirmation of the entry at LOCAL_IN time, before upper layer has a chance to clone the unconfirmed entry.
The downside is that this disables NAT and conntrack helpers.
Alternative fix would be to add locking to all code parts that deal with unconfirmed packets, but even if that could be done in a sane way this opens up other problems, for example:
-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4 -m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5
For multicast case, only one of such conflicting mappings will be created, conntrack only handles 1:1 NAT mappings.
Users should set create a setup that explicitly marks such traffic NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass them, ruleset might have accept rules for untracked traffic already, so user-visible behaviour would change.
Suggested-by: Pablo Neira Ayuso pablo@netfilter.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777 Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netfilter.h | 1 + net/bridge/br_netfilter_hooks.c | 96 ++++++++++++++++++++++ net/bridge/netfilter/nf_conntrack_bridge.c | 30 +++++++ net/netfilter/nf_conntrack_core.c | 1 + 4 files changed, 128 insertions(+)
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index 80900d9109920..ce660d51549b4 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -474,6 +474,7 @@ struct nf_ct_hook { const struct sk_buff *); void (*attach)(struct sk_buff *nskb, const struct sk_buff *skb); void (*set_closing)(struct nf_conntrack *nfct); + int (*confirm)(struct sk_buff *skb); }; extern const struct nf_ct_hook __rcu *nf_ct_hook;
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index ed17208907578..35e10c5a766d5 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -43,6 +43,10 @@ #include <linux/sysctl.h> #endif
+#if IS_ENABLED(CONFIG_NF_CONNTRACK) +#include <net/netfilter/nf_conntrack_core.h> +#endif + static unsigned int brnf_net_id __read_mostly;
struct brnf_net { @@ -553,6 +557,90 @@ static unsigned int br_nf_pre_routing(void *priv, return NF_STOLEN; }
+#if IS_ENABLED(CONFIG_NF_CONNTRACK) +/* conntracks' nf_confirm logic cannot handle cloned skbs referencing + * the same nf_conn entry, which will happen for multicast (broadcast) + * Frames on bridges. + * + * Example: + * macvlan0 + * br0 + * ethX ethY + * + * ethX (or Y) receives multicast or broadcast packet containing + * an IP packet, not yet in conntrack table. + * + * 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. + * -> skb->_nfct now references a unconfirmed entry + * 2. skb is broad/mcast packet. bridge now passes clones out on each bridge + * interface. + * 3. skb gets passed up the stack. + * 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb + * and schedules a work queue to send them out on the lower devices. + * + * The clone skb->_nfct is not a copy, it is the same entry as the + * original skb. The macvlan rx handler then returns RX_HANDLER_PASS. + * 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb. + * + * The Macvlan broadcast worker and normal confirm path will race. + * + * This race will not happen if step 2 already confirmed a clone. In that + * case later steps perform skb_clone() with skb->_nfct already confirmed (in + * hash table). This works fine. + * + * But such confirmation won't happen when eb/ip/nftables rules dropped the + * packets before they reached the nf_confirm step in postrouting. + * + * Work around this problem by explicit confirmation of the entry at + * LOCAL_IN time, before upper layer has a chance to clone the unconfirmed + * entry. + * + */ +static unsigned int br_nf_local_in(void *priv, + struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct nf_conntrack *nfct = skb_nfct(skb); + const struct nf_ct_hook *ct_hook; + struct nf_conn *ct; + int ret; + + if (!nfct || skb->pkt_type == PACKET_HOST) + return NF_ACCEPT; + + ct = container_of(nfct, struct nf_conn, ct_general); + if (likely(nf_ct_is_confirmed(ct))) + return NF_ACCEPT; + + WARN_ON_ONCE(skb_shared(skb)); + WARN_ON_ONCE(refcount_read(&nfct->use) != 1); + + /* We can't call nf_confirm here, it would create a dependency + * on nf_conntrack module. + */ + ct_hook = rcu_dereference(nf_ct_hook); + if (!ct_hook) { + skb->_nfct = 0ul; + nf_conntrack_put(nfct); + return NF_ACCEPT; + } + + nf_bridge_pull_encap_header(skb); + ret = ct_hook->confirm(skb); + switch (ret & NF_VERDICT_MASK) { + case NF_STOLEN: + return NF_STOLEN; + default: + nf_bridge_push_encap_header(skb); + break; + } + + ct = container_of(nfct, struct nf_conn, ct_general); + WARN_ON_ONCE(!nf_ct_is_confirmed(ct)); + + return ret; +} +#endif
/* PF_BRIDGE/FORWARD *************************************************/ static int br_nf_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb) @@ -964,6 +1052,14 @@ static const struct nf_hook_ops br_nf_ops[] = { .hooknum = NF_BR_PRE_ROUTING, .priority = NF_BR_PRI_BRNF, }, +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + { + .hook = br_nf_local_in, + .pf = NFPROTO_BRIDGE, + .hooknum = NF_BR_LOCAL_IN, + .priority = NF_BR_PRI_LAST, + }, +#endif { .hook = br_nf_forward, .pf = NFPROTO_BRIDGE, diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c index abb090f94ed26..6f877e31709ba 100644 --- a/net/bridge/netfilter/nf_conntrack_bridge.c +++ b/net/bridge/netfilter/nf_conntrack_bridge.c @@ -291,6 +291,30 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb, return nf_conntrack_in(skb, &bridge_state); }
+static unsigned int nf_ct_bridge_in(void *priv, struct sk_buff *skb, + const struct nf_hook_state *state) +{ + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + + if (skb->pkt_type == PACKET_HOST) + return NF_ACCEPT; + + /* nf_conntrack_confirm() cannot handle concurrent clones, + * this happens for broad/multicast frames with e.g. macvlan on top + * of the bridge device. + */ + ct = nf_ct_get(skb, &ctinfo); + if (!ct || nf_ct_is_confirmed(ct) || nf_ct_is_template(ct)) + return NF_ACCEPT; + + /* let inet prerouting call conntrack again */ + skb->_nfct = 0; + nf_ct_put(ct); + + return NF_ACCEPT; +} + static void nf_ct_bridge_frag_save(struct sk_buff *skb, struct nf_bridge_frag_data *data) { @@ -385,6 +409,12 @@ static struct nf_hook_ops nf_ct_bridge_hook_ops[] __read_mostly = { .hooknum = NF_BR_PRE_ROUTING, .priority = NF_IP_PRI_CONNTRACK, }, + { + .hook = nf_ct_bridge_in, + .pf = NFPROTO_BRIDGE, + .hooknum = NF_BR_LOCAL_IN, + .priority = NF_IP_PRI_CONNTRACK_CONFIRM, + }, { .hook = nf_ct_bridge_post, .pf = NFPROTO_BRIDGE, diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 2e5f3864d353a..5b876fa7f9af9 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -2756,6 +2756,7 @@ static const struct nf_ct_hook nf_conntrack_hook = { .get_tuple_skb = nf_conntrack_get_tuple_skb, .attach = nf_conntrack_attach, .set_closing = nf_conntrack_set_closing, + .confirm = __nf_conntrack_confirm, };
void nf_conntrack_init_end(void)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit b6c65eb20ffa8e3bd89f551427dbeee2876d72ca ]
We never increment the group number iterator, so all groups get recorded into index 0 of the mcast_groups[] array.
As a result YNL can only handle using the last group. For example using the "netdev" sample on kernel with page pool commands results in:
$ ./samples/netdev YNL: Multicast group 'mgmt' not found
Most families have only one multicast group, so this hasn't been noticed. Plus perhaps developers usually test the last group which would have worked.
Fixes: 86878f14d71a ("tools: ynl: user space helpers") Reviewed-by: Donald Hunter donald.hunter@gmail.com Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/net/ynl/lib/ynl.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/net/ynl/lib/ynl.c b/tools/net/ynl/lib/ynl.c index 591f5f50ddaab..2aa19004fa0cf 100644 --- a/tools/net/ynl/lib/ynl.c +++ b/tools/net/ynl/lib/ynl.c @@ -519,6 +519,7 @@ ynl_get_family_info_mcast(struct ynl_sock *ys, const struct nlattr *mcasts) ys->mcast_groups[i].name[GENL_NAMSIZ - 1] = 0; } } + i++; }
return 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lin Ma linma@zju.edu.cn
[ Upstream commit 743ad091fb46e622f1b690385bb15e3cd3daf874 ]
In the commit d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length"), an adjustment was made to the old loop logic in the function `rtnl_bridge_setlink` to enable the loop to also check the length of the IFLA_BRIDGE_MODE attribute. However, this adjustment removed the `break` statement and led to an error logic of the flags writing back at the end of this function.
if (have_flags) memcpy(nla_data(attr), &flags, sizeof(flags)); // attr should point to IFLA_BRIDGE_FLAGS NLA !!!
Before the mentioned commit, the `attr` is granted to be IFLA_BRIDGE_FLAGS. However, this is not necessarily true fow now as the updated loop will let the attr point to the last NLA, even an invalid NLA which could cause overflow writes.
This patch introduces a new variable `br_flag` to save the NLA pointer that points to IFLA_BRIDGE_FLAGS and uses it to resolve the mentioned error logic.
Fixes: d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length") Signed-off-by: Lin Ma linma@zju.edu.cn Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20240227121128.608110-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index bf4c3f65ad99c..8f9cd6b7926e3 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -5164,10 +5164,9 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, struct net *net = sock_net(skb->sk); struct ifinfomsg *ifm; struct net_device *dev; - struct nlattr *br_spec, *attr = NULL; + struct nlattr *br_spec, *attr, *br_flags_attr = NULL; int rem, err = -EOPNOTSUPP; u16 flags = 0; - bool have_flags = false;
if (nlmsg_len(nlh) < sizeof(*ifm)) return -EINVAL; @@ -5185,11 +5184,11 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC); if (br_spec) { nla_for_each_nested(attr, br_spec, rem) { - if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !have_flags) { + if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !br_flags_attr) { if (nla_len(attr) < sizeof(flags)) return -EINVAL;
- have_flags = true; + br_flags_attr = attr; flags = nla_get_u16(attr); }
@@ -5233,8 +5232,8 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, } }
- if (have_flags) - memcpy(nla_data(attr), &flags, sizeof(flags)); + if (br_flags_attr) + memcpy(nla_data(br_flags_attr), &flags, sizeof(flags)); out: return err; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleksij Rempel o.rempel@pengutronix.de
[ Upstream commit 0bb7b09392eb74b152719ae87b1ba5e4bf910ef0 ]
The i211 requires the same PTP timestamp adjustments as the i210, according to its datasheet. To ensure consistent timestamping across different platforms, this change extends the existing adjustments to include the i211.
The adjustment result are tested and comparable for i210 and i211 based systems.
Fixes: 3f544d2a4d5c ("igb: adjust PTP timestamps for Tx/Rx latency") Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Reviewed-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Link: https://lore.kernel.org/r/20240227184942.362710-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_ptp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c index 319c544b9f04c..f945705561200 100644 --- a/drivers/net/ethernet/intel/igb/igb_ptp.c +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c @@ -957,7 +957,7 @@ static void igb_ptp_tx_hwtstamp(struct igb_adapter *adapter)
igb_ptp_systim_to_hwtstamp(adapter, &shhwtstamps, regval); /* adjust timestamp for the TX latency based on link speed */ - if (adapter->hw.mac.type == e1000_i210) { + if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) { switch (adapter->link_speed) { case SPEED_10: adjust = IGB_I210_TX_LATENCY_10; @@ -1003,6 +1003,7 @@ int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va, ktime_t *timestamp) { struct igb_adapter *adapter = q_vector->adapter; + struct e1000_hw *hw = &adapter->hw; struct skb_shared_hwtstamps ts; __le64 *regval = (__le64 *)va; int adjust = 0; @@ -1022,7 +1023,7 @@ int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va, igb_ptp_systim_to_hwtstamp(adapter, &ts, le64_to_cpu(regval[1]));
/* adjust timestamp for the RX latency based on link speed */ - if (adapter->hw.mac.type == e1000_i210) { + if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) { switch (adapter->link_speed) { case SPEED_10: adjust = IGB_I210_RX_LATENCY_10;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukasz Majewski lukma@denx.de
[ Upstream commit 51dd4ee0372228ffb0f7709fa7aa0678d4199d06 ]
Current HSR implementation uses following supervisory frame (even for HSRv1 the HSR tag is not is not present):
00000000: 01 15 4e 00 01 2d XX YY ZZ 94 77 10 88 fb 00 01 00000010: 7e 1c 17 06 XX YY ZZ 94 77 10 1e 06 XX YY ZZ 94 00000020: 77 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00
The current code adds extra two bytes (i.e. sizeof(struct hsr_sup_tlv)) when offset for skb_pull() is calculated. This is wrong, as both 'struct hsrv1_ethhdr_sp' and 'hsrv0_ethhdr_sp' already have 'struct hsr_sup_tag' defined in them, so there is no need for adding extra two bytes.
This code was working correctly as with no RedBox support, the check for HSR_TLV_EOT (0x00) was off by two bytes, which were corresponding to zeroed padded bytes for minimal packet size.
Fixes: eafaa88b3eb7 ("net: hsr: Add support for redbox supervision frames") Signed-off-by: Lukasz Majewski lukma@denx.de Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20240228085644.3618044-1-lukma@denx.de Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_forward.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/hsr/hsr_forward.c b/net/hsr/hsr_forward.c index 80cdc6f6b34c9..0323ab5023c69 100644 --- a/net/hsr/hsr_forward.c +++ b/net/hsr/hsr_forward.c @@ -83,7 +83,7 @@ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb) return false;
/* Get next tlv */ - total_length += sizeof(struct hsr_sup_tlv) + hsr_sup_tag->tlv.HSR_TLV_length; + total_length += hsr_sup_tag->tlv.HSR_TLV_length; if (!pskb_may_pull(skb, total_length)) return false; skb_pull(skb, total_length);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit f7fa16d49837f947ee59492958f9e6f0e51d9a78 ]
With mixed sync/async decryption, or failures of crypto_aead_decrypt, we increment decrypt_pending but we never do the corresponding decrement since tls_decrypt_done will not be called. In this case, we should decrement decrypt_pending immediately to avoid getting stuck.
For example, the prequeue prequeue test gets stuck with mixed modes (one async decrypt + one sync decrypt).
Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/c56d5fc35543891d5319f834f25622360e1bfbec.170913264... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index de96959336c48..9f23ba321efe6 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -289,6 +289,8 @@ static int tls_do_decryption(struct sock *sk, return 0;
ret = crypto_wait_req(ret, &ctx->async_wait); + } else if (darg->async) { + atomic_dec(&ctx->decrypt_pending); } darg->async = false;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 6caaf104423d809b49a67ee6500191d063b40dc6 ]
If we peek from 2 records with a currently empty rx_list, and the first record is decrypted synchronously but the second record is decrypted async, the following happens: 1. decrypt record 1 (sync) 2. copy from record 1 to the userspace's msg 3. queue the decrypted record to rx_list for future read(!PEEK) 4. decrypt record 2 (async) 5. queue record 2 to rx_list 6. call process_rx_list to copy data from the 2nd record
We currently pass copied=0 as skip offset to process_rx_list, so we end up copying once again from the first record. We should skip over the data we've already copied.
Seen with selftest tls.12_aes_gcm.recv_peek_large_buf_mult_recs
Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/1b132d2b2b99296bfde54e8a67672d90d6d16e71.170913264... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 9f23ba321efe6..1394fc44f3788 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1950,6 +1950,7 @@ int tls_sw_recvmsg(struct sock *sk, struct strp_msg *rxm; struct tls_msg *tlm; ssize_t copied = 0; + ssize_t peeked = 0; bool async = false; int target, err; bool is_kvec = iov_iter_is_kvec(&msg->msg_iter); @@ -2097,8 +2098,10 @@ int tls_sw_recvmsg(struct sock *sk, if (err < 0) goto put_on_rx_list_err;
- if (is_peek) + if (is_peek) { + peeked += chunk; goto put_on_rx_list; + }
if (partially_consumed) { rxm->offset += chunk; @@ -2137,8 +2140,8 @@ int tls_sw_recvmsg(struct sock *sk,
/* Drain records from the rx_list & copy if required */ if (is_peek || is_kvec) - err = process_rx_list(ctx, msg, &control, copied, - decrypted, is_peek, NULL); + err = process_rx_list(ctx, msg, &control, copied + peeked, + decrypted - peeked, is_peek, NULL); else err = process_rx_list(ctx, msg, &control, 0, async_copy_bytes, is_peek, NULL);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 41532b785e9d79636b3815a64ddf6a096647d011 ]
If we're not doing async, the handling is much simpler. There's no reference counting, we just need to wait for the completion to wake us up and return its result.
We should preferably also use a separate crypto_wait. I'm not seeing a UAF as I did in the past, I think aec7961916f3 ("tls: fix race between async notify and socket close") took care of it.
This will make the next fix easier.
Signed-off-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/47bde5f649707610eaef9f0d679519966fc31061.170913264... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 13114dc55430 ("tls: fix use-after-free on failed backlog decryption") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 1394fc44f3788..1fd37fe13ffd8 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -274,9 +274,15 @@ static int tls_do_decryption(struct sock *sk, DEBUG_NET_WARN_ON_ONCE(atomic_read(&ctx->decrypt_pending) < 1); atomic_inc(&ctx->decrypt_pending); } else { + DECLARE_CRYPTO_WAIT(wait); + aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG, - crypto_req_done, &ctx->async_wait); + crypto_req_done, &wait); + ret = crypto_aead_decrypt(aead_req); + if (ret == -EINPROGRESS || ret == -EBUSY) + ret = crypto_wait_req(ret, &wait); + return ret; }
ret = crypto_aead_decrypt(aead_req); @@ -285,10 +291,7 @@ static int tls_do_decryption(struct sock *sk, ret = ret ?: -EINPROGRESS; } if (ret == -EINPROGRESS) { - if (darg->async) - return 0; - - ret = crypto_wait_req(ret, &ctx->async_wait); + return 0; } else if (darg->async) { atomic_dec(&ctx->decrypt_pending); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 13114dc5543069f7b97991e3b79937b6da05f5b0 ]
When the decrypt request goes to the backlog and crypto_aead_decrypt returns -EBUSY, tls_do_decryption will wait until all async decryptions have completed. If one of them fails, tls_do_decryption will return -EBADMSG and tls_decrypt_sg jumps to the error path, releasing all the pages. But the pages have been passed to the async callback, and have already been released by tls_decrypt_done.
The only true async case is when crypto_aead_decrypt returns -EINPROGRESS. With -EBUSY, we already waited so we can tell tls_sw_recvmsg that the data is available for immediate copy, but we need to notify tls_decrypt_sg (via the new ->async_done flag) that the memory has already been released.
Fixes: 859054147318 ("net: tls: handle backlogging of crypto requests") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/4755dd8d9bebdefaa19ce1439b833d6199d4364c.170913264... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 1fd37fe13ffd8..211f57164cb61 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -52,6 +52,7 @@ struct tls_decrypt_arg { struct_group(inargs, bool zc; bool async; + bool async_done; u8 tail; );
@@ -286,15 +287,18 @@ static int tls_do_decryption(struct sock *sk, }
ret = crypto_aead_decrypt(aead_req); + if (ret == -EINPROGRESS) + return 0; + if (ret == -EBUSY) { ret = tls_decrypt_async_wait(ctx); - ret = ret ?: -EINPROGRESS; - } - if (ret == -EINPROGRESS) { - return 0; - } else if (darg->async) { - atomic_dec(&ctx->decrypt_pending); + darg->async_done = true; + /* all completions have run, we're not doing async anymore */ + darg->async = false; + return ret; } + + atomic_dec(&ctx->decrypt_pending); darg->async = false;
return ret; @@ -1593,8 +1597,11 @@ static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov, /* Prepare and submit AEAD request */ err = tls_do_decryption(sk, sgin, sgout, dctx->iv, data_len + prot->tail_size, aead_req, darg); - if (err) + if (err) { + if (darg->async_done) + goto exit_free_skb; goto exit_free_pages; + }
darg->skb = clear_skb ?: tls_strp_msg(ctx); clear_skb = NULL; @@ -1606,6 +1613,9 @@ static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov, return err; }
+ if (unlikely(darg->async_done)) + return 0; + if (prot->tail_size) darg->tail = dctx->tail;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jisheng Zhang jszhang@kernel.org
[ Upstream commit 8246601a7d391ce8207408149d65732f28af81a1 ]
If non-leaf PTEs I.E pmd, pud or p4d is modified, a sfence.vma is a must for safe, imagine if an implementation caches the non-leaf translation in TLB, although I didn't meet this HW so far, but it's possible in theory.
Signed-off-by: Jisheng Zhang jszhang@kernel.org Fixes: c5e9b2c2ae82 ("riscv: Improve tlb_flush()") Link: https://lore.kernel.org/r/20231219175046.2496-2-jszhang@kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/pgalloc.h | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h index d169a4f41a2e7..c80bb9990d32e 100644 --- a/arch/riscv/include/asm/pgalloc.h +++ b/arch/riscv/include/asm/pgalloc.h @@ -95,7 +95,13 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud) __pud_free(mm, pud); }
-#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud) +#define __pud_free_tlb(tlb, pud, addr) \ +do { \ + if (pgtable_l4_enabled) { \ + pagetable_pud_dtor(virt_to_ptdesc(pud)); \ + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \ + } \ +} while (0)
#define p4d_alloc_one p4d_alloc_one static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr) @@ -124,7 +130,11 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d) __p4d_free(mm, p4d); }
-#define __p4d_free_tlb(tlb, p4d, addr) p4d_free((tlb)->mm, p4d) +#define __p4d_free_tlb(tlb, p4d, addr) \ +do { \ + if (pgtable_l5_enabled) \ + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \ +} while (0) #endif /* __PAGETABLE_PMD_FOLDED */
static inline void sync_kernel_mappings(pgd_t *pgd) @@ -149,7 +159,11 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
#ifndef __PAGETABLE_PMD_FOLDED
-#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd) +#define __pmd_free_tlb(tlb, pmd, addr) \ +do { \ + pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \ + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \ +} while (0)
#endif /* __PAGETABLE_PMD_FOLDED */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit fccfa646ef3628097d59f7d9c1a3e84d4b6bb45e ]
gcc-14 notices that the allocation with sizeof(void) on 32-bit architectures is not enough for a 64-bit phys_addr_t:
drivers/firmware/efi/capsule-loader.c: In function 'efi_capsule_open': drivers/firmware/efi/capsule-loader.c:295:24: error: allocation of insufficient size '4' for type 'phys_addr_t' {aka 'long long unsigned int'} with size '8' [-Werror=alloc-size] 295 | cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL); | ^
Use the correct type instead here.
Fixes: f24c4d478013 ("efi/capsule-loader: Reinstate virtual capsule mapping") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/capsule-loader.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/capsule-loader.c b/drivers/firmware/efi/capsule-loader.c index 3e8d4b51a8140..97bafb5f70389 100644 --- a/drivers/firmware/efi/capsule-loader.c +++ b/drivers/firmware/efi/capsule-loader.c @@ -292,7 +292,7 @@ static int efi_capsule_open(struct inode *inode, struct file *file) return -ENOMEM; }
- cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL); + cap_info->phys = kzalloc(sizeof(phys_addr_t), GFP_KERNEL); if (!cap_info->phys) { kfree(cap_info->pages); kfree(cap_info);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 2df70149e73e79783bcbc7db4fa51ecef0e2022c ]
The bq27xxx i2c-client may not have an IRQ, in which case client->irq will be 0. bq27xxx_battery_i2c_probe() already has an if (client->irq) check wrapping the request_threaded_irq().
But bq27xxx_battery_i2c_remove() unconditionally calls free_irq(client->irq) leading to:
[ 190.310742] ------------[ cut here ]------------ [ 190.310843] Trying to free already-free IRQ 0 [ 190.310861] WARNING: CPU: 2 PID: 1304 at kernel/irq/manage.c:1893 free_irq+0x1b8/0x310
Followed by a backtrace when unbinding the driver. Add an if (client->irq) to bq27xxx_battery_i2c_remove() mirroring probe() to fix this.
Fixes: 444ff00734f3 ("power: supply: bq27xxx: Fix I2C IRQ race on remove") Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20240215155133.70537-1-hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/bq27xxx_battery_i2c.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/power/supply/bq27xxx_battery_i2c.c b/drivers/power/supply/bq27xxx_battery_i2c.c index 9b5475590518f..886e0a8e2abd1 100644 --- a/drivers/power/supply/bq27xxx_battery_i2c.c +++ b/drivers/power/supply/bq27xxx_battery_i2c.c @@ -209,7 +209,9 @@ static void bq27xxx_battery_i2c_remove(struct i2c_client *client) { struct bq27xxx_device_info *di = i2c_get_clientdata(client);
- free_irq(client->irq, di); + if (client->irq) + free_irq(client->irq, di); + bq27xxx_battery_teardown(di);
mutex_lock(&battery_mutex);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit e33625c84b75e4f078d7f9bf58f01fe71ab99642 ]
The driver must write 0 to HALO_STATE before sending the SYSTEM_RESET command to the firmware.
HALO_STATE is in DSP memory, which is preserved across a soft reset. The SYSTEM_RESET command does not change the value of HALO_STATE. There is period of time while the CS35L56 is resetting, before the firmware has started to boot, where a read of HALO_STATE will return the value it had before the SYSTEM_RESET. If the driver does not clear HALO_STATE, this would return BOOT_DONE status even though the firmware has not booted.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 8a731fd37f8b ("ASoC: cs35l56: Move utility functions to shared file") Link: https://msgid.link/r/20240216140535.1434933-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs35l56-shared.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/soc/codecs/cs35l56-shared.c b/sound/soc/codecs/cs35l56-shared.c index 953ba066bab1e..fc99bc92aeace 100644 --- a/sound/soc/codecs/cs35l56-shared.c +++ b/sound/soc/codecs/cs35l56-shared.c @@ -286,6 +286,7 @@ void cs35l56_wait_min_reset_pulse(void) EXPORT_SYMBOL_NS_GPL(cs35l56_wait_min_reset_pulse, SND_SOC_CS35L56_SHARED);
static const struct reg_sequence cs35l56_system_reset_seq[] = { + REG_SEQ0(CS35L56_DSP1_HALO_STATE, 0), REG_SEQ0(CS35L56_DSP_VIRTUAL1_MBOX_1, CS35L56_MBOX_CMD_SYSTEM_RESET), };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit 4df49712eb54141be00a9312547436d55677f092 ]
We forgot to remove the line for snd-rtctimer from Makefile while dropping the functionality. Get rid of the stale line.
Fixes: 34ce71a96dcb ("ALSA: timer: remove legacy rtctimer") Link: https://lore.kernel.org/r/20240221092156.28695-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/core/Makefile | 1 - 1 file changed, 1 deletion(-)
diff --git a/sound/core/Makefile b/sound/core/Makefile index a6b444ee28326..f6526b3371375 100644 --- a/sound/core/Makefile +++ b/sound/core/Makefile @@ -32,7 +32,6 @@ snd-ump-objs := ump.o snd-ump-$(CONFIG_SND_UMP_LEGACY_RAWMIDI) += ump_convert.o snd-timer-objs := timer.o snd-hrtimer-objs := hrtimer.o -snd-rtctimer-objs := rtctimer.o snd-hwdep-objs := hwdep.o snd-seq-device-objs := seq_device.o
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Colin Ian King colin.i.king@gmail.com
[ Upstream commit 1382d8b55129875b2e07c4d2a7ebc790183769ee ]
In the case where __lpass_get_dmactl_handle is called and the driver id dai_id is invalid the pointer dmactl is not being assigned a value, and dmactl contains a garbage value since it has not been initialized and so the null check may not work. Fix this to initialize dmactl to NULL. One could argue that modern compilers will set this to zero, but it is useful to keep this initialized as per the same way in functions __lpass_platform_codec_intf_init and lpass_cdc_dma_daiops_hw_params.
Cleans up clang scan build warning: sound/soc/qcom/lpass-cdc-dma.c:275:7: warning: Branch condition evaluates to a garbage value [core.uninitialized.Branch]
Fixes: b81af585ea54 ("ASoC: qcom: Add lpass CPU driver for codec dma control") Signed-off-by: Colin Ian King colin.i.king@gmail.com Link: https://msgid.link/r/20240221134804.3475989-1-colin.i.king@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/qcom/lpass-cdc-dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/qcom/lpass-cdc-dma.c b/sound/soc/qcom/lpass-cdc-dma.c index 48b03e60e3a3d..8106c586f68a4 100644 --- a/sound/soc/qcom/lpass-cdc-dma.c +++ b/sound/soc/qcom/lpass-cdc-dma.c @@ -259,7 +259,7 @@ static int lpass_cdc_dma_daiops_trigger(struct snd_pcm_substream *substream, int cmd, struct snd_soc_dai *dai) { struct snd_soc_pcm_runtime *soc_runtime = snd_soc_substream_to_rtd(substream); - struct lpaif_dmactl *dmactl; + struct lpaif_dmactl *dmactl = NULL; int ret = 0, id;
switch (cmd) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikko Perttunen mperttunen@nvidia.com
[ Upstream commit 1fa8d07ae1a5fa4e87de42c338e8fc27f46d8bb6 ]
On Tegra186, secure world applications may need to access host1x during suspend/resume, and rely on the kernel to keep Host1x out of reset during the suspend cycle. As such, as a quirk, skip asserting Host1x's reset on Tegra186.
We don't need to keep the clocks enabled, as BPMP ensures the clock stays on while Host1x is being used. On newer SoC's, the reset line is inaccessible, so there is no need for the quirk.
Fixes: b7c00cdf6df5 ("gpu: host1x: Enable system suspend callbacks") Signed-off-by: Mikko Perttunen mperttunen@nvidia.com Reviewed-by: Jon Hunter jonathanh@nvidia.com Tested-by: Jon Hunter jonathanh@nvidia.com Signed-off-by: Thierry Reding treding@nvidia.com Link: https://patchwork.freedesktop.org/patch/msgid/20240222010517.1573931-1-cyndi... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/host1x/dev.c | 15 +++++++++------ drivers/gpu/host1x/dev.h | 6 ++++++ 2 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 42fd504abbcda..89983d7d73ca1 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -169,6 +169,7 @@ static const struct host1x_info host1x06_info = { .num_sid_entries = ARRAY_SIZE(tegra186_sid_table), .sid_table = tegra186_sid_table, .reserve_vblank_syncpts = false, + .skip_reset_assert = true, };
static const struct host1x_sid_entry tegra194_sid_table[] = { @@ -680,13 +681,15 @@ static int __maybe_unused host1x_runtime_suspend(struct device *dev) host1x_intr_stop(host); host1x_syncpt_save(host);
- err = reset_control_bulk_assert(host->nresets, host->resets); - if (err) { - dev_err(dev, "failed to assert reset: %d\n", err); - goto resume_host1x; - } + if (!host->info->skip_reset_assert) { + err = reset_control_bulk_assert(host->nresets, host->resets); + if (err) { + dev_err(dev, "failed to assert reset: %d\n", err); + goto resume_host1x; + }
- usleep_range(1000, 2000); + usleep_range(1000, 2000); + }
clk_disable_unprepare(host->clk); reset_control_bulk_release(host->nresets, host->resets); diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index c8e302de76257..925a118db23f5 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -116,6 +116,12 @@ struct host1x_info { * the display driver disables VBLANK increments. */ bool reserve_vblank_syncpts; + /* + * On Tegra186, secure world applications may require access to + * host1x during suspend/resume. To allow this, we need to leave + * host1x not in reset. + */ + bool skip_reset_assert; };
struct host1x {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yangyu Chen cyy@cyyself.name
[ Upstream commit c21f014818600ae017f97ee087e7c136b1916aa7 ]
Previous commit dbfbda3bd6bf ("riscv: mm: update T-Head memory type definitions") from patch [1] missed a `<` for bit shifting, result in bit(61) does not set in _PAGE_NOCACHE_THEAD and leaves bit(0) set instead. This patch get this fixed.
Link: https://lore.kernel.org/linux-riscv/20230912072510.2510-1-jszhang@kernel.org... [1] Fixes: dbfbda3bd6bf ("riscv: mm: update T-Head memory type definitions") Signed-off-by: Yangyu Chen cyy@cyyself.name Reviewed-by: Guo Ren guoren@kernel.org Reviewed-by: Jisheng Zhang jszhang@kernel.org Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/tencent_E19FA1A095768063102E654C6FC858A32F06@qq.co... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/pgtable-64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h index 9a2c780a11e95..783837bbd8783 100644 --- a/arch/riscv/include/asm/pgtable-64.h +++ b/arch/riscv/include/asm/pgtable-64.h @@ -136,7 +136,7 @@ enum napot_cont_order { * 10010 - IO Strongly-ordered, Non-cacheable, Non-bufferable, Shareable, Non-trustable */ #define _PAGE_PMA_THEAD ((1UL << 62) | (1UL << 61) | (1UL << 60)) -#define _PAGE_NOCACHE_THEAD ((1UL < 61) | (1UL << 60)) +#define _PAGE_NOCACHE_THEAD ((1UL << 61) | (1UL << 60)) #define _PAGE_IO_THEAD ((1UL << 63) | (1UL << 60)) #define _PAGE_MTMASK_THEAD (_PAGE_PMA_THEAD | _PAGE_IO_THEAD | (1UL << 59))
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit fc325b1a915f6d0c821bfcea21fb3f1354c4323b ]
The new riscv specific arch_hugetlb_migration_supported() must be guarded with a #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION to avoid the following build error:
In file included from include/linux/hugetlb.h:851, from kernel/fork.c:52:
arch/riscv/include/asm/hugetlb.h:15:42: error: static declaration of 'arch_hugetlb_migration_supported' follows non-static declaration
15 | #define arch_hugetlb_migration_supported arch_hugetlb_migration_supported | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/hugetlb.h:916:20: note: in expansion of macro 'arch_hugetlb_migration_supported' 916 | static inline bool arch_hugetlb_migration_supported(struct hstate *h) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/riscv/include/asm/hugetlb.h:14:6: note: previous declaration of 'arch_hugetlb_migration_supported' with type 'bool(struct hstate *)' {aka '_Bool(struct hstate *)'} 14 | bool arch_hugetlb_migration_supported(struct hstate *h); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202402110258.CV51JlEI-lkp@intel.com/ Fixes: ce68c035457b ("riscv: Fix arch_hugetlb_migration_supported() for NAPOT") Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240211083640.756583-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/hugetlb.h | 2 ++ arch/riscv/mm/hugetlbpage.c | 2 ++ 2 files changed, 4 insertions(+)
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h index 20f9c3ba23414..22deb7a2a6ec4 100644 --- a/arch/riscv/include/asm/hugetlb.h +++ b/arch/riscv/include/asm/hugetlb.h @@ -11,8 +11,10 @@ static inline void arch_clear_hugepage_flags(struct page *page) } #define arch_clear_hugepage_flags arch_clear_hugepage_flags
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION bool arch_hugetlb_migration_supported(struct hstate *h); #define arch_hugetlb_migration_supported arch_hugetlb_migration_supported +#endif
#ifdef CONFIG_RISCV_ISA_SVNAPOT #define __HAVE_ARCH_HUGE_PTE_CLEAR diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index e7b69281875b2..fbe918801667d 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -426,10 +426,12 @@ bool __init arch_hugetlb_valid_size(unsigned long size) return __hugetlb_valid_size(size); }
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION bool arch_hugetlb_migration_supported(struct hstate *h) { return __hugetlb_valid_size(huge_page_size(h)); } +#endif
#ifdef CONFIG_CONTIG_ALLOC static __init int gigantic_pages_init(void)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit ae861c466ee57e15a29d97629e1c564e3f714a4f ]
The cs35l56->component pointer is used by the suspend-resume handling to know whether the driver is fully instantiated. This is to prevent it queuing dsp_work which would result in calling wm_adsp when the driver is not an instantiated ASoC component. So this pointer must be cleared by cs35l56_component_remove().
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56") Link: https://msgid.link/r/20240129162737.497-4-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: eba2eb2495f4 ("ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()") Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs35l56.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index 45b4de3eff94f..09944db4db30d 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -809,6 +809,8 @@ static void cs35l56_component_remove(struct snd_soc_component *component) struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component);
cancel_work_sync(&cs35l56->dsp_work); + + cs35l56->component = NULL; }
static int cs35l56_set_bias_level(struct snd_soc_component *component,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit cd38ccbecdace1469b4e0cfb3ddeec72a3fad226 ]
cs35l56_component_remove() must call wm_adsp_power_down() and wm_adsp2_component_remove().
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56") Link: https://msgid.link/r/20240129162737.497-5-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: eba2eb2495f4 ("ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()") Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs35l56.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index 09944db4db30d..491da77112c34 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -810,6 +810,11 @@ static void cs35l56_component_remove(struct snd_soc_component *component)
cancel_work_sync(&cs35l56->dsp_work);
+ if (cs35l56->dsp.cs_dsp.booted) + wm_adsp_power_down(&cs35l56->dsp); + + wm_adsp2_component_remove(&cs35l56->dsp, component); + cs35l56->component = NULL; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit 07687cd0539f8185b6ba0c0afba8473517116d6a ]
Move the call to cs35l56_set_patch() earlier in cs35l56_init() so that it only adds the register patch on first-time initialization.
The call was after the post_soft_reset label, so every time this function was run to re-initialize the hardware after a reset it would call regmap_register_patch() and add the same reg_sequence again.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 898673b905b9 ("ASoC: cs35l56: Move shared data into a common data structure") Link: https://msgid.link/r/20240129162737.497-6-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: eba2eb2495f4 ("ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()") Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs35l56.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index 491da77112c34..ea5d2b2eb82a0 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -1159,6 +1159,10 @@ int cs35l56_init(struct cs35l56_private *cs35l56) if (ret < 0) return ret;
+ ret = cs35l56_set_patch(&cs35l56->base); + if (ret) + return ret; + /* Populate the DSP information with the revision and security state */ cs35l56->dsp.part = devm_kasprintf(cs35l56->base.dev, GFP_KERNEL, "cs35l56%s-%02x", cs35l56->base.secured ? "s" : "", cs35l56->base.rev); @@ -1197,10 +1201,6 @@ int cs35l56_init(struct cs35l56_private *cs35l56) if (ret) return ret;
- ret = cs35l56_set_patch(&cs35l56->base); - if (ret) - return ret; - /* Registers could be dirty after soft reset or SoundWire enumeration */ regcache_sync(cs35l56->base.regmap);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit 07f7d6e7a124d3e4de36771e2a4926d0e31c2258 ]
Defer initializing the state of the ASP1 mixer registers until the firmware has been downloaded and rebooted.
On a SoundWire system the ASP is free for use as a chip-to-chip interconnect. This can be either for the firmware on multiple CS35L56 to share reference audio; or as a bridge to another device. If it is a firmware interconnect it is owned by the firmware and the Linux driver should avoid writing the registers. However, if it is a bridge then Linux may take over and handle it as a normal codec-to-codec link. Even if the ASP is used as a firmware-firmware interconnect it is useful to have ALSA controls for the ASP mixer. They are at least useful for debugging.
CS35L56 is designed for SDCA and a generic SDCA driver would know nothing about these chip-specific registers. So if the ASP is being used on a SoundWire system the firmware sets up the ASP mixer registers. This means that we can't assume the default state of these registers. But we don't know the initial state that the firmware set them to until after the firmware has been downloaded and booted, which can take several seconds when downloading multiple amps.
DAPM normally reads the initial state of mux registers during probe() but this would mean blocking probe() for several seconds until the firmware has initialized them. To avoid this, the mixer muxes are set SND_SOC_NOPM to prevent DAPM trying to read the register state. Custom get/set callbacks are implemented for ALSA control access, and these can safely block waiting for the firmware download.
After the firmware download has completed, the state of the mux registers is known so a work job is queued to call snd_soc_dapm_mux_update_power() on each of the mux widgets.
Backport note: This won't apply cleanly to kernels older than v6.6.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56") Link: https://msgid.link/r/20240129162737.497-11-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: eba2eb2495f4 ("ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()") Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs35l56-shared.c | 7 +- sound/soc/codecs/cs35l56.c | 172 +++++++++++++++++++++++++++--- sound/soc/codecs/cs35l56.h | 1 + 3 files changed, 163 insertions(+), 17 deletions(-)
diff --git a/sound/soc/codecs/cs35l56-shared.c b/sound/soc/codecs/cs35l56-shared.c index fc99bc92aeace..2eb397724b4ba 100644 --- a/sound/soc/codecs/cs35l56-shared.c +++ b/sound/soc/codecs/cs35l56-shared.c @@ -34,10 +34,9 @@ static const struct reg_default cs35l56_reg_defaults[] = { { CS35L56_ASP1_FRAME_CONTROL5, 0x00020100 }, { CS35L56_ASP1_DATA_CONTROL1, 0x00000018 }, { CS35L56_ASP1_DATA_CONTROL5, 0x00000018 }, - { CS35L56_ASP1TX1_INPUT, 0x00000018 }, - { CS35L56_ASP1TX2_INPUT, 0x00000019 }, - { CS35L56_ASP1TX3_INPUT, 0x00000020 }, - { CS35L56_ASP1TX4_INPUT, 0x00000028 }, + + /* no defaults for ASP1TX mixer */ + { CS35L56_SWIRE_DP3_CH1_INPUT, 0x00000018 }, { CS35L56_SWIRE_DP3_CH2_INPUT, 0x00000019 }, { CS35L56_SWIRE_DP3_CH3_INPUT, 0x00000029 }, diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index ea5d2b2eb82a0..30f4b9e9cc94c 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -59,6 +59,135 @@ static int cs35l56_dspwait_put_volsw(struct snd_kcontrol *kcontrol, return snd_soc_put_volsw(kcontrol, ucontrol); }
+static const unsigned short cs35l56_asp1_mixer_regs[] = { + CS35L56_ASP1TX1_INPUT, CS35L56_ASP1TX2_INPUT, + CS35L56_ASP1TX3_INPUT, CS35L56_ASP1TX4_INPUT, +}; + +static const char * const cs35l56_asp1_mux_control_names[] = { + "ASP1 TX1 Source", "ASP1 TX2 Source", "ASP1 TX3 Source", "ASP1 TX4 Source" +}; + +static int cs35l56_dspwait_asp1tx_get(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_value *ucontrol) +{ + struct snd_soc_component *component = snd_soc_dapm_kcontrol_component(kcontrol); + struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component); + struct soc_enum *e = (struct soc_enum *)kcontrol->private_value; + int index = e->shift_l; + unsigned int addr, val; + int ret; + + /* Wait for mux to be initialized */ + cs35l56_wait_dsp_ready(cs35l56); + flush_work(&cs35l56->mux_init_work); + + addr = cs35l56_asp1_mixer_regs[index]; + ret = regmap_read(cs35l56->base.regmap, addr, &val); + if (ret) + return ret; + + val &= CS35L56_ASP_TXn_SRC_MASK; + ucontrol->value.enumerated.item[0] = snd_soc_enum_val_to_item(e, val); + + return 0; +} + +static int cs35l56_dspwait_asp1tx_put(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_value *ucontrol) +{ + struct snd_soc_component *component = snd_soc_dapm_kcontrol_component(kcontrol); + struct snd_soc_dapm_context *dapm = snd_soc_dapm_kcontrol_dapm(kcontrol); + struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component); + struct soc_enum *e = (struct soc_enum *)kcontrol->private_value; + int item = ucontrol->value.enumerated.item[0]; + int index = e->shift_l; + unsigned int addr, val; + bool changed; + int ret; + + /* Wait for mux to be initialized */ + cs35l56_wait_dsp_ready(cs35l56); + flush_work(&cs35l56->mux_init_work); + + addr = cs35l56_asp1_mixer_regs[index]; + val = snd_soc_enum_item_to_val(e, item); + + ret = regmap_update_bits_check(cs35l56->base.regmap, addr, + CS35L56_ASP_TXn_SRC_MASK, val, &changed); + if (!ret) + return ret; + + if (changed) + snd_soc_dapm_mux_update_power(dapm, kcontrol, item, e, NULL); + + return changed; +} + +static void cs35l56_mark_asp1_mixer_widgets_dirty(struct cs35l56_private *cs35l56) +{ + struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(cs35l56->component); + const char *prefix = cs35l56->component->name_prefix; + char full_name[SNDRV_CTL_ELEM_ID_NAME_MAXLEN]; + const char *name; + struct snd_kcontrol *kcontrol; + struct soc_enum *e; + unsigned int val[4]; + int i, item, ret; + + /* + * Resume so we can read the registers from silicon if the regmap + * cache has not yet been populated. + */ + ret = pm_runtime_resume_and_get(cs35l56->base.dev); + if (ret < 0) + return; + + ret = regmap_bulk_read(cs35l56->base.regmap, CS35L56_ASP1TX1_INPUT, + val, ARRAY_SIZE(val)); + + pm_runtime_mark_last_busy(cs35l56->base.dev); + pm_runtime_put_autosuspend(cs35l56->base.dev); + + if (ret) { + dev_err(cs35l56->base.dev, "Failed to read ASP1 mixer regs: %d\n", ret); + return; + } + + snd_soc_card_mutex_lock(dapm->card); + WARN_ON(!dapm->card->instantiated); + + for (i = 0; i < ARRAY_SIZE(cs35l56_asp1_mux_control_names); ++i) { + name = cs35l56_asp1_mux_control_names[i]; + + if (prefix) { + snprintf(full_name, sizeof(full_name), "%s %s", prefix, name); + name = full_name; + } + + kcontrol = snd_soc_card_get_kcontrol(dapm->card, name); + if (!kcontrol) { + dev_warn(cs35l56->base.dev, "Could not find control %s\n", name); + continue; + } + + e = (struct soc_enum *)kcontrol->private_value; + item = snd_soc_enum_val_to_item(e, val[i] & CS35L56_ASP_TXn_SRC_MASK); + snd_soc_dapm_mux_update_power(dapm, kcontrol, item, e, NULL); + } + + snd_soc_card_mutex_unlock(dapm->card); +} + +static void cs35l56_mux_init_work(struct work_struct *work) +{ + struct cs35l56_private *cs35l56 = container_of(work, + struct cs35l56_private, + mux_init_work); + + cs35l56_mark_asp1_mixer_widgets_dirty(cs35l56); +} + static DECLARE_TLV_DB_SCALE(vol_tlv, -10000, 25, 0);
static const struct snd_kcontrol_new cs35l56_controls[] = { @@ -77,40 +206,44 @@ static const struct snd_kcontrol_new cs35l56_controls[] = { };
static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_asp1tx1_enum, - CS35L56_ASP1TX1_INPUT, - 0, CS35L56_ASP_TXn_SRC_MASK, + SND_SOC_NOPM, + 0, 0, cs35l56_tx_input_texts, cs35l56_tx_input_values);
static const struct snd_kcontrol_new asp1_tx1_mux = - SOC_DAPM_ENUM("ASP1TX1 SRC", cs35l56_asp1tx1_enum); + SOC_DAPM_ENUM_EXT("ASP1TX1 SRC", cs35l56_asp1tx1_enum, + cs35l56_dspwait_asp1tx_get, cs35l56_dspwait_asp1tx_put);
static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_asp1tx2_enum, - CS35L56_ASP1TX2_INPUT, - 0, CS35L56_ASP_TXn_SRC_MASK, + SND_SOC_NOPM, + 1, 0, cs35l56_tx_input_texts, cs35l56_tx_input_values);
static const struct snd_kcontrol_new asp1_tx2_mux = - SOC_DAPM_ENUM("ASP1TX2 SRC", cs35l56_asp1tx2_enum); + SOC_DAPM_ENUM_EXT("ASP1TX2 SRC", cs35l56_asp1tx2_enum, + cs35l56_dspwait_asp1tx_get, cs35l56_dspwait_asp1tx_put);
static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_asp1tx3_enum, - CS35L56_ASP1TX3_INPUT, - 0, CS35L56_ASP_TXn_SRC_MASK, + SND_SOC_NOPM, + 2, 0, cs35l56_tx_input_texts, cs35l56_tx_input_values);
static const struct snd_kcontrol_new asp1_tx3_mux = - SOC_DAPM_ENUM("ASP1TX3 SRC", cs35l56_asp1tx3_enum); + SOC_DAPM_ENUM_EXT("ASP1TX3 SRC", cs35l56_asp1tx3_enum, + cs35l56_dspwait_asp1tx_get, cs35l56_dspwait_asp1tx_put);
static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_asp1tx4_enum, - CS35L56_ASP1TX4_INPUT, - 0, CS35L56_ASP_TXn_SRC_MASK, + SND_SOC_NOPM, + 3, 0, cs35l56_tx_input_texts, cs35l56_tx_input_values);
static const struct snd_kcontrol_new asp1_tx4_mux = - SOC_DAPM_ENUM("ASP1TX4 SRC", cs35l56_asp1tx4_enum); + SOC_DAPM_ENUM_EXT("ASP1TX4 SRC", cs35l56_asp1tx4_enum, + cs35l56_dspwait_asp1tx_get, cs35l56_dspwait_asp1tx_put);
static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_sdw1tx1_enum, CS35L56_SWIRE_DP3_CH1_INPUT, @@ -764,6 +897,15 @@ static void cs35l56_dsp_work(struct work_struct *work) else cs35l56_patch(cs35l56);
+ + /* + * Set starting value of ASP1 mux widgets. Updating a mux takes + * the DAPM mutex. Post this to a separate job so that DAPM + * power-up can wait for dsp_work to complete without deadlocking + * on the DAPM mutex. + */ + queue_work(cs35l56->dsp_wq, &cs35l56->mux_init_work); + pm_runtime_mark_last_busy(cs35l56->base.dev); pm_runtime_put_autosuspend(cs35l56->base.dev); } @@ -809,6 +951,7 @@ static void cs35l56_component_remove(struct snd_soc_component *component) struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component);
cancel_work_sync(&cs35l56->dsp_work); + cancel_work_sync(&cs35l56->mux_init_work);
if (cs35l56->dsp.cs_dsp.booted) wm_adsp_power_down(&cs35l56->dsp); @@ -876,8 +1019,10 @@ int cs35l56_system_suspend(struct device *dev)
dev_dbg(dev, "system_suspend\n");
- if (cs35l56->component) + if (cs35l56->component) { flush_work(&cs35l56->dsp_work); + cancel_work_sync(&cs35l56->mux_init_work); + }
/* * The interrupt line is normally shared, but after we start suspending @@ -1028,6 +1173,7 @@ static int cs35l56_dsp_init(struct cs35l56_private *cs35l56) return -ENOMEM;
INIT_WORK(&cs35l56->dsp_work, cs35l56_dsp_work); + INIT_WORK(&cs35l56->mux_init_work, cs35l56_mux_init_work);
dsp = &cs35l56->dsp; cs35l56_init_cs_dsp(&cs35l56->base, &dsp->cs_dsp); diff --git a/sound/soc/codecs/cs35l56.h b/sound/soc/codecs/cs35l56.h index 8159c3e217d93..dc2fe4c91e67b 100644 --- a/sound/soc/codecs/cs35l56.h +++ b/sound/soc/codecs/cs35l56.h @@ -34,6 +34,7 @@ struct cs35l56_private { struct wm_adsp dsp; /* must be first member */ struct cs35l56_base base; struct work_struct dsp_work; + struct work_struct mux_init_work; struct workqueue_struct *dsp_wq; struct snd_soc_component *component; struct regulator_bulk_data supplies[CS35L56_NUM_BULK_SUPPLIES];
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit f6c967941c5d6fa526fdd64733a8d86bf2bfab31 ]
Put the silicon revision and secured flag in the wm_adsp fwf_name string instead of including them in the part string.
This changes the format of the firmware name string from
cs35l56[s]-rev-misc[-system_name]
to cs35l56-rev[-s]-misc[-system_name]
No firmware files have been published, so this doesn't cause a compatibility break.
Silicon revision and secured flag are included in the firmware filename to pick a firmware compatible with the part. These strings were being added to the part string, but that is a misuse of the string. The correct place for these is the fwf_name string, which is specifically intended to select between multiple firmware files for the same part.
Backport note: This won't apply to kernels older than v6.6.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 608f1b0dbdde ("ASoC: cs35l56: Move DSP part string generation so that it is done only once") Link: https://msgid.link/r/20240129162737.497-12-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: eba2eb2495f4 ("ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()") Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs35l56.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index 30f4b9e9cc94c..f05fab577f037 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -886,6 +886,18 @@ static void cs35l56_dsp_work(struct work_struct *work)
pm_runtime_get_sync(cs35l56->base.dev);
+ /* Populate fw file qualifier with the revision and security state */ + if (!cs35l56->dsp.fwf_name) { + cs35l56->dsp.fwf_name = kasprintf(GFP_KERNEL, "%02x%s-dsp1", + cs35l56->base.rev, + cs35l56->base.secured ? "-s" : ""); + if (!cs35l56->dsp.fwf_name) + goto err; + } + + dev_dbg(cs35l56->base.dev, "DSP fwf name: '%s' system name: '%s'\n", + cs35l56->dsp.fwf_name, cs35l56->dsp.system_name); + /* * When the device is running in secure mode the firmware files can * only contain insecure tunings and therefore we do not need to @@ -905,7 +917,7 @@ static void cs35l56_dsp_work(struct work_struct *work) * on the DAPM mutex. */ queue_work(cs35l56->dsp_wq, &cs35l56->mux_init_work); - +err: pm_runtime_mark_last_busy(cs35l56->base.dev); pm_runtime_put_autosuspend(cs35l56->base.dev); } @@ -958,6 +970,9 @@ static void cs35l56_component_remove(struct snd_soc_component *component)
wm_adsp2_component_remove(&cs35l56->dsp, component);
+ kfree(cs35l56->dsp.fwf_name); + cs35l56->dsp.fwf_name = NULL; + cs35l56->component = NULL; }
@@ -1309,12 +1324,6 @@ int cs35l56_init(struct cs35l56_private *cs35l56) if (ret) return ret;
- /* Populate the DSP information with the revision and security state */ - cs35l56->dsp.part = devm_kasprintf(cs35l56->base.dev, GFP_KERNEL, "cs35l56%s-%02x", - cs35l56->base.secured ? "s" : "", cs35l56->base.rev); - if (!cs35l56->dsp.part) - return -ENOMEM; - if (!cs35l56->base.reset_gpio) { dev_dbg(cs35l56->base.dev, "No reset gpio: using soft reset\n"); cs35l56->soft_resetting = true;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit c14f09f010cc569ae7e2f6ef02374f6bfef9917e ]
Rewrite the handling of ASP1 TX mixer mux initialization to prevent a deadlock during component_remove().
The firmware can overwrite the ASP1 TX mixer registers with system-specific settings. This is mainly for hardware that uses the ASP as a chip-to-chip link controlled by the firmware. Because of this the driver cannot know the starting state of the ASP1 mixer muxes until the firmware has been downloaded and rebooted.
The original workaround for this was to queue a work function from the dsp_work() job. This work then read the register values (populating the regmap cache the first time around) and then called snd_soc_dapm_mux_update_power(). The problem with this is that it was ultimately triggered by cs35l56_component_probe() queueing dsp_work, which meant that it would be running in parallel with the rest of the ASoC component and card initialization. To prevent accessing DAPM before it was fully initialized the work function took the card mutex. But this would deadlock if cs35l56_component_remove() was called before the work job had completed, because ASoC calls component_remove() with the card mutex held.
This new version removes the work function. Instead the regmap cache and DAPM mux widgets are initialized the first time any of the associated ALSA controls is read or written.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 07f7d6e7a124 ("ASoC: cs35l56: Fix for initializing ASP1 mixer registers") Link: https://lore.kernel.org/r/20240208123742.1278104-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: eba2eb2495f4 ("ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()") Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs35l56.c | 153 +++++++++++++++++-------------------- sound/soc/codecs/cs35l56.h | 2 +- 2 files changed, 73 insertions(+), 82 deletions(-)
diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index f05fab577f037..aaeed4992d846 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -68,63 +68,7 @@ static const char * const cs35l56_asp1_mux_control_names[] = { "ASP1 TX1 Source", "ASP1 TX2 Source", "ASP1 TX3 Source", "ASP1 TX4 Source" };
-static int cs35l56_dspwait_asp1tx_get(struct snd_kcontrol *kcontrol, - struct snd_ctl_elem_value *ucontrol) -{ - struct snd_soc_component *component = snd_soc_dapm_kcontrol_component(kcontrol); - struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component); - struct soc_enum *e = (struct soc_enum *)kcontrol->private_value; - int index = e->shift_l; - unsigned int addr, val; - int ret; - - /* Wait for mux to be initialized */ - cs35l56_wait_dsp_ready(cs35l56); - flush_work(&cs35l56->mux_init_work); - - addr = cs35l56_asp1_mixer_regs[index]; - ret = regmap_read(cs35l56->base.regmap, addr, &val); - if (ret) - return ret; - - val &= CS35L56_ASP_TXn_SRC_MASK; - ucontrol->value.enumerated.item[0] = snd_soc_enum_val_to_item(e, val); - - return 0; -} - -static int cs35l56_dspwait_asp1tx_put(struct snd_kcontrol *kcontrol, - struct snd_ctl_elem_value *ucontrol) -{ - struct snd_soc_component *component = snd_soc_dapm_kcontrol_component(kcontrol); - struct snd_soc_dapm_context *dapm = snd_soc_dapm_kcontrol_dapm(kcontrol); - struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component); - struct soc_enum *e = (struct soc_enum *)kcontrol->private_value; - int item = ucontrol->value.enumerated.item[0]; - int index = e->shift_l; - unsigned int addr, val; - bool changed; - int ret; - - /* Wait for mux to be initialized */ - cs35l56_wait_dsp_ready(cs35l56); - flush_work(&cs35l56->mux_init_work); - - addr = cs35l56_asp1_mixer_regs[index]; - val = snd_soc_enum_item_to_val(e, item); - - ret = regmap_update_bits_check(cs35l56->base.regmap, addr, - CS35L56_ASP_TXn_SRC_MASK, val, &changed); - if (!ret) - return ret; - - if (changed) - snd_soc_dapm_mux_update_power(dapm, kcontrol, item, e, NULL); - - return changed; -} - -static void cs35l56_mark_asp1_mixer_widgets_dirty(struct cs35l56_private *cs35l56) +static int cs35l56_sync_asp1_mixer_widgets_with_firmware(struct cs35l56_private *cs35l56) { struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(cs35l56->component); const char *prefix = cs35l56->component->name_prefix; @@ -135,13 +79,19 @@ static void cs35l56_mark_asp1_mixer_widgets_dirty(struct cs35l56_private *cs35l5 unsigned int val[4]; int i, item, ret;
+ if (cs35l56->asp1_mixer_widgets_initialized) + return 0; + /* * Resume so we can read the registers from silicon if the regmap * cache has not yet been populated. */ ret = pm_runtime_resume_and_get(cs35l56->base.dev); if (ret < 0) - return; + return ret; + + /* Wait for firmware download and reboot */ + cs35l56_wait_dsp_ready(cs35l56);
ret = regmap_bulk_read(cs35l56->base.regmap, CS35L56_ASP1TX1_INPUT, val, ARRAY_SIZE(val)); @@ -151,12 +101,9 @@ static void cs35l56_mark_asp1_mixer_widgets_dirty(struct cs35l56_private *cs35l5
if (ret) { dev_err(cs35l56->base.dev, "Failed to read ASP1 mixer regs: %d\n", ret); - return; + return ret; }
- snd_soc_card_mutex_lock(dapm->card); - WARN_ON(!dapm->card->instantiated); - for (i = 0; i < ARRAY_SIZE(cs35l56_asp1_mux_control_names); ++i) { name = cs35l56_asp1_mux_control_names[i];
@@ -176,16 +123,65 @@ static void cs35l56_mark_asp1_mixer_widgets_dirty(struct cs35l56_private *cs35l5 snd_soc_dapm_mux_update_power(dapm, kcontrol, item, e, NULL); }
- snd_soc_card_mutex_unlock(dapm->card); + cs35l56->asp1_mixer_widgets_initialized = true; + + return 0; }
-static void cs35l56_mux_init_work(struct work_struct *work) +static int cs35l56_dspwait_asp1tx_get(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_value *ucontrol) { - struct cs35l56_private *cs35l56 = container_of(work, - struct cs35l56_private, - mux_init_work); + struct snd_soc_component *component = snd_soc_dapm_kcontrol_component(kcontrol); + struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component); + struct soc_enum *e = (struct soc_enum *)kcontrol->private_value; + int index = e->shift_l; + unsigned int addr, val; + int ret;
- cs35l56_mark_asp1_mixer_widgets_dirty(cs35l56); + ret = cs35l56_sync_asp1_mixer_widgets_with_firmware(cs35l56); + if (ret) + return ret; + + addr = cs35l56_asp1_mixer_regs[index]; + ret = regmap_read(cs35l56->base.regmap, addr, &val); + if (ret) + return ret; + + val &= CS35L56_ASP_TXn_SRC_MASK; + ucontrol->value.enumerated.item[0] = snd_soc_enum_val_to_item(e, val); + + return 0; +} + +static int cs35l56_dspwait_asp1tx_put(struct snd_kcontrol *kcontrol, + struct snd_ctl_elem_value *ucontrol) +{ + struct snd_soc_component *component = snd_soc_dapm_kcontrol_component(kcontrol); + struct snd_soc_dapm_context *dapm = snd_soc_dapm_kcontrol_dapm(kcontrol); + struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component); + struct soc_enum *e = (struct soc_enum *)kcontrol->private_value; + int item = ucontrol->value.enumerated.item[0]; + int index = e->shift_l; + unsigned int addr, val; + bool changed; + int ret; + + ret = cs35l56_sync_asp1_mixer_widgets_with_firmware(cs35l56); + if (ret) + return ret; + + addr = cs35l56_asp1_mixer_regs[index]; + val = snd_soc_enum_item_to_val(e, item); + + ret = regmap_update_bits_check(cs35l56->base.regmap, addr, + CS35L56_ASP_TXn_SRC_MASK, val, &changed); + if (!ret) + return ret; + + if (changed) + snd_soc_dapm_mux_update_power(dapm, kcontrol, item, e, NULL); + + return changed; }
static DECLARE_TLV_DB_SCALE(vol_tlv, -10000, 25, 0); @@ -909,14 +905,6 @@ static void cs35l56_dsp_work(struct work_struct *work) else cs35l56_patch(cs35l56);
- - /* - * Set starting value of ASP1 mux widgets. Updating a mux takes - * the DAPM mutex. Post this to a separate job so that DAPM - * power-up can wait for dsp_work to complete without deadlocking - * on the DAPM mutex. - */ - queue_work(cs35l56->dsp_wq, &cs35l56->mux_init_work); err: pm_runtime_mark_last_busy(cs35l56->base.dev); pm_runtime_put_autosuspend(cs35l56->base.dev); @@ -953,6 +941,13 @@ static int cs35l56_component_probe(struct snd_soc_component *component) debugfs_create_bool("can_hibernate", 0444, debugfs_root, &cs35l56->base.can_hibernate); debugfs_create_bool("fw_patched", 0444, debugfs_root, &cs35l56->base.fw_patched);
+ /* + * The widgets for the ASP1TX mixer can't be initialized + * until the firmware has been downloaded and rebooted. + */ + regcache_drop_region(cs35l56->base.regmap, CS35L56_ASP1TX1_INPUT, CS35L56_ASP1TX4_INPUT); + cs35l56->asp1_mixer_widgets_initialized = false; + queue_work(cs35l56->dsp_wq, &cs35l56->dsp_work);
return 0; @@ -963,7 +958,6 @@ static void cs35l56_component_remove(struct snd_soc_component *component) struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component);
cancel_work_sync(&cs35l56->dsp_work); - cancel_work_sync(&cs35l56->mux_init_work);
if (cs35l56->dsp.cs_dsp.booted) wm_adsp_power_down(&cs35l56->dsp); @@ -1034,10 +1028,8 @@ int cs35l56_system_suspend(struct device *dev)
dev_dbg(dev, "system_suspend\n");
- if (cs35l56->component) { + if (cs35l56->component) flush_work(&cs35l56->dsp_work); - cancel_work_sync(&cs35l56->mux_init_work); - }
/* * The interrupt line is normally shared, but after we start suspending @@ -1188,7 +1180,6 @@ static int cs35l56_dsp_init(struct cs35l56_private *cs35l56) return -ENOMEM;
INIT_WORK(&cs35l56->dsp_work, cs35l56_dsp_work); - INIT_WORK(&cs35l56->mux_init_work, cs35l56_mux_init_work);
dsp = &cs35l56->dsp; cs35l56_init_cs_dsp(&cs35l56->base, &dsp->cs_dsp); diff --git a/sound/soc/codecs/cs35l56.h b/sound/soc/codecs/cs35l56.h index dc2fe4c91e67b..d9fbf568a1958 100644 --- a/sound/soc/codecs/cs35l56.h +++ b/sound/soc/codecs/cs35l56.h @@ -34,7 +34,6 @@ struct cs35l56_private { struct wm_adsp dsp; /* must be first member */ struct cs35l56_base base; struct work_struct dsp_work; - struct work_struct mux_init_work; struct workqueue_struct *dsp_wq; struct snd_soc_component *component; struct regulator_bulk_data supplies[CS35L56_NUM_BULK_SUPPLIES]; @@ -51,6 +50,7 @@ struct cs35l56_private { u8 asp_slot_count; bool tdm_mode; bool sysclk_set; + bool asp1_mixer_widgets_initialized; u8 old_sdw_clock_scale; };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit eba2eb2495f47690400331c722868902784e59de ]
snd_soc_card_get_kcontrol() must be holding a read lock on card->controls_rwsem while walking the controls list.
Compare with snd_ctl_find_numid().
The existing function is renamed snd_soc_card_get_kcontrol_locked() so that it can be called from contexts that are already holding card->controls_rwsem (for example, control get/put functions).
There are few direct or indirect callers of snd_soc_card_get_kcontrol(), and most are safe. Three require changes, which have been included in this patch:
codecs/cs35l45.c: cs35l45_activate_ctl() is called from a control put() function so is changed to call snd_soc_card_get_kcontrol_locked().
codecs/cs35l56.c: cs35l56_sync_asp1_mixer_widgets_with_firmware() is called from control get()/put() functions so is changed to call snd_soc_card_get_kcontrol_locked().
fsl/fsl_xcvr.c: fsl_xcvr_activate_ctl() is called from three places, one of which already holds card->controls_rwsem: 1. fsl_xcvr_mode_put(), a control put function, which will already be holding card->controls_rwsem. 2. fsl_xcvr_startup(), a DAI startup function. 3. fsl_xcvr_shutdown(), a DAI shutdown function.
To fix this, fsl_xcvr_activate_ctl() has been changed to call snd_soc_card_get_kcontrol_locked() so that it is safe to call directly from fsl_xcvr_mode_put(). The fsl_xcvr_startup() and fsl_xcvr_shutdown() functions have been changed to take a read lock on card->controls_rsem() around calls to fsl_xcvr_activate_ctl(). While this is not very elegant, it keeps the change small, to avoid this patch creating a large collateral churn in fsl/fsl_xcvr.c.
Analysis of other callers of snd_soc_card_get_kcontrol() is that they do not need any changes, they are not holding card->controls_rwsem when they call snd_soc_card_get_kcontrol().
Direct callers of snd_soc_card_get_kcontrol(): fsl/fsl_spdif.c: fsl_spdif_dai_probe() - DAI probe function fsl/fsl_micfil.c: voice_detected_fn() - IRQ handler
Indirect callers via soc_component_notify_control(): codecs/cs42l43: cs42l43_mic_shutter() - IRQ handler codecs/cs42l43: cs42l43_spk_shutter() - IRQ handler codecs/ak4118.c: ak4118_irq_handler() - IRQ handler codecs/wm_adsp.c: wm_adsp_write_ctl() - not currently used
Indirect callers via snd_soc_limit_volume(): qcom/sc8280xp.c: sc8280xp_snd_init() - DAIlink init function ti/rx51.c: rx51_aic34_init() - DAI init function
I don't have hardware to test the fsl/*, qcom/sc828xp.c, ti/rx51.c and ak4118.c changes.
Backport note: The fsl/, qcom/, cs35l45, cs35l56 and cs42l43 callers were added since the Fixes commit so won't all be present on older kernels.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 209c6cdfd283 ("ASoC: soc-card: move snd_soc_card_get_kcontrol() to soc-card") Link: https://lore.kernel.org/r/20240221123710.690224-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/sound/soc-card.h | 2 ++ sound/soc/codecs/cs35l45.c | 2 +- sound/soc/codecs/cs35l56.c | 2 +- sound/soc/fsl/fsl_xcvr.c | 12 +++++++++++- sound/soc/soc-card.c | 24 ++++++++++++++++++++++-- 5 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/include/sound/soc-card.h b/include/sound/soc-card.h index ecc02e955279f..1f4c39922d825 100644 --- a/include/sound/soc-card.h +++ b/include/sound/soc-card.h @@ -30,6 +30,8 @@ static inline void snd_soc_card_mutex_unlock(struct snd_soc_card *card)
struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card, const char *name); +struct snd_kcontrol *snd_soc_card_get_kcontrol_locked(struct snd_soc_card *soc_card, + const char *name); int snd_soc_card_jack_new(struct snd_soc_card *card, const char *id, int type, struct snd_soc_jack *jack); int snd_soc_card_jack_new_pins(struct snd_soc_card *card, const char *id, diff --git a/sound/soc/codecs/cs35l45.c b/sound/soc/codecs/cs35l45.c index 44c221745c3b2..2392c6effed85 100644 --- a/sound/soc/codecs/cs35l45.c +++ b/sound/soc/codecs/cs35l45.c @@ -184,7 +184,7 @@ static int cs35l45_activate_ctl(struct snd_soc_component *component, else snprintf(name, SNDRV_CTL_ELEM_ID_NAME_MAXLEN, "%s", ctl_name);
- kcontrol = snd_soc_card_get_kcontrol(component->card, name); + kcontrol = snd_soc_card_get_kcontrol_locked(component->card, name); if (!kcontrol) { dev_err(component->dev, "Can't find kcontrol %s\n", name); return -EINVAL; diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index aaeed4992d846..8e596f5ce8b2a 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -112,7 +112,7 @@ static int cs35l56_sync_asp1_mixer_widgets_with_firmware(struct cs35l56_private name = full_name; }
- kcontrol = snd_soc_card_get_kcontrol(dapm->card, name); + kcontrol = snd_soc_card_get_kcontrol_locked(dapm->card, name); if (!kcontrol) { dev_warn(cs35l56->base.dev, "Could not find control %s\n", name); continue; diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c index f0fb33d719c25..c46f64557a7ff 100644 --- a/sound/soc/fsl/fsl_xcvr.c +++ b/sound/soc/fsl/fsl_xcvr.c @@ -174,7 +174,9 @@ static int fsl_xcvr_activate_ctl(struct snd_soc_dai *dai, const char *name, struct snd_kcontrol *kctl; bool enabled;
- kctl = snd_soc_card_get_kcontrol(card, name); + lockdep_assert_held(&card->snd_card->controls_rwsem); + + kctl = snd_soc_card_get_kcontrol_locked(card, name); if (kctl == NULL) return -ENOENT;
@@ -576,10 +578,14 @@ static int fsl_xcvr_startup(struct snd_pcm_substream *substream, xcvr->streams |= BIT(substream->stream);
if (!xcvr->soc_data->spdif_only) { + struct snd_soc_card *card = dai->component->card; + /* Disable XCVR controls if there is stream started */ + down_read(&card->snd_card->controls_rwsem); fsl_xcvr_activate_ctl(dai, fsl_xcvr_mode_kctl.name, false); fsl_xcvr_activate_ctl(dai, fsl_xcvr_arc_mode_kctl.name, false); fsl_xcvr_activate_ctl(dai, fsl_xcvr_earc_capds_kctl.name, false); + up_read(&card->snd_card->controls_rwsem); }
return 0; @@ -598,11 +604,15 @@ static void fsl_xcvr_shutdown(struct snd_pcm_substream *substream, /* Enable XCVR controls if there is no stream started */ if (!xcvr->streams) { if (!xcvr->soc_data->spdif_only) { + struct snd_soc_card *card = dai->component->card; + + down_read(&card->snd_card->controls_rwsem); fsl_xcvr_activate_ctl(dai, fsl_xcvr_mode_kctl.name, true); fsl_xcvr_activate_ctl(dai, fsl_xcvr_arc_mode_kctl.name, (xcvr->mode == FSL_XCVR_MODE_ARC)); fsl_xcvr_activate_ctl(dai, fsl_xcvr_earc_capds_kctl.name, (xcvr->mode == FSL_XCVR_MODE_EARC)); + up_read(&card->snd_card->controls_rwsem); } ret = regmap_update_bits(xcvr->regmap, FSL_XCVR_EXT_IER0, FSL_XCVR_IRQ_EARC_ALL, 0); diff --git a/sound/soc/soc-card.c b/sound/soc/soc-card.c index 285ab4c9c7168..8a2f163da6bc9 100644 --- a/sound/soc/soc-card.c +++ b/sound/soc/soc-card.c @@ -5,6 +5,9 @@ // Copyright (C) 2019 Renesas Electronics Corp. // Kuninori Morimoto kuninori.morimoto.gx@renesas.com // + +#include <linux/lockdep.h> +#include <linux/rwsem.h> #include <sound/soc.h> #include <sound/jack.h>
@@ -26,12 +29,15 @@ static inline int _soc_card_ret(struct snd_soc_card *card, return ret; }
-struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card, - const char *name) +struct snd_kcontrol *snd_soc_card_get_kcontrol_locked(struct snd_soc_card *soc_card, + const char *name) { struct snd_card *card = soc_card->snd_card; struct snd_kcontrol *kctl;
+ /* must be held read or write */ + lockdep_assert_held(&card->controls_rwsem); + if (unlikely(!name)) return NULL;
@@ -40,6 +46,20 @@ struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card, return kctl; return NULL; } +EXPORT_SYMBOL_GPL(snd_soc_card_get_kcontrol_locked); + +struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card, + const char *name) +{ + struct snd_card *card = soc_card->snd_card; + struct snd_kcontrol *kctl; + + down_read(&card->controls_rwsem); + kctl = snd_soc_card_get_kcontrol_locked(soc_card, name); + up_read(&card->controls_rwsem); + + return kctl; +} EXPORT_SYMBOL_GPL(snd_soc_card_get_kcontrol);
static int jack_new(struct snd_soc_card *card, const char *id, int type,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Conor Dooley conor@kernel.org
[ Upstream commit d82f32202e0df7bf40d4b67c8a4ff9cea32df4d9 ]
Before attempting to support the pre-ratification version of vector found on older T-Head CPUs, disallow "v" in riscv,isa on these platforms. The deprecated property has no clear way to communicate the specific version of vector that is supported and much of the vendor provided software puts "v" in the isa string. riscv,isa-extensions should be used instead. This should not be too much of a burden for these systems, as the vendor shipped devicetrees and firmware do not work with a mainline kernel and will require updating.
We can limit this restriction to only ignore v in riscv,isa on CPUs that report T-Head's vendor ID and a zero marchid. Newer T-Head CPUs that support the ratified version of vector should report non-zero marchid, according to Guo Ren [1].
Link: https://lore.kernel.org/linux-riscv/CAJF2gTRy5eK73=d6s7CVy9m9pB8p4rAoMHM3cZF... [1] Fixes: dc6667a4e7e3 ("riscv: Extending cpufeature.c to detect V-extension") Co-developed-by: Conor Dooley conor.dooley@microchip.com Signed-off-by: Conor Dooley conor.dooley@microchip.com Acked-by: Guo Ren guoren@kernel.org Link: https://lore.kernel.org/r/20240223-tidings-shabby-607f086cb4d7@spud Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/kernel/cpufeature.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index b3785ffc15703..92a26f8b18450 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -22,6 +22,7 @@ #include <asm/hwprobe.h> #include <asm/patch.h> #include <asm/processor.h> +#include <asm/sbi.h> #include <asm/vector.h>
#include "copy-unaligned.h" @@ -401,6 +402,20 @@ static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap) set_bit(RISCV_ISA_EXT_ZIHPM, isainfo->isa); }
+ /* + * "V" in ISA strings is ambiguous in practice: it should mean + * just the standard V-1.0 but vendors aren't well behaved. + * Many vendors with T-Head CPU cores which implement the 0.7.1 + * version of the vector specification put "v" into their DTs. + * CPU cores with the ratified spec will contain non-zero + * marchid. + */ + if (acpi_disabled && riscv_cached_mvendorid(cpu) == THEAD_VENDOR_ID && + riscv_cached_marchid(cpu) == 0x0) { + this_hwcap &= ~isa2hwcap[RISCV_ISA_EXT_v]; + clear_bit(RISCV_ISA_EXT_v, isainfo->isa); + } + /* * All "okay" hart should have same isa. Set HWCAP based on * common capabilities of every "okay" hart, in case they don't
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thierry Reding treding@nvidia.com
[ Upstream commit 86bf8cfda6d2a6720fa2e6e676c98f0882c9d3d7 ]
Tegra DRM doesn't support display on Tegra234 and later, so make sure not to remove any existing framebuffers in that case.
v2: - add comments explaining how this situation can come about - clear DRIVER_MODESET and DRIVER_ATOMIC feature bits
Fixes: 6848c291a54f ("drm/aperture: Convert drivers to aperture interfaces") Signed-off-by: Thierry Reding treding@nvidia.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: Javier Martinez Canillas javierm@redhat.com Signed-off-by: Robert Foss rfoss@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20240223150333.1401582-1-thier... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/tegra/drm.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index ff36171c8fb70..373bcd79257e0 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -1242,9 +1242,26 @@ static int host1x_drm_probe(struct host1x_device *dev)
drm_mode_config_reset(drm);
- err = drm_aperture_remove_framebuffers(&tegra_drm_driver); - if (err < 0) - goto hub; + /* + * Only take over from a potential firmware framebuffer if any CRTCs + * have been registered. This must not be a fatal error because there + * are other accelerators that are exposed via this driver. + * + * Another case where this happens is on Tegra234 where the display + * hardware is no longer part of the host1x complex, so this driver + * will not expose any modesetting features. + */ + if (drm->mode_config.num_crtc > 0) { + err = drm_aperture_remove_framebuffers(&tegra_drm_driver); + if (err < 0) + goto hub; + } else { + /* + * Indicate to userspace that this doesn't expose any display + * capabilities. + */ + drm->driver_features &= ~(DRIVER_MODESET | DRIVER_ATOMIC); + }
err = drm_dev_register(drm, 0); if (err < 0)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby (SUSE) jirislaby@kernel.org
[ Upstream commit 00d6a284fcf3fad1b7e1b5bc3cd87cbfb60ce03f ]
Commit a5a923038d70 (fbdev: fbcon: Properly revert changes when vc_resize() failed) started restoring old font data upon failure (of vc_resize()). But it performs so only for user fonts. It means that the "system"/internal fonts are not restored at all. So in result, the very first call to fbcon_do_set_font() performs no restore at all upon failing vc_resize().
This can be reproduced by Syzkaller to crash the system on the next invocation of font_get(). It's rather hard to hit the allocation failure in vc_resize() on the first font_set(), but not impossible. Esp. if fault injection is used to aid the execution/failure. It was demonstrated by Sirius: BUG: unable to handle page fault for address: fffffffffffffff8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD cb7b067 P4D cb7b067 PUD cb7d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8007 Comm: poc Not tainted 6.7.0-g9d1694dc91ce #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:fbcon_get_font+0x229/0x800 drivers/video/fbdev/core/fbcon.c:2286 Call Trace: <TASK> con_font_get drivers/tty/vt/vt.c:4558 [inline] con_font_op+0x1fc/0xf20 drivers/tty/vt/vt.c:4673 vt_k_ioctl drivers/tty/vt/vt_ioctl.c:474 [inline] vt_ioctl+0x632/0x2ec0 drivers/tty/vt/vt_ioctl.c:752 tty_ioctl+0x6f8/0x1570 drivers/tty/tty_io.c:2803 vfs_ioctl fs/ioctl.c:51 [inline] ...
So restore the font data in any case, not only for user fonts. Note the later 'if' is now protected by 'old_userfont' and not 'old_data' as the latter is always set now. (And it is supposed to be non-NULL. Otherwise we would see the bug above again.)
Signed-off-by: Jiri Slaby (SUSE) jirislaby@kernel.org Fixes: a5a923038d70 ("fbdev: fbcon: Properly revert changes when vc_resize() failed") Reported-and-tested-by: Ubisectech Sirius bugreport@ubisectech.com Cc: Ubisectech Sirius bugreport@ubisectech.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Helge Deller deller@gmx.de Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20240208114411.14604-1-jirisla... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/core/fbcon.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c index 63af6ab034b5f..e6c45d585482f 100644 --- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -2400,11 +2400,9 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, int charcount, struct fbcon_ops *ops = info->fbcon_par; struct fbcon_display *p = &fb_display[vc->vc_num]; int resize, ret, old_userfont, old_width, old_height, old_charcount; - char *old_data = NULL; + u8 *old_data = vc->vc_font.data;
resize = (w != vc->vc_font.width) || (h != vc->vc_font.height); - if (p->userfont) - old_data = vc->vc_font.data; vc->vc_font.data = (void *)(p->fontdata = data); old_userfont = p->userfont; if ((p->userfont = userfont)) @@ -2438,13 +2436,13 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, int charcount, update_screen(vc); }
- if (old_data && (--REFCOUNT(old_data) == 0)) + if (old_userfont && (--REFCOUNT(old_data) == 0)) kfree(old_data - FONT_EXTRA_WORDS * sizeof(int)); return 0;
err_out: p->fontdata = old_data; - vc->vc_font.data = (void *)old_data; + vc->vc_font.data = old_data;
if (userfont) { p->userfont = old_userfont;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 5f7a07646655fb4108da527565dcdc80124b14c4 ]
If a directory has a block with only ".__afsXXXX" files in it (from uncompleted silly-rename), these .__afsXXXX files are skipped but without advancing the file position in the dir_context. This leads to afs_dir_iterate() repeating the block again and again.
Fix this by making the code that skips the .__afsXXXX file also manually advance the file position.
The symptoms are a soft lookup:
watchdog: BUG: soft lockup - CPU#3 stuck for 52s! [check:5737] ... RIP: 0010:afs_dir_iterate_block+0x39/0x1fd ... ? watchdog_timer_fn+0x1a6/0x213 ... ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? afs_dir_iterate_block+0x39/0x1fd afs_dir_iterate+0x10a/0x148 afs_readdir+0x30/0x4a iterate_dir+0x93/0xd3 __do_sys_getdents64+0x6b/0xd4
This is almost certainly the actual fix for:
https://bugzilla.kernel.org/show_bug.cgi?id=218496
Fixes: 57e9d49c5452 ("afs: Hide silly-rename files from userspace") Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/786185.1708694102@warthog.procyon.org.uk Reviewed-by: Marc Dionne marc.dionne@auristor.com cc: Marc Dionne marc.dionne@auristor.com cc: Markus Suvanto markus.suvanto@gmail.com cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/dir.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 9140780be5a43..c097da6e9c5b2 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -479,8 +479,10 @@ static int afs_dir_iterate_block(struct afs_vnode *dvnode, dire->u.name[0] == '.' && ctx->actor != afs_lookup_filldir && ctx->actor != afs_lookup_one_filldir && - memcmp(dire->u.name, ".__afs", 6) == 0) + memcmp(dire->u.name, ".__afs", 6) == 0) { + ctx->pos = blkoff + next * sizeof(union afs_xdr_dirent); continue; + }
/* found the next entry */ if (!dir_emit(ctx, dire->u.name, nlen,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivasan Shanmugam srinivasan.shanmugam@amd.com
[ Upstream commit 0f8ca019544a252d1afb468ce840c6dcbac73af4 ]
Adds a check in the map_hw_resources function to prevent a potential buffer overflow. The function was accessing arrays using an index that could potentially be greater than the size of the arrays, leading to a buffer overflow.
Adds a check to ensure that the index is within the bounds of the arrays. If the index is out of bounds, an error message is printed and break it will continue execution with just ignoring extra data early to prevent the buffer overflow.
Reported by smatch: drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_wrapper.c:79 map_hw_resources() error: buffer overflow 'dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_stream_id' 6 <= 7 drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_wrapper.c:81 map_hw_resources() error: buffer overflow 'dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_plane_id' 6 <= 7
Fixes: 7966f319c66d ("drm/amd/display: Introduce DML2") Cc: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Cc: Roman Li roman.li@amd.com Cc: Qingqing Zhuo Qingqing.Zhuo@amd.com Cc: Aurabindo Pillai aurabindo.pillai@amd.com Cc: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Srinivasan Shanmugam srinivasan.shanmugam@amd.com Suggested-by: Roman Li roman.li@amd.com Reviewed-by: Roman Li roman.li@amd.com Reviewed-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c index 8f231418870f2..c62b61ac45d27 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c @@ -76,6 +76,11 @@ static void map_hw_resources(struct dml2_context *dml2, in_out_display_cfg->hw.DLGRefClkFreqMHz = 50; } for (j = 0; j < mode_support_info->DPPPerSurface[i]; j++) { + if (i >= __DML2_WRAPPER_MAX_STREAMS_PLANES__) { + dml_print("DML::%s: Index out of bounds: i=%d, __DML2_WRAPPER_MAX_STREAMS_PLANES__=%d\n", + __func__, i, __DML2_WRAPPER_MAX_STREAMS_PLANES__); + break; + } dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_stream_id[num_pipes] = dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_stream_id[i]; dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_stream_id_valid[num_pipes] = true; dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_plane_id[num_pipes] = dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_plane_id[i];
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vadim Shakirov vadim.shakirov@syntacore.com
[ Upstream commit 65730fe8f4fb039683d76fa8ea7e8d18a53c6cc6 ]
Added the PERF_PMU_CAP_NO_INTERRUPT flag because the legacy pmu driver does not provide sampling capabilities
Added the PERF_PMU_CAP_NO_EXCLUDE flag because the legacy pmu driver does not provide the ability to disable counter incrementation in different privilege modes
Suggested-by: Atish Patra atishp@rivosinc.com Signed-off-by: Vadim Shakirov vadim.shakirov@syntacore.com Reviewed-by: Atish Patra atishp@rivosinc.com Fixes: 9b3e150e310e ("RISC-V: Add a simple platform driver for RISC-V legacy perf") Link: https://lore.kernel.org/r/20240227170002.188671-2-vadim.shakirov@syntacore.c... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/perf/riscv_pmu_legacy.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/perf/riscv_pmu_legacy.c b/drivers/perf/riscv_pmu_legacy.c index 79fdd667922e8..a85fc9a15f039 100644 --- a/drivers/perf/riscv_pmu_legacy.c +++ b/drivers/perf/riscv_pmu_legacy.c @@ -117,6 +117,8 @@ static void pmu_legacy_init(struct riscv_pmu *pmu) pmu->event_mapped = pmu_legacy_event_mapped; pmu->event_unmapped = pmu_legacy_event_unmapped; pmu->csr_index = pmu_legacy_csr_index; + pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT; + pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
perf_pmu_register(&pmu->pmu, "cpu", PERF_TYPE_RAW); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vadim Shakirov vadim.shakirov@syntacore.com
[ Upstream commit 682dc133f83e0194796e6ea72eb642df1c03dfbe ]
With parameters CONFIG_RISCV_PMU_LEGACY=y and CONFIG_RISCV_PMU_SBI=n linux kernel crashes when you try perf record:
$ perf record ls [ 46.749286] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 46.750199] Oops [#1] [ 46.750342] Modules linked in: [ 46.750608] CPU: 0 PID: 107 Comm: perf-exec Not tainted 6.6.0 #2 [ 46.750906] Hardware name: riscv-virtio,qemu (DT) [ 46.751184] epc : 0x0 [ 46.751430] ra : arch_perf_update_userpage+0x54/0x13e [ 46.751680] epc : 0000000000000000 ra : ffffffff8072ee52 sp : ff2000000022b8f0 [ 46.751958] gp : ffffffff81505988 tp : ff6000000290d400 t0 : ff2000000022b9c0 [ 46.752229] t1 : 0000000000000001 t2 : 0000000000000003 s0 : ff2000000022b930 [ 46.752451] s1 : ff600000028fb000 a0 : 0000000000000000 a1 : ff600000028fb000 [ 46.752673] a2 : 0000000ae2751268 a3 : 00000000004fb708 a4 : 0000000000000004 [ 46.752895] a5 : 0000000000000000 a6 : 000000000017ffe3 a7 : 00000000000000d2 [ 46.753117] s2 : ff600000028fb000 s3 : 0000000ae2751268 s4 : 0000000000000000 [ 46.753338] s5 : ffffffff8153e290 s6 : ff600000863b9000 s7 : ff60000002961078 [ 46.753562] s8 : ff60000002961048 s9 : ff60000002961058 s10: 0000000000000001 [ 46.753783] s11: 0000000000000018 t3 : ffffffffffffffff t4 : ffffffffffffffff [ 46.754005] t5 : ff6000000292270c t6 : ff2000000022bb30 [ 46.754179] status: 0000000200000100 badaddr: 0000000000000000 cause: 000000000000000c [ 46.754653] Code: Unable to access instruction at 0xffffffffffffffec. [ 46.754939] ---[ end trace 0000000000000000 ]--- [ 46.755131] note: perf-exec[107] exited with irqs disabled [ 46.755546] note: perf-exec[107] exited with preempt_count 4
This happens because in the legacy case the ctr_get_width function was not defined, but it is used in arch_perf_update_userpage.
Also remove extra check in riscv_pmu_ctr_get_width_mask
Signed-off-by: Vadim Shakirov vadim.shakirov@syntacore.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Atish Patra atishp@rivosinc.com Fixes: cc4c07c89aad ("drivers: perf: Implement perf event mmap support in the SBI backend") Link: https://lore.kernel.org/r/20240227170002.188671-3-vadim.shakirov@syntacore.c... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/perf/riscv_pmu.c | 18 +++++------------- drivers/perf/riscv_pmu_legacy.c | 8 +++++++- 2 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c index 0dda70e1ef90a..c78a6fd6c57f6 100644 --- a/drivers/perf/riscv_pmu.c +++ b/drivers/perf/riscv_pmu.c @@ -150,19 +150,11 @@ u64 riscv_pmu_ctr_get_width_mask(struct perf_event *event) struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu); struct hw_perf_event *hwc = &event->hw;
- if (!rvpmu->ctr_get_width) - /** - * If the pmu driver doesn't support counter width, set it to default - * maximum allowed by the specification. - */ - cwidth = 63; - else { - if (hwc->idx == -1) - /* Handle init case where idx is not initialized yet */ - cwidth = rvpmu->ctr_get_width(0); - else - cwidth = rvpmu->ctr_get_width(hwc->idx); - } + if (hwc->idx == -1) + /* Handle init case where idx is not initialized yet */ + cwidth = rvpmu->ctr_get_width(0); + else + cwidth = rvpmu->ctr_get_width(hwc->idx);
return GENMASK_ULL(cwidth, 0); } diff --git a/drivers/perf/riscv_pmu_legacy.c b/drivers/perf/riscv_pmu_legacy.c index a85fc9a15f039..fa0bccf4edf2e 100644 --- a/drivers/perf/riscv_pmu_legacy.c +++ b/drivers/perf/riscv_pmu_legacy.c @@ -37,6 +37,12 @@ static int pmu_legacy_event_map(struct perf_event *event, u64 *config) return pmu_legacy_ctr_get_idx(event); }
+/* cycle & instret are always 64 bit, one bit less according to SBI spec */ +static int pmu_legacy_ctr_get_width(int idx) +{ + return 63; +} + static u64 pmu_legacy_read_ctr(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; @@ -111,7 +117,7 @@ static void pmu_legacy_init(struct riscv_pmu *pmu) pmu->ctr_stop = NULL; pmu->event_map = pmu_legacy_event_map; pmu->ctr_get_idx = pmu_legacy_ctr_get_idx; - pmu->ctr_get_width = NULL; + pmu->ctr_get_width = pmu_legacy_ctr_get_width; pmu->ctr_clear_idx = NULL; pmu->ctr_read = pmu_legacy_read_ctr; pmu->event_mapped = pmu_legacy_event_mapped;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit 16ab4646c9057e0528b985ad772e3cb88c613db2 ]
This reverts commit ce173474cf19fe7fbe8f0fc74e3c81ec9c3d9807.
We cannot correctly deal with NAPOT mappings in vmalloc/vmap because if some part of a NAPOT mapping is unmapped, the remaining mapping is not updated accordingly. For example:
ptr = vmalloc_huge(64 * 1024, GFP_KERNEL); vunmap_range((unsigned long)(ptr + PAGE_SIZE), (unsigned long)(ptr + 64 * 1024));
leads to the following kernel page table dump:
0xffff8f8000ef0000-0xffff8f8000ef1000 0x00000001033c0000 4K PTE N .. .. D A G . . W R V
Meaning the first entry which was not unmapped still has the N bit set, which, if accessed first and cached in the TLB, could allow access to the unmapped range.
That's because the logic to break the NAPOT mapping does not exist and likely won't. Indeed, to break a NAPOT mapping, we first have to clear the whole mapping, flush the TLB and then set the new mapping ("break- before-make" equivalent). That works fine in userspace since we can handle any pagefault occurring on the remaining mapping but we can't handle a kernel pagefault on such mapping.
So fix this by reverting the commit that introduced the vmap/vmalloc support.
Fixes: ce173474cf19 ("riscv: mm: support Svnapot in huge vmap") Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240227205016.121901-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/vmalloc.h | 61 +------------------------------- 1 file changed, 1 insertion(+), 60 deletions(-)
diff --git a/arch/riscv/include/asm/vmalloc.h b/arch/riscv/include/asm/vmalloc.h index 924d01b56c9a1..51f6dfe19745a 100644 --- a/arch/riscv/include/asm/vmalloc.h +++ b/arch/riscv/include/asm/vmalloc.h @@ -19,65 +19,6 @@ static inline bool arch_vmap_pmd_supported(pgprot_t prot) return true; }
-#ifdef CONFIG_RISCV_ISA_SVNAPOT -#include <linux/pgtable.h> +#endif
-#define arch_vmap_pte_range_map_size arch_vmap_pte_range_map_size -static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr, unsigned long end, - u64 pfn, unsigned int max_page_shift) -{ - unsigned long map_size = PAGE_SIZE; - unsigned long size, order; - - if (!has_svnapot()) - return map_size; - - for_each_napot_order_rev(order) { - if (napot_cont_shift(order) > max_page_shift) - continue; - - size = napot_cont_size(order); - if (end - addr < size) - continue; - - if (!IS_ALIGNED(addr, size)) - continue; - - if (!IS_ALIGNED(PFN_PHYS(pfn), size)) - continue; - - map_size = size; - break; - } - - return map_size; -} - -#define arch_vmap_pte_supported_shift arch_vmap_pte_supported_shift -static inline int arch_vmap_pte_supported_shift(unsigned long size) -{ - int shift = PAGE_SHIFT; - unsigned long order; - - if (!has_svnapot()) - return shift; - - WARN_ON_ONCE(size >= PMD_SIZE); - - for_each_napot_order_rev(order) { - if (napot_cont_size(order) > size) - continue; - - if (!IS_ALIGNED(size, napot_cont_size(order))) - continue; - - shift = napot_cont_shift(order); - break; - } - - return shift; -} - -#endif /* CONFIG_RISCV_ISA_SVNAPOT */ -#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ #endif /* _ASM_RISCV_VMALLOC_H */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit e0fe5ab4192c171c111976dbe90bbd37d3976be0 ]
pte_leaf_size() must be reimplemented to add support for NAPOT mappings.
Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page") Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240227205016.121901-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/pgtable.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 74ffb2178f545..ec8468ddad4a0 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -439,6 +439,10 @@ static inline pte_t pte_mkhuge(pte_t pte) return pte; }
+#define pte_leaf_size(pte) (pte_napot(pte) ? \ + napot_cont_size(napot_cont_order(pte)) :\ + PAGE_SIZE) + #ifdef CONFIG_NUMA_BALANCING /* * See the comment in include/asm-generic/pgtable.h
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dimitris Vlachos dvlachos@ics.forth.gr
[ Upstream commit a11dd49dcb9376776193e15641f84fcc1e5980c9 ]
Offset vmemmap so that the first page of vmemmap will be mapped to the first page of physical memory in order to ensure that vmemmap’s bounds will be respected during pfn_to_page()/page_to_pfn() operations. The conversion macros will produce correct SV39/48/57 addresses for every possible/valid DRAM_BASE inside the physical memory limits.
v2:Address Alex's comments
Suggested-by: Alexandre Ghiti alexghiti@rivosinc.com Signed-off-by: Dimitris Vlachos dvlachos@ics.forth.gr Reported-by: Dimitris Vlachos dvlachos@ics.forth.gr Closes: https://lore.kernel.org/linux-riscv/20240202135030.42265-1-csd4492@csd.uoc.g... Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240229191723.32779-1-dvlachos@ics.forth.gr Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index ec8468ddad4a0..76b131e7bbcad 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -84,7 +84,7 @@ * Define vmemmap for pfn_to_page & page_to_pfn calls. Needed if kernel * is configured with CONFIG_SPARSEMEM_VMEMMAP enabled. */ -#define vmemmap ((struct page *)VMEMMAP_START) +#define vmemmap ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT))
#define PCI_IO_SIZE SZ_16M #define PCI_IO_END VMEMMAP_START
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
[ Upstream commit a1a4a9ca77f143c00fce69c1239887ff8b813bec ]
For fiemap we recently stopped locking the target extent range for the whole duration of the fiemap call, in order to avoid a deadlock in a scenario where the fiemap buffer happens to be a memory mapped range of the same file. This use case is very unlikely to be useful in practice but it may be triggered by fuzz testing (syzbot, etc).
However by not locking the target extent range for the whole duration of the fiemap call we can race with an ordered extent. This happens like this:
1) The fiemap task finishes processing a file extent item that covers the file range [512K, 1M[, and that file extent item is the last item in the leaf currently being processed;
2) And ordered extent for the file range [768K, 2M[, in COW mode, completes (btrfs_finish_one_ordered()) and the file extent item covering the range [512K, 1M[ is trimmed to cover the range [512K, 768K[ and then a new file extent item for the range [768K, 2M[ is inserted in the inode's subvolume tree;
3) The fiemap task calls fiemap_next_leaf_item(), which then calls btrfs_next_leaf() to find the next leaf / item. This finds that the the next key following the one we previously processed (its type is BTRFS_EXTENT_DATA_KEY and its offset is 512K), is the key corresponding to the new file extent item inserted by the ordered extent, which has a type of BTRFS_EXTENT_DATA_KEY and an offset of 768K;
4) Later the fiemap code ends up at emit_fiemap_extent() and triggers the warning:
if (cache->offset + cache->len > offset) { WARN_ON(1); return -EINVAL; }
Since we get 1M > 768K, because the previously emitted entry for the old extent covering the file range [512K, 1M[ ends at an offset that is greater than the new extent's start offset (768K). This makes fiemap fail with -EINVAL besides triggering the warning that produces a stack trace like the following:
[1621.677651] ------------[ cut here ]------------ [1621.677656] WARNING: CPU: 1 PID: 204366 at fs/btrfs/extent_io.c:2492 emit_fiemap_extent+0x84/0x90 [btrfs] [1621.677899] Modules linked in: btrfs blake2b_generic (...) [1621.677951] CPU: 1 PID: 204366 Comm: pool Not tainted 6.8.0-rc5-btrfs-next-151+ #1 [1621.677954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [1621.677956] RIP: 0010:emit_fiemap_extent+0x84/0x90 [btrfs] [1621.678033] Code: 2b 4c 89 63 (...) [1621.678035] RSP: 0018:ffffab16089ffd20 EFLAGS: 00010206 [1621.678037] RAX: 00000000004fa000 RBX: ffffab16089ffe08 RCX: 0000000000009000 [1621.678039] RDX: 00000000004f9000 RSI: 00000000004f1000 RDI: ffffab16089ffe90 [1621.678040] RBP: 00000000004f9000 R08: 0000000000001000 R09: 0000000000000000 [1621.678041] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000041d78000 [1621.678043] R13: 0000000000001000 R14: 0000000000000000 R15: ffff9434f0b17850 [1621.678044] FS: 00007fa6e20006c0(0000) GS:ffff943bdfa40000(0000) knlGS:0000000000000000 [1621.678046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1621.678048] CR2: 00007fa6b0801000 CR3: 000000012d404002 CR4: 0000000000370ef0 [1621.678053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1621.678055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1621.678056] Call Trace: [1621.678074] <TASK> [1621.678076] ? __warn+0x80/0x130 [1621.678082] ? emit_fiemap_extent+0x84/0x90 [btrfs] [1621.678159] ? report_bug+0x1f4/0x200 [1621.678164] ? handle_bug+0x42/0x70 [1621.678167] ? exc_invalid_op+0x14/0x70 [1621.678170] ? asm_exc_invalid_op+0x16/0x20 [1621.678178] ? emit_fiemap_extent+0x84/0x90 [btrfs] [1621.678253] extent_fiemap+0x766/0xa30 [btrfs] [1621.678339] btrfs_fiemap+0x45/0x80 [btrfs] [1621.678420] do_vfs_ioctl+0x1e4/0x870 [1621.678431] __x64_sys_ioctl+0x6a/0xc0 [1621.678434] do_syscall_64+0x52/0x120 [1621.678445] entry_SYSCALL_64_after_hwframe+0x6e/0x76
There's also another case where before calling btrfs_next_leaf() we are processing a hole or a prealloc extent and we had several delalloc ranges within that hole or prealloc extent. In that case if the ordered extents complete before we find the next key, we may end up finding an extent item with an offset smaller than (or equals to) the offset in cache->offset.
So fix this by changing emit_fiemap_extent() to address these three scenarios like this:
1) For the first case, steps listed above, adjust the length of the previously cached extent so that it does not overlap with the current extent, emit the previous one and cache the current file extent item;
2) For the second case where he had a hole or prealloc extent with multiple delalloc ranges inside the hole or prealloc extent's range, and the current file extent item has an offset that matches the offset in the fiemap cache, just discard what we have in the fiemap cache and assign the current file extent item to the cache, since it's more up to date;
3) For the third case where he had a hole or prealloc extent with multiple delalloc ranges inside the hole or prealloc extent's range and the offset of the file extent item we just found is smaller than what we have in the cache, just skip the current file extent item if its range end at or behind the cached extent's end, because we may have emitted (to the fiemap user space buffer) delalloc ranges that overlap with the current file extent item's range. If the file extent item's range goes beyond the end offset of the cached extent, just emit the cached extent and cache a subrange of the file extent item, that goes from the end offset of the cached extent to the end offset of the file extent item.
Dealing with those cases in those ways makes everything consistent by reflecting the current state of file extent items in the btree and without emitting extents that have overlapping ranges (which would be confusing and violating expectations).
This issue could be triggered often with test case generic/561, and was also hit and reported by Wang Yugui.
Reported-by: Wang Yugui wangyugui@e16-tech.com Link: https://lore.kernel.org/linux-btrfs/20240223104619.701F.409509F4@e16-tech.co... Fixes: b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking") Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 103 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 96 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 197b41d02735b..3f795f105a645 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2436,6 +2436,7 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, struct fiemap_cache *cache, u64 offset, u64 phys, u64 len, u32 flags) { + u64 cache_end; int ret = 0;
/* Set at the end of extent_fiemap(). */ @@ -2445,15 +2446,102 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, goto assign;
/* - * Sanity check, extent_fiemap() should have ensured that new - * fiemap extent won't overlap with cached one. - * Not recoverable. + * When iterating the extents of the inode, at extent_fiemap(), we may + * find an extent that starts at an offset behind the end offset of the + * previous extent we processed. This happens if fiemap is called + * without FIEMAP_FLAG_SYNC and there are ordered extents completing + * while we call btrfs_next_leaf() (through fiemap_next_leaf_item()). * - * NOTE: Physical address can overlap, due to compression + * For example we are in leaf X processing its last item, which is the + * file extent item for file range [512K, 1M[, and after + * btrfs_next_leaf() releases the path, there's an ordered extent that + * completes for the file range [768K, 2M[, and that results in trimming + * the file extent item so that it now corresponds to the file range + * [512K, 768K[ and a new file extent item is inserted for the file + * range [768K, 2M[, which may end up as the last item of leaf X or as + * the first item of the next leaf - in either case btrfs_next_leaf() + * will leave us with a path pointing to the new extent item, for the + * file range [768K, 2M[, since that's the first key that follows the + * last one we processed. So in order not to report overlapping extents + * to user space, we trim the length of the previously cached extent and + * emit it. + * + * Upon calling btrfs_next_leaf() we may also find an extent with an + * offset smaller than or equals to cache->offset, and this happens + * when we had a hole or prealloc extent with several delalloc ranges in + * it, but after btrfs_next_leaf() released the path, delalloc was + * flushed and the resulting ordered extents were completed, so we can + * now have found a file extent item for an offset that is smaller than + * or equals to what we have in cache->offset. We deal with this as + * described below. */ - if (cache->offset + cache->len > offset) { - WARN_ON(1); - return -EINVAL; + cache_end = cache->offset + cache->len; + if (cache_end > offset) { + if (offset == cache->offset) { + /* + * We cached a dealloc range (found in the io tree) for + * a hole or prealloc extent and we have now found a + * file extent item for the same offset. What we have + * now is more recent and up to date, so discard what + * we had in the cache and use what we have just found. + */ + goto assign; + } else if (offset > cache->offset) { + /* + * The extent range we previously found ends after the + * offset of the file extent item we found and that + * offset falls somewhere in the middle of that previous + * extent range. So adjust the range we previously found + * to end at the offset of the file extent item we have + * just found, since this extent is more up to date. + * Emit that adjusted range and cache the file extent + * item we have just found. This corresponds to the case + * where a previously found file extent item was split + * due to an ordered extent completing. + */ + cache->len = offset - cache->offset; + goto emit; + } else { + const u64 range_end = offset + len; + + /* + * The offset of the file extent item we have just found + * is behind the cached offset. This means we were + * processing a hole or prealloc extent for which we + * have found delalloc ranges (in the io tree), so what + * we have in the cache is the last delalloc range we + * found while the file extent item we found can be + * either for a whole delalloc range we previously + * emmitted or only a part of that range. + * + * We have two cases here: + * + * 1) The file extent item's range ends at or behind the + * cached extent's end. In this case just ignore the + * current file extent item because we don't want to + * overlap with previous ranges that may have been + * emmitted already; + * + * 2) The file extent item starts behind the currently + * cached extent but its end offset goes beyond the + * end offset of the cached extent. We don't want to + * overlap with a previous range that may have been + * emmitted already, so we emit the currently cached + * extent and then partially store the current file + * extent item's range in the cache, for the subrange + * going the cached extent's end to the end of the + * file extent item. + */ + if (range_end <= cache_end) + return 0; + + if (!(flags & (FIEMAP_EXTENT_ENCODED | FIEMAP_EXTENT_DELALLOC))) + phys += cache_end - offset; + + offset = cache_end; + len = range_end - cache_end; + goto emit; + } }
/* @@ -2473,6 +2561,7 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, return 0; }
+emit: /* Not mergeable, need to submit cached one */ ret = fiemap_fill_next_extent(fieinfo, cache->offset, cache->phys, cache->len, cache->flags);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sid Pranjale sidpranjale127@protonmail.com
[ Upstream commit f6ecfdad359a01c7fd8a3bcfde3ef0acdf107e6e ]
Nouveau deallocates a few buffers post GPU init which are required for GPU suspend/resume to function correctly. This is likely not as big an issue on systems where the NVGPU is the only GPU, but on multi-GPU set ups it leads to a regression where the kernel module errors and results in a system-wide rendering freeze.
This commit addresses that regression by moving the two buffers required for suspend and resume to be deallocated at driver unload instead of post init.
Fixes: 042b5f83841fb ("drm/nouveau: fix several DMA buffer leaks") Signed-off-by: Sid Pranjale sidpranjale127@protonmail.com Signed-off-by: Dave Airlie airlied@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c index a41735ab60683..d66fc3570642b 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c @@ -1054,8 +1054,6 @@ r535_gsp_postinit(struct nvkm_gsp *gsp) /* Release the DMA buffers that were needed only for boot and init */ nvkm_gsp_mem_dtor(gsp, &gsp->boot.fw); nvkm_gsp_mem_dtor(gsp, &gsp->libos); - nvkm_gsp_mem_dtor(gsp, &gsp->rmargs); - nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta);
return ret; } @@ -2163,6 +2161,8 @@ r535_gsp_dtor(struct nvkm_gsp *gsp)
r535_gsp_dtor_fws(gsp);
+ nvkm_gsp_mem_dtor(gsp, &gsp->rmargs); + nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta); nvkm_gsp_mem_dtor(gsp, &gsp->shm.mem); nvkm_gsp_mem_dtor(gsp, &gsp->loginit); nvkm_gsp_mem_dtor(gsp, &gsp->logintr);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Saravana Kannan saravanak@google.com
[ Upstream commit 7cb50f6c9fbaa1c0b80100b8971bf13db5d75d06 ]
Introduced a stupid bug in commit 782bfd03c3ae ("of: property: Improve finding the supplier of a remote-endpoint property") due to a last minute incorrect edit of "index !=0" into "!index". This patch fixes it to be "index > 0" to match the comment right next to it.
Reported-by: Luca Ceresoli luca.ceresoli@bootlin.com Link: https://lore.kernel.org/lkml/20240223171849.10f9901d@booty/ Fixes: 782bfd03c3ae ("of: property: Improve finding the supplier of a remote-endpoint property") Signed-off-by: Saravana Kannan saravanak@google.com Reviewed-by: Herve Codina herve.codina@bootlin.com Reviewed-by: Luca Ceresoli luca.ceresoli@bootlin.com Tested-by: Luca Ceresoli luca.ceresoli@bootlin.com Link: https://lore.kernel.org/r/20240224052436.3552333-1-saravanak@google.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/of/property.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/of/property.c b/drivers/of/property.c index 885ee040ee2b3..15d1156843256 100644 --- a/drivers/of/property.c +++ b/drivers/of/property.c @@ -1303,7 +1303,7 @@ static struct device_node *parse_remote_endpoint(struct device_node *np, int index) { /* Return NULL for index > 0 to signify end of remote-endpoints. */ - if (!index || strcmp(prop_name, "remote-endpoint")) + if (index > 0 || strcmp(prop_name, "remote-endpoint")) return NULL;
return of_graph_get_remote_port_parent(np);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 2f03fc340cac9ea1dc63cbf8c93dd2eb0f227815 upstream.
Since tomoyo_write_control() updates head->write_buf when write() of long lines is requested, we need to fetch head->write_buf after head->io_sem is held. Otherwise, concurrent write() requests can cause use-after-free-write and double-free problems.
Reported-by: Sam Sun samsun1006219@gmail.com Closes: https://lkml.kernel.org/r/CAEkJfYNDspuGxYx5kym8Lvp--D36CMDUErg4rxfWFJuPbbji8... Fixes: bd03a3e4c9a9 ("TOMOYO: Add policy namespace support.") Cc: stable@vger.kernel.org # Linux 3.1+ Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/tomoyo/common.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/security/tomoyo/common.c +++ b/security/tomoyo/common.c @@ -2649,13 +2649,14 @@ ssize_t tomoyo_write_control(struct tomo { int error = buffer_len; size_t avail_len = buffer_len; - char *cp0 = head->write_buf; + char *cp0; int idx;
if (!head->write) return -EINVAL; if (mutex_lock_interruptible(&head->io_sem)) return -EINTR; + cp0 = head->write_buf; head->read_user_buf_avail = 0; idx = tomoyo_read_lock(); /* Read a line and dispatch it to the policy handler. */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Sakamoto o-takashi@sakamocchi.jp
commit 77ce96543b03f437c6b45f286d8110db2b6622a3 upstream.
The local helper function to compare the given pair of cycle count evaluates them. If the left value is less than the right value, the function returns negative value.
If the safe cycle is less than the current cycle, it is the case of cycle lost. However, it is not currently handled properly.
This commit fixes the bug.
Cc: stable@vger.kernel.org Fixes: 705794c53b00 ("ALSA: firewire-lib: check cycle continuity") Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Link: https://lore.kernel.org/r/20240218033026.72577-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/firewire/amdtp-stream.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/firewire/amdtp-stream.c +++ b/sound/firewire/amdtp-stream.c @@ -951,7 +951,7 @@ static int generate_tx_packet_descs(stru // to the reason. unsigned int safe_cycle = increment_ohci_cycle_count(next_cycle, IR_JUMBO_PAYLOAD_MAX_SKIP_CYCLES); - lost = (compare_ohci_cycle_count(safe_cycle, cycle) > 0); + lost = (compare_ohci_cycle_count(safe_cycle, cycle) < 0); } if (lost) { dev_err(&s->unit->device, "Detect discontinuity of cycle: %d %d\n",
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 49cbb7b7d36ec3ba73ce1daf7ae1d71d435453b8 upstream.
snd_ump_legacy_open() didn't return the error code properly even if it couldn't open. Fix it.
Fixes: 0b5288f5fe63 ("ALSA: ump: Add legacy raw MIDI support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240220150843.28630-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/core/ump.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/sound/core/ump.c +++ b/sound/core/ump.c @@ -985,7 +985,7 @@ static int snd_ump_legacy_open(struct sn struct snd_ump_endpoint *ump = substream->rmidi->private_data; int dir = substream->stream; int group = ump->legacy_mapping[substream->number]; - int err; + int err = 0;
mutex_lock(&ump->open_mutex); if (ump->legacy_substreams[dir][group]) { @@ -1009,7 +1009,7 @@ static int snd_ump_legacy_open(struct sn spin_unlock_irq(&ump->legacy_locks[dir]); unlock: mutex_unlock(&ump->open_mutex); - return 0; + return err; }
static int snd_ump_legacy_close(struct snd_rawmidi_substream *substream)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jay Ajit Mate jay.mate15@gmail.com
commit 89a0dff6105e06067bdc57595982dbf6d6dd4959 upstream.
The Dell Inspiron 16 Plus 7630, similar to its predecessors (7620 models), experiences an issue with unconnected top speakers. Since the controller remains unchanged, this commit addresses the problem by correctly connecting the speakers on NID 0X17 to the DAC on NIC 0x03.
Signed-off-by: Jay Ajit Mate jay.mate15@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240219100404.9573-1-jay.mate15@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9733,6 +9733,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1028, 0x0c1c, "Dell Precision 3540", ALC236_FIXUP_DELL_DUAL_CODECS), SND_PCI_QUIRK(0x1028, 0x0c1d, "Dell Precision 3440", ALC236_FIXUP_DELL_DUAL_CODECS), SND_PCI_QUIRK(0x1028, 0x0c1e, "Dell Precision 3540", ALC236_FIXUP_DELL_DUAL_CODECS), + SND_PCI_QUIRK(0x1028, 0x0c28, "Dell Inspiron 16 Plus 7630", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS), SND_PCI_QUIRK(0x1028, 0x0c4d, "Dell", ALC287_FIXUP_CS35L41_I2C_4), SND_PCI_QUIRK(0x1028, 0x0cbd, "Dell Oasis 13 CS MTL-U", ALC289_FIXUP_DELL_CS35L41_SPI_2), SND_PCI_QUIRK(0x1028, 0x0cbe, "Dell Oasis 13 2-IN-1 MTL-U", ALC289_FIXUP_DELL_CS35L41_SPI_2),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gergo Koteles soyer@irl.hu
commit c1947ce61ff4cd4de2fe5f72423abedb6dc83011 upstream.
The volume of subwoofer channels is always at maximum with the ALC269_FIXUP_THINKPAD_ACPI chain.
Use ALC285_FIXUP_THINKPAD_HEADSET_JACK to align it to the master volume.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208555#c827
Fixes: 3babae915f4c ("ALSA: hda/tas2781: Add tas2781 HDA driver") Cc: stable@vger.kernel.org Signed-off-by: Gergo Koteles soyer@irl.hu Link: https://lore.kernel.org/r/7ffae10ebba58601d25fe2ff8381a6ae3a926e62.170868781... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9578,7 +9578,7 @@ static const struct hda_fixup alc269_fix .type = HDA_FIXUP_FUNC, .v.func = tas2781_fixup_i2c, .chained = true, - .chain_id = ALC269_FIXUP_THINKPAD_ACPI, + .chain_id = ALC285_FIXUP_THINKPAD_HEADSET_JACK, }, [ALC245_FIXUP_HP_MUTE_LED_COEFBIT] = { .type = HDA_FIXUP_FUNC,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans Peter flurry123@gmx.ch
commit 1fdf4e8be7059e7784fec11d30cd32784f0bdc83 upstream.
On my EliteBook 840 G8 Notebook PC (ProdId 5S7R6EC#ABD; built 2022 for german market) the Mute LED is always on. The mute button itself works as expected. alsa-info.sh shows a different subsystem-id 0x8ab9 for Realtek ALC285 Codec, thus the existing quirks for HP 840 G8 don't work. Therefore, add a new quirk for this type of EliteBook.
Signed-off-by: Hans Peter flurry123@gmx.ch Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240219164518.4099-1-flurry123@gmx.ch Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9915,6 +9915,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x103c, 0x8aa3, "HP ProBook 450 G9 (MB 8AA1)", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8aa8, "HP EliteBook 640 G9 (MB 8AA6)", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8aab, "HP EliteBook 650 G9 (MB 8AA9)", ALC236_FIXUP_HP_GPIO_LED), + SND_PCI_QUIRK(0x103c, 0x8ab9, "HP EliteBook 840 G8 (MB 8AB8)", ALC285_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8abb, "HP ZBook Firefly 14 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8ad1, "HP EliteBook 840 14 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8ad2, "HP EliteBook 860 16 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eniac Zhang eniac-xw.zhang@hp.com
commit 67c3d7717efbd46092f217b1f811df1b205cce06 upstream.
The HP mt440 Thin Client uses an ALC236 codec and needs the ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF quirk to make the mute and micmute LEDs work.
There are two variants of the USB-C PD chip on this device. Each uses a different BIOS and board ID, hence the two entries.
Signed-off-by: Eniac Zhang eniac-xw.zhang@hp.com Signed-off-by: Alexandru Gagniuc alexandru.gagniuc@hp.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240220175812.782687-1-alexandru.gagniuc@hp.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9890,6 +9890,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x103c, 0x8973, "HP EliteBook 860 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8974, "HP EliteBook 840 Aero G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8975, "HP EliteBook x360 840 Aero G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), + SND_PCI_QUIRK(0x103c, 0x897d, "HP mt440 Mobile Thin Client U74", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8981, "HP Elite Dragonfly G3", ALC245_FIXUP_CS35L41_SPI_4), SND_PCI_QUIRK(0x103c, 0x898e, "HP EliteBook 835 G9", ALC287_FIXUP_CS35L41_I2C_2), SND_PCI_QUIRK(0x103c, 0x898f, "HP EliteBook 835 G9", ALC287_FIXUP_CS35L41_I2C_2), @@ -9921,6 +9922,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x103c, 0x8ad2, "HP EliteBook 860 16 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8b0f, "HP Elite mt645 G7 Mobile Thin Client U81", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b2f, "HP 255 15.6 inch G10 Notebook PC", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2), + SND_PCI_QUIRK(0x103c, 0x8b3f, "HP mt440 Mobile Thin Client U91", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8b42, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8b43, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8b44, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willian Wang git@willian.wang
commit 0ac32a396e4f41e88df76ce2282423188a2d2ed0 upstream.
Lenovo Slim/Yoga Pro 9 14IRP8 requires a special fixup because there is a collision of its PCI SSID (17aa:3802) with Lenovo Yoga DuetITL 2021 codec SSID.
Fixes: 3babae915f4c ("ALSA: hda/tas2781: Add tas2781 HDA driver") Link: https://bugzilla.kernel.org/show_bug.cgi?id=208555 Link: https://lore.kernel.org/all/d5b42e483566a3815d229270abd668131a0d9f3a.camel@i... Cc: stable@vger.kernel.org Signed-off-by: Willian Wang git@willian.wang Reviewed-by: Gergo Koteles soyer@irl.hu Link: https://lore.kernel.org/r/170879111795.8.6687687359006700715.273812184@willi... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7438,6 +7438,7 @@ enum { ALC287_FIXUP_LEGION_15IMHG05_AUTOMUTE, ALC287_FIXUP_YOGA7_14ITL_SPEAKERS, ALC298_FIXUP_LENOVO_C940_DUET7, + ALC287_FIXUP_LENOVO_14IRP8_DUETITL, ALC287_FIXUP_13S_GEN2_SPEAKERS, ALC256_FIXUP_SET_COEF_DEFAULTS, ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE, @@ -7488,6 +7489,26 @@ static void alc298_fixup_lenovo_c940_due __snd_hda_apply_fixup(codec, id, action, 0); }
+/* A special fixup for Lenovo Slim/Yoga Pro 9 14IRP8 and Yoga DuetITL 2021; + * 14IRP8 PCI SSID will mistakenly be matched with the DuetITL codec SSID, + * so we need to apply a different fixup in this case. The only DuetITL codec + * SSID reported so far is the 17aa:3802 while the 14IRP8 has the 17aa:38be + * and 17aa:38bf. If it weren't for the PCI SSID, the 14IRP8 models would + * have matched correctly by their codecs. + */ +static void alc287_fixup_lenovo_14irp8_duetitl(struct hda_codec *codec, + const struct hda_fixup *fix, + int action) +{ + int id; + + if (codec->core.subsystem_id == 0x17aa3802) + id = ALC287_FIXUP_YOGA7_14ITL_SPEAKERS; /* DuetITL */ + else + id = ALC287_FIXUP_TAS2781_I2C; /* 14IRP8 */ + __snd_hda_apply_fixup(codec, id, action, 0); +} + static const struct hda_fixup alc269_fixups[] = { [ALC269_FIXUP_GPIO2] = { .type = HDA_FIXUP_FUNC, @@ -9372,6 +9393,10 @@ static const struct hda_fixup alc269_fix .type = HDA_FIXUP_FUNC, .v.func = alc298_fixup_lenovo_c940_duet7, }, + [ALC287_FIXUP_LENOVO_14IRP8_DUETITL] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc287_fixup_lenovo_14irp8_duetitl, + }, [ALC287_FIXUP_13S_GEN2_SPEAKERS] = { .type = HDA_FIXUP_VERBS, .v.verbs = (const struct hda_verb[]) { @@ -10239,7 +10264,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x17aa, 0x31af, "ThinkCentre Station", ALC623_FIXUP_LENOVO_THINKSTATION_P340), SND_PCI_QUIRK(0x17aa, 0x334b, "Lenovo ThinkCentre M70 Gen5", ALC283_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x17aa, 0x3801, "Lenovo Yoga9 14IAP7", ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN), - SND_PCI_QUIRK(0x17aa, 0x3802, "Lenovo Yoga DuetITL 2021", ALC287_FIXUP_YOGA7_14ITL_SPEAKERS), + SND_PCI_QUIRK(0x17aa, 0x3802, "Lenovo Yoga Pro 9 14IRP8 / DuetITL 2021", ALC287_FIXUP_LENOVO_14IRP8_DUETITL), SND_PCI_QUIRK(0x17aa, 0x3813, "Legion 7i 15IMHG05", ALC287_FIXUP_LEGION_15IMHG05_SPEAKERS), SND_PCI_QUIRK(0x17aa, 0x3818, "Lenovo C940 / Yoga Duet 7", ALC298_FIXUP_LENOVO_C940_DUET7), SND_PCI_QUIRK(0x17aa, 0x3819, "Lenovo 13s Gen2 ITL", ALC287_FIXUP_13S_GEN2_SPEAKERS),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit c17d2a7b216e168c3ba62d93482179c01b369ac7 upstream.
A recent commit restored the original (and still documented) semantics for the HCI_QUIRK_USE_BDADDR_PROPERTY quirk so that the device address is considered invalid unless an address is provided by firmware.
This specifically means that this flag must only be set for devices with invalid addresses, but the Broadcom BCM4377 driver has so far been setting this flag unconditionally.
Fortunately the driver already checks for invalid addresses during setup and sets the HCI_QUIRK_INVALID_BDADDR flag, which can simply be replaced with HCI_QUIRK_USE_BDADDR_PROPERTY to indicate that the default address is invalid but can be overridden by firmware (long term, this should probably just always be allowed).
Fixes: 6945795bc81a ("Bluetooth: fix use-bdaddr-property quirk") Cc: stable@vger.kernel.org # 6.5 Reported-by: Felix Zhang mrman@mrman314.tech Link: https://lore.kernel.org/r/77419ffacc5b4875e920e038332575a2a5bff29f.camel@mrm... Signed-off-by: Johan Hovold johan+linaro@kernel.org Reported-by: Felix Zhang mrman@mrman314.tech Reviewed-by: Neal Gompa neal@gompa.dev Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bluetooth/hci_bcm4377.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/bluetooth/hci_bcm4377.c +++ b/drivers/bluetooth/hci_bcm4377.c @@ -1417,7 +1417,7 @@ static int bcm4377_check_bdaddr(struct b
bda = (struct hci_rp_read_bd_addr *)skb->data; if (!bcm4377_is_valid_bdaddr(bcm4377, &bda->bdaddr)) - set_bit(HCI_QUIRK_INVALID_BDADDR, &bcm4377->hdev->quirks); + set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &bcm4377->hdev->quirks);
kfree_skb(skb); return 0; @@ -2368,7 +2368,6 @@ static int bcm4377_probe(struct pci_dev hdev->set_bdaddr = bcm4377_hci_set_bdaddr; hdev->setup = bcm4377_hci_setup;
- set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks); if (bcm4377->hw->broken_mws_transport_config) set_bit(HCI_QUIRK_BROKEN_MWS_TRANSPORT_CONFIG, &hdev->quirks); if (bcm4377->hw->broken_ext_scan)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mickaël Salaün mic@digikod.net
commit d9818b3e906a0ee1ab02ea79e74a2f755fc5461a upstream.
When linking or renaming a file, if only one of the source or destination directory is backed by an S_PRIVATE inode, then the related set of layer masks would be used as uninitialized by is_access_to_paths_allowed(). This would result to indeterministic access for one side instead of always being allowed.
This bug could only be triggered with a mounted filesystem containing both S_PRIVATE and !S_PRIVATE inodes, which doesn't seem possible.
The collect_domain_accesses() calls return early if is_nouser_or_private() returns false, which means that the directory's superblock has SB_NOUSER or its inode has S_PRIVATE. Because rename or link actions are only allowed on the same mounted filesystem, the superblock is always the same for both source and destination directories. However, it might be possible in theory to have an S_PRIVATE parent source inode with an !S_PRIVATE parent destination inode, or vice versa.
To make sure this case is not an issue, explicitly initialized both set of layer masks to 0, which means to allow all actions on the related side. If at least on side has !S_PRIVATE, then collect_domain_accesses() and is_access_to_paths_allowed() check for the required access rights.
Cc: Arnd Bergmann arnd@arndb.de Cc: Christian Brauner brauner@kernel.org Cc: Günther Noack gnoack@google.com Cc: Jann Horn jannh@google.com Cc: Shervin Oloumi enlightened@chromium.org Cc: stable@vger.kernel.org Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Link: https://lore.kernel.org/r/20240219190345.2928627-1-mic@digikod.net Signed-off-by: Mickaël Salaün mic@digikod.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/landlock/fs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -737,8 +737,8 @@ static int current_check_refer_path(stru bool allow_parent1, allow_parent2; access_mask_t access_request_parent1, access_request_parent2; struct path mnt_dir; - layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS], - layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS]; + layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS] = {}, + layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS] = {};
if (!dom) return 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Ofitserov oficerovas@altlinux.org
commit 616d82c3cfa2a2146dd7e3ae47bda7e877ee549e upstream.
The gtp_link_ops operations structure for the subsystem must be registered after registering the gtp_net_ops pernet operations structure.
Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug:
[ 1010.702740] gtp: GTP module unloaded [ 1010.715877] general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] SMP KASAN NOPTI [ 1010.715888] KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] [ 1010.715895] CPU: 1 PID: 128616 Comm: a.out Not tainted 6.8.0-rc6-std-def-alt1 #1 [ 1010.715899] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014 [ 1010.715908] RIP: 0010:gtp_newlink+0x4d7/0x9c0 [gtp] [ 1010.715915] Code: 80 3c 02 00 0f 85 41 04 00 00 48 8b bb d8 05 00 00 e8 ed f6 ff ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 4f 04 00 00 4c 89 e2 4c 8b 6d 00 48 b8 00 00 00 [ 1010.715920] RSP: 0018:ffff888020fbf180 EFLAGS: 00010203 [ 1010.715929] RAX: dffffc0000000000 RBX: ffff88800399c000 RCX: 0000000000000000 [ 1010.715933] RDX: 0000000000000001 RSI: ffffffff84805280 RDI: 0000000000000282 [ 1010.715938] RBP: 000000000000000d R08: 0000000000000001 R09: 0000000000000000 [ 1010.715942] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88800399cc80 [ 1010.715947] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000400 [ 1010.715953] FS: 00007fd1509ab5c0(0000) GS:ffff88805b300000(0000) knlGS:0000000000000000 [ 1010.715958] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1010.715962] CR2: 0000000000000000 CR3: 000000001c07a000 CR4: 0000000000750ee0 [ 1010.715968] PKRU: 55555554 [ 1010.715972] Call Trace: [ 1010.715985] ? __die_body.cold+0x1a/0x1f [ 1010.715995] ? die_addr+0x43/0x70 [ 1010.716002] ? exc_general_protection+0x199/0x2f0 [ 1010.716016] ? asm_exc_general_protection+0x1e/0x30 [ 1010.716026] ? gtp_newlink+0x4d7/0x9c0 [gtp] [ 1010.716034] ? gtp_net_exit+0x150/0x150 [gtp] [ 1010.716042] __rtnl_newlink+0x1063/0x1700 [ 1010.716051] ? rtnl_setlink+0x3c0/0x3c0 [ 1010.716063] ? is_bpf_text_address+0xc0/0x1f0 [ 1010.716070] ? kernel_text_address.part.0+0xbb/0xd0 [ 1010.716076] ? __kernel_text_address+0x56/0xa0 [ 1010.716084] ? unwind_get_return_address+0x5a/0xa0 [ 1010.716091] ? create_prof_cpu_mask+0x30/0x30 [ 1010.716098] ? arch_stack_walk+0x9e/0xf0 [ 1010.716106] ? stack_trace_save+0x91/0xd0 [ 1010.716113] ? stack_trace_consume_entry+0x170/0x170 [ 1010.716121] ? __lock_acquire+0x15c5/0x5380 [ 1010.716139] ? mark_held_locks+0x9e/0xe0 [ 1010.716148] ? kmem_cache_alloc_trace+0x35f/0x3c0 [ 1010.716155] ? __rtnl_newlink+0x1700/0x1700 [ 1010.716160] rtnl_newlink+0x69/0xa0 [ 1010.716166] rtnetlink_rcv_msg+0x43b/0xc50 [ 1010.716172] ? rtnl_fdb_dump+0x9f0/0x9f0 [ 1010.716179] ? lock_acquire+0x1fe/0x560 [ 1010.716188] ? netlink_deliver_tap+0x12f/0xd50 [ 1010.716196] netlink_rcv_skb+0x14d/0x440 [ 1010.716202] ? rtnl_fdb_dump+0x9f0/0x9f0 [ 1010.716208] ? netlink_ack+0xab0/0xab0 [ 1010.716213] ? netlink_deliver_tap+0x202/0xd50 [ 1010.716220] ? netlink_deliver_tap+0x218/0xd50 [ 1010.716226] ? __virt_addr_valid+0x30b/0x590 [ 1010.716233] netlink_unicast+0x54b/0x800 [ 1010.716240] ? netlink_attachskb+0x870/0x870 [ 1010.716248] ? __check_object_size+0x2de/0x3b0 [ 1010.716254] netlink_sendmsg+0x938/0xe40 [ 1010.716261] ? netlink_unicast+0x800/0x800 [ 1010.716269] ? __import_iovec+0x292/0x510 [ 1010.716276] ? netlink_unicast+0x800/0x800 [ 1010.716284] __sock_sendmsg+0x159/0x190 [ 1010.716290] ____sys_sendmsg+0x712/0x880 [ 1010.716297] ? sock_write_iter+0x3d0/0x3d0 [ 1010.716304] ? __ia32_sys_recvmmsg+0x270/0x270 [ 1010.716309] ? lock_acquire+0x1fe/0x560 [ 1010.716315] ? drain_array_locked+0x90/0x90 [ 1010.716324] ___sys_sendmsg+0xf8/0x170 [ 1010.716331] ? sendmsg_copy_msghdr+0x170/0x170 [ 1010.716337] ? lockdep_init_map_type+0x2c7/0x860 [ 1010.716343] ? lockdep_hardirqs_on_prepare+0x430/0x430 [ 1010.716350] ? debug_mutex_init+0x33/0x70 [ 1010.716360] ? percpu_counter_add_batch+0x8b/0x140 [ 1010.716367] ? lock_acquire+0x1fe/0x560 [ 1010.716373] ? find_held_lock+0x2c/0x110 [ 1010.716384] ? __fd_install+0x1b6/0x6f0 [ 1010.716389] ? lock_downgrade+0x810/0x810 [ 1010.716396] ? __fget_light+0x222/0x290 [ 1010.716403] __sys_sendmsg+0xea/0x1b0 [ 1010.716409] ? __sys_sendmsg_sock+0x40/0x40 [ 1010.716419] ? lockdep_hardirqs_on_prepare+0x2b3/0x430 [ 1010.716425] ? syscall_enter_from_user_mode+0x1d/0x60 [ 1010.716432] do_syscall_64+0x30/0x40 [ 1010.716438] entry_SYSCALL_64_after_hwframe+0x62/0xc7 [ 1010.716444] RIP: 0033:0x7fd1508cbd49 [ 1010.716452] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ef 70 0d 00 f7 d8 64 89 01 48 [ 1010.716456] RSP: 002b:00007fff18872348 EFLAGS: 00000202 ORIG_RAX: 000000000000002e [ 1010.716463] RAX: ffffffffffffffda RBX: 000055f72bf0eac0 RCX: 00007fd1508cbd49 [ 1010.716468] RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000006 [ 1010.716473] RBP: 00007fff18872360 R08: 00007fff18872360 R09: 00007fff18872360 [ 1010.716478] R10: 00007fff18872360 R11: 0000000000000202 R12: 000055f72bf0e1b0 [ 1010.716482] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1010.716491] Modules linked in: gtp(+) udp_tunnel ib_core uinput af_packet rfkill qrtr joydev hid_generic usbhid hid kvm_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm snd_hda_codec_generic ledtrig_audio irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel nls_utf8 snd_intel_dspcfg nls_cp866 psmouse aesni_intel vfat crypto_simd fat cryptd glue_helper snd_hda_codec pcspkr snd_hda_core i2c_i801 snd_hwdep i2c_smbus xhci_pci snd_pcm lpc_ich xhci_pci_renesas xhci_hcd qemu_fw_cfg tiny_power_button button sch_fq_codel vboxvideo drm_vram_helper drm_ttm_helper ttm vboxsf vboxguest snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device snd_timer snd soundcore msr fuse efi_pstore dm_mod ip_tables x_tables autofs4 virtio_gpu virtio_dma_buf drm_kms_helper cec rc_core drm virtio_rng virtio_scsi rng_core virtio_balloon virtio_blk virtio_net virtio_console net_failover failover ahci libahci libata evdev scsi_mod input_leds serio_raw virtio_pci intel_agp [ 1010.716674] virtio_ring intel_gtt virtio [last unloaded: gtp] [ 1010.716693] ---[ end trace 04990a4ce61e174b ]---
Cc: stable@vger.kernel.org Signed-off-by: Alexander Ofitserov oficerovas@altlinux.org Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20240228114703.465107-1-oficerovas@altlinux.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/gtp.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1903,26 +1903,26 @@ static int __init gtp_init(void)
get_random_bytes(>p_h_initval, sizeof(gtp_h_initval));
- err = rtnl_link_register(>p_link_ops); + err = register_pernet_subsys(>p_net_ops); if (err < 0) goto error_out;
- err = register_pernet_subsys(>p_net_ops); + err = rtnl_link_register(>p_link_ops); if (err < 0) - goto unreg_rtnl_link; + goto unreg_pernet_subsys;
err = genl_register_family(>p_genl_family); if (err < 0) - goto unreg_pernet_subsys; + goto unreg_rtnl_link;
pr_info("GTP module loaded (pdp ctx size %zd bytes)\n", sizeof(struct pdp_ctx)); return 0;
-unreg_pernet_subsys: - unregister_pernet_subsys(>p_net_ops); unreg_rtnl_link: rtnl_link_unregister(>p_link_ops); +unreg_pernet_subsys: + unregister_pernet_subsys(>p_net_ops); error_out: pr_err("error loading GTP module loaded\n"); return err;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nhat Pham nphamcs@gmail.com
commit 3a75cb05d53f4a6823a32deb078de1366954a804 upstream.
In cachestat, we access the folio from the page cache's xarray to compute its page offset, and check for its dirty and writeback flags. However, we do not hold a reference to the folio before performing these actions, which means the folio can concurrently be released and reused as another folio/page/slab.
Get around this altogether by just using xarray's existing machinery for the folio page offsets and dirty/writeback states.
This changes behavior for tmpfs files to now always report zeroes in their dirty and writeback counters. This is okay as tmpfs doesn't follow conventional writeback cache behavior: its pages get "cleaned" during swapout, after which they're no longer resident etc.
Link: https://lkml.kernel.org/r/20240220153409.GA216065@cmpxchg.org Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") Reported-by: Jann Horn jannh@google.com Suggested-by: Matthew Wilcox willy@infradead.org Signed-off-by: Nhat Pham nphamcs@gmail.com Signed-off-by: Johannes Weiner hannes@cmpxchg.org Tested-by: Jann Horn jannh@google.com Cc: stable@vger.kernel.org [6.4+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/filemap.c | 51 ++++++++++++++++++++++++++------------------------- 1 file changed, 26 insertions(+), 25 deletions(-)
--- a/mm/filemap.c +++ b/mm/filemap.c @@ -4108,28 +4108,40 @@ static void filemap_cachestat(struct add
rcu_read_lock(); xas_for_each(&xas, folio, last_index) { + int order; unsigned long nr_pages; pgoff_t folio_first_index, folio_last_index;
+ /* + * Don't deref the folio. It is not pinned, and might + * get freed (and reused) underneath us. + * + * We *could* pin it, but that would be expensive for + * what should be a fast and lightweight syscall. + * + * Instead, derive all information of interest from + * the rcu-protected xarray. + */ + if (xas_retry(&xas, folio)) continue;
+ order = xa_get_order(xas.xa, xas.xa_index); + nr_pages = 1 << order; + folio_first_index = round_down(xas.xa_index, 1 << order); + folio_last_index = folio_first_index + nr_pages - 1; + + /* Folios might straddle the range boundaries, only count covered pages */ + if (folio_first_index < first_index) + nr_pages -= first_index - folio_first_index; + + if (folio_last_index > last_index) + nr_pages -= folio_last_index - last_index; + if (xa_is_value(folio)) { /* page is evicted */ void *shadow = (void *)folio; bool workingset; /* not used */ - int order = xa_get_order(xas.xa, xas.xa_index); - - nr_pages = 1 << order; - folio_first_index = round_down(xas.xa_index, 1 << order); - folio_last_index = folio_first_index + nr_pages - 1; - - /* Folios might straddle the range boundaries, only count covered pages */ - if (folio_first_index < first_index) - nr_pages -= first_index - folio_first_index; - - if (folio_last_index > last_index) - nr_pages -= folio_last_index - last_index;
cs->nr_evicted += nr_pages;
@@ -4147,24 +4159,13 @@ static void filemap_cachestat(struct add goto resched; }
- nr_pages = folio_nr_pages(folio); - folio_first_index = folio_pgoff(folio); - folio_last_index = folio_first_index + nr_pages - 1; - - /* Folios might straddle the range boundaries, only count covered pages */ - if (folio_first_index < first_index) - nr_pages -= first_index - folio_first_index; - - if (folio_last_index > last_index) - nr_pages -= folio_last_index - last_index; - /* page is in cache */ cs->nr_cache += nr_pages;
- if (folio_test_dirty(folio)) + if (xas_get_mark(&xas, PAGECACHE_TAG_DIRTY)) cs->nr_dirty += nr_pages;
- if (folio_test_writeback(folio)) + if (xas_get_mark(&xas, PAGECACHE_TAG_WRITEBACK)) cs->nr_writeback += nr_pages;
resched:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Elad Nachman enachman@marvell.com
commit e6a30d0c48a1e8a68f1cc413bee65302ab03ddfb upstream.
The check in nand_base.c, nand_scan_tail() : has the following code: (ecc->steps * ecc->size != mtd->writesize) which fails for some NAND chips. Remove ECC entries in this driver which are not integral multiplications, and adjust the number of chunks for entries which fails the above calculation so it will calculate correctly (this was previously done automatically before the check and was removed in a later commit).
Fixes: 68c18dae6888 ("mtd: rawnand: marvell: add missing layouts") Cc: stable@vger.kernel.org Signed-off-by: Elad Nachman enachman@marvell.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/marvell_nand.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
--- a/drivers/mtd/nand/raw/marvell_nand.c +++ b/drivers/mtd/nand/raw/marvell_nand.c @@ -290,16 +290,13 @@ static const struct marvell_hw_ecc_layou MARVELL_LAYOUT( 2048, 512, 4, 1, 1, 2048, 32, 30, 0, 0, 0), MARVELL_LAYOUT( 2048, 512, 8, 2, 1, 1024, 0, 30,1024,32, 30), MARVELL_LAYOUT( 2048, 512, 8, 2, 1, 1024, 0, 30,1024,64, 30), - MARVELL_LAYOUT( 2048, 512, 12, 3, 2, 704, 0, 30,640, 0, 30), - MARVELL_LAYOUT( 2048, 512, 16, 5, 4, 512, 0, 30, 0, 32, 30), + MARVELL_LAYOUT( 2048, 512, 16, 4, 4, 512, 0, 30, 0, 32, 30), MARVELL_LAYOUT( 4096, 512, 4, 2, 2, 2048, 32, 30, 0, 0, 0), - MARVELL_LAYOUT( 4096, 512, 8, 5, 4, 1024, 0, 30, 0, 64, 30), - MARVELL_LAYOUT( 4096, 512, 12, 6, 5, 704, 0, 30,576, 32, 30), - MARVELL_LAYOUT( 4096, 512, 16, 9, 8, 512, 0, 30, 0, 32, 30), + MARVELL_LAYOUT( 4096, 512, 8, 4, 4, 1024, 0, 30, 0, 64, 30), + MARVELL_LAYOUT( 4096, 512, 16, 8, 8, 512, 0, 30, 0, 32, 30), MARVELL_LAYOUT( 8192, 512, 4, 4, 4, 2048, 0, 30, 0, 0, 0), - MARVELL_LAYOUT( 8192, 512, 8, 9, 8, 1024, 0, 30, 0, 160, 30), - MARVELL_LAYOUT( 8192, 512, 12, 12, 11, 704, 0, 30,448, 64, 30), - MARVELL_LAYOUT( 8192, 512, 16, 17, 16, 512, 0, 30, 0, 32, 30), + MARVELL_LAYOUT( 8192, 512, 8, 8, 8, 1024, 0, 30, 0, 160, 30), + MARVELL_LAYOUT( 8192, 512, 16, 16, 16, 512, 0, 30, 0, 32, 30), };
/**
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit f78c1375339a291cba492a70eaf12ec501d28a8e upstream.
It's currently possible to change the mesh ID when the interface isn't yet in mesh mode, at the same time as changing it into mesh mode. This leads to an overwrite of data in the wdev->u union for the interface type it currently has, causing cfg80211_change_iface() to do wrong things when switching.
We could probably allow setting an interface to mesh while setting the mesh ID at the same time by doing a different order of operations here, but realistically there's no userspace that's going to do this, so just disallow changes in iftype when setting mesh ID.
Cc: stable@vger.kernel.org Fixes: 29cbe68c516a ("cfg80211/mac80211: add mesh join/leave commands") Reported-by: syzbot+dd4779978217b1973180@syzkaller.appspotmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/nl80211.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -4185,6 +4185,8 @@ static int nl80211_set_interface(struct
if (ntype != NL80211_IFTYPE_MESH_POINT) return -EINVAL; + if (otype != NL80211_IFTYPE_MESH_POINT) + return -EINVAL; if (netif_running(dev)) return -EBUSY;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit e2b54eaf28df0c978626c9736b94f003b523b451 upstream.
When creating a snapshot we may do a double free of an anonymous device in case there's an error committing the transaction. The second free may result in freeing an anonymous device number that was allocated by some other subsystem in the kernel or another btrfs filesystem.
The steps that lead to this:
1) At ioctl.c:create_snapshot() we allocate an anonymous device number and assign it to pending_snapshot->anon_dev;
2) Then we call btrfs_commit_transaction() and end up at transaction.c:create_pending_snapshot();
3) There we call btrfs_get_new_fs_root() and pass it the anonymous device number stored in pending_snapshot->anon_dev;
4) btrfs_get_new_fs_root() frees that anonymous device number because btrfs_lookup_fs_root() returned a root - someone else did a lookup of the new root already, which could some task doing backref walking;
5) After that some error happens in the transaction commit path, and at ioctl.c:create_snapshot() we jump to the 'fail' label, and after that we free again the same anonymous device number, which in the meanwhile may have been reallocated somewhere else, because pending_snapshot->anon_dev still has the same value as in step 1.
Recently syzbot ran into this and reported the following trace:
------------[ cut here ]------------ ida_free called for id=51 which is not allocated. WARNING: CPU: 1 PID: 31038 at lib/idr.c:525 ida_free+0x370/0x420 lib/idr.c:525 Modules linked in: CPU: 1 PID: 31038 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00410-gc02197fc9076 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 RIP: 0010:ida_free+0x370/0x420 lib/idr.c:525 Code: 10 42 80 3c 28 (...) RSP: 0018:ffffc90015a67300 EFLAGS: 00010246 RAX: be5130472f5dd000 RBX: 0000000000000033 RCX: 0000000000040000 RDX: ffffc90009a7a000 RSI: 000000000003ffff RDI: 0000000000040000 RBP: ffffc90015a673f0 R08: ffffffff81577992 R09: 1ffff92002b4cdb4 R10: dffffc0000000000 R11: fffff52002b4cdb5 R12: 0000000000000246 R13: dffffc0000000000 R14: ffffffff8e256b80 R15: 0000000000000246 FS: 00007fca3f4b46c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f167a17b978 CR3: 000000001ed26000 CR4: 0000000000350ef0 Call Trace: <TASK> btrfs_get_root_ref+0xa48/0xaf0 fs/btrfs/disk-io.c:1346 create_pending_snapshot+0xff2/0x2bc0 fs/btrfs/transaction.c:1837 create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1931 btrfs_commit_transaction+0xf1c/0x3740 fs/btrfs/transaction.c:2404 create_snapshot+0x507/0x880 fs/btrfs/ioctl.c:848 btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:998 btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1044 __btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1306 btrfs_ioctl_snap_create_v2+0x1ca/0x400 fs/btrfs/ioctl.c:1393 btrfs_ioctl+0xa74/0xd40 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0xfe/0x170 fs/ioctl.c:857 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7fca3e67dda9 Code: 28 00 00 00 (...) RSP: 002b:00007fca3f4b40c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fca3e7abf80 RCX: 00007fca3e67dda9 RDX: 00000000200005c0 RSI: 0000000050009417 RDI: 0000000000000003 RBP: 00007fca3e6ca47a R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fca3e7abf80 R15: 00007fff6bf95658 </TASK>
Where we get an explicit message where we attempt to free an anonymous device number that is not currently allocated. It happens in a different code path from the example below, at btrfs_get_root_ref(), so this change may not fix the case triggered by syzbot.
To fix at least the code path from the example above, change btrfs_get_root_ref() and its callers to receive a dev_t pointer argument for the anonymous device number, so that in case it frees the number, it also resets it to 0, so that up in the call chain we don't attempt to do the double free.
CC: stable@vger.kernel.org # 5.10+ Link: https://lore.kernel.org/linux-btrfs/000000000000f673a1061202f630@google.com/ Fixes: e03ee2fe873e ("btrfs: do not ASSERT() if the newly created subvolume already got read") Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/disk-io.c | 22 +++++++++++----------- fs/btrfs/disk-io.h | 2 +- fs/btrfs/ioctl.c | 2 +- fs/btrfs/transaction.c | 2 +- 4 files changed, 14 insertions(+), 14 deletions(-)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1286,12 +1286,12 @@ void btrfs_free_fs_info(struct btrfs_fs_ * * @objectid: root id * @anon_dev: preallocated anonymous block device number for new roots, - * pass 0 for new allocation. + * pass NULL for a new allocation. * @check_ref: whether to check root item references, If true, return -ENOENT * for orphan roots */ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info, - u64 objectid, dev_t anon_dev, + u64 objectid, dev_t *anon_dev, bool check_ref) { struct btrfs_root *root; @@ -1321,9 +1321,9 @@ again: * that common but still possible. In that case, we just need * to free the anon_dev. */ - if (unlikely(anon_dev)) { - free_anon_bdev(anon_dev); - anon_dev = 0; + if (unlikely(anon_dev && *anon_dev)) { + free_anon_bdev(*anon_dev); + *anon_dev = 0; }
if (check_ref && btrfs_root_refs(&root->root_item) == 0) { @@ -1345,7 +1345,7 @@ again: goto fail; }
- ret = btrfs_init_fs_root(root, anon_dev); + ret = btrfs_init_fs_root(root, anon_dev ? *anon_dev : 0); if (ret) goto fail;
@@ -1381,7 +1381,7 @@ fail: * root's anon_dev to 0 to avoid a double free, once by btrfs_put_root() * and once again by our caller. */ - if (anon_dev) + if (anon_dev && *anon_dev) root->anon_dev = 0; btrfs_put_root(root); return ERR_PTR(ret); @@ -1397,7 +1397,7 @@ fail: struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, u64 objectid, bool check_ref) { - return btrfs_get_root_ref(fs_info, objectid, 0, check_ref); + return btrfs_get_root_ref(fs_info, objectid, NULL, check_ref); }
/* @@ -1405,11 +1405,11 @@ struct btrfs_root *btrfs_get_fs_root(str * the anonymous block device id * * @objectid: tree objectid - * @anon_dev: if zero, allocate a new anonymous block device or use the - * parameter value + * @anon_dev: if NULL, allocate a new anonymous block device or use the + * parameter value if not NULL */ struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, - u64 objectid, dev_t anon_dev) + u64 objectid, dev_t *anon_dev) { return btrfs_get_root_ref(fs_info, objectid, anon_dev, true); } --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -64,7 +64,7 @@ void btrfs_free_fs_roots(struct btrfs_fs struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, u64 objectid, bool check_ref); struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, - u64 objectid, dev_t anon_dev); + u64 objectid, dev_t *anon_dev); struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info, struct btrfs_path *path, u64 objectid); --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -721,7 +721,7 @@ static noinline int create_subvol(struct free_extent_buffer(leaf); leaf = NULL;
- new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev); + new_root = btrfs_get_new_fs_root(fs_info, objectid, &anon_dev); if (IS_ERR(new_root)) { ret = PTR_ERR(new_root); btrfs_abort_transaction(trans, ret); --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1834,7 +1834,7 @@ static noinline int create_pending_snaps }
key.offset = (u64)-1; - pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev); + pending->snap = btrfs_get_new_fs_root(fs_info, objectid, &pending->anon_dev); if (IS_ERR(pending->snap)) { ret = PTR_ERR(pending->snap); pending->snap = NULL;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
commit 9845664b9ee47ce7ee7ea93caf47d39a9d4552c4 upstream.
There's a syzbot report that device name buffers passed to device replace are not properly checked for string termination which could lead to a read out of bounds in getname_kernel().
Add a helper that validates both source and target device name buffers. For devid as the source initialize the buffer to empty string in case something tries to read it later.
This was originally analyzed and fixed in a different way by Edward Adam Davis (see links).
Link: https://lore.kernel.org/linux-btrfs/000000000000d1a1d1060cc9c5e7@google.com/ Link: https://lore.kernel.org/linux-btrfs/tencent_44CA0665C9836EF9EEC80CB9E7E206DF... CC: stable@vger.kernel.org # 4.19+ CC: Edward Adam Davis eadavis@qq.com Reported-and-tested-by: syzbot+33f23b49ac24f986c9e8@syzkaller.appspotmail.com Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/dev-replace.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-)
--- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -727,6 +727,23 @@ leave: return ret; }
+static int btrfs_check_replace_dev_names(struct btrfs_ioctl_dev_replace_args *args) +{ + if (args->start.srcdevid == 0) { + if (memchr(args->start.srcdev_name, 0, + sizeof(args->start.srcdev_name)) == NULL) + return -ENAMETOOLONG; + } else { + args->start.srcdev_name[0] = 0; + } + + if (memchr(args->start.tgtdev_name, 0, + sizeof(args->start.tgtdev_name)) == NULL) + return -ENAMETOOLONG; + + return 0; +} + int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info, struct btrfs_ioctl_dev_replace_args *args) { @@ -739,10 +756,9 @@ int btrfs_dev_replace_by_ioctl(struct bt default: return -EINVAL; } - - if ((args->start.srcdevid == 0 && args->start.srcdev_name[0] == '\0') || - args->start.tgtdev_name[0] == '\0') - return -EINVAL; + ret = btrfs_check_replace_dev_names(args); + if (ret < 0) + return ret;
ret = btrfs_dev_replace_start(fs_info, args->start.tgtdev_name, args->start.srcdevid,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 5897710b28cabab04ea6c7547f27b7989de646ae upstream.
If we have a sparse file with a trailing hole (from the last extent's end to i_size) and then create an extent in the file that ends before the file's i_size, then when doing an incremental send we will issue a write full of zeroes for the range that starts immediately after the new extent ends up to i_size. While this isn't incorrect because the file ends up with exactly the same data, it unnecessarily results in using extra space at the destination with one or more extents full of zeroes instead of having a hole. In same cases this results in using megabytes or even gigabytes of unnecessary space.
Example, reproducer:
$ cat test.sh #!/bin/bash
DEV=/dev/sdh MNT=/mnt/sdh
mkfs.btrfs -f $DEV mount $DEV $MNT
# Create 1G sparse file. xfs_io -f -c "truncate 1G" $MNT/foobar
# Create base snapshot. btrfs subvolume snapshot -r $MNT $MNT/mysnap1
# Create send stream (full send) for the base snapshot. btrfs send -f /tmp/1.snap $MNT/mysnap1
# Now write one extent at the beginning of the file and one somewhere # in the middle, leaving a gap between the end of this second extent # and the file's size. xfs_io -c "pwrite -S 0xab 0 128K" \ -c "pwrite -S 0xcd 512M 128K" \ $MNT/foobar
# Now create a second snapshot which is going to be used for an # incremental send operation. btrfs subvolume snapshot -r $MNT $MNT/mysnap2
# Create send stream (incremental send) for the second snapshot. btrfs send -p $MNT/mysnap1 -f /tmp/2.snap $MNT/mysnap2
# Now recreate the filesystem by receiving both send streams and # verify we get the same content that the original filesystem had # and file foobar has only two extents with a size of 128K each. umount $MNT mkfs.btrfs -f $DEV mount $DEV $MNT
btrfs receive -f /tmp/1.snap $MNT btrfs receive -f /tmp/2.snap $MNT
echo -e "\nFile fiemap in the second snapshot:" # Should have: # # 128K extent at file range [0, 128K[ # hole at file range [128K, 512M[ # 128K extent file range [512M, 512M + 128K[ # hole at file range [512M + 128K, 1G[ xfs_io -r -c "fiemap -v" $MNT/mysnap2/foobar
# File should be using 256K of data (two 128K extents). echo -e "\nSpace used by the file: $(du -h $MNT/mysnap2/foobar | cut -f 1)"
umount $MNT
Running the test, we can see with fiemap that we get an extent for the range [512M, 1G[, while in the source filesystem we have an extent for the range [512M, 512M + 128K[ and a hole for the rest of the file (the range [512M + 128K, 1G[):
$ ./test.sh (...) File fiemap in the second snapshot: /mnt/sdh/mysnap2/foobar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..255]: 26624..26879 256 0x0 1: [256..1048575]: hole 1048320 2: [1048576..2097151]: 2156544..3205119 1048576 0x1
Space used by the file: 513M
This happens because once we finish processing an inode, at finish_inode_if_needed(), we always issue a hole (write operations full of zeros) if there's a gap between the end of the last processed extent and the file's size, even if that range is already a hole in the parent snapshot. Fix this by issuing the hole only if the range is not already a hole.
After this change, running the test above, we get the expected layout:
$ ./test.sh (...) File fiemap in the second snapshot: /mnt/sdh/mysnap2/foobar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..255]: 26624..26879 256 0x0 1: [256..1048575]: hole 1048320 2: [1048576..1048831]: 26880..27135 256 0x1 3: [1048832..2097151]: hole 1048320
Space used by the file: 256K
A test case for fstests will follow soon.
CC: stable@vger.kernel.org # 6.1+ Reported-by: Dorai Ashok S A dash.btrfs@inix.me Link: https://lore.kernel.org/linux-btrfs/c0bf7818-9c45-46a8-b3d3-513230d0c86e@ini... Reviewed-by: Sweet Tea Dorminy sweettea-kernel@dorminy.me Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/send.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
--- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -6705,11 +6705,20 @@ static int finish_inode_if_needed(struct if (ret) goto out; } - if (sctx->cur_inode_last_extent < - sctx->cur_inode_size) { - ret = send_hole(sctx, sctx->cur_inode_size); - if (ret) + if (sctx->cur_inode_last_extent < sctx->cur_inode_size) { + ret = range_is_hole_in_parent(sctx, + sctx->cur_inode_last_extent, + sctx->cur_inode_size); + if (ret < 0) { goto out; + } else if (ret == 0) { + ret = send_hole(sctx, sctx->cur_inode_size); + if (ret < 0) + goto out; + } else { + /* Range is already a hole, skip. */ + ret = 0; + } } } if (need_truncate) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit 955558030954b9637b41c97b730f9b38c92ac488 upstream.
This reverts commit e490d60a2f76bff636c68ce4fe34c1b6c34bbd86.
This causes hangs on SI when DC is enabled and errors on driver reboot and power off cycles.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3216 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2755 Reviewed-by: Yang Wang kevinyang.wang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c @@ -6925,6 +6925,23 @@ static int si_dpm_enable(struct amdgpu_d return 0; }
+static int si_set_temperature_range(struct amdgpu_device *adev) +{ + int ret; + + ret = si_thermal_enable_alert(adev, false); + if (ret) + return ret; + ret = si_thermal_set_temperature_range(adev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX); + if (ret) + return ret; + ret = si_thermal_enable_alert(adev, true); + if (ret) + return ret; + + return ret; +} + static void si_dpm_disable(struct amdgpu_device *adev) { struct rv7xx_power_info *pi = rv770_get_pi(adev); @@ -7608,6 +7625,18 @@ static int si_dpm_process_interrupt(stru
static int si_dpm_late_init(void *handle) { + int ret; + struct amdgpu_device *adev = (struct amdgpu_device *)handle; + + if (!adev->pm.dpm_enabled) + return 0; + + ret = si_set_temperature_range(adev); + if (ret) + return ret; +#if 0 //TODO ? + si_dpm_powergate_uvd(adev, true); +#endif return 0; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Auld matthew.auld@intel.com
commit f41900e4a6ef019d64a70394b0e0c3bd048d4ec8 upstream.
There is a corner case here where start/end is after/before the block range we are currently checking. If so we need to be sure that splitting the block will eventually give use the block size we need. To do that we should adjust the block range to account for the start/end, and only continue with the split if the size/alignment will fit the requested size. Not doing so can result in leaving split blocks unmerged when it eventually fails.
Fixes: afea229fe102 ("drm: improve drm_buddy_alloc function") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com Cc: Christian König christian.koenig@amd.com Cc: stable@vger.kernel.org # v5.18+ Reviewed-by: Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20240219121851.25774-4-matthew... Signed-off-by: Christian König christian.koenig@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_buddy.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -332,6 +332,7 @@ alloc_range_bias(struct drm_buddy *mm, u64 start, u64 end, unsigned int order) { + u64 req_size = mm->chunk_size << order; struct drm_buddy_block *block; struct drm_buddy_block *buddy; LIST_HEAD(dfs); @@ -367,6 +368,15 @@ alloc_range_bias(struct drm_buddy *mm, if (drm_buddy_block_is_allocated(block)) continue;
+ if (block_start < start || block_end > end) { + u64 adjusted_start = max(block_start, start); + u64 adjusted_end = min(block_end, end); + + if (round_down(adjusted_end + 1, req_size) <= + round_up(adjusted_start, req_size)) + continue; + } + if (contains(start, end, block_start, block_end) && order == drm_buddy_block_order(block)) { /*
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ma Jun Jun.Ma2@amd.com
commit 7968e9748fbbd7ae49770d9f8a8231d8bce2aebb upstream.
It's unreasonable to use 0 as the power1_min_cap when OD is disabled. So, use the same lower limit as the value used when OD is enabled.
Fixes: 1958946858a6 ("drm/amd/pm: Support for getting power1_cap_min value") Signed-off-by: Ma Jun Jun.Ma2@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Acked-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 9 ++++----- drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 9 ++++----- drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 9 ++++----- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 9 ++++----- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 9 ++++----- 5 files changed, 20 insertions(+), 25 deletions(-)
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c @@ -1303,13 +1303,12 @@ static int arcturus_get_power_limit(stru if (default_power_limit) *default_power_limit = power_limit;
- if (smu->od_enabled) { + if (smu->od_enabled) od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]); - od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]); - } else { + else od_percent_upper = 0; - od_percent_lower = 100; - } + + od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n", od_percent_upper, od_percent_lower, power_limit); --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c @@ -2357,13 +2357,12 @@ static int navi10_get_power_limit(struct *default_power_limit = power_limit;
if (smu->od_enabled && - navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_POWER_LIMIT)) { + navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_POWER_LIMIT)) od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]); - od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]); - } else { + else od_percent_upper = 0; - od_percent_lower = 100; - } + + od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n", od_percent_upper, od_percent_lower, power_limit); --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c @@ -640,13 +640,12 @@ static int sienna_cichlid_get_power_limi if (default_power_limit) *default_power_limit = power_limit;
- if (smu->od_enabled) { + if (smu->od_enabled) od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]); - od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]); - } else { + else od_percent_upper = 0; - od_percent_lower = 100; - } + + od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n", od_percent_upper, od_percent_lower, power_limit); --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c @@ -2364,13 +2364,12 @@ static int smu_v13_0_0_get_power_limit(s if (default_power_limit) *default_power_limit = power_limit;
- if (smu->od_enabled) { + if (smu->od_enabled) od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]); - od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]); - } else { + else od_percent_upper = 0; - od_percent_lower = 100; - } + + od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n", od_percent_upper, od_percent_lower, power_limit); --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c @@ -2328,13 +2328,12 @@ static int smu_v13_0_7_get_power_limit(s if (default_power_limit) *default_power_limit = power_limit;
- if (smu->od_enabled) { + if (smu->od_enabled) od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]); - od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]); - } else { + else od_percent_upper = 0; - od_percent_lower = 100; - } + + od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n", od_percent_upper, od_percent_lower, power_limit);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryan Lin tsung-hua.lin@amd.com
commit b7cdccc6a849568775f738b1e233f751a8fed013 upstream.
[WHY] Some eDP panels' ext caps don't write initial values. The value of dpcd_addr (0x317) can be random and the backlight control interface will be incorrect.
[HOW] Add new panel patches to remove sink ext caps.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.5.x Cc: Tsung-hua Lin tsung-hua.lin@amd.com Cc: Chris Chi moukong.chi@amd.com Reviewed-by: Wayne Lin wayne.lin@amd.com Acked-by: Alex Hung alex.hung@amd.com Signed-off-by: Ryan Lin tsung-hua.lin@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c @@ -66,6 +66,8 @@ static void apply_edid_quirks(struct edi /* Workaround for some monitors that do not clear DPCD 0x317 if FreeSync is unsupported */ case drm_edid_encode_panel_id('A', 'U', 'O', 0xA7AB): case drm_edid_encode_panel_id('A', 'U', 'O', 0xE69B): + case drm_edid_encode_panel_id('B', 'O', 'E', 0x092A): + case drm_edid_encode_panel_id('L', 'G', 'D', 0x06D1): DRM_DEBUG_DRIVER("Clearing DPCD 0x317 on monitor with panel id %X\n", panel_id); edid_caps->panel_patch.remove_sink_ext_caps = true; break; @@ -119,6 +121,8 @@ enum dc_edid_status dm_helpers_parse_edi
edid_caps->edid_hdmi = connector->display_info.is_hdmi;
+ apply_edid_quirks(edid_buf, edid_caps); + sad_count = drm_edid_to_sad((struct edid *) edid->raw_edid, &sads); if (sad_count <= 0) return result; @@ -145,8 +149,6 @@ enum dc_edid_status dm_helpers_parse_edi else edid_caps->speaker_flags = DEFAULT_SPEAKER_LOCATION;
- apply_edid_quirks(edid_buf, edid_caps); - kfree(sads); kfree(sadb);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
commit f79ee78767ca60e7a2c89eacd2dbdf237d97e838 upstream.
We need to bail out before adding/removing devices if we are going to -EPROBE_DEFER. Otherwise boot can get stuck in a probe deferral loop due to a long-standing issue in driver core (see commit fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")).
Deregistering the altmode child device can potentially also trigger bugs in the DRM bridge implementation, which does not expect bridges to go away.
[DB: slightly fixed commit message by adding the word 'commit'] Suggested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Rob Clark robdclark@chromium.org Link: https://lore.kernel.org/r/20231213210644.8702-1-robdclark@gmail.com [ johan: rebase on 6.8-rc4, amend commit message and mention DRM ] Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org # 6.3 Cc: Bjorn Andersson andersson@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Bjorn Andersson andersson@kernel.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20240217150228.5788-5-johan+li... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/qcom/pmic_glink.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
--- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -268,10 +268,17 @@ static int pmic_glink_probe(struct platf else pg->client_mask = PMIC_GLINK_CLIENT_DEFAULT;
+ pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg); + if (IS_ERR(pg->pdr)) { + ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr), + "failed to initialize pdr\n"); + return ret; + } + if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI)) { ret = pmic_glink_add_aux_device(pg, &pg->ucsi_aux, "ucsi"); if (ret) - return ret; + goto out_release_pdr_handle; } if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_ALTMODE)) { ret = pmic_glink_add_aux_device(pg, &pg->altmode_aux, "altmode"); @@ -284,17 +291,11 @@ static int pmic_glink_probe(struct platf goto out_release_altmode_aux; }
- pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg); - if (IS_ERR(pg->pdr)) { - ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr), "failed to initialize pdr\n"); - goto out_release_aux_devices; - } - service = pdr_add_lookup(pg->pdr, "tms/servreg", "msm/adsp/charger_pd"); if (IS_ERR(service)) { ret = dev_err_probe(&pdev->dev, PTR_ERR(service), "failed adding pdr lookup for charger_pd\n"); - goto out_release_pdr_handle; + goto out_release_aux_devices; }
mutex_lock(&__pmic_glink_lock); @@ -303,8 +304,6 @@ static int pmic_glink_probe(struct platf
return 0;
-out_release_pdr_handle: - pdr_handle_release(pg->pdr); out_release_aux_devices: if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_BATT)) pmic_glink_del_aux_device(pg, &pg->ps_aux); @@ -314,6 +313,8 @@ out_release_altmode_aux: out_release_ucsi_aux: if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI)) pmic_glink_del_aux_device(pg, &pg->ucsi_aux); +out_release_pdr_handle: + pdr_handle_release(pg->pdr);
return ret; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peng Ma peng.ma@nxp.com
commit 9d739bccf261dd93ec1babf82f5c5d71dd4caa3e upstream.
There is chip (ls1028a) errata:
The SoC may hang on 16 byte unaligned read transactions by QDMA.
Unaligned read transactions initiated by QDMA may stall in the NOC (Network On-Chip), causing a deadlock condition. Stalled transactions will trigger completion timeouts in PCIe controller.
Workaround: Enable prefetch by setting the source descriptor prefetchable bit ( SD[PF] = 1 ).
Implement this workaround.
Cc: stable@vger.kernel.org Fixes: b092529e0aa0 ("dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs") Signed-off-by: Peng Ma peng.ma@nxp.com Signed-off-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20240201215007.439503-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/fsl-qdma.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/dma/fsl-qdma.c +++ b/drivers/dma/fsl-qdma.c @@ -109,6 +109,7 @@ #define FSL_QDMA_CMD_WTHROTL_OFFSET 20 #define FSL_QDMA_CMD_DSEN_OFFSET 19 #define FSL_QDMA_CMD_LWC_OFFSET 16 +#define FSL_QDMA_CMD_PF BIT(17)
/* Field definition for Descriptor status */ #define QDMA_CCDF_STATUS_RTE BIT(5) @@ -384,7 +385,8 @@ static void fsl_qdma_comp_fill_memcpy(st qdma_csgf_set_f(csgf_dest, len); /* Descriptor Buffer */ cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE << - FSL_QDMA_CMD_RWTTYPE_OFFSET); + FSL_QDMA_CMD_RWTTYPE_OFFSET) | + FSL_QDMA_CMD_PF; sdf->data = QDMA_SDDF_CMD(cmd);
cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE <<
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ardb@kernel.org
commit 1c0cf6d19690141002889d72622b90fc01562ce4 upstream.
The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with.
It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code.
The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs.
Fixes: fc074e130051 ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk") Cc: stable@vger.kernel.org Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Reviewed-by: Eric Biggers ebiggers@google.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/crypto/aes-neonbs-glue.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -227,8 +227,19 @@ static int ctr_encrypt(struct skcipher_r src += blocks * AES_BLOCK_SIZE; } if (nbytes && walk.nbytes == walk.total) { + u8 buf[AES_BLOCK_SIZE]; + u8 *d = dst; + + if (unlikely(nbytes < AES_BLOCK_SIZE)) + src = dst = memcpy(buf + sizeof(buf) - nbytes, + src, nbytes); + neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds, nbytes, walk.iv); + + if (unlikely(nbytes < AES_BLOCK_SIZE)) + memcpy(d, dst, nbytes); + nbytes = 0; } kernel_neon_end();
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tadeusz Struk tstruk@gigaio.com
commit df2515a17914ecfc2a0594509deaf7fcb8d191ac upstream.
The PTDMA driver sets DMA masks in two different places for the same device inconsistently. First call is in pt_pci_probe(), where it uses 48bit mask. The second call is in pt_dmaengine_register(), where it uses a 64bit mask. Using 64bit dma mask causes IO_PAGE_FAULT errors on DMA transfers between main memory and other devices. Without the extra call it works fine. Additionally the second call doesn't check the return value so it can silently fail. Remove the superfluous dma_set_mask() call and only use 48bit mask.
Cc: stable@vger.kernel.org Fixes: b0b4a6b10577 ("dmaengine: ptdma: register PTDMA controller as a DMA resource") Reviewed-by: Basavaraj Natikar Basavaraj.Natikar@amd.com Signed-off-by: Tadeusz Struk tstruk@gigaio.com Link: https://lore.kernel.org/r/20240222163053.13842-1-tstruk@gigaio.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/ptdma/ptdma-dmaengine.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/dma/ptdma/ptdma-dmaengine.c +++ b/drivers/dma/ptdma/ptdma-dmaengine.c @@ -385,8 +385,6 @@ int pt_dmaengine_register(struct pt_devi chan->vc.desc_free = pt_do_cleanup; vchan_init(&chan->vc, dma_dev);
- dma_set_mask_and_coherent(pt->dev, DMA_BIT_MASK(64)); - ret = dma_async_device_register(dma_dev); if (ret) goto err_reg;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joy Zou joy.zou@nxp.com
commit 9ba17defd9edd87970b701085402bc8ecc3a11d4 upstream.
The 'nbytes' should be equivalent to burst * width in audio multi-fifo setups. Given that the FIFO width is fixed at 32 bits, adjusts the burst size for multi-fifo configurations to match the slave maxburst in the configuration.
Cc: stable@vger.kernel.org Fixes: 72f5801a4e2b ("dmaengine: fsl-edma: integrate v3 support") Signed-off-by: Joy Zou joy.zou@nxp.com Signed-off-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20240131163318.360315-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/fsl-edma-common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/dma/fsl-edma-common.c +++ b/drivers/dma/fsl-edma-common.c @@ -503,7 +503,7 @@ void fsl_edma_fill_tcd(struct fsl_edma_c if (fsl_chan->is_multi_fifo) { /* set mloff to support multiple fifo */ burst = cfg->direction == DMA_DEV_TO_MEM ? - cfg->src_addr_width : cfg->dst_addr_width; + cfg->src_maxburst : cfg->dst_maxburst; nbytes |= EDMA_V3_TCD_NBYTES_MLOFF(-(burst * 4)); /* enable DMLOE/SMLOE */ if (cfg->direction == DMA_MEM_TO_DEV) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Curtis Klein curtis.klein@hpe.com
commit 87a39071e0b639f45e05d296cc0538eef44ec0bd upstream.
Initialize the qDMA irqs after the registers are configured so that interrupts that may have been pending from a primary kernel don't get processed by the irq handler before it is ready to and cause panic with the following trace:
Call trace: fsl_qdma_queue_handler+0xf8/0x3e8 __handle_irq_event_percpu+0x78/0x2b0 handle_irq_event_percpu+0x1c/0x68 handle_irq_event+0x44/0x78 handle_fasteoi_irq+0xc8/0x178 generic_handle_irq+0x24/0x38 __handle_domain_irq+0x90/0x100 gic_handle_irq+0x5c/0xb8 el1_irq+0xb8/0x180 _raw_spin_unlock_irqrestore+0x14/0x40 __setup_irq+0x4bc/0x798 request_threaded_irq+0xd8/0x190 devm_request_threaded_irq+0x74/0xe8 fsl_qdma_probe+0x4d4/0xca8 platform_drv_probe+0x50/0xa0 really_probe+0xe0/0x3f8 driver_probe_device+0x64/0x130 device_driver_attach+0x6c/0x78 __driver_attach+0xbc/0x158 bus_for_each_dev+0x5c/0x98 driver_attach+0x20/0x28 bus_add_driver+0x158/0x220 driver_register+0x60/0x110 __platform_driver_register+0x44/0x50 fsl_qdma_driver_init+0x18/0x20 do_one_initcall+0x48/0x258 kernel_init_freeable+0x1a4/0x23c kernel_init+0x10/0xf8 ret_from_fork+0x10/0x18
Cc: stable@vger.kernel.org Fixes: b092529e0aa0 ("dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs") Signed-off-by: Curtis Klein curtis.klein@hpe.com Signed-off-by: Yi Zhao yi.zhao@nxp.com Signed-off-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20240201220406.440145-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/fsl-qdma.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-)
--- a/drivers/dma/fsl-qdma.c +++ b/drivers/dma/fsl-qdma.c @@ -1199,10 +1199,6 @@ static int fsl_qdma_probe(struct platfor if (!fsl_qdma->queue) return -ENOMEM;
- ret = fsl_qdma_irq_init(pdev, fsl_qdma); - if (ret) - return ret; - fsl_qdma->irq_base = platform_get_irq_byname(pdev, "qdma-queue0"); if (fsl_qdma->irq_base < 0) return fsl_qdma->irq_base; @@ -1241,16 +1237,19 @@ static int fsl_qdma_probe(struct platfor
platform_set_drvdata(pdev, fsl_qdma);
- ret = dma_async_device_register(&fsl_qdma->dma_dev); + ret = fsl_qdma_reg_init(fsl_qdma); if (ret) { - dev_err(&pdev->dev, - "Can't register NXP Layerscape qDMA engine.\n"); + dev_err(&pdev->dev, "Can't Initialize the qDMA engine.\n"); return ret; }
- ret = fsl_qdma_reg_init(fsl_qdma); + ret = fsl_qdma_irq_init(pdev, fsl_qdma); + if (ret) + return ret; + + ret = dma_async_device_register(&fsl_qdma->dma_dev); if (ret) { - dev_err(&pdev->dev, "Can't Initialize the qDMA engine.\n"); + dev_err(&pdev->dev, "Can't register NXP Layerscape qDMA engine.\n"); return ret; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Kerello christophe.kerello@foss.st.com
commit 6b1ba3f9040be5efc4396d86c9752cdc564730be upstream.
Turning on CONFIG_DMA_API_DEBUG_SG results in the following warning:
DMA-API: mmci-pl18x 48220000.mmc: cacheline tracking EEXIST, overlapping mappings aren't supported WARNING: CPU: 1 PID: 51 at kernel/dma/debug.c:568 add_dma_entry+0x234/0x2f4 Modules linked in: CPU: 1 PID: 51 Comm: kworker/1:2 Not tainted 6.1.28 #1 Hardware name: STMicroelectronics STM32MP257F-EV1 Evaluation Board (DT) Workqueue: events_freezable mmc_rescan Call trace: add_dma_entry+0x234/0x2f4 debug_dma_map_sg+0x198/0x350 __dma_map_sg_attrs+0xa0/0x110 dma_map_sg_attrs+0x10/0x2c sdmmc_idma_prep_data+0x80/0xc0 mmci_prep_data+0x38/0x84 mmci_start_data+0x108/0x2dc mmci_request+0xe4/0x190 __mmc_start_request+0x68/0x140 mmc_start_request+0x94/0xc0 mmc_wait_for_req+0x70/0x100 mmc_send_tuning+0x108/0x1ac sdmmc_execute_tuning+0x14c/0x210 mmc_execute_tuning+0x48/0xec mmc_sd_init_uhs_card.part.0+0x208/0x464 mmc_sd_init_card+0x318/0x89c mmc_attach_sd+0xe4/0x180 mmc_rescan+0x244/0x320
DMA API debug brings to light leaking dma-mappings as dma_map_sg and dma_unmap_sg are not correctly balanced.
If an error occurs in mmci_cmd_irq function, only mmci_dma_error function is called and as this API is not managed on stm32 variant, dma_unmap_sg is never called in this error path.
Signed-off-by: Christophe Kerello christophe.kerello@foss.st.com Fixes: 46b723dd867d ("mmc: mmci: add stm32 sdmmc variant") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240207143951.938144-1-christophe.kerello@foss.st... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/mmci_stm32_sdmmc.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
--- a/drivers/mmc/host/mmci_stm32_sdmmc.c +++ b/drivers/mmc/host/mmci_stm32_sdmmc.c @@ -225,6 +225,8 @@ static int sdmmc_idma_start(struct mmci_ struct scatterlist *sg; int i;
+ host->dma_in_progress = true; + if (!host->variant->dma_lli || data->sg_len == 1 || idma->use_bounce_buffer) { u32 dma_addr; @@ -263,9 +265,30 @@ static int sdmmc_idma_start(struct mmci_ return 0; }
+static void sdmmc_idma_error(struct mmci_host *host) +{ + struct mmc_data *data = host->data; + struct sdmmc_idma *idma = host->dma_priv; + + if (!dma_inprogress(host)) + return; + + writel_relaxed(0, host->base + MMCI_STM32_IDMACTRLR); + host->dma_in_progress = false; + data->host_cookie = 0; + + if (!idma->use_bounce_buffer) + dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len, + mmc_get_dma_dir(data)); +} + static void sdmmc_idma_finalize(struct mmci_host *host, struct mmc_data *data) { + if (!dma_inprogress(host)) + return; + writel_relaxed(0, host->base + MMCI_STM32_IDMACTRLR); + host->dma_in_progress = false;
if (!data->host_cookie) sdmmc_idma_unprep_data(host, data, 0); @@ -676,6 +699,7 @@ static struct mmci_host_ops sdmmc_varian .dma_setup = sdmmc_idma_setup, .dma_start = sdmmc_idma_start, .dma_finalize = sdmmc_idma_finalize, + .dma_error = sdmmc_idma_error, .set_clkreg = mmci_sdmmc_set_clkreg, .set_pwrreg = mmci_sdmmc_set_pwrreg, .busy_complete = sdmmc_busy_complete,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Semenov ivan@semenov.dev
commit ff3206d2186d84e4f77e1378ba1d225633f17b9b upstream.
Initializing an eMMC that's connected via a 1-bit bus is current failing, if the HW (DT) informs that 4-bit bus is supported. In fact this is a regression, as we were earlier capable of falling back to 1-bit mode, when switching to 4/8-bit bus failed. Therefore, let's restore the behaviour.
Log for Samsung eMMC 5.1 chip connected via 1bit bus (only D0 pin) Before patch: [134509.044225] mmc0: switch to bus width 4 failed [134509.044509] mmc0: new high speed MMC card at address 0001 [134509.054594] mmcblk0: mmc0:0001 BGUF4R 29.1 GiB [134509.281602] mmc0: switch to bus width 4 failed [134509.282638] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.282657] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.284598] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.284602] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.284609] ldm_validate_partition_table(): Disk read failed. [134509.286495] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.286500] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.288303] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.288308] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.289540] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.289544] Buffer I/O error on dev mmcblk0, logical block 0, async page read [134509.289553] mmcblk0: unable to read partition table [134509.289728] mmcblk0boot0: mmc0:0001 BGUF4R 31.9 MiB [134509.290283] mmcblk0boot1: mmc0:0001 BGUF4R 31.9 MiB [134509.294577] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 [134509.295835] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [134509.295841] Buffer I/O error on dev mmcblk0, logical block 0, async page read
After patch:
[134551.089613] mmc0: switch to bus width 4 failed [134551.090377] mmc0: new high speed MMC card at address 0001 [134551.102271] mmcblk0: mmc0:0001 BGUF4R 29.1 GiB [134551.113365] mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20 p21 [134551.114262] mmcblk0boot0: mmc0:0001 BGUF4R 31.9 MiB [134551.114925] mmcblk0boot1: mmc0:0001 BGUF4R 31.9 MiB
Fixes: 577fb13199b1 ("mmc: rework selection of bus speed mode") Cc: stable@vger.kernel.org Signed-off-by: Ivan Semenov ivan@semenov.dev Link: https://lore.kernel.org/r/20240206172845.34316-1-ivan@semenov.dev Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/mmc.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1006,10 +1006,12 @@ static int mmc_select_bus_width(struct m static unsigned ext_csd_bits[] = { EXT_CSD_BUS_WIDTH_8, EXT_CSD_BUS_WIDTH_4, + EXT_CSD_BUS_WIDTH_1, }; static unsigned bus_widths[] = { MMC_BUS_WIDTH_8, MMC_BUS_WIDTH_4, + MMC_BUS_WIDTH_1, }; struct mmc_host *host = card->host; unsigned idx, bus_width = 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Elad Nachman enachman@marvell.com
commit 09e23823ae9a3e2d5d20f2e1efe0d6e48cef9129 upstream.
AC5X spec says PHY init complete bit must be polled until zero. We see cases in which timeout can take longer than the standard calculation on AC5X, which is expected following the spec comment above. According to the spec, we must wait as long as it takes for that bit to toggle on AC5X. Cap that with 100 delay loops so we won't get stuck forever.
Fixes: 06c8b667ff5b ("mmc: sdhci-xenon: Add support to PHYs of Marvell Xenon SDHC") Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Elad Nachman enachman@marvell.com Link: https://lore.kernel.org/r/20240222191714.1216470-3-enachman@marvell.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-xenon-phy.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-)
--- a/drivers/mmc/host/sdhci-xenon-phy.c +++ b/drivers/mmc/host/sdhci-xenon-phy.c @@ -109,6 +109,8 @@ #define XENON_EMMC_PHY_LOGIC_TIMING_ADJUST (XENON_EMMC_PHY_REG_BASE + 0x18) #define XENON_LOGIC_TIMING_VALUE 0x00AA8977
+#define XENON_MAX_PHY_TIMEOUT_LOOPS 100 + /* * List offset of PHY registers and some special register values * in eMMC PHY 5.0 or eMMC PHY 5.1 @@ -259,18 +261,27 @@ static int xenon_emmc_phy_init(struct sd /* get the wait time */ wait /= clock; wait++; - /* wait for host eMMC PHY init completes */ - udelay(wait);
- reg = sdhci_readl(host, phy_regs->timing_adj); - reg &= XENON_PHY_INITIALIZAION; - if (reg) { + /* + * AC5X spec says bit must be polled until zero. + * We see cases in which timeout can take longer + * than the standard calculation on AC5X, which is + * expected following the spec comment above. + * According to the spec, we must wait as long as + * it takes for that bit to toggle on AC5X. + * Cap that with 100 delay loops so we won't get + * stuck here forever: + */ + + ret = read_poll_timeout(sdhci_readl, reg, + !(reg & XENON_PHY_INITIALIZAION), + wait, XENON_MAX_PHY_TIMEOUT_LOOPS * wait, + false, host, phy_regs->timing_adj); + if (ret) dev_err(mmc_dev(host->mmc), "eMMC PHY init cannot complete after %d us\n", - wait); - return -ETIMEDOUT; - } + wait * XENON_MAX_PHY_TIMEOUT_LOOPS);
- return 0; + return ret; }
#define ARMADA_3700_SOC_PAD_1_8V 0x1
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Elad Nachman enachman@marvell.com
commit 8e9f25a290ae0016353c9ea13314c95fb3207812 upstream.
Each time SD/mmc phy is initialized, at times, in some of the attempts, phy fails to completes its initialization which results into timeout error. Per the HW spec, it is a pre-requisite to ensure a stable SD clock before a phy initialization is attempted.
Fixes: 06c8b667ff5b ("mmc: sdhci-xenon: Add support to PHYs of Marvell Xenon SDHC") Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Signed-off-by: Elad Nachman enachman@marvell.com Link: https://lore.kernel.org/r/20240222200930.1277665-1-enachman@marvell.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-xenon-phy.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
--- a/drivers/mmc/host/sdhci-xenon-phy.c +++ b/drivers/mmc/host/sdhci-xenon-phy.c @@ -11,6 +11,7 @@ #include <linux/slab.h> #include <linux/delay.h> #include <linux/ktime.h> +#include <linux/iopoll.h> #include <linux/of_address.h>
#include "sdhci-pltfm.h" @@ -218,6 +219,19 @@ static int xenon_alloc_emmc_phy(struct s return 0; }
+static int xenon_check_stability_internal_clk(struct sdhci_host *host) +{ + u32 reg; + int err; + + err = read_poll_timeout(sdhci_readw, reg, reg & SDHCI_CLOCK_INT_STABLE, + 1100, 20000, false, host, SDHCI_CLOCK_CONTROL); + if (err) + dev_err(mmc_dev(host->mmc), "phy_init: Internal clock never stabilized.\n"); + + return err; +} + /* * eMMC 5.0/5.1 PHY init/re-init. * eMMC PHY init should be executed after: @@ -234,6 +248,11 @@ static int xenon_emmc_phy_init(struct sd struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host); struct xenon_emmc_phy_regs *phy_regs = priv->emmc_phy_regs;
+ int ret = xenon_check_stability_internal_clk(host); + + if (ret) + return ret; + reg = sdhci_readl(host, phy_regs->timing_adj); reg |= XENON_PHY_INITIALIZAION; sdhci_writel(host, reg, phy_regs->timing_adj);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiubo Li xiubli@redhat.com
commit 51d31149a88b5c5a8d2d33f06df93f6187a25b4c upstream.
The addition of bal_rank_mask with encoding version 17 was merged into ceph.git in Oct 2022 and made it into v18.2.0 release normally. A few months later, the much delayed addition of max_xattr_size got merged, also with encoding version 17, placed before bal_rank_mask in the encoding -- but it didn't make v18.2.0 release.
The way this ended up being resolved on the MDS side is that bal_rank_mask will continue to be encoded in version 17 while max_xattr_size is now encoded in version 18. This does mean that older kernels will misdecode version 17, but this is also true for v18.2.0 and v18.2.1 clients in userspace.
The best we can do is backport this adjustment -- see ceph.git commit 78abfeaff27fee343fb664db633de5b221699a73 for details.
[ idryomov: changelog ]
Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/64440 Fixes: d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Patrick Donnelly pdonnell@ibm.com Reviewed-by: Venky Shankar vshankar@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/mdsmap.c | 7 ++++--- fs/ceph/mdsmap.h | 6 +++++- 2 files changed, 9 insertions(+), 4 deletions(-)
--- a/fs/ceph/mdsmap.c +++ b/fs/ceph/mdsmap.c @@ -380,10 +380,11 @@ struct ceph_mdsmap *ceph_mdsmap_decode(s ceph_decode_skip_8(p, end, bad_ext); /* required_client_features */ ceph_decode_skip_set(p, end, 64, bad_ext); + /* bal_rank_mask */ + ceph_decode_skip_string(p, end, bad_ext); + } + if (mdsmap_ev >= 18) { ceph_decode_64_safe(p, end, m->m_max_xattr_size, bad_ext); - } else { - /* This forces the usage of the (sync) SETXATTR Op */ - m->m_max_xattr_size = 0; } bad_ext: doutc(cl, "m_enabled: %d, m_damaged: %d, m_num_laggy: %d\n", --- a/fs/ceph/mdsmap.h +++ b/fs/ceph/mdsmap.h @@ -27,7 +27,11 @@ struct ceph_mdsmap { u32 m_session_timeout; /* seconds */ u32 m_session_autoclose; /* seconds */ u64 m_max_file_size; - u64 m_max_xattr_size; /* maximum size for xattrs blob */ + /* + * maximum size for xattrs blob. + * Zeroed by default to force the usage of the (sync) SETXATTR Op. + */ + u64 m_max_xattr_size; u32 m_max_mds; /* expected up:active mds number */ u32 m_num_active_mds; /* actual up:active mds number */ u32 possible_max_rank; /* possible max rank index */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 3aff0c459e77ac0fb1c4d6884433467f797f7357 upstream.
Commit e4bb020f3dbb ("riscv: detect assembler support for .option arch") added two tests, one for a valid value to '.option arch' that should succeed and one for an invalid value that is expected to fail to make sure that support for '.option arch' is properly detected because Clang does not error when '.option arch' is not supported:
$ clang --target=riscv64-linux-gnu -Werror -x assembler -c -o /dev/null <(echo '.option arch, +m') /dev/fd/63:1:9: warning: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax' .option arch, +m ^ $ echo $? 0
Unfortunately, the invalid test started being accepted by Clang after the linked llvm-project change, which causes CONFIG_AS_HAS_OPTION_ARCH and configurations that depend on it to be silently disabled, even though those versions do support '.option arch'.
The invalid test can be avoided altogether by using '-Wa,--fatal-warnings', which will turn all assembler warnings into errors, like '-Werror' does for the compiler:
$ clang --target=riscv64-linux-gnu -Werror -Wa,--fatal-warnings -x assembler -c -o /dev/null <(echo '.option arch, +m') /dev/fd/63:1:9: error: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax' .option arch, +m ^ $ echo $? 1
The as-instr macros have been updated to make use of this flag, so remove the invalid test, which allows CONFIG_AS_HAS_OPTION_ARCH to work for all compiler versions.
Cc: stable@vger.kernel.org Fixes: e4bb020f3dbb ("riscv: detect assembler support for .option arch") Link: https://github.com/llvm/llvm-project/commit/3ac9fe69f70a2b3541266daedbaaa7dc... Reported-by: Eric Biggers ebiggers@kernel.org Closes: https://lore.kernel.org/r/20240121011341.GA97368@sol.localdomain/ Signed-off-by: Nathan Chancellor nathan@kernel.org Tested-by: Eric Biggers ebiggers@google.com Tested-by: Andy Chiu andybnac@gmail.com Reviewed-by: Andy Chiu andybnac@gmail.com Tested-by: Conor Dooley conor.dooley@microchip.com Reviewed-by: Conor Dooley conor.dooley@microchip.com Acked-by: Masahiro Yamada masahiroy@kernel.org Link: https://lore.kernel.org/r/20240125-fix-riscv-option-arch-llvm-18-v1-2-390ac9... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/Kconfig | 1 - 1 file changed, 1 deletion(-)
--- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -294,7 +294,6 @@ config AS_HAS_OPTION_ARCH # https://reviews.llvm.org/D123515 def_bool y depends on $(as-instr, .option arch$(comma) +m) - depends on !$(as-instr, .option arch$(comma) -i)
source "arch/riscv/Kconfig.socs" source "arch/riscv/Kconfig.errata"
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zong Li zong.li@sifive.com
commit 680341382da56bd192ebfa4e58eaf4fec2e5bca7 upstream.
CALLER_ADDRx returns caller's address at specified level, they are used for several tracers. These macros eventually use __builtin_return_address(n) to get the caller's address if arch doesn't define their own implementation.
In RISC-V, __builtin_return_address(n) only works when n == 0, we need to walk the stack frame to get the caller's address at specified level.
data.level started from 'level + 3' due to the call flow of getting caller's address in RISC-V implementation. If we don't have additional three iteration, the level is corresponding to follows:
callsite -> return_address -> arch_stack_walk -> walk_stackframe | | | | level 3 level 2 level 1 level 0
Fixes: 10626c32e382 ("riscv/ftrace: Add basic support") Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Signed-off-by: Zong Li zong.li@sifive.com Link: https://lore.kernel.org/r/20240202015102.26251-1-zong.li@sifive.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/ftrace.h | 5 +++ arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/return_address.c | 48 +++++++++++++++++++++++++++++++++++++ 3 files changed, 55 insertions(+) create mode 100644 arch/riscv/kernel/return_address.c
--- a/arch/riscv/include/asm/ftrace.h +++ b/arch/riscv/include/asm/ftrace.h @@ -25,6 +25,11 @@
#define ARCH_SUPPORTS_FTRACE_OPS 1 #ifndef __ASSEMBLY__ + +extern void *return_address(unsigned int level); + +#define ftrace_return_address(n) return_address(n) + void MCOUNT_NAME(void); static inline unsigned long ftrace_call_adjust(unsigned long addr) { --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -7,6 +7,7 @@ ifdef CONFIG_FTRACE CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_patch.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_return_address.o = $(CC_FLAGS_FTRACE) endif CFLAGS_syscall_table.o += $(call cc-option,-Wno-override-init,) CFLAGS_compat_syscall_table.o += $(call cc-option,-Wno-override-init,) @@ -46,6 +47,7 @@ obj-y += irq.o obj-y += process.o obj-y += ptrace.o obj-y += reset.o +obj-y += return_address.o obj-y += setup.o obj-y += signal.o obj-y += syscall_table.o --- /dev/null +++ b/arch/riscv/kernel/return_address.c @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This code come from arch/arm64/kernel/return_address.c + * + * Copyright (C) 2023 SiFive. + */ + +#include <linux/export.h> +#include <linux/kprobes.h> +#include <linux/stacktrace.h> + +struct return_address_data { + unsigned int level; + void *addr; +}; + +static bool save_return_addr(void *d, unsigned long pc) +{ + struct return_address_data *data = d; + + if (!data->level) { + data->addr = (void *)pc; + return false; + } + + --data->level; + + return true; +} +NOKPROBE_SYMBOL(save_return_addr); + +noinline void *return_address(unsigned int level) +{ + struct return_address_data data; + + data.level = level + 3; + data.addr = NULL; + + arch_stack_walk(save_return_addr, &data, current, NULL); + + if (!data.level) + return data.addr; + else + return NULL; + +} +EXPORT_SYMBOL_GPL(return_address); +NOKPROBE_SYMBOL(return_address);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Samuel Holland samuel.holland@sifive.com
commit 3fb3f7164edc467450e650dca51dbe4823315a56 upstream.
When the kernel is running in M-mode, the CBZE bit must be set in the menvcfg CSR, not in senvcfg.
Cc: stable@vger.kernel.org Fixes: 43c16d51a19b ("RISC-V: Enable cbo.zero in usermode") Reviewed-by: Andrew Jones ajones@ventanamicro.com Signed-off-by: Samuel Holland samuel.holland@sifive.com Reviewed-by: Conor Dooley conor.dooley@microchip.com Link: https://lore.kernel.org/r/20240228065559.3434837-2-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/csr.h | 2 ++ arch/riscv/kernel/cpufeature.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
--- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -415,6 +415,7 @@ # define CSR_STATUS CSR_MSTATUS # define CSR_IE CSR_MIE # define CSR_TVEC CSR_MTVEC +# define CSR_ENVCFG CSR_MENVCFG # define CSR_SCRATCH CSR_MSCRATCH # define CSR_EPC CSR_MEPC # define CSR_CAUSE CSR_MCAUSE @@ -439,6 +440,7 @@ # define CSR_STATUS CSR_SSTATUS # define CSR_IE CSR_SIE # define CSR_TVEC CSR_STVEC +# define CSR_ENVCFG CSR_SENVCFG # define CSR_SCRATCH CSR_SSCRATCH # define CSR_EPC CSR_SEPC # define CSR_CAUSE CSR_SCAUSE --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -740,7 +740,7 @@ arch_initcall(check_unaligned_access_all void riscv_user_isa_enable(void) { if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ)) - csr_set(CSR_SENVCFG, ENVCFG_CBZE); + csr_set(CSR_ENVCFG, ENVCFG_CBZE); }
#ifdef CONFIG_RISCV_ALTERNATIVE
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Samuel Holland samuel.holland@sifive.com
commit 05ab803d1ad8ac505ade77c6bd3f86b1b4ea0dc4 upstream.
The value of the [ms]envcfg CSR is lost when entering a nonretentive idle state, so the CSR must be rewritten when resuming the CPU.
Cc: stable@vger.kernel.org # v6.7+ Fixes: 43c16d51a19b ("RISC-V: Enable cbo.zero in usermode") Signed-off-by: Samuel Holland samuel.holland@sifive.com Reviewed-by: Conor Dooley conor.dooley@microchip.com Reviewed-by: Andrew Jones ajones@ventanamicro.com Link: https://lore.kernel.org/r/20240228065559.3434837-4-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/suspend.h | 1 + arch/riscv/kernel/suspend.c | 4 ++++ 2 files changed, 5 insertions(+)
--- a/arch/riscv/include/asm/suspend.h +++ b/arch/riscv/include/asm/suspend.h @@ -14,6 +14,7 @@ struct suspend_context { struct pt_regs regs; /* Saved and restored by high-level functions */ unsigned long scratch; + unsigned long envcfg; unsigned long tvec; unsigned long ie; #ifdef CONFIG_MMU --- a/arch/riscv/kernel/suspend.c +++ b/arch/riscv/kernel/suspend.c @@ -11,6 +11,8 @@ void suspend_save_csrs(struct suspend_context *context) { context->scratch = csr_read(CSR_SCRATCH); + if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG)) + context->envcfg = csr_read(CSR_ENVCFG); context->tvec = csr_read(CSR_TVEC); context->ie = csr_read(CSR_IE);
@@ -32,6 +34,8 @@ void suspend_save_csrs(struct suspend_co void suspend_restore_csrs(struct suspend_context *context) { csr_write(CSR_SCRATCH, context->scratch); + if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG)) + csr_write(CSR_ENVCFG, context->envcfg); csr_write(CSR_TVEC, context->tvec); csr_write(CSR_IE, context->ie);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Weißschuh linux@weissschuh.net
commit 30d5297862410418bb8f8b4c0a87fa55c3063dd7 upstream.
The driver uses regmap APIs so it should make sure they are available.
Fixes: c75f4bf6800b ("power: supply: Introduce MM8013 fuel gauge driver") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh linux@weissschuh.net Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20240204-mm8013-regmap-v1-1-7cc6b619b7d3@weissschu... Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/power/supply/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/power/supply/Kconfig b/drivers/power/supply/Kconfig index f21cb05815ec..3e31375491d5 100644 --- a/drivers/power/supply/Kconfig +++ b/drivers/power/supply/Kconfig @@ -978,6 +978,7 @@ config CHARGER_QCOM_SMB2 config FUEL_GAUGE_MM8013 tristate "Mitsumi MM8013 fuel gauge driver" depends on I2C + select REGMAP_I2C help Say Y here to enable the Mitsumi MM8013 fuel gauge driver. It enables the monitoring of many battery parameters, including
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 0ee695a471a750cad4fff22286d91e038b1ef62f upstream.
Certain assembler instruction tests may only induce warnings from the assembler on an unsupported instruction or option, which causes as-instr to succeed when it was expected to fail. Some tests workaround this limitation by additionally testing that invalid input fails as expected. However, this is fragile if the assembler is changed to accept the invalid input, as it will cause the instruction/option to be unavailable like it was unsupported even when it is.
Use '-Wa,--fatal-warnings' in the as-instr macro to turn these warnings into hard errors, which avoids this fragility and makes tests more robust and well formed.
Cc: stable@vger.kernel.org Suggested-by: Eric Biggers ebiggers@kernel.org Signed-off-by: Nathan Chancellor nathan@kernel.org Tested-by: Eric Biggers ebiggers@google.com Tested-by: Andy Chiu andybnac@gmail.com Reviewed-by: Andy Chiu andybnac@gmail.com Tested-by: Conor Dooley conor.dooley@microchip.com Reviewed-by: Conor Dooley conor.dooley@microchip.com Acked-by: Masahiro Yamada masahiroy@kernel.org Link: https://lore.kernel.org/r/20240125-fix-riscv-option-arch-llvm-18-v1-1-390ac9... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/Kconfig.include | 2 +- scripts/Makefile.compiler | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/scripts/Kconfig.include +++ b/scripts/Kconfig.include @@ -33,7 +33,7 @@ ld-option = $(success,$(LD) -v $(1))
# $(as-instr,<instr>) # Return y if the assembler supports <instr>, n otherwise -as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler-with-cpp -o /dev/null -) +as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -Wa$(comma)--fatal-warnings -c -x assembler-with-cpp -o /dev/null -)
# check if $(CC) and $(LD) exist $(error-if,$(failure,command -v $(CC)),C compiler '$(CC)' not found) --- a/scripts/Makefile.compiler +++ b/scripts/Makefile.compiler @@ -38,7 +38,7 @@ as-option = $(call try-run,\ # Usage: aflags-y += $(call as-instr,instr,option1,option2)
as-instr = $(call try-run,\ - printf "%b\n" "$(1)" | $(CC) -Werror $(CLANG_FLAGS) $(KBUILD_AFLAGS) -c -x assembler-with-cpp -o "$$TMP" -,$(2),$(3)) + printf "%b\n" "$(1)" | $(CC) -Werror $(CLANG_FLAGS) $(KBUILD_AFLAGS) -Wa$(comma)--fatal-warnings -c -x assembler-with-cpp -o "$$TMP" -,$(2),$(3))
# __cc-option # Usage: MY_CFLAGS += $(call __cc-option,$(CC),$(MY_CFLAGS),-march=winchip-c6,-march=i586)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicolin Chen nicolinc@nvidia.com
commit aeb004c0cd6958e910123a1607634401009c9539 upstream.
Syzkaller reported the following WARN_ON: WARNING: CPU: 1 PID: 4738 at drivers/iommu/iommufd/io_pagetable.c:1360
Call Trace: iommufd_access_change_ioas+0x2fe/0x4e0 iommufd_access_destroy_object+0x50/0xb0 iommufd_object_remove+0x2a3/0x490 iommufd_object_destroy_user iommufd_access_destroy+0x71/0xb0 iommufd_test_staccess_release+0x89/0xd0 __fput+0x272/0xb50 __fput_sync+0x4b/0x60 __do_sys_close __se_sys_close __x64_sys_close+0x8b/0x110 do_syscall_x64
The mismatch between the access pointer in the list and the passed-in pointer is resulting from an overwrite of access->iopt_access_list_id, in iopt_add_access(). Called from iommufd_access_change_ioas() when xa_alloc() succeeds but iopt_calculate_iova_alignment() fails.
Add a new_id in iopt_add_access() and only update iopt_access_list_id when returning successfully.
Cc: stable@vger.kernel.org Fixes: 9227da7816dd ("iommufd: Add iommufd_access_change_ioas(_id) helpers") Link: https://lore.kernel.org/r/2dda7acb25b8562ec5f1310de828ef5da9ef509c.170863662... Reported-by: Jason Gunthorpe jgg@nvidia.com Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/iommufd/io_pagetable.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -1330,20 +1330,23 @@ out_unlock:
int iopt_add_access(struct io_pagetable *iopt, struct iommufd_access *access) { + u32 new_id; int rc;
down_write(&iopt->domains_rwsem); down_write(&iopt->iova_rwsem); - rc = xa_alloc(&iopt->access_list, &access->iopt_access_list_id, access, - xa_limit_16b, GFP_KERNEL_ACCOUNT); + rc = xa_alloc(&iopt->access_list, &new_id, access, xa_limit_16b, + GFP_KERNEL_ACCOUNT); + if (rc) goto out_unlock;
rc = iopt_calculate_iova_alignment(iopt); if (rc) { - xa_erase(&iopt->access_list, access->iopt_access_list_id); + xa_erase(&iopt->access_list, new_id); goto out_unlock; } + access->iopt_access_list_id = new_id;
out_unlock: up_write(&iopt->iova_rwsem);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicolin Chen nicolinc@nvidia.com
commit cf7c2789822db8b5efa34f5ebcf1621bc0008d48 upstream.
Syzkaller reported the following bug:
general protection fault, probably for non-canonical address 0xdffffc0000000038: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x00000000000001c0-0x00000000000001c7] Call Trace: lock_acquire lock_acquire+0x1ce/0x4f0 down_read+0x93/0x4a0 iommufd_test_syz_conv_iova+0x56/0x1f0 iommufd_test_access_rw.isra.0+0x2ec/0x390 iommufd_test+0x1058/0x1e30 iommufd_fops_ioctl+0x381/0x510 vfs_ioctl __do_sys_ioctl __se_sys_ioctl __x64_sys_ioctl+0x170/0x1e0 do_syscall_x64 do_syscall_64+0x71/0x140
This is because the new iommufd_access_change_ioas() sets access->ioas to NULL during its process, so the lock might be gone in a concurrent racing context.
Fix this by doing the same access->ioas sanity as iommufd_access_rw() and iommufd_access_pin_pages() functions do.
Cc: stable@vger.kernel.org Fixes: 9227da7816dd ("iommufd: Add iommufd_access_change_ioas(_id) helpers") Link: https://lore.kernel.org/r/3f1932acaf1dd494d404c04364d73ce8f57f3e5e.170863662... Reported-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/iommufd/selftest.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-)
--- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -48,8 +48,8 @@ enum { * In syzkaller mode the 64 bit IOVA is converted into an nth area and offset * value. This has a much smaller randomization space and syzkaller can hit it. */ -static unsigned long iommufd_test_syz_conv_iova(struct io_pagetable *iopt, - u64 *iova) +static unsigned long __iommufd_test_syz_conv_iova(struct io_pagetable *iopt, + u64 *iova) { struct syz_layout { __u32 nth_area; @@ -73,6 +73,21 @@ static unsigned long iommufd_test_syz_co return 0; }
+static unsigned long iommufd_test_syz_conv_iova(struct iommufd_access *access, + u64 *iova) +{ + unsigned long ret; + + mutex_lock(&access->ioas_lock); + if (!access->ioas) { + mutex_unlock(&access->ioas_lock); + return 0; + } + ret = __iommufd_test_syz_conv_iova(&access->ioas->iopt, iova); + mutex_unlock(&access->ioas_lock); + return ret; +} + void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd, unsigned int ioas_id, u64 *iova, u32 *flags) { @@ -85,7 +100,7 @@ void iommufd_test_syz_conv_iova_id(struc ioas = iommufd_get_ioas(ucmd->ictx, ioas_id); if (IS_ERR(ioas)) return; - *iova = iommufd_test_syz_conv_iova(&ioas->iopt, iova); + *iova = __iommufd_test_syz_conv_iova(&ioas->iopt, iova); iommufd_put_object(ucmd->ictx, &ioas->obj); }
@@ -1045,7 +1060,7 @@ static int iommufd_test_access_pages(str }
if (flags & MOCK_FLAGS_ACCESS_SYZ) - iova = iommufd_test_syz_conv_iova(&staccess->access->ioas->iopt, + iova = iommufd_test_syz_conv_iova(staccess->access, &cmd->access_pages.iova);
npages = (ALIGN(iova + length, PAGE_SIZE) - @@ -1147,8 +1162,8 @@ static int iommufd_test_access_rw(struct }
if (flags & MOCK_FLAGS_ACCESS_SYZ) - iova = iommufd_test_syz_conv_iova(&staccess->access->ioas->iopt, - &cmd->access_rw.iova); + iova = iommufd_test_syz_conv_iova(staccess->access, + &cmd->access_rw.iova);
rc = iommufd_access_rw(staccess->access, iova, tmp, length, flags); if (rc)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Schumacher timschumi@gmx.de
commit f45812cc23fb74bef62d4eb8a69fe7218f4b9f2a upstream.
Work around a quirk in a few old (2011-ish) UEFI implementations, where a call to `GetNextVariableName` with a buffer size larger than 512 bytes will always return EFI_INVALID_PARAMETER.
There is some lore around EFI variable names being up to 1024 bytes in size, but this has no basis in the UEFI specification, and the upper bounds are typically platform specific, and apply to the entire variable (name plus payload).
Given that Linux does not permit creating files with names longer than NAME_MAX (255) bytes, 512 bytes (== 256 UTF-16 characters) is a reasonable limit.
Cc: stable@vger.kernel.org # 6.1+ Signed-off-by: Tim Schumacher timschumi@gmx.de Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/efivarfs/vars.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
--- a/fs/efivarfs/vars.c +++ b/fs/efivarfs/vars.c @@ -372,7 +372,7 @@ static void dup_variable_bug(efi_char16_ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), void *data, bool duplicates, struct list_head *head) { - unsigned long variable_name_size = 1024; + unsigned long variable_name_size = 512; efi_char16_t *variable_name; efi_status_t status; efi_guid_t vendor_guid; @@ -389,12 +389,13 @@ int efivar_init(int (*func)(efi_char16_t goto free;
/* - * Per EFI spec, the maximum storage allocated for both - * the variable name and variable data is 1024 bytes. + * A small set of old UEFI implementations reject sizes + * above a certain threshold, the lowest seen in the wild + * is 512. */
do { - variable_name_size = 1024; + variable_name_size = 512;
status = efivar_get_next_variable(&variable_name_size, variable_name, @@ -431,9 +432,13 @@ int efivar_init(int (*func)(efi_char16_t break; case EFI_NOT_FOUND: break; + case EFI_BUFFER_TOO_SMALL: + pr_warn("efivars: Variable name size exceeds maximum (%lu > 512)\n", + variable_name_size); + status = EFI_NOT_FOUND; + break; default: - printk(KERN_WARNING "efivars: get_next_variable: status=%lx\n", - status); + pr_warn("efivars: get_next_variable: status=%lx\n", status); status = EFI_NOT_FOUND; break; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cristian Marussi cristian.marussi@arm.com
commit eb5555d422d0fc325e1574a7353d3c616f82d8b5 upstream.
On unloading of the scmi_perf_domain module got the below splat, when in the DT provided to the system under test the '#power-domain-cells' property was missing. Indeed, this particular setup causes the probe to bail out early without giving any error, which leads to the ->remove() callback gets to run too, but without all the expected initialized structures in place.
Add a check and bail out early on remove too.
Call trace: scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain] scmi_dev_remove+0x28/0x40 [scmi_core] device_remove+0x54/0x90 device_release_driver_internal+0x1dc/0x240 driver_detach+0x58/0xa8 bus_remove_driver+0x78/0x108 driver_unregister+0x38/0x70 scmi_driver_unregister+0x28/0x180 [scmi_core] scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain] __arm64_sys_delete_module+0x1a8/0x2c0 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x34/0xb8 el0t_64_sync_handler+0x100/0x130 el0t_64_sync+0x190/0x198 Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80) ---[ end trace 0000000000000000 ]---
Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Reviewed-by: Sudeep Holla sudeep.holla@arm.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pmdomain/arm/scmi_perf_domain.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/pmdomain/arm/scmi_perf_domain.c b/drivers/pmdomain/arm/scmi_perf_domain.c index 709bbc448fad..d7ef46ccd9b8 100644 --- a/drivers/pmdomain/arm/scmi_perf_domain.c +++ b/drivers/pmdomain/arm/scmi_perf_domain.c @@ -159,6 +159,9 @@ static void scmi_perf_domain_remove(struct scmi_device *sdev) struct genpd_onecell_data *scmi_pd_data = dev_get_drvdata(dev); int i;
+ if (!scmi_pd_data) + return; + of_genpd_del_provider(dev->of_node);
for (i = 0; i < scmi_pd_data->num_domains; i++)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bjorn Andersson quic_bjorande@quicinc.com
commit 2a93c6cbd5a703d44c414a3c3945a87ce11430ba upstream.
Commit 'e3e56c050ab6 ("soc: qcom: rpmhpd: Make power_on actually enable the domain")' aimed to make sure that a power-domain that is being enabled without any particular performance-state requested will at least turn the rail on, to avoid filling DeviceTree with otherwise unnecessary required-opps properties.
But in the event that aggregation happens on a disabled power-domain, with an enabled peer without performance-state, both the local and peer corner are 0. The peer's enabled_corner is not considered, with the result that the underlying (shared) resource is disabled.
One case where this can be observed is when the display stack keeps mmcx enabled (but without a particular performance-state vote) in order to access registers and sync_state happens in the rpmhpd driver. As mmcx_ao is flushed the state of the peer (mmcx) is not considered and mmcx_ao ends up turning off "mmcx.lvl" underneath mmcx. This has been observed several times, but has been painted over in DeviceTree by adding an explicit vote for the lowest non-disabled performance-state.
Fixes: e3e56c050ab6 ("soc: qcom: rpmhpd: Make power_on actually enable the domain") Reported-by: Johan Hovold johan@kernel.org Closes: https://lore.kernel.org/linux-arm-msm/ZdMwZa98L23mu3u6@hovoldconsulting.com/ Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Tested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Reviewed-by: Stephen Boyd swboyd@chromium.org Tested-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20240226-rpmhpd-enable-corner-fix-v1-1-68c004cec48... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pmdomain/qcom/rpmhpd.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/pmdomain/qcom/rpmhpd.c +++ b/drivers/pmdomain/qcom/rpmhpd.c @@ -692,6 +692,7 @@ static int rpmhpd_aggregate_corner(struc unsigned int active_corner, sleep_corner; unsigned int this_active_corner = 0, this_sleep_corner = 0; unsigned int peer_active_corner = 0, peer_sleep_corner = 0; + unsigned int peer_enabled_corner;
if (pd->state_synced) { to_active_sleep(pd, corner, &this_active_corner, &this_sleep_corner); @@ -701,9 +702,11 @@ static int rpmhpd_aggregate_corner(struc this_sleep_corner = pd->level_count - 1; }
- if (peer && peer->enabled) - to_active_sleep(peer, peer->corner, &peer_active_corner, + if (peer && peer->enabled) { + peer_enabled_corner = max(peer->corner, peer->enable_corner); + to_active_sleep(peer, peer_enabled_corner, &peer_active_corner, &peer_sleep_corner); + }
active_corner = max(this_active_corner, peer_active_corner);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit 6572786006fa96ad2c35bb31757f1f861298093b upstream.
Fix to allocate fprobe::entry_data_size buffer with rethook instances. If fprobe doesn't allocate entry_data_size buffer for each rethook instance, fprobe entry handler can cause a buffer overrun when storing entry data in entry handler.
Link: https://lore.kernel.org/all/170920576727.107552.638161246679734051.stgit@dev...
Reported-by: Jiri Olsa olsajiri@gmail.com Closes: https://lore.kernel.org/all/Zd9eBn2FTQzYyg7L@krava/ Fixes: 4bbd93455659 ("kprobes: kretprobe scalability improvement") Cc: stable@vger.kernel.org Tested-by: Jiri Olsa olsajiri@gmail.com Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/fprobe.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c index 6cd2a4e3afb8..9ff018245840 100644 --- a/kernel/trace/fprobe.c +++ b/kernel/trace/fprobe.c @@ -189,9 +189,6 @@ static int fprobe_init_rethook(struct fprobe *fp, int num) { int size;
- if (num <= 0) - return -EINVAL; - if (!fp->exit_handler) { fp->rethook = NULL; return 0; @@ -199,15 +196,16 @@ static int fprobe_init_rethook(struct fprobe *fp, int num)
/* Initialize rethook if needed */ if (fp->nr_maxactive) - size = fp->nr_maxactive; + num = fp->nr_maxactive; else - size = num * num_possible_cpus() * 2; - if (size <= 0) + num *= num_possible_cpus() * 2; + if (num <= 0) return -EINVAL;
+ size = sizeof(struct fprobe_rethook_node) + fp->entry_data_size; + /* Initialize rethook */ - fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler, - sizeof(struct fprobe_rethook_node), size); + fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler, size, num); if (IS_ERR(fp->rethook)) return PTR_ERR(fp->rethook);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aneesh Kumar K.V (IBM) aneesh.kumar@kernel.org
commit 720da1e593b85a550593b415bf1d79a053133451 upstream.
Architectures like powerpc add debug checks to ensure we find only devmap PUD pte entries. These debug checks are only done with CONFIG_DEBUG_VM. This patch marks the ptes used for PUD advanced test devmap pte entries so that we don't hit on debug checks on architecture like ppc64 as below.
WARNING: CPU: 2 PID: 1 at arch/powerpc/mm/book3s64/radix_pgtable.c:1382 radix__pud_hugepage_update+0x38/0x138 .... NIP [c0000000000a7004] radix__pud_hugepage_update+0x38/0x138 LR [c0000000000a77a8] radix__pudp_huge_get_and_clear+0x28/0x60 Call Trace: [c000000004a2f950] [c000000004a2f9a0] 0xc000000004a2f9a0 (unreliable) [c000000004a2f980] [000d34c100000000] 0xd34c100000000 [c000000004a2f9a0] [c00000000206ba98] pud_advanced_tests+0x118/0x334 [c000000004a2fa40] [c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48 [c000000004a2fc10] [c00000000000fd28] do_one_initcall+0x60/0x388
Also
kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:202! ....
NIP [c000000000096510] pudp_huge_get_and_clear_full+0x98/0x174 LR [c00000000206bb34] pud_advanced_tests+0x1b4/0x334 Call Trace: [c000000004a2f950] [000d34c100000000] 0xd34c100000000 (unreliable) [c000000004a2f9a0] [c00000000206bb34] pud_advanced_tests+0x1b4/0x334 [c000000004a2fa40] [c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48 [c000000004a2fc10] [c00000000000fd28] do_one_initcall+0x60/0x388
Link: https://lkml.kernel.org/r/20240129060022.68044-1-aneesh.kumar@kernel.org Fixes: 27af67f35631 ("powerpc/book3s64/mm: enable transparent pud hugepage") Signed-off-by: Aneesh Kumar K.V (IBM) aneesh.kumar@kernel.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Michael Ellerman mpe@ellerman.id.au Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/debug_vm_pgtable.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -362,6 +362,12 @@ static void __init pud_advanced_tests(st vaddr &= HPAGE_PUD_MASK;
pud = pfn_pud(args->pud_pfn, args->page_prot); + /* + * Some architectures have debug checks to make sure + * huge pud mapping are only found with devmap entries + * For now test with only devmap entries. + */ + pud = pud_mkdevmap(pud); set_pud_at(args->mm, vaddr, args->pudp, pud); flush_dcache_page(page); pudp_set_wrprotect(args->mm, vaddr, args->pudp); @@ -374,6 +380,7 @@ static void __init pud_advanced_tests(st WARN_ON(!pud_none(pud)); #endif /* __PAGETABLE_PMD_FOLDED */ pud = pfn_pud(args->pud_pfn, args->page_prot); + pud = pud_mkdevmap(pud); pud = pud_wrprotect(pud); pud = pud_mkclean(pud); set_pud_at(args->mm, vaddr, args->pudp, pud); @@ -391,6 +398,7 @@ static void __init pud_advanced_tests(st #endif /* __PAGETABLE_PMD_FOLDED */
pud = pfn_pud(args->pud_pfn, args->page_prot); + pud = pud_mkdevmap(pud); pud = pud_mkyoung(pud); set_pud_at(args->mm, vaddr, args->pudp, pud); flush_dcache_page(page);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Byungchul Park byungchul@sk.com
commit 2774f256e7c0219e2b0a0894af1c76bdabc4f974 upstream.
With numa balancing on, when a numa system is running where a numa node doesn't have its local memory so it has no managed zones, the following oops has been observed. It's because wakeup_kswapd() is called with a wrong zone index, -1. Fixed it by checking the index before calling wakeup_kswapd().
BUG: unable to handle page fault for address: 00000000000033f3 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty #255 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812) Code: (omitted) RSP: 0000:ffffc90004257d58 EFLAGS: 00010286 RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480 RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940 FS: 00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace:
<TASK> ? __die ? page_fault_oops ? __pte_offset_map_lock ? exc_page_fault ? asm_exc_page_fault ? wakeup_kswapd migrate_misplaced_page __handle_mm_fault handle_mm_fault do_user_addr_fault exc_page_fault asm_exc_page_fault RIP: 0033:0x55b897ba0808 Code: (omitted) RSP: 002b:00007ffeefa821a0 EFLAGS: 00010287 RAX: 000055b89983acd0 RBX: 00007ffeefa823f8 RCX: 000055b89983acd0 RDX: 00007fc2f8122010 RSI: 0000000000020000 RDI: 000055b89983acd0 RBP: 00007ffeefa821a0 R08: 0000000000000037 R09: 0000000000000075 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 00007ffeefa82410 R14: 000055b897ba5dd8 R15: 00007fc4b8340000 </TASK>
Link: https://lkml.kernel.org/r/20240216111502.79759-1-byungchul@sk.com Signed-off-by: Byungchul Park byungchul@sk.com Reported-by: Hyeongtak Ji hyeongtak.ji@sk.com Fixes: c574bbe917036 ("NUMA balancing: optimize page placement for memory tiering system") Reviewed-by: Oscar Salvador osalvador@suse.de Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/migrate.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/mm/migrate.c +++ b/mm/migrate.c @@ -2517,6 +2517,14 @@ static int numamigrate_isolate_folio(pg_ if (managed_zone(pgdat->node_zones + z)) break; } + + /* + * If there are no managed zones, it should not proceed + * further. + */ + if (z < 0) + return 0; + wakeup_kswapd(pgdat->node_zones + z, 0, folio_order(folio), ZONE_MOVABLE); return 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Bohac jbohac@suse.cz
commit 7fd817c906503b6813ea3b41f5fdf4192449a707 upstream.
SETUP_RNG_SEED in setup_data is supplied by kexec and should not be reserved in the e820 map.
Doing so reserves 16 bytes of RAM when booting with kexec. (16 bytes because data->len is zeroed by parse_setup_data so only sizeof(setup_data) is reserved.)
When kexec is used repeatedly, each boot adds two entries in the kexec-provided e820 map as the 16-byte range splits a larger range of usable memory. Eventually all of the 128 available entries get used up. The next split will result in losing usable memory as the new entries cannot be added to the e820 map.
Fixes: 68b8e9713c8e ("x86/setup: Use rng seeds from setup_data") Signed-off-by: Jiri Bohac jbohac@suse.cz Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/ZbmOjKnARGiaYBd5@dwarf.suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/e820.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1017,10 +1017,12 @@ void __init e820__reserve_setup_data(voi e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
/* - * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need - * to be reserved. + * SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED are supplied by + * kexec and do not need to be reserved. */ - if (data->type != SETUP_EFI && data->type != SETUP_IMA) + if (data->type != SETUP_EFI && + data->type != SETUP_IMA && + data->type != SETUP_RNG_SEED) e820__range_update_kexec(pa_data, sizeof(*data) + data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Bonzini pbonzini@redhat.com
commit 9a458198eba98b7207669a166e64d04b04cb651b upstream.
In commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach"), the initialization of c->x86_phys_bits was moved after this_cpu->c_early_init(c). This is incorrect because early_init_amd() expected to be able to reduce the value according to the contents of CPUID leaf 0x8000001f.
Fortunately, the bug was negated by init_amd()'s call to early_init_amd(), which does reduce x86_phys_bits in the end. However, this is very late in the boot process and, most notably, the wrong value is used for x86_phys_bits when setting up MTRRs.
To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is set/cleared, and c->extended_cpuid_level is retrieved.
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240131230902.1867092-2-pbonzini%40redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/common.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1596,6 +1596,7 @@ static void __init early_identify_cpu(st get_cpu_vendor(c); get_cpu_cap(c); setup_force_cpu_cap(X86_FEATURE_CPUID); + get_cpu_address_sizes(c); cpu_parse_early_param();
if (this_cpu->c_early_init) @@ -1608,10 +1609,9 @@ static void __init early_identify_cpu(st this_cpu->c_bsp_init(c); } else { setup_clear_cpu_cap(X86_FEATURE_CPUID); + get_cpu_address_sizes(c); }
- get_cpu_address_sizes(c); - setup_force_cpu_cap(X86_FEATURE_ALWAYS);
cpu_set_bug_bits(c);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Bonzini pbonzini@redhat.com
commit 6890cb1ace350b4386c8aee1343dc3b3ddd214da upstream.
MKTME repurposes the high bit of physical address to key id for encryption key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same, the valid bits in the MTRR mask register are based on the reduced number of physical address bits.
detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts it from the total usable physical bits, but it is called too late. Move the call to early_init_intel() so that it is called in setup_arch(), before MTRRs are setup.
This fixes boot on TDX-enabled systems, which until now only worked with "disable_mtrr_cleanup". Without the patch, the values written to the MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and the writes failed; with the patch, the values are 46-bit wide, which matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo.
Reported-by: Zixi Chen zixchen@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240131230902.1867092-3-pbonzini%40redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/intel.c | 178 ++++++++++++++++++++++---------------------- 1 file changed, 91 insertions(+), 87 deletions(-)
--- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -184,6 +184,90 @@ static bool bad_spectre_microcode(struct return false; }
+#define MSR_IA32_TME_ACTIVATE 0x982 + +/* Helpers to access TME_ACTIVATE MSR */ +#define TME_ACTIVATE_LOCKED(x) (x & 0x1) +#define TME_ACTIVATE_ENABLED(x) (x & 0x2) + +#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */ +#define TME_ACTIVATE_POLICY_AES_XTS_128 0 + +#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */ + +#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */ +#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1 + +/* Values for mktme_status (SW only construct) */ +#define MKTME_ENABLED 0 +#define MKTME_DISABLED 1 +#define MKTME_UNINITIALIZED 2 +static int mktme_status = MKTME_UNINITIALIZED; + +static void detect_tme_early(struct cpuinfo_x86 *c) +{ + u64 tme_activate, tme_policy, tme_crypto_algs; + int keyid_bits = 0, nr_keyids = 0; + static u64 tme_activate_cpu0 = 0; + + rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate); + + if (mktme_status != MKTME_UNINITIALIZED) { + if (tme_activate != tme_activate_cpu0) { + /* Broken BIOS? */ + pr_err_once("x86/tme: configuration is inconsistent between CPUs\n"); + pr_err_once("x86/tme: MKTME is not usable\n"); + mktme_status = MKTME_DISABLED; + + /* Proceed. We may need to exclude bits from x86_phys_bits. */ + } + } else { + tme_activate_cpu0 = tme_activate; + } + + if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) { + pr_info_once("x86/tme: not enabled by BIOS\n"); + mktme_status = MKTME_DISABLED; + return; + } + + if (mktme_status != MKTME_UNINITIALIZED) + goto detect_keyid_bits; + + pr_info("x86/tme: enabled by BIOS\n"); + + tme_policy = TME_ACTIVATE_POLICY(tme_activate); + if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128) + pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy); + + tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate); + if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) { + pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n", + tme_crypto_algs); + mktme_status = MKTME_DISABLED; + } +detect_keyid_bits: + keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate); + nr_keyids = (1UL << keyid_bits) - 1; + if (nr_keyids) { + pr_info_once("x86/mktme: enabled by BIOS\n"); + pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids); + } else { + pr_info_once("x86/mktme: disabled by BIOS\n"); + } + + if (mktme_status == MKTME_UNINITIALIZED) { + /* MKTME is usable */ + mktme_status = MKTME_ENABLED; + } + + /* + * KeyID bits effectively lower the number of physical address + * bits. Update cpuinfo_x86::x86_phys_bits accordingly. + */ + c->x86_phys_bits -= keyid_bits; +} + static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable; @@ -322,6 +406,13 @@ static void early_init_intel(struct cpui */ if (detect_extended_topology_early(c) < 0) detect_ht_early(c); + + /* + * Adjust the number of physical bits early because it affects the + * valid bits of the MTRR mask registers. + */ + if (cpu_has(c, X86_FEATURE_TME)) + detect_tme_early(c); }
static void bsp_init_intel(struct cpuinfo_x86 *c) @@ -482,90 +573,6 @@ static void srat_detect_node(struct cpui #endif }
-#define MSR_IA32_TME_ACTIVATE 0x982 - -/* Helpers to access TME_ACTIVATE MSR */ -#define TME_ACTIVATE_LOCKED(x) (x & 0x1) -#define TME_ACTIVATE_ENABLED(x) (x & 0x2) - -#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */ -#define TME_ACTIVATE_POLICY_AES_XTS_128 0 - -#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */ - -#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */ -#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1 - -/* Values for mktme_status (SW only construct) */ -#define MKTME_ENABLED 0 -#define MKTME_DISABLED 1 -#define MKTME_UNINITIALIZED 2 -static int mktme_status = MKTME_UNINITIALIZED; - -static void detect_tme(struct cpuinfo_x86 *c) -{ - u64 tme_activate, tme_policy, tme_crypto_algs; - int keyid_bits = 0, nr_keyids = 0; - static u64 tme_activate_cpu0 = 0; - - rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate); - - if (mktme_status != MKTME_UNINITIALIZED) { - if (tme_activate != tme_activate_cpu0) { - /* Broken BIOS? */ - pr_err_once("x86/tme: configuration is inconsistent between CPUs\n"); - pr_err_once("x86/tme: MKTME is not usable\n"); - mktme_status = MKTME_DISABLED; - - /* Proceed. We may need to exclude bits from x86_phys_bits. */ - } - } else { - tme_activate_cpu0 = tme_activate; - } - - if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) { - pr_info_once("x86/tme: not enabled by BIOS\n"); - mktme_status = MKTME_DISABLED; - return; - } - - if (mktme_status != MKTME_UNINITIALIZED) - goto detect_keyid_bits; - - pr_info("x86/tme: enabled by BIOS\n"); - - tme_policy = TME_ACTIVATE_POLICY(tme_activate); - if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128) - pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy); - - tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate); - if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) { - pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n", - tme_crypto_algs); - mktme_status = MKTME_DISABLED; - } -detect_keyid_bits: - keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate); - nr_keyids = (1UL << keyid_bits) - 1; - if (nr_keyids) { - pr_info_once("x86/mktme: enabled by BIOS\n"); - pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids); - } else { - pr_info_once("x86/mktme: disabled by BIOS\n"); - } - - if (mktme_status == MKTME_UNINITIALIZED) { - /* MKTME is usable */ - mktme_status = MKTME_ENABLED; - } - - /* - * KeyID bits effectively lower the number of physical address - * bits. Update cpuinfo_x86::x86_phys_bits accordingly. - */ - c->x86_phys_bits -= keyid_bits; -} - static void init_cpuid_fault(struct cpuinfo_x86 *c) { u64 msr; @@ -702,9 +709,6 @@ static void init_intel(struct cpuinfo_x8
init_ia32_feat_ctl(c);
- if (cpu_has(c, X86_FEATURE_TME)) - detect_tme(c); - init_intel_misc_features(c);
split_lock_init();
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
commit 535d620ea5ff1a033dc64ee3d912acadc7470619 upstream.
Address family of server side mismatches with that of client side, like in "userspace pm add & remove address" test:
userspace_pm_add_addr $ns1 10.0.2.1 10 userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED
That's because on the server side, the family is set to AF_INET6 and the v4 address is mapped in a v6 one.
This patch fixes this issue. In mptcp_pm_nl_subflow_destroy_doit(), before checking local address family with remote address family, map an IPv4 address to an IPv6 address if the pair is a v4-mapped address.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-1-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_userspace.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -495,6 +495,16 @@ int mptcp_pm_nl_subflow_destroy_doit(str goto destroy_err; }
+#if IS_ENABLED(CONFIG_MPTCP_IPV6) + if (addr_l.family == AF_INET && ipv6_addr_v4mapped(&addr_r.addr6)) { + ipv6_addr_set_v4mapped(addr_l.addr.s_addr, &addr_l.addr6); + addr_l.family = AF_INET6; + } + if (addr_r.family == AF_INET && ipv6_addr_v4mapped(&addr_l.addr6)) { + ipv6_addr_set_v4mapped(addr_r.addr.s_addr, &addr_r.addr6); + addr_r.family = AF_INET6; + } +#endif if (addr_l.family != addr_r.family) { GENL_SET_ERR_MSG(info, "address families do not match"); err = -EINVAL;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 5b49c41ac8f27aa3a63a1712b1f54f91015c18f2 upstream.
After the 'Fixes' commit mentioned below, the client side might print the following warning once when a subflow is fully established at the reception of any valid additional ack:
MPTCP: bogus mpc option on established client sk
That's a normal situation, and no warning should be printed for that. We can then skip the check when the label is used.
Fixes: e4a0fa47e816 ("mptcp: corner case locking for rx path fields initialization") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-3-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/options.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -981,10 +981,10 @@ static bool check_fully_established(stru if (mp_opt->deny_join_id0) WRITE_ONCE(msk->pm.remote_deny_join_id0, true);
-set_fully_established: if (unlikely(!READ_ONCE(msk->pm.server_side))) pr_warn_once("bogus mpc option on established client sk");
+set_fully_established: mptcp_data_lock((struct sock *)msk); __mptcp_subflow_fully_established(msk, subflow, mp_opt); mptcp_data_unlock((struct sock *)msk);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit b9cd26f640a308ea314ad23532de9a8592cd09d2 upstream.
when inserting not contiguous data in the subflow write queue, the protocol creates a new skb and prevent the TCP stack from merging it later with already queued skbs by setting the EOR marker.
Still no push flag is explicitly set at the end of previous GSO packet, making the aggregation on the receiver side sub-optimal - and packetdrill self-tests less predictable.
Explicitly mark the end of not contiguous DSS with the push flag.
Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-4-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1274,6 +1274,7 @@ static int mptcp_sendmsg_frag(struct soc mpext = mptcp_get_ext(skb); if (!mptcp_skb_can_collapse_to(data_seq, skb, mpext)) { TCP_SKB_CB(skb)->eor = 1; + tcp_mark_push(tcp_sk(ssk), skb); goto alloc_skb; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
commit 9480f388a2ef54fba911d9325372abd69a328601 upstream.
Commands 'ss -M' are used in script mptcp_join.sh to display only MPTCP sockets. So it must be checked if ss tool supports MPTCP in this script.
Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-7-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++++ 1 file changed, 5 insertions(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -159,6 +159,11 @@ check_tools() exit $ksft_skip fi
+ if ! ss -h | grep -q MPTCP; then + echo "SKIP: ss tool does not support MPTCP" + exit $ksft_skip + fi + # Use the legacy version if available to support old kernel versions if iptables-legacy -V &> /dev/null; then iptables="iptables-legacy"
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit adf1bb78dab55e36d4d557aa2fb446ebcfe9e5ce upstream.
Such value should be inherited from the first subflow, but passive sockets always used 'rsk_rcv_wnd'.
Fixes: 6f8a612a33e4 ("mptcp: keep track of advertised windows right edge") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-5-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3225,7 +3225,7 @@ struct sock *mptcp_sk_clone_init(const s msk->write_seq = subflow_req->idsn + 1; msk->snd_nxt = msk->write_seq; msk->snd_una = msk->write_seq; - msk->wnd_end = msk->snd_nxt + req->rsk_rcv_wnd; + msk->wnd_end = msk->snd_nxt + tcp_sk(ssk)->snd_wnd; msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq; mptcp_init_sched(msk, mptcp_sk(sk)->sched);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit b111d8fbd2cbc63e05f3adfbbe0d4df655dfcc5b upstream.
After the blamed commit below, the send buffer auto-tuning can happen after that the mptcp_propagate_sndbuf() completes - via the delegated action infrastructure.
We must check for write space even after such change or we risk missing the wake-up event.
Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-6-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.h | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
--- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -790,6 +790,16 @@ static inline bool mptcp_data_fin_enable READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt); }
+static inline void mptcp_write_space(struct sock *sk) +{ + if (sk_stream_is_writeable(sk)) { + /* pairs with memory barrier in mptcp_poll */ + smp_mb(); + if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags)) + sk_stream_write_space(sk); + } +} + static inline void __mptcp_sync_sndbuf(struct sock *sk) { struct mptcp_subflow_context *subflow; @@ -808,6 +818,7 @@ static inline void __mptcp_sync_sndbuf(s
/* the msk max wmem limit is <nr_subflows> * tcp wmem[2] */ WRITE_ONCE(sk->sk_sndbuf, new_sndbuf); + mptcp_write_space(sk); }
/* The called held both the msk socket and the subflow socket locks, @@ -838,16 +849,6 @@ static inline void mptcp_propagate_sndbu local_bh_enable(); }
-static inline void mptcp_write_space(struct sock *sk) -{ - if (sk_stream_is_writeable(sk)) { - /* pairs with memory barrier in mptcp_poll */ - smp_mb(); - if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags)) - sk_stream_write_space(sk); - } -} - void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags);
#define MPTCP_TOKEN_MAX_RETRIES 4
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Davide Caratti dcaratti@redhat.com
commit 10048689def7e40a4405acda16fdc6477d4ecc5c upstream.
when MPTCP server accepts an incoming connection, it clones its listener socket. However, the pointer to 'inet_opt' for the new socket has the same value as the original one: as a consequence, on program exit it's possible to observe the following splat:
BUG: KASAN: double-free in inet_sock_destruct+0x54f/0x8b0 Free of addr ffff888485950880 by task swapper/25/0
CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 6.8.0-rc1+ #609 Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013 Call Trace: <IRQ> dump_stack_lvl+0x32/0x50 print_report+0xca/0x620 kasan_report_invalid_free+0x64/0x90 __kasan_slab_free+0x1aa/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 rcu_do_batch+0x34e/0xd90 rcu_core+0x559/0xac0 __do_softirq+0x183/0x5a4 irq_exit_rcu+0x12d/0x170 sysvec_apic_timer_interrupt+0x6b/0x80 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:cpuidle_enter_state+0x175/0x300 Code: 30 00 0f 84 1f 01 00 00 83 e8 01 83 f8 ff 75 e5 48 83 c4 18 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc fb 45 85 ed <0f> 89 60 ff ff ff 48 c1 e5 06 48 c7 43 18 00 00 00 00 48 83 44 2b RSP: 0018:ffff888481cf7d90 EFLAGS: 00000202 RAX: 0000000000000000 RBX: ffff88887facddc8 RCX: 0000000000000000 RDX: 1ffff1110ff588b1 RSI: 0000000000000019 RDI: ffff88887fac4588 RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000043080 R10: 0009b02ea273363f R11: ffff88887fabf42b R12: ffffffff932592e0 R13: 0000000000000004 R14: 0000000000000000 R15: 00000022c880ec80 cpuidle_enter+0x4a/0xa0 do_idle+0x310/0x410 cpu_startup_entry+0x51/0x60 start_secondary+0x211/0x270 secondary_startup_64_no_verify+0x184/0x18b </TASK>
Allocated by task 6853: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 __kasan_kmalloc+0xa6/0xb0 __kmalloc+0x1eb/0x450 cipso_v4_sock_setattr+0x96/0x360 netlbl_sock_setattr+0x132/0x1f0 selinux_netlbl_socket_post_create+0x6c/0x110 selinux_socket_post_create+0x37b/0x7f0 security_socket_post_create+0x63/0xb0 __sock_create+0x305/0x450 __sys_socket_create.part.23+0xbd/0x130 __sys_socket+0x37/0xb0 __x64_sys_socket+0x6f/0xb0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76
Freed by task 6858: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x12c/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 subflow_ulp_release+0x1f0/0x250 tcp_cleanup_ulp+0x6e/0x110 tcp_v4_destroy_sock+0x5a/0x3a0 inet_csk_destroy_sock+0x135/0x390 tcp_fin+0x416/0x5c0 tcp_data_queue+0x1bc8/0x4310 tcp_rcv_state_process+0x15a3/0x47b0 tcp_v4_do_rcv+0x2c1/0x990 tcp_v4_rcv+0x41fb/0x5ed0 ip_protocol_deliver_rcu+0x6d/0x9f0 ip_local_deliver_finish+0x278/0x360 ip_local_deliver+0x182/0x2c0 ip_rcv+0xb5/0x1c0 __netif_receive_skb_one_core+0x16e/0x1b0 process_backlog+0x1e3/0x650 __napi_poll+0xa6/0x500 net_rx_action+0x740/0xbb0 __do_softirq+0x183/0x5a4
The buggy address belongs to the object at ffff888485950880 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888485950880, ffff8884859508c0)
The buggy address belongs to the physical page: page:0000000056d1e95e refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888485950700 pfn:0x485950 flags: 0x57ffffc0000800(slab|node=1|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0057ffffc0000800 ffff88810004c640 ffffea00121b8ac0 dead000000000006 raw: ffff888485950700 0000000000200019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888485950780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888485950880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^ ffff888485950900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950980: 00 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc
Something similar (a refcount underflow) happens with CALIPSO/IPv6. Fix this by duplicating IP / IPv6 options after clone, so that ip{,6}_sock_destruct() doesn't end up freeing the same memory area twice.
Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Cc: stable@vger.kernel.org Signed-off-by: Davide Caratti dcaratti@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-8-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3192,8 +3192,50 @@ static struct ipv6_pinfo *mptcp_inet6_sk
return (struct ipv6_pinfo *)(((u8 *)sk) + offset); } + +static void mptcp_copy_ip6_options(struct sock *newsk, const struct sock *sk) +{ + const struct ipv6_pinfo *np = inet6_sk(sk); + struct ipv6_txoptions *opt; + struct ipv6_pinfo *newnp; + + newnp = inet6_sk(newsk); + + rcu_read_lock(); + opt = rcu_dereference(np->opt); + if (opt) { + opt = ipv6_dup_options(newsk, opt); + if (!opt) + net_warn_ratelimited("%s: Failed to copy ip6 options\n", __func__); + } + RCU_INIT_POINTER(newnp->opt, opt); + rcu_read_unlock(); +} #endif
+static void mptcp_copy_ip_options(struct sock *newsk, const struct sock *sk) +{ + struct ip_options_rcu *inet_opt, *newopt = NULL; + const struct inet_sock *inet = inet_sk(sk); + struct inet_sock *newinet; + + newinet = inet_sk(newsk); + + rcu_read_lock(); + inet_opt = rcu_dereference(inet->inet_opt); + if (inet_opt) { + newopt = sock_kmalloc(newsk, sizeof(*inet_opt) + + inet_opt->opt.optlen, GFP_ATOMIC); + if (newopt) + memcpy(newopt, inet_opt, sizeof(*inet_opt) + + inet_opt->opt.optlen); + else + net_warn_ratelimited("%s: Failed to copy ip options\n", __func__); + } + RCU_INIT_POINTER(newinet->inet_opt, newopt); + rcu_read_unlock(); +} + struct sock *mptcp_sk_clone_init(const struct sock *sk, const struct mptcp_options_received *mp_opt, struct sock *ssk, @@ -3214,6 +3256,13 @@ struct sock *mptcp_sk_clone_init(const s
__mptcp_init_sock(nsk);
+#if IS_ENABLED(CONFIG_MPTCP_IPV6) + if (nsk->sk_family == AF_INET6) + mptcp_copy_ip6_options(nsk, sk); + else +#endif + mptcp_copy_ip_options(nsk, sk); + msk = mptcp_sk(nsk); msk->local_key = subflow_req->local_key; msk->token = subflow_req->token;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit d6a9608af9a75d13243d217f6ce1e30e57d56ffe upstream.
Syzbot and Eric reported a lockdep splat in the subflow diag:
WARNING: possible circular locking dependency detected 6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted
syz-executor.2/24141 is trying to acquire lock: ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
but task is already holding lock: ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&h->lhash2[i].lock){+.+.}-{2:2}: lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743 inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261 __inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217 inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239 rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316 rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577 ops_init+0x352/0x610 net/core/net_namespace.c:136 __register_pernet_operations net/core/net_namespace.c:1214 [inline] register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283 register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370 rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735 do_one_initcall+0x238/0x830 init/main.c:1236 do_initcall_level+0x157/0x210 init/main.c:1298 do_initcalls+0x3f/0x80 init/main.c:1314 kernel_init_freeable+0x42f/0x5d0 init/main.c:1551 kernel_init+0x1d/0x2a0 init/main.c:1441 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
-> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 lock_sock_fast include/net/sock.h:1723 [inline] subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28 tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345 inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061 __inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263 inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371 netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264 __netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:338 [inline] inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405 sock_diag_rcv_msg+0xe7/0x410 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77
As noted by Eric we can break the lock dependency chain avoid dumping any extended info for the mptcp subflow listener: nothing actually useful is presented there.
Fixes: b8adb69a7d29 ("mptcp: fix lockless access in subflow ULP diag") Cc: stable@vger.kernel.org Reported-by: Eric Dumazet edumazet@google.com Closes: https://lore.kernel.org/netdev/CANn89iJ=Oecw6OZDwmSYc9HJKQ_G32uN11L+oUcMu+TO... Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-9-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/diag.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/mptcp/diag.c +++ b/net/mptcp/diag.c @@ -21,6 +21,9 @@ static int subflow_get_info(struct sock bool slow; int err;
+ if (inet_sk_state_load(sk) == TCP_LISTEN) + return 0; + start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP); if (!start) return -EMSGSIZE;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Ujfalusi peter.ujfalusi@gmail.com
commit 7a29fa05aeca2c16193f00a883c56ffc7c25b6c5 upstream.
The core twl chip is probed via i2c and the dev->driver->of_match_table is NULL, causing the driver to fail to probe.
This partially reverts:
commit 1e0c866887f4 ("mfd: Use device_get_match_data() in a bunch of drivers")
Fixes: 1e0c866887f4 ("mfd: Use device_get_match_data() in a bunch of drivers") Signed-off-by: Peter Ujfalusi peter.ujfalusi@gmail.com Link: https://lore.kernel.org/r/20231029114843.15553-1-peter.ujfalusi@gmail.com Signed-off-by: Lee Jones lee@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mfd/twl6030-irq.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/mfd/twl6030-irq.c +++ b/drivers/mfd/twl6030-irq.c @@ -24,10 +24,10 @@ #include <linux/kthread.h> #include <linux/mfd/twl.h> #include <linux/platform_device.h> -#include <linux/property.h> #include <linux/suspend.h> #include <linux/of.h> #include <linux/irqdomain.h> +#include <linux/of_device.h>
#include "twl-core.h"
@@ -368,10 +368,10 @@ int twl6030_init_irq(struct device *dev, int nr_irqs; int status; u8 mask[3]; - const int *irq_tbl; + const struct of_device_id *of_id;
- irq_tbl = device_get_match_data(dev); - if (!irq_tbl) { + of_id = of_match_device(twl6030_of_match, dev); + if (!of_id || !of_id->data) { dev_err(dev, "Unknown TWL device model\n"); return -EINVAL; } @@ -409,7 +409,7 @@ int twl6030_init_irq(struct device *dev,
twl6030_irq->pm_nb.notifier_call = twl6030_irq_pm_notifier; atomic_set(&twl6030_irq->wakeirqs, 0); - twl6030_irq->irq_mapping_tbl = irq_tbl; + twl6030_irq->irq_mapping_tbl = of_id->data;
twl6030_irq->irq_domain = irq_domain_add_linear(node, nr_irqs,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
when AOP_WRITEPAGE_ACTIVATE is returned (as NFS does when it detects congestion) it is important that the folio is redirtied. nfs_writepage_locked() doesn't do this, so files can become corrupted as writes can be lost.
Note that this is not needed in v6.8 as AOP_WRITEPAGE_ACTIVATE cannot be returned. It is needed for kernels v5.18..v6.7. Prior to 6.3 the patch is different as it needs to mention "page", not "folio".
Reported-and-tested-by: Jacek Tomaka Jacek.Tomaka@poczta.fm Fixes: 6df25e58532b ("nfs: remove reliance on bdi congestion") Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/write.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -668,8 +668,10 @@ static int nfs_writepage_locked(struct f int err;
if (wbc->sync_mode == WB_SYNC_NONE && - NFS_SERVER(inode)->write_congested) + NFS_SERVER(inode)->write_congested) { + folio_redirty_for_writepage(wbc, folio); return AOP_WRITEPAGE_ACTIVATE; + }
nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE); nfs_pageio_init_write(&pgio, inode, 0, false,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 25236c91b5ab4a26a56ba2e79b8060cf4e047839 upstream.
syzbot reported a task hung; at the same time, GC was looping infinitely in list_for_each_entry_safe() for OOB skb. [0]
syzbot demonstrated that the list_for_each_entry_safe() was not actually safe in this case.
A single skb could have references for multiple sockets. If we free such a skb in the list_for_each_entry_safe(), the current and next sockets could be unlinked in a single iteration.
unix_notinflight() uses list_del_init() to unlink the socket, so the prefetched next socket forms a loop itself and list_for_each_entry_safe() never stops.
Here, we must use while() and make sure we always fetch the first socket.
[0]: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 5065 Comm: syz-executor236 Not tainted 6.8.0-rc3-syzkaller-00136-g1f719a2f3fa6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline] RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] RIP: 0010:__sanitizer_cov_trace_pc+0xd/0x60 kernel/kcov.c:207 Code: cc cc cc cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 65 48 8b 14 25 40 c2 03 00 <65> 8b 05 b4 7c 78 7e a9 00 01 ff 00 48 8b 34 24 74 0f f6 c4 01 74 RSP: 0018:ffffc900033efa58 EFLAGS: 00000283 RAX: ffff88807b077800 RBX: ffff88807b077800 RCX: 1ffffffff27b1189 RDX: ffff88802a5a3b80 RSI: ffffffff8968488d RDI: ffff88807b077f70 RBP: ffffc900033efbb0 R08: 0000000000000001 R09: fffffbfff27a900c R10: ffffffff93d48067 R11: ffffffff8ae000eb R12: ffff88807b077800 R13: dffffc0000000000 R14: ffff88807b077e40 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564f4fc1e3a8 CR3: 000000000d57a000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <NMI> </NMI> <TASK> unix_gc+0x563/0x13b0 net/unix/garbage.c:319 unix_release_sock+0xa93/0xf80 net/unix/af_unix.c:683 unix_release+0x91/0xf0 net/unix/af_unix.c:1064 __sock_release+0xb0/0x270 net/socket.c:659 sock_close+0x1c/0x30 net/socket.c:1421 __fput+0x270/0xb80 fs/file_table.c:376 task_work_run+0x14f/0x250 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xa8a/0x2ad0 kernel/exit.c:871 do_group_exit+0xd4/0x2a0 kernel/exit.c:1020 __do_sys_exit_group kernel/exit.c:1031 [inline] __se_sys_exit_group kernel/exit.c:1029 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1029 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd5/0x270 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f9d6cbdac09 Code: Unable to access opcode bytes at 0x7f9d6cbdabdf. RSP: 002b:00007fff5952feb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9d6cbdac09 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 RBP: 00007f9d6cc552b0 R08: ffffffffffffffb8 R09: 0000000000000006 R10: 0000000000000006 R11: 0000000000000246 R12: 00007f9d6cc552b0 R13: 0000000000000000 R14: 00007f9d6cc55d00 R15: 00007f9d6cbabe70 </TASK>
Reported-by: syzbot+4fa4a2d1f5a5ee06f006@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4fa4a2d1f5a5ee06f006 Fixes: 1279f9d9dec2 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240209220453.96053-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/unix/garbage.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -315,10 +315,11 @@ void unix_gc(void) __skb_queue_purge(&hitlist);
#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - list_for_each_entry_safe(u, next, &gc_candidates, link) { - struct sk_buff *skb = u->oob_skb; + while (!list_empty(&gc_candidates)) { + u = list_entry(gc_candidates.next, struct unix_sock, link); + if (u->oob_skb) { + struct sk_buff *skb = u->oob_skb;
- if (skb) { u->oob_skb = NULL; kfree_skb(skb); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
commit aa82ac51d63328714645c827775d64dbfd9941f3 upstream.
syzbot reported another task hung in __unix_gc(). [0]
The current while loop assumes that all of the left candidates have oob_skb and calling kfree_skb(oob_skb) releases the remaining candidates.
However, I missed a case that oob_skb has self-referencing fd and another fd and the latter sk is placed before the former in the candidate list. Then, the while loop never proceeds, resulting the task hung.
__unix_gc() has the same loop just before purging the collected skb, so we can call kfree_skb(oob_skb) there and let __skb_queue_purge() release all inflight sockets.
[0]: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 2784 Comm: kworker/u4:8 Not tainted 6.8.0-rc4-syzkaller-01028-g71b605d32017 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Workqueue: events_unbound __unix_gc RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:200 Code: 89 fb e8 23 00 00 00 48 8b 3d 84 f5 1a 0c 48 89 de 5b e9 43 26 57 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0d 90 52 70 7e 65 8b 15 91 52 70 RSP: 0018:ffffc9000a17fa78 EFLAGS: 00000287 RAX: ffffffff8a0a6108 RBX: ffff88802b6c2640 RCX: ffff88802c0b3b80 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 RBP: ffffc9000a17fbf0 R08: ffffffff89383f1d R09: 1ffff1100ee5ff84 R10: dffffc0000000000 R11: ffffed100ee5ff85 R12: 1ffff110056d84ee R13: ffffc9000a17fae0 R14: 0000000000000000 R15: ffffffff8f47b840 FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffef5687ff8 CR3: 0000000029b34000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <NMI> </NMI> <TASK> __unix_gc+0xe69/0xf40 net/unix/garbage.c:343 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706 worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787 kthread+0x2ef/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 </TASK>
Reported-and-tested-by: syzbot+ecab4d36f920c3574bf9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ecab4d36f920c3574bf9 Fixes: 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/unix/garbage.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
--- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -284,9 +284,17 @@ void unix_gc(void) * which are creating the cycle(s). */ skb_queue_head_init(&hitlist); - list_for_each_entry(u, &gc_candidates, link) + list_for_each_entry(u, &gc_candidates, link) { scan_children(&u->sk, inc_inflight, &hitlist);
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + kfree_skb(u->oob_skb); + u->oob_skb = NULL; + } +#endif + } + /* not_cycle_list contains those sockets which do not make up a * cycle. Restore these to the inflight list. */ @@ -314,18 +322,6 @@ void unix_gc(void) /* Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist);
-#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - while (!list_empty(&gc_candidates)) { - u = list_entry(gc_candidates.next, struct unix_sock, link); - if (u->oob_skb) { - struct sk_buff *skb = u->oob_skb; - - u->oob_skb = NULL; - kfree_skb(skb); - } - } -#endif - spin_lock(&unix_gc_lock);
/* There could be io_uring registered files, just push them back to
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
commit 4703b014f28bf7a2e56d1da238ee95ef6c5ce76b upstream.
It looks like the "!" character was added accidentally. The regmap_update_bits_check() function is normally going to succeed. This means the rest of the function is unreachable and we don't handle the situation where "changed" is true correctly.
Fixes: 07f7d6e7a124 ("ASoC: cs35l56: Fix for initializing ASP1 mixer registers") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Richard Fitzgerald rf@opensource.cirrus.com Link: https://lore.kernel.org/r/0c254c07-d1c0-4a5c-a22b-7e135cab032c@moroto.mounta... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/codecs/cs35l56.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -175,7 +175,7 @@ static int cs35l56_dspwait_asp1tx_put(st
ret = regmap_update_bits_check(cs35l56->base.regmap, addr, CS35L56_ASP_TXn_SRC_MASK, val, &changed); - if (!ret) + if (ret) return ret;
if (changed)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit cd665bfc757c71e9b7e0abff0f362d8abd38a805 ]
The current check of ch_en enabled to know the maximum number of available hardware channels is wrong as it check the number of ch_en register set but all of them are unset at probe. This register is set at the dw_hdma_v0_core_start function which is run lately before a DMA transfer.
The HDMA IP have no way to know the number of hardware channels available like the eDMA IP, then let set it to maximum channels and let the platform set the right number of channels.
Fixes: e74c39573d35 ("dmaengine: dw-edma: Add support for native HDMA") Acked-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Reviewed-by: Serge Semin fancer.lancer@gmail.com Signed-off-by: Kory Maincent kory.maincent@bootlin.com Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-1-8e8c1acb7a4... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/dw-edma/dw-hdma-v0-core.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/dma/dw-edma/dw-hdma-v0-core.c b/drivers/dma/dw-edma/dw-hdma-v0-core.c index 00b735a0202ab..1f4cb7db54756 100644 --- a/drivers/dma/dw-edma/dw-hdma-v0-core.c +++ b/drivers/dma/dw-edma/dw-hdma-v0-core.c @@ -65,18 +65,12 @@ static void dw_hdma_v0_core_off(struct dw_edma *dw)
static u16 dw_hdma_v0_core_ch_count(struct dw_edma *dw, enum dw_edma_dir dir) { - u32 num_ch = 0; - int id; - - for (id = 0; id < HDMA_V0_MAX_NR_CH; id++) { - if (GET_CH_32(dw, id, dir, ch_en) & BIT(0)) - num_ch++; - } - - if (num_ch > HDMA_V0_MAX_NR_CH) - num_ch = HDMA_V0_MAX_NR_CH; - - return (u16)num_ch; + /* + * The HDMA IP have no way to know the number of hardware channels + * available, we set it to maximum channels and let the platform + * set the right number of channels. + */ + return HDMA_V0_MAX_NR_CH; }
static enum dma_status dw_hdma_v0_core_ch_status(struct dw_edma_chan *chan)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit 7b52ba8616e978bf4f38f207f11a8176517244d0 ]
Instead of setting HDMA_V0_LOCAL_ABORT_INT_EN bit, HDMA_V0_LOCAL_STOP_INT_EN bit got set twice, due to which the abort interrupt is not getting generated for HDMA. Fix it by setting the correct interrupt enable bit.
Fixes: e74c39573d35 ("dmaengine: dw-edma: Add support for native HDMA") Reviewed-by: Serge Semin fancer.lancer@gmail.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Kory Maincent kory.maincent@bootlin.com Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-2-8e8c1acb7a4... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/dw-edma/dw-hdma-v0-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/dw-edma/dw-hdma-v0-core.c b/drivers/dma/dw-edma/dw-hdma-v0-core.c index 1f4cb7db54756..108f9127aaaaf 100644 --- a/drivers/dma/dw-edma/dw-hdma-v0-core.c +++ b/drivers/dma/dw-edma/dw-hdma-v0-core.c @@ -236,7 +236,7 @@ static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first) /* Interrupt enable&unmask - done, abort */ tmp = GET_CH_32(dw, chan->dir, chan->id, int_setup) | HDMA_V0_STOP_INT_MASK | HDMA_V0_ABORT_INT_MASK | - HDMA_V0_LOCAL_STOP_INT_EN | HDMA_V0_LOCAL_STOP_INT_EN; + HDMA_V0_LOCAL_STOP_INT_EN | HDMA_V0_LOCAL_ABORT_INT_EN; SET_CH_32(dw, chan->dir, chan->id, int_setup, tmp); /* Channel control */ SET_CH_32(dw, chan->dir, chan->id, control1, HDMA_V0_LINKLIST_EN);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit 930a8a015dcfde4b8906351ff081066dc277748c ]
Fix "HDMA_V0_REMOTEL_STOP_INT_EN" typo error
Fixes: e74c39573d35 ("dmaengine: dw-edma: Add support for native HDMA") Reviewed-by: Serge Semin fancer.lancer@gmail.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Kory Maincent kory.maincent@bootlin.com Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-3-8e8c1acb7a4... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/dw-edma/dw-hdma-v0-regs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/dw-edma/dw-hdma-v0-regs.h b/drivers/dma/dw-edma/dw-hdma-v0-regs.h index a974abdf8aaf5..eab5fd7177e54 100644 --- a/drivers/dma/dw-edma/dw-hdma-v0-regs.h +++ b/drivers/dma/dw-edma/dw-hdma-v0-regs.h @@ -15,7 +15,7 @@ #define HDMA_V0_LOCAL_ABORT_INT_EN BIT(6) #define HDMA_V0_REMOTE_ABORT_INT_EN BIT(5) #define HDMA_V0_LOCAL_STOP_INT_EN BIT(4) -#define HDMA_V0_REMOTEL_STOP_INT_EN BIT(3) +#define HDMA_V0_REMOTE_STOP_INT_EN BIT(3) #define HDMA_V0_ABORT_INT_MASK BIT(2) #define HDMA_V0_STOP_INT_MASK BIT(0) #define HDMA_V0_LINKLIST_EN BIT(0)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit e2f6a5789051ee9c632f27a12d0f01f0cbf78aac ]
Only the local interruption was configured, remote interrupt was left behind. This patch fix it by setting stop and abort remote interrupts when the DW_EDMA_CHIP_LOCAL flag is not set.
Fixes: e74c39573d35 ("dmaengine: dw-edma: Add support for native HDMA") Signed-off-by: Kory Maincent kory.maincent@bootlin.com Reviewed-by: Serge Semin fancer.lancer@gmail.com Acked-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-4-8e8c1acb7a4... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/dw-edma/dw-hdma-v0-core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/dma/dw-edma/dw-hdma-v0-core.c b/drivers/dma/dw-edma/dw-hdma-v0-core.c index 108f9127aaaaf..04b0bcb6ded97 100644 --- a/drivers/dma/dw-edma/dw-hdma-v0-core.c +++ b/drivers/dma/dw-edma/dw-hdma-v0-core.c @@ -237,6 +237,8 @@ static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first) tmp = GET_CH_32(dw, chan->dir, chan->id, int_setup) | HDMA_V0_STOP_INT_MASK | HDMA_V0_ABORT_INT_MASK | HDMA_V0_LOCAL_STOP_INT_EN | HDMA_V0_LOCAL_ABORT_INT_EN; + if (!(dw->chip->flags & DW_EDMA_CHIP_LOCAL)) + tmp |= HDMA_V0_REMOTE_STOP_INT_EN | HDMA_V0_REMOTE_ABORT_INT_EN; SET_CH_32(dw, chan->dir, chan->id, int_setup, tmp); /* Channel control */ SET_CH_32(dw, chan->dir, chan->id, control1, HDMA_V0_LINKLIST_EN);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit 712a92a48158e02155b4b6b21e03a817f78c9b7e ]
The Linked list element and pointer are not stored in the same memory as the HDMA controller register. If the doorbell register is toggled before the full write of the linked list a race condition error will occur. In remote setup we can only use a readl to the memory to assure the full write has occurred.
Fixes: e74c39573d35 ("dmaengine: dw-edma: Add support for native HDMA") Reviewed-by: Serge Semin fancer.lancer@gmail.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Kory Maincent kory.maincent@bootlin.com Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-5-8e8c1acb7a4... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/dw-edma/dw-hdma-v0-core.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/dma/dw-edma/dw-hdma-v0-core.c b/drivers/dma/dw-edma/dw-hdma-v0-core.c index 04b0bcb6ded97..10e8f0715114f 100644 --- a/drivers/dma/dw-edma/dw-hdma-v0-core.c +++ b/drivers/dma/dw-edma/dw-hdma-v0-core.c @@ -222,6 +222,20 @@ static void dw_hdma_v0_core_write_chunk(struct dw_edma_chunk *chunk) dw_hdma_v0_write_ll_link(chunk, i, control, chunk->ll_region.paddr); }
+static void dw_hdma_v0_sync_ll_data(struct dw_edma_chunk *chunk) +{ + /* + * In case of remote HDMA engine setup, the DW PCIe RP/EP internal + * configuration registers and application memory are normally accessed + * over different buses. Ensure LL-data reaches the memory before the + * doorbell register is toggled by issuing the dummy-read from the remote + * LL memory in a hope that the MRd TLP will return only after the + * last MWr TLP is completed + */ + if (!(chunk->chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL)) + readl(chunk->ll_region.vaddr.io); +} + static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first) { struct dw_edma_chan *chan = chunk->chan; @@ -252,6 +266,9 @@ static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first) /* Set consumer cycle */ SET_CH_32(dw, chan->dir, chan->id, cycle_sync, HDMA_V0_CONSUMER_CYCLE_STAT | HDMA_V0_CONSUMER_CYCLE_BIT); + + dw_hdma_v0_sync_ll_data(chunk); + /* Doorbell */ SET_CH_32(dw, chan->dir, chan->id, doorbell, HDMA_V0_DOORBELL_START); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit bbcc1c83f343e580c3aa1f2a8593343bf7b55bba ]
The Linked list element and pointer are not stored in the same memory as the eDMA controller register. If the doorbell register is toggled before the full write of the linked list a race condition error will occur. In remote setup we can only use a readl to the memory to assure the full write has occurred.
Fixes: 7e4b8a4fbe2c ("dmaengine: Add Synopsys eDMA IP version 0 support") Reviewed-by: Serge Semin fancer.lancer@gmail.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Kory Maincent kory.maincent@bootlin.com Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-6-8e8c1acb7a4... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/dw-edma/dw-edma-v0-core.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c index b38786f0ad799..b75fdaffad9a4 100644 --- a/drivers/dma/dw-edma/dw-edma-v0-core.c +++ b/drivers/dma/dw-edma/dw-edma-v0-core.c @@ -346,6 +346,20 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk) dw_edma_v0_write_ll_link(chunk, i, control, chunk->ll_region.paddr); }
+static void dw_edma_v0_sync_ll_data(struct dw_edma_chunk *chunk) +{ + /* + * In case of remote eDMA engine setup, the DW PCIe RP/EP internal + * configuration registers and application memory are normally accessed + * over different buses. Ensure LL-data reaches the memory before the + * doorbell register is toggled by issuing the dummy-read from the remote + * LL memory in a hope that the MRd TLP will return only after the + * last MWr TLP is completed + */ + if (!(chunk->chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL)) + readl(chunk->ll_region.vaddr.io); +} + static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first) { struct dw_edma_chan *chan = chunk->chan; @@ -412,6 +426,9 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first) SET_CH_32(dw, chan->dir, chan->id, llp.msb, upper_32_bits(chunk->ll_region.paddr)); } + + dw_edma_v0_sync_ll_data(chunk); + /* Doorbell */ SET_RW_32(dw, chan->dir, doorbell, FIELD_PREP(EDMA_V0_DOORBELL_CH_MASK, chan->id));
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Stein alexander.stein@ew.tq-group.com
[ Upstream commit 7936378cb6d87073163130e1e1fc1e5f76a597cf ]
Devicetree spec lists only dashes as valid characters for alias names. Table 3.2: Valid characters for alias names, Devicee Specification, Release v0.4
Signed-off-by: Alexander Stein alexander.stein@ew.tq-group.com Fixes: 3fbae284887de ("phy: freescale: phy-fsl-imx8-mipi-dphy: Add i.MX8qxp LVDS PHY mode support") Link: https://lore.kernel.org/r/20240110093343.468810-1-alexander.stein@ew.tq-grou... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c b/drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c index e625b32889bfc..0928a526e2ab3 100644 --- a/drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c +++ b/drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c @@ -706,7 +706,7 @@ static int mixel_dphy_probe(struct platform_device *pdev) return ret; }
- priv->id = of_alias_get_id(np, "mipi_dphy"); + priv->id = of_alias_get_id(np, "mipi-dphy"); if (priv->id < 0) { dev_err(dev, "Failed to get phy node alias id: %d\n", priv->id);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit 95055beb067cb30f626fb10f7019737ca7681df0 ]
It should be 'qphy->vreg' passed to PTR_ERR() when devm_regulator_get() fails.
Fixes: 08e49af50701 ("phy: qcom: Introduce M31 USB PHY driver") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Varadarajan Narayanan quic_varada@quicinc.com Link: https://lore.kernel.org/r/20230824091345.1072650-1-yangyingliang@huawei.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/qualcomm/phy-qcom-m31.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-m31.c b/drivers/phy/qualcomm/phy-qcom-m31.c index c2590579190a9..03fb0d4b75d74 100644 --- a/drivers/phy/qualcomm/phy-qcom-m31.c +++ b/drivers/phy/qualcomm/phy-qcom-m31.c @@ -299,7 +299,7 @@ static int m31usb_phy_probe(struct platform_device *pdev)
qphy->vreg = devm_regulator_get(dev, "vdda-phy"); if (IS_ERR(qphy->vreg)) - return dev_err_probe(dev, PTR_ERR(qphy->phy), + return dev_err_probe(dev, PTR_ERR(qphy->vreg), "failed to get vreg\n");
phy_set_drvdata(qphy->phy, qphy);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit d4c08d8b23b22807c712208cd05cb047e92e7672 ]
The MSM8996 platform has registers setup different to the rest of QMP v3 USB platforms. It has PCS region at 0x600 and no PCS_MISC region, while other platforms have PCS region at 0x800 and PCS_MISC at 0x600. This results in the malfunctioning USB host on some of the platforms. The commit f74c35b630d4 ("phy: qcom-qmp-usb: fix register offsets for ipq8074/ipq6018") fixed the issue for IPQ platforms, but missed the SDM845 which has the same register layout.
To simplify future platform addition and to make the driver more future proof, rename qmp_usb_offsets_v3 to qmp_usb_offsets_v3_msm8996 (to mark its peculiarity), rename qmp_usb_offsets_ipq8074 to qmp_usb_offsets_v3 and use it for SDM845 platform.
Fixes: 2be22aae6b18 ("phy: qcom-qmp-usb: populate offsets configuration") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20240213133824.2218916-1-dmitry.baryshkov@linaro.o... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/qualcomm/phy-qcom-qmp-usb.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-usb.c b/drivers/phy/qualcomm/phy-qcom-qmp-usb.c index a3719719e2e0f..365f5d85847b8 100644 --- a/drivers/phy/qualcomm/phy-qcom-qmp-usb.c +++ b/drivers/phy/qualcomm/phy-qcom-qmp-usb.c @@ -1276,7 +1276,7 @@ static const char * const qmp_phy_vreg_l[] = { "vdda-phy", "vdda-pll", };
-static const struct qmp_usb_offsets qmp_usb_offsets_ipq8074 = { +static const struct qmp_usb_offsets qmp_usb_offsets_v3 = { .serdes = 0, .pcs = 0x800, .pcs_misc = 0x600, @@ -1292,7 +1292,7 @@ static const struct qmp_usb_offsets qmp_usb_offsets_ipq9574 = { .rx = 0x400, };
-static const struct qmp_usb_offsets qmp_usb_offsets_v3 = { +static const struct qmp_usb_offsets qmp_usb_offsets_v3_msm8996 = { .serdes = 0, .pcs = 0x600, .tx = 0x200, @@ -1328,7 +1328,7 @@ static const struct qmp_usb_offsets qmp_usb_offsets_v5 = { static const struct qmp_phy_cfg ipq6018_usb3phy_cfg = { .lanes = 1,
- .offsets = &qmp_usb_offsets_ipq8074, + .offsets = &qmp_usb_offsets_v3,
.serdes_tbl = ipq9574_usb3_serdes_tbl, .serdes_tbl_num = ARRAY_SIZE(ipq9574_usb3_serdes_tbl), @@ -1346,7 +1346,7 @@ static const struct qmp_phy_cfg ipq6018_usb3phy_cfg = { static const struct qmp_phy_cfg ipq8074_usb3phy_cfg = { .lanes = 1,
- .offsets = &qmp_usb_offsets_ipq8074, + .offsets = &qmp_usb_offsets_v3,
.serdes_tbl = ipq8074_usb3_serdes_tbl, .serdes_tbl_num = ARRAY_SIZE(ipq8074_usb3_serdes_tbl), @@ -1382,7 +1382,7 @@ static const struct qmp_phy_cfg ipq9574_usb3phy_cfg = { static const struct qmp_phy_cfg msm8996_usb3phy_cfg = { .lanes = 1,
- .offsets = &qmp_usb_offsets_v3, + .offsets = &qmp_usb_offsets_v3_msm8996,
.serdes_tbl = msm8996_usb3_serdes_tbl, .serdes_tbl_num = ARRAY_SIZE(msm8996_usb3_serdes_tbl),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fenghua Yu fenghua.yu@intel.com
[ Upstream commit ecec7c9f29a7114a3e23a14020b1149ea7dffb4f ]
head is defined in idxd->evl as a shadow of head in the EVLSTATUS register. There are two issues related to the shadow head:
1. Mismatch between the shadow head and the state of the EVLSTATUS register: If Event Log is supported, upon completion of the Enable Device command, the Event Log head in the variable idxd->evl->head should be cleared to match the state of the EVLSTATUS register. But the variable is not reset currently, leading mismatch between the variable and the register state. The mismatch causes incorrect processing of Event Log entries.
2. Unnecessary shadow head definition: The shadow head is unnecessary as head can be read directly from the EVLSTATUS register. Reading head from the register incurs no additional cost because event log head and tail are always read together and tail is already read directly from the register as required by hardware.
Remove the shadow Event Log head stored in idxd->evl to address the mentioned issues.
Fixes: 244da66cda35 ("dmaengine: idxd: setup event log configuration") Signed-off-by: Fenghua Yu fenghua.yu@intel.com Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20240215024931.1739621-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/idxd/cdev.c | 2 +- drivers/dma/idxd/debugfs.c | 2 +- drivers/dma/idxd/idxd.h | 1 - drivers/dma/idxd/irq.c | 3 +-- 4 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 0423655f5a880..3dd25a9a04f18 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -345,7 +345,7 @@ static void idxd_cdev_evl_drain_pasid(struct idxd_wq *wq, u32 pasid) spin_lock(&evl->lock); status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET); t = status.tail; - h = evl->head; + h = status.head; size = evl->size;
while (h != t) { diff --git a/drivers/dma/idxd/debugfs.c b/drivers/dma/idxd/debugfs.c index 9cfbd9b14c4c4..f3f25ee676f30 100644 --- a/drivers/dma/idxd/debugfs.c +++ b/drivers/dma/idxd/debugfs.c @@ -68,9 +68,9 @@ static int debugfs_evl_show(struct seq_file *s, void *d)
spin_lock(&evl->lock);
- h = evl->head; evl_status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET); t = evl_status.tail; + h = evl_status.head; evl_size = evl->size;
seq_printf(s, "Event Log head %u tail %u interrupt pending %u\n\n", diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 1e89c80a07fc2..96062ae39f9a2 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -290,7 +290,6 @@ struct idxd_evl { unsigned int log_size; /* The number of entries in the event log. */ u16 size; - u16 head; unsigned long *bmap; bool batch_fail[IDXD_MAX_BATCH_IDENT]; }; diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index 2183d7f9cdbdd..3bdfc1797f620 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -367,9 +367,9 @@ static void process_evl_entries(struct idxd_device *idxd) /* Clear interrupt pending bit */ iowrite32(evl_status.bits_upper32, idxd->reg_base + IDXD_EVLSTATUS_OFFSET + sizeof(u32)); - h = evl->head; evl_status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET); t = evl_status.tail; + h = evl_status.head; size = idxd->evl->size;
while (h != t) { @@ -378,7 +378,6 @@ static void process_evl_entries(struct idxd_device *idxd) h = (h + 1) % size; }
- evl->head = h; evl_status.head = h; iowrite32(evl_status.bits_lower32, idxd->reg_base + IDXD_EVLSTATUS_OFFSET); spin_unlock(&evl->lock);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fenghua Yu fenghua.yu@intel.com
[ Upstream commit d3ea125df37dc37972d581b74a5d3785c3f283ab ]
If CONFIG_HARDENED_USERCOPY is enabled, copying completion record from event log cache to user triggers a kernel bug.
[ 1987.159822] usercopy: Kernel memory exposure attempt detected from SLUB object 'dsa0' (offset 74, size 31)! [ 1987.170845] ------------[ cut here ]------------ [ 1987.176086] kernel BUG at mm/usercopy.c:102! [ 1987.180946] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 1987.186866] CPU: 17 PID: 528 Comm: kworker/17:1 Not tainted 6.8.0-rc2+ #5 [ 1987.194537] Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023 [ 1987.206405] Workqueue: wq0.0 idxd_evl_fault_work [idxd] [ 1987.212338] RIP: 0010:usercopy_abort+0x72/0x90 [ 1987.217381] Code: 58 65 9c 50 48 c7 c2 17 85 61 9c 57 48 c7 c7 98 fd 6b 9c 48 0f 44 d6 48 c7 c6 b3 08 62 9c 4c 89 d1 49 0f 44 f3 e8 1e 2e d5 ff <0f> 0b 49 c7 c1 9e 42 61 9c 4c 89 cf 4d 89 c8 eb a9 66 66 2e 0f 1f [ 1987.238505] RSP: 0018:ff62f5cf20607d60 EFLAGS: 00010246 [ 1987.244423] RAX: 000000000000005f RBX: 000000000000001f RCX: 0000000000000000 [ 1987.252480] RDX: 0000000000000000 RSI: ffffffff9c61429e RDI: 00000000ffffffff [ 1987.260538] RBP: ff62f5cf20607d78 R08: ff2a6a89ef3fffe8 R09: 00000000fffeffff [ 1987.268595] R10: ff2a6a89eed00000 R11: 0000000000000003 R12: ff2a66934849c89a [ 1987.276652] R13: 0000000000000001 R14: ff2a66934849c8b9 R15: ff2a66934849c899 [ 1987.284710] FS: 0000000000000000(0000) GS:ff2a66b22fe40000(0000) knlGS:0000000000000000 [ 1987.293850] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1987.300355] CR2: 00007fe291a37000 CR3: 000000010fbd4005 CR4: 0000000000f71ef0 [ 1987.308413] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1987.316470] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 1987.324527] PKRU: 55555554 [ 1987.327622] Call Trace: [ 1987.330424] <TASK> [ 1987.332826] ? show_regs+0x6e/0x80 [ 1987.336703] ? die+0x3c/0xa0 [ 1987.339988] ? do_trap+0xd4/0xf0 [ 1987.343662] ? do_error_trap+0x75/0xa0 [ 1987.347922] ? usercopy_abort+0x72/0x90 [ 1987.352277] ? exc_invalid_op+0x57/0x80 [ 1987.356634] ? usercopy_abort+0x72/0x90 [ 1987.360988] ? asm_exc_invalid_op+0x1f/0x30 [ 1987.365734] ? usercopy_abort+0x72/0x90 [ 1987.370088] __check_heap_object+0xb7/0xd0 [ 1987.374739] __check_object_size+0x175/0x2d0 [ 1987.379588] idxd_copy_cr+0xa9/0x130 [idxd] [ 1987.384341] idxd_evl_fault_work+0x127/0x390 [idxd] [ 1987.389878] process_one_work+0x13e/0x300 [ 1987.394435] ? __pfx_worker_thread+0x10/0x10 [ 1987.399284] worker_thread+0x2f7/0x420 [ 1987.403544] ? _raw_spin_unlock_irqrestore+0x2b/0x50 [ 1987.409171] ? __pfx_worker_thread+0x10/0x10 [ 1987.414019] kthread+0x107/0x140 [ 1987.417693] ? __pfx_kthread+0x10/0x10 [ 1987.421954] ret_from_fork+0x3d/0x60 [ 1987.426019] ? __pfx_kthread+0x10/0x10 [ 1987.430281] ret_from_fork_asm+0x1b/0x30 [ 1987.434744] </TASK>
The issue arises because event log cache is created using kmem_cache_create() which is not suitable for user copy.
Fix the issue by creating event log cache with kmem_cache_create_usercopy(), ensuring safe user copy.
Fixes: c2f156bf168f ("dmaengine: idxd: create kmem cache for event log fault items") Reported-by: Tony Zhu tony.zhu@intel.com Tested-by: Tony Zhu tony.zhu@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Reviewed-by: Lijun Pan lijun.pan@intel.com Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20240209191412.1050270-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/idxd/init.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 0eb1c827a215f..d09a8553ea71d 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -342,7 +342,9 @@ static void idxd_cleanup_internals(struct idxd_device *idxd) static int idxd_init_evl(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; + unsigned int evl_cache_size; struct idxd_evl *evl; + const char *idxd_name;
if (idxd->hw.gen_cap.evl_support == 0) return 0; @@ -354,9 +356,16 @@ static int idxd_init_evl(struct idxd_device *idxd) spin_lock_init(&evl->lock); evl->size = IDXD_EVL_SIZE_MIN;
- idxd->evl_cache = kmem_cache_create(dev_name(idxd_confdev(idxd)), - sizeof(struct idxd_evl_fault) + evl_ent_size(idxd), - 0, 0, NULL); + idxd_name = dev_name(idxd_confdev(idxd)); + evl_cache_size = sizeof(struct idxd_evl_fault) + evl_ent_size(idxd); + /* + * Since completion record in evl_cache will be copied to user + * when handling completion record page fault, need to create + * the cache suitable for user copy. + */ + idxd->evl_cache = kmem_cache_create_usercopy(idxd_name, evl_cache_size, + 0, 0, 0, evl_cache_size, + NULL); if (!idxd->evl_cache) { kfree(evl); return -ENOMEM;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gaurav Batra gbatra@linux.vnet.ibm.com
[ Upstream commit 09a3c1e46142199adcee372a420b024b4fc61051 ]
When kdump kernel tries to copy dump data over SR-IOV, LPAR panics due to NULL pointer exception:
Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc000000020847ad4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: mlx5_core(+) vmx_crypto pseries_wdt papr_scm libnvdimm mlxfw tls psample sunrpc fuse overlay squashfs loop CPU: 12 PID: 315 Comm: systemd-udevd Not tainted 6.4.0-Test102+ #12 Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries NIP: c000000020847ad4 LR: c00000002083b2dc CTR: 00000000006cd18c REGS: c000000029162ca0 TRAP: 0300 Not tainted (6.4.0-Test102+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48288244 XER: 00000008 CFAR: c00000002083b2d8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 ... NIP _find_next_zero_bit+0x24/0x110 LR bitmap_find_next_zero_area_off+0x5c/0xe0 Call Trace: dev_printk_emit+0x38/0x48 (unreliable) iommu_area_alloc+0xc4/0x180 iommu_range_alloc+0x1e8/0x580 iommu_alloc+0x60/0x130 iommu_alloc_coherent+0x158/0x2b0 dma_iommu_alloc_coherent+0x3c/0x50 dma_alloc_attrs+0x170/0x1f0 mlx5_cmd_init+0xc0/0x760 [mlx5_core] mlx5_function_setup+0xf0/0x510 [mlx5_core] mlx5_init_one+0x84/0x210 [mlx5_core] probe_one+0x118/0x2c0 [mlx5_core] local_pci_probe+0x68/0x110 pci_call_probe+0x68/0x200 pci_device_probe+0xbc/0x1a0 really_probe+0x104/0x540 __driver_probe_device+0xb4/0x230 driver_probe_device+0x54/0x130 __driver_attach+0x158/0x2b0 bus_for_each_dev+0xa8/0x130 driver_attach+0x34/0x50 bus_add_driver+0x16c/0x300 driver_register+0xa4/0x1b0 __pci_register_driver+0x68/0x80 mlx5_init+0xb8/0x100 [mlx5_core] do_one_initcall+0x60/0x300 do_init_module+0x7c/0x2b0
At the time of LPAR dump, before kexec hands over control to kdump kernel, DDWs (Dynamic DMA Windows) are scanned and added to the FDT. For the SR-IOV case, default DMA window "ibm,dma-window" is removed from the FDT and DDW added, for the device.
Now, kexec hands over control to the kdump kernel.
When the kdump kernel initializes, PCI busses are scanned and IOMMU group/tables created, in pci_dma_bus_setup_pSeriesLP(). For the SR-IOV case, there is no "ibm,dma-window". The original commit: b1fc44eaa9ba, fixes the path where memory is pre-mapped (direct mapped) to the DDW. When TCEs are direct mapped, there is no need to initialize IOMMU tables.
iommu_table_setparms_lpar() only considers "ibm,dma-window" property when initiallizing IOMMU table. In the scenario where TCEs are dynamically allocated for SR-IOV, newly created IOMMU table is not initialized. Later, when the device driver tries to enter TCEs for the SR-IOV device, NULL pointer execption is thrown from iommu_area_alloc().
The fix is to initialize the IOMMU table with DDW property stored in the FDT. There are 2 points to remember:
1. For the dedicated adapter, kdump kernel would encounter both default and DDW in FDT. In this case, DDW property is used to initialize the IOMMU table.
2. A DDW could be direct or dynamic mapped. kdump kernel would initialize IOMMU table and mark the existing DDW as "dynamic". This works fine since, at the time of table initialization, iommu_table_clear() makes some space in the DDW, for some predefined number of TCEs which are needed for kdump to succeed.
Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Gaurav Batra gbatra@linux.vnet.ibm.com Reviewed-by: Brian King brking@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240125203017.61014-1-gbatra@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/platforms/pseries/iommu.c | 156 +++++++++++++++++-------- 1 file changed, 105 insertions(+), 51 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 496e16c588aaa..e8c4129697b14 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -574,29 +574,6 @@ static void iommu_table_setparms(struct pci_controller *phb,
struct iommu_table_ops iommu_table_lpar_multi_ops;
-/* - * iommu_table_setparms_lpar - * - * Function: On pSeries LPAR systems, return TCE table info, given a pci bus. - */ -static void iommu_table_setparms_lpar(struct pci_controller *phb, - struct device_node *dn, - struct iommu_table *tbl, - struct iommu_table_group *table_group, - const __be32 *dma_window) -{ - unsigned long offset, size, liobn; - - of_parse_dma_window(dn, dma_window, &liobn, &offset, &size); - - iommu_table_setparms_common(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, NULL, - &iommu_table_lpar_multi_ops); - - - table_group->tce32_start = offset; - table_group->tce32_size = size; -} - struct iommu_table_ops iommu_table_pseries_ops = { .set = tce_build_pSeries, .clear = tce_free_pSeries, @@ -724,26 +701,71 @@ struct iommu_table_ops iommu_table_lpar_multi_ops = { * dynamic 64bit DMA window, walking up the device tree. */ static struct device_node *pci_dma_find(struct device_node *dn, - const __be32 **dma_window) + struct dynamic_dma_window_prop *prop) { - const __be32 *dw = NULL; + const __be32 *default_prop = NULL; + const __be32 *ddw_prop = NULL; + struct device_node *rdn = NULL; + bool default_win = false, ddw_win = false;
for ( ; dn && PCI_DN(dn); dn = dn->parent) { - dw = of_get_property(dn, "ibm,dma-window", NULL); - if (dw) { - if (dma_window) - *dma_window = dw; - return dn; + default_prop = of_get_property(dn, "ibm,dma-window", NULL); + if (default_prop) { + rdn = dn; + default_win = true; + } + ddw_prop = of_get_property(dn, DIRECT64_PROPNAME, NULL); + if (ddw_prop) { + rdn = dn; + ddw_win = true; + break; + } + ddw_prop = of_get_property(dn, DMA64_PROPNAME, NULL); + if (ddw_prop) { + rdn = dn; + ddw_win = true; + break; } - dw = of_get_property(dn, DIRECT64_PROPNAME, NULL); - if (dw) - return dn; - dw = of_get_property(dn, DMA64_PROPNAME, NULL); - if (dw) - return dn; + + /* At least found default window, which is the case for normal boot */ + if (default_win) + break; }
- return NULL; + /* For PCI devices there will always be a DMA window, either on the device + * or parent bus + */ + WARN_ON(!(default_win | ddw_win)); + + /* caller doesn't want to get DMA window property */ + if (!prop) + return rdn; + + /* parse DMA window property. During normal system boot, only default + * DMA window is passed in OF. But, for kdump, a dedicated adapter might + * have both default and DDW in FDT. In this scenario, DDW takes precedence + * over default window. + */ + if (ddw_win) { + struct dynamic_dma_window_prop *p; + + p = (struct dynamic_dma_window_prop *)ddw_prop; + prop->liobn = p->liobn; + prop->dma_base = p->dma_base; + prop->tce_shift = p->tce_shift; + prop->window_shift = p->window_shift; + } else if (default_win) { + unsigned long offset, size, liobn; + + of_parse_dma_window(rdn, default_prop, &liobn, &offset, &size); + + prop->liobn = cpu_to_be32((u32)liobn); + prop->dma_base = cpu_to_be64(offset); + prop->tce_shift = cpu_to_be32(IOMMU_PAGE_SHIFT_4K); + prop->window_shift = cpu_to_be32(order_base_2(size)); + } + + return rdn; }
static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus) @@ -751,17 +773,20 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus) struct iommu_table *tbl; struct device_node *dn, *pdn; struct pci_dn *ppci; - const __be32 *dma_window = NULL; + struct dynamic_dma_window_prop prop;
dn = pci_bus_to_OF_node(bus);
pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n", dn);
- pdn = pci_dma_find(dn, &dma_window); + pdn = pci_dma_find(dn, &prop);
- if (dma_window == NULL) - pr_debug(" no ibm,dma-window property !\n"); + /* In PPC architecture, there will always be DMA window on bus or one of the + * parent bus. During reboot, there will be ibm,dma-window property to + * define DMA window. For kdump, there will at least be default window or DDW + * or both. + */
ppci = PCI_DN(pdn);
@@ -771,13 +796,24 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus) if (!ppci->table_group) { ppci->table_group = iommu_pseries_alloc_group(ppci->phb->node); tbl = ppci->table_group->tables[0]; - if (dma_window) { - iommu_table_setparms_lpar(ppci->phb, pdn, tbl, - ppci->table_group, dma_window);
- if (!iommu_init_table(tbl, ppci->phb->node, 0, 0)) - panic("Failed to initialize iommu table"); - } + iommu_table_setparms_common(tbl, ppci->phb->bus->number, + be32_to_cpu(prop.liobn), + be64_to_cpu(prop.dma_base), + 1ULL << be32_to_cpu(prop.window_shift), + be32_to_cpu(prop.tce_shift), NULL, + &iommu_table_lpar_multi_ops); + + /* Only for normal boot with default window. Doesn't matter even + * if we set these with DDW which is 64bit during kdump, since + * these will not be used during kdump. + */ + ppci->table_group->tce32_start = be64_to_cpu(prop.dma_base); + ppci->table_group->tce32_size = 1 << be32_to_cpu(prop.window_shift); + + if (!iommu_init_table(tbl, ppci->phb->node, 0, 0)) + panic("Failed to initialize iommu table"); + iommu_register_group(ppci->table_group, pci_domain_nr(bus), 0); pr_debug(" created table: %p\n", ppci->table_group); @@ -968,6 +1004,12 @@ static void find_existing_ddw_windows_named(const char *name) continue; }
+ /* If at the time of system initialization, there are DDWs in OF, + * it means this is during kexec. DDW could be direct or dynamic. + * We will just mark DDWs as "dynamic" since this is kdump path, + * no need to worry about perforance. ddw_list_new_entry() will + * set window->direct = false. + */ window = ddw_list_new_entry(pdn, dma64); if (!window) { of_node_put(pdn); @@ -1524,8 +1566,8 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev) { struct device_node *pdn, *dn; struct iommu_table *tbl; - const __be32 *dma_window = NULL; struct pci_dn *pci; + struct dynamic_dma_window_prop prop;
pr_debug("pci_dma_dev_setup_pSeriesLP: %s\n", pci_name(dev));
@@ -1538,7 +1580,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev) dn = pci_device_to_OF_node(dev); pr_debug(" node is %pOF\n", dn);
- pdn = pci_dma_find(dn, &dma_window); + pdn = pci_dma_find(dn, &prop); if (!pdn || !PCI_DN(pdn)) { printk(KERN_WARNING "pci_dma_dev_setup_pSeriesLP: " "no DMA window found for pci dev=%s dn=%pOF\n", @@ -1551,8 +1593,20 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev) if (!pci->table_group) { pci->table_group = iommu_pseries_alloc_group(pci->phb->node); tbl = pci->table_group->tables[0]; - iommu_table_setparms_lpar(pci->phb, pdn, tbl, - pci->table_group, dma_window); + + iommu_table_setparms_common(tbl, pci->phb->bus->number, + be32_to_cpu(prop.liobn), + be64_to_cpu(prop.dma_base), + 1ULL << be32_to_cpu(prop.window_shift), + be32_to_cpu(prop.tce_shift), NULL, + &iommu_table_lpar_multi_ops); + + /* Only for normal boot with default window. Doesn't matter even + * if we set these with DDW which is 64bit during kdump, since + * these will not be used during kdump. + */ + pci->table_group->tce32_start = be64_to_cpu(prop.dma_base); + pci->table_group->tce32_size = 1 << be32_to_cpu(prop.window_shift);
iommu_init_table(tbl, pci->phb->node, 0, 0); iommu_register_group(pci->table_group,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Lynch nathanl@linux.ibm.com
[ Upstream commit fad87dbd48156ab940538f052f1820f4b6ed2819 ]
The PAPR spec spells the function name as
"ibm,reset-pe-dma-windows"
but in practice firmware uses the singular form:
"ibm,reset-pe-dma-window"
in the device tree. Since we have the wrong spelling in the RTAS function table, reverse lookups (token -> name) fail and warn:
unexpected failed lookup for token 86 WARNING: CPU: 1 PID: 545 at arch/powerpc/kernel/rtas.c:659 __do_enter_rtas_trace+0x2a4/0x2b4 CPU: 1 PID: 545 Comm: systemd-udevd Not tainted 6.8.0-rc4 #30 Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NL1060_028) hv:phyp pSeries NIP [c0000000000417f0] __do_enter_rtas_trace+0x2a4/0x2b4 LR [c0000000000417ec] __do_enter_rtas_trace+0x2a0/0x2b4 Call Trace: __do_enter_rtas_trace+0x2a0/0x2b4 (unreliable) rtas_call+0x1f8/0x3e0 enable_ddw.constprop.0+0x4d0/0xc84 dma_iommu_dma_supported+0xe8/0x24c dma_set_mask+0x5c/0xd8 mlx5_pci_init.constprop.0+0xf0/0x46c [mlx5_core] probe_one+0xfc/0x32c [mlx5_core] local_pci_probe+0x68/0x12c pci_call_probe+0x68/0x1ec pci_device_probe+0xbc/0x1a8 really_probe+0x104/0x570 __driver_probe_device+0xb8/0x224 driver_probe_device+0x54/0x130 __driver_attach+0x158/0x2b0 bus_for_each_dev+0xa8/0x120 driver_attach+0x34/0x48 bus_add_driver+0x174/0x304 driver_register+0x8c/0x1c4 __pci_register_driver+0x68/0x7c mlx5_init+0xb8/0x118 [mlx5_core] do_one_initcall+0x60/0x388 do_init_module+0x7c/0x2a4 init_module_from_file+0xb4/0x108 idempotent_init_module+0x184/0x34c sys_finit_module+0x90/0x114
And oopses are possible when lockdep is enabled or the RTAS tracepoints are active, since those paths dereference the result of the lookup.
Use the correct spelling to match firmware's behavior, adjusting the related constants to match.
Signed-off-by: Nathan Lynch nathanl@linux.ibm.com Fixes: 8252b88294d2 ("powerpc/rtas: improve function information lookups") Reported-by: Gaurav Batra gbatra@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240222-rtas-fix-ibm-reset-pe-dma-window-v1-1-7aaf235ac6... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/rtas.h | 4 ++-- arch/powerpc/kernel/rtas.c | 9 +++++++-- 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index c697c3c746946..33024a2874a69 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -68,7 +68,7 @@ enum rtas_function_index { RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE, RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE2, RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW, - RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOWS, + RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW, RTAS_FNIDX__IBM_SCAN_LOG_DUMP, RTAS_FNIDX__IBM_SET_DYNAMIC_INDICATOR, RTAS_FNIDX__IBM_SET_EEH_OPTION, @@ -163,7 +163,7 @@ typedef struct { #define RTAS_FN_IBM_READ_SLOT_RESET_STATE rtas_fn_handle(RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE) #define RTAS_FN_IBM_READ_SLOT_RESET_STATE2 rtas_fn_handle(RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE2) #define RTAS_FN_IBM_REMOVE_PE_DMA_WINDOW rtas_fn_handle(RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW) -#define RTAS_FN_IBM_RESET_PE_DMA_WINDOWS rtas_fn_handle(RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOWS) +#define RTAS_FN_IBM_RESET_PE_DMA_WINDOW rtas_fn_handle(RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW) #define RTAS_FN_IBM_SCAN_LOG_DUMP rtas_fn_handle(RTAS_FNIDX__IBM_SCAN_LOG_DUMP) #define RTAS_FN_IBM_SET_DYNAMIC_INDICATOR rtas_fn_handle(RTAS_FNIDX__IBM_SET_DYNAMIC_INDICATOR) #define RTAS_FN_IBM_SET_EEH_OPTION rtas_fn_handle(RTAS_FNIDX__IBM_SET_EEH_OPTION) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 87d65bdd3ecae..46b9476d75824 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -310,8 +310,13 @@ static struct rtas_function rtas_function_table[] __ro_after_init = { [RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW] = { .name = "ibm,remove-pe-dma-window", }, - [RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOWS] = { - .name = "ibm,reset-pe-dma-windows", + [RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW] = { + /* + * Note: PAPR+ v2.13 7.3.31.4.1 spells this as + * "ibm,reset-pe-dma-windows" (plural), but RTAS + * implementations use the singular form in practice. + */ + .name = "ibm,reset-pe-dma-window", }, [RTAS_FNIDX__IBM_SCAN_LOG_DUMP] = { .name = "ibm,scan-log-dump",
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arturas Moskvinas arturas.moskvinas@gmail.com
[ Upstream commit 530b1dbd97846b110ea8a94c7cc903eca21786e5 ]
Chip outputs are enabled[1] before actual reset is performed[2] which might cause pin output value to flip flop if previous pin value was set to 1. Fix that behavior by making sure chip is fully reset before all outputs are enabled.
Flip-flop can be noticed when module is removed and inserted again and one of the pins was changed to 1 before removal. 100 microsecond flipping is noticeable on oscilloscope (100khz SPI bus).
For a properly reset chip - output is enabled around 100 microseconds (on 100khz SPI bus) later during probing process hence should be irrelevant behavioral change.
Fixes: 7ebc194d0fd4 (gpio: 74x164: Introduce 'enable-gpios' property) Link: https://elixir.bootlin.com/linux/v6.7.4/source/drivers/gpio/gpio-74x164.c#L1... [1] Link: https://elixir.bootlin.com/linux/v6.7.4/source/drivers/gpio/gpio-74x164.c#L1... [2] Signed-off-by: Arturas Moskvinas arturas.moskvinas@gmail.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-74x164.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpio-74x164.c b/drivers/gpio/gpio-74x164.c index e00c333105170..753e7be039e4d 100644 --- a/drivers/gpio/gpio-74x164.c +++ b/drivers/gpio/gpio-74x164.c @@ -127,8 +127,6 @@ static int gen_74x164_probe(struct spi_device *spi) if (IS_ERR(chip->gpiod_oe)) return PTR_ERR(chip->gpiod_oe);
- gpiod_set_value_cansleep(chip->gpiod_oe, 1); - spi_set_drvdata(spi, chip);
chip->gpio_chip.label = spi->modalias; @@ -153,6 +151,8 @@ static int gen_74x164_probe(struct spi_device *spi) goto exit_destroy; }
+ gpiod_set_value_cansleep(chip->gpiod_oe, 1); + ret = gpiochip_add_data(&chip->gpio_chip, chip); if (!ret) return 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit e4aec4daa8c009057b5e063db1b7322252c92dc8 ]
After shuffling the code, error path wasn't updated correctly. Fix it here.
Fixes: 2f4133bb5f14 ("gpiolib: No need to call gpiochip_remove_pin_ranges() twice") Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 15de124d5b402..e8dc706fd7979 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1003,11 +1003,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, gpiochip_irqchip_free_valid_mask(gc); err_remove_acpi_chip: acpi_gpiochip_remove(gc); + gpiochip_remove_pin_ranges(gc); err_remove_of_chip: gpiochip_free_hogs(gc); of_gpiochip_remove(gc); err_free_gpiochip_mask: - gpiochip_remove_pin_ranges(gc); gpiochip_free_valid_mask(gc); if (gdev->dev.release) { /* release() has been registered by gpiochip_setup_dev() */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
[ Upstream commit ec5c54a9d3c4f9c15e647b049fea401ee5258696 ]
Hogs are added *after* ACPI so should be removed *before* in error path.
Fixes: a411e81e61df ("gpiolib: add hogs support for machine code") Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index e8dc706fd7979..1d033106cf396 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -972,11 +972,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
ret = gpiochip_irqchip_init_valid_mask(gc); if (ret) - goto err_remove_acpi_chip; + goto err_free_hogs;
ret = gpiochip_irqchip_init_hw(gc); if (ret) - goto err_remove_acpi_chip; + goto err_remove_irqchip_mask;
ret = gpiochip_add_irqchip(gc, lock_key, request_key); if (ret) @@ -1001,11 +1001,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, gpiochip_irqchip_remove(gc); err_remove_irqchip_mask: gpiochip_irqchip_free_valid_mask(gc); -err_remove_acpi_chip: +err_free_hogs: + gpiochip_free_hogs(gc); acpi_gpiochip_remove(gc); gpiochip_remove_pin_ranges(gc); err_remove_of_chip: - gpiochip_free_hogs(gc); of_gpiochip_remove(gc); err_free_gpiochip_mask: gpiochip_free_valid_mask(gc);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ming Lei ming.lei@redhat.com
[ Upstream commit 7838b4656110d950afdd92a081cc0f33e23e0ea8 ]
In commit 19416123ab3e ("block: define 'struct bvec_iter' as packed"), what we need is to save the 4byte padding, and avoid `bio` to spread on one extra cache line.
It is enough to define it as '__packed __aligned(4)', as '__packed' alone means byte aligned, and can cause compiler to generate horrible code on architectures that don't support unaligned access in case that bvec_iter is embedded in other structures.
Cc: Mikulas Patocka mpatocka@redhat.com Suggested-by: Linus Torvalds torvalds@linux-foundation.org Fixes: 19416123ab3e ("block: define 'struct bvec_iter' as packed") Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/bvec.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h index 555aae5448ae4..bd1e361b351c5 100644 --- a/include/linux/bvec.h +++ b/include/linux/bvec.h @@ -83,7 +83,7 @@ struct bvec_iter {
unsigned int bi_bvec_done; /* number of bytes completed in current bvec */ -} __packed; +} __packed __aligned(4);
struct bvec_iter_all { struct bio_vec bv;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit 3c7501722e6b31a6e56edd23cea5e77dbb9ffd1a upstream.
Mitigation for MDS is to use VERW instruction to clear any secrets in CPU Buffers. Any memory accesses after VERW execution can still remain in CPU buffers. It is safer to execute VERW late in return to user path to minimize the window in which kernel data can end up in CPU buffers. There are not many kernel secrets to be had after SWITCH_TO_USER_CR3.
Add support for deploying VERW mitigation after user register state is restored. This helps minimize the chances of kernel data ending up into CPU buffers after executing VERW.
Note that the mitigation at the new location is not yet enabled.
Corner case not handled ======================= Interrupts returning to kernel don't clear CPUs buffers since the exit-to-user path is expected to do that anyways. But, there could be a case when an NMI is generated in kernel after the exit-to-user path has cleared the buffers. This case is not handled and NMI returning to kernel don't clear CPU buffers because:
1. It is rare to get an NMI after VERW, but before returning to userspace. 2. For an unprivileged user, there is no known way to make that NMI less rare or target it. 3. It would take a large number of these precisely-timed NMIs to mount an actual attack. There's presumably not enough bandwidth. 4. The NMI in question occurs after a VERW, i.e. when user state is restored and most interesting data is already scrubbed. Whats left is only the data that NMI touches, and that may or may not be of any interest.
[ pawan: resolved conflict for hunk swapgs_restore_regs_and_return_to_usermode in backport ]
Suggested-by: Dave Hansen dave.hansen@intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Link: https://lore.kernel.org/all/20240213-delay-verw-v8-2-a6216d83edb7%40linux.in... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/entry/entry_64.S | 11 +++++++++++ arch/x86/entry/entry_64_compat.S | 1 + 2 files changed, 12 insertions(+)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -161,6 +161,7 @@ syscall_return_via_sysret: SYM_INNER_LABEL(entry_SYSRETQ_unsafe_stack, SYM_L_GLOBAL) ANNOTATE_NOENDBR swapgs + CLEAR_CPU_BUFFERS sysretq SYM_INNER_LABEL(entry_SYSRETQ_end, SYM_L_GLOBAL) ANNOTATE_NOENDBR @@ -601,6 +602,7 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_ /* Restore RDI. */ popq %rdi swapgs + CLEAR_CPU_BUFFERS jmp .Lnative_iret
@@ -712,6 +714,8 @@ native_irq_return_ldt: */ popq %rax /* Restore user RAX */
+ CLEAR_CPU_BUFFERS + /* * RSP now points to an ordinary IRET frame, except that the page * is read-only and RSP[31:16] are preloaded with the userspace @@ -1439,6 +1443,12 @@ nmi_restore: movq $0, 5*8(%rsp) /* clear "NMI executing" */
/* + * Skip CLEAR_CPU_BUFFERS here, since it only helps in rare cases like + * NMI in kernel after user state is restored. For an unprivileged user + * these conditions are hard to meet. + */ + + /* * iretq reads the "iret" frame and exits the NMI stack in a * single instruction. We are returning to kernel mode, so this * cannot result in a fault. Similarly, we don't need to worry @@ -1455,6 +1465,7 @@ SYM_CODE_START(entry_SYSCALL32_ignore) UNWIND_HINT_END_OF_STACK ENDBR mov $-ENOSYS, %eax + CLEAR_CPU_BUFFERS sysretl SYM_CODE_END(entry_SYSCALL32_ignore)
--- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -270,6 +270,7 @@ SYM_INNER_LABEL(entry_SYSRETL_compat_uns xorl %r9d, %r9d xorl %r10d, %r10d swapgs + CLEAR_CPU_BUFFERS sysretl SYM_INNER_LABEL(entry_SYSRETL_compat_end, SYM_L_GLOBAL) ANNOTATE_NOENDBR
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit a0e2dab44d22b913b4c228c8b52b2a104434b0b3 upstream.
As done for entry_64, add support for executing VERW late in exit to user path for 32-bit mode.
Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Link: https://lore.kernel.org/all/20240213-delay-verw-v8-3-a6216d83edb7%40linux.in... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/entry/entry_32.S | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -885,6 +885,7 @@ SYM_FUNC_START(entry_SYSENTER_32) BUG_IF_WRONG_CR3 no_user_check=1 popfl popl %eax + CLEAR_CPU_BUFFERS
/* * Return back to the vDSO, which will pop ecx and edx. @@ -954,6 +955,7 @@ restore_all_switch_stack:
/* Restore user state */ RESTORE_REGS pop=4 # skip orig_eax/error_code + CLEAR_CPU_BUFFERS .Lirq_return: /* * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization @@ -1146,6 +1148,7 @@ SYM_CODE_START(asm_exc_nmi)
/* Not on SYSENTER stack. */ call exc_nmi + CLEAR_CPU_BUFFERS jmp .Lnmi_return
.Lnmi_from_sysenter_stack:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 upstream.
The VERW mitigation at exit-to-user is enabled via a static branch mds_user_clear. This static branch is never toggled after boot, and can be safely replaced with an ALTERNATIVE() which is convenient to use in asm.
Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user path. Also remove the now redundant VERW in exc_nmi() and arch_exit_to_user_mode().
Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.in... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/arch/x86/mds.rst | 36 +++++++++++++++++++++++++---------- arch/x86/include/asm/entry-common.h | 1 arch/x86/include/asm/nospec-branch.h | 12 ----------- arch/x86/kernel/cpu/bugs.c | 15 +++++--------- arch/x86/kernel/nmi.c | 3 -- arch/x86/kvm/vmx/vmx.c | 2 - 6 files changed, 33 insertions(+), 36 deletions(-)
--- a/Documentation/arch/x86/mds.rst +++ b/Documentation/arch/x86/mds.rst @@ -95,6 +95,9 @@ The kernel provides a function to invoke
mds_clear_cpu_buffers()
+Also macro CLEAR_CPU_BUFFERS can be used in ASM late in exit-to-user path. +Other than CFLAGS.ZF, this macro doesn't clobber any registers. + The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state (idle) transitions.
@@ -138,17 +141,30 @@ Mitigation points
When transitioning from kernel to user space the CPU buffers are flushed on affected CPUs when the mitigation is not disabled on the kernel - command line. The migitation is enabled through the static key - mds_user_clear. + command line. The mitigation is enabled through the feature flag + X86_FEATURE_CLEAR_CPU_BUF.
- The mitigation is invoked in prepare_exit_to_usermode() which covers - all but one of the kernel to user space transitions. The exception - is when we return from a Non Maskable Interrupt (NMI), which is - handled directly in do_nmi(). - - (The reason that NMI is special is that prepare_exit_to_usermode() can - enable IRQs. In NMI context, NMIs are blocked, and we don't want to - enable IRQs with NMIs blocked.) + The mitigation is invoked just before transitioning to userspace after + user registers are restored. This is done to minimize the window in + which kernel data could be accessed after VERW e.g. via an NMI after + VERW. + + **Corner case not handled** + Interrupts returning to kernel don't clear CPUs buffers since the + exit-to-user path is expected to do that anyways. But, there could be + a case when an NMI is generated in kernel after the exit-to-user path + has cleared the buffers. This case is not handled and NMI returning to + kernel don't clear CPU buffers because: + + 1. It is rare to get an NMI after VERW, but before returning to userspace. + 2. For an unprivileged user, there is no known way to make that NMI + less rare or target it. + 3. It would take a large number of these precisely-timed NMIs to mount + an actual attack. There's presumably not enough bandwidth. + 4. The NMI in question occurs after a VERW, i.e. when user state is + restored and most interesting data is already scrubbed. Whats left + is only the data that NMI touches, and that may or may not be of + any interest.
2. C-State transition --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -91,7 +91,6 @@ static inline void arch_exit_to_user_mod
static __always_inline void arch_exit_to_user_mode(void) { - mds_user_clear_cpu_buffers(); amd_clear_divider(); } #define arch_exit_to_user_mode arch_exit_to_user_mode --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -540,7 +540,6 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_ DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
-DECLARE_STATIC_KEY_FALSE(mds_user_clear); DECLARE_STATIC_KEY_FALSE(mds_idle_clear);
DECLARE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush); @@ -575,17 +574,6 @@ static __always_inline void mds_clear_cp }
/** - * mds_user_clear_cpu_buffers - Mitigation for MDS and TAA vulnerability - * - * Clear CPU buffers if the corresponding static key is enabled - */ -static __always_inline void mds_user_clear_cpu_buffers(void) -{ - if (static_branch_likely(&mds_user_clear)) - mds_clear_cpu_buffers(); -} - -/** * mds_idle_clear_cpu_buffers - Mitigation for MDS vulnerability * * Clear CPU buffers if the corresponding static key is enabled --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -111,9 +111,6 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_i /* Control unconditional IBPB in switch_mm() */ DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
-/* Control MDS CPU buffer clear before returning to user space */ -DEFINE_STATIC_KEY_FALSE(mds_user_clear); -EXPORT_SYMBOL_GPL(mds_user_clear); /* Control MDS CPU buffer clear before idling (halt, mwait) */ DEFINE_STATIC_KEY_FALSE(mds_idle_clear); EXPORT_SYMBOL_GPL(mds_idle_clear); @@ -252,7 +249,7 @@ static void __init mds_select_mitigation if (!boot_cpu_has(X86_FEATURE_MD_CLEAR)) mds_mitigation = MDS_MITIGATION_VMWERV;
- static_branch_enable(&mds_user_clear); + setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
if (!boot_cpu_has(X86_BUG_MSBDS_ONLY) && (mds_nosmt || cpu_mitigations_auto_nosmt())) @@ -356,7 +353,7 @@ static void __init taa_select_mitigation * For guests that can't determine whether the correct microcode is * present on host, enable the mitigation for UCODE_NEEDED as well. */ - static_branch_enable(&mds_user_clear); + setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
if (taa_nosmt || cpu_mitigations_auto_nosmt()) cpu_smt_disable(false); @@ -424,7 +421,7 @@ static void __init mmio_select_mitigatio */ if (boot_cpu_has_bug(X86_BUG_MDS) || (boot_cpu_has_bug(X86_BUG_TAA) && boot_cpu_has(X86_FEATURE_RTM))) - static_branch_enable(&mds_user_clear); + setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF); else static_branch_enable(&mmio_stale_data_clear);
@@ -484,12 +481,12 @@ static void __init md_clear_update_mitig if (cpu_mitigations_off()) return;
- if (!static_key_enabled(&mds_user_clear)) + if (!boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF)) goto out;
/* - * mds_user_clear is now enabled. Update MDS, TAA and MMIO Stale Data - * mitigation, if necessary. + * X86_FEATURE_CLEAR_CPU_BUF is now enabled. Update MDS, TAA and MMIO + * Stale Data mitigation, if necessary. */ if (mds_mitigation == MDS_MITIGATION_OFF && boot_cpu_has_bug(X86_BUG_MDS)) { --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -563,9 +563,6 @@ nmi_restart: } if (this_cpu_dec_return(nmi_state)) goto nmi_restart; - - if (user_mode(regs)) - mds_user_clear_cpu_buffers(); }
#if IS_ENABLED(CONFIG_KVM_INTEL) --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7229,7 +7229,7 @@ static noinstr void vmx_vcpu_enter_exit( /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); - else if (static_branch_unlikely(&mds_user_clear)) + else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF)) mds_clear_cpu_buffers(); else if (static_branch_unlikely(&mmio_stale_data_clear) && kvm_arch_has_assigned_device(vcpu->kvm))
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
From: Sean Christopherson seanjc@google.com
commit 706a189dcf74d3b3f955e9384785e726ed6c7c80 upstream.
Use EFLAGS.CF instead of EFLAGS.ZF to track whether to use VMRESUME versus VMLAUNCH. Freeing up EFLAGS.ZF will allow doing VERW, which clobbers ZF, for MDS mitigations as late as possible without needing to duplicate VERW for both paths.
Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Nikolay Borisov nik.borisov@suse.com Link: https://lore.kernel.org/all/20240213-delay-verw-v8-5-a6216d83edb7%40linux.in... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/run_flags.h | 7 +++++-- arch/x86/kvm/vmx/vmenter.S | 6 +++--- 2 files changed, 8 insertions(+), 5 deletions(-)
--- a/arch/x86/kvm/vmx/run_flags.h +++ b/arch/x86/kvm/vmx/run_flags.h @@ -2,7 +2,10 @@ #ifndef __KVM_X86_VMX_RUN_FLAGS_H #define __KVM_X86_VMX_RUN_FLAGS_H
-#define VMX_RUN_VMRESUME (1 << 0) -#define VMX_RUN_SAVE_SPEC_CTRL (1 << 1) +#define VMX_RUN_VMRESUME_SHIFT 0 +#define VMX_RUN_SAVE_SPEC_CTRL_SHIFT 1 + +#define VMX_RUN_VMRESUME BIT(VMX_RUN_VMRESUME_SHIFT) +#define VMX_RUN_SAVE_SPEC_CTRL BIT(VMX_RUN_SAVE_SPEC_CTRL_SHIFT)
#endif /* __KVM_X86_VMX_RUN_FLAGS_H */ --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -139,7 +139,7 @@ SYM_FUNC_START(__vmx_vcpu_run) mov (%_ASM_SP), %_ASM_AX
/* Check if vmlaunch or vmresume is needed */ - test $VMX_RUN_VMRESUME, %ebx + bt $VMX_RUN_VMRESUME_SHIFT, %ebx
/* Load guest registers. Don't clobber flags. */ mov VCPU_RCX(%_ASM_AX), %_ASM_CX @@ -161,8 +161,8 @@ SYM_FUNC_START(__vmx_vcpu_run) /* Load guest RAX. This kills the @regs pointer! */ mov VCPU_RAX(%_ASM_AX), %_ASM_AX
- /* Check EFLAGS.ZF from 'test VMX_RUN_VMRESUME' above */ - jz .Lvmlaunch + /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */ + jnc .Lvmlaunch
/* * After a successful VMRESUME/VMLAUNCH, control flow "magically"
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit 43fb862de8f628c5db5e96831c915b9aebf62d33 upstream.
During VMentry VERW is executed to mitigate MDS. After VERW, any memory access like register push onto stack may put host data in MDS affected CPU buffers. A guest can then use MDS to sample host data.
Although likelihood of secrets surviving in registers at current VERW callsite is less, but it can't be ruled out. Harden the MDS mitigation by moving the VERW mitigation late in VMentry path.
Note that VERW for MMIO Stale Data mitigation is unchanged because of the complexity of per-guest conditional VERW which is not easy to handle that late in asm with no GPRs available. If the CPU is also affected by MDS, VERW is unconditionally executed late in asm regardless of guest having MMIO access.
Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Sean Christopherson seanjc@google.com Link: https://lore.kernel.org/all/20240213-delay-verw-v8-6-a6216d83edb7%40linux.in... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/vmenter.S | 3 +++ arch/x86/kvm/vmx/vmx.c | 20 ++++++++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -161,6 +161,9 @@ SYM_FUNC_START(__vmx_vcpu_run) /* Load guest RAX. This kills the @regs pointer! */ mov VCPU_RAX(%_ASM_AX), %_ASM_AX
+ /* Clobbers EFLAGS.ZF */ + CLEAR_CPU_BUFFERS + /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */ jnc .Lvmlaunch
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -387,7 +387,16 @@ static __always_inline void vmx_enable_f
static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx) { - vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) && + /* + * Disable VERW's behavior of clearing CPU buffers for the guest if the + * CPU isn't affected by MDS/TAA, and the host hasn't forcefully enabled + * the mitigation. Disabling the clearing behavior provides a + * performance boost for guests that aren't aware that manually clearing + * CPU buffers is unnecessary, at the cost of MSR accesses on VM-Entry + * and VM-Exit. + */ + vmx->disable_fb_clear = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) && + (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) && !boot_cpu_has_bug(X86_BUG_MDS) && !boot_cpu_has_bug(X86_BUG_TAA);
@@ -7226,11 +7235,14 @@ static noinstr void vmx_vcpu_enter_exit(
guest_state_enter_irqoff();
- /* L1D Flush includes CPU buffer clear to mitigate MDS */ + /* + * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW + * mitigation for MDS is done late in VMentry and is still + * executed in spite of L1D Flush. This is because an extra VERW + * should not matter much after the big hammer L1D Flush. + */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); - else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF)) - mds_clear_cpu_buffers(); else if (static_branch_unlikely(&mmio_stale_data_clear) && kvm_arch_has_assigned_device(vcpu->kvm)) mds_clear_cpu_buffers();
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang geliang.tang@suse.com
commit 06848c0f341ee3f9226ed01e519c72e4d2b6f001 upstream.
This patch adds a new helper get_info_value(), using 'sed' command to parse the value of the given item name in the line with the given keyword, to make chk_mptcp_info() and pedit_action_pkts() more readable.
Also add another helper evts_get_info() to use get_info_value() to parse the output of 'pm_nl_ctl events' command, to make all the userspace pm selftests more readable, both in mptcp_join.sh and userspace_pm.sh.
Reviewed-by: Matthieu Baerts matttbe@kernel.org Signed-off-by: Geliang Tang geliang.tang@suse.com Signed-off-by: Mat Martineau martineau@kernel.org Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-2-8d6b94150f6b@k... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 19 ++-- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 10 ++ tools/testing/selftests/net/mptcp/userspace_pm.sh | 86 +++++++++------------- 3 files changed, 57 insertions(+), 58 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -1845,10 +1845,8 @@ chk_mptcp_info()
print_check "mptcp_info ${info1:0:8}=$exp1:$exp2"
- cnt1=$(ss -N $ns1 -inmHM | grep "$info1:" | - sed -n 's/.*('"$info1"':)([[:digit:]]*).*$/\2/p;q') - cnt2=$(ss -N $ns2 -inmHM | grep "$info2:" | - sed -n 's/.*('"$info2"':)([[:digit:]]*).*$/\2/p;q') + cnt1=$(ss -N $ns1 -inmHM | mptcp_lib_get_info_value "$info1" "$info1") + cnt2=$(ss -N $ns2 -inmHM | mptcp_lib_get_info_value "$info2" "$info2") # 'ss' only display active connections and counters that are not 0. [ -z "$cnt1" ] && cnt1=0 [ -z "$cnt2" ] && cnt2=0 @@ -2824,13 +2822,13 @@ verify_listener_events() return fi
- type=$(grep "type:$e_type," $evt | sed -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q') - family=$(grep "type:$e_type," $evt | sed -n 's/.*(family:)([[:digit:]]*).*$/\2/p;q') - sport=$(grep "type:$e_type," $evt | sed -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q') + type=$(mptcp_lib_evts_get_info type "$evt" "$e_type") + family=$(mptcp_lib_evts_get_info family "$evt" "$e_type") + sport=$(mptcp_lib_evts_get_info sport "$evt" "$e_type") if [ $family ] && [ $family = $AF_INET6 ]; then - saddr=$(grep "type:$e_type," $evt | sed -n 's/.*(saddr6:)([0-9a-f:.]*).*$/\2/p;q') + saddr=$(mptcp_lib_evts_get_info saddr6 "$evt" "$e_type") else - saddr=$(grep "type:$e_type," $evt | sed -n 's/.*(saddr4:)([0-9.]*).*$/\2/p;q') + saddr=$(mptcp_lib_evts_get_info saddr4 "$evt" "$e_type") fi
if [ $type ] && [ $type = $e_type ] && @@ -3225,8 +3223,7 @@ fastclose_tests() pedit_action_pkts() { tc -n $ns2 -j -s action show action pedit index 100 | \ - grep "packets" | \ - sed 's/.*"packets":([0-9]+),.*/\1/' + mptcp_lib_get_info_value "packets" packets }
fail_tests() --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -208,6 +208,16 @@ mptcp_lib_result_print_all_tap() { done }
+# get the value of keyword $1 in the line marked by keyword $2 +mptcp_lib_get_info_value() { + grep "${2}" | sed -n 's/.*('"${1}"':)([0-9a-f:.]*).*$/\2/p;q' +} + +# $1: info name ; $2: evts_ns ; $3: event type +mptcp_lib_evts_get_info() { + mptcp_lib_get_info_value "${1}" "^type:${3:-1}," < "${2}" +} + # $1: PID mptcp_lib_kill_wait() { [ "${1}" -eq 0 ] && return 0 --- a/tools/testing/selftests/net/mptcp/userspace_pm.sh +++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh @@ -238,14 +238,11 @@ make_connection() local server_token local server_serverside
- client_token=$(sed --unbuffered -n 's/.*(token:)([[:digit:]]*).*$/\2/p;q' "$client_evts") - client_port=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$client_evts") - client_serverside=$(sed --unbuffered -n 's/.*(server_side:)([[:digit:]]*).*$/\2/p;q'\ - "$client_evts") - server_token=$(grep "type:1," "$server_evts" | - sed --unbuffered -n 's/.*(token:)([[:digit:]]*).*$/\2/p;q') - server_serverside=$(grep "type:1," "$server_evts" | - sed --unbuffered -n 's/.*(server_side:)([[:digit:]]*).*$/\2/p;q') + client_token=$(mptcp_lib_evts_get_info token "$client_evts") + client_port=$(mptcp_lib_evts_get_info sport "$client_evts") + client_serverside=$(mptcp_lib_evts_get_info server_side "$client_evts") + server_token=$(mptcp_lib_evts_get_info token "$server_evts") + server_serverside=$(mptcp_lib_evts_get_info server_side "$server_evts")
print_test "Established IP${is_v6} MPTCP Connection ns2 => ns1" if [ "$client_token" != "" ] && [ "$server_token" != "" ] && [ "$client_serverside" = 0 ] && @@ -331,16 +328,16 @@ verify_announce_event() local dport local id
- type=$(sed --unbuffered -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q' "$evt") - token=$(sed --unbuffered -n 's/.*(token:)([[:digit:]]*).*$/\2/p;q' "$evt") + type=$(mptcp_lib_evts_get_info type "$evt" $e_type) + token=$(mptcp_lib_evts_get_info token "$evt" $e_type) if [ "$e_af" = "v6" ] then - addr=$(sed --unbuffered -n 's/.*(daddr6:)([0-9a-f:.]*).*$/\2/p;q' "$evt") + addr=$(mptcp_lib_evts_get_info daddr6 "$evt" $e_type) else - addr=$(sed --unbuffered -n 's/.*(daddr4:)([0-9.]*).*$/\2/p;q' "$evt") + addr=$(mptcp_lib_evts_get_info daddr4 "$evt" $e_type) fi - dport=$(sed --unbuffered -n 's/.*(dport:)([[:digit:]]*).*$/\2/p;q' "$evt") - id=$(sed --unbuffered -n 's/.*(rem_id:)([[:digit:]]*).*$/\2/p;q' "$evt") + dport=$(mptcp_lib_evts_get_info dport "$evt" $e_type) + id=$(mptcp_lib_evts_get_info rem_id "$evt" $e_type)
check_expected "type" "token" "addr" "dport" "id" } @@ -358,7 +355,7 @@ test_announce() $client_addr_id dev ns2eth1 > /dev/null 2>&1
local type - type=$(sed --unbuffered -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q' "$server_evts") + type=$(mptcp_lib_evts_get_info type "$server_evts") print_test "ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token" if [ "$type" = "" ] then @@ -437,9 +434,9 @@ verify_remove_event() local token local id
- type=$(sed --unbuffered -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q' "$evt") - token=$(sed --unbuffered -n 's/.*(token:)([[:digit:]]*).*$/\2/p;q' "$evt") - id=$(sed --unbuffered -n 's/.*(rem_id:)([[:digit:]]*).*$/\2/p;q' "$evt") + type=$(mptcp_lib_evts_get_info type "$evt" $e_type) + token=$(mptcp_lib_evts_get_info token "$evt" $e_type) + id=$(mptcp_lib_evts_get_info rem_id "$evt" $e_type)
check_expected "type" "token" "id" } @@ -457,7 +454,7 @@ test_remove() $client_addr_id > /dev/null 2>&1 print_test "RM_ADDR id:${client_addr_id} ns2 => ns1, invalid token" local type - type=$(sed --unbuffered -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q' "$server_evts") + type=$(mptcp_lib_evts_get_info type "$server_evts") if [ "$type" = "" ] then test_pass @@ -470,7 +467,7 @@ test_remove() ip netns exec "$ns2" ./pm_nl_ctl rem token "$client4_token" id\ $invalid_id > /dev/null 2>&1 print_test "RM_ADDR id:${invalid_id} ns2 => ns1, invalid id" - type=$(sed --unbuffered -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q' "$server_evts") + type=$(mptcp_lib_evts_get_info type "$server_evts") if [ "$type" = "" ] then test_pass @@ -574,19 +571,19 @@ verify_subflow_events() fi fi
- type=$(sed --unbuffered -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q' "$evt") - token=$(sed --unbuffered -n 's/.*(token:)([[:digit:]]*).*$/\2/p;q' "$evt") - family=$(sed --unbuffered -n 's/.*(family:)([[:digit:]]*).*$/\2/p;q' "$evt") - dport=$(sed --unbuffered -n 's/.*(dport:)([[:digit:]]*).*$/\2/p;q' "$evt") - locid=$(sed --unbuffered -n 's/.*(loc_id:)([[:digit:]]*).*$/\2/p;q' "$evt") - remid=$(sed --unbuffered -n 's/.*(rem_id:)([[:digit:]]*).*$/\2/p;q' "$evt") + type=$(mptcp_lib_evts_get_info type "$evt" $e_type) + token=$(mptcp_lib_evts_get_info token "$evt" $e_type) + family=$(mptcp_lib_evts_get_info family "$evt" $e_type) + dport=$(mptcp_lib_evts_get_info dport "$evt" $e_type) + locid=$(mptcp_lib_evts_get_info loc_id "$evt" $e_type) + remid=$(mptcp_lib_evts_get_info rem_id "$evt" $e_type) if [ "$family" = "$AF_INET6" ] then - saddr=$(sed --unbuffered -n 's/.*(saddr6:)([0-9a-f:.]*).*$/\2/p;q' "$evt") - daddr=$(sed --unbuffered -n 's/.*(daddr6:)([0-9a-f:.]*).*$/\2/p;q' "$evt") + saddr=$(mptcp_lib_evts_get_info saddr6 "$evt" $e_type) + daddr=$(mptcp_lib_evts_get_info daddr6 "$evt" $e_type) else - saddr=$(sed --unbuffered -n 's/.*(saddr4:)([0-9.]*).*$/\2/p;q' "$evt") - daddr=$(sed --unbuffered -n 's/.*(daddr4:)([0-9.]*).*$/\2/p;q' "$evt") + saddr=$(mptcp_lib_evts_get_info saddr4 "$evt" $e_type) + daddr=$(mptcp_lib_evts_get_info daddr4 "$evt" $e_type) fi
check_expected "type" "token" "daddr" "dport" "family" "saddr" "locid" "remid" @@ -621,7 +618,7 @@ test_subflows() mptcp_lib_kill_wait $listener_pid
local sport - sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$server_evts") + sport=$(mptcp_lib_evts_get_info sport "$server_evts" $SUB_ESTABLISHED)
# DESTROY_SUBFLOW from server to client machine :>"$server_evts" @@ -659,7 +656,7 @@ test_subflows() # Delete the listener from the client ns, if one was created mptcp_lib_kill_wait $listener_pid
- sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$server_evts") + sport=$(mptcp_lib_evts_get_info sport "$server_evts" $SUB_ESTABLISHED)
# DESTROY_SUBFLOW6 from server to client machine :>"$server_evts" @@ -698,7 +695,7 @@ test_subflows() # Delete the listener from the client ns, if one was created mptcp_lib_kill_wait $listener_pid
- sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$server_evts") + sport=$(mptcp_lib_evts_get_info sport "$server_evts" $SUB_ESTABLISHED)
# DESTROY_SUBFLOW from server to client machine :>"$server_evts" @@ -736,7 +733,7 @@ test_subflows() # Delete the listener from the server ns, if one was created mptcp_lib_kill_wait $listener_pid
- sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$client_evts") + sport=$(mptcp_lib_evts_get_info sport "$client_evts" $SUB_ESTABLISHED)
# DESTROY_SUBFLOW from client to server machine :>"$client_evts" @@ -775,7 +772,7 @@ test_subflows() # Delete the listener from the server ns, if one was created mptcp_lib_kill_wait $listener_pid
- sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$client_evts") + sport=$(mptcp_lib_evts_get_info sport "$client_evts" $SUB_ESTABLISHED)
# DESTROY_SUBFLOW6 from client to server machine :>"$client_evts" @@ -812,7 +809,7 @@ test_subflows() # Delete the listener from the server ns, if one was created mptcp_lib_kill_wait $listener_pid
- sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$client_evts") + sport=$(mptcp_lib_evts_get_info sport "$client_evts" $SUB_ESTABLISHED)
# DESTROY_SUBFLOW from client to server machine :>"$client_evts" @@ -858,7 +855,7 @@ test_subflows_v4_v6_mix() # Delete the listener from the server ns, if one was created mptcp_lib_kill_wait $listener_pid
- sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$client_evts") + sport=$(mptcp_lib_evts_get_info sport "$client_evts" $SUB_ESTABLISHED)
# DESTROY_SUBFLOW from client to server machine :>"$client_evts" @@ -926,18 +923,13 @@ verify_listener_events() print_test "CLOSE_LISTENER $e_saddr:$e_sport" fi
- type=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q') - family=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(family:)([[:digit:]]*).*$/\2/p;q') - sport=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q') + type=$(mptcp_lib_evts_get_info type $evt $e_type) + family=$(mptcp_lib_evts_get_info family $evt $e_type) + sport=$(mptcp_lib_evts_get_info sport $evt $e_type) if [ $family ] && [ $family = $AF_INET6 ]; then - saddr=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(saddr6:)([0-9a-f:.]*).*$/\2/p;q') + saddr=$(mptcp_lib_evts_get_info saddr6 $evt $e_type) else - saddr=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(saddr4:)([0-9.]*).*$/\2/p;q') + saddr=$(mptcp_lib_evts_get_info saddr4 $evt $e_type) fi
check_expected "type" "family" "saddr" "sport"
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang geliang.tang@suse.com
commit 80775412882e273b8ef62124fae861cde8e6fb3d upstream.
This patch adds a new helper chk_subflows_total(), in it use the newly added counter mptcpi_subflows_total to get the "correct" amount of subflows, including the initial one.
To be compatible with old 'ss' or kernel versions not supporting this counter, get the total subflows by listing TCP connections that are MPTCP subflows:
ss -ti state state established state syn-sent state syn-recv | grep -c tcp-ulp-mptcp.
Reviewed-by: Matthieu Baerts matttbe@kernel.org Signed-off-by: Geliang Tang geliang.tang@suse.com Signed-off-by: Mat Martineau martineau@kernel.org Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-3-8d6b94150f6b@k... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 42 +++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -1843,7 +1843,7 @@ chk_mptcp_info() local cnt2 local dump_stats
- print_check "mptcp_info ${info1:0:8}=$exp1:$exp2" + print_check "mptcp_info ${info1:0:15}=$exp1:$exp2"
cnt1=$(ss -N $ns1 -inmHM | mptcp_lib_get_info_value "$info1" "$info1") cnt2=$(ss -N $ns2 -inmHM | mptcp_lib_get_info_value "$info2" "$info2") @@ -1864,6 +1864,42 @@ chk_mptcp_info() fi }
+# $1: subflows in ns1 ; $2: subflows in ns2 +# number of all subflows, including the initial subflow. +chk_subflows_total() +{ + local cnt1 + local cnt2 + local info="subflows_total" + local dump_stats + + # if subflows_total counter is supported, use it: + if [ -n "$(ss -N $ns1 -inmHM | mptcp_lib_get_info_value $info $info)" ]; then + chk_mptcp_info $info $1 $info $2 + return + fi + + print_check "$info $1:$2" + + # if not, count the TCP connections that are in fact MPTCP subflows + cnt1=$(ss -N $ns1 -ti state established state syn-sent state syn-recv | + grep -c tcp-ulp-mptcp) + cnt2=$(ss -N $ns2 -ti state established state syn-sent state syn-recv | + grep -c tcp-ulp-mptcp) + + if [ "$1" != "$cnt1" ] || [ "$2" != "$cnt2" ]; then + fail_test "got subflows $cnt1:$cnt2 expected $1:$2" + dump_stats=1 + else + print_ok + fi + + if [ "$dump_stats" = 1 ]; then + ss -N $ns1 -ti + ss -N $ns2 -ti + fi +} + chk_link_usage() { local ns=$1 @@ -3407,10 +3443,12 @@ userspace_tests() chk_join_nr 1 1 1 chk_add_nr 1 1 chk_mptcp_info subflows 1 subflows 1 + chk_subflows_total 2 2 chk_mptcp_info add_addr_signal 1 add_addr_accepted 1 userspace_pm_rm_sf_addr_ns1 10.0.2.1 10 chk_rm_nr 1 1 invert chk_mptcp_info subflows 0 subflows 0 + chk_subflows_total 1 1 kill_events_pids mptcp_lib_kill_wait $tests_pid fi @@ -3427,9 +3465,11 @@ userspace_tests() userspace_pm_add_sf 10.0.3.2 20 chk_join_nr 1 1 1 chk_mptcp_info subflows 1 subflows 1 + chk_subflows_total 2 2 userspace_pm_rm_sf_addr_ns2 10.0.3.2 20 chk_rm_nr 1 1 chk_mptcp_info subflows 0 subflows 0 + chk_subflows_total 1 1 kill_events_pids mptcp_lib_kill_wait $tests_pid fi
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang geliang.tang@suse.com
commit 757c828ce94905a2975873d5e90a376c701b2b90 upstream.
This patch adds a new argument namespace to userspace_pm_add_addr() and userspace_pm_add_sf() to make these two helper more versatile.
Add two more versatile helpers for userspace pm remove subflow or address: userspace_pm_rm_addr() and userspace_pm_rm_sf(). The original test helpers userspace_pm_rm_sf_addr_ns1() and userspace_pm_rm_sf_addr_ns2() can be replaced by these new helpers.
Reviewed-by: Matthieu Baerts matttbe@kernel.org Signed-off-by: Geliang Tang geliang.tang@suse.com Signed-off-by: Mat Martineau martineau@kernel.org Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-4-8d6b94150f6b@k... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 100 +++++++++++------------- 1 file changed, 49 insertions(+), 51 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2824,6 +2824,7 @@ backup_tests() fi }
+SUB_ESTABLISHED=10 # MPTCP_EVENT_SUB_ESTABLISHED LISTENER_CREATED=15 #MPTCP_EVENT_LISTENER_CREATED LISTENER_CLOSED=16 #MPTCP_EVENT_LISTENER_CLOSED
@@ -3284,75 +3285,70 @@ fail_tests() fi }
+# $1: ns ; $2: addr ; $3: id userspace_pm_add_addr() { - local addr=$1 - local id=$2 + local evts=$evts_ns1 local tk
- tk=$(grep "type:1," "$evts_ns1" | - sed -n 's/.*(token:)([[:digit:]]*).*$/\2/p;q') - ip netns exec $ns1 ./pm_nl_ctl ann $addr token $tk id $id + [ "$1" == "$ns2" ] && evts=$evts_ns2 + tk=$(mptcp_lib_evts_get_info token "$evts") + + ip netns exec $1 ./pm_nl_ctl ann $2 token $tk id $3 sleep 1 }
-userspace_pm_rm_sf_addr_ns1() +# $1: ns ; $2: id +userspace_pm_rm_addr() { - local addr=$1 - local id=$2 - local tk sp da dp - local cnt_addr cnt_sf - - tk=$(grep "type:1," "$evts_ns1" | - sed -n 's/.*(token:)([[:digit:]]*).*$/\2/p;q') - sp=$(grep "type:10" "$evts_ns1" | - sed -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q') - da=$(grep "type:10" "$evts_ns1" | - sed -n 's/.*(daddr6:)([0-9a-f:.]*).*$/\2/p;q') - dp=$(grep "type:10" "$evts_ns1" | - sed -n 's/.*(dport:)([[:digit:]]*).*$/\2/p;q') - cnt_addr=$(rm_addr_count ${ns1}) - cnt_sf=$(rm_sf_count ${ns1}) - ip netns exec $ns1 ./pm_nl_ctl rem token $tk id $id - ip netns exec $ns1 ./pm_nl_ctl dsf lip "::ffff:$addr" \ - lport $sp rip $da rport $dp token $tk - wait_rm_addr $ns1 "${cnt_addr}" - wait_rm_sf $ns1 "${cnt_sf}" + local evts=$evts_ns1 + local tk + local cnt + + [ "$1" == "$ns2" ] && evts=$evts_ns2 + tk=$(mptcp_lib_evts_get_info token "$evts") + + cnt=$(rm_addr_count ${1}) + ip netns exec $1 ./pm_nl_ctl rem token $tk id $2 + wait_rm_addr $1 "${cnt}" }
+# $1: ns ; $2: addr ; $3: id userspace_pm_add_sf() { - local addr=$1 - local id=$2 + local evts=$evts_ns1 local tk da dp
- tk=$(sed -n 's/.*(token:)([[:digit:]]*).*$/\2/p;q' "$evts_ns2") - da=$(sed -n 's/.*(daddr4:)([0-9.]*).*$/\2/p;q' "$evts_ns2") - dp=$(sed -n 's/.*(dport:)([[:digit:]]*).*$/\2/p;q' "$evts_ns2") - ip netns exec $ns2 ./pm_nl_ctl csf lip $addr lid $id \ + [ "$1" == "$ns2" ] && evts=$evts_ns2 + tk=$(mptcp_lib_evts_get_info token "$evts") + da=$(mptcp_lib_evts_get_info daddr4 "$evts") + dp=$(mptcp_lib_evts_get_info dport "$evts") + + ip netns exec $1 ./pm_nl_ctl csf lip $2 lid $3 \ rip $da rport $dp token $tk sleep 1 }
-userspace_pm_rm_sf_addr_ns2() +# $1: ns ; $2: addr $3: event type +userspace_pm_rm_sf() { - local addr=$1 - local id=$2 + local evts=$evts_ns1 + local t=${3:-1} + local ip=4 local tk da dp sp - local cnt_addr cnt_sf + local cnt + + [ "$1" == "$ns2" ] && evts=$evts_ns2 + if is_v6 $2; then ip=6; fi + tk=$(mptcp_lib_evts_get_info token "$evts") + da=$(mptcp_lib_evts_get_info "daddr$ip" "$evts" $t) + dp=$(mptcp_lib_evts_get_info dport "$evts" $t) + sp=$(mptcp_lib_evts_get_info sport "$evts" $t)
- tk=$(sed -n 's/.*(token:)([[:digit:]]*).*$/\2/p;q' "$evts_ns2") - da=$(sed -n 's/.*(daddr4:)([0-9.]*).*$/\2/p;q' "$evts_ns2") - dp=$(sed -n 's/.*(dport:)([[:digit:]]*).*$/\2/p;q' "$evts_ns2") - sp=$(grep "type:10" "$evts_ns2" | - sed -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q') - cnt_addr=$(rm_addr_count ${ns2}) - cnt_sf=$(rm_sf_count ${ns2}) - ip netns exec $ns2 ./pm_nl_ctl rem token $tk id $id - ip netns exec $ns2 ./pm_nl_ctl dsf lip $addr lport $sp \ + cnt=$(rm_sf_count ${1}) + ip netns exec $1 ./pm_nl_ctl dsf lip $2 lport $sp \ rip $da rport $dp token $tk - wait_rm_addr $ns2 "${cnt_addr}" - wait_rm_sf $ns2 "${cnt_sf}" + wait_rm_sf $1 "${cnt}" }
userspace_tests() @@ -3439,13 +3435,14 @@ userspace_tests() run_tests $ns1 $ns2 10.0.1.1 & local tests_pid=$! wait_mpj $ns1 - userspace_pm_add_addr 10.0.2.1 10 + userspace_pm_add_addr $ns1 10.0.2.1 10 chk_join_nr 1 1 1 chk_add_nr 1 1 chk_mptcp_info subflows 1 subflows 1 chk_subflows_total 2 2 chk_mptcp_info add_addr_signal 1 add_addr_accepted 1 - userspace_pm_rm_sf_addr_ns1 10.0.2.1 10 + userspace_pm_rm_addr $ns1 10 + userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED chk_rm_nr 1 1 invert chk_mptcp_info subflows 0 subflows 0 chk_subflows_total 1 1 @@ -3462,11 +3459,12 @@ userspace_tests() run_tests $ns1 $ns2 10.0.1.1 & local tests_pid=$! wait_mpj $ns2 - userspace_pm_add_sf 10.0.3.2 20 + userspace_pm_add_sf $ns2 10.0.3.2 20 chk_join_nr 1 1 1 chk_mptcp_info subflows 1 subflows 1 chk_subflows_total 2 2 - userspace_pm_rm_sf_addr_ns2 10.0.3.2 20 + userspace_pm_rm_addr $ns2 20 + userspace_pm_rm_sf $ns2 10.0.3.2 $SUB_ESTABLISHED chk_rm_nr 1 1 chk_mptcp_info subflows 0 subflows 0 chk_subflows_total 1 1
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang geliang.tang@suse.com
commit b850f2c7dd85ecd14a333685c4ffd23f12665e94 upstream.
To avoid duplicated code in different MPTCP selftests, we can add and use helpers defined in mptcp_lib.sh.
is_v6() helper is defined in mptcp_connect.sh, mptcp_join.sh and mptcp_sockopt.sh, so export it into mptcp_lib.sh and rename it as mptcp_lib_is_v6(). Use this new helper in all scripts.
Reviewed-by: Matthieu Baerts matttbe@kernel.org Signed-off-by: Geliang Tang geliang.tang@suse.com Signed-off-by: Mat Martineau martineau@kernel.org Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-10-8d6b94150f6b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 16 +++++----------- tools/testing/selftests/net/mptcp/mptcp_join.sh | 14 ++++---------- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 5 +++++ tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 8 +------- 4 files changed, 15 insertions(+), 28 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -310,12 +310,6 @@ check_mptcp_disabled() return 0 }
-# $1: IP address -is_v6() -{ - [ -z "${1##*:*}" ] -} - do_ping() { local listener_ns="$1" @@ -324,7 +318,7 @@ do_ping() local ping_args="-q -c 1" local rc=0
- if is_v6 "${connect_addr}"; then + if mptcp_lib_is_v6 "${connect_addr}"; then $ipv6 || return 0 ping_args="${ping_args} -6" fi @@ -620,12 +614,12 @@ run_tests_lo() fi
# skip if we don't want v6 - if ! $ipv6 && is_v6 "${connect_addr}"; then + if ! $ipv6 && mptcp_lib_is_v6 "${connect_addr}"; then return 0 fi
local local_addr - if is_v6 "${connect_addr}"; then + if mptcp_lib_is_v6 "${connect_addr}"; then local_addr="::" else local_addr="0.0.0.0" @@ -693,7 +687,7 @@ run_test_transparent() TEST_GROUP="${msg}"
# skip if we don't want v6 - if ! $ipv6 && is_v6 "${connect_addr}"; then + if ! $ipv6 && mptcp_lib_is_v6 "${connect_addr}"; then return 0 fi
@@ -726,7 +720,7 @@ EOF fi
local local_addr - if is_v6 "${connect_addr}"; then + if mptcp_lib_is_v6 "${connect_addr}"; then local_addr="::" r6flag="-6" else --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -592,12 +592,6 @@ link_failure() done }
-# $1: IP address -is_v6() -{ - [ -z "${1##*:*}" ] -} - # $1: ns, $2: port wait_local_port_listen() { @@ -877,7 +871,7 @@ pm_nl_set_endpoint() local id=10 while [ $add_nr_ns1 -gt 0 ]; do local addr - if is_v6 "${connect_addr}"; then + if mptcp_lib_is_v6 "${connect_addr}"; then addr="dead:beef:$counter::1" else addr="10.0.$counter.1" @@ -929,7 +923,7 @@ pm_nl_set_endpoint() local id=20 while [ $add_nr_ns2 -gt 0 ]; do local addr - if is_v6 "${connect_addr}"; then + if mptcp_lib_is_v6 "${connect_addr}"; then addr="dead:beef:$counter::2" else addr="10.0.$counter.2" @@ -971,7 +965,7 @@ pm_nl_set_endpoint() pm_nl_flush_endpoint ${connector_ns} elif [ $rm_nr_ns2 -eq 9 ]; then local addr - if is_v6 "${connect_addr}"; then + if mptcp_lib_is_v6 "${connect_addr}"; then addr="dead:beef:1::2" else addr="10.0.1.2" @@ -3339,7 +3333,7 @@ userspace_pm_rm_sf() local cnt
[ "$1" == "$ns2" ] && evts=$evts_ns2 - if is_v6 $2; then ip=6; fi + if mptcp_lib_is_v6 $2; then ip=6; fi tk=$(mptcp_lib_evts_get_info token "$evts") da=$(mptcp_lib_evts_get_info "daddr$ip" "$evts" $t) dp=$(mptcp_lib_evts_get_info dport "$evts" $t) --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -227,6 +227,11 @@ mptcp_lib_kill_wait() { wait "${1}" 2>/dev/null }
+# $1: IP address +mptcp_lib_is_v6() { + [ -z "${1##*:*}" ] +} + # $1: ns, $2: MIB counter mptcp_lib_get_counter() { local ns="${1}" --- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh @@ -161,12 +161,6 @@ check_transfer() return 0 }
-# $1: IP address -is_v6() -{ - [ -z "${1##*:*}" ] -} - do_transfer() { local listener_ns="$1" @@ -183,7 +177,7 @@ do_transfer() local mptcp_connect="./mptcp_connect -r 20"
local local_addr ip - if is_v6 "${connect_addr}"; then + if mptcp_lib_is_v6 "${connect_addr}"; then local_addr="::" ip=ipv6 else
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
commit 7092dbee23282b6fcf1313fc64e2b92649ee16e8 upstream.
Now both a v4 address and a v4-mapped address are supported when destroying a userspace pm subflow, this patch adds a second subflow to "userspace pm add & remove address" test, and two subflows could be removed two different ways, one with the v4mapped and one with v4.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 48d73f609dcc ("selftests: mptcp: update userspace pm addr tests") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 28 +++++++++++++----------- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 4 +-- 2 files changed, 18 insertions(+), 14 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -3328,16 +3328,17 @@ userspace_pm_rm_sf() { local evts=$evts_ns1 local t=${3:-1} - local ip=4 + local ip local tk da dp sp local cnt
[ "$1" == "$ns2" ] && evts=$evts_ns2 - if mptcp_lib_is_v6 $2; then ip=6; fi + [ -n "$(mptcp_lib_evts_get_info "saddr4" "$evts" $t)" ] && ip=4 + [ -n "$(mptcp_lib_evts_get_info "saddr6" "$evts" $t)" ] && ip=6 tk=$(mptcp_lib_evts_get_info token "$evts") - da=$(mptcp_lib_evts_get_info "daddr$ip" "$evts" $t) - dp=$(mptcp_lib_evts_get_info dport "$evts" $t) - sp=$(mptcp_lib_evts_get_info sport "$evts" $t) + da=$(mptcp_lib_evts_get_info "daddr$ip" "$evts" $t $2) + dp=$(mptcp_lib_evts_get_info dport "$evts" $t $2) + sp=$(mptcp_lib_evts_get_info sport "$evts" $t $2)
cnt=$(rm_sf_count ${1}) ip netns exec $1 ./pm_nl_ctl dsf lip $2 lport $sp \ @@ -3424,20 +3425,23 @@ userspace_tests() if reset_with_events "userspace pm add & remove address" && continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then set_userspace_pm $ns1 - pm_nl_set_limits $ns2 1 1 + pm_nl_set_limits $ns2 2 2 speed=5 \ run_tests $ns1 $ns2 10.0.1.1 & local tests_pid=$! wait_mpj $ns1 userspace_pm_add_addr $ns1 10.0.2.1 10 - chk_join_nr 1 1 1 - chk_add_nr 1 1 - chk_mptcp_info subflows 1 subflows 1 - chk_subflows_total 2 2 - chk_mptcp_info add_addr_signal 1 add_addr_accepted 1 + userspace_pm_add_addr $ns1 10.0.3.1 20 + chk_join_nr 2 2 2 + chk_add_nr 2 2 + chk_mptcp_info subflows 2 subflows 2 + chk_subflows_total 3 3 + chk_mptcp_info add_addr_signal 2 add_addr_accepted 2 userspace_pm_rm_addr $ns1 10 userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED - chk_rm_nr 1 1 invert + userspace_pm_rm_addr $ns1 20 + userspace_pm_rm_sf $ns1 10.0.3.1 $SUB_ESTABLISHED + chk_rm_nr 2 2 invert chk_mptcp_info subflows 0 subflows 0 chk_subflows_total 1 1 kill_events_pids --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -213,9 +213,9 @@ mptcp_lib_get_info_value() { grep "${2}" | sed -n 's/.*('"${1}"':)([0-9a-f:.]*).*$/\2/p;q' }
-# $1: info name ; $2: evts_ns ; $3: event type +# $1: info name ; $2: evts_ns ; [$3: event type; [$4: addr]] mptcp_lib_evts_get_info() { - mptcp_lib_get_info_value "${1}" "^type:${3:-1}," < "${2}" + grep "${4:-}" "${2}" | mptcp_lib_get_info_value "${1}" "^type:${3:-1}," }
# $1: PID
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Danilo Krummrich dakr@redhat.com
This bug is present in v6.7 only, since the scheduler design has been re-worked in v6.8.
Client scheduler entities must be flushed before an associated GPU scheduler is teared down. Otherwise the entitiy might still hold a pointer to the scheduler's runqueue which is freed at scheduler tear down already.
[ 305.224293] ================================================================== [ 305.224297] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224310] Read of size 8 at addr ffff8881440a8f48 by task rmmod/4436
[ 305.224317] CPU: 10 PID: 4436 Comm: rmmod Tainted: G U 6.7.6-100.fc38.x86_64+debug #1 [ 305.224321] Hardware name: Dell Inc. Precision 7550/01PXFR, BIOS 1.27.0 11/08/2023 [ 305.224324] Call Trace: [ 305.224327] <TASK> [ 305.224329] dump_stack_lvl+0x76/0xd0 [ 305.224336] print_report+0xcf/0x670 [ 305.224342] ? drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224352] ? __virt_addr_valid+0x215/0x410 [ 305.224359] ? drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224368] kasan_report+0xa6/0xe0 [ 305.224373] ? drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224385] drm_sched_entity_flush+0x6c4/0x7b0 [gpu_sched] [ 305.224395] ? __pfx_drm_sched_entity_flush+0x10/0x10 [gpu_sched] [ 305.224406] ? rcu_is_watching+0x15/0xb0 [ 305.224413] drm_sched_entity_destroy+0x17/0x20 [gpu_sched] [ 305.224422] nouveau_cli_fini+0x6c/0x120 [nouveau] [ 305.224658] nouveau_drm_device_fini+0x2ac/0x490 [nouveau] [ 305.224871] nouveau_drm_remove+0x18e/0x220 [nouveau] [ 305.225082] ? __pfx_nouveau_drm_remove+0x10/0x10 [nouveau] [ 305.225290] ? rcu_is_watching+0x15/0xb0 [ 305.225295] ? _raw_spin_unlock_irqrestore+0x66/0x80 [ 305.225299] ? trace_hardirqs_on+0x16/0x100 [ 305.225304] ? _raw_spin_unlock_irqrestore+0x4f/0x80 [ 305.225310] pci_device_remove+0xa3/0x1d0 [ 305.225316] device_release_driver_internal+0x379/0x540 [ 305.225322] driver_detach+0xc5/0x180 [ 305.225327] bus_remove_driver+0x11e/0x2a0 [ 305.225333] pci_unregister_driver+0x2a/0x250 [ 305.225339] nouveau_drm_exit+0x1f/0x970 [nouveau] [ 305.225548] __do_sys_delete_module+0x350/0x580 [ 305.225554] ? __pfx___do_sys_delete_module+0x10/0x10 [ 305.225562] ? syscall_enter_from_user_mode+0x26/0x90 [ 305.225567] ? rcu_is_watching+0x15/0xb0 [ 305.225571] ? syscall_enter_from_user_mode+0x26/0x90 [ 305.225575] ? trace_hardirqs_on+0x16/0x100 [ 305.225580] do_syscall_64+0x61/0xe0 [ 305.225584] ? rcu_is_watching+0x15/0xb0 [ 305.225587] ? syscall_exit_to_user_mode+0x1f/0x50 [ 305.225592] ? trace_hardirqs_on_prepare+0xe3/0x100 [ 305.225596] ? do_syscall_64+0x70/0xe0 [ 305.225600] ? trace_hardirqs_on_prepare+0xe3/0x100 [ 305.225604] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 305.225609] RIP: 0033:0x7f6148f3592b [ 305.225650] Code: 73 01 c3 48 8b 0d dd 04 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ad 04 0c 00 f7 d8 64 89 01 48 [ 305.225653] RSP: 002b:00007ffe89986f08 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 305.225659] RAX: ffffffffffffffda RBX: 000055cbb036e900 RCX: 00007f6148f3592b [ 305.225662] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055cbb036e968 [ 305.225664] RBP: 00007ffe89986f30 R08: 1999999999999999 R09: 0000000000000000 [ 305.225667] R10: 00007f6148fa6ac0 R11: 0000000000000206 R12: 0000000000000000 [ 305.225670] R13: 00007ffe89987190 R14: 000055cbb036e900 R15: 0000000000000000 [ 305.225678] </TASK>
[ 305.225683] Allocated by task 484: [ 305.225685] kasan_save_stack+0x33/0x60 [ 305.225690] kasan_set_track+0x25/0x30 [ 305.225693] __kasan_kmalloc+0x8f/0xa0 [ 305.225696] drm_sched_init+0x3c7/0xce0 [gpu_sched] [ 305.225705] nouveau_sched_init+0xd2/0x110 [nouveau] [ 305.225913] nouveau_drm_device_init+0x130/0x3290 [nouveau] [ 305.226121] nouveau_drm_probe+0x1ab/0x6b0 [nouveau] [ 305.226329] local_pci_probe+0xda/0x190 [ 305.226333] pci_device_probe+0x23a/0x780 [ 305.226337] really_probe+0x3df/0xb80 [ 305.226341] __driver_probe_device+0x18c/0x450 [ 305.226345] driver_probe_device+0x4a/0x120 [ 305.226348] __driver_attach+0x1e5/0x4a0 [ 305.226351] bus_for_each_dev+0x106/0x190 [ 305.226355] bus_add_driver+0x2a1/0x570 [ 305.226358] driver_register+0x134/0x460 [ 305.226361] do_one_initcall+0xd3/0x430 [ 305.226366] do_init_module+0x238/0x770 [ 305.226370] load_module+0x5581/0x6f10 [ 305.226374] __do_sys_init_module+0x1f2/0x220 [ 305.226377] do_syscall_64+0x61/0xe0 [ 305.226381] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 305.226387] Freed by task 4436: [ 305.226389] kasan_save_stack+0x33/0x60 [ 305.226392] kasan_set_track+0x25/0x30 [ 305.226396] kasan_save_free_info+0x2b/0x50 [ 305.226399] __kasan_slab_free+0x10b/0x1a0 [ 305.226402] slab_free_freelist_hook+0x12b/0x1e0 [ 305.226406] __kmem_cache_free+0xd4/0x1d0 [ 305.226410] drm_sched_fini+0x178/0x320 [gpu_sched] [ 305.226418] nouveau_drm_device_fini+0x2a0/0x490 [nouveau] [ 305.226624] nouveau_drm_remove+0x18e/0x220 [nouveau] [ 305.226832] pci_device_remove+0xa3/0x1d0 [ 305.226836] device_release_driver_internal+0x379/0x540 [ 305.226840] driver_detach+0xc5/0x180 [ 305.226843] bus_remove_driver+0x11e/0x2a0 [ 305.226847] pci_unregister_driver+0x2a/0x250 [ 305.226850] nouveau_drm_exit+0x1f/0x970 [nouveau] [ 305.227056] __do_sys_delete_module+0x350/0x580 [ 305.227060] do_syscall_64+0x61/0xe0 [ 305.227064] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 305.227070] The buggy address belongs to the object at ffff8881440a8f00 which belongs to the cache kmalloc-128 of size 128 [ 305.227073] The buggy address is located 72 bytes inside of freed 128-byte region [ffff8881440a8f00, ffff8881440a8f80)
[ 305.227078] The buggy address belongs to the physical page: [ 305.227081] page:00000000627efa0a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1440a8 [ 305.227085] head:00000000627efa0a order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 305.227088] flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff) [ 305.227093] page_type: 0xffffffff() [ 305.227097] raw: 0017ffffc0000840 ffff8881000428c0 ffffea0005b33500 dead000000000002 [ 305.227100] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 305.227102] page dumped because: kasan: bad access detected
[ 305.227106] Memory state around the buggy address: [ 305.227109] ffff8881440a8e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 305.227112] ffff8881440a8e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 305.227114] >ffff8881440a8f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 305.227117] ^ [ 305.227120] ffff8881440a8f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 305.227122] ffff8881440a9000: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc [ 305.227125] ==================================================================
Cc: stable@vger.kernel.org # v6.7 only Reported-by: Karol Herbst kherbst@redhat.com Closes: https://gist.githubusercontent.com/karolherbst/a20eb0f937a06ed6aabe2ac2ca3d1... Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI") Signed-off-by: Danilo Krummrich dakr@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nouveau_drm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -708,10 +708,11 @@ nouveau_drm_device_fini(struct drm_devic } mutex_unlock(&drm->clients_lock);
- nouveau_sched_fini(drm); - nouveau_cli_fini(&drm->client); nouveau_cli_fini(&drm->master); + + nouveau_sched_fini(drm); + nvif_parent_dtor(&drm->parent); mutex_destroy(&drm->clients_lock); kfree(drm);
Hello,
On Mon, 4 Mar 2024 21:21:05 +0000 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.7.9 release. There are 162 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.9-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] a8588364073d ("Linux 6.7.9-rc1")
Thanks, SJ
[...]
---
ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
Den mån 4 mars 2024 kl 23:48 skrev SeongJae Park sj@kernel.org:
Hello,
On Mon, 4 Mar 2024 21:21:05 +0000 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.7.9 release. There are 162 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.9-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] a8588364073d ("Linux 6.7.9-rc1")
Thanks, SJ
[...]
ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
Works fine on my Arch Linux desktop with model name : AMD Ryzen 5 5600 6-Core Processor after the Arch Linux manual intervention for new mkinitcpio settings and version in Arch
Tested by: Luna Jernberg droidbittin@gmail.com
On Tue, Mar 05, 2024 at 12:05:33AM +0100, Luna Jernberg wrote:
Works fine on my Arch Linux desktop with model name : AMD Ryzen 5 5600 6-Core Processor after the Arch Linux manual intervention for new mkinitcpio settings and version in Arch
Tested by: Luna Jernberg droidbittin@gmail.com
Nit, this should be "Tested-by:" without the ' ', otherwise our tools will not pick this up.
thanks,
greg k-h
Tested-by: Luna Jernberg droidbittin@gmail.com
Den ons 6 mars 2024 kl 15:52 skrev Greg Kroah-Hartman gregkh@linuxfoundation.org:
On Tue, Mar 05, 2024 at 12:05:33AM +0100, Luna Jernberg wrote:
Works fine on my Arch Linux desktop with model name : AMD Ryzen 5 5600 6-Core Processor after the Arch Linux manual intervention for new mkinitcpio settings and version in Arch
Tested by: Luna Jernberg droidbittin@gmail.com
Nit, this should be "Tested-by:" without the ' ', otherwise our tools will not pick this up.
thanks,
greg k-h
Den ons 6 mars 2024 kl 15:52 skrev Greg Kroah-Hartman gregkh@linuxfoundation.org:
On Tue, Mar 05, 2024 at 12:05:33AM +0100, Luna Jernberg wrote:
Works fine on my Arch Linux desktop with model name : AMD Ryzen 5 5600 6-Core Processor after the Arch Linux manual intervention for new mkinitcpio settings and version in Arch
Tested by: Luna Jernberg droidbittin@gmail.com
Nit, this should be "Tested-by:" without the ' ', otherwise our tools will not pick this up.
thanks,
greg k-h
Alright resent the emails with the right tag format sorry for the trouble
On 3/4/24 1:21 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.9 release. There are 162 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.9-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
The build fails on RISC-V RV64 with:
arch/riscv/kernel/suspend.c: In function ‘suspend_save_csrs’: arch/riscv/kernel/suspend.c:14:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’ undeclared (first use in this function); did you mean ‘RISCV_ISA_EXT_ZIFENCEI’? 14 | if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | RISCV_ISA_EXT_ZIFENCEI arch/riscv/kernel/suspend.c:14:66: note: each undeclared identifier is reported only once for each function it appears in arch/riscv/kernel/suspend.c: In function ‘suspend_restore_csrs’: arch/riscv/kernel/suspend.c:37:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’ undeclared (first use in this function); did you mean ‘RISCV_ISA_EXT_ZIFENCEI’? 37 | if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | RISCV_ISA_EXT_ZIFENCEI make[4]: *** [scripts/Makefile.build:243: arch/riscv/kernel/suspend.o] Error 1 make[3]: *** [scripts/Makefile.build:480: arch/riscv/kernel] Error 2 make[2]: *** [scripts/Makefile.build:480: arch/riscv] Error 2
The patch "riscv: Save/restore envcfg CSR during CPU suspend" (commit 64e54f78d9f2dc30ac399a632922bb1fe036778a) requires patch "riscv: Add a custom ISA extension for the [ms]envcfg CSR" (upstream commit 4774848fef6041716a4883217eb75f6b10eb183b).
On Mon, Mar 04, 2024 at 03:31:00PM -0800, Ron Economos wrote:
On 3/4/24 1:21 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.9 release. There are 162 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.9-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
The build fails on RISC-V RV64 with:
arch/riscv/kernel/suspend.c: In function ‘suspend_save_csrs’: arch/riscv/kernel/suspend.c:14:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’ undeclared (first use in this function); did you mean ‘RISCV_ISA_EXT_ZIFENCEI’? 14 | if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | RISCV_ISA_EXT_ZIFENCEI arch/riscv/kernel/suspend.c:14:66: note: each undeclared identifier is reported only once for each function it appears in arch/riscv/kernel/suspend.c: In function ‘suspend_restore_csrs’: arch/riscv/kernel/suspend.c:37:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’ undeclared (first use in this function); did you mean ‘RISCV_ISA_EXT_ZIFENCEI’? 37 | if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | RISCV_ISA_EXT_ZIFENCEI make[4]: *** [scripts/Makefile.build:243: arch/riscv/kernel/suspend.o] Error 1 make[3]: *** [scripts/Makefile.build:480: arch/riscv/kernel] Error 2 make[2]: *** [scripts/Makefile.build:480: arch/riscv] Error 2
The patch "riscv: Save/restore envcfg CSR during CPU suspend" (commit 64e54f78d9f2dc30ac399a632922bb1fe036778a) requires patch "riscv: Add a custom ISA extension for the [ms]envcfg CSR" (upstream commit 4774848fef6041716a4883217eb75f6b10eb183b).
Ah, that wasn't obvious, I've applied that pre-requsite patch now, needed to be done by-hand. I'll push out a -rc2 in a bit with this fix in it, thanks for testing!
greg k-h
Den tis 5 mars 2024 kl 08:41 skrev Greg Kroah-Hartman gregkh@linuxfoundation.org:
On Mon, Mar 04, 2024 at 03:31:00PM -0800, Ron Economos wrote:
On 3/4/24 1:21 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.9 release. There are 162 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.9-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
The build fails on RISC-V RV64 with:
arch/riscv/kernel/suspend.c: In function ‘suspend_save_csrs’: arch/riscv/kernel/suspend.c:14:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’ undeclared (first use in this function); did you mean ‘RISCV_ISA_EXT_ZIFENCEI’? 14 | if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | RISCV_ISA_EXT_ZIFENCEI arch/riscv/kernel/suspend.c:14:66: note: each undeclared identifier is reported only once for each function it appears in arch/riscv/kernel/suspend.c: In function ‘suspend_restore_csrs’: arch/riscv/kernel/suspend.c:37:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’ undeclared (first use in this function); did you mean ‘RISCV_ISA_EXT_ZIFENCEI’? 37 | if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | RISCV_ISA_EXT_ZIFENCEI make[4]: *** [scripts/Makefile.build:243: arch/riscv/kernel/suspend.o] Error 1 make[3]: *** [scripts/Makefile.build:480: arch/riscv/kernel] Error 2 make[2]: *** [scripts/Makefile.build:480: arch/riscv] Error 2
The patch "riscv: Save/restore envcfg CSR during CPU suspend" (commit 64e54f78d9f2dc30ac399a632922bb1fe036778a) requires patch "riscv: Add a custom ISA extension for the [ms]envcfg CSR" (upstream commit 4774848fef6041716a4883217eb75f6b10eb183b).
Ah, that wasn't obvious, I've applied that pre-requsite patch now, needed to be done by-hand. I'll push out a -rc2 in a bit with this fix in it, thanks for testing!
greg k-h
Alright thanks for the info will test RC2 later today then
On Mon, Mar 04, 2024 at 09:21:05PM +0000, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.9 release. There are 162 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed the kernel on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On 3/4/24 14:21, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.9 release. There are 162 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 06 Mar 2024 21:15:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.9-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org