This is the start of the stable review cycle for the 6.16.8 release. There are 189 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 19 Sep 2025 12:32:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.8-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.16.8-rc1
Johan Hovold johan@kernel.org phy: ti-pipe3: fix device leak at unbind
Johan Hovold johan@kernel.org phy: ti: omap-usb2: fix device leak at unbind
Johan Hovold johan@kernel.org phy: tegra: xusb: fix device and OF node leak at probe
Stephan Gerhold stephan.gerhold@linaro.org phy: qcom: qmp-pcie: Fix PHY initialization when powered down by firmware
Miaoqian Lin linmq006@gmail.com dmaengine: dw: dmamux: Fix device reference leak in rzn1_dmamux_route_allocate
Stephan Gerhold stephan.gerhold@linaro.org dmaengine: qcom: bam_dma: Fix DT error handling for num-channels/ees
Takashi Iwai tiwai@suse.de usb: gadget: midi2: Fix MIDI2 IN EP max packet size
Takashi Iwai tiwai@suse.de usb: gadget: midi2: Fix missing UMP group attributes initialization
RD Babiera rdbabiera@google.com usb: typec: tcpm: properly deliver cable vdms to altmode drivers
Alan Stern stern@rowland.harvard.edu USB: gadget: dummy-hcd: Fix locking bug in RT-enabled kernels
Mathias Nyman mathias.nyman@linux.intel.com xhci: fix memory leak regression when freeing xhci vdev devices depth first
Mathias Nyman mathias.nyman@linux.intel.com xhci: dbc: Fix full DbC transfer ring after several reconnects
Mathias Nyman mathias.nyman@linux.intel.com xhci: dbc: decouple endpoint allocation from initialization
Yuezhang Mo Yuezhang.Mo@sony.com erofs: fix runtime warning on truncate_folio_batch_exceptionals()
Andreas Kemnade akemnade@kernel.org regulator: sy7636a: fix lifecycle of power good gpio
Anders Roxell anders.roxell@linaro.org dmaengine: ti: edma: Fix memory allocation size for queue_priority_map
Gao Xiang xiang@kernel.org erofs: fix invalid algorithm for encoded extents
Gao Xiang xiang@kernel.org erofs: unify meta buffers in z_erofs_fill_inode()
Gao Xiang xiang@kernel.org erofs: remove need_kmap in erofs_read_metabuf()
Gao Xiang xiang@kernel.org erofs: get rid of {get,put}_page() for ztailpacking data
Dan Carpenter dan.carpenter@linaro.org dmaengine: idxd: Fix double free in idxd_setup_wqs()
Yi Sun yi.sun@intel.com dmaengine: idxd: Fix refcount underflow on module unload
Yi Sun yi.sun@intel.com dmaengine: idxd: Remove improper idxd_free
Pengyu Luo mitltlatltl@gmail.com phy: qualcomm: phy-qcom-eusb2-repeater: fix override properties
Hangbin Liu liuhangbin@gmail.com hsr: hold rcu and dev lock for hsr_get_port_ndev
Hangbin Liu liuhangbin@gmail.com hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr
Hangbin Liu liuhangbin@gmail.com hsr: use rtnl lock when iterating over ports
Florian Westphal fw@strlen.de netfilter: nf_tables: restart set lookup on base_seq change
Florian Westphal fw@strlen.de netfilter: nf_tables: make nft_set_do_lookup available unconditionally
Florian Westphal fw@strlen.de netfilter: nf_tables: place base_seq in struct net
Phil Sutter phil@nwl.cc netfilter: nf_tables: Reintroduce shortened deletion notifications
Florian Westphal fw@strlen.de netfilter: nft_set_rbtree: continue traversal if element is inactive
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: don't return bogus extension pointer
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: merge pipapo_get/lookup
Florian Westphal fw@strlen.de netfilter: nft_set: remove one argument from lookup and update functions
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: remove unused arguments
Florian Westphal fw@strlen.de netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
Anssi Hannula anssi.hannula@bitwise.fi can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp can: j1939: implement NETDEV_UNREGISTER notification handler
Davide Caratti dcaratti@redhat.com selftests: can: enable CONFIG_CAN_VCAN as a module
Stanislav Fomichev sdf@fomichev.me macsec: sync features on RTM_NEWLINK
Carolina Jubran cjubran@nvidia.com net: dev_ioctl: take ops lock in hwtstamp lower paths
Alex Deucher alexander.deucher@amd.com drm/amd/display: use udelay rather than fsleep
Michal Wajdeczko michal.wajdeczko@intel.com drm/xe/configfs: Don't touch survivability_mode on fini
Michal Schmidt mschmidt@redhat.com i40e: fix IRQ freeing in i40e_vsi_request_irq_msix error path
Kohei Enju enjuk@amazon.com igb: fix link test skipping when interface is admin down
Tianyu Xu tianyxu@cisco.com igb: Fix NULL pointer dereference in ethtool loopback test
Alex Tran alex.t.tran@gmail.com docs: networking: can: change bcm_msg_head frames member to support flexible array
Antoine Tenart atenart@kernel.org tunnels: reset the GSO metadata before reusing the skb
Petr Machata petrm@nvidia.com net: bridge: Bounce invalid boolopts
Jonas Gorski jonas.gorski@gmail.com net: dsa: b53: fix ageing time for BCM53101
Alok Tiwari alok.a.tiwari@oracle.com genetlink: fix genl_bind() invoking bind() after -EPERM
Klaus Kudielka klaus.kudielka@gmail.com PCI: mvebu: Fix use of for_each_of_range() iterator
Miaoqing Pan miaoqing.pan@oss.qualcomm.com wifi: ath12k: fix WMI TLV header misalignment
Sriram R quic_srirrama@quicinc.com wifi: ath12k: Add support to enqueue management frame at MLD level
Sarika Sharma quic_sarishar@quicinc.com wifi: ath12k: add link support for multi-link in arsta
Miaoqing Pan miaoqing.pan@oss.qualcomm.com wifi: ath12k: Fix missing station power save configuration
Vladimir Oltean vladimir.oltean@nxp.com net: phy: transfer phy_config_inband() locking responsibility to phylink
Vladimir Oltean vladimir.oltean@nxp.com net: phylink: add lock for serializing concurrent pl->phydev writes with resolver
Stefan Wahren wahrenst@gmx.net net: fec: Fix possible NPD in fec_enet_phy_reset_after_clk_enable()
Chia-I Wu olvaffe@gmail.com drm/panthor: validate group queue count
Christophe JAILLET christophe.jaillet@wanadoo.fr mtd: rawnand: nuvoton: Fix an error handling path in ma35_nand_chips_init()
Fabio Porcedda fabio.porcedda@gmail.com USB: serial: option: add Telit Cinterion LE910C4-WWX new compositions
Fabio Porcedda fabio.porcedda@gmail.com USB: serial: option: add Telit Cinterion FN990A w/audio compositions
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org dt-bindings: serial: brcm,bcm7271-uart: Constrain clocks
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: fix bug in flow control levels init
Fabian Vogt fvogt@suse.de tty: hvc_console: Call hvc_kick in hvc_write unconditionally
Paolo Abeni pabeni@redhat.com Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"
Antheas Kapenekakis lkml@antheas.dev Input: xpad - add support for Flydigi Apex 5
Christoffer Sandberg cs@tuxedo.de Input: i8042 - add TUXEDO InfinityBook Pro Gen10 AMD to i8042 quirk table
Jeff LaBundy jeff@labundy.com Input: iqs7222 - avoid enabling unused interrupts
K Prateek Nayak kprateek.nayak@amd.com x86/cpu/topology: Always try cpu_parse_topology_ext() on AMD/Hygon
Reinette Chatre reinette.chatre@intel.com fs/resctrl: Eliminate false positive lockdep warning when reading SNC counters
Xiongfeng Wang wangxiongfeng2@huawei.com hrtimers: Unconditionally update target CPU base after offline timer migration
Pratap Nirujogi pratap.nirujogi@amd.com drm/amd/amdgpu: Declare isp firmware binary file
Mario Limonciello (AMD) mario.limonciello@amd.com drm/amd/display: Drop dm_prepare_suspend() and dm_complete()
Mario Limonciello mario.limonciello@amd.com drm/amd/display: Destroy cached state in complete() callback
Quanmin Yan yanquanmin1@huawei.com mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters()
Stanislav Fort stanislav.fort@aisle.com mm/damon/sysfs: fix use-after-free in state_show()
Santhosh Kumar K s-k6@ti.com mtd: spinand: winbond: Fix oob_layout for W25N01JW
Miquel Raynal miquel.raynal@bootlin.com mtd: spinand: winbond: Enable high-speed modes on w25n0xjw
Miquel Raynal miquel.raynal@bootlin.com mtd: spinand: Add a ->configure_chip() hook
Max Kellermann max.kellermann@ionos.com ceph: fix crash after fscrypt_encrypt_pagecache_blocks() error
Max Kellermann max.kellermann@ionos.com ceph: always call ceph_shift_unused_folios_left()
Alex Markuze amarkuze@redhat.com ceph: fix race condition where r_parent becomes stale before sending message
Alex Markuze amarkuze@redhat.com ceph: fix race condition validating r_parent before applying state
Ilya Dryomov idryomov@gmail.com libceph: fix invalid accesses to ceph_connection_v1_info
Chen Ridong chenridong@huawei.com kernfs: Fix UAF in polling when open file is released
Qu Wenruo wqu@suse.com btrfs: fix corruption reading compressed range when block size is smaller than page size
Boris Burkov boris@bur.io btrfs: use readahead_expand() on compressed extents
Fangzhi Zuo Jerry.Zuo@amd.com drm/amd/display: Disable DPCD Probe Quirk
Imre Deak imre.deak@intel.com drm/dp: Add an EDID quirk for the DPCD register access probe
Imre Deak imre.deak@intel.com drm/edid: Add support for quirks visible to DRM core and drivers
Imre Deak imre.deak@intel.com drm/edid: Define the quirks in an enum list
Geoffrey McRae geoffrey.mcrae@amd.com drm/amd/display: remove oem i2c adapter on finish
Ovidiu Bunea ovidiu.bunea@amd.com drm/amd/display: Correct sequences and delays for DCN35 PG & RCG
David Rosca david.rosca@amd.com drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packages
David Rosca david.rosca@amd.com drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any time
Alex Deucher alexander.deucher@amd.com drm/amdgpu: fix a memory leak in fence cleanup when unloading
Thomas Hellström thomas.hellstrom@linux.intel.com drm/xe: Block exec and rebind worker while evicting for suspend / hibernate
Thomas Hellström thomas.hellstrom@linux.intel.com drm/xe: Allow the pm notifier to continue on failure
Thomas Hellström thomas.hellstrom@linux.intel.com drm/xe: Attempt to bring bos back to VRAM after eviction
Jani Nikula jani.nikula@intel.com drm/i915/power: fix size for for_each_set_bit() in abox iteration
Johan Hovold johan@kernel.org drm/mediatek: fix potential OF node use-after-free
Quanmin Yan yanquanmin1@huawei.com mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters()
Sang-Heon Jeon ekffu200098@gmail.com mm/damon/core: set quota->charged_from to jiffies at first charge window
Kyle Meyer kyle.meyer@hpe.com mm/memory-failure: fix redundant updates for already poisoned pages
Miaohe Lin linmiaohe@huawei.com mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
Uladzislau Rezki (Sony) urezki@gmail.com mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()
Wei Yang richard.weiyang@gmail.com mm/khugepaged: fix the address passed to notifier on testing young
Jeongjun Park aha310510@gmail.com mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
Miklos Szeredi mszeredi@redhat.com fuse: prevent overflow in copy_file_range return value
Miklos Szeredi mszeredi@redhat.com fuse: check if copy_file_range() returns larger than requested size
Amir Goldstein amir73il@gmail.com fuse: do not allow mapping a non-regular backing file
Christophe Kerello christophe.kerello@foss.st.com mtd: rawnand: stm32_fmc2: fix ECC overwrite
Christophe Kerello christophe.kerello@foss.st.com mtd: rawnand: stm32_fmc2: avoid overlapping mappings on ECC buffer
Alexander Sverdlin alexander.sverdlin@gmail.com mtd: nand: raw: atmel: Respect tAR, tCLR in read setup timing
Paulo Alcantara pc@manguebit.org smb: client: fix data loss due to broken rename(2)
Paulo Alcantara pc@manguebit.org smb: client: fix compound alignment with encryption
Breno Leitao leitao@debian.org s390: kexec: initialize kexec_buf struct
Johannes Berg johannes.berg@intel.com wifi: iwlwifi: fix 130/1030 configs
Rafael J. Wysocki rafael.j.wysocki@intel.com PM: hibernate: Restrict GFP mask in hibernation_snapshot()
Rafael J. Wysocki rafael.j.wysocki@intel.com PM: EM: Add function for registering a PD without capacity update
Oleksij Rempel o.rempel@pengutronix.de net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups
Jiawen Wu jiawenwu@trustnetic.com net: libwx: fix to enable RSS
Jonas Jelonek jelonek.jonas@gmail.com i2c: rtl9300: remove broken SMBus Quick operation support
Jonas Jelonek jelonek.jonas@gmail.com i2c: rtl9300: ensure data length is within supported range
Chiasheng Lee chiasheng.lee@linux.intel.com i2c: i801: Hide Intel Birch Stream SoC TCO WDT
Omar Sandoval osandov@fb.com btrfs: fix subvolume deletion lockup caused by inodes xarray race
Boris Burkov boris@bur.io btrfs: fix squota compressed stats leak
Mark Tinguely mark.tinguely@oracle.com ocfs2: fix recursive semaphore deadlock in fiemap call
Matthieu Baerts (NGI0) matttbe@kernel.org netlink: specs: mptcp: fix if-idx attribute type
Matthieu Baerts (NGI0) matttbe@kernel.org doc: mptcp: net.mptcp.pm_type is deprecated
Krister Johansen kjlx@templeofstupid.com mptcp: sockopt: make sync_socket_options propagate SOCK_KEEPOPEN
Breno Leitao leitao@debian.org arm64: kexec: initialize kexec_buf struct in load_other_segments()
Nathan Chancellor nathan@kernel.org compiler-clang.h: define __SANITIZE_*__ macros only when undefined
Trond Myklebust trond.myklebust@hammerspace.com Revert "SUNRPC: Don't allow waiting for exiting tasks"
Jonas Jelonek jelonek.jonas@gmail.com i2c: rtl9300: fix channel number bound check
Salah Triki salah.triki@gmail.com EDAC/altera: Delete an inappropriate dma_free_coherent() call
wangzijie wangzijie1@honor.com proc: fix type confusion in pde_set_flags()
Kuniyuki Iwashima kuniyu@google.com tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork.
Peilin Ye yepeilin@google.com bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
KaFai Wan kafai.wan@linux.dev bpf: Allow fall back to interpreter for programs with stack size <= 512
Kumar Kartikeya Dwivedi memxor@gmail.com rqspinlock: Choose trylock fallback for NMI waiters
Maciej Fijalkowski maciej.fijalkowski@intel.com xsk: Fix immature cq descriptor production
Daniel Borkmann daniel@iogearbox.net bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt
Mario Limonciello (AMD) superm1@kernel.org cpufreq/amd-pstate: Fix a regression leading to EPP 0 after resume
Thomas Richter tmricht@linux.ibm.com s390/cpum_cf: Deny all sampling events by counter PMU
Thomas Richter tmricht@linux.ibm.com s390/pai: Deny all events not handled by this PMU
Gautham R. Shenoy gautham.shenoy@amd.com cpufreq/amd-pstate: Fix setting of CPPC.min_perf in active mode for performance governor
Jesper Dangaard Brouer hawk@kernel.org bpf, cpumap: Disable page_pool direct xdp_return need larger scope
Pu Lehui pulehui@huawei.com tracing: Silence warning when chunk allocation fails in trace_pid_write
Jonathan Curley jcurley@purestorage.com NFSv4/flexfiles: Fix layout merge mirror check.
Trond Myklebust trond.myklebust@hammerspace.com NFS: nfs_invalidate_folio() must observe the offset and size arguments
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.2: Serialise O_DIRECT i/o and copy range
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.2: Serialise O_DIRECT i/o and clone range
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.2: Serialise O_DIRECT i/o and fallocate()
Trond Myklebust trond.myklebust@hammerspace.com NFS: Serialise O_DIRECT i/o and truncate()
Wang Liang wangliang74@huawei.com tracing/osnoise: Fix null-ptr-deref in bitmap_parselist()
Vladimir Riabchun ferr.lambarginio@gmail.com ftrace/samples: Fix function size computation
Scott Mayhew smayhew@redhat.com nfs/localio: restore creds before releasing pageio data
Luo Gengkun luogengkun@huaweicloud.com tracing: Fix tracing_marker may trigger page fault during preempt_disable
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Clear the NFS_CAP_XATTR flag if not supported by the server
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Clear NFS_CAP_OPEN_XOR and NFS_CAP_DELEGTIME if not supported
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Clear the NFS_CAP_FS_LOCATIONS flag if it is not set
Guenter Roeck linux@roeck-us.net trace/fgraph: Fix error handling
Xiao Ni xni@redhat.com md: keep recovery_cp in mdp_superblock_s
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Don't clear capabilities that won't be reset
Justin Worrell jworrell@gmail.com SUNRPC: call xs_sock_process_cmsg for all cmsg
Tigran Mkrtchyan tigran.mkrtchyan@desy.de flexfiles/pNFS: fix NULL checks on result of ff_layout_choose_ds_for_read
Alex Deucher alexander.deucher@amd.com Revert "drm/amdgpu: Add more checks to PSP mailbox"
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: ISO: Fix getname not returning broadcast fields
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_conn: Fix running bis_cleanup for hci_conn->type PA_LINK
Lu Baolu baolu.lu@linux.intel.com iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain
Jason Gunthorpe jgg@ziepe.ca iommu/vt-d: Split paging_domain_compatible()
Jason Gunthorpe jgg@ziepe.ca iommu/vt-d: Create unique domain ops for each stage
Jason Gunthorpe jgg@ziepe.ca iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags()
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_conn: Fix not cleaning up Broadcaster/Broadcast Source
Dan Carpenter dan.carpenter@linaro.org irqchip/mvebu-gicp: Fix an IS_ERR() vs NULL check in probe()
Kan Liang kan.liang@linux.intel.com perf: Fix the POLL_HUP delivery breakage
Baochen Qiang baochen.qiang@oss.qualcomm.com dma-debug: don't enforce dma mapping check on noncoherent allocations
Amir Goldstein amir73il@gmail.com fhandle: use more consistent rules for decoding file handle from userns
Edward Adam Davis eadavis@qq.com fuse: Block access to folio overlimit
Christian Brauner brauner@kernel.org coredump: don't pointlessly check and spew warnings
Christoph Hellwig hch@lst.de block: don't silently ignore metadata for sync read/write
Christoph Hellwig hch@lst.de fs: add a FMODE_ flag to indicate IOCB_HAS_METADATA availability
-------------
Diffstat:
.../bindings/serial/brcm,bcm7271-uart.yaml | 2 +- Documentation/netlink/specs/mptcp_pm.yaml | 2 +- Documentation/networking/can.rst | 2 +- Documentation/networking/mptcp.rst | 8 +- Makefile | 4 +- arch/arm64/kernel/machine_kexec_file.c | 2 +- arch/s390/kernel/kexec_elf.c | 2 +- arch/s390/kernel/kexec_image.c | 2 +- arch/s390/kernel/machine_kexec_file.c | 6 +- arch/s390/kernel/perf_cpum_cf.c | 4 +- arch/s390/kernel/perf_pai_crypto.c | 4 +- arch/s390/kernel/perf_pai_ext.c | 2 +- arch/x86/kernel/cpu/topology_amd.c | 25 +- block/fops.c | 13 +- drivers/cpufreq/amd-pstate.c | 19 +- drivers/cpufreq/intel_pstate.c | 4 +- drivers/dma/dw/rzn1-dmamux.c | 15 +- drivers/dma/idxd/init.c | 39 +-- drivers/dma/qcom/bam_dma.c | 8 +- drivers/dma/ti/edma.c | 4 +- drivers/edac/altera_edac.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 - drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 11 - drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 2 - drivers/gpu/drm/amd/amdgpu/isp_v4_1_1.c | 2 + drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 4 +- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 31 +-- drivers/gpu/drm/amd/amdgpu/psp_v11_0_8.c | 25 +- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 18 +- drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 25 +- drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c | 25 +- drivers/gpu/drm/amd/amdgpu/psp_v14_0.c | 25 +- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 12 +- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 60 +++-- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 106 +++++---- .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 1 + drivers/gpu/drm/amd/display/dc/dc.h | 1 + .../gpu/drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c | 74 +++--- .../drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c | 2 +- .../drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 115 ++------- .../gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c | 3 - .../drm/amd/display/dc/hwss/dcn351/dcn351_init.c | 3 - drivers/gpu/drm/amd/display/dc/inc/hw/pg_cntl.h | 1 + .../drm/amd/display/dc/pg/dcn35/dcn35_pg_cntl.c | 78 +++--- drivers/gpu/drm/display/drm_dp_helper.c | 42 +++- drivers/gpu/drm/drm_edid.c | 232 +++++++++--------- drivers/gpu/drm/i915/display/intel_display_power.c | 6 +- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 11 +- drivers/gpu/drm/panthor/panthor_drv.c | 2 +- drivers/gpu/drm/xe/tests/xe_bo.c | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 10 +- drivers/gpu/drm/xe/xe_bo.c | 16 +- drivers/gpu/drm/xe/xe_bo.h | 2 +- drivers/gpu/drm/xe/xe_device_types.h | 6 + drivers/gpu/drm/xe/xe_dma_buf.c | 2 +- drivers/gpu/drm/xe/xe_exec.c | 9 + drivers/gpu/drm/xe/xe_pm.c | 42 +++- drivers/gpu/drm/xe/xe_survivability_mode.c | 3 +- drivers/gpu/drm/xe/xe_vm.c | 42 +++- drivers/gpu/drm/xe/xe_vm.h | 2 + drivers/gpu/drm/xe/xe_vm_types.h | 5 + drivers/i2c/busses/i2c-i801.c | 2 +- drivers/i2c/busses/i2c-rtl9300.c | 22 +- drivers/input/joystick/xpad.c | 2 + drivers/input/misc/iqs7222.c | 3 + drivers/input/serio/i8042-acpipnpio.h | 14 ++ drivers/iommu/intel/cache.c | 5 +- drivers/iommu/intel/iommu.c | 263 ++++++++++++++------- drivers/iommu/intel/iommu.h | 12 + drivers/iommu/intel/nested.c | 4 +- drivers/iommu/intel/svm.c | 1 - drivers/irqchip/irq-mvebu-gicp.c | 2 +- drivers/md/md.c | 6 +- drivers/mtd/nand/raw/atmel/nand-controller.c | 16 +- .../mtd/nand/raw/nuvoton-ma35d1-nand-controller.c | 4 +- drivers/mtd/nand/raw/stm32_fmc2_nand.c | 46 ++-- drivers/mtd/nand/spi/core.c | 18 +- drivers/mtd/nand/spi/winbond.c | 80 ++++++- drivers/net/can/xilinx_can.c | 16 +- drivers/net/dsa/b53/b53_common.c | 17 +- drivers/net/ethernet/freescale/fec_main.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 5 +- drivers/net/ethernet/intel/igb/igb_main.c | 3 +- drivers/net/ethernet/ti/icssg/icssg_prueth.c | 20 +- drivers/net/ethernet/wangxun/libwx/wx_hw.c | 4 - drivers/net/macsec.c | 1 + drivers/net/phy/phy.c | 12 +- drivers/net/phy/phylink.c | 28 ++- drivers/net/wireless/ath/ath12k/core.h | 1 + drivers/net/wireless/ath/ath12k/dp_mon.c | 22 +- drivers/net/wireless/ath/ath12k/dp_rx.c | 11 +- drivers/net/wireless/ath/ath12k/hw.c | 55 +++++ drivers/net/wireless/ath/ath12k/hw.h | 2 + drivers/net/wireless/ath/ath12k/mac.c | 127 +++++----- drivers/net/wireless/ath/ath12k/peer.c | 2 +- drivers/net/wireless/ath/ath12k/peer.h | 28 +++ drivers/net/wireless/ath/ath12k/wmi.c | 58 ++++- drivers/net/wireless/ath/ath12k/wmi.h | 16 +- drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 26 +- drivers/pci/controller/pci-mvebu.c | 21 +- drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c | 4 +- drivers/phy/qualcomm/phy-qcom-qmp-pcie.c | 25 +- drivers/phy/tegra/xusb-tegra210.c | 6 +- drivers/phy/ti/phy-omap-usb2.c | 13 + drivers/phy/ti/phy-ti-pipe3.c | 13 + drivers/regulator/sy7636a-regulator.c | 7 +- drivers/tty/hvc/hvc_console.c | 6 +- drivers/tty/serial/sc16is7xx.c | 14 +- drivers/usb/gadget/function/f_midi2.c | 11 +- drivers/usb/gadget/udc/dummy_hcd.c | 8 +- drivers/usb/host/xhci-dbgcap.c | 94 +++++--- drivers/usb/host/xhci-mem.c | 2 +- drivers/usb/serial/option.c | 17 ++ drivers/usb/typec/tcpm/tcpm.c | 12 +- fs/btrfs/extent_io.c | 73 +++++- fs/btrfs/inode.c | 12 +- fs/btrfs/qgroup.c | 6 +- fs/ceph/addr.c | 9 +- fs/ceph/debugfs.c | 14 +- fs/ceph/dir.c | 17 +- fs/ceph/file.c | 24 +- fs/ceph/inode.c | 88 +++++-- fs/ceph/mds_client.c | 172 ++++++++------ fs/ceph/mds_client.h | 18 +- fs/coredump.c | 4 + fs/erofs/data.c | 8 +- fs/erofs/fileio.c | 2 +- fs/erofs/fscache.c | 2 +- fs/erofs/inode.c | 8 +- fs/erofs/internal.h | 2 +- fs/erofs/super.c | 16 +- fs/erofs/zdata.c | 17 +- fs/erofs/zmap.c | 98 ++++---- fs/exec.c | 2 +- fs/fhandle.c | 8 + fs/fuse/dev.c | 2 +- fs/fuse/file.c | 5 +- fs/fuse/passthrough.c | 5 + fs/kernfs/file.c | 58 +++-- fs/nfs/client.c | 2 + fs/nfs/file.c | 7 +- fs/nfs/flexfilelayout/flexfilelayout.c | 21 +- fs/nfs/inode.c | 4 +- fs/nfs/internal.h | 10 + fs/nfs/io.c | 13 +- fs/nfs/localio.c | 12 +- fs/nfs/nfs42proc.c | 2 + fs/nfs/nfs4file.c | 2 + fs/nfs/nfs4proc.c | 7 +- fs/nfs/write.c | 1 + fs/ocfs2/extent_map.c | 10 +- fs/proc/generic.c | 3 +- fs/resctrl/ctrlmondata.c | 2 +- fs/resctrl/internal.h | 4 +- fs/resctrl/monitor.c | 6 +- fs/smb/client/cifsglob.h | 13 +- fs/smb/client/file.c | 18 +- fs/smb/client/inode.c | 86 +++++-- fs/smb/client/smb2glob.h | 3 +- fs/smb/client/smb2inode.c | 204 ++++++++++++---- fs/smb/client/smb2ops.c | 32 ++- fs/smb/client/smb2proto.h | 3 + fs/smb/client/trace.h | 9 +- include/drm/display/drm_dp_helper.h | 6 + include/drm/drm_connector.h | 4 +- include/drm/drm_edid.h | 8 + include/linux/compiler-clang.h | 31 ++- include/linux/energy_model.h | 10 + include/linux/fs.h | 3 +- include/linux/kasan.h | 6 +- include/linux/mtd/spinand.h | 8 + include/net/netfilter/nf_tables.h | 11 +- include/net/netfilter/nf_tables_core.h | 49 ++-- include/net/netns/nftables.h | 1 + include/uapi/linux/raid/md_p.h | 2 +- io_uring/rw.c | 3 + kernel/bpf/Makefile | 1 + kernel/bpf/core.c | 16 +- kernel/bpf/cpumap.c | 4 +- kernel/bpf/crypto.c | 2 +- kernel/bpf/helpers.c | 7 +- kernel/bpf/rqspinlock.c | 2 +- kernel/dma/debug.c | 48 +++- kernel/dma/debug.h | 20 ++ kernel/dma/mapping.c | 4 +- kernel/events/core.c | 1 + kernel/power/energy_model.c | 29 ++- kernel/power/hibernate.c | 1 + kernel/time/hrtimer.c | 11 +- kernel/trace/fgraph.c | 3 +- kernel/trace/trace.c | 10 +- kernel/trace/trace_osnoise.c | 3 + mm/damon/core.c | 4 + mm/damon/lru_sort.c | 5 + mm/damon/reclaim.c | 5 + mm/damon/sysfs.c | 14 +- mm/hugetlb.c | 9 +- mm/kasan/shadow.c | 31 ++- mm/khugepaged.c | 4 +- mm/memory-failure.c | 20 +- mm/vmalloc.c | 8 +- net/bluetooth/hci_conn.c | 14 +- net/bluetooth/hci_event.c | 7 +- net/bluetooth/iso.c | 2 +- net/bridge/br.c | 7 + net/can/j1939/bus.c | 5 +- net/can/j1939/j1939-priv.h | 1 + net/can/j1939/main.c | 3 + net/can/j1939/socket.c | 52 ++++ net/ceph/messenger.c | 7 +- net/core/dev_ioctl.c | 22 +- net/hsr/hsr_device.c | 28 ++- net/hsr/hsr_main.c | 4 +- net/hsr/hsr_main.h | 3 + net/ipv4/ip_tunnel_core.c | 6 + net/ipv4/tcp_bpf.c | 5 +- net/mptcp/sockopt.c | 11 +- net/netfilter/nf_tables_api.c | 123 ++++++---- net/netfilter/nft_dynset.c | 5 +- net/netfilter/nft_lookup.c | 67 ++++-- net/netfilter/nft_objref.c | 5 +- net/netfilter/nft_set_bitmap.c | 14 +- net/netfilter/nft_set_hash.c | 54 ++--- net/netfilter/nft_set_pipapo.c | 211 ++++++----------- net/netfilter/nft_set_pipapo_avx2.c | 29 +-- net/netfilter/nft_set_rbtree.c | 46 ++-- net/netlink/genetlink.c | 3 + net/sunrpc/sched.c | 2 - net/sunrpc/xprtsock.c | 6 +- net/xdp/xsk.c | 113 +++++++-- net/xdp/xsk_queue.h | 12 + samples/ftrace/ftrace-direct-modify.c | 2 +- tools/testing/selftests/net/can/config | 3 + 234 files changed, 3151 insertions(+), 1711 deletions(-)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
[ Upstream commit d072148a8631f102de60ed5a3a827e85d09d24f0 ]
Currently the kernel will happily route io_uring requests with metadata to file operations that don't support it. Add a FMODE_ flag to guard that.
Fixes: 4de2ce04c862 ("fs: introduce IOCB_HAS_METADATA for metadata") Signed-off-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/20250819082517.2038819-2-hch@lst.de Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- block/fops.c | 3 +++ include/linux/fs.h | 3 ++- io_uring/rw.c | 3 +++ 3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/block/fops.c b/block/fops.c index 1309861d4c2c4..e7bfe65c57f22 100644 --- a/block/fops.c +++ b/block/fops.c @@ -7,6 +7,7 @@ #include <linux/init.h> #include <linux/mm.h> #include <linux/blkdev.h> +#include <linux/blk-integrity.h> #include <linux/buffer_head.h> #include <linux/mpage.h> #include <linux/uio.h> @@ -672,6 +673,8 @@ static int blkdev_open(struct inode *inode, struct file *filp)
if (bdev_can_atomic_write(bdev)) filp->f_mode |= FMODE_CAN_ATOMIC_WRITE; + if (blk_get_integrity(bdev->bd_disk)) + filp->f_mode |= FMODE_HAS_METADATA;
ret = bdev_open(bdev, mode, filp->private_data, NULL, filp); if (ret) diff --git a/include/linux/fs.h b/include/linux/fs.h index 040c0036320fd..d6716ff498a7a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -149,7 +149,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* Expect random access pattern */ #define FMODE_RANDOM ((__force fmode_t)(1 << 12))
-/* FMODE_* bit 13 */ +/* Supports IOCB_HAS_METADATA */ +#define FMODE_HAS_METADATA ((__force fmode_t)(1 << 13))
/* File is opened with O_PATH; almost nothing can be done with it */ #define FMODE_PATH ((__force fmode_t)(1 << 14)) diff --git a/io_uring/rw.c b/io_uring/rw.c index 52a5b950b2e5e..af5a54b5db123 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -886,6 +886,9 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type) if (req->flags & REQ_F_HAS_METADATA) { struct io_async_rw *io = req->async_data;
+ if (!(file->f_mode & FMODE_HAS_METADATA)) + return -EINVAL; + /* * We have a union of meta fields with wpq used for buffered-io * in io_async_rw, so fail it here.
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
[ Upstream commit 2729a60bbfb9215997f25372ebe9b7964f038296 ]
The block fops don't try to handle metadata for synchronous requests, probably because the completion handler looks at dio->iocb which is not valid for synchronous requests.
But silently ignoring metadata (or warning in case of __blkdev_direct_IO_simple) is a really bad idea as that can cause silent data corruption if a user ever shows up.
Instead simply handle metadata for synchronous requests as the completion handler can simply check for bio_integrity() as the block layer default integrity will already be freed at this point, and thus bio_integrity() will only return true for user mapped integrity.
Fixes: 3d8b5a22d404 ("block: add support to pass user meta buffer") Signed-off-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/20250819082517.2038819-3-hch@lst.de Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- block/fops.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/block/fops.c b/block/fops.c index e7bfe65c57f22..d62fbefb2e671 100644 --- a/block/fops.c +++ b/block/fops.c @@ -55,7 +55,6 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, struct bio bio; ssize_t ret;
- WARN_ON_ONCE(iocb->ki_flags & IOCB_HAS_METADATA); if (nr_pages <= DIO_INLINE_BIO_VECS) vecs = inline_vecs; else { @@ -132,7 +131,7 @@ static void blkdev_bio_end_io(struct bio *bio) if (bio->bi_status && !dio->bio.bi_status) dio->bio.bi_status = bio->bi_status;
- if (!is_sync && (dio->iocb->ki_flags & IOCB_HAS_METADATA)) + if (bio_integrity(bio)) bio_integrity_unmap_user(bio);
if (atomic_dec_and_test(&dio->ref)) { @@ -234,7 +233,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, } bio->bi_opf |= REQ_NOWAIT; } - if (!is_sync && (iocb->ki_flags & IOCB_HAS_METADATA)) { + if (iocb->ki_flags & IOCB_HAS_METADATA) { ret = bio_integrity_map_iter(bio, iocb->private); if (unlikely(ret)) goto fail; @@ -302,7 +301,7 @@ static void blkdev_bio_end_io_async(struct bio *bio) ret = blk_status_to_errno(bio->bi_status); }
- if (iocb->ki_flags & IOCB_HAS_METADATA) + if (bio_integrity(bio)) bio_integrity_unmap_user(bio);
iocb->ki_complete(iocb, ret); @@ -423,7 +422,8 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) }
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1); - if (likely(nr_pages <= BIO_MAX_VECS)) { + if (likely(nr_pages <= BIO_MAX_VECS && + !(iocb->ki_flags & IOCB_HAS_METADATA))) { if (is_sync_kiocb(iocb)) return __blkdev_direct_IO_simple(iocb, iter, bdev, nr_pages);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner brauner@kernel.org
[ Upstream commit be1e0283021ec73c2eb92839db9a471a068709d9 ]
When a write happens it doesn't make sense to check perform checks on the input. Skip them.
Whether a fixes tag is licensed is a bit of a gray area here but I'll add one for the socket validation part I added recently.
Link: https://lore.kernel.org/20250821-moosbedeckt-denunziant-7908663f3563@brauner Fixes: 16195d2c7dd2 ("coredump: validate socket name as it is written") Reported-by: Brad Spengler brad.spengler@opensrcsec.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/coredump.c | 4 ++++ fs/exec.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/coredump.c b/fs/coredump.c index f217ebf2b3b68..012915262d11b 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -1263,11 +1263,15 @@ static int proc_dostring_coredump(const struct ctl_table *table, int write, ssize_t retval; char old_core_pattern[CORENAME_MAX_SIZE];
+ if (write) + return proc_dostring(table, write, buffer, lenp, ppos); + retval = strscpy(old_core_pattern, core_pattern, CORENAME_MAX_SIZE);
error = proc_dostring(table, write, buffer, lenp, ppos); if (error) return error; + if (!check_coredump_socket()) { strscpy(core_pattern, old_core_pattern, retval + 1); return -EINVAL; diff --git a/fs/exec.c b/fs/exec.c index ba400aafd6406..551e1cc5bf1e3 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -2048,7 +2048,7 @@ static int proc_dointvec_minmax_coredump(const struct ctl_table *table, int writ { int error = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
- if (!error) + if (!error && !write) validate_coredump_safety(); return error; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Edward Adam Davis eadavis@qq.com
[ Upstream commit 9d81ba6d49a7457784f0b6a71046818b86ec7e44 ]
syz reported a slab-out-of-bounds Write in fuse_dev_do_write.
When the number of bytes to be retrieved is truncated to the upper limit by fc->max_pages and there is an offset, the oob is triggered.
Add a loop termination condition to prevent overruns.
Fixes: 3568a9569326 ("fuse: support large folios for retrieves") Reported-by: syzbot+2d215d165f9354b9c4ea@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2d215d165f9354b9c4ea Tested-by: syzbot+2d215d165f9354b9c4ea@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis eadavis@qq.com Reviewed-by: Joanne Koong joannelkoong@gmail.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fuse/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index e80cd8f2c049f..5150aa25e64be 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1893,7 +1893,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
index = outarg->offset >> PAGE_SHIFT;
- while (num) { + while (num && ap->num_folios < num_pages) { struct folio *folio; unsigned int folio_offset; unsigned int nr_bytes;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit bb585591ebf00fb1f6a1fdd1ea96b5848bd9112d ]
Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks") relaxed the coditions for decoding a file handle from non init userns.
The conditions are that that decoded dentry is accessible from the user provided mountfd (or to fs root) and that all the ancestors along the path have a valid id mapping in the userns.
These conditions are intentionally more strict than the condition that the decoded dentry should be "lookable" by path from the mountfd.
For example, the path /home/amir/dir/subdir is lookable by path from unpriv userns of user amir, because /home perms is 755, but the owner of /home does not have a valid id mapping in unpriv userns of user amir.
The current code did not check that the decoded dentry itself has a valid id mapping in the userns. There is no security risk in that, because that final open still performs the needed permission checks, but this is inconsistent with the checks performed on the ancestors, so the behavior can be a bit confusing.
Add the check for the decoded dentry itself, so that the entire path, including the last component has a valid id mapping in the userns.
Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks") Signed-off-by: Amir Goldstein amir73il@gmail.com Link: https://lore.kernel.org/20250827194309.1259650-1-amir73il@gmail.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fhandle.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/fs/fhandle.c b/fs/fhandle.c index e21ec857f2abc..52c72896e1c16 100644 --- a/fs/fhandle.c +++ b/fs/fhandle.c @@ -202,6 +202,14 @@ static int vfs_dentry_acceptable(void *context, struct dentry *dentry) if (!ctx->flags) return 1;
+ /* + * Verify that the decoded dentry itself has a valid id mapping. + * In case the decoded dentry is the mountfd root itself, this + * verifies that the mountfd inode itself has a valid id mapping. + */ + if (!privileged_wrt_inode_uidgid(user_ns, idmap, d_inode(dentry))) + return 0; + /* * It's racy as we're not taking rename_lock but we're able to ignore * permissions and we just need an approximation whether we were able
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baochen Qiang baochen.qiang@oss.qualcomm.com
[ Upstream commit 7e2368a21741e2db542330b32aa6fdd8908e7cff ]
As discussed in [1], there is no need to enforce dma mapping check on noncoherent allocations, a simple test on the returned CPU address is good enough.
Add a new pair of debug helpers and use them for noncoherent alloc/free to fix this issue.
Fixes: efa70f2fdc84 ("dma-mapping: add a new dma_alloc_pages API") Link: https://lore.kernel.org/all/ff6c1fe6-820f-4e58-8395-df06aa91706c@oss.qualcom... # 1 Signed-off-by: Baochen Qiang baochen.qiang@oss.qualcomm.com Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Link: https://lore.kernel.org/r/20250828-dma-debug-fix-noncoherent-dma-check-v1-1-... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/dma/debug.c | 48 +++++++++++++++++++++++++++++++++++++++++++- kernel/dma/debug.h | 20 ++++++++++++++++++ kernel/dma/mapping.c | 4 ++-- 3 files changed, 69 insertions(+), 3 deletions(-)
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index e43c6de2bce4e..b82399437db03 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -39,6 +39,7 @@ enum { dma_debug_sg, dma_debug_coherent, dma_debug_resource, + dma_debug_noncoherent, };
enum map_err_types { @@ -141,6 +142,7 @@ static const char *type2name[] = { [dma_debug_sg] = "scatter-gather", [dma_debug_coherent] = "coherent", [dma_debug_resource] = "resource", + [dma_debug_noncoherent] = "noncoherent", };
static const char *dir2name[] = { @@ -993,7 +995,8 @@ static void check_unmap(struct dma_debug_entry *ref) "[mapped as %s] [unmapped as %s]\n", ref->dev_addr, ref->size, type2name[entry->type], type2name[ref->type]); - } else if (entry->type == dma_debug_coherent && + } else if ((entry->type == dma_debug_coherent || + entry->type == dma_debug_noncoherent) && ref->paddr != entry->paddr) { err_printk(ref->dev, entry, "device driver frees " "DMA memory with different CPU address " @@ -1581,6 +1584,49 @@ void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, } }
+void debug_dma_alloc_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr, + unsigned long attrs) +{ + struct dma_debug_entry *entry; + + if (unlikely(dma_debug_disabled())) + return; + + entry = dma_entry_alloc(); + if (!entry) + return; + + entry->type = dma_debug_noncoherent; + entry->dev = dev; + entry->paddr = page_to_phys(page); + entry->size = size; + entry->dev_addr = dma_addr; + entry->direction = direction; + + add_dma_entry(entry, attrs); +} + +void debug_dma_free_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr) +{ + struct dma_debug_entry ref = { + .type = dma_debug_noncoherent, + .dev = dev, + .paddr = page_to_phys(page), + .dev_addr = dma_addr, + .size = size, + .direction = direction, + }; + + if (unlikely(dma_debug_disabled())) + return; + + check_unmap(&ref); +} + static int __init dma_debug_driver_setup(char *str) { int i; diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h index f525197d3cae6..48757ca13f314 100644 --- a/kernel/dma/debug.h +++ b/kernel/dma/debug.h @@ -54,6 +54,13 @@ extern void debug_dma_sync_sg_for_cpu(struct device *dev, extern void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems, int direction); +extern void debug_dma_alloc_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr, + unsigned long attrs); +extern void debug_dma_free_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr); #else /* CONFIG_DMA_API_DEBUG */ static inline void debug_dma_map_page(struct device *dev, struct page *page, size_t offset, size_t size, @@ -126,5 +133,18 @@ static inline void debug_dma_sync_sg_for_device(struct device *dev, int nelems, int direction) { } + +static inline void debug_dma_alloc_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr, + unsigned long attrs) +{ +} + +static inline void debug_dma_free_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr) +{ +} #endif /* CONFIG_DMA_API_DEBUG */ #endif /* _KERNEL_DMA_DEBUG_H */ diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 107e4a4d251df..56de28a3b1799 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -712,7 +712,7 @@ struct page *dma_alloc_pages(struct device *dev, size_t size, if (page) { trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle, size, dir, gfp, 0); - debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0); + debug_dma_alloc_pages(dev, page, size, dir, *dma_handle, 0); } else { trace_dma_alloc_pages(dev, NULL, 0, size, dir, gfp, 0); } @@ -738,7 +738,7 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page, dma_addr_t dma_handle, enum dma_data_direction dir) { trace_dma_free_pages(dev, page_to_virt(page), dma_handle, size, dir, 0); - debug_dma_unmap_page(dev, dma_handle, size, dir); + debug_dma_free_pages(dev, page, size, dir, dma_handle); __dma_free_pages(dev, size, page, dma_handle, dir); } EXPORT_SYMBOL_GPL(dma_free_pages);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kan Liang kan.liang@linux.intel.com
[ Upstream commit 18dbcbfabfffc4a5d3ea10290c5ad27f22b0d240 ]
The event_limit can be set by the PERF_EVENT_IOC_REFRESH to limit the number of events. When the event_limit reaches 0, the POLL_HUP signal should be sent. But it's missed.
The corresponding counter should be stopped when the event_limit reaches 0. It was implemented in the ARCH-specific code. However, since the commit 9734e25fbf5a ("perf: Fix the throttle logic for a group"), all the ARCH-specific code has been moved to the generic code. The code to handle the event_limit was lost.
Add the event->pmu->stop(event, 0); back.
Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group") Closes: https://lore.kernel.org/lkml/aICYAqM5EQUlTqtX@li-2b55cdcc-350b-11b2-a85c-a78... Reported-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Tested-by: Sumanth Korikkar sumanthk@linux.ibm.com Link: https://lkml.kernel.org/r/20250811182644.1305952-1-kan.liang@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/events/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 872122e074e5f..820127536e62b 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10330,6 +10330,7 @@ static int __perf_event_overflow(struct perf_event *event, ret = 1; event->pending_kill = POLL_HUP; perf_event_disable_inatomic(event); + event->pmu->stop(event, 0); }
if (event->attr.sigtrap) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit c8bb0f00a4886b24d933ffaabcdc09bf9a370dca ]
ioremap() never returns error pointers, it returns NULL on error. Fix the check to match.
Fixes: 3c3d7dbab2c7 ("irqchip/mvebu-gicp: Clear pending interrupts on init") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/all/aKRGcgMeaXm2TMIC@stanley.mountain Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-mvebu-gicp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-mvebu-gicp.c b/drivers/irqchip/irq-mvebu-gicp.c index 54833717f8a70..667bde3c651ff 100644 --- a/drivers/irqchip/irq-mvebu-gicp.c +++ b/drivers/irqchip/irq-mvebu-gicp.c @@ -238,7 +238,7 @@ static int mvebu_gicp_probe(struct platform_device *pdev) }
base = ioremap(gicp->res->start, resource_size(gicp->res)); - if (IS_ERR(base)) { + if (!base) { dev_err(&pdev->dev, "ioremap() failed. Unable to clear pending interrupts.\n"); } else { for (i = 0; i < 64; i++)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 3ba486c5f3ce2c22ffd29c0103404cdbe21912b3 ]
This fixes Broadcaster/Broadcast Source not sending HCI_OP_LE_TERM_BIG because HCI_CONN_PER_ADV where not being set.
Fixes: a7bcffc673de ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_conn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index ad5574e9a93ee..dc4f23ceff2a6 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -2274,7 +2274,7 @@ struct hci_conn *hci_connect_bis(struct hci_dev *hdev, bdaddr_t *dst, * the start periodic advertising and create BIG commands have * been queued */ - hci_conn_hash_list_state(hdev, bis_mark_per_adv, PA_LINK, + hci_conn_hash_list_state(hdev, bis_mark_per_adv, BIS_LINK, BT_BOUND, &data);
/* Queue start periodic advertising and create BIG */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gunthorpe jgg@nvidia.com
[ Upstream commit b9434ba97c44f5744ea537adfd1f9f3fe102681c ]
Create stage specific functions that check the stage specific conditions if each stage can be supported.
Have intel_iommu_domain_alloc_paging_flags() call both stages in sequence until one does not return EOPNOTSUPP and prefer to use the first stage if available and suitable for the requested flags.
Move second stage only operations like nested_parent and dirty_tracking into the second stage function for clarity.
Move initialization of the iommu_domain members into paging_domain_alloc().
Drop initialization of domain->owner as the callers all do it.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/4-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250714045028.958850-7-baolu.lu@linux.intel.com Signed-off-by: Will Deacon will@kernel.org Stable-dep-of: cee686775f9c ("iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/intel/iommu.c | 100 +++++++++++++++++++++--------------- 1 file changed, 58 insertions(+), 42 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index c239e280e43d9..f9f16d9bbf0bc 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -3299,10 +3299,15 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st spin_lock_init(&domain->lock); spin_lock_init(&domain->cache_lock); xa_init(&domain->iommu_array); + INIT_LIST_HEAD(&domain->s1_domains); + spin_lock_init(&domain->s1_lock);
domain->nid = dev_to_node(dev); domain->use_first_level = first_stage;
+ domain->domain.type = IOMMU_DOMAIN_UNMANAGED; + domain->domain.ops = intel_iommu_ops.default_domain_ops; + /* calculate the address width */ addr_width = agaw_to_width(iommu->agaw); if (addr_width > cap_mgaw(iommu->cap)) @@ -3344,62 +3349,73 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st }
static struct iommu_domain * -intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags, - const struct iommu_user_data *user_data) +intel_iommu_domain_alloc_first_stage(struct device *dev, + struct intel_iommu *iommu, u32 flags) +{ + struct dmar_domain *dmar_domain; + + if (flags & ~IOMMU_HWPT_ALLOC_PASID) + return ERR_PTR(-EOPNOTSUPP); + + /* Only SL is available in legacy mode */ + if (!sm_supported(iommu) || !ecap_flts(iommu->ecap)) + return ERR_PTR(-EOPNOTSUPP); + + dmar_domain = paging_domain_alloc(dev, true); + if (IS_ERR(dmar_domain)) + return ERR_CAST(dmar_domain); + return &dmar_domain->domain; +} + +static struct iommu_domain * +intel_iommu_domain_alloc_second_stage(struct device *dev, + struct intel_iommu *iommu, u32 flags) { - struct device_domain_info *info = dev_iommu_priv_get(dev); - bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; - bool nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT; - struct intel_iommu *iommu = info->iommu; struct dmar_domain *dmar_domain; - struct iommu_domain *domain; - bool first_stage;
if (flags & (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_PASID))) return ERR_PTR(-EOPNOTSUPP); - if (nested_parent && !nested_supported(iommu)) - return ERR_PTR(-EOPNOTSUPP); - if (user_data || (dirty_tracking && !ssads_supported(iommu))) + + if (((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && + !nested_supported(iommu)) || + ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) && + !ssads_supported(iommu))) return ERR_PTR(-EOPNOTSUPP);
- /* - * Always allocate the guest compatible page table unless - * IOMMU_HWPT_ALLOC_NEST_PARENT or IOMMU_HWPT_ALLOC_DIRTY_TRACKING - * is specified. - */ - if (nested_parent || dirty_tracking) { - if (!sm_supported(iommu) || !ecap_slts(iommu->ecap)) - return ERR_PTR(-EOPNOTSUPP); - first_stage = false; - } else { - first_stage = first_level_by_default(iommu); - } + /* Legacy mode always supports second stage */ + if (sm_supported(iommu) && !ecap_slts(iommu->ecap)) + return ERR_PTR(-EOPNOTSUPP);
- dmar_domain = paging_domain_alloc(dev, first_stage); + dmar_domain = paging_domain_alloc(dev, false); if (IS_ERR(dmar_domain)) return ERR_CAST(dmar_domain); - domain = &dmar_domain->domain; - domain->type = IOMMU_DOMAIN_UNMANAGED; - domain->owner = &intel_iommu_ops; - domain->ops = intel_iommu_ops.default_domain_ops; - - if (nested_parent) { - dmar_domain->nested_parent = true; - INIT_LIST_HEAD(&dmar_domain->s1_domains); - spin_lock_init(&dmar_domain->s1_lock); - }
- if (dirty_tracking) { - if (dmar_domain->use_first_level) { - iommu_domain_free(domain); - return ERR_PTR(-EOPNOTSUPP); - } - domain->dirty_ops = &intel_dirty_ops; - } + dmar_domain->nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
- return domain; + if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) + dmar_domain->domain.dirty_ops = &intel_dirty_ops; + + return &dmar_domain->domain; +} + +static struct iommu_domain * +intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags, + const struct iommu_user_data *user_data) +{ + struct device_domain_info *info = dev_iommu_priv_get(dev); + struct intel_iommu *iommu = info->iommu; + struct iommu_domain *domain; + + if (user_data) + return ERR_PTR(-EOPNOTSUPP); + + /* Prefer first stage if possible by default. */ + domain = intel_iommu_domain_alloc_first_stage(dev, iommu, flags); + if (domain != ERR_PTR(-EOPNOTSUPP)) + return domain; + return intel_iommu_domain_alloc_second_stage(dev, iommu, flags); }
static void intel_iommu_domain_free(struct iommu_domain *domain)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gunthorpe jgg@nvidia.com
[ Upstream commit b33125296b5047115469b8a3b74c0fdbf4976548 ]
Use the domain ops pointer to tell what kind of domain it is instead of the internal use_first_level indication. This also protects against wrongly using a SVA/nested/IDENTITY/BLOCKED domain type in places they should not be.
The only remaining uses of use_first_level outside the paging domain are in paging_domain_compatible() and intel_iommu_enforce_cache_coherency().
Thus, remove the useless sets of use_first_level in intel_svm_domain_alloc() and intel_iommu_domain_alloc_nested(). None of the unique ops for these domain types ever reference it on their call chains.
Add a WARN_ON() check in domain_context_mapping_one() as it only works with second stage.
This is preparation for iommupt which will have different ops for each of the stages.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/5-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250714045028.958850-8-baolu.lu@linux.intel.com Signed-off-by: Will Deacon will@kernel.org Stable-dep-of: cee686775f9c ("iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/intel/cache.c | 5 +-- drivers/iommu/intel/iommu.c | 60 +++++++++++++++++++++++++----------- drivers/iommu/intel/iommu.h | 12 ++++++++ drivers/iommu/intel/nested.c | 4 +-- drivers/iommu/intel/svm.c | 1 - 5 files changed, 58 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c index c8b79de84d3fb..071f78e67fcba 100644 --- a/drivers/iommu/intel/cache.c +++ b/drivers/iommu/intel/cache.c @@ -370,7 +370,7 @@ static void cache_tag_flush_iotlb(struct dmar_domain *domain, struct cache_tag * struct intel_iommu *iommu = tag->iommu; u64 type = DMA_TLB_PSI_FLUSH;
- if (domain->use_first_level) { + if (intel_domain_is_fs_paging(domain)) { qi_batch_add_piotlb(iommu, tag->domain_id, tag->pasid, addr, pages, ih, domain->qi_batch); return; @@ -529,7 +529,8 @@ void cache_tag_flush_range_np(struct dmar_domain *domain, unsigned long start, qi_batch_flush_descs(iommu, domain->qi_batch); iommu = tag->iommu;
- if (!cap_caching_mode(iommu->cap) || domain->use_first_level) { + if (!cap_caching_mode(iommu->cap) || + intel_domain_is_fs_paging(domain)) { iommu_flush_write_buffer(iommu); continue; } diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index f9f16d9bbf0bc..207f87eeb47a2 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1479,6 +1479,9 @@ static int domain_context_mapping_one(struct dmar_domain *domain, struct context_entry *context; int ret;
+ if (WARN_ON(!intel_domain_is_ss_paging(domain))) + return -EINVAL; + pr_debug("Set context mapping for %02x:%02x.%d\n", bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
@@ -1798,7 +1801,7 @@ static int domain_setup_first_level(struct intel_iommu *iommu, static bool domain_need_iotlb_sync_map(struct dmar_domain *domain, struct intel_iommu *iommu) { - if (cap_caching_mode(iommu->cap) && !domain->use_first_level) + if (cap_caching_mode(iommu->cap) && intel_domain_is_ss_paging(domain)) return true;
if (rwbf_quirk || cap_rwbf(iommu->cap)) @@ -1830,12 +1833,14 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
if (!sm_supported(iommu)) ret = domain_context_mapping(domain, dev); - else if (domain->use_first_level) + else if (intel_domain_is_fs_paging(domain)) ret = domain_setup_first_level(iommu, domain, dev, IOMMU_NO_PASID, NULL); - else + else if (intel_domain_is_ss_paging(domain)) ret = domain_setup_second_level(iommu, domain, dev, IOMMU_NO_PASID, NULL); + else if (WARN_ON(true)) + ret = -EINVAL;
if (ret) goto out_block_translation; @@ -3306,7 +3311,6 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st domain->use_first_level = first_stage;
domain->domain.type = IOMMU_DOMAIN_UNMANAGED; - domain->domain.ops = intel_iommu_ops.default_domain_ops;
/* calculate the address width */ addr_width = agaw_to_width(iommu->agaw); @@ -3364,6 +3368,8 @@ intel_iommu_domain_alloc_first_stage(struct device *dev, dmar_domain = paging_domain_alloc(dev, true); if (IS_ERR(dmar_domain)) return ERR_CAST(dmar_domain); + + dmar_domain->domain.ops = &intel_fs_paging_domain_ops; return &dmar_domain->domain; }
@@ -3392,6 +3398,7 @@ intel_iommu_domain_alloc_second_stage(struct device *dev, if (IS_ERR(dmar_domain)) return ERR_CAST(dmar_domain);
+ dmar_domain->domain.ops = &intel_ss_paging_domain_ops; dmar_domain->nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) @@ -4110,12 +4117,15 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, if (ret) goto out_remove_dev_pasid;
- if (dmar_domain->use_first_level) + if (intel_domain_is_fs_paging(dmar_domain)) ret = domain_setup_first_level(iommu, dmar_domain, dev, pasid, old); - else + else if (intel_domain_is_ss_paging(dmar_domain)) ret = domain_setup_second_level(iommu, dmar_domain, dev, pasid, old); + else if (WARN_ON(true)) + ret = -EINVAL; + if (ret) goto out_unwind_iopf;
@@ -4390,6 +4400,32 @@ static struct iommu_domain identity_domain = { }, };
+const struct iommu_domain_ops intel_fs_paging_domain_ops = { + .attach_dev = intel_iommu_attach_device, + .set_dev_pasid = intel_iommu_set_dev_pasid, + .map_pages = intel_iommu_map_pages, + .unmap_pages = intel_iommu_unmap_pages, + .iotlb_sync_map = intel_iommu_iotlb_sync_map, + .flush_iotlb_all = intel_flush_iotlb_all, + .iotlb_sync = intel_iommu_tlb_sync, + .iova_to_phys = intel_iommu_iova_to_phys, + .free = intel_iommu_domain_free, + .enforce_cache_coherency = intel_iommu_enforce_cache_coherency, +}; + +const struct iommu_domain_ops intel_ss_paging_domain_ops = { + .attach_dev = intel_iommu_attach_device, + .set_dev_pasid = intel_iommu_set_dev_pasid, + .map_pages = intel_iommu_map_pages, + .unmap_pages = intel_iommu_unmap_pages, + .iotlb_sync_map = intel_iommu_iotlb_sync_map, + .flush_iotlb_all = intel_flush_iotlb_all, + .iotlb_sync = intel_iommu_tlb_sync, + .iova_to_phys = intel_iommu_iova_to_phys, + .free = intel_iommu_domain_free, + .enforce_cache_coherency = intel_iommu_enforce_cache_coherency, +}; + const struct iommu_ops intel_iommu_ops = { .blocked_domain = &blocking_domain, .release_domain = &blocking_domain, @@ -4407,18 +4443,6 @@ const struct iommu_ops intel_iommu_ops = { .is_attach_deferred = intel_iommu_is_attach_deferred, .def_domain_type = device_def_domain_type, .page_response = intel_iommu_page_response, - .default_domain_ops = &(const struct iommu_domain_ops) { - .attach_dev = intel_iommu_attach_device, - .set_dev_pasid = intel_iommu_set_dev_pasid, - .map_pages = intel_iommu_map_pages, - .unmap_pages = intel_iommu_unmap_pages, - .iotlb_sync_map = intel_iommu_iotlb_sync_map, - .flush_iotlb_all = intel_flush_iotlb_all, - .iotlb_sync = intel_iommu_tlb_sync, - .iova_to_phys = intel_iommu_iova_to_phys, - .free = intel_iommu_domain_free, - .enforce_cache_coherency = intel_iommu_enforce_cache_coherency, - } };
static void quirk_iommu_igfx(struct pci_dev *dev) diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 61f42802fe9e9..c699ed8810f23 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -1381,6 +1381,18 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, u8 devfn, int alloc);
extern const struct iommu_ops intel_iommu_ops; +extern const struct iommu_domain_ops intel_fs_paging_domain_ops; +extern const struct iommu_domain_ops intel_ss_paging_domain_ops; + +static inline bool intel_domain_is_fs_paging(struct dmar_domain *domain) +{ + return domain->domain.ops == &intel_fs_paging_domain_ops; +} + +static inline bool intel_domain_is_ss_paging(struct dmar_domain *domain) +{ + return domain->domain.ops == &intel_ss_paging_domain_ops; +}
#ifdef CONFIG_INTEL_IOMMU extern int intel_iommu_sm; diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index fc312f649f9ef..1b6ad9c900a5a 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -216,8 +216,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent, /* Must be nested domain */ if (user_data->type != IOMMU_HWPT_DATA_VTD_S1) return ERR_PTR(-EOPNOTSUPP); - if (parent->ops != intel_iommu_ops.default_domain_ops || - !s2_domain->nested_parent) + if (!intel_domain_is_ss_paging(s2_domain) || !s2_domain->nested_parent) return ERR_PTR(-EINVAL);
ret = iommu_copy_struct_from_user(&vtd, user_data, @@ -229,7 +228,6 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent, if (!domain) return ERR_PTR(-ENOMEM);
- domain->use_first_level = true; domain->s2_domain = s2_domain; domain->s1_cfg = vtd; domain->domain.ops = &intel_nested_domain_ops; diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index f3da596410b5e..3994521f6ea48 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -214,7 +214,6 @@ struct iommu_domain *intel_svm_domain_alloc(struct device *dev, return ERR_PTR(-ENOMEM);
domain->domain.ops = &intel_svm_domain_ops; - domain->use_first_level = true; INIT_LIST_HEAD(&domain->dev_pasids); INIT_LIST_HEAD(&domain->cache_tags); spin_lock_init(&domain->cache_lock);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gunthorpe jgg@nvidia.com
[ Upstream commit 85cfaacc99377a63e47412eeef66eff77197acea ]
Make First/Second stage specific functions that follow the same pattern in intel_iommu_domain_alloc_first/second_stage() for computing EOPNOTSUPP. This makes the code easier to understand as if we couldn't create a domain with the parameters for this IOMMU instance then we certainly are not compatible with it.
Check superpage support directly against the per-stage cap bits and the pgsize_bitmap.
Add a note that the force_snooping is read without locking. The locking needs to cover the compatible check and the add of the device to the list.
Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/7-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250714045028.958850-10-baolu.lu@linux.intel.com Signed-off-by: Will Deacon will@kernel.org Stable-dep-of: cee686775f9c ("iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/intel/iommu.c | 66 ++++++++++++++++++++++++++++++------- 1 file changed, 54 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 207f87eeb47a2..a718f0bc14cdf 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -3434,33 +3434,75 @@ static void intel_iommu_domain_free(struct iommu_domain *domain) domain_exit(dmar_domain); }
+static int paging_domain_compatible_first_stage(struct dmar_domain *dmar_domain, + struct intel_iommu *iommu) +{ + if (WARN_ON(dmar_domain->domain.dirty_ops || + dmar_domain->nested_parent)) + return -EINVAL; + + /* Only SL is available in legacy mode */ + if (!sm_supported(iommu) || !ecap_flts(iommu->ecap)) + return -EINVAL; + + /* Same page size support */ + if (!cap_fl1gp_support(iommu->cap) && + (dmar_domain->domain.pgsize_bitmap & SZ_1G)) + return -EINVAL; + return 0; +} + +static int +paging_domain_compatible_second_stage(struct dmar_domain *dmar_domain, + struct intel_iommu *iommu) +{ + unsigned int sslps = cap_super_page_val(iommu->cap); + + if (dmar_domain->domain.dirty_ops && !ssads_supported(iommu)) + return -EINVAL; + if (dmar_domain->nested_parent && !nested_supported(iommu)) + return -EINVAL; + + /* Legacy mode always supports second stage */ + if (sm_supported(iommu) && !ecap_slts(iommu->ecap)) + return -EINVAL; + + /* Same page size support */ + if (!(sslps & BIT(0)) && (dmar_domain->domain.pgsize_bitmap & SZ_2M)) + return -EINVAL; + if (!(sslps & BIT(1)) && (dmar_domain->domain.pgsize_bitmap & SZ_1G)) + return -EINVAL; + return 0; +} + int paging_domain_compatible(struct iommu_domain *domain, struct device *dev) { struct device_domain_info *info = dev_iommu_priv_get(dev); struct dmar_domain *dmar_domain = to_dmar_domain(domain); struct intel_iommu *iommu = info->iommu; + int ret = -EINVAL; int addr_width;
- if (WARN_ON_ONCE(!(domain->type & __IOMMU_DOMAIN_PAGING))) - return -EPERM; + if (intel_domain_is_fs_paging(dmar_domain)) + ret = paging_domain_compatible_first_stage(dmar_domain, iommu); + else if (intel_domain_is_ss_paging(dmar_domain)) + ret = paging_domain_compatible_second_stage(dmar_domain, iommu); + else if (WARN_ON(true)) + ret = -EINVAL; + if (ret) + return ret;
+ /* + * FIXME this is locked wrong, it needs to be under the + * dmar_domain->lock + */ if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap)) return -EINVAL;
- if (domain->dirty_ops && !ssads_supported(iommu)) - return -EINVAL; - if (dmar_domain->iommu_coherency != iommu_paging_structure_coherency(iommu)) return -EINVAL;
- if (dmar_domain->iommu_superpage != - iommu_superpage_capability(iommu, dmar_domain->use_first_level)) - return -EINVAL; - - if (dmar_domain->use_first_level && - (!sm_supported(iommu) || !ecap_flts(iommu->ecap))) - return -EINVAL;
/* check if this iommu agaw is sufficient for max mapped address */ addr_width = agaw_to_width(iommu->agaw);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lu Baolu baolu.lu@linux.intel.com
[ Upstream commit cee686775f9cd4eae31f3c1f7ec24b2048082667 ]
Commit 12724ce3fe1a ("iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes") dynamically set iotlb_sync_map. This causes synchronization issues due to lack of locking on map and attach paths, racing iommufd userspace operations.
Invalidation changes must precede device attachment to ensure all flushes complete before hardware walks page tables, preventing coherence issues.
Make domain->iotlb_sync_map static, set once during domain allocation. If an IOMMU requires iotlb_sync_map but the domain lacks it, attach is rejected. This won't reduce domain sharing: RWBF and shadowing page table caching are legacy uses with legacy hardware. Mixed configs (some IOMMUs in caching mode, others not) are unlikely in real-world scenarios.
Fixes: 12724ce3fe1a ("iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes") Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250721051657.1695788-1-baolu.lu@linux.intel.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/intel/iommu.c | 43 +++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index a718f0bc14cdf..34dd175a331dc 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -57,6 +57,8 @@ static void __init check_tylersburg_isoch(void); static int rwbf_quirk;
+#define rwbf_required(iommu) (rwbf_quirk || cap_rwbf((iommu)->cap)) + /* * set to 1 to panic kernel if can't successfully enable VT-d * (used when kernel is launched w/ TXT) @@ -1798,18 +1800,6 @@ static int domain_setup_first_level(struct intel_iommu *iommu, (pgd_t *)pgd, flags, old); }
-static bool domain_need_iotlb_sync_map(struct dmar_domain *domain, - struct intel_iommu *iommu) -{ - if (cap_caching_mode(iommu->cap) && intel_domain_is_ss_paging(domain)) - return true; - - if (rwbf_quirk || cap_rwbf(iommu->cap)) - return true; - - return false; -} - static int dmar_domain_attach_device(struct dmar_domain *domain, struct device *dev) { @@ -1849,8 +1839,6 @@ static int dmar_domain_attach_device(struct dmar_domain *domain, if (ret) goto out_block_translation;
- domain->iotlb_sync_map |= domain_need_iotlb_sync_map(domain, iommu); - return 0;
out_block_translation: @@ -3370,6 +3358,14 @@ intel_iommu_domain_alloc_first_stage(struct device *dev, return ERR_CAST(dmar_domain);
dmar_domain->domain.ops = &intel_fs_paging_domain_ops; + /* + * iotlb sync for map is only needed for legacy implementations that + * explicitly require flushing internal write buffers to ensure memory + * coherence. + */ + if (rwbf_required(iommu)) + dmar_domain->iotlb_sync_map = true; + return &dmar_domain->domain; }
@@ -3404,6 +3400,14 @@ intel_iommu_domain_alloc_second_stage(struct device *dev, if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) dmar_domain->domain.dirty_ops = &intel_dirty_ops;
+ /* + * Besides the internal write buffer flush, the caching mode used for + * legacy nested translation (which utilizes shadowing page tables) + * also requires iotlb sync on map. + */ + if (rwbf_required(iommu) || cap_caching_mode(iommu->cap)) + dmar_domain->iotlb_sync_map = true; + return &dmar_domain->domain; }
@@ -3449,6 +3453,11 @@ static int paging_domain_compatible_first_stage(struct dmar_domain *dmar_domain, if (!cap_fl1gp_support(iommu->cap) && (dmar_domain->domain.pgsize_bitmap & SZ_1G)) return -EINVAL; + + /* iotlb sync on map requirement */ + if ((rwbf_required(iommu)) && !dmar_domain->iotlb_sync_map) + return -EINVAL; + return 0; }
@@ -3472,6 +3481,12 @@ paging_domain_compatible_second_stage(struct dmar_domain *dmar_domain, return -EINVAL; if (!(sslps & BIT(1)) && (dmar_domain->domain.pgsize_bitmap & SZ_1G)) return -EINVAL; + + /* iotlb sync on map requirement */ + if ((rwbf_required(iommu) || cap_caching_mode(iommu->cap)) && + !dmar_domain->iotlb_sync_map) + return -EINVAL; + return 0; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit d36349ea73d805bb72cbc24ab90cb1da4ad5c379 ]
Connections with type of PA_LINK shall be considered temporary just to track the lifetime of PA Sync setup, once the BIG Sync is established and connection are created with BIS_LINK the existing PA_LINK connection shall not longer use bis_cleanup otherwise it terminates the PA Sync when that shall be left to BIS_LINK connection to do it.
Fixes: a7bcffc673de ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_conn.c | 12 +++++++++++- net/bluetooth/hci_event.c | 7 ++++++- 2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index dc4f23ceff2a6..ce17e489c67c3 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -829,7 +829,17 @@ static void bis_cleanup(struct hci_conn *conn) /* Check if ISO connection is a BIS and terminate advertising * set and BIG if there are no other connections using it. */ - bis = hci_conn_hash_lookup_big(hdev, conn->iso_qos.bcast.big); + bis = hci_conn_hash_lookup_big_state(hdev, + conn->iso_qos.bcast.big, + BT_CONNECTED, + HCI_ROLE_MASTER); + if (bis) + return; + + bis = hci_conn_hash_lookup_big_state(hdev, + conn->iso_qos.bcast.big, + BT_CONNECT, + HCI_ROLE_MASTER); if (bis) return;
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 0ffdbe249f5d3..090c7ffa51525 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -6973,9 +6973,14 @@ static void hci_le_big_sync_established_evt(struct hci_dev *hdev, void *data, continue; }
- if (ev->status != 0x42) + if (ev->status != 0x42) { /* Mark PA sync as established */ set_bit(HCI_CONN_PA_SYNC, &bis->flags); + /* Reset cleanup callback of PA Sync so it doesn't + * terminate the sync when deleting the connection. + */ + conn->cleanup = NULL; + }
bis->sync_handle = conn->sync_handle; bis->iso_qos.bcast.big = ev->handle;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit aee29c18a38d479c2f058c9b6a39b0527cf81d10 ]
getname shall return iso_bc fields for both BIS_LINK and PA_LINK since the likes of bluetoothd do use the getpeername to retrieve the SID both when enumerating the broadcasters and when synchronizing.
Fixes: a7bcffc673de ("Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/iso.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 14a4215352d5f..c21566e1494a9 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -1347,7 +1347,7 @@ static int iso_sock_getname(struct socket *sock, struct sockaddr *addr, bacpy(&sa->iso_bdaddr, &iso_pi(sk)->dst); sa->iso_bdaddr_type = iso_pi(sk)->dst_type;
- if (hcon && hcon->type == BIS_LINK) { + if (hcon && (hcon->type == BIS_LINK || hcon->type == PA_LINK)) { sa->iso_bc->bc_sid = iso_pi(sk)->bc_sid; sa->iso_bc->bc_num_bis = iso_pi(sk)->bc_num_bis; memcpy(sa->iso_bc->bc_bis, iso_pi(sk)->bc_bis,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
This reverts commit 165a69a87d6bde85cac2c051fa6da611ca4524f6 which is commit 8345a71fc54b28e4d13a759c45ce2664d8540d28 upstream.
This commit is not applicable for stable kernels and results in the driver failing to load on some chips on kernel 6.16.x. Revert from 6.16.x.
Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.16.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 ---- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 11 ----------- drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 31 ++++++++++++------------------- drivers/gpu/drm/amd/amdgpu/psp_v11_0_8.c | 25 ++++++++++--------------- drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 18 +++++++----------- drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 25 ++++++++++--------------- drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c | 25 ++++++++++--------------- drivers/gpu/drm/amd/amdgpu/psp_v14_0.c | 25 ++++++++++--------------- 9 files changed, 61 insertions(+), 107 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -596,10 +596,6 @@ int psp_wait_for(struct psp_context *psp udelay(1); }
- dev_err(adev->dev, - "psp reg (0x%x) wait timed out, mask: %x, read: %x exp: %x", - reg_index, mask, val, reg_val); - return -ETIME; }
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -51,17 +51,6 @@ #define C2PMSG_CMD_SPI_GET_ROM_IMAGE_ADDR_HI 0x10 #define C2PMSG_CMD_SPI_GET_FLASH_IMAGE 0x11
-/* Command register bit 31 set to indicate readiness */ -#define MBOX_TOS_READY_FLAG (GFX_FLAG_RESPONSE) -#define MBOX_TOS_READY_MASK (GFX_CMD_RESPONSE_MASK | GFX_CMD_STATUS_MASK) - -/* Values to check for a successful GFX_CMD response wait. Check against - * both status bits and response state - helps to detect a command failure - * or other unexpected cases like a device drop reading all 0xFFs - */ -#define MBOX_TOS_RESP_FLAG (GFX_FLAG_RESPONSE) -#define MBOX_TOS_RESP_MASK (GFX_CMD_RESPONSE_MASK | GFX_CMD_STATUS_MASK) - extern const struct attribute_group amdgpu_flash_attr_group;
enum psp_shared_mem_size { --- a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c @@ -94,7 +94,7 @@ static int psp_v10_0_ring_create(struct
/* Wait for response flag (bit 31) in C2PMSG_64 */ ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + 0x80000000, 0x8000FFFF, false);
return ret; } @@ -115,7 +115,7 @@ static int psp_v10_0_ring_stop(struct ps
/* Wait for response flag (bit 31) in C2PMSG_64 */ ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + 0x80000000, 0x80000000, false);
return ret; } --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c @@ -277,13 +277,11 @@ static int psp_v11_0_ring_stop(struct ps
/* Wait for response flag (bit 31) */ if (amdgpu_sriov_vf(adev)) - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), + 0x80000000, 0x80000000, false); else - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false);
return ret; } @@ -319,15 +317,13 @@ static int psp_v11_0_ring_create(struct mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_101 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), + 0x80000000, 0x8000FFFF, false);
} else { /* Wait for sOS ready for ring creation */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_READY_FLAG, MBOX_TOS_READY_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false); if (ret) { DRM_ERROR("Failed to wait for sOS ready for ring creation\n"); return ret; @@ -351,9 +347,8 @@ static int psp_v11_0_ring_create(struct mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_64 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), + 0x80000000, 0x8000FFFF, false); }
return ret; @@ -386,8 +381,7 @@ static int psp_v11_0_mode1_reset(struct
offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64);
- ret = psp_wait_for(psp, offset, MBOX_TOS_READY_FLAG, - MBOX_TOS_READY_MASK, false); + ret = psp_wait_for(psp, offset, 0x80000000, 0x8000FFFF, false);
if (ret) { DRM_INFO("psp is not working correctly before mode1 reset!\n"); @@ -401,8 +395,7 @@ static int psp_v11_0_mode1_reset(struct
offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33);
- ret = psp_wait_for(psp, offset, MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, - false); + ret = psp_wait_for(psp, offset, 0x80000000, 0x80000000, false);
if (ret) { DRM_INFO("psp mode 1 reset failed!\n"); --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0_8.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0_8.c @@ -41,9 +41,8 @@ static int psp_v11_0_8_ring_stop(struct /* there might be handshake issue with hardware which needs delay */ mdelay(20); /* Wait for response flag (bit 31) */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), + 0x80000000, 0x80000000, false); } else { /* Write the ring destroy command*/ WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_64, @@ -51,9 +50,8 @@ static int psp_v11_0_8_ring_stop(struct /* there might be handshake issue with hardware which needs delay */ mdelay(20); /* Wait for response flag (bit 31) */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false); }
return ret; @@ -89,15 +87,13 @@ static int psp_v11_0_8_ring_create(struc mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_101 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), + 0x80000000, 0x8000FFFF, false);
} else { /* Wait for sOS ready for ring creation */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_READY_FLAG, MBOX_TOS_READY_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false); if (ret) { DRM_ERROR("Failed to wait for trust OS ready for ring creation\n"); return ret; @@ -121,9 +117,8 @@ static int psp_v11_0_8_ring_create(struc mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_64 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), + 0x80000000, 0x8000FFFF, false); }
return ret; --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c @@ -163,7 +163,7 @@ static int psp_v12_0_ring_create(struct
/* Wait for response flag (bit 31) in C2PMSG_64 */ ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + 0x80000000, 0x8000FFFF, false);
return ret; } @@ -184,13 +184,11 @@ static int psp_v12_0_ring_stop(struct ps
/* Wait for response flag (bit 31) */ if (amdgpu_sriov_vf(adev)) - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_101), + 0x80000000, 0x80000000, false); else - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false);
return ret; } @@ -221,8 +219,7 @@ static int psp_v12_0_mode1_reset(struct
offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_64);
- ret = psp_wait_for(psp, offset, MBOX_TOS_READY_FLAG, - MBOX_TOS_READY_MASK, false); + ret = psp_wait_for(psp, offset, 0x80000000, 0x8000FFFF, false);
if (ret) { DRM_INFO("psp is not working correctly before mode1 reset!\n"); @@ -236,8 +233,7 @@ static int psp_v12_0_mode1_reset(struct
offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33);
- ret = psp_wait_for(psp, offset, MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, - false); + ret = psp_wait_for(psp, offset, 0x80000000, 0x80000000, false);
if (ret) { DRM_INFO("psp mode 1 reset failed!\n"); --- a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c @@ -384,9 +384,8 @@ static int psp_v13_0_ring_stop(struct ps /* there might be handshake issue with hardware which needs delay */ mdelay(20); /* Wait for response flag (bit 31) */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_101), + 0x80000000, 0x80000000, false); } else { /* Write the ring destroy command*/ WREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_64, @@ -394,9 +393,8 @@ static int psp_v13_0_ring_stop(struct ps /* there might be handshake issue with hardware which needs delay */ mdelay(20); /* Wait for response flag (bit 31) */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false); }
return ret; @@ -432,15 +430,13 @@ static int psp_v13_0_ring_create(struct mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_101 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_101), + 0x80000000, 0x8000FFFF, false);
} else { /* Wait for sOS ready for ring creation */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), - MBOX_TOS_READY_FLAG, MBOX_TOS_READY_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false); if (ret) { DRM_ERROR("Failed to wait for trust OS ready for ring creation\n"); return ret; @@ -464,9 +460,8 @@ static int psp_v13_0_ring_create(struct mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_64 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), + 0x80000000, 0x8000FFFF, false); }
return ret; --- a/drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c @@ -204,9 +204,8 @@ static int psp_v13_0_4_ring_stop(struct /* there might be handshake issue with hardware which needs delay */ mdelay(20); /* Wait for response flag (bit 31) */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_101), + 0x80000000, 0x80000000, false); } else { /* Write the ring destroy command*/ WREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_64, @@ -214,9 +213,8 @@ static int psp_v13_0_4_ring_stop(struct /* there might be handshake issue with hardware which needs delay */ mdelay(20); /* Wait for response flag (bit 31) */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false); }
return ret; @@ -252,15 +250,13 @@ static int psp_v13_0_4_ring_create(struc mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_101 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_101), + 0x80000000, 0x8000FFFF, false);
} else { /* Wait for sOS ready for ring creation */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), - MBOX_TOS_READY_FLAG, MBOX_TOS_READY_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false); if (ret) { DRM_ERROR("Failed to wait for trust OS ready for ring creation\n"); return ret; @@ -284,9 +280,8 @@ static int psp_v13_0_4_ring_create(struc mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_64 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMP0_SMN_C2PMSG_64), + 0x80000000, 0x8000FFFF, false); }
return ret; --- a/drivers/gpu/drm/amd/amdgpu/psp_v14_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v14_0.c @@ -250,9 +250,8 @@ static int psp_v14_0_ring_stop(struct ps /* there might be handshake issue with hardware which needs delay */ mdelay(20); /* Wait for response flag (bit 31) */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_101), + 0x80000000, 0x80000000, false); } else { /* Write the ring destroy command*/ WREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_64, @@ -260,9 +259,8 @@ static int psp_v14_0_ring_stop(struct ps /* there might be handshake issue with hardware which needs delay */ mdelay(20); /* Wait for response flag (bit 31) */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false); }
return ret; @@ -298,15 +296,13 @@ static int psp_v14_0_ring_create(struct mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_101 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_101), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_101), + 0x80000000, 0x8000FFFF, false);
} else { /* Wait for sOS ready for ring creation */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_64), - MBOX_TOS_READY_FLAG, MBOX_TOS_READY_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_64), + 0x80000000, 0x80000000, false); if (ret) { DRM_ERROR("Failed to wait for trust OS ready for ring creation\n"); return ret; @@ -330,9 +326,8 @@ static int psp_v14_0_ring_create(struct mdelay(20);
/* Wait for response flag (bit 31) in C2PMSG_64 */ - ret = psp_wait_for( - psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_64), - MBOX_TOS_RESP_FLAG, MBOX_TOS_RESP_MASK, false); + ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, regMPASP_SMN_C2PMSG_64), + 0x80000000, 0x8000FFFF, false); }
return ret;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tigran Mkrtchyan tigran.mkrtchyan@desy.de
[ Upstream commit 5a46d2339a5ae268ede53a221f20433d8ea4f2f9 ]
Recent commit f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") has changed the error return type of ff_layout_choose_ds_for_read() from NULL to an error pointer. However, not all code paths have been updated to match the change. Thus, some non-NULL checks will accept error pointers as a valid return value.
Reported-by: Dan Carpenter dan.carpenter@linaro.org Suggested-by: Dan Carpenter dan.carpenter@linaro.org Fixes: f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") Signed-off-by: Tigran Mkrtchyan tigran.mkrtchyan@desy.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/flexfilelayout/flexfilelayout.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index 8dc921d835388..f8ab7b4e09e7e 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -773,8 +773,11 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg, continue;
if (check_device && - nfs4_test_deviceid_unavailable(&mirror->mirror_ds->id_node)) + nfs4_test_deviceid_unavailable(&mirror->mirror_ds->id_node)) { + // reinitialize the error state in case if this is the last iteration + ds = ERR_PTR(-EINVAL); continue; + }
*best_idx = idx; break; @@ -804,7 +807,7 @@ ff_layout_choose_best_ds_for_read(struct pnfs_layout_segment *lseg, struct nfs4_pnfs_ds *ds;
ds = ff_layout_choose_valid_ds_for_read(lseg, start_idx, best_idx); - if (ds) + if (!IS_ERR(ds)) return ds; return ff_layout_choose_any_ds_for_read(lseg, start_idx, best_idx); } @@ -818,7 +821,7 @@ ff_layout_get_ds_for_read(struct nfs_pageio_descriptor *pgio,
ds = ff_layout_choose_best_ds_for_read(lseg, pgio->pg_mirror_idx, best_idx); - if (ds || !pgio->pg_mirror_idx) + if (!IS_ERR(ds) || !pgio->pg_mirror_idx) return ds; return ff_layout_choose_best_ds_for_read(lseg, 0, best_idx); } @@ -868,7 +871,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio, req->wb_nio = 0;
ds = ff_layout_get_ds_for_read(pgio, &ds_idx); - if (!ds) { + if (IS_ERR(ds)) { if (!ff_layout_no_fallback_to_mds(pgio->pg_lseg)) goto out_mds; pnfs_generic_pg_cleanup(pgio); @@ -1072,11 +1075,13 @@ static void ff_layout_resend_pnfs_read(struct nfs_pgio_header *hdr) { u32 idx = hdr->pgio_mirror_idx + 1; u32 new_idx = 0; + struct nfs4_pnfs_ds *ds;
- if (ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx)) - ff_layout_send_layouterror(hdr->lseg); - else + ds = ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx); + if (IS_ERR(ds)) pnfs_error_mark_layout_for_return(hdr->inode, hdr->lseg); + else + ff_layout_send_layouterror(hdr->lseg); pnfs_read_resend_pnfs(hdr, new_idx); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Justin Worrell jworrell@gmail.com
[ Upstream commit 9559d2fffd4f9b892165eed48198a0e5cb8504e6 ]
xs_sock_recv_cmsg was failing to call xs_sock_process_cmsg for any cmsg type other than TLS_RECORD_TYPE_ALERT (TLS_RECORD_TYPE_DATA, and other values not handled.) Based on my reading of the previous commit (cc5d5908: sunrpc: fix client side handling of tls alerts), it looks like only iov_iter_revert should be conditional on TLS_RECORD_TYPE_ALERT (but that other cmsg types should still call xs_sock_process_cmsg). On my machine, I was unable to connect (over mtls) to an NFS share hosted on FreeBSD. With this patch applied, I am able to mount the share again.
Fixes: cc5d59081fa2 ("sunrpc: fix client side handling of tls alerts") Signed-off-by: Justin Worrell jworrell@gmail.com Reviewed-and-tested-by: Scott Mayhew smayhew@redhat.com Link: https://lore.kernel.org/r/20250904211038.12874-3-jworrell@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprtsock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index c5f7bbf5775ff..3aa987e7f0724 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -407,9 +407,9 @@ xs_sock_recv_cmsg(struct socket *sock, unsigned int *msg_flags, int flags) iov_iter_kvec(&msg.msg_iter, ITER_DEST, &alert_kvec, 1, alert_kvec.iov_len); ret = sock_recvmsg(sock, &msg, flags); - if (ret > 0 && - tls_get_record_type(sock->sk, &u.cmsg) == TLS_RECORD_TYPE_ALERT) { - iov_iter_revert(&msg.msg_iter, ret); + if (ret > 0) { + if (tls_get_record_type(sock->sk, &u.cmsg) == TLS_RECORD_TYPE_ALERT) + iov_iter_revert(&msg.msg_iter, ret); ret = xs_sock_process_cmsg(sock, &msg, msg_flags, &u.cmsg, -EAGAIN); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 31f1a960ad1a14def94fa0b8c25d62b4c032813f ]
Don't clear the capabilities that are not going to get reset by the call to _nfs4_server_capabilities().
Reported-by: Scott Haiden scott.b.haiden@gmail.com Fixes: b01f21cacde9 ("NFS: Fix the setting of capabilities when automounting a new filesystem") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 7e203857f4668..fc86c75372b94 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -4082,7 +4082,6 @@ int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle) }; int err;
- nfs_server_set_init_caps(server); do { err = nfs4_handle_exception(server, _nfs4_server_capabilities(server, fhandle),
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiao Ni xni@redhat.com
[ Upstream commit c27973211ffcdf0a092eec265d5993e64b89adaf ]
commit 907a99c314a5 ("md: rename recovery_cp to resync_offset") replaces recovery_cp with resync_offset in mdp_superblock_s which is in md_p.h. md_p.h is used in userspace too. So mdadm building fails because of this. This patch revert this change.
Fixes: 907a99c314a5 ("md: rename recovery_cp to resync_offset") Signed-off-by: Xiao Ni xni@redhat.com Link: https://lore.kernel.org/linux-raid/20250815040028.18085-1-xni@redhat.com Signed-off-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md.c | 6 +++--- include/uapi/linux/raid/md_p.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 3f355bb85797f..0f41573fa9f5e 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1406,7 +1406,7 @@ static int super_90_validate(struct mddev *mddev, struct md_rdev *freshest, stru else { if (sb->events_hi == sb->cp_events_hi && sb->events_lo == sb->cp_events_lo) { - mddev->resync_offset = sb->resync_offset; + mddev->resync_offset = sb->recovery_cp; } else mddev->resync_offset = 0; } @@ -1534,13 +1534,13 @@ static void super_90_sync(struct mddev *mddev, struct md_rdev *rdev) mddev->minor_version = sb->minor_version; if (mddev->in_sync) { - sb->resync_offset = mddev->resync_offset; + sb->recovery_cp = mddev->resync_offset; sb->cp_events_hi = (mddev->events>>32); sb->cp_events_lo = (u32)mddev->events; if (mddev->resync_offset == MaxSector) sb->state = (1<< MD_SB_CLEAN); } else - sb->resync_offset = 0; + sb->recovery_cp = 0;
sb->layout = mddev->layout; sb->chunk_size = mddev->chunk_sectors << 9; diff --git a/include/uapi/linux/raid/md_p.h b/include/uapi/linux/raid/md_p.h index b139462872775..ac74133a47688 100644 --- a/include/uapi/linux/raid/md_p.h +++ b/include/uapi/linux/raid/md_p.h @@ -173,7 +173,7 @@ typedef struct mdp_superblock_s { #else #error unspecified endianness #endif - __u32 resync_offset; /* 11 resync checkpoint sector count */ + __u32 recovery_cp; /* 11 resync checkpoint sector count */ /* There are only valid for minor_version > 90 */ __u64 reshape_position; /* 12,13 next address in array-space for reshape */ __u32 new_level; /* 14 new level we are reshaping to */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit ab1396af7595e7d49a3850481b24d7fe7cbdfd31 ]
Commit edede7a6dcd7 ("trace/fgraph: Fix the warning caused by missing unregister notifier") added a call to unregister the PM notifier if register_ftrace_graph() failed. It does so unconditionally. However, the PM notifier is only registered with the first call to register_ftrace_graph(). If the first registration was successful and a subsequent registration failed, the notifier is now unregistered even if ftrace graphs are still registered.
Fix the problem by only unregistering the PM notifier during error handling if there are no active fgraph registrations.
Fixes: edede7a6dcd7 ("trace/fgraph: Fix the warning caused by missing unregister notifier") Closes: https://lore.kernel.org/all/63b0ba5a-a928-438e-84f9-93028dd72e54@roeck-us.ne... Cc: Ye Weihua yeweihua4@huawei.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Link: https://lore.kernel.org/20250906050618.2634078-1-linux@roeck-us.net Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/fgraph.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index dac2d58f39490..db40ec5cc9d73 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -1393,7 +1393,8 @@ int register_ftrace_graph(struct fgraph_ops *gops) ftrace_graph_active--; gops->saved_func = NULL; fgraph_lru_release_index(i); - unregister_pm_notifier(&ftrace_suspend_notifier); + if (!ftrace_graph_active) + unregister_pm_notifier(&ftrace_suspend_notifier); } return ret; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit dd5a8621b886b02f8341c5d4ea68eb2c552ebd3e ]
_nfs4_server_capabilities() is expected to clear any flags that are not supported by the server.
Fixes: 8a59bb93b7e3 ("NFSv4 store server support for fs_location attribute") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index fc86c75372b94..ccd97dcf115f9 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -4007,8 +4007,9 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f res.attr_bitmask[2]; } memcpy(server->attr_bitmask, res.attr_bitmask, sizeof(server->attr_bitmask)); - server->caps &= ~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | - NFS_CAP_SYMLINKS| NFS_CAP_SECURITY_LABEL); + server->caps &= + ~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | NFS_CAP_SYMLINKS | + NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS); server->fattr_valid = NFS_ATTR_FATTR_V4; if (res.attr_bitmask[0] & FATTR4_WORD0_ACL && res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit b3ac33436030bce37ecb3dcae581ecfaad28078c ]
_nfs4_server_capabilities() should clear capabilities that are not supported by the server.
Fixes: d2a00cceb93a ("NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index ccd97dcf115f9..8d492e3b21631 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -4009,7 +4009,8 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f memcpy(server->attr_bitmask, res.attr_bitmask, sizeof(server->attr_bitmask)); server->caps &= ~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | NFS_CAP_SYMLINKS | - NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS); + NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS | + NFS_CAP_OPEN_XOR | NFS_CAP_DELEGTIME); server->fattr_valid = NFS_ATTR_FATTR_V4; if (res.attr_bitmask[0] & FATTR4_WORD0_ACL && res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 4fb2b677fc1f70ee642c0beecc3cabf226ef5707 ]
nfs_server_set_fsinfo() shouldn't assume that NFS_CAP_XATTR is unset on entry to the function.
Fixes: b78ef845c35d ("NFSv4.2: query the server for extended attribute support") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/client.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 3bcf5c204578c..97bd9d2a4b0cd 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -890,6 +890,8 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
if (fsinfo->xattr_support) server->caps |= NFS_CAP_XATTR; + else + server->caps &= ~NFS_CAP_XATTR; #endif }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luo Gengkun luogengkun@huaweicloud.com
[ Upstream commit 3d62ab32df065e4a7797204a918f6489ddb8a237 ]
Both tracing_mark_write and tracing_mark_raw_write call __copy_from_user_inatomic during preempt_disable. But in some case, __copy_from_user_inatomic may trigger page fault, and will call schedule() subtly. And if a task is migrated to other cpu, the following warning will be trigger: if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
An example can illustrate this issue:
process flow CPU ---------------------------------------------------------------------
tracing_mark_raw_write(): cpu:0 ... ring_buffer_lock_reserve(): cpu:0 ... cpu = raw_smp_processor_id() cpu:0 cpu_buffer = buffer->buffers[cpu] cpu:0 ... ... __copy_from_user_inatomic(): cpu:0 ... # page fault do_mem_abort(): cpu:0 ... # Call schedule schedule() cpu:0 ... # the task schedule to cpu1 __buffer_unlock_commit(): cpu:1 ... ring_buffer_unlock_commit(): cpu:1 ... cpu = raw_smp_processor_id() cpu:1 cpu_buffer = buffer->buffers[cpu] cpu:1
As shown above, the process will acquire cpuid twice and the return values are not the same.
To fix this problem using copy_from_user_nofault instead of __copy_from_user_inatomic, as the former performs 'access_ok' before copying.
Link: https://lore.kernel.org/20250819105152.2766363-1-luogengkun@huaweicloud.com Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing") Signed-off-by: Luo Gengkun luogengkun@huaweicloud.com Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index b91fa02cc54a6..9329ac1667551 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7264,7 +7264,7 @@ static ssize_t write_marker_to_buffer(struct trace_array *tr, const char __user entry = ring_buffer_event_data(event); entry->ip = ip;
- len = __copy_from_user_inatomic(&entry->buf, ubuf, cnt); + len = copy_from_user_nofault(&entry->buf, ubuf, cnt); if (len) { memcpy(&entry->buf, FAULTED_STR, FAULTED_SIZE); cnt = FAULTED_SIZE; @@ -7361,7 +7361,7 @@ static ssize_t write_raw_marker_to_buffer(struct trace_array *tr,
entry = ring_buffer_event_data(event);
- len = __copy_from_user_inatomic(&entry->id, ubuf, cnt); + len = copy_from_user_nofault(&entry->id, ubuf, cnt); if (len) { entry->id = -1; memcpy(&entry->buf, FAULTED_STR, FAULTED_SIZE);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Scott Mayhew smayhew@redhat.com
[ Upstream commit 992203a1fba51b025c60ec0c8b0d9223343dea95 ]
Otherwise if the nfsd filecache code releases the nfsd_file immediately, it can trigger the BUG_ON(cred == current->cred) in __put_cred() when it puts the nfsd_file->nf_file->f-cred.
Fixes: b9f5dd57f4a5 ("nfs/localio: use dedicated workqueues for filesystem read and write") Signed-off-by: Scott Mayhew smayhew@redhat.com Reviewed-by: Mike Snitzer snitzer@kernel.org Link: https://lore.kernel.org/r/20250807164938.2395136-1-smayhew@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/localio.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c index 510d0a16cfe91..e2213ef18baed 100644 --- a/fs/nfs/localio.c +++ b/fs/nfs/localio.c @@ -453,12 +453,13 @@ static void nfs_local_call_read(struct work_struct *work) nfs_local_iter_init(&iter, iocb, READ);
status = filp->f_op->read_iter(&iocb->kiocb, &iter); + + revert_creds(save_cred); + if (status != -EIOCBQUEUED) { nfs_local_read_done(iocb, status); nfs_local_pgio_release(iocb); } - - revert_creds(save_cred); }
static int @@ -649,14 +650,15 @@ static void nfs_local_call_write(struct work_struct *work) file_start_write(filp); status = filp->f_op->write_iter(&iocb->kiocb, &iter); file_end_write(filp); + + revert_creds(save_cred); + current->flags = old_flags; + if (status != -EIOCBQUEUED) { nfs_local_write_done(iocb, status); nfs_local_vfs_getattr(iocb); nfs_local_pgio_release(iocb); } - - revert_creds(save_cred); - current->flags = old_flags; }
static int
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Riabchun ferr.lambarginio@gmail.com
[ Upstream commit 80d03a40837a9b26750a25122b906c052cc846c9 ]
In my_tramp1 function .size directive was placed above ASM_RET instruction, leading to a wrong function size.
Link: https://lore.kernel.org/aK3d7vxNcO52kEmg@vova-pc Fixes: 9d907f1ae80b ("samples/ftrace: Fix asm function ELF annotations") Signed-off-by: Vladimir Riabchun ferr.lambarginio@gmail.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- samples/ftrace/ftrace-direct-modify.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c index cfea7a38befb0..da3a9f2091f55 100644 --- a/samples/ftrace/ftrace-direct-modify.c +++ b/samples/ftrace/ftrace-direct-modify.c @@ -75,8 +75,8 @@ asm ( CALL_DEPTH_ACCOUNT " call my_direct_func1\n" " leave\n" -" .size my_tramp1, .-my_tramp1\n" ASM_RET +" .size my_tramp1, .-my_tramp1\n"
" .type my_tramp2, @function\n" " .globl my_tramp2\n"
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wang Liang wangliang74@huawei.com
[ Upstream commit c1628c00c4351dd0727ef7f670694f68d9e663d8 ]
A crash was observed with the following output:
BUG: kernel NULL pointer dereference, address: 0000000000000010 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 2 UID: 0 PID: 92 Comm: osnoise_cpus Not tainted 6.17.0-rc4-00201-gd69eb204c255 #138 PREEMPT(voluntary) RIP: 0010:bitmap_parselist+0x53/0x3e0 Call Trace: <TASK> osnoise_cpus_write+0x7a/0x190 vfs_write+0xf8/0x410 ? do_sys_openat2+0x88/0xd0 ksys_write+0x60/0xd0 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK>
This issue can be reproduced by below code:
fd=open("/sys/kernel/debug/tracing/osnoise/cpus", O_WRONLY); write(fd, "0-2", 0);
When user pass 'count=0' to osnoise_cpus_write(), kmalloc() will return ZERO_SIZE_PTR (16) and cpulist_parse() treat it as a normal value, which trigger the null pointer dereference. Add check for the parameter 'count'.
Cc: mhiramat@kernel.org Cc: mathieu.desnoyers@efficios.com Cc: tglozar@redhat.com Link: https://lore.kernel.org/20250906035610.3880282-1-wangliang74@huawei.com Fixes: 17f89102fe23 ("tracing/osnoise: Allow arbitrarily long CPU string") Signed-off-by: Wang Liang wangliang74@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace_osnoise.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index fd259da0aa645..337bc0eb5d71b 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -2322,6 +2322,9 @@ osnoise_cpus_write(struct file *filp, const char __user *ubuf, size_t count, int running, err; char *buf __free(kfree) = NULL;
+ if (count < 1) + return 0; + buf = kmalloc(count, GFP_KERNEL); if (!buf) return -ENOMEM;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 9eb90f435415c7da4800974ed943e39b5578ee7f ]
Ensure that all O_DIRECT reads and writes are complete, and prevent the initiation of new i/o until the setattr operation that will truncate the file is complete.
Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/inode.c | 4 +++- fs/nfs/internal.h | 10 ++++++++++ fs/nfs/io.c | 13 ++----------- 3 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index a2fa6bc4d74e3..a32cc45425e28 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -761,8 +761,10 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, trace_nfs_setattr_enter(inode);
/* Write all dirty data */ - if (S_ISREG(inode->i_mode)) + if (S_ISREG(inode->i_mode)) { + nfs_file_block_o_direct(NFS_I(inode)); nfs_sync_inode(inode); + }
fattr = nfs_alloc_fattr_with_label(NFS_SERVER(inode)); if (fattr == NULL) { diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 9dcbc33964922..0ef0fc6aba3b3 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -531,6 +531,16 @@ static inline bool nfs_file_io_is_buffered(struct nfs_inode *nfsi) return test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0; }
+/* Must be called with exclusively locked inode->i_rwsem */ +static inline void nfs_file_block_o_direct(struct nfs_inode *nfsi) +{ + if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { + clear_bit(NFS_INO_ODIRECT, &nfsi->flags); + inode_dio_wait(&nfsi->vfs_inode); + } +} + + /* namespace.c */ #define NFS_PATH_CANONICAL 1 extern char *nfs_path(char **p, struct dentry *dentry, diff --git a/fs/nfs/io.c b/fs/nfs/io.c index 3388faf2acb9f..d275b0a250bf3 100644 --- a/fs/nfs/io.c +++ b/fs/nfs/io.c @@ -14,15 +14,6 @@
#include "internal.h"
-/* Call with exclusively locked inode->i_rwsem */ -static void nfs_block_o_direct(struct nfs_inode *nfsi, struct inode *inode) -{ - if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { - clear_bit(NFS_INO_ODIRECT, &nfsi->flags); - inode_dio_wait(inode); - } -} - /** * nfs_start_io_read - declare the file is being used for buffered reads * @inode: file inode @@ -57,7 +48,7 @@ nfs_start_io_read(struct inode *inode) err = down_write_killable(&inode->i_rwsem); if (err) return err; - nfs_block_o_direct(nfsi, inode); + nfs_file_block_o_direct(nfsi); downgrade_write(&inode->i_rwsem);
return 0; @@ -90,7 +81,7 @@ nfs_start_io_write(struct inode *inode)
err = down_write_killable(&inode->i_rwsem); if (!err) - nfs_block_o_direct(NFS_I(inode), inode); + nfs_file_block_o_direct(NFS_I(inode)); return err; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit b93128f29733af5d427a335978a19884c2c230e2 ]
Ensure that all O_DIRECT reads and writes complete before calling fallocate so that we don't race w.r.t. attribute updates.
Fixes: 99f237832243 ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42proc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 01c01f45358b7..3774d5b64ba0e 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -114,6 +114,7 @@ static int nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep, exception.inode = inode; exception.state = lock->open_context->state;
+ nfs_file_block_o_direct(NFS_I(inode)); err = nfs_sync_inode(inode); if (err) goto out;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit c80ebeba1198eac8811ab0dba36ecc13d51e4438 ]
Ensure that all O_DIRECT reads and writes complete before cloning a file range, so that both the source and destination are up to date.
Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4file.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 5e9d66f3466c8..1fa69a0b33ab1 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -291,9 +291,11 @@ static loff_t nfs42_remap_file_range(struct file *src_file, loff_t src_off,
/* flush all pending writes on both src and dst so that server * has the latest data */ + nfs_file_block_o_direct(NFS_I(src_inode)); ret = nfs_sync_inode(src_inode); if (ret) goto out_unlock; + nfs_file_block_o_direct(NFS_I(dst_inode)); ret = nfs_sync_inode(dst_inode); if (ret) goto out_unlock;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit ca247c89900ae90207f4d321e260cd93b7c7d104 ]
Ensure that all O_DIRECT reads and writes complete before copying a file range, so that the destination is up to date.
Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42proc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 3774d5b64ba0e..48ee3d5d89c4a 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -431,6 +431,7 @@ static ssize_t _nfs42_proc_copy(struct file *src, return status; }
+ nfs_file_block_o_direct(NFS_I(dst_inode)); status = nfs_sync_inode(dst_inode); if (status) return status;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit b7b8574225e9d2b5f1fb5483886ab797892f43b5 ]
If we're truncating part of the folio, then we need to write out the data on the part that is not covered by the cancellation.
Fixes: d47992f86b30 ("mm: change invalidatepage prototype to accept length") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/file.c | 7 ++++--- fs/nfs/write.c | 1 + 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 033feeab8c346..a16a619fb8c33 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -437,10 +437,11 @@ static void nfs_invalidate_folio(struct folio *folio, size_t offset, dfprintk(PAGECACHE, "NFS: invalidate_folio(%lu, %zu, %zu)\n", folio->index, offset, length);
- if (offset != 0 || length < folio_size(folio)) - return; /* Cancel any unstarted writes on this page */ - nfs_wb_folio_cancel(inode, folio); + if (offset != 0 || length < folio_size(folio)) + nfs_wb_folio(inode, folio); + else + nfs_wb_folio_cancel(inode, folio); folio_wait_private_2(folio); /* [DEPRECATED] */ trace_nfs_invalidate_folio(inode, folio_pos(folio) + offset, length); } diff --git a/fs/nfs/write.c b/fs/nfs/write.c index ff29335ed8599..08fd1c0d45ec2 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -2045,6 +2045,7 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio) * release it */ nfs_inode_remove_request(req); nfs_unlock_and_release_request(req); + folio_cancel_dirty(folio); }
return ret;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonathan Curley jcurley@purestorage.com
[ Upstream commit dd2fa82473453661d12723c46c9f43d9876a7efd ]
Typo in ff_lseg_match_mirrors makes the diff ineffective. This results in merge happening all the time. Merge happening all the time is problematic because it marks lsegs invalid. Marking lsegs invalid causes all outstanding IO to get restarted with EAGAIN and connections to get closed.
Closing connections constantly triggers race conditions in the RDMA implementation...
Fixes: 660d1eb22301c ("pNFS/flexfile: Don't merge layout segments if the mirrors don't match") Signed-off-by: Jonathan Curley jcurley@purestorage.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/flexfilelayout/flexfilelayout.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index f8ab7b4e09e7e..9edb5f9b0c4e4 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -293,7 +293,7 @@ ff_lseg_match_mirrors(struct pnfs_layout_segment *l1, struct pnfs_layout_segment *l2) { const struct nfs4_ff_layout_segment *fl1 = FF_LAYOUT_LSEG(l1); - const struct nfs4_ff_layout_segment *fl2 = FF_LAYOUT_LSEG(l1); + const struct nfs4_ff_layout_segment *fl2 = FF_LAYOUT_LSEG(l2); u32 i;
if (fl1->mirror_array_cnt != fl2->mirror_array_cnt)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pu Lehui pulehui@huawei.com
[ Upstream commit cd4453c5e983cf1fd5757e9acb915adb1e4602b6 ]
Syzkaller trigger a fault injection warning:
WARNING: CPU: 1 PID: 12326 at tracepoint_add_func+0xbfc/0xeb0 Modules linked in: CPU: 1 UID: 0 PID: 12326 Comm: syz.6.10325 Tainted: G U 6.14.0-rc5-syzkaller #0 Tainted: [U]=USER Hardware name: Google Compute Engine/Google Compute Engine RIP: 0010:tracepoint_add_func+0xbfc/0xeb0 kernel/tracepoint.c:294 Code: 09 fe ff 90 0f 0b 90 0f b6 74 24 43 31 ff 41 bc ea ff ff ff RSP: 0018:ffffc9000414fb48 EFLAGS: 00010283 RAX: 00000000000012a1 RBX: ffffffff8e240ae0 RCX: ffffc90014b78000 RDX: 0000000000080000 RSI: ffffffff81bbd78b RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffffffffef R13: 0000000000000000 R14: dffffc0000000000 R15: ffffffff81c264f0 FS: 00007f27217f66c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2e80dff8 CR3: 00000000268f8000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tracepoint_probe_register_prio+0xc0/0x110 kernel/tracepoint.c:464 register_trace_prio_sched_switch include/trace/events/sched.h:222 [inline] register_pid_events kernel/trace/trace_events.c:2354 [inline] event_pid_write.isra.0+0x439/0x7a0 kernel/trace/trace_events.c:2425 vfs_write+0x24c/0x1150 fs/read_write.c:677 ksys_write+0x12b/0x250 fs/read_write.c:731 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
We can reproduce the warning by following the steps below: 1. echo 8 >> set_event_notrace_pid. Let tr->filtered_pids owns one pid and register sched_switch tracepoint. 2. echo ' ' >> set_event_pid, and perform fault injection during chunk allocation of trace_pid_list_alloc. Let pid_list with no pid and assign to tr->filtered_pids. 3. echo ' ' >> set_event_pid. Let pid_list is NULL and assign to tr->filtered_pids. 4. echo 9 >> set_event_pid, will trigger the double register sched_switch tracepoint warning.
The reason is that syzkaller injects a fault into the chunk allocation in trace_pid_list_alloc, causing a failure in trace_pid_list_set, which may trigger double register of the same tracepoint. This only occurs when the system is about to crash, but to suppress this warning, let's add failure handling logic to trace_pid_list_set.
Link: https://lore.kernel.org/20250908024658.2390398-1-pulehui@huaweicloud.com Fixes: 8d6e90983ade ("tracing: Create a sparse bitmask for pid filtering") Reported-by: syzbot+161412ccaeff20ce4dde@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67cb890e.050a0220.d8275.022e.GAE@google.com Signed-off-by: Pu Lehui pulehui@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 9329ac1667551..56f6cebdb2299 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -846,7 +846,10 @@ int trace_pid_write(struct trace_pid_list *filtered_pids, /* copy the current bits to the new max */ ret = trace_pid_list_first(filtered_pids, &pid); while (!ret) { - trace_pid_list_set(pid_list, pid); + ret = trace_pid_list_set(pid_list, pid); + if (ret < 0) + goto out; + ret = trace_pid_list_next(filtered_pids, pid + 1, &pid); nr_pids++; } @@ -883,6 +886,7 @@ int trace_pid_write(struct trace_pid_list *filtered_pids, trace_parser_clear(&parser); ret = 0; } + out: trace_parser_put(&parser);
if (ret < 0) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesper Dangaard Brouer hawk@kernel.org
[ Upstream commit 2b986b9e917bc88f81aa1ed386af63b26c983f1d ]
When running an XDP bpf_prog on the remote CPU in cpumap code then we must disable the direct return optimization that xdp_return can perform for mem_type page_pool. This optimization assumes code is still executing under RX-NAPI of the original receiving CPU, which isn't true on this remote CPU.
The cpumap code already disabled this via helpers xdp_set_return_frame_no_direct() and xdp_clear_return_frame_no_direct(), but the scope didn't include xdp_do_flush().
When doing XDP_REDIRECT towards e.g devmap this causes the function bq_xmit_all() to run with direct return optimization enabled. This can lead to hard to find bugs. The issue only happens when bq_xmit_all() cannot ndo_xdp_xmit all frames and them frees them via xdp_return_frame_rx_napi().
Fix by expanding scope to include xdp_do_flush(). This was found by Dragos Tatulea.
Fixes: 11941f8a8536 ("bpf: cpumap: Implement generic cpumap") Reported-by: Dragos Tatulea dtatulea@nvidia.com Reported-by: Chris Arges carges@cloudflare.com Signed-off-by: Jesper Dangaard Brouer hawk@kernel.org Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Tested-by: Chris Arges carges@cloudflare.com Link: https://patch.msgid.link/175519587755.3008742.1088294435150406835.stgit@fire... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/cpumap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 67e8a2fc1a99d..cfcf7ed57ca0d 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -186,7 +186,6 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu, struct xdp_buff xdp; int i, nframes = 0;
- xdp_set_return_frame_no_direct(); xdp.rxq = &rxq;
for (i = 0; i < n; i++) { @@ -231,7 +230,6 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu, } }
- xdp_clear_return_frame_no_direct(); stats->pass += nframes;
return nframes; @@ -255,6 +253,7 @@ static void cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
rcu_read_lock(); bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); + xdp_set_return_frame_no_direct();
ret->xdp_n = cpu_map_bpf_prog_run_xdp(rcpu, frames, ret->xdp_n, stats); if (unlikely(ret->skb_n)) @@ -264,6 +263,7 @@ static void cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, if (stats->redirect) xdp_do_flush();
+ xdp_clear_return_frame_no_direct(); bpf_net_ctx_clear(bpf_net_ctx); rcu_read_unlock();
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gautham R. Shenoy gautham.shenoy@amd.com
[ Upstream commit 220abf77e7c2835cc63ea8cd7158cf83952640af ]
In the "active" mode of the amd-pstate driver with performance governor, the CPPC.min_perf is expected to be the nominal_perf.
However after commit a9b9b4c2a4cd ("cpufreq/amd-pstate: Drop min and max cached frequencies"), this is not the case when the governor is switched from performance to powersave and back to performance, and the CPPC.min_perf will be equal to the scaling_min_freq that was set for the powersave governor.
This is because prior to commit a9b9b4c2a4cd ("cpufreq/amd-pstate: Drop min and max cached frequencies"), amd_pstate_epp_update_limit() would unconditionally call amd_pstate_update_min_max_limit() and the latter function would enforce the CPPC.min_perf constraint when the governor is performance.
However, after the aforementioned commit, amd_pstate_update_min_max_limit() is called by amd_pstate_epp_update_limit() only when either the scaling_{min/max}_freq is different from the cached value of cpudata->{min/max}_limit_freq, which wouldn't have changed on a governor transition from powersave to performance, thus missing out on enforcing the CPPC.min_perf constraint for the performance governor.
Fix this by invoking amd_pstate_epp_udpate_limit() not only when the {min/max} limits have changed from the cached values, but also when the policy itself has changed.
Fixes: a9b9b4c2a4cd ("cpufreq/amd-pstate: Drop min and max cached frequencies") Signed-off-by: Gautham R. Shenoy gautham.shenoy@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20250821042638.356-1-gautham.shenoy@amd.com Signed-off-by: Mario Limonciello (AMD) superm1@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/amd-pstate.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index f3477ab377425..bbb8e18a6e2b9 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -1547,13 +1547,15 @@ static void amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy) pr_debug("CPU %d exiting\n", policy->cpu); }
-static int amd_pstate_epp_update_limit(struct cpufreq_policy *policy) +static int amd_pstate_epp_update_limit(struct cpufreq_policy *policy, bool policy_change) { struct amd_cpudata *cpudata = policy->driver_data; union perf_cached perf; u8 epp;
- if (policy->min != cpudata->min_limit_freq || policy->max != cpudata->max_limit_freq) + if (policy_change || + policy->min != cpudata->min_limit_freq || + policy->max != cpudata->max_limit_freq) amd_pstate_update_min_max_limit(policy);
if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) @@ -1577,7 +1579,7 @@ static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy)
cpudata->policy = policy->policy;
- ret = amd_pstate_epp_update_limit(policy); + ret = amd_pstate_epp_update_limit(policy, true); if (ret) return ret;
@@ -1651,7 +1653,7 @@ static int amd_pstate_epp_resume(struct cpufreq_policy *policy) int ret;
/* enable amd pstate from suspend state*/ - ret = amd_pstate_epp_update_limit(policy); + ret = amd_pstate_epp_update_limit(policy, false); if (ret) return ret;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit 85941afd2c404247e583c827fae0a45da1c1d92c ]
Each PAI PMU device driver returns -EINVAL when an event is out of its accepted range. This return value aborts the search for an alternative PMU device driver to handle this event. Change the return value to -ENOENT. This return value is used to try other PMUs instead. This makes the PMUs more robust when the sequence of PMU device driver initialization changes (at boot time) or by using modules.
Fixes: 39d62336f5c12 ("s390/pai: add support for cryptography counters") Acked-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_pai_crypto.c | 4 ++-- arch/s390/kernel/perf_pai_ext.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c index 63875270941bc..01cc6493367a4 100644 --- a/arch/s390/kernel/perf_pai_crypto.c +++ b/arch/s390/kernel/perf_pai_crypto.c @@ -286,10 +286,10 @@ static int paicrypt_event_init(struct perf_event *event) /* PAI crypto PMU registered as PERF_TYPE_RAW, check event type */ if (a->type != PERF_TYPE_RAW && event->pmu->type != a->type) return -ENOENT; - /* PAI crypto event must be in valid range */ + /* PAI crypto event must be in valid range, try others if not */ if (a->config < PAI_CRYPTO_BASE || a->config > PAI_CRYPTO_BASE + paicrypt_cnt) - return -EINVAL; + return -ENOENT; /* Allow only CRYPTO_ALL for sampling */ if (a->sample_period && a->config != PAI_CRYPTO_BASE) return -EINVAL; diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c index fd14d5ebccbca..d65a9730753c5 100644 --- a/arch/s390/kernel/perf_pai_ext.c +++ b/arch/s390/kernel/perf_pai_ext.c @@ -266,7 +266,7 @@ static int paiext_event_valid(struct perf_event *event) event->hw.config_base = offsetof(struct paiext_cb, acc); return 0; } - return -EINVAL; + return -ENOENT; }
/* Might be called on different CPU than the one the event is intended for. */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit ce971233242b5391d99442271f3ca096fb49818d ]
Deny all sampling event by the CPUMF counter facility device driver and return -ENOENT. This return value is used to try other PMUs. Up to now events for type PERF_TYPE_HARDWARE were not tested for sampling and returned later on -EOPNOTSUPP. This ends the search for alternative PMUs. Change that behavior and try other PMUs instead.
Fixes: 613a41b0d16e ("s390/cpum_cf: Reject request for sampling in event initialization") Acked-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_cpum_cf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c index 6a262e198e35e..952cc8d103693 100644 --- a/arch/s390/kernel/perf_cpum_cf.c +++ b/arch/s390/kernel/perf_cpum_cf.c @@ -761,8 +761,6 @@ static int __hw_perf_event_init(struct perf_event *event, unsigned int type) break;
case PERF_TYPE_HARDWARE: - if (is_sampling_event(event)) /* No sampling support */ - return -ENOENT; ev = attr->config; if (!attr->exclude_user && attr->exclude_kernel) { /* @@ -860,6 +858,8 @@ static int cpumf_pmu_event_init(struct perf_event *event) unsigned int type = event->attr.type; int err = -ENOENT;
+ if (is_sampling_event(event)) /* No sampling support */ + return err; if (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_RAW) err = __hw_perf_event_init(event, type); else if (event->pmu->type == type)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello (AMD) superm1@kernel.org
[ Upstream commit ba3319e5905710abe495b11a1aaf03ebb51d62e2 ]
During the suspend sequence the cached CPPC request is destroyed with the expectation that it's restored during resume. This assumption broke when the separate cache EPP variable was removed, and then it was broken again by commit 608a76b65288 ("cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS option") which explicitly set it to zero during suspend.
Remove the invalidation and set the value during the suspend call to update limits so that the cached variable can be used to restore on resume.
Fixes: 608a76b65288 ("cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS option") Fixes: b7a41156588a ("cpufreq/amd-pstate: Invalidate cppc_req_cached during suspend") Reported-by: goldens goldenspinach.rhbugzilla@gmail.com Closes: https://community.frame.work/t/increased-power-usage-after-resuming-from-sus... Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2391221 Tested-by: goldens goldenspinach.rhbugzilla@gmail.com Tested-by: Willian Wang kernel@willian.wang Reported-by: Vincent Mauirn vincent.maurin.fr@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219981 Tested-by: Alex De Lorenzo kernel@alexdelorenzo.dev Reviewed-by: Gautham R. Shenoy gautham.shenoy@amd.com Link: https://lore.kernel.org/r/20250826052747.2240670-1-superm1@kernel.org Signed-off-by: Mario Limonciello (AMD) superm1@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/amd-pstate.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index bbb8e18a6e2b9..e9aaf72502e51 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -1621,13 +1621,14 @@ static int amd_pstate_suspend(struct cpufreq_policy *policy) * min_perf value across kexec reboots. If this CPU is just resumed back without kexec, * the limits, epp and desired perf will get reset to the cached values in cpudata struct */ - ret = amd_pstate_update_perf(policy, perf.bios_min_perf, 0U, 0U, 0U, false); + ret = amd_pstate_update_perf(policy, perf.bios_min_perf, + FIELD_GET(AMD_CPPC_DES_PERF_MASK, cpudata->cppc_req_cached), + FIELD_GET(AMD_CPPC_MAX_PERF_MASK, cpudata->cppc_req_cached), + FIELD_GET(AMD_CPPC_EPP_PERF_MASK, cpudata->cppc_req_cached), + false); if (ret) return ret;
- /* invalidate to ensure it's rewritten during resume */ - cpudata->cppc_req_cached = 0; - /* set this flag to avoid setting core offline*/ cpudata->suspended = true;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit f9bb6ffa7f5ad0f8ee0f53fc4a10655872ee4a14 ]
Stanislav reported that in bpf_crypto_crypt() the destination dynptr's size is not validated to be at least as large as the source dynptr's size before calling into the crypto backend with 'len = src_len'. This can result in an OOB write when the destination is smaller than the source.
Concretely, in mentioned function, psrc and pdst are both linear buffers fetched from each dynptr:
psrc = __bpf_dynptr_data(src, src_len); [...] pdst = __bpf_dynptr_data_rw(dst, dst_len); [...] err = decrypt ? ctx->type->decrypt(ctx->tfm, psrc, pdst, src_len, piv) : ctx->type->encrypt(ctx->tfm, psrc, pdst, src_len, piv);
The crypto backend expects pdst to be large enough with a src_len length that can be written. Add an additional src_len > dst_len check and bail out if it's the case. Note that these kfuncs are accessible under root privileges only.
Fixes: 3e1c6f35409f ("bpf: make common crypto API for TC/XDP programs") Reported-by: Stanislav Fort disclosure@aisle.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Vadim Fedorenko vadim.fedorenko@linux.dev Reviewed-by: Vadim Fedorenko vadim.fedorenko@linux.dev Link: https://lore.kernel.org/r/20250829143657.318524-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/crypto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/crypto.c b/kernel/bpf/crypto.c index 94854cd9c4cc3..83c4d9943084b 100644 --- a/kernel/bpf/crypto.c +++ b/kernel/bpf/crypto.c @@ -278,7 +278,7 @@ static int bpf_crypto_crypt(const struct bpf_crypto_ctx *ctx, siv_len = siv ? __bpf_dynptr_size(siv) : 0; src_len = __bpf_dynptr_size(src); dst_len = __bpf_dynptr_size(dst); - if (!src_len || !dst_len) + if (!src_len || !dst_len || src_len > dst_len) return -EINVAL;
if (siv_len != ctx->siv_len)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 30f241fcf52aaaef7ac16e66530faa11be78a865 ]
Eryk reported an issue that I have put under Closes: tag, related to umem addrs being prematurely produced onto pool's completion queue. Let us make the skb's destructor responsible for producing all addrs that given skb used.
Commit from fixes tag introduced the buggy behavior, it was not broken from day 1, but rather when xsk multi-buffer got introduced.
In order to mitigate performance impact as much as possible, mimic the linear and frag parts within skb by storing the first address from XSK descriptor at sk_buff::destructor_arg. For fragments, store them at ::cb via list. The nodes that will go onto list will be allocated via kmem_cache. xsk_destruct_skb() will consume address stored at ::destructor_arg and optionally go through list from ::cb, if count of descriptors associated with this particular skb is bigger than 1.
Previous approach where whole array for storing UMEM addresses from XSK descriptors was pre-allocated during first fragment processing yielded too big performance regression for 64b traffic. In current approach impact is much reduced on my tests and for jumbo frames I observed traffic being slower by at most 9%.
Magnus suggested to have this way of processing special cased for XDP_SHARED_UMEM, so we would identify this during bind and set different hooks for 'backpressure mechanism' on CQ and for skb destructor, but given that results looked promising on my side I decided to have a single data path for XSK generic Tx. I suppose other auxiliary stuff would have to land as well in order to make it work.
Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path") Reported-by: Eryk Kubanski e.kubanski@partner.samsung.com Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.sam... Acked-by: Stanislav Fomichev sdf@fomichev.me Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Jason Xing kerneljasonxing@gmail.com Reviewed-by: Jason Xing kerneljasonxing@gmail.com Link: https://lore.kernel.org/r/20250904194907.2342177-1-maciej.fijalkowski@intel.... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/xdp/xsk.c | 113 ++++++++++++++++++++++++++++++++++++++------ net/xdp/xsk_queue.h | 12 +++++ 2 files changed, 111 insertions(+), 14 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 72c000c0ae5f5..de331541fdb38 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -36,6 +36,20 @@ #define TX_BATCH_SIZE 32 #define MAX_PER_SOCKET_BUDGET (TX_BATCH_SIZE)
+struct xsk_addr_node { + u64 addr; + struct list_head addr_node; +}; + +struct xsk_addr_head { + u32 num_descs; + struct list_head addrs_list; +}; + +static struct kmem_cache *xsk_tx_generic_cache; + +#define XSKCB(skb) ((struct xsk_addr_head *)((skb)->cb)) + void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { if (pool->cached_need_wakeup & XDP_WAKEUP_RX) @@ -528,24 +542,43 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags) return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags); }
-static int xsk_cq_reserve_addr_locked(struct xsk_buff_pool *pool, u64 addr) +static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool) { unsigned long flags; int ret;
spin_lock_irqsave(&pool->cq_lock, flags); - ret = xskq_prod_reserve_addr(pool->cq, addr); + ret = xskq_prod_reserve(pool->cq); spin_unlock_irqrestore(&pool->cq_lock, flags);
return ret; }
-static void xsk_cq_submit_locked(struct xsk_buff_pool *pool, u32 n) +static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool, + struct sk_buff *skb) { + struct xsk_addr_node *pos, *tmp; + u32 descs_processed = 0; unsigned long flags; + u32 idx;
spin_lock_irqsave(&pool->cq_lock, flags); - xskq_prod_submit_n(pool->cq, n); + idx = xskq_get_prod(pool->cq); + + xskq_prod_write_addr(pool->cq, idx, + (u64)(uintptr_t)skb_shinfo(skb)->destructor_arg); + descs_processed++; + + if (unlikely(XSKCB(skb)->num_descs > 1)) { + list_for_each_entry_safe(pos, tmp, &XSKCB(skb)->addrs_list, addr_node) { + xskq_prod_write_addr(pool->cq, idx + descs_processed, + pos->addr); + descs_processed++; + list_del(&pos->addr_node); + kmem_cache_free(xsk_tx_generic_cache, pos); + } + } + xskq_prod_submit_n(pool->cq, descs_processed); spin_unlock_irqrestore(&pool->cq_lock, flags); }
@@ -558,9 +591,14 @@ static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n) spin_unlock_irqrestore(&pool->cq_lock, flags); }
+static void xsk_inc_num_desc(struct sk_buff *skb) +{ + XSKCB(skb)->num_descs++; +} + static u32 xsk_get_num_desc(struct sk_buff *skb) { - return skb ? (long)skb_shinfo(skb)->destructor_arg : 0; + return XSKCB(skb)->num_descs; }
static void xsk_destruct_skb(struct sk_buff *skb) @@ -572,23 +610,33 @@ static void xsk_destruct_skb(struct sk_buff *skb) *compl->tx_timestamp = ktime_get_tai_fast_ns(); }
- xsk_cq_submit_locked(xdp_sk(skb->sk)->pool, xsk_get_num_desc(skb)); + xsk_cq_submit_addr_locked(xdp_sk(skb->sk)->pool, skb); sock_wfree(skb); }
-static void xsk_set_destructor_arg(struct sk_buff *skb) +static void xsk_set_destructor_arg(struct sk_buff *skb, u64 addr) { - long num = xsk_get_num_desc(xdp_sk(skb->sk)->skb) + 1; - - skb_shinfo(skb)->destructor_arg = (void *)num; + BUILD_BUG_ON(sizeof(struct xsk_addr_head) > sizeof(skb->cb)); + INIT_LIST_HEAD(&XSKCB(skb)->addrs_list); + XSKCB(skb)->num_descs = 0; + skb_shinfo(skb)->destructor_arg = (void *)(uintptr_t)addr; }
static void xsk_consume_skb(struct sk_buff *skb) { struct xdp_sock *xs = xdp_sk(skb->sk); + u32 num_descs = xsk_get_num_desc(skb); + struct xsk_addr_node *pos, *tmp; + + if (unlikely(num_descs > 1)) { + list_for_each_entry_safe(pos, tmp, &XSKCB(skb)->addrs_list, addr_node) { + list_del(&pos->addr_node); + kmem_cache_free(xsk_tx_generic_cache, pos); + } + }
skb->destructor = sock_wfree; - xsk_cq_cancel_locked(xs->pool, xsk_get_num_desc(skb)); + xsk_cq_cancel_locked(xs->pool, num_descs); /* Free skb without triggering the perf drop trace */ consume_skb(skb); xs->skb = NULL; @@ -605,6 +653,7 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, { struct xsk_buff_pool *pool = xs->pool; u32 hr, len, ts, offset, copy, copied; + struct xsk_addr_node *xsk_addr; struct sk_buff *skb = xs->skb; struct page *page; void *buffer; @@ -619,6 +668,19 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, return ERR_PTR(err);
skb_reserve(skb, hr); + + xsk_set_destructor_arg(skb, desc->addr); + } else { + xsk_addr = kmem_cache_zalloc(xsk_tx_generic_cache, GFP_KERNEL); + if (!xsk_addr) + return ERR_PTR(-ENOMEM); + + /* in case of -EOVERFLOW that could happen below, + * xsk_consume_skb() will release this node as whole skb + * would be dropped, which implies freeing all list elements + */ + xsk_addr->addr = desc->addr; + list_add_tail(&xsk_addr->addr_node, &XSKCB(skb)->addrs_list); }
addr = desc->addr; @@ -690,8 +752,11 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, err = skb_store_bits(skb, 0, buffer, len); if (unlikely(err)) goto free_err; + + xsk_set_destructor_arg(skb, desc->addr); } else { int nr_frags = skb_shinfo(skb)->nr_frags; + struct xsk_addr_node *xsk_addr; struct page *page; u8 *vaddr;
@@ -706,12 +771,22 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, goto free_err; }
+ xsk_addr = kmem_cache_zalloc(xsk_tx_generic_cache, GFP_KERNEL); + if (!xsk_addr) { + __free_page(page); + err = -ENOMEM; + goto free_err; + } + vaddr = kmap_local_page(page); memcpy(vaddr, buffer, len); kunmap_local(vaddr);
skb_add_rx_frag(skb, nr_frags, page, 0, len, PAGE_SIZE); refcount_add(PAGE_SIZE, &xs->sk.sk_wmem_alloc); + + xsk_addr->addr = desc->addr; + list_add_tail(&xsk_addr->addr_node, &XSKCB(skb)->addrs_list); }
if (first_frag && desc->options & XDP_TX_METADATA) { @@ -755,7 +830,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, skb->mark = READ_ONCE(xs->sk.sk_mark); skb->destructor = xsk_destruct_skb; xsk_tx_metadata_to_compl(meta, &skb_shinfo(skb)->xsk_meta); - xsk_set_destructor_arg(skb); + xsk_inc_num_desc(skb);
return skb;
@@ -765,7 +840,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
if (err == -EOVERFLOW) { /* Drop the packet */ - xsk_set_destructor_arg(xs->skb); + xsk_inc_num_desc(xs->skb); xsk_drop_skb(xs->skb); xskq_cons_release(xs->tx); } else { @@ -807,7 +882,7 @@ static int __xsk_generic_xmit(struct sock *sk) * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ - err = xsk_cq_reserve_addr_locked(xs->pool, desc.addr); + err = xsk_cq_reserve_locked(xs->pool); if (err) { err = -EAGAIN; goto out; @@ -1795,8 +1870,18 @@ static int __init xsk_init(void) if (err) goto out_pernet;
+ xsk_tx_generic_cache = kmem_cache_create("xsk_generic_xmit_cache", + sizeof(struct xsk_addr_node), + 0, SLAB_HWCACHE_ALIGN, NULL); + if (!xsk_tx_generic_cache) { + err = -ENOMEM; + goto out_unreg_notif; + } + return 0;
+out_unreg_notif: + unregister_netdevice_notifier(&xsk_netdev_notifier); out_pernet: unregister_pernet_subsys(&xsk_net_ops); out_sk: diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 46d87e961ad6d..f16f390370dc4 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -344,6 +344,11 @@ static inline u32 xskq_cons_present_entries(struct xsk_queue *q)
/* Functions for producers */
+static inline u32 xskq_get_prod(struct xsk_queue *q) +{ + return READ_ONCE(q->ring->producer); +} + static inline u32 xskq_prod_nb_free(struct xsk_queue *q, u32 max) { u32 free_entries = q->nentries - (q->cached_prod - q->cached_cons); @@ -390,6 +395,13 @@ static inline int xskq_prod_reserve_addr(struct xsk_queue *q, u64 addr) return 0; }
+static inline void xskq_prod_write_addr(struct xsk_queue *q, u32 idx, u64 addr) +{ + struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; + + ring->desc[idx & q->ring_mask] = addr; +} + static inline void xskq_prod_write_addr_batch(struct xsk_queue *q, struct xdp_desc *descs, u32 nb_entries) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kumar Kartikeya Dwivedi memxor@gmail.com
[ Upstream commit 0d80e7f951be1bdd08d328fd87694be0d6e8aaa8 ]
Currently, out of all 3 types of waiters in the rqspinlock slow path (i.e., pending bit waiter, wait queue head waiter, and wait queue non-head waiter), only the pending bit waiter and wait queue head waiters apply deadlock checks and a timeout on their waiting loop. The assumption here was that the wait queue head's forward progress would be sufficient to identify cases where the lock owner or pending bit waiter is stuck, and non-head waiters relying on the head waiter would prove to be sufficient for their own forward progress.
However, the head waiter itself can be preempted by a non-head waiter for the same lock (AA) or a different lock (ABBA) in a manner that impedes its forward progress. In such a case, non-head waiters not performing deadlock and timeout checks becomes insufficient, and the system can enter a state of lockup.
This is typically not a concern with non-NMI lock acquisitions, as lock holders which in run in different contexts (IRQ, non-IRQ) use "irqsave" variants of the lock APIs, which naturally excludes such lock holders from preempting one another on the same CPU.
It might seem likely that a similar case may occur for rqspinlock when programs are attached to contention tracepoints (begin, end), however, these tracepoints either precede the enqueue into the wait queue, or succeed it, therefore cannot be used to preempt a head waiter's waiting loop.
We must still be careful against nested kprobe and fentry programs that may attach to the middle of the head's waiting loop to stall forward progress and invoke another rqspinlock acquisition that proceeds as a non-head waiter. To this end, drop CC_FLAGS_FTRACE from the rqspinlock.o object file.
For now, this issue is resolved by falling back to a repeated trylock on the lock word from NMI context, while performing the deadlock checks to break out early in case forward progress is impossible, and use the timeout as a final fallback.
A more involved fix to terminate the queue when such a condition occurs will be made as a follow up. A selftest to stress this aspect of nested NMI/non-NMI locking attempts will be added in a subsequent patch to the bpf-next tree when this fix lands and trees are synchronized.
Reported-by: Josef Bacik josef@toxicpanda.com Fixes: 164c246571e9 ("rqspinlock: Protect waiters in queue from stalls") Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Link: https://lore.kernel.org/r/20250909184959.3509085-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/Makefile | 1 + kernel/bpf/rqspinlock.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 3a335c50e6e3c..12ec926ed7114 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -62,3 +62,4 @@ CFLAGS_REMOVE_bpf_lru_list.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_queue_stack_maps.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_lpm_trie.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_ringbuf.o = $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_rqspinlock.o = $(CC_FLAGS_FTRACE) diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c index 338305c8852cf..804e619f1e006 100644 --- a/kernel/bpf/rqspinlock.c +++ b/kernel/bpf/rqspinlock.c @@ -471,7 +471,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val) * any MCS node. This is not the most elegant solution, but is * simple enough. */ - if (unlikely(idx >= _Q_MAX_NODES)) { + if (unlikely(idx >= _Q_MAX_NODES || in_nmi())) { lockevent_inc(lock_no_node); RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT); while (!queued_spin_trylock(lock)) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: KaFai Wan kafai.wan@linux.dev
[ Upstream commit df0cb5cb50bd54d3cd4d0d83417ceec6a66404aa ]
OpenWRT users reported regression on ARMv6 devices after updating to latest HEAD, where tcpdump filter:
tcpdump "not ether host 3c37121a2b3c and not ether host 184ecbca2a3a \ and not ether host 14130b4d3f47 and not ether host f0f61cf440b7 \ and not ether host a84b4dedf471 and not ether host d022be17e1d7 \ and not ether host 5c497967208b and not ether host 706655784d5b"
fails with warning: "Kernel filter failed: No error information" when using config: # CONFIG_BPF_JIT_ALWAYS_ON is not set CONFIG_BPF_JIT_DEFAULT_ON=y
The issue arises because commits: 1. "bpf: Fix array bounds error with may_goto" changed default runtime to __bpf_prog_ret0_warn when jit_requested = 1 2. "bpf: Avoid __bpf_prog_ret0_warn when jit fails" returns error when jit_requested = 1 but jit fails
This change restores interpreter fallback capability for BPF programs with stack size <= 512 bytes when jit fails.
Reported-by: Felix Fietkau nbd@nbd.name Closes: https://lore.kernel.org/bpf/2e267b4b-0540-45d8-9310-e127bf95fc63@nbd.name/ Fixes: 6ebc5030e0c5 ("bpf: Fix array bounds error with may_goto") Signed-off-by: KaFai Wan kafai.wan@linux.dev Acked-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20250909144614.2991253-1-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/core.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index d966e971893ab..829f0792d8d83 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2354,8 +2354,7 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx, const struct bpf_insn *insn) { /* If this handler ever gets executed, then BPF_JIT_ALWAYS_ON - * is not working properly, or interpreter is being used when - * prog->jit_requested is not 0, so warn about it! + * is not working properly, so warn about it! */ WARN_ON_ONCE(1); return 0; @@ -2456,8 +2455,9 @@ static int bpf_check_tail_call(const struct bpf_prog *fp) return ret; }
-static void bpf_prog_select_func(struct bpf_prog *fp) +static bool bpf_prog_select_interpreter(struct bpf_prog *fp) { + bool select_interpreter = false; #ifndef CONFIG_BPF_JIT_ALWAYS_ON u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1); u32 idx = (round_up(stack_depth, 32) / 32) - 1; @@ -2466,15 +2466,16 @@ static void bpf_prog_select_func(struct bpf_prog *fp) * But for non-JITed programs, we don't need bpf_func, so no bounds * check needed. */ - if (!fp->jit_requested && - !WARN_ON_ONCE(idx >= ARRAY_SIZE(interpreters))) { + if (idx < ARRAY_SIZE(interpreters)) { fp->bpf_func = interpreters[idx]; + select_interpreter = true; } else { fp->bpf_func = __bpf_prog_ret0_warn; } #else fp->bpf_func = __bpf_prog_ret0_warn; #endif + return select_interpreter; }
/** @@ -2493,7 +2494,7 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) /* In case of BPF to BPF calls, verifier did all the prep * work with regards to JITing, etc. */ - bool jit_needed = fp->jit_requested; + bool jit_needed = false;
if (fp->bpf_func) goto finalize; @@ -2502,7 +2503,8 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) bpf_prog_has_kfunc_call(fp)) jit_needed = true;
- bpf_prog_select_func(fp); + if (!bpf_prog_select_interpreter(fp)) + jit_needed = true;
/* eBPF JITs can rewrite the program in case constant * blinding is active. However, in case of error during
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peilin Ye yepeilin@google.com
[ Upstream commit 6d78b4473cdb08b74662355a9e8510bde09c511e ]
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can cause various locking issues; see the following stack trace (edited for style) as one example:
... [10.011566] do_raw_spin_lock.cold [10.011570] try_to_wake_up (5) double-acquiring the same [10.011575] kick_pool rq_lock, causing a hardlockup [10.011579] __queue_work [10.011582] queue_work_on [10.011585] kernfs_notify [10.011589] cgroup_file_notify [10.011593] try_charge_memcg (4) memcg accounting raises an [10.011597] obj_cgroup_charge_pages MEMCG_MAX event [10.011599] obj_cgroup_charge_account [10.011600] __memcg_slab_post_alloc_hook [10.011603] __kmalloc_node_noprof ... [10.011611] bpf_map_kmalloc_node [10.011612] __bpf_async_init [10.011615] bpf_timer_init (3) BPF calls bpf_timer_init() [10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable [10.011619] bpf__sched_ext_ops_runnable [10.011620] enqueue_task_scx (2) BPF runs with rq_lock held [10.011622] enqueue_task [10.011626] ttwu_do_activate [10.011629] sched_ttwu_pending (1) grabs rq_lock ...
The above was reproduced on bpf-next (b338cf849ec8) by modifying ./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during ops.runnable(), and hacking the memcg accounting code a bit to make a bpf_timer_init() call more likely to raise an MEMCG_MAX event.
We have also run into other similar variants (both internally and on bpf-next), including double-acquiring cgroup_file_kn_lock, the same worker_pool::lock, etc.
As suggested by Shakeel, fix this by using __GFP_HIGH instead of GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg() raises an MEMCG_MAX event, we call __memcg_memory_event() with @allow_spinning=false and avoid calling cgroup_file_notify() there.
Depends on mm patch "memcg: skip cgroup_file_notify if spinning is not allowed": https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/
v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/ https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/ v1 approach: https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Suggested-by: Shakeel Butt shakeel.butt@linux.dev Signed-off-by: Peilin Ye yepeilin@google.com Link: https://lore.kernel.org/r/20250909095222.2121438-1-yepeilin@google.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/helpers.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index fdf8737542ac4..3abbdebb2d9ef 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1277,8 +1277,11 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u goto out; }
- /* allocate hrtimer via map_kmalloc to use memcg accounting */ - cb = bpf_map_kmalloc_node(map, size, GFP_ATOMIC, map->numa_node); + /* Allocate via bpf_map_kmalloc_node() for memcg accounting. Until + * kmalloc_nolock() is available, avoid locking issues by using + * __GFP_HIGH (GFP_ATOMIC & ~__GFP_RECLAIM). + */ + cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node); if (!cb) { ret = -ENOMEM; goto out;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@google.com
[ Upstream commit a3967baad4d533dc254c31e0d221e51c8d223d58 ]
syzbot reported the splat below. [0]
The repro does the following:
1. Load a sk_msg prog that calls bpf_msg_cork_bytes(msg, cork_bytes) 2. Attach the prog to a SOCKMAP 3. Add a socket to the SOCKMAP 4. Activate fault injection 5. Send data less than cork_bytes
At 5., the data is carried over to the next sendmsg() as it is smaller than the cork_bytes specified by bpf_msg_cork_bytes().
Then, tcp_bpf_send_verdict() tries to allocate psock->cork to hold the data, but this fails silently due to fault injection + __GFP_NOWARN.
If the allocation fails, we need to revert the sk->sk_forward_alloc change done by sk_msg_alloc().
Let's call sk_msg_free() when tcp_bpf_send_verdict fails to allocate psock->cork.
The "*copied" also needs to be updated such that a proper error can be returned to the caller, sendmsg. It fails to allocate psock->cork. Nothing has been corked so far, so this patch simply sets "*copied" to 0.
[0]: WARNING: net/ipv4/af_inet.c:156 at inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156, CPU#1: syz-executor/5983 Modules linked in: CPU: 1 UID: 0 PID: 5983 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156 Code: 0f 0b 90 e9 62 fe ff ff e8 7a db b5 f7 90 0f 0b 90 e9 95 fe ff ff e8 6c db b5 f7 90 0f 0b 90 e9 bb fe ff ff e8 5e db b5 f7 90 <0f> 0b 90 e9 e1 fe ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 9f fc RSP: 0018:ffffc90000a08b48 EFLAGS: 00010246 RAX: ffffffff8a09d0b2 RBX: dffffc0000000000 RCX: ffff888024a23c80 RDX: 0000000000000100 RSI: 0000000000000fff RDI: 0000000000000000 RBP: 0000000000000fff R08: ffff88807e07c627 R09: 1ffff1100fc0f8c4 R10: dffffc0000000000 R11: ffffed100fc0f8c5 R12: ffff88807e07c380 R13: dffffc0000000000 R14: ffff88807e07c60c R15: 1ffff1100fc0f872 FS: 00005555604c4500(0000) GS:ffff888125af1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555604df5c8 CR3: 0000000032b06000 CR4: 00000000003526f0 Call Trace: <IRQ> __sk_destruct+0x86/0x660 net/core/sock.c:2339 rcu_do_batch kernel/rcu/tree.c:2605 [inline] rcu_core+0xca8/0x1770 kernel/rcu/tree.c:2861 handle_softirqs+0x286/0x870 kernel/softirq.c:579 __do_softirq kernel/softirq.c:613 [inline] invoke_softirq kernel/softirq.c:453 [inline] __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1052 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1052 </IRQ>
Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Reported-by: syzbot+4cabd1d2fa917a456db8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68c0b6b5.050a0220.3c6139.0013.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima kuniyu@google.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Link: https://patch.msgid.link/20250909232623.4151337-1-kuniyu@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_bpf.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index ba581785adb4b..a268e1595b22a 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -408,8 +408,11 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock, if (!psock->cork) { psock->cork = kzalloc(sizeof(*psock->cork), GFP_ATOMIC | __GFP_NOWARN); - if (!psock->cork) + if (!psock->cork) { + sk_msg_free(sk, msg); + *copied = 0; return -ENOMEM; + } } memcpy(psock->cork, msg, sizeof(*msg)); return 0;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: wangzijie wangzijie1@honor.com
[ Upstream commit 0ce9398aa0830f15f92bbed73853f9861c3e74ff ]
Commit 2ce3d282bd50 ("proc: fix missing pde_set_flags() for net proc files") missed a key part in the definition of proc_dir_entry:
union { const struct proc_ops *proc_ops; const struct file_operations *proc_dir_ops; };
So dereference of ->proc_ops assumes it is a proc_ops structure results in type confusion and make NULL check for 'proc_ops' not work for proc dir.
Add !S_ISDIR(dp->mode) test before calling pde_set_flags() to fix it.
Link: https://lkml.kernel.org/r/20250904135715.3972782-1-wangzijie1@honor.com Fixes: 2ce3d282bd50 ("proc: fix missing pde_set_flags() for net proc files") Signed-off-by: wangzijie wangzijie1@honor.com Reported-by: Brad Spengler spender@grsecurity.net Closes: https://lore.kernel.org/all/20250903065758.3678537-1-wangzijie1@honor.com/ Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Jiri Slaby jirislaby@kernel.org Cc: Stefano Brivio sbrivio@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/generic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 409bc1d11eca3..8e1e48760ffe0 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -390,7 +390,8 @@ struct proc_dir_entry *proc_register(struct proc_dir_entry *dir, if (proc_alloc_inum(&dp->low_ino)) goto out_free_entry;
- pde_set_flags(dp); + if (!S_ISDIR(dp->mode)) + pde_set_flags(dp);
write_lock(&proc_subdir_lock); dp->parent = dir;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Salah Triki salah.triki@gmail.com
commit ff2a66d21fd2364ed9396d151115eec59612b200 upstream.
dma_free_coherent() must only be called if the corresponding dma_alloc_coherent() call has succeeded. Calling it when the allocation fails leads to undefined behavior.
Delete the wrong call.
[ bp: Massage commit message. ]
Fixes: 71bcada88b0f3 ("edac: altera: Add Altera SDRAM EDAC support") Signed-off-by: Salah Triki salah.triki@gmail.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: Dinh Nguyen dinguyen@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/aIrfzzqh4IzYtDVC@pc Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/edac/altera_edac.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/edac/altera_edac.c +++ b/drivers/edac/altera_edac.c @@ -128,7 +128,6 @@ static ssize_t altr_sdr_mc_err_inject_wr
ptemp = dma_alloc_coherent(mci->pdev, 16, &dma_handle, GFP_KERNEL); if (!ptemp) { - dma_free_coherent(mci->pdev, 16, ptemp, dma_handle); edac_printk(KERN_ERR, EDAC_MC, "Inject: Buffer Allocation error\n"); return -ENOMEM;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Jelonek jelonek.jonas@gmail.com
commit cd6c956fbc13156bcbcca084b46a8380caebc2a8 upstream.
Fix the current check for number of channels (child nodes in the device tree). Before, this was:
if (device_get_child_node_count(dev) >= RTL9300_I2C_MUX_NCHAN)
RTL9300_I2C_MUX_NCHAN gives the maximum number of channels so checking with '>=' isn't correct because it doesn't allow the last channel number. Thus, fix it to:
if (device_get_child_node_count(dev) > RTL9300_I2C_MUX_NCHAN)
Issue occured on a TP-Link TL-ST1008F v2.0 device (8 SFP+ ports) and fix is tested there.
Fixes: c366be720235 ("i2c: Add driver for the RTL9300 I2C controller") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Jonas Jelonek jelonek.jonas@gmail.com Tested-by: Sven Eckelmann sven@narfation.org Reviewed-by: Chris Packham chris.packham@alliedtelesis.co.nz Tested-by: Chris Packham chris.packham@alliedtelesis.co.nz # On RTL9302C based board Tested-by: Markus Stockhausen markus.stockhausen@gmx.de Signed-off-by: Andi Shyti andi.shyti@kernel.org Link: https://lore.kernel.org/r/20250831100457.3114-2-jelonek.jonas@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-rtl9300.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/i2c/busses/i2c-rtl9300.c +++ b/drivers/i2c/busses/i2c-rtl9300.c @@ -353,7 +353,7 @@ static int rtl9300_i2c_probe(struct plat
platform_set_drvdata(pdev, i2c);
- if (device_get_child_node_count(dev) >= RTL9300_I2C_MUX_NCHAN) + if (device_get_child_node_count(dev) > RTL9300_I2C_MUX_NCHAN) return dev_err_probe(dev, -EINVAL, "Too many channels\n");
device_for_each_child_node(dev, child) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 199cd9e8d14bc14bdbd1fa3031ce26dac9781507 upstream.
This reverts commit 14e41b16e8cb677bb440dca2edba8b041646c742.
This patch breaks the LTP acct02 test, so let's revert and look for a better solution.
Reported-by: Mark Brown broonie@kernel.org Reported-by: Harshvardhan Jha harshvardhan.j.jha@oracle.com Link: https://lore.kernel.org/linux-nfs/7d4d57b0-39a3-49f1-8ada-60364743e3b4@siren... Cc: stable@vger.kernel.org # 6.15.x Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/sched.c | 2 -- 1 file changed, 2 deletions(-)
--- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -276,8 +276,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue
static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode) { - if (unlikely(current->flags & PF_EXITING)) - return -EINTR; schedule(); if (signal_pending_state(mode, current)) return -ERESTARTSYS;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 3fac212fe489aa0dbe8d80a42a7809840ca7b0f9 upstream.
Clang 22 recently added support for defining __SANITIZE__ macros similar to GCC [1], which causes warnings (or errors with CONFIG_WERROR=y or W=e) with the existing defines that the kernel creates to emulate this behavior with existing clang versions.
In file included from <built-in>:3: In file included from include/linux/compiler_types.h:171: include/linux/compiler-clang.h:37:9: error: '__SANITIZE_THREAD__' macro redefined [-Werror,-Wmacro-redefined] 37 | #define __SANITIZE_THREAD__ | ^ <built-in>:352:9: note: previous definition is here 352 | #define __SANITIZE_THREAD__ 1 | ^
Refactor compiler-clang.h to only define the sanitizer macros when they are undefined and adjust the rest of the code to use these macros for checking if the sanitizers are enabled, clearing up the warnings and allowing the kernel to easily drop these defines when the minimum supported version of LLVM for building the kernel becomes 22.0.0 or newer.
Link: https://lkml.kernel.org/r/20250902-clang-update-sanitize-defines-v1-1-cf3702... Link: https://github.com/llvm/llvm-project/commit/568c23bbd3303518c5056d7f03444dae... [1] Signed-off-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Justin Stitt justinstitt@google.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Bill Wendling morbo@google.com Cc: Dmitriy Vyukov dvyukov@google.com Cc: Marco Elver elver@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/compiler-clang.h | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-)
--- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -18,23 +18,42 @@ #define KASAN_ABI_VERSION 5
/* + * Clang 22 added preprocessor macros to match GCC, in hopes of eventually + * dropping __has_feature support for sanitizers: + * https://github.com/llvm/llvm-project/commit/568c23bbd3303518c5056d7f03444dae... + * Create these macros for older versions of clang so that it is easy to clean + * up once the minimum supported version of LLVM for building the kernel always + * creates these macros. + * * Note: Checking __has_feature(*_sanitizer) is only true if the feature is * enabled. Therefore it is not required to additionally check defined(CONFIG_*) * to avoid adding redundant attributes in other configurations. */ +#if __has_feature(address_sanitizer) && !defined(__SANITIZE_ADDRESS__) +#define __SANITIZE_ADDRESS__ +#endif +#if __has_feature(hwaddress_sanitizer) && !defined(__SANITIZE_HWADDRESS__) +#define __SANITIZE_HWADDRESS__ +#endif +#if __has_feature(thread_sanitizer) && !defined(__SANITIZE_THREAD__) +#define __SANITIZE_THREAD__ +#endif
-#if __has_feature(address_sanitizer) || __has_feature(hwaddress_sanitizer) -/* Emulate GCC's __SANITIZE_ADDRESS__ flag */ +/* + * Treat __SANITIZE_HWADDRESS__ the same as __SANITIZE_ADDRESS__ in the kernel. + */ +#ifdef __SANITIZE_HWADDRESS__ #define __SANITIZE_ADDRESS__ +#endif + +#ifdef __SANITIZE_ADDRESS__ #define __no_sanitize_address \ __attribute__((no_sanitize("address", "hwaddress"))) #else #define __no_sanitize_address #endif
-#if __has_feature(thread_sanitizer) -/* emulate gcc's __SANITIZE_THREAD__ flag */ -#define __SANITIZE_THREAD__ +#ifdef __SANITIZE_THREAD__ #define __no_sanitize_thread \ __attribute__((no_sanitize("thread"))) #else
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
commit 04d3cd43700a2d0fe4bfb1012a8ec7f2e34a3507 upstream.
Patch series "kexec: Fix invalid field access".
The kexec_buf structure was previously declared without initialization. commit bf454ec31add ("kexec_file: allow to place kexec_buf randomly") added a field that is always read but not consistently populated by all architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was accessed:
------------[ cut here ]------------ UBSAN: invalid-load in ./include/linux/kexec.h:210:10 load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are cleanly set, preventing future instances of uninitialized memory being used.
An initial fix was already landed for arm64[0], and this patchset fixes the problem on the remaining arm64 code and on riscv, as raised by Mark.
Discussions about this problem could be found at[1][2].
This patch (of 3):
The kexec_buf structure was previously declared without initialization. commit bf454ec31add ("kexec_file: allow to place kexec_buf randomly") added a field that is always read but not consistently populated by all architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was accessed:
------------[ cut here ]------------ UBSAN: invalid-load in ./include/linux/kexec.h:210:10 load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are cleanly set, preventing future instances of uninitialized memory being used.
Link: https://lkml.kernel.org/r/20250827-kbuf_all-v1-0-1df9882bb01a@debian.org Link: https://lkml.kernel.org/r/20250827-kbuf_all-v1-1-1df9882bb01a@debian.org Link: https://lore.kernel.org/all/20250826180742.f2471131255ec1c43683ea07@linux-fo... [0] Link: https://lore.kernel.org/all/oninomspajhxp4omtdapxnckxydbk2nzmrix7rggmpukpnza... [1] Link: https://lore.kernel.org/all/20250826-akpm-v1-1-3c831f0e3799@debian.org/ [2] Fixes: bf454ec31add ("kexec_file: allow to place kexec_buf randomly") Signed-off-by: Breno Leitao leitao@debian.org Acked-by: Baoquan He bhe@redhat.com Cc: Albert Ou aou@eecs.berkeley.edu Cc: Alexander Gordeev agordeev@linux.ibm.com Cc: Alexandre Ghiti alex@ghiti.fr Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Coiby Xu coxu@redhat.com Cc: Heiko Carstens hca@linux.ibm.com Cc: Palmer Dabbelt palmer@dabbelt.com Cc: Paul Walmsley paul.walmsley@sifive.com Cc: Sven Schnelle svens@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/machine_kexec_file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index af1ca875c52c..410060ebd86d 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -94,7 +94,7 @@ int load_other_segments(struct kimage *image, char *initrd, unsigned long initrd_len, char *cmdline) { - struct kexec_buf kbuf; + struct kexec_buf kbuf = {}; void *dtb = NULL; unsigned long initrd_load_addr = 0, dtb_len, orig_segments = image->nr_segments;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krister Johansen kjlx@templeofstupid.com
commit 648de37416b301f046f62f1b65715c7fa8ebaa67 upstream.
Users reported a scenario where MPTCP connections that were configured with SO_KEEPALIVE prior to connect would fail to enable their keepalives if MTPCP fell back to TCP mode.
After investigating, this affects keepalives for any connection where sync_socket_options is called on a socket that is in the closed or listening state. Joins are handled properly. For connects, sync_socket_options is called when the socket is still in the closed state. The tcp_set_keepalive() function does not act on sockets that are closed or listening, hence keepalive is not immediately enabled. Since the SO_KEEPOPEN flag is absent, it is not enabled later in the connect sequence via tcp_finish_connect. Setting the keepalive via sockopt after connect does work, but would not address any subsequently created flows.
Fortunately, the fix here is straight-forward: set SOCK_KEEPOPEN on the subflow when calling sync_socket_options.
The fix was valdidated both by using tcpdump to observe keepalive packets not being sent before the fix, and being sent after the fix. It was also possible to observe via ss that the keepalive timer was not enabled on these sockets before the fix, but was enabled afterwards.
Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY") Cc: stable@vger.kernel.org Signed-off-by: Krister Johansen kjlx@templeofstupid.com Reviewed-by: Geliang Tang geliang@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/aL8dYfPZrwedCIh9@templeofstupid.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/sockopt.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -1508,13 +1508,12 @@ static void sync_socket_options(struct m { static const unsigned int tx_rx_locks = SOCK_RCVBUF_LOCK | SOCK_SNDBUF_LOCK; struct sock *sk = (struct sock *)msk; + bool keep_open;
- if (ssk->sk_prot->keepalive) { - if (sock_flag(sk, SOCK_KEEPOPEN)) - ssk->sk_prot->keepalive(ssk, 1); - else - ssk->sk_prot->keepalive(ssk, 0); - } + keep_open = sock_flag(sk, SOCK_KEEPOPEN); + if (ssk->sk_prot->keepalive) + ssk->sk_prot->keepalive(ssk, keep_open); + sock_valbool_flag(ssk, SOCK_KEEPOPEN, keep_open);
ssk->sk_priority = sk->sk_priority; ssk->sk_bound_dev_if = sk->sk_bound_dev_if;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 6f021e95d0828edc8ed104a294594c2f9569383a upstream.
The net.mptcp.pm_type sysctl knob has been deprecated in v6.15, net.mptcp.path_manager should be used instead.
Adapt the section about path managers to suggest using the new sysctl knob instead of the deprecated one.
Fixes: 595c26d122d1 ("mptcp: sysctl: set path manager by name") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-2-5f2168a... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/networking/mptcp.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/Documentation/networking/mptcp.rst +++ b/Documentation/networking/mptcp.rst @@ -60,10 +60,10 @@ address announcements. Typically, it is and the server side that announces additional addresses via the ``ADD_ADDR`` and ``REMOVE_ADDR`` options.
-Path managers are controlled by the ``net.mptcp.pm_type`` sysctl knob -- see -mptcp-sysctl.rst. There are two types: the in-kernel one (type ``0``) where the -same rules are applied for all the connections (see: ``ip mptcp``) ; and the -userspace one (type ``1``), controlled by a userspace daemon (i.e. `mptcpd +Path managers are controlled by the ``net.mptcp.path_manager`` sysctl knob -- +see mptcp-sysctl.rst. There are two types: the in-kernel one (``kernel``) where +the same rules are applied for all the connections (see: ``ip mptcp``) ; and the +userspace one (``userspace``), controlled by a userspace daemon (i.e. `mptcpd https://mptcpd.mptcp.dev/`_) where different rules can be applied for each connection. The path managers can be controlled via a Netlink API; see netlink_spec/mptcp_pm.rst.
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 7094b84863e5832cb1cd9c4b9d648904775b6bd9 upstream.
This attribute is used as a signed number in the code in pm_netlink.c:
nla_put_s32(skb, MPTCP_ATTR_IF_IDX, ssk->sk_bound_dev_if))
The specs should then reflect that. Note that other 'if-idx' attributes from the same .yaml file use a signed number as well.
Fixes: bc8aeb2045e2 ("Documentation: netlink: add a YAML spec for mptcp") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-1-5f2168a... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/netlink/specs/mptcp_pm.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -256,7 +256,7 @@ attribute-sets: type: u32 - name: if-idx - type: u32 + type: s32 - name: reset-reason type: u32
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Tinguely mark.tinguely@oracle.com
commit 04100f775c2ea501927f508f17ad824ad1f23c8d upstream.
syzbot detected a OCFS2 hang due to a recursive semaphore on a FS_IOC_FIEMAP of the extent list on a specially crafted mmap file.
context_switch kernel/sched/core.c:5357 [inline] __schedule+0x1798/0x4cc0 kernel/sched/core.c:6961 __schedule_loop kernel/sched/core.c:7043 [inline] schedule+0x165/0x360 kernel/sched/core.c:7058 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115 rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185 __down_write_common kernel/locking/rwsem.c:1317 [inline] __down_write kernel/locking/rwsem.c:1326 [inline] down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591 ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142 do_page_mkwrite+0x14d/0x310 mm/memory.c:3361 wp_page_shared mm/memory.c:3762 [inline] do_wp_page+0x268d/0x5800 mm/memory.c:3981 handle_pte_fault mm/memory.c:6068 [inline] __handle_mm_fault+0x1033/0x5440 mm/memory.c:6195 handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364 do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387 handle_page_fault arch/x86/mm/fault.c:1476 [inline] exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623 RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline] RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline] RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline] RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26 Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89 f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f 1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41 RSP: 0018:ffffc9000403f950 EFLAGS: 00050256 RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038 RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060 RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42 R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098 R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060 copy_to_user include/linux/uaccess.h:225 [inline] fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145 ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532 __do_sys_ioctl fs/ioctl.c:596 [inline] __se_sys_ioctl+0x82/0x170 fs/ioctl.c:584 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f13850fd9 RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9 RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004 RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0 R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b
ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent() to read the extent list of this running mmap executable. The user supplied buffer to hold the fiemap information page faults calling ocfs2_page_mkwrite() which will take a write lock (since v2.6.27-38-g00dc417fa3e7) of the same semaphore. This recursive semaphore will hold filesystem locks and causes a hang of the fileystem.
The ip_alloc_sem protects the inode extent list and size. Release the read semphore before calling fiemap_fill_next_extent() in ocfs2_fiemap() and ocfs2_fiemap_inline(). This does an unnecessary semaphore lock/unlock on the last extent but simplifies the error path.
Link: https://lkml.kernel.org/r/61d1a62b-2631-4f12-81e2-cd689914360b@oracle.com Fixes: 00dc417fa3e7 ("ocfs2: fiemap support") Signed-off-by: Mark Tinguely mark.tinguely@oracle.com Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7 Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/extent_map.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/extent_map.c +++ b/fs/ocfs2/extent_map.c @@ -706,6 +706,8 @@ out: * it not only handles the fiemap for inlined files, but also deals * with the fast symlink, cause they have no difference for extent * mapping per se. + * + * Must be called with ip_alloc_sem semaphore held. */ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh, struct fiemap_extent_info *fieinfo, @@ -717,6 +719,7 @@ static int ocfs2_fiemap_inline(struct in u64 phys; u32 flags = FIEMAP_EXTENT_DATA_INLINE|FIEMAP_EXTENT_LAST; struct ocfs2_inode_info *oi = OCFS2_I(inode); + lockdep_assert_held_read(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data; if (ocfs2_inode_is_fast_symlink(inode)) @@ -732,8 +735,11 @@ static int ocfs2_fiemap_inline(struct in phys += offsetof(struct ocfs2_dinode, id2.i_data.id_data);
+ /* Release the ip_alloc_sem to prevent deadlock on page fault */ + up_read(&OCFS2_I(inode)->ip_alloc_sem); ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count, flags); + down_read(&OCFS2_I(inode)->ip_alloc_sem); if (ret < 0) return ret; } @@ -802,9 +808,11 @@ int ocfs2_fiemap(struct inode *inode, st len_bytes = (u64)le16_to_cpu(rec.e_leaf_clusters) << osb->s_clustersize_bits; phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb->s_blocksize_bits; virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb->s_clustersize_bits; - + /* Release the ip_alloc_sem to prevent deadlock on page fault */ + up_read(&OCFS2_I(inode)->ip_alloc_sem); ret = fiemap_fill_next_extent(fieinfo, virt_bytes, phys_bytes, len_bytes, fe_flags); + down_read(&OCFS2_I(inode)->ip_alloc_sem); if (ret) break;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit de134cb54c3a67644ff95b1c9bffe545e752c912 upstream.
The following workload on a squota enabled fs:
btrfs subvol create mnt/subvol
# ensure subvol extents get accounted sync btrfs qgroup create 1/1 mnt btrfs qgroup assign mnt/subvol 1/1 mnt btrfs qgroup delete mnt/subvol
# make the cleaner thread run btrfs filesystem sync mnt sleep 1 btrfs filesystem sync mnt btrfs qgroup destroy 1/1 mnt
will fail with EBUSY. The reason is that 1/1 does the quick accounting when we assign subvol to it, gaining its exclusive usage as excl and excl_cmpr. But then when we delete subvol, the decrement happens via record_squota_delta() which does not update excl_cmpr, as squotas does not make any distinction between compressed and normal extents. Thus, we increment excl_cmpr but never decrement it, and are unable to delete 1/1. The two possible fixes are to make squota always mirror excl and excl_cmpr or to make the fast accounting separately track the plain and cmpr numbers. The latter felt cleaner to me so that is what I opted for.
Fixes: 1e0e9d5771c3 ("btrfs: add helper for recording simple quota deltas") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/qgroup.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1483,6 +1483,7 @@ static int __qgroup_excl_accounting(stru struct btrfs_qgroup *qgroup; LIST_HEAD(qgroup_list); u64 num_bytes = src->excl; + u64 num_bytes_cmpr = src->excl_cmpr; int ret = 0;
qgroup = find_qgroup_rb(fs_info, ref_root); @@ -1494,11 +1495,12 @@ static int __qgroup_excl_accounting(stru struct btrfs_qgroup_list *glist;
qgroup->rfer += sign * num_bytes; - qgroup->rfer_cmpr += sign * num_bytes; + qgroup->rfer_cmpr += sign * num_bytes_cmpr;
WARN_ON(sign < 0 && qgroup->excl < num_bytes); + WARN_ON(sign < 0 && qgroup->excl_cmpr < num_bytes_cmpr); qgroup->excl += sign * num_bytes; - qgroup->excl_cmpr += sign * num_bytes; + qgroup->excl_cmpr += sign * num_bytes_cmpr;
if (sign > 0) qgroup_rsv_add_by_qgroup(fs_info, qgroup, src);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Omar Sandoval osandov@fb.com
commit f6a6c280059c4ddc23e12e3de1b01098e240036f upstream.
There is a race condition between inode eviction and inode caching that can cause a live struct btrfs_inode to be missing from the root->inodes xarray. Specifically, there is a window during evict() between the inode being unhashed and deleted from the xarray. If btrfs_iget() is called for the same inode in that window, it will be recreated and inserted into the xarray, but then eviction will delete the new entry, leaving nothing in the xarray:
Thread 1 Thread 2 --------------------------------------------------------------- evict() remove_inode_hash() btrfs_iget_path() btrfs_iget_locked() btrfs_read_locked_inode() btrfs_add_inode_to_root() destroy_inode() btrfs_destroy_inode() btrfs_del_inode_from_root() __xa_erase
In turn, this can cause issues for subvolume deletion. Specifically, if an inode is in this lost state, and all other inodes are evicted, then btrfs_del_inode_from_root() will call btrfs_add_dead_root() prematurely. If the lost inode has a delayed_node attached to it, then when btrfs_clean_one_deleted_snapshot() calls btrfs_kill_all_delayed_nodes(), it will loop forever because the delayed_nodes xarray will never become empty (unless memory pressure forces the inode out). We saw this manifest as soft lockups in production.
Fix it by only deleting the xarray entry if it matches the given inode (using __xa_cmpxchg()).
Fixes: 310b2f5d5a94 ("btrfs: use an xarray to track open inodes in a root") Cc: stable@vger.kernel.org # 6.11+ Reviewed-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Filipe Manana fdmanana@suse.com Co-authored-by: Leo Martins loemra.dev@gmail.com Signed-off-by: Leo Martins loemra.dev@gmail.com Signed-off-by: Omar Sandoval osandov@fb.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5685,7 +5685,17 @@ static void btrfs_del_inode_from_root(st bool empty = false;
xa_lock(&root->inodes); - entry = __xa_erase(&root->inodes, btrfs_ino(inode)); + /* + * This btrfs_inode is being freed and has already been unhashed at this + * point. It's possible that another btrfs_inode has already been + * allocated for the same inode and inserted itself into the root, so + * don't delete it in that case. + * + * Note that this shouldn't need to allocate memory, so the gfp flags + * don't really matter. + */ + entry = __xa_cmpxchg(&root->inodes, btrfs_ino(inode), inode, NULL, + GFP_ATOMIC); if (entry == inode) empty = xa_empty(&root->inodes); xa_unlock(&root->inodes);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chiasheng Lee chiasheng.lee@linux.intel.com
commit 664596bd98bb251dd417dfd3f9b615b661e1e44a upstream.
Hide the Intel Birch Stream SoC TCO WDT feature since it was removed.
On platforms with PCH TCO WDT, this redundant device might be rendering errors like this:
[ 28.144542] sysfs: cannot create duplicate filename '/bus/platform/devices/iTCO_wdt'
Fixes: 8c56f9ef25a3 ("i2c: i801: Add support for Intel Birch Stream SoC") Link: https://bugzilla.kernel.org/show_bug.cgi?id=220320 Signed-off-by: Chiasheng Lee chiasheng.lee@linux.intel.com Cc: stable@vger.kernel.org # v6.7+ Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@kernel.org Link: https://lore.kernel.org/r/20250901125943.916522-1-chiasheng.lee@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-i801.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -1052,7 +1052,7 @@ static const struct pci_device_id i801_i { PCI_DEVICE_DATA(INTEL, METEOR_LAKE_P_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, { PCI_DEVICE_DATA(INTEL, METEOR_LAKE_SOC_S_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, { PCI_DEVICE_DATA(INTEL, METEOR_LAKE_PCH_S_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, - { PCI_DEVICE_DATA(INTEL, BIRCH_STREAM_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, + { PCI_DEVICE_DATA(INTEL, BIRCH_STREAM_SMBUS, FEATURES_ICH5) }, { PCI_DEVICE_DATA(INTEL, ARROW_LAKE_H_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, { PCI_DEVICE_DATA(INTEL, PANTHER_LAKE_H_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, { PCI_DEVICE_DATA(INTEL, PANTHER_LAKE_P_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) },
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Jelonek jelonek.jonas@gmail.com
commit 06418cb5a1a542a003fdb4ad8e76ea542d57cfba upstream.
Add an explicit check for the xfer length to 'rtl9300_i2c_config_xfer' to ensure the data length isn't within the supported range. In particular a data length of 0 is not supported by the hardware and causes unintended or destructive behaviour.
This limitation becomes obvious when looking at the register documentation [1]. 4 bits are reserved for DATA_WIDTH and the value of these 4 bits is used as N + 1, allowing a data length range of 1 <= len <= 16.
Affected by this is the SMBus Quick Operation which works with a data length of 0. Passing 0 as the length causes an underflow of the value due to:
(len - 1) & 0xf
and effectively specifying a transfer length of 16 via the registers. This causes a 16-byte write operation instead of a Quick Write. For example, on SFP modules without write-protected EEPROM this soft-bricks them by overwriting some initial bytes.
For completeness, also add a quirk for the zero length.
[1] https://svanheule.net/realtek/longan/register/i2c_mst1_ctrl2
Fixes: c366be720235 ("i2c: Add driver for the RTL9300 I2C controller") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Jonas Jelonek jelonek.jonas@gmail.com Tested-by: Sven Eckelmann sven@narfation.org Reviewed-by: Chris Packham chris.packham@alliedtelesis.co.nz Tested-by: Chris Packham chris.packham@alliedtelesis.co.nz # On RTL9302C based board Tested-by: Markus Stockhausen markus.stockhausen@gmx.de Signed-off-by: Andi Shyti andi.shyti@kernel.org Link: https://lore.kernel.org/r/20250831100457.3114-3-jelonek.jonas@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-rtl9300.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/i2c/busses/i2c-rtl9300.c +++ b/drivers/i2c/busses/i2c-rtl9300.c @@ -99,6 +99,9 @@ static int rtl9300_i2c_config_xfer(struc { u32 val, mask;
+ if (len < 1 || len > 16) + return -EINVAL; + val = chan->bus_freq << RTL9300_I2C_MST_CTRL2_SCL_FREQ_OFS; mask = RTL9300_I2C_MST_CTRL2_SCL_FREQ_MASK;
@@ -323,7 +326,7 @@ static const struct i2c_algorithm rtl930 };
static struct i2c_adapter_quirks rtl9300_i2c_quirks = { - .flags = I2C_AQ_NO_CLK_STRETCH, + .flags = I2C_AQ_NO_CLK_STRETCH | I2C_AQ_NO_ZERO_LEN, .max_read_len = 16, .max_write_len = 16, };
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Jelonek jelonek.jonas@gmail.com
commit ede965fd555ac2536cf651893a998dbfd8e57b86 upstream.
Remove the SMBus Quick operation from this driver because it is not natively supported by the hardware and is wrongly implemented in the driver.
The I2C controllers in Realtek RTL9300 and RTL9310 are SMBus-compliant but there doesn't seem to be native support for the SMBus Quick operation. It is not explicitly mentioned in the documentation but looking at the registers which configure an SMBus transaction, one can see that the data length cannot be set to 0. This suggests that the hardware doesn't allow any SMBus message without data bytes (except for those it does on it's own, see SMBus Block Read).
The current implementation of SMBus Quick operation passes a length of 0 (which is actually invalid). Before the fix of a bug in a previous commit, this led to a read operation of 16 bytes from any register (the one of a former transaction or any other value.
This caused issues like soft-bricked SFP modules after a simple probe with i2cdetect which uses Quick by default. Running this with SFP modules whose EEPROM isn't write-protected, some of the initial bytes are overwritten because a 16-byte write operation is executed instead of a Quick Write. (This temporarily soft-bricked one of my DAC cables.)
Because SMBus Quick operation is obviously not supported on these controllers (because a length of 0 cannot be set, even when no register address is set), remove that instead of claiming there is support. There also shouldn't be any kind of emulated 'Quick' which just does another kind of operation in the background. Otherwise, specific issues occur in case of a 'Quick' Write which actually writes unknown data to an unknown register.
Fixes: c366be720235 ("i2c: Add driver for the RTL9300 I2C controller") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Jonas Jelonek jelonek.jonas@gmail.com Tested-by: Sven Eckelmann sven@narfation.org Reviewed-by: Chris Packham chris.packham@alliedtelesis.co.nz Tested-by: Chris Packham chris.packham@alliedtelesis.co.nz # On RTL9302C based board Tested-by: Markus Stockhausen markus.stockhausen@gmx.de Signed-off-by: Andi Shyti andi.shyti@kernel.org Link: https://lore.kernel.org/r/20250831100457.3114-4-jelonek.jonas@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-rtl9300.c | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/drivers/i2c/busses/i2c-rtl9300.c b/drivers/i2c/busses/i2c-rtl9300.c index 2b3e80aa1bdf..9e1f71fed0fe 100644 --- a/drivers/i2c/busses/i2c-rtl9300.c +++ b/drivers/i2c/busses/i2c-rtl9300.c @@ -225,15 +225,6 @@ static int rtl9300_i2c_smbus_xfer(struct i2c_adapter *adap, u16 addr, unsigned s }
switch (size) { - case I2C_SMBUS_QUICK: - ret = rtl9300_i2c_config_xfer(i2c, chan, addr, 0); - if (ret) - goto out_unlock; - ret = rtl9300_i2c_reg_addr_set(i2c, 0, 0); - if (ret) - goto out_unlock; - break; - case I2C_SMBUS_BYTE: if (read_write == I2C_SMBUS_WRITE) { ret = rtl9300_i2c_config_xfer(i2c, chan, addr, 0); @@ -315,9 +306,9 @@ static int rtl9300_i2c_smbus_xfer(struct i2c_adapter *adap, u16 addr, unsigned s
static u32 rtl9300_i2c_func(struct i2c_adapter *a) { - return I2C_FUNC_SMBUS_QUICK | I2C_FUNC_SMBUS_BYTE | - I2C_FUNC_SMBUS_BYTE_DATA | I2C_FUNC_SMBUS_WORD_DATA | - I2C_FUNC_SMBUS_BLOCK_DATA; + return I2C_FUNC_SMBUS_BYTE | I2C_FUNC_SMBUS_BYTE_DATA | + I2C_FUNC_SMBUS_WORD_DATA | I2C_FUNC_SMBUS_BLOCK_DATA | + I2C_FUNC_SMBUS_I2C_BLOCK; }
static const struct i2c_algorithm rtl9300_i2c_algo = {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiawen Wu jiawenwu@trustnetic.com
commit 157cf360c4a8751f7f511a71cc3a283b5d27f889 upstream.
Now when SRIOV is enabled, PF with multiple queues can only receive all packets on queue 0. This is caused by an incorrect flag judgement, which prevents RSS from being enabled.
In fact, RSS is supported for the functions when SRIOV is enabled. Remove the flag judgement to fix it.
Fixes: c52d4b898901 ("net: libwx: Redesign flow when sriov is enabled") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu jiawenwu@trustnetic.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/A3B7449A08A044D0+20250904024322.87145-1-jiawenwu@tr... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/wangxun/libwx/wx_hw.c | 4 ---- 1 file changed, 4 deletions(-)
--- a/drivers/net/ethernet/wangxun/libwx/wx_hw.c +++ b/drivers/net/ethernet/wangxun/libwx/wx_hw.c @@ -2071,10 +2071,6 @@ static void wx_setup_mrqc(struct wx *wx) { u32 rss_field = 0;
- /* VT, and RSS do not coexist at the same time */ - if (test_bit(WX_FLAG_VMDQ_ENABLED, wx->flags)) - return; - /* Disable indicating checksum in descriptor, enables RSS hash */ wr32m(wx, WX_PSR_CTL, WX_PSR_CTL_PCSD, WX_PSR_CTL_PCSD);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleksij Rempel o.rempel@pengutronix.de
commit 5537a4679403423e0b49c95b619983a4583d69c5 upstream.
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver: - .ndo_open/.ndo_stop control the phylink start/stop lifecycle. - ethtool/phylib entry points run in process context, not PM. - phylink MAC ops program the MAC on link changes after resume.
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink") Reported-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Tested-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com Tested-by: Xu Yang xu.yang_2@nxp.com Link: https://patch.msgid.link/20250908112619.2900723-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/asix_devices.c | 13 ------------- 1 file changed, 13 deletions(-)
--- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -607,15 +607,8 @@ static const struct net_device_ops ax887
static void ax88772_suspend(struct usbnet *dev) { - struct asix_common_private *priv = dev->driver_priv; u16 medium;
- if (netif_running(dev->net)) { - rtnl_lock(); - phylink_suspend(priv->phylink, false); - rtnl_unlock(); - } - /* Stop MAC operation */ medium = asix_read_medium_status(dev, 1); medium &= ~AX_MEDIUM_RE; @@ -644,12 +637,6 @@ static void ax88772_resume(struct usbnet for (i = 0; i < 3; i++) if (!priv->reset(dev, 1)) break; - - if (netif_running(dev->net)) { - rtnl_lock(); - phylink_resume(priv->phylink); - rtnl_unlock(); - } }
static int asix_resume(struct usb_interface *intf)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit e0423541477dfb684fbc6e6b5386054bc650f264 upstream.
The intel_pstate driver manages CPU capacity changes itself and it does not need an update of the capacity of all CPUs in the system to be carried out after registering a PD.
Moreover, in some configurations (for instance, an SMT-capable hybrid x86 system booted with nosmt in the kernel command line) the em_check_capacity_update() call at the end of em_dev_register_perf_domain() always fails and reschedules itself to run once again in 1 s, so effectively it runs in vain every 1 s forever.
To address this, introduce a new variant of em_dev_register_perf_domain(), called em_dev_register_pd_no_update(), that does not invoke em_check_capacity_update(), and make intel_pstate use it instead of the original.
Fixes: 7b010f9b9061 ("cpufreq: intel_pstate: EAS support for hybrid platforms") Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.... Reported-by: Kenneth R. Crudup kenny@panix.com Tested-by: Kenneth R. Crudup kenny@panix.com Cc: 6.16+ stable@vger.kernel.org # 6.16+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cpufreq/intel_pstate.c | 4 ++-- include/linux/energy_model.h | 10 ++++++++++ kernel/power/energy_model.c | 29 +++++++++++++++++++++++++---- 3 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index f366d35c5840..0d5d283a5429 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1034,8 +1034,8 @@ static bool hybrid_register_perf_domain(unsigned int cpu) if (!cpu_dev) return false;
- if (em_dev_register_perf_domain(cpu_dev, HYBRID_EM_STATE_COUNT, &cb, - cpumask_of(cpu), false)) + if (em_dev_register_pd_no_update(cpu_dev, HYBRID_EM_STATE_COUNT, &cb, + cpumask_of(cpu), false)) return false;
cpudata->pd_registered = true; diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 7fa1eb3cc823..61d50571ad88 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -171,6 +171,9 @@ int em_dev_update_perf_domain(struct device *dev, int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, const struct em_data_callback *cb, const cpumask_t *cpus, bool microwatts); +int em_dev_register_pd_no_update(struct device *dev, unsigned int nr_states, + const struct em_data_callback *cb, + const cpumask_t *cpus, bool microwatts); void em_dev_unregister_perf_domain(struct device *dev); struct em_perf_table *em_table_alloc(struct em_perf_domain *pd); void em_table_free(struct em_perf_table *table); @@ -350,6 +353,13 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, { return -EINVAL; } +static inline +int em_dev_register_pd_no_update(struct device *dev, unsigned int nr_states, + const struct em_data_callback *cb, + const cpumask_t *cpus, bool microwatts) +{ + return -EINVAL; +} static inline void em_dev_unregister_perf_domain(struct device *dev) { } diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index ea7995a25780..8df55397414a 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -552,6 +552,30 @@ EXPORT_SYMBOL_GPL(em_cpu_get); int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, const struct em_data_callback *cb, const cpumask_t *cpus, bool microwatts) +{ + int ret = em_dev_register_pd_no_update(dev, nr_states, cb, cpus, microwatts); + + if (_is_cpu_device(dev)) + em_check_capacity_update(); + + return ret; +} +EXPORT_SYMBOL_GPL(em_dev_register_perf_domain); + +/** + * em_dev_register_pd_no_update() - Register a perf domain for a device + * @dev : Device to register the PD for + * @nr_states : Number of performance states in the new PD + * @cb : Callback functions for populating the energy model + * @cpus : CPUs to include in the new PD (mandatory if @dev is a CPU device) + * @microwatts : Whether or not the power values in the EM will be in uW + * + * Like em_dev_register_perf_domain(), but does not trigger a CPU capacity + * update after registering the PD, even if @dev is a CPU device. + */ +int em_dev_register_pd_no_update(struct device *dev, unsigned int nr_states, + const struct em_data_callback *cb, + const cpumask_t *cpus, bool microwatts) { struct em_perf_table *em_table; unsigned long cap, prev_cap = 0; @@ -636,12 +660,9 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, unlock: mutex_unlock(&em_pd_mutex);
- if (_is_cpu_device(dev)) - em_check_capacity_update(); - return ret; } -EXPORT_SYMBOL_GPL(em_dev_register_perf_domain); +EXPORT_SYMBOL_GPL(em_dev_register_pd_no_update);
/** * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 449c9c02537a146ac97ef962327a221e21c9cab3 upstream.
Commit 12ffc3b1513e ("PM: Restrict swap use to later in the suspend sequence") incorrectly removed a pm_restrict_gfp_mask() call from hibernation_snapshot(), so memory allocations involving swap are not prevented from being carried out in this code path any more which may lead to serious breakage.
The symptoms of such breakage have become visible after adding a shrink_shmem_memory() call to hibernation_snapshot() in commit 2640e819474f ("PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()") which caused this problem to be much more likely to manifest itself.
However, since commit 2640e819474f was initially present in the DRM tree that did not include commit 12ffc3b1513e, the symptoms of this issue were not visible until merge commit 260f6f4fda93 ("Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel") that exposed it through an entirely reasonable merge conflict resolution.
Fixes: 12ffc3b1513e ("PM: Restrict swap use to later in the suspend sequence") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220555 Reported-by: Todd Brandt todd.e.brandt@linux.intel.com Tested-by: Todd Brandt todd.e.brandt@linux.intel.com Cc: 6.16+ stable@vger.kernel.org # 6.16+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Mario Limonciello (AMD) superm1@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/power/hibernate.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -423,6 +423,7 @@ int hibernation_snapshot(int platform_mo }
console_suspend_all(); + pm_restrict_gfp_mask();
error = dpm_suspend(PMSG_FREEZE);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit 2682e7a317504a9d81cbb397249d4299e84dfadd upstream.
The 130/1030 devices are really derivatives of 6030, with some small differences not pertaining to the MAC, so they must use the 6030 MAC config.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220472 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220517 Fixes: 35ac275ebe0c ("wifi: iwlwifi: cfg: finish config split") Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20250909121728.8e4911f12528.I3aa7194012a4b584fbd5dd... Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c index f9e2095d6490..7e56e4ff7642 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c @@ -124,13 +124,13 @@ VISIBLE_IF_IWLWIFI_KUNIT const struct pci_device_id iwl_hw_card_ids[] = { {IWL_PCI_DEVICE(0x0082, 0x1304, iwl6005_mac_cfg)},/* low 5GHz active */ {IWL_PCI_DEVICE(0x0082, 0x1305, iwl6005_mac_cfg)},/* high 5GHz active */
-/* 6x30 Series */ - {IWL_PCI_DEVICE(0x008A, 0x5305, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x008A, 0x5307, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x008A, 0x5325, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x008A, 0x5327, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x008B, 0x5315, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x008B, 0x5317, iwl1000_mac_cfg)}, +/* 1030/6x30 Series */ + {IWL_PCI_DEVICE(0x008A, 0x5305, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x008A, 0x5307, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x008A, 0x5325, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x008A, 0x5327, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x008B, 0x5315, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x008B, 0x5317, iwl6030_mac_cfg)}, {IWL_PCI_DEVICE(0x0090, 0x5211, iwl6030_mac_cfg)}, {IWL_PCI_DEVICE(0x0090, 0x5215, iwl6030_mac_cfg)}, {IWL_PCI_DEVICE(0x0090, 0x5216, iwl6030_mac_cfg)}, @@ -181,12 +181,12 @@ VISIBLE_IF_IWLWIFI_KUNIT const struct pci_device_id iwl_hw_card_ids[] = { {IWL_PCI_DEVICE(0x08AE, 0x1027, iwl1000_mac_cfg)},
/* 130 Series WiFi */ - {IWL_PCI_DEVICE(0x0896, 0x5005, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x0896, 0x5007, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x0897, 0x5015, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x0897, 0x5017, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x0896, 0x5025, iwl1000_mac_cfg)}, - {IWL_PCI_DEVICE(0x0896, 0x5027, iwl1000_mac_cfg)}, + {IWL_PCI_DEVICE(0x0896, 0x5005, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x0896, 0x5007, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x0897, 0x5015, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x0897, 0x5017, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x0896, 0x5025, iwl6030_mac_cfg)}, + {IWL_PCI_DEVICE(0x0896, 0x5027, iwl6030_mac_cfg)},
/* 2x00 Series */ {IWL_PCI_DEVICE(0x0890, 0x4022, iwl2000_mac_cfg)},
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
commit e67f0bd05519012eaabaae68618ffc4ed30ab680 upstream.
The kexec_buf structure was previously declared without initialization. commit bf454ec31add ("kexec_file: allow to place kexec_buf randomly") added a field that is always read but not consistently populated by all architectures. This un-initialized field will contain garbage.
This is also triggering a UBSAN warning when the uninitialized data was accessed:
------------[ cut here ]------------ UBSAN: invalid-load in ./include/linux/kexec.h:210:10 load of value 252 is not a valid value for type '_Bool'
Zero-initializing kexec_buf at declaration ensures all fields are cleanly set, preventing future instances of uninitialized memory being used.
Link: https://lkml.kernel.org/r/20250827-kbuf_all-v1-3-1df9882bb01a@debian.org Fixes: bf454ec31add ("kexec_file: allow to place kexec_buf randomly") Signed-off-by: Breno Leitao leitao@debian.org Cc: Albert Ou aou@eecs.berkeley.edu Cc: Alexander Gordeev agordeev@linux.ibm.com Cc: Alexandre Ghiti alex@ghiti.fr Cc: Baoquan He bhe@redhat.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Coiby Xu coxu@redhat.com Cc: Heiko Carstens hca@linux.ibm.com Cc: Palmer Dabbelt palmer@dabbelt.com Cc: Paul Walmsley paul.walmsley@sifive.com Cc: Sven Schnelle svens@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/kexec_elf.c | 2 +- arch/s390/kernel/kexec_image.c | 2 +- arch/s390/kernel/machine_kexec_file.c | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/s390/kernel/kexec_elf.c b/arch/s390/kernel/kexec_elf.c index 4d364de43799..143e34a4eca5 100644 --- a/arch/s390/kernel/kexec_elf.c +++ b/arch/s390/kernel/kexec_elf.c @@ -16,7 +16,7 @@ static int kexec_file_add_kernel_elf(struct kimage *image, struct s390_load_data *data) { - struct kexec_buf buf; + struct kexec_buf buf = {}; const Elf_Ehdr *ehdr; const Elf_Phdr *phdr; Elf_Addr entry; diff --git a/arch/s390/kernel/kexec_image.c b/arch/s390/kernel/kexec_image.c index a32ce8bea745..9a439175723c 100644 --- a/arch/s390/kernel/kexec_image.c +++ b/arch/s390/kernel/kexec_image.c @@ -16,7 +16,7 @@ static int kexec_file_add_kernel_image(struct kimage *image, struct s390_load_data *data) { - struct kexec_buf buf; + struct kexec_buf buf = {};
buf.image = image;
diff --git a/arch/s390/kernel/machine_kexec_file.c b/arch/s390/kernel/machine_kexec_file.c index c2bac14dd668..a36d7311c668 100644 --- a/arch/s390/kernel/machine_kexec_file.c +++ b/arch/s390/kernel/machine_kexec_file.c @@ -129,7 +129,7 @@ static int kexec_file_update_purgatory(struct kimage *image, static int kexec_file_add_purgatory(struct kimage *image, struct s390_load_data *data) { - struct kexec_buf buf; + struct kexec_buf buf = {}; int ret;
buf.image = image; @@ -152,7 +152,7 @@ static int kexec_file_add_purgatory(struct kimage *image, static int kexec_file_add_initrd(struct kimage *image, struct s390_load_data *data) { - struct kexec_buf buf; + struct kexec_buf buf = {}; int ret;
buf.image = image; @@ -184,7 +184,7 @@ static int kexec_file_add_ipl_report(struct kimage *image, { __u32 *lc_ipl_parmblock_ptr; unsigned int len, ncerts; - struct kexec_buf buf; + struct kexec_buf buf = {}; unsigned long addr; void *ptr, *end; int ret;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.org
commit 90f7c100d2dd99d5cd5be950d553edd2647e6cc8 upstream.
The encryption layer can't handle the padding iovs, so flatten the compound request into a single buffer with required padding to prevent the server from dropping the connection when finding unaligned compound requests.
Fixes: bc925c1216f0 ("smb: client: improve compound padding in encryption") Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.org Reviewed-by: David Howells dhowells@redhat.com Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/smb2ops.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-)
--- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -2640,13 +2640,35 @@ smb2_set_next_command(struct cifs_tcon * }
/* SMB headers in a compound are 8 byte aligned. */ - if (!IS_ALIGNED(len, 8)) { - num_padding = 8 - (len & 7); + if (IS_ALIGNED(len, 8)) + goto out; + + num_padding = 8 - (len & 7); + if (smb3_encryption_required(tcon)) { + int i; + + /* + * Flatten request into a single buffer with required padding as + * the encryption layer can't handle the padding iovs. + */ + for (i = 1; i < rqst->rq_nvec; i++) { + memcpy(rqst->rq_iov[0].iov_base + + rqst->rq_iov[0].iov_len, + rqst->rq_iov[i].iov_base, + rqst->rq_iov[i].iov_len); + rqst->rq_iov[0].iov_len += rqst->rq_iov[i].iov_len; + } + memset(rqst->rq_iov[0].iov_base + rqst->rq_iov[0].iov_len, + 0, num_padding); + rqst->rq_iov[0].iov_len += num_padding; + rqst->rq_nvec = 1; + } else { rqst->rq_iov[rqst->rq_nvec].iov_base = smb2_padding; rqst->rq_iov[rqst->rq_nvec].iov_len = num_padding; rqst->rq_nvec++; - len += num_padding; } + len += num_padding; +out: shdr->NextCommand = cpu_to_le32(len); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.org
commit c5ea3065586d790ea5193a679b85585173d59866 upstream.
Rename of open files in SMB2+ has been broken for a very long time, resulting in data loss as the CIFS client would fail the rename(2) call with -ENOENT and then removing the target file.
Fix this by implementing ->rename_pending_delete() for SMB2+, which will rename busy files to random filenames (e.g. silly rename) during unlink(2) or rename(2), and then marking them to delete-on-close.
Besides, introduce a FIND_WR_NO_PENDING_DELETE flag to prevent open(2) from reusing open handles that had been marked as delete pending. Handle it in cifs_get_readable_path() as well.
Reported-by: Jean-Baptiste Denis jbdenis@pasteur.fr Closes: https://marc.info/?i=16aeb380-30d4-4551-9134-4e7d1dc833c0@pasteur.fr Reviewed-by: David Howells dhowells@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.org Cc: Frank Sorenson sorenson@redhat.com Cc: Olga Kornievskaia okorniev@redhat.com Cc: Benjamin Coddington bcodding@redhat.com Cc: Scott Mayhew smayhew@redhat.com Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/cifsglob.h | 13 ++ fs/smb/client/file.c | 18 +++- fs/smb/client/inode.c | 86 +++++++++++++++---- fs/smb/client/smb2glob.h | 3 fs/smb/client/smb2inode.c | 204 ++++++++++++++++++++++++++++++++++++---------- fs/smb/client/smb2ops.c | 4 fs/smb/client/smb2proto.h | 3 fs/smb/client/trace.h | 9 -- 8 files changed, 268 insertions(+), 72 deletions(-)
--- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -87,7 +87,7 @@ #define SMB_INTERFACE_POLL_INTERVAL 600
/* maximum number of PDUs in one compound */ -#define MAX_COMPOUND 7 +#define MAX_COMPOUND 10
/* * Default number of credits to keep available for SMB3. @@ -1877,9 +1877,12 @@ static inline bool is_replayable_error(i
/* cifs_get_writable_file() flags */ -#define FIND_WR_ANY 0 -#define FIND_WR_FSUID_ONLY 1 -#define FIND_WR_WITH_DELETE 2 +enum cifs_writable_file_flags { + FIND_WR_ANY = 0U, + FIND_WR_FSUID_ONLY = (1U << 0), + FIND_WR_WITH_DELETE = (1U << 1), + FIND_WR_NO_PENDING_DELETE = (1U << 2), +};
#define MID_FREE 0 #define MID_REQUEST_ALLOCATED 1 @@ -2339,6 +2342,8 @@ struct smb2_compound_vars { struct kvec qi_iov; struct kvec io_iov[SMB2_IOCTL_IOV_SIZE]; struct kvec si_iov[SMB2_SET_INFO_IOV_SIZE]; + struct kvec unlink_iov[SMB2_SET_INFO_IOV_SIZE]; + struct kvec rename_iov[SMB2_SET_INFO_IOV_SIZE]; struct kvec close_iov; struct smb2_file_rename_info_hdr rename_info; struct smb2_file_link_info_hdr link_info; --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -998,7 +998,10 @@ int cifs_open(struct inode *inode, struc
/* Get the cached handle as SMB2 close is deferred */ if (OPEN_FMODE(file->f_flags) & FMODE_WRITE) { - rc = cifs_get_writable_path(tcon, full_path, FIND_WR_FSUID_ONLY, &cfile); + rc = cifs_get_writable_path(tcon, full_path, + FIND_WR_FSUID_ONLY | + FIND_WR_NO_PENDING_DELETE, + &cfile); } else { rc = cifs_get_readable_path(tcon, full_path, &cfile); } @@ -2530,6 +2533,9 @@ refind_writable: continue; if (with_delete && !(open_file->fid.access & DELETE)) continue; + if ((flags & FIND_WR_NO_PENDING_DELETE) && + open_file->status_file_deleted) + continue; if (OPEN_FMODE(open_file->f_flags) & FMODE_WRITE) { if (!open_file->invalidHandle) { /* found a good writable file */ @@ -2647,6 +2653,16 @@ cifs_get_readable_path(struct cifs_tcon spin_unlock(&tcon->open_file_lock); free_dentry_path(page); *ret_file = find_readable_file(cinode, 0); + if (*ret_file) { + spin_lock(&cinode->open_file_lock); + if ((*ret_file)->status_file_deleted) { + spin_unlock(&cinode->open_file_lock); + cifsFileInfo_put(*ret_file); + *ret_file = NULL; + } else { + spin_unlock(&cinode->open_file_lock); + } + } return *ret_file ? 0 : -ENOENT; }
--- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -1931,7 +1931,7 @@ cifs_drop_nlink(struct inode *inode) * but will return the EACCES to the caller. Note that the VFS does not call * unlink on negative dentries currently. */ -int cifs_unlink(struct inode *dir, struct dentry *dentry) +static int __cifs_unlink(struct inode *dir, struct dentry *dentry, bool sillyrename) { int rc = 0; unsigned int xid; @@ -2003,7 +2003,11 @@ retry_std_delete: goto psx_del_no_retry; }
- rc = server->ops->unlink(xid, tcon, full_path, cifs_sb, dentry); + if (sillyrename || (server->vals->protocol_id > SMB10_PROT_ID && + d_is_positive(dentry) && d_count(dentry) > 2)) + rc = -EBUSY; + else + rc = server->ops->unlink(xid, tcon, full_path, cifs_sb, dentry);
psx_del_no_retry: if (!rc) { @@ -2071,6 +2075,11 @@ unlink_out: return rc; }
+int cifs_unlink(struct inode *dir, struct dentry *dentry) +{ + return __cifs_unlink(dir, dentry, false); +} + static int cifs_mkdir_qinfo(struct inode *parent, struct dentry *dentry, umode_t mode, const char *full_path, struct cifs_sb_info *cifs_sb, @@ -2358,14 +2367,16 @@ int cifs_rmdir(struct inode *inode, stru rc = server->ops->rmdir(xid, tcon, full_path, cifs_sb); cifs_put_tlink(tlink);
+ cifsInode = CIFS_I(d_inode(direntry)); + if (!rc) { + set_bit(CIFS_INO_DELETE_PENDING, &cifsInode->flags); spin_lock(&d_inode(direntry)->i_lock); i_size_write(d_inode(direntry), 0); clear_nlink(d_inode(direntry)); spin_unlock(&d_inode(direntry)->i_lock); }
- cifsInode = CIFS_I(d_inode(direntry)); /* force revalidate to go get info when needed */ cifsInode->time = 0;
@@ -2458,8 +2469,11 @@ cifs_do_rename(const unsigned int xid, s } #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ do_rename_exit: - if (rc == 0) + if (rc == 0) { d_move(from_dentry, to_dentry); + /* Force a new lookup */ + d_drop(from_dentry); + } cifs_put_tlink(tlink); return rc; } @@ -2470,6 +2484,7 @@ cifs_rename2(struct mnt_idmap *idmap, st struct dentry *target_dentry, unsigned int flags) { const char *from_name, *to_name; + struct TCP_Server_Info *server; void *page1, *page2; struct cifs_sb_info *cifs_sb; struct tcon_link *tlink; @@ -2505,6 +2520,7 @@ cifs_rename2(struct mnt_idmap *idmap, st if (IS_ERR(tlink)) return PTR_ERR(tlink); tcon = tlink_tcon(tlink); + server = tcon->ses->server;
page1 = alloc_dentry_path(); page2 = alloc_dentry_path(); @@ -2591,19 +2607,53 @@ cifs_rename2(struct mnt_idmap *idmap, st
unlink_target: #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ - - /* Try unlinking the target dentry if it's not negative */ - if (d_really_is_positive(target_dentry) && (rc == -EACCES || rc == -EEXIST)) { - if (d_is_dir(target_dentry)) - tmprc = cifs_rmdir(target_dir, target_dentry); - else - tmprc = cifs_unlink(target_dir, target_dentry); - if (tmprc) - goto cifs_rename_exit; - rc = cifs_do_rename(xid, source_dentry, from_name, - target_dentry, to_name); - if (!rc) - rehash = false; + if (d_really_is_positive(target_dentry)) { + if (!rc) { + struct inode *inode = d_inode(target_dentry); + /* + * Samba and ksmbd servers allow renaming a target + * directory that is open, so make sure to update + * ->i_nlink and then mark it as delete pending. + */ + if (S_ISDIR(inode->i_mode)) { + drop_cached_dir_by_name(xid, tcon, to_name, cifs_sb); + spin_lock(&inode->i_lock); + i_size_write(inode, 0); + clear_nlink(inode); + spin_unlock(&inode->i_lock); + set_bit(CIFS_INO_DELETE_PENDING, &CIFS_I(inode)->flags); + CIFS_I(inode)->time = 0; /* force reval */ + inode_set_ctime_current(inode); + inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); + } + } else if (rc == -EACCES || rc == -EEXIST) { + /* + * Rename failed, possibly due to a busy target. + * Retry it by unliking the target first. + */ + if (d_is_dir(target_dentry)) { + tmprc = cifs_rmdir(target_dir, target_dentry); + } else { + tmprc = __cifs_unlink(target_dir, target_dentry, + server->vals->protocol_id > SMB10_PROT_ID); + } + if (tmprc) { + /* + * Some servers will return STATUS_ACCESS_DENIED + * or STATUS_DIRECTORY_NOT_EMPTY when failing to + * rename a non-empty directory. Make sure to + * propagate the appropriate error back to + * userspace. + */ + if (tmprc == -EEXIST || tmprc == -ENOTEMPTY) + rc = tmprc; + goto cifs_rename_exit; + } + rc = cifs_do_rename(xid, source_dentry, from_name, + target_dentry, to_name); + if (!rc) + rehash = false; + } }
/* force revalidate to go get info when needed */ @@ -2629,6 +2679,8 @@ cifs_dentry_needs_reval(struct dentry *d struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); struct cached_fid *cfid = NULL;
+ if (test_bit(CIFS_INO_DELETE_PENDING, &cifs_i->flags)) + return false; if (cifs_i->time == 0) return true;
--- a/fs/smb/client/smb2glob.h +++ b/fs/smb/client/smb2glob.h @@ -30,10 +30,9 @@ enum smb2_compound_ops { SMB2_OP_QUERY_DIR, SMB2_OP_MKDIR, SMB2_OP_RENAME, - SMB2_OP_DELETE, SMB2_OP_HARDLINK, SMB2_OP_SET_EOF, - SMB2_OP_RMDIR, + SMB2_OP_UNLINK, SMB2_OP_POSIX_QUERY_INFO, SMB2_OP_SET_REPARSE, SMB2_OP_GET_REPARSE, --- a/fs/smb/client/smb2inode.c +++ b/fs/smb/client/smb2inode.c @@ -346,9 +346,6 @@ replay_again: trace_smb3_posix_query_info_compound_enter(xid, tcon->tid, ses->Suid, full_path); break; - case SMB2_OP_DELETE: - trace_smb3_delete_enter(xid, tcon->tid, ses->Suid, full_path); - break; case SMB2_OP_MKDIR: /* * Directories are created through parameters in the @@ -356,23 +353,40 @@ replay_again: */ trace_smb3_mkdir_enter(xid, tcon->tid, ses->Suid, full_path); break; - case SMB2_OP_RMDIR: - rqst[num_rqst].rq_iov = &vars->si_iov[0]; + case SMB2_OP_UNLINK: + rqst[num_rqst].rq_iov = vars->unlink_iov; rqst[num_rqst].rq_nvec = 1;
size[0] = 1; /* sizeof __u8 See MS-FSCC section 2.4.11 */ data[0] = &delete_pending[0];
- rc = SMB2_set_info_init(tcon, server, - &rqst[num_rqst], COMPOUND_FID, - COMPOUND_FID, current->tgid, - FILE_DISPOSITION_INFORMATION, - SMB2_O_INFO_FILE, 0, data, size); - if (rc) + if (cfile) { + rc = SMB2_set_info_init(tcon, server, + &rqst[num_rqst], + cfile->fid.persistent_fid, + cfile->fid.volatile_fid, + current->tgid, + FILE_DISPOSITION_INFORMATION, + SMB2_O_INFO_FILE, 0, + data, size); + } else { + rc = SMB2_set_info_init(tcon, server, + &rqst[num_rqst], + COMPOUND_FID, + COMPOUND_FID, + current->tgid, + FILE_DISPOSITION_INFORMATION, + SMB2_O_INFO_FILE, 0, + data, size); + } + if (!rc && (!cfile || num_rqst > 1)) { + smb2_set_next_command(tcon, &rqst[num_rqst]); + smb2_set_related(&rqst[num_rqst]); + } else if (rc) { goto finished; - smb2_set_next_command(tcon, &rqst[num_rqst]); - smb2_set_related(&rqst[num_rqst++]); - trace_smb3_rmdir_enter(xid, tcon->tid, ses->Suid, full_path); + } + num_rqst++; + trace_smb3_unlink_enter(xid, tcon->tid, ses->Suid, full_path); break; case SMB2_OP_SET_EOF: rqst[num_rqst].rq_iov = &vars->si_iov[0]; @@ -442,7 +456,7 @@ replay_again: ses->Suid, full_path); break; case SMB2_OP_RENAME: - rqst[num_rqst].rq_iov = &vars->si_iov[0]; + rqst[num_rqst].rq_iov = vars->rename_iov; rqst[num_rqst].rq_nvec = 2;
len = in_iov[i].iov_len; @@ -732,19 +746,6 @@ finished: trace_smb3_posix_query_info_compound_done(xid, tcon->tid, ses->Suid); break; - case SMB2_OP_DELETE: - if (rc) - trace_smb3_delete_err(xid, tcon->tid, ses->Suid, rc); - else { - /* - * If dentry (hence, inode) is NULL, lease break is going to - * take care of degrading leases on handles for deleted files. - */ - if (inode) - cifs_mark_open_handles_for_deleted_file(inode, full_path); - trace_smb3_delete_done(xid, tcon->tid, ses->Suid); - } - break; case SMB2_OP_MKDIR: if (rc) trace_smb3_mkdir_err(xid, tcon->tid, ses->Suid, rc); @@ -765,11 +766,11 @@ finished: trace_smb3_rename_done(xid, tcon->tid, ses->Suid); SMB2_set_info_free(&rqst[num_rqst++]); break; - case SMB2_OP_RMDIR: - if (rc) - trace_smb3_rmdir_err(xid, tcon->tid, ses->Suid, rc); + case SMB2_OP_UNLINK: + if (!rc) + trace_smb3_unlink_done(xid, tcon->tid, ses->Suid); else - trace_smb3_rmdir_done(xid, tcon->tid, ses->Suid); + trace_smb3_unlink_err(xid, tcon->tid, ses->Suid, rc); SMB2_set_info_free(&rqst[num_rqst++]); break; case SMB2_OP_SET_EOF: @@ -1165,7 +1166,7 @@ smb2_rmdir(const unsigned int xid, struc FILE_OPEN, CREATE_NOT_FILE, ACL_NO_MODE); return smb2_compound_op(xid, tcon, cifs_sb, name, &oparms, NULL, - &(int){SMB2_OP_RMDIR}, 1, + &(int){SMB2_OP_UNLINK}, 1, NULL, NULL, NULL, NULL); }
@@ -1174,20 +1175,29 @@ smb2_unlink(const unsigned int xid, stru struct cifs_sb_info *cifs_sb, struct dentry *dentry) { struct cifs_open_parms oparms; + struct inode *inode = NULL; + int rc;
- oparms = CIFS_OPARMS(cifs_sb, tcon, name, - DELETE, FILE_OPEN, - CREATE_DELETE_ON_CLOSE | OPEN_REPARSE_POINT, - ACL_NO_MODE); - int rc = smb2_compound_op(xid, tcon, cifs_sb, name, &oparms, - NULL, &(int){SMB2_OP_DELETE}, 1, - NULL, NULL, NULL, dentry); + if (dentry) + inode = d_inode(dentry); + + oparms = CIFS_OPARMS(cifs_sb, tcon, name, DELETE, + FILE_OPEN, OPEN_REPARSE_POINT, ACL_NO_MODE); + rc = smb2_compound_op(xid, tcon, cifs_sb, name, &oparms, + NULL, &(int){SMB2_OP_UNLINK}, + 1, NULL, NULL, NULL, dentry); if (rc == -EINVAL) { cifs_dbg(FYI, "invalid lease key, resending request without lease"); rc = smb2_compound_op(xid, tcon, cifs_sb, name, &oparms, - NULL, &(int){SMB2_OP_DELETE}, 1, - NULL, NULL, NULL, NULL); + NULL, &(int){SMB2_OP_UNLINK}, + 1, NULL, NULL, NULL, NULL); } + /* + * If dentry (hence, inode) is NULL, lease break is going to + * take care of degrading leases on handles for deleted files. + */ + if (!rc && inode) + cifs_mark_open_handles_for_deleted_file(inode, name); return rc; }
@@ -1441,3 +1451,113 @@ out: cifs_free_open_info(&data); return rc; } + +static inline __le16 *utf16_smb2_path(struct cifs_sb_info *cifs_sb, + const char *name, size_t namelen) +{ + int len; + + if (*name == '\' || + (cifs_sb_master_tlink(cifs_sb) && + cifs_sb_master_tcon(cifs_sb)->posix_extensions && *name == '/')) + name++; + return cifs_strndup_to_utf16(name, namelen, &len, + cifs_sb->local_nls, + cifs_remap(cifs_sb)); +} + +int smb2_rename_pending_delete(const char *full_path, + struct dentry *dentry, + const unsigned int xid) +{ + struct cifs_sb_info *cifs_sb = CIFS_SB(d_inode(dentry)->i_sb); + struct cifsInodeInfo *cinode = CIFS_I(d_inode(dentry)); + __le16 *utf16_path __free(kfree) = NULL; + __u32 co = file_create_options(dentry); + int cmds[] = { + SMB2_OP_SET_INFO, + SMB2_OP_RENAME, + SMB2_OP_UNLINK, + }; + const int num_cmds = ARRAY_SIZE(cmds); + char *to_name __free(kfree) = NULL; + __u32 attrs = cinode->cifsAttrs; + struct cifs_open_parms oparms; + static atomic_t sillycounter; + struct cifsFileInfo *cfile; + struct tcon_link *tlink; + struct cifs_tcon *tcon; + struct kvec iov[2]; + const char *ppath; + void *page; + size_t len; + int rc; + + tlink = cifs_sb_tlink(cifs_sb); + if (IS_ERR(tlink)) + return PTR_ERR(tlink); + tcon = tlink_tcon(tlink); + + page = alloc_dentry_path(); + + ppath = build_path_from_dentry(dentry->d_parent, page); + if (IS_ERR(ppath)) { + rc = PTR_ERR(ppath); + goto out; + } + + len = strlen(ppath) + strlen("/.__smb1234") + 1; + to_name = kmalloc(len, GFP_KERNEL); + if (!to_name) { + rc = -ENOMEM; + goto out; + } + + scnprintf(to_name, len, "%s%c.__smb%04X", ppath, CIFS_DIR_SEP(cifs_sb), + atomic_inc_return(&sillycounter) & 0xffff); + + utf16_path = utf16_smb2_path(cifs_sb, to_name, len); + if (!utf16_path) { + rc = -ENOMEM; + goto out; + } + + drop_cached_dir_by_name(xid, tcon, full_path, cifs_sb); + oparms = CIFS_OPARMS(cifs_sb, tcon, full_path, + DELETE | FILE_WRITE_ATTRIBUTES, + FILE_OPEN, co, ACL_NO_MODE); + + attrs &= ~ATTR_READONLY; + if (!attrs) + attrs = ATTR_NORMAL; + if (d_inode(dentry)->i_nlink <= 1) + attrs |= ATTR_HIDDEN; + iov[0].iov_base = &(FILE_BASIC_INFO) { + .Attributes = cpu_to_le32(attrs), + }; + iov[0].iov_len = sizeof(FILE_BASIC_INFO); + iov[1].iov_base = utf16_path; + iov[1].iov_len = sizeof(*utf16_path) * UniStrlen((wchar_t *)utf16_path); + + cifs_get_writable_path(tcon, full_path, FIND_WR_WITH_DELETE, &cfile); + rc = smb2_compound_op(xid, tcon, cifs_sb, full_path, &oparms, iov, + cmds, num_cmds, cfile, NULL, NULL, dentry); + if (rc == -EINVAL) { + cifs_dbg(FYI, "invalid lease key, resending request without lease\n"); + cifs_get_writable_path(tcon, full_path, + FIND_WR_WITH_DELETE, &cfile); + rc = smb2_compound_op(xid, tcon, cifs_sb, full_path, &oparms, iov, + cmds, num_cmds, cfile, NULL, NULL, NULL); + } + if (!rc) { + set_bit(CIFS_INO_DELETE_PENDING, &cinode->flags); + } else { + cifs_tcon_dbg(FYI, "%s: failed to rename '%s' to '%s': %d\n", + __func__, full_path, to_name, rc); + rc = -EIO; + } +out: + cifs_put_tlink(tlink); + free_dentry_path(page); + return rc; +} --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -5399,6 +5399,7 @@ struct smb_version_operations smb20_oper .llseek = smb3_llseek, .is_status_io_timeout = smb2_is_status_io_timeout, .is_network_name_deleted = smb2_is_network_name_deleted, + .rename_pending_delete = smb2_rename_pending_delete, }; #endif /* CIFS_ALLOW_INSECURE_LEGACY */
@@ -5504,6 +5505,7 @@ struct smb_version_operations smb21_oper .llseek = smb3_llseek, .is_status_io_timeout = smb2_is_status_io_timeout, .is_network_name_deleted = smb2_is_network_name_deleted, + .rename_pending_delete = smb2_rename_pending_delete, };
struct smb_version_operations smb30_operations = { @@ -5620,6 +5622,7 @@ struct smb_version_operations smb30_oper .llseek = smb3_llseek, .is_status_io_timeout = smb2_is_status_io_timeout, .is_network_name_deleted = smb2_is_network_name_deleted, + .rename_pending_delete = smb2_rename_pending_delete, };
struct smb_version_operations smb311_operations = { @@ -5736,6 +5739,7 @@ struct smb_version_operations smb311_ope .llseek = smb3_llseek, .is_status_io_timeout = smb2_is_status_io_timeout, .is_network_name_deleted = smb2_is_network_name_deleted, + .rename_pending_delete = smb2_rename_pending_delete, };
#ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY --- a/fs/smb/client/smb2proto.h +++ b/fs/smb/client/smb2proto.h @@ -320,5 +320,8 @@ int smb2_create_reparse_symlink(const un int smb2_make_nfs_node(unsigned int xid, struct inode *inode, struct dentry *dentry, struct cifs_tcon *tcon, const char *full_path, umode_t mode, dev_t dev); +int smb2_rename_pending_delete(const char *full_path, + struct dentry *dentry, + const unsigned int xid);
#endif /* _SMB2PROTO_H */ --- a/fs/smb/client/trace.h +++ b/fs/smb/client/trace.h @@ -669,13 +669,12 @@ DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(que DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(posix_query_info_compound_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(hardlink_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(rename_enter); -DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(rmdir_enter); +DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(unlink_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(set_eof_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(set_info_compound_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(set_reparse_compound_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(get_reparse_compound_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(query_wsl_ea_compound_enter); -DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(delete_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(mkdir_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(tdis_enter); DEFINE_SMB3_INF_COMPOUND_ENTER_EVENT(mknod_enter); @@ -710,13 +709,12 @@ DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(quer DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(posix_query_info_compound_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(hardlink_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(rename_done); -DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(rmdir_done); +DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(unlink_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(set_eof_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(set_info_compound_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(set_reparse_compound_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(get_reparse_compound_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(query_wsl_ea_compound_done); -DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(delete_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(mkdir_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(tdis_done); DEFINE_SMB3_INF_COMPOUND_DONE_EVENT(mknod_done); @@ -756,14 +754,13 @@ DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(query DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(posix_query_info_compound_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(hardlink_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(rename_err); -DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(rmdir_err); +DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(unlink_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(set_eof_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(set_info_compound_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(set_reparse_compound_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(get_reparse_compound_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(query_wsl_ea_compound_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(mkdir_err); -DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(delete_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(tdis_err); DEFINE_SMB3_INF_COMPOUND_ERR_EVENT(mknod_err);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Sverdlin alexander.sverdlin@siemens.com
commit fd779eac2d659668be4d3dbdac0710afd5d6db12 upstream.
Having setup time 0 violates tAR, tCLR of some chips, for instance TOSHIBA TC58NVG2S3ETAI0 cannot be detected successfully (first ID byte being read duplicated, i.e. 98 98 dc 90 15 76 14 03 instead of 98 dc 90 15 76 ...).
Atmel Application Notes postulated 1 cycle NRD_SETUP without explanation [1], but it looks more appropriate to just calculate setup time properly.
[1] Link: https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/Application...
Cc: stable@vger.kernel.org Fixes: f9ce2eddf176 ("mtd: nand: atmel: Add ->setup_data_interface() hooks") Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com Tested-by: Alexander Dahl ada@thorsis.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/atmel/nand-controller.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
--- a/drivers/mtd/nand/raw/atmel/nand-controller.c +++ b/drivers/mtd/nand/raw/atmel/nand-controller.c @@ -1378,13 +1378,23 @@ static int atmel_smc_nand_prepare_smccon return ret;
/* + * Read setup timing depends on the operation done on the NAND: + * + * NRD_SETUP = max(tAR, tCLR) + */ + timeps = max(conf->timings.sdr.tAR_min, conf->timings.sdr.tCLR_min); + ncycles = DIV_ROUND_UP(timeps, mckperiodps); + totalcycles += ncycles; + ret = atmel_smc_cs_conf_set_setup(smcconf, ATMEL_SMC_NRD_SHIFT, ncycles); + if (ret) + return ret; + + /* * The read cycle timing is directly matching tRC, but is also * dependent on the setup and hold timings we calculated earlier, * which gives: * - * NRD_CYCLE = max(tRC, NRD_PULSE + NRD_HOLD) - * - * NRD_SETUP is always 0. + * NRD_CYCLE = max(tRC, NRD_SETUP + NRD_PULSE + NRD_HOLD) */ ncycles = DIV_ROUND_UP(conf->timings.sdr.tRC_min, mckperiodps); ncycles = max(totalcycles, ncycles);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Kerello christophe.kerello@foss.st.com
commit 513c40e59d5a414ab763a9c84797534b5e8c208d upstream.
Avoid below overlapping mappings by using a contiguous non-cacheable buffer.
[ 4.077708] DMA-API: stm32_fmc2_nfc 48810000.nand-controller: cacheline tracking EEXIST, overlapping mappings aren't supported [ 4.089103] WARNING: CPU: 1 PID: 44 at kernel/dma/debug.c:568 add_dma_entry+0x23c/0x300 [ 4.097071] Modules linked in: [ 4.100101] CPU: 1 PID: 44 Comm: kworker/u4:2 Not tainted 6.1.82 #1 [ 4.106346] Hardware name: STMicroelectronics STM32MP257F VALID1 SNOR / MB1704 (LPDDR4 Power discrete) + MB1703 + MB1708 (SNOR MB1730) (DT) [ 4.118824] Workqueue: events_unbound deferred_probe_work_func [ 4.124674] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4.131624] pc : add_dma_entry+0x23c/0x300 [ 4.135658] lr : add_dma_entry+0x23c/0x300 [ 4.139792] sp : ffff800009dbb490 [ 4.143016] x29: ffff800009dbb4a0 x28: 0000000004008022 x27: ffff8000098a6000 [ 4.150174] x26: 0000000000000000 x25: ffff8000099e7000 x24: ffff8000099e7de8 [ 4.157231] x23: 00000000ffffffff x22: 0000000000000000 x21: ffff8000098a6a20 [ 4.164388] x20: ffff000080964180 x19: ffff800009819ba0 x18: 0000000000000006 [ 4.171545] x17: 6361727420656e69 x16: 6c6568636163203a x15: 72656c6c6f72746e [ 4.178602] x14: 6f632d646e616e2e x13: ffff800009832f58 x12: 00000000000004ec [ 4.185759] x11: 00000000000001a4 x10: ffff80000988af58 x9 : ffff800009832f58 [ 4.192916] x8 : 00000000ffffefff x7 : ffff80000988af58 x6 : 80000000fffff000 [ 4.199972] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000 [ 4.207128] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000812d2c40 [ 4.214185] Call trace: [ 4.216605] add_dma_entry+0x23c/0x300 [ 4.220338] debug_dma_map_sg+0x198/0x350 [ 4.224373] __dma_map_sg_attrs+0xa0/0x110 [ 4.228411] dma_map_sg_attrs+0x10/0x2c [ 4.232247] stm32_fmc2_nfc_xfer.isra.0+0x1c8/0x3fc [ 4.237088] stm32_fmc2_nfc_seq_read_page+0xc8/0x174 [ 4.242127] nand_read_oob+0x1d4/0x8e0 [ 4.245861] mtd_read_oob_std+0x58/0x84 [ 4.249596] mtd_read_oob+0x90/0x150 [ 4.253231] mtd_read+0x68/0xac
Signed-off-by: Christophe Kerello christophe.kerello@foss.st.com Cc: stable@vger.kernel.org Fixes: 2cd457f328c1 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/stm32_fmc2_nand.c | 28 +++++++++------------------- 1 file changed, 9 insertions(+), 19 deletions(-)
--- a/drivers/mtd/nand/raw/stm32_fmc2_nand.c +++ b/drivers/mtd/nand/raw/stm32_fmc2_nand.c @@ -272,6 +272,7 @@ struct stm32_fmc2_nfc { struct sg_table dma_data_sg; struct sg_table dma_ecc_sg; u8 *ecc_buf; + dma_addr_t dma_ecc_addr; int dma_ecc_len; u32 tx_dma_max_burst; u32 rx_dma_max_burst; @@ -902,17 +903,10 @@ static int stm32_fmc2_nfc_xfer(struct na
if (!write_data && !raw) { /* Configure DMA ECC status */ - p = nfc->ecc_buf; for_each_sg(nfc->dma_ecc_sg.sgl, sg, eccsteps, s) { - sg_set_buf(sg, p, nfc->dma_ecc_len); - p += nfc->dma_ecc_len; - } - - ret = dma_map_sg(nfc->dev, nfc->dma_ecc_sg.sgl, - eccsteps, dma_data_dir); - if (!ret) { - ret = -EIO; - goto err_unmap_data; + sg_dma_address(sg) = nfc->dma_ecc_addr + + s * nfc->dma_ecc_len; + sg_dma_len(sg) = nfc->dma_ecc_len; }
desc_ecc = dmaengine_prep_slave_sg(nfc->dma_ecc_ch, @@ -921,7 +915,7 @@ static int stm32_fmc2_nfc_xfer(struct na DMA_PREP_INTERRUPT); if (!desc_ecc) { ret = -ENOMEM; - goto err_unmap_ecc; + goto err_unmap_data; }
reinit_completion(&nfc->dma_ecc_complete); @@ -929,7 +923,7 @@ static int stm32_fmc2_nfc_xfer(struct na desc_ecc->callback_param = &nfc->dma_ecc_complete; ret = dma_submit_error(dmaengine_submit(desc_ecc)); if (ret) - goto err_unmap_ecc; + goto err_unmap_data;
dma_async_issue_pending(nfc->dma_ecc_ch); } @@ -949,7 +943,7 @@ static int stm32_fmc2_nfc_xfer(struct na if (!write_data && !raw) dmaengine_terminate_all(nfc->dma_ecc_ch); ret = -ETIMEDOUT; - goto err_unmap_ecc; + goto err_unmap_data; }
/* Wait DMA data transfer completion */ @@ -969,11 +963,6 @@ static int stm32_fmc2_nfc_xfer(struct na } }
-err_unmap_ecc: - if (!write_data && !raw) - dma_unmap_sg(nfc->dev, nfc->dma_ecc_sg.sgl, - eccsteps, dma_data_dir); - err_unmap_data: dma_unmap_sg(nfc->dev, nfc->dma_data_sg.sgl, eccsteps, dma_data_dir);
@@ -1610,7 +1599,8 @@ static int stm32_fmc2_nfc_dma_setup(stru return ret;
/* Allocate a buffer to store ECC status registers */ - nfc->ecc_buf = devm_kzalloc(nfc->dev, FMC2_MAX_ECC_BUF_LEN, GFP_KERNEL); + nfc->ecc_buf = dmam_alloc_coherent(nfc->dev, FMC2_MAX_ECC_BUF_LEN, + &nfc->dma_ecc_addr, GFP_KERNEL); if (!nfc->ecc_buf) return -ENOMEM;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Kerello christophe.kerello@foss.st.com
commit 811c0da4542df3c065f6cb843ced68780e27bb44 upstream.
In case OOB write is requested during a data write, ECC is currently lost. Avoid this issue by only writing in the free spare area. This issue has been seen with a YAFFS2 file system.
Signed-off-by: Christophe Kerello christophe.kerello@foss.st.com Cc: stable@vger.kernel.org Fixes: 2cd457f328c1 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/stm32_fmc2_nand.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
--- a/drivers/mtd/nand/raw/stm32_fmc2_nand.c +++ b/drivers/mtd/nand/raw/stm32_fmc2_nand.c @@ -985,9 +985,21 @@ static int stm32_fmc2_nfc_seq_write(stru
/* Write oob */ if (oob_required) { - ret = nand_change_write_column_op(chip, mtd->writesize, - chip->oob_poi, mtd->oobsize, - false); + unsigned int offset_in_page = mtd->writesize; + const void *buf = chip->oob_poi; + unsigned int len = mtd->oobsize; + + if (!raw) { + struct mtd_oob_region oob_free; + + mtd_ooblayout_free(mtd, 0, &oob_free); + offset_in_page += oob_free.offset; + buf += oob_free.offset; + len = oob_free.length; + } + + ret = nand_change_write_column_op(chip, offset_in_page, + buf, len, false); if (ret) return ret; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
commit e9c8da670e749f7dedc53e3af54a87b041918092 upstream.
We do not support passthrough operations other than read/write on regular file, so allowing non-regular backing files makes no sense.
Fixes: efad7153bf93 ("fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Bernd Schubert bschubert@ddn.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/passthrough.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -237,6 +237,11 @@ int fuse_backing_open(struct fuse_conn * if (!file) goto out;
+ /* read/write/splice/mmap passthrough only relevant for regular files */ + res = d_is_dir(file->f_path.dentry) ? -EISDIR : -EINVAL; + if (!d_is_reg(file->f_path.dentry)) + goto out_fput; + backing_sb = file_inode(file)->i_sb; res = -ELOOP; if (backing_sb->s_stack_depth >= fc->max_stack_depth)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miklos Szeredi mszeredi@redhat.com
commit e5203209b3935041dac541bc5b37efb44220cc0b upstream.
Just like write(), copy_file_range() should check if the return value is less or equal to the requested number of bytes.
Reported-by: Chunsheng Luo luochunsheng@ustc.edu Closes: https://lore.kernel.org/all/20250807062425.694-1-luochunsheng@ustc.edu/ Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()") Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/file.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3079,6 +3079,9 @@ static ssize_t __fuse_copy_file_range(st fc->no_copy_file_range = 1; err = -EOPNOTSUPP; } + if (!err && outarg.size > len) + err = -EIO; + if (err) goto out;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miklos Szeredi mszeredi@redhat.com
commit 1e08938c3694f707bb165535df352ac97a8c75c9 upstream.
The FUSE protocol uses struct fuse_write_out to convey the return value of copy_file_range, which is restricted to uint32_t. But the COPY_FILE_RANGE interface supports a 64-bit size copies.
Currently the number of bytes copied is silently truncated to 32-bit, which may result in poor performance or even failure to copy in case of truncation to zero.
Reported-by: Florian Weimer fweimer@redhat.com Closes: https://lore.kernel.org/all/lhuh5ynl8z5.fsf@oldenburg.str.redhat.com/ Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()") Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3013,7 +3013,7 @@ static ssize_t __fuse_copy_file_range(st .nodeid_out = ff_out->nodeid, .fh_out = ff_out->fh, .off_out = pos_out, - .len = len, + .len = min_t(size_t, len, UINT_MAX & PAGE_MASK), .flags = flags }; struct fuse_write_out outarg;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeongjun Park aha310510@gmail.com
commit 21cc2b5c5062a256ae9064442d37ebbc23f5aef7 upstream.
When restoring a reservation for an anonymous page, we need to check to freeing a surplus. However, __unmap_hugepage_range() causes data race because it reads h->surplus_huge_pages without the protection of hugetlb_lock.
And adjust_reservation is a boolean variable that indicates whether reservations for anonymous pages in each folio should be restored. Therefore, it should be initialized to false for each round of the loop. However, this variable is not initialized to false except when defining the current adjust_reservation variable.
This means that once adjust_reservation is set to true even once within the loop, reservations for anonymous pages will be restored unconditionally in all subsequent rounds, regardless of the folio's state.
To fix this, we need to add the missing hugetlb_lock, unlock the page_table_lock earlier so that we don't lock the hugetlb_lock inside the page_table_lock lock, and initialize adjust_reservation to false on each round within the loop.
Link: https://lkml.kernel.org/r/20250823182115.1193563-1-aha310510@gmail.com Fixes: df7a6d1f6405 ("mm/hugetlb: restore the reservation if needed") Signed-off-by: Jeongjun Park aha310510@gmail.com Reported-by: syzbot+417aeb05fd190f3a6da9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=417aeb05fd190f3a6da9 Reviewed-by: Sidhartha Kumar sidhartha.kumar@oracle.com Cc: Breno Leitao leitao@debian.org Cc: David Hildenbrand david@redhat.com Cc: Muchun Song muchun.song@linux.dev Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/hugetlb.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5855,7 +5855,7 @@ void __unmap_hugepage_range(struct mmu_g spinlock_t *ptl; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); - bool adjust_reservation = false; + bool adjust_reservation; unsigned long last_addr_mask; bool force_flush = false;
@@ -5948,6 +5948,7 @@ void __unmap_hugepage_range(struct mmu_g sz); hugetlb_count_sub(pages_per_huge_page(h), mm); hugetlb_remove_rmap(folio); + spin_unlock(ptl);
/* * Restore the reservation for anonymous page, otherwise the @@ -5955,14 +5956,16 @@ void __unmap_hugepage_range(struct mmu_g * If there we are freeing a surplus, do not set the restore * reservation bit. */ + adjust_reservation = false; + + spin_lock_irq(&hugetlb_lock); if (!h->surplus_huge_pages && __vma_private_lock(vma) && folio_test_anon(folio)) { folio_set_hugetlb_restore_reserve(folio); /* Reservation to be adjusted after the spin lock */ adjust_reservation = true; } - - spin_unlock(ptl); + spin_unlock_irq(&hugetlb_lock);
/* * Adjust the reservation for the region that will have the
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei Yang richard.weiyang@gmail.com
commit 394bfac1c7f7b701c2c93834c5761b9c9ceeebcf upstream.
Commit 8ee53820edfd ("thp: mmu_notifier_test_young") introduced mmu_notifier_test_young(), but we are passing the wrong address. In xxx_scan_pmd(), the actual iteration address is "_address" not "address". We seem to misuse the variable on the very beginning.
Change it to the right one.
[akpm@linux-foundation.org fix whitespace, per everyone] Link: https://lkml.kernel.org/r/20250822063318.11644-1-richard.weiyang@gmail.com Fixes: 8ee53820edfd ("thp: mmu_notifier_test_young") Signed-off-by: Wei Yang richard.weiyang@gmail.com Reviewed-by: Dev Jain dev.jain@arm.com Reviewed-by: Zi Yan ziy@nvidia.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: Liam R. Howlett Liam.Howlett@oracle.com Cc: Nico Pache npache@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Barry Song baohua@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/khugepaged.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1400,8 +1400,8 @@ static int hpage_collapse_scan_pmd(struc */ if (cc->is_khugepaged && (pte_young(pteval) || folio_test_young(folio) || - folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm, - address))) + folio_test_referenced(folio) || + mmu_notifier_test_young(vma->vm_mm, _address))) referenced++; } if (!writable) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uladzislau Rezki (Sony) urezki@gmail.com
commit 79357cd06d41d0f5a11b17d7c86176e395d10ef2 upstream.
kasan_populate_vmalloc() and its helpers ignore the caller's gfp_mask and always allocate memory using the hardcoded GFP_KERNEL flag. This makes them inconsistent with vmalloc(), which was recently extended to support GFP_NOFS and GFP_NOIO allocations.
Page table allocations performed during shadow population also ignore the external gfp_mask. To preserve the intended semantics of GFP_NOFS and GFP_NOIO, wrap the apply_to_page_range() calls into the appropriate memalloc scope.
xfs calls vmalloc with GFP_NOFS, so this bug could lead to deadlock.
There was a report here https://lkml.kernel.org/r/686ea951.050a0220.385921.0016.GAE@google.com
This patch: - Extends kasan_populate_vmalloc() and helpers to take gfp_mask; - Passes gfp_mask down to alloc_pages_bulk() and __get_free_page(); - Enforces GFP_NOFS/NOIO semantics with memalloc_*_save()/restore() around apply_to_page_range(); - Updates vmalloc.c and percpu allocator call sites accordingly.
Link: https://lkml.kernel.org/r/20250831121058.92971-1-urezki@gmail.com Fixes: 451769ebb7e7 ("mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc") Signed-off-by: Uladzislau Rezki (Sony) urezki@gmail.com Reported-by: syzbot+3470c9ffee63e4abafeb@syzkaller.appspotmail.com Reviewed-by: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Baoquan He bhe@redhat.com Cc: Michal Hocko mhocko@kernel.org Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/kasan.h | 6 +++--- mm/kasan/shadow.c | 31 ++++++++++++++++++++++++------- mm/vmalloc.c | 8 ++++---- 3 files changed, 31 insertions(+), 14 deletions(-)
--- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -562,7 +562,7 @@ static inline void kasan_init_hw_tags(vo #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
void kasan_populate_early_vm_area_shadow(void *start, unsigned long size); -int kasan_populate_vmalloc(unsigned long addr, unsigned long size); +int kasan_populate_vmalloc(unsigned long addr, unsigned long size, gfp_t gfp_mask); void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, unsigned long free_region_end, @@ -574,7 +574,7 @@ static inline void kasan_populate_early_ unsigned long size) { } static inline int kasan_populate_vmalloc(unsigned long start, - unsigned long size) + unsigned long size, gfp_t gfp_mask) { return 0; } @@ -610,7 +610,7 @@ static __always_inline void kasan_poison static inline void kasan_populate_early_vm_area_shadow(void *start, unsigned long size) { } static inline int kasan_populate_vmalloc(unsigned long start, - unsigned long size) + unsigned long size, gfp_t gfp_mask) { return 0; } --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -335,13 +335,13 @@ static void ___free_pages_bulk(struct pa } }
-static int ___alloc_pages_bulk(struct page **pages, int nr_pages) +static int ___alloc_pages_bulk(struct page **pages, int nr_pages, gfp_t gfp_mask) { unsigned long nr_populated, nr_total = nr_pages; struct page **page_array = pages;
while (nr_pages) { - nr_populated = alloc_pages_bulk(GFP_KERNEL, nr_pages, pages); + nr_populated = alloc_pages_bulk(gfp_mask, nr_pages, pages); if (!nr_populated) { ___free_pages_bulk(page_array, nr_total - nr_pages); return -ENOMEM; @@ -353,25 +353,42 @@ static int ___alloc_pages_bulk(struct pa return 0; }
-static int __kasan_populate_vmalloc(unsigned long start, unsigned long end) +static int __kasan_populate_vmalloc(unsigned long start, unsigned long end, gfp_t gfp_mask) { unsigned long nr_pages, nr_total = PFN_UP(end - start); struct vmalloc_populate_data data; + unsigned int flags; int ret = 0;
- data.pages = (struct page **)__get_free_page(GFP_KERNEL | __GFP_ZERO); + data.pages = (struct page **)__get_free_page(gfp_mask | __GFP_ZERO); if (!data.pages) return -ENOMEM;
while (nr_total) { nr_pages = min(nr_total, PAGE_SIZE / sizeof(data.pages[0])); - ret = ___alloc_pages_bulk(data.pages, nr_pages); + ret = ___alloc_pages_bulk(data.pages, nr_pages, gfp_mask); if (ret) break;
data.start = start; + + /* + * page tables allocations ignore external gfp mask, enforce it + * by the scope API + */ + if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) + flags = memalloc_nofs_save(); + else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0) + flags = memalloc_noio_save(); + ret = apply_to_page_range(&init_mm, start, nr_pages * PAGE_SIZE, kasan_populate_vmalloc_pte, &data); + + if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) + memalloc_nofs_restore(flags); + else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0) + memalloc_noio_restore(flags); + ___free_pages_bulk(data.pages, nr_pages); if (ret) break; @@ -385,7 +402,7 @@ static int __kasan_populate_vmalloc(unsi return ret; }
-int kasan_populate_vmalloc(unsigned long addr, unsigned long size) +int kasan_populate_vmalloc(unsigned long addr, unsigned long size, gfp_t gfp_mask) { unsigned long shadow_start, shadow_end; int ret; @@ -414,7 +431,7 @@ int kasan_populate_vmalloc(unsigned long shadow_start = PAGE_ALIGN_DOWN(shadow_start); shadow_end = PAGE_ALIGN(shadow_end);
- ret = __kasan_populate_vmalloc(shadow_start, shadow_end); + ret = __kasan_populate_vmalloc(shadow_start, shadow_end, gfp_mask); if (ret) return ret;
--- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2026,6 +2026,8 @@ static struct vmap_area *alloc_vmap_area if (unlikely(!vmap_initialized)) return ERR_PTR(-EBUSY);
+ /* Only reclaim behaviour flags are relevant. */ + gfp_mask = gfp_mask & GFP_RECLAIM_MASK; might_sleep();
/* @@ -2038,8 +2040,6 @@ static struct vmap_area *alloc_vmap_area */ va = node_alloc(size, align, vstart, vend, &addr, &vn_id); if (!va) { - gfp_mask = gfp_mask & GFP_RECLAIM_MASK; - va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node); if (unlikely(!va)) return ERR_PTR(-ENOMEM); @@ -2089,7 +2089,7 @@ retry: BUG_ON(va->va_start < vstart); BUG_ON(va->va_end > vend);
- ret = kasan_populate_vmalloc(addr, size); + ret = kasan_populate_vmalloc(addr, size, gfp_mask); if (ret) { free_vmap_area(va); return ERR_PTR(ret); @@ -4826,7 +4826,7 @@ retry:
/* populate the kasan shadow space */ for (area = 0; area < nr_vms; area++) { - if (kasan_populate_vmalloc(vas[area]->va_start, sizes[area])) + if (kasan_populate_vmalloc(vas[area]->va_start, sizes[area], GFP_KERNEL)) goto err_free_shadow; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaohe Lin linmiaohe@huawei.com
commit d613f53c83ec47089c4e25859d5e8e0359f6f8da upstream.
When I did memory failure tests, below panic occurs:
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page)) kernel BUG at include/linux/page-flags.h:616! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40 RIP: 0010:unpoison_memory+0x2f3/0x590 RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246 RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0 RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000 R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0 Call Trace: <TASK> unpoison_memory+0x2f3/0x590 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110 debugfs_attr_write+0x42/0x60 full_proxy_write+0x5b/0x80 vfs_write+0xd5/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xb9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f08f0314887 RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887 RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001 RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009 R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00 </TASK> Modules linked in: hwpoison_inject ---[ end trace 0000000000000000 ]--- RIP: 0010:unpoison_memory+0x2f3/0x590 RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246 RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0 RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000 R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]---
The root cause is that unpoison_memory() tries to check the PG_HWPoison flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is triggered. This can be reproduced by below steps:
1.Offline memory block:
echo offline > /sys/devices/system/memory/memory12/state
2.Get offlined memory pfn:
page-types -b n -rlN
3.Write pfn to unpoison-pfn
echo <pfn> > /sys/kernel/debug/hwpoison/unpoison-pfn
This scenario can be identified by pfn_to_online_page() returning NULL. And ZONE_DEVICE pages are never expected, so we can simply fail if pfn_to_online_page() == NULL to fix the bug.
Link: https://lkml.kernel.org/r/20250828024618.1744895-1-linmiaohe@huawei.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Suggested-by: David Hildenbrand david@redhat.com Acked-by: David Hildenbrand david@redhat.com Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2576,10 +2576,9 @@ int unpoison_memory(unsigned long pfn) static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
- if (!pfn_valid(pfn)) - return -ENXIO; - - p = pfn_to_page(pfn); + p = pfn_to_online_page(pfn); + if (!p) + return -EIO; folio = page_folio(p);
mutex_lock(&mf_mutex);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kyle Meyer kyle.meyer@hpe.com
commit 3be306cccdccede13e3cefd0c14e430cc2b7c9c7 upstream.
Duplicate memory errors can be reported by multiple sources.
Passing an already poisoned page to action_result() causes issues:
* The amount of hardware corrupted memory is incorrectly updated. * Per NUMA node MF stats are incorrectly updated. * Redundant "already poisoned" messages are printed.
Avoid those issues by:
* Skipping hardware corrupted memory updates for already poisoned pages. * Skipping per NUMA node MF stats updates for already poisoned pages. * Dropping redundant "already poisoned" messages.
Make MF_MSG_ALREADY_POISONED consistent with other action_page_types and make calls to action_result() consistent for already poisoned normal pages and huge pages.
Link: https://lkml.kernel.org/r/aLCiHMy12Ck3ouwC@hpe.com Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages") Signed-off-by: Kyle Meyer kyle.meyer@hpe.com Reviewed-by: Jiaqi Yan jiaqiyan@google.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Jane Chu jane.chu@oracle.com Acked-by: Miaohe Lin linmiaohe@huawei.com Cc: Borislav Betkov bp@alien8.de Cc: Kyle Meyer kyle.meyer@hpe.com Cc: Liam Howlett liam.howlett@oracle.com Cc: Lorenzo Stoakes lorenzo.stoakes@oracle.com Cc: "Luck, Tony" tony.luck@intel.com Cc: Michal Hocko mhocko@suse.com Cc: Mike Rapoport rppt@kernel.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Oscar Salvador osalvador@suse.de Cc: Russ Anderson russ.anderson@hpe.com Cc: Suren Baghdasaryan surenb@google.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -950,7 +950,7 @@ static const char * const action_page_ty [MF_MSG_BUDDY] = "free buddy page", [MF_MSG_DAX] = "dax page", [MF_MSG_UNSPLIT_THP] = "unsplit thp", - [MF_MSG_ALREADY_POISONED] = "already poisoned", + [MF_MSG_ALREADY_POISONED] = "already poisoned page", [MF_MSG_UNKNOWN] = "unknown page", };
@@ -1343,9 +1343,10 @@ static int action_result(unsigned long p { trace_memory_failure_event(pfn, type, result);
- num_poisoned_pages_inc(pfn); - - update_per_node_mf_stats(pfn, result); + if (type != MF_MSG_ALREADY_POISONED) { + num_poisoned_pages_inc(pfn); + update_per_node_mf_stats(pfn, result); + }
pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[type], action_name[result]); @@ -2088,12 +2089,11 @@ retry: *hugetlb = 0; return 0; } else if (res == -EHWPOISON) { - pr_err("%#lx: already hardware poisoned\n", pfn); if (flags & MF_ACTION_REQUIRED) { folio = page_folio(p); res = kill_accessing_process(current, folio_pfn(folio), flags); - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED); } + action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED); return res; } else if (res == -EBUSY) { if (!(flags & MF_NO_RETRY)) { @@ -2279,7 +2279,6 @@ try_again: goto unlock_mutex;
if (TestSetPageHWPoison(p)) { - pr_err("%#lx: already hardware poisoned\n", pfn); res = -EHWPOISON; if (flags & MF_ACTION_REQUIRED) res = kill_accessing_process(current, pfn, flags);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sang-Heon Jeon ekffu200098@gmail.com
commit ce652aac9c90a96c6536681d17518efb1f660fb8 upstream.
Kernel initializes the "jiffies" timer as 5 minutes below zero, as shown in include/linux/jiffies.h
/* * Have the 32 bit jiffies value wrap 5 minutes after boot * so jiffies wrap bugs show up earlier. */ #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))
And jiffies comparison help functions cast unsigned value to signed to cover wraparound
#define time_after_eq(a,b) \ (typecheck(unsigned long, a) && \ typecheck(unsigned long, b) && \ ((long)((a) - (b)) >= 0))
When quota->charged_from is initialized to 0, time_after_eq() can incorrectly return FALSE even after reset_interval has elapsed. This occurs when (jiffies - reset_interval) produces a value with MSB=1, which is interpreted as negative in signed arithmetic.
This issue primarily affects 32-bit systems because: On 64-bit systems: MSB=1 values occur after ~292 million years from boot (assuming HZ=1000), almost impossible.
On 32-bit systems: MSB=1 values occur during the first 5 minutes after boot, and the second half of every jiffies wraparound cycle, starting from day 25 (assuming HZ=1000)
When above unexpected FALSE return from time_after_eq() occurs, the charging window will not reset. The user impact depends on esz value at that time.
If esz is 0, scheme ignores configured quotas and runs without any limits.
If esz is not 0, scheme stops working once the quota is exhausted. It remains until the charging window finally resets.
So, change quota->charged_from to jiffies at damos_adjust_quota() when it is considered as the first charge window. By this change, we can avoid unexpected FALSE return from time_after_eq()
Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com Fixes: 2b8a248d5873 ("mm/damon/schemes: implement size quota for schemes application speed control") # 5.16 Signed-off-by: Sang-Heon Jeon ekffu200098@gmail.com Reviewed-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/core.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -2050,6 +2050,10 @@ static void damos_adjust_quota(struct da if (!quota->ms && !quota->sz && list_empty("a->goals)) return;
+ /* First charge window */ + if (!quota->total_charged_sz && !quota->charged_from) + quota->charged_from = jiffies; + /* New charge window starts */ if (time_after_eq(jiffies, quota->charged_from + msecs_to_jiffies(quota->reset_interval))) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quanmin Yan yanquanmin1@huawei.com
commit 711f19dfd783ffb37ca4324388b9c4cb87e71363 upstream.
Patch series "mm/damon: avoid divide-by-zero in DAMON module's parameters application".
DAMON's RECLAIM and LRU_SORT modules perform no validation on user-configured parameters during application, which may lead to division-by-zero errors.
Avoid the divide-by-zero by adding validation checks when DAMON modules attempt to apply the parameters.
This patch (of 2):
During the calculation of 'hot_thres' and 'cold_thres', either 'sample_interval' or 'aggr_interval' is used as the divisor, which may lead to division-by-zero errors. Fix it by directly returning -EINVAL when such a case occurs. Additionally, since 'aggr_interval' is already required to be set no smaller than 'sample_interval' in damon_set_attrs(), only the case where 'sample_interval' is zero needs to be checked.
Link: https://lkml.kernel.org/r/20250827115858.1186261-2-yanquanmin1@huawei.com Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting") Signed-off-by: Quanmin Yan yanquanmin1@huawei.com Reviewed-by: SeongJae Park sj@kernel.org Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: ze zuo zuoze1@huawei.com Cc: stable@vger.kernel.org [6.0+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/lru_sort.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/mm/damon/lru_sort.c +++ b/mm/damon/lru_sort.c @@ -198,6 +198,11 @@ static int damon_lru_sort_apply_paramete if (err) return err;
+ if (!damon_lru_sort_mon_attrs.sample_interval) { + err = -EINVAL; + goto out; + } + err = damon_set_attrs(ctx, &damon_lru_sort_mon_attrs); if (err) goto out;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit 4de37a48b6b58faaded9eb765047cf0d8785ea18 upstream.
The for_each_child_of_node() helper drops the reference it takes to each node as it iterates over children and an explicit of_node_put() is only needed when exiting the loop early.
Drop the recently introduced bogus additional reference count decrement at each iteration that could potentially lead to a use-after-free.
Fixes: 1f403699c40f ("drm/mediatek: Fix device/node reference count leaks in mtk_drm_get_all_drm_priv") Cc: Ma Ke make24@iscas.ac.cn Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Reviewed-by: CK Hu ck.hu@mediatek.com Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Link: https://patchwork.kernel.org/project/dri-devel/patch/20250829090345.21075-2-... Signed-off-by: Chun-Kuang Hu chunkuang.hu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -388,11 +388,11 @@ static bool mtk_drm_get_all_drm_priv(str
of_id = of_match_node(mtk_drm_of_ids, node); if (!of_id) - goto next_put_node; + continue;
pdev = of_find_device_by_node(node); if (!pdev) - goto next_put_node; + continue;
drm_dev = device_find_child(&pdev->dev, NULL, mtk_drm_match); if (!drm_dev) @@ -418,11 +418,10 @@ next_put_device_drm_dev: next_put_device_pdev_dev: put_device(&pdev->dev);
-next_put_node: - of_node_put(node); - - if (cnt == MAX_CRTC) + if (cnt == MAX_CRTC) { + of_node_put(node); break; + } }
if (drm_priv->data->mmsys_dev_num == cnt) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jani Nikula jani.nikula@intel.com
commit cfa7b7659757f8d0fc4914429efa90d0d2577dd7 upstream.
for_each_set_bit() expects size to be in bits, not bytes. The abox mask iteration uses bytes, but it works by coincidence, because the local variable holding the mask is unsigned long, and the mask only ever has bit 2 as the highest bit. Using a smaller type could lead to subtle and very hard to track bugs.
Fixes: 62afef2811e4 ("drm/i915/rkl: RKL uses ABOX0 for pixel transfers") Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v5.9+ Reviewed-by: Matt Roper matthew.d.roper@intel.com Link: https://lore.kernel.org/r/20250905104149.1144751-1-jani.nikula@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com (cherry picked from commit 7ea3baa6efe4bb93d11e1c0e6528b1468d7debf6) Signed-off-by: Tvrtko Ursulin tursulin@ursulin.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_display_power.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_display_power.c +++ b/drivers/gpu/drm/i915/display/intel_display_power.c @@ -1169,7 +1169,7 @@ static void icl_mbus_init(struct intel_d if (DISPLAY_VER(display) == 12) abox_regs |= BIT(0);
- for_each_set_bit(i, &abox_regs, sizeof(abox_regs)) + for_each_set_bit(i, &abox_regs, BITS_PER_TYPE(abox_regs)) intel_de_rmw(display, MBUS_ABOX_CTL(i), mask, val); }
@@ -1630,11 +1630,11 @@ static void tgl_bw_buddy_init(struct int if (table[config].page_mask == 0) { drm_dbg_kms(display->drm, "Unknown memory configuration; disabling address buddy logic.\n"); - for_each_set_bit(i, &abox_mask, sizeof(abox_mask)) + for_each_set_bit(i, &abox_mask, BITS_PER_TYPE(abox_mask)) intel_de_write(display, BW_BUDDY_CTL(i), BW_BUDDY_DISABLE); } else { - for_each_set_bit(i, &abox_mask, sizeof(abox_mask)) { + for_each_set_bit(i, &abox_mask, BITS_PER_TYPE(abox_mask)) { intel_de_write(display, BW_BUDDY_PAGE_MASK(i), table[config].page_mask);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Hellström thomas.hellstrom@linux.intel.com
commit 5c87fee3c96ce898ad681552404a66c7605193c0 upstream.
VRAM+TT bos that are evicted from VRAM to TT may remain in TT also after a revalidation following eviction or suspend.
This manifests itself as applications becoming sluggish after buffer objects get evicted or after a resume from suspend or hibernation.
If the bo supports placement in both VRAM and TT, and we are on DGFX, mark the TT placement as fallback. This means that it is tried only after VRAM + eviction.
This flaw has probably been present since the xe module was upstreamed but use a Fixes: commit below where backporting is likely to be simple. For earlier versions we need to open- code the fallback algorithm in the driver.
v2: - Remove check for dgfx. (Matthew Auld) - Update the xe_dma_buf kunit test for the new strategy (CI) - Allow dma-buf to pin in current placement (CI) - Make xe_bo_validate() for pinned bos a NOP.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995 Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") Cc: Matthew Brost matthew.brost@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: stable@vger.kernel.org # v6.9+ Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Matthew Auld matthew.auld@intel.com Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel... (cherry picked from commit cb3d7b3b46b799c96b54f8e8fe36794a55a77f0b) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/xe/tests/xe_bo.c | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 10 +--------- drivers/gpu/drm/xe/xe_bo.c | 16 ++++++++++++---- drivers/gpu/drm/xe/xe_bo.h | 2 +- drivers/gpu/drm/xe/xe_dma_buf.c | 2 +- 5 files changed, 16 insertions(+), 16 deletions(-)
--- a/drivers/gpu/drm/xe/tests/xe_bo.c +++ b/drivers/gpu/drm/xe/tests/xe_bo.c @@ -236,7 +236,7 @@ static int evict_test_run_tile(struct xe }
xe_bo_lock(external, false); - err = xe_bo_pin_external(external); + err = xe_bo_pin_external(external, false); xe_bo_unlock(external); if (err) { KUNIT_FAIL(test, "external bo pin err=%pe\n", --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c @@ -89,15 +89,7 @@ static void check_residency(struct kunit return; }
- /* - * If on different devices, the exporter is kept in system if - * possible, saving a migration step as the transfer is just - * likely as fast from system memory. - */ - if (params->mem_mask & XE_BO_FLAG_SYSTEM) - KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, XE_PL_TT)); - else - KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, mem_type)); + KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, mem_type));
if (params->force_different_devices) KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(imported, XE_PL_TT)); --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -184,6 +184,8 @@ static void try_add_system(struct xe_dev
bo->placements[*c] = (struct ttm_place) { .mem_type = XE_PL_TT, + .flags = (bo_flags & XE_BO_FLAG_VRAM_MASK) ? + TTM_PL_FLAG_FALLBACK : 0, }; *c += 1; } @@ -2266,6 +2268,7 @@ uint64_t vram_region_gpu_offset(struct t /** * xe_bo_pin_external - pin an external BO * @bo: buffer object to be pinned + * @in_place: Pin in current placement, don't attempt to migrate. * * Pin an external (not tied to a VM, can be exported via dma-buf / prime FD) * BO. Unique call compared to xe_bo_pin as this function has it own set of @@ -2273,7 +2276,7 @@ uint64_t vram_region_gpu_offset(struct t * * Returns 0 for success, negative error code otherwise. */ -int xe_bo_pin_external(struct xe_bo *bo) +int xe_bo_pin_external(struct xe_bo *bo, bool in_place) { struct xe_device *xe = xe_bo_device(bo); int err; @@ -2282,9 +2285,11 @@ int xe_bo_pin_external(struct xe_bo *bo) xe_assert(xe, xe_bo_is_user(bo));
if (!xe_bo_is_pinned(bo)) { - err = xe_bo_validate(bo, NULL, false); - if (err) - return err; + if (!in_place) { + err = xe_bo_validate(bo, NULL, false); + if (err) + return err; + }
spin_lock(&xe->pinned.lock); list_add_tail(&bo->pinned_link, &xe->pinned.late.external); @@ -2437,6 +2442,9 @@ int xe_bo_validate(struct xe_bo *bo, str }; int ret;
+ if (xe_bo_is_pinned(bo)) + return 0; + if (vm) { lockdep_assert_held(&vm->lock); xe_vm_assert_held(vm); --- a/drivers/gpu/drm/xe/xe_bo.h +++ b/drivers/gpu/drm/xe/xe_bo.h @@ -201,7 +201,7 @@ static inline void xe_bo_unlock_vm_held( } }
-int xe_bo_pin_external(struct xe_bo *bo); +int xe_bo_pin_external(struct xe_bo *bo, bool in_place); int xe_bo_pin(struct xe_bo *bo); void xe_bo_unpin_external(struct xe_bo *bo); void xe_bo_unpin(struct xe_bo *bo); --- a/drivers/gpu/drm/xe/xe_dma_buf.c +++ b/drivers/gpu/drm/xe/xe_dma_buf.c @@ -72,7 +72,7 @@ static int xe_dma_buf_pin(struct dma_buf return ret; }
- ret = xe_bo_pin_external(bo); + ret = xe_bo_pin_external(bo, true); xe_assert(xe, !ret);
return 0;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Hellström thomas.hellstrom@linux.intel.com
commit d84820309ed34cc412ce76ecfa9471dae7d7d144 upstream.
Its actions are opportunistic anyway and will be completed on device suspend.
Marking as a fix to simplify backporting of the fix that follows in the series.
v2: - Keep the runtime pm reference over suspend / hibernate and document why. (Matt Auld, Rodrigo Vivi):
Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld matthew.auld@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Matthew Auld matthew.auld@intel.com Link: https://lore.kernel.org/r/20250904160715.2613-3-thomas.hellstrom@linux.intel... (cherry picked from commit ebd546fdffddfcaeab08afdd68ec93052c8fa740) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/xe/xe_pm.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
--- a/drivers/gpu/drm/xe/xe_pm.c +++ b/drivers/gpu/drm/xe/xe_pm.c @@ -296,17 +296,17 @@ static int xe_pm_notifier_callback(struc case PM_SUSPEND_PREPARE: xe_pm_runtime_get(xe); err = xe_bo_evict_all_user(xe); - if (err) { + if (err) drm_dbg(&xe->drm, "Notifier evict user failed (%d)\n", err); - xe_pm_runtime_put(xe); - break; - }
err = xe_bo_notifier_prepare_all_pinned(xe); - if (err) { + if (err) drm_dbg(&xe->drm, "Notifier prepare pin failed (%d)\n", err); - xe_pm_runtime_put(xe); - } + /* + * Keep the runtime pm reference until post hibernation / post suspend to + * avoid a runtime suspend interfering with evicted objects or backup + * allocations. + */ break; case PM_POST_HIBERNATION: case PM_POST_SUSPEND: @@ -315,9 +315,6 @@ static int xe_pm_notifier_callback(struc break; }
- if (err) - return NOTIFY_BAD; - return NOTIFY_DONE; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Hellström thomas.hellstrom@linux.intel.com
commit eb5723a75104605b7d2207a7d598e314166fbef4 upstream.
When the xe pm_notifier evicts for suspend / hibernate, there might be racing tasks trying to re-validate again. This can lead to suspend taking excessive time or get stuck in a live-lock. This behaviour becomes much worse with the fix that actually makes re-validation bring back bos to VRAM rather than letting them remain in TT.
Prevent that by having exec and the rebind worker waiting for a completion that is set to block by the pm_notifier before suspend and is signaled by the pm_notifier after resume / wakeup.
It's probably still possible to craft malicious applications that block suspending. More work is pending to fix that.
v3: - Avoid wait_for_completion() in the kernel worker since it could potentially cause work item flushes from freezable processes to wait forever. Instead terminate the rebind workers if needed and re-launch at resume. (Matt Auld) v4: - Fix some bad naming and leftover debug printouts. - Fix kerneldoc. - Use drmm_mutex_init() for the xe->rebind_resume_lock (Matt Auld). - Rework the interface of xe_vm_rebind_resume_worker (Matt Auld).
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288 Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld matthew.auld@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Matthew Auld matthew.auld@intel.com Link: https://lore.kernel.org/r/20250904160715.2613-4-thomas.hellstrom@linux.intel... (cherry picked from commit 599334572a5a99111015fbbd5152ce4dedc2f8b7) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/xe/xe_device_types.h | 6 +++++ drivers/gpu/drm/xe/xe_exec.c | 9 +++++++ drivers/gpu/drm/xe/xe_pm.c | 25 ++++++++++++++++++++ drivers/gpu/drm/xe/xe_vm.c | 42 ++++++++++++++++++++++++++++++++++- drivers/gpu/drm/xe/xe_vm.h | 2 + drivers/gpu/drm/xe/xe_vm_types.h | 5 ++++ 6 files changed, 88 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/xe/xe_device_types.h +++ b/drivers/gpu/drm/xe/xe_device_types.h @@ -529,6 +529,12 @@ struct xe_device {
/** @pm_notifier: Our PM notifier to perform actions in response to various PM events. */ struct notifier_block pm_notifier; + /** @pm_block: Completion to block validating tasks on suspend / hibernate prepare */ + struct completion pm_block; + /** @rebind_resume_list: List of wq items to kick on resume. */ + struct list_head rebind_resume_list; + /** @rebind_resume_lock: Lock to protect the rebind_resume_list */ + struct mutex rebind_resume_lock;
/** @pmt: Support the PMT driver callback interface */ struct { --- a/drivers/gpu/drm/xe/xe_exec.c +++ b/drivers/gpu/drm/xe/xe_exec.c @@ -237,6 +237,15 @@ retry: goto err_unlock_list; }
+ /* + * It's OK to block interruptible here with the vm lock held, since + * on task freezing during suspend / hibernate, the call will + * return -ERESTARTSYS and the IOCTL will be rerun. + */ + err = wait_for_completion_interruptible(&xe->pm_block); + if (err) + goto err_unlock_list; + vm_exec.vm = &vm->gpuvm; vm_exec.flags = DRM_EXEC_INTERRUPTIBLE_WAIT; if (xe_vm_in_lr_mode(vm)) { --- a/drivers/gpu/drm/xe/xe_pm.c +++ b/drivers/gpu/drm/xe/xe_pm.c @@ -23,6 +23,7 @@ #include "xe_pcode.h" #include "xe_pxp.h" #include "xe_trace.h" +#include "xe_vm.h" #include "xe_wa.h"
/** @@ -285,6 +286,19 @@ static u32 vram_threshold_value(struct x return DEFAULT_VRAM_THRESHOLD; }
+static void xe_pm_wake_rebind_workers(struct xe_device *xe) +{ + struct xe_vm *vm, *next; + + mutex_lock(&xe->rebind_resume_lock); + list_for_each_entry_safe(vm, next, &xe->rebind_resume_list, + preempt.pm_activate_link) { + list_del_init(&vm->preempt.pm_activate_link); + xe_vm_resume_rebind_worker(vm); + } + mutex_unlock(&xe->rebind_resume_lock); +} + static int xe_pm_notifier_callback(struct notifier_block *nb, unsigned long action, void *data) { @@ -294,6 +308,7 @@ static int xe_pm_notifier_callback(struc switch (action) { case PM_HIBERNATION_PREPARE: case PM_SUSPEND_PREPARE: + reinit_completion(&xe->pm_block); xe_pm_runtime_get(xe); err = xe_bo_evict_all_user(xe); if (err) @@ -310,6 +325,8 @@ static int xe_pm_notifier_callback(struc break; case PM_POST_HIBERNATION: case PM_POST_SUSPEND: + complete_all(&xe->pm_block); + xe_pm_wake_rebind_workers(xe); xe_bo_notifier_unprepare_all_pinned(xe); xe_pm_runtime_put(xe); break; @@ -336,6 +353,14 @@ int xe_pm_init(struct xe_device *xe) if (err) return err;
+ err = drmm_mutex_init(&xe->drm, &xe->rebind_resume_lock); + if (err) + goto err_unregister; + + init_completion(&xe->pm_block); + complete_all(&xe->pm_block); + INIT_LIST_HEAD(&xe->rebind_resume_list); + /* For now suspend/resume is only allowed with GuC */ if (!xe_device_uc_enabled(xe)) return 0; --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -393,6 +393,9 @@ static int xe_gpuvm_validate(struct drm_ list_move_tail(&gpuva_to_vma(gpuva)->combined_links.rebind, &vm->rebind_list);
+ if (!try_wait_for_completion(&vm->xe->pm_block)) + return -EAGAIN; + ret = xe_bo_validate(gem_to_xe_bo(vm_bo->obj), vm, false); if (ret) return ret; @@ -479,6 +482,33 @@ static int xe_preempt_work_begin(struct return xe_vm_validate_rebind(vm, exec, vm->preempt.num_exec_queues); }
+static bool vm_suspend_rebind_worker(struct xe_vm *vm) +{ + struct xe_device *xe = vm->xe; + bool ret = false; + + mutex_lock(&xe->rebind_resume_lock); + if (!try_wait_for_completion(&vm->xe->pm_block)) { + ret = true; + list_move_tail(&vm->preempt.pm_activate_link, &xe->rebind_resume_list); + } + mutex_unlock(&xe->rebind_resume_lock); + + return ret; +} + +/** + * xe_vm_resume_rebind_worker() - Resume the rebind worker. + * @vm: The vm whose preempt worker to resume. + * + * Resume a preempt worker that was previously suspended by + * vm_suspend_rebind_worker(). + */ +void xe_vm_resume_rebind_worker(struct xe_vm *vm) +{ + queue_work(vm->xe->ordered_wq, &vm->preempt.rebind_work); +} + static void preempt_rebind_work_func(struct work_struct *w) { struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work); @@ -502,6 +532,11 @@ static void preempt_rebind_work_func(str }
retry: + if (!try_wait_for_completion(&vm->xe->pm_block) && vm_suspend_rebind_worker(vm)) { + up_write(&vm->lock); + return; + } + if (xe_vm_userptr_check_repin(vm)) { err = xe_vm_userptr_pin(vm); if (err) @@ -1686,6 +1721,7 @@ struct xe_vm *xe_vm_create(struct xe_dev if (flags & XE_VM_FLAG_LR_MODE) { INIT_WORK(&vm->preempt.rebind_work, preempt_rebind_work_func); xe_pm_runtime_get_noresume(xe); + INIT_LIST_HEAD(&vm->preempt.pm_activate_link); }
if (flags & XE_VM_FLAG_FAULT_MODE) { @@ -1867,8 +1903,12 @@ void xe_vm_close_and_put(struct xe_vm *v xe_assert(xe, !vm->preempt.num_exec_queues);
xe_vm_close(vm); - if (xe_vm_in_preempt_fence_mode(vm)) + if (xe_vm_in_preempt_fence_mode(vm)) { + mutex_lock(&xe->rebind_resume_lock); + list_del_init(&vm->preempt.pm_activate_link); + mutex_unlock(&xe->rebind_resume_lock); flush_work(&vm->preempt.rebind_work); + } if (xe_vm_in_fault_mode(vm)) xe_svm_close(vm);
--- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -268,6 +268,8 @@ struct dma_fence *xe_vm_bind_kernel_bo(s struct xe_exec_queue *q, u64 addr, enum xe_cache_level cache_lvl);
+void xe_vm_resume_rebind_worker(struct xe_vm *vm); + /** * xe_vm_resv() - Return's the vm's reservation object * @vm: The vm --- a/drivers/gpu/drm/xe/xe_vm_types.h +++ b/drivers/gpu/drm/xe/xe_vm_types.h @@ -286,6 +286,11 @@ struct xe_vm { * BOs */ struct work_struct rebind_work; + /** + * @preempt.pm_activate_link: Link to list of rebind workers to be + * kicked on resume. + */ + struct list_head pm_activate_link; } preempt;
/** @um: unified memory state */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit 7838fb5f119191403560eca2e23613380c0e425e upstream.
Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting.
Reported-by: Lin.Cao lincao12@amd.com Cc: Vitaly Prosyak vitaly.prosyak@amd.com Cc: Christian König christian.koenig@amd.com Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free") Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit a525fa37aac36c4591cc8b07ae8957862415fbd5) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -389,8 +389,6 @@ void amdgpu_ring_fini(struct amdgpu_ring dma_fence_put(ring->vmid_wait); ring->vmid_wait = NULL; ring->me = 0; - - ring->adev->rings[ring->idx] = NULL; }
/**
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Rosca david.rosca@amd.com
commit 3318f2d20ce48849855df5e190813826d0bc3653 upstream.
There is no reason to require this to happen on first submitted IB only. We need to wait for the queue to be idle, but it can be done at any time (including when there are multiple video sessions active).
Signed-off-by: David Rosca david.rosca@amd.com Reviewed-by: Leo Liu leo.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 8908fdce0634a623404e9923ed2f536101a39db5) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 12 ++++++++---- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 12 ++++++++---- 2 files changed, 16 insertions(+), 8 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -1875,15 +1875,19 @@ static int vcn_v3_0_limit_sched(struct a struct amdgpu_job *job) { struct drm_gpu_scheduler **scheds; - - /* The create msg must be in the first IB submitted */ - if (atomic_read(&job->base.entity->fence_seq)) - return -EINVAL; + struct dma_fence *fence;
/* if VCN0 is harvested, we can't support AV1 */ if (p->adev->vcn.harvest_config & AMDGPU_VCN_HARVEST_VCN0) return -EINVAL;
+ /* wait for all jobs to finish before switching to instance 0 */ + fence = amdgpu_ctx_get_fence(p->ctx, job->base.entity, ~0ull); + if (fence) { + dma_fence_wait(fence, false); + dma_fence_put(fence); + } + scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_DEC] [AMDGPU_RING_PRIO_DEFAULT].sched; drm_sched_entity_modify_sched(job->base.entity, scheds, 1); --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c @@ -1807,15 +1807,19 @@ static int vcn_v4_0_limit_sched(struct a struct amdgpu_job *job) { struct drm_gpu_scheduler **scheds; - - /* The create msg must be in the first IB submitted */ - if (atomic_read(&job->base.entity->fence_seq)) - return -EINVAL; + struct dma_fence *fence;
/* if VCN0 is harvested, we can't support AV1 */ if (p->adev->vcn.harvest_config & AMDGPU_VCN_HARVEST_VCN0) return -EINVAL;
+ /* wait for all jobs to finish before switching to instance 0 */ + fence = amdgpu_ctx_get_fence(p->ctx, job->base.entity, ~0ull); + if (fence) { + dma_fence_wait(fence, false); + dma_fence_put(fence); + } + scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_ENC] [AMDGPU_RING_PRIO_0].sched; drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Rosca david.rosca@amd.com
commit 2b10cb58d7a3fd621ec9b2ba765a092e562ef998 upstream.
There can be multiple engine info packages in one IB and the first one may be common engine, not decode/encode. We need to parse the entire IB instead of stopping after finding first engine info.
Signed-off-by: David Rosca david.rosca@amd.com Reviewed-by: Leo Liu leo.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit dc8f9f0f45166a6b37864e7a031c726981d6e5fc) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 52 +++++++++++++--------------------- 1 file changed, 21 insertions(+), 31 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c @@ -1910,22 +1910,16 @@ out:
#define RADEON_VCN_ENGINE_TYPE_ENCODE (0x00000002) #define RADEON_VCN_ENGINE_TYPE_DECODE (0x00000003) - #define RADEON_VCN_ENGINE_INFO (0x30000001) -#define RADEON_VCN_ENGINE_INFO_MAX_OFFSET 16 - #define RENCODE_ENCODE_STANDARD_AV1 2 #define RENCODE_IB_PARAM_SESSION_INIT 0x00000003 -#define RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET 64
-/* return the offset in ib if id is found, -1 otherwise - * to speed up the searching we only search upto max_offset - */ -static int vcn_v4_0_enc_find_ib_param(struct amdgpu_ib *ib, uint32_t id, int max_offset) +/* return the offset in ib if id is found, -1 otherwise */ +static int vcn_v4_0_enc_find_ib_param(struct amdgpu_ib *ib, uint32_t id, int start) { int i;
- for (i = 0; i < ib->length_dw && i < max_offset && ib->ptr[i] >= 8; i += ib->ptr[i]/4) { + for (i = start; i < ib->length_dw && ib->ptr[i] >= 8; i += ib->ptr[i] / 4) { if (ib->ptr[i + 1] == id) return i; } @@ -1940,33 +1934,29 @@ static int vcn_v4_0_ring_patch_cs_in_pla struct amdgpu_vcn_decode_buffer *decode_buffer; uint64_t addr; uint32_t val; - int idx; + int idx = 0, sidx;
/* The first instance can decode anything */ if (!ring->me) return 0;
- /* RADEON_VCN_ENGINE_INFO is at the top of ib block */ - idx = vcn_v4_0_enc_find_ib_param(ib, RADEON_VCN_ENGINE_INFO, - RADEON_VCN_ENGINE_INFO_MAX_OFFSET); - if (idx < 0) /* engine info is missing */ - return 0; - - val = amdgpu_ib_get_value(ib, idx + 2); /* RADEON_VCN_ENGINE_TYPE */ - if (val == RADEON_VCN_ENGINE_TYPE_DECODE) { - decode_buffer = (struct amdgpu_vcn_decode_buffer *)&ib->ptr[idx + 6]; - - if (!(decode_buffer->valid_buf_flag & 0x1)) - return 0; - - addr = ((u64)decode_buffer->msg_buffer_address_hi) << 32 | - decode_buffer->msg_buffer_address_lo; - return vcn_v4_0_dec_msg(p, job, addr); - } else if (val == RADEON_VCN_ENGINE_TYPE_ENCODE) { - idx = vcn_v4_0_enc_find_ib_param(ib, RENCODE_IB_PARAM_SESSION_INIT, - RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET); - if (idx >= 0 && ib->ptr[idx + 2] == RENCODE_ENCODE_STANDARD_AV1) - return vcn_v4_0_limit_sched(p, job); + while ((idx = vcn_v4_0_enc_find_ib_param(ib, RADEON_VCN_ENGINE_INFO, idx)) >= 0) { + val = amdgpu_ib_get_value(ib, idx + 2); /* RADEON_VCN_ENGINE_TYPE */ + if (val == RADEON_VCN_ENGINE_TYPE_DECODE) { + decode_buffer = (struct amdgpu_vcn_decode_buffer *)&ib->ptr[idx + 6]; + + if (!(decode_buffer->valid_buf_flag & 0x1)) + return 0; + + addr = ((u64)decode_buffer->msg_buffer_address_hi) << 32 | + decode_buffer->msg_buffer_address_lo; + return vcn_v4_0_dec_msg(p, job, addr); + } else if (val == RADEON_VCN_ENGINE_TYPE_ENCODE) { + sidx = vcn_v4_0_enc_find_ib_param(ib, RENCODE_IB_PARAM_SESSION_INIT, idx); + if (sidx >= 0 && ib->ptr[sidx + 2] == RENCODE_ENCODE_STANDARD_AV1) + return vcn_v4_0_limit_sched(p, job); + } + idx += ib->ptr[idx] / 4; } return 0; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ovidiu Bunea ovidiu.bunea@amd.com
commit 70f0b051f82d0234ade2f6753f72a2610048db3b upstream.
[why] The current PG & RCG programming in driver has some gaps and incorrect sequences.
[how] Added delays after ungating clocks to allow ramp up, increased polling to allow more time for power up, and removed the incorrect sequences.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Reviewed-by: Charlene Liu charlene.liu@amd.com Signed-off-by: Ovidiu Bunea ovidiu.bunea@amd.com Signed-off-by: Wayne Lin wayne.lin@amd.com Tested-by: Dan Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 1bde5584e297921f45911ae874b0175dce5ed4b5) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dc.h | 1 drivers/gpu/drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c | 74 +++++---- drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 115 ++------------- drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c | 3 drivers/gpu/drm/amd/display/dc/hwss/dcn351/dcn351_init.c | 3 drivers/gpu/drm/amd/display/dc/inc/hw/pg_cntl.h | 1 drivers/gpu/drm/amd/display/dc/pg/dcn35/dcn35_pg_cntl.c | 78 ++++++---- 7 files changed, 111 insertions(+), 164 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -1095,6 +1095,7 @@ struct dc_debug_options { bool enable_hblank_borrow; bool force_subvp_df_throttle; uint32_t acpi_transition_bitmasks[MAX_PIPES]; + bool enable_pg_cntl_debug_logs; };
--- a/drivers/gpu/drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c +++ b/drivers/gpu/drm/amd/display/dc/dccg/dcn35/dcn35_dccg.c @@ -133,30 +133,34 @@ enum dsc_clk_source { };
-static void dccg35_set_dsc_clk_rcg(struct dccg *dccg, int inst, bool enable) +static void dccg35_set_dsc_clk_rcg(struct dccg *dccg, int inst, bool allow_rcg) { struct dcn_dccg *dccg_dcn = TO_DCN_DCCG(dccg);
- if (!dccg->ctx->dc->debug.root_clock_optimization.bits.dsc && enable) + if (!dccg->ctx->dc->debug.root_clock_optimization.bits.dsc && allow_rcg) return;
switch (inst) { case 0: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK0_ROOT_GATE_DISABLE, enable ? 0 : 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK0_ROOT_GATE_DISABLE, allow_rcg ? 0 : 1); break; case 1: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK1_ROOT_GATE_DISABLE, enable ? 0 : 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK1_ROOT_GATE_DISABLE, allow_rcg ? 0 : 1); break; case 2: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK2_ROOT_GATE_DISABLE, enable ? 0 : 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK2_ROOT_GATE_DISABLE, allow_rcg ? 0 : 1); break; case 3: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK3_ROOT_GATE_DISABLE, enable ? 0 : 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK3_ROOT_GATE_DISABLE, allow_rcg ? 0 : 1); break; default: BREAK_TO_DEBUGGER(); return; } + + /* Wait for clock to ramp */ + if (!allow_rcg) + udelay(10); }
static void dccg35_set_symclk32_se_rcg( @@ -385,35 +389,34 @@ static void dccg35_set_dtbclk_p_rcg(stru } }
-static void dccg35_set_dppclk_rcg(struct dccg *dccg, - int inst, bool enable) +static void dccg35_set_dppclk_rcg(struct dccg *dccg, int inst, bool allow_rcg) { - struct dcn_dccg *dccg_dcn = TO_DCN_DCCG(dccg);
- - if (!dccg->ctx->dc->debug.root_clock_optimization.bits.dpp && enable) + if (!dccg->ctx->dc->debug.root_clock_optimization.bits.dpp && allow_rcg) return;
switch (inst) { case 0: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK0_ROOT_GATE_DISABLE, enable ? 0 : 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK0_ROOT_GATE_DISABLE, allow_rcg ? 0 : 1); break; case 1: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK1_ROOT_GATE_DISABLE, enable ? 0 : 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK1_ROOT_GATE_DISABLE, allow_rcg ? 0 : 1); break; case 2: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK2_ROOT_GATE_DISABLE, enable ? 0 : 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK2_ROOT_GATE_DISABLE, allow_rcg ? 0 : 1); break; case 3: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK3_ROOT_GATE_DISABLE, enable ? 0 : 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK3_ROOT_GATE_DISABLE, allow_rcg ? 0 : 1); break; default: BREAK_TO_DEBUGGER(); break; } - //DC_LOG_DEBUG("%s: inst(%d) DPPCLK rcg_disable: %d\n", __func__, inst, enable ? 0 : 1);
+ /* Wait for clock to ramp */ + if (!allow_rcg) + udelay(10); }
static void dccg35_set_dpstreamclk_rcg( @@ -1177,32 +1180,34 @@ static void dccg35_update_dpp_dto(struct }
static void dccg35_set_dppclk_root_clock_gating(struct dccg *dccg, - uint32_t dpp_inst, uint32_t enable) + uint32_t dpp_inst, uint32_t disallow_rcg) { struct dcn_dccg *dccg_dcn = TO_DCN_DCCG(dccg);
- if (!dccg->ctx->dc->debug.root_clock_optimization.bits.dpp) + if (!dccg->ctx->dc->debug.root_clock_optimization.bits.dpp && !disallow_rcg) return;
switch (dpp_inst) { case 0: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK0_ROOT_GATE_DISABLE, enable); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK0_ROOT_GATE_DISABLE, disallow_rcg); break; case 1: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK1_ROOT_GATE_DISABLE, enable); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK1_ROOT_GATE_DISABLE, disallow_rcg); break; case 2: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK2_ROOT_GATE_DISABLE, enable); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK2_ROOT_GATE_DISABLE, disallow_rcg); break; case 3: - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK3_ROOT_GATE_DISABLE, enable); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DPPCLK3_ROOT_GATE_DISABLE, disallow_rcg); break; default: break; } - //DC_LOG_DEBUG("%s: dpp_inst(%d) rcg: %d\n", __func__, dpp_inst, enable);
+ /* Wait for clock to ramp */ + if (disallow_rcg) + udelay(10); }
static void dccg35_get_pixel_rate_div( @@ -1782,8 +1787,7 @@ static void dccg35_enable_dscclk(struct //Disable DTO switch (inst) { case 0: - if (dccg->ctx->dc->debug.root_clock_optimization.bits.dsc) - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK0_ROOT_GATE_DISABLE, 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK0_ROOT_GATE_DISABLE, 1);
REG_UPDATE_2(DSCCLK0_DTO_PARAM, DSCCLK0_DTO_PHASE, 0, @@ -1791,8 +1795,7 @@ static void dccg35_enable_dscclk(struct REG_UPDATE(DSCCLK_DTO_CTRL, DSCCLK0_EN, 1); break; case 1: - if (dccg->ctx->dc->debug.root_clock_optimization.bits.dsc) - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK1_ROOT_GATE_DISABLE, 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK1_ROOT_GATE_DISABLE, 1);
REG_UPDATE_2(DSCCLK1_DTO_PARAM, DSCCLK1_DTO_PHASE, 0, @@ -1800,8 +1803,7 @@ static void dccg35_enable_dscclk(struct REG_UPDATE(DSCCLK_DTO_CTRL, DSCCLK1_EN, 1); break; case 2: - if (dccg->ctx->dc->debug.root_clock_optimization.bits.dsc) - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK2_ROOT_GATE_DISABLE, 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK2_ROOT_GATE_DISABLE, 1);
REG_UPDATE_2(DSCCLK2_DTO_PARAM, DSCCLK2_DTO_PHASE, 0, @@ -1809,8 +1811,7 @@ static void dccg35_enable_dscclk(struct REG_UPDATE(DSCCLK_DTO_CTRL, DSCCLK2_EN, 1); break; case 3: - if (dccg->ctx->dc->debug.root_clock_optimization.bits.dsc) - REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK3_ROOT_GATE_DISABLE, 1); + REG_UPDATE(DCCG_GATE_DISABLE_CNTL6, DSCCLK3_ROOT_GATE_DISABLE, 1);
REG_UPDATE_2(DSCCLK3_DTO_PARAM, DSCCLK3_DTO_PHASE, 0, @@ -1821,6 +1822,9 @@ static void dccg35_enable_dscclk(struct BREAK_TO_DEBUGGER(); return; } + + /* Wait for clock to ramp */ + udelay(10); }
static void dccg35_disable_dscclk(struct dccg *dccg, @@ -1864,6 +1868,9 @@ static void dccg35_disable_dscclk(struct default: return; } + + /* Wait for clock ramp */ + udelay(10); }
static void dccg35_enable_symclk_se(struct dccg *dccg, uint32_t stream_enc_inst, uint32_t link_enc_inst) @@ -2349,10 +2356,7 @@ static void dccg35_disable_symclk_se_cb(
void dccg35_root_gate_disable_control(struct dccg *dccg, uint32_t pipe_idx, uint32_t disable_clock_gating) { - - if (dccg->ctx->dc->debug.root_clock_optimization.bits.dpp) { - dccg35_set_dppclk_root_clock_gating(dccg, pipe_idx, disable_clock_gating); - } + dccg35_set_dppclk_root_clock_gating(dccg, pipe_idx, disable_clock_gating); }
static const struct dccg_funcs dccg35_funcs_new = { --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c @@ -113,6 +113,14 @@ static void enable_memory_low_power(stru } #endif
+static void print_pg_status(struct dc *dc, const char *debug_func, const char *debug_log) +{ + if (dc->debug.enable_pg_cntl_debug_logs && dc->res_pool->pg_cntl) { + if (dc->res_pool->pg_cntl->funcs->print_pg_status) + dc->res_pool->pg_cntl->funcs->print_pg_status(dc->res_pool->pg_cntl, debug_func, debug_log); + } +} + void dcn35_set_dmu_fgcg(struct dce_hwseq *hws, bool enable) { REG_UPDATE_3(DMU_CLK_CNTL, @@ -137,6 +145,8 @@ void dcn35_init_hw(struct dc *dc) uint32_t user_level = MAX_BACKLIGHT_LEVEL; int i;
+ print_pg_status(dc, __func__, ": start"); + if (dc->clk_mgr && dc->clk_mgr->funcs->init_clocks) dc->clk_mgr->funcs->init_clocks(dc->clk_mgr);
@@ -200,10 +210,7 @@ void dcn35_init_hw(struct dc *dc)
/* we want to turn off all dp displays before doing detection */ dc->link_srv->blank_all_dp_displays(dc); -/* - if (hws->funcs.enable_power_gating_plane) - hws->funcs.enable_power_gating_plane(dc->hwseq, true); -*/ + if (res_pool->hubbub && res_pool->hubbub->funcs->dchubbub_init) res_pool->hubbub->funcs->dchubbub_init(dc->res_pool->hubbub); /* If taking control over from VBIOS, we may want to optimize our first @@ -236,6 +243,8 @@ void dcn35_init_hw(struct dc *dc) }
hws->funcs.init_pipes(dc, dc->current_state); + print_pg_status(dc, __func__, ": after init_pipes"); + if (dc->res_pool->hubbub->funcs->allow_self_refresh_control && !dc->res_pool->hubbub->ctx->dc->debug.disable_stutter) dc->res_pool->hubbub->funcs->allow_self_refresh_control(dc->res_pool->hubbub, @@ -312,6 +321,7 @@ void dcn35_init_hw(struct dc *dc) if (dc->res_pool->pg_cntl->funcs->init_pg_status) dc->res_pool->pg_cntl->funcs->init_pg_status(dc->res_pool->pg_cntl); } + print_pg_status(dc, __func__, ": after init_pg_status"); }
static void update_dsc_on_stream(struct pipe_ctx *pipe_ctx, bool enable) @@ -500,97 +510,6 @@ void dcn35_physymclk_root_clock_control( } }
-void dcn35_dsc_pg_control( - struct dce_hwseq *hws, - unsigned int dsc_inst, - bool power_on) -{ - uint32_t power_gate = power_on ? 0 : 1; - uint32_t pwr_status = power_on ? 0 : 2; - uint32_t org_ip_request_cntl = 0; - - if (hws->ctx->dc->debug.disable_dsc_power_gate) - return; - if (hws->ctx->dc->debug.ignore_pg) - return; - REG_GET(DC_IP_REQUEST_CNTL, IP_REQUEST_EN, &org_ip_request_cntl); - if (org_ip_request_cntl == 0) - REG_SET(DC_IP_REQUEST_CNTL, 0, IP_REQUEST_EN, 1); - - switch (dsc_inst) { - case 0: /* DSC0 */ - REG_UPDATE(DOMAIN16_PG_CONFIG, - DOMAIN_POWER_GATE, power_gate); - - REG_WAIT(DOMAIN16_PG_STATUS, - DOMAIN_PGFSM_PWR_STATUS, pwr_status, - 1, 1000); - break; - case 1: /* DSC1 */ - REG_UPDATE(DOMAIN17_PG_CONFIG, - DOMAIN_POWER_GATE, power_gate); - - REG_WAIT(DOMAIN17_PG_STATUS, - DOMAIN_PGFSM_PWR_STATUS, pwr_status, - 1, 1000); - break; - case 2: /* DSC2 */ - REG_UPDATE(DOMAIN18_PG_CONFIG, - DOMAIN_POWER_GATE, power_gate); - - REG_WAIT(DOMAIN18_PG_STATUS, - DOMAIN_PGFSM_PWR_STATUS, pwr_status, - 1, 1000); - break; - case 3: /* DSC3 */ - REG_UPDATE(DOMAIN19_PG_CONFIG, - DOMAIN_POWER_GATE, power_gate); - - REG_WAIT(DOMAIN19_PG_STATUS, - DOMAIN_PGFSM_PWR_STATUS, pwr_status, - 1, 1000); - break; - default: - BREAK_TO_DEBUGGER(); - break; - } - - if (org_ip_request_cntl == 0) - REG_SET(DC_IP_REQUEST_CNTL, 0, IP_REQUEST_EN, 0); -} - -void dcn35_enable_power_gating_plane(struct dce_hwseq *hws, bool enable) -{ - bool force_on = true; /* disable power gating */ - uint32_t org_ip_request_cntl = 0; - - if (hws->ctx->dc->debug.disable_hubp_power_gate) - return; - if (hws->ctx->dc->debug.ignore_pg) - return; - REG_GET(DC_IP_REQUEST_CNTL, IP_REQUEST_EN, &org_ip_request_cntl); - if (org_ip_request_cntl == 0) - REG_SET(DC_IP_REQUEST_CNTL, 0, IP_REQUEST_EN, 1); - /* DCHUBP0/1/2/3/4/5 */ - REG_UPDATE(DOMAIN0_PG_CONFIG, DOMAIN_POWER_FORCEON, force_on); - REG_UPDATE(DOMAIN2_PG_CONFIG, DOMAIN_POWER_FORCEON, force_on); - /* DPP0/1/2/3/4/5 */ - REG_UPDATE(DOMAIN1_PG_CONFIG, DOMAIN_POWER_FORCEON, force_on); - REG_UPDATE(DOMAIN3_PG_CONFIG, DOMAIN_POWER_FORCEON, force_on); - - force_on = true; /* disable power gating */ - if (enable && !hws->ctx->dc->debug.disable_dsc_power_gate) - force_on = false; - - /* DCS0/1/2/3/4 */ - REG_UPDATE(DOMAIN16_PG_CONFIG, DOMAIN_POWER_FORCEON, force_on); - REG_UPDATE(DOMAIN17_PG_CONFIG, DOMAIN_POWER_FORCEON, force_on); - REG_UPDATE(DOMAIN18_PG_CONFIG, DOMAIN_POWER_FORCEON, force_on); - REG_UPDATE(DOMAIN19_PG_CONFIG, DOMAIN_POWER_FORCEON, force_on); - - -} - /* In headless boot cases, DIG may be turned * on which causes HW/SW discrepancies. * To avoid this, power down hardware on boot @@ -1453,6 +1372,8 @@ void dcn35_prepare_bandwidth( }
dcn20_prepare_bandwidth(dc, context); + + print_pg_status(dc, __func__, ": after rcg and power up"); }
void dcn35_optimize_bandwidth( @@ -1461,6 +1382,8 @@ void dcn35_optimize_bandwidth( { struct pg_block_update pg_update_state;
+ print_pg_status(dc, __func__, ": before rcg and power up"); + dcn20_optimize_bandwidth(dc, context);
if (dc->hwss.calc_blocks_to_gate) { @@ -1472,6 +1395,8 @@ void dcn35_optimize_bandwidth( if (dc->hwss.root_clock_control) dc->hwss.root_clock_control(dc, &pg_update_state, false); } + + print_pg_status(dc, __func__, ": after rcg and power up"); }
void dcn35_set_drr(struct pipe_ctx **pipe_ctx, --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c @@ -115,7 +115,6 @@ static const struct hw_sequencer_funcs d .exit_optimized_pwr_state = dcn21_exit_optimized_pwr_state, .update_visual_confirm_color = dcn10_update_visual_confirm_color, .apply_idle_power_optimizations = dcn35_apply_idle_power_optimizations, - .update_dsc_pg = dcn32_update_dsc_pg, .calc_blocks_to_gate = dcn35_calc_blocks_to_gate, .calc_blocks_to_ungate = dcn35_calc_blocks_to_ungate, .hw_block_power_up = dcn35_hw_block_power_up, @@ -150,7 +149,6 @@ static const struct hwseq_private_funcs .plane_atomic_disable = dcn35_plane_atomic_disable, //.plane_atomic_disable = dcn20_plane_atomic_disable,/*todo*/ //.hubp_pg_control = dcn35_hubp_pg_control, - .enable_power_gating_plane = dcn35_enable_power_gating_plane, .dpp_root_clock_control = dcn35_dpp_root_clock_control, .dpstream_root_clock_control = dcn35_dpstream_root_clock_control, .physymclk_root_clock_control = dcn35_physymclk_root_clock_control, @@ -165,7 +163,6 @@ static const struct hwseq_private_funcs .calculate_dccg_k1_k2_values = dcn32_calculate_dccg_k1_k2_values, .resync_fifo_dccg_dio = dcn314_resync_fifo_dccg_dio, .is_dp_dig_pixel_rate_div_policy = dcn35_is_dp_dig_pixel_rate_div_policy, - .dsc_pg_control = dcn35_dsc_pg_control, .dsc_pg_status = dcn32_dsc_pg_status, .enable_plane = dcn35_enable_plane, .wait_for_pipe_update_if_needed = dcn10_wait_for_pipe_update_if_needed, --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn351/dcn351_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn351/dcn351_init.c @@ -114,7 +114,6 @@ static const struct hw_sequencer_funcs d .exit_optimized_pwr_state = dcn21_exit_optimized_pwr_state, .update_visual_confirm_color = dcn10_update_visual_confirm_color, .apply_idle_power_optimizations = dcn35_apply_idle_power_optimizations, - .update_dsc_pg = dcn32_update_dsc_pg, .calc_blocks_to_gate = dcn351_calc_blocks_to_gate, .calc_blocks_to_ungate = dcn351_calc_blocks_to_ungate, .hw_block_power_up = dcn351_hw_block_power_up, @@ -145,7 +144,6 @@ static const struct hwseq_private_funcs .plane_atomic_disable = dcn35_plane_atomic_disable, //.plane_atomic_disable = dcn20_plane_atomic_disable,/*todo*/ //.hubp_pg_control = dcn35_hubp_pg_control, - .enable_power_gating_plane = dcn35_enable_power_gating_plane, .dpp_root_clock_control = dcn35_dpp_root_clock_control, .dpstream_root_clock_control = dcn35_dpstream_root_clock_control, .physymclk_root_clock_control = dcn35_physymclk_root_clock_control, @@ -159,7 +157,6 @@ static const struct hwseq_private_funcs .setup_hpo_hw_control = dcn35_setup_hpo_hw_control, .calculate_dccg_k1_k2_values = dcn32_calculate_dccg_k1_k2_values, .is_dp_dig_pixel_rate_div_policy = dcn35_is_dp_dig_pixel_rate_div_policy, - .dsc_pg_control = dcn35_dsc_pg_control, .dsc_pg_status = dcn32_dsc_pg_status, .enable_plane = dcn35_enable_plane, .wait_for_pipe_update_if_needed = dcn10_wait_for_pipe_update_if_needed, --- a/drivers/gpu/drm/amd/display/dc/inc/hw/pg_cntl.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/pg_cntl.h @@ -47,6 +47,7 @@ struct pg_cntl_funcs { void (*optc_pg_control)(struct pg_cntl *pg_cntl, unsigned int optc_inst, bool power_on); void (*dwb_pg_control)(struct pg_cntl *pg_cntl, bool power_on); void (*init_pg_status)(struct pg_cntl *pg_cntl); + void (*print_pg_status)(struct pg_cntl *pg_cntl, const char *debug_func, const char *debug_log); };
#endif //__DC_PG_CNTL_H__ --- a/drivers/gpu/drm/amd/display/dc/pg/dcn35/dcn35_pg_cntl.c +++ b/drivers/gpu/drm/amd/display/dc/pg/dcn35/dcn35_pg_cntl.c @@ -79,16 +79,12 @@ void pg_cntl35_dsc_pg_control(struct pg_ uint32_t power_gate = power_on ? 0 : 1; uint32_t pwr_status = power_on ? 0 : 2; uint32_t org_ip_request_cntl = 0; - bool block_enabled; - - /*need to enable dscclk regardless DSC_PG*/ - if (pg_cntl->ctx->dc->res_pool->dccg->funcs->enable_dsc && power_on) - pg_cntl->ctx->dc->res_pool->dccg->funcs->enable_dsc( - pg_cntl->ctx->dc->res_pool->dccg, dsc_inst); + bool block_enabled = false; + bool skip_pg = pg_cntl->ctx->dc->debug.ignore_pg || + pg_cntl->ctx->dc->debug.disable_dsc_power_gate || + pg_cntl->ctx->dc->idle_optimizations_allowed;
- if (pg_cntl->ctx->dc->debug.ignore_pg || - pg_cntl->ctx->dc->debug.disable_dsc_power_gate || - pg_cntl->ctx->dc->idle_optimizations_allowed) + if (skip_pg && !power_on) return;
block_enabled = pg_cntl35_dsc_pg_status(pg_cntl, dsc_inst); @@ -111,7 +107,7 @@ void pg_cntl35_dsc_pg_control(struct pg_
REG_WAIT(DOMAIN16_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, - 1, 1000); + 1, 10000); break; case 1: /* DSC1 */ REG_UPDATE(DOMAIN17_PG_CONFIG, @@ -119,7 +115,7 @@ void pg_cntl35_dsc_pg_control(struct pg_
REG_WAIT(DOMAIN17_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, - 1, 1000); + 1, 10000); break; case 2: /* DSC2 */ REG_UPDATE(DOMAIN18_PG_CONFIG, @@ -127,7 +123,7 @@ void pg_cntl35_dsc_pg_control(struct pg_
REG_WAIT(DOMAIN18_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, - 1, 1000); + 1, 10000); break; case 3: /* DSC3 */ REG_UPDATE(DOMAIN19_PG_CONFIG, @@ -135,7 +131,7 @@ void pg_cntl35_dsc_pg_control(struct pg_
REG_WAIT(DOMAIN19_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, - 1, 1000); + 1, 10000); break; default: BREAK_TO_DEBUGGER(); @@ -144,12 +140,6 @@ void pg_cntl35_dsc_pg_control(struct pg_
if (dsc_inst < MAX_PIPES) pg_cntl->pg_pipe_res_enable[PG_DSC][dsc_inst] = power_on; - - if (pg_cntl->ctx->dc->res_pool->dccg->funcs->disable_dsc && !power_on) { - /*this is to disable dscclk*/ - pg_cntl->ctx->dc->res_pool->dccg->funcs->disable_dsc( - pg_cntl->ctx->dc->res_pool->dccg, dsc_inst); - } }
static bool pg_cntl35_hubp_dpp_pg_status(struct pg_cntl *pg_cntl, unsigned int hubp_dpp_inst) @@ -189,11 +179,12 @@ void pg_cntl35_hubp_dpp_pg_control(struc uint32_t pwr_status = power_on ? 0 : 2; uint32_t org_ip_request_cntl; bool block_enabled; + bool skip_pg = pg_cntl->ctx->dc->debug.ignore_pg || + pg_cntl->ctx->dc->debug.disable_hubp_power_gate || + pg_cntl->ctx->dc->debug.disable_dpp_power_gate || + pg_cntl->ctx->dc->idle_optimizations_allowed;
- if (pg_cntl->ctx->dc->debug.ignore_pg || - pg_cntl->ctx->dc->debug.disable_hubp_power_gate || - pg_cntl->ctx->dc->debug.disable_dpp_power_gate || - pg_cntl->ctx->dc->idle_optimizations_allowed) + if (skip_pg && !power_on) return;
block_enabled = pg_cntl35_hubp_dpp_pg_status(pg_cntl, hubp_dpp_inst); @@ -213,22 +204,22 @@ void pg_cntl35_hubp_dpp_pg_control(struc case 0: /* DPP0 & HUBP0 */ REG_UPDATE(DOMAIN0_PG_CONFIG, DOMAIN_POWER_GATE, power_gate); - REG_WAIT(DOMAIN0_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, 1, 1000); + REG_WAIT(DOMAIN0_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, 1, 10000); break; case 1: /* DPP1 & HUBP1 */ REG_UPDATE(DOMAIN1_PG_CONFIG, DOMAIN_POWER_GATE, power_gate); - REG_WAIT(DOMAIN1_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, 1, 1000); + REG_WAIT(DOMAIN1_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, 1, 10000); break; case 2: /* DPP2 & HUBP2 */ REG_UPDATE(DOMAIN2_PG_CONFIG, DOMAIN_POWER_GATE, power_gate); - REG_WAIT(DOMAIN2_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, 1, 1000); + REG_WAIT(DOMAIN2_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, 1, 10000); break; case 3: /* DPP3 & HUBP3 */ REG_UPDATE(DOMAIN3_PG_CONFIG, DOMAIN_POWER_GATE, power_gate); - REG_WAIT(DOMAIN3_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, 1, 1000); + REG_WAIT(DOMAIN3_PG_STATUS, DOMAIN_PGFSM_PWR_STATUS, pwr_status, 1, 10000); break; default: BREAK_TO_DEBUGGER(); @@ -501,6 +492,36 @@ void pg_cntl35_init_pg_status(struct pg_ pg_cntl->pg_res_enable[PG_DWB] = block_enabled; }
+static void pg_cntl35_print_pg_status(struct pg_cntl *pg_cntl, const char *debug_func, const char *debug_log) +{ + int i = 0; + bool block_enabled = false; + + DC_LOG_DEBUG("%s: %s", debug_func, debug_log); + + DC_LOG_DEBUG("PG_CNTL status:\n"); + + block_enabled = pg_cntl35_io_clk_status(pg_cntl); + DC_LOG_DEBUG("ONO0=%d (DCCG, DIO, DCIO)\n", block_enabled ? 1 : 0); + + block_enabled = pg_cntl35_mem_status(pg_cntl); + DC_LOG_DEBUG("ONO1=%d (DCHUBBUB, DCHVM, DCHUBBUBMEM)\n", block_enabled ? 1 : 0); + + block_enabled = pg_cntl35_plane_otg_status(pg_cntl); + DC_LOG_DEBUG("ONO2=%d (MPC, OPP, OPTC, DWB)\n", block_enabled ? 1 : 0); + + block_enabled = pg_cntl35_hpo_pg_status(pg_cntl); + DC_LOG_DEBUG("ONO3=%d (HPO)\n", block_enabled ? 1 : 0); + + for (i = 0; i < pg_cntl->ctx->dc->res_pool->pipe_count; i++) { + block_enabled = pg_cntl35_hubp_dpp_pg_status(pg_cntl, i); + DC_LOG_DEBUG("ONO%d=%d (DCHUBP%d, DPP%d)\n", 4 + i * 2, block_enabled ? 1 : 0, i, i); + + block_enabled = pg_cntl35_dsc_pg_status(pg_cntl, i); + DC_LOG_DEBUG("ONO%d=%d (DSC%d)\n", 5 + i * 2, block_enabled ? 1 : 0, i); + } +} + static const struct pg_cntl_funcs pg_cntl35_funcs = { .init_pg_status = pg_cntl35_init_pg_status, .dsc_pg_control = pg_cntl35_dsc_pg_control, @@ -511,7 +532,8 @@ static const struct pg_cntl_funcs pg_cnt .mpcc_pg_control = pg_cntl35_mpcc_pg_control, .opp_pg_control = pg_cntl35_opp_pg_control, .optc_pg_control = pg_cntl35_optc_pg_control, - .dwb_pg_control = pg_cntl35_dwb_pg_control + .dwb_pg_control = pg_cntl35_dwb_pg_control, + .print_pg_status = pg_cntl35_print_pg_status };
struct pg_cntl *pg_cntl35_create(
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geoffrey McRae geoffrey.mcrae@amd.com
commit 1dfd2864a1c4909147663e5a27c055f50f7c2796 upstream.
Fixes a bug where unbinding of the GPU would leave the oem i2c adapter registered resulting in a null pointer dereference when applications try to access the invalid device.
Fixes: 3d5470c97314 ("drm/amd/display/dm: add support for OEM i2c bus") Cc: Harry Wentland harry.wentland@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Geoffrey McRae geoffrey.mcrae@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 89923fb7ead4fdd37b78dd49962d9bb5892403e6) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2910,6 +2910,17 @@ static int dm_oem_i2c_hw_init(struct amd return 0; }
+static void dm_oem_i2c_hw_fini(struct amdgpu_device *adev) +{ + struct amdgpu_display_manager *dm = &adev->dm; + + if (dm->oem_i2c) { + i2c_del_adapter(&dm->oem_i2c->base); + kfree(dm->oem_i2c); + dm->oem_i2c = NULL; + } +} + /** * dm_hw_init() - Initialize DC device * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. @@ -2960,7 +2971,7 @@ static int dm_hw_fini(struct amdgpu_ip_b { struct amdgpu_device *adev = ip_block->adev;
- kfree(adev->dm.oem_i2c); + dm_oem_i2c_hw_fini(adev);
amdgpu_dm_hpd_fini(adev);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Imre Deak imre.deak@intel.com
commit 5281cbe0b55a1ff9c6c29361540016873bdc506e upstream.
An enum list is better suited to define a quirk list, do that. This makes looking up a quirk more robust and also allows for separating quirks internal to the EDID parser and global quirks which can be queried outside of the EDID parser (added as a follow-up).
Suggested-by: Jani Nikula jani.nikula@linux.intel.com Reviewed-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Imre Deak imre.deak@intel.com Link: https://lore.kernel.org/r/20250605082850.65136-3-imre.deak@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_edid.c | 218 +++++++++++++++++++++++---------------------- 1 file changed, 112 insertions(+), 106 deletions(-)
--- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -66,34 +66,36 @@ static int oui(u8 first, u8 second, u8 t * on as many displays as possible). */
-/* First detailed mode wrong, use largest 60Hz mode */ -#define EDID_QUIRK_PREFER_LARGE_60 (1 << 0) -/* Reported 135MHz pixel clock is too high, needs adjustment */ -#define EDID_QUIRK_135_CLOCK_TOO_HIGH (1 << 1) -/* Prefer the largest mode at 75 Hz */ -#define EDID_QUIRK_PREFER_LARGE_75 (1 << 2) -/* Detail timing is in cm not mm */ -#define EDID_QUIRK_DETAILED_IN_CM (1 << 3) -/* Detailed timing descriptors have bogus size values, so just take the - * maximum size and use that. - */ -#define EDID_QUIRK_DETAILED_USE_MAXIMUM_SIZE (1 << 4) -/* use +hsync +vsync for detailed mode */ -#define EDID_QUIRK_DETAILED_SYNC_PP (1 << 6) -/* Force reduced-blanking timings for detailed modes */ -#define EDID_QUIRK_FORCE_REDUCED_BLANKING (1 << 7) -/* Force 8bpc */ -#define EDID_QUIRK_FORCE_8BPC (1 << 8) -/* Force 12bpc */ -#define EDID_QUIRK_FORCE_12BPC (1 << 9) -/* Force 6bpc */ -#define EDID_QUIRK_FORCE_6BPC (1 << 10) -/* Force 10bpc */ -#define EDID_QUIRK_FORCE_10BPC (1 << 11) -/* Non desktop display (i.e. HMD) */ -#define EDID_QUIRK_NON_DESKTOP (1 << 12) -/* Cap the DSC target bitrate to 15bpp */ -#define EDID_QUIRK_CAP_DSC_15BPP (1 << 13) +enum drm_edid_internal_quirk { + /* First detailed mode wrong, use largest 60Hz mode */ + EDID_QUIRK_PREFER_LARGE_60, + /* Reported 135MHz pixel clock is too high, needs adjustment */ + EDID_QUIRK_135_CLOCK_TOO_HIGH, + /* Prefer the largest mode at 75 Hz */ + EDID_QUIRK_PREFER_LARGE_75, + /* Detail timing is in cm not mm */ + EDID_QUIRK_DETAILED_IN_CM, + /* Detailed timing descriptors have bogus size values, so just take the + * maximum size and use that. + */ + EDID_QUIRK_DETAILED_USE_MAXIMUM_SIZE, + /* use +hsync +vsync for detailed mode */ + EDID_QUIRK_DETAILED_SYNC_PP, + /* Force reduced-blanking timings for detailed modes */ + EDID_QUIRK_FORCE_REDUCED_BLANKING, + /* Force 8bpc */ + EDID_QUIRK_FORCE_8BPC, + /* Force 12bpc */ + EDID_QUIRK_FORCE_12BPC, + /* Force 6bpc */ + EDID_QUIRK_FORCE_6BPC, + /* Force 10bpc */ + EDID_QUIRK_FORCE_10BPC, + /* Non desktop display (i.e. HMD) */ + EDID_QUIRK_NON_DESKTOP, + /* Cap the DSC target bitrate to 15bpp */ + EDID_QUIRK_CAP_DSC_15BPP, +};
#define MICROSOFT_IEEE_OUI 0xca125c
@@ -128,124 +130,124 @@ static const struct edid_quirk { u32 quirks; } edid_quirk_list[] = { /* Acer AL1706 */ - EDID_QUIRK('A', 'C', 'R', 44358, EDID_QUIRK_PREFER_LARGE_60), + EDID_QUIRK('A', 'C', 'R', 44358, BIT(EDID_QUIRK_PREFER_LARGE_60)), /* Acer F51 */ - EDID_QUIRK('A', 'P', 'I', 0x7602, EDID_QUIRK_PREFER_LARGE_60), + EDID_QUIRK('A', 'P', 'I', 0x7602, BIT(EDID_QUIRK_PREFER_LARGE_60)),
/* AEO model 0 reports 8 bpc, but is a 6 bpc panel */ - EDID_QUIRK('A', 'E', 'O', 0, EDID_QUIRK_FORCE_6BPC), + EDID_QUIRK('A', 'E', 'O', 0, BIT(EDID_QUIRK_FORCE_6BPC)),
/* BenQ GW2765 */ - EDID_QUIRK('B', 'N', 'Q', 0x78d6, EDID_QUIRK_FORCE_8BPC), + EDID_QUIRK('B', 'N', 'Q', 0x78d6, BIT(EDID_QUIRK_FORCE_8BPC)),
/* BOE model on HP Pavilion 15-n233sl reports 8 bpc, but is a 6 bpc panel */ - EDID_QUIRK('B', 'O', 'E', 0x78b, EDID_QUIRK_FORCE_6BPC), + EDID_QUIRK('B', 'O', 'E', 0x78b, BIT(EDID_QUIRK_FORCE_6BPC)),
/* CPT panel of Asus UX303LA reports 8 bpc, but is a 6 bpc panel */ - EDID_QUIRK('C', 'P', 'T', 0x17df, EDID_QUIRK_FORCE_6BPC), + EDID_QUIRK('C', 'P', 'T', 0x17df, BIT(EDID_QUIRK_FORCE_6BPC)),
/* SDC panel of Lenovo B50-80 reports 8 bpc, but is a 6 bpc panel */ - EDID_QUIRK('S', 'D', 'C', 0x3652, EDID_QUIRK_FORCE_6BPC), + EDID_QUIRK('S', 'D', 'C', 0x3652, BIT(EDID_QUIRK_FORCE_6BPC)),
/* BOE model 0x0771 reports 8 bpc, but is a 6 bpc panel */ - EDID_QUIRK('B', 'O', 'E', 0x0771, EDID_QUIRK_FORCE_6BPC), + EDID_QUIRK('B', 'O', 'E', 0x0771, BIT(EDID_QUIRK_FORCE_6BPC)),
/* Belinea 10 15 55 */ - EDID_QUIRK('M', 'A', 'X', 1516, EDID_QUIRK_PREFER_LARGE_60), - EDID_QUIRK('M', 'A', 'X', 0x77e, EDID_QUIRK_PREFER_LARGE_60), + EDID_QUIRK('M', 'A', 'X', 1516, BIT(EDID_QUIRK_PREFER_LARGE_60)), + EDID_QUIRK('M', 'A', 'X', 0x77e, BIT(EDID_QUIRK_PREFER_LARGE_60)),
/* Envision Peripherals, Inc. EN-7100e */ - EDID_QUIRK('E', 'P', 'I', 59264, EDID_QUIRK_135_CLOCK_TOO_HIGH), + EDID_QUIRK('E', 'P', 'I', 59264, BIT(EDID_QUIRK_135_CLOCK_TOO_HIGH)), /* Envision EN2028 */ - EDID_QUIRK('E', 'P', 'I', 8232, EDID_QUIRK_PREFER_LARGE_60), + EDID_QUIRK('E', 'P', 'I', 8232, BIT(EDID_QUIRK_PREFER_LARGE_60)),
/* Funai Electronics PM36B */ - EDID_QUIRK('F', 'C', 'M', 13600, EDID_QUIRK_PREFER_LARGE_75 | - EDID_QUIRK_DETAILED_IN_CM), + EDID_QUIRK('F', 'C', 'M', 13600, BIT(EDID_QUIRK_PREFER_LARGE_75) | + BIT(EDID_QUIRK_DETAILED_IN_CM)),
/* LG 27GP950 */ - EDID_QUIRK('G', 'S', 'M', 0x5bbf, EDID_QUIRK_CAP_DSC_15BPP), + EDID_QUIRK('G', 'S', 'M', 0x5bbf, BIT(EDID_QUIRK_CAP_DSC_15BPP)),
/* LG 27GN950 */ - EDID_QUIRK('G', 'S', 'M', 0x5b9a, EDID_QUIRK_CAP_DSC_15BPP), + EDID_QUIRK('G', 'S', 'M', 0x5b9a, BIT(EDID_QUIRK_CAP_DSC_15BPP)),
/* LGD panel of HP zBook 17 G2, eDP 10 bpc, but reports unknown bpc */ - EDID_QUIRK('L', 'G', 'D', 764, EDID_QUIRK_FORCE_10BPC), + EDID_QUIRK('L', 'G', 'D', 764, BIT(EDID_QUIRK_FORCE_10BPC)),
/* LG Philips LCD LP154W01-A5 */ - EDID_QUIRK('L', 'P', 'L', 0, EDID_QUIRK_DETAILED_USE_MAXIMUM_SIZE), - EDID_QUIRK('L', 'P', 'L', 0x2a00, EDID_QUIRK_DETAILED_USE_MAXIMUM_SIZE), + EDID_QUIRK('L', 'P', 'L', 0, BIT(EDID_QUIRK_DETAILED_USE_MAXIMUM_SIZE)), + EDID_QUIRK('L', 'P', 'L', 0x2a00, BIT(EDID_QUIRK_DETAILED_USE_MAXIMUM_SIZE)),
/* Samsung SyncMaster 205BW. Note: irony */ - EDID_QUIRK('S', 'A', 'M', 541, EDID_QUIRK_DETAILED_SYNC_PP), + EDID_QUIRK('S', 'A', 'M', 541, BIT(EDID_QUIRK_DETAILED_SYNC_PP)), /* Samsung SyncMaster 22[5-6]BW */ - EDID_QUIRK('S', 'A', 'M', 596, EDID_QUIRK_PREFER_LARGE_60), - EDID_QUIRK('S', 'A', 'M', 638, EDID_QUIRK_PREFER_LARGE_60), + EDID_QUIRK('S', 'A', 'M', 596, BIT(EDID_QUIRK_PREFER_LARGE_60)), + EDID_QUIRK('S', 'A', 'M', 638, BIT(EDID_QUIRK_PREFER_LARGE_60)),
/* Sony PVM-2541A does up to 12 bpc, but only reports max 8 bpc */ - EDID_QUIRK('S', 'N', 'Y', 0x2541, EDID_QUIRK_FORCE_12BPC), + EDID_QUIRK('S', 'N', 'Y', 0x2541, BIT(EDID_QUIRK_FORCE_12BPC)),
/* ViewSonic VA2026w */ - EDID_QUIRK('V', 'S', 'C', 5020, EDID_QUIRK_FORCE_REDUCED_BLANKING), + EDID_QUIRK('V', 'S', 'C', 5020, BIT(EDID_QUIRK_FORCE_REDUCED_BLANKING)),
/* Medion MD 30217 PG */ - EDID_QUIRK('M', 'E', 'D', 0x7b8, EDID_QUIRK_PREFER_LARGE_75), + EDID_QUIRK('M', 'E', 'D', 0x7b8, BIT(EDID_QUIRK_PREFER_LARGE_75)),
/* Lenovo G50 */ - EDID_QUIRK('S', 'D', 'C', 18514, EDID_QUIRK_FORCE_6BPC), + EDID_QUIRK('S', 'D', 'C', 18514, BIT(EDID_QUIRK_FORCE_6BPC)),
/* Panel in Samsung NP700G7A-S01PL notebook reports 6bpc */ - EDID_QUIRK('S', 'E', 'C', 0xd033, EDID_QUIRK_FORCE_8BPC), + EDID_QUIRK('S', 'E', 'C', 0xd033, BIT(EDID_QUIRK_FORCE_8BPC)),
/* Rotel RSX-1058 forwards sink's EDID but only does HDMI 1.1*/ - EDID_QUIRK('E', 'T', 'R', 13896, EDID_QUIRK_FORCE_8BPC), + EDID_QUIRK('E', 'T', 'R', 13896, BIT(EDID_QUIRK_FORCE_8BPC)),
/* Valve Index Headset */ - EDID_QUIRK('V', 'L', 'V', 0x91a8, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b0, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b1, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b2, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b3, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b4, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b5, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b6, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b7, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b8, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91b9, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91ba, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91bb, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91bc, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91bd, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91be, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('V', 'L', 'V', 0x91bf, EDID_QUIRK_NON_DESKTOP), + EDID_QUIRK('V', 'L', 'V', 0x91a8, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b0, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b1, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b2, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b3, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b4, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b5, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b6, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b7, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b8, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91b9, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91ba, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91bb, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91bc, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91bd, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91be, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('V', 'L', 'V', 0x91bf, BIT(EDID_QUIRK_NON_DESKTOP)),
/* HTC Vive and Vive Pro VR Headsets */ - EDID_QUIRK('H', 'V', 'R', 0xaa01, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('H', 'V', 'R', 0xaa02, EDID_QUIRK_NON_DESKTOP), + EDID_QUIRK('H', 'V', 'R', 0xaa01, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('H', 'V', 'R', 0xaa02, BIT(EDID_QUIRK_NON_DESKTOP)),
/* Oculus Rift DK1, DK2, CV1 and Rift S VR Headsets */ - EDID_QUIRK('O', 'V', 'R', 0x0001, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('O', 'V', 'R', 0x0003, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('O', 'V', 'R', 0x0004, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('O', 'V', 'R', 0x0012, EDID_QUIRK_NON_DESKTOP), + EDID_QUIRK('O', 'V', 'R', 0x0001, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('O', 'V', 'R', 0x0003, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('O', 'V', 'R', 0x0004, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('O', 'V', 'R', 0x0012, BIT(EDID_QUIRK_NON_DESKTOP)),
/* Windows Mixed Reality Headsets */ - EDID_QUIRK('A', 'C', 'R', 0x7fce, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('L', 'E', 'N', 0x0408, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('F', 'U', 'J', 0x1970, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('D', 'E', 'L', 0x7fce, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('S', 'E', 'C', 0x144a, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('A', 'U', 'S', 0xc102, EDID_QUIRK_NON_DESKTOP), + EDID_QUIRK('A', 'C', 'R', 0x7fce, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('L', 'E', 'N', 0x0408, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('F', 'U', 'J', 0x1970, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('D', 'E', 'L', 0x7fce, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('S', 'E', 'C', 0x144a, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('A', 'U', 'S', 0xc102, BIT(EDID_QUIRK_NON_DESKTOP)),
/* Sony PlayStation VR Headset */ - EDID_QUIRK('S', 'N', 'Y', 0x0704, EDID_QUIRK_NON_DESKTOP), + EDID_QUIRK('S', 'N', 'Y', 0x0704, BIT(EDID_QUIRK_NON_DESKTOP)),
/* Sensics VR Headsets */ - EDID_QUIRK('S', 'E', 'N', 0x1019, EDID_QUIRK_NON_DESKTOP), + EDID_QUIRK('S', 'E', 'N', 0x1019, BIT(EDID_QUIRK_NON_DESKTOP)),
/* OSVR HDK and HDK2 VR Headsets */ - EDID_QUIRK('S', 'V', 'R', 0x1019, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('A', 'U', 'O', 0x1111, EDID_QUIRK_NON_DESKTOP), + EDID_QUIRK('S', 'V', 'R', 0x1019, BIT(EDID_QUIRK_NON_DESKTOP)), + EDID_QUIRK('A', 'U', 'O', 0x1111, BIT(EDID_QUIRK_NON_DESKTOP)), };
/* @@ -2951,6 +2953,12 @@ static u32 edid_get_quirks(const struct return 0; }
+static bool drm_edid_has_internal_quirk(struct drm_connector *connector, + enum drm_edid_internal_quirk quirk) +{ + return connector->display_info.quirks & BIT(quirk); +} + #define MODE_SIZE(m) ((m)->hdisplay * (m)->vdisplay) #define MODE_REFRESH_DIFF(c,t) (abs((c) - (t)))
@@ -2960,7 +2968,6 @@ static u32 edid_get_quirks(const struct */ static void edid_fixup_preferred(struct drm_connector *connector) { - const struct drm_display_info *info = &connector->display_info; struct drm_display_mode *t, *cur_mode, *preferred_mode; int target_refresh = 0; int cur_vrefresh, preferred_vrefresh; @@ -2968,9 +2975,9 @@ static void edid_fixup_preferred(struct if (list_empty(&connector->probed_modes)) return;
- if (info->quirks & EDID_QUIRK_PREFER_LARGE_60) + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_PREFER_LARGE_60)) target_refresh = 60; - if (info->quirks & EDID_QUIRK_PREFER_LARGE_75) + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_PREFER_LARGE_75)) target_refresh = 75;
preferred_mode = list_first_entry(&connector->probed_modes, @@ -3474,7 +3481,6 @@ static struct drm_display_mode *drm_mode const struct drm_edid *drm_edid, const struct detailed_timing *timing) { - const struct drm_display_info *info = &connector->display_info; struct drm_device *dev = connector->dev; struct drm_display_mode *mode; const struct detailed_pixel_timing *pt = &timing->data.pixel_data; @@ -3508,7 +3514,7 @@ static struct drm_display_mode *drm_mode return NULL; }
- if (info->quirks & EDID_QUIRK_FORCE_REDUCED_BLANKING) { + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_FORCE_REDUCED_BLANKING)) { mode = drm_cvt_mode(dev, hactive, vactive, 60, true, false, false); if (!mode) return NULL; @@ -3520,7 +3526,7 @@ static struct drm_display_mode *drm_mode if (!mode) return NULL;
- if (info->quirks & EDID_QUIRK_135_CLOCK_TOO_HIGH) + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_135_CLOCK_TOO_HIGH)) mode->clock = 1088 * 10; else mode->clock = le16_to_cpu(timing->pixel_clock) * 10; @@ -3551,7 +3557,7 @@ static struct drm_display_mode *drm_mode
drm_mode_do_interlace_quirk(mode, pt);
- if (info->quirks & EDID_QUIRK_DETAILED_SYNC_PP) { + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_DETAILED_SYNC_PP)) { mode->flags |= DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_PVSYNC; } else { mode->flags |= (pt->misc & DRM_EDID_PT_HSYNC_POSITIVE) ? @@ -3564,12 +3570,12 @@ set_size: mode->width_mm = pt->width_mm_lo | (pt->width_height_mm_hi & 0xf0) << 4; mode->height_mm = pt->height_mm_lo | (pt->width_height_mm_hi & 0xf) << 8;
- if (info->quirks & EDID_QUIRK_DETAILED_IN_CM) { + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_DETAILED_IN_CM)) { mode->width_mm *= 10; mode->height_mm *= 10; }
- if (info->quirks & EDID_QUIRK_DETAILED_USE_MAXIMUM_SIZE) { + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_DETAILED_USE_MAXIMUM_SIZE)) { mode->width_mm = drm_edid->edid->width_cm * 10; mode->height_mm = drm_edid->edid->height_cm * 10; } @@ -6734,26 +6740,26 @@ static void update_display_info(struct d drm_update_mso(connector, drm_edid);
out: - if (info->quirks & EDID_QUIRK_NON_DESKTOP) { + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_NON_DESKTOP)) { drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] Non-desktop display%s\n", connector->base.id, connector->name, info->non_desktop ? " (redundant quirk)" : ""); info->non_desktop = true; }
- if (info->quirks & EDID_QUIRK_CAP_DSC_15BPP) + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_CAP_DSC_15BPP)) info->max_dsc_bpp = 15;
- if (info->quirks & EDID_QUIRK_FORCE_6BPC) + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_FORCE_6BPC)) info->bpc = 6;
- if (info->quirks & EDID_QUIRK_FORCE_8BPC) + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_FORCE_8BPC)) info->bpc = 8;
- if (info->quirks & EDID_QUIRK_FORCE_10BPC) + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_FORCE_10BPC)) info->bpc = 10;
- if (info->quirks & EDID_QUIRK_FORCE_12BPC) + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_FORCE_12BPC)) info->bpc = 12;
/* Depends on info->cea_rev set by drm_parse_cea_ext() above */ @@ -6918,7 +6924,6 @@ static int add_displayid_detailed_modes( static int _drm_edid_connector_add_modes(struct drm_connector *connector, const struct drm_edid *drm_edid) { - const struct drm_display_info *info = &connector->display_info; int num_modes = 0;
if (!drm_edid) @@ -6948,7 +6953,8 @@ static int _drm_edid_connector_add_modes if (drm_edid->edid->features & DRM_EDID_FEATURE_CONTINUOUS_FREQ) num_modes += add_inferred_modes(connector, drm_edid);
- if (info->quirks & (EDID_QUIRK_PREFER_LARGE_60 | EDID_QUIRK_PREFER_LARGE_75)) + if (drm_edid_has_internal_quirk(connector, EDID_QUIRK_PREFER_LARGE_60) || + drm_edid_has_internal_quirk(connector, EDID_QUIRK_PREFER_LARGE_75)) edid_fixup_preferred(connector);
return num_modes;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Imre Deak imre.deak@intel.com
commit 0b4aa85e8981198e23a68d50ee3c490ccd7f8311 upstream.
Add support for EDID based quirks which can be queried outside of the EDID parser iteself by DRM core and drivers. There are at least two such quirks applicable to all drivers: the DPCD register access probe quirk and the 128b/132b DPRX Lane Count Conversion quirk (see 3.5.2.16.3 in the v2.1a DP Standard). The latter quirk applies to panels with specific EDID panel names, support for defining a quirk this way will be added as a follow-up.
v2: Reset global_quirks in drm_reset_display_info(). v3: (Jani) - Use one list for both the global and internal quirks. - Drop change for panel name specific quirks. - Add comment about the way quirks should be queried.
Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Jani Nikula jani.nikula@linux.intel.com Reviewed-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Imre Deak imre.deak@intel.com Link: https://lore.kernel.org/r/20250605082850.65136-4-imre.deak@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_edid.c | 8 +++++++- include/drm/drm_connector.h | 4 +++- include/drm/drm_edid.h | 5 +++++ 3 files changed, 15 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -68,7 +68,7 @@ static int oui(u8 first, u8 second, u8 t
enum drm_edid_internal_quirk { /* First detailed mode wrong, use largest 60Hz mode */ - EDID_QUIRK_PREFER_LARGE_60, + EDID_QUIRK_PREFER_LARGE_60 = DRM_EDID_QUIRK_NUM, /* Reported 135MHz pixel clock is too high, needs adjustment */ EDID_QUIRK_135_CLOCK_TOO_HIGH, /* Prefer the largest mode at 75 Hz */ @@ -2959,6 +2959,12 @@ static bool drm_edid_has_internal_quirk( return connector->display_info.quirks & BIT(quirk); }
+bool drm_edid_has_quirk(struct drm_connector *connector, enum drm_edid_quirk quirk) +{ + return connector->display_info.quirks & BIT(quirk); +} +EXPORT_SYMBOL(drm_edid_has_quirk); + #define MODE_SIZE(m) ((m)->hdisplay * (m)->vdisplay) #define MODE_REFRESH_DIFF(c,t) (abs((c) - (t)))
--- a/include/drm/drm_connector.h +++ b/include/drm/drm_connector.h @@ -843,7 +843,9 @@ struct drm_display_info { int vics_len;
/** - * @quirks: EDID based quirks. Internal to EDID parsing. + * @quirks: EDID based quirks. DRM core and drivers can query the + * @drm_edid_quirk quirks using drm_edid_has_quirk(), the rest of + * the quirks also tracked here are internal to EDID parsing. */ u32 quirks;
--- a/include/drm/drm_edid.h +++ b/include/drm/drm_edid.h @@ -109,6 +109,10 @@ struct detailed_data_string { #define DRM_EDID_CVT_FLAGS_STANDARD_BLANKING (1 << 3) #define DRM_EDID_CVT_FLAGS_REDUCED_BLANKING (1 << 4)
+enum drm_edid_quirk { + DRM_EDID_QUIRK_NUM, +}; + struct detailed_data_monitor_range { u8 min_vfreq; u8 max_vfreq; @@ -476,5 +480,6 @@ void drm_edid_print_product_id(struct dr u32 drm_edid_get_panel_id(const struct drm_edid *drm_edid); bool drm_edid_match(const struct drm_edid *drm_edid, const struct drm_edid_ident *ident); +bool drm_edid_has_quirk(struct drm_connector *connector, enum drm_edid_quirk quirk);
#endif /* __DRM_EDID_H__ */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Imre Deak imre.deak@intel.com
commit b87ed522b3643f096ef183ed0ccf2d2b90ddd513 upstream.
Reading DPCD registers has side-effects and some of these can cause a problem for instance during link training. Based on this it's better to avoid the probing quirk done before each DPCD register read, limiting this to the monitor which requires it. Add an EDID quirk for this. Leave the quirk enabled by default, allowing it to be disabled after the monitor is detected.
v2: Fix lockdep wrt. drm_dp_aux::hw_mutex when calling drm_dp_dpcd_set_probe_quirk() with a dependent lock already held. v3: Add a helper for determining if DPCD probing is needed. (Jani) v4: - s/drm_dp_dpcd_set_probe_quirk/drm_dp_dpcd_set_probe (Jani) - Fix documentation of drm_dp_dpcd_set_probe(). - Add comment at the end of internal quirk entries.
Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Jani Nikula jani.nikula@linux.intel.com Reviewed-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Imre Deak imre.deak@intel.com Link: https://lore.kernel.org/r/20250609125556.109538-1-imre.deak@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/display/drm_dp_helper.c | 42 ++++++++++++++++++++++---------- drivers/gpu/drm/drm_edid.c | 8 ++++++ include/drm/display/drm_dp_helper.h | 6 ++++ include/drm/drm_edid.h | 3 ++ 4 files changed, 46 insertions(+), 13 deletions(-)
--- a/drivers/gpu/drm/display/drm_dp_helper.c +++ b/drivers/gpu/drm/display/drm_dp_helper.c @@ -692,6 +692,34 @@ void drm_dp_dpcd_set_powered(struct drm_ EXPORT_SYMBOL(drm_dp_dpcd_set_powered);
/** + * drm_dp_dpcd_set_probe() - Set whether a probing before DPCD access is done + * @aux: DisplayPort AUX channel + * @enable: Enable the probing if required + */ +void drm_dp_dpcd_set_probe(struct drm_dp_aux *aux, bool enable) +{ + WRITE_ONCE(aux->dpcd_probe_disabled, !enable); +} +EXPORT_SYMBOL(drm_dp_dpcd_set_probe); + +static bool dpcd_access_needs_probe(struct drm_dp_aux *aux) +{ + /* + * HP ZR24w corrupts the first DPCD access after entering power save + * mode. Eg. on a read, the entire buffer will be filled with the same + * byte. Do a throw away read to avoid corrupting anything we care + * about. Afterwards things will work correctly until the monitor + * gets woken up and subsequently re-enters power save mode. + * + * The user pressing any button on the monitor is enough to wake it + * up, so there is no particularly good place to do the workaround. + * We just have to do it before any DPCD access and hope that the + * monitor doesn't power down exactly after the throw away read. + */ + return !aux->is_remote && !READ_ONCE(aux->dpcd_probe_disabled); +} + +/** * drm_dp_dpcd_read() - read a series of bytes from the DPCD * @aux: DisplayPort AUX channel (SST or MST) * @offset: address of the (first) register to read @@ -712,19 +740,7 @@ ssize_t drm_dp_dpcd_read(struct drm_dp_a { int ret;
- /* - * HP ZR24w corrupts the first DPCD access after entering power save - * mode. Eg. on a read, the entire buffer will be filled with the same - * byte. Do a throw away read to avoid corrupting anything we care - * about. Afterwards things will work correctly until the monitor - * gets woken up and subsequently re-enters power save mode. - * - * The user pressing any button on the monitor is enough to wake it - * up, so there is no particularly good place to do the workaround. - * We just have to do it before any DPCD access and hope that the - * monitor doesn't power down exactly after the throw away read. - */ - if (!aux->is_remote) { + if (dpcd_access_needs_probe(aux)) { ret = drm_dp_dpcd_probe(aux, DP_TRAINING_PATTERN_SET); if (ret < 0) return ret; --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -248,6 +248,14 @@ static const struct edid_quirk { /* OSVR HDK and HDK2 VR Headsets */ EDID_QUIRK('S', 'V', 'R', 0x1019, BIT(EDID_QUIRK_NON_DESKTOP)), EDID_QUIRK('A', 'U', 'O', 0x1111, BIT(EDID_QUIRK_NON_DESKTOP)), + + /* + * @drm_edid_internal_quirk entries end here, following with the + * @drm_edid_quirk entries. + */ + + /* HP ZR24w DP AUX DPCD access requires probing to prevent corruption. */ + EDID_QUIRK('H', 'W', 'P', 0x2869, BIT(DRM_EDID_QUIRK_DP_DPCD_PROBE)), };
/* --- a/include/drm/display/drm_dp_helper.h +++ b/include/drm/display/drm_dp_helper.h @@ -523,10 +523,16 @@ struct drm_dp_aux { * @no_zero_sized: If the hw can't use zero sized transfers (NVIDIA) */ bool no_zero_sized; + + /** + * @dpcd_probe_disabled: If probing before a DPCD access is disabled. + */ + bool dpcd_probe_disabled; };
int drm_dp_dpcd_probe(struct drm_dp_aux *aux, unsigned int offset); void drm_dp_dpcd_set_powered(struct drm_dp_aux *aux, bool powered); +void drm_dp_dpcd_set_probe(struct drm_dp_aux *aux, bool enable); ssize_t drm_dp_dpcd_read(struct drm_dp_aux *aux, unsigned int offset, void *buffer, size_t size); ssize_t drm_dp_dpcd_write(struct drm_dp_aux *aux, unsigned int offset, --- a/include/drm/drm_edid.h +++ b/include/drm/drm_edid.h @@ -110,6 +110,9 @@ struct detailed_data_string { #define DRM_EDID_CVT_FLAGS_REDUCED_BLANKING (1 << 4)
enum drm_edid_quirk { + /* Do a dummy read before DPCD accesses, to prevent corruption. */ + DRM_EDID_QUIRK_DP_DPCD_PROBE, + DRM_EDID_QUIRK_NUM, };
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fangzhi Zuo Jerry.Zuo@amd.com
commit f5c32370dba668c171c73684f489a3ea0b9503c5 upstream.
Disable dpcd probe quirk to native aux.
Signed-off-by: Fangzhi Zuo Jerry.Zuo@amd.com Reviewed-by: Imre Deak imre.deak@intel.com Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4500 Reviewed-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20250904191351.746707-1-Jerry.Zuo@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit c5f4fb40584ee591da9fa090c6f265d11cbb1acf) Cc: stable@vger.kernel.org # 6.16.y: 5281cbe0b55a Cc: stable@vger.kernel.org # 6.16.y: 0b4aa85e8981 Cc: stable@vger.kernel.org # 6.16.y: b87ed522b364 Cc: stable@vger.kernel.org # 6.16.y Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -809,6 +809,7 @@ void amdgpu_dm_initialize_dp_connector(s drm_dp_aux_init(&aconnector->dm_dp_aux.aux); drm_dp_cec_register_connector(&aconnector->dm_dp_aux.aux, &aconnector->base); + drm_dp_dpcd_set_probe(&aconnector->dm_dp_aux.aux, false);
if (aconnector->base.connector_type == DRM_MODE_CONNECTOR_eDP) return;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
[ Upstream commit 9e9ff875e4174be939371667d2cc81244e31232f ]
We recently received a report of poor performance doing sequential buffered reads of a file with compressed extents. With bs=128k, a naive sequential dd ran as fast on a compressed file as on an uncompressed (1.2GB/s on my reproducing system) while with bs<32k, this performance tanked down to ~300MB/s.
i.e., slow:
dd if=some-compressed-file of=/dev/null bs=4k count=X
vs fast:
dd if=some-compressed-file of=/dev/null bs=128k count=Y
The cause of this slowness is overhead to do with looking up extent_maps to enable readahead pre-caching on compressed extents (add_ra_bio_pages()), as well as some overhead in the generic VFS readahead code we hit more in the slow case. Notably, the main difference between the two read sizes is that in the large sized request case, we call btrfs_readahead() relatively rarely while in the smaller request we call it for every compressed extent. So the fast case stays in the btrfs readahead loop:
while ((folio = readahead_folio(rac)) != NULL) btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start);
where the slower one breaks out of that loop every time. This results in calling add_ra_bio_pages a lot, doing lots of extent_map lookups, extent_map locking, etc.
This happens because although add_ra_bio_pages() does add the appropriate un-compressed file pages to the cache, it does not communicate back to the ractl in any way. To solve this, we should be using readahead_expand() to signal to readahead to expand the readahead window.
This change passes the readahead_control into the btrfs_bio_ctrl and in the case of compressed reads sets the expansion to the size of the extent_map we already looked up anyway. It skips the subpage case as that one already doesn't do add_ra_bio_pages().
With this change, whether we use bs=4k or bs=128k, btrfs expands the readahead window up to the largest compressed extent we have seen so far (in the trivial example: 128k) and the call stacks of the two modes look identical. Notably, we barely call add_ra_bio_pages at all. And the performance becomes identical as well. So this change certainly "fixes" this performance problem.
Of course, it does seem to beg a few questions:
1. Will this waste too much page cache with a too large ra window? 2. Will this somehow cause bugs prevented by the more thoughtful checking in add_ra_bio_pages? 3. Should we delete add_ra_bio_pages?
My stabs at some answers:
1. Hard to say. See attempts at generic performance testing below. Is there a "readahead_shrink" we should be using? Should we expand more slowly, by half the remaining em size each time? 2. I don't think so. Since the new behavior is indistinguishable from reading the file with a larger read size passed in, I don't see why one would be safe but not the other. 3. Probably! I tested that and it was fine in fstests, and it seems like the pages would get re-used just as well in the readahead case. However, it is possible some reads that use page cache but not btrfs_readahead() could suffer. I will investigate this further as a follow up.
I tested the performance implications of this change in 3 ways (using compress-force=zstd:3 for compression):
Directly test the affected workload of small sequential reads on a compressed file (improved from ~250MB/s to ~1.2GB/s)
==========for-next========== dd /mnt/lol/non-cmpr 4k 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.02983 s, 712 MB/s dd /mnt/lol/non-cmpr 128k 32768+0 records in 32768+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.92403 s, 725 MB/s dd /mnt/lol/cmpr 4k 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.8832 s, 240 MB/s dd /mnt/lol/cmpr 128k 32768+0 records in 32768+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.71001 s, 1.2 GB/s
==========ra-expand========== dd /mnt/lol/non-cmpr 4k 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.09001 s, 705 MB/s dd /mnt/lol/non-cmpr 128k 32768+0 records in 32768+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.07664 s, 707 MB/s dd /mnt/lol/cmpr 4k 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.79531 s, 1.1 GB/s dd /mnt/lol/cmpr 128k 32768+0 records in 32768+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.69533 s, 1.2 GB/s
Built the linux kernel from clean (no change)
Ran fsperf. Mostly neutral results with some improvements and regressions here and there.
Reported-by: Dimitrios Apostolou jimis@gmx.net Link: https://lore.kernel.org/linux-btrfs/34601559-6c16-6ccc-1793-20a97ca0dbba@gmx... Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 9786531399a6 ("btrfs: fix corruption reading compressed range when block size is smaller than page size") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.c | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-)
--- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -110,6 +110,7 @@ struct btrfs_bio_ctrl { * This is to avoid touching ranges covered by compression/inline. */ unsigned long submit_bitmap; + struct readahead_control *ractl; };
static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl) @@ -882,6 +883,25 @@ static struct extent_map *get_extent_map
return em; } + +static void btrfs_readahead_expand(struct readahead_control *ractl, + const struct extent_map *em) +{ + const u64 ra_pos = readahead_pos(ractl); + const u64 ra_end = ra_pos + readahead_length(ractl); + const u64 em_end = em->start + em->ram_bytes; + + /* No expansion for holes and inline extents. */ + if (em->disk_bytenr > EXTENT_MAP_LAST_BYTE) + return; + + ASSERT(em_end >= ra_pos, + "extent_map %llu %llu ends before current readahead position %llu", + em->start, em->len, ra_pos); + if (em_end > ra_end) + readahead_expand(ractl, ra_pos, em_end - ra_pos); +} + /* * basic readpage implementation. Locked extent state structs are inserted * into the tree that are removed when the IO is done (by the end_io @@ -945,6 +965,16 @@ static int btrfs_do_readpage(struct foli
compress_type = btrfs_extent_map_compression(em);
+ /* + * Only expand readahead for extents which are already creating + * the pages anyway in add_ra_bio_pages, which is compressed + * extents in the non subpage case. + */ + if (bio_ctrl->ractl && + !btrfs_is_subpage(fs_info, folio) && + compress_type != BTRFS_COMPRESS_NONE) + btrfs_readahead_expand(bio_ctrl->ractl, em); + if (compress_type != BTRFS_COMPRESS_NONE) disk_bytenr = em->disk_bytenr; else @@ -2550,7 +2580,10 @@ int btrfs_writepages(struct address_spac
void btrfs_readahead(struct readahead_control *rac) { - struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ | REQ_RAHEAD }; + struct btrfs_bio_ctrl bio_ctrl = { + .opf = REQ_OP_READ | REQ_RAHEAD, + .ractl = rac + }; struct folio *folio; struct btrfs_inode *inode = BTRFS_I(rac->mapping->host); const u64 start = readahead_pos(rac);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit 9786531399a679fc2f4630d2c0a186205282ab2f ]
[BUG] With 64K page size (aarch64 with 64K page size config) and 4K btrfs block size, the following workload can easily lead to a corrupted read:
mkfs.btrfs -f -s 4k $dev > /dev/null mount -o compress $dev $mnt xfs_io -f -c "pwrite -S 0xff 0 64k" $mnt/base > /dev/null echo "correct result:" od -Ad -t x1 $mnt/base xfs_io -f -c "reflink $mnt/base 32k 0 32k" \ -c "reflink $mnt/base 0 32k 32k" \ -c "pwrite -S 0xff 60k 4k" $mnt/new > /dev/null echo "incorrect result:" od -Ad -t x1 $mnt/new umount $mnt
This shows the following result:
correct result: 0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0065536 incorrect result: 0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0032768 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0061440 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0065536
Notice the zero in the range [32K, 60K), which is incorrect.
[CAUSE] With extra trace printk, it shows the following events during od: (some unrelated info removed like CPU and context)
od-3457 btrfs_do_readpage: enter r/i=5/258 folio=0(65536) prev_em_start=0000000000000000
The "r/i" is indicating the root and inode number. In our case the file "new" is using ino 258 from fs tree (root 5).
Here notice the @prev_em_start pointer is NULL. This means the btrfs_do_readpage() is called from btrfs_read_folio(), not from btrfs_readahead().
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=0 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=4096 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=8192 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=12288 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=16384 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=20480 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=24576 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=28672 got em start=0 len=32768
These above 32K blocks will be read from the first half of the compressed data extent.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=32768 got em start=32768 len=32768
Note here there is no btrfs_submit_compressed_read() call. Which is incorrect now. Although both extent maps at 0 and 32K are pointing to the same compressed data, their offsets are different thus can not be merged into the same read.
So this means the compressed data read merge check is doing something wrong.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=36864 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=40960 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=45056 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=49152 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=53248 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=57344 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=61440 skip uptodate od-3457 btrfs_submit_compressed_read: cb orig_bio: file off=0 len=61440
The function btrfs_submit_compressed_read() is only called at the end of folio read. The compressed bio will only have an extent map of range [0, 32K), but the original bio passed in is for the whole 64K folio.
This will cause the decompression part to only fill the first 32K, leaving the rest untouched (aka, filled with zero).
This incorrect compressed read merge leads to the above data corruption.
There were similar problems that happened in the past, commit 808f80b46790 ("Btrfs: update fix for read corruption of compressed and shared extents") is doing pretty much the same fix for readahead.
But that's back to 2015, where btrfs still only supports bs (block size) == ps (page size) cases. This means btrfs_do_readpage() only needs to handle a folio which contains exactly one block.
Only btrfs_readahead() can lead to a read covering multiple blocks. Thus only btrfs_readahead() passes a non-NULL @prev_em_start pointer.
With v5.15 kernel btrfs introduced bs < ps support. This breaks the above assumption that a folio can only contain one block.
Now btrfs_read_folio() can also read multiple blocks in one go. But btrfs_read_folio() doesn't pass a @prev_em_start pointer, thus the existing bio force submission check will never be triggered.
In theory, this can also happen for btrfs with large folios, but since large folio is still experimental, we don't need to bother it, thus only bs < ps support is affected for now.
[FIX] Instead of passing @prev_em_start to do the proper compressed extent check, introduce one new member, btrfs_bio_ctrl::last_em_start, so that the existing bio force submission logic will always be triggered.
CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.c | 40 ++++++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 10 deletions(-)
--- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -111,6 +111,24 @@ struct btrfs_bio_ctrl { */ unsigned long submit_bitmap; struct readahead_control *ractl; + + /* + * The start offset of the last used extent map by a read operation. + * + * This is for proper compressed read merge. + * U64_MAX means we are starting the read and have made no progress yet. + * + * The current btrfs_bio_is_contig() only uses disk_bytenr as + * the condition to check if the read can be merged with previous + * bio, which is not correct. E.g. two file extents pointing to the + * same extent but with different offset. + * + * So here we need to do extra checks to only merge reads that are + * covered by the same extent map. + * Just extent_map::start will be enough, as they are unique + * inside the same inode. + */ + u64 last_em_start; };
static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl) @@ -910,7 +928,7 @@ static void btrfs_readahead_expand(struc * return 0 on success, otherwise return error */ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, - struct btrfs_bio_ctrl *bio_ctrl, u64 *prev_em_start) + struct btrfs_bio_ctrl *bio_ctrl) { struct inode *inode = folio->mapping->host; struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); @@ -1020,12 +1038,11 @@ static int btrfs_do_readpage(struct foli * non-optimal behavior (submitting 2 bios for the same extent). */ if (compress_type != BTRFS_COMPRESS_NONE && - prev_em_start && *prev_em_start != (u64)-1 && - *prev_em_start != em->start) + bio_ctrl->last_em_start != U64_MAX && + bio_ctrl->last_em_start != em->start) force_bio_submit = true;
- if (prev_em_start) - *prev_em_start = em->start; + bio_ctrl->last_em_start = em->start;
btrfs_free_extent_map(em); em = NULL; @@ -1239,12 +1256,15 @@ int btrfs_read_folio(struct file *file, const u64 start = folio_pos(folio); const u64 end = start + folio_size(folio) - 1; struct extent_state *cached_state = NULL; - struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ }; + struct btrfs_bio_ctrl bio_ctrl = { + .opf = REQ_OP_READ, + .last_em_start = U64_MAX, + }; struct extent_map *em_cached = NULL; int ret;
lock_extents_for_read(inode, start, end, &cached_state); - ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl, NULL); + ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl); btrfs_unlock_extent(&inode->io_tree, start, end, &cached_state);
btrfs_free_extent_map(em_cached); @@ -2582,7 +2602,8 @@ void btrfs_readahead(struct readahead_co { struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ | REQ_RAHEAD, - .ractl = rac + .ractl = rac, + .last_em_start = U64_MAX, }; struct folio *folio; struct btrfs_inode *inode = BTRFS_I(rac->mapping->host); @@ -2590,12 +2611,11 @@ void btrfs_readahead(struct readahead_co const u64 end = start + readahead_length(rac) - 1; struct extent_state *cached_state = NULL; struct extent_map *em_cached = NULL; - u64 prev_em_start = (u64)-1;
lock_extents_for_read(inode, start, end, &cached_state);
while ((folio = readahead_folio(rac)) != NULL) - btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start); + btrfs_do_readpage(folio, &em_cached, &bio_ctrl);
btrfs_unlock_extent(&inode->io_tree, start, end, &cached_state);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Ridong chenridong@huawei.com
commit 3c9ba2777d6c86025e1ba4186dc5cd930e40ec5f upstream.
A use-after-free (UAF) vulnerability was identified in the PSI (Pressure Stall Information) monitoring mechanism:
BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
psi_trigger_poll+0x3c/0x140 cgroup_pressure_poll+0x70/0xa0 cgroup_file_poll+0x8c/0x100 kernfs_fop_poll+0x11c/0x1c0 ep_item_poll.isra.0+0x188/0x2c0
Allocated by task 1: cgroup_file_open+0x88/0x388 kernfs_fop_open+0x73c/0xaf0 do_dentry_open+0x5fc/0x1200 vfs_open+0xa0/0x3f0 do_open+0x7e8/0xd08 path_openat+0x2fc/0x6b0 do_filp_open+0x174/0x368
Freed by task 8462: cgroup_file_release+0x130/0x1f8 kernfs_drain_open_files+0x17c/0x440 kernfs_drain+0x2dc/0x360 kernfs_show+0x1b8/0x288 cgroup_file_show+0x150/0x268 cgroup_pressure_write+0x1dc/0x340 cgroup_file_write+0x274/0x548
Reproduction Steps: 1. Open test/cpu.pressure and establish epoll monitoring 2. Disable monitoring: echo 0 > test/cgroup.pressure 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
The race condition occurs because: 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: - Releases PSI triggers via cgroup_file_release() - Frees of->priv through kernfs_drain_open_files() 2. While epoll still holds reference to the file and continues polling 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
epolling disable/enable cgroup.pressure fd=open(cpu.pressure) while(1) ... epoll_wait kernfs_fop_poll kernfs_get_active = true echo 0 > cgroup.pressure ... cgroup_file_show kernfs_show // inactive kn kernfs_drain_open_files cft->release(of); kfree(ctx); ... kernfs_get_active = false echo 1 > cgroup.pressure kernfs_show kernfs_activate_one(kn); kernfs_fop_poll kernfs_get_active = true cgroup_file_poll psi_trigger_poll // UAF ... end: close(fd)
To address this issue, introduce kernfs_get_active_of() for kernfs open files to obtain active references. This function will fail if the open file has been released. Replace kernfs_get_active() with kernfs_get_active_of() to prevent further operations on released file descriptors.
Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") Cc: stable stable@kernel.org Reported-by: Zhang Zhaotian zhangzhaotian@huawei.com Signed-off-by: Chen Ridong chenridong@huawei.com Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20250822070715.1565236-2-chenridong@huaweicloud.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/kernfs/file.c | 58 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 38 insertions(+), 20 deletions(-)
--- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -70,6 +70,24 @@ static struct kernfs_open_node *of_on(st !list_empty(&of->list)); }
+/* Get active reference to kernfs node for an open file */ +static struct kernfs_open_file *kernfs_get_active_of(struct kernfs_open_file *of) +{ + /* Skip if file was already released */ + if (unlikely(of->released)) + return NULL; + + if (!kernfs_get_active(of->kn)) + return NULL; + + return of; +} + +static void kernfs_put_active_of(struct kernfs_open_file *of) +{ + return kernfs_put_active(of->kn); +} + /** * kernfs_deref_open_node_locked - Get kernfs_open_node corresponding to @kn * @@ -139,7 +157,7 @@ static void kernfs_seq_stop_active(struc
if (ops->seq_stop) ops->seq_stop(sf, v); - kernfs_put_active(of->kn); + kernfs_put_active_of(of); }
static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos) @@ -152,7 +170,7 @@ static void *kernfs_seq_start(struct seq * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return ERR_PTR(-ENODEV);
ops = kernfs_ops(of->kn); @@ -238,7 +256,7 @@ static ssize_t kernfs_file_read_iter(str * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) { + if (!kernfs_get_active_of(of)) { len = -ENODEV; mutex_unlock(&of->mutex); goto out_free; @@ -252,7 +270,7 @@ static ssize_t kernfs_file_read_iter(str else len = -EINVAL;
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); mutex_unlock(&of->mutex);
if (len < 0) @@ -323,7 +341,7 @@ static ssize_t kernfs_fop_write_iter(str * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) { + if (!kernfs_get_active_of(of)) { mutex_unlock(&of->mutex); len = -ENODEV; goto out_free; @@ -335,7 +353,7 @@ static ssize_t kernfs_fop_write_iter(str else len = -EINVAL;
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); mutex_unlock(&of->mutex);
if (len > 0) @@ -357,13 +375,13 @@ static void kernfs_vma_open(struct vm_ar if (!of->vm_ops) return;
- if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return;
if (of->vm_ops->open) of->vm_ops->open(vma);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); }
static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf) @@ -375,14 +393,14 @@ static vm_fault_t kernfs_vma_fault(struc if (!of->vm_ops) return VM_FAULT_SIGBUS;
- if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return VM_FAULT_SIGBUS;
ret = VM_FAULT_SIGBUS; if (of->vm_ops->fault) ret = of->vm_ops->fault(vmf);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); return ret; }
@@ -395,7 +413,7 @@ static vm_fault_t kernfs_vma_page_mkwrit if (!of->vm_ops) return VM_FAULT_SIGBUS;
- if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return VM_FAULT_SIGBUS;
ret = 0; @@ -404,7 +422,7 @@ static vm_fault_t kernfs_vma_page_mkwrit else file_update_time(file);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); return ret; }
@@ -418,14 +436,14 @@ static int kernfs_vma_access(struct vm_a if (!of->vm_ops) return -EINVAL;
- if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return -EINVAL;
ret = -EINVAL; if (of->vm_ops->access) ret = of->vm_ops->access(vma, addr, buf, len, write);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); return ret; }
@@ -455,7 +473,7 @@ static int kernfs_fop_mmap(struct file * mutex_lock(&of->mutex);
rc = -ENODEV; - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) goto out_unlock;
ops = kernfs_ops(of->kn); @@ -490,7 +508,7 @@ static int kernfs_fop_mmap(struct file * } vma->vm_ops = &kernfs_vm_ops; out_put: - kernfs_put_active(of->kn); + kernfs_put_active_of(of); out_unlock: mutex_unlock(&of->mutex);
@@ -852,7 +870,7 @@ static __poll_t kernfs_fop_poll(struct f struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); __poll_t ret;
- if (!kernfs_get_active(kn)) + if (!kernfs_get_active_of(of)) return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI;
if (kn->attr.ops->poll) @@ -860,7 +878,7 @@ static __poll_t kernfs_fop_poll(struct f else ret = kernfs_generic_poll(of, wait);
- kernfs_put_active(kn); + kernfs_put_active_of(of); return ret; }
@@ -875,7 +893,7 @@ static loff_t kernfs_fop_llseek(struct f * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) { + if (!kernfs_get_active_of(of)) { mutex_unlock(&of->mutex); return -ENODEV; } @@ -886,7 +904,7 @@ static loff_t kernfs_fop_llseek(struct f else ret = generic_file_llseek(file, offset, whence);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); mutex_unlock(&of->mutex); return ret; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Dryomov idryomov@gmail.com
commit cdbc9836c7afadad68f374791738f118263c5371 upstream.
There is a place where generic code in messenger.c is reading and another place where it is writing to con->v1 union member without checking that the union member is active (i.e. msgr1 is in use).
On 64-bit systems, con->v1.auth_retry overlaps with con->v2.out_iter, so such a read is almost guaranteed to return a bogus value instead of 0 when msgr2 is in use. This ends up being fairly benign because the side effect is just the invalidation of the authorizer and successive fetching of new tickets.
con->v1.connect_seq overlaps with con->v2.conn_bufs and the fact that it's being written to can cause more serious consequences, but luckily it's not something that happens often.
Cc: stable@vger.kernel.org Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ceph/messenger.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1524,7 +1524,7 @@ static void con_fault_finish(struct ceph * in case we faulted due to authentication, invalidate our * current tickets so that we can get new ones. */ - if (con->v1.auth_retry) { + if (!ceph_msgr2(from_msgr(con->msgr)) && con->v1.auth_retry) { dout("auth_retry %d, invalidating\n", con->v1.auth_retry); if (con->ops->invalidate_authorizer) con->ops->invalidate_authorizer(con); @@ -1714,9 +1714,10 @@ static void clear_standby(struct ceph_co { /* come back from STANDBY? */ if (con->state == CEPH_CON_S_STANDBY) { - dout("clear_standby %p and ++connect_seq\n", con); + dout("clear_standby %p\n", con); con->state = CEPH_CON_S_PREOPEN; - con->v1.connect_seq++; + if (!ceph_msgr2(from_msgr(con->msgr))) + con->v1.connect_seq++; WARN_ON(ceph_con_flag_test(con, CEPH_CON_F_WRITE_PENDING)); WARN_ON(ceph_con_flag_test(con, CEPH_CON_F_KEEPALIVE_PENDING)); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Markuze amarkuze@redhat.com
commit 15f519e9f883b316d86e2bb6b767a023aafd9d83 upstream.
Add validation to ensure the cached parent directory inode matches the directory info in MDS replies. This prevents client-side race conditions where concurrent operations (e.g. rename) cause r_parent to become stale between request initiation and reply processing, which could lead to applying state changes to incorrect directory inodes.
[ idryomov: folded a kerneldoc fixup and a follow-up fix from Alex to move CEPH_CAP_PIN reference when r_parent is updated:
When the parent directory lock is not held, req->r_parent can become stale and is updated to point to the correct inode. However, the associated CEPH_CAP_PIN reference was not being adjusted. The CEPH_CAP_PIN is a reference on an inode that is tracked for accounting purposes. Moving this pin is important to keep the accounting balanced. When the pin was not moved from the old parent to the new one, it created two problems: The reference on the old, stale parent was never released, causing a reference leak. A reference for the new parent was never acquired, creating the risk of a reference underflow later in ceph_mdsc_release_request(). This patch corrects the logic by releasing the pin from the old parent and acquiring it for the new parent when r_parent is switched. This ensures reference accounting stays balanced. ]
Cc: stable@vger.kernel.org Signed-off-by: Alex Markuze amarkuze@redhat.com Reviewed-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/debugfs.c | 14 +--- fs/ceph/dir.c | 17 ++--- fs/ceph/file.c | 24 ++----- fs/ceph/inode.c | 7 -- fs/ceph/mds_client.c | 172 +++++++++++++++++++++++++++++++-------------------- fs/ceph/mds_client.h | 18 ++++- 6 files changed, 145 insertions(+), 107 deletions(-)
--- a/fs/ceph/debugfs.c +++ b/fs/ceph/debugfs.c @@ -55,8 +55,6 @@ static int mdsc_show(struct seq_file *s, struct ceph_mds_client *mdsc = fsc->mdsc; struct ceph_mds_request *req; struct rb_node *rp; - int pathlen = 0; - u64 pathbase; char *path;
mutex_lock(&mdsc->mutex); @@ -81,8 +79,8 @@ static int mdsc_show(struct seq_file *s, if (req->r_inode) { seq_printf(s, " #%llx", ceph_ino(req->r_inode)); } else if (req->r_dentry) { - path = ceph_mdsc_build_path(mdsc, req->r_dentry, &pathlen, - &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, req->r_dentry, &path_info, 0); if (IS_ERR(path)) path = NULL; spin_lock(&req->r_dentry->d_lock); @@ -91,7 +89,7 @@ static int mdsc_show(struct seq_file *s, req->r_dentry, path ? path : ""); spin_unlock(&req->r_dentry->d_lock); - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); } else if (req->r_path1) { seq_printf(s, " #%llx/%s", req->r_ino1.ino, req->r_path1); @@ -100,8 +98,8 @@ static int mdsc_show(struct seq_file *s, }
if (req->r_old_dentry) { - path = ceph_mdsc_build_path(mdsc, req->r_old_dentry, &pathlen, - &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, req->r_old_dentry, &path_info, 0); if (IS_ERR(path)) path = NULL; spin_lock(&req->r_old_dentry->d_lock); @@ -111,7 +109,7 @@ static int mdsc_show(struct seq_file *s, req->r_old_dentry, path ? path : ""); spin_unlock(&req->r_old_dentry->d_lock); - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); } else if (req->r_path2 && req->r_op != CEPH_MDS_OP_SYMLINK) { if (req->r_ino2.ino) seq_printf(s, " #%llx/%s", req->r_ino2.ino, --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1272,10 +1272,8 @@ static void ceph_async_unlink_cb(struct
/* If op failed, mark everyone involved for errors */ if (result) { - int pathlen = 0; - u64 base = 0; - char *path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, - &base, 0); + struct ceph_path_info path_info = {0}; + char *path = ceph_mdsc_build_path(mdsc, dentry, &path_info, 0);
/* mark error on parent + clear complete */ mapping_set_error(req->r_parent->i_mapping, result); @@ -1289,8 +1287,8 @@ static void ceph_async_unlink_cb(struct mapping_set_error(req->r_old_inode->i_mapping, result);
pr_warn_client(cl, "failure path=(%llx)%s result=%d!\n", - base, IS_ERR(path) ? "<<bad>>" : path, result); - ceph_mdsc_free_path(path, pathlen); + path_info.vino.ino, IS_ERR(path) ? "<<bad>>" : path, result); + ceph_mdsc_free_path_info(&path_info); } out: iput(req->r_old_inode); @@ -1348,8 +1346,6 @@ static int ceph_unlink(struct inode *dir int err = -EROFS; int op; char *path; - int pathlen; - u64 pathbase;
if (ceph_snap(dir) == CEPH_SNAPDIR) { /* rmdir .snap/foo is RMSNAP */ @@ -1368,14 +1364,15 @@ static int ceph_unlink(struct inode *dir if (!dn) { try_async = false; } else { - path = ceph_mdsc_build_path(mdsc, dn, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, dn, &path_info, 0); if (IS_ERR(path)) { try_async = false; err = 0; } else { err = ceph_mds_check_access(mdsc, path, MAY_WRITE); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dn);
/* For none EACCES cases will let the MDS do the mds auth check */ --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -368,8 +368,6 @@ int ceph_open(struct inode *inode, struc int flags, fmode, wanted; struct dentry *dentry; char *path; - int pathlen; - u64 pathbase; bool do_sync = false; int mask = MAY_READ;
@@ -399,14 +397,15 @@ int ceph_open(struct inode *inode, struc if (!dentry) { do_sync = true; } else { - path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, dentry, &path_info, 0); if (IS_ERR(path)) { do_sync = true; err = 0; } else { err = ceph_mds_check_access(mdsc, path, mask); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dentry);
/* For none EACCES cases will let the MDS do the mds auth check */ @@ -614,15 +613,13 @@ static void ceph_async_create_cb(struct mapping_set_error(req->r_parent->i_mapping, result);
if (result) { - int pathlen = 0; - u64 base = 0; - char *path = ceph_mdsc_build_path(mdsc, req->r_dentry, &pathlen, - &base, 0); + struct ceph_path_info path_info = {0}; + char *path = ceph_mdsc_build_path(mdsc, req->r_dentry, &path_info, 0);
pr_warn_client(cl, "async create failure path=(%llx)%s result=%d!\n", - base, IS_ERR(path) ? "<<bad>>" : path, result); - ceph_mdsc_free_path(path, pathlen); + path_info.vino.ino, IS_ERR(path) ? "<<bad>>" : path, result); + ceph_mdsc_free_path_info(&path_info);
ceph_dir_clear_complete(req->r_parent); if (!d_unhashed(dentry)) @@ -791,8 +788,6 @@ int ceph_atomic_open(struct inode *dir, int mask; int err; char *path; - int pathlen; - u64 pathbase;
doutc(cl, "%p %llx.%llx dentry %p '%pd' %s flags %d mode 0%o\n", dir, ceph_vinop(dir), dentry, dentry, @@ -814,7 +809,8 @@ int ceph_atomic_open(struct inode *dir, if (!dn) { try_async = false; } else { - path = ceph_mdsc_build_path(mdsc, dn, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, dn, &path_info, 0); if (IS_ERR(path)) { try_async = false; err = 0; @@ -826,7 +822,7 @@ int ceph_atomic_open(struct inode *dir, mask |= MAY_WRITE; err = ceph_mds_check_access(mdsc, path, mask); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dn);
/* For none EACCES cases will let the MDS do the mds auth check */ --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -2488,22 +2488,21 @@ int __ceph_setattr(struct mnt_idmap *idm int truncate_retry = 20; /* The RMW will take around 50ms */ struct dentry *dentry; char *path; - int pathlen; - u64 pathbase; bool do_sync = false;
dentry = d_find_alias(inode); if (!dentry) { do_sync = true; } else { - path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, dentry, &path_info, 0); if (IS_ERR(path)) { do_sync = true; err = 0; } else { err = ceph_mds_check_access(mdsc, path, MAY_WRITE); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dentry);
/* For none EACCES cases will let the MDS do the mds auth check */ --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2681,8 +2681,7 @@ static u8 *get_fscrypt_altname(const str * ceph_mdsc_build_path - build a path string to a given dentry * @mdsc: mds client * @dentry: dentry to which path should be built - * @plen: returned length of string - * @pbase: returned base inode number + * @path_info: output path, length, base ino+snap, and freepath ownership flag * @for_wire: is this path going to be sent to the MDS? * * Build a string that represents the path to the dentry. This is mostly called @@ -2700,7 +2699,7 @@ static u8 *get_fscrypt_altname(const str * foo/.snap/bar -> foo//bar */ char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *dentry, - int *plen, u64 *pbase, int for_wire) + struct ceph_path_info *path_info, int for_wire) { struct ceph_client *cl = mdsc->fsc->client; struct dentry *cur; @@ -2810,16 +2809,28 @@ retry: return ERR_PTR(-ENAMETOOLONG); }
- *pbase = base; - *plen = PATH_MAX - 1 - pos; + /* Initialize the output structure */ + memset(path_info, 0, sizeof(*path_info)); + + path_info->vino.ino = base; + path_info->pathlen = PATH_MAX - 1 - pos; + path_info->path = path + pos; + path_info->freepath = true; + + /* Set snap from dentry if available */ + if (d_inode(dentry)) + path_info->vino.snap = ceph_snap(d_inode(dentry)); + else + path_info->vino.snap = CEPH_NOSNAP; + doutc(cl, "on %p %d built %llx '%.*s'\n", dentry, d_count(dentry), - base, *plen, path + pos); + base, PATH_MAX - 1 - pos, path + pos); return path + pos; }
static int build_dentry_path(struct ceph_mds_client *mdsc, struct dentry *dentry, - struct inode *dir, const char **ppath, int *ppathlen, - u64 *pino, bool *pfreepath, bool parent_locked) + struct inode *dir, struct ceph_path_info *path_info, + bool parent_locked) { char *path;
@@ -2828,41 +2839,47 @@ static int build_dentry_path(struct ceph dir = d_inode_rcu(dentry->d_parent); if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP && !IS_ENCRYPTED(dir)) { - *pino = ceph_ino(dir); + path_info->vino.ino = ceph_ino(dir); + path_info->vino.snap = ceph_snap(dir); rcu_read_unlock(); - *ppath = dentry->d_name.name; - *ppathlen = dentry->d_name.len; + path_info->path = dentry->d_name.name; + path_info->pathlen = dentry->d_name.len; + path_info->freepath = false; return 0; } rcu_read_unlock(); - path = ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1); + path = ceph_mdsc_build_path(mdsc, dentry, path_info, 1); if (IS_ERR(path)) return PTR_ERR(path); - *ppath = path; - *pfreepath = true; + /* + * ceph_mdsc_build_path already fills path_info, including snap handling. + */ return 0; }
-static int build_inode_path(struct inode *inode, - const char **ppath, int *ppathlen, u64 *pino, - bool *pfreepath) +static int build_inode_path(struct inode *inode, struct ceph_path_info *path_info) { struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); struct dentry *dentry; char *path;
if (ceph_snap(inode) == CEPH_NOSNAP) { - *pino = ceph_ino(inode); - *ppathlen = 0; + path_info->vino.ino = ceph_ino(inode); + path_info->vino.snap = ceph_snap(inode); + path_info->pathlen = 0; + path_info->freepath = false; return 0; } dentry = d_find_alias(inode); - path = ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1); + path = ceph_mdsc_build_path(mdsc, dentry, path_info, 1); dput(dentry); if (IS_ERR(path)) return PTR_ERR(path); - *ppath = path; - *pfreepath = true; + /* + * ceph_mdsc_build_path already fills path_info, including snap from dentry. + * Override with inode's snap since that's what this function is for. + */ + path_info->vino.snap = ceph_snap(inode); return 0; }
@@ -2872,26 +2889,32 @@ static int build_inode_path(struct inode */ static int set_request_path_attr(struct ceph_mds_client *mdsc, struct inode *rinode, struct dentry *rdentry, struct inode *rdiri, - const char *rpath, u64 rino, const char **ppath, - int *pathlen, u64 *ino, bool *freepath, + const char *rpath, u64 rino, + struct ceph_path_info *path_info, bool parent_locked) { struct ceph_client *cl = mdsc->fsc->client; int r = 0;
+ /* Initialize the output structure */ + memset(path_info, 0, sizeof(*path_info)); + if (rinode) { - r = build_inode_path(rinode, ppath, pathlen, ino, freepath); + r = build_inode_path(rinode, path_info); doutc(cl, " inode %p %llx.%llx\n", rinode, ceph_ino(rinode), ceph_snap(rinode)); } else if (rdentry) { - r = build_dentry_path(mdsc, rdentry, rdiri, ppath, pathlen, ino, - freepath, parent_locked); - doutc(cl, " dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen, *ppath); + r = build_dentry_path(mdsc, rdentry, rdiri, path_info, parent_locked); + doutc(cl, " dentry %p %llx/%.*s\n", rdentry, path_info->vino.ino, + path_info->pathlen, path_info->path); } else if (rpath || rino) { - *ino = rino; - *ppath = rpath; - *pathlen = rpath ? strlen(rpath) : 0; - doutc(cl, " path %.*s\n", *pathlen, rpath); + path_info->vino.ino = rino; + path_info->vino.snap = CEPH_NOSNAP; + path_info->path = rpath; + path_info->pathlen = rpath ? strlen(rpath) : 0; + path_info->freepath = false; + + doutc(cl, " path %.*s\n", path_info->pathlen, rpath); }
return r; @@ -2968,11 +2991,8 @@ static struct ceph_msg *create_request_m struct ceph_client *cl = mdsc->fsc->client; struct ceph_msg *msg; struct ceph_mds_request_head_legacy *lhead; - const char *path1 = NULL; - const char *path2 = NULL; - u64 ino1 = 0, ino2 = 0; - int pathlen1 = 0, pathlen2 = 0; - bool freepath1 = false, freepath2 = false; + struct ceph_path_info path_info1 = {0}; + struct ceph_path_info path_info2 = {0}; struct dentry *old_dentry = NULL; int len; u16 releases; @@ -2982,25 +3002,49 @@ static struct ceph_msg *create_request_m u16 request_head_version = mds_supported_head_version(session); kuid_t caller_fsuid = req->r_cred->fsuid; kgid_t caller_fsgid = req->r_cred->fsgid; + bool parent_locked = test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, - req->r_parent, req->r_path1, req->r_ino1.ino, - &path1, &pathlen1, &ino1, &freepath1, - test_bit(CEPH_MDS_R_PARENT_LOCKED, - &req->r_req_flags)); + req->r_parent, req->r_path1, req->r_ino1.ino, + &path_info1, parent_locked); if (ret < 0) { msg = ERR_PTR(ret); goto out; }
+ /* + * When the parent directory's i_rwsem is *not* locked, req->r_parent may + * have become stale (e.g. after a concurrent rename) between the time the + * dentry was looked up and now. If we detect that the stored r_parent + * does not match the inode number we just encoded for the request, switch + * to the correct inode so that the MDS receives a valid parent reference. + */ + if (!parent_locked && req->r_parent && path_info1.vino.ino && + ceph_ino(req->r_parent) != path_info1.vino.ino) { + struct inode *old_parent = req->r_parent; + struct inode *correct_dir = ceph_get_inode(mdsc->fsc->sb, path_info1.vino, NULL); + if (!IS_ERR(correct_dir)) { + WARN_ONCE(1, "ceph: r_parent mismatch (had %llx wanted %llx) - updating\n", + ceph_ino(old_parent), path_info1.vino.ino); + /* + * Transfer CEPH_CAP_PIN from the old parent to the new one. + * The pin was taken earlier in ceph_mdsc_submit_request(). + */ + ceph_put_cap_refs(ceph_inode(old_parent), CEPH_CAP_PIN); + iput(old_parent); + req->r_parent = correct_dir; + ceph_get_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN); + } + } + /* If r_old_dentry is set, then assume that its parent is locked */ if (req->r_old_dentry && !(req->r_old_dentry->d_flags & DCACHE_DISCONNECTED)) old_dentry = req->r_old_dentry; ret = set_request_path_attr(mdsc, NULL, old_dentry, - req->r_old_dentry_dir, - req->r_path2, req->r_ino2.ino, - &path2, &pathlen2, &ino2, &freepath2, true); + req->r_old_dentry_dir, + req->r_path2, req->r_ino2.ino, + &path_info2, true); if (ret < 0) { msg = ERR_PTR(ret); goto out_free1; @@ -3031,7 +3075,7 @@ static struct ceph_msg *create_request_m
/* filepaths */ len += 2 * (1 + sizeof(u32) + sizeof(u64)); - len += pathlen1 + pathlen2; + len += path_info1.pathlen + path_info2.pathlen;
/* cap releases */ len += sizeof(struct ceph_mds_request_release) * @@ -3039,9 +3083,9 @@ static struct ceph_msg *create_request_m !!req->r_old_inode_drop + !!req->r_old_dentry_drop);
if (req->r_dentry_drop) - len += pathlen1; + len += path_info1.pathlen; if (req->r_old_dentry_drop) - len += pathlen2; + len += path_info2.pathlen;
/* MClientRequest tail */
@@ -3154,8 +3198,8 @@ static struct ceph_msg *create_request_m lhead->ino = cpu_to_le64(req->r_deleg_ino); lhead->args = req->r_args;
- ceph_encode_filepath(&p, end, ino1, path1); - ceph_encode_filepath(&p, end, ino2, path2); + ceph_encode_filepath(&p, end, path_info1.vino.ino, path_info1.path); + ceph_encode_filepath(&p, end, path_info2.vino.ino, path_info2.path);
/* make note of release offset, in case we need to replay */ req->r_request_release_offset = p - msg->front.iov_base; @@ -3218,11 +3262,9 @@ static struct ceph_msg *create_request_m msg->hdr.data_off = cpu_to_le16(0);
out_free2: - if (freepath2) - ceph_mdsc_free_path((char *)path2, pathlen2); + ceph_mdsc_free_path_info(&path_info2); out_free1: - if (freepath1) - ceph_mdsc_free_path((char *)path1, pathlen1); + ceph_mdsc_free_path_info(&path_info1); out: return msg; out_err: @@ -4579,24 +4621,20 @@ static int reconnect_caps_cb(struct inod struct ceph_pagelist *pagelist = recon_state->pagelist; struct dentry *dentry; struct ceph_cap *cap; - char *path; - int pathlen = 0, err; - u64 pathbase; + struct ceph_path_info path_info = {0}; + int err; u64 snap_follows;
dentry = d_find_primary(inode); if (dentry) { /* set pathbase to parent dir when msg_version >= 2 */ - path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, + char *path = ceph_mdsc_build_path(mdsc, dentry, &path_info, recon_state->msg_version >= 2); dput(dentry); if (IS_ERR(path)) { err = PTR_ERR(path); goto out_err; } - } else { - path = NULL; - pathbase = 0; }
spin_lock(&ci->i_ceph_lock); @@ -4629,7 +4667,7 @@ static int reconnect_caps_cb(struct inod rec.v2.wanted = cpu_to_le32(__ceph_caps_wanted(ci)); rec.v2.issued = cpu_to_le32(cap->issued); rec.v2.snaprealm = cpu_to_le64(ci->i_snap_realm->ino); - rec.v2.pathbase = cpu_to_le64(pathbase); + rec.v2.pathbase = cpu_to_le64(path_info.vino.ino); rec.v2.flock_len = (__force __le32) ((ci->i_ceph_flags & CEPH_I_ERROR_FILELOCK) ? 0 : 1); } else { @@ -4644,7 +4682,7 @@ static int reconnect_caps_cb(struct inod ts = inode_get_atime(inode); ceph_encode_timespec64(&rec.v1.atime, &ts); rec.v1.snaprealm = cpu_to_le64(ci->i_snap_realm->ino); - rec.v1.pathbase = cpu_to_le64(pathbase); + rec.v1.pathbase = cpu_to_le64(path_info.vino.ino); }
if (list_empty(&ci->i_cap_snaps)) { @@ -4706,7 +4744,7 @@ encode_again: sizeof(struct ceph_filelock); rec.v2.flock_len = cpu_to_le32(struct_len);
- struct_len += sizeof(u32) + pathlen + sizeof(rec.v2); + struct_len += sizeof(u32) + path_info.pathlen + sizeof(rec.v2);
if (struct_v >= 2) struct_len += sizeof(u64); /* snap_follows */ @@ -4730,7 +4768,7 @@ encode_again: ceph_pagelist_encode_8(pagelist, 1); ceph_pagelist_encode_32(pagelist, struct_len); } - ceph_pagelist_encode_string(pagelist, path, pathlen); + ceph_pagelist_encode_string(pagelist, (char *)path_info.path, path_info.pathlen); ceph_pagelist_append(pagelist, &rec, sizeof(rec.v2)); ceph_locks_to_pagelist(flocks, pagelist, num_fcntl_locks, num_flock_locks); @@ -4741,17 +4779,17 @@ out_freeflocks: } else { err = ceph_pagelist_reserve(pagelist, sizeof(u64) + sizeof(u32) + - pathlen + sizeof(rec.v1)); + path_info.pathlen + sizeof(rec.v1)); if (err) goto out_err;
ceph_pagelist_encode_64(pagelist, ceph_ino(inode)); - ceph_pagelist_encode_string(pagelist, path, pathlen); + ceph_pagelist_encode_string(pagelist, (char *)path_info.path, path_info.pathlen); ceph_pagelist_append(pagelist, &rec, sizeof(rec.v1)); }
out_err: - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); if (!err) recon_state->nr_caps++; return err; --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -617,14 +617,24 @@ extern int ceph_mds_check_access(struct
extern void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc);
-static inline void ceph_mdsc_free_path(char *path, int len) +/* + * Structure to group path-related output parameters for build_*_path functions + */ +struct ceph_path_info { + const char *path; + int pathlen; + struct ceph_vino vino; + bool freepath; +}; + +static inline void ceph_mdsc_free_path_info(const struct ceph_path_info *path_info) { - if (!IS_ERR_OR_NULL(path)) - __putname(path - (PATH_MAX - 1 - len)); + if (path_info && path_info->freepath && !IS_ERR_OR_NULL(path_info->path)) + __putname((char *)path_info->path - (PATH_MAX - 1 - path_info->pathlen)); }
extern char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, - struct dentry *dentry, int *plen, u64 *base, + struct dentry *dentry, struct ceph_path_info *path_info, int for_wire);
extern void __ceph_mdsc_drop_dentry_lease(struct dentry *dentry);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Markuze amarkuze@redhat.com
commit bec324f33d1ed346394b2eee25bf6dbf3511f727 upstream.
When the parent directory's i_rwsem is not locked, req->r_parent may become stale due to concurrent operations (e.g. rename) between dentry lookup and message creation. Validate that r_parent matches the encoded parent inode and update to the correct inode if a mismatch is detected.
[ idryomov: folded a follow-up fix from Alex to drop extra reference from ceph_get_reply_dir() in ceph_fill_trace():
ceph_get_reply_dir() may return a different, referenced inode when r_parent is stale and the parent directory lock is not held. ceph_fill_trace() used that inode but failed to drop the reference when it differed from req->r_parent, leaking an inode reference.
Keep the directory inode in a local variable and iput() it at function end if it does not match req->r_parent. ]
Cc: stable@vger.kernel.org Signed-off-by: Alex Markuze amarkuze@redhat.com Reviewed-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/inode.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 69 insertions(+), 12 deletions(-)
--- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -55,6 +55,52 @@ static int ceph_set_ino_cb(struct inode return 0; }
+/* + * Check if the parent inode matches the vino from directory reply info + */ +static inline bool ceph_vino_matches_parent(struct inode *parent, + struct ceph_vino vino) +{ + return ceph_ino(parent) == vino.ino && ceph_snap(parent) == vino.snap; +} + +/* + * Validate that the directory inode referenced by @req->r_parent matches the + * inode number and snapshot id contained in the reply's directory record. If + * they do not match – which can theoretically happen if the parent dentry was + * moved between the time the request was issued and the reply arrived – fall + * back to looking up the correct inode in the inode cache. + * + * A reference is *always* returned. Callers that receive a different inode + * than the original @parent are responsible for dropping the extra reference + * once the reply has been processed. + */ +static struct inode *ceph_get_reply_dir(struct super_block *sb, + struct inode *parent, + struct ceph_mds_reply_info_parsed *rinfo) +{ + struct ceph_vino vino; + + if (unlikely(!rinfo->diri.in)) + return parent; /* nothing to compare against */ + + /* If we didn't have a cached parent inode to begin with, just bail out. */ + if (!parent) + return NULL; + + vino.ino = le64_to_cpu(rinfo->diri.in->ino); + vino.snap = le64_to_cpu(rinfo->diri.in->snapid); + + if (likely(ceph_vino_matches_parent(parent, vino))) + return parent; /* matches – use the original reference */ + + /* Mismatch – this should be rare. Emit a WARN and obtain the correct inode. */ + WARN_ONCE(1, "ceph: reply dir mismatch (parent valid %llx.%llx reply %llx.%llx)\n", + ceph_ino(parent), ceph_snap(parent), vino.ino, vino.snap); + + return ceph_get_inode(sb, vino, NULL); +} + /** * ceph_new_inode - allocate a new inode in advance of an expected create * @dir: parent directory for new inode @@ -1523,6 +1569,7 @@ int ceph_fill_trace(struct super_block * struct ceph_vino tvino, dvino; struct ceph_fs_client *fsc = ceph_sb_to_fs_client(sb); struct ceph_client *cl = fsc->client; + struct inode *parent_dir = NULL; int err = 0;
doutc(cl, "%p is_dentry %d is_target %d\n", req, @@ -1536,10 +1583,17 @@ int ceph_fill_trace(struct super_block * }
if (rinfo->head->is_dentry) { - struct inode *dir = req->r_parent; - - if (dir) { - err = ceph_fill_inode(dir, NULL, &rinfo->diri, + /* + * r_parent may be stale, in cases when R_PARENT_LOCKED is not set, + * so we need to get the correct inode + */ + parent_dir = ceph_get_reply_dir(sb, req->r_parent, rinfo); + if (unlikely(IS_ERR(parent_dir))) { + err = PTR_ERR(parent_dir); + goto done; + } + if (parent_dir) { + err = ceph_fill_inode(parent_dir, NULL, &rinfo->diri, rinfo->dirfrag, session, -1, &req->r_caps_reservation); if (err < 0) @@ -1548,14 +1602,14 @@ int ceph_fill_trace(struct super_block * WARN_ON_ONCE(1); }
- if (dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME && + if (parent_dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME && test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) && !test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) { bool is_nokey = false; struct qstr dname; struct dentry *dn, *parent; struct fscrypt_str oname = FSTR_INIT(NULL, 0); - struct ceph_fname fname = { .dir = dir, + struct ceph_fname fname = { .dir = parent_dir, .name = rinfo->dname, .ctext = rinfo->altname, .name_len = rinfo->dname_len, @@ -1564,10 +1618,10 @@ int ceph_fill_trace(struct super_block * BUG_ON(!rinfo->head->is_target); BUG_ON(req->r_dentry);
- parent = d_find_any_alias(dir); + parent = d_find_any_alias(parent_dir); BUG_ON(!parent);
- err = ceph_fname_alloc_buffer(dir, &oname); + err = ceph_fname_alloc_buffer(parent_dir, &oname); if (err < 0) { dput(parent); goto done; @@ -1576,7 +1630,7 @@ int ceph_fill_trace(struct super_block * err = ceph_fname_to_usr(&fname, NULL, &oname, &is_nokey); if (err < 0) { dput(parent); - ceph_fname_free_buffer(dir, &oname); + ceph_fname_free_buffer(parent_dir, &oname); goto done; } dname.name = oname.name; @@ -1595,7 +1649,7 @@ retry_lookup: dname.len, dname.name, dn); if (!dn) { dput(parent); - ceph_fname_free_buffer(dir, &oname); + ceph_fname_free_buffer(parent_dir, &oname); err = -ENOMEM; goto done; } @@ -1610,12 +1664,12 @@ retry_lookup: ceph_snap(d_inode(dn)) != tvino.snap)) { doutc(cl, " dn %p points to wrong inode %p\n", dn, d_inode(dn)); - ceph_dir_clear_ordered(dir); + ceph_dir_clear_ordered(parent_dir); d_delete(dn); dput(dn); goto retry_lookup; } - ceph_fname_free_buffer(dir, &oname); + ceph_fname_free_buffer(parent_dir, &oname);
req->r_dentry = dn; dput(parent); @@ -1794,6 +1848,9 @@ retry_lookup: &dvino, ptvino); } done: + /* Drop extra ref from ceph_get_reply_dir() if it returned a new inode */ + if (unlikely(!IS_ERR_OR_NULL(parent_dir) && parent_dir != req->r_parent)) + iput(parent_dir); doutc(cl, "done err=%d\n", err); return err; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Kellermann max.kellermann@ionos.com
commit cce7c15faaac79b532a07ed6ab8332280ad83762 upstream.
The function ceph_process_folio_batch() sets folio_batch entries to NULL, which is an illegal state. Before folio_batch_release() crashes due to this API violation, the function ceph_shift_unused_folios_left() is supposed to remove those NULLs from the array.
However, since commit ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method"), this shifting doesn't happen anymore because the "for" loop got moved to ceph_process_folio_batch(), and now the `i` variable that remains in ceph_writepages_start() doesn't get incremented anymore, making the shifting effectively unreachable much of the time.
Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write() method") added more preconditions for doing the shift, replacing the `i` check (with something that is still just as broken):
- if ceph_process_folio_batch() fails, shifting never happens
- if ceph_move_dirty_page_in_page_array() was never called (because ceph_process_folio_batch() has returned early for some of various reasons), shifting never happens
- if `processed_in_fbatch` is zero (because ceph_process_folio_batch() has returned early for some of the reasons mentioned above or because ceph_move_dirty_page_in_page_array() has failed), shifting never happens
Since those two commits, any problem in ceph_process_folio_batch() could crash the kernel, e.g. this way:
BUG: kernel NULL pointer dereference, address: 0000000000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 Workqueue: writeback wb_workfn (flush-ceph-1) RIP: 0010:folios_put_refs+0x85/0x140 Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ceph_writepages_start+0xeb9/0x1410
The crash can be reproduced easily by changing the ceph_check_page_before_write() return value to `-E2BIG`.
(Interestingly, the crash happens only if `huge_zero_folio` has already been allocated; without `huge_zero_folio`, is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL entries instead of dereferencing them. That makes reproducing the bug somewhat unreliable. See https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com for a discussion of this detail.)
My suggestion is to move the ceph_shift_unused_folios_left() to right after ceph_process_folio_batch() to ensure it always gets called to fix up the illegal folio_batch state.
Cc: stable@vger.kernel.org Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/ Signed-off-by: Max Kellermann max.kellermann@ionos.com Reviewed-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/addr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1687,6 +1687,7 @@ get_more_pages:
process_folio_batch: rc = ceph_process_folio_batch(mapping, wbc, &ceph_wbc); + ceph_shift_unused_folios_left(&ceph_wbc.fbatch); if (rc) goto release_folios;
@@ -1695,8 +1696,6 @@ process_folio_batch: goto release_folios;
if (ceph_wbc.processed_in_fbatch) { - ceph_shift_unused_folios_left(&ceph_wbc.fbatch); - if (folio_batch_count(&ceph_wbc.fbatch) == 0 && ceph_wbc.locked_pages < ceph_wbc.max_pages) { doutc(cl, "reached end fbatch, trying for more\n");
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Kellermann max.kellermann@ionos.com
commit 249e0a47cdb46bb9eae65511c569044bd8698d7d upstream.
The function move_dirty_folio_in_page_array() was created by commit ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") by moving code from ceph_writepages_start() to this function.
This new function is supposed to return an error code which is checked by the caller (now ceph_process_folio_batch()), and on error, the caller invokes redirty_page_for_writepage() and then breaks from the loop.
However, the refactoring commit has gone wrong, and it by accident, it always returns 0 (= success) because it first NULLs the pointer and then returns PTR_ERR(NULL) which is always 0. This means errors are silently ignored, leaving NULL entries in the page array, which may later crash the kernel.
The simple solution is to call PTR_ERR() before clearing the pointer.
Cc: stable@vger.kernel.org Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") Link: https://lore.kernel.org/ceph-devel/aK4v548CId5GIKG1@swift.blarg.de/ Signed-off-by: Max Kellermann max.kellermann@ionos.com Reviewed-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/addr.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8bc66b45dade..322ed268f14a 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1264,7 +1264,9 @@ static inline int move_dirty_folio_in_page_array(struct address_space *mapping, 0, gfp_flags); if (IS_ERR(pages[index])) { - if (PTR_ERR(pages[index]) == -EINVAL) { + int err = PTR_ERR(pages[index]); + + if (err == -EINVAL) { pr_err_client(cl, "inode->i_blkbits=%hhu\n", inode->i_blkbits); } @@ -1273,7 +1275,7 @@ static inline int move_dirty_folio_in_page_array(struct address_space *mapping, BUG_ON(ceph_wbc->locked_pages == 0);
pages[index] = NULL; - return PTR_ERR(pages[index]); + return err; } } else { pages[index] = &folio->page;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
[ Upstream commit da55809ebb45d1d80b7a388ffef841ed683e1a6f ]
There is already a manufacturer hook, which is manufacturer specific but not chip specific. We no longer have access to the actual NAND identity at this stage so let's add a per-chip configuration hook to align the chip configuration (if any) with the core's setting.
Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Stable-dep-of: 4550d33e1811 ("mtd: spinand: winbond: Fix oob_layout for W25N01JW") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/spi/core.c | 16 ++++++++++++++-- include/linux/mtd/spinand.h | 7 +++++++ 2 files changed, 21 insertions(+), 2 deletions(-)
--- a/drivers/mtd/nand/spi/core.c +++ b/drivers/mtd/nand/spi/core.c @@ -1253,8 +1253,19 @@ static int spinand_id_detect(struct spin
static int spinand_manufacturer_init(struct spinand_device *spinand) { - if (spinand->manufacturer->ops->init) - return spinand->manufacturer->ops->init(spinand); + int ret; + + if (spinand->manufacturer->ops->init) { + ret = spinand->manufacturer->ops->init(spinand); + if (ret) + return ret; + } + + if (spinand->configure_chip) { + ret = spinand->configure_chip(spinand); + if (ret) + return ret; + }
return 0; } @@ -1349,6 +1360,7 @@ int spinand_match_and_init(struct spinan spinand->flags = table[i].flags; spinand->id.len = 1 + table[i].devid.len; spinand->select_target = table[i].select_target; + spinand->configure_chip = table[i].configure_chip; spinand->set_cont_read = table[i].set_cont_read; spinand->fact_otp = &table[i].fact_otp; spinand->user_otp = &table[i].user_otp; --- a/include/linux/mtd/spinand.h +++ b/include/linux/mtd/spinand.h @@ -484,6 +484,7 @@ struct spinand_user_otp { * @op_variants.update_cache: variants of the update-cache operation * @select_target: function used to select a target/die. Required only for * multi-die chips + * @configure_chip: Align the chip configuration with the core settings * @set_cont_read: enable/disable continuous cached reads * @fact_otp: SPI NAND factory OTP info. * @user_otp: SPI NAND user OTP info. @@ -507,6 +508,7 @@ struct spinand_info { } op_variants; int (*select_target)(struct spinand_device *spinand, unsigned int target); + int (*configure_chip)(struct spinand_device *spinand); int (*set_cont_read)(struct spinand_device *spinand, bool enable); struct spinand_fact_otp fact_otp; @@ -539,6 +541,9 @@ struct spinand_info { #define SPINAND_SELECT_TARGET(__func) \ .select_target = __func
+#define SPINAND_CONFIGURE_CHIP(__configure_chip) \ + .configure_chip = __configure_chip + #define SPINAND_CONT_READ(__set_cont_read) \ .set_cont_read = __set_cont_read
@@ -607,6 +612,7 @@ struct spinand_dirmap { * passed in spi_mem_op be DMA-able, so we can't based the bufs on * the stack * @manufacturer: SPI NAND manufacturer information + * @configure_chip: Align the chip configuration with the core settings * @cont_read_possible: Field filled by the core once the whole system * configuration is known to tell whether continuous reads are * suitable to use or not in general with this chip/configuration. @@ -647,6 +653,7 @@ struct spinand_device { const struct spinand_manufacturer *manufacturer; void *priv;
+ int (*configure_chip)(struct spinand_device *spinand); bool cont_read_possible; int (*set_cont_read)(struct spinand_device *spinand, bool enable);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
[ Upstream commit f1a91175faaab02a45d1ceb313a315a5bfeb5416 ]
w25n0xjw chips have a high-speed capability hidden in a configuration register. Once enabled, dual/quad SDR reads may be performed at a much higher frequency.
Implement the new ->configure_chip() hook for this purpose and configure the SR4 register accordingly.
Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Stable-dep-of: 4550d33e1811 ("mtd: spinand: winbond: Fix oob_layout for W25N01JW") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/spi/core.c | 2 - drivers/mtd/nand/spi/winbond.c | 45 +++++++++++++++++++++++++++++++++++++++-- include/linux/mtd/spinand.h | 1 3 files changed, 45 insertions(+), 3 deletions(-)
--- a/drivers/mtd/nand/spi/core.c +++ b/drivers/mtd/nand/spi/core.c @@ -20,7 +20,7 @@ #include <linux/spi/spi.h> #include <linux/spi/spi-mem.h>
-static int spinand_read_reg_op(struct spinand_device *spinand, u8 reg, u8 *val) +int spinand_read_reg_op(struct spinand_device *spinand, u8 reg, u8 *val) { struct spi_mem_op op = SPINAND_GET_FEATURE_1S_1S_1S_OP(reg, spinand->scratchbuf); --- a/drivers/mtd/nand/spi/winbond.c +++ b/drivers/mtd/nand/spi/winbond.c @@ -18,6 +18,9 @@
#define W25N04KV_STATUS_ECC_5_8_BITFLIPS (3 << 4)
+#define W25N0XJW_SR4 0xD0 +#define W25N0XJW_SR4_HS BIT(2) + /* * "X2" in the core is equivalent to "dual output" in the datasheets, * "X4" in the core is equivalent to "quad output" in the datasheets. @@ -42,10 +45,12 @@ static SPINAND_OP_VARIANTS(update_cache_ static SPINAND_OP_VARIANTS(read_cache_dual_quad_dtr_variants, SPINAND_PAGE_READ_FROM_CACHE_1S_4D_4D_OP(0, 8, NULL, 0, 80 * HZ_PER_MHZ), SPINAND_PAGE_READ_FROM_CACHE_1S_1D_4D_OP(0, 2, NULL, 0, 80 * HZ_PER_MHZ), + SPINAND_PAGE_READ_FROM_CACHE_1S_4S_4S_OP(0, 4, NULL, 0, 0), SPINAND_PAGE_READ_FROM_CACHE_1S_4S_4S_OP(0, 2, NULL, 0, 104 * HZ_PER_MHZ), SPINAND_PAGE_READ_FROM_CACHE_1S_1S_4S_OP(0, 1, NULL, 0), SPINAND_PAGE_READ_FROM_CACHE_1S_2D_2D_OP(0, 4, NULL, 0, 80 * HZ_PER_MHZ), SPINAND_PAGE_READ_FROM_CACHE_1S_1D_2D_OP(0, 2, NULL, 0, 80 * HZ_PER_MHZ), + SPINAND_PAGE_READ_FROM_CACHE_1S_2S_2S_OP(0, 2, NULL, 0, 0), SPINAND_PAGE_READ_FROM_CACHE_1S_2S_2S_OP(0, 1, NULL, 0, 104 * HZ_PER_MHZ), SPINAND_PAGE_READ_FROM_CACHE_1S_1S_2S_OP(0, 1, NULL, 0), SPINAND_PAGE_READ_FROM_CACHE_1S_1D_1D_OP(0, 2, NULL, 0, 80 * HZ_PER_MHZ), @@ -230,6 +235,40 @@ static int w25n02kv_ecc_get_status(struc return -EINVAL; }
+static int w25n0xjw_hs_cfg(struct spinand_device *spinand) +{ + const struct spi_mem_op *op; + bool hs; + u8 sr4; + int ret; + + op = spinand->op_templates.read_cache; + if (op->cmd.dtr || op->addr.dtr || op->dummy.dtr || op->data.dtr) + hs = false; + else if (op->cmd.buswidth == 1 && op->addr.buswidth == 1 && + op->dummy.buswidth == 1 && op->data.buswidth == 1) + hs = false; + else if (!op->max_freq) + hs = true; + else + hs = false; + + ret = spinand_read_reg_op(spinand, W25N0XJW_SR4, &sr4); + if (ret) + return ret; + + if (hs) + sr4 |= W25N0XJW_SR4_HS; + else + sr4 &= ~W25N0XJW_SR4_HS; + + ret = spinand_write_reg_op(spinand, W25N0XJW_SR4, sr4); + if (ret) + return ret; + + return 0; +} + static const struct spinand_info winbond_spinand_table[] = { /* 512M-bit densities */ SPINAND_INFO("W25N512GW", /* 1.8V */ @@ -268,7 +307,8 @@ static const struct spinand_info winbond &write_cache_variants, &update_cache_variants), 0, - SPINAND_ECCINFO(&w25m02gv_ooblayout, NULL)), + SPINAND_ECCINFO(&w25m02gv_ooblayout, NULL), + SPINAND_CONFIGURE_CHIP(w25n0xjw_hs_cfg)), SPINAND_INFO("W25N01KV", /* 3.3V */ SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xae, 0x21), NAND_MEMORG(1, 2048, 96, 64, 1024, 20, 1, 1, 1), @@ -324,7 +364,8 @@ static const struct spinand_info winbond &write_cache_variants, &update_cache_variants), 0, - SPINAND_ECCINFO(&w25m02gv_ooblayout, NULL)), + SPINAND_ECCINFO(&w25m02gv_ooblayout, NULL), + SPINAND_CONFIGURE_CHIP(w25n0xjw_hs_cfg)), SPINAND_INFO("W25N02KV", /* 3.3V */ SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xaa, 0x22), NAND_MEMORG(1, 2048, 128, 64, 2048, 40, 1, 1, 1), --- a/include/linux/mtd/spinand.h +++ b/include/linux/mtd/spinand.h @@ -730,6 +730,7 @@ int spinand_match_and_init(struct spinan enum spinand_readid_method rdid_method);
int spinand_upd_cfg(struct spinand_device *spinand, u8 mask, u8 val); +int spinand_read_reg_op(struct spinand_device *spinand, u8 reg, u8 *val); int spinand_write_reg_op(struct spinand_device *spinand, u8 reg, u8 val); int spinand_select_target(struct spinand_device *spinand, unsigned int target);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Santhosh Kumar K s-k6@ti.com
[ Upstream commit 4550d33e18112a11a740424c4eec063cd58e918c ]
Fix the W25N01JW's oob_layout according to the datasheet [1]
[1] https://www.winbond.com/hq/product/code-storage-flash-memory/qspinand-flash/...
Fixes: 6a804fb72de5 ("mtd: spinand: winbond: add support for serial NAND flash") Cc: Sridharan S N quic_sridsn@quicinc.com Cc: stable@vger.kernel.org Signed-off-by: Santhosh Kumar K s-k6@ti.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/spi/winbond.c | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-)
--- a/drivers/mtd/nand/spi/winbond.c +++ b/drivers/mtd/nand/spi/winbond.c @@ -162,6 +162,36 @@ static const struct mtd_ooblayout_ops w2 .free = w25n02kv_ooblayout_free, };
+static int w25n01jw_ooblayout_ecc(struct mtd_info *mtd, int section, + struct mtd_oob_region *region) +{ + if (section > 3) + return -ERANGE; + + region->offset = (16 * section) + 12; + region->length = 4; + + return 0; +} + +static int w25n01jw_ooblayout_free(struct mtd_info *mtd, int section, + struct mtd_oob_region *region) +{ + if (section > 3) + return -ERANGE; + + region->offset = (16 * section); + region->length = 12; + + /* Extract BBM */ + if (!section) { + region->offset += 2; + region->length -= 2; + } + + return 0; +} + static int w35n01jw_ooblayout_ecc(struct mtd_info *mtd, int section, struct mtd_oob_region *region) { @@ -192,6 +222,11 @@ static int w35n01jw_ooblayout_free(struc return 0; }
+static const struct mtd_ooblayout_ops w25n01jw_ooblayout = { + .ecc = w25n01jw_ooblayout_ecc, + .free = w25n01jw_ooblayout_free, +}; + static const struct mtd_ooblayout_ops w35n01jw_ooblayout = { .ecc = w35n01jw_ooblayout_ecc, .free = w35n01jw_ooblayout_free, @@ -307,7 +342,7 @@ static const struct spinand_info winbond &write_cache_variants, &update_cache_variants), 0, - SPINAND_ECCINFO(&w25m02gv_ooblayout, NULL), + SPINAND_ECCINFO(&w25n01jw_ooblayout, NULL), SPINAND_CONFIGURE_CHIP(w25n0xjw_hs_cfg)), SPINAND_INFO("W25N01KV", /* 3.3V */ SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xae, 0x21),
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stanislav Fort stanislav.fort@aisle.com
commit 3260a3f0828e06f5f13fac69fb1999a6d60d9cff upstream.
state_show() reads kdamond->damon_ctx without holding damon_sysfs_lock. This allows a use-after-free race:
CPU 0 CPU 1 ----- ----- state_show() damon_sysfs_turn_damon_on() ctx = kdamond->damon_ctx; mutex_lock(&damon_sysfs_lock); damon_destroy_ctx(kdamond->damon_ctx); kdamond->damon_ctx = NULL; mutex_unlock(&damon_sysfs_lock); damon_is_running(ctx); /* ctx is freed */ mutex_lock(&ctx->kdamond_lock); /* UAF */
(The race can also occur with damon_sysfs_kdamonds_rm_dirs() and damon_sysfs_kdamond_release(), which free or replace the context under damon_sysfs_lock.)
Fix by taking damon_sysfs_lock before dereferencing the context, mirroring the locking used in pid_show().
The bug has existed since state_show() first accessed kdamond->damon_ctx.
Link: https://lkml.kernel.org/r/20250905101046.2288-1-disclosure@aisle.com Fixes: a61ea561c871 ("mm/damon/sysfs: link DAMON for virtual address spaces monitoring") Signed-off-by: Stanislav Fort disclosure@aisle.com Reported-by: Stanislav Fort disclosure@aisle.com Reviewed-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: SeongJae Park sj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/sysfs.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
--- a/mm/damon/sysfs.c +++ b/mm/damon/sysfs.c @@ -1243,14 +1243,18 @@ static ssize_t state_show(struct kobject { struct damon_sysfs_kdamond *kdamond = container_of(kobj, struct damon_sysfs_kdamond, kobj); - struct damon_ctx *ctx = kdamond->damon_ctx; - bool running; + struct damon_ctx *ctx; + bool running = false;
- if (!ctx) - running = false; - else + if (!mutex_trylock(&damon_sysfs_lock)) + return -EBUSY; + + ctx = kdamond->damon_ctx; + if (ctx) running = damon_sysfs_ctx_running(ctx);
+ mutex_unlock(&damon_sysfs_lock); + return sysfs_emit(buf, "%s\n", running ? damon_sysfs_cmd_strs[DAMON_SYSFS_CMD_ON] : damon_sysfs_cmd_strs[DAMON_SYSFS_CMD_OFF]);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quanmin Yan yanquanmin1@huawei.com
commit e6b543ca9806d7bced863f43020e016ee996c057 upstream.
When creating a new scheme of DAMON_RECLAIM, the calculation of 'min_age_region' uses 'aggr_interval' as the divisor, which may lead to division-by-zero errors. Fix it by directly returning -EINVAL when such a case occurs.
Link: https://lkml.kernel.org/r/20250827115858.1186261-3-yanquanmin1@huawei.com Fixes: f5a79d7c0c87 ("mm/damon: introduce struct damos_access_pattern") Signed-off-by: Quanmin Yan yanquanmin1@huawei.com Reviewed-by: SeongJae Park sj@kernel.org Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: ze zuo zuoze1@huawei.com Cc: stable@vger.kernel.org [6.1+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: SeongJae Park sj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/reclaim.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/mm/damon/reclaim.c +++ b/mm/damon/reclaim.c @@ -194,6 +194,11 @@ static int damon_reclaim_apply_parameter if (err) return err;
+ if (!damon_reclaim_mon_attrs.aggr_interval) { + err = -EINVAL; + goto out; + } + err = damon_set_attrs(ctx, &damon_reclaim_mon_attrs); if (err) goto out;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 45cc102f8e6520ac07637c7e09f61dcc3772e125 ]
[Why] When the suspend sequence has been aborted after prepare() but before suspend() the resume() callback never gets called. The PM core will call complete() when this happens. As the state has been cached in prepare() it needs to be destroyed in complete() if it's still around.
[How] Create a helper for destroying cached state and call it both in resume() and complete() callbacks. If resume has been called the state will be destroyed and it's a no-op for complete(). If resume hasn't been called (such as an aborted suspend) then destroy the state in complete().
Fixes: 50e0bae34fa6 ("drm/amd/display: Add and use new dm_prepare_suspend() callback") Reviewed-by: Alex Hung alex.hung@amd.com Link: https://lore.kernel.org/r/20250602014432.3538345-4-superm1@kernel.org Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: 60f71f0db7b1 ("drm/amd/display: Drop dm_prepare_suspend() and dm_complete()") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 100 +++++++++++++--------- 1 file changed, 60 insertions(+), 40 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -3084,6 +3084,64 @@ static int dm_cache_state(struct amdgpu_ return adev->dm.cached_state ? 0 : r; }
+static void dm_destroy_cached_state(struct amdgpu_device *adev) +{ + struct amdgpu_display_manager *dm = &adev->dm; + struct drm_device *ddev = adev_to_drm(adev); + struct dm_plane_state *dm_new_plane_state; + struct drm_plane_state *new_plane_state; + struct dm_crtc_state *dm_new_crtc_state; + struct drm_crtc_state *new_crtc_state; + struct drm_plane *plane; + struct drm_crtc *crtc; + int i; + + if (!dm->cached_state) + return; + + /* Force mode set in atomic commit */ + for_each_new_crtc_in_state(dm->cached_state, crtc, new_crtc_state, i) { + new_crtc_state->active_changed = true; + dm_new_crtc_state = to_dm_crtc_state(new_crtc_state); + reset_freesync_config_for_crtc(dm_new_crtc_state); + } + + /* + * atomic_check is expected to create the dc states. We need to release + * them here, since they were duplicated as part of the suspend + * procedure. + */ + for_each_new_crtc_in_state(dm->cached_state, crtc, new_crtc_state, i) { + dm_new_crtc_state = to_dm_crtc_state(new_crtc_state); + if (dm_new_crtc_state->stream) { + WARN_ON(kref_read(&dm_new_crtc_state->stream->refcount) > 1); + dc_stream_release(dm_new_crtc_state->stream); + dm_new_crtc_state->stream = NULL; + } + dm_new_crtc_state->base.color_mgmt_changed = true; + } + + for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { + dm_new_plane_state = to_dm_plane_state(new_plane_state); + if (dm_new_plane_state->dc_state) { + WARN_ON(kref_read(&dm_new_plane_state->dc_state->refcount) > 1); + dc_plane_state_release(dm_new_plane_state->dc_state); + dm_new_plane_state->dc_state = NULL; + } + } + + drm_atomic_helper_resume(ddev, dm->cached_state); + + dm->cached_state = NULL; +} + +static void dm_complete(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + + dm_destroy_cached_state(adev); +} + static int dm_prepare_suspend(struct amdgpu_ip_block *ip_block) { struct amdgpu_device *adev = ip_block->adev; @@ -3317,12 +3375,6 @@ static int dm_resume(struct amdgpu_ip_bl struct amdgpu_dm_connector *aconnector; struct drm_connector *connector; struct drm_connector_list_iter iter; - struct drm_crtc *crtc; - struct drm_crtc_state *new_crtc_state; - struct dm_crtc_state *dm_new_crtc_state; - struct drm_plane *plane; - struct drm_plane_state *new_plane_state; - struct dm_plane_state *dm_new_plane_state; struct dm_atomic_state *dm_state = to_dm_atomic_state(dm->atomic_obj.state); enum dc_connection_type new_connection_type = dc_connection_none; struct dc_state *dc_state; @@ -3481,40 +3533,7 @@ static int dm_resume(struct amdgpu_ip_bl } drm_connector_list_iter_end(&iter);
- /* Force mode set in atomic commit */ - for_each_new_crtc_in_state(dm->cached_state, crtc, new_crtc_state, i) { - new_crtc_state->active_changed = true; - dm_new_crtc_state = to_dm_crtc_state(new_crtc_state); - reset_freesync_config_for_crtc(dm_new_crtc_state); - } - - /* - * atomic_check is expected to create the dc states. We need to release - * them here, since they were duplicated as part of the suspend - * procedure. - */ - for_each_new_crtc_in_state(dm->cached_state, crtc, new_crtc_state, i) { - dm_new_crtc_state = to_dm_crtc_state(new_crtc_state); - if (dm_new_crtc_state->stream) { - WARN_ON(kref_read(&dm_new_crtc_state->stream->refcount) > 1); - dc_stream_release(dm_new_crtc_state->stream); - dm_new_crtc_state->stream = NULL; - } - dm_new_crtc_state->base.color_mgmt_changed = true; - } - - for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { - dm_new_plane_state = to_dm_plane_state(new_plane_state); - if (dm_new_plane_state->dc_state) { - WARN_ON(kref_read(&dm_new_plane_state->dc_state->refcount) > 1); - dc_plane_state_release(dm_new_plane_state->dc_state); - dm_new_plane_state->dc_state = NULL; - } - } - - drm_atomic_helper_resume(ddev, dm->cached_state); - - dm->cached_state = NULL; + dm_destroy_cached_state(adev);
/* Do mst topology probing after resuming cached state*/ drm_connector_list_iter_begin(ddev, &iter); @@ -3563,6 +3582,7 @@ static const struct amd_ip_funcs amdgpu_ .prepare_suspend = dm_prepare_suspend, .suspend = dm_suspend, .resume = dm_resume, + .complete = dm_complete, .is_idle = dm_is_idle, .wait_for_idle = dm_wait_for_idle, .check_soft_reset = dm_check_soft_reset,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Mario Limonciello (AMD)" mario.limonciello@amd.com
[ Upstream commit 60f71f0db7b12f303789ef59949e38ee5838ee8b ]
[Why] dm_prepare_suspend() was added in commit 50e0bae34fa6b ("drm/amd/display: Add and use new dm_prepare_suspend() callback") to allow display to turn off earlier in the suspend sequence.
This caused a regression that HDMI audio sometimes didn't work properly after resume unless audio was playing during suspend.
[How] Drop dm_prepare_suspend() callback. All code in it will still run during dm_suspend(). Also drop unnecessary dm_complete() callback. dm_complete() was used for failed prepare and also for any case of successful resume. The code in it already runs in dm_resume().
This change will introduce more time that the display is turned on during suspend sequence. The compositor can turn it off sooner if desired.
Cc: Harry Wentland harry.wentland@amd.com Reported-by: Przemysław Kopa prz.kopa@gmail.com Closes: https://lore.kernel.org/amd-gfx/1cea0d56-7739-4ad9-bf8e-c9330faea2bb@kernel.... Reported-by: Kalvin hikaph+oss@gmail.com Closes: https://github.com/alsa-project/alsa-lib/issues/465 Closes: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4809 Fixes: 50e0bae34fa6b ("drm/amd/display: Add and use new dm_prepare_suspend() callback") Signed-off-by: Mario Limonciello mario.limonciello@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 2fd653b9bb5aacec5d4c421ab290905898fe85a2) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 21 --------------------- 1 file changed, 21 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -3135,25 +3135,6 @@ static void dm_destroy_cached_state(stru dm->cached_state = NULL; }
-static void dm_complete(struct amdgpu_ip_block *ip_block) -{ - struct amdgpu_device *adev = ip_block->adev; - - dm_destroy_cached_state(adev); -} - -static int dm_prepare_suspend(struct amdgpu_ip_block *ip_block) -{ - struct amdgpu_device *adev = ip_block->adev; - - if (amdgpu_in_reset(adev)) - return 0; - - WARN_ON(adev->dm.cached_state); - - return dm_cache_state(adev); -} - static int dm_suspend(struct amdgpu_ip_block *ip_block) { struct amdgpu_device *adev = ip_block->adev; @@ -3579,10 +3560,8 @@ static const struct amd_ip_funcs amdgpu_ .early_fini = amdgpu_dm_early_fini, .hw_init = dm_hw_init, .hw_fini = dm_hw_fini, - .prepare_suspend = dm_prepare_suspend, .suspend = dm_suspend, .resume = dm_resume, - .complete = dm_complete, .is_idle = dm_is_idle, .wait_for_idle = dm_wait_for_idle, .check_soft_reset = dm_check_soft_reset,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pratap Nirujogi pratap.nirujogi@amd.com
commit 857ccfc19f9be1269716f3d681650c1bd149a656 upstream.
Declare isp firmware file isp_4_1_1.bin required by isp4.1.1 device.
Suggested-by: Alexey Zagorodnikov xglooom@gmail.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Pratap Nirujogi pratap.nirujogi@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit d97b74a833eba1f4f69f67198fd98ef036c0e5f9) Cc: stable@vger.kernel.org [ Adjust context ] Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/isp_v4_1_1.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/isp_v4_1_1.c +++ b/drivers/gpu/drm/amd/amdgpu/isp_v4_1_1.c @@ -29,6 +29,8 @@ #include "amdgpu.h" #include "isp_v4_1_1.h"
+MODULE_FIRMWARE("amdgpu/isp_4_1_1.bin"); + static const unsigned int isp_4_1_1_int_srcid[MAX_ISP411_INT_SRC] = { ISP_4_1__SRCID__ISP_RINGBUFFER_WPT9, ISP_4_1__SRCID__ISP_RINGBUFFER_WPT10,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiongfeng Wang wangxiongfeng2@huawei.com
commit e895f8e29119c8c966ea794af9e9100b10becb88 upstream.
When testing softirq based hrtimers on an ARM32 board, with high resolution mode and NOHZ inactive, softirq based hrtimers fail to expire after being moved away from an offline CPU:
CPU0 CPU1 hrtimer_start(..., HRTIMER_MODE_SOFT); cpu_down(CPU1) ... hrtimers_cpu_dying() // Migrate timers to CPU0 smp_call_function_single(CPU0, returgger_next_event); retrigger_next_event() if (!highres && !nohz) return;
As retrigger_next_event() is a NOOP when both high resolution timers and NOHZ are inactive CPU0's hrtimer_cpu_base::softirq_expires_next is not updated and the migrated softirq timers never expire unless there is a softirq based hrtimer queued on CPU0 later.
Fix this by removing the hrtimer_hres_active() and tick_nohz_active() check in retrigger_next_event(), which enforces a full update of the CPU base. As this is not a fast path the extra cost does not matter.
[ tglx: Massaged change log ]
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Co-developed-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/all/20250805081025.54235-1-wangxiongfeng2@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/hrtimer.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
--- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -787,10 +787,10 @@ static void retrigger_next_event(void *a * of the next expiring timer is enough. The return from the SMP * function call will take care of the reprogramming in case the * CPU was in a NOHZ idle sleep. + * + * In periodic low resolution mode, the next softirq expiration + * must also be updated. */ - if (!hrtimer_hres_active(base) && !tick_nohz_active) - return; - raw_spin_lock(&base->lock); hrtimer_update_base(base); if (hrtimer_hres_active(base)) @@ -2295,11 +2295,6 @@ int hrtimers_cpu_dying(unsigned int dyin &new_base->clock_base[i]); }
- /* - * The migration might have changed the first expiring softirq - * timer on this CPU. Update it. - */ - __hrtimer_get_next_event(new_base, HRTIMER_ACTIVE_SOFT); /* Tell the other CPU to retrigger the next event */ smp_call_function_single(ncpu, retrigger_next_event, NULL, 0);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Reinette Chatre reinette.chatre@intel.com
commit d2e1b84c5141ff2ad465279acfc3cf943c960b78 upstream.
Running resctrl_tests on an SNC-2 system with lockdep debugging enabled triggers several warnings with following trace:
WARNING: CPU: 0 PID: 1914 at kernel/cpu.c:528 lockdep_assert_cpus_held ... Call Trace: __mon_event_count ? __lock_acquire ? __pfx___mon_event_count mon_event_count ? __pfx_smp_mon_event_count smp_mon_event_count smp_call_on_cpu_callback
get_cpu_cacheinfo_level() called from __mon_event_count() requires CPU hotplug lock to be held. The hotplug lock is indeed held during this time, as confirmed by the lockdep_assert_cpus_held() within mon_event_read() that calls mon_event_count() via IPI, but the lockdep tracking is not able to follow the IPI.
Fresh CPU cache information via get_cpu_cacheinfo_level() from __mon_event_count() was added to support the fix for the issue where resctrl inappropriately maintained links to L3 cache information that will be stale in the case when the associated CPU goes offline.
Keep the cacheinfo ID in struct rdt_mon_domain to ensure that resctrl does not maintain stale cache information while CPUs can go offline. Return to using a pointer to the L3 cache information (struct cacheinfo) in struct rmid_read, rmid_read::ci. Initialize rmid_read::ci before the IPI where it is used. CPU hotplug lock is held across rmid_read::ci initialization and use to ensure that it points to accurate cache information.
Fixes: 594902c986e2 ("x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem") Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/resctrl/ctrlmondata.c | 2 +- fs/resctrl/internal.h | 4 ++-- fs/resctrl/monitor.c | 6 ++---- 3 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c index d98e0d2de09f..3c39cfacb251 100644 --- a/fs/resctrl/ctrlmondata.c +++ b/fs/resctrl/ctrlmondata.c @@ -625,11 +625,11 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg) */ list_for_each_entry(d, &r->mon_domains, hdr.list) { if (d->ci_id == domid) { - rr.ci_id = d->ci_id; cpu = cpumask_any(&d->hdr.cpu_mask); ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE); if (!ci) continue; + rr.ci = ci; mon_event_read(&rr, r, NULL, rdtgrp, &ci->shared_cpu_map, evtid, false); goto checkresult; diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h index 0a1eedba2b03..9a8cf6f11151 100644 --- a/fs/resctrl/internal.h +++ b/fs/resctrl/internal.h @@ -98,7 +98,7 @@ struct mon_data { * domains in @r sharing L3 @ci.id * @evtid: Which monitor event to read. * @first: Initialize MBM counter when true. - * @ci_id: Cacheinfo id for L3. Only set when @d is NULL. Used when summing domains. + * @ci: Cacheinfo for L3. Only set when @d is NULL. Used when summing domains. * @err: Error encountered when reading counter. * @val: Returned value of event counter. If @rgrp is a parent resource group, * @val includes the sum of event counts from its child resource groups. @@ -112,7 +112,7 @@ struct rmid_read { struct rdt_mon_domain *d; enum resctrl_event_id evtid; bool first; - unsigned int ci_id; + struct cacheinfo *ci; int err; u64 val; void *arch_mon_ctx; diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c index f5637855c3ac..7326c28a7908 100644 --- a/fs/resctrl/monitor.c +++ b/fs/resctrl/monitor.c @@ -361,7 +361,6 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr) { int cpu = smp_processor_id(); struct rdt_mon_domain *d; - struct cacheinfo *ci; struct mbm_state *m; int err, ret; u64 tval = 0; @@ -389,8 +388,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr) }
/* Summing domains that share a cache, must be on a CPU for that cache. */ - ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE); - if (!ci || ci->id != rr->ci_id) + if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map)) return -EINVAL;
/* @@ -402,7 +400,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr) */ ret = -EINVAL; list_for_each_entry(d, &rr->r->mon_domains, hdr.list) { - if (d->ci_id != rr->ci_id) + if (d->ci_id != rr->ci->id) continue; err = resctrl_arch_rmid_read(rr->r, d, closid, rmid, rr->evtid, &tval, rr->arch_mon_ctx);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: K Prateek Nayak kprateek.nayak@amd.com
commit cba4262a19afae21665ee242b3404bcede5a94d7 upstream.
Support for parsing the topology on AMD/Hygon processors using CPUID leaf 0xb was added in
3986a0a805e6 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available").
In an effort to keep all the topology parsing bits in one place, this commit also introduced a pseudo dependency on the TOPOEXT feature to parse the CPUID leaf 0xb.
The TOPOEXT feature (CPUID 0x80000001 ECX[22]) advertises the support for Cache Properties leaf 0x8000001d and the CPUID leaf 0x8000001e EAX for "Extended APIC ID" however support for 0xb was introduced alongside the x2APIC support not only on AMD [1], but also historically on x86 [2].
Similar to 0xb, the support for extended CPU topology leaf 0x80000026 too does not depend on the TOPOEXT feature.
The support for these leaves is expected to be confirmed by ensuring
leaf <= {extended_}cpuid_level
and then parsing the level 0 of the respective leaf to confirm EBX[15:0] (LogProcAtThisLevel) is non-zero as stated in the definition of "CPUID_Fn0000000B_EAX_x00 [Extended Topology Enumeration] (Core::X86::Cpuid::ExtTopEnumEax0)" in Processor Programming Reference (PPR) for AMD Family 19h Model 01h Rev B1 Vol1 [3] Sec. 2.1.15.1 "CPUID Instruction Functions".
This has not been a problem on baremetal platforms since support for TOPOEXT (Fam 0x15 and later) predates the support for CPUID leaf 0xb (Fam 0x17[Zen2] and later), however, for AMD guests on QEMU, the "x2apic" feature can be enabled independent of the "topoext" feature where QEMU expects topology and the initial APICID to be parsed using the CPUID leaf 0xb (especially when number of cores > 255) which is populated independent of the "topoext" feature flag.
Unconditionally call cpu_parse_topology_ext() on AMD and Hygon processors to first parse the topology using the XTOPOLOGY leaves (0x80000026 / 0xb) before using the TOPOEXT leaf (0x8000001e).
While at it, break down the single large comment in parse_topology_amd() to better highlight the purpose of each CPUID leaf.
Fixes: 3986a0a805e6 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available") Suggested-by: Naveen N Rao (AMD) naveen@kernel.org Signed-off-by: K Prateek Nayak kprateek.nayak@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@vger.kernel.org # Only v6.9 and above; depends on x86 topology rewrite Link: https://lore.kernel.org/lkml/1529686927-7665-1-git-send-email-suravee.suthik... [1] Link: https://lore.kernel.org/lkml/20080818181435.523309000@linux-os.sc.intel.com/ [2] Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 [3] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/topology_amd.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
--- a/arch/x86/kernel/cpu/topology_amd.c +++ b/arch/x86/kernel/cpu/topology_amd.c @@ -175,27 +175,30 @@ static void topoext_fixup(struct topo_sc
static void parse_topology_amd(struct topo_scan *tscan) { - bool has_topoext = false; - /* - * If the extended topology leaf 0x8000_001e is available - * try to get SMT, CORE, TILE, and DIE shifts from extended + * Try to get SMT, CORE, TILE, and DIE shifts from extended * CPUID leaf 0x8000_0026 on supported processors first. If * extended CPUID leaf 0x8000_0026 is not supported, try to - * get SMT and CORE shift from leaf 0xb first, then try to - * get the CORE shift from leaf 0x8000_0008. + * get SMT and CORE shift from leaf 0xb. If either leaf is + * available, cpu_parse_topology_ext() will return true. */ - if (cpu_feature_enabled(X86_FEATURE_TOPOEXT)) - has_topoext = cpu_parse_topology_ext(tscan); + bool has_xtopology = cpu_parse_topology_ext(tscan);
if (cpu_feature_enabled(X86_FEATURE_AMD_HTR_CORES)) tscan->c->topo.cpu_type = cpuid_ebx(0x80000026);
- if (!has_topoext && !parse_8000_0008(tscan)) + /* + * If XTOPOLOGY leaves (0x26/0xb) are not available, try to + * get the CORE shift from leaf 0x8000_0008 first. + */ + if (!has_xtopology && !parse_8000_0008(tscan)) return;
- /* Prefer leaf 0x8000001e if available */ - if (parse_8000_001e(tscan, has_topoext)) + /* + * Prefer leaf 0x8000001e if available to get the SMT shift and + * the initial APIC ID if XTOPOLOGY leaves are not available. + */ + if (parse_8000_001e(tscan, has_xtopology)) return;
/* Try the NODEID MSR */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff LaBundy jeff@labundy.com
commit c9ddc41cdd522f2db5d492eda3df8994d928be34 upstream.
If a proximity event node is defined so as to specify the wake-up properties of the touch surface, the proximity event interrupt is enabled unconditionally. This may result in unwanted interrupts.
Solve this problem by enabling the interrupt only if the event is mapped to a key or switch code.
Signed-off-by: Jeff LaBundy jeff@labundy.com Link: https://lore.kernel.org/r/aKJxxgEWpNaNcUaW@nixie71 Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/misc/iqs7222.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/input/misc/iqs7222.c +++ b/drivers/input/misc/iqs7222.c @@ -2427,6 +2427,9 @@ static int iqs7222_parse_chan(struct iqs if (error) return error;
+ if (!iqs7222->kp_type[chan_index][i]) + continue; + if (!dev_desc->event_offset) continue;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoffer Sandberg cs@tuxedo.de
commit 1939a9fcb80353dd8b111aa1e79c691afbde08b4 upstream.
Occasionally wakes up from suspend with missing input on the internal keyboard. Setting the quirks appears to fix the issue for this device as well.
Signed-off-by: Christoffer Sandberg cs@tuxedo.de Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250826142646.13516-1-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/serio/i8042-acpipnpio.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/drivers/input/serio/i8042-acpipnpio.h +++ b/drivers/input/serio/i8042-acpipnpio.h @@ -1155,6 +1155,20 @@ static const struct dmi_system_id i8042_ .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "XxHP4NAx"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "XxKK4NAx_XxSP4NAx"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, /* * A lot of modern Clevo barebones have touchpad and/or keyboard issues * after suspend fixable with the forcenorestore quirk.
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antheas Kapenekakis lkml@antheas.dev
commit 47ddf62b43eced739b56422cd55e86b9b4bb7dac upstream.
Add Flydigi Apex 5 to the list of recognized controllers.
Reported-by: Brandon Lin brandon@emergence.ltd Reported-by: Sergey Belozyorcev belozyorcev@ya.ru Signed-off-by: Antheas Kapenekakis lkml@antheas.dev Link: https://lore.kernel.org/r/20250903165114.2987905-1-lkml@antheas.dev Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/joystick/xpad.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/input/joystick/xpad.c +++ b/drivers/input/joystick/xpad.c @@ -422,6 +422,7 @@ static const struct xpad_device { { 0x3537, 0x1010, "GameSir G7 SE", 0, XTYPE_XBOXONE }, { 0x366c, 0x0005, "ByoWave Proteus Controller", MAP_SHARE_BUTTON, XTYPE_XBOXONE, FLAG_DELAY_INIT }, { 0x3767, 0x0101, "Fanatec Speedster 3 Forceshock Wheel", 0, XTYPE_XBOX }, + { 0x37d7, 0x2501, "Flydigi Apex 5", 0, XTYPE_XBOX360 }, { 0x413d, 0x2104, "Black Shark Green Ghost Gamepad", 0, XTYPE_XBOX360 }, { 0xffff, 0xffff, "Chinese-made Xbox Controller", 0, XTYPE_XBOX }, { 0x0000, 0x0000, "Generic X-Box pad", 0, XTYPE_UNKNOWN } @@ -578,6 +579,7 @@ static const struct usb_device_id xpad_t XPAD_XBOX360_VENDOR(0x3537), /* GameSir Controllers */ XPAD_XBOXONE_VENDOR(0x3537), /* GameSir Controllers */ XPAD_XBOXONE_VENDOR(0x366c), /* ByoWave controllers */ + XPAD_XBOX360_VENDOR(0x37d7), /* Flydigi Controllers */ XPAD_XBOX360_VENDOR(0x413d), /* Black Shark Green Ghost Controller */ { } };
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit 63a796558bc22ec699e4193d5c75534757ddf2e6 upstream.
This reverts commit 5537a4679403 ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"), it breaks operation of asix ethernet usb dongle after system suspend-resume cycle.
Link: https://lore.kernel.org/all/b5ea8296-f981-445d-a09a-2f389d7f6fdd@samsung.com... Fixes: 5537a4679403 ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups") Reported-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Jakub Kicinski kuba@kernel.org Link: https://patch.msgid.link/2945b9dbadb8ee1fee058b19554a5cb14f1763c1.1757601118... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/asix_devices.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -607,8 +607,15 @@ static const struct net_device_ops ax887
static void ax88772_suspend(struct usbnet *dev) { + struct asix_common_private *priv = dev->driver_priv; u16 medium;
+ if (netif_running(dev->net)) { + rtnl_lock(); + phylink_suspend(priv->phylink, false); + rtnl_unlock(); + } + /* Stop MAC operation */ medium = asix_read_medium_status(dev, 1); medium &= ~AX_MEDIUM_RE; @@ -637,6 +644,12 @@ static void ax88772_resume(struct usbnet for (i = 0; i < 3; i++) if (!priv->reset(dev, 1)) break; + + if (netif_running(dev->net)) { + rtnl_lock(); + phylink_resume(priv->phylink); + rtnl_unlock(); + } }
static int asix_resume(struct usb_interface *intf)
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabian Vogt fvogt@suse.de
commit cfd956dcb101aa3d25bac321fae923323a47c607 upstream.
After hvc_write completes, call hvc_kick also in the case the output buffer has been drained, to ensure tty_wakeup gets called.
This fixes that functions which wait for a drained buffer got stuck occasionally.
Cc: stable stable@kernel.org Closes: https://bugzilla.opensuse.org/show_bug.cgi?id=1230062 Signed-off-by: Fabian Vogt fvogt@suse.de Link: https://lore.kernel.org/r/2011735.PYKUYFuaPT@fvogt-thinkpad Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/hvc/hvc_console.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -543,10 +543,10 @@ static ssize_t hvc_write(struct tty_stru }
/* - * Racy, but harmless, kick thread if there is still pending data. + * Kick thread to flush if there's still pending data + * or to wakeup the write queue. */ - if (hp->n_outbuf) - hvc_kick(); + hvc_kick();
return written; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 535fd4c98452c87537a40610abba45daf5761ec6 upstream.
When trying to set MCR[2], XON1 is incorrectly accessed instead. And when writing to the TCR register to configure flow control levels, we are incorrectly writing to the MSR register. The default value of $00 is then used for TCR, which means that selectable trigger levels in FCR are used in place of TCR.
TCR/TLR access requires EFR[4] (enable enhanced functions) and MCR[2] to be set. EFR[4] is already set in probe().
MCR access requires LCR[7] to be zero.
Since LCR is set to $BF when trying to set MCR[2], XON1 is incorrectly accessed instead because MCR shares the same address space as XON1.
Since MCR[2] is unmodified and still zero, when writing to TCR we are in fact writing to MSR because TCR/TLR registers share the same address space as MSR/SPR.
Fix by first removing useless reconfiguration of EFR[4] (enable enhanced functions), as it is already enabled in sc16is7xx_probe() since commit 43c51bb573aa ("sc16is7xx: make sure device is in suspend once probed"). Now LCR is $00, which means that MCR access is enabled.
Also remove regcache_cache_bypass() calls since we no longer access the enhanced registers set, and TCR is already declared as volatile (in fact by declaring MSR as volatile, which shares the same address).
Finally disable access to TCR/TLR registers after modifying them by clearing MCR[2].
Note: the comment about "... and internal clock div" is wrong and can be ignored/removed as access to internal clock div registers (DLL/DLH) is permitted only when LCR[7] is logic 1, not when enhanced features is enabled. And DLL/DLH access is not needed in sc16is7xx_startup().
Fixes: dfeae619d781 ("serial: sc16is7xx") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20250731124451.1108864-1-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -1177,17 +1177,6 @@ static int sc16is7xx_startup(struct uart sc16is7xx_port_write(port, SC16IS7XX_FCR_REG, SC16IS7XX_FCR_FIFO_BIT);
- /* Enable EFR */ - sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, - SC16IS7XX_LCR_CONF_MODE_B); - - regcache_cache_bypass(one->regmap, true); - - /* Enable write access to enhanced features and internal clock div */ - sc16is7xx_port_update(port, SC16IS7XX_EFR_REG, - SC16IS7XX_EFR_ENABLE_BIT, - SC16IS7XX_EFR_ENABLE_BIT); - /* Enable TCR/TLR */ sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_TCRTLR_BIT, @@ -1199,7 +1188,8 @@ static int sc16is7xx_startup(struct uart SC16IS7XX_TCR_RX_RESUME(24) | SC16IS7XX_TCR_RX_HALT(48));
- regcache_cache_bypass(one->regmap, false); + /* Disable TCR/TLR access */ + sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_TCRTLR_BIT, 0);
/* Now, initialize the UART */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, SC16IS7XX_LCR_WORD_LEN_8);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit ee047e1d85d73496541c54bd4f432c9464e13e65 upstream.
Lists should have fixed constraints, because binding must be specific in respect to hardware, thus add missing constraints to number of clocks.
Cc: stable stable@kernel.org Fixes: 88a499cd70d4 ("dt-bindings: Add support for the Broadcom UART driver") Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20250812121630.67072-2-krzysztof.kozlowski@linaro.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml +++ b/Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml @@ -41,7 +41,7 @@ properties: - const: dma_intr2
clocks: - minItems: 1 + maxItems: 1
clock-names: const: sw_baud
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabio Porcedda fabio.porcedda@gmail.com
commit cba70aff623b104085ab5613fedd21f6ea19095a upstream.
Add the following Telit Cinterion FN990A w/audio compositions:
0x1077: tty (diag) + adb + rmnet + audio + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#= 8 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1077 Rev=05.04 S: Manufacturer=Telit Wireless Solutions S: Product=FN990 S: SerialNumber=67e04c35 C: #Ifs=10 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio I: If#= 4 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio E: Ad=03(O) Atr=0d(Isoc) MxPS= 68 Ivl=1ms I: If#= 5 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio E: Ad=84(I) Atr=0d(Isoc) MxPS= 68 Ivl=1ms I: If#= 6 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8c(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
0x1078: tty (diag) + adb + MBIM + audio + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#= 21 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1078 Rev=05.04 S: Manufacturer=Telit Wireless Solutions S: Product=FN990 S: SerialNumber=67e04c35 C: #Ifs=11 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#=10 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8c(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=32ms I: If#= 3 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio I: If#= 5 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio I: If#= 6 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio E: Ad=84(I) Atr=0d(Isoc) MxPS= 68 Ivl=1ms I: If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
0x1079: RNDIS + tty (diag) + adb + audio + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#= 23 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1079 Rev=05.04 S: Manufacturer=Telit Wireless Solutions S: Product=FN990 S: SerialNumber=67e04c35 C: #Ifs=11 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#=10 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8c(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio I: If#= 5 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio I: If#= 6 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio E: Ad=84(I) Atr=0d(Isoc) MxPS= 68 Ivl=1ms I: If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
Cc: stable@vger.kernel.org Signed-off-by: Fabio Porcedda fabio.porcedda@gmail.com Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1369,6 +1369,12 @@ static const struct usb_device_id option .driver_info = NCTRL(0) | RSVD(1) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1075, 0xff), /* Telit FN990A (PCIe) */ .driver_info = RSVD(0) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1077, 0xff), /* Telit FN990A (rmnet + audio) */ + .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1078, 0xff), /* Telit FN990A (MBIM + audio) */ + .driver_info = NCTRL(0) | RSVD(1) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1079, 0xff), /* Telit FN990A (RNDIS + audio) */ + .driver_info = NCTRL(2) | RSVD(3) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1080, 0xff), /* Telit FE990A (rmnet) */ .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1081, 0xff), /* Telit FE990A (MBIM) */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabio Porcedda fabio.porcedda@gmail.com
commit a5a261bea9bf8444300d1067b4a73bedee5b5227 upstream.
Add the following Telit Cinterion LE910C4-WWX new compositions:
0x1034: tty (AT) + tty (AT) + rmnet T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 8 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1034 Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x1036: tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1036 Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x1037: tty (diag) + tty (Telit custom) + tty (AT) + tty (AT) + rmnet T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 15 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1037 Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x1038: tty (Telit custom) + tty (AT) + tty (AT) + rmnet T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 9 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1038 Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x103b: tty (diag) + tty (Telit custom) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=103b Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x103c: tty (Telit custom) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 11 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=103c Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
Cc: stable@vger.kernel.org Signed-off-by: Fabio Porcedda fabio.porcedda@gmail.com Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1322,7 +1322,18 @@ static const struct usb_device_id option .driver_info = NCTRL(0) | RSVD(3) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1033, 0xff), /* Telit LE910C1-EUX (ECM) */ .driver_info = NCTRL(0) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1034, 0xff), /* Telit LE910C4-WWX (rmnet) */ + .driver_info = RSVD(2) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1035, 0xff) }, /* Telit LE910C4-WWX (ECM) */ + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1036, 0xff) }, /* Telit LE910C4-WWX */ + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1037, 0xff), /* Telit LE910C4-WWX (rmnet) */ + .driver_info = NCTRL(0) | NCTRL(1) | RSVD(4) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1038, 0xff), /* Telit LE910C4-WWX (rmnet) */ + .driver_info = NCTRL(0) | RSVD(3) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x103b, 0xff), /* Telit LE910C4-WWX */ + .driver_info = NCTRL(0) | NCTRL(1) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x103c, 0xff), /* Telit LE910C4-WWX */ + .driver_info = NCTRL(0) }, { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE922_USBCFG0), .driver_info = RSVD(0) | RSVD(1) | NCTRL(2) | RSVD(3) }, { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE922_USBCFG1),
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 1eae113dd5ff5192cfd3e11b6ab7b96193b42c01 ]
If a ma35_nand_chip_init() call fails, then a reference to 'nand_np' still needs to be released.
Use for_each_child_of_node_scoped() to fix the issue.
Fixes: 5abb5d414d55 ("mtd: rawnand: nuvoton: add new driver for the Nuvoton MA35 SoC") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/nand/raw/nuvoton-ma35d1-nand-controller.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/raw/nuvoton-ma35d1-nand-controller.c b/drivers/mtd/nand/raw/nuvoton-ma35d1-nand-controller.c index c23b537948d5e..1a285cd8fad62 100644 --- a/drivers/mtd/nand/raw/nuvoton-ma35d1-nand-controller.c +++ b/drivers/mtd/nand/raw/nuvoton-ma35d1-nand-controller.c @@ -935,10 +935,10 @@ static void ma35_chips_cleanup(struct ma35_nand_info *nand)
static int ma35_nand_chips_init(struct device *dev, struct ma35_nand_info *nand) { - struct device_node *np = dev->of_node, *nand_np; + struct device_node *np = dev->of_node; int ret;
- for_each_child_of_node(np, nand_np) { + for_each_child_of_node_scoped(np, nand_np) { ret = ma35_nand_chip_init(dev, nand, nand_np); if (ret) { ma35_chips_cleanup(nand);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chia-I Wu olvaffe@gmail.com
[ Upstream commit a00f2015acdbd8a4b3d2382eaeebe11db1925fad ]
A panthor group can have at most MAX_CS_PER_CSG panthor queues.
Fixes: 4bdca11507928 ("drm/panthor: Add the driver frontend block") Signed-off-by: Chia-I Wu olvaffe@gmail.com Reviewed-by: Boris Brezillon boris.brezillon@collabora.com # v1 Reviewed-by: Steven Price steven.price@arm.com Signed-off-by: Steven Price steven.price@arm.com Link: https://lore.kernel.org/r/20250903192133.288477-1-olvaffe@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panthor/panthor_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index 6200cad22563a..0f4ab9e5ef95c 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1093,7 +1093,7 @@ static int panthor_ioctl_group_create(struct drm_device *ddev, void *data, struct drm_panthor_queue_create *queue_args; int ret;
- if (!args->queues.count) + if (!args->queues.count || args->queues.count > MAX_CS_PER_CSG) return -EINVAL;
ret = PANTHOR_UOBJ_GET_ARRAY(queue_args, &args->queues);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren wahrenst@gmx.net
[ Upstream commit 03e79de4608bdd48ad6eec272e196124cefaf798 ]
The function of_phy_find_device may return NULL, so we need to take care before dereferencing phy_dev.
Fixes: 64a632da538a ("net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()") Signed-off-by: Stefan Wahren wahrenst@gmx.net Cc: Christoph Niedermaier cniedermaier@dh-electronics.com Cc: Richard Leitner richard.leitner@skidata.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Wei Fang wei.fang@nxp.com Link: https://patch.msgid.link/20250904091334.53965-1-wahrenst@gmx.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/fec_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 651b73163b6ee..5f15f42070c53 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -2358,7 +2358,8 @@ static void fec_enet_phy_reset_after_clk_enable(struct net_device *ndev) */ phy_dev = of_phy_find_device(fep->phy_node); phy_reset_after_clk_enable(phy_dev); - put_device(&phy_dev->mdio.dev); + if (phy_dev) + put_device(&phy_dev->mdio.dev); } }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 0ba5b2f2c381dbec9ed9e4ab3ae5d3e667de0dc3 ]
Currently phylink_resolve() protects itself against concurrent phylink_bringup_phy() or phylink_disconnect_phy() calls which modify pl->phydev by relying on pl->state_mutex.
The problem is that in phylink_resolve(), pl->state_mutex is in a lock inversion state with pl->phydev->lock. So pl->phydev->lock needs to be acquired prior to pl->state_mutex. But that requires dereferencing pl->phydev in the first place, and without pl->state_mutex, that is racy.
Hence the reason for the extra lock. Currently it is redundant, but it will serve a functional purpose once mutex_lock(&phy->lock) will be moved outside of the mutex_lock(&pl->state_mutex) section.
Another alternative considered would have been to let phylink_resolve() acquire the rtnl_mutex, which is also held when phylink_bringup_phy() and phylink_disconnect_phy() are called. But since phylink_disconnect_phy() runs under rtnl_lock(), it would deadlock with phylink_resolve() when calling flush_work(&pl->resolve). Additionally, it would have been undesirable because it would have unnecessarily blocked many other call paths as well in the entire kernel, so the smaller-scoped lock was preferred.
Link: https://lore.kernel.org/netdev/aLb6puGVzR29GpPx@shell.armlinux.org.uk/ Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Link: https://patch.msgid.link/20250904125238.193990-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: e2a10daba849 ("net: phy: transfer phy_config_inband() locking responsibility to phylink") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/phylink.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 0faa3d97e06b9..08cbb31e6dbc1 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -67,6 +67,8 @@ struct phylink { struct timer_list link_poll;
struct mutex state_mutex; + /* Serialize updates to pl->phydev with phylink_resolve() */ + struct mutex phydev_mutex; struct phylink_link_state phy_state; unsigned int phy_ib_mode; struct work_struct resolve; @@ -1568,8 +1570,11 @@ static void phylink_resolve(struct work_struct *w) struct phylink_link_state link_state; bool mac_config = false; bool retrigger = false; + struct phy_device *phy; bool cur_link_state;
+ mutex_lock(&pl->phydev_mutex); + phy = pl->phydev; mutex_lock(&pl->state_mutex); cur_link_state = phylink_link_is_up(pl);
@@ -1603,11 +1608,11 @@ static void phylink_resolve(struct work_struct *w) /* If we have a phy, the "up" state is the union of both the * PHY and the MAC */ - if (pl->phydev) + if (phy) link_state.link &= pl->phy_state.link;
/* Only update if the PHY link is up */ - if (pl->phydev && pl->phy_state.link) { + if (phy && pl->phy_state.link) { /* If the interface has changed, force a link down * event if the link isn't already down, and re-resolve. */ @@ -1671,6 +1676,7 @@ static void phylink_resolve(struct work_struct *w) queue_work(system_power_efficient_wq, &pl->resolve); } mutex_unlock(&pl->state_mutex); + mutex_unlock(&pl->phydev_mutex); }
static void phylink_run_resolve(struct phylink *pl) @@ -1806,6 +1812,7 @@ struct phylink *phylink_create(struct phylink_config *config, if (!pl) return ERR_PTR(-ENOMEM);
+ mutex_init(&pl->phydev_mutex); mutex_init(&pl->state_mutex); INIT_WORK(&pl->resolve, phylink_resolve);
@@ -2066,6 +2073,7 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy, dev_name(&phy->mdio.dev), phy->drv->name, irq_str); kfree(irq_str);
+ mutex_lock(&pl->phydev_mutex); mutex_lock(&phy->lock); mutex_lock(&pl->state_mutex); pl->phydev = phy; @@ -2111,6 +2119,7 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
mutex_unlock(&pl->state_mutex); mutex_unlock(&phy->lock); + mutex_unlock(&pl->phydev_mutex);
phylink_dbg(pl, "phy: %s setting supported %*pb advertising %*pb\n", @@ -2289,6 +2298,7 @@ void phylink_disconnect_phy(struct phylink *pl)
ASSERT_RTNL();
+ mutex_lock(&pl->phydev_mutex); phy = pl->phydev; if (phy) { mutex_lock(&phy->lock); @@ -2298,8 +2308,11 @@ void phylink_disconnect_phy(struct phylink *pl) pl->mac_tx_clk_stop = false; mutex_unlock(&pl->state_mutex); mutex_unlock(&phy->lock); - flush_work(&pl->resolve); + } + mutex_unlock(&pl->phydev_mutex);
+ if (phy) { + flush_work(&pl->resolve); phy_disconnect(phy); } }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit e2a10daba84968f6b5777d150985fd7d6abc9c84 ]
Problem description ===================
Lockdep reports a possible circular locking dependency (AB/BA) between &pl->state_mutex and &phy->lock, as follows.
phylink_resolve() // acquires &pl->state_mutex -> phylink_major_config() -> phy_config_inband() // acquires &pl->phydev->lock
whereas all the other call sites where &pl->state_mutex and &pl->phydev->lock have the locking scheme reversed. Everywhere else, &pl->phydev->lock is acquired at the top level, and &pl->state_mutex at the lower level. A clear example is phylink_bringup_phy().
The outlier is the newly introduced phy_config_inband() and the existing lock order is the correct one. To understand why it cannot be the other way around, it is sufficient to consider phylink_phy_change(), phylink's callback from the PHY device's phy->phy_link_change() virtual method, invoked by the PHY state machine.
phy_link_up() and phy_link_down(), the (indirect) callers of phylink_phy_change(), are called with &phydev->lock acquired. Then phylink_phy_change() acquires its own &pl->state_mutex, to serialize changes made to its pl->phy_state and pl->link_config. So all other instances of &pl->state_mutex and &phydev->lock must be consistent with this order.
Problem impact ==============
I think the kernel runs a serious deadlock risk if an existing phylink_resolve() thread, which results in a phy_config_inband() call, is concurrent with a phy_link_up() or phy_link_down() call, which will deadlock on &pl->state_mutex in phylink_phy_change(). Practically speaking, the impact may be limited by the slow speed of the medium auto-negotiation protocol, which makes it unlikely for the current state to still be unresolved when a new one is detected, but I think the problem is there. Nonetheless, the problem was discovered using lockdep.
Proposed solution =================
Practically speaking, the phy_config_inband() requirement of having phydev->lock acquired must transfer to the caller (phylink is the only caller). There, it must bubble up until immediately before &pl->state_mutex is acquired, for the cases where that takes place.
Solution details, considerations, notes =======================================
This is the phy_config_inband() call graph:
sfp_upstream_ops :: connect_phy() | v phylink_sfp_connect_phy() | v phylink_sfp_config_phy() | | sfp_upstream_ops :: module_insert() | | | v | phylink_sfp_module_insert() | | | | sfp_upstream_ops :: module_start() | | | | | v | | phylink_sfp_module_start() | | | | v v | phylink_sfp_config_optical() phylink_start() | | | phylink_resume() v v | | phylink_sfp_set_config() | | | v v v phylink_mac_initial_config() | phylink_resolve() | | phylink_ethtool_ksettings_set() v v v phylink_major_config() | v phy_config_inband()
phylink_major_config() caller #1, phylink_mac_initial_config(), does not acquire &pl->state_mutex nor do its callers. It must acquire &pl->phydev->lock prior to calling phylink_major_config().
phylink_major_config() caller #2, phylink_resolve() acquires &pl->state_mutex, thus also needs to acquire &pl->phydev->lock.
phylink_major_config() caller #3, phylink_ethtool_ksettings_set(), is completely uninteresting, because it only calls phylink_major_config() if pl->phydev is NULL (otherwise it calls phy_ethtool_ksettings_set()). We need to change nothing there.
Other solutions ===============
The lock inversion between &pl->state_mutex and &pl->phydev->lock has occurred at least once before, as seen in commit c718af2d00a3 ("net: phylink: fix ethtool -A with attached PHYs"). The solution there was to simply not call phy_set_asym_pause() under the &pl->state_mutex. That cannot be extended to our case though, where the phy_config_inband() call is much deeper inside the &pl->state_mutex section.
Fixes: 5fd0f1a02e75 ("net: phylink: add negotiation of in-band capabilities") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Link: https://patch.msgid.link/20250904125238.193990-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/phy.c | 12 ++++-------- drivers/net/phy/phylink.c | 9 +++++++++ 2 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 13df28445f020..c02da57a4da5e 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -1065,23 +1065,19 @@ EXPORT_SYMBOL_GPL(phy_inband_caps); */ int phy_config_inband(struct phy_device *phydev, unsigned int modes) { - int err; + lockdep_assert_held(&phydev->lock);
if (!!(modes & LINK_INBAND_DISABLE) + !!(modes & LINK_INBAND_ENABLE) + !!(modes & LINK_INBAND_BYPASS) != 1) return -EINVAL;
- mutex_lock(&phydev->lock); if (!phydev->drv) - err = -EIO; + return -EIO; else if (!phydev->drv->config_inband) - err = -EOPNOTSUPP; - else - err = phydev->drv->config_inband(phydev, modes); - mutex_unlock(&phydev->lock); + return -EOPNOTSUPP;
- return err; + return phydev->drv->config_inband(phydev, modes); } EXPORT_SYMBOL(phy_config_inband);
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 08cbb31e6dbc1..229a503d601ee 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -1411,6 +1411,7 @@ static void phylink_get_fixed_state(struct phylink *pl, static void phylink_mac_initial_config(struct phylink *pl, bool force_restart) { struct phylink_link_state link_state; + struct phy_device *phy = pl->phydev;
switch (pl->req_link_an_mode) { case MLO_AN_PHY: @@ -1434,7 +1435,11 @@ static void phylink_mac_initial_config(struct phylink *pl, bool force_restart) link_state.link = false;
phylink_apply_manual_flow(pl, &link_state); + if (phy) + mutex_lock(&phy->lock); phylink_major_config(pl, force_restart, &link_state); + if (phy) + mutex_unlock(&phy->lock); }
static const char *phylink_pause_to_str(int pause) @@ -1575,6 +1580,8 @@ static void phylink_resolve(struct work_struct *w)
mutex_lock(&pl->phydev_mutex); phy = pl->phydev; + if (phy) + mutex_lock(&phy->lock); mutex_lock(&pl->state_mutex); cur_link_state = phylink_link_is_up(pl);
@@ -1676,6 +1683,8 @@ static void phylink_resolve(struct work_struct *w) queue_work(system_power_efficient_wq, &pl->resolve); } mutex_unlock(&pl->state_mutex); + if (phy) + mutex_unlock(&phy->lock); mutex_unlock(&pl->phydev_mutex); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaoqing Pan miaoqing.pan@oss.qualcomm.com
[ Upstream commit 4b66d18918f8e4d85e51974a9e3ce9abad5c7c3d ]
Commit afbab6e4e88d ("wifi: ath12k: modify ath12k_mac_op_bss_info_changed() for MLO") replaced the bss_info_changed() callback with vif_cfg_changed() and link_info_changed() to support Multi-Link Operation (MLO). As a result, the station power save configuration is no longer correctly applied in ath12k_mac_bss_info_changed().
Move the handling of 'BSS_CHANGED_PS' into ath12k_mac_op_vif_cfg_changed() to align with the updated callback structure introduced for MLO, ensuring proper power-save behavior for station interfaces.
Tested-on: WCN7850 hw2.0 PCI WLAN.IOE_HMT.1.1-00011-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Fixes: afbab6e4e88d ("wifi: ath12k: modify ath12k_mac_op_bss_info_changed() for MLO") Signed-off-by: Miaoqing Pan miaoqing.pan@oss.qualcomm.com Reviewed-by: Baochen Qiang baochen.qiang@oss.qualcomm.com Link: https://patch.msgid.link/20250908015025.1301398-1-miaoqing.pan@oss.qualcomm.... Signed-off-by: Jeff Johnson jeff.johnson@oss.qualcomm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath12k/mac.c | 122 ++++++++++++++------------ 1 file changed, 67 insertions(+), 55 deletions(-)
diff --git a/drivers/net/wireless/ath/ath12k/mac.c b/drivers/net/wireless/ath/ath12k/mac.c index a885dd168a372..87944e4467050 100644 --- a/drivers/net/wireless/ath/ath12k/mac.c +++ b/drivers/net/wireless/ath/ath12k/mac.c @@ -3650,12 +3650,68 @@ static int ath12k_mac_fils_discovery(struct ath12k_link_vif *arvif, return ret; }
+static void ath12k_mac_vif_setup_ps(struct ath12k_link_vif *arvif) +{ + struct ath12k *ar = arvif->ar; + struct ieee80211_vif *vif = arvif->ahvif->vif; + struct ieee80211_conf *conf = &ath12k_ar_to_hw(ar)->conf; + enum wmi_sta_powersave_param param; + struct ieee80211_bss_conf *info; + enum wmi_sta_ps_mode psmode; + int ret; + int timeout; + bool enable_ps; + + lockdep_assert_wiphy(ath12k_ar_to_hw(ar)->wiphy); + + if (vif->type != NL80211_IFTYPE_STATION) + return; + + enable_ps = arvif->ahvif->ps; + if (enable_ps) { + psmode = WMI_STA_PS_MODE_ENABLED; + param = WMI_STA_PS_PARAM_INACTIVITY_TIME; + + timeout = conf->dynamic_ps_timeout; + if (timeout == 0) { + info = ath12k_mac_get_link_bss_conf(arvif); + if (!info) { + ath12k_warn(ar->ab, "unable to access bss link conf in setup ps for vif %pM link %u\n", + vif->addr, arvif->link_id); + return; + } + + /* firmware doesn't like 0 */ + timeout = ieee80211_tu_to_usec(info->beacon_int) / 1000; + } + + ret = ath12k_wmi_set_sta_ps_param(ar, arvif->vdev_id, param, + timeout); + if (ret) { + ath12k_warn(ar->ab, "failed to set inactivity time for vdev %d: %i\n", + arvif->vdev_id, ret); + return; + } + } else { + psmode = WMI_STA_PS_MODE_DISABLED; + } + + ath12k_dbg(ar->ab, ATH12K_DBG_MAC, "mac vdev %d psmode %s\n", + arvif->vdev_id, psmode ? "enable" : "disable"); + + ret = ath12k_wmi_pdev_set_ps_mode(ar, arvif->vdev_id, psmode); + if (ret) + ath12k_warn(ar->ab, "failed to set sta power save mode %d for vdev %d: %d\n", + psmode, arvif->vdev_id, ret); +} + static void ath12k_mac_op_vif_cfg_changed(struct ieee80211_hw *hw, struct ieee80211_vif *vif, u64 changed) { struct ath12k_vif *ahvif = ath12k_vif_to_ahvif(vif); unsigned long links = ahvif->links_map; + struct ieee80211_vif_cfg *vif_cfg; struct ieee80211_bss_conf *info; struct ath12k_link_vif *arvif; struct ieee80211_sta *sta; @@ -3719,61 +3775,24 @@ static void ath12k_mac_op_vif_cfg_changed(struct ieee80211_hw *hw, } } } -} - -static void ath12k_mac_vif_setup_ps(struct ath12k_link_vif *arvif) -{ - struct ath12k *ar = arvif->ar; - struct ieee80211_vif *vif = arvif->ahvif->vif; - struct ieee80211_conf *conf = &ath12k_ar_to_hw(ar)->conf; - enum wmi_sta_powersave_param param; - struct ieee80211_bss_conf *info; - enum wmi_sta_ps_mode psmode; - int ret; - int timeout; - bool enable_ps;
- lockdep_assert_wiphy(ath12k_ar_to_hw(ar)->wiphy); + if (changed & BSS_CHANGED_PS) { + links = ahvif->links_map; + vif_cfg = &vif->cfg;
- if (vif->type != NL80211_IFTYPE_STATION) - return; + for_each_set_bit(link_id, &links, IEEE80211_MLD_MAX_NUM_LINKS) { + arvif = wiphy_dereference(hw->wiphy, ahvif->link[link_id]); + if (!arvif || !arvif->ar) + continue;
- enable_ps = arvif->ahvif->ps; - if (enable_ps) { - psmode = WMI_STA_PS_MODE_ENABLED; - param = WMI_STA_PS_PARAM_INACTIVITY_TIME; + ar = arvif->ar;
- timeout = conf->dynamic_ps_timeout; - if (timeout == 0) { - info = ath12k_mac_get_link_bss_conf(arvif); - if (!info) { - ath12k_warn(ar->ab, "unable to access bss link conf in setup ps for vif %pM link %u\n", - vif->addr, arvif->link_id); - return; + if (ar->ab->hw_params->supports_sta_ps) { + ahvif->ps = vif_cfg->ps; + ath12k_mac_vif_setup_ps(arvif); } - - /* firmware doesn't like 0 */ - timeout = ieee80211_tu_to_usec(info->beacon_int) / 1000; } - - ret = ath12k_wmi_set_sta_ps_param(ar, arvif->vdev_id, param, - timeout); - if (ret) { - ath12k_warn(ar->ab, "failed to set inactivity time for vdev %d: %i\n", - arvif->vdev_id, ret); - return; - } - } else { - psmode = WMI_STA_PS_MODE_DISABLED; } - - ath12k_dbg(ar->ab, ATH12K_DBG_MAC, "mac vdev %d psmode %s\n", - arvif->vdev_id, psmode ? "enable" : "disable"); - - ret = ath12k_wmi_pdev_set_ps_mode(ar, arvif->vdev_id, psmode); - if (ret) - ath12k_warn(ar->ab, "failed to set sta power save mode %d for vdev %d: %d\n", - psmode, arvif->vdev_id, ret); }
static bool ath12k_mac_supports_station_tpc(struct ath12k *ar, @@ -3795,7 +3814,6 @@ static void ath12k_mac_bss_info_changed(struct ath12k *ar, { struct ath12k_vif *ahvif = arvif->ahvif; struct ieee80211_vif *vif = ath12k_ahvif_to_vif(ahvif); - struct ieee80211_vif_cfg *vif_cfg = &vif->cfg; struct cfg80211_chan_def def; u32 param_id, param_value; enum nl80211_band band; @@ -4069,12 +4087,6 @@ static void ath12k_mac_bss_info_changed(struct ath12k *ar, }
ath12k_mac_fils_discovery(arvif, info); - - if (changed & BSS_CHANGED_PS && - ar->ab->hw_params->supports_sta_ps) { - ahvif->ps = vif_cfg->ps; - ath12k_mac_vif_setup_ps(arvif); - } }
static struct ath12k_vif_cache *ath12k_ahvif_get_link_cache(struct ath12k_vif *ahvif,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sarika Sharma quic_sarishar@quicinc.com
[ Upstream commit 3b8aa249d0fce93590888a6ed3d22b458091ecb9 ]
Currently, statistics in arsta are updated at deflink for both non-ML and multi-link(ML) station. Link statistics are not updated for multi-link operation(MLO).
Hence, add support to correctly obtain the link ID if the peer is ML, fetch the arsta from the appropriate link ID, and update the statistics in the corresponding arsta.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Signed-off-by: Sarika Sharma quic_sarishar@quicinc.com Reviewed-by: Vasanthakumar Thiagarajan vasanthakumar.thiagarajan@oss.qualcomm.com Link: https://patch.msgid.link/20250701105927.803342-3-quic_sarishar@quicinc.com Signed-off-by: Jeff Johnson jeff.johnson@oss.qualcomm.com Stable-dep-of: 82e2be57d544 ("wifi: ath12k: fix WMI TLV header misalignment") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath12k/dp_mon.c | 22 ++++++++++++++------ drivers/net/wireless/ath/ath12k/dp_rx.c | 11 +++++----- drivers/net/wireless/ath/ath12k/peer.h | 26 ++++++++++++++++++++++++ 3 files changed, 48 insertions(+), 11 deletions(-)
diff --git a/drivers/net/wireless/ath/ath12k/dp_mon.c b/drivers/net/wireless/ath/ath12k/dp_mon.c index 91f4e3aff74c3..6a0915a0c7aae 100644 --- a/drivers/net/wireless/ath/ath12k/dp_mon.c +++ b/drivers/net/wireless/ath/ath12k/dp_mon.c @@ -3610,7 +3610,6 @@ ath12k_dp_mon_rx_update_user_stats(struct ath12k *ar, struct hal_rx_mon_ppdu_info *ppdu_info, u32 uid) { - struct ath12k_sta *ahsta; struct ath12k_link_sta *arsta; struct ath12k_rx_peer_stats *rx_stats = NULL; struct hal_rx_user_status *user_stats = &ppdu_info->userstats[uid]; @@ -3628,8 +3627,13 @@ ath12k_dp_mon_rx_update_user_stats(struct ath12k *ar, return; }
- ahsta = ath12k_sta_to_ahsta(peer->sta); - arsta = &ahsta->deflink; + arsta = ath12k_peer_get_link_sta(ar->ab, peer); + if (!arsta) { + ath12k_warn(ar->ab, "link sta not found on peer %pM id %d\n", + peer->addr, peer->peer_id); + return; + } + arsta->rssi_comb = ppdu_info->rssi_comb; ewma_avg_rssi_add(&arsta->avg_rssi, ppdu_info->rssi_comb); rx_stats = arsta->rx_stats; @@ -3742,7 +3746,6 @@ int ath12k_dp_mon_srng_process(struct ath12k *ar, int *budget, struct dp_srng *mon_dst_ring; struct hal_srng *srng; struct dp_rxdma_mon_ring *buf_ring; - struct ath12k_sta *ahsta = NULL; struct ath12k_link_sta *arsta; struct ath12k_peer *peer; struct sk_buff_head skb_list; @@ -3868,8 +3871,15 @@ int ath12k_dp_mon_srng_process(struct ath12k *ar, int *budget, }
if (ppdu_info->reception_type == HAL_RX_RECEPTION_TYPE_SU) { - ahsta = ath12k_sta_to_ahsta(peer->sta); - arsta = &ahsta->deflink; + arsta = ath12k_peer_get_link_sta(ar->ab, peer); + if (!arsta) { + ath12k_warn(ar->ab, "link sta not found on peer %pM id %d\n", + peer->addr, peer->peer_id); + spin_unlock_bh(&ab->base_lock); + rcu_read_unlock(); + dev_kfree_skb_any(skb); + continue; + } ath12k_dp_mon_rx_update_peer_su_stats(ar, arsta, ppdu_info); } else if ((ppdu_info->fc_valid) && diff --git a/drivers/net/wireless/ath/ath12k/dp_rx.c b/drivers/net/wireless/ath/ath12k/dp_rx.c index bd95dc88f9b21..e9137ffeb5ab4 100644 --- a/drivers/net/wireless/ath/ath12k/dp_rx.c +++ b/drivers/net/wireless/ath/ath12k/dp_rx.c @@ -1418,8 +1418,6 @@ ath12k_update_per_peer_tx_stats(struct ath12k *ar, { struct ath12k_base *ab = ar->ab; struct ath12k_peer *peer; - struct ieee80211_sta *sta; - struct ath12k_sta *ahsta; struct ath12k_link_sta *arsta; struct htt_ppdu_stats_user_rate *user_rate; struct ath12k_per_peer_tx_stats *peer_stats = &ar->peer_tx_stats; @@ -1500,9 +1498,12 @@ ath12k_update_per_peer_tx_stats(struct ath12k *ar, return; }
- sta = peer->sta; - ahsta = ath12k_sta_to_ahsta(sta); - arsta = &ahsta->deflink; + arsta = ath12k_peer_get_link_sta(ab, peer); + if (!arsta) { + spin_unlock_bh(&ab->base_lock); + rcu_read_unlock(); + return; + }
memset(&arsta->txrate, 0, sizeof(arsta->txrate));
diff --git a/drivers/net/wireless/ath/ath12k/peer.h b/drivers/net/wireless/ath/ath12k/peer.h index f3a5e054d2b55..92c4988df2f16 100644 --- a/drivers/net/wireless/ath/ath12k/peer.h +++ b/drivers/net/wireless/ath/ath12k/peer.h @@ -91,5 +91,31 @@ struct ath12k_peer *ath12k_peer_find_by_ast(struct ath12k_base *ab, int ast_hash int ath12k_peer_ml_create(struct ath12k_hw *ah, struct ieee80211_sta *sta); int ath12k_peer_ml_delete(struct ath12k_hw *ah, struct ieee80211_sta *sta); int ath12k_peer_mlo_link_peers_delete(struct ath12k_vif *ahvif, struct ath12k_sta *ahsta); +static inline +struct ath12k_link_sta *ath12k_peer_get_link_sta(struct ath12k_base *ab, + struct ath12k_peer *peer) +{ + struct ath12k_sta *ahsta; + struct ath12k_link_sta *arsta; + + if (!peer->sta) + return NULL; + + ahsta = ath12k_sta_to_ahsta(peer->sta); + if (peer->ml_id & ATH12K_PEER_ML_ID_VALID) { + if (!(ahsta->links_map & BIT(peer->link_id))) { + ath12k_warn(ab, "peer %pM id %d link_id %d can't found in STA link_map 0x%x\n", + peer->addr, peer->peer_id, peer->link_id, + ahsta->links_map); + return NULL; + } + arsta = rcu_dereference(ahsta->link[peer->link_id]); + if (!arsta) + return NULL; + } else { + arsta = &ahsta->deflink; + } + return arsta; +}
#endif /* _PEER_H_ */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sriram R quic_srirrama@quicinc.com
[ Upstream commit 9d2abd4162fca8a1eb46f664268dffad35c8ad20 ]
A multi-link client can use any link for transmissions. It can decide to put one link in power save mode for longer periods while listening on the other links as per MLD listen interval. Unicast management frames sent to that link station might get dropped if that link station is in power save mode or inactive. In such cases, firmware can take decision on which link to use.
Allow the firmware to decide on which link management frame should be sent on, by filling the hardware link with maximum value of u32, so that the firmware will not have a specific link to transmit data on and so the management frames will be link agnostic. For QCN devices, all action frames are marked as link agnostic. For WCN devices, if the device is configured as an AP, then all frames other than probe response frames, authentication frames, association response frames, re-association response frames and ADDBA response frames are marked as link agnostic and if the device is configured as a station, then all frames other than probe request frames, authentication frames, de-authentication frames and ADDBA response frames are marked as link agnostic.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Signed-off-by: Sriram R quic_srirrama@quicinc.com Co-developed-by: Roopni Devanathan quic_rdevanat@quicinc.com Signed-off-by: Roopni Devanathan quic_rdevanat@quicinc.com Reviewed-by: Vasanthakumar Thiagarajan vasanthakumar.thiagarajan@oss.qualcomm.com Link: https://patch.msgid.link/20250711091704.3704379-1-quic_rdevanat@quicinc.com Signed-off-by: Jeff Johnson jeff.johnson@oss.qualcomm.com Stable-dep-of: 82e2be57d544 ("wifi: ath12k: fix WMI TLV header misalignment") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath12k/core.h | 1 + drivers/net/wireless/ath/ath12k/hw.c | 55 +++++++++++++++++++++++++ drivers/net/wireless/ath/ath12k/hw.h | 2 + drivers/net/wireless/ath/ath12k/mac.c | 5 ++- drivers/net/wireless/ath/ath12k/peer.c | 2 +- drivers/net/wireless/ath/ath12k/peer.h | 2 + drivers/net/wireless/ath/ath12k/wmi.c | 56 ++++++++++++++++++++++++-- drivers/net/wireless/ath/ath12k/wmi.h | 16 +++++++- 8 files changed, 131 insertions(+), 8 deletions(-)
diff --git a/drivers/net/wireless/ath/ath12k/core.h b/drivers/net/wireless/ath/ath12k/core.h index 4bd286296da79..cebdf62ce3db9 100644 --- a/drivers/net/wireless/ath/ath12k/core.h +++ b/drivers/net/wireless/ath/ath12k/core.h @@ -116,6 +116,7 @@ static inline u64 ath12k_le32hilo_to_u64(__le32 hi, __le32 lo) enum ath12k_skb_flags { ATH12K_SKB_HW_80211_ENCAP = BIT(0), ATH12K_SKB_CIPHER_SET = BIT(1), + ATH12K_SKB_MLO_STA = BIT(2), };
struct ath12k_skb_cb { diff --git a/drivers/net/wireless/ath/ath12k/hw.c b/drivers/net/wireless/ath/ath12k/hw.c index ec77ad498b33a..6791ae1d64e50 100644 --- a/drivers/net/wireless/ath/ath12k/hw.c +++ b/drivers/net/wireless/ath/ath12k/hw.c @@ -14,6 +14,7 @@ #include "hw.h" #include "mhi.h" #include "dp_rx.h" +#include "peer.h"
static const guid_t wcn7850_uuid = GUID_INIT(0xf634f534, 0x6147, 0x11ec, 0x90, 0xd6, 0x02, 0x42, @@ -49,6 +50,12 @@ static bool ath12k_dp_srng_is_comp_ring_qcn9274(int ring_num) return false; }
+static bool ath12k_is_frame_link_agnostic_qcn9274(struct ath12k_link_vif *arvif, + struct ieee80211_mgmt *mgmt) +{ + return ieee80211_is_action(mgmt->frame_control); +} + static int ath12k_hw_mac_id_to_pdev_id_wcn7850(const struct ath12k_hw_params *hw, int mac_id) { @@ -74,6 +81,52 @@ static bool ath12k_dp_srng_is_comp_ring_wcn7850(int ring_num) return false; }
+static bool ath12k_is_addba_resp_action_code(struct ieee80211_mgmt *mgmt) +{ + if (!ieee80211_is_action(mgmt->frame_control)) + return false; + + if (mgmt->u.action.category != WLAN_CATEGORY_BACK) + return false; + + if (mgmt->u.action.u.addba_resp.action_code != WLAN_ACTION_ADDBA_RESP) + return false; + + return true; +} + +static bool ath12k_is_frame_link_agnostic_wcn7850(struct ath12k_link_vif *arvif, + struct ieee80211_mgmt *mgmt) +{ + struct ieee80211_vif *vif = ath12k_ahvif_to_vif(arvif->ahvif); + struct ath12k_hw *ah = ath12k_ar_to_ah(arvif->ar); + struct ath12k_base *ab = arvif->ar->ab; + __le16 fc = mgmt->frame_control; + + spin_lock_bh(&ab->base_lock); + if (!ath12k_peer_find_by_addr(ab, mgmt->da) && + !ath12k_peer_ml_find(ah, mgmt->da)) { + spin_unlock_bh(&ab->base_lock); + return false; + } + spin_unlock_bh(&ab->base_lock); + + if (vif->type == NL80211_IFTYPE_STATION) + return arvif->is_up && + (vif->valid_links == vif->active_links) && + !ieee80211_is_probe_req(fc) && + !ieee80211_is_auth(fc) && + !ieee80211_is_deauth(fc) && + !ath12k_is_addba_resp_action_code(mgmt); + + if (vif->type == NL80211_IFTYPE_AP) + return !(ieee80211_is_probe_resp(fc) || ieee80211_is_auth(fc) || + ieee80211_is_assoc_resp(fc) || ieee80211_is_reassoc_resp(fc) || + ath12k_is_addba_resp_action_code(mgmt)); + + return false; +} + static const struct ath12k_hw_ops qcn9274_ops = { .get_hw_mac_from_pdev_id = ath12k_hw_qcn9274_mac_from_pdev_id, .mac_id_to_pdev_id = ath12k_hw_mac_id_to_pdev_id_qcn9274, @@ -81,6 +134,7 @@ static const struct ath12k_hw_ops qcn9274_ops = { .rxdma_ring_sel_config = ath12k_dp_rxdma_ring_sel_config_qcn9274, .get_ring_selector = ath12k_hw_get_ring_selector_qcn9274, .dp_srng_is_tx_comp_ring = ath12k_dp_srng_is_comp_ring_qcn9274, + .is_frame_link_agnostic = ath12k_is_frame_link_agnostic_qcn9274, };
static const struct ath12k_hw_ops wcn7850_ops = { @@ -90,6 +144,7 @@ static const struct ath12k_hw_ops wcn7850_ops = { .rxdma_ring_sel_config = ath12k_dp_rxdma_ring_sel_config_wcn7850, .get_ring_selector = ath12k_hw_get_ring_selector_wcn7850, .dp_srng_is_tx_comp_ring = ath12k_dp_srng_is_comp_ring_wcn7850, + .is_frame_link_agnostic = ath12k_is_frame_link_agnostic_wcn7850, };
#define ATH12K_TX_RING_MASK_0 0x1 diff --git a/drivers/net/wireless/ath/ath12k/hw.h b/drivers/net/wireless/ath/ath12k/hw.h index 0a75bc5abfa24..9c69dd5a22afa 100644 --- a/drivers/net/wireless/ath/ath12k/hw.h +++ b/drivers/net/wireless/ath/ath12k/hw.h @@ -246,6 +246,8 @@ struct ath12k_hw_ops { int (*rxdma_ring_sel_config)(struct ath12k_base *ab); u8 (*get_ring_selector)(struct sk_buff *skb); bool (*dp_srng_is_tx_comp_ring)(int ring_num); + bool (*is_frame_link_agnostic)(struct ath12k_link_vif *arvif, + struct ieee80211_mgmt *mgmt); };
static inline diff --git a/drivers/net/wireless/ath/ath12k/mac.c b/drivers/net/wireless/ath/ath12k/mac.c index 87944e4467050..708dc3dd4347a 100644 --- a/drivers/net/wireless/ath/ath12k/mac.c +++ b/drivers/net/wireless/ath/ath12k/mac.c @@ -7685,7 +7685,7 @@ static int ath12k_mac_mgmt_tx_wmi(struct ath12k *ar, struct ath12k_link_vif *arv
skb_cb->paddr = paddr;
- ret = ath12k_wmi_mgmt_send(ar, arvif->vdev_id, buf_id, skb); + ret = ath12k_wmi_mgmt_send(arvif, buf_id, skb); if (ret) { ath12k_warn(ar->ab, "failed to send mgmt frame: %d\n", ret); goto err_unmap_buf; @@ -7997,6 +7997,9 @@ static void ath12k_mac_op_tx(struct ieee80211_hw *hw,
skb_cb->flags |= ATH12K_SKB_HW_80211_ENCAP; } else if (ieee80211_is_mgmt(hdr->frame_control)) { + if (sta && sta->mlo) + skb_cb->flags |= ATH12K_SKB_MLO_STA; + ret = ath12k_mac_mgmt_tx(ar, skb, is_prb_rsp); if (ret) { ath12k_warn(ar->ab, "failed to queue management frame %d\n", diff --git a/drivers/net/wireless/ath/ath12k/peer.c b/drivers/net/wireless/ath/ath12k/peer.c index ec7236bbccc0f..eb7aeff014903 100644 --- a/drivers/net/wireless/ath/ath12k/peer.c +++ b/drivers/net/wireless/ath/ath12k/peer.c @@ -8,7 +8,7 @@ #include "peer.h" #include "debug.h"
-static struct ath12k_ml_peer *ath12k_peer_ml_find(struct ath12k_hw *ah, const u8 *addr) +struct ath12k_ml_peer *ath12k_peer_ml_find(struct ath12k_hw *ah, const u8 *addr) { struct ath12k_ml_peer *ml_peer;
diff --git a/drivers/net/wireless/ath/ath12k/peer.h b/drivers/net/wireless/ath/ath12k/peer.h index 92c4988df2f16..44afc0b7dd53e 100644 --- a/drivers/net/wireless/ath/ath12k/peer.h +++ b/drivers/net/wireless/ath/ath12k/peer.h @@ -91,6 +91,8 @@ struct ath12k_peer *ath12k_peer_find_by_ast(struct ath12k_base *ab, int ast_hash int ath12k_peer_ml_create(struct ath12k_hw *ah, struct ieee80211_sta *sta); int ath12k_peer_ml_delete(struct ath12k_hw *ah, struct ieee80211_sta *sta); int ath12k_peer_mlo_link_peers_delete(struct ath12k_vif *ahvif, struct ath12k_sta *ahsta); +struct ath12k_ml_peer *ath12k_peer_ml_find(struct ath12k_hw *ah, + const u8 *addr); static inline struct ath12k_link_sta *ath12k_peer_get_link_sta(struct ath12k_base *ab, struct ath12k_peer *peer) diff --git a/drivers/net/wireless/ath/ath12k/wmi.c b/drivers/net/wireless/ath/ath12k/wmi.c index eac5d48cade66..d333f40408feb 100644 --- a/drivers/net/wireless/ath/ath12k/wmi.c +++ b/drivers/net/wireless/ath/ath12k/wmi.c @@ -782,20 +782,46 @@ struct sk_buff *ath12k_wmi_alloc_skb(struct ath12k_wmi_base *wmi_ab, u32 len) return skb; }
-int ath12k_wmi_mgmt_send(struct ath12k *ar, u32 vdev_id, u32 buf_id, +int ath12k_wmi_mgmt_send(struct ath12k_link_vif *arvif, u32 buf_id, struct sk_buff *frame) { + struct ath12k *ar = arvif->ar; struct ath12k_wmi_pdev *wmi = ar->wmi; struct wmi_mgmt_send_cmd *cmd; struct ieee80211_tx_info *info = IEEE80211_SKB_CB(frame); - struct wmi_tlv *frame_tlv; + struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)frame->data; + struct ieee80211_vif *vif = ath12k_ahvif_to_vif(arvif->ahvif); + int cmd_len = sizeof(struct ath12k_wmi_mgmt_send_tx_params); + struct ieee80211_mgmt *mgmt = (struct ieee80211_mgmt *)hdr; + struct ath12k_wmi_mlo_mgmt_send_params *ml_params; + struct ath12k_base *ab = ar->ab; + struct wmi_tlv *frame_tlv, *tlv; + struct ath12k_skb_cb *skb_cb; + u32 buf_len, buf_len_aligned; + u32 vdev_id = arvif->vdev_id; + bool link_agnostic = false; struct sk_buff *skb; - u32 buf_len; int ret, len; + void *ptr;
buf_len = min_t(int, frame->len, WMI_MGMT_SEND_DOWNLD_LEN);
- len = sizeof(*cmd) + sizeof(*frame_tlv) + roundup(buf_len, 4); + buf_len_aligned = roundup(buf_len, sizeof(u32)); + + len = sizeof(*cmd) + sizeof(*frame_tlv) + buf_len_aligned; + + if (ieee80211_vif_is_mld(vif)) { + skb_cb = ATH12K_SKB_CB(frame); + if ((skb_cb->flags & ATH12K_SKB_MLO_STA) && + ab->hw_params->hw_ops->is_frame_link_agnostic && + ab->hw_params->hw_ops->is_frame_link_agnostic(arvif, mgmt)) { + len += cmd_len + TLV_HDR_SIZE + sizeof(*ml_params); + ath12k_generic_dbg(ATH12K_DBG_MGMT, + "Sending Mgmt Frame fc 0x%0x as link agnostic", + mgmt->frame_control); + link_agnostic = true; + } + }
skb = ath12k_wmi_alloc_skb(wmi->wmi_ab, len); if (!skb) @@ -818,6 +844,28 @@ int ath12k_wmi_mgmt_send(struct ath12k *ar, u32 vdev_id, u32 buf_id,
memcpy(frame_tlv->value, frame->data, buf_len);
+ if (!link_agnostic) + goto send; + + ptr = skb->data + sizeof(*cmd) + sizeof(*frame_tlv) + buf_len_aligned; + + tlv = ptr; + + /* Tx params not used currently */ + tlv->header = ath12k_wmi_tlv_cmd_hdr(WMI_TAG_TX_SEND_PARAMS, cmd_len); + ptr += cmd_len; + + tlv = ptr; + tlv->header = ath12k_wmi_tlv_hdr(WMI_TAG_ARRAY_STRUCT, sizeof(*ml_params)); + ptr += TLV_HDR_SIZE; + + ml_params = ptr; + ml_params->tlv_header = ath12k_wmi_tlv_cmd_hdr(WMI_TAG_MLO_TX_SEND_PARAMS, + sizeof(*ml_params)); + + ml_params->hw_link_id = cpu_to_le32(WMI_MGMT_LINK_AGNOSTIC_ID); + +send: ret = ath12k_wmi_cmd_send(wmi, skb, WMI_MGMT_TX_SEND_CMDID); if (ret) { ath12k_warn(ar->ab, diff --git a/drivers/net/wireless/ath/ath12k/wmi.h b/drivers/net/wireless/ath/ath12k/wmi.h index 8627154f1680f..6dbcedcf08175 100644 --- a/drivers/net/wireless/ath/ath12k/wmi.h +++ b/drivers/net/wireless/ath/ath12k/wmi.h @@ -3963,6 +3963,7 @@ struct wmi_scan_chan_list_cmd { } __packed;
#define WMI_MGMT_SEND_DOWNLD_LEN 64 +#define WMI_MGMT_LINK_AGNOSTIC_ID 0xFFFFFFFF
#define WMI_TX_PARAMS_DWORD0_POWER GENMASK(7, 0) #define WMI_TX_PARAMS_DWORD0_MCS_MASK GENMASK(19, 8) @@ -3988,7 +3989,18 @@ struct wmi_mgmt_send_cmd {
/* This TLV is followed by struct wmi_mgmt_frame */
- /* Followed by struct wmi_mgmt_send_params */ + /* Followed by struct ath12k_wmi_mlo_mgmt_send_params */ +} __packed; + +struct ath12k_wmi_mlo_mgmt_send_params { + __le32 tlv_header; + __le32 hw_link_id; +} __packed; + +struct ath12k_wmi_mgmt_send_tx_params { + __le32 tlv_header; + __le32 tx_param_dword0; + __le32 tx_param_dword1; } __packed;
struct wmi_sta_powersave_mode_cmd { @@ -6183,7 +6195,7 @@ void ath12k_wmi_init_wcn7850(struct ath12k_base *ab, int ath12k_wmi_cmd_send(struct ath12k_wmi_pdev *wmi, struct sk_buff *skb, u32 cmd_id); struct sk_buff *ath12k_wmi_alloc_skb(struct ath12k_wmi_base *wmi_sc, u32 len); -int ath12k_wmi_mgmt_send(struct ath12k *ar, u32 vdev_id, u32 buf_id, +int ath12k_wmi_mgmt_send(struct ath12k_link_vif *arvif, u32 buf_id, struct sk_buff *frame); int ath12k_wmi_p2p_go_bcn_ie(struct ath12k *ar, u32 vdev_id, const u8 *p2p_ie);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaoqing Pan miaoqing.pan@oss.qualcomm.com
[ Upstream commit 82e2be57d544ff9ad4696c85600827b39be8ce9e ]
When buf_len is not 4-byte aligned in ath12k_wmi_mgmt_send(), the firmware asserts and triggers a recovery. The following error messages are observed:
ath12k_pci 0004:01:00.0: failed to submit WMI_MGMT_TX_SEND_CMDID cmd ath12k_pci 0004:01:00.0: failed to send mgmt frame: -108 ath12k_pci 0004:01:00.0: failed to tx mgmt frame, vdev_id 0 :-108 ath12k_pci 0004:01:00.0: waiting recovery start...
This issue was observed when running 'iw wlanx set power_save off/on' in MLO station mode, which triggers the sending of an SMPS action frame with a length of 27 bytes to the AP. To resolve the misalignment, use buf_len_aligned instead of buf_len when constructing the WMI TLV header.
Tested-on: WCN7850 hw2.0 PCI WLAN.IOE_HMT.1.1-00011-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices") Signed-off-by: Miaoqing Pan miaoqing.pan@oss.qualcomm.com Reviewed-by: Baochen Qiang baochen.qiang@oss.qualcomm.com Link: https://patch.msgid.link/20250908015139.1301437-1-miaoqing.pan@oss.qualcomm.... Signed-off-by: Jeff Johnson jeff.johnson@oss.qualcomm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath12k/wmi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath12k/wmi.c b/drivers/net/wireless/ath/ath12k/wmi.c index d333f40408feb..d740326079e1d 100644 --- a/drivers/net/wireless/ath/ath12k/wmi.c +++ b/drivers/net/wireless/ath/ath12k/wmi.c @@ -840,7 +840,7 @@ int ath12k_wmi_mgmt_send(struct ath12k_link_vif *arvif, u32 buf_id, cmd->tx_params_valid = 0;
frame_tlv = (struct wmi_tlv *)(skb->data + sizeof(*cmd)); - frame_tlv->header = ath12k_wmi_tlv_hdr(WMI_TAG_ARRAY_BYTE, buf_len); + frame_tlv->header = ath12k_wmi_tlv_hdr(WMI_TAG_ARRAY_BYTE, buf_len_aligned);
memcpy(frame_tlv->value, frame->data, buf_len);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Klaus Kudielka klaus.kudielka@gmail.com
[ Upstream commit b816265396daf1beb915e0ffbfd7f3906c2bf4a4 ]
5da3d94a23c6 ("PCI: mvebu: Use for_each_of_range() iterator for parsing "ranges"") simplified code by using the for_each_of_range() iterator, but it broke PCI enumeration on Turris Omnia (and probably other mvebu targets).
Issue #1:
To determine range.flags, of_pci_range_parser_one() uses bus->get_flags(), which resolves to of_bus_pci_get_flags(), which already returns an IORESOURCE bit field, and NOT the original flags from the "ranges" resource.
Then mvebu_get_tgt_attr() attempts the very same conversion again. Remove the misinterpretation of range.flags in mvebu_get_tgt_attr(), to restore the intended behavior.
Issue #2:
The driver needs target and attributes, which are encoded in the raw address values of the "/soc/pcie/ranges" resource. According to of_pci_range_parser_one(), the raw values are stored in range.bus_addr and range.parent_bus_addr, respectively. range.cpu_addr is a translated version of range.parent_bus_addr, and not relevant here.
Use the correct range structure member, to extract target and attributes. This restores the intended behavior.
Fixes: 5da3d94a23c6 ("PCI: mvebu: Use for_each_of_range() iterator for parsing "ranges"") Reported-by: Jan Palus jpalus@fastmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220479 Signed-off-by: Klaus Kudielka klaus.kudielka@gmail.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Tested-by: Tony Dinh mibodhi@gmail.com Tested-by: Jan Palus jpalus@fastmail.com Link: https://patch.msgid.link/20250907102303.29735-1-klaus.kudielka@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/pci-mvebu.c | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c index a4a2bac4f4b27..2f8d0223c1a6d 100644 --- a/drivers/pci/controller/pci-mvebu.c +++ b/drivers/pci/controller/pci-mvebu.c @@ -1168,12 +1168,6 @@ static void __iomem *mvebu_pcie_map_registers(struct platform_device *pdev, return devm_ioremap_resource(&pdev->dev, &port->regs); }
-#define DT_FLAGS_TO_TYPE(flags) (((flags) >> 24) & 0x03) -#define DT_TYPE_IO 0x1 -#define DT_TYPE_MEM32 0x2 -#define DT_CPUADDR_TO_TARGET(cpuaddr) (((cpuaddr) >> 56) & 0xFF) -#define DT_CPUADDR_TO_ATTR(cpuaddr) (((cpuaddr) >> 48) & 0xFF) - static int mvebu_get_tgt_attr(struct device_node *np, int devfn, unsigned long type, unsigned int *tgt, @@ -1189,19 +1183,12 @@ static int mvebu_get_tgt_attr(struct device_node *np, int devfn, return -EINVAL;
for_each_of_range(&parser, &range) { - unsigned long rtype; u32 slot = upper_32_bits(range.bus_addr);
- if (DT_FLAGS_TO_TYPE(range.flags) == DT_TYPE_IO) - rtype = IORESOURCE_IO; - else if (DT_FLAGS_TO_TYPE(range.flags) == DT_TYPE_MEM32) - rtype = IORESOURCE_MEM; - else - continue; - - if (slot == PCI_SLOT(devfn) && type == rtype) { - *tgt = DT_CPUADDR_TO_TARGET(range.cpu_addr); - *attr = DT_CPUADDR_TO_ATTR(range.cpu_addr); + if (slot == PCI_SLOT(devfn) && + type == (range.flags & IORESOURCE_TYPE_BITS)) { + *tgt = (range.parent_bus_addr >> 56) & 0xFF; + *attr = (range.parent_bus_addr >> 48) & 0xFF; return 0; } }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alok Tiwari alok.a.tiwari@oracle.com
[ Upstream commit 1dbfb0363224f6da56f6655d596dc5097308d6f5 ]
Per family bind/unbind callbacks were introduced to allow families to track multicast group consumer presence, e.g. to start or stop producing events depending on listeners.
However, in genl_bind() the bind() callback was invoked even if capability checks failed and ret was set to -EPERM. This means that callbacks could run on behalf of unauthorized callers while the syscall still returned failure to user space.
Fix this by only invoking bind() after "if (ret) break;" check i.e. after permission checks have succeeded.
Fixes: 3de21a8990d3 ("genetlink: Add per family bind/unbind callbacks") Signed-off-by: Alok Tiwari alok.a.tiwari@oracle.com Link: https://patch.msgid.link/20250905135731.3026965-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netlink/genetlink.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 104732d345434..978c129c60950 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -1836,6 +1836,9 @@ static int genl_bind(struct net *net, int group) !ns_capable(net->user_ns, CAP_SYS_ADMIN)) ret = -EPERM;
+ if (ret) + break; + if (family->bind) family->bind(i);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Gorski jonas.gorski@gmail.com
[ Upstream commit 674b34c4c770551e916ae707829c7faea4782d3a ]
For some reason Broadcom decided that BCM53101 uses 0.5s increments for the ageing time register, but kept the field width the same [1]. Due to this, the actual ageing time was always half of what was configured.
Fix this by adapting the limits and value calculation for BCM53101.
So far it looks like this is the only chip with the increased tick speed:
$ grep -l -r "Specifies the aging time in 0.5 seconds" cdk/PKG/chip | sort cdk/PKG/chip/bcm53101/bcm53101_a0_defs.h
$ grep -l -r "Specifies the aging time in seconds" cdk/PKG/chip | sort cdk/PKG/chip/bcm53010/bcm53010_a0_defs.h cdk/PKG/chip/bcm53020/bcm53020_a0_defs.h cdk/PKG/chip/bcm53084/bcm53084_a0_defs.h cdk/PKG/chip/bcm53115/bcm53115_a0_defs.h cdk/PKG/chip/bcm53118/bcm53118_a0_defs.h cdk/PKG/chip/bcm53125/bcm53125_a0_defs.h cdk/PKG/chip/bcm53128/bcm53128_a0_defs.h cdk/PKG/chip/bcm53134/bcm53134_a0_defs.h cdk/PKG/chip/bcm53242/bcm53242_a0_defs.h cdk/PKG/chip/bcm53262/bcm53262_a0_defs.h cdk/PKG/chip/bcm53280/bcm53280_a0_defs.h cdk/PKG/chip/bcm53280/bcm53280_b0_defs.h cdk/PKG/chip/bcm53600/bcm53600_a0_defs.h cdk/PKG/chip/bcm89500/bcm89500_a0_defs.h
[1] https://github.com/Broadcom/OpenMDK/blob/a5d3fc9b12af3eeb68f2ca0ce7ec4056cd1...
Fixes: e39d14a760c0 ("net: dsa: b53: implement setting ageing time") Signed-off-by: Jonas Gorski jonas.gorski@gmail.com Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Link: https://patch.msgid.link/20250905124507.59186-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/b53/b53_common.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index d15d912690c40..073d20241a4c9 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -1229,9 +1229,15 @@ static int b53_setup(struct dsa_switch *ds) */ ds->untag_vlan_aware_bridge_pvid = true;
- /* Ageing time is set in seconds */ - ds->ageing_time_min = 1 * 1000; - ds->ageing_time_max = AGE_TIME_MAX * 1000; + if (dev->chip_id == BCM53101_DEVICE_ID) { + /* BCM53101 uses 0.5 second increments */ + ds->ageing_time_min = 1 * 500; + ds->ageing_time_max = AGE_TIME_MAX * 500; + } else { + /* Everything else uses 1 second increments */ + ds->ageing_time_min = 1 * 1000; + ds->ageing_time_max = AGE_TIME_MAX * 1000; + }
ret = b53_reset_switch(dev); if (ret) { @@ -2448,7 +2454,10 @@ int b53_set_ageing_time(struct dsa_switch *ds, unsigned int msecs) else reg = B53_AGING_TIME_CONTROL;
- atc = DIV_ROUND_CLOSEST(msecs, 1000); + if (dev->chip_id == BCM53101_DEVICE_ID) + atc = DIV_ROUND_CLOSEST(msecs, 500); + else + atc = DIV_ROUND_CLOSEST(msecs, 1000);
if (!is5325(dev) && !is5365(dev)) atc |= AGE_CHANGE;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Machata petrm@nvidia.com
[ Upstream commit 8625f5748fea960d2af4f3c3e9891ee8f6f80906 ]
The bridge driver currently tolerates options that it does not recognize. Instead, it should bounce them.
Fixes: a428afe82f98 ("net: bridge: add support for user-controlled bool options") Signed-off-by: Petr Machata petrm@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://patch.msgid.link/e6fdca3b5a8d54183fbda075daffef38bdd7ddce.1757070067... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/net/bridge/br.c b/net/bridge/br.c index 0adeafe11a365..ad2d8f59fc7bc 100644 --- a/net/bridge/br.c +++ b/net/bridge/br.c @@ -324,6 +324,13 @@ int br_boolopt_multi_toggle(struct net_bridge *br, int err = 0; int opt_id;
+ opt_id = find_next_bit(&bitmap, BITS_PER_LONG, BR_BOOLOPT_MAX); + if (opt_id != BITS_PER_LONG) { + NL_SET_ERR_MSG_FMT_MOD(extack, "Unknown boolean option %d", + opt_id); + return -EINVAL; + } + for_each_set_bit(opt_id, &bitmap, BR_BOOLOPT_MAX) { bool on = !!(bm->optval & BIT(opt_id));
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart atenart@kernel.org
[ Upstream commit e3c674db356c4303804b2415e7c2b11776cdd8c3 ]
If a GSO skb is sent through a Geneve tunnel and if Geneve options are added, the split GSO skb might not fit in the MTU anymore and an ICMP frag needed packet can be generated. In such case the ICMP packet might go through the segmentation logic (and dropped) later if it reaches a path were the GSO status is checked and segmentation is required.
This is especially true when an OvS bridge is used with a Geneve tunnel attached to it. The following set of actions could lead to the ICMP packet being wrongfully segmented:
1. An skb is constructed by the TCP layer (e.g. gso_type SKB_GSO_TCPV4, segs >= 2).
2. The skb hits the OvS bridge where Geneve options are added by an OvS action before being sent through the tunnel.
3. When the skb is xmited in the tunnel, the split skb does not fit anymore in the MTU and iptunnel_pmtud_build_icmp is called to generate an ICMP fragmentation needed packet. This is done by reusing the original (GSO!) skb. The GSO metadata is not cleared.
4. The ICMP packet being sent back hits the OvS bridge again and because skb_is_gso returns true, it goes through queue_gso_packets...
5. ...where __skb_gso_segment is called. The skb is then dropped.
6. Note that in the above example on re-transmission the skb won't be a GSO one as it would be segmented (len > MSS) and the ICMP packet should go through.
Fix this by resetting the GSO information before reusing an skb in iptunnel_pmtud_build_icmp and iptunnel_pmtud_build_icmpv6.
Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Reported-by: Adrian Moreno amorenoz@redhat.com Signed-off-by: Antoine Tenart atenart@kernel.org Reviewed-by: Stefano Brivio sbrivio@redhat.com Link: https://patch.msgid.link/20250904125351.159740-1-atenart@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_tunnel_core.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index f65d2f7273813..8392d304a72eb 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -204,6 +204,9 @@ static int iptunnel_pmtud_build_icmp(struct sk_buff *skb, int mtu) if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct iphdr))) return -EINVAL;
+ if (skb_is_gso(skb)) + skb_gso_reset(skb); + skb_copy_bits(skb, skb_mac_offset(skb), &eh, ETH_HLEN); pskb_pull(skb, ETH_HLEN); skb_reset_network_header(skb); @@ -298,6 +301,9 @@ static int iptunnel_pmtud_build_icmpv6(struct sk_buff *skb, int mtu) if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct ipv6hdr))) return -EINVAL;
+ if (skb_is_gso(skb)) + skb_gso_reset(skb); + skb_copy_bits(skb, skb_mac_offset(skb), &eh, ETH_HLEN); pskb_pull(skb, ETH_HLEN); skb_reset_network_header(skb);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Tran alex.t.tran@gmail.com
[ Upstream commit 641427d5bf90af0625081bf27555418b101274cd ]
The documentation of the 'bcm_msg_head' struct does not match how it is defined in 'bcm.h'. Changed the frames member to a flexible array, matching the definition in the header file.
See commit 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members")
Signed-off-by: Alex Tran alex.t.tran@gmail.com Acked-by: Oliver Hartkopp socketcan@hartkopp.net Link: https://patch.msgid.link/20250904031709.1426895-1-alex.t.tran@gmail.com Fixes: 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217783 Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/can.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/networking/can.rst b/Documentation/networking/can.rst index b018ce3463926..515a3876f58cf 100644 --- a/Documentation/networking/can.rst +++ b/Documentation/networking/can.rst @@ -742,7 +742,7 @@ The broadcast manager sends responses to user space in the same form: struct timeval ival1, ival2; /* count and subsequent interval */ canid_t can_id; /* unique can_id for task */ __u32 nframes; /* number of can_frames following */ - struct can_frame frames[0]; + struct can_frame frames[]; };
The aligned payload 'frames' uses the same basic CAN frame structure defined
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tianyu Xu tianyxu@cisco.com
[ Upstream commit 75871a525a596ff4d16c4aebc0018f8d0923c9b1 ]
The igb driver currently causes a NULL pointer dereference when executing the ethtool loopback test. This occurs because there is no associated q_vector for the test ring when it is set up, as interrupts are typically not added to the test rings.
Since commit 5ef44b3cb43b removed the napi_id assignment in __xdp_rxq_info_reg(), there is no longer a need to pass a napi_id to it. Therefore, simply use 0 as the last parameter.
Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support") Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Reviewed-by: Joe Damato joe@dama.to Signed-off-by: Tianyu Xu tianyxu@cisco.com Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Tested-by: Rinitha S sx.rinitha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index b76a154e635e0..d87438bef6fba 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -4451,8 +4451,7 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring) if (xdp_rxq_info_is_reg(&rx_ring->xdp_rxq)) xdp_rxq_info_unreg(&rx_ring->xdp_rxq); res = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, - rx_ring->queue_index, - rx_ring->q_vector->napi.napi_id); + rx_ring->queue_index, 0); if (res < 0) { dev_err(dev, "Failed to register xdp_rxq index %u\n", rx_ring->queue_index);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kohei Enju enjuk@amazon.com
[ Upstream commit d709f178abca22a4d3642513df29afe4323a594b ]
The igb driver incorrectly skips the link test when the network interface is admin down (if_running == false), causing the test to always report PASS regardless of the actual physical link state.
This behavior is inconsistent with other drivers (e.g. i40e, ice, ixgbe, etc.) which correctly test the physical link state regardless of admin state. Remove the if_running check to ensure link test always reflects the physical link state.
Fixes: 8d420a1b3ea6 ("igb: correct link test not being run when link is down") Signed-off-by: Kohei Enju enjuk@amazon.com Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Tested-by: Rinitha S sx.rinitha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_ethtool.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index ca6ccbc139548..6412c84e2d17d 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -2081,11 +2081,8 @@ static void igb_diag_test(struct net_device *netdev, } else { dev_info(&adapter->pdev->dev, "online testing starting\n");
- /* PHY is powered down when interface is down */ - if (if_running && igb_link_test(adapter, &data[TEST_LINK])) + if (igb_link_test(adapter, &data[TEST_LINK])) eth_test->flags |= ETH_TEST_FL_FAILED; - else - data[TEST_LINK] = 0;
/* Online tests aren't run; pass by default */ data[TEST_REG] = 0;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Schmidt mschmidt@redhat.com
[ Upstream commit 915470e1b44e71d1dd07ee067276f003c3521ee3 ]
If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration later than the first, the error path wants to free the IRQs requested so far. However, it uses the wrong dev_id argument for free_irq(), so it does not free the IRQs correctly and instead triggers the warning:
Trying to free already-free IRQ 173 WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0 Modules linked in: i40e(+) [...] CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy) Hardware name: [...] RIP: 0010:__free_irq+0x192/0x2c0 [...] Call Trace: <TASK> free_irq+0x32/0x70 i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e] i40e_vsi_request_irq+0x79/0x80 [i40e] i40e_vsi_open+0x21f/0x2f0 [i40e] i40e_open+0x63/0x130 [i40e] __dev_open+0xfc/0x210 __dev_change_flags+0x1fc/0x240 netif_change_flags+0x27/0x70 do_setlink.isra.0+0x341/0xc70 rtnl_newlink+0x468/0x860 rtnetlink_rcv_msg+0x375/0x450 netlink_rcv_skb+0x5c/0x110 netlink_unicast+0x288/0x3c0 netlink_sendmsg+0x20d/0x430 ____sys_sendmsg+0x3a2/0x3d0 ___sys_sendmsg+0x99/0xe0 __sys_sendmsg+0x8a/0xf0 do_syscall_64+0x82/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] </TASK> ---[ end trace 0000000000000000 ]---
Use the same dev_id for free_irq() as for request_irq().
I tested this with inserting code to fail intentionally.
Fixes: 493fb30011b3 ("i40e: Move q_vectors from pointer to array to array of pointers") Signed-off-by: Michal Schmidt mschmidt@redhat.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Reviewed-by: Subbaraya Sundeep sbhatta@marvell.com Tested-by: Rinitha S sx.rinitha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index f1c9e575703ea..26dcdceae741e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -4182,7 +4182,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename) irq_num = pf->msix_entries[base + vector].vector; irq_set_affinity_notifier(irq_num, NULL); irq_update_affinity_hint(irq_num, NULL); - free_irq(irq_num, &vsi->q_vectors[vector]); + free_irq(irq_num, vsi->q_vectors[vector]); } return err; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Wajdeczko michal.wajdeczko@intel.com
[ Upstream commit 7934fdc25ad642ab3dbc16d734ab58638520ea60 ]
This is a user controlled configfs attribute, we should not modify that outside the configfs attr.store() implementation.
Fixes: bc417e54e24b ("drm/xe: Enable configfs support for survivability mode") Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: Riana Tauro riana.tauro@intel.com Reviewed-by: Stuart Summers stuart.summers@intel.com Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com Link: https://lore.kernel.org/r/20250904103521.7130-1-michal.wajdeczko@intel.com (cherry picked from commit 079a5c83dbd23db7a6eed8f558cf75e264d8a17b) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_survivability_mode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c index 1f710b3fc599b..5ae3d70e45167 100644 --- a/drivers/gpu/drm/xe/xe_survivability_mode.c +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c @@ -40,6 +40,8 @@ * * # echo 1 > /sys/kernel/config/xe/0000:03:00.0/survivability_mode * + * It is the responsibility of the user to clear the mode once firmware flash is complete. + * * Refer :ref:`xe_configfs` for more details on how to use configfs * * Survivability mode is indicated by the below admin-only readable sysfs which provides additional @@ -146,7 +148,6 @@ static void xe_survivability_mode_fini(void *arg) struct pci_dev *pdev = to_pci_dev(xe->drm.dev); struct device *dev = &pdev->dev;
- xe_configfs_clear_survivability_mode(pdev); sysfs_remove_file(&dev->kobj, &dev_attr_survivability_mode.attr); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit 1d66c3f2b8c0b5c51f3f4fe29b362c9851190c5a ]
This function can be called from an atomic context so we can't use fsleep().
Fixes: 01f60348d8fb ("drm/amd/display: Fix 'failed to blank crtc!'") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4549 Cc: Wen Chen Wen.Chen3@amd.com Cc: Fangzhi Zuo jerry.zuo@amd.com Cc: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Cc: Harry Wentland harry.wentland@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 27e4dc2c0543fd1808cc52bd888ee1e0533c4a2e) Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c index cdb8685ae7d71..454e362ff096a 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c @@ -955,7 +955,7 @@ enum dc_status dcn20_enable_stream_timing( return DC_ERROR_UNEXPECTED; }
- fsleep(stream->timing.v_total * (stream->timing.h_total * 10000u / stream->timing.pix_clk_100hz)); + udelay(stream->timing.v_total * (stream->timing.h_total * 10000u / stream->timing.pix_clk_100hz));
params.vertical_total_min = stream->adjust.v_total_min; params.vertical_total_max = stream->adjust.v_total_max;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carolina Jubran cjubran@nvidia.com
[ Upstream commit 686cab5a18e443e1d5f2abb17bed45837836425f ]
ndo hwtstamp callbacks are expected to run under the per-device ops lock. Make the lower get/set paths consistent with the rest of ndo invocations.
Kernel log: WARNING: CPU: 13 PID: 51364 at ./include/net/netdev_lock.h:70 __netdev_update_features+0x4bd/0xe60 ... RIP: 0010:__netdev_update_features+0x4bd/0xe60 ... Call Trace: <TASK> netdev_update_features+0x1f/0x60 mlx5_hwtstamp_set+0x181/0x290 [mlx5_core] mlx5e_hwtstamp_set+0x19/0x30 [mlx5_core] dev_set_hwtstamp_phylib+0x9f/0x220 dev_set_hwtstamp_phylib+0x9f/0x220 dev_set_hwtstamp+0x13d/0x240 dev_ioctl+0x12f/0x4b0 sock_ioctl+0x171/0x370 __x64_sys_ioctl+0x3f7/0x900 ? __sys_setsockopt+0x69/0xb0 do_syscall_64+0x6f/0x2e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> .... ---[ end trace 0000000000000000 ]---
Note that the mlx5_hwtstamp_set and mlx5e_hwtstamp_set functions shown in the trace come from an in progress patch converting the legacy ioctl to ndo_hwtstamp_get/set and are not present in mainline.
Fixes: ffb7ed19ac0a ("net: hold netdev instance lock during ioctl operations") Signed-off-by: Carolina Jubran cjubran@nvidia.com Reviewed-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Link: https://patch.msgid.link/20250907080821.2353388-1-cjubran@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev_ioctl.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index 616479e714663..9447065d01afb 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -464,8 +464,15 @@ int generic_hwtstamp_get_lower(struct net_device *dev, if (!netif_device_present(dev)) return -ENODEV;
- if (ops->ndo_hwtstamp_get) - return dev_get_hwtstamp_phylib(dev, kernel_cfg); + if (ops->ndo_hwtstamp_get) { + int err; + + netdev_lock_ops(dev); + err = dev_get_hwtstamp_phylib(dev, kernel_cfg); + netdev_unlock_ops(dev); + + return err; + }
/* Legacy path: unconverted lower driver */ return generic_hwtstamp_ioctl_lower(dev, SIOCGHWTSTAMP, kernel_cfg); @@ -481,8 +488,15 @@ int generic_hwtstamp_set_lower(struct net_device *dev, if (!netif_device_present(dev)) return -ENODEV;
- if (ops->ndo_hwtstamp_set) - return dev_set_hwtstamp_phylib(dev, kernel_cfg, extack); + if (ops->ndo_hwtstamp_set) { + int err; + + netdev_lock_ops(dev); + err = dev_set_hwtstamp_phylib(dev, kernel_cfg, extack); + netdev_unlock_ops(dev); + + return err; + }
/* Legacy path: unconverted lower driver */ return generic_hwtstamp_ioctl_lower(dev, SIOCSHWTSTAMP, kernel_cfg);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stanislav Fomichev sdf@fomichev.me
[ Upstream commit 0f82c3ba66c6b2e3cde0f255156a753b108ee9dc ]
Syzkaller managed to lock the lower device via ETHTOOL_SFEATURES:
netdev_lock include/linux/netdevice.h:2761 [inline] netdev_lock_ops include/net/netdev_lock.h:42 [inline] netdev_sync_lower_features net/core/dev.c:10649 [inline] __netdev_update_features+0xcb1/0x1be0 net/core/dev.c:10819 netdev_update_features+0x6d/0xe0 net/core/dev.c:10876 macsec_notify+0x2f5/0x660 drivers/net/macsec.c:4533 notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85 call_netdevice_notifiers_extack net/core/dev.c:2267 [inline] call_netdevice_notifiers net/core/dev.c:2281 [inline] netdev_features_change+0x85/0xc0 net/core/dev.c:1570 __dev_ethtool net/ethtool/ioctl.c:3469 [inline] dev_ethtool+0x1536/0x19b0 net/ethtool/ioctl.c:3502 dev_ioctl+0x392/0x1150 net/core/dev_ioctl.c:759
It happens because lower features are out of sync with the upper:
__dev_ethtool (real_dev) netdev_lock_ops(real_dev) ETHTOOL_SFEATURES __netdev_features_change netdev_sync_upper_features disable LRO on the lower if (old_features != dev->features) netdev_features_change fires NETDEV_FEAT_CHANGE macsec_notify NETDEV_FEAT_CHANGE netdev_update_features (for each macsec dev) netdev_sync_lower_features if (upper_features != lower_features) netdev_lock_ops(lower) # lower == real_dev stuck ...
netdev_unlock_ops(real_dev)
Per commit af5f54b0ef9e ("net: Lock lower level devices when updating features"), we elide the lock/unlock when the upper and lower features are synced. Makes sure the lower (real_dev) has proper features after the macsec link has been created. This makes sure we never hit the situation where we need to sync upper flags to the lower.
Reported-by: syzbot+7e0f89fb6cae5d002de0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7e0f89fb6cae5d002de0 Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Stanislav Fomichev sdf@fomichev.me Reviewed-by: Sabrina Dubroca sd@queasysnail.net Link: https://patch.msgid.link/20250908173614.3358264-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/macsec.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 01329fe7451a1..0eca96eeed58a 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -4286,6 +4286,7 @@ static int macsec_newlink(struct net_device *dev, if (err < 0) goto del_dev;
+ netdev_update_features(dev); netif_stacked_transfer_operstate(real_dev, dev); linkwatch_fire_event(dev);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit d013ebc3499fd87cb9dee1dafd0c58aeb05c27c1 ]
A proper kernel configuration for running kselftest can be obtained with:
$ yes | make kselftest-merge
Build of 'vcan' driver is currently missing, while the other required knobs are already there because of net/link_netns.py [1]. Add a config file in selftests/net/can to store the minimum set of kconfig needed for CAN selftests.
[1] https://patch.msgid.link/20250219125039.18024-14-shaw.leon@gmail.com
Fixes: 77442ffa83e8 ("selftests: can: Import tst-filter from can-tests") Reviewed-by: Vincent Mailhol mailhol@kernel.org Signed-off-by: Davide Caratti dcaratti@redhat.com Link: https://patch.msgid.link/fa4c0ea262ec529f25e5f5aa9269d84764c67321.1757516009... Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/can/config | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 tools/testing/selftests/net/can/config
diff --git a/tools/testing/selftests/net/can/config b/tools/testing/selftests/net/can/config new file mode 100644 index 0000000000000..188f797966709 --- /dev/null +++ b/tools/testing/selftests/net/can/config @@ -0,0 +1,3 @@ +CONFIG_CAN=m +CONFIG_CAN_DEV=m +CONFIG_CAN_VCAN=m
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 7fcbe5b2c6a4b5407bf2241fdb71e0a390f6ab9a ]
syzbot is reporting
unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
problem, for j1939 protocol did not have NETDEV_UNREGISTER notification handler for undoing changes made by j1939_sk_bind().
Commit 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct callback") expects that a call to j1939_priv_put() can be unconditionally delayed until j1939_sk_sock_destruct() is called. But we need to call j1939_priv_put() against an extra ref held by j1939_sk_bind() call (as a part of undoing changes made by j1939_sk_bind()) as soon as NETDEV_UNREGISTER notification fires (i.e. before j1939_sk_sock_destruct() is called via j1939_sk_release()). Otherwise, the extra ref on "struct j1939_priv" held by j1939_sk_bind() call prevents "struct net_device" from dropping the usage count to 1; making it impossible for unregister_netdevice() to continue.
Reported-by: syzbot syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Tested-by: syzbot syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Fixes: 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct callback") Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Tested-by: Oleksij Rempel o.rempel@pengutronix.de Acked-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://patch.msgid.link/ac9db9a4-6c30-416e-8b94-96e6559d55b2@I-love.SAKURA.... [mkl: remove space in front of label] Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/can/j1939/j1939-priv.h | 1 + net/can/j1939/main.c | 3 +++ net/can/j1939/socket.c | 49 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 53 insertions(+)
diff --git a/net/can/j1939/j1939-priv.h b/net/can/j1939/j1939-priv.h index 31a93cae5111b..81f58924b4acd 100644 --- a/net/can/j1939/j1939-priv.h +++ b/net/can/j1939/j1939-priv.h @@ -212,6 +212,7 @@ void j1939_priv_get(struct j1939_priv *priv);
/* notify/alert all j1939 sockets bound to ifindex */ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv); +void j1939_sk_netdev_event_unregister(struct j1939_priv *priv); int j1939_cancel_active_session(struct j1939_priv *priv, struct sock *sk); void j1939_tp_init(struct j1939_priv *priv);
diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c index 7e8a20f2fc42b..3706a872ecafd 100644 --- a/net/can/j1939/main.c +++ b/net/can/j1939/main.c @@ -377,6 +377,9 @@ static int j1939_netdev_notify(struct notifier_block *nb, j1939_sk_netdev_event_netdown(priv); j1939_ecu_unmap_all(priv); break; + case NETDEV_UNREGISTER: + j1939_sk_netdev_event_unregister(priv); + break; }
j1939_priv_put(priv); diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index 6fefe7a687611..b3a45aa70cf2f 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -1299,6 +1299,55 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) read_unlock_bh(&priv->j1939_socks_lock); }
+void j1939_sk_netdev_event_unregister(struct j1939_priv *priv) +{ + struct sock *sk; + struct j1939_sock *jsk; + bool wait_rcu = false; + +rescan: /* The caller is holding a ref on this "priv" via j1939_priv_get_by_ndev(). */ + read_lock_bh(&priv->j1939_socks_lock); + list_for_each_entry(jsk, &priv->j1939_socks, list) { + /* Skip if j1939_jsk_add() is not called on this socket. */ + if (!(jsk->state & J1939_SOCK_BOUND)) + continue; + sk = &jsk->sk; + sock_hold(sk); + read_unlock_bh(&priv->j1939_socks_lock); + /* Check if j1939_jsk_del() is not yet called on this socket after holding + * socket's lock, for both j1939_sk_bind() and j1939_sk_release() call + * j1939_jsk_del() with socket's lock held. + */ + lock_sock(sk); + if (jsk->state & J1939_SOCK_BOUND) { + /* Neither j1939_sk_bind() nor j1939_sk_release() called j1939_jsk_del(). + * Make this socket no longer bound, by pretending as if j1939_sk_bind() + * dropped old references but did not get new references. + */ + j1939_jsk_del(priv, jsk); + j1939_local_ecu_put(priv, jsk->addr.src_name, jsk->addr.sa); + j1939_netdev_stop(priv); + /* Call j1939_priv_put() now and prevent j1939_sk_sock_destruct() from + * calling the corresponding j1939_priv_put(). + * + * j1939_sk_sock_destruct() is supposed to call j1939_priv_put() after + * an RCU grace period. But since the caller is holding a ref on this + * "priv", we can defer synchronize_rcu() until immediately before + * the caller calls j1939_priv_put(). + */ + j1939_priv_put(priv); + jsk->priv = NULL; + wait_rcu = true; + } + release_sock(sk); + sock_put(sk); + goto rescan; + } + read_unlock_bh(&priv->j1939_socks_lock); + if (wait_rcu) + synchronize_rcu(); +} + static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd, unsigned long arg) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit f214744c8a27c3c1da6b538c232da22cd027530e ]
Commit 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct callback") expects that a call to j1939_priv_put() can be unconditionally delayed until j1939_sk_sock_destruct() is called. But a refcount leak will happen when j1939_sk_bind() is called again after j1939_local_ecu_get() from previous j1939_sk_bind() call returned an error. We need to call j1939_priv_put() before j1939_sk_bind() returns an error.
Fixes: 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct callback") Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Tested-by: Oleksij Rempel o.rempel@pengutronix.de Acked-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://patch.msgid.link/4f49a1bc-a528-42ad-86c0-187268ab6535@I-love.SAKURA.... Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/can/j1939/socket.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index b3a45aa70cf2f..785b883a1319d 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -520,6 +520,9 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) ret = j1939_local_ecu_get(priv, jsk->addr.src_name, jsk->addr.sa); if (ret) { j1939_netdev_stop(priv); + jsk->priv = NULL; + synchronize_rcu(); + j1939_priv_put(priv); goto out_release_sock; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 06e02da29f6f1a45fc07bd60c7eaf172dc21e334 ]
Since j1939_sk_bind() and j1939_sk_release() call j1939_local_ecu_put() when J1939_SOCK_BOUND was already set, but the error handling path for j1939_sk_bind() will not set J1939_SOCK_BOUND when j1939_local_ecu_get() fails, j1939_local_ecu_get() needs to undo priv->ents[sa].nusers++ when j1939_local_ecu_get() returns an error.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Tested-by: Oleksij Rempel o.rempel@pengutronix.de Acked-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://patch.msgid.link/e7f80046-4ff7-4ce2-8ad8-7c3c678a42c9@I-love.SAKURA.... Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/can/j1939/bus.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/can/j1939/bus.c b/net/can/j1939/bus.c index 39844f14eed86..797719cb227ec 100644 --- a/net/can/j1939/bus.c +++ b/net/can/j1939/bus.c @@ -290,8 +290,11 @@ int j1939_local_ecu_get(struct j1939_priv *priv, name_t name, u8 sa) if (!ecu) ecu = j1939_ecu_create_locked(priv, name); err = PTR_ERR_OR_ZERO(ecu); - if (err) + if (err) { + if (j1939_address_is_unicast(sa)) + priv->ents[sa].nusers--; goto done; + }
ecu->nusers++; /* TODO: do we care if ecu->addr != sa? */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anssi Hannula anssi.hannula@bitwise.fi
[ Upstream commit ef79f00be72bd81d2e1e6f060d83cf7e425deee4 ]
can_put_echo_skb() takes ownership of the SKB and it may be freed during or after the call.
However, xilinx_can xcan_write_frame() keeps using SKB after the call.
Fix that by only calling can_put_echo_skb() after the code is done touching the SKB.
The tx_lock is held for the entire xcan_write_frame() execution and also on the can_get_echo_skb() side so the order of operations does not matter.
An earlier fix commit 3d3c817c3a40 ("can: xilinx_can: Fix usage of skb memory") did not move the can_put_echo_skb() call far enough.
Signed-off-by: Anssi Hannula anssi.hannula@bitwise.fi Fixes: 1598efe57b3e ("can: xilinx_can: refactor code in preparation for CAN FD support") Link: https://patch.msgid.link/20250822095002.168389-1-anssi.hannula@bitwise.fi [mkl: add "commit" in front of sha1 in patch description] [mkl: fix indention] Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/xilinx_can.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index 3f2e378199abb..5abe4af61655c 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -690,14 +690,6 @@ static void xcan_write_frame(struct net_device *ndev, struct sk_buff *skb, dlc |= XCAN_DLCR_EDL_MASK; }
- if (!(priv->devtype.flags & XCAN_FLAG_TX_MAILBOXES) && - (priv->devtype.flags & XCAN_FLAG_TXFEMP)) - can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max, 0); - else - can_put_echo_skb(skb, ndev, 0, 0); - - priv->tx_head++; - priv->write_reg(priv, XCAN_FRAME_ID_OFFSET(frame_offset), id); /* If the CAN frame is RTR frame this write triggers transmission * (not on CAN FD) @@ -730,6 +722,14 @@ static void xcan_write_frame(struct net_device *ndev, struct sk_buff *skb, data[1]); } } + + if (!(priv->devtype.flags & XCAN_FLAG_TX_MAILBOXES) && + (priv->devtype.flags & XCAN_FLAG_TXFEMP)) + can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max, 0); + else + can_put_echo_skb(skb, ndev, 0, 0); + + priv->tx_head++; }
/**
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 5e13f2c491a4100d208e77e92fe577fe3dbad6c2 ]
Running new 'set_flush_add_atomic_bitmap' test case for nftables.git with CONFIG_PROVE_RCU_LIST=y yields:
net/netfilter/nft_set_bitmap.c:231 RCU-list traversed in non-reader section!! rcu_scheduler_active = 2, debug_locks = 1 1 lock held by nft/4008: #0: ffff888147f79cd8 (&nft_net->commit_mutex){+.+.}-{4:4}, at: nf_tables_valid_genid+0x2f/0xd0
lockdep_rcu_suspicious+0x116/0x160 nft_bitmap_walk+0x22d/0x240 nf_tables_delsetelem+0x1010/0x1a00 ..
This is a false positive, the list cannot be altered while the transaction mutex is held, so pass the relevant argument to the iterator.
Fixes tag intentionally wrong; no point in picking this up if earlier false-positive-fixups were not applied.
Fixes: 28b7a6b84c0a ("netfilter: nf_tables: avoid false-positive lockdep splats in set walker") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_bitmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c index 12390d2e994fc..8a435cc0e98c4 100644 --- a/net/netfilter/nft_set_bitmap.c +++ b/net/netfilter/nft_set_bitmap.c @@ -221,7 +221,8 @@ static void nft_bitmap_walk(const struct nft_ctx *ctx, const struct nft_bitmap *priv = nft_set_priv(set); struct nft_bitmap_elem *be;
- list_for_each_entry_rcu(be, &priv->list, head) { + list_for_each_entry_rcu(be, &priv->list, head, + lockdep_is_held(&nft_pernet(ctx->net)->commit_mutex)) { if (iter->count < iter->skip) goto cont;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 7792c1e03054440c60d4bce0c06a31c134601997 ]
They are not used anymore, so remove them.
Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 9e4e25f2458f9..9ac48e6b4332c 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -502,8 +502,6 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
/** * pipapo_get() - Get matching element reference given key data - * @net: Network namespace - * @set: nftables API set representation * @m: storage containing active/existing elements * @data: Key data to be matched against existing elements * @genmask: If set, check that element is active in given genmask @@ -516,9 +514,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, * * Return: pointer to &struct nft_pipapo_elem on match, error pointer otherwise. */ -static struct nft_pipapo_elem *pipapo_get(const struct net *net, - const struct nft_set *set, - const struct nft_pipapo_match *m, +static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, const u8 *data, u8 genmask, u64 tstamp, gfp_t gfp) { @@ -615,7 +611,7 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set, struct nft_pipapo_match *m = rcu_dereference(priv->match); struct nft_pipapo_elem *e;
- e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data, + e = pipapo_get(m, (const u8 *)elem->key.val.data, nft_genmask_cur(net), get_jiffies_64(), GFP_ATOMIC); if (IS_ERR(e)) @@ -1344,7 +1340,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, else end = start;
- dup = pipapo_get(net, set, m, start, genmask, tstamp, GFP_KERNEL); + dup = pipapo_get(m, start, genmask, tstamp, GFP_KERNEL); if (!IS_ERR(dup)) { /* Check if we already have the same exact entry */ const struct nft_data *dup_key, *dup_end; @@ -1366,7 +1362,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
if (PTR_ERR(dup) == -ENOENT) { /* Look for partially overlapping entries */ - dup = pipapo_get(net, set, m, end, nft_genmask_next(net), tstamp, + dup = pipapo_get(m, end, nft_genmask_next(net), tstamp, GFP_KERNEL); }
@@ -1913,7 +1909,7 @@ nft_pipapo_deactivate(const struct net *net, const struct nft_set *set, if (!m) return NULL;
- e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data, + e = pipapo_get(m, (const u8 *)elem->key.val.data, nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL); if (IS_ERR(e)) return NULL;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 17a20e09f086f2c574ac87f3cf6e14c4377f65f6 ]
Return the extension pointer instead of passing it as a function argument to be filled in by the callee.
As-is, whenever false is returned, the extension pointer is not used.
For all set types, when true is returned, the extension pointer was set to the matching element.
Only exception: nft_set_bitmap doesn't support extensions. Return a pointer to a static const empty element extension container.
return false -> return NULL return true -> return the elements' extension pointer.
This saves one function argument.
Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 10 ++--- include/net/netfilter/nf_tables_core.h | 47 ++++++++++++---------- net/netfilter/nft_dynset.c | 5 ++- net/netfilter/nft_lookup.c | 27 ++++++------- net/netfilter/nft_objref.c | 5 +-- net/netfilter/nft_set_bitmap.c | 11 ++++-- net/netfilter/nft_set_hash.c | 54 ++++++++++++-------------- net/netfilter/nft_set_pipapo.c | 19 +++++---- net/netfilter/nft_set_pipapo_avx2.c | 25 ++++++------ net/netfilter/nft_set_rbtree.c | 40 +++++++++---------- 10 files changed, 126 insertions(+), 117 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 5e49619ae49c6..cf65703e221fa 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -459,19 +459,17 @@ struct nft_set_ext; * control plane functions. */ struct nft_set_ops { - bool (*lookup)(const struct net *net, + const struct nft_set_ext * (*lookup)(const struct net *net, const struct nft_set *set, - const u32 *key, - const struct nft_set_ext **ext); - bool (*update)(struct nft_set *set, + const u32 *key); + const struct nft_set_ext * (*update)(struct nft_set *set, const u32 *key, struct nft_elem_priv * (*new)(struct nft_set *, const struct nft_expr *, struct nft_regs *), const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_set_ext **ext); + struct nft_regs *regs); bool (*delete)(const struct nft_set *set, const u32 *key);
diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h index 03b6165756fc5..6a52fb97b8443 100644 --- a/include/net/netfilter/nf_tables_core.h +++ b/include/net/netfilter/nf_tables_core.h @@ -94,34 +94,41 @@ extern const struct nft_set_type nft_set_pipapo_type; extern const struct nft_set_type nft_set_pipapo_avx2_type;
#ifdef CONFIG_MITIGATION_RETPOLINE -bool nft_rhash_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_bitmap_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_hash_lookup_fast(const struct net *net, - const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_hash_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); +const struct nft_set_ext * +nft_rhash_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_rbtree_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_bitmap_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_hash_lookup_fast(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_hash_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_set_do_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); #else -static inline bool +static inline const struct nft_set_ext * nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) + const u32 *key) { - return set->ops->lookup(net, set, key, ext); + return set->ops->lookup(net, set, key); } #endif
/* called from nft_pipapo_avx2.c */ -bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); +const struct nft_set_ext * +nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); /* called from nft_set_pipapo.c */ -bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); +const struct nft_set_ext * +nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, + const u32 *key);
void nft_counter_init_seqcount(void);
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c index 88922e0e8e837..e24493d9e7761 100644 --- a/net/netfilter/nft_dynset.c +++ b/net/netfilter/nft_dynset.c @@ -91,8 +91,9 @@ void nft_dynset_eval(const struct nft_expr *expr, return; }
- if (set->ops->update(set, ®s->data[priv->sreg_key], nft_dynset_new, - expr, regs, &ext)) { + ext = set->ops->update(set, ®s->data[priv->sreg_key], nft_dynset_new, + expr, regs); + if (ext) { if (priv->op == NFT_DYNSET_OP_UPDATE && nft_set_ext_exists(ext, NFT_SET_EXT_TIMEOUT) && READ_ONCE(nft_set_ext_timeout(ext)->timeout) != 0) { diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 63ef832b8aa71..40c602ffbcba7 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -25,32 +25,33 @@ struct nft_lookup { };
#ifdef CONFIG_MITIGATION_RETPOLINE -bool nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_set_do_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { if (set->ops == &nft_set_hash_fast_type.ops) - return nft_hash_lookup_fast(net, set, key, ext); + return nft_hash_lookup_fast(net, set, key); if (set->ops == &nft_set_hash_type.ops) - return nft_hash_lookup(net, set, key, ext); + return nft_hash_lookup(net, set, key);
if (set->ops == &nft_set_rhash_type.ops) - return nft_rhash_lookup(net, set, key, ext); + return nft_rhash_lookup(net, set, key);
if (set->ops == &nft_set_bitmap_type.ops) - return nft_bitmap_lookup(net, set, key, ext); + return nft_bitmap_lookup(net, set, key);
if (set->ops == &nft_set_pipapo_type.ops) - return nft_pipapo_lookup(net, set, key, ext); + return nft_pipapo_lookup(net, set, key); #if defined(CONFIG_X86_64) && !defined(CONFIG_UML) if (set->ops == &nft_set_pipapo_avx2_type.ops) - return nft_pipapo_avx2_lookup(net, set, key, ext); + return nft_pipapo_avx2_lookup(net, set, key); #endif
if (set->ops == &nft_set_rbtree_type.ops) - return nft_rbtree_lookup(net, set, key, ext); + return nft_rbtree_lookup(net, set, key);
WARN_ON_ONCE(1); - return set->ops->lookup(net, set, key, ext); + return set->ops->lookup(net, set, key); } EXPORT_SYMBOL_GPL(nft_set_do_lookup); #endif @@ -61,12 +62,12 @@ void nft_lookup_eval(const struct nft_expr *expr, { const struct nft_lookup *priv = nft_expr_priv(expr); const struct nft_set *set = priv->set; - const struct nft_set_ext *ext = NULL; const struct net *net = nft_net(pkt); + const struct nft_set_ext *ext; bool found;
- found = nft_set_do_lookup(net, set, ®s->data[priv->sreg], &ext) ^ - priv->invert; + ext = nft_set_do_lookup(net, set, ®s->data[priv->sreg]); + found = !!ext ^ priv->invert; if (!found) { ext = nft_set_catchall_lookup(net, set); if (!ext) { diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c index 09da7a3f9f967..8ee66a86c3bc7 100644 --- a/net/netfilter/nft_objref.c +++ b/net/netfilter/nft_objref.c @@ -111,10 +111,9 @@ void nft_objref_map_eval(const struct nft_expr *expr, struct net *net = nft_net(pkt); const struct nft_set_ext *ext; struct nft_object *obj; - bool found;
- found = nft_set_do_lookup(net, set, ®s->data[priv->sreg], &ext); - if (!found) { + ext = nft_set_do_lookup(net, set, ®s->data[priv->sreg]); + if (!ext) { ext = nft_set_catchall_lookup(net, set); if (!ext) { regs->verdict.code = NFT_BREAK; diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c index 8a435cc0e98c4..8d3f040a904a2 100644 --- a/net/netfilter/nft_set_bitmap.c +++ b/net/netfilter/nft_set_bitmap.c @@ -75,16 +75,21 @@ nft_bitmap_active(const u8 *bitmap, u32 idx, u32 off, u8 genmask) }
INDIRECT_CALLABLE_SCOPE -bool nft_bitmap_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_bitmap_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { const struct nft_bitmap *priv = nft_set_priv(set); + static const struct nft_set_ext found; u8 genmask = nft_genmask_cur(net); u32 idx, off;
nft_bitmap_location(set, key, &idx, &off);
- return nft_bitmap_active(priv->bitmap, idx, off, genmask); + if (nft_bitmap_active(priv->bitmap, idx, off, genmask)) + return &found; + + return NULL; }
static struct nft_bitmap_elem * diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c index abb0c8ec63719..9903c737c9f0a 100644 --- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -81,8 +81,9 @@ static const struct rhashtable_params nft_rhash_params = { };
INDIRECT_CALLABLE_SCOPE -bool nft_rhash_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_rhash_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_rhash *priv = nft_set_priv(set); const struct nft_rhash_elem *he; @@ -95,9 +96,9 @@ bool nft_rhash_lookup(const struct net *net, const struct nft_set *set,
he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params); if (he != NULL) - *ext = &he->ext; + return &he->ext;
- return !!he; + return NULL; }
static struct nft_elem_priv * @@ -120,14 +121,11 @@ nft_rhash_get(const struct net *net, const struct nft_set *set, return ERR_PTR(-ENOENT); }
-static bool nft_rhash_update(struct nft_set *set, const u32 *key, - struct nft_elem_priv * - (*new)(struct nft_set *, - const struct nft_expr *, - struct nft_regs *regs), - const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_set_ext **ext) +static const struct nft_set_ext * +nft_rhash_update(struct nft_set *set, const u32 *key, + struct nft_elem_priv *(*new)(struct nft_set *, const struct nft_expr *, + struct nft_regs *regs), + const struct nft_expr *expr, struct nft_regs *regs) { struct nft_rhash *priv = nft_set_priv(set); struct nft_rhash_elem *he, *prev; @@ -161,14 +159,13 @@ static bool nft_rhash_update(struct nft_set *set, const u32 *key, }
out: - *ext = &he->ext; - return true; + return &he->ext;
err2: nft_set_elem_destroy(set, &he->priv, true); atomic_dec(&set->nelems); err1: - return false; + return NULL; }
static int nft_rhash_insert(const struct net *net, const struct nft_set *set, @@ -507,8 +504,9 @@ struct nft_hash_elem { };
INDIRECT_CALLABLE_SCOPE -bool nft_hash_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_hash_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_hash *priv = nft_set_priv(set); u8 genmask = nft_genmask_cur(net); @@ -519,12 +517,10 @@ bool nft_hash_lookup(const struct net *net, const struct nft_set *set, hash = reciprocal_scale(hash, priv->buckets); hlist_for_each_entry_rcu(he, &priv->table[hash], node) { if (!memcmp(nft_set_ext_key(&he->ext), key, set->klen) && - nft_set_elem_active(&he->ext, genmask)) { - *ext = &he->ext; - return true; - } + nft_set_elem_active(&he->ext, genmask)) + return &he->ext; } - return false; + return NULL; }
static struct nft_elem_priv * @@ -547,9 +543,9 @@ nft_hash_get(const struct net *net, const struct nft_set *set, }
INDIRECT_CALLABLE_SCOPE -bool nft_hash_lookup_fast(const struct net *net, - const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_hash_lookup_fast(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_hash *priv = nft_set_priv(set); u8 genmask = nft_genmask_cur(net); @@ -562,12 +558,10 @@ bool nft_hash_lookup_fast(const struct net *net, hlist_for_each_entry_rcu(he, &priv->table[hash], node) { k2 = *(u32 *)nft_set_ext_key(&he->ext)->data; if (k1 == k2 && - nft_set_elem_active(&he->ext, genmask)) { - *ext = &he->ext; - return true; - } + nft_set_elem_active(&he->ext, genmask)) + return &he->ext; } - return false; + return NULL; }
static u32 nft_jhash(const struct nft_set *set, const struct nft_hash *priv, diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 9ac48e6b4332c..a844b33fa6002 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -407,8 +407,9 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules, * * Return: true on match, false otherwise. */ -bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_pipapo *priv = nft_set_priv(set); struct nft_pipapo_scratch *scratch; @@ -465,13 +466,15 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, scratch->map_index = map_index; local_bh_enable();
- return false; + return NULL; }
if (last) { - *ext = &f->mt[b].e->ext; - if (unlikely(nft_set_elem_expired(*ext) || - !nft_set_elem_active(*ext, genmask))) + const struct nft_set_ext *ext; + + ext = &f->mt[b].e->ext; + if (unlikely(nft_set_elem_expired(ext) || + !nft_set_elem_active(ext, genmask))) goto next_match;
/* Last field: we're just returning the key without @@ -482,7 +485,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, scratch->map_index = map_index; local_bh_enable();
- return true; + return ext; }
/* Swap bitmap indices: res_map is the initial bitmap for the @@ -497,7 +500,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
out: local_bh_enable(); - return false; + return NULL; }
/** diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index be7c16c79f711..6c441e2dc8af3 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1146,8 +1146,9 @@ static inline void pipapo_resmap_init_avx2(const struct nft_pipapo_match *m, uns * * Return: true on match, false otherwise. */ -bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_pipapo *priv = nft_set_priv(set); struct nft_pipapo_scratch *scratch; @@ -1155,17 +1156,18 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; const u8 *rp = (const u8 *)key; + const struct nft_set_ext *ext; unsigned long *res, *fill; bool map_index; - int i, ret = 0; + int i;
local_bh_disable();
if (unlikely(!irq_fpu_usable())) { - bool fallback_res = nft_pipapo_lookup(net, set, key, ext); + ext = nft_pipapo_lookup(net, set, key);
local_bh_enable(); - return fallback_res; + return ext; }
m = rcu_dereference(priv->match); @@ -1182,7 +1184,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, if (unlikely(!scratch)) { kernel_fpu_end(); local_bh_enable(); - return false; + return NULL; }
map_index = scratch->map_index; @@ -1197,6 +1199,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, next_match: nft_pipapo_for_each_field(f, i, m) { bool last = i == m->field_count - 1, first = !i; + int ret = 0;
#define NFT_SET_PIPAPO_AVX2_LOOKUP(b, n) \ (ret = nft_pipapo_avx2_lookup_##b##b_##n(res, fill, f, \ @@ -1244,10 +1247,10 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, goto out;
if (last) { - *ext = &f->mt[ret].e->ext; - if (unlikely(nft_set_elem_expired(*ext) || - !nft_set_elem_active(*ext, genmask))) { - ret = 0; + ext = &f->mt[ret].e->ext; + if (unlikely(nft_set_elem_expired(ext) || + !nft_set_elem_active(ext, genmask))) { + ext = NULL; goto next_match; }
@@ -1264,5 +1267,5 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, kernel_fpu_end(); local_bh_enable();
- return ret >= 0; + return ext; } diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index 2e8ef16ff191d..938a257c069e2 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -52,9 +52,9 @@ static bool nft_rbtree_elem_expired(const struct nft_rbtree_elem *rbe) return nft_set_elem_expired(&rbe->ext); }
-static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext, - unsigned int seq) +static const struct nft_set_ext * +__nft_rbtree_lookup(const struct net *net, const struct nft_set *set, + const u32 *key, unsigned int seq) { struct nft_rbtree *priv = nft_set_priv(set); const struct nft_rbtree_elem *rbe, *interval = NULL; @@ -65,7 +65,7 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set parent = rcu_dereference_raw(priv->root.rb_node); while (parent != NULL) { if (read_seqcount_retry(&priv->count, seq)) - return false; + return NULL;
rbe = rb_entry(parent, struct nft_rbtree_elem, node);
@@ -87,50 +87,48 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set }
if (nft_rbtree_elem_expired(rbe)) - return false; + return NULL;
if (nft_rbtree_interval_end(rbe)) { if (nft_set_is_anonymous(set)) - return false; + return NULL; parent = rcu_dereference_raw(parent->rb_left); interval = NULL; continue; }
- *ext = &rbe->ext; - return true; + return &rbe->ext; } }
if (set->flags & NFT_SET_INTERVAL && interval != NULL && nft_set_elem_active(&interval->ext, genmask) && !nft_rbtree_elem_expired(interval) && - nft_rbtree_interval_start(interval)) { - *ext = &interval->ext; - return true; - } + nft_rbtree_interval_start(interval)) + return &interval->ext;
- return false; + return NULL; }
INDIRECT_CALLABLE_SCOPE -bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_rbtree_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_rbtree *priv = nft_set_priv(set); unsigned int seq = read_seqcount_begin(&priv->count); - bool ret; + const struct nft_set_ext *ext;
- ret = __nft_rbtree_lookup(net, set, key, ext, seq); - if (ret || !read_seqcount_retry(&priv->count, seq)) - return ret; + ext = __nft_rbtree_lookup(net, set, key, seq); + if (ext || !read_seqcount_retry(&priv->count, seq)) + return ext;
read_lock_bh(&priv->lock); seq = read_seqcount_begin(&priv->count); - ret = __nft_rbtree_lookup(net, set, key, ext, seq); + ext = __nft_rbtree_lookup(net, set, key, seq); read_unlock_bh(&priv->lock);
- return ret; + return ext; }
static bool __nft_rbtree_get(const struct net *net, const struct nft_set *set,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit d8d871a35ca9ee4881d34995444ed1cb826d01db ]
The matching algorithm has implemented thrice: 1. data path lookup, generic version 2. data path lookup, avx2 version 3. control plane lookup
Merge 1 and 3 by refactoring pipapo_get as a common helper, then make nft_pipapo_lookup and nft_pipapo_get both call the common helper.
Aside from the code savings this has the benefit that we no longer allocate temporary scratch maps for each control plane get and insertion operation.
Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 188 ++++++++++----------------------- 1 file changed, 58 insertions(+), 130 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index a844b33fa6002..1a19649c28511 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -397,35 +397,36 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules, }
/** - * nft_pipapo_lookup() - Lookup function - * @net: Network namespace - * @set: nftables API set representation - * @key: nftables API element representation containing key data - * @ext: nftables API extension pointer, filled with matching reference + * pipapo_get() - Get matching element reference given key data + * @m: storage containing the set elements + * @data: Key data to be matched against existing elements + * @genmask: If set, check that element is active in given genmask + * @tstamp: timestamp to check for expired elements * * For more details, see DOC: Theory of Operation. * - * Return: true on match, false otherwise. + * This is the main lookup function. It matches key data against either + * the working match set or the uncommitted copy, depending on what the + * caller passed to us. + * nft_pipapo_get (lookup from userspace/control plane) and nft_pipapo_lookup + * (datapath lookup) pass the active copy. + * The insertion path will pass the uncommitted working copy. + * + * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise. */ -const struct nft_set_ext * -nft_pipapo_lookup(const struct net *net, const struct nft_set *set, - const u32 *key) +static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, + const u8 *data, u8 genmask, + u64 tstamp) { - struct nft_pipapo *priv = nft_set_priv(set); struct nft_pipapo_scratch *scratch; unsigned long *res_map, *fill_map; - u8 genmask = nft_genmask_cur(net); - const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; - const u8 *rp = (const u8 *)key; bool map_index; int i;
local_bh_disable();
- m = rcu_dereference(priv->match); - - if (unlikely(!m || !*raw_cpu_ptr(m->scratch))) + if (unlikely(!raw_cpu_ptr(m->scratch))) goto out;
scratch = *raw_cpu_ptr(m->scratch); @@ -445,12 +446,12 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, * packet bytes value, then AND bucket value */ if (likely(f->bb == 8)) - pipapo_and_field_buckets_8bit(f, res_map, rp); + pipapo_and_field_buckets_8bit(f, res_map, data); else - pipapo_and_field_buckets_4bit(f, res_map, rp); + pipapo_and_field_buckets_4bit(f, res_map, data); NFT_PIPAPO_GROUP_BITS_ARE_8_OR_4;
- rp += f->groups / NFT_PIPAPO_GROUPS_PER_BYTE(f); + data += f->groups / NFT_PIPAPO_GROUPS_PER_BYTE(f);
/* Now populate the bitmap for the next field, unless this is * the last field, in which case return the matched 'ext' @@ -470,11 +471,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, }
if (last) { - const struct nft_set_ext *ext; + struct nft_pipapo_elem *e;
- ext = &f->mt[b].e->ext; - if (unlikely(nft_set_elem_expired(ext) || - !nft_set_elem_active(ext, genmask))) + e = f->mt[b].e; + if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) || + !nft_set_elem_active(&e->ext, genmask))) goto next_match;
/* Last field: we're just returning the key without @@ -484,8 +485,7 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, */ scratch->map_index = map_index; local_bh_enable(); - - return ext; + return e; }
/* Swap bitmap indices: res_map is the initial bitmap for the @@ -495,7 +495,7 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, map_index = !map_index; swap(res_map, fill_map);
- rp += NFT_PIPAPO_GROUPS_PADDING(f); + data += NFT_PIPAPO_GROUPS_PADDING(f); }
out: @@ -504,99 +504,29 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, }
/** - * pipapo_get() - Get matching element reference given key data - * @m: storage containing active/existing elements - * @data: Key data to be matched against existing elements - * @genmask: If set, check that element is active in given genmask - * @tstamp: timestamp to check for expired elements - * @gfp: the type of memory to allocate (see kmalloc). + * nft_pipapo_lookup() - Dataplane fronted for main lookup function + * @net: Network namespace + * @set: nftables API set representation + * @key: pointer to nft registers containing key data * - * This is essentially the same as the lookup function, except that it matches - * key data against the uncommitted copy and doesn't use preallocated maps for - * bitmap results. + * This function is called from the data path. It will search for + * an element matching the given key in the current active copy. * - * Return: pointer to &struct nft_pipapo_elem on match, error pointer otherwise. + * Return: ntables API extension pointer or NULL if no match. */ -static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, - const u8 *data, u8 genmask, - u64 tstamp, gfp_t gfp) +const struct nft_set_ext * +nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { - struct nft_pipapo_elem *ret = ERR_PTR(-ENOENT); - unsigned long *res_map, *fill_map = NULL; - const struct nft_pipapo_field *f; - int i; - - if (m->bsize_max == 0) - return ret; - - res_map = kmalloc_array(m->bsize_max, sizeof(*res_map), gfp); - if (!res_map) { - ret = ERR_PTR(-ENOMEM); - goto out; - } - - fill_map = kcalloc(m->bsize_max, sizeof(*res_map), gfp); - if (!fill_map) { - ret = ERR_PTR(-ENOMEM); - goto out; - } - - pipapo_resmap_init(m, res_map); - - nft_pipapo_for_each_field(f, i, m) { - bool last = i == m->field_count - 1; - int b; - - /* For each bit group: select lookup table bucket depending on - * packet bytes value, then AND bucket value - */ - if (f->bb == 8) - pipapo_and_field_buckets_8bit(f, res_map, data); - else if (f->bb == 4) - pipapo_and_field_buckets_4bit(f, res_map, data); - else - BUG(); - - data += f->groups / NFT_PIPAPO_GROUPS_PER_BYTE(f); - - /* Now populate the bitmap for the next field, unless this is - * the last field, in which case return the matched 'ext' - * pointer if any. - * - * Now res_map contains the matching bitmap, and fill_map is the - * bitmap for the next field. - */ -next_match: - b = pipapo_refill(res_map, f->bsize, f->rules, fill_map, f->mt, - last); - if (b < 0) - goto out; - - if (last) { - if (__nft_set_elem_expired(&f->mt[b].e->ext, tstamp)) - goto next_match; - if ((genmask && - !nft_set_elem_active(&f->mt[b].e->ext, genmask))) - goto next_match; - - ret = f->mt[b].e; - goto out; - } - - data += NFT_PIPAPO_GROUPS_PADDING(f); + struct nft_pipapo *priv = nft_set_priv(set); + u8 genmask = nft_genmask_cur(net); + const struct nft_pipapo_match *m; + const struct nft_pipapo_elem *e;
- /* Swap bitmap indices: fill_map will be the initial bitmap for - * the next field (i.e. the new res_map), and res_map is - * guaranteed to be all-zeroes at this point, ready to be filled - * according to the next mapping table. - */ - swap(res_map, fill_map); - } + m = rcu_dereference(priv->match); + e = pipapo_get(m, (const u8 *)key, genmask, get_jiffies_64());
-out: - kfree(fill_map); - kfree(res_map); - return ret; + return e ? &e->ext : NULL; }
/** @@ -605,6 +535,11 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, * @set: nftables API set representation * @elem: nftables API element representation containing key data * @flags: Unused + * + * This function is called from the control plane path under + * RCU read lock. + * + * Return: set element private pointer or ERR_PTR(-ENOENT). */ static struct nft_elem_priv * nft_pipapo_get(const struct net *net, const struct nft_set *set, @@ -615,10 +550,9 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set, struct nft_pipapo_elem *e;
e = pipapo_get(m, (const u8 *)elem->key.val.data, - nft_genmask_cur(net), get_jiffies_64(), - GFP_ATOMIC); - if (IS_ERR(e)) - return ERR_CAST(e); + nft_genmask_cur(net), get_jiffies_64()); + if (!e) + return ERR_PTR(-ENOENT);
return &e->priv; } @@ -1343,8 +1277,8 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, else end = start;
- dup = pipapo_get(m, start, genmask, tstamp, GFP_KERNEL); - if (!IS_ERR(dup)) { + dup = pipapo_get(m, start, genmask, tstamp); + if (dup) { /* Check if we already have the same exact entry */ const struct nft_data *dup_key, *dup_end;
@@ -1363,15 +1297,9 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, return -ENOTEMPTY; }
- if (PTR_ERR(dup) == -ENOENT) { - /* Look for partially overlapping entries */ - dup = pipapo_get(m, end, nft_genmask_next(net), tstamp, - GFP_KERNEL); - } - - if (PTR_ERR(dup) != -ENOENT) { - if (IS_ERR(dup)) - return PTR_ERR(dup); + /* Look for partially overlapping entries */ + dup = pipapo_get(m, end, nft_genmask_next(net), tstamp); + if (dup) { *elem_priv = &dup->priv; return -ENOTEMPTY; } @@ -1913,8 +1841,8 @@ nft_pipapo_deactivate(const struct net *net, const struct nft_set *set, return NULL;
e = pipapo_get(m, (const u8 *)elem->key.val.data, - nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL); - if (IS_ERR(e)) + nft_genmask_next(net), nft_net_tstamp(net)); + if (!e) return NULL;
nft_set_elem_change_active(net, set, &e->ext);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit c8a7c2c608180f3b4e51dc958b3861242dcdd76d ]
Dan Carpenter says: Commit 17a20e09f086 ("netfilter: nft_set: remove one argument from lookup and update functions") [..] leads to the following Smatch static checker warning:
net/netfilter/nft_set_pipapo_avx2.c:1269 nft_pipapo_avx2_lookup() error: uninitialized symbol 'ext'.
Fix this by initing ext to NULL and set it only once we've found a match.
Fixes: 17a20e09f086 ("netfilter: nft_set: remove one argument from lookup and update functions") Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/netfilter-devel/aJBzc3V5wk-yPOnH@stanley.mountain/ Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo_avx2.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index 6c441e2dc8af3..2155c7f345c21 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1151,12 +1151,12 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, const u32 *key) { struct nft_pipapo *priv = nft_set_priv(set); + const struct nft_set_ext *ext = NULL; struct nft_pipapo_scratch *scratch; u8 genmask = nft_genmask_cur(net); const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; const u8 *rp = (const u8 *)key; - const struct nft_set_ext *ext; unsigned long *res, *fill; bool map_index; int i; @@ -1247,13 +1247,13 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, goto out;
if (last) { - ext = &f->mt[ret].e->ext; - if (unlikely(nft_set_elem_expired(ext) || - !nft_set_elem_active(ext, genmask))) { - ext = NULL; + const struct nft_set_ext *e = &f->mt[ret].e->ext; + + if (unlikely(nft_set_elem_expired(e) || + !nft_set_elem_active(e, genmask))) goto next_match; - }
+ ext = e; goto out; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit c4eaca2e1052adfd67bed0a36a9d4b8e515666e4 ]
The pipapo set type is special in that it has two copies of its datastructure: one live copy containing only valid elements and one on-demand clone used during transaction where adds/deletes happen.
This clone is not visible to the datapath.
This is unlike all other set types in nftables, those all link new elements into their live hlist/tree.
For those sets, the lookup functions must skip the new elements while the transaction is ongoing to ensure consistency.
As the clone is shallow, removal does have an effect on the packet path: once the transaction enters the commit phase the 'gencursor' bit that determines which elements are active and which elements should be ignored (because they are no longer valid) is flipped.
This causes the datapath lookup to ignore these elements if they are found during lookup.
This opens up a small race window where pipapo has an inconsistent view of the dataset from when the transaction-cpu flipped the genbit until the transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers:
cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase:
I) increments the genbit A) observes new genbit removes elements from the clone so they won't be found anymore B) lookup in datastructure can't see new elements yet, but old elements are ignored -> Only matches elements that were not changed in the transaction II) calls nft_pipapo_commit(), clone and live pointers are swapped. C New nft_lookup happening now will find matching elements.
Consider a packet matching range r1-r2:
cpu0 processes following transaction: 1. remove r1-r2 2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating the range has been removed.
At the same time, r1-r3 is not visible yet, because it can only be found in the clone.
The situation persists for all lookups until after cpu0 hits II).
The fix is easy: Don't check the genbit from pipapo lookup functions. This is possible because unlike the other set types, the new elements are not reachable from the live copy of the dataset.
The clone/live pointer swap is enough to avoid matching on old elements while at the same time all new elements are exposed in one go.
After this change, step B above returns a match in r1-r2. This is fine: r1-r2 only becomes truly invalid the moment they get freed. This happens after a synchronize_rcu() call and rcu read lock is held via netfilter hook traversal (nf_hook_slow()).
Cc: Stefano Brivio sbrivio@redhat.com Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 20 ++++++++++++++++++-- net/netfilter/nft_set_pipapo_avx2.c | 4 +--- 2 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 1a19649c28511..fa6741b3205a6 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -511,6 +511,23 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, * * This function is called from the data path. It will search for * an element matching the given key in the current active copy. + * Unlike other set types, this uses NFT_GENMASK_ANY instead of + * nft_genmask_cur(). + * + * This is because new (future) elements are not reachable from + * priv->match, they get added to priv->clone instead. + * When the commit phase flips the generation bitmask, the + * 'now old' entries are skipped but without the 'now current' + * elements becoming visible. Using nft_genmask_cur() thus creates + * inconsistent state: matching old entries get skipped but thew + * newly matching entries are unreachable. + * + * GENMASK will still find the 'now old' entries which ensures consistent + * priv->match view. + * + * nft_pipapo_commit swaps ->clone and ->match shortly after the + * genbit flip. As ->clone doesn't contain the old entries in the first + * place, lookup will only find the now-current ones. * * Return: ntables API extension pointer or NULL if no match. */ @@ -519,12 +536,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, const u32 *key) { struct nft_pipapo *priv = nft_set_priv(set); - u8 genmask = nft_genmask_cur(net); const struct nft_pipapo_match *m; const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match); - e = pipapo_get(m, (const u8 *)key, genmask, get_jiffies_64()); + e = pipapo_get(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
return e ? &e->ext : NULL; } diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index 2155c7f345c21..39e356c9687a9 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1153,7 +1153,6 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, struct nft_pipapo *priv = nft_set_priv(set); const struct nft_set_ext *ext = NULL; struct nft_pipapo_scratch *scratch; - u8 genmask = nft_genmask_cur(net); const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; const u8 *rp = (const u8 *)key; @@ -1249,8 +1248,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, if (last) { const struct nft_set_ext *e = &f->mt[ret].e->ext;
- if (unlikely(nft_set_elem_expired(e) || - !nft_set_elem_active(e, genmask))) + if (unlikely(nft_set_elem_expired(e))) goto next_match;
ext = e;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit a60f7bf4a1524d8896b76ba89623080aebf44272 ]
When the rbtree lookup function finds a match in the rbtree, it sets the range start interval to a potentially inactive element.
Then, after tree lookup, if the matching element is inactive, it returns NULL and suppresses a matching result.
This is wrong and leads to false negative matches when a transaction has already entered the commit phase.
cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: I) increments the genbit A) observes new genbit B) finds matching range C) returns no match: found range invalid in new generation II) removes old elements from the tree C New nft_lookup happening now will find matching element, because it is no longer obscured by old, inactive one.
Consider a packet matching range r1-r2:
cpu0 processes following transaction: 1. remove r1-r2 2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating the range has been removed. It does NOT test for further matches.
The situation persists for all lookups until after cpu0 hits II) after which r1-r3 range start node is tested for the first time.
Move the "interval start is valid" check ahead so that tree traversal continues if the starting interval is not valid in this generation.
Thanks to Stefan Hanreich for providing an initial reproducer for this bug.
Reported-by: Stefan Hanreich s.hanreich@proxmox.com Fixes: c1eda3c6394f ("netfilter: nft_rbtree: ignore inactive matching element with no descendants") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_rbtree.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index 938a257c069e2..b1f04168ec937 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -77,7 +77,9 @@ __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, nft_rbtree_interval_end(rbe) && nft_rbtree_interval_start(interval)) continue; - interval = rbe; + if (nft_set_elem_active(&rbe->ext, genmask) && + !nft_rbtree_elem_expired(rbe)) + interval = rbe; } else if (d > 0) parent = rcu_dereference_raw(parent->rb_right); else { @@ -102,8 +104,6 @@ __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, }
if (set->flags & NFT_SET_INTERVAL && interval != NULL && - nft_set_elem_active(&interval->ext, genmask) && - !nft_rbtree_elem_expired(interval) && nft_rbtree_interval_start(interval)) return &interval->ext;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit a1050dd071682d2c9d8d6d5c96119f8f401b62f0 ]
Restore commit 28339b21a365 ("netfilter: nf_tables: do not send complete notification of deletions") and fix it:
- Avoid upfront modification of 'event' variable so the conditionals become effective. - Always include NFTA_OBJ_TYPE attribute in object notifications, user space requires it for proper deserialisation. - Catch DESTROY events, too.
Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: b2f742c846ca ("netfilter: nf_tables: restart set lookup on base_seq change") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 67 ++++++++++++++++++++++++++--------- 1 file changed, 50 insertions(+), 17 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 0e86434ca13b0..3a443765d7e90 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1153,9 +1153,9 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net, { struct nlmsghdr *nlh;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, family, - NFNETLINK_V0, nft_base_seq(net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, family, NFNETLINK_V0, nft_base_seq(net)); if (!nlh) goto nla_put_failure;
@@ -1165,6 +1165,12 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net, NFTA_TABLE_PAD)) goto nla_put_failure;
+ if (event == NFT_MSG_DELTABLE || + event == NFT_MSG_DESTROYTABLE) { + nlmsg_end(skb, nlh); + return 0; + } + if (nla_put_be32(skb, NFTA_TABLE_FLAGS, htonl(table->flags & NFT_TABLE_F_MASK))) goto nla_put_failure; @@ -2022,9 +2028,9 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net, { struct nlmsghdr *nlh;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, family, - NFNETLINK_V0, nft_base_seq(net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, family, NFNETLINK_V0, nft_base_seq(net)); if (!nlh) goto nla_put_failure;
@@ -2034,6 +2040,13 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net, NFTA_CHAIN_PAD)) goto nla_put_failure;
+ if (!hook_list && + (event == NFT_MSG_DELCHAIN || + event == NFT_MSG_DESTROYCHAIN)) { + nlmsg_end(skb, nlh); + return 0; + } + if (nft_is_base_chain(chain)) { const struct nft_base_chain *basechain = nft_base_chain(chain); struct nft_stats __percpu *stats; @@ -4871,9 +4884,10 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, u32 seq = ctx->seq; int i;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, ctx->family, - NFNETLINK_V0, nft_base_seq(ctx->net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, ctx->family, NFNETLINK_V0, + nft_base_seq(ctx->net)); if (!nlh) goto nla_put_failure;
@@ -4885,6 +4899,12 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, NFTA_SET_PAD)) goto nla_put_failure;
+ if (event == NFT_MSG_DELSET || + event == NFT_MSG_DESTROYSET) { + nlmsg_end(skb, nlh); + return 0; + } + if (set->flags != 0) if (nla_put_be32(skb, NFTA_SET_FLAGS, htonl(set->flags))) goto nla_put_failure; @@ -8359,20 +8379,26 @@ static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net, { struct nlmsghdr *nlh;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, family, - NFNETLINK_V0, nft_base_seq(net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, family, NFNETLINK_V0, nft_base_seq(net)); if (!nlh) goto nla_put_failure;
if (nla_put_string(skb, NFTA_OBJ_TABLE, table->name) || nla_put_string(skb, NFTA_OBJ_NAME, obj->key.name) || + nla_put_be32(skb, NFTA_OBJ_TYPE, htonl(obj->ops->type->type)) || nla_put_be64(skb, NFTA_OBJ_HANDLE, cpu_to_be64(obj->handle), NFTA_OBJ_PAD)) goto nla_put_failure;
- if (nla_put_be32(skb, NFTA_OBJ_TYPE, htonl(obj->ops->type->type)) || - nla_put_be32(skb, NFTA_OBJ_USE, htonl(obj->use)) || + if (event == NFT_MSG_DELOBJ || + event == NFT_MSG_DESTROYOBJ) { + nlmsg_end(skb, nlh); + return 0; + } + + if (nla_put_be32(skb, NFTA_OBJ_USE, htonl(obj->use)) || nft_object_dump(skb, NFTA_OBJ_DATA, obj, reset)) goto nla_put_failure;
@@ -9413,9 +9439,9 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net, struct nft_hook *hook; struct nlmsghdr *nlh;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, family, - NFNETLINK_V0, nft_base_seq(net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, family, NFNETLINK_V0, nft_base_seq(net)); if (!nlh) goto nla_put_failure;
@@ -9425,6 +9451,13 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net, NFTA_FLOWTABLE_PAD)) goto nla_put_failure;
+ if (!hook_list && + (event == NFT_MSG_DELFLOWTABLE || + event == NFT_MSG_DESTROYFLOWTABLE)) { + nlmsg_end(skb, nlh); + return 0; + } + if (nla_put_be32(skb, NFTA_FLOWTABLE_USE, htonl(flowtable->use)) || nla_put_be32(skb, NFTA_FLOWTABLE_FLAGS, htonl(flowtable->data.flags))) goto nla_put_failure;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 64102d9bbc3d41dac5188b8fba75b1344c438970 ]
This will soon be read from packet path around same time as the gencursor.
Both gencursor and base_seq get incremented almost at the same time, so it makes sense to place them in the same structure.
This doesn't increase struct net size on 64bit due to padding.
Signed-off-by: Florian Westphal fw@strlen.de Stable-dep-of: b2f742c846ca ("netfilter: nf_tables: restart set lookup on base_seq change") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 1 - include/net/netns/nftables.h | 1 + net/netfilter/nf_tables_api.c | 65 ++++++++++++++++--------------- 3 files changed, 34 insertions(+), 33 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index cf65703e221fa..16daeac2ac555 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1916,7 +1916,6 @@ struct nftables_pernet { struct mutex commit_mutex; u64 table_handle; u64 tstamp; - unsigned int base_seq; unsigned int gc_seq; u8 validate_state; struct work_struct destroy_work; diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h index cc8060c017d5f..99dd166c5d07c 100644 --- a/include/net/netns/nftables.h +++ b/include/net/netns/nftables.h @@ -3,6 +3,7 @@ #define _NETNS_NFTABLES_H_
struct netns_nftables { + unsigned int base_seq; u8 gencursor; };
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 3a443765d7e90..5ea7a015504bb 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1131,11 +1131,14 @@ nf_tables_chain_type_lookup(struct net *net, const struct nlattr *nla, return ERR_PTR(-ENOENT); }
-static __be16 nft_base_seq(const struct net *net) +static unsigned int nft_base_seq(const struct net *net) { - struct nftables_pernet *nft_net = nft_pernet(net); + return READ_ONCE(net->nft.base_seq); +}
- return htons(nft_net->base_seq & 0xffff); +static __be16 nft_base_seq_be16(const struct net *net) +{ + return htons(nft_base_seq(net) & 0xffff); }
static const struct nla_policy nft_table_policy[NFTA_TABLE_MAX + 1] = { @@ -1155,7 +1158,7 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), - flags, family, NFNETLINK_V0, nft_base_seq(net)); + flags, family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -1248,7 +1251,7 @@ static int nf_tables_dump_tables(struct sk_buff *skb,
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -2030,7 +2033,7 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), - flags, family, NFNETLINK_V0, nft_base_seq(net)); + flags, family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -2133,7 +2136,7 @@ static int nf_tables_dump_chains(struct sk_buff *skb,
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -3671,7 +3674,7 @@ static int nf_tables_fill_rule_info(struct sk_buff *skb, struct net *net, u16 type = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event);
nlh = nfnl_msg_put(skb, portid, seq, type, flags, family, NFNETLINK_V0, - nft_base_seq(net)); + nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -3839,7 +3842,7 @@ static int nf_tables_dump_rules(struct sk_buff *skb,
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -4050,7 +4053,7 @@ static int nf_tables_getrule_reset(struct sk_buff *skb, buf = kasprintf(GFP_ATOMIC, "%.*s:%u", nla_len(nla[NFTA_RULE_TABLE]), (char *)nla_data(nla[NFTA_RULE_TABLE]), - nft_net->base_seq); + nft_base_seq(net)); audit_log_nfcfg(buf, info->nfmsg->nfgen_family, 1, AUDIT_NFT_OP_RULE_RESET, GFP_ATOMIC); kfree(buf); @@ -4887,7 +4890,7 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), flags, ctx->family, NFNETLINK_V0, - nft_base_seq(ctx->net)); + nft_base_seq_be16(ctx->net)); if (!nlh) goto nla_put_failure;
@@ -5032,7 +5035,7 @@ static int nf_tables_dump_sets(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (ctx->family != NFPROTO_UNSPEC && @@ -6209,7 +6212,7 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (dump_ctx->ctx.family != NFPROTO_UNSPEC && @@ -6238,7 +6241,7 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) seq = cb->nlh->nlmsg_seq;
nlh = nfnl_msg_put(skb, portid, seq, event, NLM_F_MULTI, - table->family, NFNETLINK_V0, nft_base_seq(net)); + table->family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -6331,7 +6334,7 @@ static int nf_tables_fill_setelem_info(struct sk_buff *skb,
event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); nlh = nfnl_msg_put(skb, portid, seq, event, flags, ctx->family, - NFNETLINK_V0, nft_base_seq(ctx->net)); + NFNETLINK_V0, nft_base_seq_be16(ctx->net)); if (!nlh) goto nla_put_failure;
@@ -6630,7 +6633,7 @@ static int nf_tables_getsetelem_reset(struct sk_buff *skb, } nelems++; } - audit_log_nft_set_reset(dump_ctx.ctx.table, nft_net->base_seq, nelems); + audit_log_nft_set_reset(dump_ctx.ctx.table, nft_base_seq(info->net), nelems);
out_unlock: rcu_read_unlock(); @@ -8381,7 +8384,7 @@ static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), - flags, family, NFNETLINK_V0, nft_base_seq(net)); + flags, family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -8446,7 +8449,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -8480,7 +8483,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) idx++; } if (ctx->reset && entries) - audit_log_obj_reset(table, nft_net->base_seq, entries); + audit_log_obj_reset(table, nft_base_seq(net), entries); if (rc < 0) break; } @@ -8649,7 +8652,7 @@ static int nf_tables_getobj_reset(struct sk_buff *skb, buf = kasprintf(GFP_ATOMIC, "%.*s:%u", nla_len(nla[NFTA_OBJ_TABLE]), (char *)nla_data(nla[NFTA_OBJ_TABLE]), - nft_net->base_seq); + nft_base_seq(net)); audit_log_nfcfg(buf, info->nfmsg->nfgen_family, 1, AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); kfree(buf); @@ -8754,9 +8757,8 @@ void nft_obj_notify(struct net *net, const struct nft_table *table, struct nft_object *obj, u32 portid, u32 seq, int event, u16 flags, int family, int report, gfp_t gfp) { - struct nftables_pernet *nft_net = nft_pernet(net); char *buf = kasprintf(gfp, "%s:%u", - table->name, nft_net->base_seq); + table->name, nft_base_seq(net));
audit_log_nfcfg(buf, family, @@ -9441,7 +9443,7 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), - flags, family, NFNETLINK_V0, nft_base_seq(net)); + flags, family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -9510,7 +9512,7 @@ static int nf_tables_dump_flowtable(struct sk_buff *skb,
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -9695,17 +9697,16 @@ static void nf_tables_flowtable_destroy(struct nft_flowtable *flowtable) static int nf_tables_fill_gen_info(struct sk_buff *skb, struct net *net, u32 portid, u32 seq) { - struct nftables_pernet *nft_net = nft_pernet(net); struct nlmsghdr *nlh; char buf[TASK_COMM_LEN]; int event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, NFT_MSG_NEWGEN);
nlh = nfnl_msg_put(skb, portid, seq, event, 0, AF_UNSPEC, - NFNETLINK_V0, nft_base_seq(net)); + NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
- if (nla_put_be32(skb, NFTA_GEN_ID, htonl(nft_net->base_seq)) || + if (nla_put_be32(skb, NFTA_GEN_ID, htonl(nft_base_seq(net))) || nla_put_be32(skb, NFTA_GEN_PROC_PID, htonl(task_pid_nr(current))) || nla_put_string(skb, NFTA_GEN_PROC_NAME, get_task_comm(buf, current))) goto nla_put_failure; @@ -10966,11 +10967,11 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) * Bump generation counter, invalidate any dump in progress. * Cannot fail after this point. */ - base_seq = READ_ONCE(nft_net->base_seq); + base_seq = nft_base_seq(net); while (++base_seq == 0) ;
- WRITE_ONCE(nft_net->base_seq, base_seq); + WRITE_ONCE(net->nft.base_seq, base_seq);
gc_seq = nft_gc_seq_begin(nft_net);
@@ -11179,7 +11180,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
nft_commit_notify(net, NETLINK_CB(skb).portid); nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN); - nf_tables_commit_audit_log(&adl, nft_net->base_seq); + nf_tables_commit_audit_log(&adl, nft_base_seq(net));
nft_gc_seq_end(nft_net, gc_seq); nft_net->validate_state = NFT_VALIDATE_SKIP; @@ -11504,7 +11505,7 @@ static bool nf_tables_valid_genid(struct net *net, u32 genid) mutex_lock(&nft_net->commit_mutex); nft_net->tstamp = get_jiffies_64();
- genid_ok = genid == 0 || nft_net->base_seq == genid; + genid_ok = genid == 0 || nft_base_seq(net) == genid; if (!genid_ok) mutex_unlock(&nft_net->commit_mutex);
@@ -12141,7 +12142,7 @@ static int __net_init nf_tables_init_net(struct net *net) INIT_LIST_HEAD(&nft_net->module_list); INIT_LIST_HEAD(&nft_net->notify_list); mutex_init(&nft_net->commit_mutex); - nft_net->base_seq = 1; + net->nft.base_seq = 1; nft_net->gc_seq = 0; nft_net->validate_state = NFT_VALIDATE_SKIP; INIT_WORK(&nft_net->destroy_work, nf_tables_trans_destroy_work);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 11fe5a82e53ac3581a80c88e0e35fb8a80e15f48 ]
This function was added for retpoline mitigation and is replaced by a static inline helper if mitigations are not enabled.
Enable this helper function unconditionally so next patch can add a lookup restart mechanism to fix possible false negatives while transactions are in progress.
Adding lookup restarts in nft_lookup_eval doesn't work as nft_objref would then need the same copypaste loop.
This patch is separate to ease review of the actual bug fix.
Suggested-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Stable-dep-of: b2f742c846ca ("netfilter: nf_tables: restart set lookup on base_seq change") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables_core.h | 10 ++-------- net/netfilter/nft_lookup.c | 17 ++++++++++++----- 2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h index 6a52fb97b8443..04699eac5b524 100644 --- a/include/net/netfilter/nf_tables_core.h +++ b/include/net/netfilter/nf_tables_core.h @@ -109,17 +109,11 @@ nft_hash_lookup_fast(const struct net *net, const struct nft_set *set, const struct nft_set_ext * nft_hash_lookup(const struct net *net, const struct nft_set *set, const u32 *key); +#endif + const struct nft_set_ext * nft_set_do_lookup(const struct net *net, const struct nft_set *set, const u32 *key); -#else -static inline const struct nft_set_ext * -nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key) -{ - return set->ops->lookup(net, set, key); -} -#endif
/* called from nft_pipapo_avx2.c */ const struct nft_set_ext * diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 40c602ffbcba7..2c6909bf1b407 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -24,11 +24,11 @@ struct nft_lookup { struct nft_set_binding binding; };
-#ifdef CONFIG_MITIGATION_RETPOLINE -const struct nft_set_ext * -nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key) +static const struct nft_set_ext * +__nft_set_do_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { +#ifdef CONFIG_MITIGATION_RETPOLINE if (set->ops == &nft_set_hash_fast_type.ops) return nft_hash_lookup_fast(net, set, key); if (set->ops == &nft_set_hash_type.ops) @@ -51,10 +51,17 @@ nft_set_do_lookup(const struct net *net, const struct nft_set *set, return nft_rbtree_lookup(net, set, key);
WARN_ON_ONCE(1); +#endif return set->ops->lookup(net, set, key); } + +const struct nft_set_ext * +nft_set_do_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) +{ + return __nft_set_do_lookup(net, set, key); +} EXPORT_SYMBOL_GPL(nft_set_do_lookup); -#endif
void nft_lookup_eval(const struct nft_expr *expr, struct nft_regs *regs,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit b2f742c846cab9afc5953a5d8f17b54922dcc723 ]
The hash, hash_fast, rhash and bitwise sets may indicate no result even though a matching element exists during a short time window while other cpu is finalizing the transaction.
This happens when the hash lookup/bitwise lookup function has picked up the old genbit, right before it was toggled by nf_tables_commit(), but then the same cpu managed to unlink the matching old element from the hash table:
cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: A) observes old genbit increments base_seq I) increments the genbit II) removes old element from the set B) finds matching element C) returns no match: found element is not valid in old generation
Next lookup observes new genbit and finds matching e2.
Consider a packet matching element e1, e2.
cpu0 processes following transaction: 1. remove e1 2. adds e2, which has same key as e1.
P matches both e1 and e2. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case:
cpu1 observed the old genbit. e2 will not be considered once it is found. The element e1 is not found anymore if cpu0 managed to unlink it from the hlist before cpu1 found it during list traversal.
The situation only occurs for a brief time period, lookups happening after I) observe new genbit and return e2.
This problem exists in all set types except nft_set_pipapo, so fix it once in nft_lookup rather than each set ops individually.
Sample the base sequence counter, which gets incremented right before the genbit is changed.
Then, if no match is found, retry the lookup if the base sequence was altered in between.
If the base sequence hasn't changed: - No update took place: no-match result is expected. This is the common case. or: - nf_tables_commit() hasn't progressed to genbit update yet. Old elements were still visible and nomatch result is expected, or: - nf_tables_commit updated the genbit: We picked up the new base_seq, so the lookup function also picked up the new genbit, no-match result is expected.
If the old genbit was observed, then nft_lookup also picked up the old base_seq: nft_lookup_should_retry() returns true and relookup is performed in the new generation.
This problem was added when the unconditional synchronize_rcu() call that followed the current/next generation bit toggle was removed.
Thanks to Pablo Neira Ayuso for reviewing an earlier version of this patchset, for suggesting re-use of existing base_seq and placement of the restart loop in nft_set_do_lookup().
Fixes: 0cbc06b3faba ("netfilter: nf_tables: remove synchronize_rcu in commit phase") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 3 ++- net/netfilter/nft_lookup.c | 31 ++++++++++++++++++++++++++++++- 2 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 5ea7a015504bb..cde63e5f18d8f 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10971,7 +10971,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) while (++base_seq == 0) ;
- WRITE_ONCE(net->nft.base_seq, base_seq); + /* pairs with smp_load_acquire in nft_lookup_eval */ + smp_store_release(&net->nft.base_seq, base_seq);
gc_seq = nft_gc_seq_begin(nft_net);
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 2c6909bf1b407..58c5b14889c47 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -55,11 +55,40 @@ __nft_set_do_lookup(const struct net *net, const struct nft_set *set, return set->ops->lookup(net, set, key); }
+static unsigned int nft_base_seq(const struct net *net) +{ + /* pairs with smp_store_release() in nf_tables_commit() */ + return smp_load_acquire(&net->nft.base_seq); +} + +static bool nft_lookup_should_retry(const struct net *net, unsigned int seq) +{ + return unlikely(seq != nft_base_seq(net)); +} + const struct nft_set_ext * nft_set_do_lookup(const struct net *net, const struct nft_set *set, const u32 *key) { - return __nft_set_do_lookup(net, set, key); + const struct nft_set_ext *ext; + unsigned int base_seq; + + do { + base_seq = nft_base_seq(net); + + ext = __nft_set_do_lookup(net, set, key); + if (ext) + break; + /* No match? There is a small chance that lookup was + * performed in the old generation, but nf_tables_commit() + * already unlinked a (matching) element. + * + * We need to repeat the lookup to make sure that we didn't + * miss a matching element in the new generation. + */ + } while (nft_lookup_should_retry(net, base_seq)); + + return ext; } EXPORT_SYMBOL_GPL(nft_set_do_lookup);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 8884c693991333ae065830554b9b0c96590b1bb2 ]
hsr_for_each_port is called in many places without holding the RCU read lock, this may trigger warnings on debug kernels. Most of the callers are actually hold rtnl lock. So add a new helper hsr_for_each_port_rtnl to allow callers in suitable contexts to iterate ports safely without explicit RCU locking.
This patch only fixed the callers that is hold rtnl lock. Other caller issues will be fixed in later patches.
Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250905091533.377443-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_device.c | 18 +++++++++--------- net/hsr/hsr_main.c | 2 +- net/hsr/hsr_main.h | 3 +++ 3 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index 88657255fec12..bce7b4061ce08 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -49,7 +49,7 @@ static bool hsr_check_carrier(struct hsr_port *master)
ASSERT_RTNL();
- hsr_for_each_port(master->hsr, port) { + hsr_for_each_port_rtnl(master->hsr, port) { if (port->type != HSR_PT_MASTER && is_slave_up(port->dev)) { netif_carrier_on(master->dev); return true; @@ -105,7 +105,7 @@ int hsr_get_max_mtu(struct hsr_priv *hsr) struct hsr_port *port;
mtu_max = ETH_DATA_LEN; - hsr_for_each_port(hsr, port) + hsr_for_each_port_rtnl(hsr, port) if (port->type != HSR_PT_MASTER) mtu_max = min(port->dev->mtu, mtu_max);
@@ -139,7 +139,7 @@ static int hsr_dev_open(struct net_device *dev)
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { @@ -172,7 +172,7 @@ static int hsr_dev_close(struct net_device *dev) struct hsr_priv *hsr;
hsr = netdev_priv(dev); - hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { @@ -205,7 +205,7 @@ static netdev_features_t hsr_features_recompute(struct hsr_priv *hsr, * may become enabled. */ features &= ~NETIF_F_ONE_FOR_ALL; - hsr_for_each_port(hsr, port) + hsr_for_each_port_rtnl(hsr, port) features = netdev_increment_features(features, port->dev->features, mask); @@ -484,7 +484,7 @@ static void hsr_set_rx_mode(struct net_device *dev)
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { @@ -506,7 +506,7 @@ static void hsr_change_rx_flags(struct net_device *dev, int change)
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { @@ -534,7 +534,7 @@ static int hsr_ndo_vlan_rx_add_vid(struct net_device *dev,
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER || port->type == HSR_PT_INTERLINK) continue; @@ -580,7 +580,7 @@ static int hsr_ndo_vlan_rx_kill_vid(struct net_device *dev,
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { switch (port->type) { case HSR_PT_SLAVE_A: case HSR_PT_SLAVE_B: diff --git a/net/hsr/hsr_main.c b/net/hsr/hsr_main.c index 192893c3f2ec7..ac1eb1db1a52b 100644 --- a/net/hsr/hsr_main.c +++ b/net/hsr/hsr_main.c @@ -22,7 +22,7 @@ static bool hsr_slave_empty(struct hsr_priv *hsr) { struct hsr_port *port;
- hsr_for_each_port(hsr, port) + hsr_for_each_port_rtnl(hsr, port) if (port->type != HSR_PT_MASTER) return false; return true; diff --git a/net/hsr/hsr_main.h b/net/hsr/hsr_main.h index 135ec5fce0196..33b0d2460c9bc 100644 --- a/net/hsr/hsr_main.h +++ b/net/hsr/hsr_main.h @@ -224,6 +224,9 @@ struct hsr_priv { #define hsr_for_each_port(hsr, port) \ list_for_each_entry_rcu((port), &(hsr)->ports, port_list)
+#define hsr_for_each_port_rtnl(hsr, port) \ + list_for_each_entry_rcu((port), &(hsr)->ports, port_list, lockdep_rtnl_is_held()) + struct hsr_port *hsr_port_get_hsr(struct hsr_priv *hsr, enum hsr_port_type pt);
/* Caller must ensure skb is a valid HSR frame */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 393c841fe4333cdd856d0ca37b066d72746cfaa6 ]
hsr_port_get_hsr() iterates over ports using hsr_for_each_port(), but many of its callers do not hold the required RCU lock.
Switch to hsr_for_each_port_rtnl(), since most callers already hold the rtnl lock. After review, all callers are covered by either the rtnl lock or the RCU lock, except hsr_dev_xmit(). Fix this by adding an RCU read lock there.
Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250905091533.377443-3-liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_device.c | 3 +++ net/hsr/hsr_main.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index bce7b4061ce08..702da1f9aaa90 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -226,6 +226,7 @@ static netdev_tx_t hsr_dev_xmit(struct sk_buff *skb, struct net_device *dev) struct hsr_priv *hsr = netdev_priv(dev); struct hsr_port *master;
+ rcu_read_lock(); master = hsr_port_get_hsr(hsr, HSR_PT_MASTER); if (master) { skb->dev = master->dev; @@ -238,6 +239,8 @@ static netdev_tx_t hsr_dev_xmit(struct sk_buff *skb, struct net_device *dev) dev_core_stats_tx_dropped_inc(dev); dev_kfree_skb_any(skb); } + rcu_read_unlock(); + return NETDEV_TX_OK; }
diff --git a/net/hsr/hsr_main.c b/net/hsr/hsr_main.c index ac1eb1db1a52b..bc94b07101d80 100644 --- a/net/hsr/hsr_main.c +++ b/net/hsr/hsr_main.c @@ -134,7 +134,7 @@ struct hsr_port *hsr_port_get_hsr(struct hsr_priv *hsr, enum hsr_port_type pt) { struct hsr_port *port;
- hsr_for_each_port(hsr, port) + hsr_for_each_port_rtnl(hsr, port) if (port->type == pt) return port; return NULL;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 847748fc66d08a89135a74e29362a66ba4e3ab15 ]
hsr_get_port_ndev calls hsr_for_each_port, which need to hold rcu lock. On the other hand, before return the port device, we need to hold the device reference to avoid UaF in the caller function.
Suggested-by: Paolo Abeni pabeni@redhat.com Fixes: 9c10dd8eed74 ("net: hsr: Create and export hsr_get_port_ndev()") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250905091533.377443-4-liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ti/icssg/icssg_prueth.c | 20 ++++++++++++++------ net/hsr/hsr_device.c | 7 ++++++- 2 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c index f436d7cf565a1..1a9cc8206430b 100644 --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c @@ -691,7 +691,7 @@ static void icssg_prueth_hsr_fdb_add_del(struct prueth_emac *emac,
static int icssg_prueth_hsr_add_mcast(struct net_device *ndev, const u8 *addr) { - struct net_device *real_dev; + struct net_device *real_dev, *port_dev; struct prueth_emac *emac; u8 vlan_id, i;
@@ -700,11 +700,15 @@ static int icssg_prueth_hsr_add_mcast(struct net_device *ndev, const u8 *addr)
if (is_hsr_master(real_dev)) { for (i = HSR_PT_SLAVE_A; i < HSR_PT_INTERLINK; i++) { - emac = netdev_priv(hsr_get_port_ndev(real_dev, i)); - if (!emac) + port_dev = hsr_get_port_ndev(real_dev, i); + emac = netdev_priv(port_dev); + if (!emac) { + dev_put(port_dev); return -EINVAL; + } icssg_prueth_hsr_fdb_add_del(emac, addr, vlan_id, true); + dev_put(port_dev); } } else { emac = netdev_priv(real_dev); @@ -716,7 +720,7 @@ static int icssg_prueth_hsr_add_mcast(struct net_device *ndev, const u8 *addr)
static int icssg_prueth_hsr_del_mcast(struct net_device *ndev, const u8 *addr) { - struct net_device *real_dev; + struct net_device *real_dev, *port_dev; struct prueth_emac *emac; u8 vlan_id, i;
@@ -725,11 +729,15 @@ static int icssg_prueth_hsr_del_mcast(struct net_device *ndev, const u8 *addr)
if (is_hsr_master(real_dev)) { for (i = HSR_PT_SLAVE_A; i < HSR_PT_INTERLINK; i++) { - emac = netdev_priv(hsr_get_port_ndev(real_dev, i)); - if (!emac) + port_dev = hsr_get_port_ndev(real_dev, i); + emac = netdev_priv(port_dev); + if (!emac) { + dev_put(port_dev); return -EINVAL; + } icssg_prueth_hsr_fdb_add_del(emac, addr, vlan_id, false); + dev_put(port_dev); } } else { emac = netdev_priv(real_dev); diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index 702da1f9aaa90..fbbc3ccf9df64 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -675,9 +675,14 @@ struct net_device *hsr_get_port_ndev(struct net_device *ndev, struct hsr_priv *hsr = netdev_priv(ndev); struct hsr_port *port;
+ rcu_read_lock(); hsr_for_each_port(hsr, port) - if (port->type == pt) + if (port->type == pt) { + dev_hold(port->dev); + rcu_read_unlock(); return port->dev; + } + rcu_read_unlock(); return NULL; } EXPORT_SYMBOL(hsr_get_port_ndev);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pengyu Luo mitltlatltl@gmail.com
[ Upstream commit 942e47ab228c7dd27c2ae043c17e7aab2028082c ]
property "qcom,tune-usb2-preem" is for EUSB2_TUNE_USB2_PREEM property "qcom,tune-usb2-amplitude" is for EUSB2_TUNE_IUSB2
The downstream correspondence is as follows: EUSB2_TUNE_USB2_PREEM: Tx pre-emphasis tuning EUSB2_TUNE_IUSB2: HS trasmit amplitude EUSB2_TUNE_SQUELCH_U: Squelch detection threshold EUSB2_TUNE_HSDISC: HS disconnect threshold EUSB2_TUNE_EUSB_SLEW: slew rate
Fixes: 31bc94de7602 ("phy: qualcomm: phy-qcom-eusb2-repeater: Don't zero-out registers") Signed-off-by: Pengyu Luo mitltlatltl@gmail.com Reviewed-by: Konrad Dybcio konrad.dybcio@oss.qualcomm.com Reviewed-by: Luca Weiss luca.weiss@fairphone.com Link: https://lore.kernel.org/r/20250812093957.32235-1-mitltlatltl@gmail.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c b/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c index d7493c2294ef2..3709fba42ebd8 100644 --- a/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c +++ b/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c @@ -127,13 +127,13 @@ static int eusb2_repeater_init(struct phy *phy) rptr->cfg->init_tbl[i].value);
/* Override registers from devicetree values */ - if (!of_property_read_u8(np, "qcom,tune-usb2-amplitude", &val)) + if (!of_property_read_u8(np, "qcom,tune-usb2-preem", &val)) regmap_write(regmap, base + EUSB2_TUNE_USB2_PREEM, val);
if (!of_property_read_u8(np, "qcom,tune-usb2-disc-thres", &val)) regmap_write(regmap, base + EUSB2_TUNE_HSDISC, val);
- if (!of_property_read_u8(np, "qcom,tune-usb2-preem", &val)) + if (!of_property_read_u8(np, "qcom,tune-usb2-amplitude", &val)) regmap_write(regmap, base + EUSB2_TUNE_IUSB2, val);
/* Wait for status OK */
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yi Sun yi.sun@intel.com
[ Upstream commit f41c538881eec4dcf5961a242097d447f848cda6 ]
The call to idxd_free() introduces a duplicate put_device() leading to a reference count underflow: refcount_t: underflow; use-after-free. WARNING: CPU: 15 PID: 4428 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110 ... Call Trace: <TASK> idxd_remove+0xe4/0x120 [idxd] pci_device_remove+0x3f/0xb0 device_release_driver_internal+0x197/0x200 driver_detach+0x48/0x90 bus_remove_driver+0x74/0xf0 pci_unregister_driver+0x2e/0xb0 idxd_exit_module+0x34/0x7a0 [idxd] __do_sys_delete_module.constprop.0+0x183/0x280 do_syscall_64+0x54/0xd70 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The idxd_unregister_devices() which is invoked at the very beginning of idxd_remove(), already takes care of the necessary put_device() through the following call path: idxd_unregister_devices() -> device_unregister() -> put_device()
In addition, when CONFIG_DEBUG_KOBJECT_RELEASE is enabled, put_device() may trigger asynchronous cleanup via schedule_delayed_work(). If idxd_free() is called immediately after, it can result in a use-after-free.
Remove the improper idxd_free() to avoid both the refcount underflow and potential memory corruption during module unload.
Fixes: d5449ff1b04d ("dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call") Signed-off-by: Yi Sun yi.sun@intel.com Tested-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Dave Jiang dave.jiang@intel.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com
Link: https://lore.kernel.org/r/20250729150313.1934101-2-yi.sun@intel.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/idxd/init.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 80355d03004db..40cc9c070081f 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -1295,7 +1295,6 @@ static void idxd_remove(struct pci_dev *pdev) idxd_cleanup(idxd); pci_iounmap(pdev, idxd->reg_base); put_device(idxd_confdev(idxd)); - idxd_free(idxd); pci_disable_device(pdev); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yi Sun yi.sun@intel.com
[ Upstream commit b7cb9a034305d52222433fad10c3de10204f29e7 ]
A recent refactor introduced a misplaced put_device() call, resulting in a reference count underflow during module unload.
There is no need to add additional put_device() calls for idxd groups, engines, or workqueues. Although the commit claims: "Note, this also fixes the missing put_device() for idxd groups, engines, and wqs."
It appears no such omission actually existed. The required cleanup is already handled by the call chain: idxd_unregister_devices() -> device_unregister() -> put_device()
Extend idxd_cleanup() to handle the remaining necessary cleanup and remove idxd_cleanup_internals(), which duplicates deallocation logic for idxd, engines, groups, and workqueues. Memory management is also properly handled through the Linux device model.
Fixes: a409e919ca32 ("dmaengine: idxd: Refactor remove call with idxd_cleanup() helper") Signed-off-by: Yi Sun yi.sun@intel.com Tested-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Dave Jiang dave.jiang@intel.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com
Link: https://lore.kernel.org/r/20250729150313.1934101-3-yi.sun@intel.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/idxd/init.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 40cc9c070081f..40f4bf4467638 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -1292,7 +1292,10 @@ static void idxd_remove(struct pci_dev *pdev) device_unregister(idxd_confdev(idxd)); idxd_shutdown(pdev); idxd_device_remove_debugfs(idxd); - idxd_cleanup(idxd); + perfmon_pmu_remove(idxd); + idxd_cleanup_interrupts(idxd); + if (device_pasid_enabled(idxd)) + idxd_disable_system_pasid(idxd); pci_iounmap(pdev, idxd->reg_base); put_device(idxd_confdev(idxd)); pci_disable_device(pdev);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 39aaa337449e71a41d4813be0226a722827ba606 ]
The clean up in idxd_setup_wqs() has had a couple bugs because the error handling is a bit subtle. It's simpler to just re-write it in a cleaner way. The issues here are:
1) If "idxd->max_wqs" is <= 0 then we call put_device(conf_dev) when "conf_dev" hasn't been initialized. 2) If kzalloc_node() fails then again "conf_dev" is invalid. It's either uninitialized or it points to the "conf_dev" from the previous iteration so it leads to a double free.
It's better to free partial loop iterations within the loop and then the unwinding at the end can handle whole loop iterations. I also renamed the labels to describe what the goto does and not where the goto was located.
Fixes: 3fd2f4bc010c ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs") Reported-by: Colin Ian King colin.i.king@gmail.com Closes: https://lore.kernel.org/all/20250811095836.1642093-1-colin.i.king@gmail.com/ Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/aJnJW3iYTDDCj9sk@stanley.mountain Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/idxd/init.c | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 40f4bf4467638..b559b0e18809e 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -189,27 +189,30 @@ static int idxd_setup_wqs(struct idxd_device *idxd) idxd->wq_enable_map = bitmap_zalloc_node(idxd->max_wqs, GFP_KERNEL, dev_to_node(dev)); if (!idxd->wq_enable_map) { rc = -ENOMEM; - goto err_bitmap; + goto err_free_wqs; }
for (i = 0; i < idxd->max_wqs; i++) { wq = kzalloc_node(sizeof(*wq), GFP_KERNEL, dev_to_node(dev)); if (!wq) { rc = -ENOMEM; - goto err; + goto err_unwind; }
idxd_dev_set_type(&wq->idxd_dev, IDXD_DEV_WQ); conf_dev = wq_confdev(wq); wq->id = i; wq->idxd = idxd; - device_initialize(wq_confdev(wq)); + device_initialize(conf_dev); conf_dev->parent = idxd_confdev(idxd); conf_dev->bus = &dsa_bus_type; conf_dev->type = &idxd_wq_device_type; rc = dev_set_name(conf_dev, "wq%d.%d", idxd->id, wq->id); - if (rc < 0) - goto err; + if (rc < 0) { + put_device(conf_dev); + kfree(wq); + goto err_unwind; + }
mutex_init(&wq->wq_lock); init_waitqueue_head(&wq->err_queue); @@ -220,15 +223,20 @@ static int idxd_setup_wqs(struct idxd_device *idxd) wq->enqcmds_retries = IDXD_ENQCMDS_RETRIES; wq->wqcfg = kzalloc_node(idxd->wqcfg_size, GFP_KERNEL, dev_to_node(dev)); if (!wq->wqcfg) { + put_device(conf_dev); + kfree(wq); rc = -ENOMEM; - goto err; + goto err_unwind; }
if (idxd->hw.wq_cap.op_config) { wq->opcap_bmap = bitmap_zalloc(IDXD_MAX_OPCAP_BITS, GFP_KERNEL); if (!wq->opcap_bmap) { + kfree(wq->wqcfg); + put_device(conf_dev); + kfree(wq); rc = -ENOMEM; - goto err_opcap_bmap; + goto err_unwind; } bitmap_copy(wq->opcap_bmap, idxd->opcap_bmap, IDXD_MAX_OPCAP_BITS); } @@ -239,13 +247,7 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
return 0;
-err_opcap_bmap: - kfree(wq->wqcfg); - -err: - put_device(conf_dev); - kfree(wq); - +err_unwind: while (--i >= 0) { wq = idxd->wqs[i]; if (idxd->hw.wq_cap.op_config) @@ -254,11 +256,10 @@ static int idxd_setup_wqs(struct idxd_device *idxd) conf_dev = wq_confdev(wq); put_device(conf_dev); kfree(wq); - } bitmap_free(idxd->wq_enable_map);
-err_bitmap: +err_free_wqs: kfree(idxd->wqs);
return rc;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gao Xiang hsiangkao@linux.alibaba.com
[ Upstream commit 96debe8c27ee2494bbd78abf3744745a84a745f1 ]
The compressed data for the ztailpacking feature is fetched from the metadata inode (e.g., bd_inode), which is folio-based.
Therefore, the folio interface should be used instead.
Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20250626085459.339830-1-hsiangkao@linux.alibaba.co... Stable-dep-of: 131897c65e2b ("erofs: fix invalid algorithm for encoded extents") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/erofs/zdata.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 9bb53f00c2c62..33c61f3b667c3 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -805,6 +805,7 @@ static int z_erofs_pcluster_begin(struct z_erofs_frontend *fe) struct erofs_map_blocks *map = &fe->map; struct super_block *sb = fe->inode->i_sb; struct z_erofs_pcluster *pcl = NULL; + void *ptr; int ret;
DBG_BUGON(fe->pcl); @@ -854,15 +855,13 @@ static int z_erofs_pcluster_begin(struct z_erofs_frontend *fe) /* bind cache first when cached decompression is preferred */ z_erofs_bind_cache(fe); } else { - void *mptr; - - mptr = erofs_read_metabuf(&map->buf, sb, map->m_pa, false); - if (IS_ERR(mptr)) { - ret = PTR_ERR(mptr); + ptr = erofs_read_metabuf(&map->buf, sb, map->m_pa, false); + if (IS_ERR(ptr)) { + ret = PTR_ERR(ptr); erofs_err(sb, "failed to get inline data %d", ret); return ret; } - get_page(map->buf.page); + folio_get(page_folio(map->buf.page)); WRITE_ONCE(fe->pcl->compressed_bvecs[0].page, map->buf.page); fe->pcl->pageofs_in = map->m_pa & ~PAGE_MASK; fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE; @@ -1325,9 +1324,8 @@ static int z_erofs_decompress_pcluster(struct z_erofs_backend *be, int err)
/* must handle all compressed pages before actual file pages */ if (pcl->from_meta) { - page = pcl->compressed_bvecs[0].page; + folio_put(page_folio(pcl->compressed_bvecs[0].page)); WRITE_ONCE(pcl->compressed_bvecs[0].page, NULL); - put_page(page); } else { /* managed folios are still left in compressed_bvecs[] */ for (i = 0; i < pclusterpages; ++i) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gao Xiang hsiangkao@linux.alibaba.com
[ Upstream commit 5e744cb61536bb4e37caca9c5e84feef638782be ]
- need_kmap is always true except for a ztailpacking case; thus, just open-code that one;
- The upcoming metadata compression will add a new boolean, so simplify this first.
Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu chao@kernel.org Link: https://lore.kernel.org/r/20250714090907.4095645-1-hsiangkao@linux.alibaba.c... Stable-dep-of: 131897c65e2b ("erofs: fix invalid algorithm for encoded extents") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/erofs/data.c | 8 ++++---- fs/erofs/fileio.c | 2 +- fs/erofs/fscache.c | 2 +- fs/erofs/inode.c | 8 ++++---- fs/erofs/internal.h | 2 +- fs/erofs/super.c | 4 ++-- fs/erofs/zdata.c | 5 +++-- fs/erofs/zmap.c | 12 ++++++------ 8 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 16e4a6bd9b973..dd7d86809c188 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -65,10 +65,10 @@ void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb) }
void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, - erofs_off_t offset, bool need_kmap) + erofs_off_t offset) { erofs_init_metabuf(buf, sb); - return erofs_bread(buf, offset, need_kmap); + return erofs_bread(buf, offset, true); }
int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map) @@ -118,7 +118,7 @@ int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map) pos = ALIGN(erofs_iloc(inode) + vi->inode_isize + vi->xattr_isize, unit) + unit * chunknr;
- idx = erofs_read_metabuf(&buf, sb, pos, true); + idx = erofs_read_metabuf(&buf, sb, pos); if (IS_ERR(idx)) { err = PTR_ERR(idx); goto out; @@ -299,7 +299,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
iomap->type = IOMAP_INLINE; - ptr = erofs_read_metabuf(&buf, sb, mdev.m_pa, true); + ptr = erofs_read_metabuf(&buf, sb, mdev.m_pa); if (IS_ERR(ptr)) return PTR_ERR(ptr); iomap->inline_data = ptr; diff --git a/fs/erofs/fileio.c b/fs/erofs/fileio.c index 91781718199e2..3ee082476c8c5 100644 --- a/fs/erofs/fileio.c +++ b/fs/erofs/fileio.c @@ -115,7 +115,7 @@ static int erofs_fileio_scan_folio(struct erofs_fileio *io, struct folio *folio) void *src;
src = erofs_read_metabuf(&buf, inode->i_sb, - map->m_pa + ofs, true); + map->m_pa + ofs); if (IS_ERR(src)) { err = PTR_ERR(src); break; diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 34517ca9df915..9a8ee646e51d9 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -274,7 +274,7 @@ static int erofs_fscache_data_read_slice(struct erofs_fscache_rq *req) size_t size = map.m_llen; void *src;
- src = erofs_read_metabuf(&buf, sb, map.m_pa, true); + src = erofs_read_metabuf(&buf, sb, map.m_pa); if (IS_ERR(src)) return PTR_ERR(src);
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index a0ae0b4f7b012..47215c5e33855 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -39,10 +39,10 @@ static int erofs_read_inode(struct inode *inode) void *ptr; int err = 0;
- ptr = erofs_read_metabuf(&buf, sb, erofs_pos(sb, blkaddr), true); + ptr = erofs_read_metabuf(&buf, sb, erofs_pos(sb, blkaddr)); if (IS_ERR(ptr)) { err = PTR_ERR(ptr); - erofs_err(sb, "failed to get inode (nid: %llu) page, err %d", + erofs_err(sb, "failed to read inode meta block (nid: %llu): %d", vi->nid, err); goto err_out; } @@ -78,10 +78,10 @@ static int erofs_read_inode(struct inode *inode)
memcpy(&copied, dic, gotten); ptr = erofs_read_metabuf(&buf, sb, - erofs_pos(sb, blkaddr + 1), true); + erofs_pos(sb, blkaddr + 1)); if (IS_ERR(ptr)) { err = PTR_ERR(ptr); - erofs_err(sb, "failed to get inode payload block (nid: %llu), err %d", + erofs_err(sb, "failed to read inode payload block (nid: %llu): %d", vi->nid, err); goto err_out; } diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 06b867d2fc3b7..a7699114f6fe6 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -385,7 +385,7 @@ void erofs_put_metabuf(struct erofs_buf *buf); void *erofs_bread(struct erofs_buf *buf, erofs_off_t offset, bool need_kmap); void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb); void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb, - erofs_off_t offset, bool need_kmap); + erofs_off_t offset); int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev); int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len); diff --git a/fs/erofs/super.c b/fs/erofs/super.c index cad87e4d66943..7cc74ef4be031 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -141,7 +141,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb, struct erofs_deviceslot *dis; struct file *file;
- dis = erofs_read_metabuf(buf, sb, *pos, true); + dis = erofs_read_metabuf(buf, sb, *pos); if (IS_ERR(dis)) return PTR_ERR(dis);
@@ -268,7 +268,7 @@ static int erofs_read_superblock(struct super_block *sb) void *data; int ret;
- data = erofs_read_metabuf(&buf, sb, 0, true); + data = erofs_read_metabuf(&buf, sb, 0); if (IS_ERR(data)) { erofs_err(sb, "cannot read erofs superblock"); return PTR_ERR(data); diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 33c61f3b667c3..e8f30eee29b44 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -855,10 +855,11 @@ static int z_erofs_pcluster_begin(struct z_erofs_frontend *fe) /* bind cache first when cached decompression is preferred */ z_erofs_bind_cache(fe); } else { - ptr = erofs_read_metabuf(&map->buf, sb, map->m_pa, false); + erofs_init_metabuf(&map->buf, sb); + ptr = erofs_bread(&map->buf, map->m_pa, false); if (IS_ERR(ptr)) { ret = PTR_ERR(ptr); - erofs_err(sb, "failed to get inline data %d", ret); + erofs_err(sb, "failed to get inline folio %d", ret); return ret; } folio_get(page_folio(map->buf.page)); diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c index f1a15ff22147b..301afd9be4e30 100644 --- a/fs/erofs/zmap.c +++ b/fs/erofs/zmap.c @@ -31,7 +31,7 @@ static int z_erofs_load_full_lcluster(struct z_erofs_maprecorder *m, struct z_erofs_lcluster_index *di; unsigned int advise;
- di = erofs_read_metabuf(&m->map->buf, inode->i_sb, pos, true); + di = erofs_read_metabuf(&m->map->buf, inode->i_sb, pos); if (IS_ERR(di)) return PTR_ERR(di); m->lcn = lcn; @@ -146,7 +146,7 @@ static int z_erofs_load_compact_lcluster(struct z_erofs_maprecorder *m, else return -EOPNOTSUPP;
- in = erofs_read_metabuf(&m->map->buf, m->inode->i_sb, pos, true); + in = erofs_read_metabuf(&m->map->buf, m->inode->i_sb, pos); if (IS_ERR(in)) return PTR_ERR(in);
@@ -551,7 +551,7 @@ static int z_erofs_map_blocks_ext(struct inode *inode, map->m_flags = 0; if (recsz <= offsetof(struct z_erofs_extent, pstart_hi)) { if (recsz <= offsetof(struct z_erofs_extent, pstart_lo)) { - ext = erofs_read_metabuf(&map->buf, sb, pos, true); + ext = erofs_read_metabuf(&map->buf, sb, pos); if (IS_ERR(ext)) return PTR_ERR(ext); pa = le64_to_cpu(*(__le64 *)ext); @@ -564,7 +564,7 @@ static int z_erofs_map_blocks_ext(struct inode *inode, }
for (; lstart <= map->m_la; lstart += 1 << vi->z_lclusterbits) { - ext = erofs_read_metabuf(&map->buf, sb, pos, true); + ext = erofs_read_metabuf(&map->buf, sb, pos); if (IS_ERR(ext)) return PTR_ERR(ext); map->m_plen = le32_to_cpu(ext->plen); @@ -584,7 +584,7 @@ static int z_erofs_map_blocks_ext(struct inode *inode, for (l = 0, r = vi->z_extents; l < r; ) { mid = l + (r - l) / 2; ext = erofs_read_metabuf(&map->buf, sb, - pos + mid * recsz, true); + pos + mid * recsz); if (IS_ERR(ext)) return PTR_ERR(ext);
@@ -667,7 +667,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode) goto out_unlock;
pos = ALIGN(erofs_iloc(inode) + vi->inode_isize + vi->xattr_isize, 8); - h = erofs_read_metabuf(&buf, sb, pos, true); + h = erofs_read_metabuf(&buf, sb, pos); if (IS_ERR(h)) { err = PTR_ERR(h); goto out_unlock;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gao Xiang hsiangkao@linux.alibaba.com
[ Upstream commit df50848bcd9f17e4e60e6d5823d0e8fe8982bbab ]
There is no need to keep additional local metabufs since we already have one in `struct erofs_map_blocks`.
This was actually a leftover when applying meta buffers to zmap operations, see commit 09c543798c3c ("erofs: use meta buffers for zmap operations").
Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20250716064152.3537457-1-hsiangkao@linux.alibaba.c... Stable-dep-of: 131897c65e2b ("erofs: fix invalid algorithm for encoded extents") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/erofs/zmap.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c index 301afd9be4e30..cd33a5d29406c 100644 --- a/fs/erofs/zmap.c +++ b/fs/erofs/zmap.c @@ -641,13 +641,12 @@ static int z_erofs_map_blocks_ext(struct inode *inode, return 0; }
-static int z_erofs_fill_inode_lazy(struct inode *inode) +static int z_erofs_fill_inode(struct inode *inode, struct erofs_map_blocks *map) { struct erofs_inode *const vi = EROFS_I(inode); struct super_block *const sb = inode->i_sb; int err, headnr; erofs_off_t pos; - struct erofs_buf buf = __EROFS_BUF_INITIALIZER; struct z_erofs_map_header *h;
if (test_bit(EROFS_I_Z_INITED_BIT, &vi->flags)) { @@ -667,7 +666,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode) goto out_unlock;
pos = ALIGN(erofs_iloc(inode) + vi->inode_isize + vi->xattr_isize, 8); - h = erofs_read_metabuf(&buf, sb, pos); + h = erofs_read_metabuf(&map->buf, sb, pos); if (IS_ERR(h)) { err = PTR_ERR(h); goto out_unlock; @@ -705,7 +704,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode) erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel", headnr + 1, vi->z_algorithmtype[headnr], vi->nid); err = -EOPNOTSUPP; - goto out_put_metabuf; + goto out_unlock; }
if (!erofs_sb_has_big_pcluster(EROFS_SB(sb)) && @@ -714,7 +713,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode) erofs_err(sb, "per-inode big pcluster without sb feature for nid %llu", vi->nid); err = -EFSCORRUPTED; - goto out_put_metabuf; + goto out_unlock; } if (vi->datalayout == EROFS_INODE_COMPRESSED_COMPACT && !(vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1) ^ @@ -722,27 +721,25 @@ static int z_erofs_fill_inode_lazy(struct inode *inode) erofs_err(sb, "big pcluster head1/2 of compact indexes should be consistent for nid %llu", vi->nid); err = -EFSCORRUPTED; - goto out_put_metabuf; + goto out_unlock; }
if (vi->z_idata_size || (vi->z_advise & Z_EROFS_ADVISE_FRAGMENT_PCLUSTER)) { - struct erofs_map_blocks map = { + struct erofs_map_blocks tm = { .buf = __EROFS_BUF_INITIALIZER };
- err = z_erofs_map_blocks_fo(inode, &map, + err = z_erofs_map_blocks_fo(inode, &tm, EROFS_GET_BLOCKS_FINDTAIL); - erofs_put_metabuf(&map.buf); + erofs_put_metabuf(&tm.buf); if (err < 0) - goto out_put_metabuf; + goto out_unlock; } done: /* paired with smp_mb() at the beginning of the function */ smp_mb(); set_bit(EROFS_I_Z_INITED_BIT, &vi->flags); -out_put_metabuf: - erofs_put_metabuf(&buf); out_unlock: clear_and_wake_up_bit(EROFS_I_BL_Z_BIT, &vi->flags); return err; @@ -760,7 +757,7 @@ int z_erofs_map_blocks_iter(struct inode *inode, struct erofs_map_blocks *map, map->m_la = inode->i_size; map->m_flags = 0; } else { - err = z_erofs_fill_inode_lazy(inode); + err = z_erofs_fill_inode(inode, map); if (!err) { if (vi->datalayout == EROFS_INODE_COMPRESSED_FULL && (vi->z_advise & Z_EROFS_ADVISE_EXTENTS))
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gao Xiang hsiangkao@linux.alibaba.com
[ Upstream commit 131897c65e2b86cf14bec7379f44aa8fbb407526 ]
The current algorithm sanity checks do not properly apply to new encoded extents.
Unify the algorithm check with Z_EROFS_COMPRESSION(_RUNTIME)_MAX and ensure consistency with sbi->available_compr_algs.
Reported-and-tested-by: syzbot+5a398eb460ddaa6f242f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/68a8bd20.050a0220.37038e.005a.GAE@google.com Fixes: 1d191b4ca51d ("erofs: implement encoded extent metadata") Thanks-to: Edward Adam Davis eadavis@qq.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/erofs/zmap.c | 67 +++++++++++++++++++++++++++---------------------- 1 file changed, 37 insertions(+), 30 deletions(-)
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c index cd33a5d29406c..14d01474ad9dd 100644 --- a/fs/erofs/zmap.c +++ b/fs/erofs/zmap.c @@ -403,10 +403,10 @@ static int z_erofs_map_blocks_fo(struct inode *inode, .inode = inode, .map = map, }; - int err = 0; - unsigned int endoff, afmt; + unsigned int endoff; unsigned long initial_lcn; unsigned long long ofs, end; + int err;
ofs = flags & EROFS_GET_BLOCKS_FINDTAIL ? inode->i_size - 1 : map->m_la; if (fragment && !(flags & EROFS_GET_BLOCKS_FINDTAIL) && @@ -502,20 +502,15 @@ static int z_erofs_map_blocks_fo(struct inode *inode, err = -EFSCORRUPTED; goto unmap_out; } - afmt = vi->z_advise & Z_EROFS_ADVISE_INTERLACED_PCLUSTER ? - Z_EROFS_COMPRESSION_INTERLACED : - Z_EROFS_COMPRESSION_SHIFTED; + if (vi->z_advise & Z_EROFS_ADVISE_INTERLACED_PCLUSTER) + map->m_algorithmformat = Z_EROFS_COMPRESSION_INTERLACED; + else + map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED; + } else if (m.headtype == Z_EROFS_LCLUSTER_TYPE_HEAD2) { + map->m_algorithmformat = vi->z_algorithmtype[1]; } else { - afmt = m.headtype == Z_EROFS_LCLUSTER_TYPE_HEAD2 ? - vi->z_algorithmtype[1] : vi->z_algorithmtype[0]; - if (!(EROFS_I_SB(inode)->available_compr_algs & (1 << afmt))) { - erofs_err(sb, "inconsistent algorithmtype %u for nid %llu", - afmt, vi->nid); - err = -EFSCORRUPTED; - goto unmap_out; - } + map->m_algorithmformat = vi->z_algorithmtype[0]; } - map->m_algorithmformat = afmt;
if ((flags & EROFS_GET_BLOCKS_FIEMAP) || ((flags & EROFS_GET_BLOCKS_READMORE) && @@ -645,9 +640,9 @@ static int z_erofs_fill_inode(struct inode *inode, struct erofs_map_blocks *map) { struct erofs_inode *const vi = EROFS_I(inode); struct super_block *const sb = inode->i_sb; - int err, headnr; - erofs_off_t pos; struct z_erofs_map_header *h; + erofs_off_t pos; + int err = 0;
if (test_bit(EROFS_I_Z_INITED_BIT, &vi->flags)) { /* @@ -661,7 +656,6 @@ static int z_erofs_fill_inode(struct inode *inode, struct erofs_map_blocks *map) if (wait_on_bit_lock(&vi->flags, EROFS_I_BL_Z_BIT, TASK_KILLABLE)) return -ERESTARTSYS;
- err = 0; if (test_bit(EROFS_I_Z_INITED_BIT, &vi->flags)) goto out_unlock;
@@ -698,15 +692,6 @@ static int z_erofs_fill_inode(struct inode *inode, struct erofs_map_blocks *map) else if (vi->z_advise & Z_EROFS_ADVISE_INLINE_PCLUSTER) vi->z_idata_size = le16_to_cpu(h->h_idata_size);
- headnr = 0; - if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX || - vi->z_algorithmtype[++headnr] >= Z_EROFS_COMPRESSION_MAX) { - erofs_err(sb, "unknown HEAD%u format %u for nid %llu, please upgrade kernel", - headnr + 1, vi->z_algorithmtype[headnr], vi->nid); - err = -EOPNOTSUPP; - goto out_unlock; - } - if (!erofs_sb_has_big_pcluster(EROFS_SB(sb)) && vi->z_advise & (Z_EROFS_ADVISE_BIG_PCLUSTER_1 | Z_EROFS_ADVISE_BIG_PCLUSTER_2)) { @@ -745,6 +730,30 @@ static int z_erofs_fill_inode(struct inode *inode, struct erofs_map_blocks *map) return err; }
+static int z_erofs_map_sanity_check(struct inode *inode, + struct erofs_map_blocks *map) +{ + struct erofs_sb_info *sbi = EROFS_I_SB(inode); + + if (!(map->m_flags & EROFS_MAP_ENCODED)) + return 0; + if (unlikely(map->m_algorithmformat >= Z_EROFS_COMPRESSION_RUNTIME_MAX)) { + erofs_err(inode->i_sb, "unknown algorithm %d @ pos %llu for nid %llu, please upgrade kernel", + map->m_algorithmformat, map->m_la, EROFS_I(inode)->nid); + return -EOPNOTSUPP; + } + if (unlikely(map->m_algorithmformat < Z_EROFS_COMPRESSION_MAX && + !(sbi->available_compr_algs & (1 << map->m_algorithmformat)))) { + erofs_err(inode->i_sb, "inconsistent algorithmtype %u for nid %llu", + map->m_algorithmformat, EROFS_I(inode)->nid); + return -EFSCORRUPTED; + } + if (unlikely(map->m_plen > Z_EROFS_PCLUSTER_MAX_SIZE || + map->m_llen > Z_EROFS_PCLUSTER_MAX_DSIZE)) + return -EOPNOTSUPP; + return 0; +} + int z_erofs_map_blocks_iter(struct inode *inode, struct erofs_map_blocks *map, int flags) { @@ -765,10 +774,8 @@ int z_erofs_map_blocks_iter(struct inode *inode, struct erofs_map_blocks *map, else err = z_erofs_map_blocks_fo(inode, map, flags); } - if (!err && (map->m_flags & EROFS_MAP_ENCODED) && - unlikely(map->m_plen > Z_EROFS_PCLUSTER_MAX_SIZE || - map->m_llen > Z_EROFS_PCLUSTER_MAX_DSIZE)) - err = -EOPNOTSUPP; + if (!err) + err = z_erofs_map_sanity_check(inode, map); if (err) map->m_llen = 0; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anders Roxell anders.roxell@linaro.org
[ Upstream commit e63419dbf2ceb083c1651852209c7f048089ac0f ]
Fix a critical memory allocation bug in edma_setup_from_hw() where queue_priority_map was allocated with insufficient memory. The code declared queue_priority_map as s8 (*)[2] (pointer to array of 2 s8), but allocated memory using sizeof(s8) instead of the correct size.
This caused out-of-bounds memory writes when accessing: queue_priority_map[i][0] = i; queue_priority_map[i][1] = i;
The bug manifested as kernel crashes with "Oops - undefined instruction" on ARM platforms (BeagleBoard-X15) during EDMA driver probe, as the memory corruption triggered kernel hardening features on Clang.
Change the allocation to use sizeof(*queue_priority_map) which automatically gets the correct size for the 2D array structure.
Fixes: 2b6b3b742019 ("ARM/dmaengine: edma: Merge the two drivers under drivers/dma/") Signed-off-by: Anders Roxell anders.roxell@linaro.org Link: https://lore.kernel.org/r/20250830094953.3038012-1-anders.roxell@linaro.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/ti/edma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/ti/edma.c b/drivers/dma/ti/edma.c index 3ed406f08c442..552be71db6c47 100644 --- a/drivers/dma/ti/edma.c +++ b/drivers/dma/ti/edma.c @@ -2064,8 +2064,8 @@ static int edma_setup_from_hw(struct device *dev, struct edma_soc_info *pdata, * priority. So Q0 is the highest priority queue and the last queue has * the lowest priority. */ - queue_priority_map = devm_kcalloc(dev, ecc->num_tc + 1, sizeof(s8), - GFP_KERNEL); + queue_priority_map = devm_kcalloc(dev, ecc->num_tc + 1, + sizeof(*queue_priority_map), GFP_KERNEL); if (!queue_priority_map) return -ENOMEM;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Kemnade akemnade@kernel.org
[ Upstream commit c05d0b32eebadc8be6e53196e99c64cf2bed1d99 ]
Attach the power good gpio to the regulator device devres instead of the parent device to fix problems if probe is run multiple times (rmmod/insmod or some deferral).
Fixes: 8c485bedfb785 ("regulator: sy7636a: Initial commit") Signed-off-by: Andreas Kemnade akemnade@kernel.org Reviewed-by: Alistair Francis alistair@alistair23.me Reviewed-by: Peng Fan peng.fan@nxp.com Message-ID: 20250906-sy7636-rsrc-v1-2-e2886a9763a7@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/sy7636a-regulator.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/regulator/sy7636a-regulator.c b/drivers/regulator/sy7636a-regulator.c index d1e7ba1fb3e1a..27e3d939b7bb9 100644 --- a/drivers/regulator/sy7636a-regulator.c +++ b/drivers/regulator/sy7636a-regulator.c @@ -83,9 +83,11 @@ static int sy7636a_regulator_probe(struct platform_device *pdev) if (!regmap) return -EPROBE_DEFER;
- gdp = devm_gpiod_get(pdev->dev.parent, "epd-pwr-good", GPIOD_IN); + device_set_of_node_from_dev(&pdev->dev, pdev->dev.parent); + + gdp = devm_gpiod_get(&pdev->dev, "epd-pwr-good", GPIOD_IN); if (IS_ERR(gdp)) { - dev_err(pdev->dev.parent, "Power good GPIO fault %ld\n", PTR_ERR(gdp)); + dev_err(&pdev->dev, "Power good GPIO fault %ld\n", PTR_ERR(gdp)); return PTR_ERR(gdp); }
@@ -105,7 +107,6 @@ static int sy7636a_regulator_probe(struct platform_device *pdev) }
config.dev = &pdev->dev; - config.dev->of_node = pdev->dev.parent->of_node; config.regmap = regmap;
rdev = devm_regulator_register(&pdev->dev, &desc, &config);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuezhang Mo Yuezhang.Mo@sony.com
[ Upstream commit 181993bb0d626cf88cc803f4356ce5c5abe86278 ]
Commit 0e2f80afcfa6("fs/dax: ensure all pages are idle prior to filesystem unmount") introduced the WARN_ON_ONCE to capture whether the filesystem has removed all DAX entries or not and applied the fix to xfs and ext4.
Apply the missed fix on erofs to fix the runtime warning:
[ 5.266254] ------------[ cut here ]------------ [ 5.266274] WARNING: CPU: 6 PID: 3109 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0xff/0x260 [ 5.266294] Modules linked in: [ 5.266999] CPU: 6 UID: 0 PID: 3109 Comm: umount Tainted: G S 6.16.0+ #6 PREEMPT(voluntary) [ 5.267012] Tainted: [S]=CPU_OUT_OF_SPEC [ 5.267017] Hardware name: Dell Inc. OptiPlex 5000/05WXFV, BIOS 1.5.1 08/24/2022 [ 5.267024] RIP: 0010:truncate_folio_batch_exceptionals+0xff/0x260 [ 5.267076] Code: 00 00 41 39 df 7f 11 eb 78 83 c3 01 49 83 c4 08 41 39 df 74 6c 48 63 f3 48 83 fe 1f 0f 83 3c 01 00 00 43 f6 44 26 08 01 74 df <0f> 0b 4a 8b 34 22 4c 89 ef 48 89 55 90 e8 ff 54 1f 00 48 8b 55 90 [ 5.267083] RSP: 0018:ffffc900013f36c8 EFLAGS: 00010202 [ 5.267095] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 5.267101] RDX: ffffc900013f3790 RSI: 0000000000000000 RDI: ffff8882a1407898 [ 5.267108] RBP: ffffc900013f3740 R08: 0000000000000000 R09: 0000000000000000 [ 5.267113] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 5.267119] R13: ffff8882a1407ab8 R14: ffffc900013f3888 R15: 0000000000000001 [ 5.267125] FS: 00007aaa8b437800(0000) GS:ffff88850025b000(0000) knlGS:0000000000000000 [ 5.267132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.267138] CR2: 00007aaa8b3aac10 CR3: 000000024f764000 CR4: 0000000000f52ef0 [ 5.267144] PKRU: 55555554 [ 5.267150] Call Trace: [ 5.267154] <TASK> [ 5.267181] truncate_inode_pages_range+0x118/0x5e0 [ 5.267193] ? save_trace+0x54/0x390 [ 5.267296] truncate_inode_pages_final+0x43/0x60 [ 5.267309] evict+0x2a4/0x2c0 [ 5.267339] dispose_list+0x39/0x80 [ 5.267352] evict_inodes+0x150/0x1b0 [ 5.267376] generic_shutdown_super+0x41/0x180 [ 5.267390] kill_block_super+0x1b/0x50 [ 5.267402] erofs_kill_sb+0x81/0x90 [erofs] [ 5.267436] deactivate_locked_super+0x32/0xb0 [ 5.267450] deactivate_super+0x46/0x60 [ 5.267460] cleanup_mnt+0xc3/0x170 [ 5.267475] __cleanup_mnt+0x12/0x20 [ 5.267485] task_work_run+0x5d/0xb0 [ 5.267499] exit_to_user_mode_loop+0x144/0x170 [ 5.267512] do_syscall_64+0x2b9/0x7c0 [ 5.267523] ? __lock_acquire+0x665/0x2ce0 [ 5.267535] ? __lock_acquire+0x665/0x2ce0 [ 5.267560] ? lock_acquire+0xcd/0x300 [ 5.267573] ? find_held_lock+0x31/0x90 [ 5.267582] ? mntput_no_expire+0x97/0x4e0 [ 5.267606] ? mntput_no_expire+0xa1/0x4e0 [ 5.267625] ? mntput+0x24/0x50 [ 5.267634] ? path_put+0x1e/0x30 [ 5.267647] ? do_faccessat+0x120/0x2f0 [ 5.267677] ? do_syscall_64+0x1a2/0x7c0 [ 5.267686] ? from_kgid_munged+0x17/0x30 [ 5.267703] ? from_kuid_munged+0x13/0x30 [ 5.267711] ? __do_sys_getuid+0x3d/0x50 [ 5.267724] ? do_syscall_64+0x1a2/0x7c0 [ 5.267732] ? irqentry_exit+0x77/0xb0 [ 5.267743] ? clear_bhb_loop+0x30/0x80 [ 5.267752] ? clear_bhb_loop+0x30/0x80 [ 5.267765] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.267772] RIP: 0033:0x7aaa8b32a9fb [ 5.267781] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 83 0d 00 f7 d8 [ 5.267787] RSP: 002b:00007ffd7c4c9468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 5.267796] RAX: 0000000000000000 RBX: 00005a61592a8b00 RCX: 00007aaa8b32a9fb [ 5.267802] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005a61592b2080 [ 5.267806] RBP: 00007ffd7c4c9540 R08: 00007aaa8b403b20 R09: 0000000000000020 [ 5.267812] R10: 0000000000000001 R11: 0000000000000246 R12: 00005a61592a8c00 [ 5.267817] R13: 0000000000000000 R14: 00005a61592b2080 R15: 00005a61592a8f10 [ 5.267849] </TASK> [ 5.267854] irq event stamp: 4721 [ 5.267859] hardirqs last enabled at (4727): [<ffffffff814abf50>] __up_console_sem+0x90/0xa0 [ 5.267873] hardirqs last disabled at (4732): [<ffffffff814abf35>] __up_console_sem+0x75/0xa0 [ 5.267884] softirqs last enabled at (3044): [<ffffffff8132adb3>] kernel_fpu_end+0x53/0x70 [ 5.267895] softirqs last disabled at (3042): [<ffffffff8132b5f4>] kernel_fpu_begin_mask+0xc4/0x120 [ 5.267905] ---[ end trace 0000000000000000 ]---
Fixes: bde708f1a65d ("fs/dax: always remove DAX page-cache entries when breaking layouts") Signed-off-by: Yuezhang Mo Yuezhang.Mo@sony.com Reviewed-by: Friendy Su friendy.su@sony.com Reviewed-by: Daniel Palmer daniel.palmer@sony.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/erofs/super.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 7cc74ef4be031..06c8981eea7f8 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -999,10 +999,22 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root) return 0; }
+static void erofs_evict_inode(struct inode *inode) +{ +#ifdef CONFIG_FS_DAX + if (IS_DAX(inode)) + dax_break_layout_final(inode); +#endif + + truncate_inode_pages_final(&inode->i_data); + clear_inode(inode); +} + const struct super_operations erofs_sops = { .put_super = erofs_put_super, .alloc_inode = erofs_alloc_inode, .free_inode = erofs_free_inode, + .evict_inode = erofs_evict_inode, .statfs = erofs_statfs, .show_options = erofs_show_options, };
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 220a0ffde02f962c13bc752b01aa570b8c65a37b upstream.
Decouple allocation of endpoint ring buffer from initialization of the buffer, and initialization of endpoint context parts from from the rest of the contexts.
It allows driver to clear up and reinitialize endpoint rings after disconnect without reallocating everything.
This is a prerequisite for the next patch that prevents the transfer ring from filling up with cancelled (no-op) TRBs if a debug cable is reconnected several times without transferring anything.
Cc: stable@vger.kernel.org Fixes: dfba2174dc42 ("usb: xhci: Add DbC support in xHCI driver") Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20250902105306.877476-2-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-dbgcap.c | 71 ++++++++++++++++++++++++++--------------- 1 file changed, 46 insertions(+), 25 deletions(-)
--- a/drivers/usb/host/xhci-dbgcap.c +++ b/drivers/usb/host/xhci-dbgcap.c @@ -101,13 +101,34 @@ static u32 xhci_dbc_populate_strings(str return string_length; }
+static void xhci_dbc_init_ep_contexts(struct xhci_dbc *dbc) +{ + struct xhci_ep_ctx *ep_ctx; + unsigned int max_burst; + dma_addr_t deq; + + max_burst = DBC_CTRL_MAXBURST(readl(&dbc->regs->control)); + + /* Populate bulk out endpoint context: */ + ep_ctx = dbc_bulkout_ctx(dbc); + deq = dbc_bulkout_enq(dbc); + ep_ctx->ep_info = 0; + ep_ctx->ep_info2 = dbc_epctx_info2(BULK_OUT_EP, 1024, max_burst); + ep_ctx->deq = cpu_to_le64(deq | dbc->ring_out->cycle_state); + + /* Populate bulk in endpoint context: */ + ep_ctx = dbc_bulkin_ctx(dbc); + deq = dbc_bulkin_enq(dbc); + ep_ctx->ep_info = 0; + ep_ctx->ep_info2 = dbc_epctx_info2(BULK_IN_EP, 1024, max_burst); + ep_ctx->deq = cpu_to_le64(deq | dbc->ring_in->cycle_state); +} + static void xhci_dbc_init_contexts(struct xhci_dbc *dbc, u32 string_length) { struct dbc_info_context *info; - struct xhci_ep_ctx *ep_ctx; u32 dev_info; - dma_addr_t deq, dma; - unsigned int max_burst; + dma_addr_t dma;
if (!dbc) return; @@ -121,20 +142,8 @@ static void xhci_dbc_init_contexts(struc info->serial = cpu_to_le64(dma + DBC_MAX_STRING_LENGTH * 3); info->length = cpu_to_le32(string_length);
- /* Populate bulk out endpoint context: */ - ep_ctx = dbc_bulkout_ctx(dbc); - max_burst = DBC_CTRL_MAXBURST(readl(&dbc->regs->control)); - deq = dbc_bulkout_enq(dbc); - ep_ctx->ep_info = 0; - ep_ctx->ep_info2 = dbc_epctx_info2(BULK_OUT_EP, 1024, max_burst); - ep_ctx->deq = cpu_to_le64(deq | dbc->ring_out->cycle_state); - - /* Populate bulk in endpoint context: */ - ep_ctx = dbc_bulkin_ctx(dbc); - deq = dbc_bulkin_enq(dbc); - ep_ctx->ep_info = 0; - ep_ctx->ep_info2 = dbc_epctx_info2(BULK_IN_EP, 1024, max_burst); - ep_ctx->deq = cpu_to_le64(deq | dbc->ring_in->cycle_state); + /* Populate bulk in and out endpoint contexts: */ + xhci_dbc_init_ep_contexts(dbc);
/* Set DbC context and info registers: */ lo_hi_writeq(dbc->ctx->dma, &dbc->regs->dccp); @@ -436,6 +445,23 @@ dbc_alloc_ctx(struct device *dev, gfp_t return ctx; }
+static void xhci_dbc_ring_init(struct xhci_ring *ring) +{ + struct xhci_segment *seg = ring->first_seg; + + /* clear all trbs on ring in case of old ring */ + memset(seg->trbs, 0, TRB_SEGMENT_SIZE); + + /* Only event ring does not use link TRB */ + if (ring->type != TYPE_EVENT) { + union xhci_trb *trb = &seg->trbs[TRBS_PER_SEGMENT - 1]; + + trb->link.segment_ptr = cpu_to_le64(ring->first_seg->dma); + trb->link.control = cpu_to_le32(LINK_TOGGLE | TRB_TYPE(TRB_LINK)); + } + xhci_initialize_ring_info(ring); +} + static struct xhci_ring * xhci_dbc_ring_alloc(struct device *dev, enum xhci_ring_type type, gfp_t flags) { @@ -464,15 +490,10 @@ xhci_dbc_ring_alloc(struct device *dev,
seg->dma = dma;
- /* Only event ring does not use link TRB */ - if (type != TYPE_EVENT) { - union xhci_trb *trb = &seg->trbs[TRBS_PER_SEGMENT - 1]; - - trb->link.segment_ptr = cpu_to_le64(dma); - trb->link.control = cpu_to_le32(LINK_TOGGLE | TRB_TYPE(TRB_LINK)); - } INIT_LIST_HEAD(&ring->td_list); - xhci_initialize_ring_info(ring); + + xhci_dbc_ring_init(ring); + return ring; dma_fail: kfree(seg);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit a5c98e8b1398534ae1feb6e95e2d3ee5215538ed upstream.
Pending requests will be flushed on disconnect, and the corresponding TRBs will be turned into No-op TRBs, which are ignored by the xHC controller once it starts processing the ring.
If the USB debug cable repeatedly disconnects before ring is started then the ring will eventually be filled with No-op TRBs. No new transfers can be queued when the ring is full, and driver will print the following error message:
"xhci_hcd 0000:00:14.0: failed to queue trbs"
This is a normal case for 'in' transfers where TRBs are always enqueued in advance, ready to take on incoming data. If no data arrives, and device is disconnected, then ring dequeue will remain at beginning of the ring while enqueue points to first free TRB after last cancelled No-op TRB. s Solve this by reinitializing the rings when the debug cable disconnects and DbC is leaving the configured state. Clear the whole ring buffer and set enqueue and dequeue to the beginning of ring, and set cycle bit to its initial state.
Cc: stable@vger.kernel.org Fixes: dfba2174dc42 ("usb: xhci: Add DbC support in xHCI driver") Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20250902105306.877476-3-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-dbgcap.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-)
--- a/drivers/usb/host/xhci-dbgcap.c +++ b/drivers/usb/host/xhci-dbgcap.c @@ -462,6 +462,25 @@ static void xhci_dbc_ring_init(struct xh xhci_initialize_ring_info(ring); }
+static int xhci_dbc_reinit_ep_rings(struct xhci_dbc *dbc) +{ + struct xhci_ring *in_ring = dbc->eps[BULK_IN].ring; + struct xhci_ring *out_ring = dbc->eps[BULK_OUT].ring; + + if (!in_ring || !out_ring || !dbc->ctx) { + dev_warn(dbc->dev, "Can't re-init unallocated endpoints\n"); + return -ENODEV; + } + + xhci_dbc_ring_init(in_ring); + xhci_dbc_ring_init(out_ring); + + /* set ep context enqueue, dequeue, and cycle to initial values */ + xhci_dbc_init_ep_contexts(dbc); + + return 0; +} + static struct xhci_ring * xhci_dbc_ring_alloc(struct device *dev, enum xhci_ring_type type, gfp_t flags) { @@ -885,7 +904,7 @@ static enum evtreturn xhci_dbc_do_handle dev_info(dbc->dev, "DbC cable unplugged\n"); dbc->state = DS_ENABLED; xhci_dbc_flush_requests(dbc); - + xhci_dbc_reinit_ep_rings(dbc); return EVT_DISC; }
@@ -895,7 +914,7 @@ static enum evtreturn xhci_dbc_do_handle writel(portsc, &dbc->regs->portsc); dbc->state = DS_ENABLED; xhci_dbc_flush_requests(dbc); - + xhci_dbc_reinit_ep_rings(dbc); return EVT_DISC; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit edcbe06453ddfde21f6aa763f7cab655f26133cc upstream.
Suspend-resume cycle test revealed a memory leak in 6.17-rc3
Turns out the slot_id race fix changes accidentally ends up calling xhci_free_virt_device() with an incorrect vdev parameter. The vdev variable was reused for temporary purposes right before calling xhci_free_virt_device().
Fix this by passing the correct vdev parameter.
The slot_id race fix that caused this regression was targeted for stable, so this needs to be applied there as well.
Fixes: 2eb03376151b ("usb: xhci: Fix slot_id resource race conflict") Reported-by: David Wang 00107082@163.com Closes: https://lore.kernel.org/linux-usb/20250829181354.4450-1-00107082@163.com Suggested-by: Michal Pecio michal.pecio@gmail.com Suggested-by: David Wang 00107082@163.com Cc: stable@vger.kernel.org Tested-by: David Wang 00107082@163.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20250902105306.877476-4-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -962,7 +962,7 @@ static void xhci_free_virt_devices_depth out: /* we are now at a leaf device */ xhci_debugfs_remove_slot(xhci, slot_id); - xhci_free_virt_device(xhci, vdev, slot_id); + xhci_free_virt_device(xhci, xhci->devs[slot_id], slot_id); }
int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Stern stern@rowland.harvard.edu
commit 8d63c83d8eb922f6c316320f50c82fa88d099bea upstream.
Yunseong Kim and the syzbot fuzzer both reported a problem in RT-enabled kernels caused by the way dummy-hcd mixes interrupt management and spin-locking. The pattern was:
local_irq_save(flags); spin_lock(&dum->lock); ... spin_unlock(&dum->lock); ... // calls usb_gadget_giveback_request() local_irq_restore(flags);
The code was written this way because usb_gadget_giveback_request() needs to be called with interrupts disabled and the private lock not held.
While this pattern works fine in non-RT kernels, it's not good when RT is enabled. RT kernels handle spinlocks much like mutexes; in particular, spin_lock() may sleep. But sleeping is not allowed while local interrupts are disabled.
To fix the problem, rewrite the code to conform to the pattern used elsewhere in dummy-hcd and other UDC drivers:
spin_lock_irqsave(&dum->lock, flags); ... spin_unlock(&dum->lock); usb_gadget_giveback_request(...); spin_lock(&dum->lock); ... spin_unlock_irqrestore(&dum->lock, flags);
This approach satisfies the RT requirements.
Signed-off-by: Alan Stern stern@rowland.harvard.edu Cc: stable stable@kernel.org Fixes: b4dbda1a22d2 ("USB: dummy-hcd: disable interrupts during req->complete") Reported-by: Yunseong Kim ysk@kzalloc.com Closes: https://lore.kernel.org/linux-usb/5b337389-73b9-4ee4-a83e-7e82bf5af87a@kzalloc.com/ Reported-by: syzbot+8baacc4139f12fa77909@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/68ac2411.050a0220.37038e.0087.GAE@google.com/ Tested-by: syzbot+8baacc4139f12fa77909@syzkaller.appspotmail.com CC: Sebastian Andrzej Siewior bigeasy@linutronix.de CC: stable@vger.kernel.org Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Link: https://lore.kernel.org/r/bb192ae2-4eee-48ee-981f-3efdbbd0d8f0@rowland.harva... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/udc/dummy_hcd.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/usb/gadget/udc/dummy_hcd.c +++ b/drivers/usb/gadget/udc/dummy_hcd.c @@ -765,8 +765,7 @@ static int dummy_dequeue(struct usb_ep * if (!dum->driver) return -ESHUTDOWN;
- local_irq_save(flags); - spin_lock(&dum->lock); + spin_lock_irqsave(&dum->lock, flags); list_for_each_entry(iter, &ep->queue, queue) { if (&iter->req != _req) continue; @@ -776,15 +775,16 @@ static int dummy_dequeue(struct usb_ep * retval = 0; break; } - spin_unlock(&dum->lock);
if (retval == 0) { dev_dbg(udc_dev(dum), "dequeued req %p from %s, len %d buf %p\n", req, _ep->name, _req->length, _req->buf); + spin_unlock(&dum->lock); usb_gadget_giveback_request(_ep, _req); + spin_lock(&dum->lock); } - local_irq_restore(flags); + spin_unlock_irqrestore(&dum->lock, flags); return retval; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: RD Babiera rdbabiera@google.com
commit f34bfcc77b18375a87091c289c2eb53c249787b4 upstream.
tcpm_handle_vdm_request delivers messages to the partner altmode or the cable altmode depending on the SVDM response type, which is incorrect. The partner or cable should be chosen based on the received message type instead.
Also add this filter to ADEV_NOTIFY_USB_AND_QUEUE_VDM, which is used when the Enter Mode command is responded to by a NAK on SOP or SOP' and when the Exit Mode command is responded to by an ACK on SOP.
Fixes: 7e7877c55eb1 ("usb: typec: tcpm: add alt mode enter/exit/vdm support for sop'") Cc: stable@vger.kernel.org Signed-off-by: RD Babiera rdbabiera@google.com Reviewed-by: Badhri Jagan Sridharan badhri@google.com Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20250821203759.1720841-2-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -2426,17 +2426,21 @@ static void tcpm_handle_vdm_request(stru case ADEV_NONE: break; case ADEV_NOTIFY_USB_AND_QUEUE_VDM: - WARN_ON(typec_altmode_notify(adev, TYPEC_STATE_USB, NULL)); - typec_altmode_vdm(adev, p[0], &p[1], cnt); + if (rx_sop_type == TCPC_TX_SOP_PRIME) { + typec_cable_altmode_vdm(adev, TYPEC_PLUG_SOP_P, p[0], &p[1], cnt); + } else { + WARN_ON(typec_altmode_notify(adev, TYPEC_STATE_USB, NULL)); + typec_altmode_vdm(adev, p[0], &p[1], cnt); + } break; case ADEV_QUEUE_VDM: - if (response_tx_sop_type == TCPC_TX_SOP_PRIME) + if (rx_sop_type == TCPC_TX_SOP_PRIME) typec_cable_altmode_vdm(adev, TYPEC_PLUG_SOP_P, p[0], &p[1], cnt); else typec_altmode_vdm(adev, p[0], &p[1], cnt); break; case ADEV_QUEUE_VDM_SEND_EXIT_MODE_ON_FAIL: - if (response_tx_sop_type == TCPC_TX_SOP_PRIME) { + if (rx_sop_type == TCPC_TX_SOP_PRIME) { if (typec_cable_altmode_vdm(adev, TYPEC_PLUG_SOP_P, p[0], &p[1], cnt)) { int svdm_version = typec_get_cable_svdm_version(
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 21d8525d2e061cde034277d518411b02eac764e2 upstream.
The gadget card driver forgot to call snd_ump_update_group_attrs() after adding FBs, and this leaves the UMP group attributes uninitialized. As a result, -ENODEV error is returned at opening a legacy rawmidi device as an inactive group.
This patch adds the missing call to address the behavior above.
Fixes: 8b645922b223 ("usb: gadget: Add support for USB MIDI 2.0 function driver") Cc: stable stable@kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20250904153932.13589-1-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/f_midi2.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/gadget/function/f_midi2.c +++ b/drivers/usb/gadget/function/f_midi2.c @@ -1599,6 +1599,7 @@ static int f_midi2_create_card(struct f_ strscpy(fb->info.name, ump_fb_name(b), sizeof(fb->info.name)); } + snd_ump_update_group_attrs(ump); }
for (i = 0; i < midi2->num_eps; i++) {
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 116e79c679a1530cf833d0ff3007061d7a716bd9 upstream.
The EP-IN of MIDI2 (altset 1) wasn't initialized in f_midi2_create_usb_configs() as it's an INT EP unlike others BULK EPs. But this leaves rather the max packet size unchanged no matter which speed is used, resulting in the very slow access. And the wMaxPacketSize values set there look legit for INT EPs, so let's initialize the MIDI2 EP-IN there for achieving the equivalent speed as well.
Fixes: 8b645922b223 ("usb: gadget: Add support for USB MIDI 2.0 function driver") Cc: stable stable@kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20250905133240.20966-1-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/f_midi2.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/usb/gadget/function/f_midi2.c +++ b/drivers/usb/gadget/function/f_midi2.c @@ -1737,9 +1737,12 @@ static int f_midi2_create_usb_configs(st case USB_SPEED_HIGH: midi2_midi1_ep_out_desc.wMaxPacketSize = cpu_to_le16(512); midi2_midi1_ep_in_desc.wMaxPacketSize = cpu_to_le16(512); - for (i = 0; i < midi2->num_eps; i++) + for (i = 0; i < midi2->num_eps; i++) { midi2_midi2_ep_out_desc[i].wMaxPacketSize = cpu_to_le16(512); + midi2_midi2_ep_in_desc[i].wMaxPacketSize = + cpu_to_le16(512); + } fallthrough; case USB_SPEED_FULL: midi1_in_eps = midi2_midi1_ep_in_descs; @@ -1748,9 +1751,12 @@ static int f_midi2_create_usb_configs(st case USB_SPEED_SUPER: midi2_midi1_ep_out_desc.wMaxPacketSize = cpu_to_le16(1024); midi2_midi1_ep_in_desc.wMaxPacketSize = cpu_to_le16(1024); - for (i = 0; i < midi2->num_eps; i++) + for (i = 0; i < midi2->num_eps; i++) { midi2_midi2_ep_out_desc[i].wMaxPacketSize = cpu_to_le16(1024); + midi2_midi2_ep_in_desc[i].wMaxPacketSize = + cpu_to_le16(1024); + } midi1_in_eps = midi2_midi1_ep_in_ss_descs; midi1_out_eps = midi2_midi1_ep_out_ss_descs; break;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephan Gerhold stephan.gerhold@linaro.org
commit 5068b5254812433e841a40886e695633148d362d upstream.
When we don't have a clock specified in the device tree, we have no way to ensure the BAM is on. This is often the case for remotely-controlled or remotely-powered BAM instances. In this case, we need to read num-channels from the DT to have all the necessary information to complete probing.
However, at the moment invalid device trees without clock and without num-channels still continue probing, because the error handling is missing return statements. The driver will then later try to read the number of channels from the registers. This is unsafe, because it relies on boot firmware and lucky timing to succeed. Unfortunately, the lack of proper error handling here has been abused for several Qualcomm SoCs upstream, causing early boot crashes in several situations [1, 2].
Avoid these early crashes by erroring out when any of the required DT properties are missing. Note that this will break some of the existing DTs upstream (mainly BAM instances related to the crypto engine). However, clearly these DTs have never been tested properly, since the error in the kernel log was just ignored. It's safer to disable the crypto engine for these broken DTBs.
[1]: https://lore.kernel.org/r/CY01EKQVWE36.B9X5TDXAREPF@fairphone.com/ [2]: https://lore.kernel.org/r/20230626145959.646747-1-krzysztof.kozlowski@linaro...
Cc: stable@vger.kernel.org Fixes: 48d163b1aa6e ("dmaengine: qcom: bam_dma: get num-channels and num-ees from dt") Signed-off-by: Stephan Gerhold stephan.gerhold@linaro.org Reviewed-by: Konrad Dybcio konrad.dybcio@oss.qualcomm.com Link: https://lore.kernel.org/r/20250212-bam-dma-fixes-v1-8-f560889e65d8@linaro.or... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/qcom/bam_dma.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/dma/qcom/bam_dma.c +++ b/drivers/dma/qcom/bam_dma.c @@ -1283,13 +1283,17 @@ static int bam_dma_probe(struct platform if (!bdev->bamclk) { ret = of_property_read_u32(pdev->dev.of_node, "num-channels", &bdev->num_channels); - if (ret) + if (ret) { dev_err(bdev->dev, "num-channels unspecified in dt\n"); + return ret; + }
ret = of_property_read_u32(pdev->dev.of_node, "qcom,num-ees", &bdev->num_ees); - if (ret) + if (ret) { dev_err(bdev->dev, "num-ees unspecified in dt\n"); + return ret; + } }
ret = clk_prepare_enable(bdev->bamclk);
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaoqian Lin linmq006@gmail.com
commit aa2e1e4563d3ab689ffa86ca1412ecbf9fd3b308 upstream.
The reference taken by of_find_device_by_node() must be released when not needed anymore. Add missing put_device() call to fix device reference leaks.
Fixes: 134d9c52fca2 ("dmaengine: dw: dmamux: Introduce RZN1 DMA router support") Cc: stable@vger.kernel.org Signed-off-by: Miaoqian Lin linmq006@gmail.com Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/r/20250902090358.2423285-1-linmq006@gmail.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/dw/rzn1-dmamux.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
--- a/drivers/dma/dw/rzn1-dmamux.c +++ b/drivers/dma/dw/rzn1-dmamux.c @@ -48,12 +48,16 @@ static void *rzn1_dmamux_route_allocate( u32 mask; int ret;
- if (dma_spec->args_count != RNZ1_DMAMUX_NCELLS) - return ERR_PTR(-EINVAL); + if (dma_spec->args_count != RNZ1_DMAMUX_NCELLS) { + ret = -EINVAL; + goto put_device; + }
map = kzalloc(sizeof(*map), GFP_KERNEL); - if (!map) - return ERR_PTR(-ENOMEM); + if (!map) { + ret = -ENOMEM; + goto put_device; + }
chan = dma_spec->args[0]; map->req_idx = dma_spec->args[4]; @@ -94,12 +98,15 @@ static void *rzn1_dmamux_route_allocate( if (ret) goto clear_bitmap;
+ put_device(&pdev->dev); return map;
clear_bitmap: clear_bit(map->req_idx, dmamux->used_chans); free_map: kfree(map); +put_device: + put_device(&pdev->dev);
return ERR_PTR(ret); }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephan Gerhold stephan.gerhold@linaro.org
commit 6cb8c1f957f674ca20b7d7c96b1f1bb11b83b679 upstream.
Commit 0cc22f5a861c ("phy: qcom: qmp-pcie: Add PHY register retention support") added support for using the "no_csr" reset to skip configuration of the PHY if the init sequence was already applied by the boot firmware. The expectation is that the PHY is only turned on/off by using the "no_csr" reset, instead of powering it down and re-programming it after a full reset.
The boot firmware on X1E does not fully conform to this expectation: If the PCIe3 link fails to come up (e.g. because no PCIe card is inserted), the firmware powers down the PHY using the QPHY_PCS_POWER_DOWN_CONTROL register. The QPHY_START_CTRL register is kept as-is, so the driver assumes the PHY is already initialized and skips the configuration/power up sequence. The PHY won't come up again without clearing the QPHY_PCS_POWER_DOWN_CONTROL, so eventually initialization fails:
qcom-qmp-pcie-phy 1be0000.phy: phy initialization timed-out phy phy-1be0000.phy.0: phy poweron failed --> -110 qcom-pcie 1bd0000.pcie: cannot initialize host qcom-pcie 1bd0000.pcie: probe with driver qcom-pcie failed with error -110
This can be reliably reproduced on the X1E CRD, QCP and Devkit when no card is inserted for PCIe3.
Fix this by checking the QPHY_PCS_POWER_DOWN_CONTROL register in addition to QPHY_START_CTRL. If the PHY is powered down with the register, it doesn't conform to the expectations for using the "no_csr" reset, so we fully re-initialize with the normal reset sequence.
Also check the register more carefully to ensure all of the bits we expect are actually set. A simple !!(readl()) is not enough, because the PHY might be only partially set up with some of the expected bits set.
Cc: stable@vger.kernel.org Fixes: 0cc22f5a861c ("phy: qcom: qmp-pcie: Add PHY register retention support") Signed-off-by: Stephan Gerhold stephan.gerhold@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@oss.qualcomm.com Link: https://lore.kernel.org/r/20250821-phy-qcom-qmp-pcie-nocsr-fix-v3-1-4898db0c... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/qualcomm/phy-qcom-qmp-pcie.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-)
--- a/drivers/phy/qualcomm/phy-qcom-qmp-pcie.c +++ b/drivers/phy/qualcomm/phy-qcom-qmp-pcie.c @@ -3064,6 +3064,14 @@ struct qmp_pcie { struct clk_fixed_rate aux_clk_fixed; };
+static bool qphy_checkbits(const void __iomem *base, u32 offset, u32 val) +{ + u32 reg; + + reg = readl(base + offset); + return (reg & val) == val; +} + static inline void qphy_setbits(void __iomem *base, u32 offset, u32 val) { u32 reg; @@ -4332,16 +4340,21 @@ static int qmp_pcie_init(struct phy *phy struct qmp_pcie *qmp = phy_get_drvdata(phy); const struct qmp_phy_cfg *cfg = qmp->cfg; void __iomem *pcs = qmp->pcs; - bool phy_initialized = !!(readl(pcs + cfg->regs[QPHY_START_CTRL])); int ret;
- qmp->skip_init = qmp->nocsr_reset && phy_initialized; /* - * We need to check the existence of init sequences in two cases: - * 1. The PHY doesn't support no_csr reset. - * 2. The PHY supports no_csr reset but isn't initialized by bootloader. - * As we can't skip init in these two cases. + * We can skip PHY initialization if all of the following conditions + * are met: + * 1. The PHY supports the nocsr_reset that preserves the PHY config. + * 2. The PHY was started (and not powered down again) by the + * bootloader, with all of the expected bits set correctly. + * In this case, we can continue without having the init sequence + * defined in the driver. */ + qmp->skip_init = qmp->nocsr_reset && + qphy_checkbits(pcs, cfg->regs[QPHY_START_CTRL], SERDES_START | PCS_START) && + qphy_checkbits(pcs, cfg->regs[QPHY_PCS_POWER_DOWN_CONTROL], cfg->pwrdn_ctrl); + if (!qmp->skip_init && !cfg->tbls.serdes_num) { dev_err(qmp->dev, "Init sequence not available\n"); return -ENODATA;
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit bca065733afd1e3a89a02f05ffe14e966cd5f78e upstream.
Make sure to drop the references taken to the PMC OF node and device by of_parse_phandle() and of_find_device_by_node() during probe.
Note the holding a reference to the PMC device does not prevent the PMC regmap from going away (e.g. if the PMC driver is unbound) so there is no need to keep the reference.
Fixes: 2d1021487273 ("phy: tegra: xusb: Add wake/sleepwalk for Tegra210") Cc: stable@vger.kernel.org # 5.14 Cc: JC Kuo jckuo@nvidia.com Signed-off-by: Johan Hovold johan@kernel.org Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20250724131206.2211-2-johan@kernel.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/tegra/xusb-tegra210.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/phy/tegra/xusb-tegra210.c +++ b/drivers/phy/tegra/xusb-tegra210.c @@ -3164,18 +3164,22 @@ tegra210_xusb_padctl_probe(struct device }
pdev = of_find_device_by_node(np); + of_node_put(np); if (!pdev) { dev_warn(dev, "PMC device is not available\n"); goto out; }
- if (!platform_get_drvdata(pdev)) + if (!platform_get_drvdata(pdev)) { + put_device(&pdev->dev); return ERR_PTR(-EPROBE_DEFER); + }
padctl->regmap = dev_get_regmap(&pdev->dev, "usb_sleepwalk"); if (!padctl->regmap) dev_info(dev, "failed to find PMC regmap\n");
+ put_device(&pdev->dev); out: return &padctl->base; }
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit 64961557efa1b98f375c0579779e7eeda1a02c42 upstream.
Make sure to drop the reference to the control device taken by of_find_device_by_node() during probe when the driver is unbound.
Fixes: 478b6c7436c2 ("usb: phy: omap-usb2: Don't use omap_get_control_dev()") Cc: stable@vger.kernel.org # 3.13 Cc: Roger Quadros rogerq@kernel.org Signed-off-by: Johan Hovold johan@kernel.org Link: https://lore.kernel.org/r/20250724131206.2211-3-johan@kernel.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/ti/phy-omap-usb2.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/drivers/phy/ti/phy-omap-usb2.c +++ b/drivers/phy/ti/phy-omap-usb2.c @@ -363,6 +363,13 @@ static void omap_usb2_init_errata(struct phy->flags |= OMAP_USB2_DISABLE_CHRG_DET; }
+static void omap_usb2_put_device(void *_dev) +{ + struct device *dev = _dev; + + put_device(dev); +} + static int omap_usb2_probe(struct platform_device *pdev) { struct omap_usb *phy; @@ -373,6 +380,7 @@ static int omap_usb2_probe(struct platfo struct device_node *control_node; struct platform_device *control_pdev; const struct usb_phy_data *phy_data; + int ret;
phy_data = device_get_match_data(&pdev->dev); if (!phy_data) @@ -423,6 +431,11 @@ static int omap_usb2_probe(struct platfo return -EINVAL; } phy->control_dev = &control_pdev->dev; + + ret = devm_add_action_or_reset(&pdev->dev, omap_usb2_put_device, + phy->control_dev); + if (ret) + return ret; } else { if (of_property_read_u32_index(node, "syscon-phy-power", 1,
6.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit e19bcea99749ce8e8f1d359f68ae03210694ad56 upstream.
Make sure to drop the reference to the control device taken by of_find_device_by_node() during probe when the driver is unbound.
Fixes: 918ee0d21ba4 ("usb: phy: omap-usb3: Don't use omap_get_control_dev()") Cc: stable@vger.kernel.org # 3.13 Cc: Roger Quadros rogerq@kernel.org Signed-off-by: Johan Hovold johan@kernel.org Link: https://lore.kernel.org/r/20250724131206.2211-4-johan@kernel.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/ti/phy-ti-pipe3.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/drivers/phy/ti/phy-ti-pipe3.c +++ b/drivers/phy/ti/phy-ti-pipe3.c @@ -667,12 +667,20 @@ static int ti_pipe3_get_clk(struct ti_pi return 0; }
+static void ti_pipe3_put_device(void *_dev) +{ + struct device *dev = _dev; + + put_device(dev); +} + static int ti_pipe3_get_sysctrl(struct ti_pipe3 *phy) { struct device *dev = phy->dev; struct device_node *node = dev->of_node; struct device_node *control_node; struct platform_device *control_pdev; + int ret;
phy->phy_power_syscon = syscon_regmap_lookup_by_phandle(node, "syscon-phy-power"); @@ -704,6 +712,11 @@ static int ti_pipe3_get_sysctrl(struct t }
phy->control_dev = &control_pdev->dev; + + ret = devm_add_action_or_reset(dev, ti_pipe3_put_device, + phy->control_dev); + if (ret) + return ret; }
if (phy->mode == PIPE3_MODE_PCIE) {
The kernel, bpf tool, perf tool, and kselftest builds fine for v6.16.8-rc1 on x86 and arm64 Azure VM.
Tested-by: Hardik Garg hargar@linux.microsoft.com
Thanks, Hardik
On Wed Sep 17, 2025 at 2:31 PM CEST, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.16.8 release. There are 189 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 19 Sep 2025 12:32:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.8-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
Thanks! Builded on all Alpine architectures & booted on x86_64.
Tested-By: Achill Gilgenast achill@achill.org
On Wed, 17 Sep 2025 14:31:50 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.16.8 release. There are 189 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 19 Sep 2025 12:32:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.8-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.16: 10 builds: 10 pass, 0 fail 28 boots: 28 pass, 0 fail 120 tests: 120 pass, 0 fail
Linux version: 6.16.8-rc1-gfb25a6be32b3 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra186-p3509-0000+p3636-0001, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
[2025-09-17 14:31] Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.16.8 release. There are 189 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 19 Sep 2025 12:32:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.16.8-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.16.y and the diffstat can be found below.
Hi Greg,
both vanilla 6.16.8-rc1 and 6.16.8-rc1 with the additional patch "netfilter: nft_set_pipapo: fix null deref for empty set" from the stable queue (https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree...) compile fine for x86_64 using GCC 15.2.1+r22+gc4e96a094636 and binutils 2.45+r29+g2b2e51a31ec7, and the resulting kernels booted and ran/runs without any discernible issues on various physical machines of mine (Ivy Bridge, Haswell, Kaby Lake, Coffee Lake) and on a Zen 2 VM and a bunch of Kaby Lake VMs.
Regards Pascal
linux-stable-mirror@lists.linaro.org