This is the start of the stable review cycle for the 6.12.48 release. There are 140 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 19 Sep 2025 12:32:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.48-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.12.48-rc1
Alex Deucher alexander.deucher@amd.com drm/amdgpu: fix a memory leak in fence cleanup when unloading
Jani Nikula jani.nikula@intel.com drm/i915/power: fix size for for_each_set_bit() in abox iteration
Buday Csaba buday.csaba@prolan.hu net: mdiobus: release reset_gpio in mdiobus_unregister_device()
K Prateek Nayak kprateek.nayak@amd.com x86/cpu/topology: Always try cpu_parse_topology_ext() on AMD/Hygon
Johan Hovold johan@kernel.org phy: ti-pipe3: fix device leak at unbind
Johan Hovold johan@kernel.org phy: ti: omap-usb2: fix device leak at unbind
Johan Hovold johan@kernel.org phy: tegra: xusb: fix device and OF node leak at probe
Miaoqian Lin linmq006@gmail.com dmaengine: dw: dmamux: Fix device reference leak in rzn1_dmamux_route_allocate
Stephan Gerhold stephan.gerhold@linaro.org dmaengine: qcom: bam_dma: Fix DT error handling for num-channels/ees
Takashi Iwai tiwai@suse.de usb: gadget: midi2: Fix MIDI2 IN EP max packet size
Takashi Iwai tiwai@suse.de usb: gadget: midi2: Fix missing UMP group attributes initialization
RD Babiera rdbabiera@google.com usb: typec: tcpm: properly deliver cable vdms to altmode drivers
Alan Stern stern@rowland.harvard.edu USB: gadget: dummy-hcd: Fix locking bug in RT-enabled kernels
Mathias Nyman mathias.nyman@linux.intel.com xhci: fix memory leak regression when freeing xhci vdev devices depth first
Palmer Dabbelt palmer@rivosinc.com RISC-V: Remove unnecessary include from compat.h
Andreas Kemnade akemnade@kernel.org regulator: sy7636a: fix lifecycle of power good gpio
Anders Roxell anders.roxell@linaro.org dmaengine: ti: edma: Fix memory allocation size for queue_priority_map
Dan Carpenter dan.carpenter@linaro.org dmaengine: idxd: Fix double free in idxd_setup_wqs()
Yi Sun yi.sun@intel.com dmaengine: idxd: Fix refcount underflow on module unload
Yi Sun yi.sun@intel.com dmaengine: idxd: Remove improper idxd_free
Pengyu Luo mitltlatltl@gmail.com phy: qualcomm: phy-qcom-eusb2-repeater: fix override properties
Hangbin Liu liuhangbin@gmail.com hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr
Hangbin Liu liuhangbin@gmail.com hsr: use rtnl lock when iterating over ports
Murali Karicheri m-karicheri2@ti.com net: hsr: Add VLAN CTAG filter support
Florian Westphal fw@strlen.de netfilter: nf_tables: restart set lookup on base_seq change
Florian Westphal fw@strlen.de netfilter: nf_tables: make nft_set_do_lookup available unconditionally
Florian Westphal fw@strlen.de netfilter: nf_tables: place base_seq in struct net
Phil Sutter phil@nwl.cc netfilter: nf_tables: Reintroduce shortened deletion notifications
Florian Westphal fw@strlen.de netfilter: nft_set_rbtree: continue traversal if element is inactive
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: don't return bogus extension pointer
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: merge pipapo_get/lookup
Florian Westphal fw@strlen.de netfilter: nft_set: remove one argument from lookup and update functions
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: remove unused arguments
Anssi Hannula anssi.hannula@bitwise.fi can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
Alex Deucher alexander.deucher@amd.com drm/amd/display: use udelay rather than fsleep
Michal Schmidt mschmidt@redhat.com i40e: fix IRQ freeing in i40e_vsi_request_irq_msix error path
Kohei Enju enjuk@amazon.com igb: fix link test skipping when interface is admin down
Alex Tran alex.t.tran@gmail.com docs: networking: can: change bcm_msg_head frames member to support flexible array
Antoine Tenart atenart@kernel.org tunnels: reset the GSO metadata before reusing the skb
Petr Machata petrm@nvidia.com net: bridge: Bounce invalid boolopts
Alok Tiwari alok.a.tiwari@oracle.com genetlink: fix genl_bind() invoking bind() after -EPERM
Stefan Wahren wahrenst@gmx.net net: fec: Fix possible NPD in fec_enet_phy_reset_after_clk_enable()
Chia-I Wu olvaffe@gmail.com drm/panthor: validate group queue count
Linus Torvalds torvalds@linux-foundation.org Disable SLUB_TINY for build testing
Fabio Porcedda fabio.porcedda@gmail.com USB: serial: option: add Telit Cinterion LE910C4-WWX new compositions
Fabio Porcedda fabio.porcedda@gmail.com USB: serial: option: add Telit Cinterion FN990A w/audio compositions
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org dt-bindings: serial: brcm,bcm7271-uart: Constrain clocks
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: fix bug in flow control levels init
Fabian Vogt fvogt@suse.de tty: hvc_console: Call hvc_kick in hvc_write unconditionally
Paolo Abeni pabeni@redhat.com Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"
Christoffer Sandberg cs@tuxedo.de Input: i8042 - add TUXEDO InfinityBook Pro Gen10 AMD to i8042 quirk table
Jeff LaBundy jeff@labundy.com Input: iqs7222 - avoid enabling unused interrupts
Xiongfeng Wang wangxiongfeng2@huawei.com hrtimers: Unconditionally update target CPU base after offline timer migration
Qu Wenruo wqu@suse.com btrfs: fix corruption reading compressed range when block size is smaller than page size
Boris Burkov boris@bur.io btrfs: use readahead_expand() on compressed extents
Santhosh Kumar K s-k6@ti.com mtd: spinand: winbond: Fix oob_layout for W25N01JW
Jeongjun Park aha310510@gmail.com mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
Quanmin Yan yanquanmin1@huawei.com mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters()
Stanislav Fort stanislav.fort@aisle.com mm/damon/sysfs: fix use-after-free in state_show()
Alex Markuze amarkuze@redhat.com ceph: fix race condition where r_parent becomes stale before sending message
Alex Markuze amarkuze@redhat.com ceph: fix race condition validating r_parent before applying state
Ilya Dryomov idryomov@gmail.com libceph: fix invalid accesses to ceph_connection_v1_info
Chen Ridong chenridong@huawei.com kernfs: Fix UAF in polling when open file is released
Matthieu Baerts (NGI0) matttbe@kernel.org netlink: specs: mptcp: fix if-idx attribute type
Jakub Kicinski kuba@kernel.org netlink: specs: mptcp: replace underscores with dashes in names
Matthieu Baerts (NGI0) matttbe@kernel.org netlink: specs: mptcp: clearly mention attributes
Matthieu Baerts (NGI0) matttbe@kernel.org netlink: specs: mptcp: add missing 'server-side' attr
David Rosca david.rosca@amd.com drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packages
David Rosca david.rosca@amd.com drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any time
Thomas Hellström thomas.hellstrom@linux.intel.com drm/xe: Attempt to bring bos back to VRAM after eviction
Johan Hovold johan@kernel.org drm/mediatek: fix potential OF node use-after-free
Quanmin Yan yanquanmin1@huawei.com mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters()
Sang-Heon Jeon ekffu200098@gmail.com mm/damon/core: set quota->charged_from to jiffies at first charge window
Kyle Meyer kyle.meyer@hpe.com mm/memory-failure: fix redundant updates for already poisoned pages
Miaohe Lin linmiaohe@huawei.com mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
Wei Yang richard.weiyang@gmail.com mm/khugepaged: fix the address passed to notifier on testing young
Miklos Szeredi mszeredi@redhat.com fuse: prevent overflow in copy_file_range return value
Miklos Szeredi mszeredi@redhat.com fuse: check if copy_file_range() returns larger than requested size
Amir Goldstein amir73il@gmail.com fuse: do not allow mapping a non-regular backing file
Christophe Kerello christophe.kerello@foss.st.com mtd: rawnand: stm32_fmc2: fix ECC overwrite
Christophe Kerello christophe.kerello@foss.st.com mtd: rawnand: stm32_fmc2: avoid overlapping mappings on ECC buffer
Alexander Sverdlin alexander.sverdlin@siemens.com mtd: nand: raw: atmel: Respect tAR, tCLR in read setup timing
Oleksij Rempel o.rempel@pengutronix.de net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups
Chiasheng Lee chiasheng.lee@linux.intel.com i2c: i801: Hide Intel Birch Stream SoC TCO WDT
Omar Sandoval osandov@fb.com btrfs: fix subvolume deletion lockup caused by inodes xarray race
Boris Burkov boris@bur.io btrfs: fix squota compressed stats leak
Mark Tinguely mark.tinguely@oracle.com ocfs2: fix recursive semaphore deadlock in fiemap call
Krister Johansen kjlx@templeofstupid.com mptcp: sockopt: make sync_socket_options propagate SOCK_KEEPOPEN
Nathan Chancellor nathan@kernel.org compiler-clang.h: define __SANITIZE_*__ macros only when undefined
Trond Myklebust trond.myklebust@hammerspace.com Revert "SUNRPC: Don't allow waiting for exiting tasks"
Salah Triki salah.triki@gmail.com EDAC/altera: Delete an inappropriate dma_free_coherent() call
wangzijie wangzijie1@honor.com proc: fix type confusion in pde_set_flags()
Kuniyuki Iwashima kuniyu@google.com tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork.
Peilin Ye yepeilin@google.com bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
KaFai Wan kafai.wan@linux.dev bpf: Allow fall back to interpreter for programs with stack size <= 512
Daniel Borkmann daniel@iogearbox.net bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt
Thomas Richter tmricht@linux.ibm.com s390/cpum_cf: Deny all sampling events by counter PMU
Thomas Richter tmricht@linux.ibm.com s390/pai: Deny all events not handled by this PMU
Pu Lehui pulehui@huawei.com tracing: Silence warning when chunk allocation fails in trace_pid_write
Jonathan Curley jcurley@purestorage.com NFSv4/flexfiles: Fix layout merge mirror check.
Trond Myklebust trond.myklebust@hammerspace.com NFS: nfs_invalidate_folio() must observe the offset and size arguments
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.2: Serialise O_DIRECT i/o and copy range
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.2: Serialise O_DIRECT i/o and clone range
Trond Myklebust trond.myklebust@hammerspace.com NFSv4.2: Serialise O_DIRECT i/o and fallocate()
Trond Myklebust trond.myklebust@hammerspace.com NFS: Serialise O_DIRECT i/o and truncate()
Max Kellermann max.kellermann@ionos.com fs/nfs/io: make nfs_start_io_*() killable
Vladimir Riabchun ferr.lambarginio@gmail.com ftrace/samples: Fix function size computation
Scott Mayhew smayhew@redhat.com nfs/localio: restore creds before releasing pageio data
Mike Snitzer snitzer@kernel.org nfs/localio: add direct IO enablement with sync and async IO support
Mike Snitzer snitzer@kernel.org nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
Luo Gengkun luogengkun@huaweicloud.com tracing: Fix tracing_marker may trigger page fault during preempt_disable
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Clear the NFS_CAP_XATTR flag if not supported by the server
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Clear NFS_CAP_OPEN_XOR and NFS_CAP_DELEGTIME if not supported
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Clear the NFS_CAP_FS_LOCATIONS flag if it is not set
Guenter Roeck linux@roeck-us.net trace/fgraph: Fix error handling
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Don't clear capabilities that won't be reset
Justin Worrell jworrell@gmail.com SUNRPC: call xs_sock_process_cmsg for all cmsg
Tigran Mkrtchyan tigran.mkrtchyan@desy.de flexfiles/pNFS: fix NULL checks on result of ff_layout_choose_ds_for_read
David Rosca david.rosca@amd.com drm/amdgpu: Add back JPEG to video caps for carrizo and newer
Takashi Iwai tiwai@suse.de ALSA: hda/realtek: Fix built-in mic assignment on ASUS VivoBook X515UA
Aurabindo Pillai aurabindo.pillai@amd.com Revert "drm/amd/display: Optimize cursor position updates"
Srinivasan Shanmugam srinivasan.shanmugam@amd.com drm/amd/display: Fix error pointers in amdgpu_dm_crtc_mem_type_changed
Umesh Nerlige Ramappa umesh.nerlige.ramappa@intel.com drm/i915/pmu: Fix zero delta busyness issue
Theodore Ts'o tytso@mit.edu ext4: introduce linear search for dentries
Huan Yang link@vivo.com Revert "udmabuf: fix vmap_udmabuf error page set"
Maurizio Lombardi mlombard@redhat.com nvme-pci: skip nvme_write_sq_db on empty rqlist
Fedor Pchelkin pchelkin@ispras.ru dma-debug: fix physical address calculation for struct dma_debug_entry
Sean Anderson sean.anderson@linux.dev dma-mapping: fix swapped dir/flags arguments to trace_dma_alloc_sgt_err
Harry Yoo harry.yoo@oracle.com mm: introduce and use {pgd,p4d}_populate_kernel()
Yevgeny Kliteynik kliteyn@nvidia.com net/mlx5: HWS, change error flow on matcher disconnect
Yeoreum Yun yeoreum.yun@arm.com kunit: kasan_test: disable fortify string checker on kasan_strings() test
Baochen Qiang baochen.qiang@oss.qualcomm.com dma-debug: don't enforce dma mapping check on noncoherent allocations
Sean Anderson sean.anderson@linux.dev dma-mapping: trace more error paths
Sean Anderson sean.anderson@linux.dev dma-mapping: use trace_dma_alloc for dma_alloc* instead of using trace_dma_map
Sean Anderson sean.anderson@linux.dev dma-mapping: trace dma_alloc/free direction
Christoph Hellwig hch@lst.de dma-debug: store a phys_addr_t in struct dma_debug_entry
Amir Goldstein amir73il@gmail.com fhandle: use more consistent rules for decoding file handle from userns
-------------
Diffstat:
.../bindings/serial/brcm,bcm7271-uart.yaml | 2 +- Documentation/filesystems/nfs/localio.rst | 13 ++ Documentation/netlink/specs/mptcp_pm.yaml | 52 ++--- Documentation/networking/can.rst | 2 +- Makefile | 4 +- arch/riscv/include/asm/compat.h | 1 - arch/s390/kernel/perf_cpum_cf.c | 4 +- arch/s390/kernel/perf_pai_crypto.c | 4 +- arch/s390/kernel/perf_pai_ext.c | 2 +- arch/x86/kernel/cpu/topology_amd.c | 25 +-- drivers/dma-buf/Kconfig | 1 - drivers/dma-buf/udmabuf.c | 22 +-- drivers/dma/dw/rzn1-dmamux.c | 15 +- drivers/dma/idxd/init.c | 39 ++-- drivers/dma/qcom/bam_dma.c | 8 +- drivers/dma/ti/edma.c | 4 +- drivers/edac/altera_edac.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 - drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 12 +- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 60 +++--- drivers/gpu/drm/amd/amdgpu/vi.c | 7 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 + .../gpu/drm/amd/display/dc/dpp/dcn10/dcn10_dpp.c | 7 +- .../drm/amd/display/dc/dpp/dcn401/dcn401_dpp_cm.c | 6 +- .../gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c | 8 +- .../drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c | 10 +- .../drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c | 2 +- drivers/gpu/drm/i915/display/intel_display_power.c | 6 +- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 16 ++ drivers/gpu/drm/mediatek/mtk_drm_drv.c | 11 +- drivers/gpu/drm/panthor/panthor_drv.c | 2 +- drivers/gpu/drm/xe/tests/xe_bo.c | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 10 +- drivers/gpu/drm/xe/xe_bo.c | 16 +- drivers/gpu/drm/xe/xe_bo.h | 2 +- drivers/gpu/drm/xe/xe_dma_buf.c | 2 +- drivers/i2c/busses/i2c-i801.c | 2 +- drivers/input/misc/iqs7222.c | 3 + drivers/input/serio/i8042-acpipnpio.h | 14 ++ drivers/mtd/nand/raw/atmel/nand-controller.c | 16 +- drivers/mtd/nand/raw/stm32_fmc2_nand.c | 46 ++--- drivers/mtd/nand/spi/winbond.c | 37 +++- drivers/net/can/xilinx_can.c | 16 +- drivers/net/ethernet/freescale/fec_main.c | 3 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 5 +- .../mlx5/core/steering/hws/mlx5hws_matcher.c | 24 +-- drivers/net/phy/mdio_bus.c | 4 +- drivers/nvme/host/pci.c | 3 + drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c | 4 +- drivers/phy/tegra/xusb-tegra210.c | 6 +- drivers/phy/ti/phy-omap-usb2.c | 13 ++ drivers/phy/ti/phy-ti-pipe3.c | 13 ++ drivers/regulator/sy7636a-regulator.c | 7 +- drivers/tty/hvc/hvc_console.c | 6 +- drivers/tty/serial/sc16is7xx.c | 14 +- drivers/usb/gadget/function/f_midi2.c | 11 +- drivers/usb/gadget/udc/dummy_hcd.c | 8 +- drivers/usb/host/xhci-mem.c | 2 +- drivers/usb/serial/option.c | 17 ++ drivers/usb/typec/tcpm/tcpm.c | 12 +- fs/btrfs/extent_io.c | 72 ++++++- fs/btrfs/inode.c | 12 +- fs/btrfs/qgroup.c | 6 +- fs/ceph/debugfs.c | 14 +- fs/ceph/dir.c | 17 +- fs/ceph/file.c | 24 +-- fs/ceph/inode.c | 88 +++++++-- fs/ceph/mds_client.c | 172 ++++++++++------- fs/ceph/mds_client.h | 18 +- fs/ext4/namei.c | 14 +- fs/fhandle.c | 8 + fs/fuse/file.c | 5 +- fs/fuse/passthrough.c | 5 + fs/kernfs/file.c | 58 ++++-- fs/nfs/client.c | 2 + fs/nfs/direct.c | 22 ++- fs/nfs/file.c | 21 +- fs/nfs/flexfilelayout/flexfilelayout.c | 21 +- fs/nfs/inode.c | 4 +- fs/nfs/internal.h | 17 +- fs/nfs/io.c | 55 ++++-- fs/nfs/localio.c | 125 +++++++++--- fs/nfs/nfs42proc.c | 2 + fs/nfs/nfs4file.c | 2 + fs/nfs/nfs4proc.c | 7 +- fs/nfs/write.c | 1 + fs/ocfs2/extent_map.c | 10 +- fs/proc/generic.c | 3 +- include/linux/compiler-clang.h | 31 ++- include/linux/fs.h | 10 +- include/linux/nfs_xdr.h | 1 + include/linux/pgalloc.h | 29 +++ include/linux/pgtable.h | 13 +- include/net/netfilter/nf_tables.h | 11 +- include/net/netfilter/nf_tables_core.h | 49 ++--- include/net/netns/nftables.h | 1 + include/trace/events/dma.h | 153 ++++++++++++++- include/uapi/linux/mptcp_pm.h | 50 ++--- kernel/bpf/core.c | 16 +- kernel/bpf/crypto.c | 2 +- kernel/bpf/helpers.c | 7 +- kernel/dma/debug.c | 135 ++++++++----- kernel/dma/debug.h | 20 ++ kernel/dma/mapping.c | 41 ++-- kernel/time/hrtimer.c | 11 +- kernel/trace/fgraph.c | 3 +- kernel/trace/trace.c | 10 +- mm/Kconfig | 2 +- mm/damon/core.c | 4 + mm/damon/lru_sort.c | 5 + mm/damon/reclaim.c | 5 + mm/damon/sysfs.c | 14 +- mm/hugetlb.c | 9 +- mm/kasan/init.c | 12 +- mm/kasan/kasan_test_c.c | 1 + mm/khugepaged.c | 4 +- mm/memory-failure.c | 20 +- mm/percpu.c | 6 +- mm/sparse-vmemmap.c | 6 +- net/bridge/br.c | 7 + net/can/j1939/bus.c | 5 +- net/can/j1939/socket.c | 3 + net/ceph/messenger.c | 7 +- net/hsr/hsr_device.c | 97 +++++++++- net/hsr/hsr_main.c | 4 +- net/hsr/hsr_main.h | 3 + net/ipv4/ip_tunnel_core.c | 6 + net/ipv4/tcp_bpf.c | 5 +- net/mptcp/sockopt.c | 11 +- net/netfilter/nf_tables_api.c | 123 +++++++----- net/netfilter/nft_dynset.c | 5 +- net/netfilter/nft_lookup.c | 67 +++++-- net/netfilter/nft_objref.c | 5 +- net/netfilter/nft_set_bitmap.c | 11 +- net/netfilter/nft_set_hash.c | 54 +++--- net/netfilter/nft_set_pipapo.c | 211 ++++++++------------- net/netfilter/nft_set_pipapo_avx2.c | 29 +-- net/netfilter/nft_set_rbtree.c | 46 +++-- net/netlink/genetlink.c | 3 + net/sunrpc/sched.c | 2 - net/sunrpc/xprtsock.c | 6 +- samples/ftrace/ftrace-direct-modify.c | 2 +- sound/pci/hda/patch_realtek.c | 1 + 144 files changed, 1887 insertions(+), 986 deletions(-)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
[ Upstream commit bb585591ebf00fb1f6a1fdd1ea96b5848bd9112d ]
Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks") relaxed the coditions for decoding a file handle from non init userns.
The conditions are that that decoded dentry is accessible from the user provided mountfd (or to fs root) and that all the ancestors along the path have a valid id mapping in the userns.
These conditions are intentionally more strict than the condition that the decoded dentry should be "lookable" by path from the mountfd.
For example, the path /home/amir/dir/subdir is lookable by path from unpriv userns of user amir, because /home perms is 755, but the owner of /home does not have a valid id mapping in unpriv userns of user amir.
The current code did not check that the decoded dentry itself has a valid id mapping in the userns. There is no security risk in that, because that final open still performs the needed permission checks, but this is inconsistent with the checks performed on the ancestors, so the behavior can be a bit confusing.
Add the check for the decoded dentry itself, so that the entire path, including the last component has a valid id mapping in the userns.
Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks") Signed-off-by: Amir Goldstein amir73il@gmail.com Link: https://lore.kernel.org/20250827194309.1259650-1-amir73il@gmail.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fhandle.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/fs/fhandle.c b/fs/fhandle.c index 82df28d45cd70..ff90f8203015e 100644 --- a/fs/fhandle.c +++ b/fs/fhandle.c @@ -176,6 +176,14 @@ static int vfs_dentry_acceptable(void *context, struct dentry *dentry) if (!ctx->flags) return 1;
+ /* + * Verify that the decoded dentry itself has a valid id mapping. + * In case the decoded dentry is the mountfd root itself, this + * verifies that the mountfd inode itself has a valid id mapping. + */ + if (!privileged_wrt_inode_uidgid(user_ns, idmap, d_inode(dentry))) + return 0; + /* * It's racy as we're not taking rename_lock but we're able to ignore * permissions and we just need an approximation whether we were able
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
[ Upstream commit 9d4f645a1fd49eea70a21e8671d358ebe1c08d02 ]
dma-debug goes to great length to split incoming physical addresses into a PFN and offset to store them in struct dma_debug_entry, just to recombine those for all meaningful uses. Just store a phys_addr_t instead.
Signed-off-by: Christoph Hellwig hch@lst.de Stable-dep-of: 7e2368a21741 ("dma-debug: don't enforce dma mapping check on noncoherent allocations") Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/dma/debug.c | 79 ++++++++++++++++------------------------------ 1 file changed, 28 insertions(+), 51 deletions(-)
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index f6f0387761d05..4e3692afdf0d2 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -59,8 +59,7 @@ enum map_err_types { * @direction: enum dma_data_direction * @sg_call_ents: 'nents' from dma_map_sg * @sg_mapped_ents: 'mapped_ents' from dma_map_sg - * @pfn: page frame of the start address - * @offset: offset of mapping relative to pfn + * @paddr: physical start address of the mapping * @map_err_type: track whether dma_mapping_error() was checked * @stack_len: number of backtrace entries in @stack_entries * @stack_entries: stack of backtrace history @@ -74,8 +73,7 @@ struct dma_debug_entry { int direction; int sg_call_ents; int sg_mapped_ents; - unsigned long pfn; - size_t offset; + phys_addr_t paddr; enum map_err_types map_err_type; #ifdef CONFIG_STACKTRACE unsigned int stack_len; @@ -389,14 +387,6 @@ static void hash_bucket_del(struct dma_debug_entry *entry) list_del(&entry->list); }
-static unsigned long long phys_addr(struct dma_debug_entry *entry) -{ - if (entry->type == dma_debug_resource) - return __pfn_to_phys(entry->pfn) + entry->offset; - - return page_to_phys(pfn_to_page(entry->pfn)) + entry->offset; -} - /* * For each mapping (initial cacheline in the case of * dma_alloc_coherent/dma_map_page, initial cacheline in each page of a @@ -428,8 +418,8 @@ static DEFINE_SPINLOCK(radix_lock);
static phys_addr_t to_cacheline_number(struct dma_debug_entry *entry) { - return (entry->pfn << CACHELINE_PER_PAGE_SHIFT) + - (entry->offset >> L1_CACHE_SHIFT); + return ((entry->paddr >> PAGE_SHIFT) << CACHELINE_PER_PAGE_SHIFT) + + (offset_in_page(entry->paddr) >> L1_CACHE_SHIFT); }
static int active_cacheline_read_overlap(phys_addr_t cln) @@ -538,11 +528,11 @@ void debug_dma_dump_mappings(struct device *dev) if (!dev || dev == entry->dev) { cln = to_cacheline_number(entry); dev_info(entry->dev, - "%s idx %d P=%llx N=%lx D=%llx L=%llx cln=%pa %s %s\n", + "%s idx %d P=%pa D=%llx L=%llx cln=%pa %s %s\n", type2name[entry->type], idx, - phys_addr(entry), entry->pfn, - entry->dev_addr, entry->size, - &cln, dir2name[entry->direction], + &entry->paddr, entry->dev_addr, + entry->size, &cln, + dir2name[entry->direction], maperr2str[entry->map_err_type]); } } @@ -569,13 +559,13 @@ static int dump_show(struct seq_file *seq, void *v) list_for_each_entry(entry, &bucket->list, list) { cln = to_cacheline_number(entry); seq_printf(seq, - "%s %s %s idx %d P=%llx N=%lx D=%llx L=%llx cln=%pa %s %s\n", + "%s %s %s idx %d P=%pa D=%llx L=%llx cln=%pa %s %s\n", dev_driver_string(entry->dev), dev_name(entry->dev), type2name[entry->type], idx, - phys_addr(entry), entry->pfn, - entry->dev_addr, entry->size, - &cln, dir2name[entry->direction], + &entry->paddr, entry->dev_addr, + entry->size, &cln, + dir2name[entry->direction], maperr2str[entry->map_err_type]); } spin_unlock_irqrestore(&bucket->lock, flags); @@ -1003,16 +993,16 @@ static void check_unmap(struct dma_debug_entry *ref) "[mapped as %s] [unmapped as %s]\n", ref->dev_addr, ref->size, type2name[entry->type], type2name[ref->type]); - } else if ((entry->type == dma_debug_coherent) && - (phys_addr(ref) != phys_addr(entry))) { + } else if (entry->type == dma_debug_coherent && + ref->paddr != entry->paddr) { err_printk(ref->dev, entry, "device driver frees " "DMA memory with different CPU address " "[device address=0x%016llx] [size=%llu bytes] " - "[cpu alloc address=0x%016llx] " - "[cpu free address=0x%016llx]", + "[cpu alloc address=0x%pa] " + "[cpu free address=0x%pa]", ref->dev_addr, ref->size, - phys_addr(entry), - phys_addr(ref)); + &entry->paddr, + &ref->paddr); }
if (ref->sg_call_ents && ref->type == dma_debug_sg && @@ -1231,8 +1221,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
entry->dev = dev; entry->type = dma_debug_single; - entry->pfn = page_to_pfn(page); - entry->offset = offset; + entry->paddr = page_to_phys(page); entry->dev_addr = dma_addr; entry->size = size; entry->direction = direction; @@ -1327,8 +1316,7 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
entry->type = dma_debug_sg; entry->dev = dev; - entry->pfn = page_to_pfn(sg_page(s)); - entry->offset = s->offset; + entry->paddr = sg_phys(s); entry->size = sg_dma_len(s); entry->dev_addr = sg_dma_address(s); entry->direction = direction; @@ -1374,8 +1362,7 @@ void debug_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, struct dma_debug_entry ref = { .type = dma_debug_sg, .dev = dev, - .pfn = page_to_pfn(sg_page(s)), - .offset = s->offset, + .paddr = sg_phys(s), .dev_addr = sg_dma_address(s), .size = sg_dma_len(s), .direction = dir, @@ -1414,16 +1401,12 @@ void debug_dma_alloc_coherent(struct device *dev, size_t size,
entry->type = dma_debug_coherent; entry->dev = dev; - entry->offset = offset_in_page(virt); + entry->paddr = page_to_phys((is_vmalloc_addr(virt) ? + vmalloc_to_page(virt) : virt_to_page(virt))); entry->size = size; entry->dev_addr = dma_addr; entry->direction = DMA_BIDIRECTIONAL;
- if (is_vmalloc_addr(virt)) - entry->pfn = vmalloc_to_pfn(virt); - else - entry->pfn = page_to_pfn(virt_to_page(virt)); - add_dma_entry(entry, attrs); }
@@ -1433,7 +1416,6 @@ void debug_dma_free_coherent(struct device *dev, size_t size, struct dma_debug_entry ref = { .type = dma_debug_coherent, .dev = dev, - .offset = offset_in_page(virt), .dev_addr = dma_addr, .size = size, .direction = DMA_BIDIRECTIONAL, @@ -1443,10 +1425,8 @@ void debug_dma_free_coherent(struct device *dev, size_t size, if (!is_vmalloc_addr(virt) && !virt_addr_valid(virt)) return;
- if (is_vmalloc_addr(virt)) - ref.pfn = vmalloc_to_pfn(virt); - else - ref.pfn = page_to_pfn(virt_to_page(virt)); + ref.paddr = page_to_phys((is_vmalloc_addr(virt) ? + vmalloc_to_page(virt) : virt_to_page(virt)));
if (unlikely(dma_debug_disabled())) return; @@ -1469,8 +1449,7 @@ void debug_dma_map_resource(struct device *dev, phys_addr_t addr, size_t size,
entry->type = dma_debug_resource; entry->dev = dev; - entry->pfn = PHYS_PFN(addr); - entry->offset = offset_in_page(addr); + entry->paddr = addr; entry->size = size; entry->dev_addr = dma_addr; entry->direction = direction; @@ -1547,8 +1526,7 @@ void debug_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, struct dma_debug_entry ref = { .type = dma_debug_sg, .dev = dev, - .pfn = page_to_pfn(sg_page(s)), - .offset = s->offset, + .paddr = sg_phys(s), .dev_addr = sg_dma_address(s), .size = sg_dma_len(s), .direction = direction, @@ -1579,8 +1557,7 @@ void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, struct dma_debug_entry ref = { .type = dma_debug_sg, .dev = dev, - .pfn = page_to_pfn(sg_page(s)), - .offset = s->offset, + .paddr = sg_phys(sg), .dev_addr = sg_dma_address(s), .size = sg_dma_len(s), .direction = direction,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
[ Upstream commit 3afff779a725cba914e6caba360b696ae6f90249 ]
In preparation for using these tracepoints in a few more places, trace the DMA direction as well. For coherent allocations this is always bidirectional.
Signed-off-by: Sean Anderson sean.anderson@linux.dev Reviewed-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Christoph Hellwig hch@lst.de Stable-dep-of: 7e2368a21741 ("dma-debug: don't enforce dma mapping check on noncoherent allocations") Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/dma.h | 18 ++++++++++++------ kernel/dma/mapping.c | 6 ++++-- 2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h index b0f41265191c3..012729cc178f0 100644 --- a/include/trace/events/dma.h +++ b/include/trace/events/dma.h @@ -116,8 +116,9 @@ DEFINE_EVENT(dma_unmap, dma_unmap_resource,
TRACE_EVENT(dma_alloc, TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr, - size_t size, gfp_t flags, unsigned long attrs), - TP_ARGS(dev, virt_addr, dma_addr, size, flags, attrs), + size_t size, enum dma_data_direction dir, gfp_t flags, + unsigned long attrs), + TP_ARGS(dev, virt_addr, dma_addr, size, dir, flags, attrs),
TP_STRUCT__entry( __string(device, dev_name(dev)) @@ -125,6 +126,7 @@ TRACE_EVENT(dma_alloc, __field(u64, dma_addr) __field(size_t, size) __field(gfp_t, flags) + __field(enum dma_data_direction, dir) __field(unsigned long, attrs) ),
@@ -137,8 +139,9 @@ TRACE_EVENT(dma_alloc, __entry->attrs = attrs; ),
- TP_printk("%s dma_addr=%llx size=%zu virt_addr=%p flags=%s attrs=%s", + TP_printk("%s dir=%s dma_addr=%llx size=%zu virt_addr=%p flags=%s attrs=%s", __get_str(device), + decode_dma_data_direction(__entry->dir), __entry->dma_addr, __entry->size, __entry->virt_addr, @@ -148,14 +151,15 @@ TRACE_EVENT(dma_alloc,
TRACE_EVENT(dma_free, TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr, - size_t size, unsigned long attrs), - TP_ARGS(dev, virt_addr, dma_addr, size, attrs), + size_t size, enum dma_data_direction dir, unsigned long attrs), + TP_ARGS(dev, virt_addr, dma_addr, size, dir, attrs),
TP_STRUCT__entry( __string(device, dev_name(dev)) __field(void *, virt_addr) __field(u64, dma_addr) __field(size_t, size) + __field(enum dma_data_direction, dir) __field(unsigned long, attrs) ),
@@ -164,11 +168,13 @@ TRACE_EVENT(dma_free, __entry->virt_addr = virt_addr; __entry->dma_addr = dma_addr; __entry->size = size; + __entry->dir = dir; __entry->attrs = attrs; ),
- TP_printk("%s dma_addr=%llx size=%zu virt_addr=%p attrs=%s", + TP_printk("%s dir=%s dma_addr=%llx size=%zu virt_addr=%p attrs=%s", __get_str(device), + decode_dma_data_direction(__entry->dir), __entry->dma_addr, __entry->size, __entry->virt_addr, diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 74d453ec750a1..9720f3c157d9f 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -619,7 +619,8 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle, else return NULL;
- trace_dma_alloc(dev, cpu_addr, *dma_handle, size, flag, attrs); + trace_dma_alloc(dev, cpu_addr, *dma_handle, size, DMA_BIDIRECTIONAL, + flag, attrs); debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr, attrs); return cpu_addr; } @@ -644,7 +645,8 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr, if (!cpu_addr) return;
- trace_dma_free(dev, cpu_addr, dma_handle, size, attrs); + trace_dma_free(dev, cpu_addr, dma_handle, size, DMA_BIDIRECTIONAL, + attrs); debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); if (dma_alloc_direct(dev, ops)) dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
[ Upstream commit c4484ab86ee00f2d9236e2851621ea02c105f4cc ]
In some cases, we use trace_dma_map to trace dma_alloc* functions. This generally follows dma_debug. However, this does not record all of the relevant information for allocations, such as GFP flags. Create new dma_alloc tracepoints for these functions. Note that while dma_alloc_noncontiguous may allocate discontiguous pages (from the CPU's point of view), the device will only see one contiguous mapping. Therefore, we just need to trace dma_addr and size.
Signed-off-by: Sean Anderson sean.anderson@linux.dev Reviewed-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Christoph Hellwig hch@lst.de Stable-dep-of: 7e2368a21741 ("dma-debug: don't enforce dma mapping check on noncoherent allocations") Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/dma.h | 99 +++++++++++++++++++++++++++++++++++++- kernel/dma/mapping.c | 10 ++-- 2 files changed, 102 insertions(+), 7 deletions(-)
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h index 012729cc178f0..45cc0ca8287fe 100644 --- a/include/trace/events/dma.h +++ b/include/trace/events/dma.h @@ -114,7 +114,7 @@ DEFINE_EVENT(dma_unmap, dma_unmap_resource, enum dma_data_direction dir, unsigned long attrs), TP_ARGS(dev, addr, size, dir, attrs));
-TRACE_EVENT(dma_alloc, +DECLARE_EVENT_CLASS(dma_alloc_class, TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr, size_t size, enum dma_data_direction dir, gfp_t flags, unsigned long attrs), @@ -149,7 +149,58 @@ TRACE_EVENT(dma_alloc, decode_dma_attrs(__entry->attrs)) );
-TRACE_EVENT(dma_free, +#define DEFINE_ALLOC_EVENT(name) \ +DEFINE_EVENT(dma_alloc_class, name, \ + TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr, \ + size_t size, enum dma_data_direction dir, gfp_t flags, \ + unsigned long attrs), \ + TP_ARGS(dev, virt_addr, dma_addr, size, dir, flags, attrs)) + +DEFINE_ALLOC_EVENT(dma_alloc); +DEFINE_ALLOC_EVENT(dma_alloc_pages); + +TRACE_EVENT(dma_alloc_sgt, + TP_PROTO(struct device *dev, struct sg_table *sgt, size_t size, + enum dma_data_direction dir, gfp_t flags, unsigned long attrs), + TP_ARGS(dev, sgt, size, dir, flags, attrs), + + TP_STRUCT__entry( + __string(device, dev_name(dev)) + __dynamic_array(u64, phys_addrs, sgt->orig_nents) + __field(u64, dma_addr) + __field(size_t, size) + __field(enum dma_data_direction, dir) + __field(gfp_t, flags) + __field(unsigned long, attrs) + ), + + TP_fast_assign( + struct scatterlist *sg; + int i; + + __assign_str(device); + for_each_sg(sgt->sgl, sg, sgt->orig_nents, i) + ((u64 *)__get_dynamic_array(phys_addrs))[i] = sg_phys(sg); + __entry->dma_addr = sg_dma_address(sgt->sgl); + __entry->size = size; + __entry->dir = dir; + __entry->flags = flags; + __entry->attrs = attrs; + ), + + TP_printk("%s dir=%s dma_addr=%llx size=%zu phys_addrs=%s flags=%s attrs=%s", + __get_str(device), + decode_dma_data_direction(__entry->dir), + __entry->dma_addr, + __entry->size, + __print_array(__get_dynamic_array(phys_addrs), + __get_dynamic_array_len(phys_addrs) / + sizeof(u64), sizeof(u64)), + show_gfp_flags(__entry->flags), + decode_dma_attrs(__entry->attrs)) +); + +DECLARE_EVENT_CLASS(dma_free_class, TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr, size_t size, enum dma_data_direction dir, unsigned long attrs), TP_ARGS(dev, virt_addr, dma_addr, size, dir, attrs), @@ -181,6 +232,50 @@ TRACE_EVENT(dma_free, decode_dma_attrs(__entry->attrs)) );
+#define DEFINE_FREE_EVENT(name) \ +DEFINE_EVENT(dma_free_class, name, \ + TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr, \ + size_t size, enum dma_data_direction dir, unsigned long attrs), \ + TP_ARGS(dev, virt_addr, dma_addr, size, dir, attrs)) + +DEFINE_FREE_EVENT(dma_free); +DEFINE_FREE_EVENT(dma_free_pages); + +TRACE_EVENT(dma_free_sgt, + TP_PROTO(struct device *dev, struct sg_table *sgt, size_t size, + enum dma_data_direction dir), + TP_ARGS(dev, sgt, size, dir), + + TP_STRUCT__entry( + __string(device, dev_name(dev)) + __dynamic_array(u64, phys_addrs, sgt->orig_nents) + __field(u64, dma_addr) + __field(size_t, size) + __field(enum dma_data_direction, dir) + ), + + TP_fast_assign( + struct scatterlist *sg; + int i; + + __assign_str(device); + for_each_sg(sgt->sgl, sg, sgt->orig_nents, i) + ((u64 *)__get_dynamic_array(phys_addrs))[i] = sg_phys(sg); + __entry->dma_addr = sg_dma_address(sgt->sgl); + __entry->size = size; + __entry->dir = dir; + ), + + TP_printk("%s dir=%s dma_addr=%llx size=%zu phys_addrs=%s", + __get_str(device), + decode_dma_data_direction(__entry->dir), + __entry->dma_addr, + __entry->size, + __print_array(__get_dynamic_array(phys_addrs), + __get_dynamic_array_len(phys_addrs) / + sizeof(u64), sizeof(u64))) +); + TRACE_EVENT(dma_map_sg, TP_PROTO(struct device *dev, struct scatterlist *sgl, int nents, int ents, enum dma_data_direction dir, unsigned long attrs), diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 9720f3c157d9f..690aeda8bd7da 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -685,8 +685,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t size, struct page *page = __dma_alloc_pages(dev, size, dma_handle, dir, gfp);
if (page) { - trace_dma_map_page(dev, page_to_phys(page), *dma_handle, size, - dir, 0); + trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle, + size, dir, gfp, 0); debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0); } return page; @@ -710,7 +710,7 @@ static void __dma_free_pages(struct device *dev, size_t size, struct page *page, void dma_free_pages(struct device *dev, size_t size, struct page *page, dma_addr_t dma_handle, enum dma_data_direction dir) { - trace_dma_unmap_page(dev, dma_handle, size, dir, 0); + trace_dma_free_pages(dev, page_to_virt(page), dma_handle, size, dir, 0); debug_dma_unmap_page(dev, dma_handle, size, dir); __dma_free_pages(dev, size, page, dma_handle, dir); } @@ -770,7 +770,7 @@ struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
if (sgt) { sgt->nents = 1; - trace_dma_map_sg(dev, sgt->sgl, sgt->orig_nents, 1, dir, attrs); + trace_dma_alloc_sgt(dev, sgt, size, dir, gfp, attrs); debug_dma_map_sg(dev, sgt->sgl, sgt->orig_nents, 1, dir, attrs); } return sgt; @@ -789,7 +789,7 @@ static void free_single_sgt(struct device *dev, size_t size, void dma_free_noncontiguous(struct device *dev, size_t size, struct sg_table *sgt, enum dma_data_direction dir) { - trace_dma_unmap_sg(dev, sgt->sgl, sgt->orig_nents, dir, 0); + trace_dma_free_sgt(dev, sgt, size, dir); debug_dma_unmap_sg(dev, sgt->sgl, sgt->orig_nents, dir);
if (use_dma_iommu(dev))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
[ Upstream commit 68b6dbf1f441c4eba3b8511728a41cf9b01dca35 ]
It can be surprising to the user if DMA functions are only traced on success. On failure, it can be unclear what the source of the problem is. Fix this by tracing all functions even when they fail. Cases where we BUG/WARN are skipped, since those should be sufficiently noisy already.
Signed-off-by: Sean Anderson sean.anderson@linux.dev Reviewed-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Christoph Hellwig hch@lst.de Stable-dep-of: 7e2368a21741 ("dma-debug: don't enforce dma mapping check on noncoherent allocations") Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/dma.h | 36 ++++++++++++++++++++++++++++++++++++ kernel/dma/mapping.c | 25 ++++++++++++++++++------- 2 files changed, 54 insertions(+), 7 deletions(-)
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h index 45cc0ca8287fe..63b55ccc4f00c 100644 --- a/include/trace/events/dma.h +++ b/include/trace/events/dma.h @@ -158,6 +158,7 @@ DEFINE_EVENT(dma_alloc_class, name, \
DEFINE_ALLOC_EVENT(dma_alloc); DEFINE_ALLOC_EVENT(dma_alloc_pages); +DEFINE_ALLOC_EVENT(dma_alloc_sgt_err);
TRACE_EVENT(dma_alloc_sgt, TP_PROTO(struct device *dev, struct sg_table *sgt, size_t size, @@ -322,6 +323,41 @@ TRACE_EVENT(dma_map_sg, decode_dma_attrs(__entry->attrs)) );
+TRACE_EVENT(dma_map_sg_err, + TP_PROTO(struct device *dev, struct scatterlist *sgl, int nents, + int err, enum dma_data_direction dir, unsigned long attrs), + TP_ARGS(dev, sgl, nents, err, dir, attrs), + + TP_STRUCT__entry( + __string(device, dev_name(dev)) + __dynamic_array(u64, phys_addrs, nents) + __field(int, err) + __field(enum dma_data_direction, dir) + __field(unsigned long, attrs) + ), + + TP_fast_assign( + struct scatterlist *sg; + int i; + + __assign_str(device); + for_each_sg(sgl, sg, nents, i) + ((u64 *)__get_dynamic_array(phys_addrs))[i] = sg_phys(sg); + __entry->err = err; + __entry->dir = dir; + __entry->attrs = attrs; + ), + + TP_printk("%s dir=%s dma_addrs=%s err=%d attrs=%s", + __get_str(device), + decode_dma_data_direction(__entry->dir), + __print_array(__get_dynamic_array(phys_addrs), + __get_dynamic_array_len(phys_addrs) / + sizeof(u64), sizeof(u64)), + __entry->err, + decode_dma_attrs(__entry->attrs)) +); + TRACE_EVENT(dma_unmap_sg, TP_PROTO(struct device *dev, struct scatterlist *sgl, int nents, enum dma_data_direction dir, unsigned long attrs), diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 690aeda8bd7da..32dcf8492bbcd 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -223,6 +223,7 @@ static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, debug_dma_map_sg(dev, sg, nents, ents, dir, attrs); } else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM && ents != -EIO && ents != -EREMOTEIO)) { + trace_dma_map_sg_err(dev, sg, nents, ents, dir, attrs); return -EIO; }
@@ -604,20 +605,26 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle, if (WARN_ON_ONCE(flag & __GFP_COMP)) return NULL;
- if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr)) + if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr)) { + trace_dma_alloc(dev, cpu_addr, *dma_handle, size, + DMA_BIDIRECTIONAL, flag, attrs); return cpu_addr; + }
/* let the implementation decide on the zone to allocate from: */ flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
- if (dma_alloc_direct(dev, ops)) + if (dma_alloc_direct(dev, ops)) { cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs); - else if (use_dma_iommu(dev)) + } else if (use_dma_iommu(dev)) { cpu_addr = iommu_dma_alloc(dev, size, dma_handle, flag, attrs); - else if (ops->alloc) + } else if (ops->alloc) { cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs); - else + } else { + trace_dma_alloc(dev, NULL, 0, size, DMA_BIDIRECTIONAL, flag, + attrs); return NULL; + }
trace_dma_alloc(dev, cpu_addr, *dma_handle, size, DMA_BIDIRECTIONAL, flag, attrs); @@ -642,11 +649,11 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr, */ WARN_ON(irqs_disabled());
+ trace_dma_free(dev, cpu_addr, dma_handle, size, DMA_BIDIRECTIONAL, + attrs); if (!cpu_addr) return;
- trace_dma_free(dev, cpu_addr, dma_handle, size, DMA_BIDIRECTIONAL, - attrs); debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); if (dma_alloc_direct(dev, ops)) dma_direct_free(dev, size, cpu_addr, dma_handle, attrs); @@ -688,6 +695,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t size, trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle, size, dir, gfp, 0); debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0); + } else { + trace_dma_alloc_pages(dev, NULL, 0, size, dir, gfp, 0); } return page; } @@ -772,6 +781,8 @@ struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size, sgt->nents = 1; trace_dma_alloc_sgt(dev, sgt, size, dir, gfp, attrs); debug_dma_map_sg(dev, sgt->sgl, sgt->orig_nents, 1, dir, attrs); + } else { + trace_dma_alloc_sgt_err(dev, NULL, 0, size, gfp, dir, attrs); } return sgt; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baochen Qiang baochen.qiang@oss.qualcomm.com
[ Upstream commit 7e2368a21741e2db542330b32aa6fdd8908e7cff ]
As discussed in [1], there is no need to enforce dma mapping check on noncoherent allocations, a simple test on the returned CPU address is good enough.
Add a new pair of debug helpers and use them for noncoherent alloc/free to fix this issue.
Fixes: efa70f2fdc84 ("dma-mapping: add a new dma_alloc_pages API") Link: https://lore.kernel.org/all/ff6c1fe6-820f-4e58-8395-df06aa91706c@oss.qualcom... # 1 Signed-off-by: Baochen Qiang baochen.qiang@oss.qualcomm.com Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Link: https://lore.kernel.org/r/20250828-dma-debug-fix-noncoherent-dma-check-v1-1-... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/dma/debug.c | 48 +++++++++++++++++++++++++++++++++++++++++++- kernel/dma/debug.h | 20 ++++++++++++++++++ kernel/dma/mapping.c | 4 ++-- 3 files changed, 69 insertions(+), 3 deletions(-)
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 4e3692afdf0d2..0221023e1120d 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -39,6 +39,7 @@ enum { dma_debug_sg, dma_debug_coherent, dma_debug_resource, + dma_debug_noncoherent, };
enum map_err_types { @@ -141,6 +142,7 @@ static const char *type2name[] = { [dma_debug_sg] = "scatter-gather", [dma_debug_coherent] = "coherent", [dma_debug_resource] = "resource", + [dma_debug_noncoherent] = "noncoherent", };
static const char *dir2name[] = { @@ -993,7 +995,8 @@ static void check_unmap(struct dma_debug_entry *ref) "[mapped as %s] [unmapped as %s]\n", ref->dev_addr, ref->size, type2name[entry->type], type2name[ref->type]); - } else if (entry->type == dma_debug_coherent && + } else if ((entry->type == dma_debug_coherent || + entry->type == dma_debug_noncoherent) && ref->paddr != entry->paddr) { err_printk(ref->dev, entry, "device driver frees " "DMA memory with different CPU address " @@ -1573,6 +1576,49 @@ void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, } }
+void debug_dma_alloc_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr, + unsigned long attrs) +{ + struct dma_debug_entry *entry; + + if (unlikely(dma_debug_disabled())) + return; + + entry = dma_entry_alloc(); + if (!entry) + return; + + entry->type = dma_debug_noncoherent; + entry->dev = dev; + entry->paddr = page_to_phys(page); + entry->size = size; + entry->dev_addr = dma_addr; + entry->direction = direction; + + add_dma_entry(entry, attrs); +} + +void debug_dma_free_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr) +{ + struct dma_debug_entry ref = { + .type = dma_debug_noncoherent, + .dev = dev, + .paddr = page_to_phys(page), + .dev_addr = dma_addr, + .size = size, + .direction = direction, + }; + + if (unlikely(dma_debug_disabled())) + return; + + check_unmap(&ref); +} + static int __init dma_debug_driver_setup(char *str) { int i; diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h index f525197d3cae6..48757ca13f314 100644 --- a/kernel/dma/debug.h +++ b/kernel/dma/debug.h @@ -54,6 +54,13 @@ extern void debug_dma_sync_sg_for_cpu(struct device *dev, extern void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems, int direction); +extern void debug_dma_alloc_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr, + unsigned long attrs); +extern void debug_dma_free_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr); #else /* CONFIG_DMA_API_DEBUG */ static inline void debug_dma_map_page(struct device *dev, struct page *page, size_t offset, size_t size, @@ -126,5 +133,18 @@ static inline void debug_dma_sync_sg_for_device(struct device *dev, int nelems, int direction) { } + +static inline void debug_dma_alloc_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr, + unsigned long attrs) +{ +} + +static inline void debug_dma_free_pages(struct device *dev, struct page *page, + size_t size, int direction, + dma_addr_t dma_addr) +{ +} #endif /* CONFIG_DMA_API_DEBUG */ #endif /* _KERNEL_DMA_DEBUG_H */ diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 32dcf8492bbcd..c12c62ad8a6bf 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -694,7 +694,7 @@ struct page *dma_alloc_pages(struct device *dev, size_t size, if (page) { trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle, size, dir, gfp, 0); - debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0); + debug_dma_alloc_pages(dev, page, size, dir, *dma_handle, 0); } else { trace_dma_alloc_pages(dev, NULL, 0, size, dir, gfp, 0); } @@ -720,7 +720,7 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page, dma_addr_t dma_handle, enum dma_data_direction dir) { trace_dma_free_pages(dev, page_to_virt(page), dma_handle, size, dir, 0); - debug_dma_unmap_page(dev, dma_handle, size, dir); + debug_dma_free_pages(dev, page, size, dir, dma_handle); __dma_free_pages(dev, size, page, dma_handle, dir); } EXPORT_SYMBOL_GPL(dma_free_pages);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yeoreum Yun yeoreum.yun@arm.com
commit 7a19afee6fb39df63ddea7ce78976d8c521178c6 upstream.
Similar to commit 09c6304e38e4 ("kasan: test: fix compatibility with FORTIFY_SOURCE") the kernel is panicing in kasan_string().
This is due to the `src` and `ptr` not being hidden from the optimizer which would disable the runtime fortify string checker.
Call trace: __fortify_panic+0x10/0x20 (P) kasan_strings+0x980/0x9b0 kunit_try_run_case+0x68/0x190 kunit_generic_run_threadfn_adapter+0x34/0x68 kthread+0x1c4/0x228 ret_from_fork+0x10/0x20 Code: d503233f a9bf7bfd 910003fd 9424b243 (d4210000) ---[ end trace 0000000000000000 ]--- note: kunit_try_catch[128] exited with irqs disabled note: kunit_try_catch[128] exited with preempt_count 1 # kasan_strings: try faulted: last ** replaying previous printk message ** # kasan_strings: try faulted: last line seen mm/kasan/kasan_test_c.c:1600 # kasan_strings: internal error occurred preventing test case from running: -4
Link: https://lkml.kernel.org/r/20250801120236.2962642-1-yeoreum.yun@arm.com Fixes: 73228c7ecc5e ("KASAN: port KASAN Tests to KUnit") Signed-off-by: Yeoreum Yun yeoreum.yun@arm.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Dmitriy Vyukov dvyukov@google.com Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Yeoreum Yun yeoreum.yun@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/kasan/kasan_test_c.c | 1 + 1 file changed, 1 insertion(+)
--- a/mm/kasan/kasan_test_c.c +++ b/mm/kasan/kasan_test_c.c @@ -1500,6 +1500,7 @@ static void kasan_memchr(struct kunit *t
ptr = kmalloc(size, GFP_KERNEL | __GFP_ZERO); KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr); + OPTIMIZER_HIDE_VAR(ptr);
OPTIMIZER_HIDE_VAR(ptr); OPTIMIZER_HIDE_VAR(size);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yevgeny Kliteynik kliteyn@nvidia.com
commit 1ce840c7a659aa53a31ef49f0271b4fd0dc10296 upstream.
Currently, when firmware failure occurs during matcher disconnect flow, the error flow of the function reconnects the matcher back and returns an error, which continues running the calling function and eventually frees the matcher that is being disconnected. This leads to a case where we have a freed matcher on the matchers list, which in turn leads to use-after-free and eventual crash.
This patch fixes that by not trying to reconnect the matcher back when some FW command fails during disconnect.
Note that we're dealing here with FW error. We can't overcome this problem. This might lead to bad steering state (e.g. wrong connection between matchers), and will also lead to resource leakage, as it is the case with any other error handling during resource destruction.
However, the goal here is to allow the driver to continue and not crash the machine with use-after-free error.
Signed-off-by: Yevgeny Kliteynik kliteyn@nvidia.com Signed-off-by: Itamar Gozlan igozlan@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20250102181415.1477316-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Jan Alexander Preissler akendo@akendo.eu Signed-off-by: Sujana Subramaniam sujana.subramaniam@sap.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws_matcher.c | 24 +++------- 1 file changed, 8 insertions(+), 16 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws_matcher.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws_matcher.c @@ -165,14 +165,14 @@ static int hws_matcher_disconnect(struct next->match_ste.rtc_0_id, next->match_ste.rtc_1_id); if (ret) { - mlx5hws_err(tbl->ctx, "Failed to disconnect matcher\n"); - goto matcher_reconnect; + mlx5hws_err(tbl->ctx, "Fatal error, failed to disconnect matcher\n"); + return ret; } } else { ret = mlx5hws_table_connect_to_miss_table(tbl, tbl->default_miss.miss_tbl); if (ret) { - mlx5hws_err(tbl->ctx, "Failed to disconnect last matcher\n"); - goto matcher_reconnect; + mlx5hws_err(tbl->ctx, "Fatal error, failed to disconnect last matcher\n"); + return ret; } }
@@ -180,27 +180,19 @@ static int hws_matcher_disconnect(struct if (prev_ft_id == tbl->ft_id) { ret = mlx5hws_table_update_connected_miss_tables(tbl); if (ret) { - mlx5hws_err(tbl->ctx, "Fatal error, failed to update connected miss table\n"); - goto matcher_reconnect; + mlx5hws_err(tbl->ctx, + "Fatal error, failed to update connected miss table\n"); + return ret; } }
ret = mlx5hws_table_ft_set_default_next_ft(tbl, prev_ft_id); if (ret) { mlx5hws_err(tbl->ctx, "Fatal error, failed to restore matcher ft default miss\n"); - goto matcher_reconnect; + return ret; }
return 0; - -matcher_reconnect: - if (list_empty(&tbl->matchers_list) || !prev) - list_add(&matcher->list_node, &tbl->matchers_list); - else - /* insert after prev matcher */ - list_add(&matcher->list_node, &prev->list_node); - - return ret; }
static void hws_matcher_set_rtc_attr_sz(struct mlx5hws_matcher *matcher,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harry Yoo harry.yoo@oracle.com
commit f2d2f9598ebb0158a3fe17cda0106d7752e654a2 upstream.
Introduce and use {pgd,p4d}_populate_kernel() in core MM code when populating PGD and P4D entries for the kernel address space. These helpers ensure proper synchronization of page tables when updating the kernel portion of top-level page tables.
Until now, the kernel has relied on each architecture to handle synchronization of top-level page tables in an ad-hoc manner. For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes").
However, this approach has proven fragile for following reasons:
1) It is easy to forget to perform the necessary page table synchronization when introducing new changes. For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") overlooked the need to synchronize page tables for the vmemmap area.
2) It is also easy to overlook that the vmemmap and direct mapping areas must not be accessed before explicit page table synchronization. For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")) caused crashes by accessing the vmemmap area before calling sync_global_pgds().
To address this, as suggested by Dave Hansen, introduce _kernel() variants of the page table population helpers, which invoke architecture-specific hooks to properly synchronize page tables. These are introduced in a new header file, include/linux/pgalloc.h, so they can be called from common code.
They reuse existing infrastructure for vmalloc and ioremap. Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK, and the actual synchronization is performed by arch_sync_kernel_mappings().
This change currently targets only x86_64, so only PGD and P4D level helpers are introduced. Currently, these helpers are no-ops since no architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.
In theory, PUD and PMD level helpers can be added later if needed by other architectures. For now, 32-bit architectures (x86-32 and arm) only handle PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless we introduce a PMD level helper.
[harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()] Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Signed-off-by: Harry Yoo harry.yoo@oracle.com Suggested-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Kiryl Shutsemau kas@kernel.org Reviewed-by: Mike Rapoport (Microsoft) rppt@kernel.org Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Acked-by: David Hildenbrand david@redhat.com Cc: Alexander Potapenko glider@google.com Cc: Alistair Popple apopple@nvidia.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Andy Lutomirski luto@kernel.org Cc: "Aneesh Kumar K.V" aneesh.kumar@linux.ibm.com Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Ard Biesheuvel ardb@kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: bibo mao maobibo@loongson.cn Cc: Borislav Betkov bp@alien8.de Cc: Christoph Lameter (Ampere) cl@gentwo.org Cc: Dennis Zhou dennis@kernel.org Cc: Dev Jain dev.jain@arm.com Cc: Dmitriy Vyukov dvyukov@google.com Cc: Gwan-gyeong Mun gwan-gyeong.mun@intel.com Cc: Ingo Molnar mingo@redhat.com Cc: Jane Chu jane.chu@oracle.com Cc: Joao Martins joao.m.martins@oracle.com Cc: Joerg Roedel joro@8bytes.org Cc: John Hubbard jhubbard@nvidia.com Cc: Kevin Brodsky kevin.brodsky@arm.com Cc: Liam Howlett liam.howlett@oracle.com Cc: Michal Hocko mhocko@suse.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Qi Zheng zhengqi.arch@bytedance.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Suren Baghdasaryan surenb@google.com Cc: Tejun Heo tj@kernel.org Cc: Thomas Gleinxer tglx@linutronix.de Cc: Thomas Huth thuth@redhat.com Cc: "Uladzislau Rezki (Sony)" urezki@gmail.com Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [ Adjust context ] Signed-off-by: Harry Yoo harry.yoo@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/pgalloc.h | 29 +++++++++++++++++++++++++++++ include/linux/pgtable.h | 13 +++++++------ mm/kasan/init.c | 12 ++++++------ mm/percpu.c | 6 +++--- mm/sparse-vmemmap.c | 6 +++--- 5 files changed, 48 insertions(+), 18 deletions(-) create mode 100644 include/linux/pgalloc.h
--- /dev/null +++ b/include/linux/pgalloc.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PGALLOC_H +#define _LINUX_PGALLOC_H + +#include <linux/pgtable.h> +#include <asm/pgalloc.h> + +/* + * {pgd,p4d}_populate_kernel() are defined as macros to allow + * compile-time optimization based on the configured page table levels. + * Without this, linking may fail because callers (e.g., KASAN) may rely + * on calls to these functions being optimized away when passing symbols + * that exist only for certain page table levels. + */ +#define pgd_populate_kernel(addr, pgd, p4d) \ + do { \ + pgd_populate(&init_mm, pgd, p4d); \ + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED) \ + arch_sync_kernel_mappings(addr, addr); \ + } while (0) + +#define p4d_populate_kernel(addr, p4d, pud) \ + do { \ + p4d_populate(&init_mm, p4d, pud); \ + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_P4D_MODIFIED) \ + arch_sync_kernel_mappings(addr, addr); \ + } while (0) + +#endif /* _LINUX_PGALLOC_H */ --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1699,8 +1699,8 @@ static inline int pmd_protnone(pmd_t pmd
/* * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values - * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings() - * needs to be called. + * and let generic vmalloc, ioremap and page table update code know when + * arch_sync_kernel_mappings() needs to be called. */ #ifndef ARCH_PAGE_TABLE_SYNC_MASK #define ARCH_PAGE_TABLE_SYNC_MASK 0 @@ -1833,10 +1833,11 @@ static inline bool arch_has_pfn_modify_c /* * Page Table Modification bits for pgtbl_mod_mask. * - * These are used by the p?d_alloc_track*() set of functions an in the generic - * vmalloc/ioremap code to track at which page-table levels entries have been - * modified. Based on that the code can better decide when vmalloc and ioremap - * mapping changes need to be synchronized to other page-tables in the system. + * These are used by the p?d_alloc_track*() and p*d_populate_kernel() + * functions in the generic vmalloc, ioremap and page table update code + * to track at which page-table levels entries have been modified. + * Based on that the code can better decide when page table changes need + * to be synchronized to other page-tables in the system. */ #define __PGTBL_PGD_MODIFIED 0 #define __PGTBL_P4D_MODIFIED 1 --- a/mm/kasan/init.c +++ b/mm/kasan/init.c @@ -13,9 +13,9 @@ #include <linux/mm.h> #include <linux/pfn.h> #include <linux/slab.h> +#include <linux/pgalloc.h>
#include <asm/page.h> -#include <asm/pgalloc.h>
#include "kasan.h"
@@ -203,7 +203,7 @@ static int __ref zero_p4d_populate(pgd_t pud_t *pud; pmd_t *pmd;
- p4d_populate(&init_mm, p4d, + p4d_populate_kernel(addr, p4d, lm_alias(kasan_early_shadow_pud)); pud = pud_offset(p4d, addr); pud_populate(&init_mm, pud, @@ -224,7 +224,7 @@ static int __ref zero_p4d_populate(pgd_t } else { p = early_alloc(PAGE_SIZE, NUMA_NO_NODE); pud_init(p); - p4d_populate(&init_mm, p4d, p); + p4d_populate_kernel(addr, p4d, p); } } zero_pud_populate(p4d, addr, next); @@ -263,10 +263,10 @@ int __ref kasan_populate_early_shadow(co * puds,pmds, so pgd_populate(), pud_populate() * is noops. */ - pgd_populate(&init_mm, pgd, + pgd_populate_kernel(addr, pgd, lm_alias(kasan_early_shadow_p4d)); p4d = p4d_offset(pgd, addr); - p4d_populate(&init_mm, p4d, + p4d_populate_kernel(addr, p4d, lm_alias(kasan_early_shadow_pud)); pud = pud_offset(p4d, addr); pud_populate(&init_mm, pud, @@ -285,7 +285,7 @@ int __ref kasan_populate_early_shadow(co if (!p) return -ENOMEM; } else { - pgd_populate(&init_mm, pgd, + pgd_populate_kernel(addr, pgd, early_alloc(PAGE_SIZE, NUMA_NO_NODE)); } } --- a/mm/percpu.c +++ b/mm/percpu.c @@ -3129,7 +3129,7 @@ out_free: #endif /* BUILD_EMBED_FIRST_CHUNK */
#ifdef BUILD_PAGE_FIRST_CHUNK -#include <asm/pgalloc.h> +#include <linux/pgalloc.h>
#ifndef P4D_TABLE_SIZE #define P4D_TABLE_SIZE PAGE_SIZE @@ -3157,7 +3157,7 @@ void __init __weak pcpu_populate_pte(uns p4d = memblock_alloc(P4D_TABLE_SIZE, P4D_TABLE_SIZE); if (!p4d) goto err_alloc; - pgd_populate(&init_mm, pgd, p4d); + pgd_populate_kernel(addr, pgd, p4d); }
p4d = p4d_offset(pgd, addr); @@ -3165,7 +3165,7 @@ void __init __weak pcpu_populate_pte(uns pud = memblock_alloc(PUD_TABLE_SIZE, PUD_TABLE_SIZE); if (!pud) goto err_alloc; - p4d_populate(&init_mm, p4d, pud); + p4d_populate_kernel(addr, p4d, pud); }
pud = pud_offset(p4d, addr); --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -27,9 +27,9 @@ #include <linux/spinlock.h> #include <linux/vmalloc.h> #include <linux/sched.h> +#include <linux/pgalloc.h>
#include <asm/dma.h> -#include <asm/pgalloc.h>
/* * Allocate a block of memory to be used to back the virtual memory map @@ -230,7 +230,7 @@ p4d_t * __meminit vmemmap_p4d_populate(p if (!p) return NULL; pud_init(p); - p4d_populate(&init_mm, p4d, p); + p4d_populate_kernel(addr, p4d, p); } return p4d; } @@ -242,7 +242,7 @@ pgd_t * __meminit vmemmap_pgd_populate(u void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; - pgd_populate(&init_mm, pgd, p); + pgd_populate_kernel(addr, pgd, p); } return pgd; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
[ Upstream commit d5bbfbad58ec0ccd187282f0e171bc763efa6828 ]
trace_dma_alloc_sgt_err was called with the dir and flags arguments swapped. Fix this.
Fixes: 68b6dbf1f441 ("dma-mapping: trace more error paths") Signed-off-by: Sean Anderson sean.anderson@linux.dev Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202410302243.1wnTlPk3-lkp@intel.com/ Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/dma/mapping.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index c12c62ad8a6bf..c7cc4e33ec6e0 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -782,7 +782,7 @@ struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size, trace_dma_alloc_sgt(dev, sgt, size, dir, gfp, attrs); debug_dma_map_sg(dev, sgt->sgl, sgt->orig_nents, 1, dir, attrs); } else { - trace_dma_alloc_sgt_err(dev, NULL, 0, size, gfp, dir, attrs); + trace_dma_alloc_sgt_err(dev, NULL, 0, size, dir, gfp, attrs); } return sgt; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
[ Upstream commit aef7ee7649e02f7fc0d2e5e532f352496976dcb1 ]
Offset into the page should also be considered while calculating a physical address for struct dma_debug_entry. page_to_phys() just shifts the value PAGE_SHIFT bits to the left so offset part is zero-filled.
An example (wrong) debug assertion failure with CONFIG_DMA_API_DEBUG enabled which is observed during systemd boot process after recent dma-debug changes:
DMA-API: e1000 0000:00:03.0: cacheline tracking EEXIST, overlapping mappings aren't supported WARNING: CPU: 4 PID: 941 at kernel/dma/debug.c:596 add_dma_entry CPU: 4 UID: 0 PID: 941 Comm: ip Not tainted 6.12.0+ #288 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:add_dma_entry kernel/dma/debug.c:596 Call Trace: <TASK> debug_dma_map_page kernel/dma/debug.c:1236 dma_map_page_attrs kernel/dma/mapping.c:179 e1000_alloc_rx_buffers drivers/net/ethernet/intel/e1000/e1000_main.c:4616 ...
Found by Linux Verification Center (linuxtesting.org).
Fixes: 9d4f645a1fd4 ("dma-debug: store a phys_addr_t in struct dma_debug_entry") Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru [hch: added a little helper to clean up the code] Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/dma/debug.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 0221023e1120d..39972e834e7a1 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -1224,7 +1224,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
entry->dev = dev; entry->type = dma_debug_single; - entry->paddr = page_to_phys(page); + entry->paddr = page_to_phys(page) + offset; entry->dev_addr = dma_addr; entry->size = size; entry->direction = direction; @@ -1382,6 +1382,18 @@ void debug_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, } }
+static phys_addr_t virt_to_paddr(void *virt) +{ + struct page *page; + + if (is_vmalloc_addr(virt)) + page = vmalloc_to_page(virt); + else + page = virt_to_page(virt); + + return page_to_phys(page) + offset_in_page(virt); +} + void debug_dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t dma_addr, void *virt, unsigned long attrs) @@ -1404,8 +1416,7 @@ void debug_dma_alloc_coherent(struct device *dev, size_t size,
entry->type = dma_debug_coherent; entry->dev = dev; - entry->paddr = page_to_phys((is_vmalloc_addr(virt) ? - vmalloc_to_page(virt) : virt_to_page(virt))); + entry->paddr = virt_to_paddr(virt); entry->size = size; entry->dev_addr = dma_addr; entry->direction = DMA_BIDIRECTIONAL; @@ -1428,8 +1439,7 @@ void debug_dma_free_coherent(struct device *dev, size_t size, if (!is_vmalloc_addr(virt) && !virt_addr_valid(virt)) return;
- ref.paddr = page_to_phys((is_vmalloc_addr(virt) ? - vmalloc_to_page(virt) : virt_to_page(virt))); + ref.paddr = virt_to_paddr(virt);
if (unlikely(dma_debug_disabled())) return;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maurizio Lombardi mlombard@redhat.com
[ Upstream commit 288ff0d10beb069355036355d5f7612579dc869c ]
nvme_submit_cmds() should check the rqlist before calling nvme_write_sq_db(); if the list is empty, it must return immediately.
Fixes: beadf0088501 ("nvme-pci: reverse request order in nvme_queue_rqs") Signed-off-by: Maurizio Lombardi mlombard@redhat.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 2bddc9f60fecc..fdc9f1df0578b 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -989,6 +989,9 @@ static void nvme_submit_cmds(struct nvme_queue *nvmeq, struct rq_list *rqlist) { struct request *req;
+ if (rq_list_empty(rqlist)) + return; + spin_lock(&nvmeq->sq_lock); while ((req = rq_list_pop(rqlist))) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huan Yang link@vivo.com
[ Upstream commit ceb7b62eaaaacfcf87473bd2e99ac73a758620cb ]
This reverts commit 18d7de823b7150344d242c3677e65d68c5271b04.
We cannot use vmap_pfn() in vmap_udmabuf() as it would fail the pfn_valid() check in vmap_pfn_apply(). This is because vmap_pfn() is intended to be used for mapping non-struct-page memory such as PCIe BARs. Since, udmabuf mostly works with pages/folios backed by shmem/hugetlbfs/THP, vmap_pfn() is not the right tool or API to invoke for implementing vmap.
Signed-off-by: Huan Yang link@vivo.com Suggested-by: Vivek Kasireddy vivek.kasireddy@intel.com Reported-by: Bingbu Cao bingbu.cao@linux.intel.com Closes: https://lore.kernel.org/dri-devel/eb7e0137-3508-4287-98c4-816c5fd98e10@vivo.... Acked-by: Vivek Kasireddy vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy vivek.kasireddy@intel.com Link: https://lore.kernel.org/r/20250428073831.19942-2-link@vivo.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma-buf/Kconfig | 1 - drivers/dma-buf/udmabuf.c | 22 +++++++--------------- 2 files changed, 7 insertions(+), 16 deletions(-)
diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig index fee04fdb08220..b46eb8a552d7b 100644 --- a/drivers/dma-buf/Kconfig +++ b/drivers/dma-buf/Kconfig @@ -36,7 +36,6 @@ config UDMABUF depends on DMA_SHARED_BUFFER depends on MEMFD_CREATE || COMPILE_TEST depends on MMU - select VMAP_PFN help A driver to let userspace turn memfd regions into dma-bufs. Qemu can use this to create host dmabufs for guest framebuffers. diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 959f690b12260..0e127a9109e75 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -74,29 +74,21 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map) { struct udmabuf *ubuf = buf->priv; - unsigned long *pfns; + struct page **pages; void *vaddr; pgoff_t pg;
dma_resv_assert_held(buf->resv);
- /** - * HVO may free tail pages, so just use pfn to map each folio - * into vmalloc area. - */ - pfns = kvmalloc_array(ubuf->pagecount, sizeof(*pfns), GFP_KERNEL); - if (!pfns) + pages = kvmalloc_array(ubuf->pagecount, sizeof(*pages), GFP_KERNEL); + if (!pages) return -ENOMEM;
- for (pg = 0; pg < ubuf->pagecount; pg++) { - unsigned long pfn = folio_pfn(ubuf->folios[pg]); - - pfn += ubuf->offsets[pg] >> PAGE_SHIFT; - pfns[pg] = pfn; - } + for (pg = 0; pg < ubuf->pagecount; pg++) + pages[pg] = &ubuf->folios[pg]->page;
- vaddr = vmap_pfn(pfns, ubuf->pagecount, PAGE_KERNEL); - kvfree(pfns); + vaddr = vm_map_ram(pages, ubuf->pagecount, -1); + kvfree(pages); if (!vaddr) return -EINVAL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
[ Upstream commit 9e28059d56649a7212d5b3f8751ec021154ba3dd ]
This patch addresses an issue where some files in case-insensitive directories become inaccessible due to changes in how the kernel function, utf8_casefold(), generates case-folded strings from the commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points").
There are good reasons why this change should be made; it's actually quite stupid that Unicode seems to think that the characters ❤ and ❤️ should be casefolded. Unfortimately because of the backwards compatibility issue, this commit was reverted in 231825b2e1ff.
This problem is addressed by instituting a brute-force linear fallback if a lookup fails on case-folded directory, which does result in a performance hit when looking up files affected by the changing how thekernel treats ignorable Uniode characters, or when attempting to look up non-existent file names. So this fallback can be disabled by setting an encoding flag if in the future, the system administrator or the manufacturer of a mobile handset or tablet can be sure that there was no opportunity for a kernel to insert file names with incompatible encodings.
Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Gabriel Krisman Bertazi krisman@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/namei.c | 14 ++++++++++---- include/linux/fs.h | 10 +++++++++- 2 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 286f8fcb74cc9..29c7c0b8295fb 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1462,7 +1462,8 @@ static bool ext4_match(struct inode *parent, * sure cf_name was properly initialized before * considering the calculated hash. */ - if (IS_ENCRYPTED(parent) && fname->cf_name.name && + if (sb_no_casefold_compat_fallback(parent->i_sb) && + IS_ENCRYPTED(parent) && fname->cf_name.name && (fname->hinfo.hash != EXT4_DIRENT_HASH(de) || fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de))) return false; @@ -1595,10 +1596,15 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir, * return. Otherwise, fall back to doing a search the * old fashioned way. */ - if (!IS_ERR(ret) || PTR_ERR(ret) != ERR_BAD_DX_DIR) + if (IS_ERR(ret) && PTR_ERR(ret) == ERR_BAD_DX_DIR) + dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " + "falling back\n")); + else if (!sb_no_casefold_compat_fallback(dir->i_sb) && + *res_dir == NULL && IS_CASEFOLDED(dir)) + dxtrace(printk(KERN_DEBUG "ext4_find_entry: casefold " + "failed, falling back\n")); + else goto cleanup_and_exit; - dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " - "falling back\n")); ret = NULL; } nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb); diff --git a/include/linux/fs.h b/include/linux/fs.h index a6de8d93838d1..37a01c9d96583 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1197,11 +1197,19 @@ extern int send_sigurg(struct file *file); #define SB_NOUSER BIT(31)
/* These flags relate to encoding and casefolding */ -#define SB_ENC_STRICT_MODE_FL (1 << 0) +#define SB_ENC_STRICT_MODE_FL (1 << 0) +#define SB_ENC_NO_COMPAT_FALLBACK_FL (1 << 1)
#define sb_has_strict_encoding(sb) \ (sb->s_encoding_flags & SB_ENC_STRICT_MODE_FL)
+#if IS_ENABLED(CONFIG_UNICODE) +#define sb_no_casefold_compat_fallback(sb) \ + (sb->s_encoding_flags & SB_ENC_NO_COMPAT_FALLBACK_FL) +#else +#define sb_no_casefold_compat_fallback(sb) (1) +#endif + /* * Umount options */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Umesh Nerlige Ramappa umesh.nerlige.ramappa@intel.com
[ Upstream commit cb5fab2afd906307876d79537ef0329033c40dd3 ]
When running igt@gem_exec_balancer@individual for multiple iterations, it is seen that the delta busyness returned by PMU is 0. The issue stems from a combination of 2 implementation specific details:
1) gt_park is throttling __update_guc_busyness_stats() so that it does not hog PCI bandwidth for some use cases. (Ref: 59bcdb564b3ba)
2) busyness implementation always returns monotonically increasing counters. (Ref: cf907f6d29421)
If an application queried an engine while it was active, engine->stats.guc.running is set to true. Following that, if all PM wakeref's are released, then gt is parked. At this time the throttling of __update_guc_busyness_stats() may result in a missed update to the running state of the engine (due to (1) above). This means subsequent calls to guc_engine_busyness() will think that the engine is still running and they will keep updating the cached counter (stats->total). This results in an inflated cached counter.
Later when the application runs a workload and queries for busyness, we return the cached value since it is larger than the actual value (due to (2) above)
All subsequent queries will return the same large (inflated) value, so the application sees a delta busyness of zero.
Fix the issue by resetting the running state of engines each time intel_guc_busyness_park() is called.
v2: (Rodrigo) - Use the correct tag in commit message - Drop the redundant wakeref check in guc_engine_busyness() and update commit message
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13366 Fixes: cf907f6d2942 ("i915/guc: Ensure busyness counter increases motonically") Signed-off-by: Umesh Nerlige Ramappa umesh.nerlige.ramappa@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20250123193839.2394694-1-umesh... (cherry picked from commit 431b742e2bfc9f6dd713f261629741980996d001) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index b48373b166779..355a21eb48443 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1469,6 +1469,19 @@ static void __reset_guc_busyness_stats(struct intel_guc *guc) spin_unlock_irqrestore(&guc->timestamp.lock, flags); }
+static void __update_guc_busyness_running_state(struct intel_guc *guc) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct intel_engine_cs *engine; + enum intel_engine_id id; + unsigned long flags; + + spin_lock_irqsave(&guc->timestamp.lock, flags); + for_each_engine(engine, gt, id) + engine->stats.guc.running = false; + spin_unlock_irqrestore(&guc->timestamp.lock, flags); +} + static void __update_guc_busyness_stats(struct intel_guc *guc) { struct intel_gt *gt = guc_to_gt(guc); @@ -1619,6 +1632,9 @@ void intel_guc_busyness_park(struct intel_gt *gt) if (!guc_submission_initialized(guc)) return;
+ /* Assume no engines are running and set running state to false */ + __update_guc_busyness_running_state(guc); + /* * There is a race with suspend flow where the worker runs after suspend * and causes an unclaimed register access warning. Cancel the worker
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivasan Shanmugam srinivasan.shanmugam@amd.com
[ Upstream commit da29abe71e164f10917ea6da02f5d9c192ccdeb7 ]
The function amdgpu_dm_crtc_mem_type_changed was dereferencing pointers returned by drm_atomic_get_plane_state without checking for errors. This could lead to undefined behavior if the function returns an error pointer.
This commit adds checks using IS_ERR to ensure that new_plane_state and old_plane_state are valid before dereferencing them.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:11486 amdgpu_dm_crtc_mem_type_changed() error: 'new_plane_state' dereferencing possible ERR_PTR()
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c 11475 static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev, 11476 struct drm_atomic_state *state, 11477 struct drm_crtc_state *crtc_state) 11478 { 11479 struct drm_plane *plane; 11480 struct drm_plane_state *new_plane_state, *old_plane_state; 11481 11482 drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) { 11483 new_plane_state = drm_atomic_get_plane_state(state, plane); 11484 old_plane_state = drm_atomic_get_plane_state(state, plane); ^^^^^^^^^^^^^^^^^^^^^^^^^^ These functions can fail.
11485 --> 11486 if (old_plane_state->fb && new_plane_state->fb && 11487 get_mem_type(old_plane_state->fb) != get_mem_type(new_plane_state->fb)) 11488 return true; 11489 } 11490 11491 return false; 11492 }
Fixes: 4caacd1671b7 ("drm/amd/display: Do not elevate mem_type change to full update") Cc: Leo Li sunpeng.li@amd.com Cc: Tom Chung chiahsuan.chung@amd.com Cc: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Cc: Roman Li roman.li@amd.com Cc: Alex Hung alex.hung@amd.com Cc: Aurabindo Pillai aurabindo.pillai@amd.com Cc: Harry Wentland harry.wentland@amd.com Cc: Hamza Mahfooz hamza.mahfooz@amd.com Reported-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Srinivasan Shanmugam srinivasan.shanmugam@amd.com Reviewed-by: Roman Li roman.li@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 9763752cf5cde..b585c321d3454 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -11483,6 +11483,11 @@ static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev, new_plane_state = drm_atomic_get_plane_state(state, plane); old_plane_state = drm_atomic_get_plane_state(state, plane);
+ if (IS_ERR(new_plane_state) || IS_ERR(old_plane_state)) { + DRM_ERROR("Failed to get plane state for plane %s\n", plane->name); + return false; + } + if (old_plane_state->fb && new_plane_state->fb && get_mem_type(old_plane_state->fb) != get_mem_type(new_plane_state->fb)) return true;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aurabindo Pillai aurabindo.pillai@amd.com
[ Upstream commit a5d258a00b41143d9c64880eed35799d093c4782 ]
This reverts commit 88c7c56d07c108ed4de319c8dba44aa4b8a38dd1.
SW and HW state are not always matching in some cases causing cursor to be disabled.
Signed-off-by: Aurabindo Pillai aurabindo.pillai@amd.com Reviewed-by: Leo Li sunpeng.li@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dpp/dcn10/dcn10_dpp.c | 7 +++---- .../gpu/drm/amd/display/dc/dpp/dcn401/dcn401_dpp_cm.c | 6 ++---- drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c | 8 +++----- .../gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c | 10 ++++------ 4 files changed, 12 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dpp/dcn10/dcn10_dpp.c b/drivers/gpu/drm/amd/display/dc/dpp/dcn10/dcn10_dpp.c index 01480a04f85ef..9a3be1dd352b6 100644 --- a/drivers/gpu/drm/amd/display/dc/dpp/dcn10/dcn10_dpp.c +++ b/drivers/gpu/drm/amd/display/dc/dpp/dcn10/dcn10_dpp.c @@ -483,11 +483,10 @@ void dpp1_set_cursor_position( if (src_y_offset + cursor_height <= 0) cur_en = 0; /* not visible beyond top edge*/
- if (dpp_base->pos.cur0_ctl.bits.cur0_enable != cur_en) { - REG_UPDATE(CURSOR0_CONTROL, CUR0_ENABLE, cur_en); + REG_UPDATE(CURSOR0_CONTROL, + CUR0_ENABLE, cur_en);
- dpp_base->pos.cur0_ctl.bits.cur0_enable = cur_en; - } + dpp_base->pos.cur0_ctl.bits.cur0_enable = cur_en; }
void dpp1_cnv_set_optional_cursor_attributes( diff --git a/drivers/gpu/drm/amd/display/dc/dpp/dcn401/dcn401_dpp_cm.c b/drivers/gpu/drm/amd/display/dc/dpp/dcn401/dcn401_dpp_cm.c index 712aff7e17f7a..92b34fe47f740 100644 --- a/drivers/gpu/drm/amd/display/dc/dpp/dcn401/dcn401_dpp_cm.c +++ b/drivers/gpu/drm/amd/display/dc/dpp/dcn401/dcn401_dpp_cm.c @@ -155,11 +155,9 @@ void dpp401_set_cursor_position( struct dcn401_dpp *dpp = TO_DCN401_DPP(dpp_base); uint32_t cur_en = pos->enable ? 1 : 0;
- if (dpp_base->pos.cur0_ctl.bits.cur0_enable != cur_en) { - REG_UPDATE(CURSOR0_CONTROL, CUR0_ENABLE, cur_en); + REG_UPDATE(CURSOR0_CONTROL, CUR0_ENABLE, cur_en);
- dpp_base->pos.cur0_ctl.bits.cur0_enable = cur_en; - } + dpp_base->pos.cur0_ctl.bits.cur0_enable = cur_en; }
void dpp401_set_optional_cursor_attributes( diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c index c74ee2d50a699..b405fa22f87a9 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c @@ -1044,13 +1044,11 @@ void hubp2_cursor_set_position( if (src_y_offset + cursor_height <= 0) cur_en = 0; /* not visible beyond top edge*/
- if (hubp->pos.cur_ctl.bits.cur_enable != cur_en) { - if (cur_en && REG_READ(CURSOR_SURFACE_ADDRESS) == 0) - hubp->funcs->set_cursor_attributes(hubp, &hubp->curs_attr); + if (cur_en && REG_READ(CURSOR_SURFACE_ADDRESS) == 0) + hubp->funcs->set_cursor_attributes(hubp, &hubp->curs_attr);
- REG_UPDATE(CURSOR_CONTROL, + REG_UPDATE(CURSOR_CONTROL, CURSOR_ENABLE, cur_en); - }
REG_SET_2(CURSOR_POSITION, 0, CURSOR_X_POSITION, pos->x, diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c index 7013c124efcff..2d52100510f05 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c @@ -718,13 +718,11 @@ void hubp401_cursor_set_position( dc_fixpt_from_int(dst_x_offset), param->h_scale_ratio));
- if (hubp->pos.cur_ctl.bits.cur_enable != cur_en) { - if (cur_en && REG_READ(CURSOR_SURFACE_ADDRESS) == 0) - hubp->funcs->set_cursor_attributes(hubp, &hubp->curs_attr); + if (cur_en && REG_READ(CURSOR_SURFACE_ADDRESS) == 0) + hubp->funcs->set_cursor_attributes(hubp, &hubp->curs_attr);
- REG_UPDATE(CURSOR_CONTROL, - CURSOR_ENABLE, cur_en); - } + REG_UPDATE(CURSOR_CONTROL, + CURSOR_ENABLE, cur_en);
REG_SET_2(CURSOR_POSITION, 0, CURSOR_X_POSITION, x_pos,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit 829ee558f3527fd602c6e2e9f270959d1de09fe0 ]
ASUS VivoBook X515UA with PCI SSID 1043:106f had a default quirk pickup via pin table that applies ALC256_FIXUP_ASUS_MIC, but this adds a bogus built-in mic pin 0x13 enabled. This was no big problem because the pin 0x13 was assigned as the secondary mic, but the recent fix made the entries sorted, hence this bogus pin appeared now as the primary input and it broke.
For fixing the bug, put the right quirk entry for this device pointing to ALC256_FIXUP_ASUS_MIC_NO_PRESENCE.
Fixes: 3b4309546b48 ("ALSA: hda: Fix headset detection failure due to unstable sort") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219897 Link: https://patch.msgid.link/20250324153233.21195-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index a28a59926adad..eb0e404c17841 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10857,6 +10857,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1043, 0x103f, "ASUS TX300", ALC282_FIXUP_ASUS_TX300), SND_PCI_QUIRK(0x1043, 0x1054, "ASUS G614FH/FM/FP", ALC287_FIXUP_CS35L41_I2C_2), SND_PCI_QUIRK(0x1043, 0x106d, "Asus K53BE", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), + SND_PCI_QUIRK(0x1043, 0x106f, "ASUS VivoBook X515UA", ALC256_FIXUP_ASUS_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1043, 0x1074, "ASUS G614PH/PM/PP", ALC287_FIXUP_CS35L41_I2C_2), SND_PCI_QUIRK(0x1043, 0x10a1, "ASUS UX391UA", ALC294_FIXUP_ASUS_SPK), SND_PCI_QUIRK(0x1043, 0x10a4, "ASUS TP3407SA", ALC287_FIXUP_TAS2781_I2C),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Rosca david.rosca@amd.com
[ Upstream commit 2036be31741b00f030530381643a8b35a5a42b5c ]
JPEG is not supported on Vega only.
Fixes: 0a6e7b06bdbe ("drm/amdgpu: Remove JPEG from vega and carrizo video caps") Signed-off-by: David Rosca david.rosca@amd.com Reviewed-by: Leo Liu leo.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 0f4dfe86fe922c37bcec99dce80a15b4d5d4726d) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/vi.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c index 48ab93e715c82..6e4f9c6108f60 100644 --- a/drivers/gpu/drm/amd/amdgpu/vi.c +++ b/drivers/gpu/drm/amd/amdgpu/vi.c @@ -239,6 +239,13 @@ static const struct amdgpu_video_codec_info cz_video_codecs_decode_array[] = .max_pixels_per_frame = 4096 * 4096, .max_level = 186, }, + { + .codec_type = AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_JPEG, + .max_width = 4096, + .max_height = 4096, + .max_pixels_per_frame = 4096 * 4096, + .max_level = 0, + }, };
static const struct amdgpu_video_codecs cz_video_codecs_decode =
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tigran Mkrtchyan tigran.mkrtchyan@desy.de
[ Upstream commit 5a46d2339a5ae268ede53a221f20433d8ea4f2f9 ]
Recent commit f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") has changed the error return type of ff_layout_choose_ds_for_read() from NULL to an error pointer. However, not all code paths have been updated to match the change. Thus, some non-NULL checks will accept error pointers as a valid return value.
Reported-by: Dan Carpenter dan.carpenter@linaro.org Suggested-by: Dan Carpenter dan.carpenter@linaro.org Fixes: f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") Signed-off-by: Tigran Mkrtchyan tigran.mkrtchyan@desy.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/flexfilelayout/flexfilelayout.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index b685e763ef11b..69496aab9583e 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -772,8 +772,11 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg, continue;
if (check_device && - nfs4_test_deviceid_unavailable(&mirror->mirror_ds->id_node)) + nfs4_test_deviceid_unavailable(&mirror->mirror_ds->id_node)) { + // reinitialize the error state in case if this is the last iteration + ds = ERR_PTR(-EINVAL); continue; + }
*best_idx = idx; break; @@ -803,7 +806,7 @@ ff_layout_choose_best_ds_for_read(struct pnfs_layout_segment *lseg, struct nfs4_pnfs_ds *ds;
ds = ff_layout_choose_valid_ds_for_read(lseg, start_idx, best_idx); - if (ds) + if (!IS_ERR(ds)) return ds; return ff_layout_choose_any_ds_for_read(lseg, start_idx, best_idx); } @@ -817,7 +820,7 @@ ff_layout_get_ds_for_read(struct nfs_pageio_descriptor *pgio,
ds = ff_layout_choose_best_ds_for_read(lseg, pgio->pg_mirror_idx, best_idx); - if (ds || !pgio->pg_mirror_idx) + if (!IS_ERR(ds) || !pgio->pg_mirror_idx) return ds; return ff_layout_choose_best_ds_for_read(lseg, 0, best_idx); } @@ -867,7 +870,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio, req->wb_nio = 0;
ds = ff_layout_get_ds_for_read(pgio, &ds_idx); - if (!ds) { + if (IS_ERR(ds)) { if (!ff_layout_no_fallback_to_mds(pgio->pg_lseg)) goto out_mds; pnfs_generic_pg_cleanup(pgio); @@ -1071,11 +1074,13 @@ static void ff_layout_resend_pnfs_read(struct nfs_pgio_header *hdr) { u32 idx = hdr->pgio_mirror_idx + 1; u32 new_idx = 0; + struct nfs4_pnfs_ds *ds;
- if (ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx)) - ff_layout_send_layouterror(hdr->lseg); - else + ds = ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx); + if (IS_ERR(ds)) pnfs_error_mark_layout_for_return(hdr->inode, hdr->lseg); + else + ff_layout_send_layouterror(hdr->lseg); pnfs_read_resend_pnfs(hdr, new_idx); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Justin Worrell jworrell@gmail.com
[ Upstream commit 9559d2fffd4f9b892165eed48198a0e5cb8504e6 ]
xs_sock_recv_cmsg was failing to call xs_sock_process_cmsg for any cmsg type other than TLS_RECORD_TYPE_ALERT (TLS_RECORD_TYPE_DATA, and other values not handled.) Based on my reading of the previous commit (cc5d5908: sunrpc: fix client side handling of tls alerts), it looks like only iov_iter_revert should be conditional on TLS_RECORD_TYPE_ALERT (but that other cmsg types should still call xs_sock_process_cmsg). On my machine, I was unable to connect (over mtls) to an NFS share hosted on FreeBSD. With this patch applied, I am able to mount the share again.
Fixes: cc5d59081fa2 ("sunrpc: fix client side handling of tls alerts") Signed-off-by: Justin Worrell jworrell@gmail.com Reviewed-and-tested-by: Scott Mayhew smayhew@redhat.com Link: https://lore.kernel.org/r/20250904211038.12874-3-jworrell@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprtsock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 92cec227215ae..b78f1aae9e806 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -407,9 +407,9 @@ xs_sock_recv_cmsg(struct socket *sock, unsigned int *msg_flags, int flags) iov_iter_kvec(&msg.msg_iter, ITER_DEST, &alert_kvec, 1, alert_kvec.iov_len); ret = sock_recvmsg(sock, &msg, flags); - if (ret > 0 && - tls_get_record_type(sock->sk, &u.cmsg) == TLS_RECORD_TYPE_ALERT) { - iov_iter_revert(&msg.msg_iter, ret); + if (ret > 0) { + if (tls_get_record_type(sock->sk, &u.cmsg) == TLS_RECORD_TYPE_ALERT) + iov_iter_revert(&msg.msg_iter, ret); ret = xs_sock_process_cmsg(sock, &msg, msg_flags, &u.cmsg, -EAGAIN); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 31f1a960ad1a14def94fa0b8c25d62b4c032813f ]
Don't clear the capabilities that are not going to get reset by the call to _nfs4_server_capabilities().
Reported-by: Scott Haiden scott.b.haiden@gmail.com Fixes: b01f21cacde9 ("NFS: Fix the setting of capabilities when automounting a new filesystem") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index e6b7cbc06c9c8..3ac8ecad2e53a 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -4064,7 +4064,6 @@ int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle) }; int err;
- nfs_server_set_init_caps(server); do { err = nfs4_handle_exception(server, _nfs4_server_capabilities(server, fhandle),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit ab1396af7595e7d49a3850481b24d7fe7cbdfd31 ]
Commit edede7a6dcd7 ("trace/fgraph: Fix the warning caused by missing unregister notifier") added a call to unregister the PM notifier if register_ftrace_graph() failed. It does so unconditionally. However, the PM notifier is only registered with the first call to register_ftrace_graph(). If the first registration was successful and a subsequent registration failed, the notifier is now unregistered even if ftrace graphs are still registered.
Fix the problem by only unregistering the PM notifier during error handling if there are no active fgraph registrations.
Fixes: edede7a6dcd7 ("trace/fgraph: Fix the warning caused by missing unregister notifier") Closes: https://lore.kernel.org/all/63b0ba5a-a928-438e-84f9-93028dd72e54@roeck-us.ne... Cc: Ye Weihua yeweihua4@huawei.com Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Link: https://lore.kernel.org/20250906050618.2634078-1-linux@roeck-us.net Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/fgraph.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index 2eed8bc672f91..988a4c4ba97bc 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -1316,7 +1316,8 @@ int register_ftrace_graph(struct fgraph_ops *gops) ftrace_graph_active--; gops->saved_func = NULL; fgraph_lru_release_index(i); - unregister_pm_notifier(&ftrace_suspend_notifier); + if (!ftrace_graph_active) + unregister_pm_notifier(&ftrace_suspend_notifier); } return ret; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit dd5a8621b886b02f8341c5d4ea68eb2c552ebd3e ]
_nfs4_server_capabilities() is expected to clear any flags that are not supported by the server.
Fixes: 8a59bb93b7e3 ("NFSv4 store server support for fs_location attribute") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 3ac8ecad2e53a..7a1a6c68d7324 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -3989,8 +3989,9 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f res.attr_bitmask[2]; } memcpy(server->attr_bitmask, res.attr_bitmask, sizeof(server->attr_bitmask)); - server->caps &= ~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | - NFS_CAP_SYMLINKS| NFS_CAP_SECURITY_LABEL); + server->caps &= + ~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | NFS_CAP_SYMLINKS | + NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS); server->fattr_valid = NFS_ATTR_FATTR_V4; if (res.attr_bitmask[0] & FATTR4_WORD0_ACL && res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit b3ac33436030bce37ecb3dcae581ecfaad28078c ]
_nfs4_server_capabilities() should clear capabilities that are not supported by the server.
Fixes: d2a00cceb93a ("NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 7a1a6c68d7324..ea92483d5e71e 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -3991,7 +3991,8 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f memcpy(server->attr_bitmask, res.attr_bitmask, sizeof(server->attr_bitmask)); server->caps &= ~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | NFS_CAP_SYMLINKS | - NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS); + NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS | + NFS_CAP_OPEN_XOR | NFS_CAP_DELEGTIME); server->fattr_valid = NFS_ATTR_FATTR_V4; if (res.attr_bitmask[0] & FATTR4_WORD0_ACL && res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 4fb2b677fc1f70ee642c0beecc3cabf226ef5707 ]
nfs_server_set_fsinfo() shouldn't assume that NFS_CAP_XATTR is unset on entry to the function.
Fixes: b78ef845c35d ("NFSv4.2: query the server for extended attribute support") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/client.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 17edc124d03f2..035474f3fb8f3 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -881,6 +881,8 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
if (fsinfo->xattr_support) server->caps |= NFS_CAP_XATTR; + else + server->caps &= ~NFS_CAP_XATTR; #endif }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luo Gengkun luogengkun@huaweicloud.com
[ Upstream commit 3d62ab32df065e4a7797204a918f6489ddb8a237 ]
Both tracing_mark_write and tracing_mark_raw_write call __copy_from_user_inatomic during preempt_disable. But in some case, __copy_from_user_inatomic may trigger page fault, and will call schedule() subtly. And if a task is migrated to other cpu, the following warning will be trigger: if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
An example can illustrate this issue:
process flow CPU ---------------------------------------------------------------------
tracing_mark_raw_write(): cpu:0 ... ring_buffer_lock_reserve(): cpu:0 ... cpu = raw_smp_processor_id() cpu:0 cpu_buffer = buffer->buffers[cpu] cpu:0 ... ... __copy_from_user_inatomic(): cpu:0 ... # page fault do_mem_abort(): cpu:0 ... # Call schedule schedule() cpu:0 ... # the task schedule to cpu1 __buffer_unlock_commit(): cpu:1 ... ring_buffer_unlock_commit(): cpu:1 ... cpu = raw_smp_processor_id() cpu:1 cpu_buffer = buffer->buffers[cpu] cpu:1
As shown above, the process will acquire cpuid twice and the return values are not the same.
To fix this problem using copy_from_user_nofault instead of __copy_from_user_inatomic, as the former performs 'access_ok' before copying.
Link: https://lore.kernel.org/20250819105152.2766363-1-luogengkun@huaweicloud.com Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing") Signed-off-by: Luo Gengkun luogengkun@huaweicloud.com Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index ba3358eef34ba..87a43c0f90764 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6949,7 +6949,7 @@ tracing_mark_write(struct file *filp, const char __user *ubuf, entry = ring_buffer_event_data(event); entry->ip = _THIS_IP_;
- len = __copy_from_user_inatomic(&entry->buf, ubuf, cnt); + len = copy_from_user_nofault(&entry->buf, ubuf, cnt); if (len) { memcpy(&entry->buf, FAULTED_STR, FAULTED_SIZE); cnt = FAULTED_SIZE; @@ -7020,7 +7020,7 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
entry = ring_buffer_event_data(event);
- len = __copy_from_user_inatomic(&entry->id, ubuf, cnt); + len = copy_from_user_nofault(&entry->id, ubuf, cnt); if (len) { entry->id = -1; memcpy(&entry->buf, FAULTED_STR, FAULTED_SIZE);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Snitzer snitzer@kernel.org
[ Upstream commit 0978e5b85fc0867f53c5f4e5b7d2a5536a623e16 ]
Push the read_iter and write_iter availability checks down to nfs_do_local_read and nfs_do_local_write respectively.
This eliminates a redundant nfs_to->nfsd_file_file() call.
Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Stable-dep-of: 992203a1fba5 ("nfs/localio: restore creds before releasing pageio data") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/localio.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c index 21b2b38fae9f3..ab305dfc71269 100644 --- a/fs/nfs/localio.c +++ b/fs/nfs/localio.c @@ -274,7 +274,7 @@ nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
static struct nfs_local_kiocb * nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, - struct nfsd_file *localio, gfp_t flags) + struct file *file, gfp_t flags) { struct nfs_local_kiocb *iocb;
@@ -287,9 +287,8 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, kfree(iocb); return NULL; } - init_sync_kiocb(&iocb->kiocb, nfs_to->nfsd_file_file(localio)); + init_sync_kiocb(&iocb->kiocb, file); iocb->kiocb.ki_pos = hdr->args.offset; - iocb->localio = localio; iocb->hdr = hdr; iocb->kiocb.ki_flags &= ~IOCB_APPEND; return iocb; @@ -396,13 +395,19 @@ nfs_do_local_read(struct nfs_pgio_header *hdr, const struct rpc_call_ops *call_ops) { struct nfs_local_kiocb *iocb; + struct file *file = nfs_to->nfsd_file_file(localio); + + /* Don't support filesystems without read_iter */ + if (!file->f_op->read_iter) + return -EAGAIN;
dprintk("%s: vfs_read count=%u pos=%llu\n", __func__, hdr->args.count, hdr->args.offset);
- iocb = nfs_local_iocb_alloc(hdr, localio, GFP_KERNEL); + iocb = nfs_local_iocb_alloc(hdr, file, GFP_KERNEL); if (iocb == NULL) return -ENOMEM; + iocb->localio = localio;
nfs_local_pgio_init(hdr, call_ops); hdr->res.eof = false; @@ -570,14 +575,20 @@ nfs_do_local_write(struct nfs_pgio_header *hdr, const struct rpc_call_ops *call_ops) { struct nfs_local_kiocb *iocb; + struct file *file = nfs_to->nfsd_file_file(localio); + + /* Don't support filesystems without write_iter */ + if (!file->f_op->write_iter) + return -EAGAIN;
dprintk("%s: vfs_write count=%u pos=%llu %s\n", __func__, hdr->args.count, hdr->args.offset, (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
- iocb = nfs_local_iocb_alloc(hdr, localio, GFP_NOIO); + iocb = nfs_local_iocb_alloc(hdr, file, GFP_NOIO); if (iocb == NULL) return -ENOMEM; + iocb->localio = localio;
switch (hdr->args.stable) { default: @@ -603,16 +614,9 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio, const struct rpc_call_ops *call_ops) { int status = 0; - struct file *filp = nfs_to->nfsd_file_file(localio);
if (!hdr->args.count) return 0; - /* Don't support filesystems without read_iter/write_iter */ - if (!filp->f_op->read_iter || !filp->f_op->write_iter) { - nfs_local_disable(clp); - status = -EAGAIN; - goto out; - }
switch (hdr->rw_mode) { case FMODE_READ: @@ -626,8 +630,10 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio, hdr->rw_mode); status = -EINVAL; } -out: + if (status != 0) { + if (status == -EAGAIN) + nfs_local_disable(clp); nfs_to_nfsd_file_put_local(localio); hdr->task.tk_status = status; nfs_local_hdr_release(hdr, call_ops);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Snitzer snitzer@kernel.org
[ Upstream commit 3feec68563dda59517f83d19123aa287a1dfd068 ]
This commit simply adds the required O_DIRECT plumbing. It doesn't address the fact that NFS doesn't ensure all writes are page aligned (nor device logical block size aligned as required by O_DIRECT).
Because NFS will read-modify-write for IO that isn't aligned, LOCALIO will not use O_DIRECT semantics by default if/when an application requests the use of O_DIRECT. Allow the use of O_DIRECT semantics by: 1: Adding a flag to the nfs_pgio_header struct to allow the NFS O_DIRECT layer to signal that O_DIRECT was used by the application 2: Adding a 'localio_O_DIRECT_semantics' NFS module parameter that when enabled will cause LOCALIO to use O_DIRECT semantics (this may cause IO to fail if applications do not properly align their IO).
This commit is derived from code developed by Weston Andros Adamson.
Signed-off-by: Mike Snitzer snitzer@kernel.org Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Anna Schumaker anna.schumaker@oracle.com Stable-dep-of: 992203a1fba5 ("nfs/localio: restore creds before releasing pageio data") Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/nfs/localio.rst | 13 ++++ fs/nfs/direct.c | 1 + fs/nfs/localio.c | 93 ++++++++++++++++++++--- include/linux/nfs_xdr.h | 1 + 4 files changed, 98 insertions(+), 10 deletions(-)
diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst index bd1967e2eab32..20fc901a08f4d 100644 --- a/Documentation/filesystems/nfs/localio.rst +++ b/Documentation/filesystems/nfs/localio.rst @@ -306,6 +306,19 @@ is issuing IO to the underlying local filesystem that it is sharing with the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and fs/nfs/localio.c:nfs_local_commit().
+With normal NFS that makes use of RPC to issue IO to the server, if an +application uses O_DIRECT the NFS client will bypass the pagecache but +the NFS server will not. Because the NFS server's use of buffered IO +affords applications to be less precise with their alignment when +issuing IO to the NFS client. LOCALIO can be configured to use O_DIRECT +semantics by setting the 'localio_O_DIRECT_semantics' nfs module +parameter to Y, e.g.: + + echo Y > /sys/module/nfs/parameters/localio_O_DIRECT_semantics + +Once enabled, it will cause LOCALIO to use O_DIRECT semantics (this may +cause IO to fail if applications do not properly align their IO). + Security ========
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index c1f1b826888c9..f159cfc125adc 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -320,6 +320,7 @@ static void nfs_read_sync_pgio_error(struct list_head *head, int error) static void nfs_direct_pgio_init(struct nfs_pgio_header *hdr) { get_dreq(hdr->dreq); + set_bit(NFS_IOHDR_ODIRECT, &hdr->flags); }
static const struct nfs_pgio_completion_ops nfs_direct_read_completion_ops = { diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c index ab305dfc71269..8fb145124e93b 100644 --- a/fs/nfs/localio.c +++ b/fs/nfs/localio.c @@ -35,6 +35,7 @@ struct nfs_local_kiocb { struct bio_vec *bvec; struct nfs_pgio_header *hdr; struct work_struct work; + void (*aio_complete_work)(struct work_struct *); struct nfsd_file *localio; };
@@ -50,6 +51,11 @@ static void nfs_local_fsync_work(struct work_struct *work); static bool localio_enabled __read_mostly = true; module_param(localio_enabled, bool, 0644);
+static bool localio_O_DIRECT_semantics __read_mostly = false; +module_param(localio_O_DIRECT_semantics, bool, 0644); +MODULE_PARM_DESC(localio_O_DIRECT_semantics, + "LOCALIO will use O_DIRECT semantics to filesystem."); + static inline bool nfs_client_is_local(const struct nfs_client *clp) { return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags); @@ -287,10 +293,19 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, kfree(iocb); return NULL; } - init_sync_kiocb(&iocb->kiocb, file); + + if (localio_O_DIRECT_semantics && + test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) { + iocb->kiocb.ki_filp = file; + iocb->kiocb.ki_flags = IOCB_DIRECT; + } else + init_sync_kiocb(&iocb->kiocb, file); + iocb->kiocb.ki_pos = hdr->args.offset; iocb->hdr = hdr; iocb->kiocb.ki_flags &= ~IOCB_APPEND; + iocb->aio_complete_work = NULL; + return iocb; }
@@ -345,6 +360,18 @@ nfs_local_pgio_release(struct nfs_local_kiocb *iocb) nfs_local_hdr_release(hdr, hdr->task.tk_ops); }
+/* + * Complete the I/O from iocb->kiocb.ki_complete() + * + * Note that this function can be called from a bottom half context, + * hence we need to queue the rpc_call_done() etc to a workqueue + */ +static inline void nfs_local_pgio_aio_complete(struct nfs_local_kiocb *iocb) +{ + INIT_WORK(&iocb->work, iocb->aio_complete_work); + queue_work(nfsiod_workqueue, &iocb->work); +} + static void nfs_local_read_done(struct nfs_local_kiocb *iocb, long status) { @@ -367,6 +394,23 @@ nfs_local_read_done(struct nfs_local_kiocb *iocb, long status) status > 0 ? status : 0, hdr->res.eof); }
+static void nfs_local_read_aio_complete_work(struct work_struct *work) +{ + struct nfs_local_kiocb *iocb = + container_of(work, struct nfs_local_kiocb, work); + + nfs_local_pgio_release(iocb); +} + +static void nfs_local_read_aio_complete(struct kiocb *kiocb, long ret) +{ + struct nfs_local_kiocb *iocb = + container_of(kiocb, struct nfs_local_kiocb, kiocb); + + nfs_local_read_done(iocb, ret); + nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_read_aio_complete_work */ +} + static void nfs_local_call_read(struct work_struct *work) { struct nfs_local_kiocb *iocb = @@ -381,10 +425,10 @@ static void nfs_local_call_read(struct work_struct *work) nfs_local_iter_init(&iter, iocb, READ);
status = filp->f_op->read_iter(&iocb->kiocb, &iter); - WARN_ON_ONCE(status == -EIOCBQUEUED); - - nfs_local_read_done(iocb, status); - nfs_local_pgio_release(iocb); + if (status != -EIOCBQUEUED) { + nfs_local_read_done(iocb, status); + nfs_local_pgio_release(iocb); + }
revert_creds(save_cred); } @@ -412,6 +456,11 @@ nfs_do_local_read(struct nfs_pgio_header *hdr, nfs_local_pgio_init(hdr, call_ops); hdr->res.eof = false;
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) { + iocb->kiocb.ki_complete = nfs_local_read_aio_complete; + iocb->aio_complete_work = nfs_local_read_aio_complete_work; + } + INIT_WORK(&iocb->work, nfs_local_call_read); queue_work(nfslocaliod_workqueue, &iocb->work);
@@ -541,6 +590,24 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status) nfs_local_pgio_done(hdr, status); }
+static void nfs_local_write_aio_complete_work(struct work_struct *work) +{ + struct nfs_local_kiocb *iocb = + container_of(work, struct nfs_local_kiocb, work); + + nfs_local_vfs_getattr(iocb); + nfs_local_pgio_release(iocb); +} + +static void nfs_local_write_aio_complete(struct kiocb *kiocb, long ret) +{ + struct nfs_local_kiocb *iocb = + container_of(kiocb, struct nfs_local_kiocb, kiocb); + + nfs_local_write_done(iocb, ret); + nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_write_aio_complete_work */ +} + static void nfs_local_call_write(struct work_struct *work) { struct nfs_local_kiocb *iocb = @@ -559,11 +626,11 @@ static void nfs_local_call_write(struct work_struct *work) file_start_write(filp); status = filp->f_op->write_iter(&iocb->kiocb, &iter); file_end_write(filp); - WARN_ON_ONCE(status == -EIOCBQUEUED); - - nfs_local_write_done(iocb, status); - nfs_local_vfs_getattr(iocb); - nfs_local_pgio_release(iocb); + if (status != -EIOCBQUEUED) { + nfs_local_write_done(iocb, status); + nfs_local_vfs_getattr(iocb); + nfs_local_pgio_release(iocb); + }
revert_creds(save_cred); current->flags = old_flags; @@ -599,10 +666,16 @@ nfs_do_local_write(struct nfs_pgio_header *hdr, case NFS_FILE_SYNC: iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC; } + nfs_local_pgio_init(hdr, call_ops);
nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) { + iocb->kiocb.ki_complete = nfs_local_write_aio_complete; + iocb->aio_complete_work = nfs_local_write_aio_complete_work; + } + INIT_WORK(&iocb->work, nfs_local_call_write); queue_work(nfslocaliod_workqueue, &iocb->work);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 12d8e47bc5a38..b48d94f099657 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1637,6 +1637,7 @@ enum { NFS_IOHDR_RESEND_PNFS, NFS_IOHDR_RESEND_MDS, NFS_IOHDR_UNSTABLE_WRITES, + NFS_IOHDR_ODIRECT, };
struct nfs_io_completion;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Scott Mayhew smayhew@redhat.com
[ Upstream commit 992203a1fba51b025c60ec0c8b0d9223343dea95 ]
Otherwise if the nfsd filecache code releases the nfsd_file immediately, it can trigger the BUG_ON(cred == current->cred) in __put_cred() when it puts the nfsd_file->nf_file->f-cred.
Fixes: b9f5dd57f4a5 ("nfs/localio: use dedicated workqueues for filesystem read and write") Signed-off-by: Scott Mayhew smayhew@redhat.com Reviewed-by: Mike Snitzer snitzer@kernel.org Link: https://lore.kernel.org/r/20250807164938.2395136-1-smayhew@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/localio.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c index 8fb145124e93b..82a053304ad59 100644 --- a/fs/nfs/localio.c +++ b/fs/nfs/localio.c @@ -425,12 +425,13 @@ static void nfs_local_call_read(struct work_struct *work) nfs_local_iter_init(&iter, iocb, READ);
status = filp->f_op->read_iter(&iocb->kiocb, &iter); + + revert_creds(save_cred); + if (status != -EIOCBQUEUED) { nfs_local_read_done(iocb, status); nfs_local_pgio_release(iocb); } - - revert_creds(save_cred); }
static int @@ -626,14 +627,15 @@ static void nfs_local_call_write(struct work_struct *work) file_start_write(filp); status = filp->f_op->write_iter(&iocb->kiocb, &iter); file_end_write(filp); + + revert_creds(save_cred); + current->flags = old_flags; + if (status != -EIOCBQUEUED) { nfs_local_write_done(iocb, status); nfs_local_vfs_getattr(iocb); nfs_local_pgio_release(iocb); } - - revert_creds(save_cred); - current->flags = old_flags; }
static int
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Riabchun ferr.lambarginio@gmail.com
[ Upstream commit 80d03a40837a9b26750a25122b906c052cc846c9 ]
In my_tramp1 function .size directive was placed above ASM_RET instruction, leading to a wrong function size.
Link: https://lore.kernel.org/aK3d7vxNcO52kEmg@vova-pc Fixes: 9d907f1ae80b ("samples/ftrace: Fix asm function ELF annotations") Signed-off-by: Vladimir Riabchun ferr.lambarginio@gmail.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- samples/ftrace/ftrace-direct-modify.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c index 81220390851a3..328c6e60f024b 100644 --- a/samples/ftrace/ftrace-direct-modify.c +++ b/samples/ftrace/ftrace-direct-modify.c @@ -75,8 +75,8 @@ asm ( CALL_DEPTH_ACCOUNT " call my_direct_func1\n" " leave\n" -" .size my_tramp1, .-my_tramp1\n" ASM_RET +" .size my_tramp1, .-my_tramp1\n"
" .type my_tramp2, @function\n" " .globl my_tramp2\n"
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Kellermann max.kellermann@ionos.com
[ Upstream commit 38a125b31504f91bf6fdd3cfc3a3e9a721e6c97a ]
This allows killing processes that wait for a lock when one process is stuck waiting for the NFS server. This aims to complete the coverage of NFS operations being killable, like nfs_direct_wait() does, for example.
Signed-off-by: Max Kellermann max.kellermann@ionos.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Stable-dep-of: 9eb90f435415 ("NFS: Serialise O_DIRECT i/o and truncate()") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/direct.c | 21 ++++++++++++++++++--- fs/nfs/file.c | 14 +++++++++++--- fs/nfs/internal.h | 7 ++++--- fs/nfs/io.c | 44 +++++++++++++++++++++++++++++++++----------- 4 files changed, 66 insertions(+), 20 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index f159cfc125adc..f32f8d7c9122b 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -472,8 +472,16 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, if (user_backed_iter(iter)) dreq->flags = NFS_ODIRECT_SHOULD_DIRTY;
- if (!swap) - nfs_start_io_direct(inode); + if (!swap) { + result = nfs_start_io_direct(inode); + if (result) { + /* release the reference that would usually be + * consumed by nfs_direct_read_schedule_iovec() + */ + nfs_direct_req_release(dreq); + goto out_release; + } + }
NFS_I(inode)->read_io += count; requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos); @@ -1031,7 +1039,14 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, FLUSH_STABLE); } else { - nfs_start_io_direct(inode); + result = nfs_start_io_direct(inode); + if (result) { + /* release the reference that would usually be + * consumed by nfs_direct_write_schedule_iovec() + */ + nfs_direct_req_release(dreq); + goto out_release; + }
requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, FLUSH_COND_STABLE); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 153d25d4b810c..033feeab8c346 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -167,7 +167,10 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to) iocb->ki_filp, iov_iter_count(to), (unsigned long) iocb->ki_pos);
- nfs_start_io_read(inode); + result = nfs_start_io_read(inode); + if (result) + return result; + result = nfs_revalidate_mapping(inode, iocb->ki_filp->f_mapping); if (!result) { result = generic_file_read_iter(iocb, to); @@ -188,7 +191,10 @@ nfs_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe
dprintk("NFS: splice_read(%pD2, %zu@%llu)\n", in, len, *ppos);
- nfs_start_io_read(inode); + result = nfs_start_io_read(inode); + if (result) + return result; + result = nfs_revalidate_mapping(inode, in->f_mapping); if (!result) { result = filemap_splice_read(in, ppos, pipe, len, flags); @@ -669,7 +675,9 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) nfs_clear_invalid_mapping(file->f_mapping);
since = filemap_sample_wb_err(file->f_mapping); - nfs_start_io_write(inode); + error = nfs_start_io_write(inode); + if (error) + return error; result = generic_write_checks(iocb, from); if (result > 0) result = generic_perform_write(iocb, from); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 882d804089add..4860270f1be04 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -6,6 +6,7 @@ #include "nfs4_fs.h" #include <linux/fs_context.h> #include <linux/security.h> +#include <linux/compiler_attributes.h> #include <linux/crc32.h> #include <linux/sunrpc/addr.h> #include <linux/nfs_page.h> @@ -516,11 +517,11 @@ extern const struct netfs_request_ops nfs_netfs_ops; #endif
/* io.c */ -extern void nfs_start_io_read(struct inode *inode); +extern __must_check int nfs_start_io_read(struct inode *inode); extern void nfs_end_io_read(struct inode *inode); -extern void nfs_start_io_write(struct inode *inode); +extern __must_check int nfs_start_io_write(struct inode *inode); extern void nfs_end_io_write(struct inode *inode); -extern void nfs_start_io_direct(struct inode *inode); +extern __must_check int nfs_start_io_direct(struct inode *inode); extern void nfs_end_io_direct(struct inode *inode);
static inline bool nfs_file_io_is_buffered(struct nfs_inode *nfsi) diff --git a/fs/nfs/io.c b/fs/nfs/io.c index b5551ed8f648b..3388faf2acb9f 100644 --- a/fs/nfs/io.c +++ b/fs/nfs/io.c @@ -39,19 +39,28 @@ static void nfs_block_o_direct(struct nfs_inode *nfsi, struct inode *inode) * Note that buffered writes and truncates both take a write lock on * inode->i_rwsem, meaning that those are serialised w.r.t. the reads. */ -void +int nfs_start_io_read(struct inode *inode) { struct nfs_inode *nfsi = NFS_I(inode); + int err; + /* Be an optimist! */ - down_read(&inode->i_rwsem); + err = down_read_killable(&inode->i_rwsem); + if (err) + return err; if (test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0) - return; + return 0; up_read(&inode->i_rwsem); + /* Slow path.... */ - down_write(&inode->i_rwsem); + err = down_write_killable(&inode->i_rwsem); + if (err) + return err; nfs_block_o_direct(nfsi, inode); downgrade_write(&inode->i_rwsem); + + return 0; }
/** @@ -74,11 +83,15 @@ nfs_end_io_read(struct inode *inode) * Declare that a buffered read operation is about to start, and ensure * that we block all direct I/O. */ -void +int nfs_start_io_write(struct inode *inode) { - down_write(&inode->i_rwsem); - nfs_block_o_direct(NFS_I(inode), inode); + int err; + + err = down_write_killable(&inode->i_rwsem); + if (!err) + nfs_block_o_direct(NFS_I(inode), inode); + return err; }
/** @@ -119,19 +132,28 @@ static void nfs_block_buffered(struct nfs_inode *nfsi, struct inode *inode) * Note that buffered writes and truncates both take a write lock on * inode->i_rwsem, meaning that those are serialised w.r.t. O_DIRECT. */ -void +int nfs_start_io_direct(struct inode *inode) { struct nfs_inode *nfsi = NFS_I(inode); + int err; + /* Be an optimist! */ - down_read(&inode->i_rwsem); + err = down_read_killable(&inode->i_rwsem); + if (err) + return err; if (test_bit(NFS_INO_ODIRECT, &nfsi->flags) != 0) - return; + return 0; up_read(&inode->i_rwsem); + /* Slow path.... */ - down_write(&inode->i_rwsem); + err = down_write_killable(&inode->i_rwsem); + if (err) + return err; nfs_block_buffered(nfsi, inode); downgrade_write(&inode->i_rwsem); + + return 0; }
/**
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 9eb90f435415c7da4800974ed943e39b5578ee7f ]
Ensure that all O_DIRECT reads and writes are complete, and prevent the initiation of new i/o until the setattr operation that will truncate the file is complete.
Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/inode.c | 4 +++- fs/nfs/internal.h | 10 ++++++++++ fs/nfs/io.c | 13 ++----------- 3 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 8827cb00f86d5..5bf5fb5ddd34c 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -761,8 +761,10 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, trace_nfs_setattr_enter(inode);
/* Write all dirty data */ - if (S_ISREG(inode->i_mode)) + if (S_ISREG(inode->i_mode)) { + nfs_file_block_o_direct(NFS_I(inode)); nfs_sync_inode(inode); + }
fattr = nfs_alloc_fattr_with_label(NFS_SERVER(inode)); if (fattr == NULL) { diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 4860270f1be04..456b423402814 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -529,6 +529,16 @@ static inline bool nfs_file_io_is_buffered(struct nfs_inode *nfsi) return test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0; }
+/* Must be called with exclusively locked inode->i_rwsem */ +static inline void nfs_file_block_o_direct(struct nfs_inode *nfsi) +{ + if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { + clear_bit(NFS_INO_ODIRECT, &nfsi->flags); + inode_dio_wait(&nfsi->vfs_inode); + } +} + + /* namespace.c */ #define NFS_PATH_CANONICAL 1 extern char *nfs_path(char **p, struct dentry *dentry, diff --git a/fs/nfs/io.c b/fs/nfs/io.c index 3388faf2acb9f..d275b0a250bf3 100644 --- a/fs/nfs/io.c +++ b/fs/nfs/io.c @@ -14,15 +14,6 @@
#include "internal.h"
-/* Call with exclusively locked inode->i_rwsem */ -static void nfs_block_o_direct(struct nfs_inode *nfsi, struct inode *inode) -{ - if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { - clear_bit(NFS_INO_ODIRECT, &nfsi->flags); - inode_dio_wait(inode); - } -} - /** * nfs_start_io_read - declare the file is being used for buffered reads * @inode: file inode @@ -57,7 +48,7 @@ nfs_start_io_read(struct inode *inode) err = down_write_killable(&inode->i_rwsem); if (err) return err; - nfs_block_o_direct(nfsi, inode); + nfs_file_block_o_direct(nfsi); downgrade_write(&inode->i_rwsem);
return 0; @@ -90,7 +81,7 @@ nfs_start_io_write(struct inode *inode)
err = down_write_killable(&inode->i_rwsem); if (!err) - nfs_block_o_direct(NFS_I(inode), inode); + nfs_file_block_o_direct(NFS_I(inode)); return err; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit b93128f29733af5d427a335978a19884c2c230e2 ]
Ensure that all O_DIRECT reads and writes complete before calling fallocate so that we don't race w.r.t. attribute updates.
Fixes: 99f237832243 ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42proc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 9f0d69e652644..66fe885fc19a1 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -112,6 +112,7 @@ static int nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep, exception.inode = inode; exception.state = lock->open_context->state;
+ nfs_file_block_o_direct(NFS_I(inode)); err = nfs_sync_inode(inode); if (err) goto out;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit c80ebeba1198eac8811ab0dba36ecc13d51e4438 ]
Ensure that all O_DIRECT reads and writes complete before cloning a file range, so that both the source and destination are up to date.
Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4file.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 1cd9652f3c280..453d08a9c4b4d 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -283,9 +283,11 @@ static loff_t nfs42_remap_file_range(struct file *src_file, loff_t src_off,
/* flush all pending writes on both src and dst so that server * has the latest data */ + nfs_file_block_o_direct(NFS_I(src_inode)); ret = nfs_sync_inode(src_inode); if (ret) goto out_unlock; + nfs_file_block_o_direct(NFS_I(dst_inode)); ret = nfs_sync_inode(dst_inode); if (ret) goto out_unlock;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit ca247c89900ae90207f4d321e260cd93b7c7d104 ]
Ensure that all O_DIRECT reads and writes complete before copying a file range, so that the destination is up to date.
Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42proc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 66fe885fc19a1..582cf8a469560 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -356,6 +356,7 @@ static ssize_t _nfs42_proc_copy(struct file *src, return status; }
+ nfs_file_block_o_direct(NFS_I(dst_inode)); status = nfs_sync_inode(dst_inode); if (status) return status;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit b7b8574225e9d2b5f1fb5483886ab797892f43b5 ]
If we're truncating part of the folio, then we need to write out the data on the part that is not covered by the cancellation.
Fixes: d47992f86b30 ("mm: change invalidatepage prototype to accept length") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/file.c | 7 ++++--- fs/nfs/write.c | 1 + 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 033feeab8c346..a16a619fb8c33 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -437,10 +437,11 @@ static void nfs_invalidate_folio(struct folio *folio, size_t offset, dfprintk(PAGECACHE, "NFS: invalidate_folio(%lu, %zu, %zu)\n", folio->index, offset, length);
- if (offset != 0 || length < folio_size(folio)) - return; /* Cancel any unstarted writes on this page */ - nfs_wb_folio_cancel(inode, folio); + if (offset != 0 || length < folio_size(folio)) + nfs_wb_folio(inode, folio); + else + nfs_wb_folio_cancel(inode, folio); folio_wait_private_2(folio); /* [DEPRECATED] */ trace_nfs_invalidate_folio(inode, folio_pos(folio) + offset, length); } diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 2b6b3542405c3..fd86546fafd3f 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -2058,6 +2058,7 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio) * release it */ nfs_inode_remove_request(req); nfs_unlock_and_release_request(req); + folio_cancel_dirty(folio); }
return ret;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonathan Curley jcurley@purestorage.com
[ Upstream commit dd2fa82473453661d12723c46c9f43d9876a7efd ]
Typo in ff_lseg_match_mirrors makes the diff ineffective. This results in merge happening all the time. Merge happening all the time is problematic because it marks lsegs invalid. Marking lsegs invalid causes all outstanding IO to get restarted with EAGAIN and connections to get closed.
Closing connections constantly triggers race conditions in the RDMA implementation...
Fixes: 660d1eb22301c ("pNFS/flexfile: Don't merge layout segments if the mirrors don't match") Signed-off-by: Jonathan Curley jcurley@purestorage.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/flexfilelayout/flexfilelayout.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index 69496aab9583e..6469846971966 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -292,7 +292,7 @@ ff_lseg_match_mirrors(struct pnfs_layout_segment *l1, struct pnfs_layout_segment *l2) { const struct nfs4_ff_layout_segment *fl1 = FF_LAYOUT_LSEG(l1); - const struct nfs4_ff_layout_segment *fl2 = FF_LAYOUT_LSEG(l1); + const struct nfs4_ff_layout_segment *fl2 = FF_LAYOUT_LSEG(l2); u32 i;
if (fl1->mirror_array_cnt != fl2->mirror_array_cnt)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pu Lehui pulehui@huawei.com
[ Upstream commit cd4453c5e983cf1fd5757e9acb915adb1e4602b6 ]
Syzkaller trigger a fault injection warning:
WARNING: CPU: 1 PID: 12326 at tracepoint_add_func+0xbfc/0xeb0 Modules linked in: CPU: 1 UID: 0 PID: 12326 Comm: syz.6.10325 Tainted: G U 6.14.0-rc5-syzkaller #0 Tainted: [U]=USER Hardware name: Google Compute Engine/Google Compute Engine RIP: 0010:tracepoint_add_func+0xbfc/0xeb0 kernel/tracepoint.c:294 Code: 09 fe ff 90 0f 0b 90 0f b6 74 24 43 31 ff 41 bc ea ff ff ff RSP: 0018:ffffc9000414fb48 EFLAGS: 00010283 RAX: 00000000000012a1 RBX: ffffffff8e240ae0 RCX: ffffc90014b78000 RDX: 0000000000080000 RSI: ffffffff81bbd78b RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffffffffef R13: 0000000000000000 R14: dffffc0000000000 R15: ffffffff81c264f0 FS: 00007f27217f66c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2e80dff8 CR3: 00000000268f8000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tracepoint_probe_register_prio+0xc0/0x110 kernel/tracepoint.c:464 register_trace_prio_sched_switch include/trace/events/sched.h:222 [inline] register_pid_events kernel/trace/trace_events.c:2354 [inline] event_pid_write.isra.0+0x439/0x7a0 kernel/trace/trace_events.c:2425 vfs_write+0x24c/0x1150 fs/read_write.c:677 ksys_write+0x12b/0x250 fs/read_write.c:731 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
We can reproduce the warning by following the steps below: 1. echo 8 >> set_event_notrace_pid. Let tr->filtered_pids owns one pid and register sched_switch tracepoint. 2. echo ' ' >> set_event_pid, and perform fault injection during chunk allocation of trace_pid_list_alloc. Let pid_list with no pid and assign to tr->filtered_pids. 3. echo ' ' >> set_event_pid. Let pid_list is NULL and assign to tr->filtered_pids. 4. echo 9 >> set_event_pid, will trigger the double register sched_switch tracepoint warning.
The reason is that syzkaller injects a fault into the chunk allocation in trace_pid_list_alloc, causing a failure in trace_pid_list_set, which may trigger double register of the same tracepoint. This only occurs when the system is about to crash, but to suppress this warning, let's add failure handling logic to trace_pid_list_set.
Link: https://lore.kernel.org/20250908024658.2390398-1-pulehui@huaweicloud.com Fixes: 8d6e90983ade ("tracing: Create a sparse bitmask for pid filtering") Reported-by: syzbot+161412ccaeff20ce4dde@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67cb890e.050a0220.d8275.022e.GAE@google.com Signed-off-by: Pu Lehui pulehui@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 87a43c0f90764..91e6bf1b101a7 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -787,7 +787,10 @@ int trace_pid_write(struct trace_pid_list *filtered_pids, /* copy the current bits to the new max */ ret = trace_pid_list_first(filtered_pids, &pid); while (!ret) { - trace_pid_list_set(pid_list, pid); + ret = trace_pid_list_set(pid_list, pid); + if (ret < 0) + goto out; + ret = trace_pid_list_next(filtered_pids, pid + 1, &pid); nr_pids++; } @@ -824,6 +827,7 @@ int trace_pid_write(struct trace_pid_list *filtered_pids, trace_parser_clear(&parser); ret = 0; } + out: trace_parser_put(&parser);
if (ret < 0) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit 85941afd2c404247e583c827fae0a45da1c1d92c ]
Each PAI PMU device driver returns -EINVAL when an event is out of its accepted range. This return value aborts the search for an alternative PMU device driver to handle this event. Change the return value to -ENOENT. This return value is used to try other PMUs instead. This makes the PMUs more robust when the sequence of PMU device driver initialization changes (at boot time) or by using modules.
Fixes: 39d62336f5c12 ("s390/pai: add support for cryptography counters") Acked-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_pai_crypto.c | 4 ++-- arch/s390/kernel/perf_pai_ext.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/s390/kernel/perf_pai_crypto.c b/arch/s390/kernel/perf_pai_crypto.c index 10725f5a6f0fd..11200880a96c1 100644 --- a/arch/s390/kernel/perf_pai_crypto.c +++ b/arch/s390/kernel/perf_pai_crypto.c @@ -286,10 +286,10 @@ static int paicrypt_event_init(struct perf_event *event) /* PAI crypto PMU registered as PERF_TYPE_RAW, check event type */ if (a->type != PERF_TYPE_RAW && event->pmu->type != a->type) return -ENOENT; - /* PAI crypto event must be in valid range */ + /* PAI crypto event must be in valid range, try others if not */ if (a->config < PAI_CRYPTO_BASE || a->config > PAI_CRYPTO_BASE + paicrypt_cnt) - return -EINVAL; + return -ENOENT; /* Allow only CRYPTO_ALL for sampling */ if (a->sample_period && a->config != PAI_CRYPTO_BASE) return -EINVAL; diff --git a/arch/s390/kernel/perf_pai_ext.c b/arch/s390/kernel/perf_pai_ext.c index a8f0bad99cf04..28398e313b58d 100644 --- a/arch/s390/kernel/perf_pai_ext.c +++ b/arch/s390/kernel/perf_pai_ext.c @@ -266,7 +266,7 @@ static int paiext_event_valid(struct perf_event *event) event->hw.config_base = offsetof(struct paiext_cb, acc); return 0; } - return -EINVAL; + return -ENOENT; }
/* Might be called on different CPU than the one the event is intended for. */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Richter tmricht@linux.ibm.com
[ Upstream commit ce971233242b5391d99442271f3ca096fb49818d ]
Deny all sampling event by the CPUMF counter facility device driver and return -ENOENT. This return value is used to try other PMUs. Up to now events for type PERF_TYPE_HARDWARE were not tested for sampling and returned later on -EOPNOTSUPP. This ends the search for alternative PMUs. Change that behavior and try other PMUs instead.
Fixes: 613a41b0d16e ("s390/cpum_cf: Reject request for sampling in event initialization") Acked-by: Sumanth Korikkar sumanthk@linux.ibm.com Signed-off-by: Thomas Richter tmricht@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/perf_cpum_cf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c index 6d6b057b562fd..b017db3344cb5 100644 --- a/arch/s390/kernel/perf_cpum_cf.c +++ b/arch/s390/kernel/perf_cpum_cf.c @@ -761,8 +761,6 @@ static int __hw_perf_event_init(struct perf_event *event, unsigned int type) break;
case PERF_TYPE_HARDWARE: - if (is_sampling_event(event)) /* No sampling support */ - return -ENOENT; ev = attr->config; if (!attr->exclude_user && attr->exclude_kernel) { /* @@ -860,6 +858,8 @@ static int cpumf_pmu_event_init(struct perf_event *event) unsigned int type = event->attr.type; int err = -ENOENT;
+ if (is_sampling_event(event)) /* No sampling support */ + return err; if (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_RAW) err = __hw_perf_event_init(event, type); else if (event->pmu->type == type)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit f9bb6ffa7f5ad0f8ee0f53fc4a10655872ee4a14 ]
Stanislav reported that in bpf_crypto_crypt() the destination dynptr's size is not validated to be at least as large as the source dynptr's size before calling into the crypto backend with 'len = src_len'. This can result in an OOB write when the destination is smaller than the source.
Concretely, in mentioned function, psrc and pdst are both linear buffers fetched from each dynptr:
psrc = __bpf_dynptr_data(src, src_len); [...] pdst = __bpf_dynptr_data_rw(dst, dst_len); [...] err = decrypt ? ctx->type->decrypt(ctx->tfm, psrc, pdst, src_len, piv) : ctx->type->encrypt(ctx->tfm, psrc, pdst, src_len, piv);
The crypto backend expects pdst to be large enough with a src_len length that can be written. Add an additional src_len > dst_len check and bail out if it's the case. Note that these kfuncs are accessible under root privileges only.
Fixes: 3e1c6f35409f ("bpf: make common crypto API for TC/XDP programs") Reported-by: Stanislav Fort disclosure@aisle.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Vadim Fedorenko vadim.fedorenko@linux.dev Reviewed-by: Vadim Fedorenko vadim.fedorenko@linux.dev Link: https://lore.kernel.org/r/20250829143657.318524-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/crypto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/crypto.c b/kernel/bpf/crypto.c index 94854cd9c4cc3..83c4d9943084b 100644 --- a/kernel/bpf/crypto.c +++ b/kernel/bpf/crypto.c @@ -278,7 +278,7 @@ static int bpf_crypto_crypt(const struct bpf_crypto_ctx *ctx, siv_len = siv ? __bpf_dynptr_size(siv) : 0; src_len = __bpf_dynptr_size(src); dst_len = __bpf_dynptr_size(dst); - if (!src_len || !dst_len) + if (!src_len || !dst_len || src_len > dst_len) return -EINVAL;
if (siv_len != ctx->siv_len)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: KaFai Wan kafai.wan@linux.dev
[ Upstream commit df0cb5cb50bd54d3cd4d0d83417ceec6a66404aa ]
OpenWRT users reported regression on ARMv6 devices after updating to latest HEAD, where tcpdump filter:
tcpdump "not ether host 3c37121a2b3c and not ether host 184ecbca2a3a \ and not ether host 14130b4d3f47 and not ether host f0f61cf440b7 \ and not ether host a84b4dedf471 and not ether host d022be17e1d7 \ and not ether host 5c497967208b and not ether host 706655784d5b"
fails with warning: "Kernel filter failed: No error information" when using config: # CONFIG_BPF_JIT_ALWAYS_ON is not set CONFIG_BPF_JIT_DEFAULT_ON=y
The issue arises because commits: 1. "bpf: Fix array bounds error with may_goto" changed default runtime to __bpf_prog_ret0_warn when jit_requested = 1 2. "bpf: Avoid __bpf_prog_ret0_warn when jit fails" returns error when jit_requested = 1 but jit fails
This change restores interpreter fallback capability for BPF programs with stack size <= 512 bytes when jit fails.
Reported-by: Felix Fietkau nbd@nbd.name Closes: https://lore.kernel.org/bpf/2e267b4b-0540-45d8-9310-e127bf95fc63@nbd.name/ Fixes: 6ebc5030e0c5 ("bpf: Fix array bounds error with may_goto") Signed-off-by: KaFai Wan kafai.wan@linux.dev Acked-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20250909144614.2991253-1-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/core.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 6f91e3a123e55..9380e0fd5e4af 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2299,8 +2299,7 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx, const struct bpf_insn *insn) { /* If this handler ever gets executed, then BPF_JIT_ALWAYS_ON - * is not working properly, or interpreter is being used when - * prog->jit_requested is not 0, so warn about it! + * is not working properly, so warn about it! */ WARN_ON_ONCE(1); return 0; @@ -2401,8 +2400,9 @@ static int bpf_check_tail_call(const struct bpf_prog *fp) return ret; }
-static void bpf_prog_select_func(struct bpf_prog *fp) +static bool bpf_prog_select_interpreter(struct bpf_prog *fp) { + bool select_interpreter = false; #ifndef CONFIG_BPF_JIT_ALWAYS_ON u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1); u32 idx = (round_up(stack_depth, 32) / 32) - 1; @@ -2411,15 +2411,16 @@ static void bpf_prog_select_func(struct bpf_prog *fp) * But for non-JITed programs, we don't need bpf_func, so no bounds * check needed. */ - if (!fp->jit_requested && - !WARN_ON_ONCE(idx >= ARRAY_SIZE(interpreters))) { + if (idx < ARRAY_SIZE(interpreters)) { fp->bpf_func = interpreters[idx]; + select_interpreter = true; } else { fp->bpf_func = __bpf_prog_ret0_warn; } #else fp->bpf_func = __bpf_prog_ret0_warn; #endif + return select_interpreter; }
/** @@ -2438,7 +2439,7 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) /* In case of BPF to BPF calls, verifier did all the prep * work with regards to JITing, etc. */ - bool jit_needed = fp->jit_requested; + bool jit_needed = false;
if (fp->bpf_func) goto finalize; @@ -2447,7 +2448,8 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) bpf_prog_has_kfunc_call(fp)) jit_needed = true;
- bpf_prog_select_func(fp); + if (!bpf_prog_select_interpreter(fp)) + jit_needed = true;
/* eBPF JITs can rewrite the program in case constant * blinding is active. However, in case of error during
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peilin Ye yepeilin@google.com
[ Upstream commit 6d78b4473cdb08b74662355a9e8510bde09c511e ]
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can cause various locking issues; see the following stack trace (edited for style) as one example:
... [10.011566] do_raw_spin_lock.cold [10.011570] try_to_wake_up (5) double-acquiring the same [10.011575] kick_pool rq_lock, causing a hardlockup [10.011579] __queue_work [10.011582] queue_work_on [10.011585] kernfs_notify [10.011589] cgroup_file_notify [10.011593] try_charge_memcg (4) memcg accounting raises an [10.011597] obj_cgroup_charge_pages MEMCG_MAX event [10.011599] obj_cgroup_charge_account [10.011600] __memcg_slab_post_alloc_hook [10.011603] __kmalloc_node_noprof ... [10.011611] bpf_map_kmalloc_node [10.011612] __bpf_async_init [10.011615] bpf_timer_init (3) BPF calls bpf_timer_init() [10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable [10.011619] bpf__sched_ext_ops_runnable [10.011620] enqueue_task_scx (2) BPF runs with rq_lock held [10.011622] enqueue_task [10.011626] ttwu_do_activate [10.011629] sched_ttwu_pending (1) grabs rq_lock ...
The above was reproduced on bpf-next (b338cf849ec8) by modifying ./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during ops.runnable(), and hacking the memcg accounting code a bit to make a bpf_timer_init() call more likely to raise an MEMCG_MAX event.
We have also run into other similar variants (both internally and on bpf-next), including double-acquiring cgroup_file_kn_lock, the same worker_pool::lock, etc.
As suggested by Shakeel, fix this by using __GFP_HIGH instead of GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg() raises an MEMCG_MAX event, we call __memcg_memory_event() with @allow_spinning=false and avoid calling cgroup_file_notify() there.
Depends on mm patch "memcg: skip cgroup_file_notify if spinning is not allowed": https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/
v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/ https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/ v1 approach: https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Suggested-by: Shakeel Butt shakeel.butt@linux.dev Signed-off-by: Peilin Ye yepeilin@google.com Link: https://lore.kernel.org/r/20250909095222.2121438-1-yepeilin@google.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/helpers.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index be4429463599f..a0bf39b7359aa 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1276,8 +1276,11 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u goto out; }
- /* allocate hrtimer via map_kmalloc to use memcg accounting */ - cb = bpf_map_kmalloc_node(map, size, GFP_ATOMIC, map->numa_node); + /* Allocate via bpf_map_kmalloc_node() for memcg accounting. Until + * kmalloc_nolock() is available, avoid locking issues by using + * __GFP_HIGH (GFP_ATOMIC & ~__GFP_RECLAIM). + */ + cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node); if (!cb) { ret = -ENOMEM; goto out;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@google.com
[ Upstream commit a3967baad4d533dc254c31e0d221e51c8d223d58 ]
syzbot reported the splat below. [0]
The repro does the following:
1. Load a sk_msg prog that calls bpf_msg_cork_bytes(msg, cork_bytes) 2. Attach the prog to a SOCKMAP 3. Add a socket to the SOCKMAP 4. Activate fault injection 5. Send data less than cork_bytes
At 5., the data is carried over to the next sendmsg() as it is smaller than the cork_bytes specified by bpf_msg_cork_bytes().
Then, tcp_bpf_send_verdict() tries to allocate psock->cork to hold the data, but this fails silently due to fault injection + __GFP_NOWARN.
If the allocation fails, we need to revert the sk->sk_forward_alloc change done by sk_msg_alloc().
Let's call sk_msg_free() when tcp_bpf_send_verdict fails to allocate psock->cork.
The "*copied" also needs to be updated such that a proper error can be returned to the caller, sendmsg. It fails to allocate psock->cork. Nothing has been corked so far, so this patch simply sets "*copied" to 0.
[0]: WARNING: net/ipv4/af_inet.c:156 at inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156, CPU#1: syz-executor/5983 Modules linked in: CPU: 1 UID: 0 PID: 5983 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156 Code: 0f 0b 90 e9 62 fe ff ff e8 7a db b5 f7 90 0f 0b 90 e9 95 fe ff ff e8 6c db b5 f7 90 0f 0b 90 e9 bb fe ff ff e8 5e db b5 f7 90 <0f> 0b 90 e9 e1 fe ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 9f fc RSP: 0018:ffffc90000a08b48 EFLAGS: 00010246 RAX: ffffffff8a09d0b2 RBX: dffffc0000000000 RCX: ffff888024a23c80 RDX: 0000000000000100 RSI: 0000000000000fff RDI: 0000000000000000 RBP: 0000000000000fff R08: ffff88807e07c627 R09: 1ffff1100fc0f8c4 R10: dffffc0000000000 R11: ffffed100fc0f8c5 R12: ffff88807e07c380 R13: dffffc0000000000 R14: ffff88807e07c60c R15: 1ffff1100fc0f872 FS: 00005555604c4500(0000) GS:ffff888125af1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555604df5c8 CR3: 0000000032b06000 CR4: 00000000003526f0 Call Trace: <IRQ> __sk_destruct+0x86/0x660 net/core/sock.c:2339 rcu_do_batch kernel/rcu/tree.c:2605 [inline] rcu_core+0xca8/0x1770 kernel/rcu/tree.c:2861 handle_softirqs+0x286/0x870 kernel/softirq.c:579 __do_softirq kernel/softirq.c:613 [inline] invoke_softirq kernel/softirq.c:453 [inline] __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1052 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1052 </IRQ>
Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Reported-by: syzbot+4cabd1d2fa917a456db8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68c0b6b5.050a0220.3c6139.0013.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima kuniyu@google.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Link: https://patch.msgid.link/20250909232623.4151337-1-kuniyu@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_bpf.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 22e8a2af5dd8b..8372ca512a755 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -408,8 +408,11 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock, if (!psock->cork) { psock->cork = kzalloc(sizeof(*psock->cork), GFP_ATOMIC | __GFP_NOWARN); - if (!psock->cork) + if (!psock->cork) { + sk_msg_free(sk, msg); + *copied = 0; return -ENOMEM; + } } memcpy(psock->cork, msg, sizeof(*msg)); return 0;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: wangzijie wangzijie1@honor.com
[ Upstream commit 0ce9398aa0830f15f92bbed73853f9861c3e74ff ]
Commit 2ce3d282bd50 ("proc: fix missing pde_set_flags() for net proc files") missed a key part in the definition of proc_dir_entry:
union { const struct proc_ops *proc_ops; const struct file_operations *proc_dir_ops; };
So dereference of ->proc_ops assumes it is a proc_ops structure results in type confusion and make NULL check for 'proc_ops' not work for proc dir.
Add !S_ISDIR(dp->mode) test before calling pde_set_flags() to fix it.
Link: https://lkml.kernel.org/r/20250904135715.3972782-1-wangzijie1@honor.com Fixes: 2ce3d282bd50 ("proc: fix missing pde_set_flags() for net proc files") Signed-off-by: wangzijie wangzijie1@honor.com Reported-by: Brad Spengler spender@grsecurity.net Closes: https://lore.kernel.org/all/20250903065758.3678537-1-wangzijie1@honor.com/ Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Jiri Slaby jirislaby@kernel.org Cc: Stefano Brivio sbrivio@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/generic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/proc/generic.c b/fs/proc/generic.c index a87a9404e0d0c..eb49beff69bcb 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -388,7 +388,8 @@ struct proc_dir_entry *proc_register(struct proc_dir_entry *dir, if (proc_alloc_inum(&dp->low_ino)) goto out_free_entry;
- pde_set_flags(dp); + if (!S_ISDIR(dp->mode)) + pde_set_flags(dp);
write_lock(&proc_subdir_lock); dp->parent = dir;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Salah Triki salah.triki@gmail.com
commit ff2a66d21fd2364ed9396d151115eec59612b200 upstream.
dma_free_coherent() must only be called if the corresponding dma_alloc_coherent() call has succeeded. Calling it when the allocation fails leads to undefined behavior.
Delete the wrong call.
[ bp: Massage commit message. ]
Fixes: 71bcada88b0f3 ("edac: altera: Add Altera SDRAM EDAC support") Signed-off-by: Salah Triki salah.triki@gmail.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: Dinh Nguyen dinguyen@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/aIrfzzqh4IzYtDVC@pc Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/edac/altera_edac.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/edac/altera_edac.c +++ b/drivers/edac/altera_edac.c @@ -128,7 +128,6 @@ static ssize_t altr_sdr_mc_err_inject_wr
ptemp = dma_alloc_coherent(mci->pdev, 16, &dma_handle, GFP_KERNEL); if (!ptemp) { - dma_free_coherent(mci->pdev, 16, ptemp, dma_handle); edac_printk(KERN_ERR, EDAC_MC, "Inject: Buffer Allocation error\n"); return -ENOMEM;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 199cd9e8d14bc14bdbd1fa3031ce26dac9781507 upstream.
This reverts commit 14e41b16e8cb677bb440dca2edba8b041646c742.
This patch breaks the LTP acct02 test, so let's revert and look for a better solution.
Reported-by: Mark Brown broonie@kernel.org Reported-by: Harshvardhan Jha harshvardhan.j.jha@oracle.com Link: https://lore.kernel.org/linux-nfs/7d4d57b0-39a3-49f1-8ada-60364743e3b4@siren... Cc: stable@vger.kernel.org # 6.15.x Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/sched.c | 2 -- 1 file changed, 2 deletions(-)
--- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -276,8 +276,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue
static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode) { - if (unlikely(current->flags & PF_EXITING)) - return -EINTR; schedule(); if (signal_pending_state(mode, current)) return -ERESTARTSYS;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 3fac212fe489aa0dbe8d80a42a7809840ca7b0f9 upstream.
Clang 22 recently added support for defining __SANITIZE__ macros similar to GCC [1], which causes warnings (or errors with CONFIG_WERROR=y or W=e) with the existing defines that the kernel creates to emulate this behavior with existing clang versions.
In file included from <built-in>:3: In file included from include/linux/compiler_types.h:171: include/linux/compiler-clang.h:37:9: error: '__SANITIZE_THREAD__' macro redefined [-Werror,-Wmacro-redefined] 37 | #define __SANITIZE_THREAD__ | ^ <built-in>:352:9: note: previous definition is here 352 | #define __SANITIZE_THREAD__ 1 | ^
Refactor compiler-clang.h to only define the sanitizer macros when they are undefined and adjust the rest of the code to use these macros for checking if the sanitizers are enabled, clearing up the warnings and allowing the kernel to easily drop these defines when the minimum supported version of LLVM for building the kernel becomes 22.0.0 or newer.
Link: https://lkml.kernel.org/r/20250902-clang-update-sanitize-defines-v1-1-cf3702... Link: https://github.com/llvm/llvm-project/commit/568c23bbd3303518c5056d7f03444dae... [1] Signed-off-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Justin Stitt justinstitt@google.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Bill Wendling morbo@google.com Cc: Dmitriy Vyukov dvyukov@google.com Cc: Marco Elver elver@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/compiler-clang.h | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-)
--- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -18,23 +18,42 @@ #define KASAN_ABI_VERSION 5
/* + * Clang 22 added preprocessor macros to match GCC, in hopes of eventually + * dropping __has_feature support for sanitizers: + * https://github.com/llvm/llvm-project/commit/568c23bbd3303518c5056d7f03444dae... + * Create these macros for older versions of clang so that it is easy to clean + * up once the minimum supported version of LLVM for building the kernel always + * creates these macros. + * * Note: Checking __has_feature(*_sanitizer) is only true if the feature is * enabled. Therefore it is not required to additionally check defined(CONFIG_*) * to avoid adding redundant attributes in other configurations. */ +#if __has_feature(address_sanitizer) && !defined(__SANITIZE_ADDRESS__) +#define __SANITIZE_ADDRESS__ +#endif +#if __has_feature(hwaddress_sanitizer) && !defined(__SANITIZE_HWADDRESS__) +#define __SANITIZE_HWADDRESS__ +#endif +#if __has_feature(thread_sanitizer) && !defined(__SANITIZE_THREAD__) +#define __SANITIZE_THREAD__ +#endif
-#if __has_feature(address_sanitizer) || __has_feature(hwaddress_sanitizer) -/* Emulate GCC's __SANITIZE_ADDRESS__ flag */ +/* + * Treat __SANITIZE_HWADDRESS__ the same as __SANITIZE_ADDRESS__ in the kernel. + */ +#ifdef __SANITIZE_HWADDRESS__ #define __SANITIZE_ADDRESS__ +#endif + +#ifdef __SANITIZE_ADDRESS__ #define __no_sanitize_address \ __attribute__((no_sanitize("address", "hwaddress"))) #else #define __no_sanitize_address #endif
-#if __has_feature(thread_sanitizer) -/* emulate gcc's __SANITIZE_THREAD__ flag */ -#define __SANITIZE_THREAD__ +#ifdef __SANITIZE_THREAD__ #define __no_sanitize_thread \ __attribute__((no_sanitize("thread"))) #else
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krister Johansen kjlx@templeofstupid.com
commit 648de37416b301f046f62f1b65715c7fa8ebaa67 upstream.
Users reported a scenario where MPTCP connections that were configured with SO_KEEPALIVE prior to connect would fail to enable their keepalives if MTPCP fell back to TCP mode.
After investigating, this affects keepalives for any connection where sync_socket_options is called on a socket that is in the closed or listening state. Joins are handled properly. For connects, sync_socket_options is called when the socket is still in the closed state. The tcp_set_keepalive() function does not act on sockets that are closed or listening, hence keepalive is not immediately enabled. Since the SO_KEEPOPEN flag is absent, it is not enabled later in the connect sequence via tcp_finish_connect. Setting the keepalive via sockopt after connect does work, but would not address any subsequently created flows.
Fortunately, the fix here is straight-forward: set SOCK_KEEPOPEN on the subflow when calling sync_socket_options.
The fix was valdidated both by using tcpdump to observe keepalive packets not being sent before the fix, and being sent after the fix. It was also possible to observe via ss that the keepalive timer was not enabled on these sockets before the fix, but was enabled afterwards.
Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY") Cc: stable@vger.kernel.org Signed-off-by: Krister Johansen kjlx@templeofstupid.com Reviewed-by: Geliang Tang geliang@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/aL8dYfPZrwedCIh9@templeofstupid.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/sockopt.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -1508,13 +1508,12 @@ static void sync_socket_options(struct m { static const unsigned int tx_rx_locks = SOCK_RCVBUF_LOCK | SOCK_SNDBUF_LOCK; struct sock *sk = (struct sock *)msk; + bool keep_open;
- if (ssk->sk_prot->keepalive) { - if (sock_flag(sk, SOCK_KEEPOPEN)) - ssk->sk_prot->keepalive(ssk, 1); - else - ssk->sk_prot->keepalive(ssk, 0); - } + keep_open = sock_flag(sk, SOCK_KEEPOPEN); + if (ssk->sk_prot->keepalive) + ssk->sk_prot->keepalive(ssk, keep_open); + sock_valbool_flag(ssk, SOCK_KEEPOPEN, keep_open);
ssk->sk_priority = sk->sk_priority; ssk->sk_bound_dev_if = sk->sk_bound_dev_if;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Tinguely mark.tinguely@oracle.com
commit 04100f775c2ea501927f508f17ad824ad1f23c8d upstream.
syzbot detected a OCFS2 hang due to a recursive semaphore on a FS_IOC_FIEMAP of the extent list on a specially crafted mmap file.
context_switch kernel/sched/core.c:5357 [inline] __schedule+0x1798/0x4cc0 kernel/sched/core.c:6961 __schedule_loop kernel/sched/core.c:7043 [inline] schedule+0x165/0x360 kernel/sched/core.c:7058 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115 rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185 __down_write_common kernel/locking/rwsem.c:1317 [inline] __down_write kernel/locking/rwsem.c:1326 [inline] down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591 ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142 do_page_mkwrite+0x14d/0x310 mm/memory.c:3361 wp_page_shared mm/memory.c:3762 [inline] do_wp_page+0x268d/0x5800 mm/memory.c:3981 handle_pte_fault mm/memory.c:6068 [inline] __handle_mm_fault+0x1033/0x5440 mm/memory.c:6195 handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364 do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387 handle_page_fault arch/x86/mm/fault.c:1476 [inline] exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623 RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline] RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline] RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline] RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26 Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89 f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f 1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41 RSP: 0018:ffffc9000403f950 EFLAGS: 00050256 RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038 RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060 RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42 R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098 R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060 copy_to_user include/linux/uaccess.h:225 [inline] fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145 ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532 __do_sys_ioctl fs/ioctl.c:596 [inline] __se_sys_ioctl+0x82/0x170 fs/ioctl.c:584 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f13850fd9 RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9 RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004 RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0 R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b
ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent() to read the extent list of this running mmap executable. The user supplied buffer to hold the fiemap information page faults calling ocfs2_page_mkwrite() which will take a write lock (since v2.6.27-38-g00dc417fa3e7) of the same semaphore. This recursive semaphore will hold filesystem locks and causes a hang of the fileystem.
The ip_alloc_sem protects the inode extent list and size. Release the read semphore before calling fiemap_fill_next_extent() in ocfs2_fiemap() and ocfs2_fiemap_inline(). This does an unnecessary semaphore lock/unlock on the last extent but simplifies the error path.
Link: https://lkml.kernel.org/r/61d1a62b-2631-4f12-81e2-cd689914360b@oracle.com Fixes: 00dc417fa3e7 ("ocfs2: fiemap support") Signed-off-by: Mark Tinguely mark.tinguely@oracle.com Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7 Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/extent_map.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/extent_map.c +++ b/fs/ocfs2/extent_map.c @@ -696,6 +696,8 @@ out: * it not only handles the fiemap for inlined files, but also deals * with the fast symlink, cause they have no difference for extent * mapping per se. + * + * Must be called with ip_alloc_sem semaphore held. */ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh, struct fiemap_extent_info *fieinfo, @@ -707,6 +709,7 @@ static int ocfs2_fiemap_inline(struct in u64 phys; u32 flags = FIEMAP_EXTENT_DATA_INLINE|FIEMAP_EXTENT_LAST; struct ocfs2_inode_info *oi = OCFS2_I(inode); + lockdep_assert_held_read(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data; if (ocfs2_inode_is_fast_symlink(inode)) @@ -722,8 +725,11 @@ static int ocfs2_fiemap_inline(struct in phys += offsetof(struct ocfs2_dinode, id2.i_data.id_data);
+ /* Release the ip_alloc_sem to prevent deadlock on page fault */ + up_read(&OCFS2_I(inode)->ip_alloc_sem); ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count, flags); + down_read(&OCFS2_I(inode)->ip_alloc_sem); if (ret < 0) return ret; } @@ -792,9 +798,11 @@ int ocfs2_fiemap(struct inode *inode, st len_bytes = (u64)le16_to_cpu(rec.e_leaf_clusters) << osb->s_clustersize_bits; phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb->s_blocksize_bits; virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb->s_clustersize_bits; - + /* Release the ip_alloc_sem to prevent deadlock on page fault */ + up_read(&OCFS2_I(inode)->ip_alloc_sem); ret = fiemap_fill_next_extent(fieinfo, virt_bytes, phys_bytes, len_bytes, fe_flags); + down_read(&OCFS2_I(inode)->ip_alloc_sem); if (ret) break;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit de134cb54c3a67644ff95b1c9bffe545e752c912 upstream.
The following workload on a squota enabled fs:
btrfs subvol create mnt/subvol
# ensure subvol extents get accounted sync btrfs qgroup create 1/1 mnt btrfs qgroup assign mnt/subvol 1/1 mnt btrfs qgroup delete mnt/subvol
# make the cleaner thread run btrfs filesystem sync mnt sleep 1 btrfs filesystem sync mnt btrfs qgroup destroy 1/1 mnt
will fail with EBUSY. The reason is that 1/1 does the quick accounting when we assign subvol to it, gaining its exclusive usage as excl and excl_cmpr. But then when we delete subvol, the decrement happens via record_squota_delta() which does not update excl_cmpr, as squotas does not make any distinction between compressed and normal extents. Thus, we increment excl_cmpr but never decrement it, and are unable to delete 1/1. The two possible fixes are to make squota always mirror excl and excl_cmpr or to make the fast accounting separately track the plain and cmpr numbers. The latter felt cleaner to me so that is what I opted for.
Fixes: 1e0e9d5771c3 ("btrfs: add helper for recording simple quota deltas") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/qgroup.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1501,6 +1501,7 @@ static int __qgroup_excl_accounting(stru struct btrfs_qgroup *qgroup; LIST_HEAD(qgroup_list); u64 num_bytes = src->excl; + u64 num_bytes_cmpr = src->excl_cmpr; int ret = 0;
qgroup = find_qgroup_rb(fs_info, ref_root); @@ -1512,11 +1513,12 @@ static int __qgroup_excl_accounting(stru struct btrfs_qgroup_list *glist;
qgroup->rfer += sign * num_bytes; - qgroup->rfer_cmpr += sign * num_bytes; + qgroup->rfer_cmpr += sign * num_bytes_cmpr;
WARN_ON(sign < 0 && qgroup->excl < num_bytes); + WARN_ON(sign < 0 && qgroup->excl_cmpr < num_bytes_cmpr); qgroup->excl += sign * num_bytes; - qgroup->excl_cmpr += sign * num_bytes; + qgroup->excl_cmpr += sign * num_bytes_cmpr;
if (sign > 0) qgroup_rsv_add_by_qgroup(fs_info, qgroup, src);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Omar Sandoval osandov@fb.com
commit f6a6c280059c4ddc23e12e3de1b01098e240036f upstream.
There is a race condition between inode eviction and inode caching that can cause a live struct btrfs_inode to be missing from the root->inodes xarray. Specifically, there is a window during evict() between the inode being unhashed and deleted from the xarray. If btrfs_iget() is called for the same inode in that window, it will be recreated and inserted into the xarray, but then eviction will delete the new entry, leaving nothing in the xarray:
Thread 1 Thread 2 --------------------------------------------------------------- evict() remove_inode_hash() btrfs_iget_path() btrfs_iget_locked() btrfs_read_locked_inode() btrfs_add_inode_to_root() destroy_inode() btrfs_destroy_inode() btrfs_del_inode_from_root() __xa_erase
In turn, this can cause issues for subvolume deletion. Specifically, if an inode is in this lost state, and all other inodes are evicted, then btrfs_del_inode_from_root() will call btrfs_add_dead_root() prematurely. If the lost inode has a delayed_node attached to it, then when btrfs_clean_one_deleted_snapshot() calls btrfs_kill_all_delayed_nodes(), it will loop forever because the delayed_nodes xarray will never become empty (unless memory pressure forces the inode out). We saw this manifest as soft lockups in production.
Fix it by only deleting the xarray entry if it matches the given inode (using __xa_cmpxchg()).
Fixes: 310b2f5d5a94 ("btrfs: use an xarray to track open inodes in a root") Cc: stable@vger.kernel.org # 6.11+ Reviewed-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Filipe Manana fdmanana@suse.com Co-authored-by: Leo Martins loemra.dev@gmail.com Signed-off-by: Leo Martins loemra.dev@gmail.com Signed-off-by: Omar Sandoval osandov@fb.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5634,7 +5634,17 @@ static void btrfs_del_inode_from_root(st bool empty = false;
xa_lock(&root->inodes); - entry = __xa_erase(&root->inodes, btrfs_ino(inode)); + /* + * This btrfs_inode is being freed and has already been unhashed at this + * point. It's possible that another btrfs_inode has already been + * allocated for the same inode and inserted itself into the root, so + * don't delete it in that case. + * + * Note that this shouldn't need to allocate memory, so the gfp flags + * don't really matter. + */ + entry = __xa_cmpxchg(&root->inodes, btrfs_ino(inode), inode, NULL, + GFP_ATOMIC); if (entry == inode) empty = xa_empty(&root->inodes); xa_unlock(&root->inodes);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chiasheng Lee chiasheng.lee@linux.intel.com
commit 664596bd98bb251dd417dfd3f9b615b661e1e44a upstream.
Hide the Intel Birch Stream SoC TCO WDT feature since it was removed.
On platforms with PCH TCO WDT, this redundant device might be rendering errors like this:
[ 28.144542] sysfs: cannot create duplicate filename '/bus/platform/devices/iTCO_wdt'
Fixes: 8c56f9ef25a3 ("i2c: i801: Add support for Intel Birch Stream SoC") Link: https://bugzilla.kernel.org/show_bug.cgi?id=220320 Signed-off-by: Chiasheng Lee chiasheng.lee@linux.intel.com Cc: stable@vger.kernel.org # v6.7+ Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@kernel.org Link: https://lore.kernel.org/r/20250901125943.916522-1-chiasheng.lee@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-i801.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -1057,7 +1057,7 @@ static const struct pci_device_id i801_i { PCI_DEVICE_DATA(INTEL, METEOR_LAKE_P_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, { PCI_DEVICE_DATA(INTEL, METEOR_LAKE_SOC_S_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, { PCI_DEVICE_DATA(INTEL, METEOR_LAKE_PCH_S_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, - { PCI_DEVICE_DATA(INTEL, BIRCH_STREAM_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, + { PCI_DEVICE_DATA(INTEL, BIRCH_STREAM_SMBUS, FEATURES_ICH5) }, { PCI_DEVICE_DATA(INTEL, ARROW_LAKE_H_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, { PCI_DEVICE_DATA(INTEL, PANTHER_LAKE_H_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) }, { PCI_DEVICE_DATA(INTEL, PANTHER_LAKE_P_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) },
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleksij Rempel o.rempel@pengutronix.de
commit 5537a4679403423e0b49c95b619983a4583d69c5 upstream.
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver: - .ndo_open/.ndo_stop control the phylink start/stop lifecycle. - ethtool/phylib entry points run in process context, not PM. - phylink MAC ops program the MAC on link changes after resume.
Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink") Reported-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Tested-by: Hubert Wiśniewski hubert.wisniewski.25632@gmail.com Tested-by: Xu Yang xu.yang_2@nxp.com Link: https://patch.msgid.link/20250908112619.2900723-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/asix_devices.c | 13 ------------- 1 file changed, 13 deletions(-)
--- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -607,15 +607,8 @@ static const struct net_device_ops ax887
static void ax88772_suspend(struct usbnet *dev) { - struct asix_common_private *priv = dev->driver_priv; u16 medium;
- if (netif_running(dev->net)) { - rtnl_lock(); - phylink_suspend(priv->phylink, false); - rtnl_unlock(); - } - /* Stop MAC operation */ medium = asix_read_medium_status(dev, 1); medium &= ~AX_MEDIUM_RE; @@ -644,12 +637,6 @@ static void ax88772_resume(struct usbnet for (i = 0; i < 3; i++) if (!priv->reset(dev, 1)) break; - - if (netif_running(dev->net)) { - rtnl_lock(); - phylink_resume(priv->phylink); - rtnl_unlock(); - } }
static int asix_resume(struct usb_interface *intf)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Sverdlin alexander.sverdlin@siemens.com
commit fd779eac2d659668be4d3dbdac0710afd5d6db12 upstream.
Having setup time 0 violates tAR, tCLR of some chips, for instance TOSHIBA TC58NVG2S3ETAI0 cannot be detected successfully (first ID byte being read duplicated, i.e. 98 98 dc 90 15 76 14 03 instead of 98 dc 90 15 76 ...).
Atmel Application Notes postulated 1 cycle NRD_SETUP without explanation [1], but it looks more appropriate to just calculate setup time properly.
[1] Link: https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/Application...
Cc: stable@vger.kernel.org Fixes: f9ce2eddf176 ("mtd: nand: atmel: Add ->setup_data_interface() hooks") Signed-off-by: Alexander Sverdlin alexander.sverdlin@siemens.com Tested-by: Alexander Dahl ada@thorsis.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/atmel/nand-controller.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
--- a/drivers/mtd/nand/raw/atmel/nand-controller.c +++ b/drivers/mtd/nand/raw/atmel/nand-controller.c @@ -1378,13 +1378,23 @@ static int atmel_smc_nand_prepare_smccon return ret;
/* + * Read setup timing depends on the operation done on the NAND: + * + * NRD_SETUP = max(tAR, tCLR) + */ + timeps = max(conf->timings.sdr.tAR_min, conf->timings.sdr.tCLR_min); + ncycles = DIV_ROUND_UP(timeps, mckperiodps); + totalcycles += ncycles; + ret = atmel_smc_cs_conf_set_setup(smcconf, ATMEL_SMC_NRD_SHIFT, ncycles); + if (ret) + return ret; + + /* * The read cycle timing is directly matching tRC, but is also * dependent on the setup and hold timings we calculated earlier, * which gives: * - * NRD_CYCLE = max(tRC, NRD_PULSE + NRD_HOLD) - * - * NRD_SETUP is always 0. + * NRD_CYCLE = max(tRC, NRD_SETUP + NRD_PULSE + NRD_HOLD) */ ncycles = DIV_ROUND_UP(conf->timings.sdr.tRC_min, mckperiodps); ncycles = max(totalcycles, ncycles);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Kerello christophe.kerello@foss.st.com
commit 513c40e59d5a414ab763a9c84797534b5e8c208d upstream.
Avoid below overlapping mappings by using a contiguous non-cacheable buffer.
[ 4.077708] DMA-API: stm32_fmc2_nfc 48810000.nand-controller: cacheline tracking EEXIST, overlapping mappings aren't supported [ 4.089103] WARNING: CPU: 1 PID: 44 at kernel/dma/debug.c:568 add_dma_entry+0x23c/0x300 [ 4.097071] Modules linked in: [ 4.100101] CPU: 1 PID: 44 Comm: kworker/u4:2 Not tainted 6.1.82 #1 [ 4.106346] Hardware name: STMicroelectronics STM32MP257F VALID1 SNOR / MB1704 (LPDDR4 Power discrete) + MB1703 + MB1708 (SNOR MB1730) (DT) [ 4.118824] Workqueue: events_unbound deferred_probe_work_func [ 4.124674] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4.131624] pc : add_dma_entry+0x23c/0x300 [ 4.135658] lr : add_dma_entry+0x23c/0x300 [ 4.139792] sp : ffff800009dbb490 [ 4.143016] x29: ffff800009dbb4a0 x28: 0000000004008022 x27: ffff8000098a6000 [ 4.150174] x26: 0000000000000000 x25: ffff8000099e7000 x24: ffff8000099e7de8 [ 4.157231] x23: 00000000ffffffff x22: 0000000000000000 x21: ffff8000098a6a20 [ 4.164388] x20: ffff000080964180 x19: ffff800009819ba0 x18: 0000000000000006 [ 4.171545] x17: 6361727420656e69 x16: 6c6568636163203a x15: 72656c6c6f72746e [ 4.178602] x14: 6f632d646e616e2e x13: ffff800009832f58 x12: 00000000000004ec [ 4.185759] x11: 00000000000001a4 x10: ffff80000988af58 x9 : ffff800009832f58 [ 4.192916] x8 : 00000000ffffefff x7 : ffff80000988af58 x6 : 80000000fffff000 [ 4.199972] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000 [ 4.207128] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000812d2c40 [ 4.214185] Call trace: [ 4.216605] add_dma_entry+0x23c/0x300 [ 4.220338] debug_dma_map_sg+0x198/0x350 [ 4.224373] __dma_map_sg_attrs+0xa0/0x110 [ 4.228411] dma_map_sg_attrs+0x10/0x2c [ 4.232247] stm32_fmc2_nfc_xfer.isra.0+0x1c8/0x3fc [ 4.237088] stm32_fmc2_nfc_seq_read_page+0xc8/0x174 [ 4.242127] nand_read_oob+0x1d4/0x8e0 [ 4.245861] mtd_read_oob_std+0x58/0x84 [ 4.249596] mtd_read_oob+0x90/0x150 [ 4.253231] mtd_read+0x68/0xac
Signed-off-by: Christophe Kerello christophe.kerello@foss.st.com Cc: stable@vger.kernel.org Fixes: 2cd457f328c1 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/stm32_fmc2_nand.c | 28 +++++++++------------------- 1 file changed, 9 insertions(+), 19 deletions(-)
--- a/drivers/mtd/nand/raw/stm32_fmc2_nand.c +++ b/drivers/mtd/nand/raw/stm32_fmc2_nand.c @@ -272,6 +272,7 @@ struct stm32_fmc2_nfc { struct sg_table dma_data_sg; struct sg_table dma_ecc_sg; u8 *ecc_buf; + dma_addr_t dma_ecc_addr; int dma_ecc_len; u32 tx_dma_max_burst; u32 rx_dma_max_burst; @@ -902,17 +903,10 @@ static int stm32_fmc2_nfc_xfer(struct na
if (!write_data && !raw) { /* Configure DMA ECC status */ - p = nfc->ecc_buf; for_each_sg(nfc->dma_ecc_sg.sgl, sg, eccsteps, s) { - sg_set_buf(sg, p, nfc->dma_ecc_len); - p += nfc->dma_ecc_len; - } - - ret = dma_map_sg(nfc->dev, nfc->dma_ecc_sg.sgl, - eccsteps, dma_data_dir); - if (!ret) { - ret = -EIO; - goto err_unmap_data; + sg_dma_address(sg) = nfc->dma_ecc_addr + + s * nfc->dma_ecc_len; + sg_dma_len(sg) = nfc->dma_ecc_len; }
desc_ecc = dmaengine_prep_slave_sg(nfc->dma_ecc_ch, @@ -921,7 +915,7 @@ static int stm32_fmc2_nfc_xfer(struct na DMA_PREP_INTERRUPT); if (!desc_ecc) { ret = -ENOMEM; - goto err_unmap_ecc; + goto err_unmap_data; }
reinit_completion(&nfc->dma_ecc_complete); @@ -929,7 +923,7 @@ static int stm32_fmc2_nfc_xfer(struct na desc_ecc->callback_param = &nfc->dma_ecc_complete; ret = dma_submit_error(dmaengine_submit(desc_ecc)); if (ret) - goto err_unmap_ecc; + goto err_unmap_data;
dma_async_issue_pending(nfc->dma_ecc_ch); } @@ -949,7 +943,7 @@ static int stm32_fmc2_nfc_xfer(struct na if (!write_data && !raw) dmaengine_terminate_all(nfc->dma_ecc_ch); ret = -ETIMEDOUT; - goto err_unmap_ecc; + goto err_unmap_data; }
/* Wait DMA data transfer completion */ @@ -969,11 +963,6 @@ static int stm32_fmc2_nfc_xfer(struct na } }
-err_unmap_ecc: - if (!write_data && !raw) - dma_unmap_sg(nfc->dev, nfc->dma_ecc_sg.sgl, - eccsteps, dma_data_dir); - err_unmap_data: dma_unmap_sg(nfc->dev, nfc->dma_data_sg.sgl, eccsteps, dma_data_dir);
@@ -1610,7 +1599,8 @@ static int stm32_fmc2_nfc_dma_setup(stru return ret;
/* Allocate a buffer to store ECC status registers */ - nfc->ecc_buf = devm_kzalloc(nfc->dev, FMC2_MAX_ECC_BUF_LEN, GFP_KERNEL); + nfc->ecc_buf = dmam_alloc_coherent(nfc->dev, FMC2_MAX_ECC_BUF_LEN, + &nfc->dma_ecc_addr, GFP_KERNEL); if (!nfc->ecc_buf) return -ENOMEM;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe Kerello christophe.kerello@foss.st.com
commit 811c0da4542df3c065f6cb843ced68780e27bb44 upstream.
In case OOB write is requested during a data write, ECC is currently lost. Avoid this issue by only writing in the free spare area. This issue has been seen with a YAFFS2 file system.
Signed-off-by: Christophe Kerello christophe.kerello@foss.st.com Cc: stable@vger.kernel.org Fixes: 2cd457f328c1 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/stm32_fmc2_nand.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
--- a/drivers/mtd/nand/raw/stm32_fmc2_nand.c +++ b/drivers/mtd/nand/raw/stm32_fmc2_nand.c @@ -985,9 +985,21 @@ static int stm32_fmc2_nfc_seq_write(stru
/* Write oob */ if (oob_required) { - ret = nand_change_write_column_op(chip, mtd->writesize, - chip->oob_poi, mtd->oobsize, - false); + unsigned int offset_in_page = mtd->writesize; + const void *buf = chip->oob_poi; + unsigned int len = mtd->oobsize; + + if (!raw) { + struct mtd_oob_region oob_free; + + mtd_ooblayout_free(mtd, 0, &oob_free); + offset_in_page += oob_free.offset; + buf += oob_free.offset; + len = oob_free.length; + } + + ret = nand_change_write_column_op(chip, offset_in_page, + buf, len, false); if (ret) return ret; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amir Goldstein amir73il@gmail.com
commit e9c8da670e749f7dedc53e3af54a87b041918092 upstream.
We do not support passthrough operations other than read/write on regular file, so allowing non-regular backing files makes no sense.
Fixes: efad7153bf93 ("fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein amir73il@gmail.com Reviewed-by: Bernd Schubert bschubert@ddn.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/passthrough.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -233,6 +233,11 @@ int fuse_backing_open(struct fuse_conn * if (!file) goto out;
+ /* read/write/splice/mmap passthrough only relevant for regular files */ + res = d_is_dir(file->f_path.dentry) ? -EISDIR : -EINVAL; + if (!d_is_reg(file->f_path.dentry)) + goto out_fput; + backing_sb = file_inode(file)->i_sb; res = -ELOOP; if (backing_sb->s_stack_depth >= fc->max_stack_depth)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miklos Szeredi mszeredi@redhat.com
commit e5203209b3935041dac541bc5b37efb44220cc0b upstream.
Just like write(), copy_file_range() should check if the return value is less or equal to the requested number of bytes.
Reported-by: Chunsheng Luo luochunsheng@ustc.edu Closes: https://lore.kernel.org/all/20250807062425.694-1-luochunsheng@ustc.edu/ Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()") Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/file.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3295,6 +3295,9 @@ static ssize_t __fuse_copy_file_range(st fc->no_copy_file_range = 1; err = -EOPNOTSUPP; } + if (!err && outarg.size > len) + err = -EIO; + if (err) goto out;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miklos Szeredi mszeredi@redhat.com
commit 1e08938c3694f707bb165535df352ac97a8c75c9 upstream.
The FUSE protocol uses struct fuse_write_out to convey the return value of copy_file_range, which is restricted to uint32_t. But the COPY_FILE_RANGE interface supports a 64-bit size copies.
Currently the number of bytes copied is silently truncated to 32-bit, which may result in poor performance or even failure to copy in case of truncation to zero.
Reported-by: Florian Weimer fweimer@redhat.com Closes: https://lore.kernel.org/all/lhuh5ynl8z5.fsf@oldenburg.str.redhat.com/ Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()") Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3229,7 +3229,7 @@ static ssize_t __fuse_copy_file_range(st .nodeid_out = ff_out->nodeid, .fh_out = ff_out->fh, .off_out = pos_out, - .len = len, + .len = min_t(size_t, len, UINT_MAX & PAGE_MASK), .flags = flags }; struct fuse_write_out outarg;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei Yang richard.weiyang@gmail.com
commit 394bfac1c7f7b701c2c93834c5761b9c9ceeebcf upstream.
Commit 8ee53820edfd ("thp: mmu_notifier_test_young") introduced mmu_notifier_test_young(), but we are passing the wrong address. In xxx_scan_pmd(), the actual iteration address is "_address" not "address". We seem to misuse the variable on the very beginning.
Change it to the right one.
[akpm@linux-foundation.org fix whitespace, per everyone] Link: https://lkml.kernel.org/r/20250822063318.11644-1-richard.weiyang@gmail.com Fixes: 8ee53820edfd ("thp: mmu_notifier_test_young") Signed-off-by: Wei Yang richard.weiyang@gmail.com Reviewed-by: Dev Jain dev.jain@arm.com Reviewed-by: Zi Yan ziy@nvidia.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: Liam R. Howlett Liam.Howlett@oracle.com Cc: Nico Pache npache@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Barry Song baohua@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/khugepaged.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1403,8 +1403,8 @@ static int hpage_collapse_scan_pmd(struc */ if (cc->is_khugepaged && (pte_young(pteval) || folio_test_young(folio) || - folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm, - address))) + folio_test_referenced(folio) || + mmu_notifier_test_young(vma->vm_mm, _address))) referenced++; } if (!writable) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaohe Lin linmiaohe@huawei.com
commit d613f53c83ec47089c4e25859d5e8e0359f6f8da upstream.
When I did memory failure tests, below panic occurs:
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page)) kernel BUG at include/linux/page-flags.h:616! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40 RIP: 0010:unpoison_memory+0x2f3/0x590 RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246 RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0 RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000 R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0 Call Trace: <TASK> unpoison_memory+0x2f3/0x590 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110 debugfs_attr_write+0x42/0x60 full_proxy_write+0x5b/0x80 vfs_write+0xd5/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xb9/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f08f0314887 RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887 RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001 RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009 R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00 </TASK> Modules linked in: hwpoison_inject ---[ end trace 0000000000000000 ]--- RIP: 0010:unpoison_memory+0x2f3/0x590 RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246 RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0 RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000 R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]---
The root cause is that unpoison_memory() tries to check the PG_HWPoison flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is triggered. This can be reproduced by below steps:
1.Offline memory block:
echo offline > /sys/devices/system/memory/memory12/state
2.Get offlined memory pfn:
page-types -b n -rlN
3.Write pfn to unpoison-pfn
echo <pfn> > /sys/kernel/debug/hwpoison/unpoison-pfn
This scenario can be identified by pfn_to_online_page() returning NULL. And ZONE_DEVICE pages are never expected, so we can simply fail if pfn_to_online_page() == NULL to fix the bug.
Link: https://lkml.kernel.org/r/20250828024618.1744895-1-linmiaohe@huawei.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Suggested-by: David Hildenbrand david@redhat.com Acked-by: David Hildenbrand david@redhat.com Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2570,10 +2570,9 @@ int unpoison_memory(unsigned long pfn) static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
- if (!pfn_valid(pfn)) - return -ENXIO; - - p = pfn_to_page(pfn); + p = pfn_to_online_page(pfn); + if (!p) + return -EIO; folio = page_folio(p);
mutex_lock(&mf_mutex);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kyle Meyer kyle.meyer@hpe.com
commit 3be306cccdccede13e3cefd0c14e430cc2b7c9c7 upstream.
Duplicate memory errors can be reported by multiple sources.
Passing an already poisoned page to action_result() causes issues:
* The amount of hardware corrupted memory is incorrectly updated. * Per NUMA node MF stats are incorrectly updated. * Redundant "already poisoned" messages are printed.
Avoid those issues by:
* Skipping hardware corrupted memory updates for already poisoned pages. * Skipping per NUMA node MF stats updates for already poisoned pages. * Dropping redundant "already poisoned" messages.
Make MF_MSG_ALREADY_POISONED consistent with other action_page_types and make calls to action_result() consistent for already poisoned normal pages and huge pages.
Link: https://lkml.kernel.org/r/aLCiHMy12Ck3ouwC@hpe.com Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages") Signed-off-by: Kyle Meyer kyle.meyer@hpe.com Reviewed-by: Jiaqi Yan jiaqiyan@google.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Jane Chu jane.chu@oracle.com Acked-by: Miaohe Lin linmiaohe@huawei.com Cc: Borislav Betkov bp@alien8.de Cc: Kyle Meyer kyle.meyer@hpe.com Cc: Liam Howlett liam.howlett@oracle.com Cc: Lorenzo Stoakes lorenzo.stoakes@oracle.com Cc: "Luck, Tony" tony.luck@intel.com Cc: Michal Hocko mhocko@suse.com Cc: Mike Rapoport rppt@kernel.org Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Oscar Salvador osalvador@suse.de Cc: Russ Anderson russ.anderson@hpe.com Cc: Suren Baghdasaryan surenb@google.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -948,7 +948,7 @@ static const char * const action_page_ty [MF_MSG_BUDDY] = "free buddy page", [MF_MSG_DAX] = "dax page", [MF_MSG_UNSPLIT_THP] = "unsplit thp", - [MF_MSG_ALREADY_POISONED] = "already poisoned", + [MF_MSG_ALREADY_POISONED] = "already poisoned page", [MF_MSG_UNKNOWN] = "unknown page", };
@@ -1341,9 +1341,10 @@ static int action_result(unsigned long p { trace_memory_failure_event(pfn, type, result);
- num_poisoned_pages_inc(pfn); - - update_per_node_mf_stats(pfn, result); + if (type != MF_MSG_ALREADY_POISONED) { + num_poisoned_pages_inc(pfn); + update_per_node_mf_stats(pfn, result); + }
pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[type], action_name[result]); @@ -2086,12 +2087,11 @@ retry: *hugetlb = 0; return 0; } else if (res == -EHWPOISON) { - pr_err("%#lx: already hardware poisoned\n", pfn); if (flags & MF_ACTION_REQUIRED) { folio = page_folio(p); res = kill_accessing_process(current, folio_pfn(folio), flags); - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED); } + action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED); return res; } else if (res == -EBUSY) { if (!(flags & MF_NO_RETRY)) { @@ -2273,7 +2273,6 @@ try_again: goto unlock_mutex;
if (TestSetPageHWPoison(p)) { - pr_err("%#lx: already hardware poisoned\n", pfn); res = -EHWPOISON; if (flags & MF_ACTION_REQUIRED) res = kill_accessing_process(current, pfn, flags);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sang-Heon Jeon ekffu200098@gmail.com
commit ce652aac9c90a96c6536681d17518efb1f660fb8 upstream.
Kernel initializes the "jiffies" timer as 5 minutes below zero, as shown in include/linux/jiffies.h
/* * Have the 32 bit jiffies value wrap 5 minutes after boot * so jiffies wrap bugs show up earlier. */ #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))
And jiffies comparison help functions cast unsigned value to signed to cover wraparound
#define time_after_eq(a,b) \ (typecheck(unsigned long, a) && \ typecheck(unsigned long, b) && \ ((long)((a) - (b)) >= 0))
When quota->charged_from is initialized to 0, time_after_eq() can incorrectly return FALSE even after reset_interval has elapsed. This occurs when (jiffies - reset_interval) produces a value with MSB=1, which is interpreted as negative in signed arithmetic.
This issue primarily affects 32-bit systems because: On 64-bit systems: MSB=1 values occur after ~292 million years from boot (assuming HZ=1000), almost impossible.
On 32-bit systems: MSB=1 values occur during the first 5 minutes after boot, and the second half of every jiffies wraparound cycle, starting from day 25 (assuming HZ=1000)
When above unexpected FALSE return from time_after_eq() occurs, the charging window will not reset. The user impact depends on esz value at that time.
If esz is 0, scheme ignores configured quotas and runs without any limits.
If esz is not 0, scheme stops working once the quota is exhausted. It remains until the charging window finally resets.
So, change quota->charged_from to jiffies at damos_adjust_quota() when it is considered as the first charge window. By this change, we can avoid unexpected FALSE return from time_after_eq()
Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com Fixes: 2b8a248d5873 ("mm/damon/schemes: implement size quota for schemes application speed control") # 5.16 Signed-off-by: Sang-Heon Jeon ekffu200098@gmail.com Reviewed-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/core.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -1596,6 +1596,10 @@ static void damos_adjust_quota(struct da if (!quota->ms && !quota->sz && list_empty("a->goals)) return;
+ /* First charge window */ + if (!quota->total_charged_sz && !quota->charged_from) + quota->charged_from = jiffies; + /* New charge window starts */ if (time_after_eq(jiffies, quota->charged_from + msecs_to_jiffies(quota->reset_interval))) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quanmin Yan yanquanmin1@huawei.com
commit 711f19dfd783ffb37ca4324388b9c4cb87e71363 upstream.
Patch series "mm/damon: avoid divide-by-zero in DAMON module's parameters application".
DAMON's RECLAIM and LRU_SORT modules perform no validation on user-configured parameters during application, which may lead to division-by-zero errors.
Avoid the divide-by-zero by adding validation checks when DAMON modules attempt to apply the parameters.
This patch (of 2):
During the calculation of 'hot_thres' and 'cold_thres', either 'sample_interval' or 'aggr_interval' is used as the divisor, which may lead to division-by-zero errors. Fix it by directly returning -EINVAL when such a case occurs. Additionally, since 'aggr_interval' is already required to be set no smaller than 'sample_interval' in damon_set_attrs(), only the case where 'sample_interval' is zero needs to be checked.
Link: https://lkml.kernel.org/r/20250827115858.1186261-2-yanquanmin1@huawei.com Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting") Signed-off-by: Quanmin Yan yanquanmin1@huawei.com Reviewed-by: SeongJae Park sj@kernel.org Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: ze zuo zuoze1@huawei.com Cc: stable@vger.kernel.org [6.0+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/lru_sort.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/mm/damon/lru_sort.c +++ b/mm/damon/lru_sort.c @@ -198,6 +198,11 @@ static int damon_lru_sort_apply_paramete if (err) return err;
+ if (!damon_lru_sort_mon_attrs.sample_interval) { + err = -EINVAL; + goto out; + } + err = damon_set_attrs(ctx, &damon_lru_sort_mon_attrs); if (err) goto out;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit 4de37a48b6b58faaded9eb765047cf0d8785ea18 upstream.
The for_each_child_of_node() helper drops the reference it takes to each node as it iterates over children and an explicit of_node_put() is only needed when exiting the loop early.
Drop the recently introduced bogus additional reference count decrement at each iteration that could potentially lead to a use-after-free.
Fixes: 1f403699c40f ("drm/mediatek: Fix device/node reference count leaks in mtk_drm_get_all_drm_priv") Cc: Ma Ke make24@iscas.ac.cn Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Reviewed-by: CK Hu ck.hu@mediatek.com Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Link: https://patchwork.kernel.org/project/dri-devel/patch/20250829090345.21075-2-... Signed-off-by: Chun-Kuang Hu chunkuang.hu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/mediatek/mtk_drm_drv.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -381,11 +381,11 @@ static bool mtk_drm_get_all_drm_priv(str
of_id = of_match_node(mtk_drm_of_ids, node); if (!of_id) - goto next_put_node; + continue;
pdev = of_find_device_by_node(node); if (!pdev) - goto next_put_node; + continue;
drm_dev = device_find_child(&pdev->dev, NULL, mtk_drm_match); if (!drm_dev) @@ -411,11 +411,10 @@ next_put_device_drm_dev: next_put_device_pdev_dev: put_device(&pdev->dev);
-next_put_node: - of_node_put(node); - - if (cnt == MAX_CRTC) + if (cnt == MAX_CRTC) { + of_node_put(node); break; + } }
if (drm_priv->data->mmsys_dev_num == cnt) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Hellström thomas.hellstrom@linux.intel.com
commit 5c87fee3c96ce898ad681552404a66c7605193c0 upstream.
VRAM+TT bos that are evicted from VRAM to TT may remain in TT also after a revalidation following eviction or suspend.
This manifests itself as applications becoming sluggish after buffer objects get evicted or after a resume from suspend or hibernation.
If the bo supports placement in both VRAM and TT, and we are on DGFX, mark the TT placement as fallback. This means that it is tried only after VRAM + eviction.
This flaw has probably been present since the xe module was upstreamed but use a Fixes: commit below where backporting is likely to be simple. For earlier versions we need to open- code the fallback algorithm in the driver.
v2: - Remove check for dgfx. (Matthew Auld) - Update the xe_dma_buf kunit test for the new strategy (CI) - Allow dma-buf to pin in current placement (CI) - Make xe_bo_validate() for pinned bos a NOP.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995 Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") Cc: Matthew Brost matthew.brost@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: stable@vger.kernel.org # v6.9+ Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Matthew Auld matthew.auld@intel.com Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel... (cherry picked from commit cb3d7b3b46b799c96b54f8e8fe36794a55a77f0b) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/xe/tests/xe_bo.c | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 10 +--------- drivers/gpu/drm/xe/xe_bo.c | 16 ++++++++++++---- drivers/gpu/drm/xe/xe_bo.h | 2 +- drivers/gpu/drm/xe/xe_dma_buf.c | 2 +- 5 files changed, 16 insertions(+), 16 deletions(-)
--- a/drivers/gpu/drm/xe/tests/xe_bo.c +++ b/drivers/gpu/drm/xe/tests/xe_bo.c @@ -222,7 +222,7 @@ static int evict_test_run_tile(struct xe }
xe_bo_lock(external, false); - err = xe_bo_pin_external(external); + err = xe_bo_pin_external(external, false); xe_bo_unlock(external); if (err) { KUNIT_FAIL(test, "external bo pin err=%pe\n", --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c @@ -89,15 +89,7 @@ static void check_residency(struct kunit return; }
- /* - * If on different devices, the exporter is kept in system if - * possible, saving a migration step as the transfer is just - * likely as fast from system memory. - */ - if (params->mem_mask & XE_BO_FLAG_SYSTEM) - KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, XE_PL_TT)); - else - KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, mem_type)); + KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, mem_type));
if (params->force_different_devices) KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(imported, XE_PL_TT)); --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -157,6 +157,8 @@ static void try_add_system(struct xe_dev
bo->placements[*c] = (struct ttm_place) { .mem_type = XE_PL_TT, + .flags = (bo_flags & XE_BO_FLAG_VRAM_MASK) ? + TTM_PL_FLAG_FALLBACK : 0, }; *c += 1; } @@ -1743,6 +1745,7 @@ uint64_t vram_region_gpu_offset(struct t /** * xe_bo_pin_external - pin an external BO * @bo: buffer object to be pinned + * @in_place: Pin in current placement, don't attempt to migrate. * * Pin an external (not tied to a VM, can be exported via dma-buf / prime FD) * BO. Unique call compared to xe_bo_pin as this function has it own set of @@ -1750,7 +1753,7 @@ uint64_t vram_region_gpu_offset(struct t * * Returns 0 for success, negative error code otherwise. */ -int xe_bo_pin_external(struct xe_bo *bo) +int xe_bo_pin_external(struct xe_bo *bo, bool in_place) { struct xe_device *xe = xe_bo_device(bo); int err; @@ -1759,9 +1762,11 @@ int xe_bo_pin_external(struct xe_bo *bo) xe_assert(xe, xe_bo_is_user(bo));
if (!xe_bo_is_pinned(bo)) { - err = xe_bo_validate(bo, NULL, false); - if (err) - return err; + if (!in_place) { + err = xe_bo_validate(bo, NULL, false); + if (err) + return err; + }
if (xe_bo_is_vram(bo)) { spin_lock(&xe->pinned.lock); @@ -1913,6 +1918,9 @@ int xe_bo_validate(struct xe_bo *bo, str .no_wait_gpu = false, };
+ if (xe_bo_is_pinned(bo)) + return 0; + if (vm) { lockdep_assert_held(&vm->lock); xe_vm_assert_held(vm); --- a/drivers/gpu/drm/xe/xe_bo.h +++ b/drivers/gpu/drm/xe/xe_bo.h @@ -173,7 +173,7 @@ static inline void xe_bo_unlock_vm_held( } }
-int xe_bo_pin_external(struct xe_bo *bo); +int xe_bo_pin_external(struct xe_bo *bo, bool in_place); int xe_bo_pin(struct xe_bo *bo); void xe_bo_unpin_external(struct xe_bo *bo); void xe_bo_unpin(struct xe_bo *bo); --- a/drivers/gpu/drm/xe/xe_dma_buf.c +++ b/drivers/gpu/drm/xe/xe_dma_buf.c @@ -72,7 +72,7 @@ static int xe_dma_buf_pin(struct dma_buf return ret; }
- ret = xe_bo_pin_external(bo); + ret = xe_bo_pin_external(bo, true); xe_assert(xe, !ret);
return 0;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Rosca david.rosca@amd.com
commit 3318f2d20ce48849855df5e190813826d0bc3653 upstream.
There is no reason to require this to happen on first submitted IB only. We need to wait for the queue to be idle, but it can be done at any time (including when there are multiple video sessions active).
Signed-off-by: David Rosca david.rosca@amd.com Reviewed-by: Leo Liu leo.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 8908fdce0634a623404e9923ed2f536101a39db5) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 12 ++++++++---- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 12 ++++++++---- 2 files changed, 16 insertions(+), 8 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -1813,15 +1813,19 @@ static int vcn_v3_0_limit_sched(struct a struct amdgpu_job *job) { struct drm_gpu_scheduler **scheds; - - /* The create msg must be in the first IB submitted */ - if (atomic_read(&job->base.entity->fence_seq)) - return -EINVAL; + struct dma_fence *fence;
/* if VCN0 is harvested, we can't support AV1 */ if (p->adev->vcn.harvest_config & AMDGPU_VCN_HARVEST_VCN0) return -EINVAL;
+ /* wait for all jobs to finish before switching to instance 0 */ + fence = amdgpu_ctx_get_fence(p->ctx, job->base.entity, ~0ull); + if (fence) { + dma_fence_wait(fence, false); + dma_fence_put(fence); + } + scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_DEC] [AMDGPU_RING_PRIO_DEFAULT].sched; drm_sched_entity_modify_sched(job->base.entity, scheds, 1); --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c @@ -1737,15 +1737,19 @@ static int vcn_v4_0_limit_sched(struct a struct amdgpu_job *job) { struct drm_gpu_scheduler **scheds; - - /* The create msg must be in the first IB submitted */ - if (atomic_read(&job->base.entity->fence_seq)) - return -EINVAL; + struct dma_fence *fence;
/* if VCN0 is harvested, we can't support AV1 */ if (p->adev->vcn.harvest_config & AMDGPU_VCN_HARVEST_VCN0) return -EINVAL;
+ /* wait for all jobs to finish before switching to instance 0 */ + fence = amdgpu_ctx_get_fence(p->ctx, job->base.entity, ~0ull); + if (fence) { + dma_fence_wait(fence, false); + dma_fence_put(fence); + } + scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_ENC] [AMDGPU_RING_PRIO_0].sched; drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Rosca david.rosca@amd.com
commit 2b10cb58d7a3fd621ec9b2ba765a092e562ef998 upstream.
There can be multiple engine info packages in one IB and the first one may be common engine, not decode/encode. We need to parse the entire IB instead of stopping after finding first engine info.
Signed-off-by: David Rosca david.rosca@amd.com Reviewed-by: Leo Liu leo.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit dc8f9f0f45166a6b37864e7a031c726981d6e5fc) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 52 +++++++++++++--------------------- 1 file changed, 21 insertions(+), 31 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c @@ -1840,22 +1840,16 @@ out:
#define RADEON_VCN_ENGINE_TYPE_ENCODE (0x00000002) #define RADEON_VCN_ENGINE_TYPE_DECODE (0x00000003) - #define RADEON_VCN_ENGINE_INFO (0x30000001) -#define RADEON_VCN_ENGINE_INFO_MAX_OFFSET 16 - #define RENCODE_ENCODE_STANDARD_AV1 2 #define RENCODE_IB_PARAM_SESSION_INIT 0x00000003 -#define RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET 64
-/* return the offset in ib if id is found, -1 otherwise - * to speed up the searching we only search upto max_offset - */ -static int vcn_v4_0_enc_find_ib_param(struct amdgpu_ib *ib, uint32_t id, int max_offset) +/* return the offset in ib if id is found, -1 otherwise */ +static int vcn_v4_0_enc_find_ib_param(struct amdgpu_ib *ib, uint32_t id, int start) { int i;
- for (i = 0; i < ib->length_dw && i < max_offset && ib->ptr[i] >= 8; i += ib->ptr[i]/4) { + for (i = start; i < ib->length_dw && ib->ptr[i] >= 8; i += ib->ptr[i] / 4) { if (ib->ptr[i + 1] == id) return i; } @@ -1870,33 +1864,29 @@ static int vcn_v4_0_ring_patch_cs_in_pla struct amdgpu_vcn_decode_buffer *decode_buffer; uint64_t addr; uint32_t val; - int idx; + int idx = 0, sidx;
/* The first instance can decode anything */ if (!ring->me) return 0;
- /* RADEON_VCN_ENGINE_INFO is at the top of ib block */ - idx = vcn_v4_0_enc_find_ib_param(ib, RADEON_VCN_ENGINE_INFO, - RADEON_VCN_ENGINE_INFO_MAX_OFFSET); - if (idx < 0) /* engine info is missing */ - return 0; - - val = amdgpu_ib_get_value(ib, idx + 2); /* RADEON_VCN_ENGINE_TYPE */ - if (val == RADEON_VCN_ENGINE_TYPE_DECODE) { - decode_buffer = (struct amdgpu_vcn_decode_buffer *)&ib->ptr[idx + 6]; - - if (!(decode_buffer->valid_buf_flag & 0x1)) - return 0; - - addr = ((u64)decode_buffer->msg_buffer_address_hi) << 32 | - decode_buffer->msg_buffer_address_lo; - return vcn_v4_0_dec_msg(p, job, addr); - } else if (val == RADEON_VCN_ENGINE_TYPE_ENCODE) { - idx = vcn_v4_0_enc_find_ib_param(ib, RENCODE_IB_PARAM_SESSION_INIT, - RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET); - if (idx >= 0 && ib->ptr[idx + 2] == RENCODE_ENCODE_STANDARD_AV1) - return vcn_v4_0_limit_sched(p, job); + while ((idx = vcn_v4_0_enc_find_ib_param(ib, RADEON_VCN_ENGINE_INFO, idx)) >= 0) { + val = amdgpu_ib_get_value(ib, idx + 2); /* RADEON_VCN_ENGINE_TYPE */ + if (val == RADEON_VCN_ENGINE_TYPE_DECODE) { + decode_buffer = (struct amdgpu_vcn_decode_buffer *)&ib->ptr[idx + 6]; + + if (!(decode_buffer->valid_buf_flag & 0x1)) + return 0; + + addr = ((u64)decode_buffer->msg_buffer_address_hi) << 32 | + decode_buffer->msg_buffer_address_lo; + return vcn_v4_0_dec_msg(p, job, addr); + } else if (val == RADEON_VCN_ENGINE_TYPE_ENCODE) { + sidx = vcn_v4_0_enc_find_ib_param(ib, RENCODE_IB_PARAM_SESSION_INIT, idx); + if (sidx >= 0 && ib->ptr[sidx + 2] == RENCODE_ENCODE_STANDARD_AV1) + return vcn_v4_0_limit_sched(p, job); + } + idx += ib->ptr[idx] / 4; } return 0; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
[ Upstream commit 6b830c6a023ff6e8fe05dbe47a9e5cd276df09ee ]
This attribute is added with the 'created' and 'established' events, but the documentation didn't mention it.
The documentation in the UAPI header has been auto-generated by:
./tools/net/ynl/ynl-regen.sh
Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-1-... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 7094b84863e5 ("netlink: specs: mptcp: fix if-idx attribute type") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/netlink/specs/mptcp_pm.yaml | 6 ++++-- include/uapi/linux/mptcp_pm.h | 11 ++++++----- 2 files changed, 10 insertions(+), 7 deletions(-)
--- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -23,7 +23,8 @@ definitions: - name: created doc: - token, family, saddr4 | saddr6, daddr4 | daddr6, sport, dport + token, family, saddr4 | saddr6, daddr4 | daddr6, sport, dport, + server-side A new MPTCP connection has been created. It is the good time to allocate memory and send ADD_ADDR if needed. Depending on the traffic-patterns it can take a long time until the @@ -31,7 +32,8 @@ definitions: - name: established doc: - token, family, saddr4 | saddr6, daddr4 | daddr6, sport, dport + token, family, saddr4 | saddr6, daddr4 | daddr6, sport, dport, + server-side A MPTCP connection is established (can start new subflows). - name: closed --- a/include/uapi/linux/mptcp_pm.h +++ b/include/uapi/linux/mptcp_pm.h @@ -13,12 +13,13 @@ * enum mptcp_event_type * @MPTCP_EVENT_UNSPEC: unused event * @MPTCP_EVENT_CREATED: token, family, saddr4 | saddr6, daddr4 | daddr6, - * sport, dport A new MPTCP connection has been created. It is the good time - * to allocate memory and send ADD_ADDR if needed. Depending on the - * traffic-patterns it can take a long time until the MPTCP_EVENT_ESTABLISHED - * is sent. + * sport, dport, server-side A new MPTCP connection has been created. It is + * the good time to allocate memory and send ADD_ADDR if needed. Depending on + * the traffic-patterns it can take a long time until the + * MPTCP_EVENT_ESTABLISHED is sent. * @MPTCP_EVENT_ESTABLISHED: token, family, saddr4 | saddr6, daddr4 | daddr6, - * sport, dport A MPTCP connection is established (can start new subflows). + * sport, dport, server-side A MPTCP connection is established (can start new + * subflows). * @MPTCP_EVENT_CLOSED: token A MPTCP connection has stopped. * @MPTCP_EVENT_ANNOUNCED: token, rem_id, family, daddr4 | daddr6 [, dport] A * new address has been announced by the peer.
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
[ Upstream commit bea87657b5ee8e6f18af2833ee4b88212ef52d28 ]
The rendered version of the MPTCP events [1] looked strange, because the whole content of the 'doc' was displayed in the same block.
It was then not clear that the first words, not even ended by a period, were the attributes that are defined when such events are emitted. These attributes have now been moved to the end, prefixed by 'Attributes:' and ended with a period. Note that '>-' has been added after 'doc:' to allow ':' in the text below.
The documentation in the UAPI header has been auto-generated by:
./tools/net/ynl/ynl-regen.sh
Link: https://docs.kernel.org/networking/netlink_spec/mptcp_pm.html#event-type [1] Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-2-... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 7094b84863e5 ("netlink: specs: mptcp: fix if-idx attribute type") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/netlink/specs/mptcp_pm.yaml | 50 ++++++++++++++-------------- include/uapi/linux/mptcp_pm.h | 53 +++++++++++++++--------------- 2 files changed, 52 insertions(+), 51 deletions(-)
--- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -22,67 +22,67 @@ definitions: doc: unused event - name: created - doc: - token, family, saddr4 | saddr6, daddr4 | daddr6, sport, dport, - server-side + doc: >- A new MPTCP connection has been created. It is the good time to allocate memory and send ADD_ADDR if needed. Depending on the traffic-patterns it can take a long time until the MPTCP_EVENT_ESTABLISHED is sent. + Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, sport, + dport, server-side. - name: established - doc: - token, family, saddr4 | saddr6, daddr4 | daddr6, sport, dport, - server-side + doc: >- A MPTCP connection is established (can start new subflows). + Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, sport, + dport, server-side. - name: closed - doc: - token + doc: >- A MPTCP connection has stopped. + Attribute: token. - name: announced value: 6 - doc: - token, rem_id, family, daddr4 | daddr6 [, dport] + doc: >- A new address has been announced by the peer. + Attributes: token, rem_id, family, daddr4 | daddr6 [, dport]. - name: removed - doc: - token, rem_id + doc: >- An address has been lost by the peer. + Attributes: token, rem_id. - name: sub-established value: 10 - doc: - token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | daddr6, sport, - dport, backup, if_idx [, error] + doc: >- A new subflow has been established. 'error' should not be set. + Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | + daddr6, sport, dport, backup, if_idx [, error]. - name: sub-closed - doc: - token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | daddr6, sport, - dport, backup, if_idx [, error] + doc: >- A subflow has been closed. An error (copy of sk_err) could be set if an error has been detected for this subflow. + Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | + daddr6, sport, dport, backup, if_idx [, error]. - name: sub-priority value: 13 - doc: - token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | daddr6, sport, - dport, backup, if_idx [, error] + doc: >- The priority of a subflow has changed. 'error' should not be set. + Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | + daddr6, sport, dport, backup, if_idx [, error]. - name: listener-created value: 15 - doc: - family, sport, saddr4 | saddr6 + doc: >- A new PM listener is created. + Attributes: family, sport, saddr4 | saddr6. - name: listener-closed - doc: - family, sport, saddr4 | saddr6 + doc: >- A PM listener is closed. + Attributes: family, sport, saddr4 | saddr6.
attribute-sets: - --- a/include/uapi/linux/mptcp_pm.h +++ b/include/uapi/linux/mptcp_pm.h @@ -12,32 +12,33 @@ /** * enum mptcp_event_type * @MPTCP_EVENT_UNSPEC: unused event - * @MPTCP_EVENT_CREATED: token, family, saddr4 | saddr6, daddr4 | daddr6, - * sport, dport, server-side A new MPTCP connection has been created. It is - * the good time to allocate memory and send ADD_ADDR if needed. Depending on - * the traffic-patterns it can take a long time until the - * MPTCP_EVENT_ESTABLISHED is sent. - * @MPTCP_EVENT_ESTABLISHED: token, family, saddr4 | saddr6, daddr4 | daddr6, - * sport, dport, server-side A MPTCP connection is established (can start new - * subflows). - * @MPTCP_EVENT_CLOSED: token A MPTCP connection has stopped. - * @MPTCP_EVENT_ANNOUNCED: token, rem_id, family, daddr4 | daddr6 [, dport] A - * new address has been announced by the peer. - * @MPTCP_EVENT_REMOVED: token, rem_id An address has been lost by the peer. - * @MPTCP_EVENT_SUB_ESTABLISHED: token, family, loc_id, rem_id, saddr4 | - * saddr6, daddr4 | daddr6, sport, dport, backup, if_idx [, error] A new - * subflow has been established. 'error' should not be set. - * @MPTCP_EVENT_SUB_CLOSED: token, family, loc_id, rem_id, saddr4 | saddr6, - * daddr4 | daddr6, sport, dport, backup, if_idx [, error] A subflow has been - * closed. An error (copy of sk_err) could be set if an error has been - * detected for this subflow. - * @MPTCP_EVENT_SUB_PRIORITY: token, family, loc_id, rem_id, saddr4 | saddr6, - * daddr4 | daddr6, sport, dport, backup, if_idx [, error] The priority of a - * subflow has changed. 'error' should not be set. - * @MPTCP_EVENT_LISTENER_CREATED: family, sport, saddr4 | saddr6 A new PM - * listener is created. - * @MPTCP_EVENT_LISTENER_CLOSED: family, sport, saddr4 | saddr6 A PM listener - * is closed. + * @MPTCP_EVENT_CREATED: A new MPTCP connection has been created. It is the + * good time to allocate memory and send ADD_ADDR if needed. Depending on the + * traffic-patterns it can take a long time until the MPTCP_EVENT_ESTABLISHED + * is sent. Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, + * sport, dport, server-side. + * @MPTCP_EVENT_ESTABLISHED: A MPTCP connection is established (can start new + * subflows). Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, + * sport, dport, server-side. + * @MPTCP_EVENT_CLOSED: A MPTCP connection has stopped. Attribute: token. + * @MPTCP_EVENT_ANNOUNCED: A new address has been announced by the peer. + * Attributes: token, rem_id, family, daddr4 | daddr6 [, dport]. + * @MPTCP_EVENT_REMOVED: An address has been lost by the peer. Attributes: + * token, rem_id. + * @MPTCP_EVENT_SUB_ESTABLISHED: A new subflow has been established. 'error' + * should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | + * saddr6, daddr4 | daddr6, sport, dport, backup, if_idx [, error]. + * @MPTCP_EVENT_SUB_CLOSED: A subflow has been closed. An error (copy of + * sk_err) could be set if an error has been detected for this subflow. + * Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | + * daddr6, sport, dport, backup, if_idx [, error]. + * @MPTCP_EVENT_SUB_PRIORITY: The priority of a subflow has changed. 'error' + * should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | + * saddr6, daddr4 | daddr6, sport, dport, backup, if_idx [, error]. + * @MPTCP_EVENT_LISTENER_CREATED: A new PM listener is created. Attributes: + * family, sport, saddr4 | saddr6. + * @MPTCP_EVENT_LISTENER_CLOSED: A PM listener is closed. Attributes: family, + * sport, saddr4 | saddr6. */ enum mptcp_event_type { MPTCP_EVENT_UNSPEC,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 9e6dd4c256d0774701637b958ba682eff4991277 ]
We're trying to add a strict regexp for the name format in the spec. Underscores will not be allowed, dashes should be used instead. This makes no difference to C (codegen, if used, replaces special chars in names) but it gives more uniform naming in Python.
Fixes: bc8aeb2045e2 ("Documentation: netlink: add a YAML spec for mptcp") Reviewed-by: Davide Caratti dcaratti@redhat.com Reviewed-by: Donald Hunter donald.hunter@gmail.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250624211002.3475021-8-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 7094b84863e5 ("netlink: specs: mptcp: fix if-idx attribute type") Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/netlink/specs/mptcp_pm.yaml | 8 ++++---- include/uapi/linux/mptcp_pm.h | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-)
--- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -57,21 +57,21 @@ definitions: doc: >- A new subflow has been established. 'error' should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | - daddr6, sport, dport, backup, if_idx [, error]. + daddr6, sport, dport, backup, if-idx [, error]. - name: sub-closed doc: >- A subflow has been closed. An error (copy of sk_err) could be set if an error has been detected for this subflow. Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | - daddr6, sport, dport, backup, if_idx [, error]. + daddr6, sport, dport, backup, if-idx [, error]. - name: sub-priority value: 13 doc: >- The priority of a subflow has changed. 'error' should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | - daddr6, sport, dport, backup, if_idx [, error]. + daddr6, sport, dport, backup, if-idx [, error]. - name: listener-created value: 15 @@ -255,7 +255,7 @@ attribute-sets: name: timeout type: u32 - - name: if_idx + name: if-idx type: u32 - name: reset-reason --- a/include/uapi/linux/mptcp_pm.h +++ b/include/uapi/linux/mptcp_pm.h @@ -27,14 +27,14 @@ * token, rem_id. * @MPTCP_EVENT_SUB_ESTABLISHED: A new subflow has been established. 'error' * should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | - * saddr6, daddr4 | daddr6, sport, dport, backup, if_idx [, error]. + * saddr6, daddr4 | daddr6, sport, dport, backup, if-idx [, error]. * @MPTCP_EVENT_SUB_CLOSED: A subflow has been closed. An error (copy of * sk_err) could be set if an error has been detected for this subflow. * Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | - * daddr6, sport, dport, backup, if_idx [, error]. + * daddr6, sport, dport, backup, if-idx [, error]. * @MPTCP_EVENT_SUB_PRIORITY: The priority of a subflow has changed. 'error' * should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | - * saddr6, daddr4 | daddr6, sport, dport, backup, if_idx [, error]. + * saddr6, daddr4 | daddr6, sport, dport, backup, if-idx [, error]. * @MPTCP_EVENT_LISTENER_CREATED: A new PM listener is created. Attributes: * family, sport, saddr4 | saddr6. * @MPTCP_EVENT_LISTENER_CLOSED: A PM listener is closed. Attributes: family,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
[ Upstream commit 7094b84863e5832cb1cd9c4b9d648904775b6bd9 ]
This attribute is used as a signed number in the code in pm_netlink.c:
nla_put_s32(skb, MPTCP_ATTR_IF_IDX, ssk->sk_bound_dev_if))
The specs should then reflect that. Note that other 'if-idx' attributes from the same .yaml file use a signed number as well.
Fixes: bc8aeb2045e2 ("Documentation: netlink: add a YAML spec for mptcp") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-1-5f2168a... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/netlink/specs/mptcp_pm.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -256,7 +256,7 @@ attribute-sets: type: u32 - name: if-idx - type: u32 + type: s32 - name: reset-reason type: u32
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Ridong chenridong@huawei.com
commit 3c9ba2777d6c86025e1ba4186dc5cd930e40ec5f upstream.
A use-after-free (UAF) vulnerability was identified in the PSI (Pressure Stall Information) monitoring mechanism:
BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140 Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
psi_trigger_poll+0x3c/0x140 cgroup_pressure_poll+0x70/0xa0 cgroup_file_poll+0x8c/0x100 kernfs_fop_poll+0x11c/0x1c0 ep_item_poll.isra.0+0x188/0x2c0
Allocated by task 1: cgroup_file_open+0x88/0x388 kernfs_fop_open+0x73c/0xaf0 do_dentry_open+0x5fc/0x1200 vfs_open+0xa0/0x3f0 do_open+0x7e8/0xd08 path_openat+0x2fc/0x6b0 do_filp_open+0x174/0x368
Freed by task 8462: cgroup_file_release+0x130/0x1f8 kernfs_drain_open_files+0x17c/0x440 kernfs_drain+0x2dc/0x360 kernfs_show+0x1b8/0x288 cgroup_file_show+0x150/0x268 cgroup_pressure_write+0x1dc/0x340 cgroup_file_write+0x274/0x548
Reproduction Steps: 1. Open test/cpu.pressure and establish epoll monitoring 2. Disable monitoring: echo 0 > test/cgroup.pressure 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
The race condition occurs because: 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it: - Releases PSI triggers via cgroup_file_release() - Frees of->priv through kernfs_drain_open_files() 2. While epoll still holds reference to the file and continues polling 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
epolling disable/enable cgroup.pressure fd=open(cpu.pressure) while(1) ... epoll_wait kernfs_fop_poll kernfs_get_active = true echo 0 > cgroup.pressure ... cgroup_file_show kernfs_show // inactive kn kernfs_drain_open_files cft->release(of); kfree(ctx); ... kernfs_get_active = false echo 1 > cgroup.pressure kernfs_show kernfs_activate_one(kn); kernfs_fop_poll kernfs_get_active = true cgroup_file_poll psi_trigger_poll // UAF ... end: close(fd)
To address this issue, introduce kernfs_get_active_of() for kernfs open files to obtain active references. This function will fail if the open file has been released. Replace kernfs_get_active() with kernfs_get_active_of() to prevent further operations on released file descriptors.
Fixes: 34f26a15611a ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface") Cc: stable stable@kernel.org Reported-by: Zhang Zhaotian zhangzhaotian@huawei.com Signed-off-by: Chen Ridong chenridong@huawei.com Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20250822070715.1565236-2-chenridong@huaweicloud.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/kernfs/file.c | 58 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 38 insertions(+), 20 deletions(-)
--- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -70,6 +70,24 @@ static struct kernfs_open_node *of_on(st !list_empty(&of->list)); }
+/* Get active reference to kernfs node for an open file */ +static struct kernfs_open_file *kernfs_get_active_of(struct kernfs_open_file *of) +{ + /* Skip if file was already released */ + if (unlikely(of->released)) + return NULL; + + if (!kernfs_get_active(of->kn)) + return NULL; + + return of; +} + +static void kernfs_put_active_of(struct kernfs_open_file *of) +{ + return kernfs_put_active(of->kn); +} + /** * kernfs_deref_open_node_locked - Get kernfs_open_node corresponding to @kn * @@ -139,7 +157,7 @@ static void kernfs_seq_stop_active(struc
if (ops->seq_stop) ops->seq_stop(sf, v); - kernfs_put_active(of->kn); + kernfs_put_active_of(of); }
static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos) @@ -152,7 +170,7 @@ static void *kernfs_seq_start(struct seq * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return ERR_PTR(-ENODEV);
ops = kernfs_ops(of->kn); @@ -238,7 +256,7 @@ static ssize_t kernfs_file_read_iter(str * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) { + if (!kernfs_get_active_of(of)) { len = -ENODEV; mutex_unlock(&of->mutex); goto out_free; @@ -252,7 +270,7 @@ static ssize_t kernfs_file_read_iter(str else len = -EINVAL;
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); mutex_unlock(&of->mutex);
if (len < 0) @@ -323,7 +341,7 @@ static ssize_t kernfs_fop_write_iter(str * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) { + if (!kernfs_get_active_of(of)) { mutex_unlock(&of->mutex); len = -ENODEV; goto out_free; @@ -335,7 +353,7 @@ static ssize_t kernfs_fop_write_iter(str else len = -EINVAL;
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); mutex_unlock(&of->mutex);
if (len > 0) @@ -357,13 +375,13 @@ static void kernfs_vma_open(struct vm_ar if (!of->vm_ops) return;
- if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return;
if (of->vm_ops->open) of->vm_ops->open(vma);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); }
static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf) @@ -375,14 +393,14 @@ static vm_fault_t kernfs_vma_fault(struc if (!of->vm_ops) return VM_FAULT_SIGBUS;
- if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return VM_FAULT_SIGBUS;
ret = VM_FAULT_SIGBUS; if (of->vm_ops->fault) ret = of->vm_ops->fault(vmf);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); return ret; }
@@ -395,7 +413,7 @@ static vm_fault_t kernfs_vma_page_mkwrit if (!of->vm_ops) return VM_FAULT_SIGBUS;
- if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return VM_FAULT_SIGBUS;
ret = 0; @@ -404,7 +422,7 @@ static vm_fault_t kernfs_vma_page_mkwrit else file_update_time(file);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); return ret; }
@@ -418,14 +436,14 @@ static int kernfs_vma_access(struct vm_a if (!of->vm_ops) return -EINVAL;
- if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) return -EINVAL;
ret = -EINVAL; if (of->vm_ops->access) ret = of->vm_ops->access(vma, addr, buf, len, write);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); return ret; }
@@ -455,7 +473,7 @@ static int kernfs_fop_mmap(struct file * mutex_lock(&of->mutex);
rc = -ENODEV; - if (!kernfs_get_active(of->kn)) + if (!kernfs_get_active_of(of)) goto out_unlock;
ops = kernfs_ops(of->kn); @@ -490,7 +508,7 @@ static int kernfs_fop_mmap(struct file * } vma->vm_ops = &kernfs_vm_ops; out_put: - kernfs_put_active(of->kn); + kernfs_put_active_of(of); out_unlock: mutex_unlock(&of->mutex);
@@ -852,7 +870,7 @@ static __poll_t kernfs_fop_poll(struct f struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry); __poll_t ret;
- if (!kernfs_get_active(kn)) + if (!kernfs_get_active_of(of)) return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI;
if (kn->attr.ops->poll) @@ -860,7 +878,7 @@ static __poll_t kernfs_fop_poll(struct f else ret = kernfs_generic_poll(of, wait);
- kernfs_put_active(kn); + kernfs_put_active_of(of); return ret; }
@@ -875,7 +893,7 @@ static loff_t kernfs_fop_llseek(struct f * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); - if (!kernfs_get_active(of->kn)) { + if (!kernfs_get_active_of(of)) { mutex_unlock(&of->mutex); return -ENODEV; } @@ -886,7 +904,7 @@ static loff_t kernfs_fop_llseek(struct f else ret = generic_file_llseek(file, offset, whence);
- kernfs_put_active(of->kn); + kernfs_put_active_of(of); mutex_unlock(&of->mutex); return ret; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Dryomov idryomov@gmail.com
commit cdbc9836c7afadad68f374791738f118263c5371 upstream.
There is a place where generic code in messenger.c is reading and another place where it is writing to con->v1 union member without checking that the union member is active (i.e. msgr1 is in use).
On 64-bit systems, con->v1.auth_retry overlaps with con->v2.out_iter, so such a read is almost guaranteed to return a bogus value instead of 0 when msgr2 is in use. This ends up being fairly benign because the side effect is just the invalidation of the authorizer and successive fetching of new tickets.
con->v1.connect_seq overlaps with con->v2.conn_bufs and the fact that it's being written to can cause more serious consequences, but luckily it's not something that happens often.
Cc: stable@vger.kernel.org Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ceph/messenger.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1524,7 +1524,7 @@ static void con_fault_finish(struct ceph * in case we faulted due to authentication, invalidate our * current tickets so that we can get new ones. */ - if (con->v1.auth_retry) { + if (!ceph_msgr2(from_msgr(con->msgr)) && con->v1.auth_retry) { dout("auth_retry %d, invalidating\n", con->v1.auth_retry); if (con->ops->invalidate_authorizer) con->ops->invalidate_authorizer(con); @@ -1714,9 +1714,10 @@ static void clear_standby(struct ceph_co { /* come back from STANDBY? */ if (con->state == CEPH_CON_S_STANDBY) { - dout("clear_standby %p and ++connect_seq\n", con); + dout("clear_standby %p\n", con); con->state = CEPH_CON_S_PREOPEN; - con->v1.connect_seq++; + if (!ceph_msgr2(from_msgr(con->msgr))) + con->v1.connect_seq++; WARN_ON(ceph_con_flag_test(con, CEPH_CON_F_WRITE_PENDING)); WARN_ON(ceph_con_flag_test(con, CEPH_CON_F_KEEPALIVE_PENDING)); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Markuze amarkuze@redhat.com
commit 15f519e9f883b316d86e2bb6b767a023aafd9d83 upstream.
Add validation to ensure the cached parent directory inode matches the directory info in MDS replies. This prevents client-side race conditions where concurrent operations (e.g. rename) cause r_parent to become stale between request initiation and reply processing, which could lead to applying state changes to incorrect directory inodes.
[ idryomov: folded a kerneldoc fixup and a follow-up fix from Alex to move CEPH_CAP_PIN reference when r_parent is updated:
When the parent directory lock is not held, req->r_parent can become stale and is updated to point to the correct inode. However, the associated CEPH_CAP_PIN reference was not being adjusted. The CEPH_CAP_PIN is a reference on an inode that is tracked for accounting purposes. Moving this pin is important to keep the accounting balanced. When the pin was not moved from the old parent to the new one, it created two problems: The reference on the old, stale parent was never released, causing a reference leak. A reference for the new parent was never acquired, creating the risk of a reference underflow later in ceph_mdsc_release_request(). This patch corrects the logic by releasing the pin from the old parent and acquiring it for the new parent when r_parent is switched. This ensures reference accounting stays balanced. ]
Cc: stable@vger.kernel.org Signed-off-by: Alex Markuze amarkuze@redhat.com Reviewed-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/debugfs.c | 14 +--- fs/ceph/dir.c | 17 ++--- fs/ceph/file.c | 24 ++----- fs/ceph/inode.c | 7 -- fs/ceph/mds_client.c | 172 +++++++++++++++++++++++++++++++-------------------- fs/ceph/mds_client.h | 18 ++++- 6 files changed, 145 insertions(+), 107 deletions(-)
--- a/fs/ceph/debugfs.c +++ b/fs/ceph/debugfs.c @@ -55,8 +55,6 @@ static int mdsc_show(struct seq_file *s, struct ceph_mds_client *mdsc = fsc->mdsc; struct ceph_mds_request *req; struct rb_node *rp; - int pathlen = 0; - u64 pathbase; char *path;
mutex_lock(&mdsc->mutex); @@ -81,8 +79,8 @@ static int mdsc_show(struct seq_file *s, if (req->r_inode) { seq_printf(s, " #%llx", ceph_ino(req->r_inode)); } else if (req->r_dentry) { - path = ceph_mdsc_build_path(mdsc, req->r_dentry, &pathlen, - &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, req->r_dentry, &path_info, 0); if (IS_ERR(path)) path = NULL; spin_lock(&req->r_dentry->d_lock); @@ -91,7 +89,7 @@ static int mdsc_show(struct seq_file *s, req->r_dentry, path ? path : ""); spin_unlock(&req->r_dentry->d_lock); - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); } else if (req->r_path1) { seq_printf(s, " #%llx/%s", req->r_ino1.ino, req->r_path1); @@ -100,8 +98,8 @@ static int mdsc_show(struct seq_file *s, }
if (req->r_old_dentry) { - path = ceph_mdsc_build_path(mdsc, req->r_old_dentry, &pathlen, - &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, req->r_old_dentry, &path_info, 0); if (IS_ERR(path)) path = NULL; spin_lock(&req->r_old_dentry->d_lock); @@ -111,7 +109,7 @@ static int mdsc_show(struct seq_file *s, req->r_old_dentry, path ? path : ""); spin_unlock(&req->r_old_dentry->d_lock); - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); } else if (req->r_path2 && req->r_op != CEPH_MDS_OP_SYMLINK) { if (req->r_ino2.ino) seq_printf(s, " #%llx/%s", req->r_ino2.ino, --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1263,10 +1263,8 @@ static void ceph_async_unlink_cb(struct
/* If op failed, mark everyone involved for errors */ if (result) { - int pathlen = 0; - u64 base = 0; - char *path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, - &base, 0); + struct ceph_path_info path_info = {0}; + char *path = ceph_mdsc_build_path(mdsc, dentry, &path_info, 0);
/* mark error on parent + clear complete */ mapping_set_error(req->r_parent->i_mapping, result); @@ -1280,8 +1278,8 @@ static void ceph_async_unlink_cb(struct mapping_set_error(req->r_old_inode->i_mapping, result);
pr_warn_client(cl, "failure path=(%llx)%s result=%d!\n", - base, IS_ERR(path) ? "<<bad>>" : path, result); - ceph_mdsc_free_path(path, pathlen); + path_info.vino.ino, IS_ERR(path) ? "<<bad>>" : path, result); + ceph_mdsc_free_path_info(&path_info); } out: iput(req->r_old_inode); @@ -1339,8 +1337,6 @@ static int ceph_unlink(struct inode *dir int err = -EROFS; int op; char *path; - int pathlen; - u64 pathbase;
if (ceph_snap(dir) == CEPH_SNAPDIR) { /* rmdir .snap/foo is RMSNAP */ @@ -1359,14 +1355,15 @@ static int ceph_unlink(struct inode *dir if (!dn) { try_async = false; } else { - path = ceph_mdsc_build_path(mdsc, dn, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, dn, &path_info, 0); if (IS_ERR(path)) { try_async = false; err = 0; } else { err = ceph_mds_check_access(mdsc, path, MAY_WRITE); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dn);
/* For none EACCES cases will let the MDS do the mds auth check */ --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -368,8 +368,6 @@ int ceph_open(struct inode *inode, struc int flags, fmode, wanted; struct dentry *dentry; char *path; - int pathlen; - u64 pathbase; bool do_sync = false; int mask = MAY_READ;
@@ -399,14 +397,15 @@ int ceph_open(struct inode *inode, struc if (!dentry) { do_sync = true; } else { - path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, dentry, &path_info, 0); if (IS_ERR(path)) { do_sync = true; err = 0; } else { err = ceph_mds_check_access(mdsc, path, mask); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dentry);
/* For none EACCES cases will let the MDS do the mds auth check */ @@ -614,15 +613,13 @@ static void ceph_async_create_cb(struct mapping_set_error(req->r_parent->i_mapping, result);
if (result) { - int pathlen = 0; - u64 base = 0; - char *path = ceph_mdsc_build_path(mdsc, req->r_dentry, &pathlen, - &base, 0); + struct ceph_path_info path_info = {0}; + char *path = ceph_mdsc_build_path(mdsc, req->r_dentry, &path_info, 0);
pr_warn_client(cl, "async create failure path=(%llx)%s result=%d!\n", - base, IS_ERR(path) ? "<<bad>>" : path, result); - ceph_mdsc_free_path(path, pathlen); + path_info.vino.ino, IS_ERR(path) ? "<<bad>>" : path, result); + ceph_mdsc_free_path_info(&path_info);
ceph_dir_clear_complete(req->r_parent); if (!d_unhashed(dentry)) @@ -791,8 +788,6 @@ int ceph_atomic_open(struct inode *dir, int mask; int err; char *path; - int pathlen; - u64 pathbase;
doutc(cl, "%p %llx.%llx dentry %p '%pd' %s flags %d mode 0%o\n", dir, ceph_vinop(dir), dentry, dentry, @@ -814,7 +809,8 @@ int ceph_atomic_open(struct inode *dir, if (!dn) { try_async = false; } else { - path = ceph_mdsc_build_path(mdsc, dn, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, dn, &path_info, 0); if (IS_ERR(path)) { try_async = false; err = 0; @@ -826,7 +822,7 @@ int ceph_atomic_open(struct inode *dir, mask |= MAY_WRITE; err = ceph_mds_check_access(mdsc, path, mask); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dn);
/* For none EACCES cases will let the MDS do the mds auth check */ --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -2483,22 +2483,21 @@ int __ceph_setattr(struct mnt_idmap *idm int truncate_retry = 20; /* The RMW will take around 50ms */ struct dentry *dentry; char *path; - int pathlen; - u64 pathbase; bool do_sync = false;
dentry = d_find_alias(inode); if (!dentry) { do_sync = true; } else { - path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, 0); + struct ceph_path_info path_info; + path = ceph_mdsc_build_path(mdsc, dentry, &path_info, 0); if (IS_ERR(path)) { do_sync = true; err = 0; } else { err = ceph_mds_check_access(mdsc, path, MAY_WRITE); } - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); dput(dentry);
/* For none EACCES cases will let the MDS do the mds auth check */ --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2686,8 +2686,7 @@ static u8 *get_fscrypt_altname(const str * ceph_mdsc_build_path - build a path string to a given dentry * @mdsc: mds client * @dentry: dentry to which path should be built - * @plen: returned length of string - * @pbase: returned base inode number + * @path_info: output path, length, base ino+snap, and freepath ownership flag * @for_wire: is this path going to be sent to the MDS? * * Build a string that represents the path to the dentry. This is mostly called @@ -2705,7 +2704,7 @@ static u8 *get_fscrypt_altname(const str * foo/.snap/bar -> foo//bar */ char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *dentry, - int *plen, u64 *pbase, int for_wire) + struct ceph_path_info *path_info, int for_wire) { struct ceph_client *cl = mdsc->fsc->client; struct dentry *cur; @@ -2815,16 +2814,28 @@ retry: return ERR_PTR(-ENAMETOOLONG); }
- *pbase = base; - *plen = PATH_MAX - 1 - pos; + /* Initialize the output structure */ + memset(path_info, 0, sizeof(*path_info)); + + path_info->vino.ino = base; + path_info->pathlen = PATH_MAX - 1 - pos; + path_info->path = path + pos; + path_info->freepath = true; + + /* Set snap from dentry if available */ + if (d_inode(dentry)) + path_info->vino.snap = ceph_snap(d_inode(dentry)); + else + path_info->vino.snap = CEPH_NOSNAP; + doutc(cl, "on %p %d built %llx '%.*s'\n", dentry, d_count(dentry), - base, *plen, path + pos); + base, PATH_MAX - 1 - pos, path + pos); return path + pos; }
static int build_dentry_path(struct ceph_mds_client *mdsc, struct dentry *dentry, - struct inode *dir, const char **ppath, int *ppathlen, - u64 *pino, bool *pfreepath, bool parent_locked) + struct inode *dir, struct ceph_path_info *path_info, + bool parent_locked) { char *path;
@@ -2833,41 +2844,47 @@ static int build_dentry_path(struct ceph dir = d_inode_rcu(dentry->d_parent); if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP && !IS_ENCRYPTED(dir)) { - *pino = ceph_ino(dir); + path_info->vino.ino = ceph_ino(dir); + path_info->vino.snap = ceph_snap(dir); rcu_read_unlock(); - *ppath = dentry->d_name.name; - *ppathlen = dentry->d_name.len; + path_info->path = dentry->d_name.name; + path_info->pathlen = dentry->d_name.len; + path_info->freepath = false; return 0; } rcu_read_unlock(); - path = ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1); + path = ceph_mdsc_build_path(mdsc, dentry, path_info, 1); if (IS_ERR(path)) return PTR_ERR(path); - *ppath = path; - *pfreepath = true; + /* + * ceph_mdsc_build_path already fills path_info, including snap handling. + */ return 0; }
-static int build_inode_path(struct inode *inode, - const char **ppath, int *ppathlen, u64 *pino, - bool *pfreepath) +static int build_inode_path(struct inode *inode, struct ceph_path_info *path_info) { struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb); struct dentry *dentry; char *path;
if (ceph_snap(inode) == CEPH_NOSNAP) { - *pino = ceph_ino(inode); - *ppathlen = 0; + path_info->vino.ino = ceph_ino(inode); + path_info->vino.snap = ceph_snap(inode); + path_info->pathlen = 0; + path_info->freepath = false; return 0; } dentry = d_find_alias(inode); - path = ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1); + path = ceph_mdsc_build_path(mdsc, dentry, path_info, 1); dput(dentry); if (IS_ERR(path)) return PTR_ERR(path); - *ppath = path; - *pfreepath = true; + /* + * ceph_mdsc_build_path already fills path_info, including snap from dentry. + * Override with inode's snap since that's what this function is for. + */ + path_info->vino.snap = ceph_snap(inode); return 0; }
@@ -2877,26 +2894,32 @@ static int build_inode_path(struct inode */ static int set_request_path_attr(struct ceph_mds_client *mdsc, struct inode *rinode, struct dentry *rdentry, struct inode *rdiri, - const char *rpath, u64 rino, const char **ppath, - int *pathlen, u64 *ino, bool *freepath, + const char *rpath, u64 rino, + struct ceph_path_info *path_info, bool parent_locked) { struct ceph_client *cl = mdsc->fsc->client; int r = 0;
+ /* Initialize the output structure */ + memset(path_info, 0, sizeof(*path_info)); + if (rinode) { - r = build_inode_path(rinode, ppath, pathlen, ino, freepath); + r = build_inode_path(rinode, path_info); doutc(cl, " inode %p %llx.%llx\n", rinode, ceph_ino(rinode), ceph_snap(rinode)); } else if (rdentry) { - r = build_dentry_path(mdsc, rdentry, rdiri, ppath, pathlen, ino, - freepath, parent_locked); - doutc(cl, " dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen, *ppath); + r = build_dentry_path(mdsc, rdentry, rdiri, path_info, parent_locked); + doutc(cl, " dentry %p %llx/%.*s\n", rdentry, path_info->vino.ino, + path_info->pathlen, path_info->path); } else if (rpath || rino) { - *ino = rino; - *ppath = rpath; - *pathlen = rpath ? strlen(rpath) : 0; - doutc(cl, " path %.*s\n", *pathlen, rpath); + path_info->vino.ino = rino; + path_info->vino.snap = CEPH_NOSNAP; + path_info->path = rpath; + path_info->pathlen = rpath ? strlen(rpath) : 0; + path_info->freepath = false; + + doutc(cl, " path %.*s\n", path_info->pathlen, rpath); }
return r; @@ -2973,11 +2996,8 @@ static struct ceph_msg *create_request_m struct ceph_client *cl = mdsc->fsc->client; struct ceph_msg *msg; struct ceph_mds_request_head_legacy *lhead; - const char *path1 = NULL; - const char *path2 = NULL; - u64 ino1 = 0, ino2 = 0; - int pathlen1 = 0, pathlen2 = 0; - bool freepath1 = false, freepath2 = false; + struct ceph_path_info path_info1 = {0}; + struct ceph_path_info path_info2 = {0}; struct dentry *old_dentry = NULL; int len; u16 releases; @@ -2987,25 +3007,49 @@ static struct ceph_msg *create_request_m u16 request_head_version = mds_supported_head_version(session); kuid_t caller_fsuid = req->r_cred->fsuid; kgid_t caller_fsgid = req->r_cred->fsgid; + bool parent_locked = test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, - req->r_parent, req->r_path1, req->r_ino1.ino, - &path1, &pathlen1, &ino1, &freepath1, - test_bit(CEPH_MDS_R_PARENT_LOCKED, - &req->r_req_flags)); + req->r_parent, req->r_path1, req->r_ino1.ino, + &path_info1, parent_locked); if (ret < 0) { msg = ERR_PTR(ret); goto out; }
+ /* + * When the parent directory's i_rwsem is *not* locked, req->r_parent may + * have become stale (e.g. after a concurrent rename) between the time the + * dentry was looked up and now. If we detect that the stored r_parent + * does not match the inode number we just encoded for the request, switch + * to the correct inode so that the MDS receives a valid parent reference. + */ + if (!parent_locked && req->r_parent && path_info1.vino.ino && + ceph_ino(req->r_parent) != path_info1.vino.ino) { + struct inode *old_parent = req->r_parent; + struct inode *correct_dir = ceph_get_inode(mdsc->fsc->sb, path_info1.vino, NULL); + if (!IS_ERR(correct_dir)) { + WARN_ONCE(1, "ceph: r_parent mismatch (had %llx wanted %llx) - updating\n", + ceph_ino(old_parent), path_info1.vino.ino); + /* + * Transfer CEPH_CAP_PIN from the old parent to the new one. + * The pin was taken earlier in ceph_mdsc_submit_request(). + */ + ceph_put_cap_refs(ceph_inode(old_parent), CEPH_CAP_PIN); + iput(old_parent); + req->r_parent = correct_dir; + ceph_get_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN); + } + } + /* If r_old_dentry is set, then assume that its parent is locked */ if (req->r_old_dentry && !(req->r_old_dentry->d_flags & DCACHE_DISCONNECTED)) old_dentry = req->r_old_dentry; ret = set_request_path_attr(mdsc, NULL, old_dentry, - req->r_old_dentry_dir, - req->r_path2, req->r_ino2.ino, - &path2, &pathlen2, &ino2, &freepath2, true); + req->r_old_dentry_dir, + req->r_path2, req->r_ino2.ino, + &path_info2, true); if (ret < 0) { msg = ERR_PTR(ret); goto out_free1; @@ -3036,7 +3080,7 @@ static struct ceph_msg *create_request_m
/* filepaths */ len += 2 * (1 + sizeof(u32) + sizeof(u64)); - len += pathlen1 + pathlen2; + len += path_info1.pathlen + path_info2.pathlen;
/* cap releases */ len += sizeof(struct ceph_mds_request_release) * @@ -3044,9 +3088,9 @@ static struct ceph_msg *create_request_m !!req->r_old_inode_drop + !!req->r_old_dentry_drop);
if (req->r_dentry_drop) - len += pathlen1; + len += path_info1.pathlen; if (req->r_old_dentry_drop) - len += pathlen2; + len += path_info2.pathlen;
/* MClientRequest tail */
@@ -3159,8 +3203,8 @@ static struct ceph_msg *create_request_m lhead->ino = cpu_to_le64(req->r_deleg_ino); lhead->args = req->r_args;
- ceph_encode_filepath(&p, end, ino1, path1); - ceph_encode_filepath(&p, end, ino2, path2); + ceph_encode_filepath(&p, end, path_info1.vino.ino, path_info1.path); + ceph_encode_filepath(&p, end, path_info2.vino.ino, path_info2.path);
/* make note of release offset, in case we need to replay */ req->r_request_release_offset = p - msg->front.iov_base; @@ -3223,11 +3267,9 @@ static struct ceph_msg *create_request_m msg->hdr.data_off = cpu_to_le16(0);
out_free2: - if (freepath2) - ceph_mdsc_free_path((char *)path2, pathlen2); + ceph_mdsc_free_path_info(&path_info2); out_free1: - if (freepath1) - ceph_mdsc_free_path((char *)path1, pathlen1); + ceph_mdsc_free_path_info(&path_info1); out: return msg; out_err: @@ -4584,24 +4626,20 @@ static int reconnect_caps_cb(struct inod struct ceph_pagelist *pagelist = recon_state->pagelist; struct dentry *dentry; struct ceph_cap *cap; - char *path; - int pathlen = 0, err; - u64 pathbase; + struct ceph_path_info path_info = {0}; + int err; u64 snap_follows;
dentry = d_find_primary(inode); if (dentry) { /* set pathbase to parent dir when msg_version >= 2 */ - path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, + char *path = ceph_mdsc_build_path(mdsc, dentry, &path_info, recon_state->msg_version >= 2); dput(dentry); if (IS_ERR(path)) { err = PTR_ERR(path); goto out_err; } - } else { - path = NULL; - pathbase = 0; }
spin_lock(&ci->i_ceph_lock); @@ -4634,7 +4672,7 @@ static int reconnect_caps_cb(struct inod rec.v2.wanted = cpu_to_le32(__ceph_caps_wanted(ci)); rec.v2.issued = cpu_to_le32(cap->issued); rec.v2.snaprealm = cpu_to_le64(ci->i_snap_realm->ino); - rec.v2.pathbase = cpu_to_le64(pathbase); + rec.v2.pathbase = cpu_to_le64(path_info.vino.ino); rec.v2.flock_len = (__force __le32) ((ci->i_ceph_flags & CEPH_I_ERROR_FILELOCK) ? 0 : 1); } else { @@ -4649,7 +4687,7 @@ static int reconnect_caps_cb(struct inod ts = inode_get_atime(inode); ceph_encode_timespec64(&rec.v1.atime, &ts); rec.v1.snaprealm = cpu_to_le64(ci->i_snap_realm->ino); - rec.v1.pathbase = cpu_to_le64(pathbase); + rec.v1.pathbase = cpu_to_le64(path_info.vino.ino); }
if (list_empty(&ci->i_cap_snaps)) { @@ -4711,7 +4749,7 @@ encode_again: sizeof(struct ceph_filelock); rec.v2.flock_len = cpu_to_le32(struct_len);
- struct_len += sizeof(u32) + pathlen + sizeof(rec.v2); + struct_len += sizeof(u32) + path_info.pathlen + sizeof(rec.v2);
if (struct_v >= 2) struct_len += sizeof(u64); /* snap_follows */ @@ -4735,7 +4773,7 @@ encode_again: ceph_pagelist_encode_8(pagelist, 1); ceph_pagelist_encode_32(pagelist, struct_len); } - ceph_pagelist_encode_string(pagelist, path, pathlen); + ceph_pagelist_encode_string(pagelist, (char *)path_info.path, path_info.pathlen); ceph_pagelist_append(pagelist, &rec, sizeof(rec.v2)); ceph_locks_to_pagelist(flocks, pagelist, num_fcntl_locks, num_flock_locks); @@ -4746,17 +4784,17 @@ out_freeflocks: } else { err = ceph_pagelist_reserve(pagelist, sizeof(u64) + sizeof(u32) + - pathlen + sizeof(rec.v1)); + path_info.pathlen + sizeof(rec.v1)); if (err) goto out_err;
ceph_pagelist_encode_64(pagelist, ceph_ino(inode)); - ceph_pagelist_encode_string(pagelist, path, pathlen); + ceph_pagelist_encode_string(pagelist, (char *)path_info.path, path_info.pathlen); ceph_pagelist_append(pagelist, &rec, sizeof(rec.v1)); }
out_err: - ceph_mdsc_free_path(path, pathlen); + ceph_mdsc_free_path_info(&path_info); if (!err) recon_state->nr_caps++; return err; --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -612,14 +612,24 @@ extern int ceph_mds_check_access(struct
extern void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc);
-static inline void ceph_mdsc_free_path(char *path, int len) +/* + * Structure to group path-related output parameters for build_*_path functions + */ +struct ceph_path_info { + const char *path; + int pathlen; + struct ceph_vino vino; + bool freepath; +}; + +static inline void ceph_mdsc_free_path_info(const struct ceph_path_info *path_info) { - if (!IS_ERR_OR_NULL(path)) - __putname(path - (PATH_MAX - 1 - len)); + if (path_info && path_info->freepath && !IS_ERR_OR_NULL(path_info->path)) + __putname((char *)path_info->path - (PATH_MAX - 1 - path_info->pathlen)); }
extern char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, - struct dentry *dentry, int *plen, u64 *base, + struct dentry *dentry, struct ceph_path_info *path_info, int for_wire);
extern void __ceph_mdsc_drop_dentry_lease(struct dentry *dentry);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Markuze amarkuze@redhat.com
commit bec324f33d1ed346394b2eee25bf6dbf3511f727 upstream.
When the parent directory's i_rwsem is not locked, req->r_parent may become stale due to concurrent operations (e.g. rename) between dentry lookup and message creation. Validate that r_parent matches the encoded parent inode and update to the correct inode if a mismatch is detected.
[ idryomov: folded a follow-up fix from Alex to drop extra reference from ceph_get_reply_dir() in ceph_fill_trace():
ceph_get_reply_dir() may return a different, referenced inode when r_parent is stale and the parent directory lock is not held. ceph_fill_trace() used that inode but failed to drop the reference when it differed from req->r_parent, leaking an inode reference.
Keep the directory inode in a local variable and iput() it at function end if it does not match req->r_parent. ]
Cc: stable@vger.kernel.org Signed-off-by: Alex Markuze amarkuze@redhat.com Reviewed-by: Viacheslav Dubeyko Slava.Dubeyko@ibm.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/inode.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 69 insertions(+), 12 deletions(-)
--- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -55,6 +55,52 @@ static int ceph_set_ino_cb(struct inode return 0; }
+/* + * Check if the parent inode matches the vino from directory reply info + */ +static inline bool ceph_vino_matches_parent(struct inode *parent, + struct ceph_vino vino) +{ + return ceph_ino(parent) == vino.ino && ceph_snap(parent) == vino.snap; +} + +/* + * Validate that the directory inode referenced by @req->r_parent matches the + * inode number and snapshot id contained in the reply's directory record. If + * they do not match – which can theoretically happen if the parent dentry was + * moved between the time the request was issued and the reply arrived – fall + * back to looking up the correct inode in the inode cache. + * + * A reference is *always* returned. Callers that receive a different inode + * than the original @parent are responsible for dropping the extra reference + * once the reply has been processed. + */ +static struct inode *ceph_get_reply_dir(struct super_block *sb, + struct inode *parent, + struct ceph_mds_reply_info_parsed *rinfo) +{ + struct ceph_vino vino; + + if (unlikely(!rinfo->diri.in)) + return parent; /* nothing to compare against */ + + /* If we didn't have a cached parent inode to begin with, just bail out. */ + if (!parent) + return NULL; + + vino.ino = le64_to_cpu(rinfo->diri.in->ino); + vino.snap = le64_to_cpu(rinfo->diri.in->snapid); + + if (likely(ceph_vino_matches_parent(parent, vino))) + return parent; /* matches – use the original reference */ + + /* Mismatch – this should be rare. Emit a WARN and obtain the correct inode. */ + WARN_ONCE(1, "ceph: reply dir mismatch (parent valid %llx.%llx reply %llx.%llx)\n", + ceph_ino(parent), ceph_snap(parent), vino.ino, vino.snap); + + return ceph_get_inode(sb, vino, NULL); +} + /** * ceph_new_inode - allocate a new inode in advance of an expected create * @dir: parent directory for new inode @@ -1523,6 +1569,7 @@ int ceph_fill_trace(struct super_block * struct ceph_vino tvino, dvino; struct ceph_fs_client *fsc = ceph_sb_to_fs_client(sb); struct ceph_client *cl = fsc->client; + struct inode *parent_dir = NULL; int err = 0;
doutc(cl, "%p is_dentry %d is_target %d\n", req, @@ -1536,10 +1583,17 @@ int ceph_fill_trace(struct super_block * }
if (rinfo->head->is_dentry) { - struct inode *dir = req->r_parent; - - if (dir) { - err = ceph_fill_inode(dir, NULL, &rinfo->diri, + /* + * r_parent may be stale, in cases when R_PARENT_LOCKED is not set, + * so we need to get the correct inode + */ + parent_dir = ceph_get_reply_dir(sb, req->r_parent, rinfo); + if (unlikely(IS_ERR(parent_dir))) { + err = PTR_ERR(parent_dir); + goto done; + } + if (parent_dir) { + err = ceph_fill_inode(parent_dir, NULL, &rinfo->diri, rinfo->dirfrag, session, -1, &req->r_caps_reservation); if (err < 0) @@ -1548,14 +1602,14 @@ int ceph_fill_trace(struct super_block * WARN_ON_ONCE(1); }
- if (dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME && + if (parent_dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME && test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) && !test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) { bool is_nokey = false; struct qstr dname; struct dentry *dn, *parent; struct fscrypt_str oname = FSTR_INIT(NULL, 0); - struct ceph_fname fname = { .dir = dir, + struct ceph_fname fname = { .dir = parent_dir, .name = rinfo->dname, .ctext = rinfo->altname, .name_len = rinfo->dname_len, @@ -1564,10 +1618,10 @@ int ceph_fill_trace(struct super_block * BUG_ON(!rinfo->head->is_target); BUG_ON(req->r_dentry);
- parent = d_find_any_alias(dir); + parent = d_find_any_alias(parent_dir); BUG_ON(!parent);
- err = ceph_fname_alloc_buffer(dir, &oname); + err = ceph_fname_alloc_buffer(parent_dir, &oname); if (err < 0) { dput(parent); goto done; @@ -1576,7 +1630,7 @@ int ceph_fill_trace(struct super_block * err = ceph_fname_to_usr(&fname, NULL, &oname, &is_nokey); if (err < 0) { dput(parent); - ceph_fname_free_buffer(dir, &oname); + ceph_fname_free_buffer(parent_dir, &oname); goto done; } dname.name = oname.name; @@ -1595,7 +1649,7 @@ retry_lookup: dname.len, dname.name, dn); if (!dn) { dput(parent); - ceph_fname_free_buffer(dir, &oname); + ceph_fname_free_buffer(parent_dir, &oname); err = -ENOMEM; goto done; } @@ -1610,12 +1664,12 @@ retry_lookup: ceph_snap(d_inode(dn)) != tvino.snap)) { doutc(cl, " dn %p points to wrong inode %p\n", dn, d_inode(dn)); - ceph_dir_clear_ordered(dir); + ceph_dir_clear_ordered(parent_dir); d_delete(dn); dput(dn); goto retry_lookup; } - ceph_fname_free_buffer(dir, &oname); + ceph_fname_free_buffer(parent_dir, &oname);
req->r_dentry = dn; dput(parent); @@ -1794,6 +1848,9 @@ retry_lookup: &dvino, ptvino); } done: + /* Drop extra ref from ceph_get_reply_dir() if it returned a new inode */ + if (unlikely(!IS_ERR_OR_NULL(parent_dir) && parent_dir != req->r_parent)) + iput(parent_dir); doutc(cl, "done err=%d\n", err); return err; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stanislav Fort stanislav.fort@aisle.com
commit 3260a3f0828e06f5f13fac69fb1999a6d60d9cff upstream.
state_show() reads kdamond->damon_ctx without holding damon_sysfs_lock. This allows a use-after-free race:
CPU 0 CPU 1 ----- ----- state_show() damon_sysfs_turn_damon_on() ctx = kdamond->damon_ctx; mutex_lock(&damon_sysfs_lock); damon_destroy_ctx(kdamond->damon_ctx); kdamond->damon_ctx = NULL; mutex_unlock(&damon_sysfs_lock); damon_is_running(ctx); /* ctx is freed */ mutex_lock(&ctx->kdamond_lock); /* UAF */
(The race can also occur with damon_sysfs_kdamonds_rm_dirs() and damon_sysfs_kdamond_release(), which free or replace the context under damon_sysfs_lock.)
Fix by taking damon_sysfs_lock before dereferencing the context, mirroring the locking used in pid_show().
The bug has existed since state_show() first accessed kdamond->damon_ctx.
Link: https://lkml.kernel.org/r/20250905101046.2288-1-disclosure@aisle.com Fixes: a61ea561c871 ("mm/damon/sysfs: link DAMON for virtual address spaces monitoring") Signed-off-by: Stanislav Fort disclosure@aisle.com Reported-by: Stanislav Fort disclosure@aisle.com Reviewed-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: SeongJae Park sj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/sysfs.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
--- a/mm/damon/sysfs.c +++ b/mm/damon/sysfs.c @@ -1067,14 +1067,18 @@ static ssize_t state_show(struct kobject { struct damon_sysfs_kdamond *kdamond = container_of(kobj, struct damon_sysfs_kdamond, kobj); - struct damon_ctx *ctx = kdamond->damon_ctx; - bool running; + struct damon_ctx *ctx; + bool running = false;
- if (!ctx) - running = false; - else + if (!mutex_trylock(&damon_sysfs_lock)) + return -EBUSY; + + ctx = kdamond->damon_ctx; + if (ctx) running = damon_sysfs_ctx_running(ctx);
+ mutex_unlock(&damon_sysfs_lock); + return sysfs_emit(buf, "%s\n", running ? damon_sysfs_cmd_strs[DAMON_SYSFS_CMD_ON] : damon_sysfs_cmd_strs[DAMON_SYSFS_CMD_OFF]);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quanmin Yan yanquanmin1@huawei.com
commit e6b543ca9806d7bced863f43020e016ee996c057 upstream.
When creating a new scheme of DAMON_RECLAIM, the calculation of 'min_age_region' uses 'aggr_interval' as the divisor, which may lead to division-by-zero errors. Fix it by directly returning -EINVAL when such a case occurs.
Link: https://lkml.kernel.org/r/20250827115858.1186261-3-yanquanmin1@huawei.com Fixes: f5a79d7c0c87 ("mm/damon: introduce struct damos_access_pattern") Signed-off-by: Quanmin Yan yanquanmin1@huawei.com Reviewed-by: SeongJae Park sj@kernel.org Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: ze zuo zuoze1@huawei.com Cc: stable@vger.kernel.org [6.1+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: SeongJae Park sj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/reclaim.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/mm/damon/reclaim.c +++ b/mm/damon/reclaim.c @@ -194,6 +194,11 @@ static int damon_reclaim_apply_parameter if (err) return err;
+ if (!damon_reclaim_mon_attrs.aggr_interval) { + err = -EINVAL; + goto out; + } + err = damon_set_attrs(ctx, &damon_reclaim_mon_attrs); if (err) goto out;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeongjun Park aha310510@gmail.com
[ Upstream commit 21cc2b5c5062a256ae9064442d37ebbc23f5aef7 ]
When restoring a reservation for an anonymous page, we need to check to freeing a surplus. However, __unmap_hugepage_range() causes data race because it reads h->surplus_huge_pages without the protection of hugetlb_lock.
And adjust_reservation is a boolean variable that indicates whether reservations for anonymous pages in each folio should be restored. Therefore, it should be initialized to false for each round of the loop. However, this variable is not initialized to false except when defining the current adjust_reservation variable.
This means that once adjust_reservation is set to true even once within the loop, reservations for anonymous pages will be restored unconditionally in all subsequent rounds, regardless of the folio's state.
To fix this, we need to add the missing hugetlb_lock, unlock the page_table_lock earlier so that we don't lock the hugetlb_lock inside the page_table_lock lock, and initialize adjust_reservation to false on each round within the loop.
Link: https://lkml.kernel.org/r/20250823182115.1193563-1-aha310510@gmail.com Fixes: df7a6d1f6405 ("mm/hugetlb: restore the reservation if needed") Signed-off-by: Jeongjun Park aha310510@gmail.com Reported-by: syzbot+417aeb05fd190f3a6da9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=417aeb05fd190f3a6da9 Reviewed-by: Sidhartha Kumar sidhartha.kumar@oracle.com Cc: Breno Leitao leitao@debian.org Cc: David Hildenbrand david@redhat.com Cc: Muchun Song muchun.song@linux.dev Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [ Page vs folio differences ] Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/hugetlb.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5512,7 +5512,7 @@ void __unmap_hugepage_range(struct mmu_g struct page *page; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); - bool adjust_reservation = false; + bool adjust_reservation; unsigned long last_addr_mask; bool force_flush = false;
@@ -5604,6 +5604,7 @@ void __unmap_hugepage_range(struct mmu_g sz); hugetlb_count_sub(pages_per_huge_page(h), mm); hugetlb_remove_rmap(page_folio(page)); + spin_unlock(ptl);
/* * Restore the reservation for anonymous page, otherwise the @@ -5611,14 +5612,16 @@ void __unmap_hugepage_range(struct mmu_g * If there we are freeing a surplus, do not set the restore * reservation bit. */ + adjust_reservation = false; + + spin_lock_irq(&hugetlb_lock); if (!h->surplus_huge_pages && __vma_private_lock(vma) && folio_test_anon(page_folio(page))) { folio_set_hugetlb_restore_reserve(page_folio(page)); /* Reservation to be adjusted after the spin lock */ adjust_reservation = true; } - - spin_unlock(ptl); + spin_unlock_irq(&hugetlb_lock);
/* * Adjust the reservation for the region that will have the
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Santhosh Kumar K s-k6@ti.com
[ Upstream commit 4550d33e18112a11a740424c4eec063cd58e918c ]
Fix the W25N01JW's oob_layout according to the datasheet [1]
[1] https://www.winbond.com/hq/product/code-storage-flash-memory/qspinand-flash/...
Fixes: 6a804fb72de5 ("mtd: spinand: winbond: add support for serial NAND flash") Cc: Sridharan S N quic_sridsn@quicinc.com Cc: stable@vger.kernel.org Signed-off-by: Santhosh Kumar K s-k6@ti.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com [ Adjust context ] Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/spi/winbond.c | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-)
--- a/drivers/mtd/nand/spi/winbond.c +++ b/drivers/mtd/nand/spi/winbond.c @@ -122,6 +122,41 @@ static const struct mtd_ooblayout_ops w2 .free = w25n02kv_ooblayout_free, };
+static int w25n01jw_ooblayout_ecc(struct mtd_info *mtd, int section, + struct mtd_oob_region *region) +{ + if (section > 3) + return -ERANGE; + + region->offset = (16 * section) + 12; + region->length = 4; + + return 0; +} + +static int w25n01jw_ooblayout_free(struct mtd_info *mtd, int section, + struct mtd_oob_region *region) +{ + if (section > 3) + return -ERANGE; + + region->offset = (16 * section); + region->length = 12; + + /* Extract BBM */ + if (!section) { + region->offset += 2; + region->length -= 2; + } + + return 0; +} + +static const struct mtd_ooblayout_ops w25n01jw_ooblayout = { + .ecc = w25n01jw_ooblayout_ecc, + .free = w25n01jw_ooblayout_free, +}; + static int w25n02kv_ecc_get_status(struct spinand_device *spinand, u8 status) { @@ -206,7 +241,7 @@ static const struct spinand_info winbond &write_cache_variants, &update_cache_variants), 0, - SPINAND_ECCINFO(&w25m02gv_ooblayout, NULL)), + SPINAND_ECCINFO(&w25n01jw_ooblayout, NULL)), SPINAND_INFO("W25N02JWZEIF", SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xbf, 0x22), NAND_MEMORG(1, 2048, 64, 64, 1024, 20, 1, 2, 1),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
[ Upstream commit 9e9ff875e4174be939371667d2cc81244e31232f ]
We recently received a report of poor performance doing sequential buffered reads of a file with compressed extents. With bs=128k, a naive sequential dd ran as fast on a compressed file as on an uncompressed (1.2GB/s on my reproducing system) while with bs<32k, this performance tanked down to ~300MB/s.
i.e., slow:
dd if=some-compressed-file of=/dev/null bs=4k count=X
vs fast:
dd if=some-compressed-file of=/dev/null bs=128k count=Y
The cause of this slowness is overhead to do with looking up extent_maps to enable readahead pre-caching on compressed extents (add_ra_bio_pages()), as well as some overhead in the generic VFS readahead code we hit more in the slow case. Notably, the main difference between the two read sizes is that in the large sized request case, we call btrfs_readahead() relatively rarely while in the smaller request we call it for every compressed extent. So the fast case stays in the btrfs readahead loop:
while ((folio = readahead_folio(rac)) != NULL) btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start);
where the slower one breaks out of that loop every time. This results in calling add_ra_bio_pages a lot, doing lots of extent_map lookups, extent_map locking, etc.
This happens because although add_ra_bio_pages() does add the appropriate un-compressed file pages to the cache, it does not communicate back to the ractl in any way. To solve this, we should be using readahead_expand() to signal to readahead to expand the readahead window.
This change passes the readahead_control into the btrfs_bio_ctrl and in the case of compressed reads sets the expansion to the size of the extent_map we already looked up anyway. It skips the subpage case as that one already doesn't do add_ra_bio_pages().
With this change, whether we use bs=4k or bs=128k, btrfs expands the readahead window up to the largest compressed extent we have seen so far (in the trivial example: 128k) and the call stacks of the two modes look identical. Notably, we barely call add_ra_bio_pages at all. And the performance becomes identical as well. So this change certainly "fixes" this performance problem.
Of course, it does seem to beg a few questions:
1. Will this waste too much page cache with a too large ra window? 2. Will this somehow cause bugs prevented by the more thoughtful checking in add_ra_bio_pages? 3. Should we delete add_ra_bio_pages?
My stabs at some answers:
1. Hard to say. See attempts at generic performance testing below. Is there a "readahead_shrink" we should be using? Should we expand more slowly, by half the remaining em size each time? 2. I don't think so. Since the new behavior is indistinguishable from reading the file with a larger read size passed in, I don't see why one would be safe but not the other. 3. Probably! I tested that and it was fine in fstests, and it seems like the pages would get re-used just as well in the readahead case. However, it is possible some reads that use page cache but not btrfs_readahead() could suffer. I will investigate this further as a follow up.
I tested the performance implications of this change in 3 ways (using compress-force=zstd:3 for compression):
Directly test the affected workload of small sequential reads on a compressed file (improved from ~250MB/s to ~1.2GB/s)
==========for-next========== dd /mnt/lol/non-cmpr 4k 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.02983 s, 712 MB/s dd /mnt/lol/non-cmpr 128k 32768+0 records in 32768+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.92403 s, 725 MB/s dd /mnt/lol/cmpr 4k 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.8832 s, 240 MB/s dd /mnt/lol/cmpr 128k 32768+0 records in 32768+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.71001 s, 1.2 GB/s
==========ra-expand========== dd /mnt/lol/non-cmpr 4k 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.09001 s, 705 MB/s dd /mnt/lol/non-cmpr 128k 32768+0 records in 32768+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.07664 s, 707 MB/s dd /mnt/lol/cmpr 4k 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.79531 s, 1.1 GB/s dd /mnt/lol/cmpr 128k 32768+0 records in 32768+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.69533 s, 1.2 GB/s
Built the linux kernel from clean (no change)
Ran fsperf. Mostly neutral results with some improvements and regressions here and there.
Reported-by: Dimitrios Apostolou jimis@gmx.net Link: https://lore.kernel.org/linux-btrfs/34601559-6c16-6ccc-1793-20a97ca0dbba@gmx... Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com [ Assert doesn't take a format string ] Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-)
--- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -108,6 +108,7 @@ struct btrfs_bio_ctrl { * This is to avoid touching ranges covered by compression/inline. */ unsigned long submit_bitmap; + struct readahead_control *ractl; };
static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl) @@ -929,6 +930,23 @@ static struct extent_map *get_extent_map
return em; } + +static void btrfs_readahead_expand(struct readahead_control *ractl, + const struct extent_map *em) +{ + const u64 ra_pos = readahead_pos(ractl); + const u64 ra_end = ra_pos + readahead_length(ractl); + const u64 em_end = em->start + em->ram_bytes; + + /* No expansion for holes and inline extents. */ + if (em->disk_bytenr > EXTENT_MAP_LAST_BYTE) + return; + + ASSERT(em_end >= ra_pos); + if (em_end > ra_end) + readahead_expand(ractl, ra_pos, em_end - ra_pos); +} + /* * basic readpage implementation. Locked extent state structs are inserted * into the tree that are removed when the IO is done (by the end_io @@ -994,6 +1012,17 @@ static int btrfs_do_readpage(struct foli
iosize = min(extent_map_end(em) - cur, end - cur + 1); iosize = ALIGN(iosize, blocksize); + + /* + * Only expand readahead for extents which are already creating + * the pages anyway in add_ra_bio_pages, which is compressed + * extents in the non subpage case. + */ + if (bio_ctrl->ractl && + !btrfs_is_subpage(fs_info, folio->mapping) && + compress_type != BTRFS_COMPRESS_NONE) + btrfs_readahead_expand(bio_ctrl->ractl, em); + if (compress_type != BTRFS_COMPRESS_NONE) disk_bytenr = em->disk_bytenr; else @@ -2360,7 +2389,10 @@ int btrfs_writepages(struct address_spac
void btrfs_readahead(struct readahead_control *rac) { - struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ | REQ_RAHEAD }; + struct btrfs_bio_ctrl bio_ctrl = { + .opf = REQ_OP_READ | REQ_RAHEAD, + .ractl = rac + }; struct folio *folio; struct btrfs_inode *inode = BTRFS_I(rac->mapping->host); const u64 start = readahead_pos(rac);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit 9786531399a679fc2f4630d2c0a186205282ab2f ]
[BUG] With 64K page size (aarch64 with 64K page size config) and 4K btrfs block size, the following workload can easily lead to a corrupted read:
mkfs.btrfs -f -s 4k $dev > /dev/null mount -o compress $dev $mnt xfs_io -f -c "pwrite -S 0xff 0 64k" $mnt/base > /dev/null echo "correct result:" od -Ad -t x1 $mnt/base xfs_io -f -c "reflink $mnt/base 32k 0 32k" \ -c "reflink $mnt/base 0 32k 32k" \ -c "pwrite -S 0xff 60k 4k" $mnt/new > /dev/null echo "incorrect result:" od -Ad -t x1 $mnt/new umount $mnt
This shows the following result:
correct result: 0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0065536 incorrect result: 0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0032768 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0061440 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0065536
Notice the zero in the range [32K, 60K), which is incorrect.
[CAUSE] With extra trace printk, it shows the following events during od: (some unrelated info removed like CPU and context)
od-3457 btrfs_do_readpage: enter r/i=5/258 folio=0(65536) prev_em_start=0000000000000000
The "r/i" is indicating the root and inode number. In our case the file "new" is using ino 258 from fs tree (root 5).
Here notice the @prev_em_start pointer is NULL. This means the btrfs_do_readpage() is called from btrfs_read_folio(), not from btrfs_readahead().
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=0 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=4096 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=8192 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=12288 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=16384 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=20480 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=24576 got em start=0 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=28672 got em start=0 len=32768
These above 32K blocks will be read from the first half of the compressed data extent.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=32768 got em start=32768 len=32768
Note here there is no btrfs_submit_compressed_read() call. Which is incorrect now. Although both extent maps at 0 and 32K are pointing to the same compressed data, their offsets are different thus can not be merged into the same read.
So this means the compressed data read merge check is doing something wrong.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=36864 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=40960 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=45056 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=49152 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=53248 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=57344 got em start=32768 len=32768 od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=61440 skip uptodate od-3457 btrfs_submit_compressed_read: cb orig_bio: file off=0 len=61440
The function btrfs_submit_compressed_read() is only called at the end of folio read. The compressed bio will only have an extent map of range [0, 32K), but the original bio passed in is for the whole 64K folio.
This will cause the decompression part to only fill the first 32K, leaving the rest untouched (aka, filled with zero).
This incorrect compressed read merge leads to the above data corruption.
There were similar problems that happened in the past, commit 808f80b46790 ("Btrfs: update fix for read corruption of compressed and shared extents") is doing pretty much the same fix for readahead.
But that's back to 2015, where btrfs still only supports bs (block size) == ps (page size) cases. This means btrfs_do_readpage() only needs to handle a folio which contains exactly one block.
Only btrfs_readahead() can lead to a read covering multiple blocks. Thus only btrfs_readahead() passes a non-NULL @prev_em_start pointer.
With v5.15 kernel btrfs introduced bs < ps support. This breaks the above assumption that a folio can only contain one block.
Now btrfs_read_folio() can also read multiple blocks in one go. But btrfs_read_folio() doesn't pass a @prev_em_start pointer, thus the existing bio force submission check will never be triggered.
In theory, this can also happen for btrfs with large folios, but since large folio is still experimental, we don't need to bother it, thus only bs < ps support is affected for now.
[FIX] Instead of passing @prev_em_start to do the proper compressed extent check, introduce one new member, btrfs_bio_ctrl::last_em_start, so that the existing bio force submission logic will always be triggered.
CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com [ Adjust context ] Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.c | 40 ++++++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 10 deletions(-)
--- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -109,6 +109,24 @@ struct btrfs_bio_ctrl { */ unsigned long submit_bitmap; struct readahead_control *ractl; + + /* + * The start offset of the last used extent map by a read operation. + * + * This is for proper compressed read merge. + * U64_MAX means we are starting the read and have made no progress yet. + * + * The current btrfs_bio_is_contig() only uses disk_bytenr as + * the condition to check if the read can be merged with previous + * bio, which is not correct. E.g. two file extents pointing to the + * same extent but with different offset. + * + * So here we need to do extra checks to only merge reads that are + * covered by the same extent map. + * Just extent_map::start will be enough, as they are unique + * inside the same inode. + */ + u64 last_em_start; };
static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl) @@ -955,7 +973,7 @@ static void btrfs_readahead_expand(struc * return 0 on success, otherwise return error */ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, - struct btrfs_bio_ctrl *bio_ctrl, u64 *prev_em_start) + struct btrfs_bio_ctrl *bio_ctrl) { struct inode *inode = folio->mapping->host; struct btrfs_fs_info *fs_info = inode_to_fs_info(inode); @@ -1066,12 +1084,11 @@ static int btrfs_do_readpage(struct foli * non-optimal behavior (submitting 2 bios for the same extent). */ if (compress_type != BTRFS_COMPRESS_NONE && - prev_em_start && *prev_em_start != (u64)-1 && - *prev_em_start != em->start) + bio_ctrl->last_em_start != U64_MAX && + bio_ctrl->last_em_start != em->start) force_bio_submit = true;
- if (prev_em_start) - *prev_em_start = em->start; + bio_ctrl->last_em_start = em->start;
free_extent_map(em); em = NULL; @@ -1115,12 +1132,15 @@ int btrfs_read_folio(struct file *file, const u64 start = folio_pos(folio); const u64 end = start + folio_size(folio) - 1; struct extent_state *cached_state = NULL; - struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ }; + struct btrfs_bio_ctrl bio_ctrl = { + .opf = REQ_OP_READ, + .last_em_start = U64_MAX, + }; struct extent_map *em_cached = NULL; int ret;
btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state); - ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl, NULL); + ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl); unlock_extent(&inode->io_tree, start, end, &cached_state);
free_extent_map(em_cached); @@ -2391,7 +2411,8 @@ void btrfs_readahead(struct readahead_co { struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ | REQ_RAHEAD, - .ractl = rac + .ractl = rac, + .last_em_start = U64_MAX, }; struct folio *folio; struct btrfs_inode *inode = BTRFS_I(rac->mapping->host); @@ -2399,12 +2420,11 @@ void btrfs_readahead(struct readahead_co const u64 end = start + readahead_length(rac) - 1; struct extent_state *cached_state = NULL; struct extent_map *em_cached = NULL; - u64 prev_em_start = (u64)-1;
btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state);
while ((folio = readahead_folio(rac)) != NULL) - btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start); + btrfs_do_readpage(folio, &em_cached, &bio_ctrl);
unlock_extent(&inode->io_tree, start, end, &cached_state);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiongfeng Wang wangxiongfeng2@huawei.com
commit e895f8e29119c8c966ea794af9e9100b10becb88 upstream.
When testing softirq based hrtimers on an ARM32 board, with high resolution mode and NOHZ inactive, softirq based hrtimers fail to expire after being moved away from an offline CPU:
CPU0 CPU1 hrtimer_start(..., HRTIMER_MODE_SOFT); cpu_down(CPU1) ... hrtimers_cpu_dying() // Migrate timers to CPU0 smp_call_function_single(CPU0, returgger_next_event); retrigger_next_event() if (!highres && !nohz) return;
As retrigger_next_event() is a NOOP when both high resolution timers and NOHZ are inactive CPU0's hrtimer_cpu_base::softirq_expires_next is not updated and the migrated softirq timers never expire unless there is a softirq based hrtimer queued on CPU0 later.
Fix this by removing the hrtimer_hres_active() and tick_nohz_active() check in retrigger_next_event(), which enforces a full update of the CPU base. As this is not a fast path the extra cost does not matter.
[ tglx: Massaged change log ]
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Co-developed-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/all/20250805081025.54235-1-wangxiongfeng2@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/hrtimer.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
--- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -798,10 +798,10 @@ static void retrigger_next_event(void *a * of the next expiring timer is enough. The return from the SMP * function call will take care of the reprogramming in case the * CPU was in a NOHZ idle sleep. + * + * In periodic low resolution mode, the next softirq expiration + * must also be updated. */ - if (!hrtimer_hres_active(base) && !tick_nohz_active) - return; - raw_spin_lock(&base->lock); hrtimer_update_base(base); if (hrtimer_hres_active(base)) @@ -2286,11 +2286,6 @@ int hrtimers_cpu_dying(unsigned int dyin &new_base->clock_base[i]); }
- /* - * The migration might have changed the first expiring softirq - * timer on this CPU. Update it. - */ - __hrtimer_get_next_event(new_base, HRTIMER_ACTIVE_SOFT); /* Tell the other CPU to retrigger the next event */ smp_call_function_single(ncpu, retrigger_next_event, NULL, 0);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff LaBundy jeff@labundy.com
commit c9ddc41cdd522f2db5d492eda3df8994d928be34 upstream.
If a proximity event node is defined so as to specify the wake-up properties of the touch surface, the proximity event interrupt is enabled unconditionally. This may result in unwanted interrupts.
Solve this problem by enabling the interrupt only if the event is mapped to a key or switch code.
Signed-off-by: Jeff LaBundy jeff@labundy.com Link: https://lore.kernel.org/r/aKJxxgEWpNaNcUaW@nixie71 Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/misc/iqs7222.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/input/misc/iqs7222.c +++ b/drivers/input/misc/iqs7222.c @@ -2430,6 +2430,9 @@ static int iqs7222_parse_chan(struct iqs if (error) return error;
+ if (!iqs7222->kp_type[chan_index][i]) + continue; + if (!dev_desc->event_offset) continue;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoffer Sandberg cs@tuxedo.de
commit 1939a9fcb80353dd8b111aa1e79c691afbde08b4 upstream.
Occasionally wakes up from suspend with missing input on the internal keyboard. Setting the quirks appears to fix the issue for this device as well.
Signed-off-by: Christoffer Sandberg cs@tuxedo.de Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250826142646.13516-1-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/serio/i8042-acpipnpio.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/drivers/input/serio/i8042-acpipnpio.h +++ b/drivers/input/serio/i8042-acpipnpio.h @@ -1155,6 +1155,20 @@ static const struct dmi_system_id i8042_ .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "XxHP4NAx"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "XxKK4NAx_XxSP4NAx"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, /* * A lot of modern Clevo barebones have touchpad and/or keyboard issues * after suspend fixable with the forcenorestore quirk.
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit 63a796558bc22ec699e4193d5c75534757ddf2e6 upstream.
This reverts commit 5537a4679403 ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"), it breaks operation of asix ethernet usb dongle after system suspend-resume cycle.
Link: https://lore.kernel.org/all/b5ea8296-f981-445d-a09a-2f389d7f6fdd@samsung.com... Fixes: 5537a4679403 ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups") Reported-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Jakub Kicinski kuba@kernel.org Link: https://patch.msgid.link/2945b9dbadb8ee1fee058b19554a5cb14f1763c1.1757601118... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/asix_devices.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -607,8 +607,15 @@ static const struct net_device_ops ax887
static void ax88772_suspend(struct usbnet *dev) { + struct asix_common_private *priv = dev->driver_priv; u16 medium;
+ if (netif_running(dev->net)) { + rtnl_lock(); + phylink_suspend(priv->phylink, false); + rtnl_unlock(); + } + /* Stop MAC operation */ medium = asix_read_medium_status(dev, 1); medium &= ~AX_MEDIUM_RE; @@ -637,6 +644,12 @@ static void ax88772_resume(struct usbnet for (i = 0; i < 3; i++) if (!priv->reset(dev, 1)) break; + + if (netif_running(dev->net)) { + rtnl_lock(); + phylink_resume(priv->phylink); + rtnl_unlock(); + } }
static int asix_resume(struct usb_interface *intf)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabian Vogt fvogt@suse.de
commit cfd956dcb101aa3d25bac321fae923323a47c607 upstream.
After hvc_write completes, call hvc_kick also in the case the output buffer has been drained, to ensure tty_wakeup gets called.
This fixes that functions which wait for a drained buffer got stuck occasionally.
Cc: stable stable@kernel.org Closes: https://bugzilla.opensuse.org/show_bug.cgi?id=1230062 Signed-off-by: Fabian Vogt fvogt@suse.de Link: https://lore.kernel.org/r/2011735.PYKUYFuaPT@fvogt-thinkpad Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/hvc/hvc_console.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -543,10 +543,10 @@ static ssize_t hvc_write(struct tty_stru }
/* - * Racy, but harmless, kick thread if there is still pending data. + * Kick thread to flush if there's still pending data + * or to wakeup the write queue. */ - if (hp->n_outbuf) - hvc_kick(); + hvc_kick();
return written; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 535fd4c98452c87537a40610abba45daf5761ec6 upstream.
When trying to set MCR[2], XON1 is incorrectly accessed instead. And when writing to the TCR register to configure flow control levels, we are incorrectly writing to the MSR register. The default value of $00 is then used for TCR, which means that selectable trigger levels in FCR are used in place of TCR.
TCR/TLR access requires EFR[4] (enable enhanced functions) and MCR[2] to be set. EFR[4] is already set in probe().
MCR access requires LCR[7] to be zero.
Since LCR is set to $BF when trying to set MCR[2], XON1 is incorrectly accessed instead because MCR shares the same address space as XON1.
Since MCR[2] is unmodified and still zero, when writing to TCR we are in fact writing to MSR because TCR/TLR registers share the same address space as MSR/SPR.
Fix by first removing useless reconfiguration of EFR[4] (enable enhanced functions), as it is already enabled in sc16is7xx_probe() since commit 43c51bb573aa ("sc16is7xx: make sure device is in suspend once probed"). Now LCR is $00, which means that MCR access is enabled.
Also remove regcache_cache_bypass() calls since we no longer access the enhanced registers set, and TCR is already declared as volatile (in fact by declaring MSR as volatile, which shares the same address).
Finally disable access to TCR/TLR registers after modifying them by clearing MCR[2].
Note: the comment about "... and internal clock div" is wrong and can be ignored/removed as access to internal clock div registers (DLL/DLH) is permitted only when LCR[7] is logic 1, not when enhanced features is enabled. And DLL/DLH access is not needed in sc16is7xx_startup().
Fixes: dfeae619d781 ("serial: sc16is7xx") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20250731124451.1108864-1-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -1161,17 +1161,6 @@ static int sc16is7xx_startup(struct uart sc16is7xx_port_write(port, SC16IS7XX_FCR_REG, SC16IS7XX_FCR_FIFO_BIT);
- /* Enable EFR */ - sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, - SC16IS7XX_LCR_CONF_MODE_B); - - regcache_cache_bypass(one->regmap, true); - - /* Enable write access to enhanced features and internal clock div */ - sc16is7xx_port_update(port, SC16IS7XX_EFR_REG, - SC16IS7XX_EFR_ENABLE_BIT, - SC16IS7XX_EFR_ENABLE_BIT); - /* Enable TCR/TLR */ sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_TCRTLR_BIT, @@ -1183,7 +1172,8 @@ static int sc16is7xx_startup(struct uart SC16IS7XX_TCR_RX_RESUME(24) | SC16IS7XX_TCR_RX_HALT(48));
- regcache_cache_bypass(one->regmap, false); + /* Disable TCR/TLR access */ + sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_TCRTLR_BIT, 0);
/* Now, initialize the UART */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, SC16IS7XX_LCR_WORD_LEN_8);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit ee047e1d85d73496541c54bd4f432c9464e13e65 upstream.
Lists should have fixed constraints, because binding must be specific in respect to hardware, thus add missing constraints to number of clocks.
Cc: stable stable@kernel.org Fixes: 88a499cd70d4 ("dt-bindings: Add support for the Broadcom UART driver") Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20250812121630.67072-2-krzysztof.kozlowski@linaro.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml +++ b/Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml @@ -41,7 +41,7 @@ properties: - const: dma_intr2
clocks: - minItems: 1 + maxItems: 1
clock-names: const: sw_baud
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabio Porcedda fabio.porcedda@gmail.com
commit cba70aff623b104085ab5613fedd21f6ea19095a upstream.
Add the following Telit Cinterion FN990A w/audio compositions:
0x1077: tty (diag) + adb + rmnet + audio + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#= 8 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1077 Rev=05.04 S: Manufacturer=Telit Wireless Solutions S: Product=FN990 S: SerialNumber=67e04c35 C: #Ifs=10 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio I: If#= 4 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio E: Ad=03(O) Atr=0d(Isoc) MxPS= 68 Ivl=1ms I: If#= 5 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio E: Ad=84(I) Atr=0d(Isoc) MxPS= 68 Ivl=1ms I: If#= 6 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8c(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
0x1078: tty (diag) + adb + MBIM + audio + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#= 21 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1078 Rev=05.04 S: Manufacturer=Telit Wireless Solutions S: Product=FN990 S: SerialNumber=67e04c35 C: #Ifs=11 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#=10 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8c(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=32ms I: If#= 3 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio I: If#= 5 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio I: If#= 6 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio E: Ad=84(I) Atr=0d(Isoc) MxPS= 68 Ivl=1ms I: If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
0x1079: RNDIS + tty (diag) + adb + audio + tty (AT/NMEA) + tty (AT) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#= 23 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1079 Rev=05.04 S: Manufacturer=Telit Wireless Solutions S: Product=FN990 S: SerialNumber=67e04c35 C: #Ifs=11 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#=10 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8c(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio I: If#= 5 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio I: If#= 6 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio E: Ad=84(I) Atr=0d(Isoc) MxPS= 68 Ivl=1ms I: If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=88(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=8a(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
Cc: stable@vger.kernel.org Signed-off-by: Fabio Porcedda fabio.porcedda@gmail.com Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1369,6 +1369,12 @@ static const struct usb_device_id option .driver_info = NCTRL(0) | RSVD(1) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1075, 0xff), /* Telit FN990A (PCIe) */ .driver_info = RSVD(0) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1077, 0xff), /* Telit FN990A (rmnet + audio) */ + .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1078, 0xff), /* Telit FN990A (MBIM + audio) */ + .driver_info = NCTRL(0) | RSVD(1) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1079, 0xff), /* Telit FN990A (RNDIS + audio) */ + .driver_info = NCTRL(2) | RSVD(3) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1080, 0xff), /* Telit FE990A (rmnet) */ .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1081, 0xff), /* Telit FE990A (MBIM) */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabio Porcedda fabio.porcedda@gmail.com
commit a5a261bea9bf8444300d1067b4a73bedee5b5227 upstream.
Add the following Telit Cinterion LE910C4-WWX new compositions:
0x1034: tty (AT) + tty (AT) + rmnet T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 8 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1034 Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x1036: tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1036 Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x1037: tty (diag) + tty (Telit custom) + tty (AT) + tty (AT) + rmnet T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 15 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1037 Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x1038: tty (Telit custom) + tty (AT) + tty (AT) + rmnet T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 9 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1038 Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x103b: tty (diag) + tty (Telit custom) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=103b Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x103c: tty (Telit custom) + tty (AT) + tty (AT) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 11 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=103c Rev=00.00 S: Manufacturer=Telit S: Product=LE910C4-WWX S: SerialNumber=93f617e7 C: #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
Cc: stable@vger.kernel.org Signed-off-by: Fabio Porcedda fabio.porcedda@gmail.com Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1322,7 +1322,18 @@ static const struct usb_device_id option .driver_info = NCTRL(0) | RSVD(3) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1033, 0xff), /* Telit LE910C1-EUX (ECM) */ .driver_info = NCTRL(0) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1034, 0xff), /* Telit LE910C4-WWX (rmnet) */ + .driver_info = RSVD(2) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1035, 0xff) }, /* Telit LE910C4-WWX (ECM) */ + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1036, 0xff) }, /* Telit LE910C4-WWX */ + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1037, 0xff), /* Telit LE910C4-WWX (rmnet) */ + .driver_info = NCTRL(0) | NCTRL(1) | RSVD(4) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1038, 0xff), /* Telit LE910C4-WWX (rmnet) */ + .driver_info = NCTRL(0) | RSVD(3) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x103b, 0xff), /* Telit LE910C4-WWX */ + .driver_info = NCTRL(0) | NCTRL(1) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x103c, 0xff), /* Telit LE910C4-WWX */ + .driver_info = NCTRL(0) }, { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE922_USBCFG0), .driver_info = RSVD(0) | RSVD(1) | NCTRL(2) | RSVD(3) }, { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE922_USBCFG1),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
[ Upstream commit 6f110a5e4f9977c31ce76fefbfef6fd4eab6bfb7 ]
... and don't error out so hard on missing module descriptions.
Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") we used to warn about missing module descriptions, but only when building with extra warnigns (ie 'W=1').
After that commit the warning became an unconditional hard error.
And it turns out not all modules have been converted despite the claims to the contrary. As reported by Damian Tometzki, the slub KUnit test didn't have a module description, and apparently nobody ever really noticed.
The reason nobody noticed seems to be that the slub KUnit tests get disabled by SLUB_TINY, which also ends up disabling a lot of other code, both in tests and in slub itself. And so anybody doing full build tests didn't actually see this failre.
So let's disable SLUB_TINY for build-only tests, since it clearly ends up limiting build coverage. Also turn the missing module descriptions error back into a warning, but let's keep it around for non-'W=1' builds.
Reported-by: Damian Tometzki damian@riscv-rocks.de Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874b... Cc: Masahiro Yamada masahiroy@kernel.org Cc: Jeff Johnson jeff.johnson@oss.qualcomm.com Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/Kconfig b/mm/Kconfig index 33fa51d608dc5..59c36bb9ce6b0 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -244,7 +244,7 @@ config SLUB
config SLUB_TINY bool "Configure for minimal memory footprint" - depends on EXPERT + depends on EXPERT && !COMPILE_TEST select SLAB_MERGE_DEFAULT help Configures the slab allocator in a way to achieve minimal memory
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chia-I Wu olvaffe@gmail.com
[ Upstream commit a00f2015acdbd8a4b3d2382eaeebe11db1925fad ]
A panthor group can have at most MAX_CS_PER_CSG panthor queues.
Fixes: 4bdca11507928 ("drm/panthor: Add the driver frontend block") Signed-off-by: Chia-I Wu olvaffe@gmail.com Reviewed-by: Boris Brezillon boris.brezillon@collabora.com # v1 Reviewed-by: Steven Price steven.price@arm.com Signed-off-by: Steven Price steven.price@arm.com Link: https://lore.kernel.org/r/20250903192133.288477-1-olvaffe@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panthor/panthor_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index c520f156e2d73..03eb7d52209a2 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1023,7 +1023,7 @@ static int panthor_ioctl_group_create(struct drm_device *ddev, void *data, struct drm_panthor_queue_create *queue_args; int ret;
- if (!args->queues.count) + if (!args->queues.count || args->queues.count > MAX_CS_PER_CSG) return -EINVAL;
ret = PANTHOR_UOBJ_GET_ARRAY(queue_args, &args->queues);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren wahrenst@gmx.net
[ Upstream commit 03e79de4608bdd48ad6eec272e196124cefaf798 ]
The function of_phy_find_device may return NULL, so we need to take care before dereferencing phy_dev.
Fixes: 64a632da538a ("net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()") Signed-off-by: Stefan Wahren wahrenst@gmx.net Cc: Christoph Niedermaier cniedermaier@dh-electronics.com Cc: Richard Leitner richard.leitner@skidata.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Wei Fang wei.fang@nxp.com Link: https://patch.msgid.link/20250904091334.53965-1-wahrenst@gmx.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/fec_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index a1cc338cf20f3..0bd814251d56e 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -2356,7 +2356,8 @@ static void fec_enet_phy_reset_after_clk_enable(struct net_device *ndev) */ phy_dev = of_phy_find_device(fep->phy_node); phy_reset_after_clk_enable(phy_dev); - put_device(&phy_dev->mdio.dev); + if (phy_dev) + put_device(&phy_dev->mdio.dev); } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alok Tiwari alok.a.tiwari@oracle.com
[ Upstream commit 1dbfb0363224f6da56f6655d596dc5097308d6f5 ]
Per family bind/unbind callbacks were introduced to allow families to track multicast group consumer presence, e.g. to start or stop producing events depending on listeners.
However, in genl_bind() the bind() callback was invoked even if capability checks failed and ret was set to -EPERM. This means that callbacks could run on behalf of unauthorized callers while the syscall still returned failure to user space.
Fix this by only invoking bind() after "if (ret) break;" check i.e. after permission checks have succeeded.
Fixes: 3de21a8990d3 ("genetlink: Add per family bind/unbind callbacks") Signed-off-by: Alok Tiwari alok.a.tiwari@oracle.com Link: https://patch.msgid.link/20250905135731.3026965-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netlink/genetlink.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 07ad65774fe29..3327d84518141 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -1836,6 +1836,9 @@ static int genl_bind(struct net *net, int group) !ns_capable(net->user_ns, CAP_SYS_ADMIN)) ret = -EPERM;
+ if (ret) + break; + if (family->bind) family->bind(i);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Machata petrm@nvidia.com
[ Upstream commit 8625f5748fea960d2af4f3c3e9891ee8f6f80906 ]
The bridge driver currently tolerates options that it does not recognize. Instead, it should bounce them.
Fixes: a428afe82f98 ("net: bridge: add support for user-controlled bool options") Signed-off-by: Petr Machata petrm@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://patch.msgid.link/e6fdca3b5a8d54183fbda075daffef38bdd7ddce.1757070067... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/net/bridge/br.c b/net/bridge/br.c index 2cab878e0a39c..ed08717541fe7 100644 --- a/net/bridge/br.c +++ b/net/bridge/br.c @@ -312,6 +312,13 @@ int br_boolopt_multi_toggle(struct net_bridge *br, int err = 0; int opt_id;
+ opt_id = find_next_bit(&bitmap, BITS_PER_LONG, BR_BOOLOPT_MAX); + if (opt_id != BITS_PER_LONG) { + NL_SET_ERR_MSG_FMT_MOD(extack, "Unknown boolean option %d", + opt_id); + return -EINVAL; + } + for_each_set_bit(opt_id, &bitmap, BR_BOOLOPT_MAX) { bool on = !!(bm->optval & BIT(opt_id));
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart atenart@kernel.org
[ Upstream commit e3c674db356c4303804b2415e7c2b11776cdd8c3 ]
If a GSO skb is sent through a Geneve tunnel and if Geneve options are added, the split GSO skb might not fit in the MTU anymore and an ICMP frag needed packet can be generated. In such case the ICMP packet might go through the segmentation logic (and dropped) later if it reaches a path were the GSO status is checked and segmentation is required.
This is especially true when an OvS bridge is used with a Geneve tunnel attached to it. The following set of actions could lead to the ICMP packet being wrongfully segmented:
1. An skb is constructed by the TCP layer (e.g. gso_type SKB_GSO_TCPV4, segs >= 2).
2. The skb hits the OvS bridge where Geneve options are added by an OvS action before being sent through the tunnel.
3. When the skb is xmited in the tunnel, the split skb does not fit anymore in the MTU and iptunnel_pmtud_build_icmp is called to generate an ICMP fragmentation needed packet. This is done by reusing the original (GSO!) skb. The GSO metadata is not cleared.
4. The ICMP packet being sent back hits the OvS bridge again and because skb_is_gso returns true, it goes through queue_gso_packets...
5. ...where __skb_gso_segment is called. The skb is then dropped.
6. Note that in the above example on re-transmission the skb won't be a GSO one as it would be segmented (len > MSS) and the ICMP packet should go through.
Fix this by resetting the GSO information before reusing an skb in iptunnel_pmtud_build_icmp and iptunnel_pmtud_build_icmpv6.
Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Reported-by: Adrian Moreno amorenoz@redhat.com Signed-off-by: Antoine Tenart atenart@kernel.org Reviewed-by: Stefano Brivio sbrivio@redhat.com Link: https://patch.msgid.link/20250904125351.159740-1-atenart@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_tunnel_core.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index f65d2f7273813..8392d304a72eb 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -204,6 +204,9 @@ static int iptunnel_pmtud_build_icmp(struct sk_buff *skb, int mtu) if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct iphdr))) return -EINVAL;
+ if (skb_is_gso(skb)) + skb_gso_reset(skb); + skb_copy_bits(skb, skb_mac_offset(skb), &eh, ETH_HLEN); pskb_pull(skb, ETH_HLEN); skb_reset_network_header(skb); @@ -298,6 +301,9 @@ static int iptunnel_pmtud_build_icmpv6(struct sk_buff *skb, int mtu) if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct ipv6hdr))) return -EINVAL;
+ if (skb_is_gso(skb)) + skb_gso_reset(skb); + skb_copy_bits(skb, skb_mac_offset(skb), &eh, ETH_HLEN); pskb_pull(skb, ETH_HLEN); skb_reset_network_header(skb);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Tran alex.t.tran@gmail.com
[ Upstream commit 641427d5bf90af0625081bf27555418b101274cd ]
The documentation of the 'bcm_msg_head' struct does not match how it is defined in 'bcm.h'. Changed the frames member to a flexible array, matching the definition in the header file.
See commit 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members")
Signed-off-by: Alex Tran alex.t.tran@gmail.com Acked-by: Oliver Hartkopp socketcan@hartkopp.net Link: https://patch.msgid.link/20250904031709.1426895-1-alex.t.tran@gmail.com Fixes: 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217783 Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/can.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/networking/can.rst b/Documentation/networking/can.rst index 62519d38c58ba..58cc609e8669b 100644 --- a/Documentation/networking/can.rst +++ b/Documentation/networking/can.rst @@ -742,7 +742,7 @@ The broadcast manager sends responses to user space in the same form: struct timeval ival1, ival2; /* count and subsequent interval */ canid_t can_id; /* unique can_id for task */ __u32 nframes; /* number of can_frames following */ - struct can_frame frames[0]; + struct can_frame frames[]; };
The aligned payload 'frames' uses the same basic CAN frame structure defined
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kohei Enju enjuk@amazon.com
[ Upstream commit d709f178abca22a4d3642513df29afe4323a594b ]
The igb driver incorrectly skips the link test when the network interface is admin down (if_running == false), causing the test to always report PASS regardless of the actual physical link state.
This behavior is inconsistent with other drivers (e.g. i40e, ice, ixgbe, etc.) which correctly test the physical link state regardless of admin state. Remove the if_running check to ensure link test always reflects the physical link state.
Fixes: 8d420a1b3ea6 ("igb: correct link test not being run when link is down") Signed-off-by: Kohei Enju enjuk@amazon.com Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Tested-by: Rinitha S sx.rinitha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_ethtool.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index ca6ccbc139548..6412c84e2d17d 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -2081,11 +2081,8 @@ static void igb_diag_test(struct net_device *netdev, } else { dev_info(&adapter->pdev->dev, "online testing starting\n");
- /* PHY is powered down when interface is down */ - if (if_running && igb_link_test(adapter, &data[TEST_LINK])) + if (igb_link_test(adapter, &data[TEST_LINK])) eth_test->flags |= ETH_TEST_FL_FAILED; - else - data[TEST_LINK] = 0;
/* Online tests aren't run; pass by default */ data[TEST_REG] = 0;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Schmidt mschmidt@redhat.com
[ Upstream commit 915470e1b44e71d1dd07ee067276f003c3521ee3 ]
If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration later than the first, the error path wants to free the IRQs requested so far. However, it uses the wrong dev_id argument for free_irq(), so it does not free the IRQs correctly and instead triggers the warning:
Trying to free already-free IRQ 173 WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0 Modules linked in: i40e(+) [...] CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy) Hardware name: [...] RIP: 0010:__free_irq+0x192/0x2c0 [...] Call Trace: <TASK> free_irq+0x32/0x70 i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e] i40e_vsi_request_irq+0x79/0x80 [i40e] i40e_vsi_open+0x21f/0x2f0 [i40e] i40e_open+0x63/0x130 [i40e] __dev_open+0xfc/0x210 __dev_change_flags+0x1fc/0x240 netif_change_flags+0x27/0x70 do_setlink.isra.0+0x341/0xc70 rtnl_newlink+0x468/0x860 rtnetlink_rcv_msg+0x375/0x450 netlink_rcv_skb+0x5c/0x110 netlink_unicast+0x288/0x3c0 netlink_sendmsg+0x20d/0x430 ____sys_sendmsg+0x3a2/0x3d0 ___sys_sendmsg+0x99/0xe0 __sys_sendmsg+0x8a/0xf0 do_syscall_64+0x82/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] </TASK> ---[ end trace 0000000000000000 ]---
Use the same dev_id for free_irq() as for request_irq().
I tested this with inserting code to fail intentionally.
Fixes: 493fb30011b3 ("i40e: Move q_vectors from pointer to array to array of pointers") Signed-off-by: Michal Schmidt mschmidt@redhat.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Reviewed-by: Subbaraya Sundeep sbhatta@marvell.com Tested-by: Rinitha S sx.rinitha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 55fb362eb5081..037c1a0cbd6a8 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -4206,7 +4206,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename) irq_num = pf->msix_entries[base + vector].vector; irq_set_affinity_notifier(irq_num, NULL); irq_update_affinity_hint(irq_num, NULL); - free_irq(irq_num, &vsi->q_vectors[vector]); + free_irq(irq_num, vsi->q_vectors[vector]); } return err; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit 1d66c3f2b8c0b5c51f3f4fe29b362c9851190c5a ]
This function can be called from an atomic context so we can't use fsleep().
Fixes: 01f60348d8fb ("drm/amd/display: Fix 'failed to blank crtc!'") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4549 Cc: Wen Chen Wen.Chen3@amd.com Cc: Fangzhi Zuo jerry.zuo@amd.com Cc: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Cc: Harry Wentland harry.wentland@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 27e4dc2c0543fd1808cc52bd888ee1e0533c4a2e) Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c index 08fc2a2c399f6..d96f52a551940 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c @@ -945,7 +945,7 @@ enum dc_status dcn20_enable_stream_timing( return DC_ERROR_UNEXPECTED; }
- fsleep(stream->timing.v_total * (stream->timing.h_total * 10000u / stream->timing.pix_clk_100hz)); + udelay(stream->timing.v_total * (stream->timing.h_total * 10000u / stream->timing.pix_clk_100hz));
params.vertical_total_min = stream->adjust.v_total_min; params.vertical_total_max = stream->adjust.v_total_max;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit f214744c8a27c3c1da6b538c232da22cd027530e ]
Commit 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct callback") expects that a call to j1939_priv_put() can be unconditionally delayed until j1939_sk_sock_destruct() is called. But a refcount leak will happen when j1939_sk_bind() is called again after j1939_local_ecu_get() from previous j1939_sk_bind() call returned an error. We need to call j1939_priv_put() before j1939_sk_bind() returns an error.
Fixes: 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct callback") Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Tested-by: Oleksij Rempel o.rempel@pengutronix.de Acked-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://patch.msgid.link/4f49a1bc-a528-42ad-86c0-187268ab6535@I-love.SAKURA.... Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/can/j1939/socket.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index 17226b2341d03..75c2b9b233901 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -520,6 +520,9 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) ret = j1939_local_ecu_get(priv, jsk->addr.src_name, jsk->addr.sa); if (ret) { j1939_netdev_stop(priv); + jsk->priv = NULL; + synchronize_rcu(); + j1939_priv_put(priv); goto out_release_sock; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 06e02da29f6f1a45fc07bd60c7eaf172dc21e334 ]
Since j1939_sk_bind() and j1939_sk_release() call j1939_local_ecu_put() when J1939_SOCK_BOUND was already set, but the error handling path for j1939_sk_bind() will not set J1939_SOCK_BOUND when j1939_local_ecu_get() fails, j1939_local_ecu_get() needs to undo priv->ents[sa].nusers++ when j1939_local_ecu_get() returns an error.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Tested-by: Oleksij Rempel o.rempel@pengutronix.de Acked-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://patch.msgid.link/e7f80046-4ff7-4ce2-8ad8-7c3c678a42c9@I-love.SAKURA.... Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/can/j1939/bus.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/can/j1939/bus.c b/net/can/j1939/bus.c index 4866879016021..e0b966c2517cf 100644 --- a/net/can/j1939/bus.c +++ b/net/can/j1939/bus.c @@ -290,8 +290,11 @@ int j1939_local_ecu_get(struct j1939_priv *priv, name_t name, u8 sa) if (!ecu) ecu = j1939_ecu_create_locked(priv, name); err = PTR_ERR_OR_ZERO(ecu); - if (err) + if (err) { + if (j1939_address_is_unicast(sa)) + priv->ents[sa].nusers--; goto done; + }
ecu->nusers++; /* TODO: do we care if ecu->addr != sa? */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anssi Hannula anssi.hannula@bitwise.fi
[ Upstream commit ef79f00be72bd81d2e1e6f060d83cf7e425deee4 ]
can_put_echo_skb() takes ownership of the SKB and it may be freed during or after the call.
However, xilinx_can xcan_write_frame() keeps using SKB after the call.
Fix that by only calling can_put_echo_skb() after the code is done touching the SKB.
The tx_lock is held for the entire xcan_write_frame() execution and also on the can_get_echo_skb() side so the order of operations does not matter.
An earlier fix commit 3d3c817c3a40 ("can: xilinx_can: Fix usage of skb memory") did not move the can_put_echo_skb() call far enough.
Signed-off-by: Anssi Hannula anssi.hannula@bitwise.fi Fixes: 1598efe57b3e ("can: xilinx_can: refactor code in preparation for CAN FD support") Link: https://patch.msgid.link/20250822095002.168389-1-anssi.hannula@bitwise.fi [mkl: add "commit" in front of sha1 in patch description] [mkl: fix indention] Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/xilinx_can.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index 436c0e4b0344c..91382225f1140 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -690,14 +690,6 @@ static void xcan_write_frame(struct net_device *ndev, struct sk_buff *skb, dlc |= XCAN_DLCR_EDL_MASK; }
- if (!(priv->devtype.flags & XCAN_FLAG_TX_MAILBOXES) && - (priv->devtype.flags & XCAN_FLAG_TXFEMP)) - can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max, 0); - else - can_put_echo_skb(skb, ndev, 0, 0); - - priv->tx_head++; - priv->write_reg(priv, XCAN_FRAME_ID_OFFSET(frame_offset), id); /* If the CAN frame is RTR frame this write triggers transmission * (not on CAN FD) @@ -730,6 +722,14 @@ static void xcan_write_frame(struct net_device *ndev, struct sk_buff *skb, data[1]); } } + + if (!(priv->devtype.flags & XCAN_FLAG_TX_MAILBOXES) && + (priv->devtype.flags & XCAN_FLAG_TXFEMP)) + can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max, 0); + else + can_put_echo_skb(skb, ndev, 0, 0); + + priv->tx_head++; }
/**
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 7792c1e03054440c60d4bce0c06a31c134601997 ]
They are not used anymore, so remove them.
Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 9e4e25f2458f9..9ac48e6b4332c 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -502,8 +502,6 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
/** * pipapo_get() - Get matching element reference given key data - * @net: Network namespace - * @set: nftables API set representation * @m: storage containing active/existing elements * @data: Key data to be matched against existing elements * @genmask: If set, check that element is active in given genmask @@ -516,9 +514,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, * * Return: pointer to &struct nft_pipapo_elem on match, error pointer otherwise. */ -static struct nft_pipapo_elem *pipapo_get(const struct net *net, - const struct nft_set *set, - const struct nft_pipapo_match *m, +static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, const u8 *data, u8 genmask, u64 tstamp, gfp_t gfp) { @@ -615,7 +611,7 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set, struct nft_pipapo_match *m = rcu_dereference(priv->match); struct nft_pipapo_elem *e;
- e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data, + e = pipapo_get(m, (const u8 *)elem->key.val.data, nft_genmask_cur(net), get_jiffies_64(), GFP_ATOMIC); if (IS_ERR(e)) @@ -1344,7 +1340,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, else end = start;
- dup = pipapo_get(net, set, m, start, genmask, tstamp, GFP_KERNEL); + dup = pipapo_get(m, start, genmask, tstamp, GFP_KERNEL); if (!IS_ERR(dup)) { /* Check if we already have the same exact entry */ const struct nft_data *dup_key, *dup_end; @@ -1366,7 +1362,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
if (PTR_ERR(dup) == -ENOENT) { /* Look for partially overlapping entries */ - dup = pipapo_get(net, set, m, end, nft_genmask_next(net), tstamp, + dup = pipapo_get(m, end, nft_genmask_next(net), tstamp, GFP_KERNEL); }
@@ -1913,7 +1909,7 @@ nft_pipapo_deactivate(const struct net *net, const struct nft_set *set, if (!m) return NULL;
- e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data, + e = pipapo_get(m, (const u8 *)elem->key.val.data, nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL); if (IS_ERR(e)) return NULL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 17a20e09f086f2c574ac87f3cf6e14c4377f65f6 ]
Return the extension pointer instead of passing it as a function argument to be filled in by the callee.
As-is, whenever false is returned, the extension pointer is not used.
For all set types, when true is returned, the extension pointer was set to the matching element.
Only exception: nft_set_bitmap doesn't support extensions. Return a pointer to a static const empty element extension container.
return false -> return NULL return true -> return the elements' extension pointer.
This saves one function argument.
Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 10 ++--- include/net/netfilter/nf_tables_core.h | 47 ++++++++++++---------- net/netfilter/nft_dynset.c | 5 ++- net/netfilter/nft_lookup.c | 27 ++++++------- net/netfilter/nft_objref.c | 5 +-- net/netfilter/nft_set_bitmap.c | 11 ++++-- net/netfilter/nft_set_hash.c | 54 ++++++++++++-------------- net/netfilter/nft_set_pipapo.c | 19 +++++---- net/netfilter/nft_set_pipapo_avx2.c | 25 ++++++------ net/netfilter/nft_set_rbtree.c | 40 +++++++++---------- 10 files changed, 126 insertions(+), 117 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 757abcb54d117..bad0c6f7ed53d 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -459,19 +459,17 @@ struct nft_set_ext; * control plane functions. */ struct nft_set_ops { - bool (*lookup)(const struct net *net, + const struct nft_set_ext * (*lookup)(const struct net *net, const struct nft_set *set, - const u32 *key, - const struct nft_set_ext **ext); - bool (*update)(struct nft_set *set, + const u32 *key); + const struct nft_set_ext * (*update)(struct nft_set *set, const u32 *key, struct nft_elem_priv * (*new)(struct nft_set *, const struct nft_expr *, struct nft_regs *), const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_set_ext **ext); + struct nft_regs *regs); bool (*delete)(const struct nft_set *set, const u32 *key);
diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h index 03b6165756fc5..6a52fb97b8443 100644 --- a/include/net/netfilter/nf_tables_core.h +++ b/include/net/netfilter/nf_tables_core.h @@ -94,34 +94,41 @@ extern const struct nft_set_type nft_set_pipapo_type; extern const struct nft_set_type nft_set_pipapo_avx2_type;
#ifdef CONFIG_MITIGATION_RETPOLINE -bool nft_rhash_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_bitmap_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_hash_lookup_fast(const struct net *net, - const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_hash_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); -bool nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); +const struct nft_set_ext * +nft_rhash_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_rbtree_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_bitmap_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_hash_lookup_fast(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_hash_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); +const struct nft_set_ext * +nft_set_do_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); #else -static inline bool +static inline const struct nft_set_ext * nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) + const u32 *key) { - return set->ops->lookup(net, set, key, ext); + return set->ops->lookup(net, set, key); } #endif
/* called from nft_pipapo_avx2.c */ -bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); +const struct nft_set_ext * +nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + const u32 *key); /* called from nft_set_pipapo.c */ -bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext); +const struct nft_set_ext * +nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, + const u32 *key);
void nft_counter_init_seqcount(void);
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c index 88922e0e8e837..e24493d9e7761 100644 --- a/net/netfilter/nft_dynset.c +++ b/net/netfilter/nft_dynset.c @@ -91,8 +91,9 @@ void nft_dynset_eval(const struct nft_expr *expr, return; }
- if (set->ops->update(set, ®s->data[priv->sreg_key], nft_dynset_new, - expr, regs, &ext)) { + ext = set->ops->update(set, ®s->data[priv->sreg_key], nft_dynset_new, + expr, regs); + if (ext) { if (priv->op == NFT_DYNSET_OP_UPDATE && nft_set_ext_exists(ext, NFT_SET_EXT_TIMEOUT) && READ_ONCE(nft_set_ext_timeout(ext)->timeout) != 0) { diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 63ef832b8aa71..40c602ffbcba7 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -25,32 +25,33 @@ struct nft_lookup { };
#ifdef CONFIG_MITIGATION_RETPOLINE -bool nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_set_do_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { if (set->ops == &nft_set_hash_fast_type.ops) - return nft_hash_lookup_fast(net, set, key, ext); + return nft_hash_lookup_fast(net, set, key); if (set->ops == &nft_set_hash_type.ops) - return nft_hash_lookup(net, set, key, ext); + return nft_hash_lookup(net, set, key);
if (set->ops == &nft_set_rhash_type.ops) - return nft_rhash_lookup(net, set, key, ext); + return nft_rhash_lookup(net, set, key);
if (set->ops == &nft_set_bitmap_type.ops) - return nft_bitmap_lookup(net, set, key, ext); + return nft_bitmap_lookup(net, set, key);
if (set->ops == &nft_set_pipapo_type.ops) - return nft_pipapo_lookup(net, set, key, ext); + return nft_pipapo_lookup(net, set, key); #if defined(CONFIG_X86_64) && !defined(CONFIG_UML) if (set->ops == &nft_set_pipapo_avx2_type.ops) - return nft_pipapo_avx2_lookup(net, set, key, ext); + return nft_pipapo_avx2_lookup(net, set, key); #endif
if (set->ops == &nft_set_rbtree_type.ops) - return nft_rbtree_lookup(net, set, key, ext); + return nft_rbtree_lookup(net, set, key);
WARN_ON_ONCE(1); - return set->ops->lookup(net, set, key, ext); + return set->ops->lookup(net, set, key); } EXPORT_SYMBOL_GPL(nft_set_do_lookup); #endif @@ -61,12 +62,12 @@ void nft_lookup_eval(const struct nft_expr *expr, { const struct nft_lookup *priv = nft_expr_priv(expr); const struct nft_set *set = priv->set; - const struct nft_set_ext *ext = NULL; const struct net *net = nft_net(pkt); + const struct nft_set_ext *ext; bool found;
- found = nft_set_do_lookup(net, set, ®s->data[priv->sreg], &ext) ^ - priv->invert; + ext = nft_set_do_lookup(net, set, ®s->data[priv->sreg]); + found = !!ext ^ priv->invert; if (!found) { ext = nft_set_catchall_lookup(net, set); if (!ext) { diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c index 09da7a3f9f967..8ee66a86c3bc7 100644 --- a/net/netfilter/nft_objref.c +++ b/net/netfilter/nft_objref.c @@ -111,10 +111,9 @@ void nft_objref_map_eval(const struct nft_expr *expr, struct net *net = nft_net(pkt); const struct nft_set_ext *ext; struct nft_object *obj; - bool found;
- found = nft_set_do_lookup(net, set, ®s->data[priv->sreg], &ext); - if (!found) { + ext = nft_set_do_lookup(net, set, ®s->data[priv->sreg]); + if (!ext) { ext = nft_set_catchall_lookup(net, set); if (!ext) { regs->verdict.code = NFT_BREAK; diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c index 1caa04619dc6d..b4765fb92d727 100644 --- a/net/netfilter/nft_set_bitmap.c +++ b/net/netfilter/nft_set_bitmap.c @@ -75,16 +75,21 @@ nft_bitmap_active(const u8 *bitmap, u32 idx, u32 off, u8 genmask) }
INDIRECT_CALLABLE_SCOPE -bool nft_bitmap_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_bitmap_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { const struct nft_bitmap *priv = nft_set_priv(set); + static const struct nft_set_ext found; u8 genmask = nft_genmask_cur(net); u32 idx, off;
nft_bitmap_location(set, key, &idx, &off);
- return nft_bitmap_active(priv->bitmap, idx, off, genmask); + if (nft_bitmap_active(priv->bitmap, idx, off, genmask)) + return &found; + + return NULL; }
static struct nft_bitmap_elem * diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c index 4b3452dff2ec0..900eddb93dcc8 100644 --- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -81,8 +81,9 @@ static const struct rhashtable_params nft_rhash_params = { };
INDIRECT_CALLABLE_SCOPE -bool nft_rhash_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_rhash_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_rhash *priv = nft_set_priv(set); const struct nft_rhash_elem *he; @@ -95,9 +96,9 @@ bool nft_rhash_lookup(const struct net *net, const struct nft_set *set,
he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params); if (he != NULL) - *ext = &he->ext; + return &he->ext;
- return !!he; + return NULL; }
static struct nft_elem_priv * @@ -120,14 +121,11 @@ nft_rhash_get(const struct net *net, const struct nft_set *set, return ERR_PTR(-ENOENT); }
-static bool nft_rhash_update(struct nft_set *set, const u32 *key, - struct nft_elem_priv * - (*new)(struct nft_set *, - const struct nft_expr *, - struct nft_regs *regs), - const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_set_ext **ext) +static const struct nft_set_ext * +nft_rhash_update(struct nft_set *set, const u32 *key, + struct nft_elem_priv *(*new)(struct nft_set *, const struct nft_expr *, + struct nft_regs *regs), + const struct nft_expr *expr, struct nft_regs *regs) { struct nft_rhash *priv = nft_set_priv(set); struct nft_rhash_elem *he, *prev; @@ -161,14 +159,13 @@ static bool nft_rhash_update(struct nft_set *set, const u32 *key, }
out: - *ext = &he->ext; - return true; + return &he->ext;
err2: nft_set_elem_destroy(set, &he->priv, true); atomic_dec(&set->nelems); err1: - return false; + return NULL; }
static int nft_rhash_insert(const struct net *net, const struct nft_set *set, @@ -507,8 +504,9 @@ struct nft_hash_elem { };
INDIRECT_CALLABLE_SCOPE -bool nft_hash_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_hash_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_hash *priv = nft_set_priv(set); u8 genmask = nft_genmask_cur(net); @@ -519,12 +517,10 @@ bool nft_hash_lookup(const struct net *net, const struct nft_set *set, hash = reciprocal_scale(hash, priv->buckets); hlist_for_each_entry_rcu(he, &priv->table[hash], node) { if (!memcmp(nft_set_ext_key(&he->ext), key, set->klen) && - nft_set_elem_active(&he->ext, genmask)) { - *ext = &he->ext; - return true; - } + nft_set_elem_active(&he->ext, genmask)) + return &he->ext; } - return false; + return NULL; }
static struct nft_elem_priv * @@ -547,9 +543,9 @@ nft_hash_get(const struct net *net, const struct nft_set *set, }
INDIRECT_CALLABLE_SCOPE -bool nft_hash_lookup_fast(const struct net *net, - const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_hash_lookup_fast(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_hash *priv = nft_set_priv(set); u8 genmask = nft_genmask_cur(net); @@ -562,12 +558,10 @@ bool nft_hash_lookup_fast(const struct net *net, hlist_for_each_entry_rcu(he, &priv->table[hash], node) { k2 = *(u32 *)nft_set_ext_key(&he->ext)->data; if (k1 == k2 && - nft_set_elem_active(&he->ext, genmask)) { - *ext = &he->ext; - return true; - } + nft_set_elem_active(&he->ext, genmask)) + return &he->ext; } - return false; + return NULL; }
static u32 nft_jhash(const struct nft_set *set, const struct nft_hash *priv, diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 9ac48e6b4332c..a844b33fa6002 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -407,8 +407,9 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules, * * Return: true on match, false otherwise. */ -bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_pipapo *priv = nft_set_priv(set); struct nft_pipapo_scratch *scratch; @@ -465,13 +466,15 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, scratch->map_index = map_index; local_bh_enable();
- return false; + return NULL; }
if (last) { - *ext = &f->mt[b].e->ext; - if (unlikely(nft_set_elem_expired(*ext) || - !nft_set_elem_active(*ext, genmask))) + const struct nft_set_ext *ext; + + ext = &f->mt[b].e->ext; + if (unlikely(nft_set_elem_expired(ext) || + !nft_set_elem_active(ext, genmask))) goto next_match;
/* Last field: we're just returning the key without @@ -482,7 +485,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, scratch->map_index = map_index; local_bh_enable();
- return true; + return ext; }
/* Swap bitmap indices: res_map is the initial bitmap for the @@ -497,7 +500,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
out: local_bh_enable(); - return false; + return NULL; }
/** diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index be7c16c79f711..6c441e2dc8af3 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1146,8 +1146,9 @@ static inline void pipapo_resmap_init_avx2(const struct nft_pipapo_match *m, uns * * Return: true on match, false otherwise. */ -bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_pipapo *priv = nft_set_priv(set); struct nft_pipapo_scratch *scratch; @@ -1155,17 +1156,18 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; const u8 *rp = (const u8 *)key; + const struct nft_set_ext *ext; unsigned long *res, *fill; bool map_index; - int i, ret = 0; + int i;
local_bh_disable();
if (unlikely(!irq_fpu_usable())) { - bool fallback_res = nft_pipapo_lookup(net, set, key, ext); + ext = nft_pipapo_lookup(net, set, key);
local_bh_enable(); - return fallback_res; + return ext; }
m = rcu_dereference(priv->match); @@ -1182,7 +1184,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, if (unlikely(!scratch)) { kernel_fpu_end(); local_bh_enable(); - return false; + return NULL; }
map_index = scratch->map_index; @@ -1197,6 +1199,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, next_match: nft_pipapo_for_each_field(f, i, m) { bool last = i == m->field_count - 1, first = !i; + int ret = 0;
#define NFT_SET_PIPAPO_AVX2_LOOKUP(b, n) \ (ret = nft_pipapo_avx2_lookup_##b##b_##n(res, fill, f, \ @@ -1244,10 +1247,10 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, goto out;
if (last) { - *ext = &f->mt[ret].e->ext; - if (unlikely(nft_set_elem_expired(*ext) || - !nft_set_elem_active(*ext, genmask))) { - ret = 0; + ext = &f->mt[ret].e->ext; + if (unlikely(nft_set_elem_expired(ext) || + !nft_set_elem_active(ext, genmask))) { + ext = NULL; goto next_match; }
@@ -1264,5 +1267,5 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, kernel_fpu_end(); local_bh_enable();
- return ret >= 0; + return ext; } diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index 2e8ef16ff191d..938a257c069e2 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -52,9 +52,9 @@ static bool nft_rbtree_elem_expired(const struct nft_rbtree_elem *rbe) return nft_set_elem_expired(&rbe->ext); }
-static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext, - unsigned int seq) +static const struct nft_set_ext * +__nft_rbtree_lookup(const struct net *net, const struct nft_set *set, + const u32 *key, unsigned int seq) { struct nft_rbtree *priv = nft_set_priv(set); const struct nft_rbtree_elem *rbe, *interval = NULL; @@ -65,7 +65,7 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set parent = rcu_dereference_raw(priv->root.rb_node); while (parent != NULL) { if (read_seqcount_retry(&priv->count, seq)) - return false; + return NULL;
rbe = rb_entry(parent, struct nft_rbtree_elem, node);
@@ -87,50 +87,48 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set }
if (nft_rbtree_elem_expired(rbe)) - return false; + return NULL;
if (nft_rbtree_interval_end(rbe)) { if (nft_set_is_anonymous(set)) - return false; + return NULL; parent = rcu_dereference_raw(parent->rb_left); interval = NULL; continue; }
- *ext = &rbe->ext; - return true; + return &rbe->ext; } }
if (set->flags & NFT_SET_INTERVAL && interval != NULL && nft_set_elem_active(&interval->ext, genmask) && !nft_rbtree_elem_expired(interval) && - nft_rbtree_interval_start(interval)) { - *ext = &interval->ext; - return true; - } + nft_rbtree_interval_start(interval)) + return &interval->ext;
- return false; + return NULL; }
INDIRECT_CALLABLE_SCOPE -bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +const struct nft_set_ext * +nft_rbtree_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { struct nft_rbtree *priv = nft_set_priv(set); unsigned int seq = read_seqcount_begin(&priv->count); - bool ret; + const struct nft_set_ext *ext;
- ret = __nft_rbtree_lookup(net, set, key, ext, seq); - if (ret || !read_seqcount_retry(&priv->count, seq)) - return ret; + ext = __nft_rbtree_lookup(net, set, key, seq); + if (ext || !read_seqcount_retry(&priv->count, seq)) + return ext;
read_lock_bh(&priv->lock); seq = read_seqcount_begin(&priv->count); - ret = __nft_rbtree_lookup(net, set, key, ext, seq); + ext = __nft_rbtree_lookup(net, set, key, seq); read_unlock_bh(&priv->lock);
- return ret; + return ext; }
static bool __nft_rbtree_get(const struct net *net, const struct nft_set *set,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit d8d871a35ca9ee4881d34995444ed1cb826d01db ]
The matching algorithm has implemented thrice: 1. data path lookup, generic version 2. data path lookup, avx2 version 3. control plane lookup
Merge 1 and 3 by refactoring pipapo_get as a common helper, then make nft_pipapo_lookup and nft_pipapo_get both call the common helper.
Aside from the code savings this has the benefit that we no longer allocate temporary scratch maps for each control plane get and insertion operation.
Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 188 ++++++++++----------------------- 1 file changed, 58 insertions(+), 130 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index a844b33fa6002..1a19649c28511 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -397,35 +397,36 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules, }
/** - * nft_pipapo_lookup() - Lookup function - * @net: Network namespace - * @set: nftables API set representation - * @key: nftables API element representation containing key data - * @ext: nftables API extension pointer, filled with matching reference + * pipapo_get() - Get matching element reference given key data + * @m: storage containing the set elements + * @data: Key data to be matched against existing elements + * @genmask: If set, check that element is active in given genmask + * @tstamp: timestamp to check for expired elements * * For more details, see DOC: Theory of Operation. * - * Return: true on match, false otherwise. + * This is the main lookup function. It matches key data against either + * the working match set or the uncommitted copy, depending on what the + * caller passed to us. + * nft_pipapo_get (lookup from userspace/control plane) and nft_pipapo_lookup + * (datapath lookup) pass the active copy. + * The insertion path will pass the uncommitted working copy. + * + * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise. */ -const struct nft_set_ext * -nft_pipapo_lookup(const struct net *net, const struct nft_set *set, - const u32 *key) +static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, + const u8 *data, u8 genmask, + u64 tstamp) { - struct nft_pipapo *priv = nft_set_priv(set); struct nft_pipapo_scratch *scratch; unsigned long *res_map, *fill_map; - u8 genmask = nft_genmask_cur(net); - const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; - const u8 *rp = (const u8 *)key; bool map_index; int i;
local_bh_disable();
- m = rcu_dereference(priv->match); - - if (unlikely(!m || !*raw_cpu_ptr(m->scratch))) + if (unlikely(!raw_cpu_ptr(m->scratch))) goto out;
scratch = *raw_cpu_ptr(m->scratch); @@ -445,12 +446,12 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, * packet bytes value, then AND bucket value */ if (likely(f->bb == 8)) - pipapo_and_field_buckets_8bit(f, res_map, rp); + pipapo_and_field_buckets_8bit(f, res_map, data); else - pipapo_and_field_buckets_4bit(f, res_map, rp); + pipapo_and_field_buckets_4bit(f, res_map, data); NFT_PIPAPO_GROUP_BITS_ARE_8_OR_4;
- rp += f->groups / NFT_PIPAPO_GROUPS_PER_BYTE(f); + data += f->groups / NFT_PIPAPO_GROUPS_PER_BYTE(f);
/* Now populate the bitmap for the next field, unless this is * the last field, in which case return the matched 'ext' @@ -470,11 +471,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, }
if (last) { - const struct nft_set_ext *ext; + struct nft_pipapo_elem *e;
- ext = &f->mt[b].e->ext; - if (unlikely(nft_set_elem_expired(ext) || - !nft_set_elem_active(ext, genmask))) + e = f->mt[b].e; + if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) || + !nft_set_elem_active(&e->ext, genmask))) goto next_match;
/* Last field: we're just returning the key without @@ -484,8 +485,7 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, */ scratch->map_index = map_index; local_bh_enable(); - - return ext; + return e; }
/* Swap bitmap indices: res_map is the initial bitmap for the @@ -495,7 +495,7 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, map_index = !map_index; swap(res_map, fill_map);
- rp += NFT_PIPAPO_GROUPS_PADDING(f); + data += NFT_PIPAPO_GROUPS_PADDING(f); }
out: @@ -504,99 +504,29 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, }
/** - * pipapo_get() - Get matching element reference given key data - * @m: storage containing active/existing elements - * @data: Key data to be matched against existing elements - * @genmask: If set, check that element is active in given genmask - * @tstamp: timestamp to check for expired elements - * @gfp: the type of memory to allocate (see kmalloc). + * nft_pipapo_lookup() - Dataplane fronted for main lookup function + * @net: Network namespace + * @set: nftables API set representation + * @key: pointer to nft registers containing key data * - * This is essentially the same as the lookup function, except that it matches - * key data against the uncommitted copy and doesn't use preallocated maps for - * bitmap results. + * This function is called from the data path. It will search for + * an element matching the given key in the current active copy. * - * Return: pointer to &struct nft_pipapo_elem on match, error pointer otherwise. + * Return: ntables API extension pointer or NULL if no match. */ -static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, - const u8 *data, u8 genmask, - u64 tstamp, gfp_t gfp) +const struct nft_set_ext * +nft_pipapo_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { - struct nft_pipapo_elem *ret = ERR_PTR(-ENOENT); - unsigned long *res_map, *fill_map = NULL; - const struct nft_pipapo_field *f; - int i; - - if (m->bsize_max == 0) - return ret; - - res_map = kmalloc_array(m->bsize_max, sizeof(*res_map), gfp); - if (!res_map) { - ret = ERR_PTR(-ENOMEM); - goto out; - } - - fill_map = kcalloc(m->bsize_max, sizeof(*res_map), gfp); - if (!fill_map) { - ret = ERR_PTR(-ENOMEM); - goto out; - } - - pipapo_resmap_init(m, res_map); - - nft_pipapo_for_each_field(f, i, m) { - bool last = i == m->field_count - 1; - int b; - - /* For each bit group: select lookup table bucket depending on - * packet bytes value, then AND bucket value - */ - if (f->bb == 8) - pipapo_and_field_buckets_8bit(f, res_map, data); - else if (f->bb == 4) - pipapo_and_field_buckets_4bit(f, res_map, data); - else - BUG(); - - data += f->groups / NFT_PIPAPO_GROUPS_PER_BYTE(f); - - /* Now populate the bitmap for the next field, unless this is - * the last field, in which case return the matched 'ext' - * pointer if any. - * - * Now res_map contains the matching bitmap, and fill_map is the - * bitmap for the next field. - */ -next_match: - b = pipapo_refill(res_map, f->bsize, f->rules, fill_map, f->mt, - last); - if (b < 0) - goto out; - - if (last) { - if (__nft_set_elem_expired(&f->mt[b].e->ext, tstamp)) - goto next_match; - if ((genmask && - !nft_set_elem_active(&f->mt[b].e->ext, genmask))) - goto next_match; - - ret = f->mt[b].e; - goto out; - } - - data += NFT_PIPAPO_GROUPS_PADDING(f); + struct nft_pipapo *priv = nft_set_priv(set); + u8 genmask = nft_genmask_cur(net); + const struct nft_pipapo_match *m; + const struct nft_pipapo_elem *e;
- /* Swap bitmap indices: fill_map will be the initial bitmap for - * the next field (i.e. the new res_map), and res_map is - * guaranteed to be all-zeroes at this point, ready to be filled - * according to the next mapping table. - */ - swap(res_map, fill_map); - } + m = rcu_dereference(priv->match); + e = pipapo_get(m, (const u8 *)key, genmask, get_jiffies_64());
-out: - kfree(fill_map); - kfree(res_map); - return ret; + return e ? &e->ext : NULL; }
/** @@ -605,6 +535,11 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, * @set: nftables API set representation * @elem: nftables API element representation containing key data * @flags: Unused + * + * This function is called from the control plane path under + * RCU read lock. + * + * Return: set element private pointer or ERR_PTR(-ENOENT). */ static struct nft_elem_priv * nft_pipapo_get(const struct net *net, const struct nft_set *set, @@ -615,10 +550,9 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set, struct nft_pipapo_elem *e;
e = pipapo_get(m, (const u8 *)elem->key.val.data, - nft_genmask_cur(net), get_jiffies_64(), - GFP_ATOMIC); - if (IS_ERR(e)) - return ERR_CAST(e); + nft_genmask_cur(net), get_jiffies_64()); + if (!e) + return ERR_PTR(-ENOENT);
return &e->priv; } @@ -1343,8 +1277,8 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, else end = start;
- dup = pipapo_get(m, start, genmask, tstamp, GFP_KERNEL); - if (!IS_ERR(dup)) { + dup = pipapo_get(m, start, genmask, tstamp); + if (dup) { /* Check if we already have the same exact entry */ const struct nft_data *dup_key, *dup_end;
@@ -1363,15 +1297,9 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, return -ENOTEMPTY; }
- if (PTR_ERR(dup) == -ENOENT) { - /* Look for partially overlapping entries */ - dup = pipapo_get(m, end, nft_genmask_next(net), tstamp, - GFP_KERNEL); - } - - if (PTR_ERR(dup) != -ENOENT) { - if (IS_ERR(dup)) - return PTR_ERR(dup); + /* Look for partially overlapping entries */ + dup = pipapo_get(m, end, nft_genmask_next(net), tstamp); + if (dup) { *elem_priv = &dup->priv; return -ENOTEMPTY; } @@ -1913,8 +1841,8 @@ nft_pipapo_deactivate(const struct net *net, const struct nft_set *set, return NULL;
e = pipapo_get(m, (const u8 *)elem->key.val.data, - nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL); - if (IS_ERR(e)) + nft_genmask_next(net), nft_net_tstamp(net)); + if (!e) return NULL;
nft_set_elem_change_active(net, set, &e->ext);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit c8a7c2c608180f3b4e51dc958b3861242dcdd76d ]
Dan Carpenter says: Commit 17a20e09f086 ("netfilter: nft_set: remove one argument from lookup and update functions") [..] leads to the following Smatch static checker warning:
net/netfilter/nft_set_pipapo_avx2.c:1269 nft_pipapo_avx2_lookup() error: uninitialized symbol 'ext'.
Fix this by initing ext to NULL and set it only once we've found a match.
Fixes: 17a20e09f086 ("netfilter: nft_set: remove one argument from lookup and update functions") Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/netfilter-devel/aJBzc3V5wk-yPOnH@stanley.mountain/ Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo_avx2.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index 6c441e2dc8af3..2155c7f345c21 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1151,12 +1151,12 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, const u32 *key) { struct nft_pipapo *priv = nft_set_priv(set); + const struct nft_set_ext *ext = NULL; struct nft_pipapo_scratch *scratch; u8 genmask = nft_genmask_cur(net); const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; const u8 *rp = (const u8 *)key; - const struct nft_set_ext *ext; unsigned long *res, *fill; bool map_index; int i; @@ -1247,13 +1247,13 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, goto out;
if (last) { - ext = &f->mt[ret].e->ext; - if (unlikely(nft_set_elem_expired(ext) || - !nft_set_elem_active(ext, genmask))) { - ext = NULL; + const struct nft_set_ext *e = &f->mt[ret].e->ext; + + if (unlikely(nft_set_elem_expired(e) || + !nft_set_elem_active(e, genmask))) goto next_match; - }
+ ext = e; goto out; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit c4eaca2e1052adfd67bed0a36a9d4b8e515666e4 ]
The pipapo set type is special in that it has two copies of its datastructure: one live copy containing only valid elements and one on-demand clone used during transaction where adds/deletes happen.
This clone is not visible to the datapath.
This is unlike all other set types in nftables, those all link new elements into their live hlist/tree.
For those sets, the lookup functions must skip the new elements while the transaction is ongoing to ensure consistency.
As the clone is shallow, removal does have an effect on the packet path: once the transaction enters the commit phase the 'gencursor' bit that determines which elements are active and which elements should be ignored (because they are no longer valid) is flipped.
This causes the datapath lookup to ignore these elements if they are found during lookup.
This opens up a small race window where pipapo has an inconsistent view of the dataset from when the transaction-cpu flipped the genbit until the transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers:
cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase:
I) increments the genbit A) observes new genbit removes elements from the clone so they won't be found anymore B) lookup in datastructure can't see new elements yet, but old elements are ignored -> Only matches elements that were not changed in the transaction II) calls nft_pipapo_commit(), clone and live pointers are swapped. C New nft_lookup happening now will find matching elements.
Consider a packet matching range r1-r2:
cpu0 processes following transaction: 1. remove r1-r2 2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating the range has been removed.
At the same time, r1-r3 is not visible yet, because it can only be found in the clone.
The situation persists for all lookups until after cpu0 hits II).
The fix is easy: Don't check the genbit from pipapo lookup functions. This is possible because unlike the other set types, the new elements are not reachable from the live copy of the dataset.
The clone/live pointer swap is enough to avoid matching on old elements while at the same time all new elements are exposed in one go.
After this change, step B above returns a match in r1-r2. This is fine: r1-r2 only becomes truly invalid the moment they get freed. This happens after a synchronize_rcu() call and rcu read lock is held via netfilter hook traversal (nf_hook_slow()).
Cc: Stefano Brivio sbrivio@redhat.com Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 20 ++++++++++++++++++-- net/netfilter/nft_set_pipapo_avx2.c | 4 +--- 2 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 1a19649c28511..fa6741b3205a6 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -511,6 +511,23 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m, * * This function is called from the data path. It will search for * an element matching the given key in the current active copy. + * Unlike other set types, this uses NFT_GENMASK_ANY instead of + * nft_genmask_cur(). + * + * This is because new (future) elements are not reachable from + * priv->match, they get added to priv->clone instead. + * When the commit phase flips the generation bitmask, the + * 'now old' entries are skipped but without the 'now current' + * elements becoming visible. Using nft_genmask_cur() thus creates + * inconsistent state: matching old entries get skipped but thew + * newly matching entries are unreachable. + * + * GENMASK will still find the 'now old' entries which ensures consistent + * priv->match view. + * + * nft_pipapo_commit swaps ->clone and ->match shortly after the + * genbit flip. As ->clone doesn't contain the old entries in the first + * place, lookup will only find the now-current ones. * * Return: ntables API extension pointer or NULL if no match. */ @@ -519,12 +536,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set, const u32 *key) { struct nft_pipapo *priv = nft_set_priv(set); - u8 genmask = nft_genmask_cur(net); const struct nft_pipapo_match *m; const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match); - e = pipapo_get(m, (const u8 *)key, genmask, get_jiffies_64()); + e = pipapo_get(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
return e ? &e->ext : NULL; } diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index 2155c7f345c21..39e356c9687a9 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1153,7 +1153,6 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, struct nft_pipapo *priv = nft_set_priv(set); const struct nft_set_ext *ext = NULL; struct nft_pipapo_scratch *scratch; - u8 genmask = nft_genmask_cur(net); const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; const u8 *rp = (const u8 *)key; @@ -1249,8 +1248,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, if (last) { const struct nft_set_ext *e = &f->mt[ret].e->ext;
- if (unlikely(nft_set_elem_expired(e) || - !nft_set_elem_active(e, genmask))) + if (unlikely(nft_set_elem_expired(e))) goto next_match;
ext = e;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit a60f7bf4a1524d8896b76ba89623080aebf44272 ]
When the rbtree lookup function finds a match in the rbtree, it sets the range start interval to a potentially inactive element.
Then, after tree lookup, if the matching element is inactive, it returns NULL and suppresses a matching result.
This is wrong and leads to false negative matches when a transaction has already entered the commit phase.
cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: I) increments the genbit A) observes new genbit B) finds matching range C) returns no match: found range invalid in new generation II) removes old elements from the tree C New nft_lookup happening now will find matching element, because it is no longer obscured by old, inactive one.
Consider a packet matching range r1-r2:
cpu0 processes following transaction: 1. remove r1-r2 2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating the range has been removed. It does NOT test for further matches.
The situation persists for all lookups until after cpu0 hits II) after which r1-r3 range start node is tested for the first time.
Move the "interval start is valid" check ahead so that tree traversal continues if the starting interval is not valid in this generation.
Thanks to Stefan Hanreich for providing an initial reproducer for this bug.
Reported-by: Stefan Hanreich s.hanreich@proxmox.com Fixes: c1eda3c6394f ("netfilter: nft_rbtree: ignore inactive matching element with no descendants") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_rbtree.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index 938a257c069e2..b1f04168ec937 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -77,7 +77,9 @@ __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, nft_rbtree_interval_end(rbe) && nft_rbtree_interval_start(interval)) continue; - interval = rbe; + if (nft_set_elem_active(&rbe->ext, genmask) && + !nft_rbtree_elem_expired(rbe)) + interval = rbe; } else if (d > 0) parent = rcu_dereference_raw(parent->rb_right); else { @@ -102,8 +104,6 @@ __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, }
if (set->flags & NFT_SET_INTERVAL && interval != NULL && - nft_set_elem_active(&interval->ext, genmask) && - !nft_rbtree_elem_expired(interval) && nft_rbtree_interval_start(interval)) return &interval->ext;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit a1050dd071682d2c9d8d6d5c96119f8f401b62f0 ]
Restore commit 28339b21a365 ("netfilter: nf_tables: do not send complete notification of deletions") and fix it:
- Avoid upfront modification of 'event' variable so the conditionals become effective. - Always include NFTA_OBJ_TYPE attribute in object notifications, user space requires it for proper deserialisation. - Catch DESTROY events, too.
Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: b2f742c846ca ("netfilter: nf_tables: restart set lookup on base_seq change") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 67 ++++++++++++++++++++++++++--------- 1 file changed, 50 insertions(+), 17 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 3743e4249dc8c..4430bfa34a993 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1017,9 +1017,9 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net, { struct nlmsghdr *nlh;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, family, - NFNETLINK_V0, nft_base_seq(net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, family, NFNETLINK_V0, nft_base_seq(net)); if (!nlh) goto nla_put_failure;
@@ -1029,6 +1029,12 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net, NFTA_TABLE_PAD)) goto nla_put_failure;
+ if (event == NFT_MSG_DELTABLE || + event == NFT_MSG_DESTROYTABLE) { + nlmsg_end(skb, nlh); + return 0; + } + if (nla_put_be32(skb, NFTA_TABLE_FLAGS, htonl(table->flags & NFT_TABLE_F_MASK))) goto nla_put_failure; @@ -1872,9 +1878,9 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net, { struct nlmsghdr *nlh;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, family, - NFNETLINK_V0, nft_base_seq(net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, family, NFNETLINK_V0, nft_base_seq(net)); if (!nlh) goto nla_put_failure;
@@ -1884,6 +1890,13 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net, NFTA_CHAIN_PAD)) goto nla_put_failure;
+ if (!hook_list && + (event == NFT_MSG_DELCHAIN || + event == NFT_MSG_DESTROYCHAIN)) { + nlmsg_end(skb, nlh); + return 0; + } + if (nft_is_base_chain(chain)) { const struct nft_base_chain *basechain = nft_base_chain(chain); struct nft_stats __percpu *stats; @@ -4654,9 +4667,10 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, u32 seq = ctx->seq; int i;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, ctx->family, - NFNETLINK_V0, nft_base_seq(ctx->net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, ctx->family, NFNETLINK_V0, + nft_base_seq(ctx->net)); if (!nlh) goto nla_put_failure;
@@ -4668,6 +4682,12 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, NFTA_SET_PAD)) goto nla_put_failure;
+ if (event == NFT_MSG_DELSET || + event == NFT_MSG_DESTROYSET) { + nlmsg_end(skb, nlh); + return 0; + } + if (set->flags != 0) if (nla_put_be32(skb, NFTA_SET_FLAGS, htonl(set->flags))) goto nla_put_failure; @@ -7990,20 +8010,26 @@ static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net, { struct nlmsghdr *nlh;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, family, - NFNETLINK_V0, nft_base_seq(net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, family, NFNETLINK_V0, nft_base_seq(net)); if (!nlh) goto nla_put_failure;
if (nla_put_string(skb, NFTA_OBJ_TABLE, table->name) || nla_put_string(skb, NFTA_OBJ_NAME, obj->key.name) || + nla_put_be32(skb, NFTA_OBJ_TYPE, htonl(obj->ops->type->type)) || nla_put_be64(skb, NFTA_OBJ_HANDLE, cpu_to_be64(obj->handle), NFTA_OBJ_PAD)) goto nla_put_failure;
- if (nla_put_be32(skb, NFTA_OBJ_TYPE, htonl(obj->ops->type->type)) || - nla_put_be32(skb, NFTA_OBJ_USE, htonl(obj->use)) || + if (event == NFT_MSG_DELOBJ || + event == NFT_MSG_DESTROYOBJ) { + nlmsg_end(skb, nlh); + return 0; + } + + if (nla_put_be32(skb, NFTA_OBJ_USE, htonl(obj->use)) || nft_object_dump(skb, NFTA_OBJ_DATA, obj, reset)) goto nla_put_failure;
@@ -9008,9 +9034,9 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net, struct nft_hook *hook; struct nlmsghdr *nlh;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); - nlh = nfnl_msg_put(skb, portid, seq, event, flags, family, - NFNETLINK_V0, nft_base_seq(net)); + nlh = nfnl_msg_put(skb, portid, seq, + nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), + flags, family, NFNETLINK_V0, nft_base_seq(net)); if (!nlh) goto nla_put_failure;
@@ -9020,6 +9046,13 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net, NFTA_FLOWTABLE_PAD)) goto nla_put_failure;
+ if (!hook_list && + (event == NFT_MSG_DELFLOWTABLE || + event == NFT_MSG_DESTROYFLOWTABLE)) { + nlmsg_end(skb, nlh); + return 0; + } + if (nla_put_be32(skb, NFTA_FLOWTABLE_USE, htonl(flowtable->use)) || nla_put_be32(skb, NFTA_FLOWTABLE_FLAGS, htonl(flowtable->data.flags))) goto nla_put_failure;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 64102d9bbc3d41dac5188b8fba75b1344c438970 ]
This will soon be read from packet path around same time as the gencursor.
Both gencursor and base_seq get incremented almost at the same time, so it makes sense to place them in the same structure.
This doesn't increase struct net size on 64bit due to padding.
Signed-off-by: Florian Westphal fw@strlen.de Stable-dep-of: b2f742c846ca ("netfilter: nf_tables: restart set lookup on base_seq change") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 1 - include/net/netns/nftables.h | 1 + net/netfilter/nf_tables_api.c | 65 ++++++++++++++++--------------- 3 files changed, 34 insertions(+), 33 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index bad0c6f7ed53d..ee550229d4ffa 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1909,7 +1909,6 @@ struct nftables_pernet { struct mutex commit_mutex; u64 table_handle; u64 tstamp; - unsigned int base_seq; unsigned int gc_seq; u8 validate_state; struct work_struct destroy_work; diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h index cc8060c017d5f..99dd166c5d07c 100644 --- a/include/net/netns/nftables.h +++ b/include/net/netns/nftables.h @@ -3,6 +3,7 @@ #define _NETNS_NFTABLES_H_
struct netns_nftables { + unsigned int base_seq; u8 gencursor; };
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 4430bfa34a993..9422b54ab2c25 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -995,11 +995,14 @@ nf_tables_chain_type_lookup(struct net *net, const struct nlattr *nla, return ERR_PTR(-ENOENT); }
-static __be16 nft_base_seq(const struct net *net) +static unsigned int nft_base_seq(const struct net *net) { - struct nftables_pernet *nft_net = nft_pernet(net); + return READ_ONCE(net->nft.base_seq); +}
- return htons(nft_net->base_seq & 0xffff); +static __be16 nft_base_seq_be16(const struct net *net) +{ + return htons(nft_base_seq(net) & 0xffff); }
static const struct nla_policy nft_table_policy[NFTA_TABLE_MAX + 1] = { @@ -1019,7 +1022,7 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), - flags, family, NFNETLINK_V0, nft_base_seq(net)); + flags, family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -1112,7 +1115,7 @@ static int nf_tables_dump_tables(struct sk_buff *skb,
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -1880,7 +1883,7 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), - flags, family, NFNETLINK_V0, nft_base_seq(net)); + flags, family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -1983,7 +1986,7 @@ static int nf_tables_dump_chains(struct sk_buff *skb,
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -3480,7 +3483,7 @@ static int nf_tables_fill_rule_info(struct sk_buff *skb, struct net *net, u16 type = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event);
nlh = nfnl_msg_put(skb, portid, seq, type, flags, family, NFNETLINK_V0, - nft_base_seq(net)); + nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -3648,7 +3651,7 @@ static int nf_tables_dump_rules(struct sk_buff *skb,
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -3859,7 +3862,7 @@ static int nf_tables_getrule_reset(struct sk_buff *skb, buf = kasprintf(GFP_ATOMIC, "%.*s:%u", nla_len(nla[NFTA_RULE_TABLE]), (char *)nla_data(nla[NFTA_RULE_TABLE]), - nft_net->base_seq); + nft_base_seq(net)); audit_log_nfcfg(buf, info->nfmsg->nfgen_family, 1, AUDIT_NFT_OP_RULE_RESET, GFP_ATOMIC); kfree(buf); @@ -4670,7 +4673,7 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), flags, ctx->family, NFNETLINK_V0, - nft_base_seq(ctx->net)); + nft_base_seq_be16(ctx->net)); if (!nlh) goto nla_put_failure;
@@ -4812,7 +4815,7 @@ static int nf_tables_dump_sets(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (ctx->family != NFPROTO_UNSPEC && @@ -5988,7 +5991,7 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (dump_ctx->ctx.family != NFPROTO_UNSPEC && @@ -6017,7 +6020,7 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) seq = cb->nlh->nlmsg_seq;
nlh = nfnl_msg_put(skb, portid, seq, event, NLM_F_MULTI, - table->family, NFNETLINK_V0, nft_base_seq(net)); + table->family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -6110,7 +6113,7 @@ static int nf_tables_fill_setelem_info(struct sk_buff *skb,
event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); nlh = nfnl_msg_put(skb, portid, seq, event, flags, ctx->family, - NFNETLINK_V0, nft_base_seq(ctx->net)); + NFNETLINK_V0, nft_base_seq_be16(ctx->net)); if (!nlh) goto nla_put_failure;
@@ -6409,7 +6412,7 @@ static int nf_tables_getsetelem_reset(struct sk_buff *skb, } nelems++; } - audit_log_nft_set_reset(dump_ctx.ctx.table, nft_net->base_seq, nelems); + audit_log_nft_set_reset(dump_ctx.ctx.table, nft_base_seq(info->net), nelems);
out_unlock: rcu_read_unlock(); @@ -8012,7 +8015,7 @@ static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), - flags, family, NFNETLINK_V0, nft_base_seq(net)); + flags, family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -8077,7 +8080,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -8111,7 +8114,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) idx++; } if (ctx->reset && entries) - audit_log_obj_reset(table, nft_net->base_seq, entries); + audit_log_obj_reset(table, nft_base_seq(net), entries); if (rc < 0) break; } @@ -8280,7 +8283,7 @@ static int nf_tables_getobj_reset(struct sk_buff *skb, buf = kasprintf(GFP_ATOMIC, "%.*s:%u", nla_len(nla[NFTA_OBJ_TABLE]), (char *)nla_data(nla[NFTA_OBJ_TABLE]), - nft_net->base_seq); + nft_base_seq(net)); audit_log_nfcfg(buf, info->nfmsg->nfgen_family, 1, AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); kfree(buf); @@ -8385,9 +8388,8 @@ void nft_obj_notify(struct net *net, const struct nft_table *table, struct nft_object *obj, u32 portid, u32 seq, int event, u16 flags, int family, int report, gfp_t gfp) { - struct nftables_pernet *nft_net = nft_pernet(net); char *buf = kasprintf(gfp, "%s:%u", - table->name, nft_net->base_seq); + table->name, nft_base_seq(net));
audit_log_nfcfg(buf, family, @@ -9036,7 +9038,7 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq, nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event), - flags, family, NFNETLINK_V0, nft_base_seq(net)); + flags, family, NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
@@ -9104,7 +9106,7 @@ static int nf_tables_dump_flowtable(struct sk_buff *skb,
rcu_read_lock(); nft_net = nft_pernet(net); - cb->seq = READ_ONCE(nft_net->base_seq); + cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) { if (family != NFPROTO_UNSPEC && family != table->family) @@ -9289,17 +9291,16 @@ static void nf_tables_flowtable_destroy(struct nft_flowtable *flowtable) static int nf_tables_fill_gen_info(struct sk_buff *skb, struct net *net, u32 portid, u32 seq) { - struct nftables_pernet *nft_net = nft_pernet(net); struct nlmsghdr *nlh; char buf[TASK_COMM_LEN]; int event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, NFT_MSG_NEWGEN);
nlh = nfnl_msg_put(skb, portid, seq, event, 0, AF_UNSPEC, - NFNETLINK_V0, nft_base_seq(net)); + NFNETLINK_V0, nft_base_seq_be16(net)); if (!nlh) goto nla_put_failure;
- if (nla_put_be32(skb, NFTA_GEN_ID, htonl(nft_net->base_seq)) || + if (nla_put_be32(skb, NFTA_GEN_ID, htonl(nft_base_seq(net))) || nla_put_be32(skb, NFTA_GEN_PROC_PID, htonl(task_pid_nr(current))) || nla_put_string(skb, NFTA_GEN_PROC_NAME, get_task_comm(buf, current))) goto nla_put_failure; @@ -10462,11 +10463,11 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) * Bump generation counter, invalidate any dump in progress. * Cannot fail after this point. */ - base_seq = READ_ONCE(nft_net->base_seq); + base_seq = nft_base_seq(net); while (++base_seq == 0) ;
- WRITE_ONCE(nft_net->base_seq, base_seq); + WRITE_ONCE(net->nft.base_seq, base_seq);
gc_seq = nft_gc_seq_begin(nft_net);
@@ -10698,7 +10699,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
nft_commit_notify(net, NETLINK_CB(skb).portid); nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN); - nf_tables_commit_audit_log(&adl, nft_net->base_seq); + nf_tables_commit_audit_log(&adl, nft_base_seq(net));
nft_gc_seq_end(nft_net, gc_seq); nft_net->validate_state = NFT_VALIDATE_SKIP; @@ -11032,7 +11033,7 @@ static bool nf_tables_valid_genid(struct net *net, u32 genid) mutex_lock(&nft_net->commit_mutex); nft_net->tstamp = get_jiffies_64();
- genid_ok = genid == 0 || nft_net->base_seq == genid; + genid_ok = genid == 0 || nft_base_seq(net) == genid; if (!genid_ok) mutex_unlock(&nft_net->commit_mutex);
@@ -11710,7 +11711,7 @@ static int __net_init nf_tables_init_net(struct net *net) INIT_LIST_HEAD(&nft_net->module_list); INIT_LIST_HEAD(&nft_net->notify_list); mutex_init(&nft_net->commit_mutex); - nft_net->base_seq = 1; + net->nft.base_seq = 1; nft_net->gc_seq = 0; nft_net->validate_state = NFT_VALIDATE_SKIP; INIT_WORK(&nft_net->destroy_work, nf_tables_trans_destroy_work);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 11fe5a82e53ac3581a80c88e0e35fb8a80e15f48 ]
This function was added for retpoline mitigation and is replaced by a static inline helper if mitigations are not enabled.
Enable this helper function unconditionally so next patch can add a lookup restart mechanism to fix possible false negatives while transactions are in progress.
Adding lookup restarts in nft_lookup_eval doesn't work as nft_objref would then need the same copypaste loop.
This patch is separate to ease review of the actual bug fix.
Suggested-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Stable-dep-of: b2f742c846ca ("netfilter: nf_tables: restart set lookup on base_seq change") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables_core.h | 10 ++-------- net/netfilter/nft_lookup.c | 17 ++++++++++++----- 2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h index 6a52fb97b8443..04699eac5b524 100644 --- a/include/net/netfilter/nf_tables_core.h +++ b/include/net/netfilter/nf_tables_core.h @@ -109,17 +109,11 @@ nft_hash_lookup_fast(const struct net *net, const struct nft_set *set, const struct nft_set_ext * nft_hash_lookup(const struct net *net, const struct nft_set *set, const u32 *key); +#endif + const struct nft_set_ext * nft_set_do_lookup(const struct net *net, const struct nft_set *set, const u32 *key); -#else -static inline const struct nft_set_ext * -nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key) -{ - return set->ops->lookup(net, set, key); -} -#endif
/* called from nft_pipapo_avx2.c */ const struct nft_set_ext * diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 40c602ffbcba7..2c6909bf1b407 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -24,11 +24,11 @@ struct nft_lookup { struct nft_set_binding binding; };
-#ifdef CONFIG_MITIGATION_RETPOLINE -const struct nft_set_ext * -nft_set_do_lookup(const struct net *net, const struct nft_set *set, - const u32 *key) +static const struct nft_set_ext * +__nft_set_do_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) { +#ifdef CONFIG_MITIGATION_RETPOLINE if (set->ops == &nft_set_hash_fast_type.ops) return nft_hash_lookup_fast(net, set, key); if (set->ops == &nft_set_hash_type.ops) @@ -51,10 +51,17 @@ nft_set_do_lookup(const struct net *net, const struct nft_set *set, return nft_rbtree_lookup(net, set, key);
WARN_ON_ONCE(1); +#endif return set->ops->lookup(net, set, key); } + +const struct nft_set_ext * +nft_set_do_lookup(const struct net *net, const struct nft_set *set, + const u32 *key) +{ + return __nft_set_do_lookup(net, set, key); +} EXPORT_SYMBOL_GPL(nft_set_do_lookup); -#endif
void nft_lookup_eval(const struct nft_expr *expr, struct nft_regs *regs,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit b2f742c846cab9afc5953a5d8f17b54922dcc723 ]
The hash, hash_fast, rhash and bitwise sets may indicate no result even though a matching element exists during a short time window while other cpu is finalizing the transaction.
This happens when the hash lookup/bitwise lookup function has picked up the old genbit, right before it was toggled by nf_tables_commit(), but then the same cpu managed to unlink the matching old element from the hash table:
cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: A) observes old genbit increments base_seq I) increments the genbit II) removes old element from the set B) finds matching element C) returns no match: found element is not valid in old generation
Next lookup observes new genbit and finds matching e2.
Consider a packet matching element e1, e2.
cpu0 processes following transaction: 1. remove e1 2. adds e2, which has same key as e1.
P matches both e1 and e2. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case:
cpu1 observed the old genbit. e2 will not be considered once it is found. The element e1 is not found anymore if cpu0 managed to unlink it from the hlist before cpu1 found it during list traversal.
The situation only occurs for a brief time period, lookups happening after I) observe new genbit and return e2.
This problem exists in all set types except nft_set_pipapo, so fix it once in nft_lookup rather than each set ops individually.
Sample the base sequence counter, which gets incremented right before the genbit is changed.
Then, if no match is found, retry the lookup if the base sequence was altered in between.
If the base sequence hasn't changed: - No update took place: no-match result is expected. This is the common case. or: - nf_tables_commit() hasn't progressed to genbit update yet. Old elements were still visible and nomatch result is expected, or: - nf_tables_commit updated the genbit: We picked up the new base_seq, so the lookup function also picked up the new genbit, no-match result is expected.
If the old genbit was observed, then nft_lookup also picked up the old base_seq: nft_lookup_should_retry() returns true and relookup is performed in the new generation.
This problem was added when the unconditional synchronize_rcu() call that followed the current/next generation bit toggle was removed.
Thanks to Pablo Neira Ayuso for reviewing an earlier version of this patchset, for suggesting re-use of existing base_seq and placement of the restart loop in nft_set_do_lookup().
Fixes: 0cbc06b3faba ("netfilter: nf_tables: remove synchronize_rcu in commit phase") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 3 ++- net/netfilter/nft_lookup.c | 31 ++++++++++++++++++++++++++++++- 2 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 9422b54ab2c25..3028d388b2933 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10467,7 +10467,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) while (++base_seq == 0) ;
- WRITE_ONCE(net->nft.base_seq, base_seq); + /* pairs with smp_load_acquire in nft_lookup_eval */ + smp_store_release(&net->nft.base_seq, base_seq);
gc_seq = nft_gc_seq_begin(nft_net);
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 2c6909bf1b407..58c5b14889c47 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -55,11 +55,40 @@ __nft_set_do_lookup(const struct net *net, const struct nft_set *set, return set->ops->lookup(net, set, key); }
+static unsigned int nft_base_seq(const struct net *net) +{ + /* pairs with smp_store_release() in nf_tables_commit() */ + return smp_load_acquire(&net->nft.base_seq); +} + +static bool nft_lookup_should_retry(const struct net *net, unsigned int seq) +{ + return unlikely(seq != nft_base_seq(net)); +} + const struct nft_set_ext * nft_set_do_lookup(const struct net *net, const struct nft_set *set, const u32 *key) { - return __nft_set_do_lookup(net, set, key); + const struct nft_set_ext *ext; + unsigned int base_seq; + + do { + base_seq = nft_base_seq(net); + + ext = __nft_set_do_lookup(net, set, key); + if (ext) + break; + /* No match? There is a small chance that lookup was + * performed in the old generation, but nf_tables_commit() + * already unlinked a (matching) element. + * + * We need to repeat the lookup to make sure that we didn't + * miss a matching element in the new generation. + */ + } while (nft_lookup_should_retry(net, base_seq)); + + return ext; } EXPORT_SYMBOL_GPL(nft_set_do_lookup);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Murali Karicheri m-karicheri2@ti.com
[ Upstream commit 1a8a63a5305e95519de6f941922dfcd8179f82e5 ]
This patch adds support for VLAN ctag based filtering at slave devices. The slave ethernet device may be capable of filtering ethernet packets based on VLAN ID. This requires that when the VLAN interface is created over an HSR/PRP interface, it passes the VID information to the associated slave ethernet devices so that it updates the hardware filters to filter ethernet frames based on VID. This patch adds the required functions to propagate the vid information to the slave devices.
Signed-off-by: Murali Karicheri m-karicheri2@ti.com Signed-off-by: MD Danish Anwar danishanwar@ti.com Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://patch.msgid.link/20241106091710.3308519-3-danishanwar@ti.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 8884c6939913 ("hsr: use rtnl lock when iterating over ports") Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_device.c | 80 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 79 insertions(+), 1 deletion(-)
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index 9d0754b3642fd..9f1106bdd4f09 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -522,6 +522,77 @@ static void hsr_change_rx_flags(struct net_device *dev, int change) } }
+static int hsr_ndo_vlan_rx_add_vid(struct net_device *dev, + __be16 proto, u16 vid) +{ + bool is_slave_a_added = false; + bool is_slave_b_added = false; + struct hsr_port *port; + struct hsr_priv *hsr; + int ret = 0; + + hsr = netdev_priv(dev); + + hsr_for_each_port(hsr, port) { + if (port->type == HSR_PT_MASTER || + port->type == HSR_PT_INTERLINK) + continue; + + ret = vlan_vid_add(port->dev, proto, vid); + switch (port->type) { + case HSR_PT_SLAVE_A: + if (ret) { + /* clean up Slave-B */ + netdev_err(dev, "add vid failed for Slave-A\n"); + if (is_slave_b_added) + vlan_vid_del(port->dev, proto, vid); + return ret; + } + + is_slave_a_added = true; + break; + + case HSR_PT_SLAVE_B: + if (ret) { + /* clean up Slave-A */ + netdev_err(dev, "add vid failed for Slave-B\n"); + if (is_slave_a_added) + vlan_vid_del(port->dev, proto, vid); + return ret; + } + + is_slave_b_added = true; + break; + default: + break; + } + } + + return 0; +} + +static int hsr_ndo_vlan_rx_kill_vid(struct net_device *dev, + __be16 proto, u16 vid) +{ + struct hsr_port *port; + struct hsr_priv *hsr; + + hsr = netdev_priv(dev); + + hsr_for_each_port(hsr, port) { + switch (port->type) { + case HSR_PT_SLAVE_A: + case HSR_PT_SLAVE_B: + vlan_vid_del(port->dev, proto, vid); + break; + default: + break; + } + } + + return 0; +} + static const struct net_device_ops hsr_device_ops = { .ndo_change_mtu = hsr_dev_change_mtu, .ndo_open = hsr_dev_open, @@ -530,6 +601,8 @@ static const struct net_device_ops hsr_device_ops = { .ndo_change_rx_flags = hsr_change_rx_flags, .ndo_fix_features = hsr_fix_features, .ndo_set_rx_mode = hsr_set_rx_mode, + .ndo_vlan_rx_add_vid = hsr_ndo_vlan_rx_add_vid, + .ndo_vlan_rx_kill_vid = hsr_ndo_vlan_rx_kill_vid, };
static const struct device_type hsr_type = { @@ -578,7 +651,8 @@ void hsr_dev_setup(struct net_device *dev)
dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA | NETIF_F_GSO_MASK | NETIF_F_HW_CSUM | - NETIF_F_HW_VLAN_CTAG_TX; + NETIF_F_HW_VLAN_CTAG_TX | + NETIF_F_HW_VLAN_CTAG_FILTER;
dev->features = dev->hw_features;
@@ -661,6 +735,10 @@ int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2], (slave[1]->features & NETIF_F_HW_HSR_FWD)) hsr->fwd_offloaded = true;
+ if ((slave[0]->features & NETIF_F_HW_VLAN_CTAG_FILTER) && + (slave[1]->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + hsr_dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER; + res = register_netdevice(hsr_dev); if (res) goto err_unregister;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 8884c693991333ae065830554b9b0c96590b1bb2 ]
hsr_for_each_port is called in many places without holding the RCU read lock, this may trigger warnings on debug kernels. Most of the callers are actually hold rtnl lock. So add a new helper hsr_for_each_port_rtnl to allow callers in suitable contexts to iterate ports safely without explicit RCU locking.
This patch only fixed the callers that is hold rtnl lock. Other caller issues will be fixed in later patches.
Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250905091533.377443-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_device.c | 18 +++++++++--------- net/hsr/hsr_main.c | 2 +- net/hsr/hsr_main.h | 3 +++ 3 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index 9f1106bdd4f09..56b49d477bec0 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -49,7 +49,7 @@ static bool hsr_check_carrier(struct hsr_port *master)
ASSERT_RTNL();
- hsr_for_each_port(master->hsr, port) { + hsr_for_each_port_rtnl(master->hsr, port) { if (port->type != HSR_PT_MASTER && is_slave_up(port->dev)) { netif_carrier_on(master->dev); return true; @@ -105,7 +105,7 @@ int hsr_get_max_mtu(struct hsr_priv *hsr) struct hsr_port *port;
mtu_max = ETH_DATA_LEN; - hsr_for_each_port(hsr, port) + hsr_for_each_port_rtnl(hsr, port) if (port->type != HSR_PT_MASTER) mtu_max = min(port->dev->mtu, mtu_max);
@@ -139,7 +139,7 @@ static int hsr_dev_open(struct net_device *dev)
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { @@ -172,7 +172,7 @@ static int hsr_dev_close(struct net_device *dev) struct hsr_priv *hsr;
hsr = netdev_priv(dev); - hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { @@ -205,7 +205,7 @@ static netdev_features_t hsr_features_recompute(struct hsr_priv *hsr, * may become enabled. */ features &= ~NETIF_F_ONE_FOR_ALL; - hsr_for_each_port(hsr, port) + hsr_for_each_port_rtnl(hsr, port) features = netdev_increment_features(features, port->dev->features, mask); @@ -483,7 +483,7 @@ static void hsr_set_rx_mode(struct net_device *dev)
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { @@ -505,7 +505,7 @@ static void hsr_change_rx_flags(struct net_device *dev, int change)
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER) continue; switch (port->type) { @@ -533,7 +533,7 @@ static int hsr_ndo_vlan_rx_add_vid(struct net_device *dev,
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { if (port->type == HSR_PT_MASTER || port->type == HSR_PT_INTERLINK) continue; @@ -579,7 +579,7 @@ static int hsr_ndo_vlan_rx_kill_vid(struct net_device *dev,
hsr = netdev_priv(dev);
- hsr_for_each_port(hsr, port) { + hsr_for_each_port_rtnl(hsr, port) { switch (port->type) { case HSR_PT_SLAVE_A: case HSR_PT_SLAVE_B: diff --git a/net/hsr/hsr_main.c b/net/hsr/hsr_main.c index d7ae32473c41a..e1afa54b3fba8 100644 --- a/net/hsr/hsr_main.c +++ b/net/hsr/hsr_main.c @@ -22,7 +22,7 @@ static bool hsr_slave_empty(struct hsr_priv *hsr) { struct hsr_port *port;
- hsr_for_each_port(hsr, port) + hsr_for_each_port_rtnl(hsr, port) if (port->type != HSR_PT_MASTER) return false; return true; diff --git a/net/hsr/hsr_main.h b/net/hsr/hsr_main.h index e26244456f639..f066c9c401c60 100644 --- a/net/hsr/hsr_main.h +++ b/net/hsr/hsr_main.h @@ -231,6 +231,9 @@ struct hsr_priv { #define hsr_for_each_port(hsr, port) \ list_for_each_entry_rcu((port), &(hsr)->ports, port_list)
+#define hsr_for_each_port_rtnl(hsr, port) \ + list_for_each_entry_rcu((port), &(hsr)->ports, port_list, lockdep_rtnl_is_held()) + struct hsr_port *hsr_port_get_hsr(struct hsr_priv *hsr, enum hsr_port_type pt);
/* Caller must ensure skb is a valid HSR frame */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 393c841fe4333cdd856d0ca37b066d72746cfaa6 ]
hsr_port_get_hsr() iterates over ports using hsr_for_each_port(), but many of its callers do not hold the required RCU lock.
Switch to hsr_for_each_port_rtnl(), since most callers already hold the rtnl lock. After review, all callers are covered by either the rtnl lock or the RCU lock, except hsr_dev_xmit(). Fix this by adding an RCU read lock there.
Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20250905091533.377443-3-liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_device.c | 3 +++ net/hsr/hsr_main.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index 56b49d477bec0..d2ae9fbed9e30 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -226,6 +226,7 @@ static netdev_tx_t hsr_dev_xmit(struct sk_buff *skb, struct net_device *dev) struct hsr_priv *hsr = netdev_priv(dev); struct hsr_port *master;
+ rcu_read_lock(); master = hsr_port_get_hsr(hsr, HSR_PT_MASTER); if (master) { skb->dev = master->dev; @@ -238,6 +239,8 @@ static netdev_tx_t hsr_dev_xmit(struct sk_buff *skb, struct net_device *dev) dev_core_stats_tx_dropped_inc(dev); dev_kfree_skb_any(skb); } + rcu_read_unlock(); + return NETDEV_TX_OK; }
diff --git a/net/hsr/hsr_main.c b/net/hsr/hsr_main.c index e1afa54b3fba8..9e4ce1ccc0229 100644 --- a/net/hsr/hsr_main.c +++ b/net/hsr/hsr_main.c @@ -125,7 +125,7 @@ struct hsr_port *hsr_port_get_hsr(struct hsr_priv *hsr, enum hsr_port_type pt) { struct hsr_port *port;
- hsr_for_each_port(hsr, port) + hsr_for_each_port_rtnl(hsr, port) if (port->type == pt) return port; return NULL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pengyu Luo mitltlatltl@gmail.com
[ Upstream commit 942e47ab228c7dd27c2ae043c17e7aab2028082c ]
property "qcom,tune-usb2-preem" is for EUSB2_TUNE_USB2_PREEM property "qcom,tune-usb2-amplitude" is for EUSB2_TUNE_IUSB2
The downstream correspondence is as follows: EUSB2_TUNE_USB2_PREEM: Tx pre-emphasis tuning EUSB2_TUNE_IUSB2: HS trasmit amplitude EUSB2_TUNE_SQUELCH_U: Squelch detection threshold EUSB2_TUNE_HSDISC: HS disconnect threshold EUSB2_TUNE_EUSB_SLEW: slew rate
Fixes: 31bc94de7602 ("phy: qualcomm: phy-qcom-eusb2-repeater: Don't zero-out registers") Signed-off-by: Pengyu Luo mitltlatltl@gmail.com Reviewed-by: Konrad Dybcio konrad.dybcio@oss.qualcomm.com Reviewed-by: Luca Weiss luca.weiss@fairphone.com Link: https://lore.kernel.org/r/20250812093957.32235-1-mitltlatltl@gmail.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c b/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c index 163950e16dbe1..c173c6244d9e5 100644 --- a/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c +++ b/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c @@ -127,13 +127,13 @@ static int eusb2_repeater_init(struct phy *phy) rptr->cfg->init_tbl[i].value);
/* Override registers from devicetree values */ - if (!of_property_read_u8(np, "qcom,tune-usb2-amplitude", &val)) + if (!of_property_read_u8(np, "qcom,tune-usb2-preem", &val)) regmap_write(regmap, base + EUSB2_TUNE_USB2_PREEM, val);
if (!of_property_read_u8(np, "qcom,tune-usb2-disc-thres", &val)) regmap_write(regmap, base + EUSB2_TUNE_HSDISC, val);
- if (!of_property_read_u8(np, "qcom,tune-usb2-preem", &val)) + if (!of_property_read_u8(np, "qcom,tune-usb2-amplitude", &val)) regmap_write(regmap, base + EUSB2_TUNE_IUSB2, val);
/* Wait for status OK */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yi Sun yi.sun@intel.com
[ Upstream commit f41c538881eec4dcf5961a242097d447f848cda6 ]
The call to idxd_free() introduces a duplicate put_device() leading to a reference count underflow: refcount_t: underflow; use-after-free. WARNING: CPU: 15 PID: 4428 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110 ... Call Trace: <TASK> idxd_remove+0xe4/0x120 [idxd] pci_device_remove+0x3f/0xb0 device_release_driver_internal+0x197/0x200 driver_detach+0x48/0x90 bus_remove_driver+0x74/0xf0 pci_unregister_driver+0x2e/0xb0 idxd_exit_module+0x34/0x7a0 [idxd] __do_sys_delete_module.constprop.0+0x183/0x280 do_syscall_64+0x54/0xd70 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The idxd_unregister_devices() which is invoked at the very beginning of idxd_remove(), already takes care of the necessary put_device() through the following call path: idxd_unregister_devices() -> device_unregister() -> put_device()
In addition, when CONFIG_DEBUG_KOBJECT_RELEASE is enabled, put_device() may trigger asynchronous cleanup via schedule_delayed_work(). If idxd_free() is called immediately after, it can result in a use-after-free.
Remove the improper idxd_free() to avoid both the refcount underflow and potential memory corruption during module unload.
Fixes: d5449ff1b04d ("dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call") Signed-off-by: Yi Sun yi.sun@intel.com Tested-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Dave Jiang dave.jiang@intel.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com
Link: https://lore.kernel.org/r/20250729150313.1934101-2-yi.sun@intel.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/idxd/init.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 18997f80bdc97..d6308adc4eeb9 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -921,7 +921,6 @@ static void idxd_remove(struct pci_dev *pdev) idxd_cleanup(idxd); pci_iounmap(pdev, idxd->reg_base); put_device(idxd_confdev(idxd)); - idxd_free(idxd); pci_disable_device(pdev); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yi Sun yi.sun@intel.com
[ Upstream commit b7cb9a034305d52222433fad10c3de10204f29e7 ]
A recent refactor introduced a misplaced put_device() call, resulting in a reference count underflow during module unload.
There is no need to add additional put_device() calls for idxd groups, engines, or workqueues. Although the commit claims: "Note, this also fixes the missing put_device() for idxd groups, engines, and wqs."
It appears no such omission actually existed. The required cleanup is already handled by the call chain: idxd_unregister_devices() -> device_unregister() -> put_device()
Extend idxd_cleanup() to handle the remaining necessary cleanup and remove idxd_cleanup_internals(), which duplicates deallocation logic for idxd, engines, groups, and workqueues. Memory management is also properly handled through the Linux device model.
Fixes: a409e919ca32 ("dmaengine: idxd: Refactor remove call with idxd_cleanup() helper") Signed-off-by: Yi Sun yi.sun@intel.com Tested-by: Shuai Xue xueshuai@linux.alibaba.com Reviewed-by: Dave Jiang dave.jiang@intel.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com
Link: https://lore.kernel.org/r/20250729150313.1934101-3-yi.sun@intel.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/idxd/init.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index d6308adc4eeb9..a7344aac8dd98 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -918,7 +918,10 @@ static void idxd_remove(struct pci_dev *pdev) device_unregister(idxd_confdev(idxd)); idxd_shutdown(pdev); idxd_device_remove_debugfs(idxd); - idxd_cleanup(idxd); + perfmon_pmu_remove(idxd); + idxd_cleanup_interrupts(idxd); + if (device_pasid_enabled(idxd)) + idxd_disable_system_pasid(idxd); pci_iounmap(pdev, idxd->reg_base); put_device(idxd_confdev(idxd)); pci_disable_device(pdev);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 39aaa337449e71a41d4813be0226a722827ba606 ]
The clean up in idxd_setup_wqs() has had a couple bugs because the error handling is a bit subtle. It's simpler to just re-write it in a cleaner way. The issues here are:
1) If "idxd->max_wqs" is <= 0 then we call put_device(conf_dev) when "conf_dev" hasn't been initialized. 2) If kzalloc_node() fails then again "conf_dev" is invalid. It's either uninitialized or it points to the "conf_dev" from the previous iteration so it leads to a double free.
It's better to free partial loop iterations within the loop and then the unwinding at the end can handle whole loop iterations. I also renamed the labels to describe what the goto does and not where the goto was located.
Fixes: 3fd2f4bc010c ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs") Reported-by: Colin Ian King colin.i.king@gmail.com Closes: https://lore.kernel.org/all/20250811095836.1642093-1-colin.i.king@gmail.com/ Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/aJnJW3iYTDDCj9sk@stanley.mountain Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/idxd/init.c | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index a7344aac8dd98..74a83203181d6 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -187,27 +187,30 @@ static int idxd_setup_wqs(struct idxd_device *idxd) idxd->wq_enable_map = bitmap_zalloc_node(idxd->max_wqs, GFP_KERNEL, dev_to_node(dev)); if (!idxd->wq_enable_map) { rc = -ENOMEM; - goto err_bitmap; + goto err_free_wqs; }
for (i = 0; i < idxd->max_wqs; i++) { wq = kzalloc_node(sizeof(*wq), GFP_KERNEL, dev_to_node(dev)); if (!wq) { rc = -ENOMEM; - goto err; + goto err_unwind; }
idxd_dev_set_type(&wq->idxd_dev, IDXD_DEV_WQ); conf_dev = wq_confdev(wq); wq->id = i; wq->idxd = idxd; - device_initialize(wq_confdev(wq)); + device_initialize(conf_dev); conf_dev->parent = idxd_confdev(idxd); conf_dev->bus = &dsa_bus_type; conf_dev->type = &idxd_wq_device_type; rc = dev_set_name(conf_dev, "wq%d.%d", idxd->id, wq->id); - if (rc < 0) - goto err; + if (rc < 0) { + put_device(conf_dev); + kfree(wq); + goto err_unwind; + }
mutex_init(&wq->wq_lock); init_waitqueue_head(&wq->err_queue); @@ -218,15 +221,20 @@ static int idxd_setup_wqs(struct idxd_device *idxd) wq->enqcmds_retries = IDXD_ENQCMDS_RETRIES; wq->wqcfg = kzalloc_node(idxd->wqcfg_size, GFP_KERNEL, dev_to_node(dev)); if (!wq->wqcfg) { + put_device(conf_dev); + kfree(wq); rc = -ENOMEM; - goto err; + goto err_unwind; }
if (idxd->hw.wq_cap.op_config) { wq->opcap_bmap = bitmap_zalloc(IDXD_MAX_OPCAP_BITS, GFP_KERNEL); if (!wq->opcap_bmap) { + kfree(wq->wqcfg); + put_device(conf_dev); + kfree(wq); rc = -ENOMEM; - goto err_opcap_bmap; + goto err_unwind; } bitmap_copy(wq->opcap_bmap, idxd->opcap_bmap, IDXD_MAX_OPCAP_BITS); } @@ -237,13 +245,7 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
return 0;
-err_opcap_bmap: - kfree(wq->wqcfg); - -err: - put_device(conf_dev); - kfree(wq); - +err_unwind: while (--i >= 0) { wq = idxd->wqs[i]; if (idxd->hw.wq_cap.op_config) @@ -252,11 +254,10 @@ static int idxd_setup_wqs(struct idxd_device *idxd) conf_dev = wq_confdev(wq); put_device(conf_dev); kfree(wq); - } bitmap_free(idxd->wq_enable_map);
-err_bitmap: +err_free_wqs: kfree(idxd->wqs);
return rc;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anders Roxell anders.roxell@linaro.org
[ Upstream commit e63419dbf2ceb083c1651852209c7f048089ac0f ]
Fix a critical memory allocation bug in edma_setup_from_hw() where queue_priority_map was allocated with insufficient memory. The code declared queue_priority_map as s8 (*)[2] (pointer to array of 2 s8), but allocated memory using sizeof(s8) instead of the correct size.
This caused out-of-bounds memory writes when accessing: queue_priority_map[i][0] = i; queue_priority_map[i][1] = i;
The bug manifested as kernel crashes with "Oops - undefined instruction" on ARM platforms (BeagleBoard-X15) during EDMA driver probe, as the memory corruption triggered kernel hardening features on Clang.
Change the allocation to use sizeof(*queue_priority_map) which automatically gets the correct size for the 2D array structure.
Fixes: 2b6b3b742019 ("ARM/dmaengine: edma: Merge the two drivers under drivers/dma/") Signed-off-by: Anders Roxell anders.roxell@linaro.org Link: https://lore.kernel.org/r/20250830094953.3038012-1-anders.roxell@linaro.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/ti/edma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/ti/edma.c b/drivers/dma/ti/edma.c index 7f861fb07cb83..6333426b4c96c 100644 --- a/drivers/dma/ti/edma.c +++ b/drivers/dma/ti/edma.c @@ -2063,8 +2063,8 @@ static int edma_setup_from_hw(struct device *dev, struct edma_soc_info *pdata, * priority. So Q0 is the highest priority queue and the last queue has * the lowest priority. */ - queue_priority_map = devm_kcalloc(dev, ecc->num_tc + 1, sizeof(s8), - GFP_KERNEL); + queue_priority_map = devm_kcalloc(dev, ecc->num_tc + 1, + sizeof(*queue_priority_map), GFP_KERNEL); if (!queue_priority_map) return -ENOMEM;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Kemnade akemnade@kernel.org
[ Upstream commit c05d0b32eebadc8be6e53196e99c64cf2bed1d99 ]
Attach the power good gpio to the regulator device devres instead of the parent device to fix problems if probe is run multiple times (rmmod/insmod or some deferral).
Fixes: 8c485bedfb785 ("regulator: sy7636a: Initial commit") Signed-off-by: Andreas Kemnade akemnade@kernel.org Reviewed-by: Alistair Francis alistair@alistair23.me Reviewed-by: Peng Fan peng.fan@nxp.com Message-ID: 20250906-sy7636-rsrc-v1-2-e2886a9763a7@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/sy7636a-regulator.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/regulator/sy7636a-regulator.c b/drivers/regulator/sy7636a-regulator.c index d1e7ba1fb3e1a..27e3d939b7bb9 100644 --- a/drivers/regulator/sy7636a-regulator.c +++ b/drivers/regulator/sy7636a-regulator.c @@ -83,9 +83,11 @@ static int sy7636a_regulator_probe(struct platform_device *pdev) if (!regmap) return -EPROBE_DEFER;
- gdp = devm_gpiod_get(pdev->dev.parent, "epd-pwr-good", GPIOD_IN); + device_set_of_node_from_dev(&pdev->dev, pdev->dev.parent); + + gdp = devm_gpiod_get(&pdev->dev, "epd-pwr-good", GPIOD_IN); if (IS_ERR(gdp)) { - dev_err(pdev->dev.parent, "Power good GPIO fault %ld\n", PTR_ERR(gdp)); + dev_err(&pdev->dev, "Power good GPIO fault %ld\n", PTR_ERR(gdp)); return PTR_ERR(gdp); }
@@ -105,7 +107,6 @@ static int sy7636a_regulator_probe(struct platform_device *pdev) }
config.dev = &pdev->dev; - config.dev->of_node = pdev->dev.parent->of_node; config.regmap = regmap;
rdev = devm_regulator_register(&pdev->dev, &desc, &config);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Palmer Dabbelt palmer@rivosinc.com
[ Upstream commit 8d4f1e05ff821a5d59116ab8c3a30fcae81d8597 ]
Without this I get a bunch of build errors like
In file included from ./include/linux/sched/task_stack.h:12, from ./arch/riscv/include/asm/compat.h:12, from ./arch/riscv/include/asm/pgtable.h:115, from ./include/linux/pgtable.h:6, from ./include/linux/mm.h:30, from arch/riscv/kernel/asm-offsets.c:8: ./include/linux/kasan.h:50:37: error: ‘MAX_PTRS_PER_PTE’ undeclared here (not in a function); did you mean ‘PTRS_PER_PTE’? 50 | extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PTE ./include/linux/kasan.h:51:8: error: unknown type name ‘pmd_t’; did you mean ‘pgd_t’? 51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD]; | ^~~~~ | pgd_t ./include/linux/kasan.h:51:37: error: ‘MAX_PTRS_PER_PMD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PGD ./include/linux/kasan.h:52:8: error: unknown type name ‘pud_t’; did you mean ‘pgd_t’? 52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD]; | ^~~~~ | pgd_t ./include/linux/kasan.h:52:37: error: ‘MAX_PTRS_PER_PUD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PGD ./include/linux/kasan.h:53:8: error: unknown type name ‘p4d_t’; did you mean ‘pgd_t’? 53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; | ^~~~~ | pgd_t ./include/linux/kasan.h:53:37: error: ‘MAX_PTRS_PER_P4D’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PGD
Link: https://lore.kernel.org/r/20241126143250.29708-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/compat.h | 1 - 1 file changed, 1 deletion(-)
diff --git a/arch/riscv/include/asm/compat.h b/arch/riscv/include/asm/compat.h index aa103530a5c83..6081327e55f5b 100644 --- a/arch/riscv/include/asm/compat.h +++ b/arch/riscv/include/asm/compat.h @@ -9,7 +9,6 @@ */ #include <linux/types.h> #include <linux/sched.h> -#include <linux/sched/task_stack.h> #include <asm-generic/compat.h>
static inline int is_compat_task(void)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit edcbe06453ddfde21f6aa763f7cab655f26133cc upstream.
Suspend-resume cycle test revealed a memory leak in 6.17-rc3
Turns out the slot_id race fix changes accidentally ends up calling xhci_free_virt_device() with an incorrect vdev parameter. The vdev variable was reused for temporary purposes right before calling xhci_free_virt_device().
Fix this by passing the correct vdev parameter.
The slot_id race fix that caused this regression was targeted for stable, so this needs to be applied there as well.
Fixes: 2eb03376151b ("usb: xhci: Fix slot_id resource race conflict") Reported-by: David Wang 00107082@163.com Closes: https://lore.kernel.org/linux-usb/20250829181354.4450-1-00107082@163.com Suggested-by: Michal Pecio michal.pecio@gmail.com Suggested-by: David Wang 00107082@163.com Cc: stable@vger.kernel.org Tested-by: David Wang 00107082@163.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20250902105306.877476-4-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -939,7 +939,7 @@ static void xhci_free_virt_devices_depth out: /* we are now at a leaf device */ xhci_debugfs_remove_slot(xhci, slot_id); - xhci_free_virt_device(xhci, vdev, slot_id); + xhci_free_virt_device(xhci, xhci->devs[slot_id], slot_id); }
int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Stern stern@rowland.harvard.edu
commit 8d63c83d8eb922f6c316320f50c82fa88d099bea upstream.
Yunseong Kim and the syzbot fuzzer both reported a problem in RT-enabled kernels caused by the way dummy-hcd mixes interrupt management and spin-locking. The pattern was:
local_irq_save(flags); spin_lock(&dum->lock); ... spin_unlock(&dum->lock); ... // calls usb_gadget_giveback_request() local_irq_restore(flags);
The code was written this way because usb_gadget_giveback_request() needs to be called with interrupts disabled and the private lock not held.
While this pattern works fine in non-RT kernels, it's not good when RT is enabled. RT kernels handle spinlocks much like mutexes; in particular, spin_lock() may sleep. But sleeping is not allowed while local interrupts are disabled.
To fix the problem, rewrite the code to conform to the pattern used elsewhere in dummy-hcd and other UDC drivers:
spin_lock_irqsave(&dum->lock, flags); ... spin_unlock(&dum->lock); usb_gadget_giveback_request(...); spin_lock(&dum->lock); ... spin_unlock_irqrestore(&dum->lock, flags);
This approach satisfies the RT requirements.
Signed-off-by: Alan Stern stern@rowland.harvard.edu Cc: stable stable@kernel.org Fixes: b4dbda1a22d2 ("USB: dummy-hcd: disable interrupts during req->complete") Reported-by: Yunseong Kim ysk@kzalloc.com Closes: https://lore.kernel.org/linux-usb/5b337389-73b9-4ee4-a83e-7e82bf5af87a@kzalloc.com/ Reported-by: syzbot+8baacc4139f12fa77909@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/68ac2411.050a0220.37038e.0087.GAE@google.com/ Tested-by: syzbot+8baacc4139f12fa77909@syzkaller.appspotmail.com CC: Sebastian Andrzej Siewior bigeasy@linutronix.de CC: stable@vger.kernel.org Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Link: https://lore.kernel.org/r/bb192ae2-4eee-48ee-981f-3efdbbd0d8f0@rowland.harva... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/udc/dummy_hcd.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/usb/gadget/udc/dummy_hcd.c +++ b/drivers/usb/gadget/udc/dummy_hcd.c @@ -764,8 +764,7 @@ static int dummy_dequeue(struct usb_ep * if (!dum->driver) return -ESHUTDOWN;
- local_irq_save(flags); - spin_lock(&dum->lock); + spin_lock_irqsave(&dum->lock, flags); list_for_each_entry(iter, &ep->queue, queue) { if (&iter->req != _req) continue; @@ -775,15 +774,16 @@ static int dummy_dequeue(struct usb_ep * retval = 0; break; } - spin_unlock(&dum->lock);
if (retval == 0) { dev_dbg(udc_dev(dum), "dequeued req %p from %s, len %d buf %p\n", req, _ep->name, _req->length, _req->buf); + spin_unlock(&dum->lock); usb_gadget_giveback_request(_ep, _req); + spin_lock(&dum->lock); } - local_irq_restore(flags); + spin_unlock_irqrestore(&dum->lock, flags); return retval; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: RD Babiera rdbabiera@google.com
commit f34bfcc77b18375a87091c289c2eb53c249787b4 upstream.
tcpm_handle_vdm_request delivers messages to the partner altmode or the cable altmode depending on the SVDM response type, which is incorrect. The partner or cable should be chosen based on the received message type instead.
Also add this filter to ADEV_NOTIFY_USB_AND_QUEUE_VDM, which is used when the Enter Mode command is responded to by a NAK on SOP or SOP' and when the Exit Mode command is responded to by an ACK on SOP.
Fixes: 7e7877c55eb1 ("usb: typec: tcpm: add alt mode enter/exit/vdm support for sop'") Cc: stable@vger.kernel.org Signed-off-by: RD Babiera rdbabiera@google.com Reviewed-by: Badhri Jagan Sridharan badhri@google.com Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20250821203759.1720841-2-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -2375,17 +2375,21 @@ static void tcpm_handle_vdm_request(stru case ADEV_NONE: break; case ADEV_NOTIFY_USB_AND_QUEUE_VDM: - WARN_ON(typec_altmode_notify(adev, TYPEC_STATE_USB, NULL)); - typec_altmode_vdm(adev, p[0], &p[1], cnt); + if (rx_sop_type == TCPC_TX_SOP_PRIME) { + typec_cable_altmode_vdm(adev, TYPEC_PLUG_SOP_P, p[0], &p[1], cnt); + } else { + WARN_ON(typec_altmode_notify(adev, TYPEC_STATE_USB, NULL)); + typec_altmode_vdm(adev, p[0], &p[1], cnt); + } break; case ADEV_QUEUE_VDM: - if (response_tx_sop_type == TCPC_TX_SOP_PRIME) + if (rx_sop_type == TCPC_TX_SOP_PRIME) typec_cable_altmode_vdm(adev, TYPEC_PLUG_SOP_P, p[0], &p[1], cnt); else typec_altmode_vdm(adev, p[0], &p[1], cnt); break; case ADEV_QUEUE_VDM_SEND_EXIT_MODE_ON_FAIL: - if (response_tx_sop_type == TCPC_TX_SOP_PRIME) { + if (rx_sop_type == TCPC_TX_SOP_PRIME) { if (typec_cable_altmode_vdm(adev, TYPEC_PLUG_SOP_P, p[0], &p[1], cnt)) { int svdm_version = typec_get_cable_svdm_version(
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 21d8525d2e061cde034277d518411b02eac764e2 upstream.
The gadget card driver forgot to call snd_ump_update_group_attrs() after adding FBs, and this leaves the UMP group attributes uninitialized. As a result, -ENODEV error is returned at opening a legacy rawmidi device as an inactive group.
This patch adds the missing call to address the behavior above.
Fixes: 8b645922b223 ("usb: gadget: Add support for USB MIDI 2.0 function driver") Cc: stable stable@kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20250904153932.13589-1-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/f_midi2.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/gadget/function/f_midi2.c +++ b/drivers/usb/gadget/function/f_midi2.c @@ -1601,6 +1601,7 @@ static int f_midi2_create_card(struct f_ strscpy(fb->info.name, ump_fb_name(b), sizeof(fb->info.name)); } + snd_ump_update_group_attrs(ump); }
for (i = 0; i < midi2->num_eps; i++) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 116e79c679a1530cf833d0ff3007061d7a716bd9 upstream.
The EP-IN of MIDI2 (altset 1) wasn't initialized in f_midi2_create_usb_configs() as it's an INT EP unlike others BULK EPs. But this leaves rather the max packet size unchanged no matter which speed is used, resulting in the very slow access. And the wMaxPacketSize values set there look legit for INT EPs, so let's initialize the MIDI2 EP-IN there for achieving the equivalent speed as well.
Fixes: 8b645922b223 ("usb: gadget: Add support for USB MIDI 2.0 function driver") Cc: stable stable@kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20250905133240.20966-1-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/f_midi2.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/usb/gadget/function/f_midi2.c +++ b/drivers/usb/gadget/function/f_midi2.c @@ -1739,9 +1739,12 @@ static int f_midi2_create_usb_configs(st case USB_SPEED_HIGH: midi2_midi1_ep_out_desc.wMaxPacketSize = cpu_to_le16(512); midi2_midi1_ep_in_desc.wMaxPacketSize = cpu_to_le16(512); - for (i = 0; i < midi2->num_eps; i++) + for (i = 0; i < midi2->num_eps; i++) { midi2_midi2_ep_out_desc[i].wMaxPacketSize = cpu_to_le16(512); + midi2_midi2_ep_in_desc[i].wMaxPacketSize = + cpu_to_le16(512); + } fallthrough; case USB_SPEED_FULL: midi1_in_eps = midi2_midi1_ep_in_descs; @@ -1750,9 +1753,12 @@ static int f_midi2_create_usb_configs(st case USB_SPEED_SUPER: midi2_midi1_ep_out_desc.wMaxPacketSize = cpu_to_le16(1024); midi2_midi1_ep_in_desc.wMaxPacketSize = cpu_to_le16(1024); - for (i = 0; i < midi2->num_eps; i++) + for (i = 0; i < midi2->num_eps; i++) { midi2_midi2_ep_out_desc[i].wMaxPacketSize = cpu_to_le16(1024); + midi2_midi2_ep_in_desc[i].wMaxPacketSize = + cpu_to_le16(1024); + } midi1_in_eps = midi2_midi1_ep_in_ss_descs; midi1_out_eps = midi2_midi1_ep_out_ss_descs; break;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephan Gerhold stephan.gerhold@linaro.org
commit 5068b5254812433e841a40886e695633148d362d upstream.
When we don't have a clock specified in the device tree, we have no way to ensure the BAM is on. This is often the case for remotely-controlled or remotely-powered BAM instances. In this case, we need to read num-channels from the DT to have all the necessary information to complete probing.
However, at the moment invalid device trees without clock and without num-channels still continue probing, because the error handling is missing return statements. The driver will then later try to read the number of channels from the registers. This is unsafe, because it relies on boot firmware and lucky timing to succeed. Unfortunately, the lack of proper error handling here has been abused for several Qualcomm SoCs upstream, causing early boot crashes in several situations [1, 2].
Avoid these early crashes by erroring out when any of the required DT properties are missing. Note that this will break some of the existing DTs upstream (mainly BAM instances related to the crypto engine). However, clearly these DTs have never been tested properly, since the error in the kernel log was just ignored. It's safer to disable the crypto engine for these broken DTBs.
[1]: https://lore.kernel.org/r/CY01EKQVWE36.B9X5TDXAREPF@fairphone.com/ [2]: https://lore.kernel.org/r/20230626145959.646747-1-krzysztof.kozlowski@linaro...
Cc: stable@vger.kernel.org Fixes: 48d163b1aa6e ("dmaengine: qcom: bam_dma: get num-channels and num-ees from dt") Signed-off-by: Stephan Gerhold stephan.gerhold@linaro.org Reviewed-by: Konrad Dybcio konrad.dybcio@oss.qualcomm.com Link: https://lore.kernel.org/r/20250212-bam-dma-fixes-v1-8-f560889e65d8@linaro.or... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/qcom/bam_dma.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/dma/qcom/bam_dma.c +++ b/drivers/dma/qcom/bam_dma.c @@ -1283,13 +1283,17 @@ static int bam_dma_probe(struct platform if (!bdev->bamclk) { ret = of_property_read_u32(pdev->dev.of_node, "num-channels", &bdev->num_channels); - if (ret) + if (ret) { dev_err(bdev->dev, "num-channels unspecified in dt\n"); + return ret; + }
ret = of_property_read_u32(pdev->dev.of_node, "qcom,num-ees", &bdev->num_ees); - if (ret) + if (ret) { dev_err(bdev->dev, "num-ees unspecified in dt\n"); + return ret; + } }
ret = clk_prepare_enable(bdev->bamclk);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaoqian Lin linmq006@gmail.com
commit aa2e1e4563d3ab689ffa86ca1412ecbf9fd3b308 upstream.
The reference taken by of_find_device_by_node() must be released when not needed anymore. Add missing put_device() call to fix device reference leaks.
Fixes: 134d9c52fca2 ("dmaengine: dw: dmamux: Introduce RZN1 DMA router support") Cc: stable@vger.kernel.org Signed-off-by: Miaoqian Lin linmq006@gmail.com Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/r/20250902090358.2423285-1-linmq006@gmail.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/dw/rzn1-dmamux.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
--- a/drivers/dma/dw/rzn1-dmamux.c +++ b/drivers/dma/dw/rzn1-dmamux.c @@ -48,12 +48,16 @@ static void *rzn1_dmamux_route_allocate( u32 mask; int ret;
- if (dma_spec->args_count != RNZ1_DMAMUX_NCELLS) - return ERR_PTR(-EINVAL); + if (dma_spec->args_count != RNZ1_DMAMUX_NCELLS) { + ret = -EINVAL; + goto put_device; + }
map = kzalloc(sizeof(*map), GFP_KERNEL); - if (!map) - return ERR_PTR(-ENOMEM); + if (!map) { + ret = -ENOMEM; + goto put_device; + }
chan = dma_spec->args[0]; map->req_idx = dma_spec->args[4]; @@ -94,12 +98,15 @@ static void *rzn1_dmamux_route_allocate( if (ret) goto clear_bitmap;
+ put_device(&pdev->dev); return map;
clear_bitmap: clear_bit(map->req_idx, dmamux->used_chans); free_map: kfree(map); +put_device: + put_device(&pdev->dev);
return ERR_PTR(ret); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit bca065733afd1e3a89a02f05ffe14e966cd5f78e upstream.
Make sure to drop the references taken to the PMC OF node and device by of_parse_phandle() and of_find_device_by_node() during probe.
Note the holding a reference to the PMC device does not prevent the PMC regmap from going away (e.g. if the PMC driver is unbound) so there is no need to keep the reference.
Fixes: 2d1021487273 ("phy: tegra: xusb: Add wake/sleepwalk for Tegra210") Cc: stable@vger.kernel.org # 5.14 Cc: JC Kuo jckuo@nvidia.com Signed-off-by: Johan Hovold johan@kernel.org Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20250724131206.2211-2-johan@kernel.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/tegra/xusb-tegra210.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/phy/tegra/xusb-tegra210.c +++ b/drivers/phy/tegra/xusb-tegra210.c @@ -3164,18 +3164,22 @@ tegra210_xusb_padctl_probe(struct device }
pdev = of_find_device_by_node(np); + of_node_put(np); if (!pdev) { dev_warn(dev, "PMC device is not available\n"); goto out; }
- if (!platform_get_drvdata(pdev)) + if (!platform_get_drvdata(pdev)) { + put_device(&pdev->dev); return ERR_PTR(-EPROBE_DEFER); + }
padctl->regmap = dev_get_regmap(&pdev->dev, "usb_sleepwalk"); if (!padctl->regmap) dev_info(dev, "failed to find PMC regmap\n");
+ put_device(&pdev->dev); out: return &padctl->base; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit 64961557efa1b98f375c0579779e7eeda1a02c42 upstream.
Make sure to drop the reference to the control device taken by of_find_device_by_node() during probe when the driver is unbound.
Fixes: 478b6c7436c2 ("usb: phy: omap-usb2: Don't use omap_get_control_dev()") Cc: stable@vger.kernel.org # 3.13 Cc: Roger Quadros rogerq@kernel.org Signed-off-by: Johan Hovold johan@kernel.org Link: https://lore.kernel.org/r/20250724131206.2211-3-johan@kernel.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/ti/phy-omap-usb2.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/drivers/phy/ti/phy-omap-usb2.c +++ b/drivers/phy/ti/phy-omap-usb2.c @@ -363,6 +363,13 @@ static void omap_usb2_init_errata(struct phy->flags |= OMAP_USB2_DISABLE_CHRG_DET; }
+static void omap_usb2_put_device(void *_dev) +{ + struct device *dev = _dev; + + put_device(dev); +} + static int omap_usb2_probe(struct platform_device *pdev) { struct omap_usb *phy; @@ -373,6 +380,7 @@ static int omap_usb2_probe(struct platfo struct device_node *control_node; struct platform_device *control_pdev; const struct usb_phy_data *phy_data; + int ret;
phy_data = device_get_match_data(&pdev->dev); if (!phy_data) @@ -423,6 +431,11 @@ static int omap_usb2_probe(struct platfo return -EINVAL; } phy->control_dev = &control_pdev->dev; + + ret = devm_add_action_or_reset(&pdev->dev, omap_usb2_put_device, + phy->control_dev); + if (ret) + return ret; } else { if (of_property_read_u32_index(node, "syscon-phy-power", 1,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit e19bcea99749ce8e8f1d359f68ae03210694ad56 upstream.
Make sure to drop the reference to the control device taken by of_find_device_by_node() during probe when the driver is unbound.
Fixes: 918ee0d21ba4 ("usb: phy: omap-usb3: Don't use omap_get_control_dev()") Cc: stable@vger.kernel.org # 3.13 Cc: Roger Quadros rogerq@kernel.org Signed-off-by: Johan Hovold johan@kernel.org Link: https://lore.kernel.org/r/20250724131206.2211-4-johan@kernel.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/ti/phy-ti-pipe3.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/drivers/phy/ti/phy-ti-pipe3.c +++ b/drivers/phy/ti/phy-ti-pipe3.c @@ -667,12 +667,20 @@ static int ti_pipe3_get_clk(struct ti_pi return 0; }
+static void ti_pipe3_put_device(void *_dev) +{ + struct device *dev = _dev; + + put_device(dev); +} + static int ti_pipe3_get_sysctrl(struct ti_pipe3 *phy) { struct device *dev = phy->dev; struct device_node *node = dev->of_node; struct device_node *control_node; struct platform_device *control_pdev; + int ret;
phy->phy_power_syscon = syscon_regmap_lookup_by_phandle(node, "syscon-phy-power"); @@ -704,6 +712,11 @@ static int ti_pipe3_get_sysctrl(struct t }
phy->control_dev = &control_pdev->dev; + + ret = devm_add_action_or_reset(dev, ti_pipe3_put_device, + phy->control_dev); + if (ret) + return ret; }
if (phy->mode == PIPE3_MODE_PCIE) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: K Prateek Nayak kprateek.nayak@amd.com
commit cba4262a19afae21665ee242b3404bcede5a94d7 upstream.
Support for parsing the topology on AMD/Hygon processors using CPUID leaf 0xb was added in
3986a0a805e6 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available").
In an effort to keep all the topology parsing bits in one place, this commit also introduced a pseudo dependency on the TOPOEXT feature to parse the CPUID leaf 0xb.
The TOPOEXT feature (CPUID 0x80000001 ECX[22]) advertises the support for Cache Properties leaf 0x8000001d and the CPUID leaf 0x8000001e EAX for "Extended APIC ID" however support for 0xb was introduced alongside the x2APIC support not only on AMD [1], but also historically on x86 [2].
Similar to 0xb, the support for extended CPU topology leaf 0x80000026 too does not depend on the TOPOEXT feature.
The support for these leaves is expected to be confirmed by ensuring
leaf <= {extended_}cpuid_level
and then parsing the level 0 of the respective leaf to confirm EBX[15:0] (LogProcAtThisLevel) is non-zero as stated in the definition of "CPUID_Fn0000000B_EAX_x00 [Extended Topology Enumeration] (Core::X86::Cpuid::ExtTopEnumEax0)" in Processor Programming Reference (PPR) for AMD Family 19h Model 01h Rev B1 Vol1 [3] Sec. 2.1.15.1 "CPUID Instruction Functions".
This has not been a problem on baremetal platforms since support for TOPOEXT (Fam 0x15 and later) predates the support for CPUID leaf 0xb (Fam 0x17[Zen2] and later), however, for AMD guests on QEMU, the "x2apic" feature can be enabled independent of the "topoext" feature where QEMU expects topology and the initial APICID to be parsed using the CPUID leaf 0xb (especially when number of cores > 255) which is populated independent of the "topoext" feature flag.
Unconditionally call cpu_parse_topology_ext() on AMD and Hygon processors to first parse the topology using the XTOPOLOGY leaves (0x80000026 / 0xb) before using the TOPOEXT leaf (0x8000001e).
While at it, break down the single large comment in parse_topology_amd() to better highlight the purpose of each CPUID leaf.
Fixes: 3986a0a805e6 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available") Suggested-by: Naveen N Rao (AMD) naveen@kernel.org Signed-off-by: K Prateek Nayak kprateek.nayak@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@vger.kernel.org # Only v6.9 and above; depends on x86 topology rewrite Link: https://lore.kernel.org/lkml/1529686927-7665-1-git-send-email-suravee.suthik... [1] Link: https://lore.kernel.org/lkml/20080818181435.523309000@linux-os.sc.intel.com/ [2] Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 [3] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/topology_amd.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
--- a/arch/x86/kernel/cpu/topology_amd.c +++ b/arch/x86/kernel/cpu/topology_amd.c @@ -174,24 +174,27 @@ static void topoext_fixup(struct topo_sc
static void parse_topology_amd(struct topo_scan *tscan) { - bool has_topoext = false; - /* - * If the extended topology leaf 0x8000_001e is available - * try to get SMT, CORE, TILE, and DIE shifts from extended + * Try to get SMT, CORE, TILE, and DIE shifts from extended * CPUID leaf 0x8000_0026 on supported processors first. If * extended CPUID leaf 0x8000_0026 is not supported, try to - * get SMT and CORE shift from leaf 0xb first, then try to - * get the CORE shift from leaf 0x8000_0008. + * get SMT and CORE shift from leaf 0xb. If either leaf is + * available, cpu_parse_topology_ext() will return true. */ - if (cpu_feature_enabled(X86_FEATURE_TOPOEXT)) - has_topoext = cpu_parse_topology_ext(tscan); + bool has_xtopology = cpu_parse_topology_ext(tscan);
- if (!has_topoext && !parse_8000_0008(tscan)) + /* + * If XTOPOLOGY leaves (0x26/0xb) are not available, try to + * get the CORE shift from leaf 0x8000_0008 first. + */ + if (!has_xtopology && !parse_8000_0008(tscan)) return;
- /* Prefer leaf 0x8000001e if available */ - if (parse_8000_001e(tscan, has_topoext)) + /* + * Prefer leaf 0x8000001e if available to get the SMT shift and + * the initial APIC ID if XTOPOLOGY leaves are not available. + */ + if (parse_8000_001e(tscan, has_xtopology)) return;
/* Try the NODEID MSR */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Buday Csaba buday.csaba@prolan.hu
commit 8ea25274ebaf2f6be8be374633b2ed8348ec0e70 upstream.
reset_gpio is claimed in mdiobus_register_device(), but it is not released in mdiobus_unregister_device(). It is instead only released when the whole MDIO bus is unregistered. When a device uses the reset_gpio property, it becomes impossible to unregister it and register it again, because the GPIO remains claimed. This patch resolves that issue.
Fixes: bafbdd527d56 ("phylib: Add device reset GPIO support") # see notes Reviewed-by: Andrew Lunn andrew@lunn.ch Cc: Csókás Bence csokas.bence@prolan.hu [ csokas.bence: Resolve rebase conflict and clarify msg ] Signed-off-by: Buday Csaba buday.csaba@prolan.hu Link: https://patch.msgid.link/20250807135449.254254-2-csokas.bence@prolan.hu Signed-off-by: Paolo Abeni pabeni@redhat.com [ csokas.bence: Use the v1 patch on top of 6.12, as specified in notes ] Signed-off-by: Bence Csókás csokas.bence@prolan.hu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/mdio_bus.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -97,6 +97,7 @@ int mdiobus_unregister_device(struct mdi if (mdiodev->bus->mdio_map[mdiodev->addr] != mdiodev) return -EINVAL;
+ gpiod_put(mdiodev->reset_gpio); reset_control_put(mdiodev->reset_ctrl);
mdiodev->bus->mdio_map[mdiodev->addr] = NULL; @@ -814,9 +815,6 @@ void mdiobus_unregister(struct mii_bus * if (!mdiodev) continue;
- if (mdiodev->reset_gpio) - gpiod_put(mdiodev->reset_gpio); - mdiodev->device_remove(mdiodev); mdiodev->device_free(mdiodev); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jani Nikula jani.nikula@intel.com
commit cfa7b7659757f8d0fc4914429efa90d0d2577dd7 upstream.
for_each_set_bit() expects size to be in bits, not bytes. The abox mask iteration uses bytes, but it works by coincidence, because the local variable holding the mask is unsigned long, and the mask only ever has bit 2 as the highest bit. Using a smaller type could lead to subtle and very hard to track bugs.
Fixes: 62afef2811e4 ("drm/i915/rkl: RKL uses ABOX0 for pixel transfers") Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v5.9+ Reviewed-by: Matt Roper matthew.d.roper@intel.com Link: https://lore.kernel.org/r/20250905104149.1144751-1-jani.nikula@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com (cherry picked from commit 7ea3baa6efe4bb93d11e1c0e6528b1468d7debf6) Signed-off-by: Tvrtko Ursulin tursulin@ursulin.net [ adapted struct intel_display *display parameters to struct drm_i915_private *dev_priv ] Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_display_power.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_display_power.c +++ b/drivers/gpu/drm/i915/display/intel_display_power.c @@ -1150,7 +1150,7 @@ static void icl_mbus_init(struct drm_i91 if (DISPLAY_VER(dev_priv) == 12) abox_regs |= BIT(0);
- for_each_set_bit(i, &abox_regs, sizeof(abox_regs)) + for_each_set_bit(i, &abox_regs, BITS_PER_TYPE(abox_regs)) intel_de_rmw(dev_priv, MBUS_ABOX_CTL(i), mask, val); }
@@ -1603,11 +1603,11 @@ static void tgl_bw_buddy_init(struct drm if (table[config].page_mask == 0) { drm_dbg(&dev_priv->drm, "Unknown memory configuration; disabling address buddy logic.\n"); - for_each_set_bit(i, &abox_mask, sizeof(abox_mask)) + for_each_set_bit(i, &abox_mask, BITS_PER_TYPE(abox_mask)) intel_de_write(dev_priv, BW_BUDDY_CTL(i), BW_BUDDY_DISABLE); } else { - for_each_set_bit(i, &abox_mask, sizeof(abox_mask)) { + for_each_set_bit(i, &abox_mask, BITS_PER_TYPE(abox_mask)) { intel_de_write(dev_priv, BW_BUDDY_PAGE_MASK(i), table[config].page_mask);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit 7838fb5f119191403560eca2e23613380c0e425e upstream.
Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting.
Reported-by: Lin.Cao lincao12@amd.com Cc: Vitaly Prosyak vitaly.prosyak@amd.com Cc: Christian König christian.koenig@amd.com Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free") Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit a525fa37aac36c4591cc8b07ae8957862415fbd5) Cc: stable@vger.kernel.org [ Adapt to conditional check ] Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 --- 1 file changed, 3 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -400,9 +400,6 @@ void amdgpu_ring_fini(struct amdgpu_ring dma_fence_put(ring->vmid_wait); ring->vmid_wait = NULL; ring->me = 0; - - if (!ring->is_mes_queue) - ring->adev->rings[ring->idx] = NULL; }
/**
[Public]
-----Original Message----- From: Greg Kroah-Hartman gregkh@linuxfoundation.org Sent: Wednesday, September 17, 2025 8:35 AM To: stable@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org; patches@lists.linux.dev; cao, lin lin.cao@amd.com; Prosyak, Vitaly Vitaly.Prosyak@amd.com; Koenig, Christian Christian.Koenig@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Sasha Levin sashal@kernel.org Subject: [PATCH 6.12 140/140] drm/amdgpu: fix a memory leak in fence cleanup when unloading
6.12-stable review patch. If anyone has any objections, please let me know.
From: Alex Deucher alexander.deucher@amd.com
commit 7838fb5f119191403560eca2e23613380c0e425e upstream.
Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting.
Reported-by: Lin.Cao lincao12@amd.com Cc: Vitaly Prosyak vitaly.prosyak@amd.com Cc: Christian König christian.koenig@amd.com Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free")
Does 6.12 contain b61badd20b44 or a backport of it? If not, then this patch is not applicable.
Alex
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit a525fa37aac36c4591cc8b07ae8957862415fbd5) Cc: stable@vger.kernel.org [ Adapt to conditional check ] Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 --- 1 file changed, 3 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -400,9 +400,6 @@ void amdgpu_ring_fini(struct amdgpu_ring dma_fence_put(ring->vmid_wait); ring->vmid_wait = NULL; ring->me = 0;
if (!ring->is_mes_queue)
ring->adev->rings[ring->idx] = NULL;
}
/**
On Wed, Sep 17, 2025 at 02:36:32PM +0000, Deucher, Alexander wrote:
[Public]
-----Original Message----- From: Greg Kroah-Hartman gregkh@linuxfoundation.org Sent: Wednesday, September 17, 2025 8:35 AM To: stable@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org; patches@lists.linux.dev; cao, lin lin.cao@amd.com; Prosyak, Vitaly Vitaly.Prosyak@amd.com; Koenig, Christian Christian.Koenig@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Sasha Levin sashal@kernel.org Subject: [PATCH 6.12 140/140] drm/amdgpu: fix a memory leak in fence cleanup when unloading
6.12-stable review patch. If anyone has any objections, please let me know.
From: Alex Deucher alexander.deucher@amd.com
commit 7838fb5f119191403560eca2e23613380c0e425e upstream.
Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting.
Reported-by: Lin.Cao lincao12@amd.com Cc: Vitaly Prosyak vitaly.prosyak@amd.com Cc: Christian König christian.koenig@amd.com Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free")
Does 6.12 contain b61badd20b44 or a backport of it? If not, then this patch is not applicable.
Yes, it is in 6.12.4
thanks,
greg k-h
The kernel, bpf tool, perf tool, and kselftest builds fine for v6.12.48-rc1 on x86 and arm64 Azure VM.
Tested-by: Hardik Garg hargar@linux.microsoft.com
Thanks, Hardik
On Wed, 17 Sep 2025 14:32:52 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.48 release. There are 140 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 19 Sep 2025 12:32:53 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.48-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.12: 10 builds: 10 pass, 0 fail 28 boots: 28 pass, 0 fail 120 tests: 120 pass, 0 fail
Linux version: 6.12.48-rc1-g6281f90c1450 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra186-p3509-0000+p3636-0001, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
linux-stable-mirror@lists.linaro.org