This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.12.11-rc1
Ryan Lee ryan.lee@canonical.com apparmor: allocate xmatch for nullpdb inside aa_alloc_null
Wayne Lin Wayne.Lin@amd.com drm/amd/display: Validate mdoe under MST LCT=1 case as well
Nicholas Susanto Nicholas.Susanto@amd.com Revert "drm/amd/display: Enable urgent latency adjustments for DCN35"
Leo Li sunpeng.li@amd.com drm/amd/display: Do not wait for PSR disable on vbl enable
Tom Chung chiahsuan.chung@amd.com drm/amd/display: Disable replay and psr while VRR is enabled
Tom Chung chiahsuan.chung@amd.com drm/amd/display: Fix PSR-SU not support but still call the amdgpu_dm_psr_enable
Christian König christian.koenig@amd.com drm/amdgpu: always sync the GFX pipe on ctx switch
Kenneth Feng kenneth.feng@amd.com drm/amdgpu: disable gfxoff with the compute workload on gfx12
Gui Chengming Jack.Gui@amd.com drm/amdgpu: fix fw attestation for MP0_14_0_{2/3}
Alex Deucher alexander.deucher@amd.com drm/amdgpu/smu13: update powersave optimizations
Ashutosh Dixit ashutosh.dixit@intel.com drm/xe/oa: Add missing VISACTL mux registers
Matthew Brost matthew.brost@intel.com drm/xe: Mark ComputeCS read mode as UC on iGPU
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/fb: Relax clear color alignment to 64 bytes
Xin Li (Intel) xin@zytor.com x86/fred: Fix the FRED RSP0 MSR out of sync with its per-CPU cache
Frederic Weisbecker frederic@kernel.org timers/migration: Enforce group initialization visibility to tree walkers
Frederic Weisbecker frederic@kernel.org timers/migration: Fix another race between hotplug and idle entry/exit
Koichiro Den koichiro.den@canonical.com hrtimers: Handle CPU state correctly on hotplug
Tomas Krcka krckatom@amazon.de irqchip/gic-v3-its: Don't enable interrupts in its_irq_set_vcpu_affinity()
Yogesh Lal quic_ylal@quicinc.com irqchip/gic-v3: Handle CPU_PM_ENTER_FAILED correctly
Joe Hattori joe@pf.is.s.u-tokyo.ac.jp irqchip: Plug a OF node reference leak in platform_irqchip_probe()
Steven Rostedt rostedt@goodmis.org tracing: gfp: Fix the GFP enum values shown for user space tracing tools
Donet Tom donettom@linux.ibm.com mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled.
Ryan Roberts ryan.roberts@arm.com mm: clear uffd-wp PTE/PMD state on mremap()
Leo Li sunpeng.li@amd.com drm/amd/display: Do not elevate mem_type change to full update
Ryan Roberts ryan.roberts@arm.com selftests/mm: set allocated memory to non-zero content in cow test
Guo Weikang guoweikang.kernel@gmail.com mm/kmemleak: fix percpu memory leak detection failure
Xiaolei Wang xiaolei.wang@windriver.com pmdomain: imx8mp-blk-ctrl: add missing loop break condition
Suren Baghdasaryan surenb@google.com tools: fix atomic_set() definition to set the value correctly
Sean Anderson sean.anderson@linux.dev gpio: xilinx: Convert gpio_lock to raw spinlock
Rik van Riel riel@surriel.com fs/proc: fix softlockup in __read_vmcore (part 2)
Marco Nelissen marco.nelissen@gmail.com filemap: avoid truncating 64-bit offset to 32 bits
Paul Fertser fercerpav@gmail.com net/ncsi: fix locking in Get MAC Address handling
Takashi Iwai tiwai@suse.de drm/nouveau/disp: Fix missing backlight control on Macbook 5,1
Dave Airlie airlied@redhat.com nouveau/fence: handle cross device fences properly
Stefano Garzarella sgarzare@redhat.com vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]
Stefano Garzarella sgarzare@redhat.com vsock: reset socket state when de-assigning the transport
Stefano Garzarella sgarzare@redhat.com vsock/virtio: cancel close work in the destructor
Stefano Garzarella sgarzare@redhat.com vsock/virtio: discard packets if the transport changes
Stefano Garzarella sgarzare@redhat.com vsock/bpf: return early if transport is not assigned
Heiner Kallweit hkallweit1@gmail.com net: ethernet: xgbe: re-add aneg to supported features in PHY quirks
Paolo Abeni pabeni@redhat.com selftests: mptcp: avoid spurious errors on disconnect
Paolo Abeni pabeni@redhat.com mptcp: fix spurious wake-up on under memory pressure
Paolo Abeni pabeni@redhat.com mptcp: be sure to send ack when mptcp-level window re-opens
Tomi Valkeinen tomi.valkeinen+renesas@ideasonboard.com i2c: atr: Fix client detach
Kairui Song kasong@tencent.com zram: fix potential UAF of zram table
Luke D. Jones luke@ljones.dev ALSA: hda/realtek: fixup ASUS H7606W
Luke D. Jones luke@ljones.dev ALSA: hda/realtek: fixup ASUS GA605W
Stefan Binding sbinding@opensource.cirrus.com ALSA: hda/realtek: Add support for Ayaneo System using CS35L41 HDA
Juergen Gross jgross@suse.com x86/asm: Make serialize() always_inline
Peter Zijlstra peterz@infradead.org sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE
Peter Zijlstra peterz@infradead.org sched/fair: Fix EEVDF entity placement bug causing scheduling lag
Luis Chamberlain mcgrof@kernel.org nvmet: propagate npwg topology
Tejun Heo tj@kernel.org sched_ext: Fix dsq_local_on selftest
Hongguang Gao hongguang.gao@broadcom.com RDMA/bnxt_re: Fix to export port num to ib_query_qp
David Vernet void@manifault.com scx: Fix maximal BPF selftest prog
Ihor Solodrai ihor.solodrai@pm.me selftests/sched_ext: fix build after renames in sched_ext API
Oleg Nesterov oleg@redhat.com poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
Lizhi Xu lizhi.xu@windriver.com afs: Fix merge preference rule failure condition
Marco Nelissen marco.nelissen@gmail.com iomap: avoid avoid truncating 64-bit offset to 32 bits
Henry Huang henry.hj@antgroup.com sched_ext: keep running prev when prev->scx.slice != 0
Hans de Goede hdegoede@redhat.com ACPI: resource: acpi_dev_irq_override(): Check DMI match last
Srinivas Pandruvada srinivas.pandruvada@linux.intel.com platform/x86: ISST: Add Clearwater Forest to support list
Srinivas Pandruvada srinivas.pandruvada@linux.intel.com platform/x86/intel: power-domains: Add Clearwater Forest support
Jakub Kicinski kuba@kernel.org selftests: tc-testing: reduce rshift value
Koichiro Den koichiro.den@canonical.com gpio: sim: lock up configfs that an instantiated device depends on
Koichiro Den koichiro.den@canonical.com gpio: virtuser: lock up configfs that an instantiated device depends on
Manivannan Sadhasivam manivannan.sadhasivam@linaro.org scsi: ufs: core: Honor runtime/system PM levels if set by host controller drivers
Max Kellermann max.kellermann@ionos.com cachefiles: Parse the "secctx" immediately
David Howells dhowells@redhat.com netfs: Fix non-contiguous donation between completed reads
David Howells dhowells@redhat.com kheaders: Ignore silly-rename files
Zhang Kunbo zhangkunbo@huawei.com fs: fix missing declaration of init_files
Brahmajit Das brahmajit.xyz@gmail.com fs/qnx6: Fix building with GCC 15
Leo Stone leocstone@gmail.com hfs: Sanity check the root record
Lizhi Xu lizhi.xu@windriver.com mac802154: check local interfaces before deleting sdata list
Paulo Alcantara pc@manguebit.com smb: client: fix double free of TCP_Server_Info::hostname
David Lechner dlechner@baylibre.com hwmon: (ltc2991) Fix mixed signed/unsigned in DIV_ROUND_CLOSEST
Wolfram Sang wsa+renesas@sang-engineering.com i2c: testunit: on errors, repeat NACK until STOP
Wolfram Sang wsa+renesas@sang-engineering.com i2c: rcar: fix NACK handling when being a target
Wolfram Sang wsa+renesas@sang-engineering.com i2c: mux: demux-pinctrl: check initial mux selection, too
Pratyush Yadav pratyush@kernel.org Revert "mtd: spi-nor: core: replace dummy buswidth from addr to data"
David Lechner dlechner@baylibre.com hwmon: (tmp513) Fix division of negative numbers
Chenyuan Yang chenyuan0y@gmail.com platform/x86: lenovo-yoga-tab2-pro-1380-fastcharger: fix serdev race
Chenyuan Yang chenyuan0y@gmail.com platform/x86: dell-uart-backlight: fix serdev race
Joe Hattori joe@pf.is.s.u-tokyo.ac.jp i2c: core: fix reference leak in i2c_register_adapter()
MD Danish Anwar danishanwar@ti.com soc: ti: pruss: Fix pruss APIs
Claudiu Beznea claudiu.beznea.uj@bp.renesas.com reset: rzg2l-usbphy-ctrl: Assign proper of node to the allocated device
Maíra Canal mcanal@igalia.com drm/v3d: Ensure job pointer is set to NULL after job completion
Ian Forbes ian.forbes@broadcom.com drm/vmwgfx: Add new keep_resv BO param
Ian Forbes ian.forbes@broadcom.com drm/vmwgfx: Unreserve BO on error
Yu-Chun Lin eleanor15x@gmail.com drm/tests: helpers: Fix compiler warning
Jakub Kicinski kuba@kernel.org netdev: avoid CFI problems with sock priv helpers
Leon Romanovsky leon@kernel.org net/mlx5e: Always start IPsec sequence number from 1
Leon Romanovsky leon@kernel.org net/mlx5e: Rely on reqid in IPsec tunnel mode
Leon Romanovsky leon@kernel.org net/mlx5e: Fix inversion dependency warning while enabling IPsec tunnel
Mark Zhang markzhang@nvidia.com net/mlx5: Clear port select structure when fail to create
Chris Mi cmi@nvidia.com net/mlx5: SF, Fix add port error handling
Yishai Hadas yishaih@nvidia.com net/mlx5: Fix a lockdep warning as part of the write combining test
Patrisious Haddad phaddad@nvidia.com net/mlx5: Fix RDMA TX steering prio
Pavel Begunkov asml.silence@gmail.com net: make page_pool_ref_netmem work with net iovs
Kevin Groeneveld kgroeneveld@lenbrook.com net: fec: handle page_pool_dev_alloc_pages error
Sean Anderson sean.anderson@linux.dev net: xilinx: axienet: Fix IRQ coalescing packet count overflow
Dan Carpenter dan.carpenter@linaro.org nfp: bpf: prevent integer overflow in nfp_bpf_event_output()
Viresh Kumar viresh.kumar@linaro.org cpufreq: Move endif to the end of Kconfig file
Kuniyuki Iwashima kuniyu@amazon.com pfcp: Destroy device along with udp socket's netns dismantle.
Kuniyuki Iwashima kuniyu@amazon.com gtp: Destroy device along with udp socket's netns dismantle.
Kuniyuki Iwashima kuniyu@amazon.com gtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().
Qu Wenruo wqu@suse.com btrfs: add the missing error handling inside get_canonical_dev_path
Rafael J. Wysocki rafael.j.wysocki@intel.com cpuidle: teo: Update documentation after previous changes
Karol Kolacinski karol.kolacinski@intel.com ice: Add correct PHY lane assignment
Sergey Temerkhanov sergey.temerkhanov@intel.com ice: Use ice_adapter for PTP shared data instead of auxdev
Sergey Temerkhanov sergey.temerkhanov@intel.com ice: Add ice_get_ctrl_ptp() wrapper to simplify the code
Sergey Temerkhanov sergey.temerkhanov@intel.com ice: Introduce ice_get_phy_model() wrapper
Karol Kolacinski karol.kolacinski@intel.com ice: Fix ETH56G FC-FEC Rx offset value
Karol Kolacinski karol.kolacinski@intel.com ice: Fix quad registers read on E825
Karol Kolacinski karol.kolacinski@intel.com ice: Fix E825 initialization
Artem Chernyshev artem.chernyshev@red-soft.ru pktgen: Avoid out-of-bounds access in get_imix_entries
Ilya Maximets i.maximets@ovn.org openvswitch: fix lockup on tx to unregistering netdev with carrier
Paul Barker paul.barker.ct@bp.renesas.com net: ravb: Fix max TX frame size for RZ/V2M
Jakub Kicinski kuba@kernel.org eth: bnxt: always recalculate features after XDP clearing, fix null-deref
Michal Luczaj mhal@rbox.co bpf: Fix bpf_sk_select_reuseport() memory leak
Sudheer Kumar Doredla s-doredla@ti.com net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()
Ard Biesheuvel ardb@kernel.org efi/zboot: Limit compression options to GZIP and ZSTD
-------------
Diffstat:
Makefile | 4 +- arch/x86/include/asm/special_insns.h | 2 +- arch/x86/kernel/fred.c | 8 +- drivers/acpi/resource.c | 6 +- drivers/block/zram/zram_drv.c | 1 + drivers/cpufreq/Kconfig | 4 +- drivers/cpuidle/governors/teo.c | 91 +++---- drivers/firmware/efi/Kconfig | 4 - drivers/firmware/efi/libstub/Makefile.zboot | 18 +- drivers/gpio/gpio-sim.c | 48 +++- drivers/gpio/gpio-virtuser.c | 49 +++- drivers/gpio/gpio-xilinx.c | 32 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fw_attestation.c | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 4 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 41 ++- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c | 25 +- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 4 +- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.h | 2 +- .../drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 2 +- .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 14 +- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 35 ++- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.h | 3 +- .../gpu/drm/amd/display/dc/dml/dcn35/dcn35_fpu.c | 4 +- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 11 +- drivers/gpu/drm/i915/display/intel_fb.c | 2 +- drivers/gpu/drm/nouveau/nouveau_fence.c | 6 +- drivers/gpu/drm/nouveau/nvkm/engine/disp/mcp77.c | 1 + drivers/gpu/drm/tests/drm_kunit_helpers.c | 3 +- drivers/gpu/drm/v3d/v3d_irq.c | 4 + drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 3 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 3 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 7 +- drivers/gpu/drm/vmwgfx/vmwgfx_gem.c | 1 + drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 +- drivers/gpu/drm/vmwgfx/vmwgfx_shader.c | 7 +- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 5 +- drivers/gpu/drm/xe/xe_hw_engine.c | 2 +- drivers/gpu/drm/xe/xe_oa.c | 1 + drivers/hwmon/ltc2991.c | 2 +- drivers/hwmon/tmp513.c | 7 +- drivers/i2c/busses/i2c-rcar.c | 20 +- drivers/i2c/i2c-atr.c | 2 +- drivers/i2c/i2c-core-base.c | 1 + drivers/i2c/i2c-slave-testunit.c | 19 +- drivers/i2c/muxes/i2c-demux-pinctrl.c | 4 +- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 1 + drivers/infiniband/hw/bnxt_re/ib_verbs.h | 4 + drivers/infiniband/hw/bnxt_re/qplib_fp.c | 1 + drivers/infiniband/hw/bnxt_re/qplib_fp.h | 1 + drivers/irqchip/irq-gic-v3-its.c | 2 +- drivers/irqchip/irq-gic-v3.c | 2 +- drivers/irqchip/irqchip.c | 4 +- drivers/mtd/spi-nor/core.c | 2 +- drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 19 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 25 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 7 - drivers/net/ethernet/freescale/fec_main.c | 19 +- drivers/net/ethernet/intel/ice/ice.h | 5 + drivers/net/ethernet/intel/ice/ice_adapter.c | 6 + drivers/net/ethernet/intel/ice/ice_adapter.h | 22 +- drivers/net/ethernet/intel/ice/ice_adminq_cmd.h | 1 + drivers/net/ethernet/intel/ice/ice_common.c | 51 ++++ drivers/net/ethernet/intel/ice/ice_common.h | 1 + drivers/net/ethernet/intel/ice/ice_main.c | 6 +- drivers/net/ethernet/intel/ice/ice_ptp.c | 165 +++++++----- drivers/net/ethernet/intel/ice/ice_ptp.h | 9 +- drivers/net/ethernet/intel/ice/ice_ptp_consts.h | 2 +- drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 285 +++++++++++---------- drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 5 + drivers/net/ethernet/intel/ice/ice_type.h | 2 - .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 22 +- .../mellanox/mlx5/core/en_accel/ipsec_fs.c | 12 +- .../mellanox/mlx5/core/en_accel/ipsec_offload.c | 11 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 1 + .../net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 4 +- .../net/ethernet/mellanox/mlx5/core/sf/devlink.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/wc.c | 24 +- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 3 +- drivers/net/ethernet/renesas/ravb_main.c | 1 + drivers/net/ethernet/ti/cpsw_ale.c | 14 +- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 6 + drivers/net/gtp.c | 26 +- drivers/net/pfcp.c | 15 +- drivers/nvme/target/io-cmd-bdev.c | 2 +- drivers/platform/x86/dell/dell-uart-backlight.c | 5 +- .../x86/intel/speed_select_if/isst_if_common.c | 1 + drivers/platform/x86/intel/tpmi_power_domains.c | 1 + .../x86/lenovo-yoga-tab2-pro-1380-fastcharger.c | 5 +- drivers/pmdomain/imx/imx8mp-blk-ctrl.c | 2 +- drivers/reset/reset-rzg2l-usbphy-ctrl.c | 1 + drivers/ufs/core/ufshcd.c | 9 +- fs/afs/addr_prefs.c | 6 +- fs/btrfs/volumes.c | 4 + fs/cachefiles/daemon.c | 14 +- fs/cachefiles/internal.h | 3 +- fs/cachefiles/security.c | 6 +- fs/file.c | 1 + fs/hfs/super.c | 4 +- fs/iomap/buffered-io.c | 2 +- fs/netfs/read_collect.c | 9 +- fs/proc/vmcore.c | 2 + fs/qnx6/inode.c | 11 +- fs/smb/client/connect.c | 3 +- include/linux/hrtimer.h | 1 + include/linux/poll.h | 10 +- include/linux/pruss_driver.h | 12 +- include/linux/userfaultfd_k.h | 12 + include/net/page_pool/helpers.h | 2 +- include/trace/events/mmflags.h | 63 +++++ kernel/cpu.c | 2 +- kernel/gen_kheaders.sh | 1 + kernel/sched/ext.c | 11 +- kernel/sched/fair.c | 151 ++--------- kernel/time/hrtimer.c | 11 +- kernel/time/timer_migration.c | 43 +++- mm/filemap.c | 2 +- mm/huge_memory.c | 12 + mm/hugetlb.c | 14 +- mm/kmemleak.c | 2 +- mm/mremap.c | 32 ++- mm/vmscan.c | 3 + net/core/filter.c | 30 ++- net/core/netdev-genl-gen.c | 14 +- net/core/pktgen.c | 6 +- net/mac802154/iface.c | 4 + net/mptcp/options.c | 6 +- net/mptcp/protocol.h | 9 +- net/ncsi/internal.h | 2 + net/ncsi/ncsi-manage.c | 16 +- net/ncsi/ncsi-rsp.c | 19 +- net/openvswitch/actions.c | 4 +- net/vmw_vsock/af_vsock.c | 18 ++ net/vmw_vsock/virtio_transport_common.c | 38 ++- net/vmw_vsock/vsock_bpf.c | 9 + security/apparmor/policy.c | 1 + sound/pci/hda/patch_realtek.c | 3 + tools/net/ynl/ynl-gen-c.py | 16 +- tools/testing/selftests/mm/cow.c | 8 +- tools/testing/selftests/net/mptcp/mptcp_connect.c | 43 +++- .../selftests/sched_ext/ddsp_bogus_dsq_fail.bpf.c | 2 +- .../selftests/sched_ext/ddsp_vtimelocal_fail.bpf.c | 4 +- .../testing/selftests/sched_ext/dsp_local_on.bpf.c | 7 +- tools/testing/selftests/sched_ext/dsp_local_on.c | 5 +- .../selftests/sched_ext/enq_select_cpu_fails.bpf.c | 2 +- tools/testing/selftests/sched_ext/exit.bpf.c | 4 +- tools/testing/selftests/sched_ext/maximal.bpf.c | 8 +- .../selftests/sched_ext/select_cpu_dfl.bpf.c | 2 +- .../sched_ext/select_cpu_dfl_nodispatch.bpf.c | 2 +- .../selftests/sched_ext/select_cpu_dispatch.bpf.c | 2 +- .../sched_ext/select_cpu_dispatch_bad_dsq.bpf.c | 2 +- .../sched_ext/select_cpu_dispatch_dbl_dsp.bpf.c | 4 +- .../selftests/sched_ext/select_cpu_vtime.bpf.c | 8 +- .../tc-testing/tc-tests/filters/flow.json | 4 +- tools/testing/shared/linux/maple_tree.h | 2 +- tools/testing/vma/linux/atomic.h | 2 +- 157 files changed, 1345 insertions(+), 766 deletions(-)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ardb@kernel.org
commit 0b2c29fb68f8bf3e87a9d88404aa6fdd486223e5 upstream.
For historical reasons, the legacy decompressor code on various architectures supports 7 different compression types for the compressed kernel image.
EFI zboot is not a compression library museum, and so the options can be limited to what is likely to be useful in practice:
- GZIP is tried and tested, and is still one of the fastest at decompression time, although the compression ratio is not very high; moreover, Fedora is already shipping EFI zboot kernels for arm64 that use GZIP, and QEMU implements direct support for it when booting a kernel without firmware loaded;
- ZSTD has a very high compression ratio (although not the highest), and is almost as fast as GZIP at decompression time.
Reducing the number of options makes it less of a hassle for other consumers of the EFI zboot format (such as QEMU today, and kexec in the future) to support it transparently without having to carry 7 different decompression libraries.
Acked-by: Gerd Hoffmann kraxel@redhat.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/Kconfig | 4 ---- drivers/firmware/efi/libstub/Makefile.zboot | 18 ++++++------------ 2 files changed, 6 insertions(+), 16 deletions(-)
--- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -76,10 +76,6 @@ config EFI_ZBOOT bool "Enable the generic EFI decompressor" depends on EFI_GENERIC_STUB && !ARM select HAVE_KERNEL_GZIP - select HAVE_KERNEL_LZ4 - select HAVE_KERNEL_LZMA - select HAVE_KERNEL_LZO - select HAVE_KERNEL_XZ select HAVE_KERNEL_ZSTD help Create the bootable image as an EFI application that carries the --- a/drivers/firmware/efi/libstub/Makefile.zboot +++ b/drivers/firmware/efi/libstub/Makefile.zboot @@ -12,22 +12,16 @@ quiet_cmd_copy_and_pad = PAD $@ $(obj)/vmlinux.bin: $(obj)/$(EFI_ZBOOT_PAYLOAD) FORCE $(call if_changed,copy_and_pad)
-comp-type-$(CONFIG_KERNEL_GZIP) := gzip -comp-type-$(CONFIG_KERNEL_LZ4) := lz4 -comp-type-$(CONFIG_KERNEL_LZMA) := lzma -comp-type-$(CONFIG_KERNEL_LZO) := lzo -comp-type-$(CONFIG_KERNEL_XZ) := xzkern -comp-type-$(CONFIG_KERNEL_ZSTD) := zstd22 - # in GZIP, the appended le32 carrying the uncompressed size is part of the # format, but in other cases, we just append it at the end for convenience, # causing the original tools to complain when checking image integrity. -# So disregard it when calculating the payload size in the zimage header. -zboot-method-y := $(comp-type-y)_with_size -zboot-size-len-y := 4 +comp-type-y := gzip +zboot-method-y := gzip +zboot-size-len-y := 0
-zboot-method-$(CONFIG_KERNEL_GZIP) := gzip -zboot-size-len-$(CONFIG_KERNEL_GZIP) := 0 +comp-type-$(CONFIG_KERNEL_ZSTD) := zstd +zboot-method-$(CONFIG_KERNEL_ZSTD) := zstd22_with_size +zboot-size-len-$(CONFIG_KERNEL_ZSTD) := 4
$(obj)/vmlinuz: $(obj)/vmlinux.bin FORCE $(call if_changed,$(zboot-method-y))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudheer Kumar Doredla s-doredla@ti.com
[ Upstream commit 03d120f27d050336f7e7d21879891542c4741f81 ]
CPSW ALE has 75-bit ALE entries stored across three 32-bit words. The cpsw_ale_get_field() and cpsw_ale_set_field() functions support ALE field entries spanning up to two words at the most.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions work as expected when ALE field spanned across word1 and word2, but fails when ALE field spanned across word2 and word3.
For example, while reading the ALE field spanned across word2 and word3 (i.e. bits 62 to 64), the word3 data shifted to an incorrect position due to the index becoming zero while flipping. The same issue occurred when setting an ALE entry.
This issue has not been seen in practice but will be an issue in the future if the driver supports accessing ALE fields spanning word2 and word3
Fix the methods to handle getting/setting fields spanning up to two words.
Fixes: b685f1a58956 ("net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()") Signed-off-by: Sudheer Kumar Doredla s-doredla@ti.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Roger Quadros rogerq@kernel.org Reviewed-by: Siddharth Vadapalli s-vadapalli@ti.com Link: https://patch.msgid.link/20250108172433.311694-1-s-doredla@ti.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ti/cpsw_ale.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c index 8d02d2b214293..dc5e247ca5d1a 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.c +++ b/drivers/net/ethernet/ti/cpsw_ale.c @@ -127,15 +127,15 @@ struct cpsw_ale_dev_id {
static inline int cpsw_ale_get_field(u32 *ale_entry, u32 start, u32 bits) { - int idx, idx2; + int idx, idx2, index; u32 hi_val = 0;
idx = start / 32; idx2 = (start + bits - 1) / 32; /* Check if bits to be fetched exceed a word */ if (idx != idx2) { - idx2 = 2 - idx2; /* flip */ - hi_val = ale_entry[idx2] << ((idx2 * 32) - start); + index = 2 - idx2; /* flip */ + hi_val = ale_entry[index] << ((idx2 * 32) - start); } start -= idx * 32; idx = 2 - idx; /* flip */ @@ -145,16 +145,16 @@ static inline int cpsw_ale_get_field(u32 *ale_entry, u32 start, u32 bits) static inline void cpsw_ale_set_field(u32 *ale_entry, u32 start, u32 bits, u32 value) { - int idx, idx2; + int idx, idx2, index;
value &= BITMASK(bits); idx = start / 32; idx2 = (start + bits - 1) / 32; /* Check if bits to be set exceed a word */ if (idx != idx2) { - idx2 = 2 - idx2; /* flip */ - ale_entry[idx2] &= ~(BITMASK(bits + start - (idx2 * 32))); - ale_entry[idx2] |= (value >> ((idx2 * 32) - start)); + index = 2 - idx2; /* flip */ + ale_entry[index] &= ~(BITMASK(bits + start - (idx2 * 32))); + ale_entry[index] |= (value >> ((idx2 * 32) - start)); } start -= idx * 32; idx = 2 - idx; /* flip */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
[ Upstream commit b3af60928ab9129befa65e6df0310d27300942bf ]
As pointed out in the original comment, lookup in sockmap can return a TCP ESTABLISHED socket. Such TCP socket may have had SO_ATTACH_REUSEPORT_EBPF set before it was ESTABLISHED. In other words, a non-NULL sk_reuseport_cb does not imply a non-refcounted socket.
Drop sk's reference in both error paths.
unreferenced object 0xffff888101911800 (size 2048): comm "test_progs", pid 44109, jiffies 4297131437 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 80 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 9336483b): __kmalloc_noprof+0x3bf/0x560 __reuseport_alloc+0x1d/0x40 reuseport_alloc+0xca/0x150 reuseport_attach_prog+0x87/0x140 sk_reuseport_attach_bpf+0xc8/0x100 sk_setsockopt+0x1181/0x1990 do_sock_setsockopt+0x12b/0x160 __sys_setsockopt+0x7b/0xc0 __x64_sys_setsockopt+0x1b/0x30 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 64d85290d79c ("bpf: Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH") Signed-off-by: Michal Luczaj mhal@rbox.co Reviewed-by: Martin KaFai Lau martin.lau@kernel.org Link: https://patch.msgid.link/20250110-reuseport-memleak-v1-1-fa1ddab0adfe@rbox.c... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/filter.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c index 54a53fae9e98f..46da488ff0703 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -11263,6 +11263,7 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern, bool is_sockarray = map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY; struct sock_reuseport *reuse; struct sock *selected_sk; + int err;
selected_sk = map->ops->map_lookup_elem(map, key); if (!selected_sk) @@ -11270,10 +11271,6 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern,
reuse = rcu_dereference(selected_sk->sk_reuseport_cb); if (!reuse) { - /* Lookup in sock_map can return TCP ESTABLISHED sockets. */ - if (sk_is_refcounted(selected_sk)) - sock_put(selected_sk); - /* reuseport_array has only sk with non NULL sk_reuseport_cb. * The only (!reuse) case here is - the sk has already been * unhashed (e.g. by close()), so treat it as -ENOENT. @@ -11281,24 +11278,33 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern, * Other maps (e.g. sock_map) do not provide this guarantee and * the sk may never be in the reuseport group to begin with. */ - return is_sockarray ? -ENOENT : -EINVAL; + err = is_sockarray ? -ENOENT : -EINVAL; + goto error; }
if (unlikely(reuse->reuseport_id != reuse_kern->reuseport_id)) { struct sock *sk = reuse_kern->sk;
- if (sk->sk_protocol != selected_sk->sk_protocol) - return -EPROTOTYPE; - else if (sk->sk_family != selected_sk->sk_family) - return -EAFNOSUPPORT; - - /* Catch all. Likely bound to a different sockaddr. */ - return -EBADFD; + if (sk->sk_protocol != selected_sk->sk_protocol) { + err = -EPROTOTYPE; + } else if (sk->sk_family != selected_sk->sk_family) { + err = -EAFNOSUPPORT; + } else { + /* Catch all. Likely bound to a different sockaddr. */ + err = -EBADFD; + } + goto error; }
reuse_kern->selected_sk = selected_sk;
return 0; +error: + /* Lookup in sock_map can return TCP ESTABLISHED sockets. */ + if (sk_is_refcounted(selected_sk)) + sock_put(selected_sk); + + return err; }
static const struct bpf_func_proto sk_select_reuseport_proto = {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit f0aa6a37a3dbb40b272df5fc6db93c114688adcd ]
Recalculate features when XDP is detached.
Before: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 | grep gro rx-gro-hw: off [requested on]
After: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 | grep gro rx-gro-hw: on
The fact that HW-GRO doesn't get re-enabled automatically is just a minor annoyance. The real issue is that the features will randomly come back during another reconfiguration which just happens to invoke netdev_update_features(). The driver doesn't handle reconfiguring two things at a time very robustly.
Starting with commit 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") we only reconfigure the RSS hash table if the "effective" number of Rx rings has changed. If HW-GRO is enabled "effective" number of rings is 2x what user sees. So if we are in the bad state, with HW-GRO re-enablement "pending" after XDP off, and we lower the rings by / 2 - the HW-GRO rings doing 2x and the ethtool -L doing / 2 may cancel each other out, and the:
if (old_rx_rings != bp->hw_resc.resv_rx_rings &&
condition in __bnxt_reserve_rings() will be false. The RSS map won't get updated, and we'll crash with:
BUG: kernel NULL pointer dereference, address: 0000000000000168 RIP: 0010:__bnxt_hwrm_vnic_set_rss+0x13a/0x1a0 bnxt_hwrm_vnic_rss_cfg_p5+0x47/0x180 __bnxt_setup_vnic_p5+0x58/0x110 bnxt_init_nic+0xb72/0xf50 __bnxt_open_nic+0x40d/0xab0 bnxt_open_nic+0x2b/0x60 ethtool_set_channels+0x18c/0x1d0
As we try to access a freed ring.
The issue is present since XDP support was added, really, but prior to commit 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") it wasn't causing major issues.
Fixes: 1054aee82321 ("bnxt_en: Use NETIF_F_GRO_HW.") Fixes: 98ba1d931f61 ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()") Reviewed-by: Michael Chan michael.chan@broadcom.com Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Link: https://patch.msgid.link/20250109043057.2888953-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 25 +++++++++++++++---- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 7 ------ 3 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index c255445e97f3c..603e9c968c44b 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4558,7 +4558,7 @@ void bnxt_set_ring_params(struct bnxt *bp) /* Changing allocation mode of RX rings. * TODO: Update when extending xdp_rxq_info to support allocation modes. */ -int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode) +static void __bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode) { struct net_device *dev = bp->dev;
@@ -4579,15 +4579,30 @@ int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode) bp->rx_skb_func = bnxt_rx_page_skb; } bp->rx_dir = DMA_BIDIRECTIONAL; - /* Disable LRO or GRO_HW */ - netdev_update_features(dev); } else { dev->max_mtu = bp->max_mtu; bp->flags &= ~BNXT_FLAG_RX_PAGE_MODE; bp->rx_dir = DMA_FROM_DEVICE; bp->rx_skb_func = bnxt_rx_skb; } - return 0; +} + +void bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode) +{ + __bnxt_set_rx_skb_mode(bp, page_mode); + + if (!page_mode) { + int rx, tx; + + bnxt_get_max_rings(bp, &rx, &tx, true); + if (rx > 1) { + bp->flags &= ~BNXT_FLAG_NO_AGG_RINGS; + bp->dev->hw_features |= NETIF_F_LRO; + } + } + + /* Update LRO and GRO_HW availability */ + netdev_update_features(bp->dev); }
static void bnxt_free_vnic_attributes(struct bnxt *bp) @@ -15909,7 +15924,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (bp->max_fltr < BNXT_MAX_FLTR) bp->max_fltr = BNXT_MAX_FLTR; bnxt_init_l2_fltr_tbl(bp); - bnxt_set_rx_skb_mode(bp, false); + __bnxt_set_rx_skb_mode(bp, false); bnxt_set_tpa_flags(bp); bnxt_set_ring_params(bp); bnxt_rdma_aux_device_init(bp); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 9e05704d94450..bee645f58d0bd 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -2796,7 +2796,7 @@ void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons, void *data); u32 bnxt_fw_health_readl(struct bnxt *bp, int reg_idx); void bnxt_set_tpa_flags(struct bnxt *bp); void bnxt_set_ring_params(struct bnxt *); -int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode); +void bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode); void bnxt_insert_usr_fltr(struct bnxt *bp, struct bnxt_filter_base *fltr); void bnxt_del_one_usr_fltr(struct bnxt *bp, struct bnxt_filter_base *fltr); int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp, unsigned long *bmap, diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index f88b641533fcc..dc51dce209d5f 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -422,15 +422,8 @@ static int bnxt_xdp_set(struct bnxt *bp, struct bpf_prog *prog) bnxt_set_rx_skb_mode(bp, true); xdp_features_set_redirect_target(dev, true); } else { - int rx, tx; - xdp_features_clear_redirect_target(dev); bnxt_set_rx_skb_mode(bp, false); - bnxt_get_max_rings(bp, &rx, &tx, true); - if (rx > 1) { - bp->flags &= ~BNXT_FLAG_NO_AGG_RINGS; - bp->dev->hw_features |= NETIF_F_LRO; - } } bp->tx_nr_rings_xdp = tx_xdp; bp->tx_nr_rings = bp->tx_nr_rings_per_tc * tc + tx_xdp;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Barker paul.barker.ct@bp.renesas.com
[ Upstream commit e7e441a4100e4bc90b52f80494a28a9667993975 ]
When tx_max_frame_size was added to struct ravb_hw_info, no value was set in ravb_rzv2m_hw_info so the default value of zero was used.
The maximum MTU is set by subtracting from tx_max_frame_size to allow space for headers and frame checksums. As ndev->max_mtu is unsigned, this subtraction wraps around leading to a ridiculously large positive value that is obviously incorrect.
Before tx_max_frame_size was introduced, the maximum MTU was based on rx_max_frame_size. So, we can restore the correct maximum MTU by copying the rx_max_frame_size value into tx_max_frame_size for RZ/V2M.
Fixes: 1d63864299ca ("net: ravb: Fix maximum TX frame size for GbEth devices") Signed-off-by: Paul Barker paul.barker.ct@bp.renesas.com Reviewed-by: Niklas Söderlund niklas.soderlund+renesas@ragnatech.se Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://patch.msgid.link/20250109113706.1409149-1-paul.barker.ct@bp.renesas.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/ravb_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 907af4651c553..6f6b0566c65bc 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2756,6 +2756,7 @@ static const struct ravb_hw_info ravb_rzv2m_hw_info = { .net_features = NETIF_F_RXCSUM, .stats_len = ARRAY_SIZE(ravb_gstrings_stats), .tccr_mask = TCCR_TSRQ0 | TCCR_TSRQ1 | TCCR_TSRQ2 | TCCR_TSRQ3, + .tx_max_frame_size = SZ_2K, .rx_max_frame_size = SZ_2K, .rx_buffer_size = SZ_2K + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Maximets i.maximets@ovn.org
[ Upstream commit 47e55e4b410f7d552e43011baa5be1aab4093990 ]
Commit in a fixes tag attempted to fix the issue in the following sequence of calls:
do_output -> ovs_vport_send -> dev_queue_xmit -> __dev_queue_xmit -> netdev_core_pick_tx -> skb_tx_hash
When device is unregistering, the 'dev->real_num_tx_queues' goes to zero and the 'while (unlikely(hash >= qcount))' loop inside the 'skb_tx_hash' becomes infinite, locking up the core forever.
But unfortunately, checking just the carrier status is not enough to fix the issue, because some devices may still be in unregistering state while reporting carrier status OK.
One example of such device is a net/dummy. It sets carrier ON on start, but it doesn't implement .ndo_stop to set the carrier off. And it makes sense, because dummy doesn't really have a carrier. Therefore, while this device is unregistering, it's still easy to hit the infinite loop in the skb_tx_hash() from the OVS datapath. There might be other drivers that do the same, but dummy by itself is important for the OVS ecosystem, because it is frequently used as a packet sink for tcpdump while debugging OVS deployments. And when the issue is hit, the only way to recover is to reboot.
Fix that by also checking if the device is running. The running state is handled by the net core during unregistering, so it covers unregistering case better, and we don't really need to send packets to devices that are not running anyway.
While only checking the running state might be enough, the carrier check is preserved. The running and the carrier states seem disjoined throughout the code and different drivers. And other core functions like __dev_direct_xmit() check both before attempting to transmit a packet. So, it seems safer to check both flags in OVS as well.
Fixes: 066b86787fa3 ("net: openvswitch: fix race on port output") Reported-by: Friedrich Weber f.weber@proxmox.com Closes: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-January/053423.html Signed-off-by: Ilya Maximets i.maximets@ovn.org Tested-by: Friedrich Weber f.weber@proxmox.com Reviewed-by: Aaron Conole aconole@redhat.com Link: https://patch.msgid.link/20250109122225.4034688-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/actions.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 16e2600146844..704c858cf2093 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -934,7 +934,9 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, { struct vport *vport = ovs_vport_rcu(dp, out_port);
- if (likely(vport && netif_carrier_ok(vport->dev))) { + if (likely(vport && + netif_running(vport->dev) && + netif_carrier_ok(vport->dev))) { u16 mru = OVS_CB(skb)->mru; u32 cutlen = OVS_CB(skb)->cutlen;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Artem Chernyshev artem.chernyshev@red-soft.ru
[ Upstream commit 76201b5979768500bca362871db66d77cb4c225e ]
Passing a sufficient amount of imix entries leads to invalid access to the pkt_dev->imix_entries array because of the incorrect boundary check.
UBSAN: array-index-out-of-bounds in net/core/pktgen.c:874:24 index 20 is out of range for type 'imix_pkt [20]' CPU: 2 PID: 1210 Comm: bash Not tainted 6.10.0-rc1 #121 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl lib/dump_stack.c:117 __ubsan_handle_out_of_bounds lib/ubsan.c:429 get_imix_entries net/core/pktgen.c:874 pktgen_if_write net/core/pktgen.c:1063 pde_write fs/proc/inode.c:334 proc_reg_write fs/proc/inode.c:346 vfs_write fs/read_write.c:593 ksys_write fs/read_write.c:644 do_syscall_64 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:130
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 52a62f8603f9 ("pktgen: Parse internet mix (imix) input") Signed-off-by: Artem Chernyshev artem.chernyshev@red-soft.ru [ fp: allow to fill the array completely; minor changelog cleanup ] Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/pktgen.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 34f68ef74b8f2..b6db4910359bb 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -851,6 +851,9 @@ static ssize_t get_imix_entries(const char __user *buffer, unsigned long weight; unsigned long size;
+ if (pkt_dev->n_imix_entries >= MAX_IMIX_ENTRIES) + return -E2BIG; + len = num_arg(&buffer[i], max_digits, &size); if (len < 0) return len; @@ -880,9 +883,6 @@ static ssize_t get_imix_entries(const char __user *buffer,
i++; pkt_dev->n_imix_entries++; - - if (pkt_dev->n_imix_entries > MAX_IMIX_ENTRIES) - return -E2BIG; } while (c == ' ');
return i;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Karol Kolacinski karol.kolacinski@intel.com
[ Upstream commit d79c304c76e9b30ff5527afc176b5c4f9f0374b6 ]
Current implementation checks revision of all PHYs on all PFs, which is incorrect and may result in initialization failure. Check only the revision of the current PHY.
Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Signed-off-by: Karol Kolacinski karol.kolacinski@intel.com Signed-off-by: Grzegorz Nitka grzegorz.nitka@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 22 +++++++++------------ 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c index 3816e45b6ab44..f6816c2f71438 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c @@ -2664,14 +2664,15 @@ static bool ice_is_muxed_topo(struct ice_hw *hw) }
/** - * ice_ptp_init_phy_e825c - initialize PHY parameters + * ice_ptp_init_phy_e825 - initialize PHY parameters * @hw: pointer to the HW struct */ -static void ice_ptp_init_phy_e825c(struct ice_hw *hw) +static void ice_ptp_init_phy_e825(struct ice_hw *hw) { struct ice_ptp_hw *ptp = &hw->ptp; struct ice_eth56g_params *params; - u8 phy; + u32 phy_rev; + int err;
ptp->phy_model = ICE_PHY_ETH56G; params = &ptp->phy.eth56g; @@ -2685,15 +2686,10 @@ static void ice_ptp_init_phy_e825c(struct ice_hw *hw) ptp->num_lports = params->num_phys * ptp->ports_per_phy;
ice_sb_access_ena_eth56g(hw, true); - for (phy = 0; phy < params->num_phys; phy++) { - u32 phy_rev; - int err; - - err = ice_read_phy_eth56g(hw, phy, PHY_REG_REVISION, &phy_rev); - if (err || phy_rev != PHY_REVISION_ETH56G) { - ptp->phy_model = ICE_PHY_UNSUP; - return; - } + err = ice_read_phy_eth56g(hw, hw->pf_id, PHY_REG_REVISION, &phy_rev); + if (err || phy_rev != PHY_REVISION_ETH56G) { + ptp->phy_model = ICE_PHY_UNSUP; + return; }
ptp->is_2x50g_muxed_topo = ice_is_muxed_topo(hw); @@ -5395,7 +5391,7 @@ void ice_ptp_init_hw(struct ice_hw *hw) else if (ice_is_e810(hw)) ice_ptp_init_phy_e810(ptp); else if (ice_is_e825c(hw)) - ice_ptp_init_phy_e825c(hw); + ice_ptp_init_phy_e825(hw); else ptp->phy_model = ICE_PHY_UNSUP; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Karol Kolacinski karol.kolacinski@intel.com
[ Upstream commit dc26548d729e5f732197d2b210fb77c745b01495 ]
Quad registers are read/written incorrectly. E825 devices always use quad 0 address and differentiate between the PHYs by changing SBQ destination device (phy_0 or phy_0_peer).
Add helpers for reading/writing PTP registers shared per quad and use correct quad address and SBQ destination device based on port.
Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Signed-off-by: Karol Kolacinski karol.kolacinski@intel.com Signed-off-by: Grzegorz Nitka grzegorz.nitka@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 219 ++++++++++++-------- drivers/net/ethernet/intel/ice/ice_type.h | 1 - 2 files changed, 133 insertions(+), 87 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c index f6816c2f71438..64c36620a02f3 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c @@ -877,31 +877,46 @@ static void ice_ptp_exec_tmr_cmd(struct ice_hw *hw) * The following functions operate on devices with the ETH 56G PHY. */
+/** + * ice_ptp_get_dest_dev_e825 - get destination PHY for given port number + * @hw: pointer to the HW struct + * @port: destination port + * + * Return: destination sideband queue PHY device. + */ +static enum ice_sbq_msg_dev ice_ptp_get_dest_dev_e825(struct ice_hw *hw, + u8 port) +{ + /* On a single complex E825, PHY 0 is always destination device phy_0 + * and PHY 1 is phy_0_peer. + */ + if (port >= hw->ptp.ports_per_phy) + return eth56g_phy_1; + else + return eth56g_phy_0; +} + /** * ice_write_phy_eth56g - Write a PHY port register * @hw: pointer to the HW struct - * @phy_idx: PHY index + * @port: destination port * @addr: PHY register address * @val: Value to write * * Return: 0 on success, other error codes when failed to write to PHY */ -static int ice_write_phy_eth56g(struct ice_hw *hw, u8 phy_idx, u32 addr, - u32 val) +static int ice_write_phy_eth56g(struct ice_hw *hw, u8 port, u32 addr, u32 val) { - struct ice_sbq_msg_input phy_msg; + struct ice_sbq_msg_input msg = { + .dest_dev = ice_ptp_get_dest_dev_e825(hw, port), + .opcode = ice_sbq_msg_wr, + .msg_addr_low = lower_16_bits(addr), + .msg_addr_high = upper_16_bits(addr), + .data = val + }; int err;
- phy_msg.opcode = ice_sbq_msg_wr; - - phy_msg.msg_addr_low = lower_16_bits(addr); - phy_msg.msg_addr_high = upper_16_bits(addr); - - phy_msg.data = val; - phy_msg.dest_dev = hw->ptp.phy.eth56g.phy_addr[phy_idx]; - - err = ice_sbq_rw_reg(hw, &phy_msg, ICE_AQ_FLAG_RD); - + err = ice_sbq_rw_reg(hw, &msg, ICE_AQ_FLAG_RD); if (err) ice_debug(hw, ICE_DBG_PTP, "PTP failed to send msg to phy %d\n", err); @@ -912,41 +927,36 @@ static int ice_write_phy_eth56g(struct ice_hw *hw, u8 phy_idx, u32 addr, /** * ice_read_phy_eth56g - Read a PHY port register * @hw: pointer to the HW struct - * @phy_idx: PHY index + * @port: destination port * @addr: PHY register address * @val: Value to write * * Return: 0 on success, other error codes when failed to read from PHY */ -static int ice_read_phy_eth56g(struct ice_hw *hw, u8 phy_idx, u32 addr, - u32 *val) +static int ice_read_phy_eth56g(struct ice_hw *hw, u8 port, u32 addr, u32 *val) { - struct ice_sbq_msg_input phy_msg; + struct ice_sbq_msg_input msg = { + .dest_dev = ice_ptp_get_dest_dev_e825(hw, port), + .opcode = ice_sbq_msg_rd, + .msg_addr_low = lower_16_bits(addr), + .msg_addr_high = upper_16_bits(addr) + }; int err;
- phy_msg.opcode = ice_sbq_msg_rd; - - phy_msg.msg_addr_low = lower_16_bits(addr); - phy_msg.msg_addr_high = upper_16_bits(addr); - - phy_msg.data = 0; - phy_msg.dest_dev = hw->ptp.phy.eth56g.phy_addr[phy_idx]; - - err = ice_sbq_rw_reg(hw, &phy_msg, ICE_AQ_FLAG_RD); - if (err) { + err = ice_sbq_rw_reg(hw, &msg, ICE_AQ_FLAG_RD); + if (err) ice_debug(hw, ICE_DBG_PTP, "PTP failed to send msg to phy %d\n", err); - return err; - } - - *val = phy_msg.data; + else + *val = msg.data;
- return 0; + return err; }
/** * ice_phy_res_address_eth56g - Calculate a PHY port register address - * @port: Port number to be written + * @hw: pointer to the HW struct + * @lane: Lane number to be written * @res_type: resource type (register/memory) * @offset: Offset from PHY port register base * @addr: The result address @@ -955,17 +965,19 @@ static int ice_read_phy_eth56g(struct ice_hw *hw, u8 phy_idx, u32 addr, * * %0 - success * * %EINVAL - invalid port number or resource type */ -static int ice_phy_res_address_eth56g(u8 port, enum eth56g_res_type res_type, - u32 offset, u32 *addr) +static int ice_phy_res_address_eth56g(struct ice_hw *hw, u8 lane, + enum eth56g_res_type res_type, + u32 offset, + u32 *addr) { - u8 lane = port % ICE_PORTS_PER_QUAD; - u8 phy = ICE_GET_QUAD_NUM(port); - if (res_type >= NUM_ETH56G_PHY_RES) return -EINVAL;
- *addr = eth56g_phy_res[res_type].base[phy] + + /* Lanes 4..7 are in fact 0..3 on a second PHY */ + lane %= hw->ptp.ports_per_phy; + *addr = eth56g_phy_res[res_type].base[0] + lane * eth56g_phy_res[res_type].step + offset; + return 0; }
@@ -985,19 +997,17 @@ static int ice_phy_res_address_eth56g(u8 port, enum eth56g_res_type res_type, static int ice_write_port_eth56g(struct ice_hw *hw, u8 port, u32 offset, u32 val, enum eth56g_res_type res_type) { - u8 phy_port = port % hw->ptp.ports_per_phy; - u8 phy_idx = port / hw->ptp.ports_per_phy; u32 addr; int err;
if (port >= hw->ptp.num_lports) return -EINVAL;
- err = ice_phy_res_address_eth56g(phy_port, res_type, offset, &addr); + err = ice_phy_res_address_eth56g(hw, port, res_type, offset, &addr); if (err) return err;
- return ice_write_phy_eth56g(hw, phy_idx, addr, val); + return ice_write_phy_eth56g(hw, port, addr, val); }
/** @@ -1016,19 +1026,17 @@ static int ice_write_port_eth56g(struct ice_hw *hw, u8 port, u32 offset, static int ice_read_port_eth56g(struct ice_hw *hw, u8 port, u32 offset, u32 *val, enum eth56g_res_type res_type) { - u8 phy_port = port % hw->ptp.ports_per_phy; - u8 phy_idx = port / hw->ptp.ports_per_phy; u32 addr; int err;
if (port >= hw->ptp.num_lports) return -EINVAL;
- err = ice_phy_res_address_eth56g(phy_port, res_type, offset, &addr); + err = ice_phy_res_address_eth56g(hw, port, res_type, offset, &addr); if (err) return err;
- return ice_read_phy_eth56g(hw, phy_idx, addr, val); + return ice_read_phy_eth56g(hw, port, addr, val); }
/** @@ -1177,6 +1185,56 @@ static int ice_write_port_mem_eth56g(struct ice_hw *hw, u8 port, u16 offset, return ice_write_port_eth56g(hw, port, offset, val, ETH56G_PHY_MEM_PTP); }
+/** + * ice_write_quad_ptp_reg_eth56g - Write a PHY quad register + * @hw: pointer to the HW struct + * @offset: PHY register offset + * @port: Port number + * @val: Value to write + * + * Return: + * * %0 - success + * * %EIO - invalid port number or resource type + * * %other - failed to write to PHY + */ +static int ice_write_quad_ptp_reg_eth56g(struct ice_hw *hw, u8 port, + u32 offset, u32 val) +{ + u32 addr; + + if (port >= hw->ptp.num_lports) + return -EIO; + + addr = eth56g_phy_res[ETH56G_PHY_REG_PTP].base[0] + offset; + + return ice_write_phy_eth56g(hw, port, addr, val); +} + +/** + * ice_read_quad_ptp_reg_eth56g - Read a PHY quad register + * @hw: pointer to the HW struct + * @offset: PHY register offset + * @port: Port number + * @val: Value to read + * + * Return: + * * %0 - success + * * %EIO - invalid port number or resource type + * * %other - failed to read from PHY + */ +static int ice_read_quad_ptp_reg_eth56g(struct ice_hw *hw, u8 port, + u32 offset, u32 *val) +{ + u32 addr; + + if (port >= hw->ptp.num_lports) + return -EIO; + + addr = eth56g_phy_res[ETH56G_PHY_REG_PTP].base[0] + offset; + + return ice_read_phy_eth56g(hw, port, addr, val); +} + /** * ice_is_64b_phy_reg_eth56g - Check if this is a 64bit PHY register * @low_addr: the low address to check @@ -1896,7 +1954,6 @@ ice_phy_get_speed_eth56g(struct ice_link_status *li) */ static int ice_phy_cfg_parpcs_eth56g(struct ice_hw *hw, u8 port) { - u8 port_blk = port & ~(ICE_PORTS_PER_QUAD - 1); u32 val; int err;
@@ -1911,8 +1968,8 @@ static int ice_phy_cfg_parpcs_eth56g(struct ice_hw *hw, u8 port) switch (ice_phy_get_speed_eth56g(&hw->port_info->phy.link_info)) { case ICE_ETH56G_LNK_SPD_1G: case ICE_ETH56G_LNK_SPD_2_5G: - err = ice_read_ptp_reg_eth56g(hw, port_blk, - PHY_GPCS_CONFIG_REG0, &val); + err = ice_read_quad_ptp_reg_eth56g(hw, port, + PHY_GPCS_CONFIG_REG0, &val); if (err) { ice_debug(hw, ICE_DBG_PTP, "Failed to read PHY_GPCS_CONFIG_REG0, status: %d", err); @@ -1923,8 +1980,8 @@ static int ice_phy_cfg_parpcs_eth56g(struct ice_hw *hw, u8 port) val |= FIELD_PREP(PHY_GPCS_CONFIG_REG0_TX_THR_M, ICE_ETH56G_NOMINAL_TX_THRESH);
- err = ice_write_ptp_reg_eth56g(hw, port_blk, - PHY_GPCS_CONFIG_REG0, val); + err = ice_write_quad_ptp_reg_eth56g(hw, port, + PHY_GPCS_CONFIG_REG0, val); if (err) { ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_GPCS_CONFIG_REG0, status: %d", err); @@ -1965,50 +2022,47 @@ static int ice_phy_cfg_parpcs_eth56g(struct ice_hw *hw, u8 port) */ int ice_phy_cfg_ptp_1step_eth56g(struct ice_hw *hw, u8 port) { - u8 port_blk = port & ~(ICE_PORTS_PER_QUAD - 1); - u8 blk_port = port & (ICE_PORTS_PER_QUAD - 1); + u8 quad_lane = port % ICE_PORTS_PER_QUAD; + u32 addr, val, peer_delay; bool enable, sfd_ena; - u32 val, peer_delay; int err;
enable = hw->ptp.phy.eth56g.onestep_ena; peer_delay = hw->ptp.phy.eth56g.peer_delay; sfd_ena = hw->ptp.phy.eth56g.sfd_ena;
- /* PHY_PTP_1STEP_CONFIG */ - err = ice_read_ptp_reg_eth56g(hw, port_blk, PHY_PTP_1STEP_CONFIG, &val); + addr = PHY_PTP_1STEP_CONFIG; + err = ice_read_quad_ptp_reg_eth56g(hw, port, addr, &val); if (err) return err;
if (enable) - val |= blk_port; + val |= BIT(quad_lane); else - val &= ~blk_port; + val &= ~BIT(quad_lane);
val &= ~(PHY_PTP_1STEP_T1S_UP64_M | PHY_PTP_1STEP_T1S_DELTA_M);
- err = ice_write_ptp_reg_eth56g(hw, port_blk, PHY_PTP_1STEP_CONFIG, val); + err = ice_write_quad_ptp_reg_eth56g(hw, port, addr, val); if (err) return err;
- /* PHY_PTP_1STEP_PEER_DELAY */ + addr = PHY_PTP_1STEP_PEER_DELAY(quad_lane); val = FIELD_PREP(PHY_PTP_1STEP_PD_DELAY_M, peer_delay); if (peer_delay) val |= PHY_PTP_1STEP_PD_ADD_PD_M; val |= PHY_PTP_1STEP_PD_DLY_V_M; - err = ice_write_ptp_reg_eth56g(hw, port_blk, - PHY_PTP_1STEP_PEER_DELAY(blk_port), val); + err = ice_write_quad_ptp_reg_eth56g(hw, port, addr, val); if (err) return err;
val &= ~PHY_PTP_1STEP_PD_DLY_V_M; - err = ice_write_ptp_reg_eth56g(hw, port_blk, - PHY_PTP_1STEP_PEER_DELAY(blk_port), val); + err = ice_write_quad_ptp_reg_eth56g(hw, port, addr, val); if (err) return err;
- /* PHY_MAC_XIF_MODE */ - err = ice_read_mac_reg_eth56g(hw, port, PHY_MAC_XIF_MODE, &val); + addr = PHY_MAC_XIF_MODE; + err = ice_read_mac_reg_eth56g(hw, port, addr, &val); if (err) return err;
@@ -2028,7 +2082,7 @@ int ice_phy_cfg_ptp_1step_eth56g(struct ice_hw *hw, u8 port) FIELD_PREP(PHY_MAC_XIF_TS_BIN_MODE_M, enable) | FIELD_PREP(PHY_MAC_XIF_TS_SFD_ENA_M, sfd_ena);
- return ice_write_mac_reg_eth56g(hw, port, PHY_MAC_XIF_MODE, val); + return ice_write_mac_reg_eth56g(hw, port, addr, val); }
/** @@ -2070,21 +2124,22 @@ static u32 ice_ptp_calc_bitslip_eth56g(struct ice_hw *hw, u8 port, u32 bs, bool fc, bool rs, enum ice_eth56g_link_spd spd) { - u8 port_offset = port & (ICE_PORTS_PER_QUAD - 1); - u8 port_blk = port & ~(ICE_PORTS_PER_QUAD - 1); u32 bitslip; int err;
if (!bs || rs) return 0;
- if (spd == ICE_ETH56G_LNK_SPD_1G || spd == ICE_ETH56G_LNK_SPD_2_5G) + if (spd == ICE_ETH56G_LNK_SPD_1G || spd == ICE_ETH56G_LNK_SPD_2_5G) { err = ice_read_gpcs_reg_eth56g(hw, port, PHY_GPCS_BITSLIP, &bitslip); - else - err = ice_read_ptp_reg_eth56g(hw, port_blk, - PHY_REG_SD_BIT_SLIP(port_offset), - &bitslip); + } else { + u8 quad_lane = port % ICE_PORTS_PER_QUAD; + u32 addr; + + addr = PHY_REG_SD_BIT_SLIP(quad_lane); + err = ice_read_quad_ptp_reg_eth56g(hw, port, addr, &bitslip); + } if (err) return 0;
@@ -2679,8 +2734,6 @@ static void ice_ptp_init_phy_e825(struct ice_hw *hw) params->onestep_ena = false; params->peer_delay = 0; params->sfd_ena = false; - params->phy_addr[0] = eth56g_phy_0; - params->phy_addr[1] = eth56g_phy_1; params->num_phys = 2; ptp->ports_per_phy = 4; ptp->num_lports = params->num_phys * ptp->ports_per_phy; @@ -2711,10 +2764,9 @@ static void ice_fill_phy_msg_e82x(struct ice_hw *hw, struct ice_sbq_msg_input *msg, u8 port, u16 offset) { - int phy_port, phy, quadtype; + int phy_port, quadtype;
phy_port = port % hw->ptp.ports_per_phy; - phy = port / hw->ptp.ports_per_phy; quadtype = ICE_GET_QUAD_NUM(port) % ICE_GET_QUAD_NUM(hw->ptp.ports_per_phy);
@@ -2726,12 +2778,7 @@ static void ice_fill_phy_msg_e82x(struct ice_hw *hw, msg->msg_addr_high = P_Q1_H(P_4_BASE + offset, phy_port); }
- if (phy == 0) - msg->dest_dev = rmn_0; - else if (phy == 1) - msg->dest_dev = rmn_1; - else - msg->dest_dev = rmn_2; + msg->dest_dev = rmn_0; }
/** diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h index 45768796691fe..479227bdff75e 100644 --- a/drivers/net/ethernet/intel/ice/ice_type.h +++ b/drivers/net/ethernet/intel/ice/ice_type.h @@ -850,7 +850,6 @@ struct ice_mbx_data {
struct ice_eth56g_params { u8 num_phys; - u8 phy_addr[2]; bool onestep_ena; bool sfd_ena; u32 peer_delay;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Karol Kolacinski karol.kolacinski@intel.com
[ Upstream commit 2e60560f1ec9b722f9c6699ec5d966f1732d14dd ]
Fix ETH56G FC-FEC incorrect Rx offset value by changing it from -255.96 to -469.26 ns.
Those values are derived from HW spec and reflect internal delays. Hex value is a fixed point representation in Q23.9 format.
Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Signed-off-by: Karol Kolacinski karol.kolacinski@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_ptp_consts.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_consts.h b/drivers/net/ethernet/intel/ice/ice_ptp_consts.h index 3005dd252a102..bdb1020147d1c 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_consts.h +++ b/drivers/net/ethernet/intel/ice/ice_ptp_consts.h @@ -131,7 +131,7 @@ struct ice_eth56g_mac_reg_cfg eth56g_mac_cfg[NUM_ICE_ETH56G_LNK_SPD] = { .rx_offset = { .serdes = 0xffffeb27, /* -10.42424 */ .no_fec = 0xffffcccd, /* -25.6 */ - .fc = 0xfffe0014, /* -255.96 */ + .fc = 0xfffc557b, /* -469.26 */ .sfd = 0x4a4, /* 2.32 */ .bs_ds = 0x32 /* 0.0969697 */ }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sergey Temerkhanov sergey.temerkhanov@intel.com
[ Upstream commit 5e0776451d89eefe66b19e010e48ece1cca07e58 ]
Introduce ice_get_phy_model() to improve code readability
Signed-off-by: Sergey Temerkhanov sergey.temerkhanov@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Simon Horman horms@kernel.org Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 258f5f905815 ("ice: Add correct PHY lane assignment") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice.h | 5 +++++ drivers/net/ethernet/intel/ice/ice_ptp.c | 19 +++++++++--------- drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 22 ++++++++++----------- 3 files changed, 25 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index d6f80da30decf..558cda577191d 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -1047,5 +1047,10 @@ static inline void ice_clear_rdma_cap(struct ice_pf *pf) clear_bit(ICE_FLAG_RDMA_ENA, pf->flags); }
+static inline enum ice_phy_model ice_get_phy_model(const struct ice_hw *hw) +{ + return hw->ptp.phy_model; +} + extern const struct xdp_metadata_ops ice_xdp_md_ops; #endif /* _ICE_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index ef2e858f49bb0..d534c8e571cee 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -1363,7 +1363,7 @@ ice_ptp_port_phy_stop(struct ice_ptp_port *ptp_port)
mutex_lock(&ptp_port->ps_lock);
- switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: err = ice_stop_phy_timer_eth56g(hw, port, true); break; @@ -1409,7 +1409,7 @@ ice_ptp_port_phy_restart(struct ice_ptp_port *ptp_port)
mutex_lock(&ptp_port->ps_lock);
- switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: err = ice_start_phy_timer_eth56g(hw, port); break; @@ -1480,8 +1480,7 @@ void ice_ptp_link_change(struct ice_pf *pf, u8 port, bool linkup) /* Skip HW writes if reset is in progress */ if (pf->hw.reset_ongoing) return; - - switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_E810: /* Do not reconfigure E810 PHY */ return; @@ -1514,7 +1513,7 @@ static int ice_ptp_cfg_phy_interrupt(struct ice_pf *pf, bool ena, u32 threshold)
ice_ptp_reset_ts_memory(hw);
- switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: { int port;
@@ -1553,7 +1552,7 @@ static int ice_ptp_cfg_phy_interrupt(struct ice_pf *pf, bool ena, u32 threshold) case ICE_PHY_UNSUP: default: dev_warn(dev, "%s: Unexpected PHY model %d\n", __func__, - hw->ptp.phy_model); + ice_get_phy_model(hw)); return -EOPNOTSUPP; } } @@ -2059,7 +2058,7 @@ ice_ptp_settime64(struct ptp_clock_info *info, const struct timespec64 *ts) /* For Vernier mode on E82X, we need to recalibrate after new settime. * Start with marking timestamps as invalid. */ - if (hw->ptp.phy_model == ICE_PHY_E82X) { + if (ice_get_phy_model(hw) == ICE_PHY_E82X) { err = ice_ptp_clear_phy_offset_ready_e82x(hw); if (err) dev_warn(ice_pf_to_dev(pf), "Failed to mark timestamps as invalid before settime\n"); @@ -2083,7 +2082,7 @@ ice_ptp_settime64(struct ptp_clock_info *info, const struct timespec64 *ts) ice_ptp_enable_all_clkout(pf);
/* Recalibrate and re-enable timestamp blocks for E822/E823 */ - if (hw->ptp.phy_model == ICE_PHY_E82X) + if (ice_get_phy_model(hw) == ICE_PHY_E82X) ice_ptp_restart_all_phy(pf); exit: if (err) { @@ -3209,7 +3208,7 @@ static int ice_ptp_init_port(struct ice_pf *pf, struct ice_ptp_port *ptp_port)
mutex_init(&ptp_port->ps_lock);
- switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: return ice_ptp_init_tx_eth56g(pf, &ptp_port->tx, ptp_port->port_num); @@ -3307,7 +3306,7 @@ static void ice_ptp_remove_auxbus_device(struct ice_pf *pf) */ static void ice_ptp_init_tx_interrupt_mode(struct ice_pf *pf) { - switch (pf->hw.ptp.phy_model) { + switch (ice_get_phy_model(&pf->hw)) { case ICE_PHY_E82X: /* E822 based PHY has the clock owner process the interrupt * for all ports. diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c index 64c36620a02f3..28bd4815ebcf3 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c @@ -804,7 +804,7 @@ static u32 ice_ptp_tmr_cmd_to_port_reg(struct ice_hw *hw, /* Certain hardware families share the same register values for the * port register and source timer register. */ - switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_E810: return ice_ptp_tmr_cmd_to_src_reg(hw, cmd) & TS_CMD_MASK_E810; default: @@ -5461,7 +5461,7 @@ void ice_ptp_init_hw(struct ice_hw *hw) static int ice_ptp_write_port_cmd(struct ice_hw *hw, u8 port, enum ice_ptp_tmr_cmd cmd) { - switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: return ice_ptp_write_port_cmd_eth56g(hw, port, cmd); case ICE_PHY_E82X: @@ -5526,7 +5526,7 @@ static int ice_ptp_port_cmd(struct ice_hw *hw, enum ice_ptp_tmr_cmd cmd) u32 port;
/* PHY models which can program all ports simultaneously */ - switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_E810: return ice_ptp_port_cmd_e810(hw, cmd); default: @@ -5605,7 +5605,7 @@ int ice_ptp_init_time(struct ice_hw *hw, u64 time)
/* PHY timers */ /* Fill Rx and Tx ports and send msg to PHY */ - switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: err = ice_ptp_prep_phy_time_eth56g(hw, (u32)(time & 0xFFFFFFFF)); @@ -5651,7 +5651,7 @@ int ice_ptp_write_incval(struct ice_hw *hw, u64 incval) wr32(hw, GLTSYN_SHADJ_L(tmr_idx), lower_32_bits(incval)); wr32(hw, GLTSYN_SHADJ_H(tmr_idx), upper_32_bits(incval));
- switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: err = ice_ptp_prep_phy_incval_eth56g(hw, incval); break; @@ -5720,7 +5720,7 @@ int ice_ptp_adj_clock(struct ice_hw *hw, s32 adj) wr32(hw, GLTSYN_SHADJ_L(tmr_idx), 0); wr32(hw, GLTSYN_SHADJ_H(tmr_idx), adj);
- switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: err = ice_ptp_prep_phy_adj_eth56g(hw, adj); break; @@ -5753,7 +5753,7 @@ int ice_ptp_adj_clock(struct ice_hw *hw, s32 adj) */ int ice_read_phy_tstamp(struct ice_hw *hw, u8 block, u8 idx, u64 *tstamp) { - switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: return ice_read_ptp_tstamp_eth56g(hw, block, idx, tstamp); case ICE_PHY_E810: @@ -5783,7 +5783,7 @@ int ice_read_phy_tstamp(struct ice_hw *hw, u8 block, u8 idx, u64 *tstamp) */ int ice_clear_phy_tstamp(struct ice_hw *hw, u8 block, u8 idx) { - switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: return ice_clear_ptp_tstamp_eth56g(hw, block, idx); case ICE_PHY_E810: @@ -5846,7 +5846,7 @@ static int ice_get_pf_c827_idx(struct ice_hw *hw, u8 *idx) */ void ice_ptp_reset_ts_memory(struct ice_hw *hw) { - switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: ice_ptp_reset_ts_memory_eth56g(hw); break; @@ -5875,7 +5875,7 @@ int ice_ptp_init_phc(struct ice_hw *hw) /* Clear event err indications for auxiliary pins */ (void)rd32(hw, GLTSYN_STAT(src_idx));
- switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: return ice_ptp_init_phc_eth56g(hw); case ICE_PHY_E810: @@ -5900,7 +5900,7 @@ int ice_ptp_init_phc(struct ice_hw *hw) */ int ice_get_phy_tx_tstamp_ready(struct ice_hw *hw, u8 block, u64 *tstamp_ready) { - switch (hw->ptp.phy_model) { + switch (ice_get_phy_model(hw)) { case ICE_PHY_ETH56G: return ice_get_phy_tx_tstamp_ready_eth56g(hw, block, tstamp_ready);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sergey Temerkhanov sergey.temerkhanov@intel.com
[ Upstream commit 97ed20a01f5b96e8738b53f56ae84b06953a2853 ]
Add ice_get_ctrl_ptp() wrapper to simplify the PTP support code in the functions that do not use ctrl_pf directly. Add the control PF pointer to struct ice_adapter Rearrange fields in struct ice_adapter
Signed-off-by: Sergey Temerkhanov sergey.temerkhanov@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Simon Horman horms@kernel.org Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 258f5f905815 ("ice: Add correct PHY lane assignment") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_adapter.h | 5 ++++- drivers/net/ethernet/intel/ice/ice_ptp.c | 12 ++++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_adapter.h b/drivers/net/ethernet/intel/ice/ice_adapter.h index 9d11014ec02ff..eb7cac01c2427 100644 --- a/drivers/net/ethernet/intel/ice/ice_adapter.h +++ b/drivers/net/ethernet/intel/ice/ice_adapter.h @@ -8,18 +8,21 @@ #include <linux/refcount_types.h>
struct pci_dev; +struct ice_pf;
/** * struct ice_adapter - PCI adapter resources shared across PFs * @ptp_gltsyn_time_lock: Spinlock protecting access to the GLTSYN_TIME * register of the PTP clock. * @refcount: Reference count. struct ice_pf objects hold the references. + * @ctrl_pf: Control PF of the adapter */ struct ice_adapter { + refcount_t refcount; /* For access to the GLTSYN_TIME register */ spinlock_t ptp_gltsyn_time_lock;
- refcount_t refcount; + struct ice_pf *ctrl_pf; };
struct ice_adapter *ice_adapter_get(const struct pci_dev *pdev); diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index d534c8e571cee..2fb73ab41bc20 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -16,6 +16,18 @@ static const struct ptp_pin_desc ice_pin_desc_e810t[] = { { "U.FL2", UFL2, PTP_PF_NONE, 2, { 0, } }, };
+static struct ice_pf *ice_get_ctrl_pf(struct ice_pf *pf) +{ + return !pf->adapter ? NULL : pf->adapter->ctrl_pf; +} + +static __maybe_unused struct ice_ptp *ice_get_ctrl_ptp(struct ice_pf *pf) +{ + struct ice_pf *ctrl_pf = ice_get_ctrl_pf(pf); + + return !ctrl_pf ? NULL : &ctrl_pf->ptp; +} + /** * ice_get_sma_config_e810t * @hw: pointer to the hw struct
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sergey Temerkhanov sergey.temerkhanov@intel.com
[ Upstream commit e800654e85b5b27966fc6493201f5f8cf658beb6 ]
Use struct ice_adapter to hold shared PTP data and control PTP related actions instead of auxbus. This allows significant code simplification and faster access to the container fields used in the PTP support code.
Move the PTP port list to the ice_adapter container to simplify the code and avoid race conditions which could occur due to the synchronous nature of the initialization/access and certain memory saving can be achieved by moving PTP data into the ice_adapter itself.
Signed-off-by: Sergey Temerkhanov sergey.temerkhanov@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Simon Horman horms@kernel.org Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Stable-dep-of: 258f5f905815 ("ice: Add correct PHY lane assignment") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_adapter.c | 6 + drivers/net/ethernet/intel/ice/ice_adapter.h | 17 +++ drivers/net/ethernet/intel/ice/ice_ptp.c | 115 ++++++++++++------- drivers/net/ethernet/intel/ice/ice_ptp.h | 5 +- drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 5 + 5 files changed, 105 insertions(+), 43 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_adapter.c b/drivers/net/ethernet/intel/ice/ice_adapter.c index ad84d8ad49a63..f3e195974a8ef 100644 --- a/drivers/net/ethernet/intel/ice/ice_adapter.c +++ b/drivers/net/ethernet/intel/ice/ice_adapter.c @@ -40,11 +40,17 @@ static struct ice_adapter *ice_adapter_new(void) spin_lock_init(&adapter->ptp_gltsyn_time_lock); refcount_set(&adapter->refcount, 1);
+ mutex_init(&adapter->ports.lock); + INIT_LIST_HEAD(&adapter->ports.ports); + return adapter; }
static void ice_adapter_free(struct ice_adapter *adapter) { + WARN_ON(!list_empty(&adapter->ports.ports)); + mutex_destroy(&adapter->ports.lock); + kfree(adapter); }
diff --git a/drivers/net/ethernet/intel/ice/ice_adapter.h b/drivers/net/ethernet/intel/ice/ice_adapter.h index eb7cac01c2427..e233225848b38 100644 --- a/drivers/net/ethernet/intel/ice/ice_adapter.h +++ b/drivers/net/ethernet/intel/ice/ice_adapter.h @@ -4,18 +4,34 @@ #ifndef _ICE_ADAPTER_H_ #define _ICE_ADAPTER_H_
+#include <linux/types.h> #include <linux/spinlock_types.h> #include <linux/refcount_types.h>
struct pci_dev; struct ice_pf;
+/** + * struct ice_port_list - data used to store the list of adapter ports + * + * This structure contains data used to maintain a list of adapter ports + * + * @ports: list of ports + * @lock: protect access to the ports list + */ +struct ice_port_list { + struct list_head ports; + /* To synchronize the ports list operations */ + struct mutex lock; +}; + /** * struct ice_adapter - PCI adapter resources shared across PFs * @ptp_gltsyn_time_lock: Spinlock protecting access to the GLTSYN_TIME * register of the PTP clock. * @refcount: Reference count. struct ice_pf objects hold the references. * @ctrl_pf: Control PF of the adapter + * @ports: Ports list */ struct ice_adapter { refcount_t refcount; @@ -23,6 +39,7 @@ struct ice_adapter { spinlock_t ptp_gltsyn_time_lock;
struct ice_pf *ctrl_pf; + struct ice_port_list ports; };
struct ice_adapter *ice_adapter_get(const struct pci_dev *pdev); diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index 2fb73ab41bc20..0b3dd87a331c7 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -21,7 +21,7 @@ static struct ice_pf *ice_get_ctrl_pf(struct ice_pf *pf) return !pf->adapter ? NULL : pf->adapter->ctrl_pf; }
-static __maybe_unused struct ice_ptp *ice_get_ctrl_ptp(struct ice_pf *pf) +static struct ice_ptp *ice_get_ctrl_ptp(struct ice_pf *pf) { struct ice_pf *ctrl_pf = ice_get_ctrl_pf(pf);
@@ -812,8 +812,8 @@ static enum ice_tx_tstamp_work ice_ptp_tx_tstamp_owner(struct ice_pf *pf) struct ice_ptp_port *port; unsigned int i;
- mutex_lock(&pf->ptp.ports_owner.lock); - list_for_each_entry(port, &pf->ptp.ports_owner.ports, list_member) { + mutex_lock(&pf->adapter->ports.lock); + list_for_each_entry(port, &pf->adapter->ports.ports, list_node) { struct ice_ptp_tx *tx = &port->tx;
if (!tx || !tx->init) @@ -821,7 +821,7 @@ static enum ice_tx_tstamp_work ice_ptp_tx_tstamp_owner(struct ice_pf *pf)
ice_ptp_process_tx_tstamp(tx); } - mutex_unlock(&pf->ptp.ports_owner.lock); + mutex_unlock(&pf->adapter->ports.lock);
for (i = 0; i < ICE_GET_QUAD_NUM(pf->hw.ptp.num_lports); i++) { u64 tstamp_ready; @@ -986,7 +986,7 @@ ice_ptp_flush_all_tx_tracker(struct ice_pf *pf) { struct ice_ptp_port *port;
- list_for_each_entry(port, &pf->ptp.ports_owner.ports, list_member) + list_for_each_entry(port, &pf->adapter->ports.ports, list_node) ice_ptp_flush_tx_tracker(ptp_port_to_pf(port), &port->tx); }
@@ -1586,10 +1586,10 @@ static void ice_ptp_restart_all_phy(struct ice_pf *pf) { struct list_head *entry;
- list_for_each(entry, &pf->ptp.ports_owner.ports) { + list_for_each(entry, &pf->adapter->ports.ports) { struct ice_ptp_port *port = list_entry(entry, struct ice_ptp_port, - list_member); + list_node);
if (port->link_up) ice_ptp_port_phy_restart(port); @@ -2906,6 +2906,50 @@ void ice_ptp_rebuild(struct ice_pf *pf, enum ice_reset_req reset_type) dev_err(ice_pf_to_dev(pf), "PTP reset failed %d\n", err); }
+static bool ice_is_primary(struct ice_hw *hw) +{ + return ice_is_e825c(hw) && ice_is_dual(hw) ? + !!(hw->dev_caps.nac_topo.mode & ICE_NAC_TOPO_PRIMARY_M) : true; +} + +static int ice_ptp_setup_adapter(struct ice_pf *pf) +{ + if (!ice_pf_src_tmr_owned(pf) || !ice_is_primary(&pf->hw)) + return -EPERM; + + pf->adapter->ctrl_pf = pf; + + return 0; +} + +static int ice_ptp_setup_pf(struct ice_pf *pf) +{ + struct ice_ptp *ctrl_ptp = ice_get_ctrl_ptp(pf); + struct ice_ptp *ptp = &pf->ptp; + + if (WARN_ON(!ctrl_ptp) || ice_get_phy_model(&pf->hw) == ICE_PHY_UNSUP) + return -ENODEV; + + INIT_LIST_HEAD(&ptp->port.list_node); + mutex_lock(&pf->adapter->ports.lock); + + list_add(&ptp->port.list_node, + &pf->adapter->ports.ports); + mutex_unlock(&pf->adapter->ports.lock); + + return 0; +} + +static void ice_ptp_cleanup_pf(struct ice_pf *pf) +{ + struct ice_ptp *ptp = &pf->ptp; + + if (ice_get_phy_model(&pf->hw) != ICE_PHY_UNSUP) { + mutex_lock(&pf->adapter->ports.lock); + list_del(&ptp->port.list_node); + mutex_unlock(&pf->adapter->ports.lock); + } +} /** * ice_ptp_aux_dev_to_aux_pf - Get auxiliary PF handle for the auxiliary device * @aux_dev: auxiliary device to get the auxiliary PF for @@ -2957,9 +3001,9 @@ static int ice_ptp_auxbus_probe(struct auxiliary_device *aux_dev, if (WARN_ON(!owner_pf)) return -ENODEV;
- INIT_LIST_HEAD(&aux_pf->ptp.port.list_member); + INIT_LIST_HEAD(&aux_pf->ptp.port.list_node); mutex_lock(&owner_pf->ptp.ports_owner.lock); - list_add(&aux_pf->ptp.port.list_member, + list_add(&aux_pf->ptp.port.list_node, &owner_pf->ptp.ports_owner.ports); mutex_unlock(&owner_pf->ptp.ports_owner.lock);
@@ -2976,7 +3020,7 @@ static void ice_ptp_auxbus_remove(struct auxiliary_device *aux_dev) struct ice_pf *aux_pf = ice_ptp_aux_dev_to_aux_pf(aux_dev);
mutex_lock(&owner_pf->ptp.ports_owner.lock); - list_del(&aux_pf->ptp.port.list_member); + list_del(&aux_pf->ptp.port.list_node); mutex_unlock(&owner_pf->ptp.ports_owner.lock); }
@@ -3036,7 +3080,7 @@ ice_ptp_auxbus_create_id_table(struct ice_pf *pf, const char *name) * ice_ptp_register_auxbus_driver - Register PTP auxiliary bus driver * @pf: Board private structure */ -static int ice_ptp_register_auxbus_driver(struct ice_pf *pf) +static int __always_unused ice_ptp_register_auxbus_driver(struct ice_pf *pf) { struct auxiliary_driver *aux_driver; struct ice_ptp *ptp; @@ -3079,7 +3123,7 @@ static int ice_ptp_register_auxbus_driver(struct ice_pf *pf) * ice_ptp_unregister_auxbus_driver - Unregister PTP auxiliary bus driver * @pf: Board private structure */ -static void ice_ptp_unregister_auxbus_driver(struct ice_pf *pf) +static void __always_unused ice_ptp_unregister_auxbus_driver(struct ice_pf *pf) { struct auxiliary_driver *aux_driver = &pf->ptp.ports_owner.aux_driver;
@@ -3098,15 +3142,12 @@ static void ice_ptp_unregister_auxbus_driver(struct ice_pf *pf) */ int ice_ptp_clock_index(struct ice_pf *pf) { - struct auxiliary_device *aux_dev; - struct ice_pf *owner_pf; + struct ice_ptp *ctrl_ptp = ice_get_ctrl_ptp(pf); struct ptp_clock *clock;
- aux_dev = &pf->ptp.port.aux_dev; - owner_pf = ice_ptp_aux_dev_to_owner_pf(aux_dev); - if (!owner_pf) + if (!ctrl_ptp) return -1; - clock = owner_pf->ptp.clock; + clock = ctrl_ptp->clock;
return clock ? ptp_clock_index(clock) : -1; } @@ -3166,15 +3207,7 @@ static int ice_ptp_init_owner(struct ice_pf *pf) if (err) goto err_clk;
- err = ice_ptp_register_auxbus_driver(pf); - if (err) { - dev_err(ice_pf_to_dev(pf), "Failed to register PTP auxbus driver"); - goto err_aux; - } - return 0; -err_aux: - ptp_clock_unregister(pf->ptp.clock); err_clk: pf->ptp.clock = NULL; err_exit: @@ -3250,7 +3283,7 @@ static void ice_ptp_release_auxbus_device(struct device *dev) * ice_ptp_create_auxbus_device - Create PTP auxiliary bus device * @pf: Board private structure */ -static int ice_ptp_create_auxbus_device(struct ice_pf *pf) +static __always_unused int ice_ptp_create_auxbus_device(struct ice_pf *pf) { struct auxiliary_device *aux_dev; struct ice_ptp *ptp; @@ -3297,7 +3330,7 @@ static int ice_ptp_create_auxbus_device(struct ice_pf *pf) * ice_ptp_remove_auxbus_device - Remove PTP auxiliary bus device * @pf: Board private structure */ -static void ice_ptp_remove_auxbus_device(struct ice_pf *pf) +static __always_unused void ice_ptp_remove_auxbus_device(struct ice_pf *pf) { struct auxiliary_device *aux_dev = &pf->ptp.port.aux_dev;
@@ -3361,19 +3394,26 @@ void ice_ptp_init(struct ice_pf *pf) /* If this function owns the clock hardware, it must allocate and * configure the PTP clock device to represent it. */ - if (ice_pf_src_tmr_owned(pf)) { + if (ice_pf_src_tmr_owned(pf) && ice_is_primary(hw)) { + err = ice_ptp_setup_adapter(pf); + if (err) + goto err_exit; err = ice_ptp_init_owner(pf); if (err) - goto err; + goto err_exit; }
+ err = ice_ptp_setup_pf(pf); + if (err) + goto err_exit; + ptp->port.port_num = hw->pf_id; if (ice_is_e825c(hw) && hw->ptp.is_2x50g_muxed_topo) ptp->port.port_num = hw->pf_id * 2;
err = ice_ptp_init_port(pf, &ptp->port); if (err) - goto err; + goto err_exit;
/* Start the PHY timestamping block */ ice_ptp_reset_phy_timestamping(pf); @@ -3381,20 +3421,16 @@ void ice_ptp_init(struct ice_pf *pf) /* Configure initial Tx interrupt settings */ ice_ptp_cfg_tx_interrupt(pf);
- err = ice_ptp_create_auxbus_device(pf); - if (err) - goto err; - ptp->state = ICE_PTP_READY;
err = ice_ptp_init_work(pf, ptp); if (err) - goto err; + goto err_exit;
dev_info(ice_pf_to_dev(pf), "PTP init successful\n"); return;
-err: +err_exit: /* If we registered a PTP clock, release it */ if (pf->ptp.clock) { ptp_clock_unregister(ptp->clock); @@ -3421,7 +3457,7 @@ void ice_ptp_release(struct ice_pf *pf) /* Disable timestamping for both Tx and Rx */ ice_ptp_disable_timestamp_mode(pf);
- ice_ptp_remove_auxbus_device(pf); + ice_ptp_cleanup_pf(pf);
ice_ptp_release_tx_tracker(pf, &pf->ptp.port.tx);
@@ -3436,9 +3472,6 @@ void ice_ptp_release(struct ice_pf *pf) pf->ptp.kworker = NULL; }
- if (ice_pf_src_tmr_owned(pf)) - ice_ptp_unregister_auxbus_driver(pf); - if (!pf->ptp.clock) return;
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.h b/drivers/net/ethernet/intel/ice/ice_ptp.h index 2db2257a0fb2f..5615efc42c624 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.h +++ b/drivers/net/ethernet/intel/ice/ice_ptp.h @@ -169,7 +169,7 @@ struct ice_ptp_tx { * ready for PTP functionality. It is used to track the port initialization * and determine when the port's PHY offset is valid. * - * @list_member: list member structure of auxiliary device + * @list_node: list member structure * @tx: Tx timestamp tracking for this port * @aux_dev: auxiliary device associated with this port * @ov_work: delayed work task for tracking when PHY offset is valid @@ -179,7 +179,7 @@ struct ice_ptp_tx { * @port_num: the port number this structure represents */ struct ice_ptp_port { - struct list_head list_member; + struct list_head list_node; struct ice_ptp_tx tx; struct auxiliary_device aux_dev; struct kthread_delayed_work ov_work; @@ -205,6 +205,7 @@ enum ice_ptp_tx_interrupt { * @ports: list of porst handled by this port owner * @lock: protect access to ports list */ + struct ice_ptp_port_owner { struct auxiliary_driver aux_driver; struct list_head ports; diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h index 4c8b845713442..3499062218b59 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h @@ -452,6 +452,11 @@ static inline u64 ice_get_base_incval(struct ice_hw *hw) } }
+static inline bool ice_is_dual(struct ice_hw *hw) +{ + return !!(hw->dev_caps.nac_topo.mode & ICE_NAC_TOPO_DUAL_M); +} + #define PFTSYN_SEM_BYTES 4
#define ICE_PTP_CLOCK_INDEX_0 0x00
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Karol Kolacinski karol.kolacinski@intel.com
[ Upstream commit 258f5f905815979f15d5151d2ea4f20d8e057fe1 ]
Driver always naively assumes, that for PTP purposes, PHY lane to configure is corresponding to PF ID.
This is not true for some port configurations, e.g.: - 2x50G per quad, where lanes used are 0 and 2 on each quad, but PF IDs are 0 and 1 - 100G per quad on 2 quads, where lanes used are 0 and 4, but PF IDs are 0 and 1
Use correct PHY lane assignment by getting and parsing port options. This is read from the NVM by the FW and provided to the driver with the indication of active port split.
Remove ice_is_muxed_topo(), which is no longer needed.
Fixes: 4409ea1726cb ("ice: Adjust PTP init for 2x50G E825C devices") Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Arkadiusz Kubalewski Arkadiusz.kubalewski@intel.com Signed-off-by: Karol Kolacinski karol.kolacinski@intel.com Signed-off-by: Grzegorz Nitka grzegorz.nitka@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/intel/ice/ice_adminq_cmd.h | 1 + drivers/net/ethernet/intel/ice/ice_common.c | 51 +++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_common.h | 1 + drivers/net/ethernet/intel/ice/ice_main.c | 6 +-- drivers/net/ethernet/intel/ice/ice_ptp.c | 23 ++++----- drivers/net/ethernet/intel/ice/ice_ptp.h | 4 +- drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 26 +--------- drivers/net/ethernet/intel/ice/ice_type.h | 1 - 8 files changed, 68 insertions(+), 45 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h index 79a6edd0be0ec..80f3dfd271243 100644 --- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h +++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h @@ -1648,6 +1648,7 @@ struct ice_aqc_get_port_options_elem { #define ICE_AQC_PORT_OPT_MAX_LANE_25G 5 #define ICE_AQC_PORT_OPT_MAX_LANE_50G 6 #define ICE_AQC_PORT_OPT_MAX_LANE_100G 7 +#define ICE_AQC_PORT_OPT_MAX_LANE_200G 8
u8 global_scid[2]; u8 phy_scid[2]; diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c index f1324e25b2af1..068a467de1d56 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.c +++ b/drivers/net/ethernet/intel/ice/ice_common.c @@ -4074,6 +4074,57 @@ ice_aq_set_port_option(struct ice_hw *hw, u8 lport, u8 lport_valid, return ice_aq_send_cmd(hw, &desc, NULL, 0, NULL); }
+/** + * ice_get_phy_lane_number - Get PHY lane number for current adapter + * @hw: pointer to the hw struct + * + * Return: PHY lane number on success, negative error code otherwise. + */ +int ice_get_phy_lane_number(struct ice_hw *hw) +{ + struct ice_aqc_get_port_options_elem *options; + unsigned int lport = 0; + unsigned int lane; + int err; + + options = kcalloc(ICE_AQC_PORT_OPT_MAX, sizeof(*options), GFP_KERNEL); + if (!options) + return -ENOMEM; + + for (lane = 0; lane < ICE_MAX_PORT_PER_PCI_DEV; lane++) { + u8 options_count = ICE_AQC_PORT_OPT_MAX; + u8 speed, active_idx, pending_idx; + bool active_valid, pending_valid; + + err = ice_aq_get_port_options(hw, options, &options_count, lane, + true, &active_idx, &active_valid, + &pending_idx, &pending_valid); + if (err) + goto err; + + if (!active_valid) + continue; + + speed = options[active_idx].max_lane_speed; + /* If we don't get speed for this lane, it's unoccupied */ + if (speed > ICE_AQC_PORT_OPT_MAX_LANE_200G) + continue; + + if (hw->pf_id == lport) { + kfree(options); + return lane; + } + + lport++; + } + + /* PHY lane not found */ + err = -ENXIO; +err: + kfree(options); + return err; +} + /** * ice_aq_sff_eeprom * @hw: pointer to the HW struct diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h index 27208a60cece5..fe6f88cfd9486 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.h +++ b/drivers/net/ethernet/intel/ice/ice_common.h @@ -193,6 +193,7 @@ ice_aq_get_port_options(struct ice_hw *hw, int ice_aq_set_port_option(struct ice_hw *hw, u8 lport, u8 lport_valid, u8 new_option); +int ice_get_phy_lane_number(struct ice_hw *hw); int ice_aq_sff_eeprom(struct ice_hw *hw, u16 lport, u8 bus_addr, u16 mem_addr, u8 page, u8 set_page, u8 *data, u8 length, diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 8f2e758c39427..45eefe22fb5b7 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -1144,7 +1144,7 @@ ice_link_event(struct ice_pf *pf, struct ice_port_info *pi, bool link_up, if (link_up == old_link && link_speed == old_link_speed) return 0;
- ice_ptp_link_change(pf, pf->hw.pf_id, link_up); + ice_ptp_link_change(pf, link_up);
if (ice_is_dcb_active(pf)) { if (test_bit(ICE_FLAG_DCB_ENA, pf->flags)) @@ -6744,7 +6744,7 @@ static int ice_up_complete(struct ice_vsi *vsi) ice_print_link_msg(vsi, true); netif_tx_start_all_queues(vsi->netdev); netif_carrier_on(vsi->netdev); - ice_ptp_link_change(pf, pf->hw.pf_id, true); + ice_ptp_link_change(pf, true); }
/* Perform an initial read of the statistics registers now to @@ -7214,7 +7214,7 @@ int ice_down(struct ice_vsi *vsi)
if (vsi->netdev) { vlan_err = ice_vsi_del_vlan_zero(vsi); - ice_ptp_link_change(vsi->back, vsi->back->hw.pf_id, false); + ice_ptp_link_change(vsi->back, false); netif_carrier_off(vsi->netdev); netif_tx_disable(vsi->netdev); } diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index 0b3dd87a331c7..7c6f81beaee46 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -1466,10 +1466,9 @@ ice_ptp_port_phy_restart(struct ice_ptp_port *ptp_port) /** * ice_ptp_link_change - Reconfigure PTP after link status change * @pf: Board private structure - * @port: Port for which the PHY start is set * @linkup: Link is up or down */ -void ice_ptp_link_change(struct ice_pf *pf, u8 port, bool linkup) +void ice_ptp_link_change(struct ice_pf *pf, bool linkup) { struct ice_ptp_port *ptp_port; struct ice_hw *hw = &pf->hw; @@ -1477,14 +1476,7 @@ void ice_ptp_link_change(struct ice_pf *pf, u8 port, bool linkup) if (pf->ptp.state != ICE_PTP_READY) return;
- if (WARN_ON_ONCE(port >= hw->ptp.num_lports)) - return; - ptp_port = &pf->ptp.port; - if (ice_is_e825c(hw) && hw->ptp.is_2x50g_muxed_topo) - port *= 2; - if (WARN_ON_ONCE(ptp_port->port_num != port)) - return;
/* Update cached link status for this port immediately */ ptp_port->link_up = linkup; @@ -3383,10 +3375,17 @@ void ice_ptp_init(struct ice_pf *pf) { struct ice_ptp *ptp = &pf->ptp; struct ice_hw *hw = &pf->hw; - int err; + int lane_num, err;
ptp->state = ICE_PTP_INITIALIZING;
+ lane_num = ice_get_phy_lane_number(hw); + if (lane_num < 0) { + err = lane_num; + goto err_exit; + } + + ptp->port.port_num = (u8)lane_num; ice_ptp_init_hw(hw);
ice_ptp_init_tx_interrupt_mode(pf); @@ -3407,10 +3406,6 @@ void ice_ptp_init(struct ice_pf *pf) if (err) goto err_exit;
- ptp->port.port_num = hw->pf_id; - if (ice_is_e825c(hw) && hw->ptp.is_2x50g_muxed_topo) - ptp->port.port_num = hw->pf_id * 2; - err = ice_ptp_init_port(pf, &ptp->port); if (err) goto err_exit; diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.h b/drivers/net/ethernet/intel/ice/ice_ptp.h index 5615efc42c624..f1cfa6aa4e76b 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.h +++ b/drivers/net/ethernet/intel/ice/ice_ptp.h @@ -332,7 +332,7 @@ void ice_ptp_prepare_for_reset(struct ice_pf *pf, enum ice_reset_req reset_type); void ice_ptp_init(struct ice_pf *pf); void ice_ptp_release(struct ice_pf *pf); -void ice_ptp_link_change(struct ice_pf *pf, u8 port, bool linkup); +void ice_ptp_link_change(struct ice_pf *pf, bool linkup); #else /* IS_ENABLED(CONFIG_PTP_1588_CLOCK) */ static inline int ice_ptp_set_ts_config(struct ice_pf *pf, struct ifreq *ifr) { @@ -380,7 +380,7 @@ static inline void ice_ptp_prepare_for_reset(struct ice_pf *pf, } static inline void ice_ptp_init(struct ice_pf *pf) { } static inline void ice_ptp_release(struct ice_pf *pf) { } -static inline void ice_ptp_link_change(struct ice_pf *pf, u8 port, bool linkup) +static inline void ice_ptp_link_change(struct ice_pf *pf, bool linkup) { }
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c index 28bd4815ebcf3..7190fde16c868 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c @@ -2698,26 +2698,6 @@ static int ice_get_phy_tx_tstamp_ready_eth56g(struct ice_hw *hw, u8 port, return 0; }
-/** - * ice_is_muxed_topo - detect breakout 2x50G topology for E825C - * @hw: pointer to the HW struct - * - * Return: true if it's 2x50 breakout topology, false otherwise - */ -static bool ice_is_muxed_topo(struct ice_hw *hw) -{ - u8 link_topo; - bool mux; - u32 val; - - val = rd32(hw, GLGEN_SWITCH_MODE_CONFIG); - mux = FIELD_GET(GLGEN_SWITCH_MODE_CONFIG_25X4_QUAD_M, val); - val = rd32(hw, GLGEN_MAC_LINK_TOPO); - link_topo = FIELD_GET(GLGEN_MAC_LINK_TOPO_LINK_TOPO_M, val); - - return (mux && link_topo == ICE_LINK_TOPO_UP_TO_2_LINKS); -} - /** * ice_ptp_init_phy_e825 - initialize PHY parameters * @hw: pointer to the HW struct @@ -2740,12 +2720,8 @@ static void ice_ptp_init_phy_e825(struct ice_hw *hw)
ice_sb_access_ena_eth56g(hw, true); err = ice_read_phy_eth56g(hw, hw->pf_id, PHY_REG_REVISION, &phy_rev); - if (err || phy_rev != PHY_REVISION_ETH56G) { + if (err || phy_rev != PHY_REVISION_ETH56G) ptp->phy_model = ICE_PHY_UNSUP; - return; - } - - ptp->is_2x50g_muxed_topo = ice_is_muxed_topo(hw); }
/* E822 family functions diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h index 479227bdff75e..609f31e0dfded 100644 --- a/drivers/net/ethernet/intel/ice/ice_type.h +++ b/drivers/net/ethernet/intel/ice/ice_type.h @@ -880,7 +880,6 @@ struct ice_ptp_hw { union ice_phy_params phy; u8 num_lports; u8 ports_per_phy; - bool is_2x50g_muxed_topo; };
/* Port hardware description */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
[ Upstream commit 5a597a19a2148d1c5cd987907a60c042ab0f62d5 ]
After previous changes, the description of the teo governor in the documentation comment does not match the code any more, so update it as appropriate.
Fixes: 449914398083 ("cpuidle: teo: Remove recent intercepts metric") Fixes: 2662342079f5 ("cpuidle: teo: Gather statistics regarding whether or not to stop the tick") Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases") Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Christian Loehle christian.loehle@arm.com Link: https://patch.msgid.link/6120335.lOV4Wx5bFT@rjwysocki.net [ rjw: Corrected 3 typos found by Christian ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpuidle/governors/teo.c | 91 +++++++++++++++++---------------- 1 file changed, 48 insertions(+), 43 deletions(-)
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index f2992f92d8db8..173ddcac540ad 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -10,25 +10,27 @@ * DOC: teo-description * * The idea of this governor is based on the observation that on many systems - * timer events are two or more orders of magnitude more frequent than any - * other interrupts, so they are likely to be the most significant cause of CPU - * wakeups from idle states. Moreover, information about what happened in the - * (relatively recent) past can be used to estimate whether or not the deepest - * idle state with target residency within the (known) time till the closest - * timer event, referred to as the sleep length, is likely to be suitable for - * the upcoming CPU idle period and, if not, then which of the shallower idle - * states to choose instead of it. + * timer interrupts are two or more orders of magnitude more frequent than any + * other interrupt types, so they are likely to dominate CPU wakeup patterns. + * Moreover, in principle, the time when the next timer event is going to occur + * can be determined at the idle state selection time, although doing that may + * be costly, so it can be regarded as the most reliable source of information + * for idle state selection. * - * Of course, non-timer wakeup sources are more important in some use cases - * which can be covered by taking a few most recent idle time intervals of the - * CPU into account. However, even in that context it is not necessary to - * consider idle duration values greater than the sleep length, because the - * closest timer will ultimately wake up the CPU anyway unless it is woken up - * earlier. + * Of course, non-timer wakeup sources are more important in some use cases, + * but even then it is generally unnecessary to consider idle duration values + * greater than the time time till the next timer event, referred as the sleep + * length in what follows, because the closest timer will ultimately wake up the + * CPU anyway unless it is woken up earlier. * - * Thus this governor estimates whether or not the prospective idle duration of - * a CPU is likely to be significantly shorter than the sleep length and selects - * an idle state for it accordingly. + * However, since obtaining the sleep length may be costly, the governor first + * checks if it can select a shallow idle state using wakeup pattern information + * from recent times, in which case it can do without knowing the sleep length + * at all. For this purpose, it counts CPU wakeup events and looks for an idle + * state whose target residency has not exceeded the idle duration (measured + * after wakeup) in the majority of relevant recent cases. If the target + * residency of that state is small enough, it may be used right away and the + * sleep length need not be determined. * * The computations carried out by this governor are based on using bins whose * boundaries are aligned with the target residency parameter values of the CPU @@ -39,7 +41,11 @@ * idle state 2, the third bin spans from the target residency of idle state 2 * up to, but not including, the target residency of idle state 3 and so on. * The last bin spans from the target residency of the deepest idle state - * supplied by the driver to infinity. + * supplied by the driver to the scheduler tick period length or to infinity if + * the tick period length is less than the target residency of that state. In + * the latter case, the governor also counts events with the measured idle + * duration between the tick period length and the target residency of the + * deepest idle state. * * Two metrics called "hits" and "intercepts" are associated with each bin. * They are updated every time before selecting an idle state for the given CPU @@ -49,47 +55,46 @@ * sleep length and the idle duration measured after CPU wakeup fall into the * same bin (that is, the CPU appears to wake up "on time" relative to the sleep * length). In turn, the "intercepts" metric reflects the relative frequency of - * situations in which the measured idle duration is so much shorter than the - * sleep length that the bin it falls into corresponds to an idle state - * shallower than the one whose bin is fallen into by the sleep length (these - * situations are referred to as "intercepts" below). + * non-timer wakeup events for which the measured idle duration falls into a bin + * that corresponds to an idle state shallower than the one whose bin is fallen + * into by the sleep length (these events are also referred to as "intercepts" + * below). * * In order to select an idle state for a CPU, the governor takes the following * steps (modulo the possible latency constraint that must be taken into account * too): * - * 1. Find the deepest CPU idle state whose target residency does not exceed - * the current sleep length (the candidate idle state) and compute 2 sums as - * follows: + * 1. Find the deepest enabled CPU idle state (the candidate idle state) and + * compute 2 sums as follows: * - * - The sum of the "hits" and "intercepts" metrics for the candidate state - * and all of the deeper idle states (it represents the cases in which the - * CPU was idle long enough to avoid being intercepted if the sleep length - * had been equal to the current one). + * - The sum of the "hits" metric for all of the idle states shallower than + * the candidate one (it represents the cases in which the CPU was likely + * woken up by a timer). * - * - The sum of the "intercepts" metrics for all of the idle states shallower - * than the candidate one (it represents the cases in which the CPU was not - * idle long enough to avoid being intercepted if the sleep length had been - * equal to the current one). + * - The sum of the "intercepts" metric for all of the idle states shallower + * than the candidate one (it represents the cases in which the CPU was + * likely woken up by a non-timer wakeup source). * - * 2. If the second sum is greater than the first one the CPU is likely to wake - * up early, so look for an alternative idle state to select. + * 2. If the second sum computed in step 1 is greater than a half of the sum of + * both metrics for the candidate state bin and all subsequent bins(if any), + * a shallower idle state is likely to be more suitable, so look for it. * - * - Traverse the idle states shallower than the candidate one in the + * - Traverse the enabled idle states shallower than the candidate one in the * descending order. * * - For each of them compute the sum of the "intercepts" metrics over all * of the idle states between it and the candidate one (including the * former and excluding the latter). * - * - If each of these sums that needs to be taken into account (because the - * check related to it has indicated that the CPU is likely to wake up - * early) is greater than a half of the corresponding sum computed in step - * 1 (which means that the target residency of the state in question had - * not exceeded the idle duration in over a half of the relevant cases), - * select the given idle state instead of the candidate one. + * - If this sum is greater than a half of the second sum computed in step 1, + * use the given idle state as the new candidate one. * - * 3. By default, select the candidate state. + * 3. If the current candidate state is state 0 or its target residency is short + * enough, return it and prevent the scheduler tick from being stopped. + * + * 4. Obtain the sleep length value and check if it is below the target + * residency of the current candidate state, in which case a new shallower + * candidate state needs to be found, so look for it. */
#include <linux/cpuidle.h>
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
[ Upstream commit fe4de594f7a2e9bc49407de60fbd20809fad4192 ]
Inside function get_canonical_dev_path(), we call d_path() to get the final device path.
But d_path() can return error, and in that case the next strscpy() call will trigger an invalid memory access.
Add back the missing error handling for d_path().
Reported-by: Boris Burkov boris@bur.io Fixes: 7e06de7c83a7 ("btrfs: canonicalize the device path before adding it") Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/volumes.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 0c4d14c59ebec..395b8b880ce78 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -797,6 +797,10 @@ static int get_canonical_dev_path(const char *dev_path, char *canonical) if (ret) goto out; resolved_path = d_path(&path, path_buf, PATH_MAX); + if (IS_ERR(resolved_path)) { + ret = PTR_ERR(resolved_path); + goto out; + } ret = strscpy(canonical, resolved_path, PATH_MAX); out: kfree(path_buf);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 46841c7053e6d25fb33e0534ef023833bf03e382 ]
gtp_newlink() links the gtp device to a list in dev_net(dev).
However, even after the gtp device is moved to another netns, it stays on the list but should be invisible.
Let's use for_each_netdev_rcu() for netdev traversal in gtp_genl_dump_pdp().
Note that gtp_dev_list is no longer used under RCU, so list helpers are converted to the non-RCU variant.
Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Reported-by: Xiao Liang shaw.leon@gmail.com Closes: https://lore.kernel.org/netdev/CABAhCOQdBL6h9M2C+kd+bGivRJ9Q72JUxW+-gur0nub_... Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/gtp.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 70f981887518a..819ce4b756d23 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1527,7 +1527,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev, }
gn = net_generic(dev_net(dev), gtp_net_id); - list_add_rcu(>p->list, &gn->gtp_dev_list); + list_add(>p->list, &gn->gtp_dev_list); dev->priv_destructor = gtp_destructor;
netdev_dbg(dev, "registered new GTP interface\n"); @@ -1553,7 +1553,7 @@ static void gtp_dellink(struct net_device *dev, struct list_head *head) hlist_for_each_entry_safe(pctx, next, >p->tid_hash[i], hlist_tid) pdp_context_delete(pctx);
- list_del_rcu(>p->list); + list_del(>p->list); unregister_netdevice_queue(dev, head); }
@@ -2279,16 +2279,19 @@ static int gtp_genl_dump_pdp(struct sk_buff *skb, struct gtp_dev *last_gtp = (struct gtp_dev *)cb->args[2], *gtp; int i, j, bucket = cb->args[0], skip = cb->args[1]; struct net *net = sock_net(skb->sk); + struct net_device *dev; struct pdp_ctx *pctx; - struct gtp_net *gn; - - gn = net_generic(net, gtp_net_id);
if (cb->args[4]) return 0;
rcu_read_lock(); - list_for_each_entry_rcu(gtp, &gn->gtp_dev_list, list) { + for_each_netdev_rcu(net, dev) { + if (dev->rtnl_link_ops != >p_link_ops) + continue; + + gtp = netdev_priv(dev); + if (last_gtp && last_gtp != gtp) continue; else @@ -2483,9 +2486,9 @@ static void __net_exit gtp_net_exit_batch_rtnl(struct list_head *net_list,
list_for_each_entry(net, net_list, exit_list) { struct gtp_net *gn = net_generic(net, gtp_net_id); - struct gtp_dev *gtp; + struct gtp_dev *gtp, *gtp_next;
- list_for_each_entry(gtp, &gn->gtp_dev_list, list) + list_for_each_entry_safe(gtp, gtp_next, &gn->gtp_dev_list, list) gtp_dellink(gtp->dev, dev_to_kill); } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit eb28fd76c0a08a47b470677c6cef9dd1c60e92d1 ]
gtp_newlink() links the device to a list in dev_net(dev) instead of src_net, where a udp tunnel socket is created.
Even when src_net is removed, the device stays alive on dev_net(dev). Then, removing src_net triggers the splat below. [0]
In this example, gtp0 is created in ns2, and the udp socket is created in ns1.
ip netns add ns1 ip netns add ns2 ip -n ns1 link add netns ns2 name gtp0 type gtp role sgsn ip netns del ns1
Let's link the device to the socket's netns instead.
Now, gtp_net_exit_batch_rtnl() needs another netdev iteration to remove all gtp devices in the netns.
[0]: ref_tracker: net notrefcnt@000000003d6e7d05 has 1/2 users at sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236) inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252) __sock_create (net/socket.c:1558) udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18) gtp_create_sock (./include/net/udp_tunnel.h:59 drivers/net/gtp.c:1423) gtp_create_sockets (drivers/net/gtp.c:1447) gtp_newlink (drivers/net/gtp.c:1507) rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012) rtnetlink_rcv_msg (net/core/rtnetlink.c:6922) netlink_rcv_skb (net/netlink/af_netlink.c:2542) netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347) netlink_sendmsg (net/netlink/af_netlink.c:1891) ____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583) ___sys_sendmsg (net/socket.c:2639) __sys_sendmsg (net/socket.c:2669) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
WARNING: CPU: 1 PID: 60 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179) Modules linked in: CPU: 1 UID: 0 PID: 60 Comm: kworker/u16:2 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179) Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89 RSP: 0018:ff11000009a07b60 EFLAGS: 00010286 RAX: 0000000000002bd3 RBX: ff1100000f4e1aa0 RCX: 1ffffffff0e40ac6 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c RBP: ff1100000f4e1af0 R08: 0000000000000001 R09: fffffbfff0e395ae R10: 0000000000000001 R11: 0000000000036001 R12: ff1100000f4e1af0 R13: dead000000000100 R14: ff1100000f4e1af0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9b2464bd98 CR3: 0000000005286005 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn (kernel/panic.c:748) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? report_bug (lib/bug.c:201 lib/bug.c:219) ? handle_bug (arch/x86/kernel/traps.c:285) ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1)) ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158) ? kfree (mm/slub.c:4613 mm/slub.c:4761) net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467) cleanup_net (net/core/net_namespace.c:664 (discriminator 3)) process_one_work (kernel/workqueue.c:3229) worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) </TASK>
Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Reported-by: Xiao Liang shaw.leon@gmail.com Closes: https://lore.kernel.org/netdev/20250104125732.17335-1-shaw.leon@gmail.com/ Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/gtp.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 819ce4b756d23..47406ce990161 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1526,7 +1526,7 @@ static int gtp_newlink(struct net *src_net, struct net_device *dev, goto out_encap; }
- gn = net_generic(dev_net(dev), gtp_net_id); + gn = net_generic(src_net, gtp_net_id); list_add(>p->list, &gn->gtp_dev_list); dev->priv_destructor = gtp_destructor;
@@ -2487,6 +2487,11 @@ static void __net_exit gtp_net_exit_batch_rtnl(struct list_head *net_list, list_for_each_entry(net, net_list, exit_list) { struct gtp_net *gn = net_generic(net, gtp_net_id); struct gtp_dev *gtp, *gtp_next; + struct net_device *dev; + + for_each_netdev(net, dev) + if (dev->rtnl_link_ops == >p_link_ops) + gtp_dellink(dev, dev_to_kill);
list_for_each_entry_safe(gtp, gtp_next, &gn->gtp_dev_list, list) gtp_dellink(gtp->dev, dev_to_kill);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit ffc90e9ca61b0f619326a1417ff32efd6cc71ed2 ]
pfcp_newlink() links the device to a list in dev_net(dev) instead of net, where a udp tunnel socket is created.
Even when net is removed, the device stays alive on dev_net(dev). Then, removing net triggers the splat below. [0]
In this example, pfcp0 is created in ns2, but the udp socket is created in ns1.
ip netns add ns1 ip netns add ns2 ip -n ns1 link add netns ns2 name pfcp0 type pfcp ip netns del ns1
Let's link the device to the socket's netns instead.
Now, pfcp_net_exit() needs another netdev iteration to remove all pfcp devices in the netns.
pfcp_dev_list is not used under RCU, so the list API is converted to the non-RCU variant.
pfcp_net_exit() can be converted to .exit_batch_rtnl() in net-next.
[0]: ref_tracker: net notrefcnt@00000000128b34dc has 1/1 users at sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236) inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252) __sock_create (net/socket.c:1558) udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18) pfcp_create_sock (drivers/net/pfcp.c:168) pfcp_newlink (drivers/net/pfcp.c:182 drivers/net/pfcp.c:197) rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012) rtnetlink_rcv_msg (net/core/rtnetlink.c:6922) netlink_rcv_skb (net/netlink/af_netlink.c:2542) netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347) netlink_sendmsg (net/netlink/af_netlink.c:1891) ____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583) ___sys_sendmsg (net/socket.c:2639) __sys_sendmsg (net/socket.c:2669) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
WARNING: CPU: 1 PID: 11 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179) Modules linked in: CPU: 1 UID: 0 PID: 11 Comm: kworker/u16:0 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179) Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89 RSP: 0018:ff11000007f3fb60 EFLAGS: 00010286 RAX: 00000000000020ef RBX: ff1100000d6481e0 RCX: 1ffffffff0e40d82 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c RBP: ff1100000d648230 R08: 0000000000000001 R09: fffffbfff0e395af R10: 0000000000000001 R11: 0000000000000000 R12: ff1100000d648230 R13: dead000000000100 R14: ff1100000d648230 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005620e1363990 CR3: 000000000eeb2002 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn (kernel/panic.c:748) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? report_bug (lib/bug.c:201 lib/bug.c:219) ? handle_bug (arch/x86/kernel/traps.c:285) ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1)) ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) ? ref_tracker_dir_exit (lib/ref_tracker.c:179) ? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158) ? kfree (mm/slub.c:4613 mm/slub.c:4761) net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467) cleanup_net (net/core/net_namespace.c:664 (discriminator 3)) process_one_work (kernel/workqueue.c:3229) worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) </TASK>
Fixes: 76c8764ef36a ("pfcp: add PFCP module") Reported-by: Xiao Liang shaw.leon@gmail.com Closes: https://lore.kernel.org/netdev/20250104125732.17335-1-shaw.leon@gmail.com/ Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/pfcp.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/net/pfcp.c b/drivers/net/pfcp.c index 69434fd13f961..68d0d9e92a220 100644 --- a/drivers/net/pfcp.c +++ b/drivers/net/pfcp.c @@ -206,8 +206,8 @@ static int pfcp_newlink(struct net *net, struct net_device *dev, goto exit_del_pfcp_sock; }
- pn = net_generic(dev_net(dev), pfcp_net_id); - list_add_rcu(&pfcp->list, &pn->pfcp_dev_list); + pn = net_generic(net, pfcp_net_id); + list_add(&pfcp->list, &pn->pfcp_dev_list);
netdev_dbg(dev, "registered new PFCP interface\n");
@@ -224,7 +224,7 @@ static void pfcp_dellink(struct net_device *dev, struct list_head *head) { struct pfcp_dev *pfcp = netdev_priv(dev);
- list_del_rcu(&pfcp->list); + list_del(&pfcp->list); unregister_netdevice_queue(dev, head); }
@@ -247,11 +247,16 @@ static int __net_init pfcp_net_init(struct net *net) static void __net_exit pfcp_net_exit(struct net *net) { struct pfcp_net *pn = net_generic(net, pfcp_net_id); - struct pfcp_dev *pfcp; + struct pfcp_dev *pfcp, *pfcp_next; + struct net_device *dev; LIST_HEAD(list);
rtnl_lock(); - list_for_each_entry(pfcp, &pn->pfcp_dev_list, list) + for_each_netdev(net, dev) + if (dev->rtnl_link_ops == &pfcp_link_ops) + pfcp_dellink(dev, &list); + + list_for_each_entry_safe(pfcp, pfcp_next, &pn->pfcp_dev_list, list) pfcp_dellink(pfcp->dev, &list);
unregister_netdevice_many(&list);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Viresh Kumar viresh.kumar@linaro.org
[ Upstream commit 7e265fc04690d449a40b413b0348b15c748cea6f ]
It is possible to enable few cpufreq drivers, without the framework being enabled. This happened due to a bug while moving the entries earlier. Fix it.
Fixes: 7ee1378736f0 ("cpufreq: Move CPPC configs to common Kconfig and add RISC-V") Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Reviewed-by: Pierre Gondois pierre.gondois@arm.com Reviewed-by: Sunil V L sunilvl@ventanamicro.com Reviewed-by: Sudeep Holla sudeep.holla@arm.com Link: https://patch.msgid.link/84ac7a8fa72a8fe20487bb0a350a758bce060965.1736488384... Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index 2561b215432a8..588ab1cc6d557 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -311,8 +311,6 @@ config QORIQ_CPUFREQ This adds the CPUFreq driver support for Freescale QorIQ SoCs which are capable of changing the CPU's frequency dynamically.
-endif - config ACPI_CPPC_CPUFREQ tristate "CPUFreq driver based on the ACPI CPPC spec" depends on ACPI_PROCESSOR @@ -341,4 +339,6 @@ config ACPI_CPPC_CPUFREQ_FIE
If in doubt, say N.
+endif + endmenu
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 16ebb6f5b6295c9688749862a39a4889c56227f8 ]
The "sizeof(struct cmsg_bpf_event) + pkt_size + data_size" math could potentially have an integer wrapping bug on 32bit systems. Check for this and return an error.
Fixes: 9816dd35ecec ("nfp: bpf: perf event output helpers support") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://patch.msgid.link/6074805b-e78d-4b8a-bf05-e929b5377c28@stanley.mounta... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c index 9d97cd281f18e..c03558adda91e 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c @@ -458,7 +458,8 @@ int nfp_bpf_event_output(struct nfp_app_bpf *bpf, const void *data, map_id_full = be64_to_cpu(cbe->map_ptr); map_id = map_id_full;
- if (len < sizeof(struct cmsg_bpf_event) + pkt_size + data_size) + if (size_add(pkt_size, data_size) > INT_MAX || + len < sizeof(struct cmsg_bpf_event) + pkt_size + data_size) return -EINVAL; if (cbe->hdr.ver != NFP_CCM_ABI_VERSION) return -EINVAL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
[ Upstream commit c17ff476f53afb30f90bb3c2af77de069c81a622 ]
If coalesce_count is greater than 255 it will not fit in the register and will overflow. This can be reproduced by running
# ethtool -C ethX rx-frames 256
which will result in a timeout of 0us instead. Fix this by checking for invalid values and reporting an error.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson sean.anderson@linux.dev Reviewed-by: Shannon Nelson shannon.nelson@amd.com Reviewed-by: Radhey Shyam Pandey radhey.shyam.pandey@amd.com Link: https://patch.msgid.link/20250113163001.2335235-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 1fcbcaa85ebdb..de10a2d08c428 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -2056,6 +2056,12 @@ axienet_ethtools_set_coalesce(struct net_device *ndev, return -EBUSY; }
+ if (ecoalesce->rx_max_coalesced_frames > 255 || + ecoalesce->tx_max_coalesced_frames > 255) { + NL_SET_ERR_MSG(extack, "frames must be less than 256"); + return -EINVAL; + } + if (ecoalesce->rx_max_coalesced_frames) lp->coalesce_count_rx = ecoalesce->rx_max_coalesced_frames; if (ecoalesce->rx_coalesce_usecs)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kevin Groeneveld kgroeneveld@lenbrook.com
[ Upstream commit 001ba0902046cb6c352494df610718c0763e77a5 ]
The fec_enet_update_cbd function calls page_pool_dev_alloc_pages but did not handle the case when it returned NULL. There was a WARN_ON(!new_page) but it would still proceed to use the NULL pointer and then crash.
This case does seem somewhat rare but when the system is under memory pressure it can happen. One case where I can duplicate this with some frequency is when writing over a smbd share to a SATA HDD attached to an imx6q.
Setting /proc/sys/vm/min_free_kbytes to higher values also seems to solve the problem for my test case. But it still seems wrong that the fec driver ignores the memory allocation error and can crash.
This commit handles the allocation error by dropping the current packet.
Fixes: 95698ff6177b5 ("net: fec: using page pool to manage RX buffers") Signed-off-by: Kevin Groeneveld kgroeneveld@lenbrook.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Reviewed-by: Wei Fang wei.fang@nxp.com Link: https://patch.msgid.link/20250113154846.1765414-1-kgroeneveld@lenbrook.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/fec_main.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 9d9fcec41488e..49d1748e0c043 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -1591,19 +1591,22 @@ static void fec_enet_tx(struct net_device *ndev, int budget) fec_enet_tx_queue(ndev, i, budget); }
-static void fec_enet_update_cbd(struct fec_enet_priv_rx_q *rxq, +static int fec_enet_update_cbd(struct fec_enet_priv_rx_q *rxq, struct bufdesc *bdp, int index) { struct page *new_page; dma_addr_t phys_addr;
new_page = page_pool_dev_alloc_pages(rxq->page_pool); - WARN_ON(!new_page); - rxq->rx_skb_info[index].page = new_page; + if (unlikely(!new_page)) + return -ENOMEM;
+ rxq->rx_skb_info[index].page = new_page; rxq->rx_skb_info[index].offset = FEC_ENET_XDP_HEADROOM; phys_addr = page_pool_get_dma_addr(new_page) + FEC_ENET_XDP_HEADROOM; bdp->cbd_bufaddr = cpu_to_fec32(phys_addr); + + return 0; }
static u32 @@ -1698,6 +1701,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) int cpu = smp_processor_id(); struct xdp_buff xdp; struct page *page; + __fec32 cbd_bufaddr; u32 sub_len = 4;
#if !defined(CONFIG_M5272) @@ -1766,12 +1770,17 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
index = fec_enet_get_bd_index(bdp, &rxq->bd); page = rxq->rx_skb_info[index].page; + cbd_bufaddr = bdp->cbd_bufaddr; + if (fec_enet_update_cbd(rxq, bdp, index)) { + ndev->stats.rx_dropped++; + goto rx_processing_done; + } + dma_sync_single_for_cpu(&fep->pdev->dev, - fec32_to_cpu(bdp->cbd_bufaddr), + fec32_to_cpu(cbd_bufaddr), pkt_len, DMA_FROM_DEVICE); prefetch(page_address(page)); - fec_enet_update_cbd(rxq, bdp, index);
if (xdp_prog) { xdp_buff_clear_frags_flag(&xdp);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit cbc16bceea784210d585a42ac9f8f10ce62b300e ]
page_pool_ref_netmem() should work with either netmem representation, but currently it casts to a page with netmem_to_page(), which will fail with net iovs. Use netmem_get_pp_ref_count_ref() instead.
Fixes: 8ab79ed50cf1 ("page_pool: devmem support") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: David Wei dw@davidwei.uk Link: https://lore.kernel.org/20250108220644.3528845-2-dw@davidwei.uk Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/page_pool/helpers.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 793e6fd78bc5c..60a5347922bec 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -294,7 +294,7 @@ static inline long page_pool_unref_page(struct page *page, long nr)
static inline void page_pool_ref_netmem(netmem_ref netmem) { - atomic_long_inc(&netmem_to_page(netmem)->pp_ref_count); + atomic_long_inc(netmem_get_pp_ref_count_ref(netmem)); }
static inline void page_pool_ref_page(struct page *page)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Patrisious Haddad phaddad@nvidia.com
[ Upstream commit c08d3e62b2e73e14da318a1d20b52d0486a28ee0 ]
User added steering rules at RDMA_TX were being added to the first prio, which is the counters prio. Fix that so that they are correctly added to the BYPASS_PRIO instead.
Fixes: 24670b1a3166 ("net/mlx5: Add support for RDMA TX steering") Signed-off-by: Patrisious Haddad phaddad@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 2eabfcc247c6a..0ce999706d412 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -2709,6 +2709,7 @@ struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct mlx5_core_dev *dev, break; case MLX5_FLOW_NAMESPACE_RDMA_TX: root_ns = steering->rdma_tx_root_ns; + prio = RDMA_TX_BYPASS_PRIO; break; case MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS: root_ns = steering->rdma_rx_root_ns;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yishai Hadas yishaih@nvidia.com
[ Upstream commit 1b10a519a45704d4b06ebd9245b272d145752c18 ]
Fix a lockdep warning [1] observed during the write combining test.
The warning indicates a potential nested lock scenario that could lead to a deadlock.
However, this is a false positive alarm because the SF lock and its parent lock are distinct ones.
The lockdep confusion arises because the locks belong to the same object class (i.e., struct mlx5_core_dev).
To resolve this, the code has been refactored to avoid taking both locks. Instead, only the parent lock is acquired.
[1] raw_ethernet_bw/2118 is trying to acquire lock: [ 213.619032] ffff88811dd75e08 (&dev->wc_state_lock){+.+.}-{3:3}, at: mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.620270] [ 213.620270] but task is already holding lock: [ 213.620943] ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3}, at: mlx5_wc_support_get+0x10c/0x210 [mlx5_core] [ 213.622045] [ 213.622045] other info that might help us debug this: [ 213.622778] Possible unsafe locking scenario: [ 213.622778] [ 213.623465] CPU0 [ 213.623815] ---- [ 213.624148] lock(&dev->wc_state_lock); [ 213.624615] lock(&dev->wc_state_lock); [ 213.625071] [ 213.625071] *** DEADLOCK *** [ 213.625071] [ 213.625805] May be due to missing lock nesting notation [ 213.625805] [ 213.626522] 4 locks held by raw_ethernet_bw/2118: [ 213.627019] #0: ffff88813f80d578 (&uverbs_dev->disassociate_srcu){.+.+}-{0:0}, at: ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs] [ 213.628088] #1: ffff88810fb23930 (&file->hw_destroy_rwsem){.+.+}-{3:3}, at: ib_init_ucontext+0x2d/0xf0 [ib_uverbs] [ 213.629094] #2: ffff88810fb23878 (&file->ucontext_lock){+.+.}-{3:3}, at: ib_init_ucontext+0x49/0xf0 [ib_uverbs] [ 213.630106] #3: ffff88810b585e08 (&dev->wc_state_lock){+.+.}-{3:3}, at: mlx5_wc_support_get+0x10c/0x210 [mlx5_core] [ 213.631185] [ 213.631185] stack backtrace: [ 213.631718] CPU: 1 UID: 0 PID: 2118 Comm: raw_ethernet_bw Not tainted 6.12.0-rc7_internal_net_next_mlx5_89a0ad0 #1 [ 213.632722] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 213.633785] Call Trace: [ 213.634099] [ 213.634393] dump_stack_lvl+0x7e/0xc0 [ 213.634806] print_deadlock_bug+0x278/0x3c0 [ 213.635265] __lock_acquire+0x15f4/0x2c40 [ 213.635712] lock_acquire+0xcd/0x2d0 [ 213.636120] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.636722] ? mlx5_ib_enable_lb+0x24/0xa0 [mlx5_ib] [ 213.637277] __mutex_lock+0x81/0xda0 [ 213.637697] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.638305] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.638902] ? rcu_read_lock_sched_held+0x3f/0x70 [ 213.639400] ? mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.640016] mlx5_wc_support_get+0x18c/0x210 [mlx5_core] [ 213.640615] set_ucontext_resp+0x68/0x2b0 [mlx5_ib] [ 213.641144] ? debug_mutex_init+0x33/0x40 [ 213.641586] mlx5_ib_alloc_ucontext+0x18e/0x7b0 [mlx5_ib] [ 213.642145] ib_init_ucontext+0xa0/0xf0 [ib_uverbs] [ 213.642679] ib_uverbs_handler_UVERBS_METHOD_GET_CONTEXT+0x95/0xc0 [ib_uverbs] [ 213.643426] ? _copy_from_user+0x46/0x80 [ 213.643878] ib_uverbs_cmd_verbs+0xa6b/0xc80 [ib_uverbs] [ 213.644426] ? ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x130/0x130 [ib_uverbs] [ 213.645213] ? __lock_acquire+0xa99/0x2c40 [ 213.645675] ? lock_acquire+0xcd/0x2d0 [ 213.646101] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs] [ 213.646625] ? reacquire_held_locks+0xcf/0x1f0 [ 213.647102] ? do_user_addr_fault+0x45d/0x770 [ 213.647586] ib_uverbs_ioctl+0xe0/0x170 [ib_uverbs] [ 213.648102] ? ib_uverbs_ioctl+0xc4/0x170 [ib_uverbs] [ 213.648632] __x64_sys_ioctl+0x4d3/0xaa0 [ 213.649060] ? do_user_addr_fault+0x4a8/0x770 [ 213.649528] do_syscall_64+0x6d/0x140 [ 213.649947] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 213.650478] RIP: 0033:0x7fa179b0737b [ 213.650893] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48 [ 213.652619] RSP: 002b:00007ffd2e6d46e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 213.653390] RAX: ffffffffffffffda RBX: 00007ffd2e6d47f8 RCX: 00007fa179b0737b [ 213.654084] RDX: 00007ffd2e6d47e0 RSI: 00000000c0181b01 RDI: 0000000000000003 [ 213.654767] RBP: 00007ffd2e6d47c0 R08: 00007fa1799be010 R09: 0000000000000002 [ 213.655453] R10: 00007ffd2e6d4960 R11: 0000000000000246 R12: 00007ffd2e6d487c [ 213.656170] R13: 0000000000000027 R14: 0000000000000001 R15: 00007ffd2e6d4f70
Fixes: d98995b4bf98 ("net/mlx5: Reimplement write combining test") Signed-off-by: Yishai Hadas yishaih@nvidia.com Reviewed-by: Michael Guralnik michaelgur@nvidia.com Reviewed-by: Larysa Zaremba larysa.zaremba@intel.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/wc.c | 24 ++++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c index 1bed75eca97db..740b719e7072d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c @@ -382,6 +382,7 @@ static void mlx5_core_test_wc(struct mlx5_core_dev *mdev)
bool mlx5_wc_support_get(struct mlx5_core_dev *mdev) { + struct mutex *wc_state_lock = &mdev->wc_state_lock; struct mlx5_core_dev *parent = NULL;
if (!MLX5_CAP_GEN(mdev, bf)) { @@ -400,32 +401,31 @@ bool mlx5_wc_support_get(struct mlx5_core_dev *mdev) */ goto out;
- mutex_lock(&mdev->wc_state_lock); - - if (mdev->wc_state != MLX5_WC_STATE_UNINITIALIZED) - goto unlock; - #ifdef CONFIG_MLX5_SF - if (mlx5_core_is_sf(mdev)) + if (mlx5_core_is_sf(mdev)) { parent = mdev->priv.parent_mdev; + wc_state_lock = &parent->wc_state_lock; + } #endif
- if (parent) { - mutex_lock(&parent->wc_state_lock); + mutex_lock(wc_state_lock);
+ if (mdev->wc_state != MLX5_WC_STATE_UNINITIALIZED) + goto unlock; + + if (parent) { mlx5_core_test_wc(parent);
mlx5_core_dbg(mdev, "parent set wc_state=%d\n", parent->wc_state); mdev->wc_state = parent->wc_state;
- mutex_unlock(&parent->wc_state_lock); + } else { + mlx5_core_test_wc(mdev); }
- mlx5_core_test_wc(mdev); - unlock: - mutex_unlock(&mdev->wc_state_lock); + mutex_unlock(wc_state_lock); out: mlx5_core_dbg(mdev, "wc_state=%d\n", mdev->wc_state);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chris Mi cmi@nvidia.com
[ Upstream commit 2011a2a18ef00b5b8e4b753acbe6451a8c5f2260 ]
If failed to add SF, error handling doesn't delete the SF from the SF table. But the hw resources are deleted. So when unload driver, hw resources will be deleted again. Firmware will report syndrome 0x68def3 which means "SF is not allocated can not deallocate".
Fix it by delete SF from SF table if failed to add SF.
Fixes: 2597ee190b4e ("net/mlx5: Call mlx5_sf_id_erase() once in mlx5_sf_dealloc()") Signed-off-by: Chris Mi cmi@nvidia.com Reviewed-by: Shay Drori shayd@nvidia.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c index a96be98be032f..b96909fbeb12d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c @@ -257,6 +257,7 @@ static int mlx5_sf_add(struct mlx5_core_dev *dev, struct mlx5_sf_table *table, return 0;
esw_err: + mlx5_sf_function_id_erase(table, sf); mlx5_sf_free(table, sf); return err; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Zhang markzhang@nvidia.com
[ Upstream commit 5641e82cb55b4ecbc6366a499300917d2f3e6790 ]
Clear the port select structure on error so no stale values left after definers are destroyed. That's because the mlx5_lag_destroy_definers() always try to destroy all lag definers in the tt_map, so in the flow below lag definers get double-destroyed and cause kernel crash:
mlx5_lag_port_sel_create() mlx5_lag_create_definers() mlx5_lag_create_definer() <- Failed on tt 1 mlx5_lag_destroy_definers() <- definers[tt=0] gets destroyed mlx5_lag_port_sel_create() mlx5_lag_create_definers() mlx5_lag_create_definer() <- Failed on tt 0 mlx5_lag_destroy_definers() <- definers[tt=0] gets double-destroyed
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 64k pages, 48-bit VAs, pgdp=0000000112ce2e00 [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Modules linked in: iptable_raw bonding ip_gre ip6_gre gre ip6_tunnel tunnel6 geneve ip6_udp_tunnel udp_tunnel ipip tunnel4 ip_tunnel rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_fwctl(OE) fwctl(OE) mlx5_core(OE) mlxdevm(OE) ib_core(OE) mlxfw(OE) memtrack(OE) mlx_compat(OE) openvswitch nsh nf_conncount psample xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc netconsole overlay efi_pstore sch_fq_codel zram ip_tables crct10dif_ce qemu_fw_cfg fuse ipv6 crc_ccitt [last unloaded: mlx_compat(OE)] CPU: 3 UID: 0 PID: 217 Comm: kworker/u53:2 Tainted: G OE 6.11.0+ #2 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core] lr : mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core] sp : ffff800085fafb00 x29: ffff800085fafb00 x28: ffff0000da0c8000 x27: 0000000000000000 x26: ffff0000da0c8000 x25: ffff0000da0c8000 x24: ffff0000da0c8000 x23: ffff0000c31f81a0 x22: 0400000000000000 x21: ffff0000da0c8000 x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8b0c9350 x14: 0000000000000000 x13: ffff800081390d18 x12: ffff800081dc3cc0 x11: 0000000000000001 x10: 0000000000000b10 x9 : ffff80007ab7304c x8 : ffff0000d00711f0 x7 : 0000000000000004 x6 : 0000000000000190 x5 : ffff00027edb3010 x4 : 0000000000000000 x3 : 0000000000000000 x2 : ffff0000d39b8000 x1 : ffff0000d39b8000 x0 : 0400000000000000 Call trace: mlx5_del_flow_rules+0x24/0x2c0 [mlx5_core] mlx5_lag_destroy_definer+0x54/0x100 [mlx5_core] mlx5_lag_destroy_definers+0xa0/0x108 [mlx5_core] mlx5_lag_port_sel_create+0x2d4/0x6f8 [mlx5_core] mlx5_activate_lag+0x60c/0x6f8 [mlx5_core] mlx5_do_bond_work+0x284/0x5c8 [mlx5_core] process_one_work+0x170/0x3e0 worker_thread+0x2d8/0x3e0 kthread+0x11c/0x128 ret_from_fork+0x10/0x20 Code: a9025bf5 aa0003f6 a90363f7 f90023f9 (f9400400) ---[ end trace 0000000000000000 ]---
Fixes: dc48516ec7d3 ("net/mlx5: Lag, add support to create definers for LAG") Signed-off-by: Mark Zhang markzhang@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c index ab2717012b79b..39e80704b1c42 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c @@ -530,7 +530,7 @@ int mlx5_lag_port_sel_create(struct mlx5_lag *ldev, set_tt_map(port_sel, hash_type); err = mlx5_lag_create_definers(ldev, hash_type, ports); if (err) - return err; + goto clear_port_sel;
if (port_sel->tunnel) { err = mlx5_lag_create_inner_ttc_table(ldev); @@ -549,6 +549,8 @@ int mlx5_lag_port_sel_create(struct mlx5_lag *ldev, mlx5_destroy_ttc_table(port_sel->inner.ttc); destroy_definers: mlx5_lag_destroy_definers(ldev); +clear_port_sel: + memset(port_sel, 0, sizeof(*port_sel)); return err; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit 2c3688090f8a1f085230aa839cc63e4a7b977df0 ]
Attempt to enable IPsec packet offload in tunnel mode in debug kernel generates the following kernel panic, which is happening due to two issues: 1. In SA add section, the should be _bh() variant when marking SA mode. 2. There is not needed flush_workqueue in SA delete routine. It is not needed as at this stage as it is removed from SADB and the running work will be canceled later in SA free.
===================================================== WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 6.12.0+ #4 Not tainted ----------------------------------------------------- charon/1337 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire: ffff88810f365020 (&xa->xa_lock#24){+.+.}-{3:3}, at: mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core]
and this task is already holding: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30 which would create a new lock dependency: (&x->lock){+.-.}-{3:3} -> (&xa->xa_lock#24){+.+.}-{3:3}
but this new dependency connects a SOFTIRQ-irq-safe lock: (&x->lock){+.-.}-{3:3}
... which became SOFTIRQ-irq-safe at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_timer_handler+0x91/0xd70 __hrtimer_run_queues+0x1dd/0xa60 hrtimer_run_softirq+0x146/0x2e0 handle_softirqs+0x266/0x860 irq_exit_rcu+0x115/0x1a0 sysvec_apic_timer_interrupt+0x6e/0x90 asm_sysvec_apic_timer_interrupt+0x16/0x20 default_idle+0x13/0x20 default_idle_call+0x67/0xa0 do_idle+0x2da/0x320 cpu_startup_entry+0x50/0x60 start_secondary+0x213/0x2a0 common_startup_64+0x129/0x138
to a SOFTIRQ-irq-unsafe lock: (&xa->xa_lock#24){+.+.}-{3:3}
... which became SOFTIRQ-irq-unsafe at: ... lock_acquire+0x1be/0x520 _raw_spin_lock+0x2c/0x40 xa_set_mark+0x70/0x110 mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&xa->xa_lock#24); local_irq_disable(); lock(&x->lock); lock(&xa->xa_lock#24); <Interrupt> lock(&x->lock);
*** DEADLOCK ***
2 locks held by charon/1337: #0: ffffffff87f8f858 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x5e/0x90 #1: ffff88813e0f0d48 (&x->lock){+.-.}-{3:3}, at: xfrm_state_delete+0x16/0x30
the dependencies between SOFTIRQ-irq-safe lock and the holding lock: -> (&x->lock){+.-.}-{3:3} ops: 29 { HARDIRQ-ON-W at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_alloc_spi+0xc0/0xe60 xfrm_alloc_userspi+0x5f6/0xbc0 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 IN-SOFTIRQ-W at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_timer_handler+0x91/0xd70 __hrtimer_run_queues+0x1dd/0xa60 hrtimer_run_softirq+0x146/0x2e0 handle_softirqs+0x266/0x860 irq_exit_rcu+0x115/0x1a0 sysvec_apic_timer_interrupt+0x6e/0x90 asm_sysvec_apic_timer_interrupt+0x16/0x20 default_idle+0x13/0x20 default_idle_call+0x67/0xa0 do_idle+0x2da/0x320 cpu_startup_entry+0x50/0x60 start_secondary+0x213/0x2a0 common_startup_64+0x129/0x138 INITIAL USE at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 xfrm_alloc_spi+0xc0/0xe60 xfrm_alloc_userspi+0x5f6/0xbc0 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 } ... key at: [<ffffffff87f9cd20>] __key.18+0x0/0x40
the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: -> (&xa->xa_lock#24){+.+.}-{3:3} ops: 9 { HARDIRQ-ON-W at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 SOFTIRQ-ON-W at: lock_acquire+0x1be/0x520 _raw_spin_lock+0x2c/0x40 xa_set_mark+0x70/0x110 mlx5e_xfrm_add_state+0xe48/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 INITIAL USE at: lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 mlx5e_xfrm_add_state+0xc5b/0x2290 [mlx5_core] xfrm_dev_state_add+0x3bb/0xd70 xfrm_add_sa+0x2451/0x4a90 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 } ... key at: [<ffffffffa078ff60>] __key.48+0x0/0xfffffffffff210a0 [mlx5_core] ... acquired at: __lock_acquire+0x30a0/0x5040 lock_acquire+0x1be/0x520 _raw_spin_lock_bh+0x34/0x40 mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] xfrm_dev_state_delete+0x90/0x160 __xfrm_state_delete+0x662/0xae0 xfrm_state_delete+0x1e/0x30 xfrm_del_sa+0x1c2/0x340 xfrm_user_rcv_msg+0x493/0x880 netlink_rcv_skb+0x12e/0x380 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 netlink_sendmsg+0x745/0xbe0 __sock_sendmsg+0xc5/0x190 __sys_sendto+0x1fe/0x2c0 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53
stack backtrace: CPU: 7 UID: 0 PID: 1337 Comm: charon Not tainted 6.12.0+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x74/0xd0 check_irq_usage+0x12e8/0x1d90 ? print_shortest_lock_dependencies_backwards+0x1b0/0x1b0 ? check_chain_key+0x1bb/0x4c0 ? __lockdep_reset_lock+0x180/0x180 ? check_path.constprop.0+0x24/0x50 ? mark_lock+0x108/0x2fb0 ? print_circular_bug+0x9b0/0x9b0 ? mark_lock+0x108/0x2fb0 ? print_usage_bug.part.0+0x670/0x670 ? check_prev_add+0x1c4/0x2310 check_prev_add+0x1c4/0x2310 __lock_acquire+0x30a0/0x5040 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? lockdep_set_lock_cmp_fn+0x190/0x190 lock_acquire+0x1be/0x520 ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? __xfrm_state_delete+0x5f0/0xae0 ? lock_downgrade+0x6b0/0x6b0 _raw_spin_lock_bh+0x34/0x40 ? mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] mlx5e_xfrm_del_state+0xca/0x1e0 [mlx5_core] xfrm_dev_state_delete+0x90/0x160 __xfrm_state_delete+0x662/0xae0 xfrm_state_delete+0x1e/0x30 xfrm_del_sa+0x1c2/0x340 ? xfrm_get_sa+0x250/0x250 ? check_chain_key+0x1bb/0x4c0 xfrm_user_rcv_msg+0x493/0x880 ? copy_sec_ctx+0x270/0x270 ? check_chain_key+0x1bb/0x4c0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? lockdep_set_lock_cmp_fn+0x190/0x190 netlink_rcv_skb+0x12e/0x380 ? copy_sec_ctx+0x270/0x270 ? netlink_ack+0xd90/0xd90 ? netlink_deliver_tap+0xcd/0xb60 xfrm_netlink_rcv+0x6d/0x90 netlink_unicast+0x42f/0x740 ? netlink_attachskb+0x730/0x730 ? lock_acquire+0x1be/0x520 netlink_sendmsg+0x745/0xbe0 ? netlink_unicast+0x740/0x740 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x740/0x740 __sock_sendmsg+0xc5/0x190 ? fdget+0x163/0x1d0 __sys_sendto+0x1fe/0x2c0 ? __x64_sys_getpeername+0xb0/0xb0 ? do_user_addr_fault+0x856/0xe30 ? lock_acquire+0x1be/0x520 ? __task_pid_nr_ns+0x117/0x410 ? lock_downgrade+0x6b0/0x6b0 __x64_sys_sendto+0xdc/0x1b0 ? lockdep_hardirqs_on_prepare+0x284/0x400 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f7d31291ba4 Code: 7d e8 89 4d d4 e8 4c 42 f7 ff 44 8b 4d d0 4c 8b 45 c8 89 c3 44 8b 55 d4 8b 7d e8 b8 2c 00 00 00 48 8b 55 d8 48 8b 75 e0 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 45 e8 e8 99 42 f7 ff 48 8b 45 RSP: 002b:00007f7d2ccd94f0 EFLAGS: 00000297 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7d31291ba4 RDX: 0000000000000028 RSI: 00007f7d2ccd96a0 RDI: 000000000000000a RBP: 00007f7d2ccd9530 R08: 00007f7d2ccd9598 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000028 R13: 00007f7d2ccd9598 R14: 00007f7d2ccd96a0 R15: 00000000000000e1 </TASK>
Fixes: 4c24272b4e2b ("net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode") Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index ca92e518be766..21857474ad83f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -768,9 +768,12 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x, MLX5_IPSEC_RESCHED);
if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET && - x->props.mode == XFRM_MODE_TUNNEL) - xa_set_mark(&ipsec->sadb, sa_entry->ipsec_obj_id, - MLX5E_IPSEC_TUNNEL_SA); + x->props.mode == XFRM_MODE_TUNNEL) { + xa_lock_bh(&ipsec->sadb); + __xa_set_mark(&ipsec->sadb, sa_entry->ipsec_obj_id, + MLX5E_IPSEC_TUNNEL_SA); + xa_unlock_bh(&ipsec->sadb); + }
out: x->xso.offload_handle = (unsigned long)sa_entry; @@ -797,7 +800,6 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x, static void mlx5e_xfrm_del_state(struct xfrm_state *x) { struct mlx5e_ipsec_sa_entry *sa_entry = to_ipsec_sa_entry(x); - struct mlx5_accel_esp_xfrm_attrs *attrs = &sa_entry->attrs; struct mlx5e_ipsec *ipsec = sa_entry->ipsec; struct mlx5e_ipsec_sa_entry *old;
@@ -806,12 +808,6 @@ static void mlx5e_xfrm_del_state(struct xfrm_state *x)
old = xa_erase_bh(&ipsec->sadb, sa_entry->ipsec_obj_id); WARN_ON(old != sa_entry); - - if (attrs->mode == XFRM_MODE_TUNNEL && - attrs->type == XFRM_DEV_OFFLOAD_PACKET) - /* Make sure that no ARP requests are running in parallel */ - flush_workqueue(ipsec->wq); - }
static void mlx5e_xfrm_free_state(struct xfrm_state *x)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit 25f23524dfa227959beb3b2c2c0f38e0222f4cfa ]
All packet offloads SAs have reqid in it to make sure they have corresponding policy. While it is not strictly needed for transparent mode, it is extremely important in tunnel mode. In that mode, policy and SAs have different match criteria.
Policy catches the whole subnet addresses, and SA catches the tunnel gateways addresses. The source address of such tunnel is not known during egress packet traversal in flow steering as it is added only after successful encryption.
As reqid is required for packet offload and it is unique for every SA, we can safely rely on it only.
The output below shows the configured egress policy and SA by strongswan:
[leonro@vm ~]$ sudo ip x s src 192.169.101.2 dst 192.169.101.1 proto esp spi 0xc88b7652 reqid 1 mode tunnel replay-window 0 flag af-unspec esn aead rfc4106(gcm(aes)) 0xe406a01083986e14d116488549094710e9c57bc6 128 anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 1, bitmap-length 1 00000000 crypto offload parameters: dev eth2 dir out mode packet
[leonro@064 ~]$ sudo ip x p src 192.170.0.0/16 dst 192.170.0.0/16 dir out priority 383615 ptype main tmpl src 192.169.101.2 dst 192.169.101.1 proto esp spi 0xc88b7652 reqid 1 mode tunnel crypto offload parameters: dev eth2 mode packet
Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes") Signed-off-by: Leon Romanovsky leonro@nvidia.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c index e51b03d4c717f..57861d34d46f8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c @@ -1718,23 +1718,21 @@ static int tx_add_rule(struct mlx5e_ipsec_sa_entry *sa_entry) goto err_alloc; }
- if (attrs->family == AF_INET) - setup_fte_addr4(spec, &attrs->saddr.a4, &attrs->daddr.a4); - else - setup_fte_addr6(spec, attrs->saddr.a6, attrs->daddr.a6); - setup_fte_no_frags(spec); setup_fte_upper_proto_match(spec, &attrs->upspec);
switch (attrs->type) { case XFRM_DEV_OFFLOAD_CRYPTO: + if (attrs->family == AF_INET) + setup_fte_addr4(spec, &attrs->saddr.a4, &attrs->daddr.a4); + else + setup_fte_addr6(spec, attrs->saddr.a6, attrs->daddr.a6); setup_fte_spi(spec, attrs->spi, false); setup_fte_esp(spec); setup_fte_reg_a(spec); break; case XFRM_DEV_OFFLOAD_PACKET: - if (attrs->reqid) - setup_fte_reg_c4(spec, attrs->reqid); + setup_fte_reg_c4(spec, attrs->reqid); err = setup_pkt_reformat(ipsec, attrs, &flow_act); if (err) goto err_pkt_reformat;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit 7f95b0247764acd739d949ff247db4b76138e55a ]
According to RFC4303, section "3.3.3. Sequence Number Generation", the first packet sent using a given SA will contain a sequence number of 1.
This is applicable to both ESN and non-ESN mode, which was not covered in commit mentioned in Fixes line.
Fixes: 3d42c8cc67a8 ("net/mlx5e: Ensure that IPsec sequence packet number starts from 1") Signed-off-by: Leon Romanovsky leonro@nvidia.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 6 ++++++ .../mellanox/mlx5/core/en_accel/ipsec_offload.c | 11 ++++++++--- 2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 21857474ad83f..1baf8933a07cb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -724,6 +724,12 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x, /* check esn */ if (x->props.flags & XFRM_STATE_ESN) mlx5e_ipsec_update_esn_state(sa_entry); + else + /* According to RFC4303, section "3.3.3. Sequence Number Generation", + * the first packet sent using a given SA will contain a sequence + * number of 1. + */ + sa_entry->esn_state.esn = 1;
mlx5e_ipsec_build_accel_xfrm_attrs(sa_entry, &sa_entry->attrs);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c index 53cfa39188cb0..820debf3fbbf2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c @@ -91,8 +91,9 @@ u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev) EXPORT_SYMBOL_GPL(mlx5_ipsec_device_caps);
static void mlx5e_ipsec_packet_setup(void *obj, u32 pdn, - struct mlx5_accel_esp_xfrm_attrs *attrs) + struct mlx5e_ipsec_sa_entry *sa_entry) { + struct mlx5_accel_esp_xfrm_attrs *attrs = &sa_entry->attrs; void *aso_ctx;
aso_ctx = MLX5_ADDR_OF(ipsec_obj, obj, ipsec_aso); @@ -120,8 +121,12 @@ static void mlx5e_ipsec_packet_setup(void *obj, u32 pdn, * active. */ MLX5_SET(ipsec_obj, obj, aso_return_reg, MLX5_IPSEC_ASO_REG_C_4_5); - if (attrs->dir == XFRM_DEV_OFFLOAD_OUT) + if (attrs->dir == XFRM_DEV_OFFLOAD_OUT) { MLX5_SET(ipsec_aso, aso_ctx, mode, MLX5_IPSEC_ASO_INC_SN); + if (!attrs->replay_esn.trigger) + MLX5_SET(ipsec_aso, aso_ctx, mode_parameter, + sa_entry->esn_state.esn); + }
if (attrs->lft.hard_packet_limit != XFRM_INF) { MLX5_SET(ipsec_aso, aso_ctx, remove_flow_pkt_cnt, @@ -175,7 +180,7 @@ static int mlx5_create_ipsec_obj(struct mlx5e_ipsec_sa_entry *sa_entry)
res = &mdev->mlx5e_res.hw_objs; if (attrs->type == XFRM_DEV_OFFLOAD_PACKET) - mlx5e_ipsec_packet_setup(obj, res->pdn, attrs); + mlx5e_ipsec_packet_setup(obj, res->pdn, sa_entry);
err = mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out)); if (!err)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit a50da36562cd62b41de9bef08edbb3e8af00f118 ]
Li Li reports that casting away callback type may cause issues for CFI. Let's generate a small wrapper for each callback, to make sure compiler sees the anticipated types.
Reported-by: Li Li dualli@chromium.org Link: https://lore.kernel.org/CANBPYPjQVqmzZ4J=rVQX87a9iuwmaetULwbK_5_3YWk2eGzkaA@... Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice") Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Mina Almasry almasrymina@google.com Link: https://patch.msgid.link/20250115161436.648646-1-kuba@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/netdev-genl-gen.c | 14 ++++++++++++-- tools/net/ynl/ynl-gen-c.py | 16 +++++++++++++--- 2 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index b28424ae06d5f..8614988fc67b9 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -178,6 +178,16 @@ static const struct genl_multicast_group netdev_nl_mcgrps[] = { [NETDEV_NLGRP_PAGE_POOL] = { "page-pool", }, };
+static void __netdev_nl_sock_priv_init(void *priv) +{ + netdev_nl_sock_priv_init(priv); +} + +static void __netdev_nl_sock_priv_destroy(void *priv) +{ + netdev_nl_sock_priv_destroy(priv); +} + struct genl_family netdev_nl_family __ro_after_init = { .name = NETDEV_FAMILY_NAME, .version = NETDEV_FAMILY_VERSION, @@ -189,6 +199,6 @@ struct genl_family netdev_nl_family __ro_after_init = { .mcgrps = netdev_nl_mcgrps, .n_mcgrps = ARRAY_SIZE(netdev_nl_mcgrps), .sock_priv_size = sizeof(struct list_head), - .sock_priv_init = (void *)netdev_nl_sock_priv_init, - .sock_priv_destroy = (void *)netdev_nl_sock_priv_destroy, + .sock_priv_init = __netdev_nl_sock_priv_init, + .sock_priv_destroy = __netdev_nl_sock_priv_destroy, }; diff --git a/tools/net/ynl/ynl-gen-c.py b/tools/net/ynl/ynl-gen-c.py index 717530bc9c52e..463f1394ab971 100755 --- a/tools/net/ynl/ynl-gen-c.py +++ b/tools/net/ynl/ynl-gen-c.py @@ -2361,6 +2361,17 @@ def print_kernel_family_struct_src(family, cw): if not kernel_can_gen_family_struct(family): return
+ if 'sock-priv' in family.kernel_family: + # Generate "trampolines" to make CFI happy + cw.write_func("static void", f"__{family.c_name}_nl_sock_priv_init", + [f"{family.c_name}_nl_sock_priv_init(priv);"], + ["void *priv"]) + cw.nl() + cw.write_func("static void", f"__{family.c_name}_nl_sock_priv_destroy", + [f"{family.c_name}_nl_sock_priv_destroy(priv);"], + ["void *priv"]) + cw.nl() + cw.block_start(f"struct genl_family {family.ident_name}_nl_family __ro_after_init =") cw.p('.name\t\t= ' + family.fam_key + ',') cw.p('.version\t= ' + family.ver_key + ',') @@ -2378,9 +2389,8 @@ def print_kernel_family_struct_src(family, cw): cw.p(f'.n_mcgrps\t= ARRAY_SIZE({family.c_name}_nl_mcgrps),') if 'sock-priv' in family.kernel_family: cw.p(f'.sock_priv_size\t= sizeof({family.kernel_family["sock-priv"]}),') - # Force cast here, actual helpers take pointer to the real type. - cw.p(f'.sock_priv_init\t= (void *){family.c_name}_nl_sock_priv_init,') - cw.p(f'.sock_priv_destroy = (void *){family.c_name}_nl_sock_priv_destroy,') + cw.p(f'.sock_priv_init\t= __{family.c_name}_nl_sock_priv_init,') + cw.p(f'.sock_priv_destroy = __{family.c_name}_nl_sock_priv_destroy,') cw.block_end(';')
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu-Chun Lin eleanor15x@gmail.com
[ Upstream commit b9097e4c8bf3934e4e07e6f9b88741957fef351e ]
Delete one line break to make the format correct, resolving the following warning during a W=1 build:
drivers/gpu/drm/tests/drm_kunit_helpers.c:324: warning: bad line: for a KUnit test
Fixes: caa714f86699 ("drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202501032001.O6WY1VCW-lkp@intel.com/ Reviewed-by: Kuan-Wei Chiu visitorckw@gmail.com Tested-by: Kuan-Wei Chiu visitorckw@gmail.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Yu-Chun Lin eleanor15x@gmail.com Link: https://patchwork.freedesktop.org/patch/msgid/20250104165134.1695864-1-elean... Signed-off-by: Maxime Ripard mripard@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/tests/drm_kunit_helpers.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tests/drm_kunit_helpers.c b/drivers/gpu/drm/tests/drm_kunit_helpers.c index 04a6b8cc62ac6..3c0b7824c0be3 100644 --- a/drivers/gpu/drm/tests/drm_kunit_helpers.c +++ b/drivers/gpu/drm/tests/drm_kunit_helpers.c @@ -320,8 +320,7 @@ static void kunit_action_drm_mode_destroy(void *ptr) }
/** - * drm_kunit_display_mode_from_cea_vic() - return a mode for CEA VIC - for a KUnit test + * drm_kunit_display_mode_from_cea_vic() - return a mode for CEA VIC for a KUnit test * @test: The test context object * @dev: DRM device * @video_code: CEA VIC of the mode
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Forbes ian.forbes@broadcom.com
[ Upstream commit cb343ded122e0bf41e4b2a9f89386296451be109 ]
Unlock BOs in reverse order. Add an acquire context so that lockdep doesn't complain.
Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers") Signed-off-by: Ian Forbes ian.forbes@broadcom.com Signed-off-by: Zack Rusin zack.rusin@broadcom.com Link: https://patchwork.freedesktop.org/patch/msgid/20241210195535.2074918-1-ian.f... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c index 10d596cb4b402..5f99f7437ae61 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c @@ -750,6 +750,7 @@ vmw_du_cursor_plane_atomic_update(struct drm_plane *plane, struct vmw_plane_state *old_vps = vmw_plane_state_to_vps(old_state); struct vmw_bo *old_bo = NULL; struct vmw_bo *new_bo = NULL; + struct ww_acquire_ctx ctx; s32 hotspot_x, hotspot_y; int ret;
@@ -769,9 +770,11 @@ vmw_du_cursor_plane_atomic_update(struct drm_plane *plane, if (du->cursor_surface) du->cursor_age = du->cursor_surface->snooper.age;
+ ww_acquire_init(&ctx, &reservation_ww_class); + if (!vmw_user_object_is_null(&old_vps->uo)) { old_bo = vmw_user_object_buffer(&old_vps->uo); - ret = ttm_bo_reserve(&old_bo->tbo, false, false, NULL); + ret = ttm_bo_reserve(&old_bo->tbo, false, false, &ctx); if (ret != 0) return; } @@ -779,9 +782,14 @@ vmw_du_cursor_plane_atomic_update(struct drm_plane *plane, if (!vmw_user_object_is_null(&vps->uo)) { new_bo = vmw_user_object_buffer(&vps->uo); if (old_bo != new_bo) { - ret = ttm_bo_reserve(&new_bo->tbo, false, false, NULL); - if (ret != 0) + ret = ttm_bo_reserve(&new_bo->tbo, false, false, &ctx); + if (ret != 0) { + if (old_bo) { + ttm_bo_unreserve(&old_bo->tbo); + ww_acquire_fini(&ctx); + } return; + } } else { new_bo = NULL; } @@ -803,10 +811,12 @@ vmw_du_cursor_plane_atomic_update(struct drm_plane *plane, hotspot_x, hotspot_y); }
- if (old_bo) - ttm_bo_unreserve(&old_bo->tbo); if (new_bo) ttm_bo_unreserve(&new_bo->tbo); + if (old_bo) + ttm_bo_unreserve(&old_bo->tbo); + + ww_acquire_fini(&ctx);
du->cursor_x = new_state->crtc_x + du->set_gui_x; du->cursor_y = new_state->crtc_y + du->set_gui_y;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Forbes ian.forbes@broadcom.com
[ Upstream commit b7d40627813799870e72729c6fc979a8a40d9ba6 ]
Adds a new BO param that keeps the reservation locked after creation. This removes the need to re-reserve the BO after creation which is a waste of cycles.
This also fixes a bug in vmw_prime_import_sg_table where the imported reservation is unlocked twice.
Signed-off-by: Ian Forbes ian.forbes@broadcom.com Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export") Reviewed-by: Zack Rusin zack.rusin@broadcom.com Signed-off-by: Zack Rusin zack.rusin@broadcom.com Link: https://patchwork.freedesktop.org/patch/msgid/20250110185335.15301-1-ian.for... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 3 ++- drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 3 ++- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 7 ++----- drivers/gpu/drm/vmwgfx/vmwgfx_gem.c | 1 + drivers/gpu/drm/vmwgfx/vmwgfx_shader.c | 7 ++----- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 5 ++--- 6 files changed, 11 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c index a0e433fbcba67..183cda50094cb 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c @@ -443,7 +443,8 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
if (params->pin) ttm_bo_pin(&vmw_bo->tbo); - ttm_bo_unreserve(&vmw_bo->tbo); + if (!params->keep_resv) + ttm_bo_unreserve(&vmw_bo->tbo);
return 0; } diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h index 43b5439ec9f76..c21ba7ff77368 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h @@ -56,8 +56,9 @@ struct vmw_bo_params { u32 domain; u32 busy_domain; enum ttm_bo_type bo_type; - size_t size; bool pin; + bool keep_resv; + size_t size; struct dma_resv *resv; struct sg_table *sg; }; diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 2825dd3149ed5..2e84e1029732d 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -401,7 +401,8 @@ static int vmw_dummy_query_bo_create(struct vmw_private *dev_priv) .busy_domain = VMW_BO_DOMAIN_SYS, .bo_type = ttm_bo_type_kernel, .size = PAGE_SIZE, - .pin = true + .pin = true, + .keep_resv = true, };
/* @@ -413,10 +414,6 @@ static int vmw_dummy_query_bo_create(struct vmw_private *dev_priv) if (unlikely(ret != 0)) return ret;
- ret = ttm_bo_reserve(&vbo->tbo, false, true, NULL); - BUG_ON(ret != 0); - vmw_bo_pin_reserved(vbo, true); - ret = ttm_bo_kmap(&vbo->tbo, 0, 1, &map); if (likely(ret == 0)) { result = ttm_kmap_obj_virtual(&map, &dummy); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c index b9857f37ca1ac..ed5015ced3920 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gem.c @@ -206,6 +206,7 @@ struct drm_gem_object *vmw_prime_import_sg_table(struct drm_device *dev, .bo_type = ttm_bo_type_sg, .size = attach->dmabuf->size, .pin = false, + .keep_resv = true, .resv = attach->dmabuf->resv, .sg = table,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c index a01ca3226d0af..7fb1c88bcc475 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c @@ -896,7 +896,8 @@ int vmw_compat_shader_add(struct vmw_private *dev_priv, .busy_domain = VMW_BO_DOMAIN_SYS, .bo_type = ttm_bo_type_device, .size = size, - .pin = true + .pin = true, + .keep_resv = true, };
if (!vmw_shader_id_ok(user_key, shader_type)) @@ -906,10 +907,6 @@ int vmw_compat_shader_add(struct vmw_private *dev_priv, if (unlikely(ret != 0)) goto out;
- ret = ttm_bo_reserve(&buf->tbo, false, true, NULL); - if (unlikely(ret != 0)) - goto no_reserve; - /* Map and copy shader bytecode. */ ret = ttm_bo_kmap(&buf->tbo, 0, PFN_UP(size), &map); if (unlikely(ret != 0)) { diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c index 621d98b376bbb..5553892d7c3e0 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c @@ -572,15 +572,14 @@ int vmw_bo_create_and_populate(struct vmw_private *dev_priv, .busy_domain = domain, .bo_type = ttm_bo_type_kernel, .size = bo_size, - .pin = true + .pin = true, + .keep_resv = true, };
ret = vmw_bo_create(dev_priv, &bo_params, &vbo); if (unlikely(ret != 0)) return ret;
- ret = ttm_bo_reserve(&vbo->tbo, false, true, NULL); - BUG_ON(ret != 0); ret = vmw_ttm_populate(vbo->tbo.bdev, vbo->tbo.ttm, &ctx); if (likely(ret == 0)) { struct vmw_ttm_tt *vmw_tt =
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maíra Canal mcanal@igalia.com
[ Upstream commit e4b5ccd392b92300a2b341705cc4805681094e49 ]
After a job completes, the corresponding pointer in the device must be set to NULL. Failing to do so triggers a warning when unloading the driver, as it appears the job is still active. To prevent this, assign the job pointer to NULL after completing the job, indicating the job has finished.
Fixes: 14d1d1908696 ("drm/v3d: Remove the bad signaled() implementation.") Signed-off-by: Maíra Canal mcanal@igalia.com Reviewed-by: Jose Maria Casanova Crespo jmcasanova@igalia.com Link: https://patchwork.freedesktop.org/patch/msgid/20250113154741.67520-1-mcanal@... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/v3d/v3d_irq.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c index 20bf33702c3c4..da203045df9be 100644 --- a/drivers/gpu/drm/v3d/v3d_irq.c +++ b/drivers/gpu/drm/v3d/v3d_irq.c @@ -108,6 +108,7 @@ v3d_irq(int irq, void *arg) v3d_job_update_stats(&v3d->bin_job->base, V3D_BIN); trace_v3d_bcl_irq(&v3d->drm, fence->seqno); dma_fence_signal(&fence->base); + v3d->bin_job = NULL; status = IRQ_HANDLED; }
@@ -118,6 +119,7 @@ v3d_irq(int irq, void *arg) v3d_job_update_stats(&v3d->render_job->base, V3D_RENDER); trace_v3d_rcl_irq(&v3d->drm, fence->seqno); dma_fence_signal(&fence->base); + v3d->render_job = NULL; status = IRQ_HANDLED; }
@@ -128,6 +130,7 @@ v3d_irq(int irq, void *arg) v3d_job_update_stats(&v3d->csd_job->base, V3D_CSD); trace_v3d_csd_irq(&v3d->drm, fence->seqno); dma_fence_signal(&fence->base); + v3d->csd_job = NULL; status = IRQ_HANDLED; }
@@ -165,6 +168,7 @@ v3d_hub_irq(int irq, void *arg) v3d_job_update_stats(&v3d->tfu_job->base, V3D_TFU); trace_v3d_tfu_irq(&v3d->drm, fence->seqno); dma_fence_signal(&fence->base); + v3d->tfu_job = NULL; status = IRQ_HANDLED; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com
[ Upstream commit 1f8af9712413f456849fdf3f3a782cbe099476d7 ]
The platform device named "rzg2l-usb-vbus-regulator", allocated by the rzg2l-usbphy-ctrl driver, is used to instantiate a regulator driver. This regulator driver is associated with a device tree (DT) node, which is a child of the rzg2l-usbphy-ctrl DT node. The regulator's DT node allows consumer nodes to reference the regulator and configure the regulator as needed.
Starting with commit cd7a38c40b23 ("regulator: core: do not silently ignore provided init_data") the struct regulator_dev::dev::of_node is no longer populated using of_node_get(config->of_node) if the regulator does not provide init_data. Since the rzg2l-usb-vbus-regulator does not provide init_data, this behaviour causes the of_find_regulator_by_node() function to fails, resulting in errors when attempting to request the regulator.
To fix this issue, call device_set_of_node_from_dev() for the "rzg2l-usb-vbus-regulator" platform device.
Fixes: 84fbd6198766 ("regulator: Add Renesas RZ/G2L USB VBUS regulator driver") Signed-off-by: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com Reviewed-by: Biju Das biju.das.jz@bp.renesas.com Link: https://lore.kernel.org/r/20241119085554.1035881-1-claudiu.beznea.uj@bp.rene... Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/reset/reset-rzg2l-usbphy-ctrl.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/reset/reset-rzg2l-usbphy-ctrl.c b/drivers/reset/reset-rzg2l-usbphy-ctrl.c index 1cd157f4f03b4..4e2ac1f0060c0 100644 --- a/drivers/reset/reset-rzg2l-usbphy-ctrl.c +++ b/drivers/reset/reset-rzg2l-usbphy-ctrl.c @@ -176,6 +176,7 @@ static int rzg2l_usbphy_ctrl_probe(struct platform_device *pdev) vdev->dev.parent = dev; priv->vdev = vdev;
+ device_set_of_node_from_dev(&vdev->dev, dev); error = platform_device_add(vdev); if (error) goto err_device_put;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: MD Danish Anwar danishanwar@ti.com
[ Upstream commit 202580b60229345dc2637099f10c8a8857c1fdc2 ]
PRUSS APIs in pruss_driver.h produce lots of compilation errors when CONFIG_TI_PRUSS is not set.
The errors and warnings, warning: returning 'void *' from a function with return type 'int' makes integer from pointer without a cast [-Wint-conversion] error: expected identifier or '(' before '{' token
Fix these warnings and errors by fixing the return type of pruss APIs as well as removing the misplaced semicolon from pruss_cfg_xfr_enable()
Fixes: 0211cc1e4fbb ("soc: ti: pruss: Add helper functions to set GPI mode, MII_RT_event and XFR") Signed-off-by: MD Danish Anwar danishanwar@ti.com Reviewed-by: Roger Quadros rogerq@kernel.org Link: https://lore.kernel.org/r/20241220100508.1554309-2-danishanwar@ti.com Signed-off-by: Nishanth Menon nm@ti.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/pruss_driver.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/pruss_driver.h b/include/linux/pruss_driver.h index c9a31c567e85b..2e18fef1a2e10 100644 --- a/include/linux/pruss_driver.h +++ b/include/linux/pruss_driver.h @@ -144,32 +144,32 @@ static inline int pruss_release_mem_region(struct pruss *pruss, static inline int pruss_cfg_get_gpmux(struct pruss *pruss, enum pruss_pru_id pru_id, u8 *mux) { - return ERR_PTR(-EOPNOTSUPP); + return -EOPNOTSUPP; }
static inline int pruss_cfg_set_gpmux(struct pruss *pruss, enum pruss_pru_id pru_id, u8 mux) { - return ERR_PTR(-EOPNOTSUPP); + return -EOPNOTSUPP; }
static inline int pruss_cfg_gpimode(struct pruss *pruss, enum pruss_pru_id pru_id, enum pruss_gpi_mode mode) { - return ERR_PTR(-EOPNOTSUPP); + return -EOPNOTSUPP; }
static inline int pruss_cfg_miirt_enable(struct pruss *pruss, bool enable) { - return ERR_PTR(-EOPNOTSUPP); + return -EOPNOTSUPP; }
static inline int pruss_cfg_xfr_enable(struct pruss *pruss, enum pru_type pru_type, - bool enable); + bool enable) { - return ERR_PTR(-EOPNOTSUPP); + return -EOPNOTSUPP; }
#endif /* CONFIG_TI_PRUSS */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp
[ Upstream commit 3f8c4f5e9a57868fa107016c81165686d23325f2 ]
The reference count of the device incremented in device_initialize() is not decremented when device_add() fails. Add a put_device() call before returning from the function.
This bug was found by an experimental static analysis tool that I am developing.
Fixes: 60f68597024d ("i2c: core: Setup i2c_adapter runtime-pm before calling device_add()") Signed-off-by: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/i2c-core-base.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c index 7c810893bfa33..75d30861ffe21 100644 --- a/drivers/i2c/i2c-core-base.c +++ b/drivers/i2c/i2c-core-base.c @@ -1562,6 +1562,7 @@ static int i2c_register_adapter(struct i2c_adapter *adap) res = device_add(&adap->dev); if (res) { pr_err("adapter '%s': can't register device (%d)\n", adap->name, res); + put_device(&adap->dev); goto out_list; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chenyuan Yang chenyuan0y@gmail.com
[ Upstream commit 1b2128aa2d45ab20b22548dcf4b48906298ca7fd ]
The dell_uart_bl_serdev_probe() function calls devm_serdev_device_open() before setting the client ops via serdev_device_set_client_ops(). This ordering can trigger a NULL pointer dereference in the serdev controller's receive_buf handler, as it assumes serdev->ops is valid when SERPORT_ACTIVE is set.
This is similar to the issue fixed in commit 5e700b384ec1 ("platform/chrome: cros_ec_uart: properly fix race condition") where devm_serdev_device_open() was called before fully initializing the device.
Fix the race by ensuring client ops are set before enabling the port via devm_serdev_device_open().
Note, serdev_device_set_baudrate() and serdev_device_set_flow_control() calls should be after the devm_serdev_device_open() call.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Signed-off-by: Chenyuan Yang chenyuan0y@gmail.com Reviewed-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20250111180118.2274516-1-chenyuan0y@gmail.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/dell/dell-uart-backlight.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/platform/x86/dell/dell-uart-backlight.c b/drivers/platform/x86/dell/dell-uart-backlight.c index 3995f90add456..c45bc332af7a0 100644 --- a/drivers/platform/x86/dell/dell-uart-backlight.c +++ b/drivers/platform/x86/dell/dell-uart-backlight.c @@ -283,6 +283,9 @@ static int dell_uart_bl_serdev_probe(struct serdev_device *serdev) init_waitqueue_head(&dell_bl->wait_queue); dell_bl->dev = dev;
+ serdev_device_set_drvdata(serdev, dell_bl); + serdev_device_set_client_ops(serdev, &dell_uart_bl_serdev_ops); + ret = devm_serdev_device_open(dev, serdev); if (ret) return dev_err_probe(dev, ret, "opening UART device\n"); @@ -290,8 +293,6 @@ static int dell_uart_bl_serdev_probe(struct serdev_device *serdev) /* 9600 bps, no flow control, these are the default but set them to be sure */ serdev_device_set_baudrate(serdev, 9600); serdev_device_set_flow_control(serdev, false); - serdev_device_set_drvdata(serdev, dell_bl); - serdev_device_set_client_ops(serdev, &dell_uart_bl_serdev_ops);
get_version[0] = DELL_SOF(GET_CMD_LEN); get_version[1] = CMD_GET_VERSION;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chenyuan Yang chenyuan0y@gmail.com
[ Upstream commit 59616a91e5e74833b2008b56c66879857c616006 ]
The yt2_1380_fc_serdev_probe() function calls devm_serdev_device_open() before setting the client ops via serdev_device_set_client_ops(). This ordering can trigger a NULL pointer dereference in the serdev controller's receive_buf handler, as it assumes serdev->ops is valid when SERPORT_ACTIVE is set.
This is similar to the issue fixed in commit 5e700b384ec1 ("platform/chrome: cros_ec_uart: properly fix race condition") where devm_serdev_device_open() was called before fully initializing the device.
Fix the race by ensuring client ops are set before enabling the port via devm_serdev_device_open().
Note, serdev_device_set_baudrate() and serdev_device_set_flow_control() calls should be after the devm_serdev_device_open() call.
Fixes: b2ed33e8d486 ("platform/x86: Add lenovo-yoga-tab2-pro-1380-fastcharger driver") Signed-off-by: Chenyuan Yang chenyuan0y@gmail.com Reviewed-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20250111180951.2277757-1-chenyuan0y@gmail.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/lenovo-yoga-tab2-pro-1380-fastcharger.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/platform/x86/lenovo-yoga-tab2-pro-1380-fastcharger.c b/drivers/platform/x86/lenovo-yoga-tab2-pro-1380-fastcharger.c index d525bdc8ca9b3..32d9b6009c422 100644 --- a/drivers/platform/x86/lenovo-yoga-tab2-pro-1380-fastcharger.c +++ b/drivers/platform/x86/lenovo-yoga-tab2-pro-1380-fastcharger.c @@ -199,14 +199,15 @@ static int yt2_1380_fc_serdev_probe(struct serdev_device *serdev) if (ret) return ret;
+ serdev_device_set_drvdata(serdev, fc); + serdev_device_set_client_ops(serdev, &yt2_1380_fc_serdev_ops); + ret = devm_serdev_device_open(dev, serdev); if (ret) return dev_err_probe(dev, ret, "opening UART device\n");
serdev_device_set_baudrate(serdev, 600); serdev_device_set_flow_control(serdev, false); - serdev_device_set_drvdata(serdev, fc); - serdev_device_set_client_ops(serdev, &yt2_1380_fc_serdev_ops);
ret = devm_extcon_register_notifier_all(dev, fc->extcon, &fc->nb); if (ret)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Lechner dlechner@baylibre.com
[ Upstream commit e2c68cea431d65292b592c9f8446c918d45fcf78 ]
Fix several issues with division of negative numbers in the tmp513 driver.
The docs on the DIV_ROUND_CLOSEST macro explain that dividing a negative value by an unsigned type is undefined behavior. The driver was doing this in several places, i.e. data->shunt_uohms has type of u32. The actual "undefined" behavior is that it converts both values to unsigned before doing the division, for example:
int ret = DIV_ROUND_CLOSEST(-100, 3U);
results in ret == 1431655732 instead of -33.
Furthermore the MILLI macro has a type of unsigned long. Multiplying a signed long by an unsigned long results in an unsigned long.
So, we need to cast both MILLI and data data->shunt_uohms to long when using the DIV_ROUND_CLOSEST macro.
Fixes: f07f9d2467f4 ("hwmon: (tmp513) Use SI constants from units.h") Fixes: 59dfa75e5d82 ("hwmon: Add driver for Texas Instruments TMP512/513 sensor chips.") Signed-off-by: David Lechner dlechner@baylibre.com Link: https://lore.kernel.org/r/20250114-fix-si-prefix-macro-sign-bugs-v1-1-696fd8... [groeck: Drop some continuation lines] Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/tmp513.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/hwmon/tmp513.c b/drivers/hwmon/tmp513.c index 1c2cb12071b80..5acbfd7d088dd 100644 --- a/drivers/hwmon/tmp513.c +++ b/drivers/hwmon/tmp513.c @@ -207,7 +207,8 @@ static int tmp51x_get_value(struct tmp51x_data *data, u8 reg, u8 pos, *val = sign_extend32(regval, reg == TMP51X_SHUNT_CURRENT_RESULT ? 16 - tmp51x_get_pga_shift(data) : 15); - *val = DIV_ROUND_CLOSEST(*val * 10 * MILLI, data->shunt_uohms); + *val = DIV_ROUND_CLOSEST(*val * 10 * (long)MILLI, (long)data->shunt_uohms); + break; case TMP51X_BUS_VOLTAGE_RESULT: case TMP51X_BUS_VOLTAGE_H_LIMIT: @@ -223,7 +224,7 @@ static int tmp51x_get_value(struct tmp51x_data *data, u8 reg, u8 pos, case TMP51X_BUS_CURRENT_RESULT: // Current = (ShuntVoltage * CalibrationRegister) / 4096 *val = sign_extend32(regval, 15) * (long)data->curr_lsb_ua; - *val = DIV_ROUND_CLOSEST(*val, MILLI); + *val = DIV_ROUND_CLOSEST(*val, (long)MILLI); break; case TMP51X_LOCAL_TEMP_RESULT: case TMP51X_REMOTE_TEMP_RESULT_1: @@ -263,7 +264,7 @@ static int tmp51x_set_value(struct tmp51x_data *data, u8 reg, long val) * The user enter current value and we convert it to * voltage. 1lsb = 10uV */ - val = DIV_ROUND_CLOSEST(val * data->shunt_uohms, 10 * MILLI); + val = DIV_ROUND_CLOSEST(val * (long)data->shunt_uohms, 10 * (long)MILLI); max_val = U16_MAX >> tmp51x_get_pga_shift(data); regval = clamp_val(val, -max_val, max_val); break;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pratyush Yadav pratyush@kernel.org
[ Upstream commit d15638bf76ad47874ecb5dc386f0945fc0b2a875 ]
This reverts commit 98d1fb94ce75f39febd456d6d3cbbe58b6678795.
The commit uses data nbits instead of addr nbits for dummy phase. This causes a regression for all boards where spi-tx-bus-width is smaller than spi-rx-bus-width. It is a common pattern for boards to have spi-tx-bus-width == 1 and spi-rx-bus-width > 1. The regression causes all reads with a dummy phase to become unavailable for such boards, leading to a usually slower 0-dummy-cycle read being selected.
Most controllers' supports_op hooks call spi_mem_default_supports_op(). In spi_mem_default_supports_op(), spi_mem_check_buswidth() is called to check if the buswidths for the op can actually be supported by the board's wiring. This wiring information comes from (among other things) the spi-{tx,rx}-bus-width DT properties. Based on these properties, SPI_TX_* or SPI_RX_* flags are set by of_spi_parse_dt(). spi_mem_check_buswidth() then uses these flags to make the decision whether an op can be supported by the board's wiring (in a way, indirectly checking against spi-{rx,tx}-bus-width).
Now the tricky bit here is that spi_mem_check_buswidth() does:
if (op->dummy.nbytes && spi_check_buswidth_req(mem, op->dummy.buswidth, true)) return false;
The true argument to spi_check_buswidth_req() means the op is treated as a TX op. For a board that has say 1-bit TX and 4-bit RX, a 4-bit dummy TX is considered as unsupported, and the op gets rejected.
The commit being reverted uses the data buswidth for dummy buswidth. So for reads, the RX buswidth gets used for the dummy phase, uncovering this issue. In reality, a dummy phase is neither RX nor TX. As the name suggests, these are just dummy cycles that send or receive no data, and thus don't really need to have any buswidth at all.
Ideally, dummy phases should not be checked against the board's wiring capabilities at all, and should only be sanity-checked for having a sane buswidth value. Since we are now at rc7 and such a change might introduce many unexpected bugs, revert the commit for now. It can be sent out later along with the spi_mem_check_buswidth() fix.
Fixes: 98d1fb94ce75 ("mtd: spi-nor: core: replace dummy buswidth from addr to data") Reported-by: Alexander Stein alexander.stein@ew.tq-group.com Closes: https://lore.kernel.org/linux-mtd/3342163.44csPzL39Z@steina-w/ Tested-by: Alexander Stein alexander.stein@ew.tq-group.com Reviewed-by: Tudor Ambarus tudor.ambarus@linaro.org Signed-off-by: Pratyush Yadav pratyush@kernel.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/spi-nor/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c index 8c57df44c40fe..9d6e85bf227b9 100644 --- a/drivers/mtd/spi-nor/core.c +++ b/drivers/mtd/spi-nor/core.c @@ -89,7 +89,7 @@ void spi_nor_spimem_setup_op(const struct spi_nor *nor, op->addr.buswidth = spi_nor_get_protocol_addr_nbits(proto);
if (op->dummy.nbytes) - op->dummy.buswidth = spi_nor_get_protocol_data_nbits(proto); + op->dummy.buswidth = spi_nor_get_protocol_addr_nbits(proto);
if (op->data.nbytes) op->data.buswidth = spi_nor_get_protocol_data_nbits(proto);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit ca89f73394daf92779ddaa37b42956f4953f3941 ]
When misconfigured, the initial setup of the current mux channel can fail, too. It must be checked as well.
Fixes: 50a5ba876908 ("i2c: mux: demux-pinctrl: add driver") Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/muxes/i2c-demux-pinctrl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/i2c/muxes/i2c-demux-pinctrl.c b/drivers/i2c/muxes/i2c-demux-pinctrl.c index 7e2686b606c04..cec7f3447e193 100644 --- a/drivers/i2c/muxes/i2c-demux-pinctrl.c +++ b/drivers/i2c/muxes/i2c-demux-pinctrl.c @@ -261,7 +261,9 @@ static int i2c_demux_pinctrl_probe(struct platform_device *pdev) pm_runtime_no_callbacks(&pdev->dev);
/* switch to first parent as active master */ - i2c_demux_activate_master(priv, 0); + err = i2c_demux_activate_master(priv, 0); + if (err) + goto err_rollback;
err = device_create_file(&pdev->dev, &dev_attr_available_masters); if (err)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 093f70c134f70e4632b295240f07d2b50b74e247 ]
When this controller is a target, the NACK handling had two issues. First, the return value from the backend was not checked on the initial WRITE_REQUESTED. So, the driver missed to send a NACK in this case. Also, the NACK always arrives one byte late on the bus, even in the WRITE_RECEIVED case. This seems to be a HW issue. We should then not rely on the backend to correctly NACK the superfluous byte as well. Fix both issues by introducing a flag which gets set whenever the backend requests a NACK and keep sending it until we get a STOP condition.
Fixes: de20d1857dd6 ("i2c: rcar: add slave support") Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-rcar.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c index 9267df38c2d0a..3991224148214 100644 --- a/drivers/i2c/busses/i2c-rcar.c +++ b/drivers/i2c/busses/i2c-rcar.c @@ -130,6 +130,8 @@ #define ID_P_PM_BLOCKED BIT(31) #define ID_P_MASK GENMASK(31, 27)
+#define ID_SLAVE_NACK BIT(0) + enum rcar_i2c_type { I2C_RCAR_GEN1, I2C_RCAR_GEN2, @@ -166,6 +168,7 @@ struct rcar_i2c_priv { int irq;
struct i2c_client *host_notify_client; + u8 slave_flags; };
#define rcar_i2c_priv_to_dev(p) ((p)->adap.dev.parent) @@ -655,6 +658,7 @@ static bool rcar_i2c_slave_irq(struct rcar_i2c_priv *priv) { u32 ssr_raw, ssr_filtered; u8 value; + int ret;
ssr_raw = rcar_i2c_read(priv, ICSSR) & 0xff; ssr_filtered = ssr_raw & rcar_i2c_read(priv, ICSIER); @@ -670,7 +674,10 @@ static bool rcar_i2c_slave_irq(struct rcar_i2c_priv *priv) rcar_i2c_write(priv, ICRXTX, value); rcar_i2c_write(priv, ICSIER, SDE | SSR | SAR); } else { - i2c_slave_event(priv->slave, I2C_SLAVE_WRITE_REQUESTED, &value); + ret = i2c_slave_event(priv->slave, I2C_SLAVE_WRITE_REQUESTED, &value); + if (ret) + priv->slave_flags |= ID_SLAVE_NACK; + rcar_i2c_read(priv, ICRXTX); /* dummy read */ rcar_i2c_write(priv, ICSIER, SDR | SSR | SAR); } @@ -683,18 +690,21 @@ static bool rcar_i2c_slave_irq(struct rcar_i2c_priv *priv) if (ssr_filtered & SSR) { i2c_slave_event(priv->slave, I2C_SLAVE_STOP, &value); rcar_i2c_write(priv, ICSCR, SIE | SDBS); /* clear our NACK */ + priv->slave_flags &= ~ID_SLAVE_NACK; rcar_i2c_write(priv, ICSIER, SAR); rcar_i2c_write(priv, ICSSR, ~SSR & 0xff); }
/* master wants to write to us */ if (ssr_filtered & SDR) { - int ret; - value = rcar_i2c_read(priv, ICRXTX); ret = i2c_slave_event(priv->slave, I2C_SLAVE_WRITE_RECEIVED, &value); - /* Send NACK in case of error */ - rcar_i2c_write(priv, ICSCR, SIE | SDBS | (ret < 0 ? FNA : 0)); + if (ret) + priv->slave_flags |= ID_SLAVE_NACK; + + /* Send NACK in case of error, but it will come 1 byte late :( */ + rcar_i2c_write(priv, ICSCR, SIE | SDBS | + (priv->slave_flags & ID_SLAVE_NACK ? FNA : 0)); rcar_i2c_write(priv, ICSSR, ~SDR & 0xff); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 6ad30f7890423341f4b79329af1f9b9bb3cdec03 ]
This backend requests a NACK from the controller driver when it detects an error. If that request gets ignored from some reason, subsequent accesses will wrongly be handled OK. To fix this, an error now changes the state machine, so the backend will report NACK until a STOP condition has been detected. This make the driver more robust against controllers which will sadly apply the NACK not to the current byte but the next one.
Fixes: a8335c64c5f0 ("i2c: add slave testunit driver") Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/i2c-slave-testunit.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/i2c/i2c-slave-testunit.c b/drivers/i2c/i2c-slave-testunit.c index 9fe3150378e86..7ae0c7902f670 100644 --- a/drivers/i2c/i2c-slave-testunit.c +++ b/drivers/i2c/i2c-slave-testunit.c @@ -38,6 +38,7 @@ enum testunit_regs {
enum testunit_flags { TU_FLAG_IN_PROCESS, + TU_FLAG_NACK, };
struct testunit_data { @@ -90,8 +91,10 @@ static int i2c_slave_testunit_slave_cb(struct i2c_client *client,
switch (event) { case I2C_SLAVE_WRITE_REQUESTED: - if (test_bit(TU_FLAG_IN_PROCESS, &tu->flags)) - return -EBUSY; + if (test_bit(TU_FLAG_IN_PROCESS | TU_FLAG_NACK, &tu->flags)) { + ret = -EBUSY; + break; + }
memset(tu->regs, 0, TU_NUM_REGS); tu->reg_idx = 0; @@ -99,8 +102,10 @@ static int i2c_slave_testunit_slave_cb(struct i2c_client *client, break;
case I2C_SLAVE_WRITE_RECEIVED: - if (test_bit(TU_FLAG_IN_PROCESS, &tu->flags)) - return -EBUSY; + if (test_bit(TU_FLAG_IN_PROCESS | TU_FLAG_NACK, &tu->flags)) { + ret = -EBUSY; + break; + }
if (tu->reg_idx < TU_NUM_REGS) tu->regs[tu->reg_idx] = *val; @@ -129,6 +134,8 @@ static int i2c_slave_testunit_slave_cb(struct i2c_client *client, * here because we still need them in the workqueue! */ tu->reg_idx = 0; + + clear_bit(TU_FLAG_NACK, &tu->flags); break;
case I2C_SLAVE_READ_PROCESSED: @@ -151,6 +158,10 @@ static int i2c_slave_testunit_slave_cb(struct i2c_client *client, break; }
+ /* If an error occurred somewhen, we NACK everything until next STOP */ + if (ret) + set_bit(TU_FLAG_NACK, &tu->flags); + return ret; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Lechner dlechner@baylibre.com
[ Upstream commit e9b24deb84863c5a77dda5be57b6cb5bf4127b85 ]
Fix use of DIV_ROUND_CLOSEST where a possibly negative value is divided by an unsigned type by casting the unsigned type to the signed type of the same size (st->r_sense_uohm[channel] has type of u32).
The docs on the DIV_ROUND_CLOSEST macro explain that dividing a negative value by an unsigned type is undefined behavior. The actual behavior is that it converts both values to unsigned before doing the division, for example:
int ret = DIV_ROUND_CLOSEST(-100, 3U);
results in ret == 1431655732 instead of -33.
Fixes: 2b9ea4262ae9 ("hwmon: Add driver for ltc2991") Signed-off-by: David Lechner dlechner@baylibre.com Link: https://lore.kernel.org/r/20250115-hwmon-ltc2991-fix-div-round-closest-v1-1-... Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/ltc2991.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/ltc2991.c b/drivers/hwmon/ltc2991.c index 7ca139e4b6aff..6d5d4cb846daf 100644 --- a/drivers/hwmon/ltc2991.c +++ b/drivers/hwmon/ltc2991.c @@ -125,7 +125,7 @@ static int ltc2991_get_curr(struct ltc2991_state *st, u32 reg, int channel,
/* Vx-Vy, 19.075uV/LSB */ *val = DIV_ROUND_CLOSEST(sign_extend32(reg_val, 14) * 19075, - st->r_sense_uohm[channel]); + (s32)st->r_sense_uohm[channel]);
return 0; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
[ Upstream commit fa2f9906a7b333ba757a7dbae0713d8a5396186e ]
When shutting down the server in cifs_put_tcp_session(), cifsd thread might be reconnecting to multiple DFS targets before it realizes it should exit the loop, so @server->hostname can't be freed as long as cifsd thread isn't done. Otherwise the following can happen:
RIP: 0010:__slab_free+0x223/0x3c0 Code: 5e 41 5f c3 cc cc cc cc 4c 89 de 4c 89 cf 44 89 44 24 08 4c 89 1c 24 e8 fb cf 8e 00 44 8b 44 24 08 4c 8b 1c 24 e9 5f fe ff ff <0f> 0b 41 f7 45 08 00 0d 21 00 0f 85 2d ff ff ff e9 1f ff ff ff 80 RSP: 0018:ffffb26180dbfd08 EFLAGS: 00010246 RAX: ffff8ea34728e510 RBX: ffff8ea34728e500 RCX: 0000000000800068 RDX: 0000000000800068 RSI: 0000000000000000 RDI: ffff8ea340042400 RBP: ffffe112041ca380 R08: 0000000000000001 R09: 0000000000000000 R10: 6170732e31303000 R11: 70726f632e786563 R12: ffff8ea34728e500 R13: ffff8ea340042400 R14: ffff8ea34728e500 R15: 0000000000800068 FS: 0000000000000000(0000) GS:ffff8ea66fd80000(0000) 000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffc25376080 CR3: 000000012a2ba001 CR4: PKRU: 55555554 Call Trace: <TASK> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __reconnect_target_unlocked+0x3e/0x160 [cifs] ? __die_body.cold+0x8/0xd ? die+0x2b/0x50 ? do_trap+0xce/0x120 ? __slab_free+0x223/0x3c0 ? do_error_trap+0x65/0x80 ? __slab_free+0x223/0x3c0 ? exc_invalid_op+0x4e/0x70 ? __slab_free+0x223/0x3c0 ? asm_exc_invalid_op+0x16/0x20 ? __slab_free+0x223/0x3c0 ? extract_hostname+0x5c/0xa0 [cifs] ? extract_hostname+0x5c/0xa0 [cifs] ? __kmalloc+0x4b/0x140 __reconnect_target_unlocked+0x3e/0x160 [cifs] reconnect_dfs_server+0x145/0x430 [cifs] cifs_handle_standard+0x1ad/0x1d0 [cifs] cifs_demultiplex_thread+0x592/0x730 [cifs] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK>
Fixes: 7be3248f3139 ("cifs: To match file servers, make sure the server hostname matches") Reported-by: Jay Shin jaeshin@redhat.com Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/connect.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index fe40152b915d8..fb51cdf552061 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -1044,6 +1044,7 @@ clean_demultiplex_info(struct TCP_Server_Info *server) /* Release netns reference for this server. */ put_net(cifs_net_ns(server)); kfree(server->leaf_fullpath); + kfree(server->hostname); kfree(server);
length = atomic_dec_return(&tcpSesAllocCount); @@ -1670,8 +1671,6 @@ cifs_put_tcp_session(struct TCP_Server_Info *server, int from_reconnect) kfree_sensitive(server->session_key.response); server->session_key.response = NULL; server->session_key.len = 0; - kfree(server->hostname); - server->hostname = NULL;
task = xchg(&server->tsk, NULL); if (task)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lizhi Xu lizhi.xu@windriver.com
[ Upstream commit eb09fbeb48709fe66c0d708aed81e910a577a30a ]
syzkaller reported a corrupted list in ieee802154_if_remove. [1]
Remove an IEEE 802.15.4 network interface after unregister an IEEE 802.15.4 hardware device from the system.
CPU0 CPU1 ==== ==== genl_family_rcv_msg_doit ieee802154_unregister_hw ieee802154_del_iface ieee802154_remove_interfaces rdev_del_virtual_intf_deprecated list_del(&sdata->list) ieee802154_if_remove list_del_rcu
The net device has been unregistered, since the rcu grace period, unregistration must be run before ieee802154_if_remove.
To avoid this issue, add a check for local->interfaces before deleting sdata list.
[1] kernel BUG at lib/list_debug.c:58! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 UID: 0 PID: 6277 Comm: syz-executor157 Not tainted 6.12.0-rc6-syzkaller-00005-g557329bcecc2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:__list_del_entry_valid_or_report+0xf4/0x140 lib/list_debug.c:56 Code: e8 a1 7e 00 07 90 0f 0b 48 c7 c7 e0 37 60 8c 4c 89 fe e8 8f 7e 00 07 90 0f 0b 48 c7 c7 40 38 60 8c 4c 89 fe e8 7d 7e 00 07 90 <0f> 0b 48 c7 c7 a0 38 60 8c 4c 89 fe e8 6b 7e 00 07 90 0f 0b 48 c7 RSP: 0018:ffffc9000490f3d0 EFLAGS: 00010246 RAX: 000000000000004e RBX: dead000000000122 RCX: d211eee56bb28d00 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffff88805b278dd8 R08: ffffffff8174a12c R09: 1ffffffff2852f0d R10: dffffc0000000000 R11: fffffbfff2852f0e R12: dffffc0000000000 R13: dffffc0000000000 R14: dead000000000100 R15: ffff88805b278cc0 FS: 0000555572f94380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056262e4a3000 CR3: 0000000078496000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __list_del_entry_valid include/linux/list.h:124 [inline] __list_del_entry include/linux/list.h:215 [inline] list_del_rcu include/linux/rculist.h:157 [inline] ieee802154_if_remove+0x86/0x1e0 net/mac802154/iface.c:687 rdev_del_virtual_intf_deprecated net/ieee802154/rdev-ops.h:24 [inline] ieee802154_del_iface+0x2c0/0x5c0 net/ieee802154/nl-phy.c:323 genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2551 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357 netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:729 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:744 ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2607 ___sys_sendmsg net/socket.c:2661 [inline] __sys_sendmsg+0x292/0x380 net/socket.c:2690 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Reported-and-tested-by: syzbot+985f827280dc3a6e7e92@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=985f827280dc3a6e7e92 Signed-off-by: Lizhi Xu lizhi.xu@windriver.com Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/20241113095129.1457225-1-lizhi.xu@windriver.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac802154/iface.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c index c0e2da5072bea..9e4631fade90c 100644 --- a/net/mac802154/iface.c +++ b/net/mac802154/iface.c @@ -684,6 +684,10 @@ void ieee802154_if_remove(struct ieee802154_sub_if_data *sdata) ASSERT_RTNL();
mutex_lock(&sdata->local->iflist_mtx); + if (list_empty(&sdata->local->interfaces)) { + mutex_unlock(&sdata->local->iflist_mtx); + return; + } list_del_rcu(&sdata->list); mutex_unlock(&sdata->local->iflist_mtx);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leo Stone leocstone@gmail.com
[ Upstream commit b905bafdea21a75d75a96855edd9e0b6051eee30 ]
In the syzbot reproducer, the hfs_cat_rec for the root dir has type HFS_CDR_FIL after being read with hfs_bnode_read() in hfs_super_fill(). This indicates it should be used as an hfs_cat_file, which is 102 bytes. Only the first 70 bytes of that struct are initialized, however, because the entrylength passed into hfs_bnode_read() is still the length of a directory record. This causes uninitialized values to be used later on, when the hfs_cat_rec union is treated as the larger hfs_cat_file struct.
Add a check to make sure the retrieved record has the correct type for the root directory (HFS_CDR_DIR), and make sure we load the correct number of bytes for a directory record.
Reported-by: syzbot+2db3c7526ba68f4ea776@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2db3c7526ba68f4ea776 Tested-by: syzbot+2db3c7526ba68f4ea776@syzkaller.appspotmail.com Tested-by: Leo Stone leocstone@gmail.com Signed-off-by: Leo Stone leocstone@gmail.com Link: https://lore.kernel.org/r/20241201051420.77858-1-leocstone@gmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/hfs/super.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/hfs/super.c b/fs/hfs/super.c index eeac99765f0d6..cf13b5cc10848 100644 --- a/fs/hfs/super.c +++ b/fs/hfs/super.c @@ -419,11 +419,13 @@ static int hfs_fill_super(struct super_block *sb, void *data, int silent) goto bail_no_root; res = hfs_cat_find_brec(sb, HFS_ROOT_CNID, &fd); if (!res) { - if (fd.entrylength > sizeof(rec) || fd.entrylength < 0) { + if (fd.entrylength != sizeof(rec.dir)) { res = -EIO; goto bail_hfs_find; } hfs_bnode_read(fd.bnode, &rec, fd.entryoffset, fd.entrylength); + if (rec.type != HFS_CDR_DIR) + res = -EIO; } if (res) goto bail_hfs_find;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brahmajit Das brahmajit.xyz@gmail.com
[ Upstream commit 989e0cdc0f18a594b25cabc60426d29659aeaf58 ]
qnx6_checkroot() had been using weirdly spelled initializer - it needed to initialize 3-element arrays of char and it used NUL-padded 3-character string literals (i.e. 4-element initializers, with completely pointless zeroes at the end).
That had been spotted by gcc-15[*]; prior to that gcc quietly dropped the 4th element of initializers.
However, none of that had been needed in the first place - all this array is used for is checking that the first directory entry in root directory is "." and the second - "..". The check had been expressed as a loop, using that match_root[] array. Since there is no chance that we ever want to extend that list of entries, the entire thing is much too fancy for its own good; what we need is just a couple of explicit memcmp() and that's it.
[*]: fs/qnx6/inode.c: In function ‘qnx6_checkroot’: fs/qnx6/inode.c:182:41: error: initializer-string for array of ‘char’ is too long [-Werror=unterminated-string-initialization] 182 | static char match_root[2][3] = {".\0\0", "..\0"}; | ^~~~~~~ fs/qnx6/inode.c:182:50: error: initializer-string for array of ‘char’ is too long [-Werror=unterminated-string-initialization] 182 | static char match_root[2][3] = {".\0\0", "..\0"}; | ^~~~~~
Signed-off-by: Brahmajit Das brahmajit.xyz@gmail.com Link: https://lore.kernel.org/r/20241004195132.1393968-1-brahmajit.xyz@gmail.com Acked-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/qnx6/inode.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c index 85925ec0051a9..3310d1ad4d0e9 100644 --- a/fs/qnx6/inode.c +++ b/fs/qnx6/inode.c @@ -179,8 +179,7 @@ static int qnx6_statfs(struct dentry *dentry, struct kstatfs *buf) */ static const char *qnx6_checkroot(struct super_block *s) { - static char match_root[2][3] = {".\0\0", "..\0"}; - int i, error = 0; + int error = 0; struct qnx6_dir_entry *dir_entry; struct inode *root = d_inode(s->s_root); struct address_space *mapping = root->i_mapping; @@ -189,11 +188,9 @@ static const char *qnx6_checkroot(struct super_block *s) if (IS_ERR(folio)) return "error reading root directory"; dir_entry = kmap_local_folio(folio, 0); - for (i = 0; i < 2; i++) { - /* maximum 3 bytes - due to match_root limitation */ - if (strncmp(dir_entry[i].de_fname, match_root[i], 3)) - error = 1; - } + if (memcmp(dir_entry[0].de_fname, ".", 2) || + memcmp(dir_entry[1].de_fname, "..", 3)) + error = 1; folio_release_kmap(folio, dir_entry); if (error) return "error reading root directory.";
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Kunbo zhangkunbo@huawei.com
[ Upstream commit 2b2fc0be98a828cf33a88a28e9745e8599fb05cf ]
fs/file.c should include include/linux/init_task.h for declaration of init_files. This fixes the sparse warning:
fs/file.c:501:21: warning: symbol 'init_files' was not declared. Should it be static?
Signed-off-by: Zhang Kunbo zhangkunbo@huawei.com Link: https://lore.kernel.org/r/20241217071836.2634868-1-zhangkunbo@huawei.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/file.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/file.c b/fs/file.c index eb093e7369720..4cb952541dd03 100644 --- a/fs/file.c +++ b/fs/file.c @@ -21,6 +21,7 @@ #include <linux/rcupdate.h> #include <linux/close_range.h> #include <net/sock.h> +#include <linux/init_task.h>
#include "internal.h"
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 973b710b8821c3401ad7a25360c89e94b26884ac ]
Tell tar to ignore silly-rename files (".__afs*" and ".nfs*") when building the header archive. These occur when a file that is open is unlinked locally, but hasn't yet been closed. Such files are visible to the user via the getdents() syscall and so programs may want to do things with them.
During the kernel build, such files may be made during the processing of header files and the cleanup may get deferred by fput() which may result in tar seeing these files when it reads the directory, but they may have disappeared by the time it tries to open them, causing tar to fail with an error. Further, we don't want to include them in the tarball if they still exist.
With CONFIG_HEADERS_INSTALL=y, something like the following may be seen:
find: './kernel/.tmp_cpio_dir/include/dt-bindings/reset/.__afs2080': No such file or directory tar: ./include/linux/greybus/.__afs3C95: File removed before we read it
The find warning doesn't seem to cause a problem.
Fix this by telling tar when called from in gen_kheaders.sh to exclude such files. This only affects afs and nfs; cifs uses the Windows Hidden attribute to prevent the file from being seen.
Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20241213135013.2964079-2-dhowells@redhat.com cc: Masahiro Yamada masahiroy@kernel.org cc: Marc Dionne marc.dionne@auristor.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-kernel@vger.kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/gen_kheaders.sh | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/gen_kheaders.sh b/kernel/gen_kheaders.sh index 383fd43ac6122..7e1340da5acae 100755 --- a/kernel/gen_kheaders.sh +++ b/kernel/gen_kheaders.sh @@ -89,6 +89,7 @@ find $cpio_dir -type f -print0 |
# Create archive and try to normalize metadata for reproducibility. tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \ + --exclude=".__afs*" --exclude=".nfs*" \ --owner=0 --group=0 --sort=name --numeric-owner --mode=u=rw,go=r,a+X \ -I $XZ -cf $tarfile -C $cpio_dir/ . > /dev/null
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit c8b90d40d5bba8e6fba457b8a7c10d3c0d467e37 ]
When a read subrequest finishes, if it doesn't have sufficient coverage to complete the folio(s) covering either side of it, it will donate the excess coverage to the adjacent subrequests on either side, offloading responsibility for unlocking the folio(s) covered to them.
Now, preference is given to donating down to a lower file offset over donating up because that check is done first - but there's no check that the lower subreq is actually contiguous, and so we can end up donating incorrectly.
The scenario seen[1] is that an 8MiB readahead request spanning four 2MiB folios is split into eight 1MiB subreqs (numbered 1 through 8). These terminate in the order 1,6,2,5,3,7,4,8. What happens is:
- 1 donates to 2 - 6 donates to 5 - 2 completes, unlocking the first folio (with 1). - 5 completes, unlocking the third folio (with 6). - 3 donates to 4 - 7 donates to 4 incorrectly - 4 completes, unlocking the second folio (with 3), but can't use the excess from 7. - 8 donates to 4, also incorrectly.
Fix this by preventing downward donation if the subreqs are not contiguous (in the example above, 7 donates to 4 across the gap left by 5 and 6).
Reported-by: Shyam Prasad N nspmangalore@gmail.com Closes: https://lore.kernel.org/r/CANT5p=qBwjBm-D8soFVVtswGEfmMtQXVW83=TNfUtvyHeFQZB... Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/526707.1733224486@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/20241213135013.2964079-3-dhowells@redhat.com cc: Steve French sfrench@samba.org cc: Paulo Alcantara pc@manguebit.com cc: Jeff Layton jlayton@kernel.org cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/netfs/read_collect.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c index e70eb4ea21c03..a44132c986538 100644 --- a/fs/netfs/read_collect.c +++ b/fs/netfs/read_collect.c @@ -249,16 +249,17 @@ static bool netfs_consume_read_data(struct netfs_io_subrequest *subreq, bool was
/* Deal with the trickiest case: that this subreq is in the middle of a * folio, not touching either edge, but finishes first. In such a - * case, we donate to the previous subreq, if there is one, so that the - * donation is only handled when that completes - and remove this - * subreq from the list. + * case, we donate to the previous subreq, if there is one and if it is + * contiguous, so that the donation is only handled when that completes + * - and remove this subreq from the list. * * If the previous subreq finished first, we will have acquired their * donation and should be able to unlock folios and/or donate nextwards. */ if (!subreq->consumed && !prev_donated && - !list_is_first(&subreq->rreq_link, &rreq->subrequests)) { + !list_is_first(&subreq->rreq_link, &rreq->subrequests) && + subreq->start == prev->start + prev->len) { prev = list_prev_entry(subreq, rreq_link); WRITE_ONCE(prev->next_donated, prev->next_donated + subreq->len); subreq->start += subreq->len;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Kellermann max.kellermann@ionos.com
[ Upstream commit e5a8b6446c0d370716f193771ccacf3260a57534 ]
Instead of storing an opaque string, call security_secctx_to_secid() right in the "secctx" command handler and store only the numeric "secid". This eliminates an unnecessary string allocation and allows the daemon to receive errors when writing the "secctx" command instead of postponing the error to the "bind" command handler. For example, if the kernel was built without `CONFIG_SECURITY`, "bind" will return `EOPNOTSUPP`, but the daemon doesn't know why. With this patch, the "secctx" will instead return `EOPNOTSUPP` which is the right context for this error.
This patch adds a boolean flag `have_secid` because I'm not sure if we can safely assume that zero is the special secid value for "not set". This appears to be true for SELinux, Smack and AppArmor, but since this attribute is not documented, I'm unable to derive a stable guarantee for that.
Signed-off-by: Max Kellermann max.kellermann@ionos.com Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20241209141554.638708-1-max.kellermann@ionos.com/ Link: https://lore.kernel.org/r/20241213135013.2964079-6-dhowells@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/daemon.c | 14 +++++++------- fs/cachefiles/internal.h | 3 ++- fs/cachefiles/security.c | 6 +++--- 3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 89b11336a8369..1806bff8e59bc 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -15,6 +15,7 @@ #include <linux/namei.h> #include <linux/poll.h> #include <linux/mount.h> +#include <linux/security.h> #include <linux/statfs.h> #include <linux/ctype.h> #include <linux/string.h> @@ -576,7 +577,7 @@ static int cachefiles_daemon_dir(struct cachefiles_cache *cache, char *args) */ static int cachefiles_daemon_secctx(struct cachefiles_cache *cache, char *args) { - char *secctx; + int err;
_enter(",%s", args);
@@ -585,16 +586,16 @@ static int cachefiles_daemon_secctx(struct cachefiles_cache *cache, char *args) return -EINVAL; }
- if (cache->secctx) { + if (cache->have_secid) { pr_err("Second security context specified\n"); return -EINVAL; }
- secctx = kstrdup(args, GFP_KERNEL); - if (!secctx) - return -ENOMEM; + err = security_secctx_to_secid(args, strlen(args), &cache->secid); + if (err) + return err;
- cache->secctx = secctx; + cache->have_secid = true; return 0; }
@@ -820,7 +821,6 @@ static void cachefiles_daemon_unbind(struct cachefiles_cache *cache) put_cred(cache->cache_cred);
kfree(cache->rootdirname); - kfree(cache->secctx); kfree(cache->tag);
_leave(""); diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 7b99bd98de75b..38c236e38cef8 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -122,7 +122,6 @@ struct cachefiles_cache { #define CACHEFILES_STATE_CHANGED 3 /* T if state changed (poll trigger) */ #define CACHEFILES_ONDEMAND_MODE 4 /* T if in on-demand read mode */ char *rootdirname; /* name of cache root directory */ - char *secctx; /* LSM security context */ char *tag; /* cache binding tag */ refcount_t unbind_pincount;/* refcount to do daemon unbind */ struct xarray reqs; /* xarray of pending on-demand requests */ @@ -130,6 +129,8 @@ struct cachefiles_cache { struct xarray ondemand_ids; /* xarray for ondemand_id allocation */ u32 ondemand_id_next; u32 msg_id_next; + u32 secid; /* LSM security id */ + bool have_secid; /* whether "secid" was set */ };
static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache) diff --git a/fs/cachefiles/security.c b/fs/cachefiles/security.c index fe777164f1d89..fc6611886b3b5 100644 --- a/fs/cachefiles/security.c +++ b/fs/cachefiles/security.c @@ -18,7 +18,7 @@ int cachefiles_get_security_ID(struct cachefiles_cache *cache) struct cred *new; int ret;
- _enter("{%s}", cache->secctx); + _enter("{%u}", cache->have_secid ? cache->secid : 0);
new = prepare_kernel_cred(current); if (!new) { @@ -26,8 +26,8 @@ int cachefiles_get_security_ID(struct cachefiles_cache *cache) goto error; }
- if (cache->secctx) { - ret = set_security_override_from_ctx(new, cache->secctx); + if (cache->have_secid) { + ret = set_security_override(new, cache->secid); if (ret < 0) { put_cred(new); pr_err("Security denies permission to nominate security context: error %d\n",
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
[ Upstream commit bb9850704c043e48c86cc9df90ee102e8a338229 ]
Otherwise, the default levels will override the levels set by the host controller drivers.
Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Link: https://lore.kernel.org/r/20241219-ufs-qcom-suspend-fix-v3-2-63c4b95a70b9@li... Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ufs/core/ufshcd.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 05b936ad353be..6cc9e61cca07d 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -10589,14 +10589,17 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) }
/* - * Set the default power management level for runtime and system PM. + * Set the default power management level for runtime and system PM if + * not set by the host controller drivers. * Default power saving mode is to keep UFS link in Hibern8 state * and UFS device in sleep state. */ - hba->rpm_lvl = ufs_get_desired_pm_lvl_for_dev_link_state( + if (!hba->rpm_lvl) + hba->rpm_lvl = ufs_get_desired_pm_lvl_for_dev_link_state( UFS_SLEEP_PWR_MODE, UIC_LINK_HIBERN8_STATE); - hba->spm_lvl = ufs_get_desired_pm_lvl_for_dev_link_state( + if (!hba->spm_lvl) + hba->spm_lvl = ufs_get_desired_pm_lvl_for_dev_link_state( UFS_SLEEP_PWR_MODE, UIC_LINK_HIBERN8_STATE);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Koichiro Den koichiro.den@canonical.com
[ Upstream commit c7c434c1dba955005f5161dae73f09c0a922cfa7 ]
Once a virtuser device is instantiated and actively used, allowing rmdir for its configfs serves no purpose and can be confusing. Userspace interacts with the virtual consumer at arbitrary times, meaning it depends on its existence.
Make the subsystem itself depend on the configfs entry for a virtuser device while it is in active use.
Signed-off-by: Koichiro Den koichiro.den@canonical.com Link: https://lore.kernel.org/r/20250103141829.430662-4-koichiro.den@canonical.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-virtuser.c | 47 ++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 7 deletions(-)
diff --git a/drivers/gpio/gpio-virtuser.c b/drivers/gpio/gpio-virtuser.c index d6244f0d3bc75..e89f299f21400 100644 --- a/drivers/gpio/gpio-virtuser.c +++ b/drivers/gpio/gpio-virtuser.c @@ -1546,6 +1546,30 @@ gpio_virtuser_device_deactivate(struct gpio_virtuser_device *dev) dev->pdev = NULL; }
+static void +gpio_virtuser_device_lockup_configfs(struct gpio_virtuser_device *dev, bool lock) +{ + struct configfs_subsystem *subsys = dev->group.cg_subsys; + struct gpio_virtuser_lookup_entry *entry; + struct gpio_virtuser_lookup *lookup; + + /* + * The device only needs to depend on leaf lookup entries. This is + * sufficient to lock up all the configfs entries that the + * instantiated, alive device depends on. + */ + list_for_each_entry(lookup, &dev->lookup_list, siblings) { + list_for_each_entry(entry, &lookup->entry_list, siblings) { + if (lock) + WARN_ON(configfs_depend_item_unlocked( + subsys, &entry->group.cg_item)); + else + configfs_undepend_item_unlocked( + &entry->group.cg_item); + } + } +} + static ssize_t gpio_virtuser_device_config_live_store(struct config_item *item, const char *page, size_t count) @@ -1558,15 +1582,24 @@ gpio_virtuser_device_config_live_store(struct config_item *item, if (ret) return ret;
- guard(mutex)(&dev->lock); + if (live) + gpio_virtuser_device_lockup_configfs(dev, true);
- if (live == gpio_virtuser_device_is_live(dev)) - return -EPERM; + scoped_guard(mutex, &dev->lock) { + if (live == gpio_virtuser_device_is_live(dev)) + ret = -EPERM; + else if (live) + ret = gpio_virtuser_device_activate(dev); + else + gpio_virtuser_device_deactivate(dev); + }
- if (live) - ret = gpio_virtuser_device_activate(dev); - else - gpio_virtuser_device_deactivate(dev); + /* + * Undepend is required only if device disablement (live == 0) + * succeeds or if device enablement (live == 1) fails. + */ + if (live == !!ret) + gpio_virtuser_device_lockup_configfs(dev, false);
return ret ?: count; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Koichiro Den koichiro.den@canonical.com
[ Upstream commit 8bd76b3d3f3af7ac2898b6a27ad90c444fec418f ]
Once a sim device is instantiated and actively used, allowing rmdir for its configfs serves no purpose and can be confusing. Effectively, arbitrary users start depending on its existence.
Make the subsystem itself depend on the configfs entry for a sim device while it is in active use.
Signed-off-by: Koichiro Den koichiro.den@canonical.com Link: https://lore.kernel.org/r/20250103141829.430662-5-koichiro.den@canonical.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-sim.c | 48 +++++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 7 deletions(-)
diff --git a/drivers/gpio/gpio-sim.c b/drivers/gpio/gpio-sim.c index dcca1d7f173e5..deedacdeb2395 100644 --- a/drivers/gpio/gpio-sim.c +++ b/drivers/gpio/gpio-sim.c @@ -1030,6 +1030,30 @@ static void gpio_sim_device_deactivate(struct gpio_sim_device *dev) dev->pdev = NULL; }
+static void +gpio_sim_device_lockup_configfs(struct gpio_sim_device *dev, bool lock) +{ + struct configfs_subsystem *subsys = dev->group.cg_subsys; + struct gpio_sim_bank *bank; + struct gpio_sim_line *line; + + /* + * The device only needs to depend on leaf line entries. This is + * sufficient to lock up all the configfs entries that the + * instantiated, alive device depends on. + */ + list_for_each_entry(bank, &dev->bank_list, siblings) { + list_for_each_entry(line, &bank->line_list, siblings) { + if (lock) + WARN_ON(configfs_depend_item_unlocked( + subsys, &line->group.cg_item)); + else + configfs_undepend_item_unlocked( + &line->group.cg_item); + } + } +} + static ssize_t gpio_sim_device_config_live_store(struct config_item *item, const char *page, size_t count) @@ -1042,14 +1066,24 @@ gpio_sim_device_config_live_store(struct config_item *item, if (ret) return ret;
- guard(mutex)(&dev->lock); + if (live) + gpio_sim_device_lockup_configfs(dev, true);
- if (live == gpio_sim_device_is_live(dev)) - ret = -EPERM; - else if (live) - ret = gpio_sim_device_activate(dev); - else - gpio_sim_device_deactivate(dev); + scoped_guard(mutex, &dev->lock) { + if (live == gpio_sim_device_is_live(dev)) + ret = -EPERM; + else if (live) + ret = gpio_sim_device_activate(dev); + else + gpio_sim_device_deactivate(dev); + } + + /* + * Undepend is required only if device disablement (live == 0) + * succeeds or if device enablement (live == 1) fails. + */ + if (live == !!ret) + gpio_sim_device_lockup_configfs(dev, false);
return ret ?: count; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit e95274dfe86490ec2a5633035c24b2de6722841f ]
After previous change rshift >= 32 is no longer allowed. Modify the test to use 31, the test doesn't seem to send any traffic so the exact value shouldn't matter.
Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20250103182458.1213486-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/tc-testing/tc-tests/filters/flow.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json b/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json index 58189327f6444..383fbda07245c 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json +++ b/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json @@ -78,10 +78,10 @@ "setup": [ "$TC qdisc add dev $DEV1 ingress" ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: handle 1 prio 1 protocol ip flow map key dst rshift 0xff", + "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: handle 1 prio 1 protocol ip flow map key dst rshift 0x1f", "expExitCode": "0", "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 protocol ip prio 1 flow", - "matchPattern": "filter parent ffff: protocol ip pref 1 flow chain [0-9]+ handle 0x1 map keys dst rshift 255 baseclass", + "matchPattern": "filter parent ffff: protocol ip pref 1 flow chain [0-9]+ handle 0x1 map keys dst rshift 31 baseclass", "matchCount": "1", "teardown": [ "$TC qdisc del dev $DEV1 ingress"
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
[ Upstream commit bee9a0838fd223823e5a6d85c055ab1691dc738e ]
Add Clearwater Forest support (INTEL_ATOM_DARKMONT_X) to tpmi_cpu_ids to support domaid id mappings.
Signed-off-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Link: https://lore.kernel.org/r/20250103155255.1488139-1-srinivas.pandruvada@linux... Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/intel/tpmi_power_domains.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/platform/x86/intel/tpmi_power_domains.c b/drivers/platform/x86/intel/tpmi_power_domains.c index 0609a8320f7ec..12fb0943b5dc3 100644 --- a/drivers/platform/x86/intel/tpmi_power_domains.c +++ b/drivers/platform/x86/intel/tpmi_power_domains.c @@ -81,6 +81,7 @@ static const struct x86_cpu_id tpmi_cpu_ids[] = { X86_MATCH_VFM(INTEL_GRANITERAPIDS_X, NULL), X86_MATCH_VFM(INTEL_ATOM_CRESTMONT_X, NULL), X86_MATCH_VFM(INTEL_ATOM_CRESTMONT, NULL), + X86_MATCH_VFM(INTEL_ATOM_DARKMONT_X, NULL), X86_MATCH_VFM(INTEL_GRANITERAPIDS_D, NULL), X86_MATCH_VFM(INTEL_PANTHERCOVE_X, NULL), {}
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
[ Upstream commit cc1ff7bc1bb378e7c46992c977b605e97d908801 ]
Add Clearwater Forest (INTEL_ATOM_DARKMONT_X) to SST support list by adding to isst_cpu_ids.
Signed-off-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Link: https://lore.kernel.org/r/20250103155255.1488139-2-srinivas.pandruvada@linux... Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/intel/speed_select_if/isst_if_common.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c index 1e46e30dae966..dbcd3087aaa4b 100644 --- a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c +++ b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c @@ -804,6 +804,7 @@ EXPORT_SYMBOL_GPL(isst_if_cdev_unregister); static const struct x86_cpu_id isst_cpu_ids[] = { X86_MATCH_VFM(INTEL_ATOM_CRESTMONT, SST_HPM_SUPPORTED), X86_MATCH_VFM(INTEL_ATOM_CRESTMONT_X, SST_HPM_SUPPORTED), + X86_MATCH_VFM(INTEL_ATOM_DARKMONT_X, SST_HPM_SUPPORTED), X86_MATCH_VFM(INTEL_EMERALDRAPIDS_X, 0), X86_MATCH_VFM(INTEL_GRANITERAPIDS_D, SST_HPM_SUPPORTED), X86_MATCH_VFM(INTEL_GRANITERAPIDS_X, SST_HPM_SUPPORTED),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit cd4a7b2e6a2437a5502910c08128ea3bad55a80b ]
acpi_dev_irq_override() gets called approx. 30 times during boot (15 legacy IRQs * 2 override_table entries). Of these 30 calls at max 1 will match the non DMI checks done by acpi_dev_irq_override(). The dmi_check_system() check is by far the most expensive check done by acpi_dev_irq_override(), make this call the last check done by acpi_dev_irq_override() so that it will be called at max 1 time instead of 30 times.
Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Link: https://patch.msgid.link/20241228165253.42584-1-hdegoede@redhat.com [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index d27a3bf96f80d..90aaec923889c 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -689,11 +689,11 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, for (i = 0; i < ARRAY_SIZE(override_table); i++) { const struct irq_override_cmp *entry = &override_table[i];
- if (dmi_check_system(entry->system) && - entry->irq == gsi && + if (entry->irq == gsi && entry->triggering == triggering && entry->polarity == polarity && - entry->shareable == shareable) + entry->shareable == shareable && + dmi_check_system(entry->system)) return entry->override; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Henry Huang henry.hj@antgroup.com
[ Upstream commit 30dd3b13f9de612ef7328ccffcf1a07d0d40ab51 ]
When %SCX_OPS_ENQ_LAST is set and prev->scx.slice != 0, @prev will be dispacthed into the local DSQ in put_prev_task_scx(). However, pick_task_scx() is executed before put_prev_task_scx(), so it will not pick @prev. Set %SCX_RQ_BAL_KEEP in balance_one() to ensure that pick_task_scx() can pick @prev.
Signed-off-by: Henry Huang henry.hj@antgroup.com Acked-by: Andrea Righi arighi@nvidia.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/ext.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index f928a67a07d29..4c4681cb9337b 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2630,6 +2630,7 @@ static int balance_one(struct rq *rq, struct task_struct *prev) { struct scx_dsp_ctx *dspc = this_cpu_ptr(scx_dsp_ctx); bool prev_on_scx = prev->sched_class == &ext_sched_class; + bool prev_on_rq = prev->scx.flags & SCX_TASK_QUEUED; int nr_loops = SCX_DSP_MAX_LOOPS;
lockdep_assert_rq_held(rq); @@ -2662,8 +2663,7 @@ static int balance_one(struct rq *rq, struct task_struct *prev) * See scx_ops_disable_workfn() for the explanation on the * bypassing test. */ - if ((prev->scx.flags & SCX_TASK_QUEUED) && - prev->scx.slice && !scx_rq_bypassing(rq)) { + if (prev_on_rq && prev->scx.slice && !scx_rq_bypassing(rq)) { rq->scx.flags |= SCX_RQ_BAL_KEEP; goto has_tasks; } @@ -2696,6 +2696,10 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
flush_dispatch_buf(rq);
+ if (prev_on_rq && prev->scx.slice) { + rq->scx.flags |= SCX_RQ_BAL_KEEP; + goto has_tasks; + } if (rq->scx.local_dsq.nr) goto has_tasks; if (consume_global_dsq(rq)) @@ -2721,8 +2725,7 @@ static int balance_one(struct rq *rq, struct task_struct *prev) * Didn't find another task to run. Keep running @prev unless * %SCX_OPS_ENQ_LAST is in effect. */ - if ((prev->scx.flags & SCX_TASK_QUEUED) && - (!static_branch_unlikely(&scx_ops_enq_last) || + if (prev_on_rq && (!static_branch_unlikely(&scx_ops_enq_last) || scx_rq_bypassing(rq))) { rq->scx.flags |= SCX_RQ_BAL_KEEP; goto has_tasks;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marco Nelissen marco.nelissen@gmail.com
[ Upstream commit c13094b894de289514d84b8db56d1f2931a0bade ]
on 32-bit kernels, iomap_write_delalloc_scan() was inadvertently using a 32-bit position due to folio_next_index() returning an unsigned long. This could lead to an infinite loop when writing to an xfs filesystem.
Signed-off-by: Marco Nelissen marco.nelissen@gmail.com Link: https://lore.kernel.org/r/20250109041253.2494374-1-marco.nelissen@gmail.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/iomap/buffered-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 25d1ede6bb0eb..1bad460275ebe 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1138,7 +1138,7 @@ static void iomap_write_delalloc_scan(struct inode *inode, start_byte, end_byte, iomap, punch);
/* move offset to start of next folio in range */ - start_byte = folio_next_index(folio) << PAGE_SHIFT; + start_byte = folio_pos(folio) + folio_size(folio); folio_unlock(folio); folio_put(folio); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lizhi Xu lizhi.xu@windriver.com
[ Upstream commit 17a4fde81d3a7478d97d15304a6d61094a10c2e3 ]
syzbot reported a lock held when returning to userspace[1]. This is because if argc is less than 0 and the function returns directly, the held inode lock is not released.
Fix this by store the error in ret and jump to done to clean up instead of returning directly.
[dh: Modified Lizhi Xu's original patch to make it honour the error code from afs_split_string()]
[1] WARNING: lock held when returning to user space! 6.13.0-rc3-syzkaller-00209-g499551201b5f #0 Not tainted ------------------------------------------------ syz-executor133/5823 is leaving the kernel with locks still held! 1 lock held by syz-executor133/5823: #0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: inode_lock include/linux/fs.h:818 [inline] #0: ffff888071cffc00 (&sb->s_type->i_mutex_key#9){++++}-{4:4}, at: afs_proc_addr_prefs_write+0x2bb/0x14e0 fs/afs/addr_prefs.c:388
Reported-by: syzbot+76f33569875eb708e575@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=76f33569875eb708e575 Signed-off-by: Lizhi Xu lizhi.xu@windriver.com Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20241226012616.2348907-1-lizhi.xu@windriver.com/ Link: https://lore.kernel.org/r/529850.1736261552@warthog.procyon.org.uk Tested-by: syzbot+76f33569875eb708e575@syzkaller.appspotmail.com cc: Marc Dionne marc.dionne@auristor.com cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/addr_prefs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/afs/addr_prefs.c b/fs/afs/addr_prefs.c index a189ff8a5034e..c0384201b8feb 100644 --- a/fs/afs/addr_prefs.c +++ b/fs/afs/addr_prefs.c @@ -413,8 +413,10 @@ int afs_proc_addr_prefs_write(struct file *file, char *buf, size_t size)
do { argc = afs_split_string(&buf, argv, ARRAY_SIZE(argv)); - if (argc < 0) - return argc; + if (argc < 0) { + ret = argc; + goto done; + } if (argc < 2) goto inval;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
[ Upstream commit cacd9ae4bf801ff4125d8961bb9a3ba955e51680 ]
As the comment above waitqueue_active() explains, it can only be used if both waker and waiter have mb()'s that pair with each other. However __pollwait() is broken in this respect.
This is not pipe-specific, but let's look at pipe_poll() for example:
poll_wait(...); // -> __pollwait() -> add_wait_queue()
LOAD(pipe->head); LOAD(pipe->head);
In theory these LOAD()'s can leak into the critical section inside add_wait_queue() and can happen before list_add(entry, wq_head), in this case pipe_poll() can race with wakeup_pipe_readers/writers which do
smp_mb(); if (waitqueue_active(wq_head)) wake_up_interruptible(wq_head);
There are more __pollwait()-like functions (grep init_poll_funcptr), and it seems that at least ep_ptable_queue_proc() has the same problem, so the patch adds smp_mb() into poll_wait().
Link: https://lore.kernel.org/all/20250102163320.GA17691@redhat.com/ Signed-off-by: Oleg Nesterov oleg@redhat.com Link: https://lore.kernel.org/r/20250107162717.GA18922@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/poll.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/linux/poll.h b/include/linux/poll.h index d1ea4f3714a84..fc641b50f1298 100644 --- a/include/linux/poll.h +++ b/include/linux/poll.h @@ -41,8 +41,16 @@ typedef struct poll_table_struct {
static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p) { - if (p && p->_qproc && wait_address) + if (p && p->_qproc && wait_address) { p->_qproc(filp, wait_address, p); + /* + * This memory barrier is paired in the wq_has_sleeper(). + * See the comment above prepare_to_wait(), we need to + * ensure that subsequent tests in this thread can't be + * reordered with __add_wait_queue() in _qproc() paths. + */ + smp_mb(); + } }
/*
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ihor Solodrai ihor.solodrai@pm.me
[ Upstream commit ef7009decc30eb2515a64253791d61b72229c119 ]
The selftests are falining to build on current tip of bpf-next and sched_ext [1]. This has broken BPF CI [2] after merge from upstream.
Use appropriate function names in the selftests according to the recent changes in the sched_ext API [3].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=... [2] https://github.com/kernel-patches/bpf/actions/runs/11959327258/job/333409237... [3] https://lore.kernel.org/all/20241109194853.580310-1-tj@kernel.org/
Signed-off-by: Ihor Solodrai ihor.solodrai@pm.me Acked-by: Andrea Righi arighi@nvidia.com Acked-by: David Vernet void@manifault.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../testing/selftests/sched_ext/ddsp_bogus_dsq_fail.bpf.c | 2 +- .../selftests/sched_ext/ddsp_vtimelocal_fail.bpf.c | 4 ++-- tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 2 +- .../selftests/sched_ext/enq_select_cpu_fails.bpf.c | 2 +- tools/testing/selftests/sched_ext/exit.bpf.c | 4 ++-- tools/testing/selftests/sched_ext/maximal.bpf.c | 4 ++-- tools/testing/selftests/sched_ext/select_cpu_dfl.bpf.c | 2 +- .../selftests/sched_ext/select_cpu_dfl_nodispatch.bpf.c | 2 +- .../testing/selftests/sched_ext/select_cpu_dispatch.bpf.c | 2 +- .../selftests/sched_ext/select_cpu_dispatch_bad_dsq.bpf.c | 2 +- .../selftests/sched_ext/select_cpu_dispatch_dbl_dsp.bpf.c | 4 ++-- tools/testing/selftests/sched_ext/select_cpu_vtime.bpf.c | 8 ++++---- 12 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/sched_ext/ddsp_bogus_dsq_fail.bpf.c b/tools/testing/selftests/sched_ext/ddsp_bogus_dsq_fail.bpf.c index 37d9bf6fb7458..6f4c3f5a1c5d9 100644 --- a/tools/testing/selftests/sched_ext/ddsp_bogus_dsq_fail.bpf.c +++ b/tools/testing/selftests/sched_ext/ddsp_bogus_dsq_fail.bpf.c @@ -20,7 +20,7 @@ s32 BPF_STRUCT_OPS(ddsp_bogus_dsq_fail_select_cpu, struct task_struct *p, * If we dispatch to a bogus DSQ that will fall back to the * builtin global DSQ, we fail gracefully. */ - scx_bpf_dispatch_vtime(p, 0xcafef00d, SCX_SLICE_DFL, + scx_bpf_dsq_insert_vtime(p, 0xcafef00d, SCX_SLICE_DFL, p->scx.dsq_vtime, 0); return cpu; } diff --git a/tools/testing/selftests/sched_ext/ddsp_vtimelocal_fail.bpf.c b/tools/testing/selftests/sched_ext/ddsp_vtimelocal_fail.bpf.c index dffc97d9cdf14..e4a55027778fd 100644 --- a/tools/testing/selftests/sched_ext/ddsp_vtimelocal_fail.bpf.c +++ b/tools/testing/selftests/sched_ext/ddsp_vtimelocal_fail.bpf.c @@ -17,8 +17,8 @@ s32 BPF_STRUCT_OPS(ddsp_vtimelocal_fail_select_cpu, struct task_struct *p,
if (cpu >= 0) { /* Shouldn't be allowed to vtime dispatch to a builtin DSQ. */ - scx_bpf_dispatch_vtime(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, - p->scx.dsq_vtime, 0); + scx_bpf_dsq_insert_vtime(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, + p->scx.dsq_vtime, 0); return cpu; }
diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c index 6a7db1502c29e..6325bf76f47ee 100644 --- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c +++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c @@ -45,7 +45,7 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev)
target = bpf_get_prandom_u32() % nr_cpus;
- scx_bpf_dispatch(p, SCX_DSQ_LOCAL_ON | target, SCX_SLICE_DFL, 0); + scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL_ON | target, SCX_SLICE_DFL, 0); bpf_task_release(p); }
diff --git a/tools/testing/selftests/sched_ext/enq_select_cpu_fails.bpf.c b/tools/testing/selftests/sched_ext/enq_select_cpu_fails.bpf.c index 1efb50d61040a..a7cf868d5e311 100644 --- a/tools/testing/selftests/sched_ext/enq_select_cpu_fails.bpf.c +++ b/tools/testing/selftests/sched_ext/enq_select_cpu_fails.bpf.c @@ -31,7 +31,7 @@ void BPF_STRUCT_OPS(enq_select_cpu_fails_enqueue, struct task_struct *p, /* Can only call from ops.select_cpu() */ scx_bpf_select_cpu_dfl(p, 0, 0, &found);
- scx_bpf_dispatch(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags); + scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags); }
SEC(".struct_ops.link") diff --git a/tools/testing/selftests/sched_ext/exit.bpf.c b/tools/testing/selftests/sched_ext/exit.bpf.c index d75d4faf07f6d..4bc36182d3ffc 100644 --- a/tools/testing/selftests/sched_ext/exit.bpf.c +++ b/tools/testing/selftests/sched_ext/exit.bpf.c @@ -33,7 +33,7 @@ void BPF_STRUCT_OPS(exit_enqueue, struct task_struct *p, u64 enq_flags) if (exit_point == EXIT_ENQUEUE) EXIT_CLEANLY();
- scx_bpf_dispatch(p, DSQ_ID, SCX_SLICE_DFL, enq_flags); + scx_bpf_dsq_insert(p, DSQ_ID, SCX_SLICE_DFL, enq_flags); }
void BPF_STRUCT_OPS(exit_dispatch, s32 cpu, struct task_struct *p) @@ -41,7 +41,7 @@ void BPF_STRUCT_OPS(exit_dispatch, s32 cpu, struct task_struct *p) if (exit_point == EXIT_DISPATCH) EXIT_CLEANLY();
- scx_bpf_consume(DSQ_ID); + scx_bpf_dsq_move_to_local(DSQ_ID); }
void BPF_STRUCT_OPS(exit_enable, struct task_struct *p) diff --git a/tools/testing/selftests/sched_ext/maximal.bpf.c b/tools/testing/selftests/sched_ext/maximal.bpf.c index 4d4cd8d966dba..4c005fa718103 100644 --- a/tools/testing/selftests/sched_ext/maximal.bpf.c +++ b/tools/testing/selftests/sched_ext/maximal.bpf.c @@ -20,7 +20,7 @@ s32 BPF_STRUCT_OPS(maximal_select_cpu, struct task_struct *p, s32 prev_cpu,
void BPF_STRUCT_OPS(maximal_enqueue, struct task_struct *p, u64 enq_flags) { - scx_bpf_dispatch(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags); + scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags); }
void BPF_STRUCT_OPS(maximal_dequeue, struct task_struct *p, u64 deq_flags) @@ -28,7 +28,7 @@ void BPF_STRUCT_OPS(maximal_dequeue, struct task_struct *p, u64 deq_flags)
void BPF_STRUCT_OPS(maximal_dispatch, s32 cpu, struct task_struct *prev) { - scx_bpf_consume(SCX_DSQ_GLOBAL); + scx_bpf_dsq_move_to_local(SCX_DSQ_GLOBAL); }
void BPF_STRUCT_OPS(maximal_runnable, struct task_struct *p, u64 enq_flags) diff --git a/tools/testing/selftests/sched_ext/select_cpu_dfl.bpf.c b/tools/testing/selftests/sched_ext/select_cpu_dfl.bpf.c index f171ac4709706..13d0f5be788d1 100644 --- a/tools/testing/selftests/sched_ext/select_cpu_dfl.bpf.c +++ b/tools/testing/selftests/sched_ext/select_cpu_dfl.bpf.c @@ -30,7 +30,7 @@ void BPF_STRUCT_OPS(select_cpu_dfl_enqueue, struct task_struct *p, } scx_bpf_put_idle_cpumask(idle_mask);
- scx_bpf_dispatch(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags); + scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags); }
SEC(".struct_ops.link") diff --git a/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.bpf.c b/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.bpf.c index 9efdbb7da9288..815f1d5d61ac4 100644 --- a/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.bpf.c +++ b/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.bpf.c @@ -67,7 +67,7 @@ void BPF_STRUCT_OPS(select_cpu_dfl_nodispatch_enqueue, struct task_struct *p, saw_local = true; }
- scx_bpf_dispatch(p, dsq_id, SCX_SLICE_DFL, enq_flags); + scx_bpf_dsq_insert(p, dsq_id, SCX_SLICE_DFL, enq_flags); }
s32 BPF_STRUCT_OPS(select_cpu_dfl_nodispatch_init_task, diff --git a/tools/testing/selftests/sched_ext/select_cpu_dispatch.bpf.c b/tools/testing/selftests/sched_ext/select_cpu_dispatch.bpf.c index 59bfc4f36167a..4bb99699e9209 100644 --- a/tools/testing/selftests/sched_ext/select_cpu_dispatch.bpf.c +++ b/tools/testing/selftests/sched_ext/select_cpu_dispatch.bpf.c @@ -29,7 +29,7 @@ s32 BPF_STRUCT_OPS(select_cpu_dispatch_select_cpu, struct task_struct *p, cpu = prev_cpu;
dispatch: - scx_bpf_dispatch(p, dsq_id, SCX_SLICE_DFL, 0); + scx_bpf_dsq_insert(p, dsq_id, SCX_SLICE_DFL, 0); return cpu; }
diff --git a/tools/testing/selftests/sched_ext/select_cpu_dispatch_bad_dsq.bpf.c b/tools/testing/selftests/sched_ext/select_cpu_dispatch_bad_dsq.bpf.c index 3bbd5fcdfb18e..2a75de11b2cfd 100644 --- a/tools/testing/selftests/sched_ext/select_cpu_dispatch_bad_dsq.bpf.c +++ b/tools/testing/selftests/sched_ext/select_cpu_dispatch_bad_dsq.bpf.c @@ -18,7 +18,7 @@ s32 BPF_STRUCT_OPS(select_cpu_dispatch_bad_dsq_select_cpu, struct task_struct *p s32 prev_cpu, u64 wake_flags) { /* Dispatching to a random DSQ should fail. */ - scx_bpf_dispatch(p, 0xcafef00d, SCX_SLICE_DFL, 0); + scx_bpf_dsq_insert(p, 0xcafef00d, SCX_SLICE_DFL, 0);
return prev_cpu; } diff --git a/tools/testing/selftests/sched_ext/select_cpu_dispatch_dbl_dsp.bpf.c b/tools/testing/selftests/sched_ext/select_cpu_dispatch_dbl_dsp.bpf.c index 0fda57fe0ecfa..99d075695c974 100644 --- a/tools/testing/selftests/sched_ext/select_cpu_dispatch_dbl_dsp.bpf.c +++ b/tools/testing/selftests/sched_ext/select_cpu_dispatch_dbl_dsp.bpf.c @@ -18,8 +18,8 @@ s32 BPF_STRUCT_OPS(select_cpu_dispatch_dbl_dsp_select_cpu, struct task_struct *p s32 prev_cpu, u64 wake_flags) { /* Dispatching twice in a row is disallowed. */ - scx_bpf_dispatch(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0); - scx_bpf_dispatch(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0); + scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0); + scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, 0);
return prev_cpu; } diff --git a/tools/testing/selftests/sched_ext/select_cpu_vtime.bpf.c b/tools/testing/selftests/sched_ext/select_cpu_vtime.bpf.c index e6c67bcf5e6e3..bfcb96cd4954b 100644 --- a/tools/testing/selftests/sched_ext/select_cpu_vtime.bpf.c +++ b/tools/testing/selftests/sched_ext/select_cpu_vtime.bpf.c @@ -2,8 +2,8 @@ /* * A scheduler that validates that enqueue flags are properly stored and * applied at dispatch time when a task is directly dispatched from - * ops.select_cpu(). We validate this by using scx_bpf_dispatch_vtime(), and - * making the test a very basic vtime scheduler. + * ops.select_cpu(). We validate this by using scx_bpf_dsq_insert_vtime(), + * and making the test a very basic vtime scheduler. * * Copyright (c) 2024 Meta Platforms, Inc. and affiliates. * Copyright (c) 2024 David Vernet dvernet@meta.com @@ -47,13 +47,13 @@ s32 BPF_STRUCT_OPS(select_cpu_vtime_select_cpu, struct task_struct *p, cpu = prev_cpu; scx_bpf_test_and_clear_cpu_idle(cpu); ddsp: - scx_bpf_dispatch_vtime(p, VTIME_DSQ, SCX_SLICE_DFL, task_vtime(p), 0); + scx_bpf_dsq_insert_vtime(p, VTIME_DSQ, SCX_SLICE_DFL, task_vtime(p), 0); return cpu; }
void BPF_STRUCT_OPS(select_cpu_vtime_dispatch, s32 cpu, struct task_struct *p) { - if (scx_bpf_consume(VTIME_DSQ)) + if (scx_bpf_dsq_move_to_local(VTIME_DSQ)) consumed = true; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Vernet void@manifault.com
[ Upstream commit b8f614207b0d5e4abd6df8d5cb3cc11f009d1d93 ]
maximal.bpf.c is still dispatching to and consuming from SCX_DSQ_GLOBAL. Let's have it use its own DSQ to avoid any runtime errors.
Signed-off-by: David Vernet void@manifault.com Tested-by: Andrea Righi arighi@nvidia.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/sched_ext/maximal.bpf.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/sched_ext/maximal.bpf.c b/tools/testing/selftests/sched_ext/maximal.bpf.c index 4c005fa718103..430f5e13bf554 100644 --- a/tools/testing/selftests/sched_ext/maximal.bpf.c +++ b/tools/testing/selftests/sched_ext/maximal.bpf.c @@ -12,6 +12,8 @@
char _license[] SEC("license") = "GPL";
+#define DSQ_ID 0 + s32 BPF_STRUCT_OPS(maximal_select_cpu, struct task_struct *p, s32 prev_cpu, u64 wake_flags) { @@ -20,7 +22,7 @@ s32 BPF_STRUCT_OPS(maximal_select_cpu, struct task_struct *p, s32 prev_cpu,
void BPF_STRUCT_OPS(maximal_enqueue, struct task_struct *p, u64 enq_flags) { - scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags); + scx_bpf_dsq_insert(p, DSQ_ID, SCX_SLICE_DFL, enq_flags); }
void BPF_STRUCT_OPS(maximal_dequeue, struct task_struct *p, u64 deq_flags) @@ -28,7 +30,7 @@ void BPF_STRUCT_OPS(maximal_dequeue, struct task_struct *p, u64 deq_flags)
void BPF_STRUCT_OPS(maximal_dispatch, s32 cpu, struct task_struct *prev) { - scx_bpf_dsq_move_to_local(SCX_DSQ_GLOBAL); + scx_bpf_dsq_move_to_local(DSQ_ID); }
void BPF_STRUCT_OPS(maximal_runnable, struct task_struct *p, u64 enq_flags) @@ -123,7 +125,7 @@ void BPF_STRUCT_OPS(maximal_cgroup_set_weight, struct cgroup *cgrp, u32 weight)
s32 BPF_STRUCT_OPS_SLEEPABLE(maximal_init) { - return 0; + return scx_bpf_create_dsq(DSQ_ID, -1); }
void BPF_STRUCT_OPS(maximal_exit, struct scx_exit_info *info)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hongguang Gao hongguang.gao@broadcom.com
[ Upstream commit 34db8ec931b84d1426423f263b1927539e73b397 ]
Current driver implementation doesn't populate the port_num field in query_qp. Adding the code to convert internal firmware port id to ibv defined port number and export it.
Reviewed-by: Saravanan Vajravel saravanan.vajravel@broadcom.com Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: Hongguang Gao hongguang.gao@broadcom.com Signed-off-by: Selvin Xavier selvin.xavier@broadcom.com Link: https://patch.msgid.link/20241211083931.968831-5-kalesh-anakkur.purayil@broa... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 1 + drivers/infiniband/hw/bnxt_re/ib_verbs.h | 4 ++++ drivers/infiniband/hw/bnxt_re/qplib_fp.c | 1 + drivers/infiniband/hw/bnxt_re/qplib_fp.h | 1 + 4 files changed, 7 insertions(+)
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index b20cffcc3e7d2..14e434ff51ede 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -2269,6 +2269,7 @@ int bnxt_re_query_qp(struct ib_qp *ib_qp, struct ib_qp_attr *qp_attr, qp_attr->retry_cnt = qplib_qp->retry_cnt; qp_attr->rnr_retry = qplib_qp->rnr_retry; qp_attr->min_rnr_timer = qplib_qp->min_rnr_timer; + qp_attr->port_num = __to_ib_port_num(qplib_qp->port_id); qp_attr->rq_psn = qplib_qp->rq.psn; qp_attr->max_rd_atomic = qplib_qp->max_rd_atomic; qp_attr->sq_psn = qplib_qp->sq.psn; diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.h b/drivers/infiniband/hw/bnxt_re/ib_verbs.h index b789e47ec97a8..9cd8f770d1b27 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.h +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.h @@ -264,6 +264,10 @@ void bnxt_re_dealloc_ucontext(struct ib_ucontext *context); int bnxt_re_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); void bnxt_re_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
+static inline u32 __to_ib_port_num(u16 port_id) +{ + return (u32)port_id + 1; +}
unsigned long bnxt_re_lock_cqs(struct bnxt_re_qp *qp); void bnxt_re_unlock_cqs(struct bnxt_re_qp *qp, unsigned long flags); diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c index 828e2f9808012..613b5fc70e13e 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c @@ -1479,6 +1479,7 @@ int bnxt_qplib_query_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp) qp->dest_qpn = le32_to_cpu(sb->dest_qp_id); memcpy(qp->smac, sb->src_mac, 6); qp->vlan_id = le16_to_cpu(sb->vlan_pcp_vlan_dei_vlan_id); + qp->port_id = le16_to_cpu(sb->port_id); bail: dma_free_coherent(&rcfw->pdev->dev, sbuf.size, sbuf.sb, sbuf.dma_addr); diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.h b/drivers/infiniband/hw/bnxt_re/qplib_fp.h index d8c71c024613b..6f02954eb1429 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.h +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.h @@ -298,6 +298,7 @@ struct bnxt_qplib_qp { u32 dest_qpn; u8 smac[6]; u16 vlan_id; + u16 port_id; u8 nw_type; struct bnxt_qplib_ah ah;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejun Heo tj@kernel.org
[ Upstream commit ce2b93fc1dfa1c82f2576aa571731c4e5dcc8dd7 ]
The dsp_local_on selftest expects the scheduler to fail by trying to schedule an e.g. CPU-affine task to the wrong CPU. However, this isn't guaranteed to happen in the 1 second window that the test is running. Besides, it's odd to have this particular exception path tested when there are no other tests that verify that the interface is working at all - e.g. the test would pass if dsp_local_on interface is completely broken and fails on any attempt.
Flip the test so that it verifies that the feature works. While at it, fix a typo in the info message.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Ihor Solodrai ihor.solodrai@pm.me Link: http://lkml.kernel.org/r/Z1n9v7Z6iNJ-wKmq@slm.duckdns.org Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 5 ++++- tools/testing/selftests/sched_ext/dsp_local_on.c | 5 +++-- 2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c index 6325bf76f47ee..fbda6bf546712 100644 --- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c +++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c @@ -43,7 +43,10 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev) if (!p) return;
- target = bpf_get_prandom_u32() % nr_cpus; + if (p->nr_cpus_allowed == nr_cpus) + target = bpf_get_prandom_u32() % nr_cpus; + else + target = scx_bpf_task_cpu(p);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL_ON | target, SCX_SLICE_DFL, 0); bpf_task_release(p); diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.c b/tools/testing/selftests/sched_ext/dsp_local_on.c index 472851b568548..0ff27e57fe430 100644 --- a/tools/testing/selftests/sched_ext/dsp_local_on.c +++ b/tools/testing/selftests/sched_ext/dsp_local_on.c @@ -34,9 +34,10 @@ static enum scx_test_status run(void *ctx) /* Just sleeping is fine, plenty of scheduling events happening */ sleep(1);
- SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_ERROR)); bpf_link__destroy(link);
+ SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG)); + return SCX_TEST_PASS; }
@@ -50,7 +51,7 @@ static void cleanup(void *ctx) struct scx_test dsp_local_on = { .name = "dsp_local_on", .description = "Verify we can directly dispatch tasks to a local DSQs " - "from osp.dispatch()", + "from ops.dispatch()", .setup = setup, .run = run, .cleanup = cleanup,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luis Chamberlain mcgrof@kernel.org
[ Upstream commit b579d6fdc3a9149bb4d2b3133cc0767130ed13e6 ]
Ensure we propagate npwg to the target as well instead of assuming its the same logical blocks per physical block.
This ensures devices with large IUs information properly propagated on the target.
Signed-off-by: Luis Chamberlain mcgrof@kernel.org Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/io-cmd-bdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c index 0bda83d0fc3e0..eaf31c823cbe8 100644 --- a/drivers/nvme/target/io-cmd-bdev.c +++ b/drivers/nvme/target/io-cmd-bdev.c @@ -36,7 +36,7 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id) */ id->nsfeat |= 1 << 4; /* NPWG = Namespace Preferred Write Granularity. 0's based */ - id->npwg = lpp0b; + id->npwg = to0based(bdev_io_min(bdev) / bdev_logical_block_size(bdev)); /* NPWA = Namespace Preferred Write Alignment. 0's based */ id->npwa = id->npwg; /* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit 6d71a9c6160479899ee744d2c6d6602a191deb1f ]
I noticed this in my traces today:
turbostat-1222 [006] d..2. 311.935649: reweight_entity: (ffff888108f13e00-ffff88885ef38440-6) { weight: 1048576 avg_vruntime: 3184159639071 vruntime: 3184159640194 (-1123) deadline: 3184162621107 } -> { weight: 2 avg_vruntime: 3184177463330 vruntime: 3184748414495 (-570951165) deadline: 4747605329439 } turbostat-1222 [006] d..2. 311.935651: reweight_entity: (ffff888108f13e00-ffff88885ef38440-6) { weight: 2 avg_vruntime: 3184177463330 vruntime: 3184748414495 (-570951165) deadline: 4747605329439 } -> { weight: 1048576 avg_vruntime: 3184176414812 vruntime: 3184177464419 (-1049607) deadline: 3184180445332 }
Which is a weight transition: 1048576 -> 2 -> 1048576.
One would expect the lag to shoot out *AND* come back, notably:
-1123*1048576/2 = -588775424 -588775424*2/1048576 = -1123
Except the trace shows it is all off. Worse, subsequent cycles shoot it out further and further.
This made me have a very hard look at reweight_entity(), and specifically the ->on_rq case, which is more prominent with DELAY_DEQUEUE.
And indeed, it is all sorts of broken. While the computation of the new lag is correct, the computation for the new vruntime, using the new lag is broken for it does not consider the logic set out in place_entity().
With the below patch, I now see things like:
migration/12-55 [012] d..3. 309.006650: reweight_entity: (ffff8881e0e6f600-ffff88885f235f40-12) { weight: 977582 avg_vruntime: 4860513347366 vruntime: 4860513347908 (-542) deadline: 4860516552475 } -> { weight: 2 avg_vruntime: 4860528915984 vruntime: 4860793840706 (-264924722) deadline: 6427157349203 } migration/14-62 [014] d..3. 309.006698: reweight_entity: (ffff8881e0e6cc00-ffff88885f3b5f40-15) { weight: 2 avg_vruntime: 4874472992283 vruntime: 4939833828823 (-65360836540) deadline: 6316614641111 } -> { weight: 967149 avg_vruntime: 4874217684324 vruntime: 4874217688559 (-4235) deadline: 4874220535650 }
Which isn't perfect yet, but much closer.
Reported-by: Doug Smythies dsmythies@telus.net Reported-by: Ingo Molnar mingo@kernel.org Tested-by: Ingo Molnar mingo@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Fixes: eab03c23c2a1 ("sched/eevdf: Fix vruntime adjustment on reweight") Link: https://lore.kernel.org/r/20250109105959.GA2981@noisy.programming.kicks-ass.... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/fair.c | 145 ++++++-------------------------------------- 1 file changed, 18 insertions(+), 127 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1ca96c99872f0..bf8153af96879 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -689,21 +689,16 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq) * * XXX could add max_slice to the augmented data to track this. */ -static s64 entity_lag(u64 avruntime, struct sched_entity *se) +static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se) { s64 vlag, limit;
- vlag = avruntime - se->vruntime; - limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se); - - return clamp(vlag, -limit, limit); -} - -static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se) -{ SCHED_WARN_ON(!se->on_rq);
- se->vlag = entity_lag(avg_vruntime(cfs_rq), se); + vlag = avg_vruntime(cfs_rq) - se->vruntime; + limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se); + + se->vlag = clamp(vlag, -limit, limit); }
/* @@ -3774,137 +3769,32 @@ static inline void dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { } #endif
-static void reweight_eevdf(struct sched_entity *se, u64 avruntime, - unsigned long weight) -{ - unsigned long old_weight = se->load.weight; - s64 vlag, vslice; - - /* - * VRUNTIME - * -------- - * - * COROLLARY #1: The virtual runtime of the entity needs to be - * adjusted if re-weight at !0-lag point. - * - * Proof: For contradiction assume this is not true, so we can - * re-weight without changing vruntime at !0-lag point. - * - * Weight VRuntime Avg-VRuntime - * before w v V - * after w' v' V' - * - * Since lag needs to be preserved through re-weight: - * - * lag = (V - v)*w = (V'- v')*w', where v = v' - * ==> V' = (V - v)*w/w' + v (1) - * - * Let W be the total weight of the entities before reweight, - * since V' is the new weighted average of entities: - * - * V' = (WV + w'v - wv) / (W + w' - w) (2) - * - * by using (1) & (2) we obtain: - * - * (WV + w'v - wv) / (W + w' - w) = (V - v)*w/w' + v - * ==> (WV-Wv+Wv+w'v-wv)/(W+w'-w) = (V - v)*w/w' + v - * ==> (WV - Wv)/(W + w' - w) + v = (V - v)*w/w' + v - * ==> (V - v)*W/(W + w' - w) = (V - v)*w/w' (3) - * - * Since we are doing at !0-lag point which means V != v, we - * can simplify (3): - * - * ==> W / (W + w' - w) = w / w' - * ==> Ww' = Ww + ww' - ww - * ==> W * (w' - w) = w * (w' - w) - * ==> W = w (re-weight indicates w' != w) - * - * So the cfs_rq contains only one entity, hence vruntime of - * the entity @v should always equal to the cfs_rq's weighted - * average vruntime @V, which means we will always re-weight - * at 0-lag point, thus breach assumption. Proof completed. - * - * - * COROLLARY #2: Re-weight does NOT affect weighted average - * vruntime of all the entities. - * - * Proof: According to corollary #1, Eq. (1) should be: - * - * (V - v)*w = (V' - v')*w' - * ==> v' = V' - (V - v)*w/w' (4) - * - * According to the weighted average formula, we have: - * - * V' = (WV - wv + w'v') / (W - w + w') - * = (WV - wv + w'(V' - (V - v)w/w')) / (W - w + w') - * = (WV - wv + w'V' - Vw + wv) / (W - w + w') - * = (WV + w'V' - Vw) / (W - w + w') - * - * ==> V'*(W - w + w') = WV + w'V' - Vw - * ==> V' * (W - w) = (W - w) * V (5) - * - * If the entity is the only one in the cfs_rq, then reweight - * always occurs at 0-lag point, so V won't change. Or else - * there are other entities, hence W != w, then Eq. (5) turns - * into V' = V. So V won't change in either case, proof done. - * - * - * So according to corollary #1 & #2, the effect of re-weight - * on vruntime should be: - * - * v' = V' - (V - v) * w / w' (4) - * = V - (V - v) * w / w' - * = V - vl * w / w' - * = V - vl' - */ - if (avruntime != se->vruntime) { - vlag = entity_lag(avruntime, se); - vlag = div_s64(vlag * old_weight, weight); - se->vruntime = avruntime - vlag; - } - - /* - * DEADLINE - * -------- - * - * When the weight changes, the virtual time slope changes and - * we should adjust the relative virtual deadline accordingly. - * - * d' = v' + (d - v)*w/w' - * = V' - (V - v)*w/w' + (d - v)*w/w' - * = V - (V - v)*w/w' + (d - v)*w/w' - * = V + (d - V)*w/w' - */ - vslice = (s64)(se->deadline - avruntime); - vslice = div_s64(vslice * old_weight, weight); - se->deadline = avruntime + vslice; -} +static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags);
static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { bool curr = cfs_rq->curr == se; - u64 avruntime;
if (se->on_rq) { /* commit outstanding execution time */ update_curr(cfs_rq); - avruntime = avg_vruntime(cfs_rq); + update_entity_lag(cfs_rq, se); + se->deadline -= se->vruntime; + se->rel_deadline = 1; if (!curr) __dequeue_entity(cfs_rq, se); update_load_sub(&cfs_rq->load, se->load.weight); } dequeue_load_avg(cfs_rq, se);
- if (se->on_rq) { - reweight_eevdf(se, avruntime, weight); - } else { - /* - * Because we keep se->vlag = V - v_i, while: lag_i = w_i*(V - v_i), - * we need to scale se->vlag when w_i changes. - */ - se->vlag = div_s64(se->vlag * se->load.weight, weight); - } + /* + * Because we keep se->vlag = V - v_i, while: lag_i = w_i*(V - v_i), + * we need to scale se->vlag when w_i changes. + */ + se->vlag = div_s64(se->vlag * se->load.weight, weight); + if (se->rel_deadline) + se->deadline = div_s64(se->deadline * se->load.weight, weight);
update_load_set(&se->load, weight);
@@ -3919,6 +3809,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, enqueue_load_avg(cfs_rq, se); if (se->on_rq) { update_load_add(&cfs_rq->load, se->load.weight); + place_entity(cfs_rq, se, 0); if (!curr) __enqueue_entity(cfs_rq, se);
@@ -5359,7 +5250,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
se->vruntime = vruntime - lag;
- if (sched_feat(PLACE_REL_DEADLINE) && se->rel_deadline) { + if (se->rel_deadline) { se->deadline += se->vruntime; se->rel_deadline = 0; return;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit 66951e4860d3c688bfa550ea4a19635b57e00eca ]
Normally dequeue_entities() will continue to dequeue an empty group entity; except DELAY_DEQUEUE changes things -- it retains empty entities such that they might continue to compete and burn off some lag.
However, doing this results in update_cfs_group() re-computing the cgroup weight 'slice' for an empty group, which it (rightly) figures isn't much at all. This in turn means that the delayed entity is not competing at the expected weight. Worse, the very low weight causes its lag to be inflated, which combined with avg_vruntime() using scale_load_down(), leads to artifacts.
As such, don't adjust the weight for empty group entities and let them compete at their original weight.
Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue") Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20250110115720.GA17405@noisy.programming.kicks-ass... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/fair.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bf8153af96879..7f8677ce83f9f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3956,7 +3956,11 @@ static void update_cfs_group(struct sched_entity *se) struct cfs_rq *gcfs_rq = group_cfs_rq(se); long shares;
- if (!gcfs_rq) + /* + * When a group becomes empty, preserve its weight. This matters for + * DELAY_DEQUEUE. + */ + if (!gcfs_rq || !gcfs_rq->load.weight) return;
if (throttled_hierarchy(gcfs_rq))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
[ Upstream commit ae02ae16b76160f0aeeae2c5fb9b15226d00a4ef ]
In order to allow serialize() to be used from noinstr code, make it __always_inline.
Fixes: 0ef8047b737d ("x86/static-call: provide a way to do very early static-call updates") Closes: https://lore.kernel.org/oe-kbuild-all/202412181756.aJvzih2K-lkp@intel.com/ Reported-by: kernel test robot lkp@intel.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/20241218100918.22167-1-jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/special_insns.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index aec6e2d3aa1d5..98bfc097389c4 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -217,7 +217,7 @@ static inline int write_user_shstk_64(u64 __user *addr, u64 val)
#define nop() asm volatile ("nop")
-static inline void serialize(void) +static __always_inline void serialize(void) { /* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */ asm volatile(".byte 0xf, 0x1, 0xe8" ::: "memory");
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Binding sbinding@opensource.cirrus.com
commit de5afaddd5a7af6b9c48900741b410ca03e453ae upstream.
Add support for Ayaneo Portable Game System.
System use 2 CS35L41 Amps with HDA, using Internal boost, with I2C
Signed-off-by: Stefan Binding sbinding@opensource.cirrus.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250109165455.645810-1-sbinding@opensource.cirrus.... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10979,6 +10979,7 @@ static const struct hda_quirk alc269_fix SND_PCI_QUIRK(0x1d72, 0x1901, "RedmiBook 14", ALC256_FIXUP_ASUS_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1945, "Redmi G", ALC256_FIXUP_ASUS_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1947, "RedmiBook Air", ALC255_FIXUP_XIAOMI_HEADSET_MIC), + SND_PCI_QUIRK(0x1f66, 0x0105, "Ayaneo Portable Game Player", ALC287_FIXUP_CS35L41_I2C_2), SND_PCI_QUIRK(0x2782, 0x0214, "VAIO VJFE-CL", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x2782, 0x0228, "Infinix ZERO BOOK 13", ALC269VB_FIXUP_INFINIX_ZERO_BOOK_13), SND_PCI_QUIRK(0x2782, 0x0232, "CHUWI CoreBook XPro", ALC269VB_FIXUP_CHUWI_COREBOOK_XPRO),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luke D. Jones luke@ljones.dev
commit f67b1ef261f4370c48e3a7d19907de136be2d5bd upstream.
The GA605W laptop has almost the exact same codec setup as the GA403 and so the same quirks apply to it.
Signed-off-by: Luke D. Jones luke@ljones.dev Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250111022754.177551-1-luke@ljones.dev Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10625,6 +10625,7 @@ static const struct hda_quirk alc269_fix SND_PCI_QUIRK(0x1043, 0x1e1f, "ASUS Vivobook 15 X1504VAP", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1043, 0x1e51, "ASUS Zephyrus M15", ALC294_FIXUP_ASUS_GU502_PINS), SND_PCI_QUIRK(0x1043, 0x1e5e, "ASUS ROG Strix G513", ALC294_FIXUP_ASUS_G513_PINS), + SND_PCI_QUIRK(0x1043, 0x1e83, "ASUS GA605W", ALC285_FIXUP_ASUS_GU605_SPI_SPEAKER2_TO_DAC1), SND_PCI_QUIRK(0x1043, 0x1e8e, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA401), SND_PCI_QUIRK(0x1043, 0x1eb3, "ASUS Ally RCLA72", ALC287_FIXUP_TAS2781_I2C), SND_PCI_QUIRK(0x1043, 0x1ed3, "ASUS HN7306W", ALC287_FIXUP_CS35L41_I2C_2),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luke D. Jones luke@ljones.dev
commit 44a48b26639e591e53f6f72000c16576ce107274 upstream.
The H7606W laptop has almost the exact same codec setup as the GA403 and so the same quirks apply to it.
Signed-off-by: Luke D. Jones luke@ljones.dev Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250111022754.177551-2-luke@ljones.dev Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10625,6 +10625,7 @@ static const struct hda_quirk alc269_fix SND_PCI_QUIRK(0x1043, 0x1e1f, "ASUS Vivobook 15 X1504VAP", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1043, 0x1e51, "ASUS Zephyrus M15", ALC294_FIXUP_ASUS_GU502_PINS), SND_PCI_QUIRK(0x1043, 0x1e5e, "ASUS ROG Strix G513", ALC294_FIXUP_ASUS_G513_PINS), + SND_PCI_QUIRK(0x1043, 0x1e63, "ASUS H7606W", ALC285_FIXUP_ASUS_GU605_SPI_SPEAKER2_TO_DAC1), SND_PCI_QUIRK(0x1043, 0x1e83, "ASUS GA605W", ALC285_FIXUP_ASUS_GU605_SPI_SPEAKER2_TO_DAC1), SND_PCI_QUIRK(0x1043, 0x1e8e, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA401), SND_PCI_QUIRK(0x1043, 0x1eb3, "ASUS Ally RCLA72", ALC287_FIXUP_TAS2781_I2C),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kairui Song kasong@tencent.com
commit 212fe1c0df4a150fb6298db2cfff267ceaba5402 upstream.
If zram_meta_alloc failed early, it frees allocated zram->table without setting it NULL. Which will potentially cause zram_meta_free to access the table if user reset an failed and uninitialized device.
Link: https://lkml.kernel.org/r/20250107065446.86928-1-ryncsn@gmail.com Fixes: 74363ec674cb ("zram: fix uninitialized ZRAM not releasing backing device") Signed-off-by: Kairui Song kasong@tencent.com Reviewed-by: Sergey Senozhatsky senozhatsky@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/zram/zram_drv.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1349,6 +1349,7 @@ static bool zram_meta_alloc(struct zram zram->mem_pool = zs_create_pool(zram->disk->disk_name); if (!zram->mem_pool) { vfree(zram->table); + zram->table = NULL; return false; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomi Valkeinen tomi.valkeinen+renesas@ideasonboard.com
commit cefc479cbb50399dec0c8e996f3539c48a1ee9dd upstream.
i2c-atr catches the BUS_NOTIFY_DEL_DEVICE event on the bus and removes the translation by calling i2c_atr_detach_client().
However, BUS_NOTIFY_DEL_DEVICE happens when the device is about to be removed from this bus, i.e. before removal, and thus before calling .remove() on the driver. If the driver happens to do any i2c transactions in its remove(), they will fail.
Fix this by catching BUS_NOTIFY_REMOVED_DEVICE instead, thus removing the translation only after the device is actually removed.
Fixes: a076a860acae ("media: i2c: add I2C Address Translator (ATR) support") Cc: stable@vger.kernel.org Signed-off-by: Tomi Valkeinen tomi.valkeinen+renesas@ideasonboard.com Reviewed-by: Luca Ceresoli luca.ceresoli@bootlin.com Reviewed-by: Romain Gantois romain.gantois@bootlin.com Tested-by: Romain Gantois romain.gantois@bootlin.com Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/i2c-atr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/i2c/i2c-atr.c +++ b/drivers/i2c/i2c-atr.c @@ -412,7 +412,7 @@ static int i2c_atr_bus_notifier_call(str dev_name(dev), ret); break;
- case BUS_NOTIFY_DEL_DEVICE: + case BUS_NOTIFY_REMOVED_DEVICE: i2c_atr_detach_client(client->adapter, client); break;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit 2ca06a2f65310aeef30bb69b7405437a14766e4d upstream.
mptcp_cleanup_rbuf() is responsible to send acks when the user-space reads enough data to update the receive windows significantly.
It tries hard to avoid acquiring the subflow sockets locks by checking conditions similar to the ones implemented at the TCP level.
To avoid too much code duplication - the MPTCP protocol can't reuse the TCP helpers as part of the relevant status is maintained into the msk socket - and multiple costly window size computation, mptcp_cleanup_rbuf uses a rough estimate for the most recently advertised window size: the MPTCP receive free space, as recorded as at last-ack time.
Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect a zero to non-zero win change in some corner cases, skipping the tcp_cleanup_rbuf call and leaving the peer stuck.
After commit ea66758c1795 ("tcp: allow MPTCP to update the announced window"), MPTCP has actually cheap access to the announced window value. Use it in mptcp_cleanup_rbuf() for a more accurate ack generation.
Fixes: e3859603ba13 ("mptcp: better msk receive window updates") Cc: stable@vger.kernel.org Reported-by: Jakub Kicinski kuba@kernel.org Closes: https://lore.kernel.org/20250107131845.5e5de3c5@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-1-0d986ee7b... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/options.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -607,7 +607,6 @@ static bool mptcp_established_options_ds } opts->ext_copy.use_ack = 1; opts->suboptions = OPTION_MPTCP_DSS; - WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
/* Add kind/length/subtype/flag overhead if mapping is not populated */ if (dss_size == 0) @@ -1288,7 +1287,7 @@ static void mptcp_set_rwin(struct tcp_so } MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICT); } - return; + goto update_wspace; }
if (rcv_wnd_new != rcv_wnd_old) { @@ -1313,6 +1312,9 @@ raise_win: th->window = htons(new_win); MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDSHARED); } + +update_wspace: + WRITE_ONCE(msk->old_wspace, tp->rcv_wnd); }
__sum16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit e226d9259dc4f5d2c19e6682ad1356fa97cf38f4 upstream.
The wake-up condition currently implemented by mptcp_epollin_ready() is wrong, as it could mark the MPTCP socket as readable even when no data are present and the system is under memory pressure.
Explicitly check for some data being available in the receive queue.
Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-2-0d986ee7b... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -760,10 +760,15 @@ static inline u64 mptcp_data_avail(const
static inline bool mptcp_epollin_ready(const struct sock *sk) { + u64 data_avail = mptcp_data_avail(mptcp_sk(sk)); + + if (!data_avail) + return false; + /* mptcp doesn't have to deal with small skbs in the receive queue, - * at it can always coalesce them + * as it can always coalesce them */ - return (mptcp_data_avail(mptcp_sk(sk)) >= sk->sk_rcvlowat) || + return (data_avail >= sk->sk_rcvlowat) || (mem_cgroup_sockets_enabled && sk->sk_memcg && mem_cgroup_under_socket_pressure(sk->sk_memcg)) || READ_ONCE(tcp_memory_pressure);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit 218cc166321fb3cc8786677ffe0d09a78778a910 upstream.
The disconnect test-case generates spurious errors:
INFO: disconnect INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI 01 ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP (duration 140ms) [FAIL] file received by server does not match (in, out): Unexpected revents: POLLERR/POLLNVAL(19) -rw-r--r-- 1 root root 10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect Trailing bytes are: ��\����R���!8��u2��5N% -rw------- 1 root root 9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1 Trailing bytes are: ��\����R���!8��u2��5N% 02 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ] 03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP (duration 31ms) [ OK ] 04 ns1 TCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration 26ms) [ OK ] [FAIL] Tests of the full disconnection have failed Time: 2 seconds
The root cause is actually in the user-space bits: the test program currently disconnects as soon as all the pending data has been spooled, generating an FASTCLOSE. If such option reaches the peer before the latter has reached the closed status, the msk socket will report an error to the user-space, as per protocol specification, causing the above failure.
Address the issue explicitly waiting for all the relevant sockets to reach a closed status before performing the disconnect.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-3-0d986ee7b... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 43 ++++++++++++++++------ 1 file changed, 32 insertions(+), 11 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -25,6 +25,8 @@ #include <sys/types.h> #include <sys/mman.h>
+#include <arpa/inet.h> + #include <netdb.h> #include <netinet/in.h>
@@ -1211,23 +1213,42 @@ static void parse_setsock_options(const exit(1); }
-void xdisconnect(int fd, int addrlen) +void xdisconnect(int fd) { - struct sockaddr_storage empty; + socklen_t addrlen = sizeof(struct sockaddr_storage); + struct sockaddr_storage addr, empty; int msec_sleep = 10; - int queued = 1; - int i; + void *raw_addr; + int i, cmdlen; + char cmd[128]; + + /* get the local address and convert it to string */ + if (getsockname(fd, (struct sockaddr *)&addr, &addrlen) < 0) + xerror("getsockname"); + + if (addr.ss_family == AF_INET) + raw_addr = &(((struct sockaddr_in *)&addr)->sin_addr); + else if (addr.ss_family == AF_INET6) + raw_addr = &(((struct sockaddr_in6 *)&addr)->sin6_addr); + else + xerror("bad family"); + + strcpy(cmd, "ss -M | grep -q "); + cmdlen = strlen(cmd); + if (!inet_ntop(addr.ss_family, raw_addr, &cmd[cmdlen], + sizeof(cmd) - cmdlen)) + xerror("inet_ntop");
shutdown(fd, SHUT_WR);
- /* while until the pending data is completely flushed, the later + /* + * wait until the pending data is completely flushed and all + * the MPTCP sockets reached the closed status. * disconnect will bypass/ignore/drop any pending data. */ for (i = 0; ; i += msec_sleep) { - if (ioctl(fd, SIOCOUTQ, &queued) < 0) - xerror("can't query out socket queue: %d", errno); - - if (!queued) + /* closed socket are not listed by 'ss' */ + if (system(cmd) != 0) break;
if (i > poll_timeout) @@ -1281,9 +1302,9 @@ again: return ret;
if (cfg_truncate > 0) { - xdisconnect(fd, peer->ai_addrlen); + xdisconnect(fd); } else if (--cfg_repeat > 0) { - xdisconnect(fd, peer->ai_addrlen); + xdisconnect(fd);
/* the socket could be unblocking at this point, we need the * connect to be blocking
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiner Kallweit hkallweit1@gmail.com
commit 6be7aca91009865d8c2b73589270224a6b6e67ab upstream.
In 4.19, before the switch to linkmode bitmaps, PHY_GBIT_FEATURES included feature bits for aneg and TP/MII ports.
SUPPORTED_TP | \ SUPPORTED_MII)
SUPPORTED_10baseT_Full)
SUPPORTED_100baseT_Full)
SUPPORTED_1000baseT_Full)
PHY_100BT_FEATURES | \ PHY_DEFAULT_FEATURES)
PHY_1000BT_FEATURES)
Referenced commit expanded PHY_GBIT_FEATURES, silently removing PHY_DEFAULT_FEATURES. The removed part can be re-added by using the new PHY_GBIT_FEATURES definition. Not clear to me is why nobody seems to have noticed this issue.
I stumbled across this when checking what it takes to make phy_10_100_features_array et al private to phylib.
Fixes: d0939c26c53a ("net: ethernet: xgbe: expand PHY_GBIT_FEAUTRES") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://patch.msgid.link/46521973-7738-4157-9f5e-0bb6f694acba@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 19 ++----------------- 1 file changed, 2 insertions(+), 17 deletions(-)
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c @@ -923,7 +923,6 @@ static void xgbe_phy_free_phy_device(str
static bool xgbe_phy_finisar_phy_quirks(struct xgbe_prv_data *pdata) { - __ETHTOOL_DECLARE_LINK_MODE_MASK(supported) = { 0, }; struct xgbe_phy_data *phy_data = pdata->phy_data; unsigned int phy_id = phy_data->phydev->phy_id;
@@ -945,14 +944,7 @@ static bool xgbe_phy_finisar_phy_quirks( phy_write(phy_data->phydev, 0x04, 0x0d01); phy_write(phy_data->phydev, 0x00, 0x9140);
- linkmode_set_bit_array(phy_10_100_features_array, - ARRAY_SIZE(phy_10_100_features_array), - supported); - linkmode_set_bit_array(phy_gbit_features_array, - ARRAY_SIZE(phy_gbit_features_array), - supported); - - linkmode_copy(phy_data->phydev->supported, supported); + linkmode_copy(phy_data->phydev->supported, PHY_GBIT_FEATURES);
phy_support_asym_pause(phy_data->phydev);
@@ -964,7 +956,6 @@ static bool xgbe_phy_finisar_phy_quirks(
static bool xgbe_phy_belfuse_phy_quirks(struct xgbe_prv_data *pdata) { - __ETHTOOL_DECLARE_LINK_MODE_MASK(supported) = { 0, }; struct xgbe_phy_data *phy_data = pdata->phy_data; struct xgbe_sfp_eeprom *sfp_eeprom = &phy_data->sfp_eeprom; unsigned int phy_id = phy_data->phydev->phy_id; @@ -1028,13 +1019,7 @@ static bool xgbe_phy_belfuse_phy_quirks( reg = phy_read(phy_data->phydev, 0x00); phy_write(phy_data->phydev, 0x00, reg & ~0x00800);
- linkmode_set_bit_array(phy_10_100_features_array, - ARRAY_SIZE(phy_10_100_features_array), - supported); - linkmode_set_bit_array(phy_gbit_features_array, - ARRAY_SIZE(phy_gbit_features_array), - supported); - linkmode_copy(phy_data->phydev->supported, supported); + linkmode_copy(phy_data->phydev->supported, PHY_GBIT_FEATURES); phy_support_asym_pause(phy_data->phydev);
netif_dbg(pdata, drv, pdata->netdev,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Garzarella sgarzare@redhat.com
commit f6abafcd32f9cfc4b1a2f820ecea70773e26d423 upstream.
Some of the core functions can only be called if the transport has been assigned.
As Michal reported, a socket might have the transport at NULL, for example after a failed connect(), causing the following trace:
BUG: kernel NULL pointer dereference, address: 00000000000000a0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+ RIP: 0010:vsock_connectible_has_data+0x1f/0x40 Call Trace: vsock_bpf_recvmsg+0xca/0x5e0 sock_recvmsg+0xb9/0xc0 __sys_recvfrom+0xb3/0x130 __x64_sys_recvfrom+0x20/0x30 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e
So we need to check the `vsk->transport` in vsock_bpf_recvmsg(), especially for connected sockets (stream/seqpacket) as we already do in __vsock_connectible_recvmsg().
Fixes: 634f1a7110b4 ("vsock: support sockmap") Cc: stable@vger.kernel.org Reported-by: Michal Luczaj mhal@rbox.co Closes: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/ Tested-by: Michal Luczaj mhal@rbox.co Reported-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/ Tested-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com Reviewed-by: Hyunwoo Kim v4bel@theori.io Acked-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Luigi Leonardi leonardi@redhat.com Signed-off-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/vmw_vsock/vsock_bpf.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -77,6 +77,7 @@ static int vsock_bpf_recvmsg(struct sock size_t len, int flags, int *addr_len) { struct sk_psock *psock; + struct vsock_sock *vsk; int copied;
psock = sk_psock_get(sk); @@ -84,6 +85,13 @@ static int vsock_bpf_recvmsg(struct sock return __vsock_recvmsg(sk, msg, len, flags);
lock_sock(sk); + vsk = vsock_sk(sk); + + if (!vsk->transport) { + copied = -ENODEV; + goto out; + } + if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) { release_sock(sk); sk_psock_put(sk, psock); @@ -108,6 +116,7 @@ static int vsock_bpf_recvmsg(struct sock copied = sk_msg_recvmsg(sk, psock, msg, len, flags); }
+out: release_sock(sk); sk_psock_put(sk, psock);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Garzarella sgarzare@redhat.com
commit 2cb7c756f605ec02ffe562fb26828e4bcc5fdfc1 upstream.
If the socket has been de-assigned or assigned to another transport, we must discard any packets received because they are not expected and would cause issues when we access vsk->transport.
A possible scenario is described by Hyunwoo Kim in the attached link, where after a first connect() interrupted by a signal, and a second connect() failed, we can find `vsk->transport` at NULL, leading to a NULL pointer dereference.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Reported-by: Hyunwoo Kim v4bel@theori.io Reported-by: Wongi Lee qwerty@theori.io Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/ Signed-off-by: Stefano Garzarella sgarzare@redhat.com Reviewed-by: Hyunwoo Kim v4bel@theori.io Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/vmw_vsock/virtio_transport_common.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -1628,8 +1628,11 @@ void virtio_transport_recv_pkt(struct vi
lock_sock(sk);
- /* Check if sk has been closed before lock_sock */ - if (sock_flag(sk, SOCK_DONE)) { + /* Check if sk has been closed or assigned to another transport before + * lock_sock (note: listener sockets are not assigned to any transport) + */ + if (sock_flag(sk, SOCK_DONE) || + (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) { (void)virtio_transport_reset_no_sock(t, skb); release_sock(sk); sock_put(sk);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Garzarella sgarzare@redhat.com
commit df137da9d6d166e87e40980e36eb8e0bc90483ef upstream.
During virtio_transport_release() we can schedule a delayed work to perform the closing of the socket before destruction.
The destructor is called either when the socket is really destroyed (reference counter to zero), or it can also be called when we are de-assigning the transport.
In the former case, we are sure the delayed work has completed, because it holds a reference until it completes, so the destructor will definitely be called after the delayed work is finished. But in the latter case, the destructor is called by AF_VSOCK core, just after the release(), so there may still be delayed work scheduled.
Refactor the code, moving the code to delete the close work already in the do_close() to a new function. Invoke it during destruction to make sure we don't leave any pending work.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Reported-by: Hyunwoo Kim v4bel@theori.io Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/ Signed-off-by: Stefano Garzarella sgarzare@redhat.com Reviewed-by: Luigi Leonardi leonardi@redhat.com Tested-by: Hyunwoo Kim v4bel@theori.io Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/vmw_vsock/virtio_transport_common.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-)
--- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -26,6 +26,9 @@ /* Threshold for detecting small packets to copy */ #define GOOD_COPY_LEN 128
+static void virtio_transport_cancel_close_work(struct vsock_sock *vsk, + bool cancel_timeout); + static const struct virtio_transport * virtio_transport_get_ops(struct vsock_sock *vsk) { @@ -1109,6 +1112,8 @@ void virtio_transport_destruct(struct vs { struct virtio_vsock_sock *vvs = vsk->trans;
+ virtio_transport_cancel_close_work(vsk, true); + kfree(vvs); vsk->trans = NULL; } @@ -1204,17 +1209,11 @@ static void virtio_transport_wait_close( } }
-static void virtio_transport_do_close(struct vsock_sock *vsk, - bool cancel_timeout) +static void virtio_transport_cancel_close_work(struct vsock_sock *vsk, + bool cancel_timeout) { struct sock *sk = sk_vsock(vsk);
- sock_set_flag(sk, SOCK_DONE); - vsk->peer_shutdown = SHUTDOWN_MASK; - if (vsock_stream_has_data(vsk) <= 0) - sk->sk_state = TCP_CLOSING; - sk->sk_state_change(sk); - if (vsk->close_work_scheduled && (!cancel_timeout || cancel_delayed_work(&vsk->close_work))) { vsk->close_work_scheduled = false; @@ -1226,6 +1225,20 @@ static void virtio_transport_do_close(st } }
+static void virtio_transport_do_close(struct vsock_sock *vsk, + bool cancel_timeout) +{ + struct sock *sk = sk_vsock(vsk); + + sock_set_flag(sk, SOCK_DONE); + vsk->peer_shutdown = SHUTDOWN_MASK; + if (vsock_stream_has_data(vsk) <= 0) + sk->sk_state = TCP_CLOSING; + sk->sk_state_change(sk); + + virtio_transport_cancel_close_work(vsk, cancel_timeout); +} + static void virtio_transport_close_timeout(struct work_struct *work) { struct vsock_sock *vsk =
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Garzarella sgarzare@redhat.com
commit a24009bc9be60242651a21702609381b5092459e upstream.
Transport's release() and destruct() are called when de-assigning the vsock transport. These callbacks can touch some socket state like sock flags, sk_state, and peer_shutdown.
Since we are reassigning the socket to a new transport during vsock_connect(), let's reset these fields to have a clean state with the new transport.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella sgarzare@redhat.com Reviewed-by: Luigi Leonardi leonardi@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/vmw_vsock/af_vsock.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -491,6 +491,15 @@ int vsock_assign_transport(struct vsock_ */ vsk->transport->release(vsk); vsock_deassign_transport(vsk); + + /* transport's release() and destruct() can touch some socket + * state, since we are reassigning the socket to a new transport + * during vsock_connect(), let's reset these fields to have a + * clean state. + */ + sock_reset_flag(sk, SOCK_DONE); + sk->sk_state = TCP_CLOSE; + vsk->peer_shutdown = 0; }
/* We increase the module refcnt to prevent the transport unloading
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Garzarella sgarzare@redhat.com
commit 91751e248256efc111e52e15115840c35d85abaf upstream.
Recent reports have shown how we sometimes call vsock_*_has_data() when a vsock socket has been de-assigned from a transport (see attached links), but we shouldn't.
Previous commits should have solved the real problems, but we may have more in the future, so to avoid null-ptr-deref, we can return 0 (no space, no data available) but with a warning.
This way the code should continue to run in a nearly consistent state and have a warning that allows us to debug future problems.
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX... Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/ Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/ Co-developed-by: Hyunwoo Kim v4bel@theori.io Signed-off-by: Hyunwoo Kim v4bel@theori.io Co-developed-by: Wongi Lee qwerty@theori.io Signed-off-by: Wongi Lee qwerty@theori.io Signed-off-by: Stefano Garzarella sgarzare@redhat.com Reviewed-by: Luigi Leonardi leonardi@redhat.com Reviewed-by: Hyunwoo Kim v4bel@theori.io Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/vmw_vsock/af_vsock.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -879,6 +879,9 @@ EXPORT_SYMBOL_GPL(vsock_create_connected
s64 vsock_stream_has_data(struct vsock_sock *vsk) { + if (WARN_ON(!vsk->transport)) + return 0; + return vsk->transport->stream_has_data(vsk); } EXPORT_SYMBOL_GPL(vsock_stream_has_data); @@ -887,6 +890,9 @@ s64 vsock_connectible_has_data(struct vs { struct sock *sk = sk_vsock(vsk);
+ if (WARN_ON(!vsk->transport)) + return 0; + if (sk->sk_type == SOCK_SEQPACKET) return vsk->transport->seqpacket_has_data(vsk); else @@ -896,6 +902,9 @@ EXPORT_SYMBOL_GPL(vsock_connectible_has_
s64 vsock_stream_has_space(struct vsock_sock *vsk) { + if (WARN_ON(!vsk->transport)) + return 0; + return vsk->transport->stream_has_space(vsk); } EXPORT_SYMBOL_GPL(vsock_stream_has_space);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie airlied@redhat.com
commit 1f9910b41c857a892b83801feebdc7bdf38c5985 upstream.
The fence sync logic doesn't handle a fence sync across devices as it tries to write to a channel offset from one device into the fence bo from a different device, which won't work so well.
This patch fixes that to avoid using the sync path in the case where the fences come from different nouveau drm devices.
This works fine on a single device as the fence bo is shared across the devices, and mapped into each channels vma space, the channel offsets are therefore okay to pass between sides, so one channel can sync on the seqnos from the other by using the offset into it's vma.
Signed-off-by: Dave Airlie airlied@redhat.com Cc: stable@vger.kernel.org Reviewed-by: Ben Skeggs bskeggs@nvidia.com [ Fix compilation issue; remove version log from commit messsage. - Danilo ] Signed-off-by: Danilo Krummrich dakr@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20250109005553.623947-1-airlie... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -387,11 +387,13 @@ nouveau_fence_sync(struct nouveau_bo *nv if (f) { struct nouveau_channel *prev; bool must_wait = true; + bool local;
rcu_read_lock(); prev = rcu_dereference(f->channel); - if (prev && (prev == chan || - fctx->sync(f, prev, chan) == 0)) + local = prev && prev->cli->drm == chan->cli->drm; + if (local && (prev == chan || + fctx->sync(f, prev, chan) == 0)) must_wait = false; rcu_read_unlock(); if (!must_wait)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 35243fc777566ccb3370e175cf591fea0f81f68c upstream.
Macbook 5,1 with MCP79 lost its backlight control since the recent change for supporting GSP-RM; it rewrote the whole nv50 backlight control code and each display engine is supposed to have an entry for IOR bl callback, but it didn't cover mcp77.
This patch adds the missing bl entry initialization for mcp77 display engine to recover the backlight control.
Fixes: 2274ce7e3681 ("drm/nouveau/disp: add output backlight control methods") Cc: stable@vger.kernel.org Link: https://bugzilla.suse.com/show_bug.cgi?id=1223838 Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Danilo Krummrich dakr@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20250102114944.11499-1-tiwai@s... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nvkm/engine/disp/mcp77.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/mcp77.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/mcp77.c @@ -31,6 +31,7 @@ mcp77_sor = { .state = g94_sor_state, .power = nv50_sor_power, .clock = nv50_sor_clock, + .bl = &nv50_sor_bl, .hdmi = &g84_sor_hdmi, .dp = &g94_sor_dp, };
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Fertser fercerpav@gmail.com
commit 9e2bbab94b88295dcc57c7580393c9ee08d7314d upstream.
Obtaining RTNL lock in a response handler is not allowed since it runs in an atomic softirq context. Postpone setting the MAC address by adding a dedicated step to the configuration FSM.
Fixes: 790071347a0a ("net/ncsi: change from ndo_set_mac_address to dev_set_mac_address") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20241129-potin-revert-ncsi-set-mac-addr-v1-1-94ea2cb... Signed-off-by: Paul Fertser fercerpav@gmail.com Tested-by: Potin Lai potin.lai.pt@gmail.com Link: https://patch.msgid.link/20250109145054.30925-1-fercerpav@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ncsi/internal.h | 2 ++ net/ncsi/ncsi-manage.c | 16 ++++++++++++++-- net/ncsi/ncsi-rsp.c | 19 ++++++------------- 3 files changed, 22 insertions(+), 15 deletions(-)
--- a/net/ncsi/internal.h +++ b/net/ncsi/internal.h @@ -289,6 +289,7 @@ enum { ncsi_dev_state_config_sp = 0x0301, ncsi_dev_state_config_cis, ncsi_dev_state_config_oem_gma, + ncsi_dev_state_config_apply_mac, ncsi_dev_state_config_clear_vids, ncsi_dev_state_config_svf, ncsi_dev_state_config_ev, @@ -322,6 +323,7 @@ struct ncsi_dev_priv { #define NCSI_DEV_RESHUFFLE 4 #define NCSI_DEV_RESET 8 /* Reset state of NC */ unsigned int gma_flag; /* OEM GMA flag */ + struct sockaddr pending_mac; /* MAC address received from GMA */ spinlock_t lock; /* Protect the NCSI device */ unsigned int package_probe_id;/* Current ID during probe */ unsigned int package_num; /* Number of packages */ --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -1038,7 +1038,7 @@ static void ncsi_configure_channel(struc : ncsi_dev_state_config_clear_vids; break; case ncsi_dev_state_config_oem_gma: - nd->state = ncsi_dev_state_config_clear_vids; + nd->state = ncsi_dev_state_config_apply_mac;
nca.package = np->id; nca.channel = nc->id; @@ -1050,10 +1050,22 @@ static void ncsi_configure_channel(struc nca.type = NCSI_PKT_CMD_OEM; ret = ncsi_gma_handler(&nca, nc->version.mf_id); } - if (ret < 0) + if (ret < 0) { + nd->state = ncsi_dev_state_config_clear_vids; schedule_work(&ndp->work); + }
break; + case ncsi_dev_state_config_apply_mac: + rtnl_lock(); + ret = dev_set_mac_address(dev, &ndp->pending_mac, NULL); + rtnl_unlock(); + if (ret < 0) + netdev_warn(dev, "NCSI: 'Writing MAC address to device failed\n"); + + nd->state = ncsi_dev_state_config_clear_vids; + + fallthrough; case ncsi_dev_state_config_clear_vids: case ncsi_dev_state_config_svf: case ncsi_dev_state_config_ev: --- a/net/ncsi/ncsi-rsp.c +++ b/net/ncsi/ncsi-rsp.c @@ -628,16 +628,14 @@ static int ncsi_rsp_handler_snfc(struct static int ncsi_rsp_handler_oem_gma(struct ncsi_request *nr, int mfr_id) { struct ncsi_dev_priv *ndp = nr->ndp; + struct sockaddr *saddr = &ndp->pending_mac; struct net_device *ndev = ndp->ndev.dev; struct ncsi_rsp_oem_pkt *rsp; - struct sockaddr saddr; u32 mac_addr_off = 0; - int ret = 0;
/* Get the response header */ rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
- saddr.sa_family = ndev->type; ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE; if (mfr_id == NCSI_OEM_MFR_BCM_ID) mac_addr_off = BCM_MAC_ADDR_OFFSET; @@ -646,22 +644,17 @@ static int ncsi_rsp_handler_oem_gma(stru else if (mfr_id == NCSI_OEM_MFR_INTEL_ID) mac_addr_off = INTEL_MAC_ADDR_OFFSET;
- memcpy(saddr.sa_data, &rsp->data[mac_addr_off], ETH_ALEN); + saddr->sa_family = ndev->type; + memcpy(saddr->sa_data, &rsp->data[mac_addr_off], ETH_ALEN); if (mfr_id == NCSI_OEM_MFR_BCM_ID || mfr_id == NCSI_OEM_MFR_INTEL_ID) - eth_addr_inc((u8 *)saddr.sa_data); - if (!is_valid_ether_addr((const u8 *)saddr.sa_data)) + eth_addr_inc((u8 *)saddr->sa_data); + if (!is_valid_ether_addr((const u8 *)saddr->sa_data)) return -ENXIO;
/* Set the flag for GMA command which should only be called once */ ndp->gma_flag = 1;
- rtnl_lock(); - ret = dev_set_mac_address(ndev, &saddr, NULL); - rtnl_unlock(); - if (ret < 0) - netdev_warn(ndev, "NCSI: 'Writing mac address to device failed\n"); - - return ret; + return 0; }
/* Response handler for Mellanox card */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marco Nelissen marco.nelissen@gmail.com
commit f505e6c91e7a22d10316665a86d79f84d9f0ba76 upstream.
On 32-bit kernels, folio_seek_hole_data() was inadvertently truncating a 64-bit value to 32 bits, leading to a possible infinite loop when writing to an xfs filesystem.
Link: https://lkml.kernel.org/r/20250102190540.1356838-1-marco.nelissen@gmail.com Fixes: 54fa39ac2e00 ("iomap: use mapping_seek_hole_data") Signed-off-by: Marco Nelissen marco.nelissen@gmail.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/filemap.c +++ b/mm/filemap.c @@ -3004,7 +3004,7 @@ static inline loff_t folio_seek_hole_dat if (ops->is_partially_uptodate(folio, offset, bsz) == seek_data) break; - start = (start + bsz) & ~(bsz - 1); + start = (start + bsz) & ~((u64)bsz - 1); offset += bsz; } while (offset < folio_size(folio)); unlock:
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rik van Riel riel@surriel.com
commit cbc5dde0a461240046e8a41c43d7c3b76d5db952 upstream.
Since commit 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore") the number of softlockups in __read_vmcore at kdump time have gone down, but they still happen sometimes.
In a memory constrained environment like the kdump image, a softlockup is not just a harmless message, but it can interfere with things like RCU freeing memory, causing the crashdump to get stuck.
The second loop in __read_vmcore has a lot more opportunities for natural sleep points, like scheduling out while waiting for a data write to happen, but apparently that is not always enough.
Add a cond_resched() to the second loop in __read_vmcore to (hopefully) get rid of the softlockups.
Link: https://lkml.kernel.org/r/20250110102821.2a37581b@fangorn Fixes: 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore") Signed-off-by: Rik van Riel riel@surriel.com Reported-by: Breno Leitao leitao@debian.org Cc: Baoquan He bhe@redhat.com Cc: Dave Young dyoung@redhat.com Cc: Vivek Goyal vgoyal@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/vmcore.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -404,6 +404,8 @@ static ssize_t __read_vmcore(struct iov_ if (!iov_iter_count(iter)) return acc; } + + cond_resched(); }
return acc;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
commit 9860370c2172704b6b4f0075a0c2a29fd84af96a upstream.
irq_chip functions may be called in raw spinlock context. Therefore, we must also use a raw spinlock for our own internal locking.
This fixes the following lockdep splat:
[ 5.349336] ============================= [ 5.353349] [ BUG: Invalid wait context ] [ 5.357361] 6.13.0-rc5+ #69 Tainted: G W [ 5.363031] ----------------------------- [ 5.367045] kworker/u17:1/44 is trying to lock: [ 5.371587] ffffff88018b02c0 (&chip->gpio_lock){....}-{3:3}, at: xgpio_irq_unmask (drivers/gpio/gpio-xilinx.c:433 (discriminator 8)) [ 5.380079] other info that might help us debug this: [ 5.385138] context-{5:5} [ 5.387762] 5 locks held by kworker/u17:1/44: [ 5.392123] #0: ffffff8800014958 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work (kernel/workqueue.c:3204) [ 5.402260] #1: ffffffc082fcbdd8 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work (kernel/workqueue.c:3205) [ 5.411528] #2: ffffff880172c900 (&dev->mutex){....}-{4:4}, at: __device_attach (drivers/base/dd.c:1006) [ 5.419929] #3: ffffff88039c8268 (request_class#2){+.+.}-{4:4}, at: __setup_irq (kernel/irq/internals.h:156 kernel/irq/manage.c:1596) [ 5.428331] #4: ffffff88039c80c8 (lock_class#2){....}-{2:2}, at: __setup_irq (kernel/irq/manage.c:1614) [ 5.436472] stack backtrace: [ 5.439359] CPU: 2 UID: 0 PID: 44 Comm: kworker/u17:1 Tainted: G W 6.13.0-rc5+ #69 [ 5.448690] Tainted: [W]=WARN [ 5.451656] Hardware name: xlnx,zynqmp (DT) [ 5.455845] Workqueue: events_unbound deferred_probe_work_func [ 5.461699] Call trace: [ 5.464147] show_stack+0x18/0x24 C [ 5.467821] dump_stack_lvl (lib/dump_stack.c:123) [ 5.471501] dump_stack (lib/dump_stack.c:130) [ 5.474824] __lock_acquire (kernel/locking/lockdep.c:4828 kernel/locking/lockdep.c:4898 kernel/locking/lockdep.c:5176) [ 5.478758] lock_acquire (arch/arm64/include/asm/percpu.h:40 kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5851 kernel/locking/lockdep.c:5814) [ 5.482429] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) [ 5.486797] xgpio_irq_unmask (drivers/gpio/gpio-xilinx.c:433 (discriminator 8)) [ 5.490737] irq_enable (kernel/irq/internals.h:236 kernel/irq/chip.c:170 kernel/irq/chip.c:439 kernel/irq/chip.c:432 kernel/irq/chip.c:345) [ 5.494060] __irq_startup (kernel/irq/internals.h:241 kernel/irq/chip.c:180 kernel/irq/chip.c:250) [ 5.497645] irq_startup (kernel/irq/chip.c:270) [ 5.501143] __setup_irq (kernel/irq/manage.c:1807) [ 5.504728] request_threaded_irq (kernel/irq/manage.c:2208)
Fixes: a32c7caea292 ("gpio: gpio-xilinx: Add interrupt support") Signed-off-by: Sean Anderson sean.anderson@linux.dev Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250110163354.2012654-1-sean.anderson@linux.dev Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-xilinx.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
--- a/drivers/gpio/gpio-xilinx.c +++ b/drivers/gpio/gpio-xilinx.c @@ -65,7 +65,7 @@ struct xgpio_instance { DECLARE_BITMAP(state, 64); DECLARE_BITMAP(last_irq_read, 64); DECLARE_BITMAP(dir, 64); - spinlock_t gpio_lock; /* For serializing operations */ + raw_spinlock_t gpio_lock; /* For serializing operations */ int irq; DECLARE_BITMAP(enable, 64); DECLARE_BITMAP(rising_edge, 64); @@ -179,14 +179,14 @@ static void xgpio_set(struct gpio_chip * struct xgpio_instance *chip = gpiochip_get_data(gc); int bit = xgpio_to_bit(chip, gpio);
- spin_lock_irqsave(&chip->gpio_lock, flags); + raw_spin_lock_irqsave(&chip->gpio_lock, flags);
/* Write to GPIO signal and set its direction to output */ __assign_bit(bit, chip->state, val);
xgpio_write_ch(chip, XGPIO_DATA_OFFSET, bit, chip->state);
- spin_unlock_irqrestore(&chip->gpio_lock, flags); + raw_spin_unlock_irqrestore(&chip->gpio_lock, flags); }
/** @@ -210,7 +210,7 @@ static void xgpio_set_multiple(struct gp bitmap_remap(hw_mask, mask, chip->sw_map, chip->hw_map, 64); bitmap_remap(hw_bits, bits, chip->sw_map, chip->hw_map, 64);
- spin_lock_irqsave(&chip->gpio_lock, flags); + raw_spin_lock_irqsave(&chip->gpio_lock, flags);
bitmap_replace(state, chip->state, hw_bits, hw_mask, 64);
@@ -218,7 +218,7 @@ static void xgpio_set_multiple(struct gp
bitmap_copy(chip->state, state, 64);
- spin_unlock_irqrestore(&chip->gpio_lock, flags); + raw_spin_unlock_irqrestore(&chip->gpio_lock, flags); }
/** @@ -236,13 +236,13 @@ static int xgpio_dir_in(struct gpio_chip struct xgpio_instance *chip = gpiochip_get_data(gc); int bit = xgpio_to_bit(chip, gpio);
- spin_lock_irqsave(&chip->gpio_lock, flags); + raw_spin_lock_irqsave(&chip->gpio_lock, flags);
/* Set the GPIO bit in shadow register and set direction as input */ __set_bit(bit, chip->dir); xgpio_write_ch(chip, XGPIO_TRI_OFFSET, bit, chip->dir);
- spin_unlock_irqrestore(&chip->gpio_lock, flags); + raw_spin_unlock_irqrestore(&chip->gpio_lock, flags);
return 0; } @@ -265,7 +265,7 @@ static int xgpio_dir_out(struct gpio_chi struct xgpio_instance *chip = gpiochip_get_data(gc); int bit = xgpio_to_bit(chip, gpio);
- spin_lock_irqsave(&chip->gpio_lock, flags); + raw_spin_lock_irqsave(&chip->gpio_lock, flags);
/* Write state of GPIO signal */ __assign_bit(bit, chip->state, val); @@ -275,7 +275,7 @@ static int xgpio_dir_out(struct gpio_chi __clear_bit(bit, chip->dir); xgpio_write_ch(chip, XGPIO_TRI_OFFSET, bit, chip->dir);
- spin_unlock_irqrestore(&chip->gpio_lock, flags); + raw_spin_unlock_irqrestore(&chip->gpio_lock, flags);
return 0; } @@ -398,7 +398,7 @@ static void xgpio_irq_mask(struct irq_da int bit = xgpio_to_bit(chip, irq_offset); u32 mask = BIT(bit / 32), temp;
- spin_lock_irqsave(&chip->gpio_lock, flags); + raw_spin_lock_irqsave(&chip->gpio_lock, flags);
__clear_bit(bit, chip->enable);
@@ -408,7 +408,7 @@ static void xgpio_irq_mask(struct irq_da temp &= ~mask; xgpio_writereg(chip->regs + XGPIO_IPIER_OFFSET, temp); } - spin_unlock_irqrestore(&chip->gpio_lock, flags); + raw_spin_unlock_irqrestore(&chip->gpio_lock, flags);
gpiochip_disable_irq(&chip->gc, irq_offset); } @@ -428,7 +428,7 @@ static void xgpio_irq_unmask(struct irq_
gpiochip_enable_irq(&chip->gc, irq_offset);
- spin_lock_irqsave(&chip->gpio_lock, flags); + raw_spin_lock_irqsave(&chip->gpio_lock, flags);
__set_bit(bit, chip->enable);
@@ -447,7 +447,7 @@ static void xgpio_irq_unmask(struct irq_ xgpio_writereg(chip->regs + XGPIO_IPIER_OFFSET, val); }
- spin_unlock_irqrestore(&chip->gpio_lock, flags); + raw_spin_unlock_irqrestore(&chip->gpio_lock, flags); }
/** @@ -512,7 +512,7 @@ static void xgpio_irqhandler(struct irq_
chained_irq_enter(irqchip, desc);
- spin_lock(&chip->gpio_lock); + raw_spin_lock(&chip->gpio_lock);
xgpio_read_ch_all(chip, XGPIO_DATA_OFFSET, all);
@@ -529,7 +529,7 @@ static void xgpio_irqhandler(struct irq_ bitmap_copy(chip->last_irq_read, all, 64); bitmap_or(all, rising, falling, 64);
- spin_unlock(&chip->gpio_lock); + raw_spin_unlock(&chip->gpio_lock);
dev_dbg(gc->parent, "IRQ rising %*pb falling %*pb\n", 64, rising, 64, falling);
@@ -620,7 +620,7 @@ static int xgpio_probe(struct platform_d bitmap_set(chip->hw_map, 0, width[0]); bitmap_set(chip->hw_map, 32, width[1]);
- spin_lock_init(&chip->gpio_lock); + raw_spin_lock_init(&chip->gpio_lock);
chip->gc.base = -1; chip->gc.ngpio = bitmap_weight(chip->hw_map, 64);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suren Baghdasaryan surenb@google.com
commit 4bbb6df62c54e6a2c1fcce4908df768f0cfa1e91 upstream.
Currently vma test is failing because of the new vma_assert_attached() assertion. The check is failing because previous refcount_set() inside vma_mark_attached() is a NoOp. Fix the definition of atomic_set() to correctly set the value of the atomic.
Link: https://lkml.kernel.org/r/20241227222220.1726384-1-surenb@google.com Fixes: 9325b8b5a1cb ("tools: add skeleton code for userland testing of VMA logic") Signed-off-by: Suren Baghdasaryan surenb@google.com Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Cc: Jann Horn jannh@google.com Cc: Liam R. Howlett Liam.Howlett@Oracle.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/shared/linux/maple_tree.h | 2 +- tools/testing/vma/linux/atomic.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/tools/testing/shared/linux/maple_tree.h +++ b/tools/testing/shared/linux/maple_tree.h @@ -2,6 +2,6 @@ #define atomic_t int32_t #define atomic_inc(x) uatomic_inc(x) #define atomic_read(x) uatomic_read(x) -#define atomic_set(x, y) do {} while (0) +#define atomic_set(x, y) uatomic_set(x, y) #define U8_MAX UCHAR_MAX #include "../../../../include/linux/maple_tree.h" --- a/tools/testing/vma/linux/atomic.h +++ b/tools/testing/vma/linux/atomic.h @@ -6,7 +6,7 @@ #define atomic_t int32_t #define atomic_inc(x) uatomic_inc(x) #define atomic_read(x) uatomic_read(x) -#define atomic_set(x, y) do {} while (0) +#define atomic_set(x, y) uatomic_set(x, y) #define U8_MAX UCHAR_MAX
#endif /* _LINUX_ATOMIC_H */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiaolei Wang xiaolei.wang@windriver.com
commit 726efa92e02b460811e8bc6990dd742f03b645ea upstream.
Currently imx8mp_blk_ctrl_remove() will continue the for loop until an out-of-bounds exception occurs.
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : dev_pm_domain_detach+0x8/0x48 lr : imx8mp_blk_ctrl_shutdown+0x58/0x90 sp : ffffffc084f8bbf0 x29: ffffffc084f8bbf0 x28: ffffff80daf32ac0 x27: 0000000000000000 x26: ffffffc081658d78 x25: 0000000000000001 x24: ffffffc08201b028 x23: ffffff80d0db9490 x22: ffffffc082340a78 x21: 00000000000005b0 x20: ffffff80d19bc180 x19: 000000000000000a x18: ffffffffffffffff x17: ffffffc080a39e08 x16: ffffffc080a39c98 x15: 4f435f464f006c72 x14: 0000000000000004 x13: ffffff80d0172110 x12: 0000000000000000 x11: ffffff80d0537740 x10: ffffff80d05376c0 x9 : ffffffc0808ed2d8 x8 : ffffffc084f8bab0 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffffff80d19b9420 x4 : fffffffe03466e60 x3 : 0000000080800077 x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000 Call trace: dev_pm_domain_detach+0x8/0x48 platform_shutdown+0x2c/0x48 device_shutdown+0x158/0x268 kernel_restart_prepare+0x40/0x58 kernel_kexec+0x58/0xe8 __do_sys_reboot+0x198/0x258 __arm64_sys_reboot+0x2c/0x40 invoke_syscall+0x5c/0x138 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x38/0xc8 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x190/0x198 Code: 8128c2d0 ffffffc0 aa1e03e9 d503201f
Fixes: 556f5cf9568a ("soc: imx: add i.MX8MP HSIO blk-ctrl") Cc: stable@vger.kernel.org Signed-off-by: Xiaolei Wang xiaolei.wang@windriver.com Reviewed-by: Lucas Stach l.stach@pengutronix.de Reviewed-by: Fabio Estevam festevam@gmail.com Reviewed-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20250115014118.4086729-1-xiaolei.wang@windriver.co... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pmdomain/imx/imx8mp-blk-ctrl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pmdomain/imx/imx8mp-blk-ctrl.c +++ b/drivers/pmdomain/imx/imx8mp-blk-ctrl.c @@ -770,7 +770,7 @@ static void imx8mp_blk_ctrl_remove(struc
of_genpd_del_provider(pdev->dev.of_node);
- for (i = 0; bc->onecell_data.num_domains; i++) { + for (i = 0; i < bc->onecell_data.num_domains; i++) { struct imx8mp_blk_ctrl_domain *domain = &bc->domains[i];
pm_genpd_remove(&domain->genpd);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guo Weikang guoweikang.kernel@gmail.com
commit 76d5d4c53e68719c018691b19a961e78524a155c upstream.
kmemleak_alloc_percpu gives an incorrect min_count parameter, causing percpu memory to be considered a gray object.
Link: https://lkml.kernel.org/r/20241227092311.3572500-1-guoweikang.kernel@gmail.c... Fixes: 8c8685928910 ("mm/kmemleak: use IS_ERR_PCPU() for pointer in the percpu address space") Signed-off-by: Guo Weikang guoweikang.kernel@gmail.com Acked-by: Uros Bizjak ubizjak@gmail.com Acked-by: Catalin Marinas catalin.marinas@arm.com Cc: Guo Weikang guoweikang.kernel@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/kmemleak.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -1071,7 +1071,7 @@ void __ref kmemleak_alloc_percpu(const v pr_debug("%s(0x%px, %zu)\n", __func__, ptr, size);
if (kmemleak_enabled && ptr && !IS_ERR_PCPU(ptr)) - create_object_percpu((__force unsigned long)ptr, size, 0, gfp); + create_object_percpu((__force unsigned long)ptr, size, 1, gfp); } EXPORT_SYMBOL_GPL(kmemleak_alloc_percpu);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryan Roberts ryan.roberts@arm.com
commit a32bf5bb7933fde6f39747499f8ec232b5b5400f upstream.
After commit b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp"), cow test cases involving swapping out THPs via madvise(MADV_PAGEOUT) started to be skipped due to the subsequent check via pagemap determining that the memory was not actually swapped out. Logs similar to this were emitted:
...
# [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (16 kB) ok 2 # SKIP MADV_PAGEOUT did not work, is swap enabled? # [RUN] Basic COW after fork() ... with single PTE of swapped-out THP (16 kB) ok 3 # SKIP MADV_PAGEOUT did not work, is swap enabled? # [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP (32 kB) ok 4 # SKIP MADV_PAGEOUT did not work, is swap enabled?
...
The commit in question introduces the behaviour of scanning THPs and if their content is predominantly zero, it splits them and replaces the pages which are wholly zero with the zero page. These cow test cases were getting caught up in this.
So let's avoid that by filling the contents of all allocated memory with a non-zero value. With this in place, the tests are passing again.
Link: https://lkml.kernel.org/r/20250107142555.1870101-1-ryan.roberts@arm.com Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp") Signed-off-by: Ryan Roberts ryan.roberts@arm.com Acked-by: David Hildenbrand david@redhat.com Cc: Usama Arif usamaarif642@gmail.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/cow.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -758,7 +758,7 @@ static void do_run_with_base_page(test_f }
/* Populate a base page. */ - memset(mem, 0, pagesize); + memset(mem, 1, pagesize);
if (swapout) { madvise(mem, pagesize, MADV_PAGEOUT); @@ -824,12 +824,12 @@ static void do_run_with_thp(test_fn fn, * Try to populate a THP. Touch the first sub-page and test if * we get the last sub-page populated automatically. */ - mem[0] = 0; + mem[0] = 1; if (!pagemap_is_populated(pagemap_fd, mem + thpsize - pagesize)) { ksft_test_result_skip("Did not get a THP populated\n"); goto munmap; } - memset(mem, 0, thpsize); + memset(mem, 1, thpsize);
size = thpsize; switch (thp_run) { @@ -1012,7 +1012,7 @@ static void run_with_hugetlb(test_fn fn, }
/* Populate an huge page. */ - memset(mem, 0, hugetlbsize); + memset(mem, 1, hugetlbsize);
/* * We need a total of two hugetlb pages to handle COW/unsharing
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leo Li sunpeng.li@amd.com
commit 35ca53b7b0f0ffd16c6675fd76abac9409cf83e0 upstream.
[Why]
There should not be any need to revalidate bandwidth on memory placement change, since the fb is expected to be pinned to DCN-accessable memory before scanout. For APU it's DRAM, and DGPU, it's VRAM. However, async flips + memory type change needs to be rejected.
[How]
Do not set lock_and_validation_needed on mem_type change. Instead, reject an async_flip request if the crtc's buffer(s) changed mem_type.
This may fix stuttering/corruption experienced with PSR SU and PSR1 panels, if the compositor allocates fbs in both VRAM carveout and GTT and flips between them.
Fixes: a7c0cad0dc06 ("drm/amd/display: ensure async flips are only accepted for fast updates") Reviewed-by: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Leo Li sunpeng.li@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 4caacd1671b7a013ad04cd8b6398f002540bdd4d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 29 +++++++++++++++++----- 1 file changed, 23 insertions(+), 6 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -11370,6 +11370,25 @@ static int dm_crtc_get_cursor_mode(struc return 0; }
+static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev, + struct drm_atomic_state *state, + struct drm_crtc_state *crtc_state) +{ + struct drm_plane *plane; + struct drm_plane_state *new_plane_state, *old_plane_state; + + drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) { + new_plane_state = drm_atomic_get_plane_state(state, plane); + old_plane_state = drm_atomic_get_plane_state(state, plane); + + if (old_plane_state->fb && new_plane_state->fb && + get_mem_type(old_plane_state->fb) != get_mem_type(new_plane_state->fb)) + return true; + } + + return false; +} + /** * amdgpu_dm_atomic_check() - Atomic check implementation for AMDgpu DM. * @@ -11567,10 +11586,6 @@ static int amdgpu_dm_atomic_check(struct
/* Remove exiting planes if they are modified */ for_each_oldnew_plane_in_descending_zpos(state, plane, old_plane_state, new_plane_state) { - if (old_plane_state->fb && new_plane_state->fb && - get_mem_type(old_plane_state->fb) != - get_mem_type(new_plane_state->fb)) - lock_and_validation_needed = true;
ret = dm_update_plane_state(dc, state, plane, old_plane_state, @@ -11865,9 +11880,11 @@ static int amdgpu_dm_atomic_check(struct
/* * Only allow async flips for fast updates that don't change - * the FB pitch, the DCC state, rotation, etc. + * the FB pitch, the DCC state, rotation, mem_type, etc. */ - if (new_crtc_state->async_flip && lock_and_validation_needed) { + if (new_crtc_state->async_flip && + (lock_and_validation_needed || + amdgpu_dm_crtc_mem_type_changed(dev, state, new_crtc_state))) { drm_dbg_atomic(crtc->dev, "[CRTC:%d:%s] async flips are only supported for fast updates\n", crtc->base.id, crtc->name);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryan Roberts ryan.roberts@arm.com
commit 0cef0bb836e3cfe00f08f9606c72abd72fe78ca3 upstream.
When mremap()ing a memory region previously registered with userfaultfd as write-protected but without UFFD_FEATURE_EVENT_REMAP, an inconsistency in flag clearing leads to a mismatch between the vma flags (which have uffd-wp cleared) and the pte/pmd flags (which do not have uffd-wp cleared). This mismatch causes a subsequent mprotect(PROT_WRITE) to trigger a warning in page_table_check_pte_flags() due to setting the pte to writable while uffd-wp is still set.
Fix this by always explicitly clearing the uffd-wp pte/pmd flags on any such mremap() so that the values are consistent with the existing clearing of VM_UFFD_WP. Be careful to clear the logical flag regardless of its physical form; a PTE bit, a swap PTE bit, or a PTE marker. Cover PTE, huge PMD and hugetlb paths.
Link: https://lkml.kernel.org/r/20250107144755.1871363-2-ryan.roberts@arm.com Co-developed-by: Mikołaj Lenczewski miko.lenczewski@arm.com Signed-off-by: Mikołaj Lenczewski miko.lenczewski@arm.com Signed-off-by: Ryan Roberts ryan.roberts@arm.com Closes: https://lore.kernel.org/linux-mm/810b44a8-d2ae-4107-b665-5a42eae2d948@arm.co... Fixes: 63b2d4174c4a ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl") Cc: David Hildenbrand david@redhat.com Cc: Jann Horn jannh@google.com Cc: Liam R. Howlett Liam.Howlett@Oracle.com Cc: Lorenzo Stoakes lorenzo.stoakes@oracle.com Cc: Mark Rutland mark.rutland@arm.com Cc: Muchun Song muchun.song@linux.dev Cc: Peter Xu peterx@redhat.com Cc: Shuah Khan shuah@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/userfaultfd_k.h | 12 ++++++++++++ mm/huge_memory.c | 12 ++++++++++++ mm/hugetlb.c | 14 +++++++++++++- mm/mremap.c | 32 +++++++++++++++++++++++++++++++- 4 files changed, 68 insertions(+), 2 deletions(-)
--- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -247,6 +247,13 @@ static inline bool vma_can_userfault(str vma_is_shmem(vma); }
+static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct *vma) +{ + struct userfaultfd_ctx *uffd_ctx = vma->vm_userfaultfd_ctx.ctx; + + return uffd_ctx && (uffd_ctx->features & UFFD_FEATURE_EVENT_REMAP) == 0; +} + extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); void dup_userfaultfd_fail(struct list_head *); @@ -401,6 +408,11 @@ static inline bool userfaultfd_wp_async( { return false; } + +static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct *vma) +{ + return false; +}
#endif /* CONFIG_USERFAULTFD */
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2132,6 +2132,16 @@ static pmd_t move_soft_dirty_pmd(pmd_t p return pmd; }
+static pmd_t clear_uffd_wp_pmd(pmd_t pmd) +{ + if (pmd_present(pmd)) + pmd = pmd_clear_uffd_wp(pmd); + else if (is_swap_pmd(pmd)) + pmd = pmd_swp_clear_uffd_wp(pmd); + + return pmd; +} + bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) { @@ -2170,6 +2180,8 @@ bool move_huge_pmd(struct vm_area_struct pgtable_trans_huge_deposit(mm, new_pmd, pgtable); } pmd = move_soft_dirty_pmd(pmd); + if (vma_has_uffd_without_event_remap(vma)) + pmd = clear_uffd_wp_pmd(pmd); set_pmd_at(mm, new_addr, new_pmd, pmd); if (force_flush) flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5395,6 +5395,7 @@ static void move_huge_pte(struct vm_area unsigned long new_addr, pte_t *src_pte, pte_t *dst_pte, unsigned long sz) { + bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma); struct hstate *h = hstate_vma(vma); struct mm_struct *mm = vma->vm_mm; spinlock_t *src_ptl, *dst_ptl; @@ -5411,7 +5412,18 @@ static void move_huge_pte(struct vm_area spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
pte = huge_ptep_get_and_clear(mm, old_addr, src_pte); - set_huge_pte_at(mm, new_addr, dst_pte, pte, sz); + + if (need_clear_uffd_wp && pte_marker_uffd_wp(pte)) + huge_pte_clear(mm, new_addr, dst_pte, sz); + else { + if (need_clear_uffd_wp) { + if (pte_present(pte)) + pte = huge_pte_clear_uffd_wp(pte); + else if (is_swap_pte(pte)) + pte = pte_swp_clear_uffd_wp(pte); + } + set_huge_pte_at(mm, new_addr, dst_pte, pte, sz); + }
if (src_ptl != dst_ptl) spin_unlock(src_ptl); --- a/mm/mremap.c +++ b/mm/mremap.c @@ -138,6 +138,7 @@ static int move_ptes(struct vm_area_stru struct vm_area_struct *new_vma, pmd_t *new_pmd, unsigned long new_addr, bool need_rmap_locks) { + bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma); struct mm_struct *mm = vma->vm_mm; pte_t *old_pte, *new_pte, pte; spinlock_t *old_ptl, *new_ptl; @@ -207,7 +208,18 @@ static int move_ptes(struct vm_area_stru force_flush = true; pte = move_pte(pte, old_addr, new_addr); pte = move_soft_dirty_pte(pte); - set_pte_at(mm, new_addr, new_pte, pte); + + if (need_clear_uffd_wp && pte_marker_uffd_wp(pte)) + pte_clear(mm, new_addr, new_pte); + else { + if (need_clear_uffd_wp) { + if (pte_present(pte)) + pte = pte_clear_uffd_wp(pte); + else if (is_swap_pte(pte)) + pte = pte_swp_clear_uffd_wp(pte); + } + set_pte_at(mm, new_addr, new_pte, pte); + } }
arch_leave_lazy_mmu_mode(); @@ -269,6 +281,15 @@ static bool move_normal_pmd(struct vm_ar if (WARN_ON_ONCE(!pmd_none(*new_pmd))) return false;
+ /* If this pmd belongs to a uffd vma with remap events disabled, we need + * to ensure that the uffd-wp state is cleared from all pgtables. This + * means recursing into lower page tables in move_page_tables(), and we + * can reuse the existing code if we simply treat the entry as "not + * moved". + */ + if (vma_has_uffd_without_event_remap(vma)) + return false; + /* * We don't have to worry about the ordering of src and dst * ptlocks because exclusive mmap_lock prevents deadlock. @@ -324,6 +345,15 @@ static bool move_normal_pud(struct vm_ar if (WARN_ON_ONCE(!pud_none(*new_pud))) return false;
+ /* If this pud belongs to a uffd vma with remap events disabled, we need + * to ensure that the uffd-wp state is cleared from all pgtables. This + * means recursing into lower page tables in move_page_tables(), and we + * can reuse the existing code if we simply treat the entry as "not + * moved". + */ + if (vma_has_uffd_without_event_remap(vma)) + return false; + /* * We don't have to worry about the ordering of src and dst * ptlocks because exclusive mmap_lock prevents deadlock.
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Donet Tom donettom@linux.ibm.com
commit bd3d56ffa2c450364acf02663ba88996da37079d upstream.
When MGLRU is enabled, the pgdemote_kswapd, pgdemote_direct, and pgdemote_khugepaged stats in vmstat are not being updated.
Commit f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations") moved the pgdemote vmstat update from demote_folio_list() to shrink_inactive_list(), which is in the normal LRU path. As a result, the pgdemote stats are updated correctly for the normal LRU but not for MGLRU.
To address this, we have added the pgdemote stat update in the evict_folios() function, which is in the MGLRU path. With this patch, the pgdemote stats will now be updated correctly when MGLRU is enabled.
Without this patch vmstat output when MGLRU is enabled ====================================================== pgdemote_kswapd 0 pgdemote_direct 0 pgdemote_khugepaged 0
With this patch vmstat output when MGLRU is enabled =================================================== pgdemote_kswapd 43234 pgdemote_direct 4691 pgdemote_khugepaged 0
Link: https://lkml.kernel.org/r/20250109060540.451261-1-donettom@linux.ibm.com Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations") Signed-off-by: Donet Tom donettom@linux.ibm.com Acked-by: Yu Zhao yuzhao@google.com Tested-by: Li Zhijian lizhijian@fujitsu.com Reviewed-by: Li Zhijian lizhijian@fujitsu.com Cc: Aneesh Kumar K.V (Arm) aneesh.kumar@kernel.org Cc: David Rientjes rientjes@google.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Kaiyang Zhao kaiyang2@cs.cmu.edu Cc: Michal Hocko mhocko@kernel.org Cc: Muchun Song muchun.song@linux.dev Cc: Ritesh Harjani (IBM) ritesh.list@gmail.com Cc: Roman Gushchin roman.gushchin@linux.dev Cc: Shakeel Butt shakeel.butt@linux.dev Cc: Wei Xu weixugc@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/vmscan.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4637,6 +4637,9 @@ retry: reset_batch_size(walk); }
+ __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(), + stat.nr_demoted); + item = PGSTEAL_KSWAPD + reclaimer_offset(); if (!cgroup_reclaim(sc)) __count_vm_events(item, reclaimed);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt rostedt@goodmis.org
commit 60295b944ff6805e677c48ae4178532b207d43be upstream.
Tracing tools like perf and trace-cmd read the /sys/kernel/tracing/events/*/*/format files to know how to parse the data and also how to print it. For the "print fmt" portion of that file, if anything uses an enum that is not exported to the tracing system, user space will not be able to parse it.
The GFP flags use to be defines, and defines get translated in the print fmt sections. But now they are converted to use enums, which is not.
The mm_page_alloc trace event format use to have:
print fmt: "page=%p pfn=0x%lx order=%d migratetype=%d gfp_flags=%s", REC->pfn != -1UL ? (((struct page *)vmemmap_base) + (REC->pfn)) : ((void *)0), REC->pfn != -1UL ? REC->pfn : 0, REC->order, REC->migratetype, (REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {( unsigned long)(((((((( gfp_t)(0x400u|0x800u)) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x100000u)) | (( gfp_t)0x02u)) | (( gfp_t)0x08u) | (( gfp_t)0)) | (( gfp_t)0x40000u) | (( gfp_t)0x80000u) | (( gfp_t)0x2000u)) & ~(( gfp_t)(0x400u|0x800u))) | (( gfp_t)0x400u)), "GFP_TRANSHUGE"}, {( unsigned long)((((((( gfp_t)(0x400u|0x800u)) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x100000u)) | (( gfp_t)0x02u)) | (( gfp_t)0x08u) | (( gfp_t)0)) ...
Where the GFP values are shown and not their names. But after the GFP flags were converted to use enums, it has:
print fmt: "page=%p pfn=0x%lx order=%d migratetype=%d gfp_flags=%s", REC->pfn != -1UL ? (vmemmap + (REC->pfn)) : ((void *)0), REC->pfn != -1UL ? REC->pfn : 0, REC->order, REC->migratetype, (REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {( unsigned long)(((((((( gfp_t)(((((1UL))) << (___GFP_DIRECT_RECLAIM_BIT))|((((1UL))) << (___GFP_KSWAPD_RECLAIM_BIT)))) | (( gfp_t)((((1UL))) << (___GFP_IO_BIT))) | (( gfp_t)((((1UL))) << (___GFP_FS_BIT))) | (( gfp_t)((((1UL))) << (___GFP_HARDWALL_BIT)))) | (( gfp_t)((((1UL))) << (___GFP_HIGHMEM_BIT)))) | (( gfp_t)((((1UL))) << (___GFP_MOVABLE_BIT))) | (( gfp_t)0)) | (( gfp_t)((((1UL))) << (___GFP_COMP_BIT))) ...
Where the enums names like ___GFP_KSWAPD_RECLAIM_BIT are shown and not their values. User space has no way to convert these names to their values and the output will fail to parse. What is shown is now:
mm_page_alloc: page=0xffffffff981685f3 pfn=0x1d1ac1 order=0 migratetype=1 gfp_flags=0x140cca
The TRACE_DEFINE_ENUM() macro was created to handle enums in the print fmt files. This causes them to be replaced at boot up with the numbers, so that user space tooling can parse it. By using this macro, the output is back to the human readable:
mm_page_alloc: page=0xffffffff981685f3 pfn=0x122233 order=0 migratetype=1 gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Veronika Molnarova vmolnaro@redhat.com Cc: Suren Baghdasaryan surenb@google.com Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lore.kernel.org/20250116214438.749504792@goodmis.org Reported-by: Michael Petlan mpetlan@redhat.com Closes: https://lore.kernel.org/all/87be5f7c-1a0-dad-daa0-54e342efaea7@redhat.com/ Fixes: 772dd0342727c ("mm: enumerate all gfp flags") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/trace/events/mmflags.h | 63 ++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index bb8a59c6caa2..d36c857dd249 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -13,6 +13,69 @@ * Thus most bits set go first. */
+/* These define the values that are enums (the bits) */ +#define TRACE_GFP_FLAGS_GENERAL \ + TRACE_GFP_EM(DMA) \ + TRACE_GFP_EM(HIGHMEM) \ + TRACE_GFP_EM(DMA32) \ + TRACE_GFP_EM(MOVABLE) \ + TRACE_GFP_EM(RECLAIMABLE) \ + TRACE_GFP_EM(HIGH) \ + TRACE_GFP_EM(IO) \ + TRACE_GFP_EM(FS) \ + TRACE_GFP_EM(ZERO) \ + TRACE_GFP_EM(DIRECT_RECLAIM) \ + TRACE_GFP_EM(KSWAPD_RECLAIM) \ + TRACE_GFP_EM(WRITE) \ + TRACE_GFP_EM(NOWARN) \ + TRACE_GFP_EM(RETRY_MAYFAIL) \ + TRACE_GFP_EM(NOFAIL) \ + TRACE_GFP_EM(NORETRY) \ + TRACE_GFP_EM(MEMALLOC) \ + TRACE_GFP_EM(COMP) \ + TRACE_GFP_EM(NOMEMALLOC) \ + TRACE_GFP_EM(HARDWALL) \ + TRACE_GFP_EM(THISNODE) \ + TRACE_GFP_EM(ACCOUNT) \ + TRACE_GFP_EM(ZEROTAGS) + +#ifdef CONFIG_KASAN_HW_TAGS +# define TRACE_GFP_FLAGS_KASAN \ + TRACE_GFP_EM(SKIP_ZERO) \ + TRACE_GFP_EM(SKIP_KASAN) +#else +# define TRACE_GFP_FLAGS_KASAN +#endif + +#ifdef CONFIG_LOCKDEP +# define TRACE_GFP_FLAGS_LOCKDEP \ + TRACE_GFP_EM(NOLOCKDEP) +#else +# define TRACE_GFP_FLAGS_LOCKDEP +#endif + +#ifdef CONFIG_SLAB_OBJ_EXT +# define TRACE_GFP_FLAGS_SLAB \ + TRACE_GFP_EM(NO_OBJ_EXT) +#else +# define TRACE_GFP_FLAGS_SLAB +#endif + +#define TRACE_GFP_FLAGS \ + TRACE_GFP_FLAGS_GENERAL \ + TRACE_GFP_FLAGS_KASAN \ + TRACE_GFP_FLAGS_LOCKDEP \ + TRACE_GFP_FLAGS_SLAB + +#undef TRACE_GFP_EM +#define TRACE_GFP_EM(a) TRACE_DEFINE_ENUM(___GFP_##a##_BIT); + +TRACE_GFP_FLAGS + +/* Just in case these are ever used */ +TRACE_DEFINE_ENUM(___GFP_UNUSED_BIT); +TRACE_DEFINE_ENUM(___GFP_LAST_BIT); + #define gfpflag_string(flag) {(__force unsigned long)flag, #flag}
#define __def_gfpflag_names \
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp
commit 9322d1915f9d976ee48c09d800fbd5169bc2ddcc upstream.
platform_irqchip_probe() leaks a OF node when irq_init_cb() fails. Fix it by declaring par_np with the __free(device_node) cleanup construct.
This bug was found by an experimental static analysis tool that I am developing.
Fixes: f8410e626569 ("irqchip: Add IRQCHIP_PLATFORM_DRIVER_BEGIN/END and IRQCHIP_MATCH helper macros") Signed-off-by: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241215033945.3414223-1-joe@pf.is.s.u-tokyo.ac.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irqchip.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/irqchip/irqchip.c +++ b/drivers/irqchip/irqchip.c @@ -35,11 +35,10 @@ void __init irqchip_init(void) int platform_irqchip_probe(struct platform_device *pdev) { struct device_node *np = pdev->dev.of_node; - struct device_node *par_np = of_irq_find_parent(np); + struct device_node *par_np __free(device_node) = of_irq_find_parent(np); of_irq_init_cb_t irq_init_cb = of_device_get_match_data(&pdev->dev);
if (!irq_init_cb) { - of_node_put(par_np); return -EINVAL; }
@@ -55,7 +54,6 @@ int platform_irqchip_probe(struct platfo * interrupt controller can check for specific domains as necessary. */ if (par_np && !irq_find_matching_host(par_np, DOMAIN_BUS_ANY)) { - of_node_put(par_np); return -EPROBE_DEFER; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yogesh Lal quic_ylal@quicinc.com
commit 0d62a49ab55c99e8deb4593b8d9f923de1ab5c18 upstream.
When a CPU attempts to enter low power mode, it disables the redistributor and Group 1 interrupts and reinitializes the system registers upon wakeup.
If the transition into low power mode fails, then the CPU_PM framework invokes the PM notifier callback with CPU_PM_ENTER_FAILED to allow the drivers to undo the state changes.
The GIC V3 driver ignores CPU_PM_ENTER_FAILED, which leaves the GIC in disabled state.
Handle CPU_PM_ENTER_FAILED in the same way as CPU_PM_EXIT to restore normal operation.
[ tglx: Massage change log, add Fixes tag ]
Fixes: 3708d52fc6bb ("irqchip: gic-v3: Implement CPU PM notifier") Signed-off-by: Yogesh Lal quic_ylal@quicinc.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241220093907.2747601-1-quic_ylal@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1522,7 +1522,7 @@ static int gic_retrigger(struct irq_data static int gic_cpu_pm_notifier(struct notifier_block *self, unsigned long cmd, void *v) { - if (cmd == CPU_PM_EXIT) { + if (cmd == CPU_PM_EXIT || cmd == CPU_PM_ENTER_FAILED) { if (gic_dist_security_disabled()) gic_enable_redist(true); gic_cpu_sys_reg_enable();
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomas Krcka krckatom@amazon.de
commit 35cb2c6ce7da545f3b5cb1e6473ad7c3a6f08310 upstream.
The following call-chain leads to enabling interrupts in a nested interrupt disabled section:
irq_set_vcpu_affinity() irq_get_desc_lock() raw_spin_lock_irqsave() <--- Disable interrupts its_irq_set_vcpu_affinity() guard(raw_spinlock_irq) <--- Enables interrupts when leaving the guard() irq_put_desc_unlock() <--- Warns because interrupts are enabled
This was broken in commit b97e8a2f7130, which replaced the original raw_spin_[un]lock() pair with guard(raw_spinlock_irq).
Fix the issue by using guard(raw_spinlock).
[ tglx: Massaged change log ]
Fixes: b97e8a2f7130 ("irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update()") Signed-off-by: Tomas Krcka krckatom@amazon.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241230150825.62894-1-krckatom@amazon.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3-its.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1961,7 +1961,7 @@ static int its_irq_set_vcpu_affinity(str if (!is_v4(its_dev->its)) return -EINVAL;
- guard(raw_spinlock_irq)(&its_dev->event_map.vlpi_lock); + guard(raw_spinlock)(&its_dev->event_map.vlpi_lock);
/* Unmap request? */ if (!info)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Koichiro Den koichiro.den@canonical.com
commit 2f8dea1692eef2b7ba6a256246ed82c365fdc686 upstream.
Consider a scenario where a CPU transitions from CPUHP_ONLINE to halfway through a CPU hotunplug down to CPUHP_HRTIMERS_PREPARE, and then back to CPUHP_ONLINE:
Since hrtimers_prepare_cpu() does not run, cpu_base.hres_active remains set to 1 throughout. However, during a CPU unplug operation, the tick and the clockevents are shut down at CPUHP_AP_TICK_DYING. On return to the online state, for instance CFS incorrectly assumes that the hrtick is already active, and the chance of the clockevent device to transition to oneshot mode is also lost forever for the CPU, unless it goes back to a lower state than CPUHP_HRTIMERS_PREPARE once.
This round-trip reveals another issue; cpu_base.online is not set to 1 after the transition, which appears as a WARN_ON_ONCE in enqueue_hrtimer().
Aside of that, the bulk of the per CPU state is not reset either, which means there are dangling pointers in the worst case.
Address this by adding a corresponding startup() callback, which resets the stale per CPU state and sets the online flag.
[ tglx: Make the new callback unconditionally available, remove the online modification in the prepare() callback and clear the remaining state in the starting callback instead of the prepare callback ]
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Signed-off-by: Koichiro Den koichiro.den@canonical.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/hrtimer.h | 1 + kernel/cpu.c | 2 +- kernel/time/hrtimer.c | 11 ++++++++++- 3 files changed, 12 insertions(+), 2 deletions(-)
--- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -379,6 +379,7 @@ extern void __init hrtimers_init(void); extern void sysrq_timer_list_show(void);
int hrtimers_prepare_cpu(unsigned int cpu); +int hrtimers_cpu_starting(unsigned int cpu); #ifdef CONFIG_HOTPLUG_CPU int hrtimers_cpu_dying(unsigned int cpu); #else --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2179,7 +2179,7 @@ static struct cpuhp_step cpuhp_hp_states }, [CPUHP_AP_HRTIMERS_DYING] = { .name = "hrtimers:dying", - .startup.single = NULL, + .startup.single = hrtimers_cpu_starting, .teardown.single = hrtimers_cpu_dying, }, [CPUHP_AP_TICK_DYING] = { --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -2156,6 +2156,15 @@ int hrtimers_prepare_cpu(unsigned int cp }
cpu_base->cpu = cpu; + hrtimer_cpu_base_init_expiry_lock(cpu_base); + return 0; +} + +int hrtimers_cpu_starting(unsigned int cpu) +{ + struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases); + + /* Clear out any left over state from a CPU down operation */ cpu_base->active_bases = 0; cpu_base->hres_active = 0; cpu_base->hang_detected = 0; @@ -2164,7 +2173,6 @@ int hrtimers_prepare_cpu(unsigned int cp cpu_base->expires_next = KTIME_MAX; cpu_base->softirq_expires_next = KTIME_MAX; cpu_base->online = 1; - hrtimer_cpu_base_init_expiry_lock(cpu_base); return 0; }
@@ -2240,6 +2248,7 @@ int hrtimers_cpu_dying(unsigned int dyin void __init hrtimers_init(void) { hrtimers_prepare_cpu(smp_processor_id()); + hrtimers_cpu_starting(smp_processor_id()); open_softirq(HRTIMER_SOFTIRQ, hrtimer_run_softirq); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederic Weisbecker frederic@kernel.org
commit b729cc1ec21a5899b7879ccfbe1786664928d597 upstream.
Commit 10a0e6f3d3db ("timers/migration: Move hierarchy setup into cpuhotplug prepare callback") fixed a race between idle exit and CPU hotplug up leading to a wrong "0" value migrator assigned to the top level. However there is still a situation that remains unhandled:
[GRP0:0] migrator = TMIGR_NONE active = NONE groupmask = 0 / \ \ 0 1 2..7 idle idle idle
0) The system is fully idle.
[GRP0:0] migrator = CPU 0 active = CPU 0 groupmask = 0 / \ \ 0 1 2..7 active idle idle
1) CPU 0 is activating. It has done the cmpxchg on the top's ->migr_state but it hasn't yet returned to __walk_groups().
[GRP0:0] migrator = CPU 0 active = CPU 0, CPU 1 groupmask = 0 / \ \ 0 1 2..7 active active idle
2) CPU 1 is activating. CPU 0 stays the migrator (still stuck in __walk_groups(), delayed by #VMEXIT for example).
[GRP1:0] migrator = TMIGR_NONE active = NONE groupmask = 0 / \ [GRP0:0] [GRP0:1] migrator = CPU 0 migrator = TMIGR_NONE active = CPU 0, CPU1 active = NONE groupmask = 2 groupmask = 1 / \ \ 0 1 2..7 8 active active idle !online
3) CPU 8 is preparing to boot. CPUHP_TMIGR_PREPARE is being ran by CPU 1 which has created the GRP0:1 and the new top GRP1:0 connected to GRP0:1 and GRP0:0. The groupmask of GRP0:0 is now 2. CPU 1 hasn't yet propagated its activation up to GRP1:0.
[GRP1:0] migrator = 0 (!!!) active = NONE groupmask = 0 / \ [GRP0:0] [GRP0:1] migrator = CPU 0 migrator = TMIGR_NONE active = CPU 0, CPU1 active = NONE groupmask = 2 groupmask = 1 / \ \ 0 1 2..7 8 active active idle !online
4) CPU 0 finally resumed after its #VMEXIT. It's in __walk_groups() returning from tmigr_cpu_active(). The new top GRP1:0 is visible and fetched but the freshly updated groupmask of GRP0:0 may not be visible due to lack of ordering! As a result tmigr_active_up() is called to GRP0:0 with a child's groupmask of "0". This buggy "0" groupmask then becomes the migrator for GRP1:0 forever. As a result, timers on a fully idle system get ignored.
One possible fix would be to define TMIGR_NONE as "0" so that such a race would have no effect. And after all TMIGR_NONE doesn't need to be anything else. However this would leave an uncomfortable state machine where gears happen not to break by chance but are vulnerable to future modifications.
Keep TMIGR_NONE as is instead and pre-initialize to "1" the groupmask of any newly created top level. This groupmask is guaranteed to be visible upon fetching the corresponding group for the 1st time:
_ By the upcoming CPU thanks to CPU hotplug synchronization between the control CPU (BP) and the booting one (AP).
_ By the control CPU since the groupmask and parent pointers are initialized locally.
_ By all CPUs belonging to the same group than the control CPU because they must wait for it to ever become idle before needing to walk to the new top. The cmpcxhg() on ->migr_state then makes sure its groupmask is visible.
With this pre-initialization, it is guaranteed that if a future top level is linked to an old one, it is walked through with a valid groupmask.
Fixes: 10a0e6f3d3db ("timers/migration: Move hierarchy setup into cpuhotplug prepare callback") Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250114231507.21672-2-frederic@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/timer_migration.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-)
diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index 8d57f7686bb0..c8a8ea2e5b98 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -1487,6 +1487,21 @@ static void tmigr_init_group(struct tmigr_group *group, unsigned int lvl, s.seq = 0; atomic_set(&group->migr_state, s.state);
+ /* + * If this is a new top-level, prepare its groupmask in advance. + * This avoids accidents where yet another new top-level is + * created in the future and made visible before the current groupmask. + */ + if (list_empty(&tmigr_level_list[lvl])) { + group->groupmask = BIT(0); + /* + * The previous top level has prepared its groupmask already, + * simply account it as the first child. + */ + if (lvl > 0) + group->num_children = 1; + } + timerqueue_init_head(&group->events); timerqueue_init(&group->groupevt.nextevt); group->groupevt.nextevt.expires = KTIME_MAX; @@ -1550,8 +1565,20 @@ static void tmigr_connect_child_parent(struct tmigr_group *child, raw_spin_lock_irq(&child->lock); raw_spin_lock_nested(&parent->lock, SINGLE_DEPTH_NESTING);
+ if (activate) { + /* + * @child is the old top and @parent the new one. In this + * case groupmask is pre-initialized and @child already + * accounted, along with its new sibling corresponding to the + * CPU going up. + */ + WARN_ON_ONCE(child->groupmask != BIT(0) || parent->num_children != 2); + } else { + /* Adding @child for the CPU going up to @parent. */ + child->groupmask = BIT(parent->num_children++); + } + child->parent = parent; - child->groupmask = BIT(parent->num_children++);
raw_spin_unlock(&parent->lock); raw_spin_unlock_irq(&child->lock);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederic Weisbecker frederic@kernel.org
commit de3ced72a79280fefd680e5e101d8b9f03cfa1d7 upstream.
Commit 2522c84db513 ("timers/migration: Fix another race between hotplug and idle entry/exit") fixed yet another race between idle exit and CPU hotplug up leading to a wrong "0" value migrator assigned to the top level. However there is yet another situation that remains unhandled:
[GRP0:0] migrator = TMIGR_NONE active = NONE groupmask = 1 / \ \ 0 1 2..7 idle idle idle
0) The system is fully idle.
[GRP0:0] migrator = CPU 0 active = CPU 0 groupmask = 1 / \ \ 0 1 2..7 active idle idle
1) CPU 0 is activating. It has done the cmpxchg on the top's ->migr_state but it hasn't yet returned to __walk_groups().
[GRP0:0] migrator = CPU 0 active = CPU 0, CPU 1 groupmask = 1 / \ \ 0 1 2..7 active active idle
2) CPU 1 is activating. CPU 0 stays the migrator (still stuck in __walk_groups(), delayed by #VMEXIT for example).
[GRP1:0] migrator = TMIGR_NONE active = NONE groupmask = 1 / \ [GRP0:0] [GRP0:1] migrator = CPU 0 migrator = TMIGR_NONE active = CPU 0, CPU1 active = NONE groupmask = 1 groupmask = 2 / \ \ 0 1 2..7 8 active active idle !online
3) CPU 8 is preparing to boot. CPUHP_TMIGR_PREPARE is being ran by CPU 1 which has created the GRP0:1 and the new top GRP1:0 connected to GRP0:1 and GRP0:0. CPU 1 hasn't yet propagated its activation up to GRP1:0.
[GRP1:0] migrator = GRP0:0 active = GRP0:0 groupmask = 1 / \ [GRP0:0] [GRP0:1] migrator = CPU 0 migrator = TMIGR_NONE active = CPU 0, CPU1 active = NONE groupmask = 1 groupmask = 2 / \ \ 0 1 2..7 8 active active idle !online
4) CPU 0 finally resumed after its #VMEXIT. It's in __walk_groups() returning from tmigr_cpu_active(). The new top GRP1:0 is visible and fetched and the pre-initialized groupmask of GRP0:0 is also visible. As a result tmigr_active_up() is called to GRP1:0 with GRP0:0 as active and migrator. CPU 0 is returning to __walk_groups() but suffers again a #VMEXIT.
[GRP1:0] migrator = GRP0:0 active = GRP0:0 groupmask = 1 / \ [GRP0:0] [GRP0:1] migrator = CPU 0 migrator = TMIGR_NONE active = CPU 0, CPU1 active = NONE groupmask = 1 groupmask = 2 / \ \ 0 1 2..7 8 active active idle !online
5) CPU 1 propagates its activation of GRP0:0 to GRP1:0. This has no effect since CPU 0 did it already.
[GRP1:0] migrator = GRP0:0 active = GRP0:0, GRP0:1 groupmask = 1 / \ [GRP0:0] [GRP0:1] migrator = CPU 0 migrator = CPU 8 active = CPU 0, CPU1 active = CPU 8 groupmask = 1 groupmask = 2 / \ \ \ 0 1 2..7 8 active active idle active
6) CPU 1 links CPU 8 to its group. CPU 8 boots and goes through CPUHP_AP_TMIGR_ONLINE which propagates activation.
[GRP2:0] migrator = TMIGR_NONE active = NONE groupmask = 1 / \ [GRP1:0] [GRP1:1] migrator = GRP0:0 migrator = TMIGR_NONE active = GRP0:0, GRP0:1 active = NONE groupmask = 1 groupmask = 2 / \ [GRP0:0] [GRP0:1] [GRP0:2] migrator = CPU 0 migrator = CPU 8 migrator = TMIGR_NONE active = CPU 0, CPU1 active = CPU 8 active = NONE groupmask = 1 groupmask = 2 groupmask = 0 / \ \ \ 0 1 2..7 8 64 active active idle active !online
7) CPU 64 is booting. CPUHP_TMIGR_PREPARE is being ran by CPU 1 which has created the GRP1:1, GRP0:2 and the new top GRP2:0 connected to GRP1:1 and GRP1:0. CPU 1 hasn't yet propagated its activation up to GRP2:0.
[GRP2:0] migrator = 0 (!!!) active = NONE groupmask = 1 / \ [GRP1:0] [GRP1:1] migrator = GRP0:0 migrator = TMIGR_NONE active = GRP0:0, GRP0:1 active = NONE groupmask = 1 groupmask = 2 / \ [GRP0:0] [GRP0:1] [GRP0:2] migrator = CPU 0 migrator = CPU 8 migrator = TMIGR_NONE active = CPU 0, CPU1 active = CPU 8 active = NONE groupmask = 1 groupmask = 2 groupmask = 0 / \ \ \ 0 1 2..7 8 64 active active idle active !online
8) CPU 0 finally resumed after its #VMEXIT. It's in __walk_groups() returning from tmigr_cpu_active(). The new top GRP2:0 is visible and fetched but the pre-initialized groupmask of GRP1:0 is not because no ordering made its initialization visible. As a result tmigr_active_up() may be called to GRP2:0 with a "0" child's groumask. Leaving the timers ignored for ever when the system is fully idle.
The race is highly theoretical and perhaps impossible in practice but the groupmask of the child is not the only concern here as the whole initialization of the child is not guaranteed to be visible to any tree walker racing against hotplug (idle entry/exit, remote handling, etc...). Although the current code layout seem to be resilient to such hazards, this doesn't tell much about the future.
Fix this with enforcing address dependency between group initialization and the write/read to the group's parent's pointer. Fortunately that doesn't involve any barrier addition in the fast paths.
Fixes: 10a0e6f3d3db ("timers/migration: Move hierarchy setup into cpuhotplug prepare callback") Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250114231507.21672-3-frederic@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/timer_migration.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index c8a8ea2e5b98..371a62a749aa 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -534,8 +534,13 @@ static void __walk_groups(up_f up, struct tmigr_walk *data, break;
child = group; - group = group->parent; + /* + * Pairs with the store release on group connection + * to make sure group initialization is visible. + */ + group = READ_ONCE(group->parent); data->childmask = child->groupmask; + WARN_ON_ONCE(!data->childmask); } while (group); }
@@ -1578,7 +1583,12 @@ static void tmigr_connect_child_parent(struct tmigr_group *child, child->groupmask = BIT(parent->num_children++); }
- child->parent = parent; + /* + * Make sure parent initialization is visible before publishing it to a + * racing CPU entering/exiting idle. This RELEASE barrier enforces an + * address dependency that pairs with the READ_ONCE() in __walk_groups(). + */ + smp_store_release(&child->parent, parent);
raw_spin_unlock(&parent->lock); raw_spin_unlock_irq(&child->lock);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Li (Intel) xin@zytor.com
commit de31b3cd706347044e1a57d68c3a683d58e8cca4 upstream.
The FRED RSP0 MSR is only used for delivering events when running userspace. Linux leverages this property to reduce expensive MSR writes and optimize context switches. The kernel only writes the MSR when about to run userspace *and* when the MSR has actually changed since the last time userspace ran.
This optimization is implemented by maintaining a per-CPU cache of FRED RSP0 and then checking that against the value for the top of current task stack before running userspace.
However cpu_init_fred_exceptions() writes the MSR without updating the per-CPU cache. This means that the kernel might return to userspace with MSR_IA32_FRED_RSP0==0 when it needed to point to the top of current task stack. This would induce a double fault (#DF), which is bad.
A context switch after cpu_init_fred_exceptions() can paper over the issue since it updates the cached value. That evidently happens most of the time explaining how this bug got through.
Fix the bug through resynchronizing the FRED RSP0 MSR with its per-CPU cache in cpu_init_fred_exceptions().
Fixes: fe85ee391966 ("x86/entry: Set FRED RSP0 on return to userspace instead of context switch") Signed-off-by: Xin Li (Intel) xin@zytor.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Dave Hansen dave.hansen@linux.intel.com Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250110174639.1250829-1-xin%40zytor.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/fred.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fred.c b/arch/x86/kernel/fred.c index 8d32c3f48abc..5e2cd1004980 100644 --- a/arch/x86/kernel/fred.c +++ b/arch/x86/kernel/fred.c @@ -50,7 +50,13 @@ void cpu_init_fred_exceptions(void) FRED_CONFIG_ENTRYPOINT(asm_fred_entrypoint_user));
wrmsrl(MSR_IA32_FRED_STKLVLS, 0); - wrmsrl(MSR_IA32_FRED_RSP0, 0); + + /* + * Ater a CPU offline/online cycle, the FRED RSP0 MSR should be + * resynchronized with its per-CPU cache. + */ + wrmsrl(MSR_IA32_FRED_RSP0, __this_cpu_read(fred_rsp0)); + wrmsrl(MSR_IA32_FRED_RSP1, 0); wrmsrl(MSR_IA32_FRED_RSP2, 0); wrmsrl(MSR_IA32_FRED_RSP3, 0);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 1a5401ec3018c101c456cdbda2eaef9482db6786 upstream.
Mesa changed its clear color alignment from 4k to 64 bytes without informing the kernel side about the change. This is now likely to cause framebuffer creation to fail.
The only thing we do with the clear color buffer in i915 is: 1. map a single page 2. read out bytes 16-23 from said page 3. unmap the page
So the only requirement we really have is that those 8 bytes are all contained within one page. Thus we can deal with the Mesa regression by reducing the alignment requiment from 4k to the same 64 bytes in the kernel. We could even go as low as 32 bytes, but IIRC 64 bytes is the hardware requirement on the 3D engine side so matching that seems sensible.
Note that the Mesa alignment chages were partially undone so the regression itself was already fixed on userspace side.
Cc: stable@vger.kernel.org Cc: Sagar Ghuge sagar.ghuge@intel.com Cc: Nanley Chery nanley.g.chery@intel.com Reported-by: Xi Ruoyao xry111@xry111.site Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13057 Closes: https://lore.kernel.org/all/45a5bba8de009347262d86a4acb27169d9ae0d9f.camel@x... Link: https://gitlab.freedesktop.org/mesa/mesa/-/commit/17f97a69c13832a6c1b0b3aad4... Link: https://gitlab.freedesktop.org/mesa/mesa/-/commit/888f63cf1baf34bc95e847a30a... Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20241129065014.8363-2-ville.sy... Tested-by: Xi Ruoyao xry111@xry111.site Reviewed-by: José Roberto de Souza jose.souza@intel.com (cherry picked from commit ed3a892e5e3d6b3f6eeb76db7c92a968aeb52f3d) Signed-off-by: Tvrtko Ursulin tursulin@ursulin.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_fb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -1613,7 +1613,7 @@ int intel_fill_fb_info(struct drm_i915_p * arithmetic related to alignment and offset calculation. */ if (is_gen12_ccs_cc_plane(&fb->base, i)) { - if (IS_ALIGNED(fb->base.offsets[i], PAGE_SIZE)) + if (IS_ALIGNED(fb->base.offsets[i], 64)) continue; else return -EINVAL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Brost matthew.brost@intel.com
commit b1231ff7ea0689d04040a44864c265bc11612fa8 upstream.
RING_CMD_CCTL read index should be UC on iGPU parts due to L3 caching structure. Having this as WB blocks ULLS from being enabled. Change to UC to unblock ULLS on iGPU.
v2: - Drop internal communications commnet, bspec is updated
Cc: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com Cc: Michal Mrozek michal.mrozek@intel.com Cc: Paulo Zanoni paulo.r.zanoni@intel.com Cc: José Roberto de Souza jose.souza@intel.com Cc: stable@vger.kernel.org Fixes: 328e089bfb37 ("drm/xe: Leverage ComputeCS read L3 caching") Signed-off-by: Matthew Brost matthew.brost@intel.com Acked-by: Michal Mrozek michal.mrozek@intel.com Reviewed-by: Stuart Summers stuart.summers@intel.com Reviewed-by: Matt Roper matthew.d.roper@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20250114002507.114087-1-matthe... (cherry picked from commit 758debf35b9cda5450e40996991a6e4b222899bd) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/xe/xe_hw_engine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/xe/xe_hw_engine.c +++ b/drivers/gpu/drm/xe/xe_hw_engine.c @@ -417,7 +417,7 @@ hw_engine_setup_default_state(struct xe_ * Bspec: 72161 */ const u8 mocs_write_idx = gt->mocs.uc_index; - const u8 mocs_read_idx = hwe->class == XE_ENGINE_CLASS_COMPUTE && + const u8 mocs_read_idx = hwe->class == XE_ENGINE_CLASS_COMPUTE && IS_DGFX(xe) && (GRAPHICS_VER(xe) >= 20 || xe->info.platform == XE_PVC) ? gt->mocs.wb_index : gt->mocs.uc_index; u32 ring_cmd_cctl_val = REG_FIELD_PREP(CMD_CCTL_WRITE_OVERRIDE_MASK, mocs_write_idx) |
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ashutosh Dixit ashutosh.dixit@intel.com
commit 79a21fc921d7aafaf69d00b4938435b81bf66022 upstream.
Add missing VISACTL mux registers required for some OA config's (e.g. RenderPipeCtrl).
Fixes: cdf02fe1a94a ("drm/xe/oa/uapi: Add/remove OA config perf ops") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit ashutosh.dixit@intel.com Reviewed-by: Umesh Nerlige Ramappa umesh.nerlige.ramappa@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20250111021539.2920346-1-ashut... (cherry picked from commit c26f22dac3449d8a687237cdfc59a6445eb8f75a) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/xe/xe_oa.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/xe/xe_oa.c +++ b/drivers/gpu/drm/xe/xe_oa.c @@ -1980,6 +1980,7 @@ static const struct xe_mmio_range xe2_oa { .start = 0x5194, .end = 0x5194 }, /* SYS_MEM_LAT_MEASURE_MERTF_GRP_3D */ { .start = 0x8704, .end = 0x8704 }, /* LMEM_LAT_MEASURE_MCFG_GRP */ { .start = 0xB1BC, .end = 0xB1BC }, /* L3_BANK_LAT_MEASURE_LBCF_GFX */ + { .start = 0xD0E0, .end = 0xD0F4 }, /* VISACTL */ { .start = 0xE18C, .end = 0xE18C }, /* SAMPLER_MODE */ { .start = 0xE590, .end = 0xE590 }, /* TDL_LSC_LAT_MEASURE_TDL_GFX */ { .start = 0x13000, .end = 0x137FC }, /* PES_0_PESL0 - PES_63_UPPER_PESL3 */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit 11510e67d0bd956878ab4ffa03c45766788092c1 upstream.
Only apply when compute profile is selected. This is the only supported configuration. Selecting other profiles can lead to performane degradations.
Reviewed-by: Kenneth Feng kenneth.feng@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit d477e39532d725b1cdb3c8005c689c74ffbf3b94) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c @@ -2549,11 +2549,12 @@ static int smu_v13_0_0_set_power_profile &backend_workload_mask);
/* Add optimizations for SMU13.0.0/10. Reuse the power saving profile */ - if ((amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(13, 0, 0) && - ((smu->adev->pm.fw_version == 0x004e6601) || - (smu->adev->pm.fw_version >= 0x004e7300))) || - (amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(13, 0, 10) && - smu->adev->pm.fw_version >= 0x00504500)) { + if ((workload_mask & (1 << PP_SMC_POWER_PROFILE_COMPUTE)) && + ((amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(13, 0, 0) && + ((smu->adev->pm.fw_version == 0x004e6601) || + (smu->adev->pm.fw_version >= 0x004e7300))) || + (amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(13, 0, 10) && + smu->adev->pm.fw_version >= 0x00504500))) { workload_type = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_WORKLOAD, PP_SMC_POWER_PROFILE_POWERSAVING);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gui Chengming Jack.Gui@amd.com
commit bd275e6cfc972329d39c6406a3c6d2ba2aba7db6 upstream.
FW attestation was disabled on MP0_14_0_{2/3}.
V2: Move check into is_fw_attestation_support func. (Frank) Remove DRM_WARN log info. (Alex) Fix format. (Christian)
Signed-off-by: Gui Chengming Jack.Gui@amd.com Reviewed-by: Frank.Min Frank.Min@amd.com Reviewed-by: Christian König Christian.Koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 62952a38d9bcf357d5ffc97615c48b12c9cd627c) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_fw_attestation.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fw_attestation.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fw_attestation.c index 2d4b67175b55..328a1b963548 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fw_attestation.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fw_attestation.c @@ -122,6 +122,10 @@ static int amdgpu_is_fw_attestation_supported(struct amdgpu_device *adev) if (adev->flags & AMD_IS_APU) return 0;
+ if (amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(14, 0, 2) || + amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(14, 0, 3)) + return 0; + if (adev->asic_type >= CHIP_SIENNA_CICHLID) return 1;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kenneth Feng kenneth.feng@amd.com
commit 90505894c4ed581318836b792c57723df491cb91 upstream.
Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure.
Signed-off-by: Kenneth Feng kenneth.feng@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 2affe2bbc997b3920045c2c434e480c81a5f9707) Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 3afcd1e8aa54..c4e733c2e75e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -715,8 +715,9 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) { enum amd_powergating_state state = idle ? AMD_PG_STATE_GATE : AMD_PG_STATE_UNGATE; - if (IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 11 && - ((adev->mes.kiq_version & AMDGPU_MES_VERSION_MASK) <= 64)) { + if ((IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 11 && + ((adev->mes.kiq_version & AMDGPU_MES_VERSION_MASK) <= 64)) || + (IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 12)) { pr_debug("GFXOFF is %s\n", idle ? "enabled" : "disabled"); amdgpu_gfx_off_ctrl(adev, idle); } else if ((IP_VERSION_MAJ(amdgpu_ip_version(adev, GC_HWIP, 0)) == 9) &&
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian König christian.koenig@amd.com
commit af04b320c71c4b59971f021615876808a36e5038 upstream.
That is needed to enforce isolation between contexts.
Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit def59436fb0d3ca0f211d14873d0273d69ebb405) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -193,8 +193,8 @@ int amdgpu_ib_schedule(struct amdgpu_rin need_ctx_switch = ring->current_ctx != fence_ctx; if (ring->funcs->emit_pipeline_sync && job && ((tmp = amdgpu_sync_get_fence(&job->explicit_sync)) || - (amdgpu_sriov_vf(adev) && need_ctx_switch) || - amdgpu_vm_need_pipeline_sync(ring, job))) { + need_ctx_switch || amdgpu_vm_need_pipeline_sync(ring, job))) { + need_pipe_sync = true;
if (tmp)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Chung chiahsuan.chung@amd.com
commit b0a3e840ad287c33a86b5515d606451b7df86ad4 upstream.
[Why] The enum DC_PSR_VERSION_SU_1 of psr_version is 1 and DC_PSR_VERSION_UNSUPPORTED is 0xFFFFFFFF.
The original code may has chance trigger the amdgpu_dm_psr_enable() while psr version is set to DC_PSR_VERSION_UNSUPPORTED.
[How] Modify the condition to psr->psr_version == DC_PSR_VERSION_SU_1
Reviewed-by: Sun peng Li sunpeng.li@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Roman Li roman.li@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit f765e7ce0417f8dc38479b4b495047c397c16902) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -8922,7 +8922,7 @@ static void amdgpu_dm_enable_self_refres (current_ts - psr->psr_dirty_rects_change_timestamp_ns) > 500000000) { if (pr->replay_feature_enabled && !pr->replay_allow_active) amdgpu_dm_replay_enable(acrtc_state->stream, true); - if (psr->psr_version >= DC_PSR_VERSION_SU_1 && + if (psr->psr_version == DC_PSR_VERSION_SU_1 && !psr->psr_allow_active && !aconn->disallow_edp_enter_psr) amdgpu_dm_psr_enable(acrtc_state->stream); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Chung chiahsuan.chung@amd.com
commit 67edb81d6e9af43a0d58edf74630f82cfda4155d upstream.
[Why] Replay and PSR will cause some video corruption while VRR is enabled.
[How] 1. Disable the Replay and PSR while VRR is enabled. 2. Change the amdgpu_dm_crtc_vrr_active() parameter to const. Because the function will only read data from dm_crtc_state.
Reviewed-by: Sun peng Li sunpeng.li@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Roman Li roman.li@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit d7879340e987b3056b8ae39db255b6c19c170a0d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 ++++-- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.h | 2 +- 3 files changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -8889,6 +8889,7 @@ static void amdgpu_dm_enable_self_refres struct replay_settings *pr = &acrtc_state->stream->link->replay_settings; struct amdgpu_dm_connector *aconn = (struct amdgpu_dm_connector *)acrtc_state->stream->dm_stream_context; + bool vrr_active = amdgpu_dm_crtc_vrr_active(acrtc_state);
if (acrtc_state->update_type > UPDATE_TYPE_FAST) { if (pr->config.replay_supported && !pr->replay_feature_enabled) @@ -8915,7 +8916,8 @@ static void amdgpu_dm_enable_self_refres * adequate number of fast atomic commits to notify KMD * of update events. See `vblank_control_worker()`. */ - if (acrtc_attach->dm_irq_params.allow_sr_entry && + if (!vrr_active && + acrtc_attach->dm_irq_params.allow_sr_entry && #ifdef CONFIG_DRM_AMD_SECURE_DISPLAY !amdgpu_dm_crc_window_is_activated(acrtc_state->base.crtc) && #endif @@ -9259,7 +9261,7 @@ static void amdgpu_dm_commit_planes(stru bundle->stream_update.abm_level = &acrtc_state->abm_level;
mutex_lock(&dm->dc_lock); - if (acrtc_state->update_type > UPDATE_TYPE_FAST) { + if ((acrtc_state->update_type > UPDATE_TYPE_FAST) || vrr_active) { if (acrtc_state->stream->link->replay_settings.replay_allow_active) amdgpu_dm_replay_disable(acrtc_state->stream); if (acrtc_state->stream->link->psr_settings.psr_allow_active) --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c @@ -93,7 +93,7 @@ int amdgpu_dm_crtc_set_vupdate_irq(struc return rc; }
-bool amdgpu_dm_crtc_vrr_active(struct dm_crtc_state *dm_state) +bool amdgpu_dm_crtc_vrr_active(const struct dm_crtc_state *dm_state) { return dm_state->freesync_config.state == VRR_STATE_ACTIVE_VARIABLE || dm_state->freesync_config.state == VRR_STATE_ACTIVE_FIXED; --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.h @@ -37,7 +37,7 @@ int amdgpu_dm_crtc_set_vupdate_irq(struc
bool amdgpu_dm_crtc_vrr_active_irq(struct amdgpu_crtc *acrtc);
-bool amdgpu_dm_crtc_vrr_active(struct dm_crtc_state *dm_state); +bool amdgpu_dm_crtc_vrr_active(const struct dm_crtc_state *dm_state);
int amdgpu_dm_crtc_enable_vblank(struct drm_crtc *crtc);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leo Li sunpeng.li@amd.com
commit ff2e4d874726c549130308b6b46aa0f8a34e04cb upstream.
[Why]
Outside of a modeset/link configuration change, we should not have to wait for the panel to exit PSR. Depending on the panel and it's state, it may take multiple frames for it to exit PSR. Therefore, waiting in all scenarios may cause perceived stuttering, especially in combination with faster vblank shutdown.
[How]
PSR1 disable is hooked up to the vblank enable event, and vice versa. In case of vblank enable, do not wait for panel to exit PSR, but still wait in all other cases.
We also avoid a call to unnecessarily change power_opts on disable - this ends up sending another command to dmcub fw.
When testing against IGT, some crc tests like kms_plane_alpha_blend and amd_hotplug were failing due to CRC timeouts. This was found to be caused by the early return before HW has fully exited PSR1. Fix this by first making sure we grab a vblank reference, then waiting for panel to exit PSR1, before programming hw for CRC generation.
Fixes: 58a261bfc967 ("drm/amd/display: use a more lax vblank enable policy for older ASICs") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3743 Reviewed-by: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Leo Li sunpeng.li@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit aa6713fa2046f4c09bf3013dd1420ae15603ca6f) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 - drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c | 25 ++++++---- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 2 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 2 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 35 ++++++++++++-- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.h | 3 - 6 files changed, 54 insertions(+), 17 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -9095,7 +9095,7 @@ static void amdgpu_dm_commit_planes(stru acrtc_state->stream->link->psr_settings.psr_dirty_rects_change_timestamp_ns = timestamp_ns; if (acrtc_state->stream->link->psr_settings.psr_allow_active) - amdgpu_dm_psr_disable(acrtc_state->stream); + amdgpu_dm_psr_disable(acrtc_state->stream, true); mutex_unlock(&dm->dc_lock); } } @@ -9265,7 +9265,7 @@ static void amdgpu_dm_commit_planes(stru if (acrtc_state->stream->link->replay_settings.replay_allow_active) amdgpu_dm_replay_disable(acrtc_state->stream); if (acrtc_state->stream->link->psr_settings.psr_allow_active) - amdgpu_dm_psr_disable(acrtc_state->stream); + amdgpu_dm_psr_disable(acrtc_state->stream, true); } mutex_unlock(&dm->dc_lock);
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c @@ -30,6 +30,7 @@ #include "amdgpu_dm.h" #include "dc.h" #include "amdgpu_securedisplay.h" +#include "amdgpu_dm_psr.h"
static const char *const pipe_crc_sources[] = { "none", @@ -224,6 +225,10 @@ int amdgpu_dm_crtc_configure_crc_source(
mutex_lock(&adev->dm.dc_lock);
+ /* For PSR1, check that the panel has exited PSR */ + if (stream_state->link->psr_settings.psr_version < DC_PSR_VERSION_SU_1) + amdgpu_dm_psr_wait_disable(stream_state); + /* Enable or disable CRTC CRC generation */ if (dm_is_crc_source_crtc(source) || source == AMDGPU_DM_PIPE_CRC_SOURCE_NONE) { if (!dc_stream_configure_crc(stream_state->ctx->dc, @@ -357,6 +362,17 @@ int amdgpu_dm_crtc_set_crc_source(struct
}
+ /* + * Reading the CRC requires the vblank interrupt handler to be + * enabled. Keep a reference until CRC capture stops. + */ + enabled = amdgpu_dm_is_valid_crc_source(cur_crc_src); + if (!enabled && enable) { + ret = drm_crtc_vblank_get(crtc); + if (ret) + goto cleanup; + } + #if defined(CONFIG_DRM_AMD_SECURE_DISPLAY) /* Reset secure_display when we change crc source from debugfs */ amdgpu_dm_set_crc_window_default(crtc, crtc_state->stream); @@ -367,16 +383,7 @@ int amdgpu_dm_crtc_set_crc_source(struct goto cleanup; }
- /* - * Reading the CRC requires the vblank interrupt handler to be - * enabled. Keep a reference until CRC capture stops. - */ - enabled = amdgpu_dm_is_valid_crc_source(cur_crc_src); if (!enabled && enable) { - ret = drm_crtc_vblank_get(crtc); - if (ret) - goto cleanup; - if (dm_is_crc_source_dprx(source)) { if (drm_dp_start_crc(aux, crtc)) { DRM_DEBUG_DRIVER("dp start crc failed\n"); --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c @@ -142,7 +142,7 @@ static void amdgpu_dm_crtc_set_panel_sr_ amdgpu_dm_replay_enable(vblank_work->stream, true); } else if (vblank_enabled) { if (link->psr_settings.psr_version < DC_PSR_VERSION_SU_1 && is_sr_active) - amdgpu_dm_psr_disable(vblank_work->stream); + amdgpu_dm_psr_disable(vblank_work->stream, false); } else if (link->psr_settings.psr_feature_enabled && allow_sr_entry && !is_sr_active && !is_crc_window_active) {
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c @@ -3638,7 +3638,7 @@ static int crc_win_update_set(void *data /* PSR may write to OTG CRC window control register, * so close it before starting secure_display. */ - amdgpu_dm_psr_disable(acrtc->dm_irq_params.stream); + amdgpu_dm_psr_disable(acrtc->dm_irq_params.stream, true);
spin_lock_irq(&adev_to_drm(adev)->event_lock);
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c @@ -201,14 +201,13 @@ void amdgpu_dm_psr_enable(struct dc_stre * * Return: true if success */ -bool amdgpu_dm_psr_disable(struct dc_stream_state *stream) +bool amdgpu_dm_psr_disable(struct dc_stream_state *stream, bool wait) { - unsigned int power_opt = 0; bool psr_enable = false;
DRM_DEBUG_DRIVER("Disabling psr...\n");
- return dc_link_set_psr_allow_active(stream->link, &psr_enable, true, false, &power_opt); + return dc_link_set_psr_allow_active(stream->link, &psr_enable, wait, false, NULL); }
/* @@ -251,3 +250,33 @@ bool amdgpu_dm_psr_is_active_allowed(str
return allow_active; } + +/** + * amdgpu_dm_psr_wait_disable() - Wait for eDP panel to exit PSR + * @stream: stream state attached to the eDP link + * + * Waits for a max of 500ms for the eDP panel to exit PSR. + * + * Return: true if panel exited PSR, false otherwise. + */ +bool amdgpu_dm_psr_wait_disable(struct dc_stream_state *stream) +{ + enum dc_psr_state psr_state = PSR_STATE0; + struct dc_link *link = stream->link; + int retry_count; + + if (link == NULL) + return false; + + for (retry_count = 0; retry_count <= 1000; retry_count++) { + dc_link_get_psr_state(link, &psr_state); + if (psr_state == PSR_STATE0) + break; + udelay(500); + } + + if (retry_count == 1000) + return false; + + return true; +} --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.h @@ -34,8 +34,9 @@ void amdgpu_dm_set_psr_caps(struct dc_link *link); void amdgpu_dm_psr_enable(struct dc_stream_state *stream); bool amdgpu_dm_link_setup_psr(struct dc_stream_state *stream); -bool amdgpu_dm_psr_disable(struct dc_stream_state *stream); +bool amdgpu_dm_psr_disable(struct dc_stream_state *stream, bool wait); bool amdgpu_dm_psr_disable_all(struct amdgpu_display_manager *dm); bool amdgpu_dm_psr_is_active_allowed(struct amdgpu_display_manager *dm); +bool amdgpu_dm_psr_wait_disable(struct dc_stream_state *stream);
#endif /* AMDGPU_DM_AMDGPU_DM_PSR_H_ */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicholas Susanto Nicholas.Susanto@amd.com
commit 3412860cc4c0c484f53f91b371483e6e4440c3e5 upstream.
Revert commit 284f141f5ce5 ("drm/amd/display: Enable urgent latency adjustments for DCN35")
[Why & How]
Urgent latency increase caused 2.8K OLED monitor caused it to block this panel support P0.
Reverting this change does not reintroduce the netflix corruption issue which it fixed.
Fixes: 284f141f5ce5 ("drm/amd/display: Enable urgent latency adjustments for DCN35") Reviewed-by: Charlene Liu charlene.liu@amd.com Signed-off-by: Nicholas Susanto Nicholas.Susanto@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit c7ccfc0d4241a834c25a9a9e1e78b388b4445d23) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dml/dcn35/dcn35_fpu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn35/dcn35_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn35/dcn35_fpu.c @@ -195,9 +195,9 @@ struct _vcs_dpi_soc_bounding_box_st dcn3 .dcn_downspread_percent = 0.5, .gpuvm_min_page_size_bytes = 4096, .hostvm_min_page_size_bytes = 4096, - .do_urgent_latency_adjustment = 1, + .do_urgent_latency_adjustment = 0, .urgent_latency_adjustment_fabric_clock_component_us = 0, - .urgent_latency_adjustment_fabric_clock_reference_mhz = 3000, + .urgent_latency_adjustment_fabric_clock_reference_mhz = 0, };
void dcn35_build_wm_range_table_fpu(struct clk_mgr *clk_mgr)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wayne Lin Wayne.Lin@amd.com
commit b5cd418f016fb801be413fd52fe4711d2d13018c upstream.
[Why & How] Currently in dm_dp_mst_is_port_support_mode(), when valdidating mode under dsc decoding at the last DP link config, we only validate the case when there is an UFP. However, if the MSTB LCT=1, there is no UFP.
Under this case, use root_link_bw_in_kbps as the available bw to compare.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3720 Fixes: fa57924c76d9 ("drm/amd/display: Refactor function dm_dp_mst_is_port_support_mode()") Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Reviewed-by: Jerry Zuo jerry.zuo@amd.com Signed-off-by: Wayne Lin Wayne.Lin@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit a04d9534a8a75b2806c5321c387be450c364b55e) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 14 +++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -1831,11 +1831,15 @@ enum dc_status dm_dp_mst_is_port_support if (immediate_upstream_port) { virtual_channel_bw_in_kbps = kbps_from_pbn(immediate_upstream_port->full_pbn); virtual_channel_bw_in_kbps = min(root_link_bw_in_kbps, virtual_channel_bw_in_kbps); - if (bw_range.min_kbps > virtual_channel_bw_in_kbps) { - DRM_DEBUG_DRIVER("MST_DSC dsc decode at last link." - "Max dsc compression can't fit into MST available bw\n"); - return DC_FAIL_BANDWIDTH_VALIDATE; - } + } else { + /* For topology LCT 1 case - only one mstb*/ + virtual_channel_bw_in_kbps = root_link_bw_in_kbps; + } + + if (bw_range.min_kbps > virtual_channel_bw_in_kbps) { + DRM_DEBUG_DRIVER("MST_DSC dsc decode at last link." + "Max dsc compression can't fit into MST available bw\n"); + return DC_FAIL_BANDWIDTH_VALIDATE; } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryan Lee ryan.lee@canonical.com
commit 17d0d04f3c999e7784648bad70ce1766c3b49d69 upstream.
attach->xmatch was not set when allocating a null profile, which is used in complain mode to allocate a learning profile. This was causing downstream failures in find_attach, which expected a valid xmatch but did not find one under a certain sequence of profile transitions in complain mode.
This patch ensures the xmatch is set up properly for null profiles.
Signed-off-by: Ryan Lee ryan.lee@canonical.com Signed-off-by: John Johansen john.johansen@canonical.com Cc: Paul Kramme kramme@digitalmanufaktur.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/apparmor/policy.c | 1 + 1 file changed, 1 insertion(+)
--- a/security/apparmor/policy.c +++ b/security/apparmor/policy.c @@ -626,6 +626,7 @@ struct aa_profile *aa_alloc_null(struct
/* TODO: ideally we should inherit abi from parent */ profile->label.flags |= FLAG_NULL; + profile->attach.xmatch = aa_get_pdb(nullpdb); rules = list_first_entry(&profile->rules, typeof(*rules), list); rules->file = aa_get_pdb(nullpdb); rules->policy = aa_get_pdb(nullpdb);
On 1/21/25 09:50, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
Am 21.01.2025 um 18:50 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
Hi Greg,
On Tue, Jan 21, 2025 at 06:50:48PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
Built and lightly tested, when booting I'm noticing the following in dmesg:
[ +0.007932] ------------[ cut here ]------------ [ +0.000003] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:5250 place_entity+0x127/0x130 [ +0.000006] Modules linked in: ahci(E) libahci(E) crc32_pclmul(E) xhci_hcd(E) libata(E) psmouse(E) crc32c_intel(E) > [ +0.000021] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G E 6.12.11-rc1+ #1 [ +0.000004] Tainted: [E]=UNSIGNED_MODULE [ +0.000002] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ +0.000002] RIP: 0010:place_entity+0x127/0x130 [ +0.000003] Code: 01 6b 28 c6 43 52 00 5b 5d 41 5c 41 5d 41 5e e9 2f 83 bc 00 b9 02 00 00 00 49 c1 ee 0a 49 39 ce 4> [ +0.000002] RSP: 0018:ffffbe1f400f8d08 EFLAGS: 00010046 [ +0.000003] RAX: 0000000000000000 RBX: ffff9ed7c0c0f200 RCX: 00000000000000c2 [ +0.000002] RDX: 0000000000000000 RSI: 000000000000001d RDI: 000000000078cfd5 [ +0.000002] RBP: 0000000029d40d60 R08: 00000000a8e83f00 R09: 0000000000000002 [ +0.000002] R10: 00000000006e3ab2 R11: ffff9ed7d4056690 R12: ffff9ed83bd360c0 [ +0.000002] R13: 0000000000000000 R14: 00000000000000c2 R15: 000000000016e360 [ +0.000003] FS: 0000000000000000(0000) GS:ffff9ed83bd00000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000002] CR2: 00007f9f5a7245d8 CR3: 0000000100bfc000 CR4: 0000000000350ef0 [ +0.000003] Call Trace: [ +0.000003] <IRQ> [ +0.000002] ? place_entity+0x127/0x130 [ +0.000002] ? __warn.cold+0x93/0xf6 [ +0.000004] ? place_entity+0x127/0x130 [ +0.000003] ? report_bug+0xff/0x140 [ +0.000005] ? handle_bug+0x58/0x90 [ +0.000002] ? exc_invalid_op+0x17/0x70 [ +0.000003] ? asm_exc_invalid_op+0x1a/0x20 [ +0.000006] ? place_entity+0x127/0x130 [ +0.000003] ? place_entity+0x99/0x130 [ +0.000004] reweight_entity+0x1af/0x1d0 [ +0.000003] enqueue_task_fair+0x30c/0x5e0 [ +0.000005] enqueue_task+0x35/0x150 [ +0.000004] activate_task+0x3a/0x60 [ +0.000003] sched_balance_rq+0x7c6/0xee0 [ +0.000008] sched_balance_domains+0x25b/0x350 [ +0.000005] handle_softirqs+0xcf/0x280 [ +0.000006] __irq_exit_rcu+0x8d/0xb0 [ +0.000003] sysvec_apic_timer_interrupt+0x71/0x90 [ +0.000003] </IRQ> [ +0.000002] <TASK> [ +0.000002] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ +0.000002] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ +0.000004] Code: 22 d7 e9 b4 01 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 0> [ +0.000002] RSP: 0018:ffffbe1f400bbed8 EFLAGS: 00000202 [ +0.000003] RAX: 0000000000000001 RBX: ffff9ed7c033e600 RCX: ffff9ed7c0647830 [ +0.000001] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000000017b4 [ +0.000002] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 [ +0.000002] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ +0.000001] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ +0.000006] default_idle+0x9/0x20 [ +0.000003] default_idle_call+0x29/0x100 [ +0.000002] do_idle+0x1fe/0x240 [ +0.000005] cpu_startup_entry+0x29/0x30 [ +0.000003] start_secondary+0x11e/0x140 [ +0.000004] common_startup_64+0x13e/0x141 [ +0.000007] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]---
Not yet bisected which change causes it.
Regards, Salvatore
On 21.01.25 22:46, Salvatore Bonaccorso wrote:
Hi Greg,
...
Built and lightly tested, when booting I'm noticing the following in dmesg:
[ +0.007932] ------------[ cut here ]------------ [ +0.000003] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:5250 place_entity+0x127/0x130
same here [no time to bisect]
Ronald
Hi Greg,
On 21/01/2025 21:46, Salvatore Bonaccorso wrote:
Hi Greg,
On Tue, Jan 21, 2025 at 06:50:48PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
Built and lightly tested, when booting I'm noticing the following in dmesg:
[ +0.007932] ------------[ cut here ]------------ [ +0.000003] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:5250 place_entity+0x127/0x130 [ +0.000006] Modules linked in: ahci(E) libahci(E) crc32_pclmul(E) xhci_hcd(E) libata(E) psmouse(E) crc32c_intel(E) > [ +0.000021] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G E 6.12.11-rc1+ #1 [ +0.000004] Tainted: [E]=UNSIGNED_MODULE [ +0.000002] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ +0.000002] RIP: 0010:place_entity+0x127/0x130 [ +0.000003] Code: 01 6b 28 c6 43 52 00 5b 5d 41 5c 41 5d 41 5e e9 2f 83 bc 00 b9 02 00 00 00 49 c1 ee 0a 49 39 ce 4> [ +0.000002] RSP: 0018:ffffbe1f400f8d08 EFLAGS: 00010046 [ +0.000003] RAX: 0000000000000000 RBX: ffff9ed7c0c0f200 RCX: 00000000000000c2 [ +0.000002] RDX: 0000000000000000 RSI: 000000000000001d RDI: 000000000078cfd5 [ +0.000002] RBP: 0000000029d40d60 R08: 00000000a8e83f00 R09: 0000000000000002 [ +0.000002] R10: 00000000006e3ab2 R11: ffff9ed7d4056690 R12: ffff9ed83bd360c0 [ +0.000002] R13: 0000000000000000 R14: 00000000000000c2 R15: 000000000016e360 [ +0.000003] FS: 0000000000000000(0000) GS:ffff9ed83bd00000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000002] CR2: 00007f9f5a7245d8 CR3: 0000000100bfc000 CR4: 0000000000350ef0 [ +0.000003] Call Trace: [ +0.000003] <IRQ> [ +0.000002] ? place_entity+0x127/0x130 [ +0.000002] ? __warn.cold+0x93/0xf6 [ +0.000004] ? place_entity+0x127/0x130 [ +0.000003] ? report_bug+0xff/0x140 [ +0.000005] ? handle_bug+0x58/0x90 [ +0.000002] ? exc_invalid_op+0x17/0x70 [ +0.000003] ? asm_exc_invalid_op+0x1a/0x20 [ +0.000006] ? place_entity+0x127/0x130 [ +0.000003] ? place_entity+0x99/0x130 [ +0.000004] reweight_entity+0x1af/0x1d0 [ +0.000003] enqueue_task_fair+0x30c/0x5e0 [ +0.000005] enqueue_task+0x35/0x150 [ +0.000004] activate_task+0x3a/0x60 [ +0.000003] sched_balance_rq+0x7c6/0xee0 [ +0.000008] sched_balance_domains+0x25b/0x350 [ +0.000005] handle_softirqs+0xcf/0x280 [ +0.000006] __irq_exit_rcu+0x8d/0xb0 [ +0.000003] sysvec_apic_timer_interrupt+0x71/0x90 [ +0.000003] </IRQ> [ +0.000002] <TASK> [ +0.000002] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ +0.000002] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ +0.000004] Code: 22 d7 e9 b4 01 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 0> [ +0.000002] RSP: 0018:ffffbe1f400bbed8 EFLAGS: 00000202 [ +0.000003] RAX: 0000000000000001 RBX: ffff9ed7c033e600 RCX: ffff9ed7c0647830 [ +0.000001] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000000017b4 [ +0.000002] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 [ +0.000002] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ +0.000001] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ +0.000006] default_idle+0x9/0x20 [ +0.000003] default_idle_call+0x29/0x100 [ +0.000002] do_idle+0x1fe/0x240 [ +0.000005] cpu_startup_entry+0x29/0x30 [ +0.000003] start_secondary+0x11e/0x140 [ +0.000004] common_startup_64+0x13e/0x141 [ +0.000007] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]---
Not yet bisected which change causes it.
I am seeing the same on Tegra. I have not looked to see which change this is yet either.
Jon
On 1/21/25 23:04, Jon Hunter wrote:
Hi Greg,
On 21/01/2025 21:46, Salvatore Bonaccorso wrote:
Hi Greg,
On Tue, Jan 21, 2025 at 06:50:48PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1...
or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
Built and lightly tested, when booting I'm noticing the following in dmesg:
[ +0.007932] ------------[ cut here ]------------ [ +0.000003] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:5250 place_entity+0x127/0x130 [ +0.000006] Modules linked in: ahci(E) libahci(E) crc32_pclmul(E) xhci_hcd(E) libata(E) psmouse(E) crc32c_intel(E) > [ +0.000021] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G E 6.12.11-rc1+ #1 [ +0.000004] Tainted: [E]=UNSIGNED_MODULE [ +0.000002] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ +0.000002] RIP: 0010:place_entity+0x127/0x130 [ +0.000003] Code: 01 6b 28 c6 43 52 00 5b 5d 41 5c 41 5d 41 5e e9 2f 83 bc 00 b9 02 00 00 00 49 c1 ee 0a 49 39 ce 4> [ +0.000002] RSP: 0018:ffffbe1f400f8d08 EFLAGS: 00010046 [ +0.000003] RAX: 0000000000000000 RBX: ffff9ed7c0c0f200 RCX: 00000000000000c2 [ +0.000002] RDX: 0000000000000000 RSI: 000000000000001d RDI: 000000000078cfd5 [ +0.000002] RBP: 0000000029d40d60 R08: 00000000a8e83f00 R09: 0000000000000002 [ +0.000002] R10: 00000000006e3ab2 R11: ffff9ed7d4056690 R12: ffff9ed83bd360c0 [ +0.000002] R13: 0000000000000000 R14: 00000000000000c2 R15: 000000000016e360 [ +0.000003] FS: 0000000000000000(0000) GS:ffff9ed83bd00000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000002] CR2: 00007f9f5a7245d8 CR3: 0000000100bfc000 CR4: 0000000000350ef0 [ +0.000003] Call Trace: [ +0.000003] <IRQ> [ +0.000002] ? place_entity+0x127/0x130 [ +0.000002] ? __warn.cold+0x93/0xf6 [ +0.000004] ? place_entity+0x127/0x130 [ +0.000003] ? report_bug+0xff/0x140 [ +0.000005] ? handle_bug+0x58/0x90 [ +0.000002] ? exc_invalid_op+0x17/0x70 [ +0.000003] ? asm_exc_invalid_op+0x1a/0x20 [ +0.000006] ? place_entity+0x127/0x130 [ +0.000003] ? place_entity+0x99/0x130 [ +0.000004] reweight_entity+0x1af/0x1d0 [ +0.000003] enqueue_task_fair+0x30c/0x5e0 [ +0.000005] enqueue_task+0x35/0x150 [ +0.000004] activate_task+0x3a/0x60 [ +0.000003] sched_balance_rq+0x7c6/0xee0 [ +0.000008] sched_balance_domains+0x25b/0x350 [ +0.000005] handle_softirqs+0xcf/0x280 [ +0.000006] __irq_exit_rcu+0x8d/0xb0 [ +0.000003] sysvec_apic_timer_interrupt+0x71/0x90 [ +0.000003] </IRQ> [ +0.000002] <TASK> [ +0.000002] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ +0.000002] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ +0.000004] Code: 22 d7 e9 b4 01 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 0> [ +0.000002] RSP: 0018:ffffbe1f400bbed8 EFLAGS: 00000202 [ +0.000003] RAX: 0000000000000001 RBX: ffff9ed7c033e600 RCX: ffff9ed7c0647830 [ +0.000001] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000000017b4 [ +0.000002] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 [ +0.000002] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ +0.000001] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ +0.000006] default_idle+0x9/0x20 [ +0.000003] default_idle_call+0x29/0x100 [ +0.000002] do_idle+0x1fe/0x240 [ +0.000005] cpu_startup_entry+0x29/0x30 [ +0.000003] start_secondary+0x11e/0x140 [ +0.000004] common_startup_64+0x13e/0x141 [ +0.000007] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]---
Not yet bisected which change causes it.
I am seeing the same on Tegra. I have not looked to see which change this is yet either.
Jon
Seeing this on RISC-V also.
Jan 21 23:51:10 riscv64 kernel: WARNING: CPU: 3 PID: 16 at kernel/sched/fair.c:5250 place_entity+0x108/0x110 Jan 21 23:51:10 riscv64 kernel: Modules linked in: Jan 21 23:51:10 riscv64 kernel: CPU: 3 UID: 0 PID: 16 Comm: rcu_preempt Not tainted 6.12.11-rc1 #2 Jan 21 23:51:10 riscv64 kernel: Hardware name: SiFive HiFive Unmatched A00 (DT) Jan 21 23:51:10 riscv64 kernel: epc : place_entity+0x108/0x110 Jan 21 23:51:10 riscv64 kernel: ra : place_entity+0x9c/0x110 Jan 21 23:51:10 riscv64 kernel: epc : ffffffff8006ed68 ra : ffffffff8006ecfc sp : ffffffc60009bac0 Jan 21 23:51:10 riscv64 kernel: gp : ffffffff81fa6318 tp : ffffffd680323b00 t0 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: t1 : 0000000000000000 t2 : 0000000000000000 s0 : ffffffc60009bb00 Jan 21 23:51:10 riscv64 kernel: s1 : ffffffd6809f5000 a0 : 000000000053b491 a1 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: a2 : 0000000000000002 a3 : 0000000000000000 a4 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: s2 : 0000000031acb37b s3 : 0000000000000000 s4 : 000000000006920d Jan 21 23:51:10 riscv64 kernel: s5 : ffffffd9fedb2a00 s6 : ffffffd9fedb2900 s7 : ffffffd6829cbb00 Jan 21 23:51:10 riscv64 kernel: s8 : 0000000000000000 s9 : 0000000000000001 s10: 0000000000000201 Jan 21 23:51:10 riscv64 kernel: s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: t5 : 0000000000000000 t6 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: status: 0000000200000100 badaddr: 0000000000000002 cause: 0000000000000003 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8006ed68>] place_entity+0x108/0x110 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8006efe2>] reweight_entity+0x182/0x1b0 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8006f0c8>] update_cfs_group+0x78/0xa8 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8006fc4c>] dequeue_entities+0x114/0x3d8 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80070084>] pick_task_fair+0xb4/0x110 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80077a3c>] pick_next_task_fair+0x1c/0x2f0 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80dc5556>] __schedule+0x176/0xc20 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80dc6022>] schedule+0x22/0x148 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80dcc072>] schedule_timeout+0x82/0x178 Jan 21 23:51:10 riscv64 kernel: [<ffffffff800cb996>] rcu_gp_fqs_loop+0x346/0x4e8 Jan 21 23:51:10 riscv64 kernel: [<ffffffff800cea62>] rcu_gp_kthread+0x12a/0x150 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8004db58>] kthread+0xc8/0xe8 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80dceefe>] ret_from_fork+0xe/0x18 Jan 21 23:51:10 riscv64 kernel: ---[ end trace 0000000000000000 ]---
Reverting commit 11d21f2ae4a254aeef84fcf7b6314f5dd045151f "sched/fair: Fix EEVDF entity placement bug causing scheduling lag" removes the warning.
On Wed, Jan 22, 2025 at 01:01:42AM -0800, Ron Economos wrote:
On 1/21/25 23:04, Jon Hunter wrote:
Hi Greg,
On 21/01/2025 21:46, Salvatore Bonaccorso wrote:
Hi Greg,
On Tue, Jan 21, 2025 at 06:50:48PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1...
or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
Built and lightly tested, when booting I'm noticing the following in dmesg:
[ +0.007932] ------------[ cut here ]------------ [ +0.000003] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:5250 place_entity+0x127/0x130 [ +0.000006] Modules linked in: ahci(E) libahci(E) crc32_pclmul(E) xhci_hcd(E) libata(E) psmouse(E) crc32c_intel(E) > [ +0.000021] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G E 6.12.11-rc1+ #1 [ +0.000004] Tainted: [E]=UNSIGNED_MODULE [ +0.000002] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ +0.000002] RIP: 0010:place_entity+0x127/0x130 [ +0.000003] Code: 01 6b 28 c6 43 52 00 5b 5d 41 5c 41 5d 41 5e e9 2f 83 bc 00 b9 02 00 00 00 49 c1 ee 0a 49 39 ce 4> [ +0.000002] RSP: 0018:ffffbe1f400f8d08 EFLAGS: 00010046 [ +0.000003] RAX: 0000000000000000 RBX: ffff9ed7c0c0f200 RCX: 00000000000000c2 [ +0.000002] RDX: 0000000000000000 RSI: 000000000000001d RDI: 000000000078cfd5 [ +0.000002] RBP: 0000000029d40d60 R08: 00000000a8e83f00 R09: 0000000000000002 [ +0.000002] R10: 00000000006e3ab2 R11: ffff9ed7d4056690 R12: ffff9ed83bd360c0 [ +0.000002] R13: 0000000000000000 R14: 00000000000000c2 R15: 000000000016e360 [ +0.000003] FS: 0000000000000000(0000) GS:ffff9ed83bd00000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000002] CR2: 00007f9f5a7245d8 CR3: 0000000100bfc000 CR4: 0000000000350ef0 [ +0.000003] Call Trace: [ +0.000003] <IRQ> [ +0.000002] ? place_entity+0x127/0x130 [ +0.000002] ? __warn.cold+0x93/0xf6 [ +0.000004] ? place_entity+0x127/0x130 [ +0.000003] ? report_bug+0xff/0x140 [ +0.000005] ? handle_bug+0x58/0x90 [ +0.000002] ? exc_invalid_op+0x17/0x70 [ +0.000003] ? asm_exc_invalid_op+0x1a/0x20 [ +0.000006] ? place_entity+0x127/0x130 [ +0.000003] ? place_entity+0x99/0x130 [ +0.000004] reweight_entity+0x1af/0x1d0 [ +0.000003] enqueue_task_fair+0x30c/0x5e0 [ +0.000005] enqueue_task+0x35/0x150 [ +0.000004] activate_task+0x3a/0x60 [ +0.000003] sched_balance_rq+0x7c6/0xee0 [ +0.000008] sched_balance_domains+0x25b/0x350 [ +0.000005] handle_softirqs+0xcf/0x280 [ +0.000006] __irq_exit_rcu+0x8d/0xb0 [ +0.000003] sysvec_apic_timer_interrupt+0x71/0x90 [ +0.000003] </IRQ> [ +0.000002] <TASK> [ +0.000002] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ +0.000002] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ +0.000004] Code: 22 d7 e9 b4 01 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 0> [ +0.000002] RSP: 0018:ffffbe1f400bbed8 EFLAGS: 00000202 [ +0.000003] RAX: 0000000000000001 RBX: ffff9ed7c033e600 RCX: ffff9ed7c0647830 [ +0.000001] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000000017b4 [ +0.000002] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 [ +0.000002] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ +0.000001] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ +0.000006] default_idle+0x9/0x20 [ +0.000003] default_idle_call+0x29/0x100 [ +0.000002] do_idle+0x1fe/0x240 [ +0.000005] cpu_startup_entry+0x29/0x30 [ +0.000003] start_secondary+0x11e/0x140 [ +0.000004] common_startup_64+0x13e/0x141 [ +0.000007] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]---
Not yet bisected which change causes it.
I am seeing the same on Tegra. I have not looked to see which change this is yet either.
Jon
Seeing this on RISC-V also.
Jan 21 23:51:10 riscv64 kernel: WARNING: CPU: 3 PID: 16 at kernel/sched/fair.c:5250 place_entity+0x108/0x110 Jan 21 23:51:10 riscv64 kernel: Modules linked in: Jan 21 23:51:10 riscv64 kernel: CPU: 3 UID: 0 PID: 16 Comm: rcu_preempt Not tainted 6.12.11-rc1 #2 Jan 21 23:51:10 riscv64 kernel: Hardware name: SiFive HiFive Unmatched A00 (DT) Jan 21 23:51:10 riscv64 kernel: epc : place_entity+0x108/0x110 Jan 21 23:51:10 riscv64 kernel: ra : place_entity+0x9c/0x110 Jan 21 23:51:10 riscv64 kernel: epc : ffffffff8006ed68 ra : ffffffff8006ecfc sp : ffffffc60009bac0 Jan 21 23:51:10 riscv64 kernel: gp : ffffffff81fa6318 tp : ffffffd680323b00 t0 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: t1 : 0000000000000000 t2 : 0000000000000000 s0 : ffffffc60009bb00 Jan 21 23:51:10 riscv64 kernel: s1 : ffffffd6809f5000 a0 : 000000000053b491 a1 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: a2 : 0000000000000002 a3 : 0000000000000000 a4 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: s2 : 0000000031acb37b s3 : 0000000000000000 s4 : 000000000006920d Jan 21 23:51:10 riscv64 kernel: s5 : ffffffd9fedb2a00 s6 : ffffffd9fedb2900 s7 : ffffffd6829cbb00 Jan 21 23:51:10 riscv64 kernel: s8 : 0000000000000000 s9 : 0000000000000001 s10: 0000000000000201 Jan 21 23:51:10 riscv64 kernel: s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: t5 : 0000000000000000 t6 : 0000000000000000 Jan 21 23:51:10 riscv64 kernel: status: 0000000200000100 badaddr: 0000000000000002 cause: 0000000000000003 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8006ed68>] place_entity+0x108/0x110 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8006efe2>] reweight_entity+0x182/0x1b0 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8006f0c8>] update_cfs_group+0x78/0xa8 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8006fc4c>] dequeue_entities+0x114/0x3d8 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80070084>] pick_task_fair+0xb4/0x110 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80077a3c>] pick_next_task_fair+0x1c/0x2f0 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80dc5556>] __schedule+0x176/0xc20 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80dc6022>] schedule+0x22/0x148 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80dcc072>] schedule_timeout+0x82/0x178 Jan 21 23:51:10 riscv64 kernel: [<ffffffff800cb996>] rcu_gp_fqs_loop+0x346/0x4e8 Jan 21 23:51:10 riscv64 kernel: [<ffffffff800cea62>] rcu_gp_kthread+0x12a/0x150 Jan 21 23:51:10 riscv64 kernel: [<ffffffff8004db58>] kthread+0xc8/0xe8 Jan 21 23:51:10 riscv64 kernel: [<ffffffff80dceefe>] ret_from_fork+0xe/0x18 Jan 21 23:51:10 riscv64 kernel: ---[ end trace 0000000000000000 ]---
Reverting commit 11d21f2ae4a254aeef84fcf7b6314f5dd045151f "sched/fair: Fix EEVDF entity placement bug causing scheduling lag" removes the warning.
Ah, thanks! I'll go revert that now and push out a -rc2.
greg k-h
Hi Greg
On Wed, Jan 22, 2025 at 2:59 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
6.12.11-rc1 tested.
Build successfully completed. Boot successfully completed. No dmesg regressions. Video output normal. Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
[ 0.000000] Linux version 6.12.11-rc1rv (takeshi@ThinkPadX1Gen10J0764) (gcc (GCC) 14.2.1 20240910, GNU ld (GNU Binutils) 2.43.1) #1 SMP PREEMPT_DYNAMIC Wed Jan 22 07:38:39 JST 2025
Thanks
Tested-by: Takeshi Ogasawara takeshi.ogasawara@futuring-girl.com
On 1/21/25 10:50, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
Hello,
On Tue, 21 Jan 2025 18:50:48 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/damonitor/damon-tests/tree/next/corr [2] f6b51ed03daa ("Linux 6.12.11-rc1")
Thanks, SJ
[...]
---
ok 9 selftests: damon: damos_tried_regions.py ok 10 selftests: damon: damon_nr_regions.py ok 11 selftests: damon: reclaim.sh ok 12 selftests: damon: lru_sort.sh ok 13 selftests: damon: debugfs_empty_targets.sh ok 14 selftests: damon: debugfs_huge_count_read_write.sh ok 15 selftests: damon: debugfs_duplicate_context_creation.sh ok 16 selftests: damon: debugfs_rm_non_contexts.sh ok 17 selftests: damon: debugfs_target_ids_read_before_terminate_race.sh ok 18 selftests: damon: debugfs_target_ids_pid_leak.sh ok 19 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 20 selftests: damon: sysfs_update_schemes_tried_regions_hang.py ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh # SKIP ok 12 selftests: damon-tests: build_m68k.sh # SKIP ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
On Tue, 21 Jan 2025 at 23:28, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. Regressions on arm64, arm, x86_64, and i386.
The following build warn WARNING: CPU: 1 PID: 22 at kernel/sched/fair.c:5250 place_entity (kernel/sched/fair.c:5250 (discriminator 1))
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Boot regression: exception-warning-cpu-pid-at-kernelschedfair-place_entity
Boot log warnings ---------- <4>[ 160.693155] ------------[ cut here ]------------ <4>[ 160.694179] WARNING: CPU: 1 PID: 22 at kernel/sched/fair.c:5250 place_entity (kernel/sched/fair.c:5250 (discriminator 1)) <4>[ 160.695257] Modules linked in: ip_tables x_tables ipv6 <4>[ 160.696471] CPU: 1 UID: 0 PID: 22 Comm: migration/1 Tainted: G D N 6.12.11-rc1 #1 <4>[ 160.697345] Tainted: [D]=DIE, [N]=TEST <4>[ 160.697903] Hardware name: linux,dummy-virt (DT) <4>[ 160.698415] Stopper: migration_cpu_stop+0x0/0x8c8 <- sched_exec (kernel/sched/core.c:5445) <4>[ 160.699330] pstate: 234020c9 (nzCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[ 160.700189] pc : place_entity (kernel/sched/fair.c:5250 (discriminator 1)) <4>[ 160.700827] lr : place_entity (kernel/sched/fair.c:292 kernel/sched/fair.c:289 kernel/sched/fair.c:5177) <4>[ 160.701267] sp : ffff8000801e79e0 <4>[ 160.701601] x29: ffff8000801e79e0 x28: fff00000cad1d852 x27: fff00000cad1d8c8 <4>[ 160.702716] x26: 0000000000000000 x25: 1ffe0000195a3b00 x24: 0000000000000000 <4>[ 160.703741] x23: 0000000000000000 x22: 0000000000231608 x21: 0000000972092cd2 <4>[ 160.704817] x20: 0000000000000000 x19: fff00000cad1d800 x18: 1ffff000100f2ec5 <4>[ 160.705954] x17: 0000000000000000 x16: fff00000c03a4d3c x15: 0000000000000000 <4>[ 160.706967] x14: 1ffe00001b532adb x13: 1ffe000019c18922 x12: fffd80001982cc21 <4>[ 160.708052] x11: 1ffe00001982cc20 x10: fffd80001982cc20 x9 : dfff800000000000 <4>[ 160.709043] x8 : fff00000cc166107 x7 : fff00000cad1d810 x6 : fffd80001982cc20 <4>[ 160.710037] x5 : fff00000cc166100 x4 : 1ffe0000195a3b01 x3 : 1ffe00001b535b92 <4>[ 160.711016] x2 : 1ffe00001b535b96 x1 : 000000000000029c x0 : 0000000000000000 <4>[ 160.712071] Call trace: <4>[ 160.712597] place_entity (kernel/sched/fair.c:5250 (discriminator 1)) <4>[ 160.713221] reweight_entity (kernel/sched/fair.c:3813) <4>[ 160.713802] update_cfs_group (kernel/sched/fair.c:3975 (discriminator 1)) <4>[ 160.714277] dequeue_entities (kernel/sched/fair.c:7091) <4>[ 160.714903] dequeue_task_fair (kernel/sched/fair.c:7144 (discriminator 1)) <4>[ 160.716502] move_queued_task.isra.0 (kernel/sched/core.c:2437 (discriminator 1)) <4>[ 160.717432] migration_cpu_stop (kernel/sched/core.c:2481 kernel/sched/core.c:2540) <4>[ 160.718309] cpu_stopper_thread (kernel/stop_machine.c:511) <4>[ 160.718871] smpboot_thread_fn (kernel/smpboot.c:164) <4>[ 160.719699] kthread (kernel/kthread.c:389) <4>[ 160.720382] ret_from_fork (arch/arm64/kernel/entry.S:861) <4>[ 160.721092] ---[ end trace 0000000000000000 ]---
metadata: history: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.12.y/build/v6.12.... boot log: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.12.y/build/v6.12....
## Build * kernel: 6.12.11-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: f6b51ed03daaa30e4c3b3969505cc9b1625babca * git describe: v6.12.10-123-gf6b51ed03daa * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.12.y/build/v6.12....
## Test Regressions (compared to v6.12.9-190-gd056ad259f16)
## Metric Regressions (compared to v6.12.9-190-gd056ad259f16)
## Test Fixes (compared to v6.12.9-190-gd056ad259f16)
## Metric Fixes (compared to v6.12.9-190-gd056ad259f16)
## Test result summary total: 117171, pass: 90748, fail: 8735, skip: 17614, xfail: 74
## Build Summary * arc: 6 total, 5 passed, 1 failed * arm: 143 total, 137 passed, 6 failed * arm64: 58 total, 56 passed, 2 failed * i386: 22 total, 19 passed, 3 failed * mips: 38 total, 33 passed, 5 failed * parisc: 5 total, 3 passed, 2 failed * powerpc: 44 total, 40 passed, 4 failed * riscv: 27 total, 24 passed, 3 failed * s390: 26 total, 22 passed, 4 failed * sh: 6 total, 5 passed, 1 failed * sparc: 5 total, 3 passed, 2 failed * x86_64: 50 total, 49 passed, 1 failed
## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-mincore * kselftest-mqueue * kselftest-net * kselftest-net-mptcp * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-rust * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-x86 * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-build-clang * log-parser-build-gcc * log-parser-test * ltp-capability * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
On Wed, Jan 22, 2025 at 03:34:16PM +0530, Naresh Kamboju wrote:
On Tue, 21 Jan 2025 at 23:28, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. Regressions on arm64, arm, x86_64, and i386.
The following build warn WARNING: CPU: 1 PID: 22 at kernel/sched/fair.c:5250 place_entity (kernel/sched/fair.c:5250 (discriminator 1))
Odd, are you seeing the same on .13 ?
On Wed, Jan 22, 2025, at 11:04, Naresh Kamboju wrote:
On Tue, 21 Jan 2025 at 23:28, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
0000000000000000 <4>[ 160.712071] Call trace: <4>[ 160.712597] place_entity (kernel/sched/fair.c:5250 (discriminator 1)) <4>[ 160.713221] reweight_entity (kernel/sched/fair.c:3813) <4>[ 160.713802] update_cfs_group (kernel/sched/fair.c:3975 (discriminator 1)) <4>[ 160.714277] dequeue_entities (kernel/sched/fair.c:7091) <4>[ 160.714903] dequeue_task_fair (kernel/sched/fair.c:7144 (discriminator 1)) <4>[ 160.716502] move_queued_task.isra.0 (kernel/sched/core.c:2437 (discriminator 1))
I don't see anything that immediately sticks out as causing this, but I do see five scheduler patches backported in stable-rc on top of v6.12.8, these are the original commits:
66951e4860d3 ("sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE") 30dd3b13f9de ("sched_ext: keep running prev when prev->scx.slice != 0") a2a3374c47c4 ("sched_ext: idle: Refresh idle masks during idle-to-idle transitions") 68e449d849fd ("sched_ext: switch class when preempted by higher priority scheduler") 6268d5bc1035 ("sched_ext: Replace rq_lock() to raw_spin_rq_lock() in scx_ops_bypass()")
Arnd
On Wed, Jan 22, 2025 at 11:56:13AM +0100, Arnd Bergmann wrote:
On Wed, Jan 22, 2025, at 11:04, Naresh Kamboju wrote:
On Tue, 21 Jan 2025 at 23:28, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
0000000000000000 <4>[ 160.712071] Call trace: <4>[ 160.712597] place_entity (kernel/sched/fair.c:5250 (discriminator 1)) <4>[ 160.713221] reweight_entity (kernel/sched/fair.c:3813) <4>[ 160.713802] update_cfs_group (kernel/sched/fair.c:3975 (discriminator 1)) <4>[ 160.714277] dequeue_entities (kernel/sched/fair.c:7091) <4>[ 160.714903] dequeue_task_fair (kernel/sched/fair.c:7144 (discriminator 1)) <4>[ 160.716502] move_queued_task.isra.0 (kernel/sched/core.c:2437 (discriminator 1))
I don't see anything that immediately sticks out as causing this, but I do see five scheduler patches backported in stable-rc on top of v6.12.8, these are the original commits:
66951e4860d3 ("sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE")
This one reworks reweight_entity(), but I've been running with that on top of 13-rc6 for a week or so and not seen this.
On 1/22/25 02:59, Peter Zijlstra wrote:
On Wed, Jan 22, 2025 at 11:56:13AM +0100, Arnd Bergmann wrote:
On Wed, Jan 22, 2025, at 11:04, Naresh Kamboju wrote:
On Tue, 21 Jan 2025 at 23:28, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote: 0000000000000000 <4>[ 160.712071] Call trace: <4>[ 160.712597] place_entity (kernel/sched/fair.c:5250 (discriminator 1)) <4>[ 160.713221] reweight_entity (kernel/sched/fair.c:3813) <4>[ 160.713802] update_cfs_group (kernel/sched/fair.c:3975 (discriminator 1)) <4>[ 160.714277] dequeue_entities (kernel/sched/fair.c:7091) <4>[ 160.714903] dequeue_task_fair (kernel/sched/fair.c:7144 (discriminator 1)) <4>[ 160.716502] move_queued_task.isra.0 (kernel/sched/core.c:2437 (discriminator 1))
I don't see anything that immediately sticks out as causing this, but I do see five scheduler patches backported in stable-rc on top of v6.12.8, these are the original commits:
66951e4860d3 ("sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE")
This one reworks reweight_entity(), but I've been running with that on top of 13-rc6 for a week or so and not seen this.
The offending commit is 6d71a9c6160479899ee744d2c6d6602a191deb1f "sched/fair: Fix EEVDF entity placement bug causing scheduling lag"
It works fine on 6.13, at least on RISC-V (which is the only arch I test).
It's already been reverted and 6.12.11-rc2 has been pushed out.
Hi Greg,
On 21/01/25 23:20, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.11 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 23 Jan 2025 17:45:02 +0000. Anything received after that time might be too late.
No problems seen on x86_64 and aarch64 with our testing.
Tested-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
Thanks, Harshit
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.11-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
linux-stable-mirror@lists.linaro.org