This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.12.6-rc1
Juergen Gross jgross@suse.com x86/xen: remove hypercall page
Juergen Gross jgross@suse.com x86/xen: use new hypercall functions instead of hypercall page
Juergen Gross jgross@suse.com x86/xen: add central hypercall functions
Juergen Gross jgross@suse.com x86/xen: don't do PV iret hypercall through hypercall page
Juergen Gross jgross@suse.com x86/static-call: provide a way to do very early static-call updates
Juergen Gross jgross@suse.com objtool/x86: allow syscall instruction
Juergen Gross jgross@suse.com x86: make get_cpu_vendor() accessible from Xen code
Juergen Gross jgross@suse.com xen/netfront: fix crash when removing device
James Morse james.morse@arm.com KVM: arm64: Disable MPAM visibility by default and ignore VMM writes
Miguel Ojeda ojeda@kernel.org rust: kbuild: set `bindgen`'s Rust target version
Nilay Shroff nilay@linux.ibm.com block: Fix potential deadlock while freezing queue and acquiring sysfs_lock
Ming Lei ming.lei@redhat.com blk-mq: move cpuhp callback registering out of q->sysfs_lock
Weizhao Ouyang o451686892@gmail.com kselftest/arm64: abi: fix SVCR detection
Nathan Chancellor nathan@kernel.org blk-iocost: Avoid using clamp() on inuse in __propagate_weights()
Lucas De Marchi lucas.demarchi@intel.com drm/xe/reg_sr: Remove register pool
Mirsad Todorovac mtodorovac69@gmail.com drm/xe: fix the ERR_PTR() returned on failure to allocate tiny pt
Robert Hodaszi robert.hodaszi@digi.com net: dsa: tag_ocelot_8021q: fix broken reception
Jesse Van Gavere jesseevg@gmail.com net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries
Nikita Yushchenko nikita.yoush@cogentembedded.com net: renesas: rswitch: fix initial MPIC register setting
Thadeu Lima de Souza Cascardo cascardo@igalia.com Bluetooth: btmtk: avoid UAF in btmtk_process_coredump
Iulia Tanasescu iulia.tanasescu@nxp.com Bluetooth: iso: Fix circular lock in iso_conn_big_sync
Iulia Tanasescu iulia.tanasescu@nxp.com Bluetooth: iso: Fix circular lock in iso_listen_bis
Frédéric Danis frederic.danis@collabora.com Bluetooth: SCO: Add support for 16 bits transparent voice setting
Iulia Tanasescu iulia.tanasescu@nxp.com Bluetooth: iso: Fix recursive locking warning
Iulia Tanasescu iulia.tanasescu@nxp.com Bluetooth: iso: Always release hdev at the end of iso_listen_bis
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
Daniil Tatianin d-tatianin@yandex-team.ru ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired
Charles Keepax ckeepax@opensource.cirrus.com ASoC: Intel: sof_sdw: Add space for a terminator into DAIs array
Daniel Borkmann daniel@iogearbox.net team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
Daniel Borkmann daniel@iogearbox.net team: Fix initial vlan_feature set in __team_compute_features
Daniel Borkmann daniel@iogearbox.net bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
Daniel Borkmann daniel@iogearbox.net bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features
Daniel Borkmann daniel@iogearbox.net net, team, bonding: Add netdev_base_features helper
Martin Ottens martin.ottens@fau.de net/sched: netem: account for backlog updates from child qdisc
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
Maxim Levitsky mlevitsk@redhat.com net: mana: Fix irq_contexts memory leak in mana_gd_setup_irqs
Maxim Levitsky mlevitsk@redhat.com net: mana: Fix memory leak in mana_gd_setup_irqs
Florian Westphal fw@strlen.de netfilter: nf_tables: do not defer rule destruction via call_rcu
Phil Sutter phil@nwl.cc netfilter: IDLETIMER: Fix for possible ABBA deadlock
Phil Sutter phil@nwl.cc selftests: netfilter: Stabilize rpath.sh
Shengjiu Wang shengjiu.wang@nxp.com ASoC: fsl_spdif: change IFACE_PCM to IFACE_MIXER
Shengjiu Wang shengjiu.wang@nxp.com ASoC: fsl_xcvr: change IFACE_PCM to IFACE_MIXER
James Clark james.clark@linaro.org libperf: evlist: Fix --cpu argument on hybrid platform
Michal Luczaj mhal@rbox.co Bluetooth: Improve setsockopt() handling of malformed user input
Shenghao Ding shenghao-ding@ti.com ASoC: tas2781: Fix calibration issue in stress test
Nikita Yushchenko nikita.yoush@cogentembedded.com net: renesas: rswitch: handle stop vs interrupt race
Nikita Yushchenko nikita.yoush@cogentembedded.com net: renesas: rswitch: avoid use-after-put for a device tree node
Nikita Yushchenko nikita.yoush@cogentembedded.com net: renesas: rswitch: fix leaked pointer on error path
Nikita Yushchenko nikita.yoush@cogentembedded.com net: renesas: rswitch: fix race window between tx start and complete
Nikita Yushchenko nikita.yoush@cogentembedded.com net: renesas: rswitch: fix possible early skb release
David Howells dhowells@redhat.com cifs: Fix rmdir failure due to ongoing I/O on deleted file
Petr Machata petrm@nvidia.com Documentation: networking: Add a caveat to nexthop_compat_mode sysctl
Michael Chan michael.chan@broadcom.com bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips
LongPing Wei weilongping@oppo.com block: get wp_offset by bdev_offset_from_zone_start
Paul Barker paul.barker.ct@bp.renesas.com Documentation: PM: Clarify pm_runtime_resume_and_get() return value
Venkata Prasad Potturu venkataprasad.potturu@amd.com ASoC: amd: yc: Fix the wrong return value
Takashi Iwai tiwai@suse.de ALSA: control: Avoid WARN() for symlink errors
Stefan Wahren wahrenst@gmx.net qca_spi: Make driver probing reliable
Stefan Wahren wahrenst@gmx.net qca_spi: Fix clock speed for multiple QCA7000
Anumula Murali Mohan Reddy anumula@chelsio.com cxgb4: use port number to set mac addr
Ilpo Järvinen ilpo.jarvinen@linux.intel.com ACPI: resource: Fix memory resource type union access
Daniel Machon daniel.machon@microchip.com net: sparx5: fix the maximum frame length register
Daniel Machon daniel.machon@microchip.com net: sparx5: fix FDMA performance issue
Christophe JAILLET christophe.jaillet@wanadoo.fr spi: aspeed: Fix an error handling path in aspeed_spi_[read|write]_user()
Philippe Simons simons.philippe@gmail.com regulator: axp20x: AXP717: set ramp_delay
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: perform error cleanup in ocelot_hwstamp_set()
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: be resilient to loss of PTP packets during transmission
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: ocelot->ts_id_lock and ocelot_port->tx_skbs.lock are IRQ-safe
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: improve handling of TX timestamp for unknown skb
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: fix memory leak on ocelot_port_add_txtstamp_skb()
Eric Dumazet edumazet@google.com net: defer final 'struct net' free in netns dismantle
Eric Dumazet edumazet@google.com net: lapb: increase LAPB_HEADER_LEN
Michael Chan michael.chan@broadcom.com bnxt_en: Fix GSO type for HW GRO packets on 5750X chips
Thomas Weißschuh linux@weissschuh.net ptp: kvm: x86: Return EOPNOTSUPP instead of ENODEV from kvm_arch_ptp_init()
Danielle Ratson danieller@nvidia.com selftests: mlxsw: sharedbuffer: Ensure no extra packets are counted
Danielle Ratson danieller@nvidia.com selftests: mlxsw: sharedbuffer: Remove duplicate test cases
Danielle Ratson danieller@nvidia.com selftests: mlxsw: sharedbuffer: Remove h1 ingress test case
Haoyu Li lihaoyu499@gmail.com wifi: cfg80211: sme: init n_channels before channels[] access
Dan Carpenter dan.carpenter@linaro.org net/mlx5: DR, prevent potential error pointer dereference
Eric Dumazet edumazet@google.com tipc: fix NULL deref in cleanup_bearer()
Remi Pommarel repk@triplefau.lt batman-adv: Do not let TT changes list grows indefinitely
Remi Pommarel repk@triplefau.lt batman-adv: Remove uninitialized data in full table TT response
Remi Pommarel repk@triplefau.lt batman-adv: Do not send uninitialized TT changes
David (Ming Qiang) Wu David.Wu3@amd.com amdgpu/uvd: get ring reference from rq scheduler
Suraj Sonawane surajsonawane0215@gmail.com acpi: nfit: vmalloc-out-of-bounds Read in acpi_nfit_ctl
Arnaldo Carvalho de Melo acme@kernel.org perf machine: Initialize machine->env to address a segfault
Benjamin Lin benjamin-jw.lin@mediatek.com wifi: mac80211: fix station NSS capability initialization order
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: mac80211: fix a queue stall in certain cases of CSA
Haoyu Li lihaoyu499@gmail.com wifi: mac80211: init cnt before accessing elem in ieee80211_copy_mbssid_beacon
Lin Ma linma@zju.edu.cn wifi: nl80211: fix NL80211_ATTR_MLO_LINK_ID off-by-one
Namhyung Kim namhyung@kernel.org perf tools: Fix build-id event recording
Kumar Kartikeya Dwivedi memxor@gmail.com bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
Michal Luczaj mhal@rbox.co bpf, sockmap: Fix update element with same
Michal Luczaj mhal@rbox.co bpf, sockmap: Fix race between element replace and close()
Jiri Olsa jolsa@kernel.org bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog
Jann Horn jannh@google.com bpf: Fix theoretical prog_array UAF in __uprobe_perf_func()
Kumar Kartikeya Dwivedi memxor@gmail.com bpf: Check size for BTF-based ctx access of pointer members
Darrick J. Wong djwong@kernel.org xfs: unlock inodes when erroring out of xfs_trans_alloc_dir
Darrick J. Wong djwong@kernel.org xfs: only run precommits once per transaction object
Darrick J. Wong djwong@kernel.org xfs: fix scrub tracepoints when inode-rooted btrees are involved
Darrick J. Wong djwong@kernel.org xfs: return from xfs_symlink_verify early on V4 filesystems
Darrick J. Wong djwong@kernel.org xfs: fix null bno_hint handling in xfs_rtallocate_rtg
Darrick J. Wong djwong@kernel.org xfs: return a 64-bit block count from xfs_btree_count_blocks
Darrick J. Wong djwong@kernel.org xfs: don't drop errno values when we fail to ficlone the entire range
Darrick J. Wong djwong@kernel.org xfs: update btree keys correctly when _insrec splits an inode root block
Darrick J. Wong djwong@kernel.org xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com drm/amdkfd: hard-code MALL cacheline size for gfx11, gfx12
Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com drm/amdkfd: hard-code cacheline size for gfx11
Andrew Martin Andrew.Martin@amd.com drm/amdkfd: Dereference null return value
Christian König christian.koenig@amd.com drm/amdgpu: fix when the cleaner shader is emitted
Kenneth Feng kenneth.feng@amd.com drm/amd/pm: Set SMU v13.0.7 default workload type
Christian König christian.koenig@amd.com drm/amdgpu: fix UVD contiguous CS mapping problem
Eugene Kobyak eugene.kobyak@intel.com drm/i915: Fix NULL pointer dereference in capture_engine
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/color: Stop using non-posted DSB writes for legacy LUT
Jiasheng Jiang jiashengjiangcool@outlook.com drm/i915: Fix memory leak by correcting cache object name in error handler
Jesse.zhang@amd.com Jesse.zhang@amd.com drm/amdkfd: pause autosuspend when creating pdd
Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com drm/xe: Call invalidation_fence_fini for PT inval fences in error state
Yi Liu yi.l.liu@intel.com iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
Lu Baolu baolu.lu@linux.intel.com iommu/vt-d: Remove cache tags before disabling ATS
Luis Claudio R. Goncalves lgoncalv@redhat.com iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context
Neal Frager neal.frager@amd.com usb: dwc3: xilinx: make sure pipe clock is deselected in usb2 only mode
Łukasz Bartosik ukaszb@chromium.org usb: typec: ucsi: Fix completion notifications
Lianqin Hu hulianqin@vivo.com usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer
Joe Hattori joe@pf.is.s.u-tokyo.ac.jp usb: typec: anx7411: fix OF node reference leaks in anx7411_typec_switch_probe()
Xu Yang xu.yang_2@nxp.com usb: dwc3: imx8mp: fix software node kernel dump
Joe Hattori joe@pf.is.s.u-tokyo.ac.jp usb: typec: anx7411: fix fwnode_handle reference leak
Vitalii Mordan mordan@ispras.ru usb: ehci-hcd: fix call balance of clocks handling routines
Takashi Iwai tiwai@suse.de usb: gadget: midi2: Fix interpretation of is_midi1 bits
liuderong liuderong@oppo.com scsi: ufs: core: Update compl_time_stamp_local_clock after completing a cqe
Stefan Wahren wahrenst@gmx.net usb: dwc2: Fix HCD port connection race
Stefan Wahren wahrenst@gmx.net usb: dwc2: hcd: Fix GetPortStatus & SetPortFeature
Stefan Wahren wahrenst@gmx.net usb: dwc2: Fix HCD resume
Joe Hattori joe@pf.is.s.u-tokyo.ac.jp ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys()
Kumar Kartikeya Dwivedi memxor@gmail.com bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
Xu Yang xu.yang_2@nxp.com usb: core: hcd: only check primary hcd skip_phy_initialization
Alan Borzeszkowski alan.borzeszkowski@linux.intel.com gpio: graniterapids: Check if GPIO line can be used for IRQs
Alan Borzeszkowski alan.borzeszkowski@linux.intel.com gpio: graniterapids: Determine if GPIO pad can be used by driver
Shankar Bandal shankar.bandal@intel.com gpio: graniterapids: Fix invalid RXEVCFG register bitmask
Shankar Bandal shankar.bandal@intel.com gpio: graniterapids: Fix invalid GPI_IS register offset
Alan Borzeszkowski alan.borzeszkowski@linux.intel.com gpio: graniterapids: Fix incorrect BAR assignment
Alan Borzeszkowski alan.borzeszkowski@linux.intel.com gpio: graniterapids: Fix vGPIO driver crash
Damien Le Moal dlemoal@kernel.org block: Ignore REQ_NOWAIT for zone reset and zone finish operations
Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz usb: host: max3421-hcd: Correctly abort a USB request.
Miguel Ojeda ojeda@kernel.org drm/panic: remove spurious empty line to clean warning
Chenghai Huang huangchenghai2@huawei.com crypto: hisilicon/debugfs - fix the struct pointer incorrectly offset problem
Alexandre Ghiti alexghiti@rivosinc.com riscv: Fix IPIs usage in kfence_protect_page()
Hridesh MG hridesh699@gmail.com ALSA: hda/realtek: Fix headset mic on Acer Nitro 5
Jaakko Salo jaakkos@gmail.com ALSA: usb-audio: Add implicit feedback quirk for Yamaha THR5
Haoyu Li lihaoyu499@gmail.com gpio: ljca: Initialize num before accessing item in ljca_gpio_config
Christian Loehle christian.loehle@arm.com spi: rockchip: Fix PM runtime count on no-op cs
Shakeel Butt shakeel.butt@linux.dev memcg: slub: fix SUnreclaim for post charged objects
Alan Borzeszkowski alan.borzeszkowski@linux.intel.com gpio: graniterapids: Fix GPIO Ack functionality
Damien Le Moal dlemoal@kernel.org block: Prevent potential deadlocks in zone write plug error recovery
Damien Le Moal dlemoal@kernel.org dm: Fix dm-zoned-reclaim zone write pointer alignment
Damien Le Moal dlemoal@kernel.org block: Use a zone write plug BIO work for REQ_NOWAIT BIOs
Damien Le Moal dlemoal@kernel.org block: Switch to using refcount_t for zone write plugs
Tejun Heo tj@kernel.org blk-cgroup: Fix UAF in blkcg_unpin_online()
Alexandre Ghiti alexghiti@rivosinc.com riscv: Fix wrong usage of __pa() on a fixmap address
Björn Töpel bjorn@rivosinc.com riscv: mm: Do not call pmd dtor on vmemmap page table teardown
Koichiro Den koichiro.den@canonical.com virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
Koichiro Den koichiro.den@canonical.com virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
Koichiro Den koichiro.den@canonical.com virtio_net: correct netdev_tx_reset_queue() invocation point
Kuan-Wei Chiu visitorckw@gmail.com perf ftrace: Fix undefined behavior in cmp_profile_data()
MoYuanhao moyuanhao3676@163.com tcp: check space before adding MPTCP SYN options
Frederik Deweerdt deweerdt.lkml@gmail.com splice: do not checksum AF_UNIX sockets
Namjae Jeon linkinjeon@kernel.org ksmbd: fix racy issue from session lookup and expire
Christian Marangi ansuelsmth@gmail.com clk: en7523: Fix wrong BUS clock for EN7581
Kan Liang kan.liang@linux.intel.com perf/x86/intel/ds: Unconditionally drain PEBS DS when changing PEBS_DATA_CFG
Juri Lelli juri.lelli@redhat.com sched/deadline: Fix replenish_dl_new_period dl_server condition
Jann Horn jannh@google.com bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors
Claudiu Beznea claudiu.beznea.uj@bp.renesas.com serial: sh-sci: Check if TX data was written to device in .tx_empty()
Radhey Shyam Pandey radhey.shyam.pandey@amd.com usb: misc: onboard_usb_dev: skip suspend/resume sequence for USB5744 SMBus support
-------------
Diffstat:
Documentation/networking/ip-sysctl.rst | 6 + Documentation/power/runtime_pm.rst | 4 +- Makefile | 4 +- arch/arm64/kvm/sys_regs.c | 55 ++- arch/riscv/include/asm/kfence.h | 4 +- arch/riscv/kernel/setup.c | 2 +- arch/riscv/mm/init.c | 7 +- arch/x86/events/intel/ds.c | 2 +- arch/x86/include/asm/processor.h | 2 + arch/x86/include/asm/static_call.h | 15 + arch/x86/include/asm/sync_core.h | 6 +- arch/x86/include/asm/xen/hypercall.h | 36 +- arch/x86/kernel/callthunks.c | 5 - arch/x86/kernel/cpu/common.c | 38 +- arch/x86/kernel/static_call.c | 9 + arch/x86/xen/enlighten.c | 64 ++- arch/x86/xen/enlighten_hvm.c | 13 +- arch/x86/xen/enlighten_pv.c | 4 +- arch/x86/xen/enlighten_pvh.c | 7 - arch/x86/xen/xen-asm.S | 50 +- arch/x86/xen/xen-head.S | 106 ++++- arch/x86/xen/xen-ops.h | 9 + block/blk-cgroup.c | 6 +- block/blk-iocost.c | 9 +- block/blk-mq-sysfs.c | 16 +- block/blk-mq.c | 127 ++++- block/blk-sysfs.c | 4 +- block/blk-zoned.c | 526 +++++++++------------ drivers/acpi/acpica/evxfregn.c | 2 - drivers/acpi/nfit/core.c | 7 +- drivers/acpi/resource.c | 6 +- drivers/ata/sata_highbank.c | 1 + drivers/bluetooth/btmtk.c | 20 +- drivers/clk/clk-en7523.c | 5 +- drivers/crypto/hisilicon/debugfs.c | 4 +- drivers/gpio/gpio-graniterapids.c | 52 +- drivers/gpio/gpio-ljca.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 17 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 24 +- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 15 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 23 +- .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 12 +- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 1 + drivers/gpu/drm/drm_panic_qr.rs | 1 - drivers/gpu/drm/i915/display/intel_color.c | 30 +- drivers/gpu/drm/i915/i915_gpu_error.c | 18 +- drivers/gpu/drm/i915/i915_scheduler.c | 2 +- drivers/gpu/drm/xe/tests/xe_migrate.c | 4 +- drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 8 + drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h | 1 + drivers/gpu/drm/xe/xe_pt.c | 3 +- drivers/gpu/drm/xe/xe_reg_sr.c | 31 +- drivers/gpu/drm/xe/xe_reg_sr_types.h | 6 - drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 2 +- drivers/iommu/intel/cache.c | 34 +- drivers/iommu/intel/iommu.c | 4 +- drivers/md/dm-zoned-reclaim.c | 6 +- drivers/net/bonding/bond_main.c | 10 +- drivers/net/dsa/microchip/ksz_common.c | 42 +- drivers/net/dsa/ocelot/felix_vsc9959.c | 17 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 14 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 9 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 5 +- .../mellanox/mlx5/core/steering/dr_domain.c | 4 +- .../net/ethernet/microchip/sparx5/sparx5_main.c | 11 +- .../net/ethernet/microchip/sparx5/sparx5_port.c | 2 +- drivers/net/ethernet/microsoft/mana/gdma_main.c | 6 +- drivers/net/ethernet/mscc/ocelot_ptp.c | 207 ++++---- drivers/net/ethernet/qualcomm/qca_spi.c | 26 +- drivers/net/ethernet/qualcomm/qca_spi.h | 1 - drivers/net/ethernet/renesas/rswitch.c | 95 ++-- drivers/net/ethernet/renesas/rswitch.h | 14 +- drivers/net/team/team_core.c | 11 +- drivers/net/virtio_net.c | 24 +- drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c | 2 +- drivers/net/xen-netfront.c | 5 +- drivers/ptp/ptp_kvm_x86.c | 6 +- drivers/regulator/axp20x-regulator.c | 36 +- drivers/spi/spi-aspeed-smc.c | 10 +- drivers/spi/spi-rockchip.c | 14 + drivers/tty/serial/sh-sci.c | 29 ++ drivers/ufs/core/ufshcd.c | 1 + drivers/usb/core/hcd.c | 8 +- drivers/usb/dwc2/hcd.c | 19 +- drivers/usb/dwc3/dwc3-imx8mp.c | 30 +- drivers/usb/dwc3/dwc3-xilinx.c | 5 +- drivers/usb/gadget/function/f_midi2.c | 6 +- drivers/usb/gadget/function/u_serial.c | 9 +- drivers/usb/host/ehci-sh.c | 9 +- drivers/usb/host/max3421-hcd.c | 16 +- drivers/usb/misc/onboard_usb_dev.c | 4 +- drivers/usb/typec/anx7411.c | 66 ++- drivers/usb/typec/ucsi/ucsi.c | 6 +- drivers/virtio/virtio_ring.c | 6 +- fs/smb/client/inode.c | 5 +- fs/smb/server/auth.c | 2 + fs/smb/server/mgmt/user_session.c | 6 +- fs/smb/server/server.c | 4 +- fs/smb/server/smb2pdu.c | 27 +- fs/xfs/libxfs/xfs_btree.c | 33 +- fs/xfs/libxfs/xfs_btree.h | 2 +- fs/xfs/libxfs/xfs_ialloc_btree.c | 4 +- fs/xfs/libxfs/xfs_symlink_remote.c | 4 +- fs/xfs/scrub/agheader.c | 6 +- fs/xfs/scrub/agheader_repair.c | 6 +- fs/xfs/scrub/fscounters.c | 2 +- fs/xfs/scrub/ialloc.c | 4 +- fs/xfs/scrub/refcount.c | 2 +- fs/xfs/scrub/symlink_repair.c | 3 +- fs/xfs/scrub/trace.h | 2 +- fs/xfs/xfs_bmap_util.c | 2 +- fs/xfs/xfs_file.c | 8 + fs/xfs/xfs_rtalloc.c | 2 +- fs/xfs/xfs_trans.c | 19 +- include/linux/blkdev.h | 5 +- include/linux/bpf.h | 19 +- include/linux/compiler.h | 39 +- include/linux/dsa/ocelot.h | 1 + include/linux/netdev_features.h | 7 + include/linux/static_call.h | 1 + include/linux/virtio.h | 3 +- include/net/bluetooth/bluetooth.h | 10 +- include/net/lapb.h | 2 +- include/net/mac80211.h | 4 +- include/net/net_namespace.h | 1 + include/net/netfilter/nf_tables.h | 4 - include/soc/mscc/ocelot.h | 2 - kernel/bpf/btf.c | 149 +++++- kernel/bpf/verifier.c | 79 +--- kernel/sched/deadline.c | 2 +- kernel/static_call_inline.c | 2 +- kernel/trace/bpf_trace.c | 11 + kernel/trace/trace_uprobe.c | 6 +- mm/slub.c | 21 +- net/batman-adv/translation-table.c | 58 ++- net/bluetooth/hci_event.c | 33 +- net/bluetooth/hci_sock.c | 14 +- net/bluetooth/iso.c | 71 ++- net/bluetooth/l2cap_sock.c | 20 +- net/bluetooth/rfcomm/sock.c | 9 +- net/bluetooth/sco.c | 40 +- net/core/net_namespace.c | 20 +- net/core/sock_map.c | 6 +- net/dsa/tag_ocelot_8021q.c | 2 +- net/ipv4/tcp_output.c | 6 +- net/mac80211/cfg.c | 9 +- net/mac80211/ieee80211_i.h | 49 +- net/mac80211/iface.c | 12 +- net/mac80211/mlme.c | 2 - net/mac80211/util.c | 23 +- net/netfilter/nf_tables_api.c | 32 +- net/netfilter/xt_IDLETIMER.c | 52 +- net/sched/sch_netem.c | 22 +- net/tipc/udp_media.c | 7 +- net/unix/af_unix.c | 1 + net/wireless/nl80211.c | 2 +- net/wireless/sme.c | 1 + rust/Makefile | 15 +- sound/core/control_led.c | 14 +- sound/pci/hda/patch_realtek.c | 1 + sound/soc/amd/yc/acp6x-mach.c | 13 +- sound/soc/codecs/tas2781-i2c.c | 2 +- sound/soc/fsl/fsl_spdif.c | 2 +- sound/soc/fsl/fsl_xcvr.c | 2 +- sound/soc/intel/boards/sof_sdw.c | 8 +- sound/usb/quirks.c | 2 + tools/lib/perf/evlist.c | 18 +- tools/objtool/check.c | 9 +- tools/perf/builtin-ftrace.c | 3 +- tools/perf/util/build-id.c | 4 +- tools/perf/util/machine.c | 2 + .../testing/selftests/arm64/abi/syscall-abi-asm.S | 32 +- .../selftests/bpf/progs/test_tp_btf_nullable.c | 6 +- .../selftests/bpf/progs/verifier_btf_ctx_access.c | 4 +- .../testing/selftests/bpf/progs/verifier_d_path.c | 4 +- .../selftests/drivers/net/mlxsw/sharedbuffer.sh | 55 ++- tools/testing/selftests/net/netfilter/rpath.sh | 18 +- 182 files changed, 2200 insertions(+), 1299 deletions(-)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Radhey Shyam Pandey radhey.shyam.pandey@amd.com
commit ce15d6b3d5c3c6f78290066be0f0a4fd89cdeb5b upstream.
USB5744 SMBus initialization is done once in probe() and doing it in resume is not supported so avoid going into suspend and reset the HUB.
There is a sysfs property 'always_powered_in_suspend' to implement this feature but since default state should be set to a working configuration so override this property value.
It fixes the suspend/resume testcase on Kria KR260 Robotics Starter Kit.
Fixes: 6782311d04df ("usb: misc: onboard_usb_dev: add Microchip usb5744 SMBus programming support") Cc: stable@vger.kernel.org Signed-off-by: Radhey Shyam Pandey radhey.shyam.pandey@amd.com Link: https://lore.kernel.org/r/1733165302-1694891-1-git-send-email-radhey.shyam.p... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/misc/onboard_usb_dev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/misc/onboard_usb_dev.c b/drivers/usb/misc/onboard_usb_dev.c index 36b11127280f..75ac3c6aa92d 100644 --- a/drivers/usb/misc/onboard_usb_dev.c +++ b/drivers/usb/misc/onboard_usb_dev.c @@ -407,8 +407,10 @@ static int onboard_dev_probe(struct platform_device *pdev) }
if (of_device_is_compatible(pdev->dev.of_node, "usb424,2744") || - of_device_is_compatible(pdev->dev.of_node, "usb424,5744")) + of_device_is_compatible(pdev->dev.of_node, "usb424,5744")) { err = onboard_dev_5744_i2c_init(client); + onboard_dev->always_powered_in_suspend = true; + }
put_device(&client->dev); if (err < 0)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com
commit 7cc0e0a43a91052477c2921f924a37d9c3891f0c upstream.
On the Renesas RZ/G3S, when doing suspend to RAM, the uart_suspend_port() is called. The uart_suspend_port() calls 3 times the struct uart_port::ops::tx_empty() before shutting down the port.
According to the documentation, the struct uart_port::ops::tx_empty() API tests whether the transmitter FIFO and shifter for the port is empty.
The Renesas RZ/G3S SCIFA IP reports the number of data units stored in the transmit FIFO through the FDR (FIFO Data Count Register). The data units in the FIFOs are written in the shift register and transmitted from there. The TEND bit in the Serial Status Register reports if the data was transmitted from the shift register.
In the previous code, in the tx_empty() API implemented by the sh-sci driver, it is considered that the TX is empty if the hardware reports the TEND bit set and the number of data units in the FIFO is zero.
According to the HW manual, the TEND bit has the following meaning:
0: Transmission is in the waiting state or in progress. 1: Transmission is completed.
It has been noticed that when opening the serial device w/o using it and then switch to a power saving mode, the tx_empty() call in the uart_port_suspend() function fails, leading to the "Unable to drain transmitter" message being printed on the console. This is because the TEND=0 if nothing has been transmitted and the FIFOs are empty. As the TEND=0 has double meaning (waiting state, in progress) we can't determined the scenario described above.
Add a software workaround for this. This sets a variable if any data has been sent on the serial console (when using PIO) or if the DMA callback has been called (meaning something has been transmitted). In the tx_empty() API the status of the DMA transaction is also checked and if it is completed or in progress the code falls back in checking the hardware registers instead of relying on the software variable.
Fixes: 73a19e4c0301 ("serial: sh-sci: Add DMA support.") Cc: stable@vger.kernel.org Signed-off-by: Claudiu Beznea claudiu.beznea.uj@bp.renesas.com Link: https://lore.kernel.org/r/20241125115856.513642-1-claudiu.beznea.uj@bp.renes... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sh-sci.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
--- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -157,6 +157,7 @@ struct sci_port {
bool has_rtscts; bool autorts; + bool tx_occurred; };
#define SCI_NPORTS CONFIG_SERIAL_SH_SCI_NR_UARTS @@ -850,6 +851,7 @@ static void sci_transmit_chars(struct ua { struct tty_port *tport = &port->state->port; unsigned int stopped = uart_tx_stopped(port); + struct sci_port *s = to_sci_port(port); unsigned short status; unsigned short ctrl; int count; @@ -885,6 +887,7 @@ static void sci_transmit_chars(struct ua }
sci_serial_out(port, SCxTDR, c); + s->tx_occurred = true;
port->icount.tx++; } while (--count > 0); @@ -1241,6 +1244,8 @@ static void sci_dma_tx_complete(void *ar if (kfifo_len(&tport->xmit_fifo) < WAKEUP_CHARS) uart_write_wakeup(port);
+ s->tx_occurred = true; + if (!kfifo_is_empty(&tport->xmit_fifo)) { s->cookie_tx = 0; schedule_work(&s->work_tx); @@ -1731,6 +1736,19 @@ static void sci_flush_buffer(struct uart s->cookie_tx = -EINVAL; } } + +static void sci_dma_check_tx_occurred(struct sci_port *s) +{ + struct dma_tx_state state; + enum dma_status status; + + if (!s->chan_tx) + return; + + status = dmaengine_tx_status(s->chan_tx, s->cookie_tx, &state); + if (status == DMA_COMPLETE || status == DMA_IN_PROGRESS) + s->tx_occurred = true; +} #else /* !CONFIG_SERIAL_SH_SCI_DMA */ static inline void sci_request_dma(struct uart_port *port) { @@ -1740,6 +1758,10 @@ static inline void sci_free_dma(struct u { }
+static void sci_dma_check_tx_occurred(struct sci_port *s) +{ +} + #define sci_flush_buffer NULL #endif /* !CONFIG_SERIAL_SH_SCI_DMA */
@@ -2076,6 +2098,12 @@ static unsigned int sci_tx_empty(struct { unsigned short status = sci_serial_in(port, SCxSR); unsigned short in_tx_fifo = sci_txfill(port); + struct sci_port *s = to_sci_port(port); + + sci_dma_check_tx_occurred(s); + + if (!s->tx_occurred) + return TIOCSER_TEMT;
return (status & SCxSR_TEND(port)) && !in_tx_fifo ? TIOCSER_TEMT : 0; } @@ -2247,6 +2275,7 @@ static int sci_startup(struct uart_port
dev_dbg(port->dev, "%s(%d)\n", __func__, port->line);
+ s->tx_occurred = false; sci_request_dma(port);
ret = sci_request_irq(s);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
commit ef1b808e3b7c98612feceedf985c2fbbeb28f956 upstream.
Uprobes always use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But it is possible to attach a non-sleepable BPF program to a uprobe, and non-sleepable BPF programs are freed via normal RCU (see __bpf_prog_put_noref()). This leads to UAF of the bpf_prog because a normal RCU grace period does not imply a tasks-trace-RCU grace period.
Fix it by explicitly waiting for a tasks-trace-RCU grace period after removing the attachment of a bpf_prog to a perf_event.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Suggested-by: Andrii Nakryiko andrii@kernel.org Suggested-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20241210-bpf-fix-actual-uprobe-uaf-v1-1-19439849... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/bpf_trace.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2223,6 +2223,13 @@ void perf_event_detach_bpf_prog(struct p bpf_prog_array_free_sleepable(old_array); }
+ /* + * It could be that the bpf_prog is not sleepable (and will be freed + * via normal RCU), but is called from a point that supports sleepable + * programs and uses tasks-trace-RCU. + */ + synchronize_rcu_tasks_trace(); + bpf_prog_put(event->prog); event->prog = NULL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juri Lelli juri.lelli@redhat.com
commit 22368fe1f9bbf39db2b5b52859589883273e80ce upstream.
The condition in replenish_dl_new_period() that checks if a reservation (dl_server) is deferred and is not handling a starvation case is obviously wrong.
Fix it.
Fixes: a110a81c52a9 ("sched/deadline: Deferrable dl server") Signed-off-by: Juri Lelli juri.lelli@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20241127063740.8278-1-juri.lelli@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/deadline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index d9d5a702f1a6..206691d35b7d 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -781,7 +781,7 @@ static inline void replenish_dl_new_period(struct sched_dl_entity *dl_se, * If it is a deferred reservation, and the server * is not handling an starvation case, defer it. */ - if (dl_se->dl_defer & !dl_se->dl_defer_running) { + if (dl_se->dl_defer && !dl_se->dl_defer_running) { dl_se->dl_throttled = 1; dl_se->dl_defer_armed = 1; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kan Liang kan.liang@linux.intel.com
commit 9f3de72a0c37005f897d69e4bdd59c25b8898447 upstream.
The PEBS kernel warnings can still be observed with the below case.
when the below commands are running in parallel for a while.
while true; do perf record --no-buildid -a --intr-regs=AX \ -e cpu/event=0xd0,umask=0x81/pp \ -c 10003 -o /dev/null ./triad; done &
while true; do perf record -e 'cpu/mem-loads,ldlat=3/uP' -W -d -- ./dtlb done
The commit b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") intends to flush the entire PEBS buffer before the hardware is reprogrammed. However, it fails in the above case.
The first perf command utilizes the large PEBS, while the second perf command only utilizes a single PEBS. When the second perf event is added, only the n_pebs++. The intel_pmu_pebs_enable() is invoked after intel_pmu_pebs_add(). So the cpuc->n_pebs == cpuc->n_large_pebs check in the intel_pmu_drain_large_pebs() fails. The PEBS DS is not flushed. The new PEBS event should not be taken into account when flushing the existing PEBS DS.
The check is unnecessary here. Before the hardware is reprogrammed, all the stale records must be drained unconditionally.
For single PEBS or PEBS-vi-pt, the DS must be empty. The drain_pebs() can handle the empty case. There is no harm to unconditionally drain the PEBS DS.
Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241119135504.1463839-2-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/ds.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1468,7 +1468,7 @@ void intel_pmu_pebs_enable(struct perf_e * hence we need to drain when changing said * size. */ - intel_pmu_drain_large_pebs(cpuc); + intel_pmu_drain_pebs_buffer(); adaptive_pebs_record_size_update(); wrmsrl(MSR_PEBS_DATA_CFG, pebs_data_cfg); cpuc->active_pebs_data_cfg = pebs_data_cfg;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Marangi ansuelsmth@gmail.com
commit 2eb75f86d52565367211c51334d15fe672633085 upstream.
The Documentation for EN7581 had a typo and still referenced the EN7523 BUS base source frequency. This was in conflict with a different page in the Documentration that state that the BUS runs at 300MHz (600MHz source with divisor set to 2) and the actual watchdog that tick at half the BUS clock (150MHz). This was verified with the watchdog by timing the seconds that the system takes to reboot (due too watchdog) and by operating on different values of the BUS divisor.
The correct values for source of BUS clock are 600MHz and 540MHz.
This was also confirmed by Airoha.
Cc: stable@vger.kernel.org Fixes: 66bc47326ce2 ("clk: en7523: Add EN7581 support") Signed-off-by: Christian Marangi ansuelsmth@gmail.com Link: https://lore.kernel.org/r/20241116105710.19748-1-ansuelsmth@gmail.com Acked-by: Lorenzo Bianconi lorenzo@kernel.org Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/clk/clk-en7523.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/clk/clk-en7523.c +++ b/drivers/clk/clk-en7523.c @@ -92,6 +92,7 @@ static const u32 slic_base[] = { 1000000 static const u32 npu_base[] = { 333000000, 400000000, 500000000 }; /* EN7581 */ static const u32 emi7581_base[] = { 540000000, 480000000, 400000000, 300000000 }; +static const u32 bus7581_base[] = { 600000000, 540000000 }; static const u32 npu7581_base[] = { 800000000, 750000000, 720000000, 600000000 }; static const u32 crypto_base[] = { 540000000, 480000000 };
@@ -227,8 +228,8 @@ static const struct en_clk_desc en7581_b .base_reg = REG_BUS_CLK_DIV_SEL, .base_bits = 1, .base_shift = 8, - .base_values = bus_base, - .n_base_values = ARRAY_SIZE(bus_base), + .base_values = bus7581_base, + .n_base_values = ARRAY_SIZE(bus7581_base),
.div_bits = 3, .div_shift = 0,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
commit b95629435b84b9ecc0c765995204a4d8a913ed52 upstream.
Increment the session reference count within the lock for lookup to avoid racy issue with session expire.
Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-25737 Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/auth.c | 2 ++ fs/smb/server/mgmt/user_session.c | 6 +++++- fs/smb/server/server.c | 4 ++-- fs/smb/server/smb2pdu.c | 27 ++++++++++++++------------- 4 files changed, 23 insertions(+), 16 deletions(-)
--- a/fs/smb/server/auth.c +++ b/fs/smb/server/auth.c @@ -1016,6 +1016,8 @@ static int ksmbd_get_encryption_key(stru
ses_enc_key = enc ? sess->smb3encryptionkey : sess->smb3decryptionkey; + if (enc) + ksmbd_user_session_get(sess); memcpy(key, ses_enc_key, SMB3_ENC_DEC_KEY_SIZE);
return 0; --- a/fs/smb/server/mgmt/user_session.c +++ b/fs/smb/server/mgmt/user_session.c @@ -263,8 +263,10 @@ struct ksmbd_session *ksmbd_session_look
down_read(&conn->session_lock); sess = xa_load(&conn->sessions, id); - if (sess) + if (sess) { sess->last_active = jiffies; + ksmbd_user_session_get(sess); + } up_read(&conn->session_lock); return sess; } @@ -275,6 +277,8 @@ struct ksmbd_session *ksmbd_session_look
down_read(&sessions_table_lock); sess = __session_lookup(id); + if (sess) + ksmbd_user_session_get(sess); up_read(&sessions_table_lock);
return sess; --- a/fs/smb/server/server.c +++ b/fs/smb/server/server.c @@ -241,14 +241,14 @@ send: if (work->tcon) ksmbd_tree_connect_put(work->tcon); smb3_preauth_hash_rsp(work); - if (work->sess) - ksmbd_user_session_put(work->sess); if (work->sess && work->sess->enc && work->encrypted && conn->ops->encrypt_resp) { rc = conn->ops->encrypt_resp(work); if (rc < 0) conn->ops->set_rsp_status(work, STATUS_DATA_ERROR); } + if (work->sess) + ksmbd_user_session_put(work->sess);
ksmbd_conn_write(work); } --- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -67,8 +67,10 @@ static inline bool check_session_id(stru return false;
sess = ksmbd_session_lookup_all(conn, id); - if (sess) + if (sess) { + ksmbd_user_session_put(sess); return true; + } pr_err("Invalid user session id: %llu\n", id); return false; } @@ -605,10 +607,8 @@ int smb2_check_user_session(struct ksmbd
/* Check for validity of user session */ work->sess = ksmbd_session_lookup_all(conn, sess_id); - if (work->sess) { - ksmbd_user_session_get(work->sess); + if (work->sess) return 1; - } ksmbd_debug(SMB, "Invalid user session, Uid %llu\n", sess_id); return -ENOENT; } @@ -1701,29 +1701,35 @@ int smb2_sess_setup(struct ksmbd_work *w
if (conn->dialect != sess->dialect) { rc = -EINVAL; + ksmbd_user_session_put(sess); goto out_err; }
if (!(req->hdr.Flags & SMB2_FLAGS_SIGNED)) { rc = -EINVAL; + ksmbd_user_session_put(sess); goto out_err; }
if (strncmp(conn->ClientGUID, sess->ClientGUID, SMB2_CLIENT_GUID_SIZE)) { rc = -ENOENT; + ksmbd_user_session_put(sess); goto out_err; }
if (sess->state == SMB2_SESSION_IN_PROGRESS) { rc = -EACCES; + ksmbd_user_session_put(sess); goto out_err; }
if (sess->state == SMB2_SESSION_EXPIRED) { rc = -EFAULT; + ksmbd_user_session_put(sess); goto out_err; } + ksmbd_user_session_put(sess);
if (ksmbd_conn_need_reconnect(conn)) { rc = -EFAULT; @@ -1731,7 +1737,8 @@ int smb2_sess_setup(struct ksmbd_work *w goto out_err; }
- if (ksmbd_session_lookup(conn, sess_id)) { + sess = ksmbd_session_lookup(conn, sess_id); + if (!sess) { rc = -EACCES; goto out_err; } @@ -1742,7 +1749,6 @@ int smb2_sess_setup(struct ksmbd_work *w }
conn->binding = true; - ksmbd_user_session_get(sess); } else if ((conn->dialect < SMB30_PROT_ID || server_conf.flags & KSMBD_GLOBAL_FLAG_SMB3_MULTICHANNEL) && (req->Flags & SMB2_SESSION_REQ_FLAG_BINDING)) { @@ -1769,7 +1775,6 @@ int smb2_sess_setup(struct ksmbd_work *w }
conn->binding = false; - ksmbd_user_session_get(sess); } work->sess = sess;
@@ -2195,9 +2200,9 @@ err_out: int smb2_session_logoff(struct ksmbd_work *work) { struct ksmbd_conn *conn = work->conn; + struct ksmbd_session *sess = work->sess; struct smb2_logoff_req *req; struct smb2_logoff_rsp *rsp; - struct ksmbd_session *sess; u64 sess_id; int err;
@@ -2219,11 +2224,6 @@ int smb2_session_logoff(struct ksmbd_wor ksmbd_close_session_fds(work); ksmbd_conn_wait_idle(conn);
- /* - * Re-lookup session to validate if session is deleted - * while waiting request complete - */ - sess = ksmbd_session_lookup_all(conn, sess_id); if (ksmbd_tree_conn_session_logoff(sess)) { ksmbd_debug(SMB, "Invalid tid %d\n", req->hdr.Id.SyncId.TreeId); rsp->hdr.Status = STATUS_NETWORK_NAME_DELETED; @@ -8962,6 +8962,7 @@ int smb3_decrypt_req(struct ksmbd_work * le64_to_cpu(tr_hdr->SessionId)); return -ECONNABORTED; } + ksmbd_user_session_put(sess);
iov[0].iov_base = buf; iov[0].iov_len = sizeof(struct smb2_transform_hdr) + 4;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederik Deweerdt deweerdt.lkml@gmail.com
commit 6bd8614fc2d076fc21b7488c9f279853960964e2 upstream.
When `skb_splice_from_iter` was introduced, it inadvertently added checksumming for AF_UNIX sockets. This resulted in significant slowdowns, for example when using sendfile over unix sockets.
Using the test code in [1] in my test setup (2G single core qemu), the client receives a 1000M file in: - without the patch: 1482ms (+/- 36ms) - with the patch: 652.5ms (+/- 22.9ms)
This commit addresses the issue by marking checksumming as unnecessary in `unix_stream_sendmsg`
Cc: stable@vger.kernel.org Signed-off-by: Frederik Deweerdt deweerdt.lkml@gmail.com Fixes: 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES") Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Joe Damato jdamato@fastly.com Link: https://patch.msgid.link/Z1fMaHkRf8cfubuE@xiberoa Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/unix/af_unix.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2313,6 +2313,7 @@ static int unix_stream_sendmsg(struct so fds_sent = true;
if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) { + skb->ip_summed = CHECKSUM_UNNECESSARY; err = skb_splice_from_iter(skb, &msg->msg_iter, size, sk->sk_allocation); if (err < 0) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: MoYuanhao moyuanhao3676@163.com
commit 06d64ab46f19ac12f59a1d2aa8cd196b2e4edb5b upstream.
Ensure there is enough space before adding MPTCP options in tcp_syn_options().
Without this check, 'remaining' could underflow, and causes issues. If there is not enough space, MPTCP should not be used.
Signed-off-by: MoYuanhao moyuanhao3676@163.com Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections") Cc: stable@vger.kernel.org Acked-by: Matthieu Baerts (NGI0) matttbe@kernel.org [ Matt: Add Fixes, cc Stable, update Description ] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20241209-net-mptcp-check-space-syn-v1-1-2da992bb6f7... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_output.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -883,8 +883,10 @@ static unsigned int tcp_syn_options(stru unsigned int size;
if (mptcp_syn_options(sk, skb, &size, &opts->mptcp)) { - opts->options |= OPTION_MPTCP; - remaining -= size; + if (remaining >= size) { + opts->options |= OPTION_MPTCP; + remaining -= size; + } } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuan-Wei Chiu visitorckw@gmail.com
commit 246dfe3dc199246bd64635163115f2691623fc53 upstream.
The comparison function cmp_profile_data() violates the C standard's requirements for qsort() comparison functions, which mandate symmetry and transitivity:
* Symmetry: If x < y, then y > x. * Transitivity: If x < y and y < z, then x < z.
When v1 and v2 are equal, the function incorrectly returns 1, breaking symmetry and transitivity. This causes undefined behavior, which can lead to memory corruption in certain versions of glibc [1].
Fix the issue by returning 0 when v1 and v2 are equal, ensuring compliance with the C standard and preventing undefined behavior.
Link: https://www.qualys.com/2024/01/30/qsort.txt [1] Fixes: 0f223813edd0 ("perf ftrace: Add 'profile' command") Fixes: 74ae366c37b7 ("perf ftrace profile: Add -s/--sort option") Cc: stable@vger.kernel.org Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com Reviewed-by: Namhyung Kim namhyung@kernel.org Reviewed-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: jserv@ccns.ncku.edu.tw Cc: chuang@cs.nycu.edu.tw Link: https://lore.kernel.org/r/20241209134226.1939163-1-visitorckw@gmail.com Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/builtin-ftrace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c index 272d3c70810e..a56cf8b0a7d4 100644 --- a/tools/perf/builtin-ftrace.c +++ b/tools/perf/builtin-ftrace.c @@ -1151,8 +1151,9 @@ static int cmp_profile_data(const void *a, const void *b)
if (v1 > v2) return -1; - else + if (v1 < v2) return 1; + return 0; }
static void print_profile_result(struct perf_ftrace *ftrace)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Koichiro Den koichiro.den@canonical.com
commit 3ddccbefebdbe0c4c72a248676e4d39ac66a8e26 upstream.
When virtnet_close is followed by virtnet_open, some TX completions can possibly remain unconsumed, until they are finally processed during the first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash [1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call before RX napi enable") was not sufficient to eliminate all BQL crash cases for virtio-net.
This issue can be reproduced with the latest net-next master by running: `while :; do ip l set DEV down; ip l set DEV up; done` under heavy network TX load from inside the machine.
netdev_tx_reset_queue() can actually be dropped from virtnet_open path; the device is not stopped in any case. For BQL core part, it's just like traffic nearly ceases to exist for some period. For stall detector added to BQL, even if virtnet_close could somehow lead to some TX completions delayed for long, followed by virtnet_open, we can just take it as stall as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector based on BQL"). Note also that users can still reset stall_max via sysfs.
So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This eliminates the BQL crashes. As a result, netdev_tx_reset_queue() is now explicitly required in freeze/restore path. This patch adds it to immediately after free_unused_bufs(), following the rule of thumb: netdev_tx_reset_queue() should follow any SKB freeing not followed by netdev_tx_completed_queue(). This seems the most consistent and streamlined approach, and now netdev_tx_reset_queue() runs whenever free_unused_bufs() is done.
[1]: ------------[ cut here ]------------ kernel BUG at lib/dynamic_queue_limits.c:99! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2 Tainted: [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \ BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:dql_completed+0x26b/0x290 Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d 4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01 d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009 RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000 RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001 R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40 FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0 PKRU: 55555554 Call Trace: <IRQ> ? die+0x32/0x80 ? do_trap+0xd9/0x100 ? dql_completed+0x26b/0x290 ? dql_completed+0x26b/0x290 ? do_error_trap+0x6d/0xb0 ? dql_completed+0x26b/0x290 ? exc_invalid_op+0x4c/0x60 ? dql_completed+0x26b/0x290 ? asm_exc_invalid_op+0x16/0x20 ? dql_completed+0x26b/0x290 __free_old_xmit+0xff/0x170 [virtio_net] free_old_xmit+0x54/0xc0 [virtio_net] virtnet_poll+0xf4/0xe30 [virtio_net] ? __update_load_avg_cfs_rq+0x264/0x2d0 ? update_curr+0x35/0x260 ? reweight_entity+0x1be/0x260 __napi_poll.constprop.0+0x28/0x1c0 net_rx_action+0x329/0x420 ? enqueue_hrtimer+0x35/0x90 ? trace_hardirqs_on+0x1d/0x80 ? kvm_sched_clock_read+0xd/0x20 ? sched_clock+0xc/0x30 ? kvm_sched_clock_read+0xd/0x20 ? sched_clock+0xc/0x30 ? sched_clock_cpu+0xd/0x1a0 handle_softirqs+0x138/0x3e0 do_softirq.part.0+0x89/0xc0 </IRQ> <TASK> __local_bh_enable_ip+0xa7/0xb0 virtnet_open+0xc8/0x310 [virtio_net] __dev_open+0xfa/0x1b0 __dev_change_flags+0x1de/0x250 dev_change_flags+0x22/0x60 do_setlink.isra.0+0x2df/0x10b0 ? rtnetlink_rcv_msg+0x34f/0x3f0 ? netlink_rcv_skb+0x54/0x100 ? netlink_unicast+0x23e/0x390 ? netlink_sendmsg+0x21e/0x490 ? ____sys_sendmsg+0x31b/0x350 ? avc_has_perm_noaudit+0x67/0xf0 ? cred_has_capability.isra.0+0x75/0x110 ? __nla_validate_parse+0x5f/0xee0 ? __pfx___probestub_irq_enable+0x3/0x10 ? __create_object+0x5e/0x90 ? security_capable+0x3b/0x70 rtnl_newlink+0x784/0xaf0 ? avc_has_perm_noaudit+0x67/0xf0 ? cred_has_capability.isra.0+0x75/0x110 ? stack_depot_save_flags+0x24/0x6d0 ? __pfx_rtnl_newlink+0x10/0x10 rtnetlink_rcv_msg+0x34f/0x3f0 ? do_syscall_64+0x6c/0x180 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e ? __pfx_rtnetlink_rcv_msg+0x10/0x10 netlink_rcv_skb+0x54/0x100 netlink_unicast+0x23e/0x390 netlink_sendmsg+0x21e/0x490 ____sys_sendmsg+0x31b/0x350 ? copy_msghdr_from_user+0x6d/0xa0 ___sys_sendmsg+0x86/0xd0 ? __pte_offset_map+0x17/0x160 ? preempt_count_add+0x69/0xa0 ? __call_rcu_common.constprop.0+0x147/0x610 ? preempt_count_add+0x69/0xa0 ? preempt_count_add+0x69/0xa0 ? _raw_spin_trylock+0x13/0x60 ? trace_hardirqs_on+0x1d/0x80 __sys_sendmsg+0x66/0xc0 do_syscall_64+0x6c/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f41defe5b34 Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00 f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55 RSP: 002b:00007ffe5336ecc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f41defe5b34 RDX: 0000000000000000 RSI: 00007ffe5336ed30 RDI: 0000000000000003 RBP: 00007ffe5336eda0 R08: 0000000000000010 R09: 0000000000000001 R10: 00007ffe5336f6f9 R11: 0000000000000202 R12: 0000000000000003 R13: 0000000067452259 R14: 0000556ccc28b040 R15: 0000000000000000 </TASK> [...]
Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits") Cc: stable@vger.kernel.org # v6.11+ Signed-off-by: Koichiro Den koichiro.den@canonical.com Acked-by: Jason Wang jasowang@redhat.com Reviewed-by: Xuan Zhuo xuanzhuo@linux.alibaba.com [ pabeni: trimmed possibly troublesome separator ] Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2898,7 +2898,6 @@ static int virtnet_enable_queue_pair(str if (err < 0) goto err_xdp_reg_mem_model;
- netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, qp_index)); virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi); virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, &vi->sq[qp_index].napi);
@@ -6728,11 +6727,20 @@ free:
static void remove_vq_common(struct virtnet_info *vi) { + int i; + virtio_reset_device(vi->vdev);
/* Free unused buffers in both send and recv, if any. */ free_unused_bufs(vi);
+ /* + * Rule of thumb is netdev_tx_reset_queue() should follow any + * skb freeing not followed by netdev_tx_completed_queue() + */ + for (i = 0; i < vi->max_queue_pairs; i++) + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i)); + free_receive_bufs(vi);
free_receive_page_frags(vi);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Koichiro Den koichiro.den@canonical.com
commit 8d6712c892019b9b9dc5c7039edd3c9d770b510b upstream.
When virtqueue_resize() has actually recycled all unused buffers, additional work may be required in some cases. Relying solely on its return status is fragile, so introduce a new function argument 'recycle_done', which is invoked when the recycle really occurs.
Cc: stable@vger.kernel.org # v6.11+ Signed-off-by: Koichiro Den koichiro.den@canonical.com Acked-by: Jason Wang jasowang@redhat.com Reviewed-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 4 ++-- drivers/virtio/virtio_ring.c | 6 +++++- include/linux/virtio.h | 3 ++- 3 files changed, 9 insertions(+), 4 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3165,7 +3165,7 @@ static int virtnet_rx_resize(struct virt
virtnet_rx_pause(vi, rq);
- err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf); + err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf, NULL); if (err) netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err);
@@ -3228,7 +3228,7 @@ static int virtnet_tx_resize(struct virt
virtnet_tx_pause(vi, sq);
- err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf); + err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, NULL); if (err) netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
--- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2716,6 +2716,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue * @_vq: the struct virtqueue we're talking about. * @num: new ring num * @recycle: callback to recycle unused buffers + * @recycle_done: callback to be invoked when recycle for all unused buffers done * * When it is really necessary to create a new vring, it will set the current vq * into the reset state. Then call the passed callback to recycle the buffer @@ -2736,7 +2737,8 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue * */ int virtqueue_resize(struct virtqueue *_vq, u32 num, - void (*recycle)(struct virtqueue *vq, void *buf)) + void (*recycle)(struct virtqueue *vq, void *buf), + void (*recycle_done)(struct virtqueue *vq)) { struct vring_virtqueue *vq = to_vvq(_vq); int err; @@ -2753,6 +2755,8 @@ int virtqueue_resize(struct virtqueue *_ err = virtqueue_disable_and_recycle(_vq, recycle); if (err) return err; + if (recycle_done) + recycle_done(_vq);
if (vq->packed_ring) err = virtqueue_resize_packed(_vq, num); --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -100,7 +100,8 @@ dma_addr_t virtqueue_get_avail_addr(cons dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
int virtqueue_resize(struct virtqueue *vq, u32 num, - void (*recycle)(struct virtqueue *vq, void *buf)); + void (*recycle)(struct virtqueue *vq, void *buf), + void (*recycle_done)(struct virtqueue *vq)); int virtqueue_reset(struct virtqueue *vq, void (*recycle)(struct virtqueue *vq, void *buf));
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Koichiro Den koichiro.den@canonical.com
commit 1480f0f61b675567ca5d0943d6ef2e39172dcafd upstream.
virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to be reset when flushing has actually occurred. Add virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to handle this.
Fixes: c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits") Cc: stable@vger.kernel.org # v6.11+ Signed-off-by: Koichiro Den koichiro.den@canonical.com Acked-by: Jason Wang jasowang@redhat.com Reviewed-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -502,6 +502,7 @@ struct virtio_net_common_hdr { };
static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf); +static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq); static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp, struct net_device *dev, unsigned int *xdp_xmit, @@ -3228,7 +3229,8 @@ static int virtnet_tx_resize(struct virt
virtnet_tx_pause(vi, sq);
- err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, NULL); + err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf, + virtnet_sq_free_unused_buf_done); if (err) netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
@@ -5996,6 +5998,14 @@ static void virtnet_sq_free_unused_buf(s xdp_return_frame(ptr_to_xdp(buf)); }
+static void virtnet_sq_free_unused_buf_done(struct virtqueue *vq) +{ + struct virtnet_info *vi = vq->vdev->priv; + int i = vq2txq(vq); + + netdev_tx_reset_queue(netdev_get_tx_queue(vi->dev, i)); +} + static void free_unused_bufs(struct virtnet_info *vi) { void *buf;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Björn Töpel bjorn@rivosinc.com
commit 21f1b85c8912262adf51707e63614a114425eb10 upstream.
The vmemmap's, which is used for RV64 with SPARSEMEM_VMEMMAP, page tables are populated using pmd (page middle directory) hugetables. However, the pmd allocation is not using the generic mechanism used by the VMA code (e.g. pmd_alloc()), or the RISC-V specific create_pgd_mapping()/alloc_pmd_late(). Instead, the vmemmap page table code allocates a page, and calls vmemmap_set_pmd(). This results in that the pmd ctor is *not* called, nor would it make sense to do so.
Now, when tearing down a vmemmap page table pmd, the cleanup code would unconditionally, and incorrectly call the pmd dtor, which results in a crash (best case).
This issue was found when running the HMM selftests:
| tools/testing/selftests/mm# ./test_hmm.sh smoke | ... # when unloading the test_hmm.ko module | page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10915b | flags: 0x1000000000000000(node=0|zone=1) | raw: 1000000000000000 0000000000000000 dead000000000122 0000000000000000 | raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 | page dumped because: VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte) | ------------[ cut here ]------------ | kernel BUG at include/linux/mm.h:3080! | Kernel BUG [#1] | Modules linked in: test_hmm(-) sch_fq_codel fuse drm drm_panel_orientation_quirks backlight dm_mod | CPU: 1 UID: 0 PID: 514 Comm: modprobe Tainted: G W 6.12.0-00982-gf2a4f1682d07 #2 | Tainted: [W]=WARN | Hardware name: riscv-virtio qemu/qemu, BIOS 2024.10 10/01/2024 | epc : remove_pgd_mapping+0xbec/0x1070 | ra : remove_pgd_mapping+0xbec/0x1070 | epc : ffffffff80010a68 ra : ffffffff80010a68 sp : ff20000000a73940 | gp : ffffffff827b2d88 tp : ff6000008785da40 t0 : ffffffff80fbce04 | t1 : 0720072007200720 t2 : 706d756420656761 s0 : ff20000000a73a50 | s1 : ff6000008915cff8 a0 : 0000000000000039 a1 : 0000000000000008 | a2 : ff600003fff0de20 a3 : 0000000000000000 a4 : 0000000000000000 | a5 : 0000000000000000 a6 : c0000000ffffefff a7 : ffffffff824469b8 | s2 : ff1c0000022456c0 s3 : ff1ffffffdbfffff s4 : ff6000008915c000 | s5 : ff6000008915c000 s6 : ff6000008915c000 s7 : ff1ffffffdc00000 | s8 : 0000000000000001 s9 : ff1ffffffdc00000 s10: ffffffff819a31f0 | s11: ffffffffffffffff t3 : ffffffff8000c950 t4 : ff60000080244f00 | t5 : ff60000080244000 t6 : ff20000000a73708 | status: 0000000200000120 badaddr: ffffffff80010a68 cause: 0000000000000003 | [<ffffffff80010a68>] remove_pgd_mapping+0xbec/0x1070 | [<ffffffff80fd238e>] vmemmap_free+0x14/0x1e | [<ffffffff8032e698>] section_deactivate+0x220/0x452 | [<ffffffff8032ef7e>] sparse_remove_section+0x4a/0x58 | [<ffffffff802f8700>] __remove_pages+0x7e/0xba | [<ffffffff803760d8>] memunmap_pages+0x2bc/0x3fe | [<ffffffff02a3ca28>] dmirror_device_remove_chunks+0x2ea/0x518 [test_hmm] | [<ffffffff02a3e026>] hmm_dmirror_exit+0x3e/0x1018 [test_hmm] | [<ffffffff80102c14>] __riscv_sys_delete_module+0x15a/0x2a6 | [<ffffffff80fd020c>] do_trap_ecall_u+0x1f2/0x266 | [<ffffffff80fde0a2>] _new_vmalloc_restore_context_a0+0xc6/0xd2 | Code: bf51 7597 0184 8593 76a5 854a 4097 0029 80e7 2c00 (9002) 7597 | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Fatal exception in interrupt
Add a check to avoid calling the pmd dtor, if the calling context is vmemmap_free().
Fixes: c75a74f4ba19 ("riscv: mm: Add memory hotplugging support") Signed-off-by: Björn Töpel bjorn@rivosinc.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20241120131203.1859787-1-bjorn@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/mm/init.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 0e8c20adcd98..fc53ce748c80 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -1566,7 +1566,7 @@ static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd) pmd_clear(pmd); }
-static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud, bool is_vmemmap) { struct page *page = pud_page(*pud); struct ptdesc *ptdesc = page_ptdesc(page); @@ -1579,7 +1579,8 @@ static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) return; }
- pagetable_pmd_dtor(ptdesc); + if (!is_vmemmap) + pagetable_pmd_dtor(ptdesc); if (PageReserved(page)) free_reserved_page(page); else @@ -1703,7 +1704,7 @@ static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long addr, un remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, altmap);
if (pgtable_l4_enabled) - free_pmd_table(pmd_base, pudp); + free_pmd_table(pmd_base, pudp, is_vmemmap); } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
commit c796e187201242992d6d292bfeff41aadfdf3f29 upstream.
riscv uses fixmap addresses to map the dtb so we can't use __pa() which is reserved for linear mapping addresses.
Fixes: b2473a359763 ("of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify") Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20241209074508.53037-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -227,7 +227,7 @@ static void __init init_resources(void) static void __init parse_dtb(void) { /* Early scan of device tree from init memory */ - if (early_init_dt_scan(dtb_early_va, __pa(dtb_early_va))) { + if (early_init_dt_scan(dtb_early_va, dtb_early_pa)) { const char *name = of_flat_dt_get_machine_name();
if (name) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejun Heo tj@kernel.org
commit 86e6ca55b83c575ab0f2e105cf08f98e58d3d7af upstream.
blkcg_unpin_online() walks up the blkcg hierarchy putting the online pin. To walk up, it uses blkcg_parent(blkcg) but it was calling that after blkcg_destroy_blkgs(blkcg) which could free the blkcg, leading to the following UAF:
================================================================== BUG: KASAN: slab-use-after-free in blkcg_unpin_online+0x15a/0x270 Read of size 8 at addr ffff8881057678c0 by task kworker/9:1/117
CPU: 9 UID: 0 PID: 117 Comm: kworker/9:1 Not tainted 6.13.0-rc1-work-00182-gb8f52214c61a-dirty #48 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 02/02/2022 Workqueue: cgwb_release cgwb_release_workfn Call Trace: <TASK> dump_stack_lvl+0x27/0x80 print_report+0x151/0x710 kasan_report+0xc0/0x100 blkcg_unpin_online+0x15a/0x270 cgwb_release_workfn+0x194/0x480 process_scheduled_works+0x71b/0xe20 worker_thread+0x82a/0xbd0 kthread+0x242/0x2c0 ret_from_fork+0x33/0x70 ret_from_fork_asm+0x1a/0x30 </TASK> ... Freed by task 1944: kasan_save_track+0x2b/0x70 kasan_save_free_info+0x3c/0x50 __kasan_slab_free+0x33/0x50 kfree+0x10c/0x330 css_free_rwork_fn+0xe6/0xb30 process_scheduled_works+0x71b/0xe20 worker_thread+0x82a/0xbd0 kthread+0x242/0x2c0 ret_from_fork+0x33/0x70 ret_from_fork_asm+0x1a/0x30
Note that the UAF is not easy to trigger as the free path is indirected behind a couple RCU grace periods and a work item execution. I could only trigger it with artifical msleep() injected in blkcg_unpin_online().
Fix it by reading the parent pointer before destroying the blkcg's blkg's.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Abagail ren renzezhongucas@gmail.com Suggested-by: Linus Torvalds torvalds@linuxfoundation.org Fixes: 4308a434e5e0 ("blkcg: don't offline parent blkcg first") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-cgroup.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1324,10 +1324,14 @@ void blkcg_unpin_online(struct cgroup_su struct blkcg *blkcg = css_to_blkcg(blkcg_css);
do { + struct blkcg *parent; + if (!refcount_dec_and_test(&blkcg->online_pin)) break; + + parent = blkcg_parent(blkcg); blkcg_destroy_blkgs(blkcg); - blkcg = blkcg_parent(blkcg); + blkcg = parent; } while (blkcg); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 4122fef16b172f7c1838fcf74340268c86ed96db upstream.
Replace the raw atomic_t reference counting of zone write plugs with a refcount_t. No functional changes.
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202411050650.ilIZa8S7-lkp@intel.com/ Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20241107065438.236348-1-dlemoal@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-zoned.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
--- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -18,7 +18,7 @@ #include <linux/vmalloc.h> #include <linux/sched/mm.h> #include <linux/spinlock.h> -#include <linux/atomic.h> +#include <linux/refcount.h> #include <linux/mempool.h>
#include "blk.h" @@ -64,7 +64,7 @@ static const char *const zone_cond_name[ struct blk_zone_wplug { struct hlist_node node; struct list_head link; - atomic_t ref; + refcount_t ref; spinlock_t lock; unsigned int flags; unsigned int zone_no; @@ -417,7 +417,7 @@ static struct blk_zone_wplug *disk_get_z
hlist_for_each_entry_rcu(zwplug, &disk->zone_wplugs_hash[idx], node) { if (zwplug->zone_no == zno && - atomic_inc_not_zero(&zwplug->ref)) { + refcount_inc_not_zero(&zwplug->ref)) { rcu_read_unlock(); return zwplug; } @@ -438,7 +438,7 @@ static void disk_free_zone_wplug_rcu(str
static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug) { - if (atomic_dec_and_test(&zwplug->ref)) { + if (refcount_dec_and_test(&zwplug->ref)) { WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list)); WARN_ON_ONCE(!list_empty(&zwplug->link)); WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_UNHASHED)); @@ -469,7 +469,7 @@ static inline bool disk_should_remove_zo * taken when the plug was allocated and another reference taken by the * caller context). */ - if (atomic_read(&zwplug->ref) > 2) + if (refcount_read(&zwplug->ref) > 2) return false;
/* We can remove zone write plugs for zones that are empty or full. */ @@ -539,7 +539,7 @@ again:
INIT_HLIST_NODE(&zwplug->node); INIT_LIST_HEAD(&zwplug->link); - atomic_set(&zwplug->ref, 2); + refcount_set(&zwplug->ref, 2); spin_lock_init(&zwplug->lock); zwplug->flags = 0; zwplug->zone_no = zno; @@ -630,7 +630,7 @@ static inline void disk_zone_wplug_set_e * finished. */ zwplug->flags |= BLK_ZONE_WPLUG_ERROR; - atomic_inc(&zwplug->ref); + refcount_inc(&zwplug->ref);
spin_lock_irqsave(&disk->zone_wplugs_lock, flags); list_add_tail(&zwplug->link, &disk->zone_wplugs_err_list); @@ -1105,7 +1105,7 @@ static void disk_zone_wplug_schedule_bio * reference we take here. */ WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)); - atomic_inc(&zwplug->ref); + refcount_inc(&zwplug->ref); queue_work(disk->zone_wplugs_wq, &zwplug->bio_work); }
@@ -1450,7 +1450,7 @@ static void disk_destroy_zone_wplugs_has while (!hlist_empty(&disk->zone_wplugs_hash[i])) { zwplug = hlist_entry(disk->zone_wplugs_hash[i].first, struct blk_zone_wplug, node); - atomic_inc(&zwplug->ref); + refcount_inc(&zwplug->ref); disk_remove_zone_wplug(disk, zwplug); disk_put_zone_wplug(zwplug); } @@ -1876,7 +1876,7 @@ int queue_zone_wplugs_show(void *data, s spin_lock_irqsave(&zwplug->lock, flags); zwp_zone_no = zwplug->zone_no; zwp_flags = zwplug->flags; - zwp_ref = atomic_read(&zwplug->ref); + zwp_ref = refcount_read(&zwplug->ref); zwp_wp_offset = zwplug->wp_offset; zwp_bio_list_size = bio_list_size(&zwplug->bio_list); spin_unlock_irqrestore(&zwplug->lock, flags);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit cae005670887cb07ceafc25bb32e221e56286488 upstream.
For zoned block devices, a write BIO issued to a zone that has no on-going writes will be prepared for execution and allowed to execute immediately by blk_zone_wplug_handle_write() (called from blk_zone_plug_bio()). However, if this BIO specifies REQ_NOWAIT, the allocation of a request for its execution in blk_mq_submit_bio() may fail after blk_zone_plug_bio() completed, marking the target zone of the BIO as plugged. When this BIO is retried later on, it will be blocked as the zone write plug of the target zone is in a plugged state without any on-going write operation (completion of write operations trigger unplugging of the next write BIOs for a zone). This leads to a BIO that is stuck in a zone write plug and never completes, which results in various issues such as hung tasks.
Avoid this problem by always executing REQ_NOWAIT write BIOs using the BIO work of a zone write plug. This ensure that we never block the BIO issuer and can thus safely ignore the REQ_NOWAIT flag when executing the BIO from the zone write plug BIO work.
Since such BIO may be the first write BIO issued to a zone with no on-going write, modify disk_zone_wplug_add_bio() to schedule the zone write plug BIO work if the write plug is not already marked with the BLK_ZONE_WPLUG_PLUGGED flag. This scheduling is otherwise not necessary as the completion of the on-going write for the zone will schedule the execution of the next plugged BIOs.
blk_zone_wplug_handle_write() is also fixed to better handle zone write plug allocation failures for REQ_NOWAIT BIOs by failing a write BIO using bio_wouldblock_error() instead of bio_io_error().
Reported-by: Bart Van Assche bvanassche@acm.org Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Link: https://lore.kernel.org/r/20241209122357.47838-2-dlemoal@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-zoned.c | 62 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 42 insertions(+), 20 deletions(-)
--- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -759,9 +759,25 @@ static bool blk_zone_wplug_handle_reset_ return false; }
-static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug, - struct bio *bio, unsigned int nr_segs) +static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk, + struct blk_zone_wplug *zwplug) +{ + /* + * Take a reference on the zone write plug and schedule the submission + * of the next plugged BIO. blk_zone_wplug_bio_work() will release the + * reference we take here. + */ + WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)); + refcount_inc(&zwplug->ref); + queue_work(disk->zone_wplugs_wq, &zwplug->bio_work); +} + +static inline void disk_zone_wplug_add_bio(struct gendisk *disk, + struct blk_zone_wplug *zwplug, + struct bio *bio, unsigned int nr_segs) { + bool schedule_bio_work = false; + /* * Grab an extra reference on the BIO request queue usage counter. * This reference will be reused to submit a request for the BIO for @@ -778,6 +794,16 @@ static inline void blk_zone_wplug_add_bi bio_clear_polled(bio);
/* + * REQ_NOWAIT BIOs are always handled using the zone write plug BIO + * work, which can block. So clear the REQ_NOWAIT flag and schedule the + * work if this is the first BIO we are plugging. + */ + if (bio->bi_opf & REQ_NOWAIT) { + schedule_bio_work = !(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); + bio->bi_opf &= ~REQ_NOWAIT; + } + + /* * Reuse the poll cookie field to store the number of segments when * split to the hardware limits. */ @@ -790,6 +816,11 @@ static inline void blk_zone_wplug_add_bi * at the tail of the list to preserve the sequential write order. */ bio_list_add(&zwplug->bio_list, bio); + + zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; + + if (schedule_bio_work) + disk_zone_wplug_schedule_bio_work(disk, zwplug); }
/* @@ -983,7 +1014,10 @@ static bool blk_zone_wplug_handle_write(
zwplug = disk_get_and_lock_zone_wplug(disk, sector, gfp_mask, &flags); if (!zwplug) { - bio_io_error(bio); + if (bio->bi_opf & REQ_NOWAIT) + bio_wouldblock_error(bio); + else + bio_io_error(bio); return true; }
@@ -992,9 +1026,11 @@ static bool blk_zone_wplug_handle_write(
/* * If the zone is already plugged or has a pending error, add the BIO - * to the plug BIO list. Otherwise, plug and let the BIO execute. + * to the plug BIO list. Do the same for REQ_NOWAIT BIOs to ensure that + * we will not see a BLK_STS_AGAIN failure if we let the BIO execute. + * Otherwise, plug and let the BIO execute. */ - if (zwplug->flags & BLK_ZONE_WPLUG_BUSY) + if (zwplug->flags & BLK_ZONE_WPLUG_BUSY || (bio->bi_opf & REQ_NOWAIT)) goto plug;
/* @@ -1011,8 +1047,7 @@ static bool blk_zone_wplug_handle_write( return false;
plug: - zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; - blk_zone_wplug_add_bio(zwplug, bio, nr_segs); + disk_zone_wplug_add_bio(disk, zwplug, bio, nr_segs);
spin_unlock_irqrestore(&zwplug->lock, flags);
@@ -1096,19 +1131,6 @@ bool blk_zone_plug_bio(struct bio *bio, } EXPORT_SYMBOL_GPL(blk_zone_plug_bio);
-static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk, - struct blk_zone_wplug *zwplug) -{ - /* - * Take a reference on the zone write plug and schedule the submission - * of the next plugged BIO. blk_zone_wplug_bio_work() will release the - * reference we take here. - */ - WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)); - refcount_inc(&zwplug->ref); - queue_work(disk->zone_wplugs_wq, &zwplug->bio_work); -} - static void disk_zone_wplug_unplug_bio(struct gendisk *disk, struct blk_zone_wplug *zwplug) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit b76b840fd93374240b59825f1ab8e2f5c9907acb upstream.
The zone reclaim processing of the dm-zoned device mapper uses blkdev_issue_zeroout() to align the write pointer of a zone being used for reclaiming another zone, to write the valid data blocks from the zone being reclaimed at the same position relative to the zone start in the reclaim target zone.
The first call to blkdev_issue_zeroout() will try to use hardware offload using a REQ_OP_WRITE_ZEROES operation if the device reports a non-zero max_write_zeroes_sectors queue limit. If this operation fails because of the lack of hardware support, blkdev_issue_zeroout() falls back to using a regular write operation with the zero-page as buffer. Currently, such REQ_OP_WRITE_ZEROES failure is automatically handled by the block layer zone write plugging code which will execute a report zones operation to ensure that the write pointer of the target zone of the failed operation has not changed and to "rewind" the zone write pointer offset of the target zone as it was advanced when the write zero operation was submitted. So the REQ_OP_WRITE_ZEROES failure does not cause any issue and blkdev_issue_zeroout() works as expected.
However, since the automatic recovery of zone write pointers by the zone write plugging code can potentially cause deadlocks with queue freeze operations, a different recovery must be implemented in preparation for the removal of zone write plugging report zones based recovery.
Do this by introducing the new function blk_zone_issue_zeroout(). This function first calls blkdev_issue_zeroout() with the flag BLKDEV_ZERO_NOFALLBACK to intercept failures on the first execution which attempt to use the device hardware offload with the REQ_OP_WRITE_ZEROES operation. If this attempt fails, a report zone operation is issued to restore the zone write pointer offset of the target zone to the correct position and blkdev_issue_zeroout() is called again without the BLKDEV_ZERO_NOFALLBACK flag. The report zones operation performing this recovery is implemented using the helper function disk_zone_sync_wp_offset() which calls the gendisk report_zones file operation with the callback disk_report_zones_cb(). This callback updates the target write pointer offset of the target zone using the new function disk_zone_wplug_sync_wp_offset().
dmz_reclaim_align_wp() is modified to change its call to blkdev_issue_zeroout() to a call to blk_zone_issue_zeroout() without any other change needed as the two functions are functionnally equivalent.
Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Acked-by: Mike Snitzer snitzer@kernel.org Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Link: https://lore.kernel.org/r/20241209122357.47838-4-dlemoal@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-zoned.c | 141 +++++++++++++++++++++++++++++++++++------- drivers/md/dm-zoned-reclaim.c | 6 - include/linux/blkdev.h | 3 3 files changed, 124 insertions(+), 26 deletions(-)
--- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -115,6 +115,30 @@ const char *blk_zone_cond_str(enum blk_z } EXPORT_SYMBOL_GPL(blk_zone_cond_str);
+struct disk_report_zones_cb_args { + struct gendisk *disk; + report_zones_cb user_cb; + void *user_data; +}; + +static void disk_zone_wplug_sync_wp_offset(struct gendisk *disk, + struct blk_zone *zone); + +static int disk_report_zones_cb(struct blk_zone *zone, unsigned int idx, + void *data) +{ + struct disk_report_zones_cb_args *args = data; + struct gendisk *disk = args->disk; + + if (disk->zone_wplugs_hash) + disk_zone_wplug_sync_wp_offset(disk, zone); + + if (!args->user_cb) + return 0; + + return args->user_cb(zone, idx, args->user_data); +} + /** * blkdev_report_zones - Get zones information * @bdev: Target block device @@ -707,6 +731,58 @@ static void disk_zone_wplug_set_wp_offse spin_unlock_irqrestore(&zwplug->lock, flags); }
+static unsigned int blk_zone_wp_offset(struct blk_zone *zone) +{ + switch (zone->cond) { + case BLK_ZONE_COND_IMP_OPEN: + case BLK_ZONE_COND_EXP_OPEN: + case BLK_ZONE_COND_CLOSED: + return zone->wp - zone->start; + case BLK_ZONE_COND_FULL: + return zone->len; + case BLK_ZONE_COND_EMPTY: + return 0; + case BLK_ZONE_COND_NOT_WP: + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + default: + /* + * Conventional, offline and read-only zones do not have a valid + * write pointer. + */ + return UINT_MAX; + } +} + +static void disk_zone_wplug_sync_wp_offset(struct gendisk *disk, + struct blk_zone *zone) +{ + struct blk_zone_wplug *zwplug; + unsigned long flags; + + zwplug = disk_get_zone_wplug(disk, zone->start); + if (!zwplug) + return; + + spin_lock_irqsave(&zwplug->lock, flags); + if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) + disk_zone_wplug_set_wp_offset(disk, zwplug, + blk_zone_wp_offset(zone)); + spin_unlock_irqrestore(&zwplug->lock, flags); + + disk_put_zone_wplug(zwplug); +} + +static int disk_zone_sync_wp_offset(struct gendisk *disk, sector_t sector) +{ + struct disk_report_zones_cb_args args = { + .disk = disk, + }; + + return disk->fops->report_zones(disk, sector, 1, + disk_report_zones_cb, &args); +} + static bool blk_zone_wplug_handle_reset_or_finish(struct bio *bio, unsigned int wp_offset) { @@ -1284,29 +1360,6 @@ put_zwplug: disk_put_zone_wplug(zwplug); }
-static unsigned int blk_zone_wp_offset(struct blk_zone *zone) -{ - switch (zone->cond) { - case BLK_ZONE_COND_IMP_OPEN: - case BLK_ZONE_COND_EXP_OPEN: - case BLK_ZONE_COND_CLOSED: - return zone->wp - zone->start; - case BLK_ZONE_COND_FULL: - return zone->len; - case BLK_ZONE_COND_EMPTY: - return 0; - case BLK_ZONE_COND_NOT_WP: - case BLK_ZONE_COND_OFFLINE: - case BLK_ZONE_COND_READONLY: - default: - /* - * Conventional, offline and read-only zones do not have a valid - * write pointer. - */ - return UINT_MAX; - } -} - static int blk_zone_wplug_report_zone_cb(struct blk_zone *zone, unsigned int idx, void *data) { @@ -1876,6 +1929,48 @@ int blk_revalidate_disk_zones(struct gen } EXPORT_SYMBOL_GPL(blk_revalidate_disk_zones);
+/** + * blk_zone_issue_zeroout - zero-fill a block range in a zone + * @bdev: blockdev to write + * @sector: start sector + * @nr_sects: number of sectors to write + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + * Zero-fill a block range in a zone (@sector must be equal to the zone write + * pointer), handling potential errors due to the (initially unknown) lack of + * hardware offload (See blkdev_issue_zeroout()). + */ +int blk_zone_issue_zeroout(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask) +{ + int ret; + + if (WARN_ON_ONCE(!bdev_is_zoned(bdev))) + return -EIO; + + ret = blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, + BLKDEV_ZERO_NOFALLBACK); + if (ret != -EOPNOTSUPP) + return ret; + + /* + * The failed call to blkdev_issue_zeroout() advanced the zone write + * pointer. Undo this using a report zone to update the zone write + * pointer to the correct current value. + */ + ret = disk_zone_sync_wp_offset(bdev->bd_disk, sector); + if (ret != 1) + return ret < 0 ? ret : -EIO; + + /* + * Retry without BLKDEV_ZERO_NOFALLBACK to force the fallback to a + * regular write with zero-pages. + */ + return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, 0); +} +EXPORT_SYMBOL_GPL(blk_zone_issue_zeroout); + #ifdef CONFIG_BLK_DEBUG_FS
int queue_zone_wplugs_show(void *data, struct seq_file *m) --- a/drivers/md/dm-zoned-reclaim.c +++ b/drivers/md/dm-zoned-reclaim.c @@ -76,9 +76,9 @@ static int dmz_reclaim_align_wp(struct d * pointer and the requested position. */ nr_blocks = block - wp_block; - ret = blkdev_issue_zeroout(dev->bdev, - dmz_start_sect(zmd, zone) + dmz_blk2sect(wp_block), - dmz_blk2sect(nr_blocks), GFP_NOIO, 0); + ret = blk_zone_issue_zeroout(dev->bdev, + dmz_start_sect(zmd, zone) + dmz_blk2sect(wp_block), + dmz_blk2sect(nr_blocks), GFP_NOIO); if (ret) { dmz_dev_err(dev, "Align zone %u wp %llu to %llu (wp+%u) blocks failed %d", --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1386,6 +1386,9 @@ static inline bool bdev_is_zone_start(st return bdev_offset_from_zone_start(bdev, sector) == 0; }
+int blk_zone_issue_zeroout(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask); + static inline int queue_dma_alignment(const struct request_queue *q) { return q->limits.dma_alignment;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit fe0418eb9bd69a19a948b297c8de815e05f3cde1 upstream.
Zone write plugging for handling writes to zones of a zoned block device always execute a zone report whenever a write BIO to a zone fails. The intent of this is to ensure that the tracking of a zone write pointer is always correct to ensure that the alignment to a zone write pointer of write BIOs can be checked on submission and that we can always correctly emulate zone append operations using regular write BIOs.
However, this error recovery scheme introduces a potential deadlock if a device queue freeze is initiated while BIOs are still plugged in a zone write plug and one of these write operation fails. In such case, the disk zone write plug error recovery work is scheduled and executes a report zone. This in turn can result in a request allocation in the underlying driver to issue the report zones command to the device. But with the device queue freeze already started, this allocation will block, preventing the report zone execution and the continuation of the processing of the plugged BIOs. As plugged BIOs hold a queue usage reference, the queue freeze itself will never complete, resulting in a deadlock.
Avoid this problem by completely removing from the zone write plugging code the use of report zones operations after a failed write operation, instead relying on the device user to either execute a report zones, reset the zone, finish the zone, or give up writing to the device (which is a fairly common pattern for file systems which degrade to read-only after write failures). This is not an unreasonnable requirement as all well-behaved applications, FSes and device mapper already use report zones to recover from write errors whenever possible by comparing the current position of a zone write pointer with what their assumption about the position is.
The changes to remove the automatic error recovery are as follows: - Completely remove the error recovery work and its associated resources (zone write plug list head, disk error list, and disk zone_wplugs_work work struct). This also removes the functions disk_zone_wplug_set_error() and disk_zone_wplug_clear_error().
- Change the BLK_ZONE_WPLUG_ERROR zone write plug flag into BLK_ZONE_WPLUG_NEED_WP_UPDATE. This new flag is set for a zone write plug whenever a write opration targetting the zone of the zone write plug fails. This flag indicates that the zone write pointer offset is not reliable and that it must be updated when the next report zone, reset zone, finish zone or disk revalidation is executed.
- Modify blk_zone_write_plug_bio_endio() to set the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag for the target zone of a failed write BIO.
- Modify the function disk_zone_wplug_set_wp_offset() to clear this new flag, thus implementing recovery of a correct write pointer offset with the reset (all) zone and finish zone operations.
- Modify blkdev_report_zones() to always use the disk_report_zones_cb() callback so that disk_zone_wplug_sync_wp_offset() can be called for any zone marked with the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag. This implements recovery of a correct write pointer offset for zone write plugs marked with BLK_ZONE_WPLUG_NEED_WP_UPDATE and within the range of the report zones operation executed by the user.
- Modify blk_revalidate_seq_zone() to call disk_zone_wplug_sync_wp_offset() for all sequential write required zones when a zoned block device is revalidated, thus always resolving any inconsistency between the write pointer offset of zone write plugs and the actual write pointer position of sequential zones.
Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Link: https://lore.kernel.org/r/20241209122357.47838-5-dlemoal@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-zoned.c | 308 +++++++++---------------------------------------- include/linux/blkdev.h | 2 2 files changed, 61 insertions(+), 249 deletions(-)
--- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -41,7 +41,6 @@ static const char *const zone_cond_name[ /* * Per-zone write plug. * @node: hlist_node structure for managing the plug using a hash table. - * @link: To list the plug in the zone write plug error list of the disk. * @ref: Zone write plug reference counter. A zone write plug reference is * always at least 1 when the plug is hashed in the disk plug hash table. * The reference is incremented whenever a new BIO needing plugging is @@ -63,7 +62,6 @@ static const char *const zone_cond_name[ */ struct blk_zone_wplug { struct hlist_node node; - struct list_head link; refcount_t ref; spinlock_t lock; unsigned int flags; @@ -80,8 +78,8 @@ struct blk_zone_wplug { * - BLK_ZONE_WPLUG_PLUGGED: Indicates that the zone write plug is plugged, * that is, that write BIOs are being throttled due to a write BIO already * being executed or the zone write plug bio list is not empty. - * - BLK_ZONE_WPLUG_ERROR: Indicates that a write error happened which will be - * recovered with a report zone to update the zone write pointer offset. + * - BLK_ZONE_WPLUG_NEED_WP_UPDATE: Indicates that we lost track of a zone + * write pointer offset and need to update it. * - BLK_ZONE_WPLUG_UNHASHED: Indicates that the zone write plug was removed * from the disk hash table and that the initial reference to the zone * write plug set when the plug was first added to the hash table has been @@ -91,11 +89,9 @@ struct blk_zone_wplug { * freed once all remaining references from BIOs or functions are dropped. */ #define BLK_ZONE_WPLUG_PLUGGED (1U << 0) -#define BLK_ZONE_WPLUG_ERROR (1U << 1) +#define BLK_ZONE_WPLUG_NEED_WP_UPDATE (1U << 1) #define BLK_ZONE_WPLUG_UNHASHED (1U << 2)
-#define BLK_ZONE_WPLUG_BUSY (BLK_ZONE_WPLUG_PLUGGED | BLK_ZONE_WPLUG_ERROR) - /** * blk_zone_cond_str - Return string XXX in BLK_ZONE_COND_XXX. * @zone_cond: BLK_ZONE_COND_XXX. @@ -163,6 +159,11 @@ int blkdev_report_zones(struct block_dev { struct gendisk *disk = bdev->bd_disk; sector_t capacity = get_capacity(disk); + struct disk_report_zones_cb_args args = { + .disk = disk, + .user_cb = cb, + .user_data = data, + };
if (!bdev_is_zoned(bdev) || WARN_ON_ONCE(!disk->fops->report_zones)) return -EOPNOTSUPP; @@ -170,7 +171,8 @@ int blkdev_report_zones(struct block_dev if (!nr_zones || sector >= capacity) return 0;
- return disk->fops->report_zones(disk, sector, nr_zones, cb, data); + return disk->fops->report_zones(disk, sector, nr_zones, + disk_report_zones_cb, &args); } EXPORT_SYMBOL_GPL(blkdev_report_zones);
@@ -464,7 +466,7 @@ static inline void disk_put_zone_wplug(s { if (refcount_dec_and_test(&zwplug->ref)) { WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list)); - WARN_ON_ONCE(!list_empty(&zwplug->link)); + WARN_ON_ONCE(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_UNHASHED));
call_rcu(&zwplug->rcu_head, disk_free_zone_wplug_rcu); @@ -478,8 +480,8 @@ static inline bool disk_should_remove_zo if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) return false;
- /* If the zone write plug is still busy, it cannot be removed. */ - if (zwplug->flags & BLK_ZONE_WPLUG_BUSY) + /* If the zone write plug is still plugged, it cannot be removed. */ + if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) return false;
/* @@ -562,7 +564,6 @@ again: return NULL;
INIT_HLIST_NODE(&zwplug->node); - INIT_LIST_HEAD(&zwplug->link); refcount_set(&zwplug->ref, 2); spin_lock_init(&zwplug->lock); zwplug->flags = 0; @@ -611,124 +612,29 @@ static void disk_zone_wplug_abort(struct }
/* - * Abort (fail) all plugged BIOs of a zone write plug that are not aligned - * with the assumed write pointer location of the zone when the BIO will - * be unplugged. - */ -static void disk_zone_wplug_abort_unaligned(struct gendisk *disk, - struct blk_zone_wplug *zwplug) -{ - unsigned int wp_offset = zwplug->wp_offset; - struct bio_list bl = BIO_EMPTY_LIST; - struct bio *bio; - - while ((bio = bio_list_pop(&zwplug->bio_list))) { - if (disk_zone_is_full(disk, zwplug->zone_no, wp_offset) || - (bio_op(bio) != REQ_OP_ZONE_APPEND && - bio_offset_from_zone_start(bio) != wp_offset)) { - blk_zone_wplug_bio_io_error(zwplug, bio); - continue; - } - - wp_offset += bio_sectors(bio); - bio_list_add(&bl, bio); - } - - bio_list_merge(&zwplug->bio_list, &bl); -} - -static inline void disk_zone_wplug_set_error(struct gendisk *disk, - struct blk_zone_wplug *zwplug) -{ - unsigned long flags; - - if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) - return; - - /* - * At this point, we already have a reference on the zone write plug. - * However, since we are going to add the plug to the disk zone write - * plugs work list, increase its reference count. This reference will - * be dropped in disk_zone_wplugs_work() once the error state is - * handled, or in disk_zone_wplug_clear_error() if the zone is reset or - * finished. - */ - zwplug->flags |= BLK_ZONE_WPLUG_ERROR; - refcount_inc(&zwplug->ref); - - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); - list_add_tail(&zwplug->link, &disk->zone_wplugs_err_list); - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); -} - -static inline void disk_zone_wplug_clear_error(struct gendisk *disk, - struct blk_zone_wplug *zwplug) -{ - unsigned long flags; - - if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR)) - return; - - /* - * We are racing with the error handling work which drops the reference - * on the zone write plug after handling the error state. So remove the - * plug from the error list and drop its reference count only if the - * error handling has not yet started, that is, if the zone write plug - * is still listed. - */ - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); - if (!list_empty(&zwplug->link)) { - list_del_init(&zwplug->link); - zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR; - disk_put_zone_wplug(zwplug); - } - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); -} - -/* - * Set a zone write plug write pointer offset to either 0 (zone reset case) - * or to the zone size (zone finish case). This aborts all plugged BIOs, which - * is fine to do as doing a zone reset or zone finish while writes are in-flight - * is a mistake from the user which will most likely cause all plugged BIOs to - * fail anyway. + * Set a zone write plug write pointer offset to the specified value. + * This aborts all plugged BIOs, which is fine as this function is called for + * a zone reset operation, a zone finish operation or if the zone needs a wp + * update from a report zone after a write error. */ static void disk_zone_wplug_set_wp_offset(struct gendisk *disk, struct blk_zone_wplug *zwplug, unsigned int wp_offset) { - unsigned long flags; - - spin_lock_irqsave(&zwplug->lock, flags); - - /* - * Make sure that a BIO completion or another zone reset or finish - * operation has not already removed the plug from the hash table. - */ - if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) { - spin_unlock_irqrestore(&zwplug->lock, flags); - return; - } + lockdep_assert_held(&zwplug->lock);
/* Update the zone write pointer and abort all plugged BIOs. */ + zwplug->flags &= ~BLK_ZONE_WPLUG_NEED_WP_UPDATE; zwplug->wp_offset = wp_offset; disk_zone_wplug_abort(zwplug);
/* - * Updating the write pointer offset puts back the zone - * in a good state. So clear the error flag and decrement the - * error count if we were in error state. - */ - disk_zone_wplug_clear_error(disk, zwplug); - - /* * The zone write plug now has no BIO plugged: remove it from the * hash table so that it cannot be seen. The plug will be freed * when the last reference is dropped. */ if (disk_should_remove_zone_wplug(disk, zwplug)) disk_remove_zone_wplug(disk, zwplug); - - spin_unlock_irqrestore(&zwplug->lock, flags); }
static unsigned int blk_zone_wp_offset(struct blk_zone *zone) @@ -765,7 +671,7 @@ static void disk_zone_wplug_sync_wp_offs return;
spin_lock_irqsave(&zwplug->lock, flags); - if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) + if (zwplug->flags & BLK_ZONE_WPLUG_NEED_WP_UPDATE) disk_zone_wplug_set_wp_offset(disk, zwplug, blk_zone_wp_offset(zone)); spin_unlock_irqrestore(&zwplug->lock, flags); @@ -789,6 +695,7 @@ static bool blk_zone_wplug_handle_reset_ struct gendisk *disk = bio->bi_bdev->bd_disk; sector_t sector = bio->bi_iter.bi_sector; struct blk_zone_wplug *zwplug; + unsigned long flags;
/* Conventional zones cannot be reset nor finished. */ if (disk_zone_is_conv(disk, sector)) { @@ -805,7 +712,9 @@ static bool blk_zone_wplug_handle_reset_ */ zwplug = disk_get_zone_wplug(disk, sector); if (zwplug) { + spin_lock_irqsave(&zwplug->lock, flags); disk_zone_wplug_set_wp_offset(disk, zwplug, wp_offset); + spin_unlock_irqrestore(&zwplug->lock, flags); disk_put_zone_wplug(zwplug); }
@@ -816,6 +725,7 @@ static bool blk_zone_wplug_handle_reset_ { struct gendisk *disk = bio->bi_bdev->bd_disk; struct blk_zone_wplug *zwplug; + unsigned long flags; sector_t sector;
/* @@ -827,7 +737,9 @@ static bool blk_zone_wplug_handle_reset_ sector += disk->queue->limits.chunk_sectors) { zwplug = disk_get_zone_wplug(disk, sector); if (zwplug) { + spin_lock_irqsave(&zwplug->lock, flags); disk_zone_wplug_set_wp_offset(disk, zwplug, 0); + spin_unlock_irqrestore(&zwplug->lock, flags); disk_put_zone_wplug(zwplug); } } @@ -1010,12 +922,22 @@ static bool blk_zone_wplug_prepare_bio(s struct gendisk *disk = bio->bi_bdev->bd_disk;
/* + * If we lost track of the zone write pointer due to a write error, + * the user must either execute a report zones, reset the zone or finish + * the to recover a reliable write pointer position. Fail BIOs if the + * user did not do that as we cannot handle emulated zone append + * otherwise. + */ + if (zwplug->flags & BLK_ZONE_WPLUG_NEED_WP_UPDATE) + return false; + + /* * Check that the user is not attempting to write to a full zone. * We know such BIO will fail, and that would potentially overflow our * write pointer offset beyond the end of the zone. */ if (disk_zone_wplug_is_full(disk, zwplug)) - goto err; + return false;
if (bio_op(bio) == REQ_OP_ZONE_APPEND) { /* @@ -1034,24 +956,18 @@ static bool blk_zone_wplug_prepare_bio(s bio_set_flag(bio, BIO_EMULATES_ZONE_APPEND); } else { /* - * Check for non-sequential writes early because we avoid a - * whole lot of error handling trouble if we don't send it off - * to the driver. + * Check for non-sequential writes early as we know that BIOs + * with a start sector not unaligned to the zone write pointer + * will fail. */ if (bio_offset_from_zone_start(bio) != zwplug->wp_offset) - goto err; + return false; }
/* Advance the zone write pointer offset. */ zwplug->wp_offset += bio_sectors(bio);
return true; - -err: - /* We detected an invalid write BIO: schedule error recovery. */ - disk_zone_wplug_set_error(disk, zwplug); - kblockd_schedule_work(&disk->zone_wplugs_work); - return false; }
static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) @@ -1101,20 +1017,20 @@ static bool blk_zone_wplug_handle_write( bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING);
/* - * If the zone is already plugged or has a pending error, add the BIO - * to the plug BIO list. Do the same for REQ_NOWAIT BIOs to ensure that - * we will not see a BLK_STS_AGAIN failure if we let the BIO execute. + * If the zone is already plugged, add the BIO to the plug BIO list. + * Do the same for REQ_NOWAIT BIOs to ensure that we will not see a + * BLK_STS_AGAIN failure if we let the BIO execute. * Otherwise, plug and let the BIO execute. */ - if (zwplug->flags & BLK_ZONE_WPLUG_BUSY || (bio->bi_opf & REQ_NOWAIT)) + if ((zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) || + (bio->bi_opf & REQ_NOWAIT)) goto plug;
- /* - * If an error is detected when preparing the BIO, add it to the BIO - * list so that error recovery can deal with it. - */ - if (!blk_zone_wplug_prepare_bio(zwplug, bio)) - goto plug; + if (!blk_zone_wplug_prepare_bio(zwplug, bio)) { + spin_unlock_irqrestore(&zwplug->lock, flags); + bio_io_error(bio); + return true; + }
zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED;
@@ -1214,16 +1130,6 @@ static void disk_zone_wplug_unplug_bio(s
spin_lock_irqsave(&zwplug->lock, flags);
- /* - * If we had an error, schedule error recovery. The recovery work - * will restart submission of plugged BIOs. - */ - if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) { - spin_unlock_irqrestore(&zwplug->lock, flags); - kblockd_schedule_work(&disk->zone_wplugs_work); - return; - } - /* Schedule submission of the next plugged BIO if we have one. */ if (!bio_list_empty(&zwplug->bio_list)) { disk_zone_wplug_schedule_bio_work(disk, zwplug); @@ -1266,12 +1172,13 @@ void blk_zone_write_plug_bio_endio(struc }
/* - * If the BIO failed, mark the plug as having an error to trigger - * recovery. + * If the BIO failed, abort all plugged BIOs and mark the plug as + * needing a write pointer update. */ if (bio->bi_status != BLK_STS_OK) { spin_lock_irqsave(&zwplug->lock, flags); - disk_zone_wplug_set_error(disk, zwplug); + disk_zone_wplug_abort(zwplug); + zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE; spin_unlock_irqrestore(&zwplug->lock, flags); }
@@ -1327,6 +1234,7 @@ static void blk_zone_wplug_bio_work(stru */ spin_lock_irqsave(&zwplug->lock, flags);
+again: bio = bio_list_pop(&zwplug->bio_list); if (!bio) { zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; @@ -1335,10 +1243,8 @@ static void blk_zone_wplug_bio_work(stru }
if (!blk_zone_wplug_prepare_bio(zwplug, bio)) { - /* Error recovery will decide what to do with the BIO. */ - bio_list_add_head(&zwplug->bio_list, bio); - spin_unlock_irqrestore(&zwplug->lock, flags); - goto put_zwplug; + blk_zone_wplug_bio_io_error(zwplug, bio); + goto again; }
spin_unlock_irqrestore(&zwplug->lock, flags); @@ -1360,97 +1266,6 @@ put_zwplug: disk_put_zone_wplug(zwplug); }
-static int blk_zone_wplug_report_zone_cb(struct blk_zone *zone, - unsigned int idx, void *data) -{ - struct blk_zone *zonep = data; - - *zonep = *zone; - return 0; -} - -static void disk_zone_wplug_handle_error(struct gendisk *disk, - struct blk_zone_wplug *zwplug) -{ - sector_t zone_start_sector = - bdev_zone_sectors(disk->part0) * zwplug->zone_no; - unsigned int noio_flag; - struct blk_zone zone; - unsigned long flags; - int ret; - - /* Get the current zone information from the device. */ - noio_flag = memalloc_noio_save(); - ret = disk->fops->report_zones(disk, zone_start_sector, 1, - blk_zone_wplug_report_zone_cb, &zone); - memalloc_noio_restore(noio_flag); - - spin_lock_irqsave(&zwplug->lock, flags); - - /* - * A zone reset or finish may have cleared the error already. In such - * case, do nothing as the report zones may have seen the "old" write - * pointer value before the reset/finish operation completed. - */ - if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR)) - goto unlock; - - zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR; - - if (ret != 1) { - /* - * We failed to get the zone information, meaning that something - * is likely really wrong with the device. Abort all remaining - * plugged BIOs as otherwise we could endup waiting forever on - * plugged BIOs to complete if there is a queue freeze on-going. - */ - disk_zone_wplug_abort(zwplug); - goto unplug; - } - - /* Update the zone write pointer offset. */ - zwplug->wp_offset = blk_zone_wp_offset(&zone); - disk_zone_wplug_abort_unaligned(disk, zwplug); - - /* Restart BIO submission if we still have any BIO left. */ - if (!bio_list_empty(&zwplug->bio_list)) { - disk_zone_wplug_schedule_bio_work(disk, zwplug); - goto unlock; - } - -unplug: - zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; - if (disk_should_remove_zone_wplug(disk, zwplug)) - disk_remove_zone_wplug(disk, zwplug); - -unlock: - spin_unlock_irqrestore(&zwplug->lock, flags); -} - -static void disk_zone_wplugs_work(struct work_struct *work) -{ - struct gendisk *disk = - container_of(work, struct gendisk, zone_wplugs_work); - struct blk_zone_wplug *zwplug; - unsigned long flags; - - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); - - while (!list_empty(&disk->zone_wplugs_err_list)) { - zwplug = list_first_entry(&disk->zone_wplugs_err_list, - struct blk_zone_wplug, link); - list_del_init(&zwplug->link); - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); - - disk_zone_wplug_handle_error(disk, zwplug); - disk_put_zone_wplug(zwplug); - - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); - } - - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); -} - static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk) { return 1U << disk->zone_wplugs_hash_bits; @@ -1459,8 +1274,6 @@ static inline unsigned int disk_zone_wpl void disk_init_zone_resources(struct gendisk *disk) { spin_lock_init(&disk->zone_wplugs_lock); - INIT_LIST_HEAD(&disk->zone_wplugs_err_list); - INIT_WORK(&disk->zone_wplugs_work, disk_zone_wplugs_work); }
/* @@ -1559,8 +1372,6 @@ void disk_free_zone_resources(struct gen if (!disk->zone_wplugs_pool) return;
- cancel_work_sync(&disk->zone_wplugs_work); - if (disk->zone_wplugs_wq) { destroy_workqueue(disk->zone_wplugs_wq); disk->zone_wplugs_wq = NULL; @@ -1757,6 +1568,8 @@ static int blk_revalidate_seq_zone(struc if (!disk->zone_wplugs_hash) return 0;
+ disk_zone_wplug_sync_wp_offset(disk, zone); + wp_offset = blk_zone_wp_offset(zone); if (!wp_offset || wp_offset >= zone->capacity) return 0; @@ -1893,6 +1706,7 @@ int blk_revalidate_disk_zones(struct gen memalloc_noio_restore(noio_flag); return ret; } + ret = disk->fops->report_zones(disk, 0, UINT_MAX, blk_revalidate_zone_cb, &args); if (!ret) { --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -200,8 +200,6 @@ struct gendisk { spinlock_t zone_wplugs_lock; struct mempool_s *zone_wplugs_pool; struct hlist_head *zone_wplugs_hash; - struct list_head zone_wplugs_err_list; - struct work_struct zone_wplugs_work; struct workqueue_struct *zone_wplugs_wq; #endif /* CONFIG_BLK_DEV_ZONED */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com
commit 0bb18e34abdde7bf58fca8542e2dcf621924ea19 upstream.
Interrupt status (GPI_IS) register is cleared by writing 1 to it, not 0.
Cc: stable@vger.kernel.org Signed-off-by: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Andy Shevchenko andy@kernel.org Link: https://lore.kernel.org/r/20241204070415.1034449-8-mika.westerberg@linux.int... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-graniterapids.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpio/gpio-graniterapids.c +++ b/drivers/gpio/gpio-graniterapids.c @@ -166,7 +166,7 @@ static void gnr_gpio_irq_ack(struct irq_ guard(raw_spinlock_irqsave)(&priv->lock);
reg = readl(addr); - reg &= ~BIT(bit_idx); + reg |= BIT(bit_idx); writel(reg, addr); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shakeel Butt shakeel.butt@linux.dev
commit b7ffecbe198e2dfc44abf92ceb90f46150f7527a upstream.
Large kmalloc directly allocates from the page allocator and then use lruvec_stat_mod_folio() to increment the unreclaimable slab stats for global and memcg. However when post memcg charging of slab objects was added in commit 9028cdeb38e1 ("memcg: add charging of already allocated slab objects"), it missed to correctly handle the unreclaimable slab stats for memcg.
One user visisble effect of that bug is that the node level unreclaimable slab stat will work correctly but the memcg level stat can underflow as kernel correctly handles the free path but the charge path missed to increment the memcg level unreclaimable slab stat. Let's fix by correctly handle in the post charge code path.
Fixes: 9028cdeb38e1 ("memcg: add charging of already allocated slab objects") Signed-off-by: Shakeel Butt shakeel.butt@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Vlastimil Babka vbabka@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/slub.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
--- a/mm/slub.c +++ b/mm/slub.c @@ -2199,9 +2199,24 @@ bool memcg_slab_post_charge(void *p, gfp
folio = virt_to_folio(p); if (!folio_test_slab(folio)) { - return folio_memcg_kmem(folio) || - (__memcg_kmem_charge_page(folio_page(folio, 0), flags, - folio_order(folio)) == 0); + int size; + + if (folio_memcg_kmem(folio)) + return true; + + if (__memcg_kmem_charge_page(folio_page(folio, 0), flags, + folio_order(folio))) + return false; + + /* + * This folio has already been accounted in the global stats but + * not in the memcg stats. So, subtract from the global and use + * the interface which adds to both global and memcg stats. + */ + size = folio_size(folio); + node_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B, -size); + lruvec_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B, size); + return true; }
slab = folio_slab(folio);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Loehle christian.loehle@arm.com
commit 0bb394067a792e7119abc9e0b7158ef19381f456 upstream.
The early bail out that caused an out-of-bounds write was removed with commit 5c018e378f91 ("spi: spi-rockchip: Fix out of bounds array access") Unfortunately that caused the PM runtime count to be unbalanced and underflowed on the first call. To fix that reintroduce a no-op check by reading the register directly.
Cc: stable@vger.kernel.org Fixes: 5c018e378f91 ("spi: spi-rockchip: Fix out of bounds array access") Signed-off-by: Christian Loehle christian.loehle@arm.com Link: https://patch.msgid.link/1f2b3af4-2b7a-4ac8-ab95-c80120ebf44c@arm.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/spi/spi-rockchip.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/drivers/spi/spi-rockchip.c +++ b/drivers/spi/spi-rockchip.c @@ -241,6 +241,20 @@ static void rockchip_spi_set_cs(struct s struct spi_controller *ctlr = spi->controller; struct rockchip_spi *rs = spi_controller_get_devdata(ctlr); bool cs_asserted = spi->mode & SPI_CS_HIGH ? enable : !enable; + bool cs_actual; + + /* + * SPI subsystem tries to avoid no-op calls that would break the PM + * refcount below. It can't however for the first time it is used. + * To detect this case we read it here and bail out early for no-ops. + */ + if (spi_get_csgpiod(spi, 0)) + cs_actual = !!(readl_relaxed(rs->regs + ROCKCHIP_SPI_SER) & 1); + else + cs_actual = !!(readl_relaxed(rs->regs + ROCKCHIP_SPI_SER) & + BIT(spi_get_chipselect(spi, 0))); + if (unlikely(cs_actual == cs_asserted)) + return;
if (cs_asserted) { /* Keep things powered as long as CS is asserted */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haoyu Li lihaoyu499@gmail.com
commit 3396995f9fb6bcbe0004a68118a22f98bab6e2b9 upstream.
With the new __counted_by annocation in ljca_gpio_packet, the "num" struct member must be set before accessing the "item" array. Failing to do so will trigger a runtime warning when enabling CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.
Fixes: 1034cc423f1b ("gpio: update Intel LJCA USB GPIO driver") Cc: stable@vger.kernel.org Signed-off-by: Haoyu Li lihaoyu499@gmail.com Link: https://lore.kernel.org/stable/20241203141451.342316-1-lihaoyu499%40gmail.co... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-ljca.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-ljca.c b/drivers/gpio/gpio-ljca.c index d67b912d884d..c6c31e6146c7 100644 --- a/drivers/gpio/gpio-ljca.c +++ b/drivers/gpio/gpio-ljca.c @@ -82,9 +82,9 @@ static int ljca_gpio_config(struct ljca_gpio_dev *ljca_gpio, u8 gpio_id, int ret;
mutex_lock(&ljca_gpio->trans_lock); + packet->num = 1; packet->item[0].index = gpio_id; packet->item[0].value = config | ljca_gpio->connect_mode[gpio_id]; - packet->num = 1;
ret = ljca_transfer(ljca_gpio->ljca, LJCA_GPIO_CONFIG, (u8 *)packet, struct_size(packet, item, packet->num), NULL, 0);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jaakko Salo jaakkos@gmail.com
commit 82fdcf9b518b205da040046fbe7747fb3fd18657 upstream.
Use implicit feedback from the capture endpoint to fix popping sounds during playback.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219567 Signed-off-by: Jaakko Salo jaakkos@gmail.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20241206164448.8136-1-jaakkos@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -2179,6 +2179,8 @@ static const struct usb_audio_quirk_flag QUIRK_FLAG_CTL_MSG_DELAY_1M | QUIRK_FLAG_MIC_RES_384), DEVICE_FLG(0x046d, 0x09a4, /* Logitech QuickCam E 3500 */ QUIRK_FLAG_CTL_MSG_DELAY_1M | QUIRK_FLAG_IGNORE_CTL_ERROR), + DEVICE_FLG(0x0499, 0x1506, /* Yamaha THR5 */ + QUIRK_FLAG_GENERIC_IMPLICIT_FB), DEVICE_FLG(0x0499, 0x1509, /* Steinberg UR22 */ QUIRK_FLAG_GENERIC_IMPLICIT_FB), DEVICE_FLG(0x0499, 0x3108, /* Yamaha YIT-W12TX */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hridesh MG hridesh699@gmail.com
commit 5a69e3d0a1b0f07e58c353560cfcb1ea20a6f040 upstream.
Add a PCI quirk to enable microphone input on the headphone jack on the Acer Nitro 5 AN515-58 laptop.
Signed-off-by: Hridesh MG hridesh699@gmail.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20241205171843.7787-1-hridesh699@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10127,6 +10127,7 @@ static const struct hda_quirk alc269_fix SND_PCI_QUIRK(0x1025, 0x1430, "Acer TravelMate B311R-31", ALC256_FIXUP_ACER_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1025, 0x1466, "Acer Aspire A515-56", ALC255_FIXUP_ACER_HEADPHONE_AND_MIC), SND_PCI_QUIRK(0x1025, 0x1534, "Acer Predator PH315-54", ALC255_FIXUP_ACER_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1025, 0x159c, "Acer Nitro 5 AN515-58", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x169a, "Acer Swift SFG16", ALC256_FIXUP_ACER_SFG16_MICMUTE_LED), SND_PCI_QUIRK(0x1028, 0x0470, "Dell M101z", ALC269_FIXUP_DELL_M101Z), SND_PCI_QUIRK(0x1028, 0x053c, "Dell Latitude E5430", ALC292_FIXUP_DELL_E7X),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
commit b3431a8bb336cece8adc452437befa7d4534b2fd upstream.
flush_tlb_kernel_range() may use IPIs to flush the TLBs of all the cores, which triggers the following warning when the irqs are disabled:
[ 3.455330] WARNING: CPU: 1 PID: 0 at kernel/smp.c:815 smp_call_function_many_cond+0x452/0x520 [ 3.456647] Modules linked in: [ 3.457218] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.0-rc7-00010-g91d3de7240b8 #1 [ 3.457416] Hardware name: QEMU QEMU Virtual Machine, BIOS [ 3.457633] epc : smp_call_function_many_cond+0x452/0x520 [ 3.457736] ra : on_each_cpu_cond_mask+0x1e/0x30 [ 3.457786] epc : ffffffff800b669a ra : ffffffff800b67c2 sp : ff2000000000bb50 [ 3.457824] gp : ffffffff815212b8 tp : ff6000008014f080 t0 : 000000000000003f [ 3.457859] t1 : ffffffff815221e0 t2 : 000000000000000f s0 : ff2000000000bc10 [ 3.457920] s1 : 0000000000000040 a0 : ffffffff815221e0 a1 : 0000000000000001 [ 3.457953] a2 : 0000000000010000 a3 : 0000000000000003 a4 : 0000000000000000 [ 3.458006] a5 : 0000000000000000 a6 : ffffffffffffffff a7 : 0000000000000000 [ 3.458042] s2 : ffffffff815223be s3 : 00fffffffffff000 s4 : ff600001ffe38fc0 [ 3.458076] s5 : ff600001ff950d00 s6 : 0000000200000120 s7 : 0000000000000001 [ 3.458109] s8 : 0000000000000001 s9 : ff60000080841ef0 s10: 0000000000000001 [ 3.458141] s11: ffffffff81524812 t3 : 0000000000000001 t4 : ff60000080092bc0 [ 3.458172] t5 : 0000000000000000 t6 : ff200000000236d0 [ 3.458203] status: 0000000200000100 badaddr: ffffffff800b669a cause: 0000000000000003 [ 3.458373] [<ffffffff800b669a>] smp_call_function_many_cond+0x452/0x520 [ 3.458593] [<ffffffff800b67c2>] on_each_cpu_cond_mask+0x1e/0x30 [ 3.458625] [<ffffffff8000e4ca>] __flush_tlb_range+0x118/0x1ca [ 3.458656] [<ffffffff8000e6b2>] flush_tlb_kernel_range+0x1e/0x26 [ 3.458683] [<ffffffff801ea56a>] kfence_protect+0xc0/0xce [ 3.458717] [<ffffffff801e9456>] kfence_guarded_free+0xc6/0x1c0 [ 3.458742] [<ffffffff801e9d6c>] __kfence_free+0x62/0xc6 [ 3.458764] [<ffffffff801c57d8>] kfree+0x106/0x32c [ 3.458786] [<ffffffff80588cf2>] detach_buf_split+0x188/0x1a8 [ 3.458816] [<ffffffff8058708c>] virtqueue_get_buf_ctx+0xb6/0x1f6 [ 3.458839] [<ffffffff805871da>] virtqueue_get_buf+0xe/0x16 [ 3.458880] [<ffffffff80613d6a>] virtblk_done+0x5c/0xe2 [ 3.458908] [<ffffffff8058766e>] vring_interrupt+0x6a/0x74 [ 3.458930] [<ffffffff800747d8>] __handle_irq_event_percpu+0x7c/0xe2 [ 3.458956] [<ffffffff800748f0>] handle_irq_event+0x3c/0x86 [ 3.458978] [<ffffffff800786cc>] handle_simple_irq+0x9e/0xbe [ 3.459004] [<ffffffff80073934>] generic_handle_domain_irq+0x1c/0x2a [ 3.459027] [<ffffffff804bf87c>] imsic_handle_irq+0xba/0x120 [ 3.459056] [<ffffffff80073934>] generic_handle_domain_irq+0x1c/0x2a [ 3.459080] [<ffffffff804bdb76>] riscv_intc_aia_irq+0x24/0x34 [ 3.459103] [<ffffffff809d0452>] handle_riscv_irq+0x2e/0x4c [ 3.459133] [<ffffffff809d923e>] call_on_irq_stack+0x32/0x40
So only flush the local TLB and let the lazy kfence page fault handling deal with the faults which could happen when a core has an old protected pte version cached in its TLB. That leads to potential inaccuracies which can be tolerated when using kfence.
Fixes: 47513f243b45 ("riscv: Enable KFENCE for riscv64") Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241209074125.52322-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/kfence.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/riscv/include/asm/kfence.h +++ b/arch/riscv/include/asm/kfence.h @@ -22,7 +22,9 @@ static inline bool kfence_protect_page(u else set_pte(pte, __pte(pte_val(ptep_get(pte)) | _PAGE_PRESENT));
- flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + preempt_disable(); + local_flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + preempt_enable();
return true; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chenghai Huang huangchenghai2@huawei.com
commit cd26cd65476711e2c69e0a049c0eeef4b743f5ac upstream.
Offset based on (id * size) is wrong for sqc and cqc. (*sqc/*cqc + 1) can already offset sizeof(struct(Xqc)) length.
Fixes: 15f112f9cef5 ("crypto: hisilicon/debugfs - mask the unnecessary info from the dump") Cc: stable@vger.kernel.org Signed-off-by: Chenghai Huang huangchenghai2@huawei.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/hisilicon/debugfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/crypto/hisilicon/debugfs.c b/drivers/crypto/hisilicon/debugfs.c index 1b9b7bccdeff..45e130b901eb 100644 --- a/drivers/crypto/hisilicon/debugfs.c +++ b/drivers/crypto/hisilicon/debugfs.c @@ -192,7 +192,7 @@ static int qm_sqc_dump(struct hisi_qm *qm, char *s, char *name)
down_read(&qm->qps_lock); if (qm->sqc) { - memcpy(&sqc, qm->sqc + qp_id * sizeof(struct qm_sqc), sizeof(struct qm_sqc)); + memcpy(&sqc, qm->sqc + qp_id, sizeof(struct qm_sqc)); sqc.base_h = cpu_to_le32(QM_XQC_ADDR_MASK); sqc.base_l = cpu_to_le32(QM_XQC_ADDR_MASK); dump_show(qm, &sqc, sizeof(struct qm_sqc), "SOFT SQC"); @@ -229,7 +229,7 @@ static int qm_cqc_dump(struct hisi_qm *qm, char *s, char *name)
down_read(&qm->qps_lock); if (qm->cqc) { - memcpy(&cqc, qm->cqc + qp_id * sizeof(struct qm_cqc), sizeof(struct qm_cqc)); + memcpy(&cqc, qm->cqc + qp_id, sizeof(struct qm_cqc)); cqc.base_h = cpu_to_le32(QM_XQC_ADDR_MASK); cqc.base_l = cpu_to_le32(QM_XQC_ADDR_MASK); dump_show(qm, &cqc, sizeof(struct qm_cqc), "SOFT CQC");
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
commit 4011b351b1b5a953aaa7c6b3915f908b3cc1be96 upstream.
Clippy in the upcoming Rust 1.83.0 spots a spurious empty line since the `clippy::empty_line_after_doc_comments` warning is now enabled by default given it is part of the `suspicious` group [1]:
error: empty line after doc comment --> drivers/gpu/drm/drm_panic_qr.rs:931:1 | 931 | / /// They must remain valid for the duration of the function call. 932 | | | |_ 933 | #[no_mangle] 934 | / pub unsafe extern "C" fn drm_panic_qr_generate( 935 | | url: *const i8, 936 | | data: *mut u8, 937 | | data_len: usize, ... | 940 | | tmp_size: usize, 941 | | ) -> u8 { | |_______- the comment documents this function | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_d... = note: `-D clippy::empty-line-after-doc-comments` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::empty_line_after_doc_comments)]` = help: if the empty line is unintentional remove it
Thus remove the empty line.
Cc: stable@vger.kernel.org Fixes: cb5164ac43d0 ("drm/panic: Add a QR code panic screen") Link: https://github.com/rust-lang/rust-clippy/pull/13091 [1] Reviewed-by: Jocelyn Falempe jfalempe@redhat.com Link: https://lore.kernel.org/r/20241125233332.697497-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_panic_qr.rs | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/drm_panic_qr.rs index 09500cddc009..ef2d490965ba 100644 --- a/drivers/gpu/drm/drm_panic_qr.rs +++ b/drivers/gpu/drm/drm_panic_qr.rs @@ -929,7 +929,6 @@ impl QrImage<'_> { /// * `tmp` must be valid for reading and writing for `tmp_size` bytes. /// /// They must remain valid for the duration of the function call. - #[no_mangle] pub unsafe extern "C" fn drm_panic_qr_generate( url: *const i8,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
commit 0d2ada05227881f3d0722ca2364e3f7a860a301f upstream.
If the current USB request was aborted, the spi thread would not respond to any further requests. This is because the "curr_urb" pointer would not become NULL, so no further requests would be taken off the queue. The solution here is to set the "urb_done" flag, as this will cause the correct handling of the URB. Also clear interrupts that should only be expected if an URB is in progress.
Fixes: 2d53139f3162 ("Add support for using a MAX3421E chip as a host driver.") Cc: stable stable@kernel.org Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz Link: https://lore.kernel.org/r/20241124221430.1106080-1-mark.tomlinson@alliedtele... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/max3421-hcd.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
--- a/drivers/usb/host/max3421-hcd.c +++ b/drivers/usb/host/max3421-hcd.c @@ -779,11 +779,17 @@ max3421_check_unlink(struct usb_hcd *hcd retval = 1; dev_dbg(&spi->dev, "%s: URB %p unlinked=%d", __func__, urb, urb->unlinked); - usb_hcd_unlink_urb_from_ep(hcd, urb); - spin_unlock_irqrestore(&max3421_hcd->lock, - flags); - usb_hcd_giveback_urb(hcd, urb, 0); - spin_lock_irqsave(&max3421_hcd->lock, flags); + if (urb == max3421_hcd->curr_urb) { + max3421_hcd->urb_done = 1; + max3421_hcd->hien &= ~(BIT(MAX3421_HI_HXFRDN_BIT) | + BIT(MAX3421_HI_RCVDAV_BIT)); + } else { + usb_hcd_unlink_urb_from_ep(hcd, urb); + spin_unlock_irqrestore(&max3421_hcd->lock, + flags); + usb_hcd_giveback_urb(hcd, urb, 0); + spin_lock_irqsave(&max3421_hcd->lock, flags); + } } } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 5eb3317aa5a2ffe4574ab1a12cf9bc9447ca26c0 upstream.
There are currently any issuer of REQ_OP_ZONE_RESET and REQ_OP_ZONE_FINISH operations that set REQ_NOWAIT. However, as we cannot handle this flag correctly due to the potential request allocation failure that may happen in blk_mq_submit_bio() after blk_zone_plug_bio() has handled the zone write plug write pointer updates for the targeted zones, modify blk_zone_wplug_handle_reset_or_finish() to warn if this flag is set and ignore it.
Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Link: https://lore.kernel.org/r/20241209122357.47838-3-dlemoal@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-zoned.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -704,6 +704,15 @@ static bool blk_zone_wplug_handle_reset_ }
/* + * No-wait reset or finish BIOs do not make much sense as the callers + * issue these as blocking operations in most cases. To avoid issues + * the BIO execution potentially failing with BLK_STS_AGAIN, warn about + * REQ_NOWAIT being set and ignore that flag. + */ + if (WARN_ON_ONCE(bio->bi_opf & REQ_NOWAIT)) + bio->bi_opf &= ~REQ_NOWAIT; + + /* * If we have a zone write plug, set its write pointer offset to 0 * (reset case) or to the zone size (finish case). This will abort all * BIOs plugged for the target zone. It is fine as resetting or
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com
commit eb9640fd1ce666610b77f5997596e9570a36378f upstream.
Move setting irq_chip.name from probe() function to the initialization of "irq_chip" struct in order to fix vGPIO driver crash during bootup.
Crash was caused by unauthorized modification of irq_chip.name field where irq_chip struct was initialized as const.
This behavior is a consequence of suboptimal implementation of gpio_irq_chip_set_chip(), which should be changed to avoid casting away const qualifier.
Crash log: BUG: unable to handle page fault for address: ffffffffc0ba81c0 /#PF: supervisor write access in kernel mode /#PF: error_code(0x0003) - permissions violation CPU: 33 UID: 0 PID: 1075 Comm: systemd-udevd Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7 #1 Hardware name: Intel Corporation Kaseyville RP/Kaseyville RP, BIOS KVLDCRB1.PGS.0026.D73.2410081258 10/08/2024 RIP: 0010:gnr_gpio_probe+0x171/0x220 [gpio_graniterapids]
Cc: stable@vger.kernel.org Signed-off-by: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Andy Shevchenko andy@kernel.org Link: https://lore.kernel.org/r/20241204070415.1034449-2-mika.westerberg@linux.int... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-graniterapids.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpio/gpio-graniterapids.c +++ b/drivers/gpio/gpio-graniterapids.c @@ -234,6 +234,7 @@ static int gnr_gpio_irq_set_type(struct }
static const struct irq_chip gnr_gpio_irq_chip = { + .name = "gpio-graniterapids", .irq_ack = gnr_gpio_irq_ack, .irq_mask = gnr_gpio_irq_mask, .irq_unmask = gnr_gpio_irq_unmask, @@ -324,7 +325,6 @@ static int gnr_gpio_probe(struct platfor
girq = &priv->gc.irq; gpio_irq_chip_set_chip(girq, &gnr_gpio_irq_chip); - girq->chip->name = dev_name(dev); girq->parent_handler = NULL; girq->num_parents = 0; girq->parents = NULL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com
commit 7382d2f0e802077c36495e325da8d253a15fb441 upstream.
Base Address of vGPIO MMIO register is provided directly by the BIOS instead of using offsets. Update address assignment to reflect this change in driver.
Cc: stable@vger.kernel.org Signed-off-by: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Andy Shevchenko andy@kernel.org Link: https://lore.kernel.org/r/20241204070415.1034449-3-mika.westerberg@linux.int... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-graniterapids.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
--- a/drivers/gpio/gpio-graniterapids.c +++ b/drivers/gpio/gpio-graniterapids.c @@ -32,7 +32,7 @@ #define GNR_PINS_PER_REG 32 #define GNR_NUM_REGS DIV_ROUND_UP(GNR_NUM_PINS, GNR_PINS_PER_REG)
-#define GNR_CFG_BAR 0x00 +#define GNR_CFG_PADBAR 0x00 #define GNR_CFG_LOCK_OFFSET 0x04 #define GNR_GPI_STATUS_OFFSET 0x20 #define GNR_GPI_ENABLE_OFFSET 0x24 @@ -50,6 +50,7 @@ * struct gnr_gpio - Intel Granite Rapids-D vGPIO driver state * @gc: GPIO controller interface * @reg_base: base address of the GPIO registers + * @pad_base: base address of the vGPIO pad configuration registers * @ro_bitmap: bitmap of read-only pins * @lock: guard the registers * @pad_backup: backup of the register state for suspend @@ -57,6 +58,7 @@ struct gnr_gpio { struct gpio_chip gc; void __iomem *reg_base; + void __iomem *pad_base; DECLARE_BITMAP(ro_bitmap, GNR_NUM_PINS); raw_spinlock_t lock; u32 pad_backup[]; @@ -65,7 +67,7 @@ struct gnr_gpio { static void __iomem *gnr_gpio_get_padcfg_addr(const struct gnr_gpio *priv, unsigned int gpio) { - return priv->reg_base + gpio * sizeof(u32); + return priv->pad_base + gpio * sizeof(u32); }
static int gnr_gpio_configure_line(struct gpio_chip *gc, unsigned int gpio, @@ -292,6 +294,7 @@ static int gnr_gpio_probe(struct platfor struct gnr_gpio *priv; void __iomem *regs; int irq, ret; + u32 offset;
priv = devm_kzalloc(dev, struct_size(priv, pad_backup, num_backup_pins), GFP_KERNEL); if (!priv) @@ -303,6 +306,10 @@ static int gnr_gpio_probe(struct platfor if (IS_ERR(regs)) return PTR_ERR(regs);
+ priv->reg_base = regs; + offset = readl(priv->reg_base + GNR_CFG_PADBAR); + priv->pad_base = priv->reg_base + offset; + irq = platform_get_irq(pdev, 0); if (irq < 0) return irq; @@ -312,8 +319,6 @@ static int gnr_gpio_probe(struct platfor if (ret) return dev_err_probe(dev, ret, "failed to request interrupt\n");
- priv->reg_base = regs + readl(regs + GNR_CFG_BAR); - gnr_gpio_init_pin_ro_bits(dev, priv->reg_base + GNR_CFG_LOCK_OFFSET, priv->ro_bitmap);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shankar Bandal shankar.bandal@intel.com
commit 0fe329b55231cca489f9bed1db0e778d077fdaf9 upstream.
Update GPI Interrupt Status register offset to correct value.
Cc: stable@vger.kernel.org Signed-off-by: Shankar Bandal shankar.bandal@intel.com Signed-off-by: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Andy Shevchenko andy@kernel.org Link: https://lore.kernel.org/r/20241204070415.1034449-4-mika.westerberg@linux.int... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-graniterapids.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-graniterapids.c b/drivers/gpio/gpio-graniterapids.c index d2b542b536b6..be907784ccdb 100644 --- a/drivers/gpio/gpio-graniterapids.c +++ b/drivers/gpio/gpio-graniterapids.c @@ -34,7 +34,7 @@
#define GNR_CFG_PADBAR 0x00 #define GNR_CFG_LOCK_OFFSET 0x04 -#define GNR_GPI_STATUS_OFFSET 0x20 +#define GNR_GPI_STATUS_OFFSET 0x14 #define GNR_GPI_ENABLE_OFFSET 0x24
#define GNR_CFG_DW_RX_MASK GENMASK(25, 22)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shankar Bandal shankar.bandal@intel.com
commit 15636b00a055474033426b94b6372728b2163a1e upstream.
Correct RX Level/Edge Configuration register (RXEVCFG) bitmask.
Cc: stable@vger.kernel.org Signed-off-by: Shankar Bandal shankar.bandal@intel.com Signed-off-by: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Andy Shevchenko andy@kernel.org Link: https://lore.kernel.org/r/20241204070415.1034449-5-mika.westerberg@linux.int... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-graniterapids.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-graniterapids.c b/drivers/gpio/gpio-graniterapids.c index be907784ccdb..ec2931a65723 100644 --- a/drivers/gpio/gpio-graniterapids.c +++ b/drivers/gpio/gpio-graniterapids.c @@ -37,7 +37,7 @@ #define GNR_GPI_STATUS_OFFSET 0x14 #define GNR_GPI_ENABLE_OFFSET 0x24
-#define GNR_CFG_DW_RX_MASK GENMASK(25, 22) +#define GNR_CFG_DW_RX_MASK GENMASK(23, 22) #define GNR_CFG_DW_RX_DISABLE FIELD_PREP(GNR_CFG_DW_RX_MASK, 2) #define GNR_CFG_DW_RX_EDGE FIELD_PREP(GNR_CFG_DW_RX_MASK, 1) #define GNR_CFG_DW_RX_LEVEL FIELD_PREP(GNR_CFG_DW_RX_MASK, 0)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com
commit 0588504d28dedde6789aec17a6ece6fa8e477725 upstream.
Add check of HOSTSW_MODE bit to determine if GPIO pad can be used by the driver.
Cc: stable@vger.kernel.org Signed-off-by: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Andy Shevchenko andy@kernel.org Link: https://lore.kernel.org/r/20241204070415.1034449-6-mika.westerberg@linux.int... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-graniterapids.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
--- a/drivers/gpio/gpio-graniterapids.c +++ b/drivers/gpio/gpio-graniterapids.c @@ -37,6 +37,7 @@ #define GNR_GPI_STATUS_OFFSET 0x14 #define GNR_GPI_ENABLE_OFFSET 0x24
+#define GNR_CFG_DW_HOSTSW_MODE BIT(27) #define GNR_CFG_DW_RX_MASK GENMASK(23, 22) #define GNR_CFG_DW_RX_DISABLE FIELD_PREP(GNR_CFG_DW_RX_MASK, 2) #define GNR_CFG_DW_RX_EDGE FIELD_PREP(GNR_CFG_DW_RX_MASK, 1) @@ -90,6 +91,20 @@ static int gnr_gpio_configure_line(struc return 0; }
+static int gnr_gpio_request(struct gpio_chip *gc, unsigned int gpio) +{ + struct gnr_gpio *priv = gpiochip_get_data(gc); + u32 dw; + + dw = readl(gnr_gpio_get_padcfg_addr(priv, gpio)); + if (!(dw & GNR_CFG_DW_HOSTSW_MODE)) { + dev_warn(gc->parent, "GPIO %u is not owned by host", gpio); + return -EBUSY; + } + + return 0; +} + static int gnr_gpio_get(struct gpio_chip *gc, unsigned int gpio) { const struct gnr_gpio *priv = gpiochip_get_data(gc); @@ -141,6 +156,7 @@ static int gnr_gpio_direction_output(str
static const struct gpio_chip gnr_gpio_chip = { .owner = THIS_MODULE, + .request = gnr_gpio_request, .get = gnr_gpio_get, .set = gnr_gpio_set, .get_direction = gnr_gpio_get_direction,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com
commit c0ec4890d6454980c53c3cc164140115c4a671f2 upstream.
GPIO line can only be used as interrupt if its INTSEL register is programmed by the BIOS.
Cc: stable@vger.kernel.org Signed-off-by: Alan Borzeszkowski alan.borzeszkowski@linux.intel.com Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Andy Shevchenko andy@kernel.org Link: https://lore.kernel.org/r/20241204070415.1034449-7-mika.westerberg@linux.int... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-graniterapids.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpio/gpio-graniterapids.c b/drivers/gpio/gpio-graniterapids.c index b12abe77299c..3a972d460fe2 100644 --- a/drivers/gpio/gpio-graniterapids.c +++ b/drivers/gpio/gpio-graniterapids.c @@ -39,6 +39,7 @@
#define GNR_CFG_DW_HOSTSW_MODE BIT(27) #define GNR_CFG_DW_RX_MASK GENMASK(23, 22) +#define GNR_CFG_DW_INTSEL_MASK GENMASK(21, 14) #define GNR_CFG_DW_RX_DISABLE FIELD_PREP(GNR_CFG_DW_RX_MASK, 2) #define GNR_CFG_DW_RX_EDGE FIELD_PREP(GNR_CFG_DW_RX_MASK, 1) #define GNR_CFG_DW_RX_LEVEL FIELD_PREP(GNR_CFG_DW_RX_MASK, 0) @@ -227,10 +228,18 @@ static void gnr_gpio_irq_unmask(struct irq_data *d) static int gnr_gpio_irq_set_type(struct irq_data *d, unsigned int type) { struct gpio_chip *gc = irq_data_get_irq_chip_data(d); - irq_hw_number_t pin = irqd_to_hwirq(d); - u32 mask = GNR_CFG_DW_RX_MASK; + struct gnr_gpio *priv = gpiochip_get_data(gc); + irq_hw_number_t hwirq = irqd_to_hwirq(d); + u32 reg; u32 set;
+ /* Allow interrupts only if Interrupt Select field is non-zero */ + reg = readl(gnr_gpio_get_padcfg_addr(priv, hwirq)); + if (!(reg & GNR_CFG_DW_INTSEL_MASK)) { + dev_dbg(gc->parent, "GPIO %lu cannot be used as IRQ", hwirq); + return -EPERM; + } + /* Falling edge and level low triggers not supported by the GPIO controller */ switch (type) { case IRQ_TYPE_NONE: @@ -248,7 +257,7 @@ static int gnr_gpio_irq_set_type(struct irq_data *d, unsigned int type) return -EINVAL; }
- return gnr_gpio_configure_line(gc, pin, mask, set); + return gnr_gpio_configure_line(gc, hwirq, GNR_CFG_DW_RX_MASK, set); }
static const struct irq_chip gnr_gpio_irq_chip = {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xu Yang xu.yang_2@nxp.com
commit d2ec94fbc431cc77ed53d4480bdc856669c2b5aa upstream.
Before commit 53a2d95df836 ("usb: core: add phy notify connect and disconnect"), phy initialization will be skipped even when shared hcd doesn't set skip_phy_initialization flag. However, the situation is changed after the commit. The hcd.c will initialize phy when add shared hcd. This behavior is unexpected for some platforms which will handle phy initialization by themselves. To avoid the issue, this will only check skip_phy_initialization flag of primary hcd since shared hcd normally follow primary hcd setting.
Fixes: 53a2d95df836 ("usb: core: add phy notify connect and disconnect") Cc: stable@vger.kernel.org Signed-off-by: Xu Yang xu.yang_2@nxp.com Link: https://lore.kernel.org/r/20241105090120.2438366-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hcd.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -2794,8 +2794,14 @@ int usb_add_hcd(struct usb_hcd *hcd, int retval; struct usb_device *rhdev; struct usb_hcd *shared_hcd; + int skip_phy_initialization;
- if (!hcd->skip_phy_initialization) { + if (usb_hcd_is_primary_hcd(hcd)) + skip_phy_initialization = hcd->skip_phy_initialization; + else + skip_phy_initialization = hcd->primary_hcd->skip_phy_initialization; + + if (!skip_phy_initialization) { if (usb_hcd_is_primary_hcd(hcd)) { hcd->phy_roothub = usb_phy_roothub_alloc(hcd->self.sysdev); if (IS_ERR(hcd->phy_roothub))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kumar Kartikeya Dwivedi memxor@gmail.com
commit c00d738e1673ab801e1577e4e3c780ccf88b1a5b upstream.
This patch reverts commit cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The patch was well-intended and meant to be as a stop-gap fixing branch prediction when the pointer may actually be NULL at runtime. Eventually, it was supposed to be replaced by an automated script or compiler pass detecting possibly NULL arguments and marking them accordingly.
However, it caused two main issues observed for production programs and failed to preserve backwards compatibility. First, programs relied on the verifier not exploring == NULL branch when pointer is not NULL, thus they started failing with a 'dereference of scalar' error. Next, allowing raw_tp arguments to be modified surfaced the warning in the verifier that warns against reg->off when PTR_MAYBE_NULL is set.
More information, context, and discusson on both problems is available in [0]. Overall, this approach had several shortcomings, and the fixes would further complicate the verifier's logic, and the entire masking scheme would have to be removed eventually anyway.
Hence, revert the patch in preparation of a better fix avoiding these issues to replace this commit.
[0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
Reported-by: Manu Bretelle chantra@meta.com Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf.h | 6 - kernel/bpf/btf.c | 5 kernel/bpf/verifier.c | 79 +-------------- tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c | 6 - 4 files changed, 9 insertions(+), 87 deletions(-)
--- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -3471,10 +3471,4 @@ static inline bool bpf_is_subprog(const return prog->aux->func_idx != 0; }
-static inline bool bpf_prog_is_raw_tp(const struct bpf_prog *prog) -{ - return prog->type == BPF_PROG_TYPE_TRACING && - prog->expected_attach_type == BPF_TRACE_RAW_TP; -} - #endif /* _LINUX_BPF_H */ --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6564,10 +6564,7 @@ bool btf_ctx_access(int off, int size, e if (prog_args_trusted(prog)) info->reg_type |= PTR_TRUSTED;
- /* Raw tracepoint arguments always get marked as maybe NULL */ - if (bpf_prog_is_raw_tp(prog)) - info->reg_type |= PTR_MAYBE_NULL; - else if (btf_param_match_suffix(btf, &args[arg], "__nullable")) + if (btf_param_match_suffix(btf, &args[arg], "__nullable")) info->reg_type |= PTR_MAYBE_NULL;
if (tgt_prog) { --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -418,25 +418,6 @@ static struct btf_record *reg_btf_record return rec; }
-static bool mask_raw_tp_reg_cond(const struct bpf_verifier_env *env, struct bpf_reg_state *reg) { - return reg->type == (PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL) && - bpf_prog_is_raw_tp(env->prog) && !reg->ref_obj_id; -} - -static bool mask_raw_tp_reg(const struct bpf_verifier_env *env, struct bpf_reg_state *reg) -{ - if (!mask_raw_tp_reg_cond(env, reg)) - return false; - reg->type &= ~PTR_MAYBE_NULL; - return true; -} - -static void unmask_raw_tp_reg(struct bpf_reg_state *reg, bool result) -{ - if (result) - reg->type |= PTR_MAYBE_NULL; -} - static bool subprog_is_global(const struct bpf_verifier_env *env, int subprog) { struct bpf_func_info_aux *aux = env->prog->aux->func_info_aux; @@ -6618,7 +6599,6 @@ static int check_ptr_to_btf_access(struc const char *field_name = NULL; enum bpf_type_flag flag = 0; u32 btf_id = 0; - bool mask; int ret;
if (!env->allow_ptr_leaks) { @@ -6690,21 +6670,7 @@ static int check_ptr_to_btf_access(struc
if (ret < 0) return ret; - /* For raw_tp progs, we allow dereference of PTR_MAYBE_NULL - * trusted PTR_TO_BTF_ID, these are the ones that are possibly - * arguments to the raw_tp. Since internal checks in for trusted - * reg in check_ptr_to_btf_access would consider PTR_MAYBE_NULL - * modifier as problematic, mask it out temporarily for the - * check. Don't apply this to pointers with ref_obj_id > 0, as - * those won't be raw_tp args. - * - * We may end up applying this relaxation to other trusted - * PTR_TO_BTF_ID with maybe null flag, since we cannot - * distinguish PTR_MAYBE_NULL tagged for arguments vs normal - * tagging, but that should expand allowed behavior, and not - * cause regression for existing behavior. - */ - mask = mask_raw_tp_reg(env, reg); + if (ret != PTR_TO_BTF_ID) { /* just mark; */
@@ -6765,13 +6731,8 @@ static int check_ptr_to_btf_access(struc clear_trusted_flags(&flag); }
- if (atype == BPF_READ && value_regno >= 0) { + if (atype == BPF_READ && value_regno >= 0) mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag); - /* We've assigned a new type to regno, so don't undo masking. */ - if (regno == value_regno) - mask = false; - } - unmask_raw_tp_reg(reg, mask);
return 0; } @@ -7146,7 +7107,7 @@ static int check_mem_access(struct bpf_v if (!err && t == BPF_READ && value_regno >= 0) mark_reg_unknown(env, regs, value_regno); } else if (base_type(reg->type) == PTR_TO_BTF_ID && - (mask_raw_tp_reg_cond(env, reg) || !type_may_be_null(reg->type))) { + !type_may_be_null(reg->type)) { err = check_ptr_to_btf_access(env, regs, regno, off, size, t, value_regno); } else if (reg->type == CONST_PTR_TO_MAP) { @@ -8844,7 +8805,6 @@ static int check_func_arg(struct bpf_ver enum bpf_reg_type type = reg->type; u32 *arg_btf_id = NULL; int err = 0; - bool mask;
if (arg_type == ARG_DONTCARE) return 0; @@ -8885,11 +8845,11 @@ static int check_func_arg(struct bpf_ver base_type(arg_type) == ARG_PTR_TO_SPIN_LOCK) arg_btf_id = fn->arg_btf_id[arg];
- mask = mask_raw_tp_reg(env, reg); err = check_reg_type(env, regno, arg_type, arg_btf_id, meta); + if (err) + return err;
- err = err ?: check_func_arg_reg_off(env, reg, regno, arg_type); - unmask_raw_tp_reg(reg, mask); + err = check_func_arg_reg_off(env, reg, regno, arg_type); if (err) return err;
@@ -9684,17 +9644,14 @@ static int btf_check_func_arg_match(stru return ret; } else if (base_type(arg->arg_type) == ARG_PTR_TO_BTF_ID) { struct bpf_call_arg_meta meta; - bool mask; int err;
if (register_is_null(reg) && type_may_be_null(arg->arg_type)) continue;
memset(&meta, 0, sizeof(meta)); /* leave func_id as zero */ - mask = mask_raw_tp_reg(env, reg); err = check_reg_type(env, regno, arg->arg_type, &arg->btf_id, &meta); err = err ?: check_func_arg_reg_off(env, reg, regno, arg->arg_type); - unmask_raw_tp_reg(reg, mask); if (err) return err; } else { @@ -12009,7 +11966,6 @@ static int check_kfunc_args(struct bpf_v enum bpf_arg_type arg_type = ARG_DONTCARE; u32 regno = i + 1, ref_id, type_size; bool is_ret_buf_sz = false; - bool mask = false; int kf_arg_type;
t = btf_type_skip_modifiers(btf, args[i].type, NULL); @@ -12068,15 +12024,12 @@ static int check_kfunc_args(struct bpf_v return -EINVAL; }
- mask = mask_raw_tp_reg(env, reg); if ((is_kfunc_trusted_args(meta) || is_kfunc_rcu(meta)) && (register_is_null(reg) || type_may_be_null(reg->type)) && !is_kfunc_arg_nullable(meta->btf, &args[i])) { verbose(env, "Possibly NULL pointer passed to trusted arg%d\n", i); - unmask_raw_tp_reg(reg, mask); return -EACCES; } - unmask_raw_tp_reg(reg, mask);
if (reg->ref_obj_id) { if (is_kfunc_release(meta) && meta->ref_obj_id) { @@ -12134,24 +12087,16 @@ static int check_kfunc_args(struct bpf_v if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta)) break;
- /* Allow passing maybe NULL raw_tp arguments to - * kfuncs for compatibility. Don't apply this to - * arguments with ref_obj_id > 0. - */ - mask = mask_raw_tp_reg(env, reg); if (!is_trusted_reg(reg)) { if (!is_kfunc_rcu(meta)) { verbose(env, "R%d must be referenced or trusted\n", regno); - unmask_raw_tp_reg(reg, mask); return -EINVAL; } if (!is_rcu_reg(reg)) { verbose(env, "R%d must be a rcu pointer\n", regno); - unmask_raw_tp_reg(reg, mask); return -EINVAL; } } - unmask_raw_tp_reg(reg, mask); fallthrough; case KF_ARG_PTR_TO_CTX: case KF_ARG_PTR_TO_DYNPTR: @@ -12174,9 +12119,7 @@ static int check_kfunc_args(struct bpf_v
if (is_kfunc_release(meta) && reg->ref_obj_id) arg_type |= OBJ_RELEASE; - mask = mask_raw_tp_reg(env, reg); ret = check_func_arg_reg_off(env, reg, regno, arg_type); - unmask_raw_tp_reg(reg, mask); if (ret < 0) return ret;
@@ -12353,7 +12296,6 @@ static int check_kfunc_args(struct bpf_v ref_tname = btf_name_by_offset(btf, ref_t->name_off); fallthrough; case KF_ARG_PTR_TO_BTF_ID: - mask = mask_raw_tp_reg(env, reg); /* Only base_type is checked, further checks are done here */ if ((base_type(reg->type) != PTR_TO_BTF_ID || (bpf_type_has_unsafe_modifiers(reg->type) && !is_rcu_reg(reg))) && @@ -12362,11 +12304,9 @@ static int check_kfunc_args(struct bpf_v verbose(env, "expected %s or socket\n", reg_type_str(env, base_type(reg->type) | (type_flag(reg->type) & BPF_REG_TRUSTED_MODIFIERS))); - unmask_raw_tp_reg(reg, mask); return -EINVAL; } ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i); - unmask_raw_tp_reg(reg, mask); if (ret < 0) return ret; break; @@ -13336,7 +13276,7 @@ static int sanitize_check_bounds(struct */ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, struct bpf_insn *insn, - struct bpf_reg_state *ptr_reg, + const struct bpf_reg_state *ptr_reg, const struct bpf_reg_state *off_reg) { struct bpf_verifier_state *vstate = env->cur_state; @@ -13350,7 +13290,6 @@ static int adjust_ptr_min_max_vals(struc struct bpf_sanitize_info info = {}; u8 opcode = BPF_OP(insn->code); u32 dst = insn->dst_reg; - bool mask; int ret;
dst_reg = ®s[dst]; @@ -13377,14 +13316,11 @@ static int adjust_ptr_min_max_vals(struc return -EACCES; }
- mask = mask_raw_tp_reg(env, ptr_reg); if (ptr_reg->type & PTR_MAYBE_NULL) { verbose(env, "R%d pointer arithmetic on %s prohibited, null-check it first\n", dst, reg_type_str(env, ptr_reg->type)); - unmask_raw_tp_reg(ptr_reg, mask); return -EACCES; } - unmask_raw_tp_reg(ptr_reg, mask);
switch (base_type(ptr_reg->type)) { case PTR_TO_CTX: @@ -19934,7 +19870,6 @@ static int convert_ctx_accesses(struct b * for this case. */ case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED: - case PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL: if (type == BPF_READ) { if (BPF_MODE(insn->code) == BPF_MEM) insn->code = BPF_LDX | BPF_PROBE_MEM | --- a/tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c +++ b/tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c @@ -7,11 +7,7 @@ #include "bpf_misc.h"
SEC("tp_btf/bpf_testmod_test_nullable_bare") -/* This used to be a failure test, but raw_tp nullable arguments can now - * directly be dereferenced, whether they have nullable annotation or not, - * and don't need to be explicitly checked. - */ -__success +__failure __msg("R1 invalid mem access 'trusted_ptr_or_null_'") int BPF_PROG(handle_tp_btf_nullable_bare1, struct bpf_testmod_test_read_ctx *nullable_ctx) { return nullable_ctx->len;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp
commit 676fe1f6f74db988191dab5df3bf256908177072 upstream.
The OF node reference obtained by of_parse_phandle_with_args() is not released on early return. Add a of_node_put() call before returning.
Fixes: 8996b89d6bc9 ("ata: add platform driver for Calxeda AHCI controller") Signed-off-by: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/sata_highbank.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/ata/sata_highbank.c +++ b/drivers/ata/sata_highbank.c @@ -348,6 +348,7 @@ static int highbank_initialize_phys(stru phy_nodes[phy] = phy_data.np; cphy_base[phy] = of_iomap(phy_nodes[phy], 0); if (cphy_base[phy] == NULL) { + of_node_put(phy_data.np); return 0; } phy_count += 1;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren wahrenst@gmx.net
commit 336f72d3cbf5cc17df2947bbbd2ba6e2509f17e8 upstream.
The Raspberry Pi can suffer on interrupt storms on HCD resume. The dwc2 driver sometimes misses to enable HCD_FLAG_HW_ACCESSIBLE before re-enabling the interrupts. This causes a situation where both handler ignore a incoming port interrupt and force the upper layers to disable the dwc2 interrupt line. This leaves the USB interface in a unusable state:
irq 66: nobody cared (try booting with the "irqpoll" option) CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.10.0-rc3 Hardware name: BCM2835 Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x50/0x64 dump_stack_lvl from __report_bad_irq+0x38/0xc0 __report_bad_irq from note_interrupt+0x2ac/0x2f4 note_interrupt from handle_irq_event+0x88/0x8c handle_irq_event from handle_level_irq+0xb4/0x1ac handle_level_irq from generic_handle_domain_irq+0x24/0x34 generic_handle_domain_irq from bcm2836_chained_handle_irq+0x24/0x28 bcm2836_chained_handle_irq from generic_handle_domain_irq+0x24/0x34 generic_handle_domain_irq from generic_handle_arch_irq+0x34/0x44 generic_handle_arch_irq from __irq_svc+0x88/0xb0 Exception stack(0xc1b01f20 to 0xc1b01f68) 1f20: 0005c0d4 00000001 00000000 00000000 c1b09780 c1d6b32c c1b04e54 c1a5eae8 1f40: c1b04e90 00000000 00000000 00000000 c1d6a8a0 c1b01f70 c11d2da8 c11d4160 1f60: 60000013 ffffffff __irq_svc from default_idle_call+0x1c/0xb0 default_idle_call from do_idle+0x21c/0x284 do_idle from cpu_startup_entry+0x28/0x2c cpu_startup_entry from kernel_init+0x0/0x12c handlers: [<f539e0f4>] dwc2_handle_common_intr [<75cd278b>] usb_hcd_irq Disabling IRQ #66
So enable the HCD_FLAG_HW_ACCESSIBLE flag in case there is a port connection.
Fixes: c74c26f6e398 ("usb: dwc2: Fix partial power down exiting by system resume") Closes: https://lore.kernel.org/linux-usb/3fd0c2fb-4752-45b3-94eb-42352703e1fd@gmx.n... Link: https://lore.kernel.org/all/5e8cbce0-3260-2971-484f-fc73a3b2bd28@synopsys.co... Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://lore.kernel.org/r/20241202001631.75473-2-wahrenst@gmx.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc2/hcd.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index cb54390e7de4..26a320c1979a 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -4431,6 +4431,7 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd) * Power Down mode. */ if (hprt0 & HPRT0_CONNSTS) { + set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags); hsotg->lx_state = DWC2_L0; goto unlock; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren wahrenst@gmx.net
commit a8d3e4a734599c7d0f6735f8db8a812e503395dd upstream.
On Rasperry Pis without onboard USB hub the power cycle during power connect init only disable the port but never enabled it again:
usb usb1-port1: attempt power cycle
The port relevant part in dwc2_hcd_hub_control() is skipped in case port_connect_status = 0 under the assumption the core is or will be soon in device mode. But this assumption is wrong, because after ClearPortFeature USB_PORT_FEAT_POWER the port_connect_status will also be 0 and SetPortFeature (incl. USB_PORT_FEAT_POWER) will be a no-op.
Fix the behavior of dwc2_hcd_hub_control() by replacing the port_connect_status check with dwc2_is_device_mode().
Link: https://github.com/raspberrypi/linux/issues/6247 Fixes: 7359d482eb4d ("staging: HCD files for the DWC2 driver") Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://lore.kernel.org/r/20241202001631.75473-3-wahrenst@gmx.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc2/hcd.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
--- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -3546,11 +3546,9 @@ static int dwc2_hcd_hub_control(struct d port_status |= USB_PORT_STAT_C_OVERCURRENT << 16; }
- if (!hsotg->flags.b.port_connect_status) { + if (dwc2_is_device_mode(hsotg)) { /* - * The port is disconnected, which means the core is - * either in device mode or it soon will be. Just - * return 0's for the remainder of the port status + * Just return 0's for the remainder of the port status * since the port register can't be read if the core * is in device mode. */ @@ -3620,13 +3618,11 @@ static int dwc2_hcd_hub_control(struct d if (wvalue != USB_PORT_FEAT_TEST && (!windex || windex > 1)) goto error;
- if (!hsotg->flags.b.port_connect_status) { + if (dwc2_is_device_mode(hsotg)) { /* - * The port is disconnected, which means the core is - * either in device mode or it soon will be. Just - * return without doing anything since the port - * register can't be written if the core is in device - * mode. + * Just return 0's for the remainder of the port status + * since the port register can't be read if the core + * is in device mode. */ break; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren wahrenst@gmx.net
commit 1cf1bd88f129f3bd647fead4dca270a5894274bb upstream.
On Raspberry Pis without onboard USB hub frequent device reconnects can trigger a interrupt storm after DWC2 entered host clock gating. This is caused by a race between _dwc2_hcd_suspend() and the port interrupt, which sets port_connect_status. The issue occurs if port_connect_status is still 1, but there is no connection anymore:
usb 1-1: USB disconnect, device number 25 dwc2 3f980000.usb: _dwc2_hcd_suspend: port_connect_status: 1 dwc2 3f980000.usb: Entering host clock gating. Disabling IRQ #66 irq 66: nobody cared (try booting with the "irqpoll" option) CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-gc1bb81b13202-dirty #322 Hardware name: BCM2835 Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x50/0x64 dump_stack_lvl from __report_bad_irq+0x38/0xc0 __report_bad_irq from note_interrupt+0x2ac/0x2f4 note_interrupt from handle_irq_event+0x88/0x8c handle_irq_event from handle_level_irq+0xb4/0x1ac handle_level_irq from generic_handle_domain_irq+0x24/0x34 generic_handle_domain_irq from bcm2836_chained_handle_irq+0x24/0x28 bcm2836_chained_handle_irq from generic_handle_domain_irq+0x24/0x34 generic_handle_domain_irq from generic_handle_arch_irq+0x34/0x44 generic_handle_arch_irq from __irq_svc+0x88/0xb0 Exception stack(0xc1d01f20 to 0xc1d01f68) 1f20: 0004ef3c 00000001 00000000 00000000 c1d09780 c1f6bb5c c1d04e54 c1c60ca8 1f40: c1d04e94 00000000 00000000 c1d092a8 c1f6af20 c1d01f70 c1211b98 c1212f40 1f60: 60000013 ffffffff __irq_svc from default_idle_call+0x1c/0xb0 default_idle_call from do_idle+0x21c/0x284 do_idle from cpu_startup_entry+0x28/0x2c cpu_startup_entry from kernel_init+0x0/0x12c handlers: [<e3a25c00>] dwc2_handle_common_intr [<58bf98a3>] usb_hcd_irq Disabling IRQ #66
So avoid this by reading the connection status directly.
Fixes: 113f86d0c302 ("usb: dwc2: Update partial power down entering by system suspend") Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://lore.kernel.org/r/20241202001631.75473-4-wahrenst@gmx.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc2/hcd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -4345,7 +4345,7 @@ static int _dwc2_hcd_suspend(struct usb_ if (hsotg->bus_suspended) goto skip_power_saving;
- if (hsotg->flags.b.port_connect_status == 0) + if (!(dwc2_read_hprt0(hsotg) & HPRT0_CONNSTS)) goto skip_power_saving;
switch (hsotg->params.power_down) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: liuderong liuderong@oppo.com
commit f103396ae31851d00b561ff9f8a32a441953ff8b upstream.
lrbp->compl_time_stamp_local_clock is set to zero after sending a sqe but it is not updated after completing a cqe. Thus the printed information in ufshcd_print_tr() will always be zero.
Update lrbp->cmpl_time_stamp_local_clock after completing a cqe.
Log sample:
ufshcd-qcom 1d84000.ufshc: UPIU[8] - issue time 8750227249 us ufshcd-qcom 1d84000.ufshc: UPIU[8] - complete time 0 us
Fixes: c30d8d010b5e ("scsi: ufs: core: Prepare for completion in MCQ") Reviewed-by: Bean Huo beanhuo@micron.com Reviewed-by: Peter Wang peter.wang@mediatek.com Signed-off-by: liuderong liuderong@oppo.com Link: https://lore.kernel.org/r/1733470182-220841-1-git-send-email-liuderong@oppo.... Reviewed-by: Avri Altman avri.altman@wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ufs/core/ufshcd.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -5566,6 +5566,7 @@ void ufshcd_compl_one_cqe(struct ufs_hba
lrbp = &hba->lrb[task_tag]; lrbp->compl_time_stamp = ktime_get(); + lrbp->compl_time_stamp_local_clock = local_clock(); cmd = lrbp->cmd; if (cmd) { if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 82937056967da052cbc04b4435c13db84192dc52 upstream.
The UMP Function Block info m1.0 field (represented by is_midi1 sysfs entry) is an enumeration from 0 to 2, while the midi2 gadget driver incorrectly copies it to the corresponding snd_ump_block_info.flags bits as-is. This made the wrong bit flags set when m1.0 = 2.
This patch corrects the wrong interpretation of is_midi1 bits.
Fixes: 29ee7a4dddd5 ("usb: gadget: midi2: Add configfs support") Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20241127070213.8232-1-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/f_midi2.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/usb/gadget/function/f_midi2.c +++ b/drivers/usb/gadget/function/f_midi2.c @@ -1593,7 +1593,11 @@ static int f_midi2_create_card(struct f_ fb->info.midi_ci_version = b->midi_ci_version; fb->info.ui_hint = reverse_dir(b->ui_hint); fb->info.sysex8_streams = b->sysex8_streams; - fb->info.flags |= b->is_midi1; + if (b->is_midi1 < 2) + fb->info.flags |= b->is_midi1; + else + fb->info.flags |= SNDRV_UMP_BLOCK_IS_MIDI1 | + SNDRV_UMP_BLOCK_IS_LOWSPEED; strscpy(fb->info.name, ump_fb_name(b), sizeof(fb->info.name)); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitalii Mordan mordan@ispras.ru
commit 97264eaaba0122a5b7e8ddd7bf4ff3ac57c2b170 upstream.
If the clocks priv->iclk and priv->fclk were not enabled in ehci_hcd_sh_probe, they should not be disabled in any path.
Conversely, if they was enabled in ehci_hcd_sh_probe, they must be disabled in all error paths to ensure proper cleanup.
Found by Linux Verification Center (linuxtesting.org) with Klever.
Fixes: 63c845522263 ("usb: ehci-hcd: Add support for SuperH EHCI.") Cc: stable@vger.kernel.org # ff30bd6a6618: sh: clk: Fix clk_enable() to return 0 on NULL clk Signed-off-by: Vitalii Mordan mordan@ispras.ru Reviewed-by: Alan Stern stern@rowland.harvard.edu Link: https://lore.kernel.org/r/20241121114700.2100520-1-mordan@ispras.ru Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/ehci-sh.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/usb/host/ehci-sh.c +++ b/drivers/usb/host/ehci-sh.c @@ -119,8 +119,12 @@ static int ehci_hcd_sh_probe(struct plat if (IS_ERR(priv->iclk)) priv->iclk = NULL;
- clk_enable(priv->fclk); - clk_enable(priv->iclk); + ret = clk_enable(priv->fclk); + if (ret) + goto fail_request_resource; + ret = clk_enable(priv->iclk); + if (ret) + goto fail_iclk;
ret = usb_add_hcd(hcd, irq, IRQF_SHARED); if (ret != 0) { @@ -136,6 +140,7 @@ static int ehci_hcd_sh_probe(struct plat
fail_add_hcd: clk_disable(priv->iclk); +fail_iclk: clk_disable(priv->fclk);
fail_request_resource:
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp
commit 645d56e4cc74e953284809d096532c1955918a28 upstream.
An fwnode_handle and usb_role_switch are obtained with an incremented refcount in anx7411_typec_port_probe(), however the refcounts are not decremented in the error path. The fwnode_handle is also not decremented in the .remove() function. Therefore, call fwnode_handle_put() and usb_role_switch_put() accordingly.
Fixes: fe6d8a9c8e64 ("usb: typec: anx7411: Add Analogix PD ANX7411 support") Cc: stable@vger.kernel.org Signed-off-by: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20241121023429.962848-1-joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/anx7411.c | 47 +++++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 18 deletions(-)
diff --git a/drivers/usb/typec/anx7411.c b/drivers/usb/typec/anx7411.c index d1e7c487ddfb..95607efb9f7e 100644 --- a/drivers/usb/typec/anx7411.c +++ b/drivers/usb/typec/anx7411.c @@ -1021,6 +1021,16 @@ static void anx7411_port_unregister_altmodes(struct typec_altmode **adev) } }
+static void anx7411_port_unregister(struct typec_params *typecp) +{ + fwnode_handle_put(typecp->caps.fwnode); + anx7411_port_unregister_altmodes(typecp->port_amode); + if (typecp->port) + typec_unregister_port(typecp->port); + if (typecp->role_sw) + usb_role_switch_put(typecp->role_sw); +} + static int anx7411_usb_mux_set(struct typec_mux_dev *mux, struct typec_mux_state *state) { @@ -1154,34 +1164,34 @@ static int anx7411_typec_port_probe(struct anx7411_data *ctx, ret = fwnode_property_read_string(fwnode, "power-role", &buf); if (ret) { dev_err(dev, "power-role not found: %d\n", ret); - return ret; + goto put_fwnode; }
ret = typec_find_port_power_role(buf); if (ret < 0) - return ret; + goto put_fwnode; cap->type = ret;
ret = fwnode_property_read_string(fwnode, "data-role", &buf); if (ret) { dev_err(dev, "data-role not found: %d\n", ret); - return ret; + goto put_fwnode; }
ret = typec_find_port_data_role(buf); if (ret < 0) - return ret; + goto put_fwnode; cap->data = ret;
ret = fwnode_property_read_string(fwnode, "try-power-role", &buf); if (ret) { dev_err(dev, "try-power-role not found: %d\n", ret); - return ret; + goto put_fwnode; }
ret = typec_find_power_role(buf); if (ret < 0) - return ret; + goto put_fwnode; cap->prefer_role = ret;
/* Get source pdos */ @@ -1193,7 +1203,7 @@ static int anx7411_typec_port_probe(struct anx7411_data *ctx, typecp->src_pdo_nr); if (ret < 0) { dev_err(dev, "source cap validate failed: %d\n", ret); - return -EINVAL; + goto put_fwnode; }
typecp->caps_flags |= HAS_SOURCE_CAP; @@ -1207,7 +1217,7 @@ static int anx7411_typec_port_probe(struct anx7411_data *ctx, typecp->sink_pdo_nr); if (ret < 0) { dev_err(dev, "sink cap validate failed: %d\n", ret); - return -EINVAL; + goto put_fwnode; }
for (i = 0; i < typecp->sink_pdo_nr; i++) { @@ -1251,13 +1261,21 @@ static int anx7411_typec_port_probe(struct anx7411_data *ctx, ret = PTR_ERR(ctx->typec.port); ctx->typec.port = NULL; dev_err(dev, "Failed to register type c port %d\n", ret); - return ret; + goto put_usb_role_switch; }
typec_port_register_altmodes(ctx->typec.port, NULL, ctx, ctx->typec.port_amode, MAX_ALTMODE); return 0; + +put_usb_role_switch: + if (ctx->typec.role_sw) + usb_role_switch_put(ctx->typec.role_sw); +put_fwnode: + fwnode_handle_put(fwnode); + + return ret; }
static int anx7411_typec_check_connection(struct anx7411_data *ctx) @@ -1523,8 +1541,7 @@ free_wq: destroy_workqueue(plat->workqueue);
free_typec_port: - typec_unregister_port(plat->typec.port); - anx7411_port_unregister_altmodes(plat->typec.port_amode); + anx7411_port_unregister(&plat->typec);
free_typec_switch: anx7411_unregister_switch(plat); @@ -1548,17 +1565,11 @@ static void anx7411_i2c_remove(struct i2c_client *client)
i2c_unregister_device(plat->spi_client);
- if (plat->typec.role_sw) - usb_role_switch_put(plat->typec.role_sw); - anx7411_unregister_mux(plat);
anx7411_unregister_switch(plat);
- if (plat->typec.port) - typec_unregister_port(plat->typec.port); - - anx7411_port_unregister_altmodes(plat->typec.port_amode); + anx7411_port_unregister(&plat->typec); }
static const struct i2c_device_id anx7411_id[] = {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xu Yang xu.yang_2@nxp.com
commit a4faee01179a4d9cbad9ba6be2da8637c68c1438 upstream.
When unbind and bind the device again, kernel will dump below warning:
[ 173.972130] sysfs: cannot create duplicate filename '/devices/platform/soc/4c010010.usb/software_node' [ 173.981564] CPU: 2 UID: 0 PID: 536 Comm: sh Not tainted 6.12.0-rc6-06344-g2aed7c4a5c56 #144 [ 173.989923] Hardware name: NXP i.MX95 15X15 board (DT) [ 173.995062] Call trace: [ 173.997509] dump_backtrace+0x90/0xe8 [ 174.001196] show_stack+0x18/0x24 [ 174.004524] dump_stack_lvl+0x74/0x8c [ 174.008198] dump_stack+0x18/0x24 [ 174.011526] sysfs_warn_dup+0x64/0x80 [ 174.015201] sysfs_do_create_link_sd+0xf0/0xf8 [ 174.019656] sysfs_create_link+0x20/0x40 [ 174.023590] software_node_notify+0x90/0x100 [ 174.027872] device_create_managed_software_node+0xec/0x108 ...
The '4c010010.usb' device is a platform device created during the initcall and is never removed, which causes its associated software node to persist indefinitely.
The existing device_create_managed_software_node() does not provide a corresponding removal function.
Replace device_create_managed_software_node() with the device_add_software_node() and device_remove_software_node() pair to ensure proper addition and removal of software nodes, addressing this issue.
Fixes: a9400f1979a0 ("usb: dwc3: imx8mp: add 2 software managed quirk properties for host mode") Cc: stable@vger.kernel.org Reviewed-by: Frank Li Frank.Li@nxp.com Signed-off-by: Xu Yang xu.yang_2@nxp.com Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Link: https://lore.kernel.org/r/20241126032841.2458338-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/dwc3-imx8mp.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-imx8mp.c b/drivers/usb/dwc3/dwc3-imx8mp.c index 356812cbcd88..3edc5aca76f9 100644 --- a/drivers/usb/dwc3/dwc3-imx8mp.c +++ b/drivers/usb/dwc3/dwc3-imx8mp.c @@ -129,6 +129,16 @@ static void dwc3_imx8mp_wakeup_disable(struct dwc3_imx8mp *dwc3_imx) writel(val, dwc3_imx->hsio_blk_base + USB_WAKEUP_CTRL); }
+static const struct property_entry dwc3_imx8mp_properties[] = { + PROPERTY_ENTRY_BOOL("xhci-missing-cas-quirk"), + PROPERTY_ENTRY_BOOL("xhci-skip-phy-init-quirk"), + {}, +}; + +static const struct software_node dwc3_imx8mp_swnode = { + .properties = dwc3_imx8mp_properties, +}; + static irqreturn_t dwc3_imx8mp_interrupt(int irq, void *_dwc3_imx) { struct dwc3_imx8mp *dwc3_imx = _dwc3_imx; @@ -148,17 +158,6 @@ static irqreturn_t dwc3_imx8mp_interrupt(int irq, void *_dwc3_imx) return IRQ_HANDLED; }
-static int dwc3_imx8mp_set_software_node(struct device *dev) -{ - struct property_entry props[3] = { 0 }; - int prop_idx = 0; - - props[prop_idx++] = PROPERTY_ENTRY_BOOL("xhci-missing-cas-quirk"); - props[prop_idx++] = PROPERTY_ENTRY_BOOL("xhci-skip-phy-init-quirk"); - - return device_create_managed_software_node(dev, props, NULL); -} - static int dwc3_imx8mp_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -221,17 +220,17 @@ static int dwc3_imx8mp_probe(struct platform_device *pdev) if (err < 0) goto disable_rpm;
- err = dwc3_imx8mp_set_software_node(dev); + err = device_add_software_node(dev, &dwc3_imx8mp_swnode); if (err) { err = -ENODEV; - dev_err(dev, "failed to create software node\n"); + dev_err(dev, "failed to add software node\n"); goto disable_rpm; }
err = of_platform_populate(node, NULL, NULL, dev); if (err) { dev_err(&pdev->dev, "failed to create dwc3 core\n"); - goto disable_rpm; + goto remove_swnode; }
dwc3_imx->dwc3 = of_find_device_by_node(dwc3_np); @@ -255,6 +254,8 @@ static int dwc3_imx8mp_probe(struct platform_device *pdev)
depopulate: of_platform_depopulate(dev); +remove_swnode: + device_remove_software_node(dev); disable_rpm: pm_runtime_disable(dev); pm_runtime_put_noidle(dev); @@ -268,6 +269,7 @@ static void dwc3_imx8mp_remove(struct platform_device *pdev)
pm_runtime_get_sync(dev); of_platform_depopulate(dev); + device_remove_software_node(dev);
pm_runtime_disable(dev); pm_runtime_put_noidle(dev);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp
commit ef42b906df5c57d0719b69419df9dfd25f25c161 upstream.
The refcounts of the OF nodes obtained by of_get_child_by_name() calls in anx7411_typec_switch_probe() are not decremented. Replace them with device_get_named_child_node() calls and store the return values to the newly created fwnode_handle fields in anx7411_data, and call fwnode_handle_put() on them in the error path and in the unregister functions.
Fixes: e45d7337dc0e ("usb: typec: anx7411: Use of_get_child_by_name() instead of of_find_node_by_name()") Cc: stable@vger.kernel.org Signed-off-by: Joe Hattori joe@pf.is.s.u-tokyo.ac.jp Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20241126014909.3687917-1-joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/anx7411.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
--- a/drivers/usb/typec/anx7411.c +++ b/drivers/usb/typec/anx7411.c @@ -290,6 +290,8 @@ struct anx7411_data { struct power_supply *psy; struct power_supply_desc psy_desc; struct device *dev; + struct fwnode_handle *switch_node; + struct fwnode_handle *mux_node; };
static u8 snk_identity[] = { @@ -1099,6 +1101,7 @@ static void anx7411_unregister_mux(struc if (ctx->typec.typec_mux) { typec_mux_unregister(ctx->typec.typec_mux); ctx->typec.typec_mux = NULL; + fwnode_handle_put(ctx->mux_node); } }
@@ -1107,6 +1110,7 @@ static void anx7411_unregister_switch(st if (ctx->typec.typec_switch) { typec_switch_unregister(ctx->typec.typec_switch); ctx->typec.typec_switch = NULL; + fwnode_handle_put(ctx->switch_node); } }
@@ -1114,28 +1118,29 @@ static int anx7411_typec_switch_probe(st struct device *dev) { int ret; - struct device_node *node;
- node = of_get_child_by_name(dev->of_node, "orientation_switch"); - if (!node) + ctx->switch_node = device_get_named_child_node(dev, "orientation_switch"); + if (!ctx->switch_node) return 0;
- ret = anx7411_register_switch(ctx, dev, &node->fwnode); + ret = anx7411_register_switch(ctx, dev, ctx->switch_node); if (ret) { dev_err(dev, "failed register switch"); + fwnode_handle_put(ctx->switch_node); return ret; }
- node = of_get_child_by_name(dev->of_node, "mode_switch"); - if (!node) { + ctx->mux_node = device_get_named_child_node(dev, "mode_switch"); + if (!ctx->mux_node) { dev_err(dev, "no typec mux exist"); ret = -ENODEV; goto unregister_switch; }
- ret = anx7411_register_mux(ctx, dev, &node->fwnode); + ret = anx7411_register_mux(ctx, dev, ctx->mux_node); if (ret) { dev_err(dev, "failed register mode switch"); + fwnode_handle_put(ctx->mux_node); ret = -ENODEV; goto unregister_switch; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lianqin Hu hulianqin@vivo.com
commit 4cfbca86f6a8b801f3254e0e3c8f2b1d2d64be2b upstream.
Considering that in some extreme cases, when u_serial driver is accessed by multiple threads, Thread A is executing the open operation and calling the gs_open, Thread B is executing the disconnect operation and calling the gserial_disconnect function,The port->port_usb pointer will be set to NULL.
E.g. Thread A Thread B gs_open() gadget_unbind_driver() gs_start_io() composite_disconnect() gs_start_rx() gserial_disconnect() ... ... spin_unlock(&port->port_lock) status = usb_ep_queue() spin_lock(&port->port_lock) spin_lock(&port->port_lock) port->port_usb = NULL gs_free_requests(port->port_usb->in) spin_unlock(&port->port_lock) Crash
This causes thread A to access a null pointer (port->port_usb is null) when calling the gs_free_requests function, causing a crash.
If port_usb is NULL, the release request will be skipped as it will be done by gserial_disconnect.
So add a null pointer check to gs_start_io before attempting to access the value of the pointer port->port_usb.
Call trace: gs_start_io+0x164/0x25c gs_open+0x108/0x13c tty_open+0x314/0x638 chrdev_open+0x1b8/0x258 do_dentry_open+0x2c4/0x700 vfs_open+0x2c/0x3c path_openat+0xa64/0xc60 do_filp_open+0xb8/0x164 do_sys_openat2+0x84/0xf0 __arm64_sys_openat+0x70/0x9c invoke_syscall+0x58/0x114 el0_svc_common+0x80/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x38/0x68
Fixes: c1dca562be8a ("usb gadget: split out serial core") Cc: stable@vger.kernel.org Suggested-by: Prashanth K quic_prashk@quicinc.com Signed-off-by: Lianqin Hu hulianqin@vivo.com Acked-by: Prashanth K quic_prashk@quicinc.com Link: https://lore.kernel.org/r/TYUPR06MB62178DC3473F9E1A537DCD02D2362@TYUPR06MB62... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/u_serial.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/usb/gadget/function/u_serial.c +++ b/drivers/usb/gadget/function/u_serial.c @@ -579,9 +579,12 @@ static int gs_start_io(struct gs_port *p * we didn't in gs_start_tx() */ tty_wakeup(port->port.tty); } else { - gs_free_requests(ep, head, &port->read_allocated); - gs_free_requests(port->port_usb->in, &port->write_pool, - &port->write_allocated); + /* Free reqs only if we are still connected */ + if (port->port_usb) { + gs_free_requests(ep, head, &port->read_allocated); + gs_free_requests(port->port_usb->in, &port->write_pool, + &port->write_allocated); + } status = -EIO; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Łukasz Bartosik ukaszb@chromium.org
commit e37b383df91ba9bde9c6a31bf3ea9072561c5126 upstream.
OPM PPM LPM | 1.send cmd | | |-------------------------->| | | |-- | | | | 2.set busy bit in CCI | | |<- | | 3.notify the OPM | | |<--------------------------| | | | 4.send cmd to be executed | | |-------------------------->| | | | | | 5.cmd completed | | |<--------------------------| | | | | |-- | | | | 6.set cmd completed | | |<- bit in CCI | | | | | 7.notify the OPM | | |<--------------------------| | | | | | 8.handle notification | | | from point 3, read CCI | | |<--------------------------| | | | |
When the PPM receives command from the OPM (p.1) it sets the busy bit in the CCI (p.2), sends notification to the OPM (p.3) and forwards the command to be executed by the LPM (p.4). When the PPM receives command completion from the LPM (p.5) it sets command completion bit in the CCI (p.6) and sends notification to the OPM (p.7). If command execution by the LPM is fast enough then when the OPM starts handling the notification from p.3 in p.8 and reads the CCI value it will see command completion bit set and will call complete(). Then complete() might be called again when the OPM handles notification from p.7.
This fix replaces test_bit() with test_and_clear_bit() in ucsi_notify_common() in order to call complete() only once per request.
This fix also reinitializes completion variable in ucsi_sync_control_common() before a command is sent.
Fixes: 584e8df58942 ("usb: typec: ucsi: extract common code for command handling") Cc: stable@vger.kernel.org Signed-off-by: Łukasz Bartosik ukaszb@chromium.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Reviewed-by: Benson Leung bleung@chromium.org Link: https://lore.kernel.org/r/20241203102318.3386345-1-ukaszb@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/ucsi/ucsi.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c index c435c0835744..7a65a7672e18 100644 --- a/drivers/usb/typec/ucsi/ucsi.c +++ b/drivers/usb/typec/ucsi/ucsi.c @@ -46,11 +46,11 @@ void ucsi_notify_common(struct ucsi *ucsi, u32 cci) ucsi_connector_change(ucsi, UCSI_CCI_CONNECTOR(cci));
if (cci & UCSI_CCI_ACK_COMPLETE && - test_bit(ACK_PENDING, &ucsi->flags)) + test_and_clear_bit(ACK_PENDING, &ucsi->flags)) complete(&ucsi->complete);
if (cci & UCSI_CCI_COMMAND_COMPLETE && - test_bit(COMMAND_PENDING, &ucsi->flags)) + test_and_clear_bit(COMMAND_PENDING, &ucsi->flags)) complete(&ucsi->complete); } EXPORT_SYMBOL_GPL(ucsi_notify_common); @@ -65,6 +65,8 @@ int ucsi_sync_control_common(struct ucsi *ucsi, u64 command) else set_bit(COMMAND_PENDING, &ucsi->flags);
+ reinit_completion(&ucsi->complete); + ret = ucsi->ops->async_control(ucsi, command); if (ret) goto out_clear_bit;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neal Frager neal.frager@amd.com
commit a48f744bef9ee74814a9eccb030b02223e48c76c upstream.
When the USB3 PHY is not defined in the Linux device tree, there could still be a case where there is a USB3 PHY active on the board and enabled by the first stage bootloader. If serdes clock is being used then the USB will fail to enumerate devices in 2.0 only mode.
To solve this, make sure that the PIPE clock is deselected whenever the USB3 PHY is not defined and guarantees that the USB2 only mode will work in all cases.
Fixes: 9678f3361afc ("usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode") Cc: stable@vger.kernel.org Signed-off-by: Neal Frager neal.frager@amd.com Signed-off-by: Radhey Shyam Pandey radhey.shyam.pandey@amd.com Acked-by: Peter Korsgaard peter@korsgaard.com Link: https://lore.kernel.org/r/1733163111-1414816-1-git-send-email-radhey.shyam.p... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/dwc3-xilinx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/usb/dwc3/dwc3-xilinx.c +++ b/drivers/usb/dwc3/dwc3-xilinx.c @@ -121,8 +121,11 @@ static int dwc3_xlnx_init_zynqmp(struct * in use but the usb3-phy entry is missing from the device tree. * Therefore, skip these operations in this case. */ - if (!priv_data->usb3_phy) + if (!priv_data->usb3_phy) { + /* Deselect the PIPE Clock Select bit in FPD PIPE Clock register */ + writel(PIPE_CLK_DESELECT, priv_data->regs + XLNX_USB_FPD_PIPE_CLK); goto skip_usb3_phy; + }
crst = devm_reset_control_get_exclusive(dev, "usb_crst"); if (IS_ERR(crst)) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luis Claudio R. Goncalves lgoncalv@redhat.com
commit 1f806218164d1bb93f3db21eaf61254b08acdf03 upstream.
During boot some of the calls to tegra241_cmdqv_get_cmdq() will happen in preemptible context. As this function calls smp_processor_id(), if CONFIG_DEBUG_PREEMPT is enabled, these calls will trigger a series of "BUG: using smp_processor_id() in preemptible" backtraces.
As tegra241_cmdqv_get_cmdq() only calls smp_processor_id() to use the CPU number as a factor to balance out traffic on cmdq usage, it is safe to use raw_smp_processor_id() here.
Cc: stable@vger.kernel.org Fixes: 918eb5c856f6 ("iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV") Signed-off-by: Luis Claudio R. Goncalves lgoncalv@redhat.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Nicolin Chen nicolinc@nvidia.com Tested-by: Nicolin Chen nicolinc@nvidia.com Link: https://lore.kernel.org/r/Z1L1mja3nXzsJ0Pk@uudg.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c index c8ec74f089f3..6e41ddaa24d6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c +++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c @@ -339,7 +339,7 @@ tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu, * one CPU at a time can enter the process, while the others * will be spinning at the same lock. */ - lidx = smp_processor_id() % cmdqv->num_lvcmdqs_per_vintf; + lidx = raw_smp_processor_id() % cmdqv->num_lvcmdqs_per_vintf; vcmdq = vintf->lvcmdqs[lidx]; if (!vcmdq || !READ_ONCE(vcmdq->enabled)) return NULL;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lu Baolu baolu.lu@linux.intel.com
commit 1f2557e08a617a4b5e92a48a1a9a6f86621def18 upstream.
The current implementation removes cache tags after disabling ATS, leading to potential memory leaks and kernel crashes. Specifically, CACHE_TAG_DEVTLB type cache tags may still remain in the list even after the domain is freed, causing a use-after-free condition.
This issue really shows up when multiple VFs from different PFs passed through to a single user-space process via vfio-pci. In such cases, the kernel may crash with kernel messages like:
BUG: kernel NULL pointer dereference, address: 0000000000000014 PGD 19036a067 P4D 1940a3067 PUD 136c9b067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 74 UID: 0 PID: 3183 Comm: testCli Not tainted 6.11.9 #2 RIP: 0010:cache_tag_flush_range+0x9b/0x250 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x163/0x590 ? exc_page_fault+0x72/0x190 ? asm_exc_page_fault+0x22/0x30 ? cache_tag_flush_range+0x9b/0x250 ? cache_tag_flush_range+0x5d/0x250 intel_iommu_tlb_sync+0x29/0x40 intel_iommu_unmap_pages+0xfe/0x160 __iommu_unmap+0xd8/0x1a0 vfio_unmap_unpin+0x182/0x340 [vfio_iommu_type1] vfio_remove_dma+0x2a/0xb0 [vfio_iommu_type1] vfio_iommu_type1_ioctl+0xafa/0x18e0 [vfio_iommu_type1]
Move cache_tag_unassign_domain() before iommu_disable_pci_caps() to fix it.
Fixes: 3b1d9e2b2d68 ("iommu/vt-d: Add cache tag assignment interface") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20241129020506.576413-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/intel/iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -3372,6 +3372,9 @@ void device_block_translation(struct dev struct intel_iommu *iommu = info->iommu; unsigned long flags;
+ if (info->domain) + cache_tag_unassign_domain(info->domain, dev, IOMMU_NO_PASID); + iommu_disable_pci_caps(info); if (!dev_is_real_dma_subdevice(dev)) { if (sm_supported(iommu)) @@ -3388,7 +3391,6 @@ void device_block_translation(struct dev list_del(&info->link); spin_unlock_irqrestore(&info->domain->lock, flags);
- cache_tag_unassign_domain(info->domain, dev, IOMMU_NO_PASID); domain_detach_iommu(info->domain, iommu); info->domain = NULL; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yi Liu yi.l.liu@intel.com
commit 74536f91962d5f6af0a42414773ce61e653c10ee upstream.
The qi_batch is allocated when assigning cache tag for a domain. While for nested parent domain, it is missed. Hence, when trying to map pages to the nested parent, NULL dereference occurred. Also, there is potential memleak since there is no lock around domain->qi_batch allocation.
To solve it, add a helper for qi_batch allocation, and call it in both the __cache_tag_assign_domain() and __cache_tag_assign_parent_domain().
BUG: kernel NULL pointer dereference, address: 0000000000000200 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8104795067 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632 Call Trace: ? __die+0x24/0x70 ? page_fault_oops+0x80/0x150 ? do_user_addr_fault+0x63/0x7b0 ? exc_page_fault+0x7c/0x220 ? asm_exc_page_fault+0x26/0x30 ? cache_tag_flush_range_np+0x13c/0x260 intel_iommu_iotlb_sync_map+0x1a/0x30 iommu_map+0x61/0xf0 batch_to_domain+0x188/0x250 iopt_area_fill_domains+0x125/0x320 ? rcu_is_watching+0x11/0x50 iopt_map_pages+0x63/0x100 iopt_map_common.isra.0+0xa7/0x190 iopt_map_user_pages+0x6a/0x80 iommufd_ioas_map+0xcd/0x1d0 iommufd_fops_ioctl+0x118/0x1c0 __x64_sys_ioctl+0x93/0xc0 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation") Cc: stable@vger.kernel.org Co-developed-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Link: https://lore.kernel.org/r/20241210130322.17175-1-yi.l.liu@intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/intel/cache.c | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c index e5b89f728ad3..09694cca8752 100644 --- a/drivers/iommu/intel/cache.c +++ b/drivers/iommu/intel/cache.c @@ -105,12 +105,35 @@ static void cache_tag_unassign(struct dmar_domain *domain, u16 did, spin_unlock_irqrestore(&domain->cache_lock, flags); }
+/* domain->qi_batch will be freed in iommu_free_domain() path. */ +static int domain_qi_batch_alloc(struct dmar_domain *domain) +{ + unsigned long flags; + int ret = 0; + + spin_lock_irqsave(&domain->cache_lock, flags); + if (domain->qi_batch) + goto out_unlock; + + domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_ATOMIC); + if (!domain->qi_batch) + ret = -ENOMEM; +out_unlock: + spin_unlock_irqrestore(&domain->cache_lock, flags); + + return ret; +} + static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 did, struct device *dev, ioasid_t pasid) { struct device_domain_info *info = dev_iommu_priv_get(dev); int ret;
+ ret = domain_qi_batch_alloc(domain); + if (ret) + return ret; + ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_IOTLB); if (ret || !info->ats_enabled) return ret; @@ -139,6 +162,10 @@ static int __cache_tag_assign_parent_domain(struct dmar_domain *domain, u16 did, struct device_domain_info *info = dev_iommu_priv_get(dev); int ret;
+ ret = domain_qi_batch_alloc(domain); + if (ret) + return ret; + ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_NESTING_IOTLB); if (ret || !info->ats_enabled) return ret; @@ -190,13 +217,6 @@ int cache_tag_assign_domain(struct dmar_domain *domain, u16 did = domain_get_id_for_dev(domain, dev); int ret;
- /* domain->qi_bach will be freed in iommu_free_domain() path. */ - if (!domain->qi_batch) { - domain->qi_batch = kzalloc(sizeof(*domain->qi_batch), GFP_KERNEL); - if (!domain->qi_batch) - return -ENOMEM; - } - ret = __cache_tag_assign_domain(domain, did, dev, pasid); if (ret || domain->domain.type != IOMMU_DOMAIN_NESTED) return ret;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
commit cefade70f346160f47cc24776160329e2ee63653 upstream.
Invalidation_fence_init takes a PM reference, which is released in its _fini counterpart, so we need to make sure that the latter is called, even if the fence is in an error state.
Since we already have a function that calls _fini() and signals the fence in the tlb inval code, we can expose that and call it from the PT code.
Fixes: f002702290fc ("drm/xe: Hold a PM ref when GT TLB invalidations are inflight") Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: stable@vger.kernel.org # v6.11+ Cc: Matthew Brost matthew.brost@intel.com Cc: Nirmoy Das nirmoy.das@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Reviewed-by: Nirmoy Das nirmoy.das@intel.com Reviewed-by: Matthew Brost matthew.brost@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20241206015022.1567113-1-danie... (cherry picked from commit 65338639b79ce88aef5263cd518cde570a3c7c8e) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 8 ++++++++ drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h | 1 + drivers/gpu/drm/xe/xe_pt.c | 3 +-- 3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c index 3cb228c773cd..6146d1776bda 100644 --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c @@ -65,6 +65,14 @@ invalidation_fence_signal(struct xe_device *xe, struct xe_gt_tlb_invalidation_fe __invalidation_fence_signal(xe, fence); }
+void xe_gt_tlb_invalidation_fence_signal(struct xe_gt_tlb_invalidation_fence *fence) +{ + if (WARN_ON_ONCE(!fence->gt)) + return; + + __invalidation_fence_signal(gt_to_xe(fence->gt), fence); +} + static void xe_gt_tlb_fence_timeout(struct work_struct *work) { struct xe_gt *gt = container_of(work, struct xe_gt, diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h index f430d5797af7..00b1c6c01e8d 100644 --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h @@ -28,6 +28,7 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len); void xe_gt_tlb_invalidation_fence_init(struct xe_gt *gt, struct xe_gt_tlb_invalidation_fence *fence, bool stack); +void xe_gt_tlb_invalidation_fence_signal(struct xe_gt_tlb_invalidation_fence *fence);
static inline void xe_gt_tlb_invalidation_fence_wait(struct xe_gt_tlb_invalidation_fence *fence) diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c index f27f579f4d85..797576690356 100644 --- a/drivers/gpu/drm/xe/xe_pt.c +++ b/drivers/gpu/drm/xe/xe_pt.c @@ -1333,8 +1333,7 @@ static void invalidation_fence_cb(struct dma_fence *fence, queue_work(system_wq, &ifence->work); } else { ifence->base.base.error = ifence->fence->error; - dma_fence_signal(&ifence->base.base); - dma_fence_put(&ifence->base.base); + xe_gt_tlb_invalidation_fence_signal(&ifence->base); } dma_fence_put(ifence->fence); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesse.zhang@amd.com Jesse.zhang@amd.com
commit 438b39ac74e2a9dc0a5c9d653b7d8066877e86b1 upstream.
When using MES creating a pdd will require talking to the GPU to setup the relevant context. The code here forgot to wake up the GPU in case it was in suspend, this causes KVM to EFAULT for passthrough GPU for example. This issue can be masked if the GPU was woken up by other things (e.g. opening the KMS node) first and have not yet gone to sleep.
v4: do the allocation of proc_ctx_bo in a lazy fashion when the first queue is created in a process (Felix)
Signed-off-by: Jesse Zhang jesse.zhang@amd.com Reviewed-by: Yunxiang Li Yunxiang.Li@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 15 +++++++++++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 23 +----------------- 2 files changed, 17 insertions(+), 21 deletions(-)
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -205,6 +205,21 @@ static int add_queue_mes(struct device_q if (!down_read_trylock(&adev->reset_domain->sem)) return -EIO;
+ if (!pdd->proc_ctx_cpu_ptr) { + r = amdgpu_amdkfd_alloc_gtt_mem(adev, + AMDGPU_MES_PROC_CTX_SIZE, + &pdd->proc_ctx_bo, + &pdd->proc_ctx_gpu_addr, + &pdd->proc_ctx_cpu_ptr, + false); + if (r) { + dev_err(adev->dev, + "failed to allocate process context bo\n"); + return r; + } + memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE); + } + memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input)); queue_input.process_id = qpd->pqm->process->pasid; queue_input.page_table_base_addr = qpd->page_table_base; --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1076,7 +1076,8 @@ static void kfd_process_destroy_pdds(str
kfd_free_process_doorbells(pdd->dev->kfd, pdd);
- if (pdd->dev->kfd->shared_resources.enable_mes) + if (pdd->dev->kfd->shared_resources.enable_mes && + pdd->proc_ctx_cpu_ptr) amdgpu_amdkfd_free_gtt_mem(pdd->dev->adev, &pdd->proc_ctx_bo); /* @@ -1610,7 +1611,6 @@ struct kfd_process_device *kfd_create_pr struct kfd_process *p) { struct kfd_process_device *pdd = NULL; - int retval = 0;
if (WARN_ON_ONCE(p->n_pdds >= MAX_GPU_INSTANCE)) return NULL; @@ -1634,21 +1634,6 @@ struct kfd_process_device *kfd_create_pr pdd->user_gpu_id = dev->id; atomic64_set(&pdd->evict_duration_counter, 0);
- if (dev->kfd->shared_resources.enable_mes) { - retval = amdgpu_amdkfd_alloc_gtt_mem(dev->adev, - AMDGPU_MES_PROC_CTX_SIZE, - &pdd->proc_ctx_bo, - &pdd->proc_ctx_gpu_addr, - &pdd->proc_ctx_cpu_ptr, - false); - if (retval) { - dev_err(dev->adev->dev, - "failed to allocate process context bo\n"); - goto err_free_pdd; - } - memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE); - } - p->pdds[p->n_pdds++] = pdd; if (kfd_dbg_is_per_vmid_supported(pdd->dev)) pdd->spi_dbg_override = pdd->dev->kfd2kgd->disable_debug_trap( @@ -1660,10 +1645,6 @@ struct kfd_process_device *kfd_create_pr idr_init(&pdd->alloc_idr);
return pdd; - -err_free_pdd: - kfree(pdd); - return NULL; }
/**
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiasheng Jiang jiashengjiangcool@outlook.com
commit 2828e5808bcd5aae7fdcd169cac1efa2701fa2dd upstream.
Replace "slab_priorities" with "slab_dependencies" in the error handler to avoid memory leak.
Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Jiasheng Jiang jiashengjiangcool@outlook.com Reviewed-by: Nirmoy Das nirmoy.das@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20241127201042.29620-1-jiashen... (cherry picked from commit 9bc5e7dc694d3112bbf0fa4c46ef0fa0f114937a) Signed-off-by: Tvrtko Ursulin tursulin@ursulin.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/i915_scheduler.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -506,6 +506,6 @@ int __init i915_scheduler_module_init(vo return 0;
err_priorities: - kmem_cache_destroy(slab_priorities); + kmem_cache_destroy(slab_dependencies); return -ENOMEM; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit cd3da567e2e46b8f75549637b960a83b024d6b6e upstream.
DSB LUT register writes vs. palette anti-collision logic appear to interact in interesting ways: - posted DSB writes simply vanish into thin air while anti-collision is active - non-posted DSB writes actually get blocked by the anti-collision logic, but unfortunately this ends up hogging the bus for long enough that unrelated parallel CPU MMIO accesses start to disappear instead
Even though we are updating the LUT during vblank we aren't immune to the anti-collision logic because it kicks in briefly for pipe prefill (initiated at frame start). The safe time window for performing the LUT update is thus between the undelayed vblank and frame start. Turns out that with low enough CDCLK frequency (DSB execution speed depends on CDCLK) we can exceed that.
As we are currently using non-posted writes for the legacy LUT updates, in which case we can hit the far more severe failure mode. The problem is exacerbated by the fact that non-posted writes are much slower than posted writes (~4x it seems).
To mititage the problem let's switch to using posted DSB writes for legacy LUT updates (which will involve using the double write approach to avoid other problems with DSB vs. legacy LUT writes). Despite writing each register twice this will in fact make the legacy LUT update faster when compared to the non-posted write approach, making the problem less likely to appear. The failure mode is also less severe.
This isn't the 100% solution we need though. That will involve estimating how long the LUT update will take, and pushing frame start and/or delayed vblank forward to guarantee that the update will have finished by the time the pipe prefill starts...
Cc: stable@vger.kernel.org Fixes: 34d8311f4a1c ("drm/i915/dsb: Re-instate DSB for LUT updates") Fixes: 25ea3411bd23 ("drm/i915/dsb: Use non-posted register writes for legacy LUT") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12494 Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20241120164123.12706-3-ville.s... Reviewed-by: Uma Shankar uma.shankar@intel.com (cherry picked from commit 2504a316b35d49522f39cf0dc01830d7c36a9be4) Signed-off-by: Tvrtko Ursulin tursulin@ursulin.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_color.c | 30 +++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_color.c +++ b/drivers/gpu/drm/i915/display/intel_color.c @@ -1333,19 +1333,29 @@ static void ilk_load_lut_8(const struct lut = blob->data;
/* - * DSB fails to correctly load the legacy LUT - * unless we either write each entry twice, - * or use non-posted writes + * DSB fails to correctly load the legacy LUT unless + * we either write each entry twice when using posted + * writes, or we use non-posted writes. + * + * If palette anti-collision is active during LUT + * register writes: + * - posted writes simply get dropped and thus the LUT + * contents may not be correctly updated + * - non-posted writes are blocked and thus the LUT + * contents are always correct, but simultaneous CPU + * MMIO access will start to fail + * + * Choose the lesser of two evils and use posted writes. + * Using posted writes is also faster, even when having + * to write each register twice. */ - if (crtc_state->dsb_color_vblank) - intel_dsb_nonpost_start(crtc_state->dsb_color_vblank); - - for (i = 0; i < 256; i++) + for (i = 0; i < 256; i++) { ilk_lut_write(crtc_state, LGC_PALETTE(pipe, i), i9xx_lut_8(&lut[i])); - - if (crtc_state->dsb_color_vblank) - intel_dsb_nonpost_end(crtc_state->dsb_color_vblank); + if (crtc_state->dsb_color_vblank) + ilk_lut_write(crtc_state, LGC_PALETTE(pipe, i), + i9xx_lut_8(&lut[i])); + } }
static void ilk_load_lut_10(const struct intel_crtc_state *crtc_state,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eugene Kobyak eugene.kobyak@intel.com
commit da0b986256ae9a78b0215214ff44f271bfe237c1 upstream.
When the intel_context structure contains NULL, it raises a NULL pointer dereference error in drm_info().
Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309 Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Cc: John Harrison John.C.Harrison@Intel.com Cc: stable@vger.kernel.org # v6.3+ Signed-off-by: Eugene Kobyak eugene.kobyak@intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/xmsgfynkhycw3cf56akp4he2ffg44v... (cherry picked from commit 754302a5bc1bd8fd3b7d85c168b0a1af6d4bba4d) Signed-off-by: Tvrtko Ursulin tursulin@ursulin.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/i915_gpu_error.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1652,9 +1652,21 @@ capture_engine(struct intel_engine_cs *e return NULL;
intel_engine_get_hung_entity(engine, &ce, &rq); - if (rq && !i915_request_started(rq)) - drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", - engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); + if (rq && !i915_request_started(rq)) { + /* + * We want to know also what is the guc_id of the context, + * but if we don't have the context reference, then skip + * printing it. + */ + if (ce) + drm_info(&engine->gt->i915->drm, + "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", + engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); + else + drm_info(&engine->gt->i915->drm, + "Got hung context on %s with active request %lld:%lld not yet started\n", + engine->name, rq->fence.context, rq->fence.seqno); + }
if (rq) { capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian König christian.koenig@amd.com
commit 12f325bcd2411e571dbb500bf6862c812c479735 upstream.
When starting the mpv player, Radeon R9 users are observing the below error in dmesg.
[drm:amdgpu_uvd_cs_pass2 [amdgpu]] *ERROR* msg/fb buffer ff00f7c000-ff00f7e000 out of 256MB segment!
The patch tries to set the TTM_PL_FLAG_CONTIGUOUS for both user flag(AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) set and not set cases.
v2: Make the TTM_PL_FLAG_CONTIGUOUS mandatory for user BO's. v3: revert back to v1, but fix the check instead (chk).
Closes:https://gitlab.freedesktop.org/drm/amd/-/issues/3599 Closes:https://gitlab.freedesktop.org/drm/amd/-/issues/3501 Signed-off-by: Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.10+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 17 +++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 ++ 2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index d891ab779ca7..5df21529b3b1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1801,13 +1801,18 @@ int amdgpu_cs_find_mapping(struct amdgpu_cs_parser *parser, if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != &parser->exec.ticket) return -EINVAL;
+ /* Make sure VRAM is allocated contigiously */ (*bo)->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; - amdgpu_bo_placement_from_domain(*bo, (*bo)->allowed_domains); - for (i = 0; i < (*bo)->placement.num_placement; i++) - (*bo)->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; - r = ttm_bo_validate(&(*bo)->tbo, &(*bo)->placement, &ctx); - if (r) - return r; + if ((*bo)->tbo.resource->mem_type == TTM_PL_VRAM && + !((*bo)->tbo.resource->placement & TTM_PL_FLAG_CONTIGUOUS)) { + + amdgpu_bo_placement_from_domain(*bo, (*bo)->allowed_domains); + for (i = 0; i < (*bo)->placement.num_placement; i++) + (*bo)->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; + r = ttm_bo_validate(&(*bo)->tbo, &(*bo)->placement, &ctx); + if (r) + return r; + }
return amdgpu_ttm_alloc_gart(&(*bo)->tbo); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c index 31fd30dcd593..65bb26215e86 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c @@ -551,6 +551,8 @@ static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo) for (i = 0; i < abo->placement.num_placement; ++i) { abo->placements[i].fpfn = 0 >> PAGE_SHIFT; abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT; + if (abo->placements[i].mem_type == TTM_PL_VRAM) + abo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kenneth Feng kenneth.feng@amd.com
commit 3912a78cf72eb45f8153a395162b08fef9c5ec3d upstream.
Set the default workload type to bootup type on smu v13.0.7. This is because of the constraint on smu v13.0.7. Gfx activity has an even higher set point on 3D fullscreen mode than the one on bootup mode. This causes the 3D fullscreen mode's performance is worse than the bootup mode's performance for the lightweighted/medium workload. For the high workload, the performance is the same between 3D fullscreen mode and bootup mode.
v2: set the default workload in ASIC specific file
Signed-off-by: Kenneth Feng kenneth.feng@amd.com Reviewed-by: Yang Wang kevinyang.wang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 6.11.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c @@ -2717,4 +2717,5 @@ void smu_v13_0_7_set_ppt_funcs(struct sm smu->workload_map = smu_v13_0_7_workload_map; smu->smc_driver_if_version = SMU13_0_7_DRIVER_IF_VERSION; smu_v13_0_set_smu_mailbox_registers(smu); + smu->power_profile_mode = PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian König christian.koenig@amd.com
commit f4df208177d02f1c90f3644da3a2453080b8c24f upstream.
Emitting the cleaner shader must come after the check if a VM switch is necessary or not.
Otherwise we will emit the cleaner shader every time and not just when it is necessary because we switched between applications.
This can otherwise crash on gang submit and probably decreases performance quite a bit.
v2: squash in fix from Srini (Alex)
Signed-off-by: Christian König christian.koenig@amd.com Fixes: ee7a846ea27b ("drm/amdgpu: Emit cleaner shader at end of IB submission") Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8d9bf7a0857f..ddd7f05e4db9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -674,12 +674,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, pasid_mapping_needed &= adev->gmc.gmc_funcs->emit_pasid_mapping && ring->funcs->emit_wreg;
- if (adev->gfx.enable_cleaner_shader && - ring->funcs->emit_cleaner_shader && - job->enforce_isolation) - ring->funcs->emit_cleaner_shader(ring); - - if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync) + if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync && + !(job->enforce_isolation && !job->vmid)) return 0;
amdgpu_ring_ib_begin(ring); @@ -690,6 +686,11 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, if (need_pipe_sync) amdgpu_ring_emit_pipeline_sync(ring);
+ if (adev->gfx.enable_cleaner_shader && + ring->funcs->emit_cleaner_shader && + job->enforce_isolation) + ring->funcs->emit_cleaner_shader(ring); + if (vm_flush_needed) { trace_amdgpu_vm_flush(ring, job->vmid, job->vm_pd_addr); amdgpu_ring_emit_vm_flush(ring, job->vmid, job->vm_pd_addr);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Martin Andrew.Martin@amd.com
commit a592bb19abdc2072875c87da606461bfd7821b08 upstream.
In the function pqm_uninit there is a call-assignment of "pdd = kfd_get_process_device_data" which could be null, and this value was later dereferenced without checking.
Fixes: fb91065851cd ("drm/amdkfd: Refactor queue wptr_bo GART mapping") Signed-off-by: Andrew Martin Andrew.Martin@amd.com Reviewed-by: Felix Kuehling felix.kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c index c76db22a1000..59b92d66e958 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c @@ -212,13 +212,17 @@ static void pqm_clean_queue_resource(struct process_queue_manager *pqm, void pqm_uninit(struct process_queue_manager *pqm) { struct process_queue_node *pqn, *next; - struct kfd_process_device *pdd;
list_for_each_entry_safe(pqn, next, &pqm->queues, process_queue_list) { if (pqn->q) { - pdd = kfd_get_process_device_data(pqn->q->device, pqm->process); - kfd_queue_unref_bo_vas(pdd, &pqn->q->properties); - kfd_queue_release_buffers(pdd, &pqn->q->properties); + struct kfd_process_device *pdd = kfd_get_process_device_data(pqn->q->device, + pqm->process); + if (pdd) { + kfd_queue_unref_bo_vas(pdd, &pqn->q->properties); + kfd_queue_release_buffers(pdd, &pqn->q->properties); + } else { + WARN_ON(!pdd); + } pqm_clean_queue_resource(pqm, pqn); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com
commit 321048c4a3e375416b51b4093978f9ce2aa4d391 upstream.
This information is not available in ip discovery table.
Signed-off-by: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com Reviewed-by: David Belanger david.belanger@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1422,6 +1422,7 @@ err:
static int kfd_fill_gpu_cache_info_from_gfx_config(struct kfd_dev *kdev, + bool cache_line_size_missing, struct kfd_gpu_cache_info *pcache_info) { struct amdgpu_device *adev = kdev->adev; @@ -1436,6 +1437,8 @@ static int kfd_fill_gpu_cache_info_from_ CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.gc_num_tcp_per_wpg / 2; pcache_info[i].cache_line_size = adev->gfx.config.gc_tcp_cache_line_size; + if (cache_line_size_missing && !pcache_info[i].cache_line_size) + pcache_info[i].cache_line_size = 128; i++; } /* Scalar L1 Instruction Cache per SQC */ @@ -1448,6 +1451,8 @@ static int kfd_fill_gpu_cache_info_from_ CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.gc_num_sqc_per_wgp * 2; pcache_info[i].cache_line_size = adev->gfx.config.gc_instruction_cache_line_size; + if (cache_line_size_missing && !pcache_info[i].cache_line_size) + pcache_info[i].cache_line_size = 128; i++; } /* Scalar L1 Data Cache per SQC */ @@ -1459,6 +1464,8 @@ static int kfd_fill_gpu_cache_info_from_ CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.gc_num_sqc_per_wgp * 2; pcache_info[i].cache_line_size = adev->gfx.config.gc_scalar_data_cache_line_size; + if (cache_line_size_missing && !pcache_info[i].cache_line_size) + pcache_info[i].cache_line_size = 64; i++; } /* GL1 Data Cache per SA */ @@ -1471,7 +1478,8 @@ static int kfd_fill_gpu_cache_info_from_ CRAT_CACHE_FLAGS_DATA_CACHE | CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.max_cu_per_sh; - pcache_info[i].cache_line_size = 0; + if (cache_line_size_missing) + pcache_info[i].cache_line_size = 128; i++; } /* L2 Data Cache per GPU (Total Tex Cache) */ @@ -1483,6 +1491,8 @@ static int kfd_fill_gpu_cache_info_from_ CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.max_cu_per_sh; pcache_info[i].cache_line_size = adev->gfx.config.gc_tcc_cache_line_size; + if (cache_line_size_missing && !pcache_info[i].cache_line_size) + pcache_info[i].cache_line_size = 128; i++; } /* L3 Data Cache per GPU */ @@ -1568,6 +1578,7 @@ static int kfd_fill_gpu_cache_info_from_ int kfd_get_gpu_cache_info(struct kfd_node *kdev, struct kfd_gpu_cache_info **pcache_info) { int num_of_cache_types = 0; + bool cache_line_size_missing = false;
switch (kdev->adev->asic_type) { case CHIP_KAVERI: @@ -1691,10 +1702,17 @@ int kfd_get_gpu_cache_info(struct kfd_no case IP_VERSION(11, 5, 0): case IP_VERSION(11, 5, 1): case IP_VERSION(11, 5, 2): + /* Cacheline size not available in IP discovery for gc11. + * kfd_fill_gpu_cache_info_from_gfx_config to hard code it + */ + cache_line_size_missing = true; + fallthrough; case IP_VERSION(12, 0, 0): case IP_VERSION(12, 0, 1): num_of_cache_types = - kfd_fill_gpu_cache_info_from_gfx_config(kdev->kfd, *pcache_info); + kfd_fill_gpu_cache_info_from_gfx_config(kdev->kfd, + cache_line_size_missing, + *pcache_info); break; default: *pcache_info = dummy_cache_info;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com
commit d50bf3f0fab636574c163ba8b5863e12b1ed19bd upstream.
This information is not available in ip discovery table.
Signed-off-by: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com Reviewed-by: David Belanger david.belanger@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1503,7 +1503,7 @@ static int kfd_fill_gpu_cache_info_from_ CRAT_CACHE_FLAGS_DATA_CACHE | CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.max_cu_per_sh; - pcache_info[i].cache_line_size = 0; + pcache_info[i].cache_line_size = 64; i++; } return i;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 6f4669708a69fd21f0299c2d5c4780a6ce358ab5 upstream.
If we need to reset a symlink target to the "durr it's busted" string, then we clear the zapped flag as well. However, this should be using the provided helper so that we don't set the zapped state on an otherwise ok symlink.
Cc: stable@vger.kernel.org # v6.10 Fixes: 2651923d8d8db0 ("xfs: online repair of symbolic links") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/scrub/symlink_repair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/scrub/symlink_repair.c b/fs/xfs/scrub/symlink_repair.c index d015a86ef460..953ce7be78dc 100644 --- a/fs/xfs/scrub/symlink_repair.c +++ b/fs/xfs/scrub/symlink_repair.c @@ -36,6 +36,7 @@ #include "scrub/tempfile.h" #include "scrub/tempexch.h" #include "scrub/reap.h" +#include "scrub/health.h"
/* * Symbolic Link Repair @@ -233,7 +234,7 @@ xrep_symlink_salvage( * target zapped flag. */ if (buflen == 0) { - sc->sick_mask |= XFS_SICK_INO_SYMLINK_ZAPPED; + xchk_mark_healthy_if_clean(sc, XFS_SICK_INO_SYMLINK_ZAPPED); sprintf(target_buf, DUMMY_TARGET); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 6d7b4bc1c3e00b1a25b7a05141a64337b4629337 upstream.
In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec would erroneously try to update the parent's key for a block that had been split if we decided to insert the new record into the new block. The solution was to detect this situation and update the in-core key value that we pass up to the caller so that the caller will (eventually) add the new block to the parent level of the tree with the correct key.
However, I missed a subtlety about the way inode-rooted btrees work. If the full block was a maximally sized inode root block, we'll solve that fullness by moving the root block's records to a new block, resizing the root block, and updating the root to point to the new block. We don't pass a pointer to the new block to the caller because that work has already been done. The new record will /always/ land in the new block, so in this case we need to use xfs_btree_update_keys to update the keys.
This bug can theoretically manifest itself in the very rare case that we split a bmbt root block and the new record lands in the very first slot of the new block, though I've never managed to trigger it in practice. However, it is very easy to reproduce by running generic/522 with the realtime rmapbt patchset if rtinherit=1.
Cc: stable@vger.kernel.org # v4.8 Fixes: 2c813ad66a7218 ("xfs: support btrees with overlapping intervals for keys") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_btree.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-)
--- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -3569,14 +3569,31 @@ xfs_btree_insrec( xfs_btree_log_block(cur, bp, XFS_BB_NUMRECS);
/* - * If we just inserted into a new tree block, we have to - * recalculate nkey here because nkey is out of date. + * Update btree keys to reflect the newly added record or keyptr. + * There are three cases here to be aware of. Normally, all we have to + * do is walk towards the root, updating keys as necessary. * - * Otherwise we're just updating an existing block (having shoved - * some records into the new tree block), so use the regular key - * update mechanism. + * If the caller had us target a full block for the insertion, we dealt + * with that by calling the _make_block_unfull function. If the + * "make unfull" function splits the block, it'll hand us back the key + * and pointer of the new block. We haven't yet added the new block to + * the next level up, so if we decide to add the new record to the new + * block (bp->b_bn != old_bn), we have to update the caller's pointer + * so that the caller adds the new block with the correct key. + * + * However, there is a third possibility-- if the selected block is the + * root block of an inode-rooted btree and cannot be expanded further, + * the "make unfull" function moves the root block contents to a new + * block and updates the root block to point to the new block. In this + * case, no block pointer is passed back because the block has already + * been added to the btree. In this case, we need to use the regular + * key update function, just like the first case. This is critical for + * overlapping btrees, because the high key must be updated to reflect + * the entire tree, not just the subtree accessible through the first + * child of the root (which is now two levels down from the root). */ - if (bp && xfs_buf_daddr(bp) != old_bn) { + if (!xfs_btree_ptr_is_null(cur, &nptr) && + bp && xfs_buf_daddr(bp) != old_bn) { xfs_btree_get_keys(cur, block, lkey); } else if (xfs_btree_needs_key_update(cur, optr)) { error = xfs_btree_update_keys(cur, level);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 7ce31f20a0771d71779c3b0ec9cdf474cc3c8e9a upstream.
Way back when we first implemented FICLONE for XFS, life was simple -- either the the entire remapping completed, or something happened and we had to return an errno explaining what happened. Neither of those ioctls support returning partial results, so it's all or nothing.
Then things got complicated when copy_file_range came along, because it actually can return the number of bytes copied, so commit 3f68c1f562f1e4 tried to make it so that we could return a partial result if the REMAP_FILE_CAN_SHORTEN flag is set. This is also how FIDEDUPERANGE can indicate that the kernel performed a partial deduplication.
Unfortunately, the logic is wrong if an error stops the remapping and CAN_SHORTEN is not set. Because those callers cannot return partial results, it is an error for ->remap_file_range to return a positive quantity that is less than the @len passed in. Implementations really should be returning a negative errno in this case, because that's what btrfs (which introduced FICLONE{,RANGE}) did.
Therefore, ->remap_range implementations cannot silently drop an errno that they might have when the number of bytes remapped is less than the number of bytes requested and CAN_SHORTEN is not set.
Found by running generic/562 on a 64k fsblock filesystem and wondering why it reported corrupt files.
Cc: stable@vger.kernel.org # v4.20 Fixes: 3fc9f5e409319e ("xfs: remove xfs_reflink_remap_range") Really-Fixes: 3f68c1f562f1e4 ("xfs: support returning partial reflink results") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_file.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1228,6 +1228,14 @@ out_unlock: xfs_iunlock2_remapping(src, dest); if (ret) trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_); + /* + * If the caller did not set CAN_SHORTEN, then it is not prepared to + * handle partial results -- either the whole remap succeeds, or we + * must say why it did not. In this case, any error should be returned + * to the caller. + */ + if (ret && remapped < len && !(remap_flags & REMAP_FILE_CAN_SHORTEN)) + return ret; return remapped > 0 ? remapped : ret; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit bd27c7bcdca25ce8067ebb94ded6ac1bd7b47317 upstream.
With the nrext64 feature enabled, it's possible for a data fork to have 2^48 extent mappings. Even with a 64k fsblock size, that maps out to a bmbt containing more than 2^32 blocks. Therefore, this predicate must return a u64 count to avoid an integer wraparound that will cause scrub to do the wrong thing.
It's unlikely that any such filesystem currently exists, because the incore bmbt would consume more than 64GB of kernel memory on its own, and so far nobody except me has driven a filesystem that far, judging from the lack of complaints.
Cc: stable@vger.kernel.org # v5.19 Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_btree.c | 4 ++-- fs/xfs/libxfs/xfs_btree.h | 2 +- fs/xfs/libxfs/xfs_ialloc_btree.c | 4 +++- fs/xfs/scrub/agheader.c | 6 +++--- fs/xfs/scrub/agheader_repair.c | 6 +++--- fs/xfs/scrub/fscounters.c | 2 +- fs/xfs/scrub/ialloc.c | 4 ++-- fs/xfs/scrub/refcount.c | 2 +- fs/xfs/xfs_bmap_util.c | 2 +- 9 files changed, 17 insertions(+), 15 deletions(-)
--- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -5173,7 +5173,7 @@ xfs_btree_count_blocks_helper( int level, void *data) { - xfs_extlen_t *blocks = data; + xfs_filblks_t *blocks = data; (*blocks)++;
return 0; @@ -5183,7 +5183,7 @@ xfs_btree_count_blocks_helper( int xfs_btree_count_blocks( struct xfs_btree_cur *cur, - xfs_extlen_t *blocks) + xfs_filblks_t *blocks) { *blocks = 0; return xfs_btree_visit_blocks(cur, xfs_btree_count_blocks_helper, --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -485,7 +485,7 @@ typedef int (*xfs_btree_visit_blocks_fn) int xfs_btree_visit_blocks(struct xfs_btree_cur *cur, xfs_btree_visit_blocks_fn fn, unsigned int flags, void *data);
-int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_extlen_t *blocks); +int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_filblks_t *blocks);
union xfs_btree_rec *xfs_btree_rec_addr(struct xfs_btree_cur *cur, int n, struct xfs_btree_block *block); --- a/fs/xfs/libxfs/xfs_ialloc_btree.c +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c @@ -743,6 +743,7 @@ xfs_finobt_count_blocks( { struct xfs_buf *agbp = NULL; struct xfs_btree_cur *cur; + xfs_filblks_t blocks; int error;
error = xfs_ialloc_read_agi(pag, tp, 0, &agbp); @@ -750,9 +751,10 @@ xfs_finobt_count_blocks( return error;
cur = xfs_finobt_init_cursor(pag, tp, agbp); - error = xfs_btree_count_blocks(cur, tree_blocks); + error = xfs_btree_count_blocks(cur, &blocks); xfs_btree_del_cursor(cur, error); xfs_trans_brelse(tp, agbp); + *tree_blocks = blocks;
return error; } --- a/fs/xfs/scrub/agheader.c +++ b/fs/xfs/scrub/agheader.c @@ -434,7 +434,7 @@ xchk_agf_xref_btreeblks( { struct xfs_agf *agf = sc->sa.agf_bp->b_addr; struct xfs_mount *mp = sc->mp; - xfs_agblock_t blocks; + xfs_filblks_t blocks; xfs_agblock_t btreeblks; int error;
@@ -483,7 +483,7 @@ xchk_agf_xref_refcblks( struct xfs_scrub *sc) { struct xfs_agf *agf = sc->sa.agf_bp->b_addr; - xfs_agblock_t blocks; + xfs_filblks_t blocks; int error;
if (!sc->sa.refc_cur) @@ -816,7 +816,7 @@ xchk_agi_xref_fiblocks( struct xfs_scrub *sc) { struct xfs_agi *agi = sc->sa.agi_bp->b_addr; - xfs_agblock_t blocks; + xfs_filblks_t blocks; int error = 0;
if (!xfs_has_inobtcounts(sc->mp)) --- a/fs/xfs/scrub/agheader_repair.c +++ b/fs/xfs/scrub/agheader_repair.c @@ -256,7 +256,7 @@ xrep_agf_calc_from_btrees( struct xfs_agf *agf = agf_bp->b_addr; struct xfs_mount *mp = sc->mp; xfs_agblock_t btreeblks; - xfs_agblock_t blocks; + xfs_filblks_t blocks; int error;
/* Update the AGF counters from the bnobt. */ @@ -946,7 +946,7 @@ xrep_agi_calc_from_btrees( if (error) goto err; if (xfs_has_inobtcounts(mp)) { - xfs_agblock_t blocks; + xfs_filblks_t blocks;
error = xfs_btree_count_blocks(cur, &blocks); if (error) @@ -959,7 +959,7 @@ xrep_agi_calc_from_btrees( agi->agi_freecount = cpu_to_be32(freecount);
if (xfs_has_finobt(mp) && xfs_has_inobtcounts(mp)) { - xfs_agblock_t blocks; + xfs_filblks_t blocks;
cur = xfs_finobt_init_cursor(sc->sa.pag, sc->tp, agi_bp); error = xfs_btree_count_blocks(cur, &blocks); --- a/fs/xfs/scrub/fscounters.c +++ b/fs/xfs/scrub/fscounters.c @@ -261,7 +261,7 @@ xchk_fscount_btreeblks( struct xchk_fscounters *fsc, xfs_agnumber_t agno) { - xfs_extlen_t blocks; + xfs_filblks_t blocks; int error;
error = xchk_ag_init_existing(sc, agno, &sc->sa); --- a/fs/xfs/scrub/ialloc.c +++ b/fs/xfs/scrub/ialloc.c @@ -652,8 +652,8 @@ xchk_iallocbt_xref_rmap_btreeblks( struct xfs_scrub *sc) { xfs_filblks_t blocks; - xfs_extlen_t inobt_blocks = 0; - xfs_extlen_t finobt_blocks = 0; + xfs_filblks_t inobt_blocks = 0; + xfs_filblks_t finobt_blocks = 0; int error;
if (!sc->sa.ino_cur || !sc->sa.rmap_cur || --- a/fs/xfs/scrub/refcount.c +++ b/fs/xfs/scrub/refcount.c @@ -490,7 +490,7 @@ xchk_refcount_xref_rmap( struct xfs_scrub *sc, xfs_filblks_t cow_blocks) { - xfs_extlen_t refcbt_blocks = 0; + xfs_filblks_t refcbt_blocks = 0; xfs_filblks_t blocks; int error;
--- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -111,7 +111,7 @@ xfs_bmap_count_blocks( struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_cur *cur; - xfs_extlen_t btblocks = 0; + xfs_filblks_t btblocks = 0; int error;
*nextents = 0;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit af9f02457f461b23307fe826a37be61ba6e32c92 upstream.
xfs_bmap_rtalloc initializes the bno_hint variable to NULLRTBLOCK (aka NULLFSBLOCK). If the allocation request is for a file range that's adjacent to an existing mapping, it will then change bno_hint to the blkno hint in the bmalloca structure.
In other words, bno_hint is either a rt block number, or it's all 1s. Unfortunately, commit ec12f97f1b8a8f didn't take the NULLRTBLOCK state into account, which means that it tries to translate that into a realtime extent number. We then end up with an obnoxiously high rtx number and pointlessly feed that to the near allocator. This often fails and falls back to the by-size allocator. Seeing as we had no locality hint anyway, this is a waste of time.
Fix the code to detect a lack of bno_hint correctly. This was detected by running xfs/009 with metadir enabled and a 28k rt extent size.
Cc: stable@vger.kernel.org # v6.12 Fixes: ec12f97f1b8a8f ("xfs: make the rtalloc start hint a xfs_rtblock_t") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_rtalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1295,7 +1295,7 @@ xfs_rtallocate( * For an allocation to an empty file at offset 0, pick an extent that * will space things out in the rt area. */ - if (bno_hint) + if (bno_hint != NULLFSBLOCK) start = xfs_rtb_to_rtx(args.mp, bno_hint); else if (initial_user_data) start = xfs_rtpick_extent(args.mp, tp, maxlen);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 7f8b718c58783f3ff0810b39e2f62f50ba2549f6 upstream.
V4 symlink blocks didn't have headers, so return early if this is a V4 filesystem.
Cc: stable@vger.kernel.org # v5.1 Fixes: 39708c20ab5133 ("xfs: miscellaneous verifier magic value fixups") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_symlink_remote.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/xfs/libxfs/xfs_symlink_remote.c +++ b/fs/xfs/libxfs/xfs_symlink_remote.c @@ -92,8 +92,10 @@ xfs_symlink_verify( struct xfs_mount *mp = bp->b_mount; struct xfs_dsymlink_hdr *dsl = bp->b_addr;
+ /* no verification of non-crc buffers */ if (!xfs_has_crc(mp)) - return __this_address; + return NULL; + if (!xfs_verify_magic(bp, dsl->sl_magic)) return __this_address; if (!uuid_equal(&dsl->sl_uuid, &mp->m_sb.sb_meta_uuid))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit ffc3ea4f3c1cc83a86b7497b0c4b0aee7de5480d upstream.
Fix a minor mistakes in the scrub tracepoints that can manifest when inode-rooted btrees are enabled. The existing code worked fine for bmap btrees, but we should tighten the code up to be less sloppy.
Cc: stable@vger.kernel.org # v5.7 Fixes: 92219c292af8dd ("xfs: convert btree cursor inode-private member names") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/scrub/trace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -601,7 +601,7 @@ TRACE_EVENT(xchk_ifork_btree_op_error, TP_fast_assign( xfs_fsblock_t fsbno = xchk_btree_cur_fsbno(cur, level); __entry->dev = sc->mp->m_super->s_dev; - __entry->ino = sc->ip->i_ino; + __entry->ino = cur->bc_ino.ip->i_ino; __entry->whichfork = cur->bc_ino.whichfork; __entry->type = sc->sm->sm_type; __assign_str(name);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 44d9b07e52db25035680713c3428016cadcd2ea1 upstream.
Committing a transaction tx0 with a defer ops chain of (A, B, C) creates a chain of transactions that looks like this:
tx0 -> txA -> txB -> txC
Prior to commit cb042117488dbf, __xfs_trans_commit would run precommits on tx0, then call xfs_defer_finish_noroll to convert A-C to tx[A-C]. Unfortunately, after the finish_noroll loop we forgot to run precommits on txC. That was fixed by adding the second precommit call.
Unfortunately, none of us remembered that xfs_defer_finish_noroll calls __xfs_trans_commit a second time to commit tx0 before finishing work A in txA and committing that. In other words, we run precommits twice on tx0:
xfs_trans_commit(tx0) __xfs_trans_commit(tx0, false) xfs_trans_run_precommits(tx0) xfs_defer_finish_noroll(tx0) xfs_trans_roll(tx0) txA = xfs_trans_dup(tx0) __xfs_trans_commit(tx0, true) xfs_trans_run_precommits(tx0)
This currently isn't an issue because the inode item precommit is idempotent; the iunlink item precommit deletes itself so it can't be called again; and the buffer/dquot item precommits only check the incore objects for corruption. However, it doesn't make sense to run precommits twice.
Fix this situation by only running precommits after finish_noroll.
Cc: stable@vger.kernel.org # v6.4 Fixes: cb042117488dbf ("xfs: defered work could create precommits") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_trans.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-)
--- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -834,13 +834,6 @@ __xfs_trans_commit(
trace_xfs_trans_commit(tp, _RET_IP_);
- error = xfs_trans_run_precommits(tp); - if (error) { - if (tp->t_flags & XFS_TRANS_PERM_LOG_RES) - xfs_defer_cancel(tp); - goto out_unreserve; - } - /* * Finish deferred items on final commit. Only permanent transactions * should ever have deferred ops. @@ -851,13 +844,12 @@ __xfs_trans_commit( error = xfs_defer_finish_noroll(&tp); if (error) goto out_unreserve; - - /* Run precommits from final tx in defer chain. */ - error = xfs_trans_run_precommits(tp); - if (error) - goto out_unreserve; }
+ error = xfs_trans_run_precommits(tp); + if (error) + goto out_unreserve; + /* * If there is nothing to be logged by the transaction, * then unlock all of the items associated with the
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 53b001a21c9dff73b64e8c909c41991f01d5d00f upstream.
Debugging a filesystem patch with generic/475 caused the system to hang after observing the following sequences in dmesg:
XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x491520 len 32 error 5 XFS (dm-0): metadata I/O error in "xfs_btree_read_buf_block+0xba/0x160 [xfs]" at daddr 0x3445608 len 8 error 5 XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x138e1c0 len 32 error 5 XFS (dm-0): log I/O error -5 XFS (dm-0): Metadata I/O Error (0x1) detected at xfs_trans_read_buf_map+0x1ea/0x4b0 [xfs] (fs/xfs/xfs_trans_buf.c:311). Shutting down filesystem. XFS (dm-0): Please unmount the filesystem and rectify the problem(s) XFS (dm-0): Internal error dqp->q_ino.reserved < dqp->q_ino.count at line 869 of file fs/xfs/xfs_trans_dquot.c. Caller xfs_trans_dqresv+0x236/0x440 [xfs] XFS (dm-0): Corruption detected. Unmount and run xfs_repair XFS (dm-0): Unmounting Filesystem be6bcbcc-9921-4deb-8d16-7cc94e335fa7
The system is stuck in unmount trying to lock a couple of inodes so that they can be purged. The dquot corruption notice above is a clue to what happened -- a link() call tried to set up a transaction to link a child into a directory. Quota reservation for the transaction failed after IO errors shut down the filesystem, but then we forgot to unlock the inodes on our way out. Fix that.
Cc: stable@vger.kernel.org # v6.10 Fixes: bd5562111d5839 ("xfs: Hold inode locks in xfs_trans_alloc_dir") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_trans.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -1374,5 +1374,8 @@ done:
out_cancel: xfs_trans_cancel(tp); + xfs_iunlock(dp, XFS_ILOCK_EXCL); + if (dp != ip) + xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kumar Kartikeya Dwivedi memxor@gmail.com
commit 659b9ba7cb2d7adb64618b87ddfaa528a143766e upstream.
Robert Morris reported the following program type which passes the verifier in [0]:
SEC("struct_ops/bpf_cubic_init") void BPF_PROG(bpf_cubic_init, struct sock *sk) { asm volatile("r2 = *(u16*)(r1 + 0)"); // verifier should demand u64 asm volatile("*(u32 *)(r2 +1504) = 0"); // 1280 in some configs }
The second line may or may not work, but the first instruction shouldn't pass, as it's a narrow load into the context structure of the struct ops callback. The code falls back to btf_ctx_access to ensure correctness and obtaining the types of pointers. Ensure that the size of the access is correctly checked to be 8 bytes, otherwise the verifier thinks the narrow load obtained a trusted BTF pointer and will permit loads/stores as it sees fit.
Perform the check on size after we've verified that the load is for a pointer field, as for scalar values narrow loads are fine. Access to structs passed as arguments to a BPF program are also treated as scalars, therefore no adjustment is needed in their case.
Existing verifier selftests are broken by this change, but because they were incorrect. Verifier tests for d_path were performing narrow load into context to obtain path pointer, had this program actually run it would cause a crash. The same holds for verifier_btf_ctx_access tests.
[0]: https://lore.kernel.org/bpf/51338.1732985814@localhost
Fixes: 9e15db66136a ("bpf: Implement accurate raw_tp context access via BTF") Reported-by: Robert Morris rtm@mit.edu Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Link: https://lore.kernel.org/r/20241212092050.3204165-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/btf.c | 6 ++++++ tools/testing/selftests/bpf/progs/verifier_btf_ctx_access.c | 4 ++-- tools/testing/selftests/bpf/progs/verifier_d_path.c | 4 ++-- 3 files changed, 10 insertions(+), 4 deletions(-)
--- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6519,6 +6519,12 @@ bool btf_ctx_access(int off, int size, e return false; }
+ if (size != sizeof(u64)) { + bpf_log(log, "func '%s' size %d must be 8\n", + tname, size); + return false; + } + /* check for PTR_TO_RDONLY_BUF_OR_NULL or PTR_TO_RDWR_BUF_OR_NULL */ for (i = 0; i < prog->aux->ctx_arg_info_size; i++) { const struct bpf_ctx_arg_aux *ctx_arg_info = &prog->aux->ctx_arg_info[i]; --- a/tools/testing/selftests/bpf/progs/verifier_btf_ctx_access.c +++ b/tools/testing/selftests/bpf/progs/verifier_btf_ctx_access.c @@ -11,7 +11,7 @@ __success __retval(0) __naked void btf_ctx_access_accept(void) { asm volatile (" \ - r2 = *(u32*)(r1 + 8); /* load 2nd argument value (int pointer) */\ + r2 = *(u64 *)(r1 + 8); /* load 2nd argument value (int pointer) */\ r0 = 0; \ exit; \ " ::: __clobber_all); @@ -23,7 +23,7 @@ __success __retval(0) __naked void ctx_access_u32_pointer_accept(void) { asm volatile (" \ - r2 = *(u32*)(r1 + 0); /* load 1nd argument value (u32 pointer) */\ + r2 = *(u64 *)(r1 + 0); /* load 1nd argument value (u32 pointer) */\ r0 = 0; \ exit; \ " ::: __clobber_all); --- a/tools/testing/selftests/bpf/progs/verifier_d_path.c +++ b/tools/testing/selftests/bpf/progs/verifier_d_path.c @@ -11,7 +11,7 @@ __success __retval(0) __naked void d_path_accept(void) { asm volatile (" \ - r1 = *(u32*)(r1 + 0); \ + r1 = *(u64 *)(r1 + 0); \ r2 = r10; \ r2 += -8; \ r6 = 0; \ @@ -31,7 +31,7 @@ __failure __msg("helper call is not allo __naked void d_path_reject(void) { asm volatile (" \ - r1 = *(u32*)(r1 + 0); \ + r1 = *(u64 *)(r1 + 0); \ r2 = r10; \ r2 += -8; \ r6 = 0; \
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
commit 7d0d673627e20cfa3b21a829a896ce03b58a4f1c upstream.
Currently, the pointer stored in call->prog_array is loaded in __uprobe_perf_func(), with no RCU annotation and no immediately visible RCU protection, so it looks as if the loaded pointer can immediately be dangling. Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, but this is too late. It then uses rcu_dereference_check(), but this use of rcu_dereference_check() does not actually dereference anything.
Fix it by aligning the semantics to bpf_prog_run_array(): Let the caller provide rcu_read_lock_trace() protection and then load call->prog_array with rcu_dereference_check().
This issue seems to be theoretical: I don't know of any way to reach this code without having handle_swbp() further up the stack, which is already holding a rcu_read_lock_trace() lock, so where we take rcu_read_lock_trace() in __uprobe_perf_func()/bpf_prog_run_array_uprobe() doesn't actually have any effect.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Suggested-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20241210-bpf-fix-uprobe-uaf-v4-1-5fc8959b2b74@go... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf.h | 13 +++++-------- kernel/trace/trace_uprobe.c | 6 +++++- 2 files changed, 10 insertions(+), 9 deletions(-)
--- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2157,26 +2157,25 @@ bpf_prog_run_array(const struct bpf_prog * rcu-protected dynamically sized maps. */ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog; - const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; u32 ret = 1;
might_fault(); + RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held"); + + if (unlikely(!array)) + return ret;
- rcu_read_lock_trace(); migrate_disable();
run_ctx.is_uprobe = true;
- array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held()); - if (unlikely(!array)) - goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) { @@ -2191,9 +2190,7 @@ bpf_prog_run_array_uprobe(const struct b rcu_read_unlock(); } bpf_reset_run_ctx(old_run_ctx); -out: migrate_enable(); - rcu_read_unlock_trace(); return ret; }
--- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1400,9 +1400,13 @@ static void __uprobe_perf_func(struct tr
#ifdef CONFIG_BPF_EVENTS if (bpf_prog_array_valid(call)) { + const struct bpf_prog_array *array; u32 ret;
- ret = bpf_prog_run_array_uprobe(call->prog_array, regs, bpf_prog_run); + rcu_read_lock_trace(); + array = rcu_dereference_check(call->prog_array, rcu_read_lock_trace_held()); + ret = bpf_prog_run_array_uprobe(array, regs, bpf_prog_run); + rcu_read_unlock_trace(); if (!ret) return; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Olsa jolsa@kernel.org
commit 978c4486cca5c7b9253d3ab98a88c8e769cb9bbd upstream.
Syzbot reported [1] crash that happens for following tracing scenario:
- create tracepoint perf event with attr.inherit=1, attach it to the process and set bpf program to it - attached process forks -> chid creates inherited event
the new child event shares the parent's bpf program and tp_event (hence prog_array) which is global for tracepoint
- exit both process and its child -> release both events - first perf_event_detach_bpf_prog call will release tp_event->prog_array and second perf_event_detach_bpf_prog will crash, because tp_event->prog_array is NULL
The fix makes sure the perf_event_detach_bpf_prog checks prog_array is valid before it tries to remove the bpf program from it.
[1] https://lore.kernel.org/bpf/Z1MR6dCIKajNS6nU@krava/T/#m91dbf0688221ec7a7fc95...
Fixes: 0ee288e69d03 ("bpf,perf: Fix perf_event_detach_bpf_prog error handling") Reported-by: syzbot+2e0d2840414ce817aaac@syzkaller.appspotmail.com Signed-off-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20241208142507.1207698-1-jolsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/bpf_trace.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2215,6 +2215,9 @@ void perf_event_detach_bpf_prog(struct p goto unlock;
old_array = bpf_event_rcu_dereference(event->tp_event->prog_array); + if (!old_array) + goto put; + ret = bpf_prog_array_copy(old_array, event->prog, NULL, 0, &new_array); if (ret < 0) { bpf_prog_array_delete_safe(old_array, event->prog); @@ -2223,6 +2226,7 @@ void perf_event_detach_bpf_prog(struct p bpf_prog_array_free_sleepable(old_array); }
+put: /* * It could be that the bpf_prog is not sleepable (and will be freed * via normal RCU), but is called from a point that supports sleepable
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
commit ed1fc5d76b81a4d681211333c026202cad4d5649 upstream.
Element replace (with a socket different from the one stored) may race with socket's close() link popping & unlinking. __sock_map_delete() unconditionally unrefs the (wrong) element:
// set map[0] = s0 map_update_elem(map, 0, s0)
// drop fd of s0 close(s0) sock_map_close() lock_sock(sk) (s0!) sock_map_remove_links(sk) link = sk_psock_link_pop() sock_map_unlink(sk, link) sock_map_delete_from_link // replace map[0] with s1 map_update_elem(map, 0, s1) sock_map_update_elem (s1!) lock_sock(sk) sock_map_update_common psock = sk_psock(sk) spin_lock(&stab->lock) osk = stab->sks[idx] sock_map_add_link(..., &stab->sks[idx]) sock_map_unref(osk, &stab->sks[idx]) psock = sk_psock(osk) sk_psock_put(sk, psock) if (refcount_dec_and_test(&psock)) sk_psock_drop(sk, psock) spin_unlock(&stab->lock) unlock_sock(sk) __sock_map_delete spin_lock(&stab->lock) sk = *psk // s1 replaced s0; sk == s1 if (!sk_test || sk_test == sk) // sk_test (s0) != sk (s1); no branch sk = xchg(psk, NULL) if (sk) sock_map_unref(sk, psk) // unref s1; sks[idx] will dangle psock = sk_psock(sk) sk_psock_put(sk, psock) if (refcount_dec_and_test()) sk_psock_drop(sk, psock) spin_unlock(&stab->lock) release_sock(sk)
Then close(map) enqueues bpf_map_free_deferred, which finally calls sock_map_free(). This results in some refcount_t warnings along with a KASAN splat [1].
Fix __sock_map_delete(), do not allow sock_map_unref() on elements that may have been replaced.
[1]: BUG: KASAN: slab-use-after-free in sock_map_free+0x10e/0x330 Write of size 4 at addr ffff88811f5b9100 by task kworker/u64:12/1063
CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Not tainted 6.12.0+ #125 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 Workqueue: events_unbound bpf_map_free_deferred Call Trace: <TASK> dump_stack_lvl+0x68/0x90 print_report+0x174/0x4f6 kasan_report+0xb9/0x190 kasan_check_range+0x10f/0x1e0 sock_map_free+0x10e/0x330 bpf_map_free_deferred+0x173/0x320 process_one_work+0x846/0x1420 worker_thread+0x5b3/0xf80 kthread+0x29e/0x360 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30 </TASK>
Allocated by task 1202: kasan_save_stack+0x1e/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x85/0x90 kmem_cache_alloc_noprof+0x131/0x450 sk_prot_alloc+0x5b/0x220 sk_alloc+0x2c/0x870 unix_create1+0x88/0x8a0 unix_create+0xc5/0x180 __sock_create+0x241/0x650 __sys_socketpair+0x1ce/0x420 __x64_sys_socketpair+0x92/0x100 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 46: kasan_save_stack+0x1e/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x37/0x60 __kasan_slab_free+0x4b/0x70 kmem_cache_free+0x1a1/0x590 __sk_destruct+0x388/0x5a0 sk_psock_destroy+0x73e/0xa50 process_one_work+0x846/0x1420 worker_thread+0x5b3/0xf80 kthread+0x29e/0x360 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30
The buggy address belongs to the object at ffff88811f5b9080 which belongs to the cache UNIX-STREAM of size 1984 The buggy address is located 128 bytes inside of freed 1984-byte region [ffff88811f5b9080, ffff88811f5b9840)
The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11f5b8 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 memcg:ffff888127d49401 flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) page_type: f5(slab) raw: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000 raw: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401 head: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000 head: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401 head: 0017ffffc0000003 ffffea00047d6e01 ffffffffffffffff 0000000000000000 head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88811f5b9000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88811f5b9080: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88811f5b9180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88811f5b9200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Disabling lock debugging due to kernel taint
refcount_t: addition on 0; use-after-free. WARNING: CPU: 14 PID: 1063 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150 CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G B 6.12.0+ #125 Tainted: [B]=BAD_PAGE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 Workqueue: events_unbound bpf_map_free_deferred RIP: 0010:refcount_warn_saturate+0xce/0x150 Code: 34 73 eb 03 01 e8 82 53 ad fe 0f 0b eb b1 80 3d 27 73 eb 03 00 75 a8 48 c7 c7 80 bd 95 84 c6 05 17 73 eb 03 01 e8 62 53 ad fe <0f> 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05 RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001 RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed10bcde6349 R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000 R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024 FS: 0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0 PKRU: 55555554 Call Trace: <TASK> ? __warn.cold+0x5f/0x1ff ? refcount_warn_saturate+0xce/0x150 ? report_bug+0x1ec/0x390 ? handle_bug+0x58/0x90 ? exc_invalid_op+0x13/0x40 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0xce/0x150 sock_map_free+0x2e5/0x330 bpf_map_free_deferred+0x173/0x320 process_one_work+0x846/0x1420 worker_thread+0x5b3/0xf80 kthread+0x29e/0x360 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30 </TASK> irq event stamp: 10741 hardirqs last enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20 hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770 softirqs last enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210 softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
refcount_t: underflow; use-after-free. WARNING: CPU: 14 PID: 1063 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150 CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G B W 6.12.0+ #125 Tainted: [B]=BAD_PAGE, [W]=WARN Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 Workqueue: events_unbound bpf_map_free_deferred RIP: 0010:refcount_warn_saturate+0xee/0x150 Code: 17 73 eb 03 01 e8 62 53 ad fe 0f 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05 f6 72 eb 03 01 e8 42 53 ad fe <0f> 0b e9 6e ff ff ff 80 3d e6 72 eb 03 00 0f 85 61 ff ff ff 48 c7 RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001 RBP: 0000000000000003 R08: 0000000000000001 R09: ffffed10bcde6349 R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000 R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024 FS: 0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0 PKRU: 55555554 Call Trace: <TASK> ? __warn.cold+0x5f/0x1ff ? refcount_warn_saturate+0xee/0x150 ? report_bug+0x1ec/0x390 ? handle_bug+0x58/0x90 ? exc_invalid_op+0x13/0x40 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0xee/0x150 sock_map_free+0x2d3/0x330 bpf_map_free_deferred+0x173/0x320 process_one_work+0x846/0x1420 worker_thread+0x5b3/0xf80 kthread+0x29e/0x360 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30 </TASK> irq event stamp: 10741 hardirqs last enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20 hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770 softirqs last enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210 softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Michal Luczaj mhal@rbox.co Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/bpf/20241202-sockmap-replace-v1-3-1e88579e7bd5@rbox.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/sock_map.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -411,12 +411,11 @@ static void *sock_map_lookup_sys(struct static int __sock_map_delete(struct bpf_stab *stab, struct sock *sk_test, struct sock **psk) { - struct sock *sk; + struct sock *sk = NULL; int err = 0;
spin_lock_bh(&stab->lock); - sk = *psk; - if (!sk_test || sk_test == sk) + if (!sk_test || sk_test == *psk) sk = xchg(psk, NULL);
if (likely(sk))
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
commit 75e072a390da9a22e7ae4a4e8434dfca5da499fb upstream.
Consider a sockmap entry being updated with the same socket:
osk = stab->sks[idx]; sock_map_add_link(psock, link, map, &stab->sks[idx]); stab->sks[idx] = sk; if (osk) sock_map_unref(osk, &stab->sks[idx]);
Due to sock_map_unref(), which invokes sock_map_del_link(), all the psock's links for stab->sks[idx] are torn:
list_for_each_entry_safe(link, tmp, &psock->link, list) { if (link->link_raw == link_raw) { ... list_del(&link->list); sk_psock_free_link(link); } }
And that includes the new link sock_map_add_link() added just before the unref.
This results in a sockmap holding a socket, but without the respective link. This in turn means that close(sock) won't trigger the cleanup, i.e. a closed socket will not be automatically removed from the sockmap.
Stop tearing the links when a matching link_raw is found.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Michal Luczaj mhal@rbox.co Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/bpf/20241202-sockmap-replace-v1-1-1e88579e7bd5@rbox.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/sock_map.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -159,6 +159,7 @@ static void sock_map_del_link(struct soc verdict_stop = true; list_del(&link->list); sk_psock_free_link(link); + break; } } spin_unlock_bh(&psock->link_lock);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kumar Kartikeya Dwivedi memxor@gmail.com
commit 838a10bd2ebfe11a60dd67687533a7cfc220cc86 upstream.
Arguments to a raw tracepoint are tagged as trusted, which carries the semantics that the pointer will be non-NULL. However, in certain cases, a raw tracepoint argument may end up being NULL. More context about this issue is available in [0].
Thus, there is a discrepancy between the reality, that raw_tp arguments can actually be NULL, and the verifier's knowledge, that they are never NULL, causing explicit NULL check branch to be dead code eliminated.
A previous attempt [1], i.e. the second fixed commit, was made to simulate symbolic execution as if in most accesses, the argument is a non-NULL raw_tp, except for conditional jumps. This tried to suppress branch prediction while preserving compatibility, but surfaced issues with production programs that were difficult to solve without increasing verifier complexity. A more complete discussion of issues and fixes is available at [2].
Fix this by maintaining an explicit list of tracepoints where the arguments are known to be NULL, and mark the positional arguments as PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments are known to be ERR_PTR, and mark these arguments as scalar values to prevent potential dereference.
Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2), shifted by the zero-indexed argument number x 4. This can be represented as follows: 1st arg: 0x1 2nd arg: 0x10 3rd arg: 0x100 ... and so on (likewise for ERR_PTR case).
In the future, an automated pass will be used to produce such a list, or insert __nullable annotations automatically for tracepoints. Each compilation unit will be analyzed and results will be collated to find whether a tracepoint pointer is definitely not null, maybe null, or an unknown state where verifier conservatively marks it PTR_MAYBE_NULL. A proof of concept of this tool from Eduard is available at [3].
Note that in case we don't find a specification in the raw_tp_null_args array and the tracepoint belongs to a kernel module, we will conservatively mark the arguments as PTR_MAYBE_NULL. This is because unlike for in-tree modules, out-of-tree module tracepoints may pass NULL freely to the tracepoint. We don't protect against such tracepoints passing ERR_PTR (which is uncommon anyway), lest we mark all such arguments as SCALAR_VALUE.
While we are it, let's adjust the test raw_tp_null to not perform dereference of the skb->mark, as that won't be allowed anymore, and make it more robust by using inline assembly to test the dead code elimination behavior, which should still stay the same.
[0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.c... [1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com [2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com [3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params
Reported-by: Juri Lelli juri.lelli@redhat.com # original bug Reported-by: Manu Bretelle chantra@meta.com # bugs in masking fix Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs") Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") Reviewed-by: Eduard Zingerman eddyz87@gmail.com Co-developed-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/btf.c | 138 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+)
--- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6415,6 +6415,101 @@ int btf_ctx_arg_offset(const struct btf return off; }
+struct bpf_raw_tp_null_args { + const char *func; + u64 mask; +}; + +static const struct bpf_raw_tp_null_args raw_tp_null_args[] = { + /* sched */ + { "sched_pi_setprio", 0x10 }, + /* ... from sched_numa_pair_template event class */ + { "sched_stick_numa", 0x100 }, + { "sched_swap_numa", 0x100 }, + /* afs */ + { "afs_make_fs_call", 0x10 }, + { "afs_make_fs_calli", 0x10 }, + { "afs_make_fs_call1", 0x10 }, + { "afs_make_fs_call2", 0x10 }, + { "afs_protocol_error", 0x1 }, + { "afs_flock_ev", 0x10 }, + /* cachefiles */ + { "cachefiles_lookup", 0x1 | 0x200 }, + { "cachefiles_unlink", 0x1 }, + { "cachefiles_rename", 0x1 }, + { "cachefiles_prep_read", 0x1 }, + { "cachefiles_mark_active", 0x1 }, + { "cachefiles_mark_failed", 0x1 }, + { "cachefiles_mark_inactive", 0x1 }, + { "cachefiles_vfs_error", 0x1 }, + { "cachefiles_io_error", 0x1 }, + { "cachefiles_ondemand_open", 0x1 }, + { "cachefiles_ondemand_copen", 0x1 }, + { "cachefiles_ondemand_close", 0x1 }, + { "cachefiles_ondemand_read", 0x1 }, + { "cachefiles_ondemand_cread", 0x1 }, + { "cachefiles_ondemand_fd_write", 0x1 }, + { "cachefiles_ondemand_fd_release", 0x1 }, + /* ext4, from ext4__mballoc event class */ + { "ext4_mballoc_discard", 0x10 }, + { "ext4_mballoc_free", 0x10 }, + /* fib */ + { "fib_table_lookup", 0x100 }, + /* filelock */ + /* ... from filelock_lock event class */ + { "posix_lock_inode", 0x10 }, + { "fcntl_setlk", 0x10 }, + { "locks_remove_posix", 0x10 }, + { "flock_lock_inode", 0x10 }, + /* ... from filelock_lease event class */ + { "break_lease_noblock", 0x10 }, + { "break_lease_block", 0x10 }, + { "break_lease_unblock", 0x10 }, + { "generic_delete_lease", 0x10 }, + { "time_out_leases", 0x10 }, + /* host1x */ + { "host1x_cdma_push_gather", 0x10000 }, + /* huge_memory */ + { "mm_khugepaged_scan_pmd", 0x10 }, + { "mm_collapse_huge_page_isolate", 0x1 }, + { "mm_khugepaged_scan_file", 0x10 }, + { "mm_khugepaged_collapse_file", 0x10 }, + /* kmem */ + { "mm_page_alloc", 0x1 }, + { "mm_page_pcpu_drain", 0x1 }, + /* .. from mm_page event class */ + { "mm_page_alloc_zone_locked", 0x1 }, + /* netfs */ + { "netfs_failure", 0x10 }, + /* power */ + { "device_pm_callback_start", 0x10 }, + /* qdisc */ + { "qdisc_dequeue", 0x1000 }, + /* rxrpc */ + { "rxrpc_recvdata", 0x1 }, + { "rxrpc_resend", 0x10 }, + /* sunrpc */ + { "xs_stream_read_data", 0x1 }, + /* ... from xprt_cong_event event class */ + { "xprt_reserve_cong", 0x10 }, + { "xprt_release_cong", 0x10 }, + { "xprt_get_cong", 0x10 }, + { "xprt_put_cong", 0x10 }, + /* tcp */ + { "tcp_send_reset", 0x11 }, + /* tegra_apb_dma */ + { "tegra_dma_tx_status", 0x100 }, + /* timer_migration */ + { "tmigr_update_events", 0x1 }, + /* writeback, from writeback_folio_template event class */ + { "writeback_dirty_folio", 0x10 }, + { "folio_wait_writeback", 0x10 }, + /* rdma */ + { "mr_integ_alloc", 0x2000 }, + /* bpf_testmod */ + { "bpf_testmod_test_read", 0x0 }, +}; + bool btf_ctx_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, struct bpf_insn_access_aux *info) @@ -6425,6 +6520,7 @@ bool btf_ctx_access(int off, int size, e const char *tname = prog->aux->attach_func_name; struct bpf_verifier_log *log = info->log; const struct btf_param *args; + bool ptr_err_raw_tp = false; const char *tag_value; u32 nr_args, arg; int i, ret; @@ -6573,6 +6669,39 @@ bool btf_ctx_access(int off, int size, e if (btf_param_match_suffix(btf, &args[arg], "__nullable")) info->reg_type |= PTR_MAYBE_NULL;
+ if (prog->expected_attach_type == BPF_TRACE_RAW_TP) { + struct btf *btf = prog->aux->attach_btf; + const struct btf_type *t; + const char *tname; + + /* BTF lookups cannot fail, return false on error */ + t = btf_type_by_id(btf, prog->aux->attach_btf_id); + if (!t) + return false; + tname = btf_name_by_offset(btf, t->name_off); + if (!tname) + return false; + /* Checked by bpf_check_attach_target */ + tname += sizeof("btf_trace_") - 1; + for (i = 0; i < ARRAY_SIZE(raw_tp_null_args); i++) { + /* Is this a func with potential NULL args? */ + if (strcmp(tname, raw_tp_null_args[i].func)) + continue; + if (raw_tp_null_args[i].mask & (0x1 << (arg * 4))) + info->reg_type |= PTR_MAYBE_NULL; + /* Is the current arg IS_ERR? */ + if (raw_tp_null_args[i].mask & (0x2 << (arg * 4))) + ptr_err_raw_tp = true; + break; + } + /* If we don't know NULL-ness specification and the tracepoint + * is coming from a loadable module, be conservative and mark + * argument as PTR_MAYBE_NULL. + */ + if (i == ARRAY_SIZE(raw_tp_null_args) && btf_is_module(btf)) + info->reg_type |= PTR_MAYBE_NULL; + } + if (tgt_prog) { enum bpf_prog_type tgt_type;
@@ -6617,6 +6746,15 @@ bool btf_ctx_access(int off, int size, e bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n", tname, arg, info->btf_id, btf_type_str(t), __btf_name_by_offset(btf, t->name_off)); + + /* Perform all checks on the validity of type for this argument, but if + * we know it can be IS_ERR at runtime, scrub pointer type and mark as + * scalar. + */ + if (ptr_err_raw_tp) { + bpf_log(log, "marking pointer arg%d as scalar as it may encode error", arg); + info->reg_type = SCALAR_VALUE; + } return true; } EXPORT_SYMBOL_GPL(btf_ctx_access);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namhyung Kim namhyung@kernel.org
[ Upstream commit 23c44f6c83257923b179461694edcf62749bedd5 ]
The build-id events written at the end of the record session are broken due to unexpected data. The write_buildid() writes the fixed length event first and then variable length filename.
But a recent change made it write more data in the padding area accidentally. So readers of the event see zero-filled data for the next entry and treat it incorrectly. This resulted in wrong kernel symbols because the kernel DSO loaded a random vmlinux image in the path as it didn't have a valid build-id.
Fixes: ae39ba16554e ("perf inject: Fix build ID injection") Reported-by: Linus Torvalds torvalds@linux-foundation.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Reviewed-by: Ian Rogers irogers@google.com Link: https://lore.kernel.org/r/Z0aRFFW9xMh3mqKB@google.com Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/build-id.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c index 8982f68e7230..e763e8d99a43 100644 --- a/tools/perf/util/build-id.c +++ b/tools/perf/util/build-id.c @@ -277,7 +277,7 @@ static int write_buildid(const char *name, size_t name_len, struct build_id *bid struct perf_record_header_build_id b; size_t len;
- len = sizeof(b) + name_len + 1; + len = name_len + 1; len = PERF_ALIGN(len, sizeof(u64));
memset(&b, 0, sizeof(b)); @@ -286,7 +286,7 @@ static int write_buildid(const char *name, size_t name_len, struct build_id *bid misc |= PERF_RECORD_MISC_BUILD_ID_SIZE; b.pid = pid; b.header.misc = misc; - b.header.size = len; + b.header.size = sizeof(b) + len;
err = do_write(fd, &b, sizeof(b)); if (err < 0)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lin Ma linma@zju.edu.cn
[ Upstream commit 2e3dbf938656986cce73ac4083500d0bcfbffe24 ]
Since the netlink attribute range validation provides inclusive checking, the *max* of attribute NL80211_ATTR_MLO_LINK_ID should be IEEE80211_MLD_MAX_NUM_LINKS - 1 otherwise causing an off-by-one.
One crash stack for demonstration: ================================================================== BUG: KASAN: wild-memory-access in ieee80211_tx_control_port+0x3b6/0xca0 net/mac80211/tx.c:5939 Read of size 6 at addr 001102080000000c by task fuzzer.386/9508
CPU: 1 PID: 9508 Comm: syz.1.386 Not tainted 6.1.70 #2 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x177/0x231 lib/dump_stack.c:106 print_report+0xe0/0x750 mm/kasan/report.c:398 kasan_report+0x139/0x170 mm/kasan/report.c:495 kasan_check_range+0x287/0x290 mm/kasan/generic.c:189 memcpy+0x25/0x60 mm/kasan/shadow.c:65 ieee80211_tx_control_port+0x3b6/0xca0 net/mac80211/tx.c:5939 rdev_tx_control_port net/wireless/rdev-ops.h:761 [inline] nl80211_tx_control_port+0x7b3/0xc40 net/wireless/nl80211.c:15453 genl_family_rcv_msg_doit+0x22e/0x320 net/netlink/genetlink.c:756 genl_family_rcv_msg net/netlink/genetlink.c:833 [inline] genl_rcv_msg+0x539/0x740 net/netlink/genetlink.c:850 netlink_rcv_skb+0x1de/0x420 net/netlink/af_netlink.c:2508 genl_rcv+0x24/0x40 net/netlink/genetlink.c:861 netlink_unicast_kernel net/netlink/af_netlink.c:1326 [inline] netlink_unicast+0x74b/0x8c0 net/netlink/af_netlink.c:1352 netlink_sendmsg+0x882/0xb90 net/netlink/af_netlink.c:1874 sock_sendmsg_nosec net/socket.c:716 [inline] __sock_sendmsg net/socket.c:728 [inline] ____sys_sendmsg+0x5cc/0x8f0 net/socket.c:2499 ___sys_sendmsg+0x21c/0x290 net/socket.c:2553 __sys_sendmsg net/socket.c:2582 [inline] __do_sys_sendmsg net/socket.c:2591 [inline] __se_sys_sendmsg+0x19e/0x270 net/socket.c:2589 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x90 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Update the policy to ensure correct validation.
Fixes: 7b0a0e3c3a88 ("wifi: cfg80211: do some rework towards MLO link APIs") Signed-off-by: Lin Ma linma@zju.edu.cn Suggested-by: Cengiz Can cengiz.can@canonical.com Link: https://patch.msgid.link/20241130170526.96698-1-linma@zju.edu.cn Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/nl80211.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 9b1b9dc5a7eb..1e78f575fb56 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -814,7 +814,7 @@ static const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_MLO_LINKS] = NLA_POLICY_NESTED_ARRAY(nl80211_policy), [NL80211_ATTR_MLO_LINK_ID] = - NLA_POLICY_RANGE(NLA_U8, 0, IEEE80211_MLD_MAX_NUM_LINKS), + NLA_POLICY_RANGE(NLA_U8, 0, IEEE80211_MLD_MAX_NUM_LINKS - 1), [NL80211_ATTR_MLD_ADDR] = NLA_POLICY_EXACT_LEN(ETH_ALEN), [NL80211_ATTR_MLO_SUPPORT] = { .type = NLA_FLAG }, [NL80211_ATTR_MAX_NUM_AKM_SUITES] = { .type = NLA_REJECT },
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haoyu Li lihaoyu499@gmail.com
[ Upstream commit 496db69fd860570145f7c266b31f3af85fca5b00 ]
With the new __counted_by annocation in cfg80211_mbssid_elems, the "cnt" struct member must be set before accessing the "elem" array. Failing to do so will trigger a runtime warning when enabling CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.
Fixes: c14679d7005a ("wifi: cfg80211: Annotate struct cfg80211_mbssid_elems with __counted_by") Signed-off-by: Haoyu Li lihaoyu499@gmail.com Link: https://patch.msgid.link/20241123172500.311853-1-lihaoyu499@gmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/cfg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 6dfc61a9acd4..242b718b1cd9 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1061,13 +1061,13 @@ ieee80211_copy_mbssid_beacon(u8 *pos, struct cfg80211_mbssid_elems *dst, { int i, offset = 0;
+ dst->cnt = src->cnt; for (i = 0; i < src->cnt; i++) { memcpy(pos + offset, src->elem[i].data, src->elem[i].len); dst->elem[i].len = src->elem[i].len; dst->elem[i].data = pos + offset; offset += dst->elem[i].len; } - dst->cnt = src->cnt;
return offset; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
[ Upstream commit 11ac0d7c3b5ba58232fb7dacb54371cbe75ec183 ]
If we got an unprotected action frame with CSA and then we heard the beacon with the CSA IE, we'll block the queues with the CSA reason twice. Since this reason is refcounted, we won't wake up the queues since we wake them up only once and the ref count will never reach 0. This led to blocked queues that prevented any activity (even disconnection wouldn't reset the queue state and the only way to recover would be to reload the kernel module.
Fix this by not refcounting the CSA reason. It becomes now pointless to maintain the csa_blocked_queues state. Remove it.
Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Fixes: 414e090bc41d ("wifi: mac80211: restrict public action ECSA frame handling") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219447 Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20241119173108.5ea90828c2cc.I4f89e58572fb71ae48e47a... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/wireless/intel/iwlwifi/mvm/mac-ctxt.c | 2 +- include/net/mac80211.h | 4 +- net/mac80211/cfg.c | 3 +- net/mac80211/ieee80211_i.h | 49 +++++++++++++++---- net/mac80211/iface.c | 12 ++--- net/mac80211/mlme.c | 2 - net/mac80211/util.c | 23 ++------- 7 files changed, 50 insertions(+), 45 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c index a7a10e716e65..e96ddaeeeeff 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c @@ -1967,7 +1967,7 @@ void iwl_mvm_channel_switch_error_notif(struct iwl_mvm *mvm, if (csa_err_mask & (CS_ERR_COUNT_ERROR | CS_ERR_LONG_DELAY_AFTER_CS | CS_ERR_TX_BLOCK_TIMER_EXPIRED)) - ieee80211_channel_switch_disconnect(vif, true); + ieee80211_channel_switch_disconnect(vif); rcu_read_unlock(); }
diff --git a/include/net/mac80211.h b/include/net/mac80211.h index 333e0fae6796..5b712582f9a9 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -6770,14 +6770,12 @@ void ieee80211_chswitch_done(struct ieee80211_vif *vif, bool success, /** * ieee80211_channel_switch_disconnect - disconnect due to channel switch error * @vif: &struct ieee80211_vif pointer from the add_interface callback. - * @block_tx: if %true, do not send deauth frame. * * Instruct mac80211 to disconnect due to a channel switch error. The channel * switch can request to block the tx and so, we need to make sure we do not send * a deauth frame in this case. */ -void ieee80211_channel_switch_disconnect(struct ieee80211_vif *vif, - bool block_tx); +void ieee80211_channel_switch_disconnect(struct ieee80211_vif *vif);
/** * ieee80211_request_smps - request SM PS transition diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 242b718b1cd9..16d47123a73c 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -3674,13 +3674,12 @@ void ieee80211_csa_finish(struct ieee80211_vif *vif, unsigned int link_id) } EXPORT_SYMBOL(ieee80211_csa_finish);
-void ieee80211_channel_switch_disconnect(struct ieee80211_vif *vif, bool block_tx) +void ieee80211_channel_switch_disconnect(struct ieee80211_vif *vif) { struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; struct ieee80211_local *local = sdata->local;
- sdata->csa_blocked_queues = block_tx; sdata_info(sdata, "channel switch failed, disconnecting\n"); wiphy_work_queue(local->hw.wiphy, &ifmgd->csa_connection_drop_work); } diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 3d3c9139ff5e..7a0242e937d3 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -1106,8 +1106,6 @@ struct ieee80211_sub_if_data {
unsigned long state;
- bool csa_blocked_queues; - char name[IFNAMSIZ];
struct ieee80211_fragment_cache frags; @@ -2411,17 +2409,13 @@ void ieee80211_send_4addr_nullfunc(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata); void ieee80211_sta_tx_notify(struct ieee80211_sub_if_data *sdata, struct ieee80211_hdr *hdr, bool ack, u16 tx_time); - +unsigned int +ieee80211_get_vif_queues(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata); void ieee80211_wake_queues_by_reason(struct ieee80211_hw *hw, unsigned long queues, enum queue_stop_reason reason, bool refcounted); -void ieee80211_stop_vif_queues(struct ieee80211_local *local, - struct ieee80211_sub_if_data *sdata, - enum queue_stop_reason reason); -void ieee80211_wake_vif_queues(struct ieee80211_local *local, - struct ieee80211_sub_if_data *sdata, - enum queue_stop_reason reason); void ieee80211_stop_queues_by_reason(struct ieee80211_hw *hw, unsigned long queues, enum queue_stop_reason reason, @@ -2432,6 +2426,43 @@ void ieee80211_wake_queue_by_reason(struct ieee80211_hw *hw, int queue, void ieee80211_stop_queue_by_reason(struct ieee80211_hw *hw, int queue, enum queue_stop_reason reason, bool refcounted); +static inline void +ieee80211_stop_vif_queues(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + enum queue_stop_reason reason) +{ + ieee80211_stop_queues_by_reason(&local->hw, + ieee80211_get_vif_queues(local, sdata), + reason, true); +} + +static inline void +ieee80211_wake_vif_queues(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + enum queue_stop_reason reason) +{ + ieee80211_wake_queues_by_reason(&local->hw, + ieee80211_get_vif_queues(local, sdata), + reason, true); +} +static inline void +ieee80211_stop_vif_queues_norefcount(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + enum queue_stop_reason reason) +{ + ieee80211_stop_queues_by_reason(&local->hw, + ieee80211_get_vif_queues(local, sdata), + reason, false); +} +static inline void +ieee80211_wake_vif_queues_norefcount(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + enum queue_stop_reason reason) +{ + ieee80211_wake_queues_by_reason(&local->hw, + ieee80211_get_vif_queues(local, sdata), + reason, false); +} void ieee80211_add_pending_skb(struct ieee80211_local *local, struct sk_buff *skb); void ieee80211_add_pending_skbs(struct ieee80211_local *local, diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 6ef0990d3d29..af9055252e6d 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -2364,18 +2364,14 @@ void ieee80211_vif_block_queues_csa(struct ieee80211_sub_if_data *sdata) if (ieee80211_hw_check(&local->hw, HANDLES_QUIET_CSA)) return;
- ieee80211_stop_vif_queues(local, sdata, - IEEE80211_QUEUE_STOP_REASON_CSA); - sdata->csa_blocked_queues = true; + ieee80211_stop_vif_queues_norefcount(local, sdata, + IEEE80211_QUEUE_STOP_REASON_CSA); }
void ieee80211_vif_unblock_queues_csa(struct ieee80211_sub_if_data *sdata) { struct ieee80211_local *local = sdata->local;
- if (sdata->csa_blocked_queues) { - ieee80211_wake_vif_queues(local, sdata, - IEEE80211_QUEUE_STOP_REASON_CSA); - sdata->csa_blocked_queues = false; - } + ieee80211_wake_vif_queues_norefcount(local, sdata, + IEEE80211_QUEUE_STOP_REASON_CSA); } diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 0303972c23e4..111066928b96 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -2636,8 +2636,6 @@ ieee80211_sta_process_chanswitch(struct ieee80211_link_data *link, */ link->conf->csa_active = true; link->u.mgd.csa.blocked_tx = csa_ie.mode; - sdata->csa_blocked_queues = - csa_ie.mode && !ieee80211_hw_check(&local->hw, HANDLES_QUIET_CSA);
wiphy_work_queue(sdata->local->hw.wiphy, &ifmgd->csa_connection_drop_work); diff --git a/net/mac80211/util.c b/net/mac80211/util.c index f94faa86ba8a..b4814e97cf74 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -657,7 +657,7 @@ void ieee80211_wake_queues(struct ieee80211_hw *hw) } EXPORT_SYMBOL(ieee80211_wake_queues);
-static unsigned int +unsigned int ieee80211_get_vif_queues(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata) { @@ -669,7 +669,8 @@ ieee80211_get_vif_queues(struct ieee80211_local *local, queues = 0;
for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) - queues |= BIT(sdata->vif.hw_queue[ac]); + if (sdata->vif.hw_queue[ac] != IEEE80211_INVAL_HW_QUEUE) + queues |= BIT(sdata->vif.hw_queue[ac]); if (sdata->vif.cab_queue != IEEE80211_INVAL_HW_QUEUE) queues |= BIT(sdata->vif.cab_queue); } else { @@ -724,24 +725,6 @@ void ieee80211_flush_queues(struct ieee80211_local *local, __ieee80211_flush_queues(local, sdata, 0, drop); }
-void ieee80211_stop_vif_queues(struct ieee80211_local *local, - struct ieee80211_sub_if_data *sdata, - enum queue_stop_reason reason) -{ - ieee80211_stop_queues_by_reason(&local->hw, - ieee80211_get_vif_queues(local, sdata), - reason, true); -} - -void ieee80211_wake_vif_queues(struct ieee80211_local *local, - struct ieee80211_sub_if_data *sdata, - enum queue_stop_reason reason) -{ - ieee80211_wake_queues_by_reason(&local->hw, - ieee80211_get_vif_queues(local, sdata), - reason, true); -} - static void __iterate_interfaces(struct ieee80211_local *local, u32 iter_flags, void (*iterator)(void *data, u8 *mac,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Lin benjamin-jw.lin@mediatek.com
[ Upstream commit 819e0f1e58e0ba3800cd9eb96b2a39e44e49df97 ]
Station's spatial streaming capability should be initialized before handling VHT OMN, because the handling requires the capability information.
Fixes: a8bca3e9371d ("wifi: mac80211: track capability/opmode NSS separately") Signed-off-by: Benjamin Lin benjamin-jw.lin@mediatek.com Link: https://patch.msgid.link/20241118080722.9603-1-benjamin-jw.lin@mediatek.com [rewrite subject] Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/cfg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 16d47123a73c..1b1bf044378d 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1911,6 +1911,8 @@ static int sta_link_apply_parameters(struct ieee80211_local *local, params->eht_capa_len, link_sta);
+ ieee80211_sta_init_nss(link_sta); + if (params->opmode_notif_used) { /* returned value is only needed for rc update, but the * rc isn't initialized here yet, so ignore it @@ -1920,8 +1922,6 @@ static int sta_link_apply_parameters(struct ieee80211_local *local, sband->band); }
- ieee80211_sta_init_nss(link_sta); - return 0; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnaldo Carvalho de Melo acme@kernel.org
[ Upstream commit 88a6e2f67cc94f751a74409ab4c21e5fc8ea6757 ]
Its used from trace__run(), for the 'perf trace' live mode, i.e. its strace-like, non-perf.data file processing mode, the most common one.
The trace__run() function will set trace->host using machine__new_host() that is supposed to give a machine instance representing the running machine, and since we'll use perf_env__arch_strerrno() to get the right errno -> string table, we need to use machine->env, so initialize it in machine__new_host().
Before the patch:
(gdb) run trace --errno-summary -a sleep 1 <SNIP> Summary of events:
gvfs-afc-volume (3187), 2 events, 0.0%
syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ pselect6 1 0 0.000 0.000 0.000 0.000 0.00%
GUsbEventThread (3519), 2 events, 0.0%
syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ poll 1 0 0.000 0.000 0.000 0.000 0.00% <SNIP> Program received signal SIGSEGV, Segmentation fault. 0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478 478 if (env->arch_strerrno == NULL) (gdb) bt #0 0x00000000005caba0 in perf_env__arch_strerrno (env=0x0, err=110) at util/env.c:478 #1 0x00000000004b75d2 in thread__dump_stats (ttrace=0x14f58f0, trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4673 #2 0x00000000004b78bf in trace__fprintf_thread (fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>, thread=0x10fa0b0, trace=0x7fffffffa5b0) at builtin-trace.c:4708 #3 0x00000000004b7ad9 in trace__fprintf_thread_summary (trace=0x7fffffffa5b0, fp=0x7ffff6ff74e0 <_IO_2_1_stderr_>) at builtin-trace.c:4747 #4 0x00000000004b656e in trace__run (trace=0x7fffffffa5b0, argc=2, argv=0x7fffffffde60) at builtin-trace.c:4456 #5 0x00000000004ba43e in cmd_trace (argc=2, argv=0x7fffffffde60) at builtin-trace.c:5487 #6 0x00000000004c0414 in run_builtin (p=0xec3068 <commands+648>, argc=5, argv=0x7fffffffde60) at perf.c:351 #7 0x00000000004c06bb in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:404 #8 0x00000000004c0814 in run_argv (argcp=0x7fffffffdc4c, argv=0x7fffffffdc40) at perf.c:448 #9 0x00000000004c0b5d in main (argc=5, argv=0x7fffffffde60) at perf.c:560 (gdb)
After:
root@number:~# perf trace -a --errno-summary sleep 1 <SNIP> pw-data-loop (2685), 1410 events, 16.0%
syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ epoll_wait 188 0 983.428 0.000 5.231 15.595 8.68% ioctl 94 0 0.811 0.004 0.009 0.016 2.82% read 188 0 0.322 0.001 0.002 0.006 5.15% write 141 0 0.280 0.001 0.002 0.018 8.39% timerfd_settime 94 0 0.138 0.001 0.001 0.007 6.47%
gnome-control-c (179406), 1848 events, 20.9%
syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ poll 222 0 959.577 0.000 4.322 21.414 11.40% recvmsg 150 0 0.539 0.001 0.004 0.013 5.12% write 300 0 0.442 0.001 0.001 0.007 3.29% read 150 0 0.183 0.001 0.001 0.009 5.53% getpid 102 0 0.101 0.000 0.001 0.008 7.82%
root@number:~#
Fixes: 54373b5d53c1f6aa ("perf env: Introduce perf_env__arch_strerrno()") Reported-by: Veronika Molnarova vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Acked-by: Veronika Molnarova vmolnaro@redhat.com Acked-by: Michael Petlan mpetlan@redhat.com Tested-by: Michael Petlan mpetlan@redhat.com Link: https://lore.kernel.org/r/Z0XffUgNSv_9OjOi@x1 Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/machine.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 4f0ac998b0cc..27d5345d2b30 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -134,6 +134,8 @@ struct machine *machine__new_host(void)
if (machine__create_kernel_maps(machine) < 0) goto out_delete; + + machine->env = &perf_env; }
return machine;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suraj Sonawane surajsonawane0215@gmail.com
[ Upstream commit 265e98f72bac6c41a4492d3e30a8e5fd22fe0779 ]
Fix an issue detected by syzbot with KASAN:
BUG: KASAN: vmalloc-out-of-bounds in cmd_to_func drivers/acpi/nfit/ core.c:416 [inline] BUG: KASAN: vmalloc-out-of-bounds in acpi_nfit_ctl+0x20e8/0x24a0 drivers/acpi/nfit/core.c:459
The issue occurs in cmd_to_func when the call_pkg->nd_reserved2 array is accessed without verifying that call_pkg points to a buffer that is appropriately sized as a struct nd_cmd_pkg. This can lead to out-of-bounds access and undefined behavior if the buffer does not have sufficient space.
To address this, a check was added in acpi_nfit_ctl() to ensure that buf is not NULL and that buf_len is less than sizeof(*call_pkg) before accessing it. This ensures safe access to the members of call_pkg, including the nd_reserved2 array.
Reported-by: syzbot+7534f060ebda6b8b51b3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7534f060ebda6b8b51b3 Tested-by: syzbot+7534f060ebda6b8b51b3@syzkaller.appspotmail.com Fixes: ebe9f6f19d80 ("acpi/nfit: Fix bus command validation") Signed-off-by: Suraj Sonawane surajsonawane0215@gmail.com Reviewed-by: Alison Schofield alison.schofield@intel.com Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://patch.msgid.link/20241118162609.29063-1-surajsonawane0215@gmail.com Signed-off-by: Ira Weiny ira.weiny@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/nfit/core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 5429ec9ef06f..a5d47819b3a4 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -454,8 +454,13 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, if (cmd_rc) *cmd_rc = -EINVAL;
- if (cmd == ND_CMD_CALL) + if (cmd == ND_CMD_CALL) { + if (!buf || buf_len < sizeof(*call_pkg)) + return -EINVAL; + call_pkg = buf; + } + func = cmd_to_func(nfit_mem, cmd, call_pkg, &family); if (func < 0) return func;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David (Ming Qiang) Wu David.Wu3@amd.com
[ Upstream commit 47f402a3e08113e0f5d8e1e6fcc197667a16022f ]
base.sched may not be set for each instance and should not be used for cases such as non-IB tests.
Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()") Signed-off-by: David (Ming Qiang) Wu David.Wu3@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c index 6068b784dc69..9a30b8c10838 100644 --- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c @@ -1289,7 +1289,7 @@ static int uvd_v7_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p, struct amdgpu_job *job, struct amdgpu_ib *ib) { - struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); + struct amdgpu_ring *ring = amdgpu_job_ring(job); unsigned i;
/* No patching necessary for the first instance */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Remi Pommarel repk@triplefau.lt
[ Upstream commit f2f7358c3890e7366cbcb7512b4bc8b4394b2d61 ]
The number of TT changes can be less than initially expected in batadv_tt_tvlv_container_update() (changes can be removed by batadv_tt_local_event() in ADD+DEL sequence between reading tt_diff_entries_num and actually iterating the change list under lock).
Thus tt_diff_len could be bigger than the actual changes size that need to be sent. Because batadv_send_my_tt_response sends the whole packet, uninitialized data can be interpreted as TT changes on other nodes leading to weird TT global entries on those nodes such as:
* 00:00:00:00:00:00 -1 [....] ( 0) 88:12:4e:ad:7e:ba (179) (0x45845380) * 00:00:00:00:78:79 4092 [.W..] ( 0) 88:12:4e:ad:7e:3c (145) (0x8ebadb8b)
All of the above also applies to OGM tvlv container buffer's tvlv_len.
Remove the extra allocated space to avoid sending uninitialized TT changes in batadv_send_my_tt_response() and batadv_v_ogm_send_softif().
Fixes: e1bf0c14096f ("batman-adv: tvlv - convert tt data sent within OGMs") Signed-off-by: Remi Pommarel repk@triplefau.lt Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/batman-adv/translation-table.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index 2243cec18ecc..f0590f9bc2b1 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -990,6 +990,7 @@ static void batadv_tt_tvlv_container_update(struct batadv_priv *bat_priv) int tt_diff_len, tt_change_len = 0; int tt_diff_entries_num = 0; int tt_diff_entries_count = 0; + size_t tt_extra_len = 0; u16 tvlv_len;
tt_diff_entries_num = atomic_read(&bat_priv->tt.local_changes); @@ -1027,6 +1028,9 @@ static void batadv_tt_tvlv_container_update(struct batadv_priv *bat_priv) } spin_unlock_bh(&bat_priv->tt.changes_list_lock);
+ tt_extra_len = batadv_tt_len(tt_diff_entries_num - + tt_diff_entries_count); + /* Keep the buffer for possible tt_request */ spin_lock_bh(&bat_priv->tt.last_changeset_lock); kfree(bat_priv->tt.last_changeset); @@ -1035,6 +1039,7 @@ static void batadv_tt_tvlv_container_update(struct batadv_priv *bat_priv) tt_change_len = batadv_tt_len(tt_diff_entries_count); /* check whether this new OGM has no changes due to size problems */ if (tt_diff_entries_count > 0) { + tt_diff_len -= tt_extra_len; /* if kmalloc() fails we will reply with the full table * instead of providing the diff */ @@ -1047,6 +1052,8 @@ static void batadv_tt_tvlv_container_update(struct batadv_priv *bat_priv) } spin_unlock_bh(&bat_priv->tt.last_changeset_lock);
+ /* Remove extra packet space for OGM */ + tvlv_len -= tt_extra_len; container_register: batadv_tvlv_container_register(bat_priv, BATADV_TVLV_TT, 1, tt_data, tvlv_len);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Remi Pommarel repk@triplefau.lt
[ Upstream commit 8038806db64da15721775d6b834990cacbfcf0b2 ]
The number of entries filled by batadv_tt_tvlv_generate() can be less than initially expected in batadv_tt_prepare_tvlv_{global,local}_data() (changes can be removed by batadv_tt_local_event() in ADD+DEL sequence in the meantime as the lock held during the whole tvlv global/local data generation).
Thus tvlv_len could be bigger than the actual TT entry size that need to be sent so full table TT_RESPONSE could hold invalid TT entries such as below.
* 00:00:00:00:00:00 -1 [....] ( 0) 88:12:4e:ad:7e:ba (179) (0x45845380) * 00:00:00:00:78:79 4092 [.W..] ( 0) 88:12:4e:ad:7e:3c (145) (0x8ebadb8b)
Remove the extra allocated space to avoid sending uninitialized entries for full table TT_RESPONSE in both batadv_send_other_tt_response() and batadv_send_my_tt_response().
Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific") Signed-off-by: Remi Pommarel repk@triplefau.lt Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/batman-adv/translation-table.c | 37 ++++++++++++++++++------------ 1 file changed, 22 insertions(+), 15 deletions(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index f0590f9bc2b1..bbab7491c83f 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -2754,14 +2754,16 @@ static bool batadv_tt_global_valid(const void *entry_ptr, * * Fills the tvlv buff with the tt entries from the specified hash. If valid_cb * is not provided then this becomes a no-op. + * + * Return: Remaining unused length in tvlv_buff. */ -static void batadv_tt_tvlv_generate(struct batadv_priv *bat_priv, - struct batadv_hashtable *hash, - void *tvlv_buff, u16 tt_len, - bool (*valid_cb)(const void *, - const void *, - u8 *flags), - void *cb_data) +static u16 batadv_tt_tvlv_generate(struct batadv_priv *bat_priv, + struct batadv_hashtable *hash, + void *tvlv_buff, u16 tt_len, + bool (*valid_cb)(const void *, + const void *, + u8 *flags), + void *cb_data) { struct batadv_tt_common_entry *tt_common_entry; struct batadv_tvlv_tt_change *tt_change; @@ -2775,7 +2777,7 @@ static void batadv_tt_tvlv_generate(struct batadv_priv *bat_priv, tt_change = tvlv_buff;
if (!valid_cb) - return; + return tt_len;
rcu_read_lock(); for (i = 0; i < hash->size; i++) { @@ -2801,6 +2803,8 @@ static void batadv_tt_tvlv_generate(struct batadv_priv *bat_priv, } } rcu_read_unlock(); + + return batadv_tt_len(tt_tot - tt_num_entries); }
/** @@ -3076,10 +3080,11 @@ static bool batadv_send_other_tt_response(struct batadv_priv *bat_priv, goto out;
/* fill the rest of the tvlv with the real TT entries */ - batadv_tt_tvlv_generate(bat_priv, bat_priv->tt.global_hash, - tt_change, tt_len, - batadv_tt_global_valid, - req_dst_orig_node); + tvlv_len -= batadv_tt_tvlv_generate(bat_priv, + bat_priv->tt.global_hash, + tt_change, tt_len, + batadv_tt_global_valid, + req_dst_orig_node); }
/* Don't send the response, if larger than fragmented packet. */ @@ -3203,9 +3208,11 @@ static bool batadv_send_my_tt_response(struct batadv_priv *bat_priv, goto out;
/* fill the rest of the tvlv with the real TT entries */ - batadv_tt_tvlv_generate(bat_priv, bat_priv->tt.local_hash, - tt_change, tt_len, - batadv_tt_local_valid, NULL); + tvlv_len -= batadv_tt_tvlv_generate(bat_priv, + bat_priv->tt.local_hash, + tt_change, tt_len, + batadv_tt_local_valid, + NULL); }
tvlv_tt_data->flags = BATADV_TT_RESPONSE;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Remi Pommarel repk@triplefau.lt
[ Upstream commit fff8f17c1a6fc802ca23bbd3a276abfde8cc58e6 ]
When TT changes list is too big to fit in packet due to MTU size, an empty OGM is sent expected other node to send TT request to get the changes. The issue is that tt.last_changeset was not built thus the originator was responding with previous changes to those TT requests (see batadv_send_my_tt_response). Also the changes list was never cleaned up effectively never ending growing from this point onwards, repeatedly sending the same TT response changes over and over, and creating a new empty OGM every OGM interval expecting for the local changes to be purged.
When there is more TT changes that can fit in packet, drop all changes, send empty OGM and wait for TT request so we can respond with a full table instead.
Fixes: e1bf0c14096f ("batman-adv: tvlv - convert tt data sent within OGMs") Signed-off-by: Remi Pommarel repk@triplefau.lt Acked-by: Antonio Quartulli Antonio@mandelbit.com Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/batman-adv/translation-table.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index bbab7491c83f..53dea8ae96e4 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -990,6 +990,7 @@ static void batadv_tt_tvlv_container_update(struct batadv_priv *bat_priv) int tt_diff_len, tt_change_len = 0; int tt_diff_entries_num = 0; int tt_diff_entries_count = 0; + bool drop_changes = false; size_t tt_extra_len = 0; u16 tvlv_len;
@@ -997,10 +998,17 @@ static void batadv_tt_tvlv_container_update(struct batadv_priv *bat_priv) tt_diff_len = batadv_tt_len(tt_diff_entries_num);
/* if we have too many changes for one packet don't send any - * and wait for the tt table request which will be fragmented + * and wait for the tt table request so we can reply with the full + * (fragmented) table. + * + * The local change history should still be cleaned up so the next + * TT round can start again with a clean state. */ - if (tt_diff_len > bat_priv->soft_iface->mtu) + if (tt_diff_len > bat_priv->soft_iface->mtu) { tt_diff_len = 0; + tt_diff_entries_num = 0; + drop_changes = true; + }
tvlv_len = batadv_tt_prepare_tvlv_local_data(bat_priv, &tt_data, &tt_change, &tt_diff_len); @@ -1009,7 +1017,7 @@ static void batadv_tt_tvlv_container_update(struct batadv_priv *bat_priv)
tt_data->flags = BATADV_TT_OGM_DIFF;
- if (tt_diff_len == 0) + if (!drop_changes && tt_diff_len == 0) goto container_register;
spin_lock_bh(&bat_priv->tt.changes_list_lock);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit b04d86fff66b15c07505d226431f808c15b1703c ]
syzbot found [1] that after blamed commit, ub->ubsock->sk was NULL when attempting the atomic_dec() :
atomic_dec(&tipc_net(sock_net(ub->ubsock->sk))->wq_count);
Fix this by caching the tipc_net pointer.
[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] CPU: 0 UID: 0 PID: 5896 Comm: kworker/0:3 Not tainted 6.13.0-rc1-next-20241203-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: events cleanup_bearer RIP: 0010:read_pnet include/net/net_namespace.h:387 [inline] RIP: 0010:sock_net include/net/sock.h:655 [inline] RIP: 0010:cleanup_bearer+0x1f7/0x280 net/tipc/udp_media.c:820 Code: 18 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 3c f7 99 f6 48 8b 1b 48 83 c3 30 e8 f0 e4 60 00 48 89 d8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 df e8 1a f7 99 f6 49 83 c7 e8 48 8b 1b RSP: 0018:ffffc9000410fb70 EFLAGS: 00010206 RAX: 0000000000000006 RBX: 0000000000000030 RCX: ffff88802fe45a00 RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffc9000410f900 RBP: ffff88807e1f0908 R08: ffffc9000410f907 R09: 1ffff92000821f20 R10: dffffc0000000000 R11: fffff52000821f21 R12: ffff888031d19980 R13: dffffc0000000000 R14: dffffc0000000000 R15: ffff88807e1f0918 FS: 0000000000000000(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556ca050b000 CR3: 0000000031c0c000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: 6a2fa13312e5 ("tipc: Fix use-after-free of kernel socket in cleanup_bearer().") Reported-by: syzbot+46aa5474f179dacd1a3b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67508b5f.050a0220.17bd51.0070.GAE@google.com/... Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20241204170548.4152658-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/udp_media.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index b7e25e7e9933..108a4cc2e001 100644 --- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -807,6 +807,7 @@ static void cleanup_bearer(struct work_struct *work) { struct udp_bearer *ub = container_of(work, struct udp_bearer, work); struct udp_replicast *rcast, *tmp; + struct tipc_net *tn;
list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) { dst_cache_destroy(&rcast->dst_cache); @@ -814,10 +815,14 @@ static void cleanup_bearer(struct work_struct *work) kfree_rcu(rcast, rcu); }
+ tn = tipc_net(sock_net(ub->ubsock->sk)); + dst_cache_destroy(&ub->rcast.dst_cache); udp_tunnel_sock_release(ub->ubsock); + + /* Note: could use a call_rcu() to avoid another synchronize_net() */ synchronize_net(); - atomic_dec(&tipc_net(sock_net(ub->ubsock->sk))->wq_count); + atomic_dec(&tn->wq_count); kfree(ub); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 11776cff0b563c8b8a4fa76cab620bfb633a8cb8 ]
The dr_domain_add_vport_cap() function generally returns NULL on error but sometimes we want it to return ERR_PTR(-EBUSY) so the caller can retry. The problem here is that "ret" can be either -EBUSY or -ENOMEM and if it's and -ENOMEM then the error pointer is propogated back and eventually dereferenced in dr_ste_v0_build_src_gvmi_qpn_tag().
Fixes: 11a45def2e19 ("net/mlx5: DR, Add support for SF vports") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/07477254-e179-43e2-b1b3-3b9db4674195@stanley.mounta... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c index 3d74109f8230..49f22cad92bf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c @@ -297,7 +297,9 @@ dr_domain_add_vport_cap(struct mlx5dr_domain *dmn, u16 vport) if (ret) { mlx5dr_dbg(dmn, "Couldn't insert new vport into xarray (%d)\n", ret); kvfree(vport_caps); - return ERR_PTR(ret); + if (ret == -EBUSY) + return ERR_PTR(-EBUSY); + return NULL; }
return vport_caps;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haoyu Li lihaoyu499@gmail.com
[ Upstream commit f1d3334d604cc32db63f6e2b3283011e02294e54 ]
With the __counted_by annocation in cfg80211_scan_request struct, the "n_channels" struct member must be set before accessing the "channels" array. Failing to do so will trigger a runtime warning when enabling CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.
Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by") Signed-off-by: Haoyu Li lihaoyu499@gmail.com Link: https://patch.msgid.link/20241203152049.348806-1-lihaoyu499@gmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/sme.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/wireless/sme.c b/net/wireless/sme.c index 431da30817a6..268171600087 100644 --- a/net/wireless/sme.c +++ b/net/wireless/sme.c @@ -83,6 +83,7 @@ static int cfg80211_conn_scan(struct wireless_dev *wdev) if (!request) return -ENOMEM;
+ request->n_channels = n_channels; if (wdev->conn->params.channel) { enum nl80211_band band = wdev->conn->params.channel->band; struct ieee80211_supported_band *sband =
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Danielle Ratson danieller@nvidia.com
[ Upstream commit cf3515c556907b4da290967a2a6cbbd9ee0ee723 ]
The test is sending only one packet generated with mausezahn from $h1 to $h2. However, for some reason, it is testing for non-zero maximum occupancy in both the ingress pool of $h1 and $h2. The former only passes when $h2 happens to send a packet.
Avoid intermittent failures by removing unintentional test case regarding the ingress pool of $h1.
Fixes: a865ad999603 ("selftests: mlxsw: Add shared buffer traffic test") Signed-off-by: Danielle Ratson danieller@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Link: https://patch.msgid.link/5b7344608d5e06f38209e48d8af8c92fa11b6742.1733414773... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh b/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh index 0c47faff9274..a7b3d6cf3185 100755 --- a/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh +++ b/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh @@ -108,11 +108,6 @@ port_pool_test()
devlink sb occupancy snapshot $DEVLINK_DEV
- RET=0 - max_occ=$(sb_occ_pool_check $dl_port1 $SB_POOL_ING $exp_max_occ) - check_err $? "Expected iPool($SB_POOL_ING) max occupancy to be $exp_max_occ, but got $max_occ" - log_test "physical port's($h1) ingress pool" - RET=0 max_occ=$(sb_occ_pool_check $dl_port2 $SB_POOL_ING $exp_max_occ) check_err $? "Expected iPool($SB_POOL_ING) max occupancy to be $exp_max_occ, but got $max_occ"
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Danielle Ratson danieller@nvidia.com
[ Upstream commit 6c46ad4d1bb2e8ec2265296e53765190f6e32f33 ]
On both port_tc_ip_test() and port_tc_arp_test(), the max occupancy is checked on $h2 twice, when only the error message is different and does not match the check itself.
Remove the two duplicated test cases from the test.
Fixes: a865ad999603 ("selftests: mlxsw: Add shared buffer traffic test") Signed-off-by: Danielle Ratson danieller@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Link: https://patch.msgid.link/d9eb26f6fc16a06a30b5c2c16ad80caf502bc561.1733414773... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/drivers/net/mlxsw/sharedbuffer.sh | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh b/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh index a7b3d6cf3185..21bebc5726f6 100755 --- a/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh +++ b/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh @@ -131,11 +131,6 @@ port_tc_ip_test()
devlink sb occupancy snapshot $DEVLINK_DEV
- RET=0 - max_occ=$(sb_occ_itc_check $dl_port2 $SB_ITC $exp_max_occ) - check_err $? "Expected ingress TC($SB_ITC) max occupancy to be $exp_max_occ, but got $max_occ" - log_test "physical port's($h1) ingress TC - IP packet" - RET=0 max_occ=$(sb_occ_itc_check $dl_port2 $SB_ITC $exp_max_occ) check_err $? "Expected ingress TC($SB_ITC) max occupancy to be $exp_max_occ, but got $max_occ" @@ -158,11 +153,6 @@ port_tc_arp_test()
devlink sb occupancy snapshot $DEVLINK_DEV
- RET=0 - max_occ=$(sb_occ_itc_check $dl_port2 $SB_ITC $exp_max_occ) - check_err $? "Expected ingress TC($SB_ITC) max occupancy to be $exp_max_occ, but got $max_occ" - log_test "physical port's($h1) ingress TC - ARP packet" - RET=0 max_occ=$(sb_occ_itc_check $dl_port2 $SB_ITC $exp_max_occ) check_err $? "Expected ingress TC($SB_ITC) max occupancy to be $exp_max_occ, but got $max_occ"
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Danielle Ratson danieller@nvidia.com
[ Upstream commit 5f2c7ab15fd806043db1a7d54b5ec36be0bd93b1 ]
The test assumes that the packet it is sending is the only packet being passed to the device.
However, it is not the case and so other packets are filling the buffers as well. Therefore, the test sometimes fails because it is reading a maximum occupancy that is larger than expected.
Add egress filters on $h1 and $h2 that will guarantee the above.
Fixes: a865ad999603 ("selftests: mlxsw: Add shared buffer traffic test") Signed-off-by: Danielle Ratson danieller@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Link: https://patch.msgid.link/64c28bc9b1cc1d78c4a73feda7cedbe9526ccf8b.1733414773... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../drivers/net/mlxsw/sharedbuffer.sh | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh b/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh index 21bebc5726f6..c068e6c2a580 100755 --- a/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh +++ b/tools/testing/selftests/drivers/net/mlxsw/sharedbuffer.sh @@ -22,20 +22,34 @@ SB_ITC=0 h1_create() { simple_if_init $h1 192.0.1.1/24 + tc qdisc add dev $h1 clsact + + # Add egress filter on $h1 that will guarantee that the packet sent, + # will be the only packet being passed to the device. + tc filter add dev $h1 egress pref 2 handle 102 matchall action drop }
h1_destroy() { + tc filter del dev $h1 egress pref 2 handle 102 matchall action drop + tc qdisc del dev $h1 clsact simple_if_fini $h1 192.0.1.1/24 }
h2_create() { simple_if_init $h2 192.0.1.2/24 + tc qdisc add dev $h2 clsact + + # Add egress filter on $h2 that will guarantee that the packet sent, + # will be the only packet being passed to the device. + tc filter add dev $h2 egress pref 1 handle 101 matchall action drop }
h2_destroy() { + tc filter del dev $h2 egress pref 1 handle 101 matchall action drop + tc qdisc del dev $h2 clsact simple_if_fini $h2 192.0.1.2/24 }
@@ -101,6 +115,11 @@ port_pool_test() local exp_max_occ=$(devlink_cell_size_get) local max_occ
+ tc filter add dev $h1 egress protocol ip pref 1 handle 101 flower \ + src_mac $h1mac dst_mac $h2mac \ + src_ip 192.0.1.1 dst_ip 192.0.1.2 \ + action pass + devlink sb occupancy clearmax $DEVLINK_DEV
$MZ $h1 -c 1 -p 10 -a $h1mac -b $h2mac -A 192.0.1.1 -B 192.0.1.2 \ @@ -117,6 +136,11 @@ port_pool_test() max_occ=$(sb_occ_pool_check $cpu_dl_port $SB_POOL_EGR_CPU $exp_max_occ) check_err $? "Expected ePool($SB_POOL_EGR_CPU) max occupancy to be $exp_max_occ, but got $max_occ" log_test "CPU port's egress pool" + + tc filter del dev $h1 egress protocol ip pref 1 handle 101 flower \ + src_mac $h1mac dst_mac $h2mac \ + src_ip 192.0.1.1 dst_ip 192.0.1.2 \ + action pass }
port_tc_ip_test() @@ -124,6 +148,11 @@ port_tc_ip_test() local exp_max_occ=$(devlink_cell_size_get) local max_occ
+ tc filter add dev $h1 egress protocol ip pref 1 handle 101 flower \ + src_mac $h1mac dst_mac $h2mac \ + src_ip 192.0.1.1 dst_ip 192.0.1.2 \ + action pass + devlink sb occupancy clearmax $DEVLINK_DEV
$MZ $h1 -c 1 -p 10 -a $h1mac -b $h2mac -A 192.0.1.1 -B 192.0.1.2 \ @@ -140,6 +169,11 @@ port_tc_ip_test() max_occ=$(sb_occ_etc_check $cpu_dl_port $SB_ITC_CPU_IP $exp_max_occ) check_err $? "Expected egress TC($SB_ITC_CPU_IP) max occupancy to be $exp_max_occ, but got $max_occ" log_test "CPU port's egress TC - IP packet" + + tc filter del dev $h1 egress protocol ip pref 1 handle 101 flower \ + src_mac $h1mac dst_mac $h2mac \ + src_ip 192.0.1.1 dst_ip 192.0.1.2 \ + action pass }
port_tc_arp_test() @@ -147,6 +181,9 @@ port_tc_arp_test() local exp_max_occ=$(devlink_cell_size_get) local max_occ
+ tc filter add dev $h1 egress protocol arp pref 1 handle 101 flower \ + src_mac $h1mac action pass + devlink sb occupancy clearmax $DEVLINK_DEV
$MZ $h1 -c 1 -p 10 -a $h1mac -A 192.0.1.1 -t arp -q @@ -162,6 +199,9 @@ port_tc_arp_test() max_occ=$(sb_occ_etc_check $cpu_dl_port $SB_ITC_CPU_ARP $exp_max_occ) check_err $? "Expected egress TC($SB_ITC_IP2ME) max occupancy to be $exp_max_occ, but got $max_occ" log_test "CPU port's egress TC - ARP packet" + + tc filter del dev $h1 egress protocol arp pref 1 handle 101 flower \ + src_mac $h1mac action pass }
setup_prepare()
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Weißschuh linux@weissschuh.net
[ Upstream commit 5e7aa97c7acf171275ac02a8bb018c31b8918d13 ]
The caller, ptp_kvm_init(), emits a warning if kvm_arch_ptp_init() exits with any error which is not EOPNOTSUPP:
"fail to initialize ptp_kvm"
Replace ENODEV with EOPNOTSUPP to avoid this spurious warning, aligning with the ARM implementation.
Fixes: a86ed2cfa13c ("ptp: Don't print an error if ptp_kvm is not supported") Signed-off-by: Thomas Weißschuh linux@weissschuh.net Link: https://patch.msgid.link/20241203-kvm_ptp-eopnotsuppp-v2-1-d1d060f27aa6@weis... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ptp/ptp_kvm_x86.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/ptp/ptp_kvm_x86.c b/drivers/ptp/ptp_kvm_x86.c index 617c8d6706d3..6cea4fe39bcf 100644 --- a/drivers/ptp/ptp_kvm_x86.c +++ b/drivers/ptp/ptp_kvm_x86.c @@ -26,7 +26,7 @@ int kvm_arch_ptp_init(void) long ret;
if (!kvm_para_available()) - return -ENODEV; + return -EOPNOTSUPP;
if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) { p = alloc_page(GFP_KERNEL | __GFP_ZERO); @@ -46,14 +46,14 @@ int kvm_arch_ptp_init(void)
clock_pair_gpa = slow_virt_to_phys(clock_pair); if (!pvclock_get_pvti_cpu0_va()) { - ret = -ENODEV; + ret = -EOPNOTSUPP; goto err; }
ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa, KVM_CLOCK_PAIRING_WALLCLOCK); if (ret == -KVM_ENOSYS) { - ret = -ENODEV; + ret = -EOPNOTSUPP; goto err; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit de37faf41ac55619dd329229a9bd9698faeabc52 ]
The existing code is using RSS profile to determine IPV4/IPV6 GSO type on all chips older than 5760X. This won't work on 5750X chips that may be using modified RSS profiles. This commit from 2018 has updated the driver to not use RSS profile for HW GRO packets on newer chips:
50f011b63d8c ("bnxt_en: Update RSS setup and GRO-HW logic according to the latest spec.")
However, a recent commit to add support for the newest 5760X chip broke the logic. If the GRO packet needs to be re-segmented by the stack, the wrong GSO type will cause the packet to be dropped.
Fix it to only use RSS profile to determine GSO type on the oldest 5730X/5740X chips which cannot use the new method and is safe to use the RSS profiles.
Also fix the L3/L4 hash type for RX packets by not using the RSS profile for the same reason. Use the ITYPE field in the RX completion to determine L3/L4 hash types correctly.
Fixes: a7445d69809f ("bnxt_en: Add support for new RX and TPA_START completion types for P7") Reviewed-by: Colin Winegarden colin.winegarden@broadcom.com Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://patch.msgid.link/20241204215918.1692597-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 14 ++++++-------- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 +++ 2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 3d9ee91e1f8b..dafc5a4039cd 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1518,7 +1518,7 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, if (TPA_START_IS_IPV6(tpa_start1)) tpa_info->gso_type = SKB_GSO_TCPV6; /* RSS profiles 1 and 3 with extract code 0 for inner 4-tuple */ - else if (cmp_type == CMP_TYPE_RX_L2_TPA_START_CMP && + else if (!BNXT_CHIP_P4_PLUS(bp) && TPA_START_HASH_TYPE(tpa_start) == 3) tpa_info->gso_type = SKB_GSO_TCPV6; tpa_info->rss_hash = @@ -2212,15 +2212,13 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, if (cmp_type == CMP_TYPE_RX_L2_V3_CMP) { type = bnxt_rss_ext_op(bp, rxcmp); } else { - u32 hash_type = RX_CMP_HASH_TYPE(rxcmp); + u32 itypes = RX_CMP_ITYPES(rxcmp);
- /* RSS profiles 1 and 3 with extract code 0 for inner - * 4-tuple - */ - if (hash_type != 1 && hash_type != 3) - type = PKT_HASH_TYPE_L3; - else + if (itypes == RX_CMP_FLAGS_ITYPE_TCP || + itypes == RX_CMP_FLAGS_ITYPE_UDP) type = PKT_HASH_TYPE_L4; + else + type = PKT_HASH_TYPE_L3; } skb_set_hash(skb, le32_to_cpu(rxcmp->rx_cmp_rss_hash), type); } diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 69231e85140b..1d97219369c5 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -267,6 +267,9 @@ struct rx_cmp { (((le32_to_cpu((rxcmp)->rx_cmp_misc_v1) & RX_CMP_RSS_HASH_TYPE) >>\ RX_CMP_RSS_HASH_TYPE_SHIFT) & RSS_PROFILE_ID_MASK)
+#define RX_CMP_ITYPES(rxcmp) \ + (le32_to_cpu((rxcmp)->rx_cmp_len_flags_type) & RX_CMP_FLAGS_ITYPES_MASK) + #define RX_CMP_V3_HASH_TYPE_LEGACY(rxcmp) \ ((le32_to_cpu((rxcmp)->rx_cmp_misc_v1) & RX_CMP_V3_RSS_EXT_OP_LEGACY) >>\ RX_CMP_V3_RSS_EXT_OP_LEGACY_SHIFT)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit a6d75ecee2bf828ac6a1b52724aba0a977e4eaf4 ]
It is unclear if net/lapb code is supposed to be ready for 8021q.
We can at least avoid crashes like the following :
skbuff: skb_under_panic: text:ffffffff8aabe1f6 len:24 put:20 head:ffff88802824a400 data:ffff88802824a3fe tail:0x16 end:0x140 dev:nr0.2 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:206 ! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 5508 Comm: dhcpcd Not tainted 6.12.0-rc7-syzkaller-00144-g66418447d27b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024 RIP: 0010:skb_panic net/core/skbuff.c:206 [inline] RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216 Code: 0d 8d 48 c7 c6 2e 9e 29 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 1a 6f 37 02 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 RSP: 0018:ffffc90002ddf638 EFLAGS: 00010282 RAX: 0000000000000086 RBX: dffffc0000000000 RCX: 7a24750e538ff600 RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000 RBP: ffff888034a86650 R08: ffffffff8174b13c R09: 1ffff920005bbe60 R10: dffffc0000000000 R11: fffff520005bbe61 R12: 0000000000000140 R13: ffff88802824a400 R14: ffff88802824a3fe R15: 0000000000000016 FS: 00007f2a5990d740(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000110c2631fd CR3: 0000000029504000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_push+0xe5/0x100 net/core/skbuff.c:2636 nr_header+0x36/0x320 net/netrom/nr_dev.c:69 dev_hard_header include/linux/netdevice.h:3148 [inline] vlan_dev_hard_header+0x359/0x480 net/8021q/vlan_dev.c:83 dev_hard_header include/linux/netdevice.h:3148 [inline] lapbeth_data_transmit+0x1f6/0x2a0 drivers/net/wan/lapbether.c:257 lapb_data_transmit+0x91/0xb0 net/lapb/lapb_iface.c:447 lapb_transmit_buffer+0x168/0x1f0 net/lapb/lapb_out.c:149 lapb_establish_data_link+0x84/0xd0 lapb_device_event+0x4e0/0x670 notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93 __dev_notify_flags+0x207/0x400 dev_change_flags+0xf0/0x1a0 net/core/dev.c:8922 devinet_ioctl+0xa4e/0x1aa0 net/ipv4/devinet.c:1188 inet_ioctl+0x3d7/0x4f0 net/ipv4/af_inet.c:1003 sock_do_ioctl+0x158/0x460 net/socket.c:1227 sock_ioctl+0x626/0x8e0 net/socket.c:1346 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+fb99d1b0c0f81d94a5e2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67506220.050a0220.17bd51.006c.GAE@google.com/... Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20241204141031.4030267-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/lapb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/lapb.h b/include/net/lapb.h index 124ee122f2c8..6c07420644e4 100644 --- a/include/net/lapb.h +++ b/include/net/lapb.h @@ -4,7 +4,7 @@ #include <linux/lapb.h> #include <linux/refcount.h>
-#define LAPB_HEADER_LEN 20 /* LAPB over Ethernet + a bit more */ +#define LAPB_HEADER_LEN MAX_HEADER /* LAPB over Ethernet + a bit more */
#define LAPB_ACK_PENDING_CONDITION 0x01 #define LAPB_REJECT_CONDITION 0x02
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 0f6ede9fbc747e2553612271bce108f7517e7a45 ]
Ilya reported a slab-use-after-free in dst_destroy [1]
Issue is in xfrm6_net_init() and xfrm4_net_init() :
They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
But net structure might be freed before all the dst callbacks are called. So when dst_destroy() calls later :
if (dst->ops->destroy) dst->ops->destroy(dst);
dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
See a relevant issue fixed in :
ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
A fix is to queue the 'struct net' to be freed after one another cleanup_net() round (and existing rcu_barrier())
[1]
BUG: KASAN: slab-use-after-free in dst_destroy (net/core/dst.c:112) Read of size 8 at addr ffff8882137ccab0 by task swapper/37/0 Dec 03 05:46:18 kernel: CPU: 37 UID: 0 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 6.12.0 #67 Hardware name: Red Hat KVM/RHEL, BIOS 1.16.1-1.el9 04/01/2014 Call Trace: <IRQ> dump_stack_lvl (lib/dump_stack.c:124) print_address_description.constprop.0 (mm/kasan/report.c:378) ? dst_destroy (net/core/dst.c:112) print_report (mm/kasan/report.c:489) ? dst_destroy (net/core/dst.c:112) ? kasan_addr_to_slab (mm/kasan/common.c:37) kasan_report (mm/kasan/report.c:603) ? dst_destroy (net/core/dst.c:112) ? rcu_do_batch (kernel/rcu/tree.c:2567) dst_destroy (net/core/dst.c:112) rcu_do_batch (kernel/rcu/tree.c:2567) ? __pfx_rcu_do_batch (kernel/rcu/tree.c:2491) ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4339 kernel/locking/lockdep.c:4406) rcu_core (kernel/rcu/tree.c:2825) handle_softirqs (kernel/softirq.c:554) __irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637) irq_exit_rcu (kernel/softirq.c:651) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049) </IRQ> <TASK> asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702) RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:743) Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d c7 c9 27 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 RSP: 0018:ffff888100d2fe00 EFLAGS: 00000246 RAX: 00000000001870ed RBX: 1ffff110201a5fc2 RCX: ffffffffb61a3e46 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb3d4d123 RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed11c7e1835d R10: ffff888e3f0c1aeb R11: 0000000000000000 R12: 0000000000000000 R13: ffff888100d20000 R14: dffffc0000000000 R15: 0000000000000000 ? ct_kernel_exit.constprop.0 (kernel/context_tracking.c:148) ? cpuidle_idle_call (kernel/sched/idle.c:186) default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118) cpuidle_idle_call (kernel/sched/idle.c:186) ? __pfx_cpuidle_idle_call (kernel/sched/idle.c:168) ? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5848) ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4347 kernel/locking/lockdep.c:4406) ? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:59) do_idle (kernel/sched/idle.c:326) cpu_startup_entry (kernel/sched/idle.c:423 (discriminator 1)) start_secondary (arch/x86/kernel/smpboot.c:202 arch/x86/kernel/smpboot.c:282) ? __pfx_start_secondary (arch/x86/kernel/smpboot.c:232) ? soft_restart_cpu (arch/x86/kernel/head_64.S:452) common_startup_64 (arch/x86/kernel/head_64.S:414) </TASK> Dec 03 05:46:18 kernel: Allocated by task 12184: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69) __kasan_slab_alloc (mm/kasan/common.c:319 mm/kasan/common.c:345) kmem_cache_alloc_noprof (mm/slub.c:4085 mm/slub.c:4134 mm/slub.c:4141) copy_net_ns (net/core/net_namespace.c:421 net/core/net_namespace.c:480) create_new_namespaces (kernel/nsproxy.c:110) unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4)) ksys_unshare (kernel/fork.c:3313) __x64_sys_unshare (kernel/fork.c:3382) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Dec 03 05:46:18 kernel: Freed by task 11: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69) kasan_save_free_info (mm/kasan/generic.c:582) __kasan_slab_free (mm/kasan/common.c:271) kmem_cache_free (mm/slub.c:4579 mm/slub.c:4681) cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:647) process_one_work (kernel/workqueue.c:3229) worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) Dec 03 05:46:18 kernel: Last potentially related work creation: kasan_save_stack (mm/kasan/common.c:48) __kasan_record_aux_stack (mm/kasan/generic.c:541) insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186) __queue_work (kernel/workqueue.c:2340) queue_work_on (kernel/workqueue.c:2391) xfrm_policy_insert (net/xfrm/xfrm_policy.c:1610) xfrm_add_policy (net/xfrm/xfrm_user.c:2116) xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321) netlink_rcv_skb (net/netlink/af_netlink.c:2536) xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344) netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342) netlink_sendmsg (net/netlink/af_netlink.c:1886) sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165) vfs_write (fs/read_write.c:590 fs/read_write.c:683) ksys_write (fs/read_write.c:736) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Dec 03 05:46:18 kernel: Second to last potentially related work creation: kasan_save_stack (mm/kasan/common.c:48) __kasan_record_aux_stack (mm/kasan/generic.c:541) insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186) __queue_work (kernel/workqueue.c:2340) queue_work_on (kernel/workqueue.c:2391) __xfrm_state_insert (./include/linux/workqueue.h:723 net/xfrm/xfrm_state.c:1150 net/xfrm/xfrm_state.c:1145 net/xfrm/xfrm_state.c:1513) xfrm_state_update (./include/linux/spinlock.h:396 net/xfrm/xfrm_state.c:1940) xfrm_add_sa (net/xfrm/xfrm_user.c:912) xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321) netlink_rcv_skb (net/netlink/af_netlink.c:2536) xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344) netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342) netlink_sendmsg (net/netlink/af_netlink.c:1886) sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165) vfs_write (fs/read_write.c:590 fs/read_write.c:683) ksys_write (fs/read_write.c:736) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Fixes: a8a572a6b5f2 ("xfrm: dst_entries_init() per-net dst_ops") Reported-by: Ilya Maximets i.maximets@ovn.org Closes: https://lore.kernel.org/netdev/CANn89iKKYDVpB=MtmfH7nyv2p=rJWSLedO5k7wSZgtY_... Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20241204125455.3871859-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/net_namespace.h | 1 + net/core/net_namespace.c | 20 +++++++++++++++++++- 2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index e67b483cc8bb..9398c8f49953 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -80,6 +80,7 @@ struct net { * or to unregister pernet ops * (pernet_ops_rwsem write locked). */ + struct llist_node defer_free_list; struct llist_node cleanup_list; /* namespaces on death row */
#ifdef CONFIG_KEYS diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index e39479f1c9a4..70fea7c1a4b0 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -443,6 +443,21 @@ static struct net *net_alloc(void) goto out; }
+static LLIST_HEAD(defer_free_list); + +static void net_complete_free(void) +{ + struct llist_node *kill_list; + struct net *net, *next; + + /* Get the list of namespaces to free from last round. */ + kill_list = llist_del_all(&defer_free_list); + + llist_for_each_entry_safe(net, next, kill_list, defer_free_list) + kmem_cache_free(net_cachep, net); + +} + static void net_free(struct net *net) { if (refcount_dec_and_test(&net->passive)) { @@ -451,7 +466,8 @@ static void net_free(struct net *net) /* There should not be any trackers left there. */ ref_tracker_dir_exit(&net->notrefcnt_tracker);
- kmem_cache_free(net_cachep, net); + /* Wait for an extra rcu_barrier() before final free. */ + llist_add(&net->defer_free_list, &defer_free_list); } }
@@ -636,6 +652,8 @@ static void cleanup_net(struct work_struct *work) */ rcu_barrier();
+ net_complete_free(); + /* Finally it is safe to free my network namespace structure */ list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) { list_del_init(&net->exit_list);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 4b01bec25bef62544228bce06db6a3afa5d3d6bb ]
If ocelot_port_add_txtstamp_skb() fails, for example due to a full PTP timestamp FIFO, we must undo the skb_clone_sk() call with kfree_skb(). Otherwise, the reference to the skb clone is lost.
Fixes: 52849bcf0029 ("net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://patch.msgid.link/20241205145519.1236778-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot_ptp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mscc/ocelot_ptp.c b/drivers/net/ethernet/mscc/ocelot_ptp.c index e172638b0601..db00a51a7430 100644 --- a/drivers/net/ethernet/mscc/ocelot_ptp.c +++ b/drivers/net/ethernet/mscc/ocelot_ptp.c @@ -688,8 +688,10 @@ int ocelot_port_txtstamp_request(struct ocelot *ocelot, int port, return -ENOMEM;
err = ocelot_port_add_txtstamp_skb(ocelot, port, *clone); - if (err) + if (err) { + kfree_skb(*clone); return err; + }
OCELOT_SKB_CB(skb)->ptp_cmd = ptp_cmd; OCELOT_SKB_CB(*clone)->ptp_class = ptp_class;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit b6fba4b3f0becb794e274430f3a0839d8ba31262 ]
This condition, theoretically impossible to trigger, is not really handled well. By "continuing", we are skipping the write to SYS_PTP_NXT which advances the timestamp FIFO to the next entry. So we are reading the same FIFO entry all over again, printing stack traces and eventually killing the kernel.
No real problem has been observed here. This is part of a larger rework of the timestamp IRQ procedure, with this logical change split out into a patch of its own. We will need to "goto next_ts" for other conditions as well.
Fixes: 9fde506e0c53 ("net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://patch.msgid.link/20241205145519.1236778-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot_ptp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot_ptp.c b/drivers/net/ethernet/mscc/ocelot_ptp.c index db00a51a7430..95a5267bc9ce 100644 --- a/drivers/net/ethernet/mscc/ocelot_ptp.c +++ b/drivers/net/ethernet/mscc/ocelot_ptp.c @@ -786,7 +786,7 @@ void ocelot_get_txtstamp(struct ocelot *ocelot) spin_unlock_irqrestore(&port->tx_skbs.lock, flags);
if (WARN_ON(!skb_match)) - continue; + goto next_ts;
if (!ocelot_validate_ptp_skb(skb_match, seqid)) { dev_err_ratelimited(ocelot->dev, @@ -804,7 +804,7 @@ void ocelot_get_txtstamp(struct ocelot *ocelot) shhwtstamps.hwtstamp = ktime_set(ts.tv_sec, ts.tv_nsec); skb_complete_tx_timestamp(skb_match, &shhwtstamps);
- /* Next ts */ +next_ts: ocelot_write(ocelot, SYS_PTP_NXT_PTP_NXT, SYS_PTP_NXT); } }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 0c53cdb95eb4a604062e326636971d96dd9b1b26 ]
ocelot_get_txtstamp() is a threaded IRQ handler, requested explicitly as such by both ocelot_ptp_rdy_irq_handler() and vsc9959_irq_handler().
As such, it runs with IRQs enabled, and not in hardirq context. Thus, ocelot_port_add_txtstamp_skb() has no reason to turn off IRQs, it cannot be preempted by ocelot_get_txtstamp(). For the same reason, dev_kfree_skb_any_reason() will always evaluate as kfree_skb_reason() in this calling context, so just simplify the dev_kfree_skb_any() call to kfree_skb().
Also, ocelot_port_txtstamp_request() runs from NET_TX softirq context, not with hardirqs enabled. Thus, ocelot_get_txtstamp() which shares the ocelot_port->tx_skbs.lock lock with it, has no reason to disable hardirqs.
This is part of a larger rework of the TX timestamping procedure. A logical subportion of the rework has been split into a separate change.
Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://patch.msgid.link/20241205145519.1236778-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: b454abfab525 ("net: mscc: ocelot: be resilient to loss of PTP packets during transmission") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot_ptp.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot_ptp.c b/drivers/net/ethernet/mscc/ocelot_ptp.c index 95a5267bc9ce..d732f99e6391 100644 --- a/drivers/net/ethernet/mscc/ocelot_ptp.c +++ b/drivers/net/ethernet/mscc/ocelot_ptp.c @@ -607,13 +607,12 @@ static int ocelot_port_add_txtstamp_skb(struct ocelot *ocelot, int port, struct sk_buff *clone) { struct ocelot_port *ocelot_port = ocelot->ports[port]; - unsigned long flags;
- spin_lock_irqsave(&ocelot->ts_id_lock, flags); + spin_lock(&ocelot->ts_id_lock);
if (ocelot_port->ptp_skbs_in_flight == OCELOT_MAX_PTP_ID || ocelot->ptp_skbs_in_flight == OCELOT_PTP_FIFO_SIZE) { - spin_unlock_irqrestore(&ocelot->ts_id_lock, flags); + spin_unlock(&ocelot->ts_id_lock); return -EBUSY; }
@@ -630,7 +629,7 @@ static int ocelot_port_add_txtstamp_skb(struct ocelot *ocelot, int port,
skb_queue_tail(&ocelot_port->tx_skbs, clone);
- spin_unlock_irqrestore(&ocelot->ts_id_lock, flags); + spin_unlock(&ocelot->ts_id_lock);
return 0; } @@ -749,7 +748,6 @@ void ocelot_get_txtstamp(struct ocelot *ocelot) u32 val, id, seqid, txport; struct ocelot_port *port; struct timespec64 ts; - unsigned long flags;
val = ocelot_read(ocelot, SYS_PTP_STATUS);
@@ -773,7 +771,7 @@ void ocelot_get_txtstamp(struct ocelot *ocelot)
/* Retrieve its associated skb */ try_again: - spin_lock_irqsave(&port->tx_skbs.lock, flags); + spin_lock(&port->tx_skbs.lock);
skb_queue_walk_safe(&port->tx_skbs, skb, skb_tmp) { if (OCELOT_SKB_CB(skb)->ts_id != id) @@ -783,7 +781,7 @@ void ocelot_get_txtstamp(struct ocelot *ocelot) break; }
- spin_unlock_irqrestore(&port->tx_skbs.lock, flags); + spin_unlock(&port->tx_skbs.lock);
if (WARN_ON(!skb_match)) goto next_ts; @@ -792,7 +790,7 @@ void ocelot_get_txtstamp(struct ocelot *ocelot) dev_err_ratelimited(ocelot->dev, "port %d received stale TX timestamp for seqid %d, discarding\n", txport, seqid); - dev_kfree_skb_any(skb); + kfree_skb(skb); goto try_again; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit b454abfab52543c44b581afc807b9f97fc1e7a3a ]
The Felix DSA driver presents unique challenges that make the simplistic ocelot PTP TX timestamping procedure unreliable: any transmitted packet may be lost in hardware before it ever leaves our local system.
This may happen because there is congestion on the DSA conduit, the switch CPU port or even user port (Qdiscs like taprio may delay packets indefinitely by design).
The technical problem is that the kernel, i.e. ocelot_port_add_txtstamp_skb(), runs out of timestamp IDs eventually, because it never detects that packets are lost, and keeps the IDs of the lost packets on hold indefinitely. The manifestation of the issue once the entire timestamp ID range becomes busy looks like this in dmesg:
mscc_felix 0000:00:00.5: port 0 delivering skb without TX timestamp mscc_felix 0000:00:00.5: port 1 delivering skb without TX timestamp
At the surface level, we need a timeout timer so that the kernel knows a timestamp ID is available again. But there is a deeper problem with the implementation, which is the monotonically increasing ocelot_port->ts_id. In the presence of packet loss, it will be impossible to detect that and reuse one of the holes created in the range of free timestamp IDs.
What we actually need is a bitmap of 63 timestamp IDs tracking which one is available. That is able to use up holes caused by packet loss, but also gives us a unique opportunity to not implement an actual timer_list for the timeout timer (very complicated in terms of locking).
We could only declare a timestamp ID stale on demand (lazily), aka when there's no other timestamp ID available. There are pros and cons to this approach: the implementation is much more simple than per-packet timers would be, but most of the stale packets would be quasi-leaked - not really leaked, but blocked in driver memory, since this algorithm sees no reason to free them.
An improved technique would be to check for stale timestamp IDs every time we allocate a new one. Assuming a constant flux of PTP packets, this avoids stale packets being blocked in memory, but of course, packets lost at the end of the flux are still blocked until the flux resumes (nobody left to kick them out).
Since implementing per-packet timers is way too complicated, this should be good enough.
Testing procedure:
Persistently block traffic class 5 and try to run PTP on it: $ tc qdisc replace dev swp3 parent root taprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 sched-entry S 0xdf 100000 flags 0x2 [ 126.948141] mscc_felix 0000:00:00.5: port 3 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS $ ptp4l -i swp3 -2 -P -m --socket_priority 5 --fault_reset_interval ASAP --logSyncInterval -3 ptp4l[70.351]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[70.354]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[70.358]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE [ 70.394583] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[70.406]: timed out while polling for tx timestamp ptp4l[70.406]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[70.406]: port 1 (swp3): send peer delay response failed ptp4l[70.407]: port 1 (swp3): clearing fault immediately ptp4l[70.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 71.394858] mscc_felix 0000:00:00.5: port 3 timestamp id 1 ptp4l[71.400]: timed out while polling for tx timestamp ptp4l[71.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[71.401]: port 1 (swp3): send peer delay response failed ptp4l[71.401]: port 1 (swp3): clearing fault immediately [ 72.393616] mscc_felix 0000:00:00.5: port 3 timestamp id 2 ptp4l[72.401]: timed out while polling for tx timestamp ptp4l[72.402]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[72.402]: port 1 (swp3): send peer delay response failed ptp4l[72.402]: port 1 (swp3): clearing fault immediately ptp4l[72.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 73.395291] mscc_felix 0000:00:00.5: port 3 timestamp id 3 ptp4l[73.400]: timed out while polling for tx timestamp ptp4l[73.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[73.400]: port 1 (swp3): send peer delay response failed ptp4l[73.400]: port 1 (swp3): clearing fault immediately [ 74.394282] mscc_felix 0000:00:00.5: port 3 timestamp id 4 ptp4l[74.400]: timed out while polling for tx timestamp ptp4l[74.401]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[74.401]: port 1 (swp3): send peer delay response failed ptp4l[74.401]: port 1 (swp3): clearing fault immediately ptp4l[74.953]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 75.396830] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost [ 75.405760] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[75.410]: timed out while polling for tx timestamp ptp4l[75.411]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it ptp4l[75.411]: port 1 (swp3): send peer delay response failed ptp4l[75.411]: port 1 (swp3): clearing fault immediately (...)
Remove the blocking condition and see that the port recovers: $ same tc command as above, but use "sched-entry S 0xff" instead $ same ptp4l command as above ptp4l[99.489]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[99.490]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[99.492]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE [ 100.403768] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost [ 100.412545] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 1 which seems lost [ 100.421283] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 2 which seems lost [ 100.430015] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 3 which seems lost [ 100.438744] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 4 which seems lost [ 100.447470] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 100.505919] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[100.963]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1 [ 101.405077] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 101.507953] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 102.405405] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 102.509391] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 103.406003] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 103.510011] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 104.405601] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 104.510624] mscc_felix 0000:00:00.5: port 3 timestamp id 0 ptp4l[104.965]: selected best master clock d858d7.fffe.00ca6d ptp4l[104.966]: port 1 (swp3): assuming the grand master role ptp4l[104.967]: port 1 (swp3): LISTENING to GRAND_MASTER on RS_GRAND_MASTER [ 105.106201] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.232420] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.359001] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.405500] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.485356] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.511220] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.610938] mscc_felix 0000:00:00.5: port 3 timestamp id 0 [ 105.737237] mscc_felix 0000:00:00.5: port 3 timestamp id 0 (...)
Notice that in this new usage pattern, a non-congested port should basically use timestamp ID 0 all the time, progressing to higher numbers only if there are unacknowledged timestamps in flight. Compare this to the old usage, where the timestamp ID used to monotonically increase modulo OCELOT_MAX_PTP_ID.
In terms of implementation, this simplifies the bookkeeping of the ocelot_port :: ts_id and ptp_skbs_in_flight. Since we need to traverse the list of two-step timestampable skbs for each new packet anyway, the information can already be computed and does not need to be stored. Also, ocelot_port->tx_skbs is always accessed under the switch-wide ocelot->ts_id_lock IRQ-unsafe spinlock, so we don't need the skb queue's lock and can use the unlocked primitives safely.
This problem was actually detected using the tc-taprio offload, and is causing trouble in TSN scenarios, which Felix (NXP LS1028A / VSC9959) supports but Ocelot (VSC7514) does not. Thus, I've selected the commit to blame as the one adding initial timestamping support for the Felix switch.
Fixes: c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://patch.msgid.link/20241205145519.1236778-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot_ptp.c | 134 +++++++++++++++---------- include/linux/dsa/ocelot.h | 1 + include/soc/mscc/ocelot.h | 2 - 3 files changed, 80 insertions(+), 57 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot_ptp.c b/drivers/net/ethernet/mscc/ocelot_ptp.c index d732f99e6391..7eb01d1e1ecd 100644 --- a/drivers/net/ethernet/mscc/ocelot_ptp.c +++ b/drivers/net/ethernet/mscc/ocelot_ptp.c @@ -14,6 +14,8 @@ #include <soc/mscc/ocelot.h> #include "ocelot.h"
+#define OCELOT_PTP_TX_TSTAMP_TIMEOUT (5 * HZ) + int ocelot_ptp_gettime64(struct ptp_clock_info *ptp, struct timespec64 *ts) { struct ocelot *ocelot = container_of(ptp, struct ocelot, ptp_info); @@ -603,34 +605,88 @@ int ocelot_get_ts_info(struct ocelot *ocelot, int port, } EXPORT_SYMBOL(ocelot_get_ts_info);
-static int ocelot_port_add_txtstamp_skb(struct ocelot *ocelot, int port, +static struct sk_buff *ocelot_port_dequeue_ptp_tx_skb(struct ocelot *ocelot, + int port, u8 ts_id, + u32 seqid) +{ + struct ocelot_port *ocelot_port = ocelot->ports[port]; + struct sk_buff *skb, *skb_tmp, *skb_match = NULL; + struct ptp_header *hdr; + + spin_lock(&ocelot->ts_id_lock); + + skb_queue_walk_safe(&ocelot_port->tx_skbs, skb, skb_tmp) { + if (OCELOT_SKB_CB(skb)->ts_id != ts_id) + continue; + + /* Check that the timestamp ID is for the expected PTP + * sequenceId. We don't have to test ptp_parse_header() against + * NULL, because we've pre-validated the packet's ptp_class. + */ + hdr = ptp_parse_header(skb, OCELOT_SKB_CB(skb)->ptp_class); + if (seqid != ntohs(hdr->sequence_id)) + continue; + + __skb_unlink(skb, &ocelot_port->tx_skbs); + ocelot->ptp_skbs_in_flight--; + skb_match = skb; + break; + } + + spin_unlock(&ocelot->ts_id_lock); + + return skb_match; +} + +static int ocelot_port_queue_ptp_tx_skb(struct ocelot *ocelot, int port, struct sk_buff *clone) { struct ocelot_port *ocelot_port = ocelot->ports[port]; + DECLARE_BITMAP(ts_id_in_flight, OCELOT_MAX_PTP_ID); + struct sk_buff *skb, *skb_tmp; + unsigned long n;
spin_lock(&ocelot->ts_id_lock);
- if (ocelot_port->ptp_skbs_in_flight == OCELOT_MAX_PTP_ID || - ocelot->ptp_skbs_in_flight == OCELOT_PTP_FIFO_SIZE) { + /* To get a better chance of acquiring a timestamp ID, first flush the + * stale packets still waiting in the TX timestamping queue. They are + * probably lost. + */ + skb_queue_walk_safe(&ocelot_port->tx_skbs, skb, skb_tmp) { + if (time_before(OCELOT_SKB_CB(skb)->ptp_tx_time + + OCELOT_PTP_TX_TSTAMP_TIMEOUT, jiffies)) { + dev_warn_ratelimited(ocelot->dev, + "port %d invalidating stale timestamp ID %u which seems lost\n", + port, OCELOT_SKB_CB(skb)->ts_id); + __skb_unlink(skb, &ocelot_port->tx_skbs); + kfree_skb(skb); + ocelot->ptp_skbs_in_flight--; + } else { + __set_bit(OCELOT_SKB_CB(skb)->ts_id, ts_id_in_flight); + } + } + + if (ocelot->ptp_skbs_in_flight == OCELOT_PTP_FIFO_SIZE) { spin_unlock(&ocelot->ts_id_lock); return -EBUSY; }
- skb_shinfo(clone)->tx_flags |= SKBTX_IN_PROGRESS; - /* Store timestamp ID in OCELOT_SKB_CB(clone)->ts_id */ - OCELOT_SKB_CB(clone)->ts_id = ocelot_port->ts_id; - - ocelot_port->ts_id++; - if (ocelot_port->ts_id == OCELOT_MAX_PTP_ID) - ocelot_port->ts_id = 0; + n = find_first_zero_bit(ts_id_in_flight, OCELOT_MAX_PTP_ID); + if (n == OCELOT_MAX_PTP_ID) { + spin_unlock(&ocelot->ts_id_lock); + return -EBUSY; + }
- ocelot_port->ptp_skbs_in_flight++; + /* Found an available timestamp ID, use it */ + OCELOT_SKB_CB(clone)->ts_id = n; + OCELOT_SKB_CB(clone)->ptp_tx_time = jiffies; ocelot->ptp_skbs_in_flight++; - - skb_queue_tail(&ocelot_port->tx_skbs, clone); + __skb_queue_tail(&ocelot_port->tx_skbs, clone);
spin_unlock(&ocelot->ts_id_lock);
+ dev_dbg_ratelimited(ocelot->dev, "port %d timestamp id %lu\n", port, n); + return 0; }
@@ -686,12 +742,14 @@ int ocelot_port_txtstamp_request(struct ocelot *ocelot, int port, if (!(*clone)) return -ENOMEM;
- err = ocelot_port_add_txtstamp_skb(ocelot, port, *clone); + /* Store timestamp ID in OCELOT_SKB_CB(clone)->ts_id */ + err = ocelot_port_queue_ptp_tx_skb(ocelot, port, *clone); if (err) { kfree_skb(*clone); return err; }
+ skb_shinfo(*clone)->tx_flags |= SKBTX_IN_PROGRESS; OCELOT_SKB_CB(skb)->ptp_cmd = ptp_cmd; OCELOT_SKB_CB(*clone)->ptp_class = ptp_class; } @@ -727,26 +785,14 @@ static void ocelot_get_hwtimestamp(struct ocelot *ocelot, spin_unlock_irqrestore(&ocelot->ptp_clock_lock, flags); }
-static bool ocelot_validate_ptp_skb(struct sk_buff *clone, u16 seqid) -{ - struct ptp_header *hdr; - - hdr = ptp_parse_header(clone, OCELOT_SKB_CB(clone)->ptp_class); - if (WARN_ON(!hdr)) - return false; - - return seqid == ntohs(hdr->sequence_id); -} - void ocelot_get_txtstamp(struct ocelot *ocelot) { int budget = OCELOT_PTP_QUEUE_SZ;
while (budget--) { - struct sk_buff *skb, *skb_tmp, *skb_match = NULL; struct skb_shared_hwtstamps shhwtstamps; u32 val, id, seqid, txport; - struct ocelot_port *port; + struct sk_buff *skb_match; struct timespec64 ts;
val = ocelot_read(ocelot, SYS_PTP_STATUS); @@ -762,36 +808,14 @@ void ocelot_get_txtstamp(struct ocelot *ocelot) txport = SYS_PTP_STATUS_PTP_MESS_TXPORT_X(val); seqid = SYS_PTP_STATUS_PTP_MESS_SEQ_ID(val);
- port = ocelot->ports[txport]; - - spin_lock(&ocelot->ts_id_lock); - port->ptp_skbs_in_flight--; - ocelot->ptp_skbs_in_flight--; - spin_unlock(&ocelot->ts_id_lock); - /* Retrieve its associated skb */ -try_again: - spin_lock(&port->tx_skbs.lock); - - skb_queue_walk_safe(&port->tx_skbs, skb, skb_tmp) { - if (OCELOT_SKB_CB(skb)->ts_id != id) - continue; - __skb_unlink(skb, &port->tx_skbs); - skb_match = skb; - break; - } - - spin_unlock(&port->tx_skbs.lock); - - if (WARN_ON(!skb_match)) + skb_match = ocelot_port_dequeue_ptp_tx_skb(ocelot, txport, id, + seqid); + if (!skb_match) { + dev_warn_ratelimited(ocelot->dev, + "port %d received TX timestamp (seqid %d, ts id %u) for packet previously declared stale\n", + txport, seqid, id); goto next_ts; - - if (!ocelot_validate_ptp_skb(skb_match, seqid)) { - dev_err_ratelimited(ocelot->dev, - "port %d received stale TX timestamp for seqid %d, discarding\n", - txport, seqid); - kfree_skb(skb); - goto try_again; }
/* Get the h/w timestamp */ diff --git a/include/linux/dsa/ocelot.h b/include/linux/dsa/ocelot.h index 6fbfbde68a37..620a3260fc08 100644 --- a/include/linux/dsa/ocelot.h +++ b/include/linux/dsa/ocelot.h @@ -15,6 +15,7 @@ struct ocelot_skb_cb { struct sk_buff *clone; unsigned int ptp_class; /* valid only for clones */ + unsigned long ptp_tx_time; /* valid only for clones */ u32 tstamp_lo; u8 ptp_cmd; u8 ts_id; diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 462c653e1017..2db9ae0575b6 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -778,7 +778,6 @@ struct ocelot_port {
phy_interface_t phy_mode;
- unsigned int ptp_skbs_in_flight; struct sk_buff_head tx_skbs;
unsigned int trap_proto; @@ -786,7 +785,6 @@ struct ocelot_port { u16 mrp_ring_id;
u8 ptp_cmd; - u8 ts_id;
u8 index;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 43a4166349a254446e7a3db65f721c6a30daccf3 ]
An unsupported RX filter will leave the port with TX timestamping still applied as per the new request, rather than the old setting. When parsing the tx_type, don't apply it just yet, but delay that until after we've parsed the rx_filter as well (and potentially returned -ERANGE for that).
Similarly, copy_to_user() may fail, which is a rare occurrence, but should still be treated by unwinding what was done.
Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://patch.msgid.link/20241205145519.1236778-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot_ptp.c | 59 ++++++++++++++++++-------- 1 file changed, 42 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot_ptp.c b/drivers/net/ethernet/mscc/ocelot_ptp.c index 7eb01d1e1ecd..808ce8e68d39 100644 --- a/drivers/net/ethernet/mscc/ocelot_ptp.c +++ b/drivers/net/ethernet/mscc/ocelot_ptp.c @@ -497,6 +497,28 @@ static int ocelot_traps_to_ptp_rx_filter(unsigned int proto) return HWTSTAMP_FILTER_NONE; }
+static int ocelot_ptp_tx_type_to_cmd(int tx_type, int *ptp_cmd) +{ + switch (tx_type) { + case HWTSTAMP_TX_ON: + *ptp_cmd = IFH_REW_OP_TWO_STEP_PTP; + break; + case HWTSTAMP_TX_ONESTEP_SYNC: + /* IFH_REW_OP_ONE_STEP_PTP updates the correctionField, + * what we need to update is the originTimestamp. + */ + *ptp_cmd = IFH_REW_OP_ORIGIN_PTP; + break; + case HWTSTAMP_TX_OFF: + *ptp_cmd = 0; + break; + default: + return -ERANGE; + } + + return 0; +} + int ocelot_hwstamp_get(struct ocelot *ocelot, int port, struct ifreq *ifr) { struct ocelot_port *ocelot_port = ocelot->ports[port]; @@ -523,30 +545,19 @@ EXPORT_SYMBOL(ocelot_hwstamp_get); int ocelot_hwstamp_set(struct ocelot *ocelot, int port, struct ifreq *ifr) { struct ocelot_port *ocelot_port = ocelot->ports[port]; + int ptp_cmd, old_ptp_cmd = ocelot_port->ptp_cmd; bool l2 = false, l4 = false; struct hwtstamp_config cfg; + bool old_l2, old_l4; int err;
if (copy_from_user(&cfg, ifr->ifr_data, sizeof(cfg))) return -EFAULT;
/* Tx type sanity check */ - switch (cfg.tx_type) { - case HWTSTAMP_TX_ON: - ocelot_port->ptp_cmd = IFH_REW_OP_TWO_STEP_PTP; - break; - case HWTSTAMP_TX_ONESTEP_SYNC: - /* IFH_REW_OP_ONE_STEP_PTP updates the correctional field, we - * need to update the origin time. - */ - ocelot_port->ptp_cmd = IFH_REW_OP_ORIGIN_PTP; - break; - case HWTSTAMP_TX_OFF: - ocelot_port->ptp_cmd = 0; - break; - default: - return -ERANGE; - } + err = ocelot_ptp_tx_type_to_cmd(cfg.tx_type, &ptp_cmd); + if (err) + return err;
switch (cfg.rx_filter) { case HWTSTAMP_FILTER_NONE: @@ -571,13 +582,27 @@ int ocelot_hwstamp_set(struct ocelot *ocelot, int port, struct ifreq *ifr) return -ERANGE; }
+ old_l2 = ocelot_port->trap_proto & OCELOT_PROTO_PTP_L2; + old_l4 = ocelot_port->trap_proto & OCELOT_PROTO_PTP_L4; + err = ocelot_setup_ptp_traps(ocelot, port, l2, l4); if (err) return err;
+ ocelot_port->ptp_cmd = ptp_cmd; + cfg.rx_filter = ocelot_traps_to_ptp_rx_filter(ocelot_port->trap_proto);
- return copy_to_user(ifr->ifr_data, &cfg, sizeof(cfg)) ? -EFAULT : 0; + if (copy_to_user(ifr->ifr_data, &cfg, sizeof(cfg))) { + err = -EFAULT; + goto out_restore_ptp_traps; + } + + return 0; +out_restore_ptp_traps: + ocelot_setup_ptp_traps(ocelot, port, old_l2, old_l4); + ocelot_port->ptp_cmd = old_ptp_cmd; + return err; } EXPORT_SYMBOL(ocelot_hwstamp_set);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Philippe Simons simons.philippe@gmail.com
[ Upstream commit f07ae52f5cf6a5584fdf7c8c652f027d90bc8b74 ]
AXP717 datasheet says that regulator ramp delay is 15.625 us/step, which is 10mV in our case.
Add a AXP_DESC_RANGES_DELAY macro and update AXP_DESC_RANGES macro to expand to AXP_DESC_RANGES_DELAY with ramp_delay = 0
For DCDC4, steps is 100mv
Add a AXP_DESC_DELAY macro and update AXP_DESC macro to expand to AXP_DESC_DELAY with ramp_delay = 0
This patch fix crashes when using CPU DVFS.
Signed-off-by: Philippe Simons simons.philippe@gmail.com Tested-by: Hironori KIKUCHI kikuchan98@gmail.com Tested-by: Chris Morgan macromorgan@hotmail.com Reviewed-by: Chen-Yu Tsai wens@csie.org Fixes: d2ac3df75c3a ("regulator: axp20x: add support for the AXP717") Link: https://patch.msgid.link/20241208124308.5630-1-simons.philippe@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/axp20x-regulator.c | 36 ++++++++++++++++++---------- 1 file changed, 24 insertions(+), 12 deletions(-)
diff --git a/drivers/regulator/axp20x-regulator.c b/drivers/regulator/axp20x-regulator.c index a8e91d9d028b..945d2917b91b 100644 --- a/drivers/regulator/axp20x-regulator.c +++ b/drivers/regulator/axp20x-regulator.c @@ -371,8 +371,8 @@ .ops = &axp20x_ops, \ }
-#define AXP_DESC(_family, _id, _match, _supply, _min, _max, _step, _vreg, \ - _vmask, _ereg, _emask) \ +#define AXP_DESC_DELAY(_family, _id, _match, _supply, _min, _max, _step, _vreg, \ + _vmask, _ereg, _emask, _ramp_delay) \ [_family##_##_id] = { \ .name = (_match), \ .supply_name = (_supply), \ @@ -388,9 +388,15 @@ .vsel_mask = (_vmask), \ .enable_reg = (_ereg), \ .enable_mask = (_emask), \ + .ramp_delay = (_ramp_delay), \ .ops = &axp20x_ops, \ }
+#define AXP_DESC(_family, _id, _match, _supply, _min, _max, _step, _vreg, \ + _vmask, _ereg, _emask) \ + AXP_DESC_DELAY(_family, _id, _match, _supply, _min, _max, _step, _vreg, \ + _vmask, _ereg, _emask, 0) + #define AXP_DESC_SW(_family, _id, _match, _supply, _ereg, _emask) \ [_family##_##_id] = { \ .name = (_match), \ @@ -419,8 +425,8 @@ .ops = &axp20x_ops_fixed \ }
-#define AXP_DESC_RANGES(_family, _id, _match, _supply, _ranges, _n_voltages, \ - _vreg, _vmask, _ereg, _emask) \ +#define AXP_DESC_RANGES_DELAY(_family, _id, _match, _supply, _ranges, _n_voltages, \ + _vreg, _vmask, _ereg, _emask, _ramp_delay) \ [_family##_##_id] = { \ .name = (_match), \ .supply_name = (_supply), \ @@ -436,9 +442,15 @@ .enable_mask = (_emask), \ .linear_ranges = (_ranges), \ .n_linear_ranges = ARRAY_SIZE(_ranges), \ + .ramp_delay = (_ramp_delay), \ .ops = &axp20x_ops_range, \ }
+#define AXP_DESC_RANGES(_family, _id, _match, _supply, _ranges, _n_voltages, \ + _vreg, _vmask, _ereg, _emask) \ + AXP_DESC_RANGES_DELAY(_family, _id, _match, _supply, _ranges, \ + _n_voltages, _vreg, _vmask, _ereg, _emask, 0) + static const int axp209_dcdc2_ldo3_slew_rates[] = { 1600, 800, @@ -781,21 +793,21 @@ static const struct linear_range axp717_dcdc3_ranges[] = { };
static const struct regulator_desc axp717_regulators[] = { - AXP_DESC_RANGES(AXP717, DCDC1, "dcdc1", "vin1", + AXP_DESC_RANGES_DELAY(AXP717, DCDC1, "dcdc1", "vin1", axp717_dcdc1_ranges, AXP717_DCDC1_NUM_VOLTAGES, AXP717_DCDC1_CONTROL, AXP717_DCDC_V_OUT_MASK, - AXP717_DCDC_OUTPUT_CONTROL, BIT(0)), - AXP_DESC_RANGES(AXP717, DCDC2, "dcdc2", "vin2", + AXP717_DCDC_OUTPUT_CONTROL, BIT(0), 640), + AXP_DESC_RANGES_DELAY(AXP717, DCDC2, "dcdc2", "vin2", axp717_dcdc2_ranges, AXP717_DCDC2_NUM_VOLTAGES, AXP717_DCDC2_CONTROL, AXP717_DCDC_V_OUT_MASK, - AXP717_DCDC_OUTPUT_CONTROL, BIT(1)), - AXP_DESC_RANGES(AXP717, DCDC3, "dcdc3", "vin3", + AXP717_DCDC_OUTPUT_CONTROL, BIT(1), 640), + AXP_DESC_RANGES_DELAY(AXP717, DCDC3, "dcdc3", "vin3", axp717_dcdc3_ranges, AXP717_DCDC3_NUM_VOLTAGES, AXP717_DCDC3_CONTROL, AXP717_DCDC_V_OUT_MASK, - AXP717_DCDC_OUTPUT_CONTROL, BIT(2)), - AXP_DESC(AXP717, DCDC4, "dcdc4", "vin4", 1000, 3700, 100, + AXP717_DCDC_OUTPUT_CONTROL, BIT(2), 640), + AXP_DESC_DELAY(AXP717, DCDC4, "dcdc4", "vin4", 1000, 3700, 100, AXP717_DCDC4_CONTROL, AXP717_DCDC_V_OUT_MASK, - AXP717_DCDC_OUTPUT_CONTROL, BIT(3)), + AXP717_DCDC_OUTPUT_CONTROL, BIT(3), 6400), AXP_DESC(AXP717, ALDO1, "aldo1", "aldoin", 500, 3500, 100, AXP717_ALDO1_CONTROL, AXP717_LDO_V_OUT_MASK, AXP717_LDO0_OUTPUT_CONTROL, BIT(0)),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit c84dda3751e945a67d71cbe3af4474aad24a5794 ]
A aspeed_spi_start_user() is not balanced by a corresponding aspeed_spi_stop_user(). Add the missing call.
Fixes: e3228ed92893 ("spi: spi-mem: Convert Aspeed SMC driver to spi-mem") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Link: https://patch.msgid.link/4052aa2f9a9ea342fa6af83fa991b55ce5d5819e.1732051814... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-aspeed-smc.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/spi/spi-aspeed-smc.c b/drivers/spi/spi-aspeed-smc.c index bbd417c55e7f..b0e3f307b283 100644 --- a/drivers/spi/spi-aspeed-smc.c +++ b/drivers/spi/spi-aspeed-smc.c @@ -239,7 +239,7 @@ static ssize_t aspeed_spi_read_user(struct aspeed_spi_chip *chip,
ret = aspeed_spi_send_cmd_addr(chip, op->addr.nbytes, offset, op->cmd.opcode); if (ret < 0) - return ret; + goto stop_user;
if (op->dummy.buswidth && op->dummy.nbytes) { for (i = 0; i < op->dummy.nbytes / op->dummy.buswidth; i++) @@ -249,8 +249,9 @@ static ssize_t aspeed_spi_read_user(struct aspeed_spi_chip *chip, aspeed_spi_set_io_mode(chip, io_mode);
aspeed_spi_read_from_ahb(buf, chip->ahb_base, len); +stop_user: aspeed_spi_stop_user(chip); - return 0; + return ret; }
static ssize_t aspeed_spi_write_user(struct aspeed_spi_chip *chip, @@ -261,10 +262,11 @@ static ssize_t aspeed_spi_write_user(struct aspeed_spi_chip *chip, aspeed_spi_start_user(chip); ret = aspeed_spi_send_cmd_addr(chip, op->addr.nbytes, op->addr.val, op->cmd.opcode); if (ret < 0) - return ret; + goto stop_user; aspeed_spi_write_to_ahb(chip->ahb_base, op->data.buf.out, op->data.nbytes); +stop_user: aspeed_spi_stop_user(chip); - return 0; + return ret; }
/* support for 1-1-1, 1-1-2 or 1-1-4 */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Machon daniel.machon@microchip.com
[ Upstream commit f004f2e535e2b66ccbf5ac35f8eaadeac70ad7b7 ]
The FDMA handler is responsible for scheduling a NAPI poll, which will eventually fetch RX packets from the FDMA queue. Currently, the FDMA handler is run in a threaded context. For some reason, this kills performance. Admittedly, I did not do a thorough investigation to see exactly what causes the issue, however, I noticed that in the other driver utilizing the same FDMA engine, we run the FDMA handler in hard IRQ context.
Fix this performance issue, by running the FDMA handler in hard IRQ context, not deferring any work to a thread.
Prior to this change, the RX UDP performance was:
Interval Transfer Bitrate Jitter 0.00-10.20 sec 44.6 MBytes 36.7 Mbits/sec 0.027 ms
After this change, the rx UDP performance is:
Interval Transfer Bitrate Jitter 0.00-9.12 sec 1.01 GBytes 953 Mbits/sec 0.020 ms
Fixes: 10615907e9b5 ("net: sparx5: switchdev: adding frame DMA functionality") Signed-off-by: Daniel Machon daniel.machon@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/sparx5/sparx5_main.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c index b64c814eac11..0c4c75b3682f 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c @@ -693,12 +693,11 @@ static int sparx5_start(struct sparx5 *sparx5) err = -ENXIO; if (sparx5->fdma_irq >= 0) { if (GCB_CHIP_ID_REV_ID_GET(sparx5->chip_id) > 0) - err = devm_request_threaded_irq(sparx5->dev, - sparx5->fdma_irq, - NULL, - sparx5_fdma_handler, - IRQF_ONESHOT, - "sparx5-fdma", sparx5); + err = devm_request_irq(sparx5->dev, + sparx5->fdma_irq, + sparx5_fdma_handler, + 0, + "sparx5-fdma", sparx5); if (!err) err = sparx5_fdma_start(sparx5); if (err)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Machon daniel.machon@microchip.com
[ Upstream commit ddd7ba006078a2bef5971b2dc5f8383d47f96207 ]
On port initialization, we configure the maximum frame length accepted by the receive module associated with the port. This value is currently written to the MAX_LEN field of the DEV10G_MAC_ENA_CFG register, when in fact, it should be written to the DEV10G_MAC_MAXLEN_CFG register. Fix this.
Fixes: 946e7fd5053a ("net: sparx5: add port module support") Signed-off-by: Daniel Machon daniel.machon@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/sparx5/sparx5_port.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_port.c b/drivers/net/ethernet/microchip/sparx5/sparx5_port.c index 062e486c002c..672508efce5c 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_port.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_port.c @@ -1119,7 +1119,7 @@ int sparx5_port_init(struct sparx5 *sparx5, spx5_inst_rmw(DEV10G_MAC_MAXLEN_CFG_MAX_LEN_SET(ETH_MAXLEN), DEV10G_MAC_MAXLEN_CFG_MAX_LEN, devinst, - DEV10G_MAC_ENA_CFG(0)); + DEV10G_MAC_MAXLEN_CFG(0));
/* Handle Signal Detect in 10G PCS */ spx5_inst_wr(PCS10G_BR_PCS_SD_CFG_SD_POL_SET(sd_pol) |
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
[ Upstream commit 7899ca9f3bd2b008e9a7c41f2a9f1986052d7e96 ]
In acpi_decode_space() addr->info.mem.caching is checked on main level for any resource type but addr->info.mem is part of union and thus valid only if the resource type is memory range.
Move the check inside the preceeding switch/case to only execute it when the union is of correct type.
Fixes: fcb29bbcd540 ("ACPI: Add prefetch decoding to the address space parser") Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://patch.msgid.link/20241202100614.20731-1-ilpo.jarvinen@linux.intel.co... Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 7fe842dae1ec..821867de43be 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -250,6 +250,9 @@ static bool acpi_decode_space(struct resource_win *win, switch (addr->resource_type) { case ACPI_MEMORY_RANGE: acpi_dev_memresource_flags(res, len, wp); + + if (addr->info.mem.caching == ACPI_PREFETCHABLE_MEMORY) + res->flags |= IORESOURCE_PREFETCH; break; case ACPI_IO_RANGE: acpi_dev_ioresource_flags(res, len, iodec, @@ -265,9 +268,6 @@ static bool acpi_decode_space(struct resource_win *win, if (addr->producer_consumer == ACPI_PRODUCER) res->flags |= IORESOURCE_WINDOW;
- if (addr->info.mem.caching == ACPI_PREFETCHABLE_MEMORY) - res->flags |= IORESOURCE_PREFETCH; - return !(res->flags & IORESOURCE_DISABLED); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anumula Murali Mohan Reddy anumula@chelsio.com
[ Upstream commit 356983f569c1f5991661fc0050aa263792f50616 ]
t4_set_vf_mac_acl() uses pf to set mac addr, but t4vf_get_vf_mac_acl() uses port number to get mac addr, this leads to error when an attempt to set MAC address on VF's of PF2 and PF3. This patch fixes the issue by using port number to set mac address.
Fixes: e0cdac65ba26 ("cxgb4vf: configure ports accessible by the VF") Signed-off-by: Anumula Murali Mohan Reddy anumula@chelsio.com Signed-off-by: Potnuri Bharat Teja bharat@chelsio.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20241206062014.49414-1-anumula@chelsio.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 5 +++-- 3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index bbf7641a0fc7..7e13cd69f68a 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -2077,7 +2077,7 @@ void t4_idma_monitor(struct adapter *adapter, struct sge_idma_monitor_state *idma, int hz, int ticks); int t4_set_vf_mac_acl(struct adapter *adapter, unsigned int vf, - unsigned int naddr, u8 *addr); + u8 start, unsigned int naddr, u8 *addr); void t4_tp_pio_read(struct adapter *adap, u32 *buff, u32 nregs, u32 start_index, bool sleep_ok); void t4_tp_tm_pio_read(struct adapter *adap, u32 *buff, u32 nregs, diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 2418645c8823..fb3933fbb842 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -3246,7 +3246,7 @@ static int cxgb4_mgmt_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
dev_info(pi->adapter->pdev_dev, "Setting MAC %pM on VF %d\n", mac, vf); - ret = t4_set_vf_mac_acl(adap, vf + 1, 1, mac); + ret = t4_set_vf_mac_acl(adap, vf + 1, pi->lport, 1, mac); if (!ret) ether_addr_copy(adap->vfinfo[vf].vf_mac_addr, mac); return ret; diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index 76de55306c4d..175bf9b13058 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -10215,11 +10215,12 @@ int t4_load_cfg(struct adapter *adap, const u8 *cfg_data, unsigned int size) * t4_set_vf_mac_acl - Set MAC address for the specified VF * @adapter: The adapter * @vf: one of the VFs instantiated by the specified PF + * @start: The start port id associated with specified VF * @naddr: the number of MAC addresses * @addr: the MAC address(es) to be set to the specified VF */ int t4_set_vf_mac_acl(struct adapter *adapter, unsigned int vf, - unsigned int naddr, u8 *addr) + u8 start, unsigned int naddr, u8 *addr) { struct fw_acl_mac_cmd cmd;
@@ -10234,7 +10235,7 @@ int t4_set_vf_mac_acl(struct adapter *adapter, unsigned int vf, cmd.en_to_len16 = cpu_to_be32((unsigned int)FW_LEN16(cmd)); cmd.nmac = naddr;
- switch (adapter->pf) { + switch (start) { case 3: memcpy(cmd.macaddr3, addr, sizeof(cmd.macaddr3)); break;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren wahrenst@gmx.net
[ Upstream commit 4dba406fac06b009873fe7a28231b9b7e4288b09 ]
Storing the maximum clock speed in module parameter qcaspi_clkspeed has the unintended side effect that the first probed instance defines the value for all other instances. Fix this issue by storing it in max_speed_hz of the relevant SPI device.
This fix keeps the priority of the speed parameter (module parameter, device tree property, driver default). Btw this uses the opportunity to get the rid of the unused member clkspeed.
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://patch.msgid.link/20241206184643.123399-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qualcomm/qca_spi.c | 24 ++++++++++-------------- drivers/net/ethernet/qualcomm/qca_spi.h | 1 - 2 files changed, 10 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index 8f7ce6b51a1c..a73426a8c429 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -812,7 +812,6 @@ qcaspi_netdev_init(struct net_device *dev)
dev->mtu = QCAFRM_MAX_MTU; dev->type = ARPHRD_ETHER; - qca->clkspeed = qcaspi_clkspeed; qca->burst_len = qcaspi_burst_len; qca->spi_thread = NULL; qca->buffer_size = (QCAFRM_MAX_MTU + VLAN_ETH_HLEN + QCAFRM_HEADER_LEN + @@ -903,17 +902,15 @@ qca_spi_probe(struct spi_device *spi) legacy_mode = of_property_read_bool(spi->dev.of_node, "qca,legacy-mode");
- if (qcaspi_clkspeed == 0) { - if (spi->max_speed_hz) - qcaspi_clkspeed = spi->max_speed_hz; - else - qcaspi_clkspeed = QCASPI_CLK_SPEED; - } + if (qcaspi_clkspeed) + spi->max_speed_hz = qcaspi_clkspeed; + else if (!spi->max_speed_hz) + spi->max_speed_hz = QCASPI_CLK_SPEED;
- if ((qcaspi_clkspeed < QCASPI_CLK_SPEED_MIN) || - (qcaspi_clkspeed > QCASPI_CLK_SPEED_MAX)) { - dev_err(&spi->dev, "Invalid clkspeed: %d\n", - qcaspi_clkspeed); + if (spi->max_speed_hz < QCASPI_CLK_SPEED_MIN || + spi->max_speed_hz > QCASPI_CLK_SPEED_MAX) { + dev_err(&spi->dev, "Invalid clkspeed: %u\n", + spi->max_speed_hz); return -EINVAL; }
@@ -938,14 +935,13 @@ qca_spi_probe(struct spi_device *spi) return -EINVAL; }
- dev_info(&spi->dev, "ver=%s, clkspeed=%d, burst_len=%d, pluggable=%d\n", + dev_info(&spi->dev, "ver=%s, clkspeed=%u, burst_len=%d, pluggable=%d\n", QCASPI_DRV_VERSION, - qcaspi_clkspeed, + spi->max_speed_hz, qcaspi_burst_len, qcaspi_pluggable);
spi->mode = SPI_MODE_3; - spi->max_speed_hz = qcaspi_clkspeed; if (spi_setup(spi) < 0) { dev_err(&spi->dev, "Unable to setup SPI device\n"); return -EFAULT; diff --git a/drivers/net/ethernet/qualcomm/qca_spi.h b/drivers/net/ethernet/qualcomm/qca_spi.h index 8f4808695e82..0831cefc58b8 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.h +++ b/drivers/net/ethernet/qualcomm/qca_spi.h @@ -89,7 +89,6 @@ struct qcaspi { #endif
/* user configurable options */ - u32 clkspeed; u8 legacy_mode; u16 burst_len; };
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren wahrenst@gmx.net
[ Upstream commit becc6399ce3b724cffe9ccb7ef0bff440bb1b62b ]
The module parameter qcaspi_pluggable controls if QCA7000 signature should be checked at driver probe (current default) or not. Unfortunately this could fail in case the chip is temporary in reset, which isn't under total control by the Linux host. So disable this check per default in order to avoid unexpected probe failures.
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://patch.msgid.link/20241206184643.123399-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qualcomm/qca_spi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index a73426a8c429..6b4b40c6e1fe 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -53,7 +53,7 @@ MODULE_PARM_DESC(qcaspi_burst_len, "Number of data bytes per burst. Use 1-5000."
#define QCASPI_PLUGGABLE_MIN 0 #define QCASPI_PLUGGABLE_MAX 1 -static int qcaspi_pluggable = QCASPI_PLUGGABLE_MIN; +static int qcaspi_pluggable = QCASPI_PLUGGABLE_MAX; module_param(qcaspi_pluggable, int, 0); MODULE_PARM_DESC(qcaspi_pluggable, "Pluggable SPI connection (yes/no).");
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit b2e538a9827dd04ab5273bf4be8eb2edb84357b0 ]
Using WARN() for showing the error of symlink creations don't give more information than telling that something goes wrong, since the usual code path is a lregister callback from each control element creation. More badly, the use of WARN() rather confuses fuzzer as if it were serious issues.
This patch downgrades the warning messages to use the normal dev_err() instead of WARN(). For making it clearer, add the function name to the prefix, too.
Fixes: a135dfb5de15 ("ALSA: led control - add sysfs kcontrol LED marking layer") Reported-by: syzbot+4e7919b09c67ffd198ae@syzkaller.appspotmail.com Closes: https://lore.kernel.org/675664c7.050a0220.a30f1.018c.GAE@google.com Link: https://patch.msgid.link/20241209095614.4273-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/core/control_led.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/sound/core/control_led.c b/sound/core/control_led.c index 65a1ebe87776..e33dfcf863cf 100644 --- a/sound/core/control_led.c +++ b/sound/core/control_led.c @@ -668,10 +668,16 @@ static void snd_ctl_led_sysfs_add(struct snd_card *card) goto cerr; led->cards[card->number] = led_card; snprintf(link_name, sizeof(link_name), "led-%s", led->name); - WARN(sysfs_create_link(&card->ctl_dev->kobj, &led_card->dev.kobj, link_name), - "can't create symlink to controlC%i device\n", card->number); - WARN(sysfs_create_link(&led_card->dev.kobj, &card->card_dev.kobj, "card"), - "can't create symlink to card%i\n", card->number); + if (sysfs_create_link(&card->ctl_dev->kobj, &led_card->dev.kobj, + link_name)) + dev_err(card->dev, + "%s: can't create symlink to controlC%i device\n", + __func__, card->number); + if (sysfs_create_link(&led_card->dev.kobj, &card->card_dev.kobj, + "card")) + dev_err(card->dev, + "%s: can't create symlink to card%i\n", + __func__, card->number);
continue; cerr:
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Venkata Prasad Potturu venkataprasad.potturu@amd.com
[ Upstream commit 984795e76def5c903724b8d6a8228e356bbdf2af ]
With the current implementation, when ACP driver fails to read ACPI _WOV entry then the DMI overrides code won't invoke, may cause regressions for some BIOS versions.
Add a condition check to jump to check the DMI entries incase of ACP driver fail to read ACPI _WOV method.
Fixes: 4095cf872084 (ASoC: amd: yc: Fix for enabling DMIC on acp6x via _DSD entry)
Signed-off-by: Venkata Prasad Potturu venkataprasad.potturu@amd.com Link: https://patch.msgid.link/20241210091026.996860-1-venkataprasad.potturu@amd.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/amd/yc/acp6x-mach.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c index e38c5885dadf..ecf57a6cb7c3 100644 --- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -578,14 +578,19 @@ static int acp6x_probe(struct platform_device *pdev)
handle = ACPI_HANDLE(pdev->dev.parent); ret = acpi_evaluate_integer(handle, "_WOV", NULL, &dmic_status); - if (!ACPI_FAILURE(ret)) + if (!ACPI_FAILURE(ret)) { wov_en = dmic_status; + if (!wov_en) + return -ENODEV; + } else { + /* Incase of ACPI method read failure then jump to check_dmi_entry */ + goto check_dmi_entry; + }
- if (is_dmic_enable && wov_en) + if (is_dmic_enable) platform_set_drvdata(pdev, &acp6x_card); - else - return 0;
+check_dmi_entry: /* check for any DMI overrides */ dmi_id = dmi_first_match(yc_acp_quirk_table); if (dmi_id)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Barker paul.barker.ct@bp.renesas.com
[ Upstream commit ccb84dc8f4a02e7d30ffd388522996546b4d00e1 ]
Update the documentation to match the behaviour of the code.
pm_runtime_resume_and_get() always returns 0 on success, even if __pm_runtime_resume() returns 1.
Fixes: 2c412337cfe6 ("PM: runtime: Add documentation for pm_runtime_resume_and_get()") Signed-off-by: Paul Barker paul.barker.ct@bp.renesas.com Link: https://patch.msgid.link/20241203143729.478-1-paul.barker.ct@bp.renesas.com [ rjw: Subject and changelog edits, adjusted new comment formatting ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/power/runtime_pm.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst index 53d1996460ab..12f429359a82 100644 --- a/Documentation/power/runtime_pm.rst +++ b/Documentation/power/runtime_pm.rst @@ -347,7 +347,9 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
`int pm_runtime_resume_and_get(struct device *dev);` - run pm_runtime_resume(dev) and if successful, increment the device's - usage counter; return the result of pm_runtime_resume + usage counter; returns 0 on success (whether or not the device's + runtime PM status was already 'active') or the error code from + pm_runtime_resume() on failure.
`int pm_request_idle(struct device *dev);` - submit a request to execute the subsystem-level idle callback for the
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: LongPing Wei weilongping@oppo.com
[ Upstream commit 790eb09e59709a1ffc1c64fe4aae2789120851b0 ]
Call bdev_offset_from_zone_start() instead of open-coding it.
Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Signed-off-by: LongPing Wei weilongping@oppo.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Bart Van Assche bvanassche@acm.org Link: https://lore.kernel.org/r/20241107020439.1644577-1-weilongping@oppo.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-zoned.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 6d21693f39b7..767bcbce74fa 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -568,7 +568,7 @@ static struct blk_zone_wplug *disk_get_and_lock_zone_wplug(struct gendisk *disk, spin_lock_init(&zwplug->lock); zwplug->flags = 0; zwplug->zone_no = zno; - zwplug->wp_offset = sector & (disk->queue->limits.chunk_sectors - 1); + zwplug->wp_offset = bdev_offset_from_zone_start(disk->part0, sector); bio_list_init(&zwplug->bio_list); INIT_WORK(&zwplug->bio_work, blk_zone_wplug_bio_work); zwplug->disk = disk;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 24c6843b7393ebc80962b59d7ae71af91bf0dcc1 ]
The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of the previous generation (5750X or P5). However, the aggregation ID fields in the completion structures on P7 have been redefined from 16 bits to 12 bits. The freed up 4 bits are redefined for part of the metadata such as the VLAN ID. The aggregation ID mask was not modified when adding support for P7 chips. Including the extra 4 bits for the aggregation ID can potentially cause the driver to store or fetch the packet header of GRO/LRO packets in the wrong TPA buffer. It may hit the BUG() condition in __skb_pull() because the SKB contains no valid packet header:
kernel BUG at include/linux/skbuff.h:2766! Oops: invalid opcode: 0000 1 PREEMPT SMP NOPTI CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G OE 6.12.0-rc2+ #7 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Dell Inc. PowerEdge R760/0VRV9X, BIOS 1.0.1 12/27/2022 RIP: 0010:eth_type_trans+0xda/0x140 Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 d0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48 RSP: 0018:ff615003803fcc28 EFLAGS: 00010283 RAX: 00000000000022d2 RBX: 0000000000000003 RCX: ff2e8c25da334040 RDX: 0000000000000040 RSI: ff2e8c25c1ce8000 RDI: ff2e8c25869f9000 RBP: ff2e8c258c31c000 R08: ff2e8c25da334000 R09: 0000000000000001 R10: ff2e8c25da3342c0 R11: ff2e8c25c1ce89c0 R12: ff2e8c258e0990b0 R13: ff2e8c25bb120000 R14: ff2e8c25c1ce89c0 R15: ff2e8c25869f9000 FS: 0000000000000000(0000) GS:ff2e8c34be300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f05317e4c8 CR3: 000000108bac6006 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? eth_type_trans+0xda/0x140 ? do_error_trap+0x65/0x80 ? eth_type_trans+0xda/0x140 ? exc_invalid_op+0x4e/0x70 ? eth_type_trans+0xda/0x140 ? asm_exc_invalid_op+0x16/0x20 ? eth_type_trans+0xda/0x140 bnxt_tpa_end+0x10b/0x6b0 [bnxt_en] ? bnxt_tpa_start+0x195/0x320 [bnxt_en] bnxt_rx_pkt+0x902/0xd90 [bnxt_en] ? __bnxt_tx_int.constprop.0+0x89/0x300 [bnxt_en] ? kmem_cache_free+0x343/0x440 ? __bnxt_tx_int.constprop.0+0x24f/0x300 [bnxt_en] __bnxt_poll_work+0x193/0x370 [bnxt_en] bnxt_poll_p5+0x9a/0x300 [bnxt_en] ? try_to_wake_up+0x209/0x670 __napi_poll+0x29/0x1b0
Fix it by redefining the aggregation ID mask for P5_PLUS chips to be 12 bits. This will work because the maximum aggregation ID is less than 4096 on all P5_PLUS chips.
Fixes: 13d2d3d381ee ("bnxt_en: Add new P7 hardware interface definitions") Reviewed-by: Damodharam Ammepalli damodharam.ammepalli@broadcom.com Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20241209015448.1937766-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 1d97219369c5..9e05704d9445 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -381,7 +381,7 @@ struct rx_agg_cmp { u32 rx_agg_cmp_opaque; __le32 rx_agg_cmp_v; #define RX_AGG_CMP_V (1 << 0) - #define RX_AGG_CMP_AGG_ID (0xffff << 16) + #define RX_AGG_CMP_AGG_ID (0x0fff << 16) #define RX_AGG_CMP_AGG_ID_SHIFT 16 __le32 rx_agg_cmp_unused; }; @@ -419,7 +419,7 @@ struct rx_tpa_start_cmp { #define RX_TPA_START_CMP_V3_RSS_HASH_TYPE_SHIFT 7 #define RX_TPA_START_CMP_AGG_ID (0x7f << 25) #define RX_TPA_START_CMP_AGG_ID_SHIFT 25 - #define RX_TPA_START_CMP_AGG_ID_P5 (0xffff << 16) + #define RX_TPA_START_CMP_AGG_ID_P5 (0x0fff << 16) #define RX_TPA_START_CMP_AGG_ID_SHIFT_P5 16 #define RX_TPA_START_CMP_METADATA1 (0xf << 28) #define RX_TPA_START_CMP_METADATA1_SHIFT 28 @@ -543,7 +543,7 @@ struct rx_tpa_end_cmp { #define RX_TPA_END_CMP_PAYLOAD_OFFSET_SHIFT 16 #define RX_TPA_END_CMP_AGG_ID (0x7f << 25) #define RX_TPA_END_CMP_AGG_ID_SHIFT 25 - #define RX_TPA_END_CMP_AGG_ID_P5 (0xffff << 16) + #define RX_TPA_END_CMP_AGG_ID_P5 (0x0fff << 16) #define RX_TPA_END_CMP_AGG_ID_SHIFT_P5 16
__le32 rx_tpa_end_cmp_tsdelta;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Machata petrm@nvidia.com
[ Upstream commit bbe4b41259a3e255a16d795486d331c1670b4e75 ]
net.ipv4.nexthop_compat_mode was added when nexthop objects were added to provide the view of nexthop objects through the usual lens of the route UAPI. As nexthop objects evolved, the information provided through this lens became incomplete. For example, details of resilient nexthop groups are obviously omitted.
Now that 16-bit nexthop group weights are a thing, the 8-bit UAPI cannot convey the >8-bit weight accurately. Instead of inventing workarounds for an obsolete interface, just document the expectations of inaccuracy.
Fixes: b72a6a7ab957 ("net: nexthop: Increase weight to u16") Signed-off-by: Petr Machata petrm@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://patch.msgid.link/b575e32399ccacd09079b2a218255164535123bd.1733740749... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/ip-sysctl.rst | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index eacf8983e230..dcbb6f6caf6d 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -2170,6 +2170,12 @@ nexthop_compat_mode - BOOLEAN understands the new API, this sysctl can be disabled to achieve full performance benefits of the new API by disabling the nexthop expansion and extraneous notifications. + + Note that as a backward-compatible mode, dumping of modern features + might be incomplete or wrong. For example, resilient groups will not be + shown as such, but rather as just a list of next hops. Also weights that + do not fit into 8 bits will show incorrectly. + Default: true (backward compat mode)
fib_notify_on_flag_change - INTEGER
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit bb57c81e97e0082abfb0406ed6f67c615c3d206c ]
The cifs_io_request struct (a wrapper around netfs_io_request) holds open the file on the server, even beyond the local Linux file being closed. This can cause problems with Windows-based filesystems as the file's name still exists after deletion until the file is closed, preventing the parent directory from being removed and causing spurious test failures in xfstests due to inability to remove a directory. The symptom looks something like this in the test output:
rm: cannot remove '/mnt/scratch/test/p0/d3': Directory not empty rm: cannot remove '/mnt/scratch/test/p1/dc/dae': Directory not empty
Fix this by waiting in unlink and rename for any outstanding I/O requests to be completed on the target file before removing that file.
Note that this doesn't prevent Linux from trying to start new requests after deletion if it still has the file open locally - something that's perfectly acceptable on a UNIX system.
Note also that whilst I've marked this as fixing the commit to make cifs use netfslib, I don't know that it won't occur before that.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: David Howells dhowells@redhat.com Acked-by: Paulo Alcantara (Red Hat) pc@manguebit.com cc: Jeff Layton jlayton@kernel.org cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/inode.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c index b35fe1075503..fafc07e38663 100644 --- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -1925,6 +1925,7 @@ int cifs_unlink(struct inode *dir, struct dentry *dentry) goto unlink_out; }
+ netfs_wait_for_outstanding_io(inode); cifs_close_deferred_file_under_dentry(tcon, full_path); #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY if (cap_unix(tcon->ses) && (CIFS_UNIX_POSIX_PATH_OPS_CAP & @@ -2442,8 +2443,10 @@ cifs_rename2(struct mnt_idmap *idmap, struct inode *source_dir, }
cifs_close_deferred_file_under_dentry(tcon, from_name); - if (d_inode(target_dentry) != NULL) + if (d_inode(target_dentry) != NULL) { + netfs_wait_for_outstanding_io(d_inode(target_dentry)); cifs_close_deferred_file_under_dentry(tcon, to_name); + }
rc = cifs_do_rename(xid, source_dentry, from_name, target_dentry, to_name);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Yushchenko nikita.yoush@cogentembedded.com
[ Upstream commit 5cb099902b6b6292b3a85ffa1bb844e0ba195945 ]
When sending frame split into multiple descriptors, hardware processes descriptors one by one, including writing back DT values. The first descriptor could be already marked as completed when processing of next descriptors for the same frame is still in progress.
Although only the last descriptor is configured to generate interrupt, completion of the first descriptor could be noticed by the driver when handling interrupt for the previous frame.
Currently, driver stores skb in the entry that corresponds to the first descriptor. This results into skb could be unmapped and freed when hardware did not complete the send yet. This opens a window for corrupting the data being sent.
Fix this by saving skb in the entry that corresponds to the last descriptor used to send the frame.
Fixes: d2c96b9d5f83 ("net: rswitch: Add jumbo frames handling for TX") Signed-off-by: Nikita Yushchenko nikita.yoush@cogentembedded.com Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Link: https://patch.msgid.link/20241208095004.69468-2-nikita.yoush@cogentembedded.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/rswitch.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/renesas/rswitch.c b/drivers/net/ethernet/renesas/rswitch.c index b80aa27a7214..32b32aa7e01f 100644 --- a/drivers/net/ethernet/renesas/rswitch.c +++ b/drivers/net/ethernet/renesas/rswitch.c @@ -1681,8 +1681,9 @@ static netdev_tx_t rswitch_start_xmit(struct sk_buff *skb, struct net_device *nd if (dma_mapping_error(ndev->dev.parent, dma_addr_orig)) goto err_kfree;
- gq->skbs[gq->cur] = skb; - gq->unmap_addrs[gq->cur] = dma_addr_orig; + /* Stored the skb at the last descriptor to avoid skb free before hardware completes send */ + gq->skbs[(gq->cur + nr_desc - 1) % gq->ring_size] = skb; + gq->unmap_addrs[(gq->cur + nr_desc - 1) % gq->ring_size] = dma_addr_orig;
/* DT_FSTART should be set at last. So, this is reverse order. */ for (i = nr_desc; i-- > 0; ) {
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Yushchenko nikita.yoush@cogentembedded.com
[ Upstream commit 0c9547e6ccf40455b0574cf589be3b152a3edf5b ]
If hardware is already transmitting, it can start handling the descriptor being written to immediately after it observes updated DT field, before the queue is kicked by a write to GWTRC.
If the start_xmit() execution is preempted at unfortunate moment, this transmission can complete, and interrupt handled, before gq->cur gets updated. With the current implementation of completion, this will cause the last entry not completed.
Fix that by changing completion loop to check DT values directly, instead of depending on gq->cur.
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Nikita Yushchenko nikita.yoush@cogentembedded.com Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Link: https://patch.msgid.link/20241208095004.69468-3-nikita.yoush@cogentembedded.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/rswitch.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/renesas/rswitch.c b/drivers/net/ethernet/renesas/rswitch.c index 32b32aa7e01f..c251becef6f8 100644 --- a/drivers/net/ethernet/renesas/rswitch.c +++ b/drivers/net/ethernet/renesas/rswitch.c @@ -862,13 +862,10 @@ static void rswitch_tx_free(struct net_device *ndev) struct rswitch_ext_desc *desc; struct sk_buff *skb;
- for (; rswitch_get_num_cur_queues(gq) > 0; - gq->dirty = rswitch_next_queue_index(gq, false, 1)) { - desc = &gq->tx_ring[gq->dirty]; - if ((desc->desc.die_dt & DT_MASK) != DT_FEMPTY) - break; - + desc = &gq->tx_ring[gq->dirty]; + while ((desc->desc.die_dt & DT_MASK) == DT_FEMPTY) { dma_rmb(); + skb = gq->skbs[gq->dirty]; if (skb) { rdev->ndev->stats.tx_packets++; @@ -879,7 +876,10 @@ static void rswitch_tx_free(struct net_device *ndev) dev_kfree_skb_any(gq->skbs[gq->dirty]); gq->skbs[gq->dirty] = NULL; } + desc->desc.die_dt = DT_EEMPTY; + gq->dirty = rswitch_next_queue_index(gq, false, 1); + desc = &gq->tx_ring[gq->dirty]; } }
@@ -1685,6 +1685,8 @@ static netdev_tx_t rswitch_start_xmit(struct sk_buff *skb, struct net_device *nd gq->skbs[(gq->cur + nr_desc - 1) % gq->ring_size] = skb; gq->unmap_addrs[(gq->cur + nr_desc - 1) % gq->ring_size] = dma_addr_orig;
+ dma_wmb(); + /* DT_FSTART should be set at last. So, this is reverse order. */ for (i = nr_desc; i-- > 0; ) { desc = &gq->tx_ring[rswitch_next_queue_index(gq, true, i)]; @@ -1695,8 +1697,6 @@ static netdev_tx_t rswitch_start_xmit(struct sk_buff *skb, struct net_device *nd goto err_unmap; }
- wmb(); /* gq->cur must be incremented after die_dt was set */ - gq->cur = rswitch_next_queue_index(gq, true, nr_desc); rswitch_modify(rdev->addr, GWTRC(gq->index), 0, BIT(gq->index % 32));
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Yushchenko nikita.yoush@cogentembedded.com
[ Upstream commit bb617328bafa1023d8e9c25a25345a564c66c14f ]
If error path is taken while filling descriptor for a frame, skb pointer is left in the entry. Later, on the ring entry reuse, the same entry could be used as a part of a multi-descriptor frame, and skb for that new frame could be stored in a different entry.
Then, the stale pointer will reach the completion routine, and passed to the release operation.
Fix that by clearing the saved skb pointer at the error path.
Fixes: d2c96b9d5f83 ("net: rswitch: Add jumbo frames handling for TX") Signed-off-by: Nikita Yushchenko nikita.yoush@cogentembedded.com Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Link: https://patch.msgid.link/20241208095004.69468-4-nikita.yoush@cogentembedded.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/rswitch.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/renesas/rswitch.c b/drivers/net/ethernet/renesas/rswitch.c index c251becef6f8..af0bc95ad6ae 100644 --- a/drivers/net/ethernet/renesas/rswitch.c +++ b/drivers/net/ethernet/renesas/rswitch.c @@ -1703,6 +1703,7 @@ static netdev_tx_t rswitch_start_xmit(struct sk_buff *skb, struct net_device *nd return ret;
err_unmap: + gq->skbs[(gq->cur + nr_desc - 1) % gq->ring_size] = NULL; dma_unmap_single(ndev->dev.parent, dma_addr_orig, skb->len, DMA_TO_DEVICE);
err_kfree:
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Yushchenko nikita.yoush@cogentembedded.com
[ Upstream commit 66b7e9f85b8459c823b11e9af69dbf4be5eb6be8 ]
The device tree node saved in the rswitch_device structure is used at several driver locations. So passing this node to of_node_put() after the first use is wrong.
Move of_node_put() for this node to exit paths.
Fixes: b46f1e579329 ("net: renesas: rswitch: Simplify struct phy * handling") Signed-off-by: Nikita Yushchenko nikita.yoush@cogentembedded.com Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Link: https://patch.msgid.link/20241208095004.69468-5-nikita.yoush@cogentembedded.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/rswitch.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/renesas/rswitch.c b/drivers/net/ethernet/renesas/rswitch.c index af0bc95ad6ae..3b57abada200 100644 --- a/drivers/net/ethernet/renesas/rswitch.c +++ b/drivers/net/ethernet/renesas/rswitch.c @@ -1891,7 +1891,6 @@ static int rswitch_device_alloc(struct rswitch_private *priv, unsigned int index rdev->np_port = rswitch_get_port_node(rdev); rdev->disabled = !rdev->np_port; err = of_get_ethdev_address(rdev->np_port, ndev); - of_node_put(rdev->np_port); if (err) { if (is_valid_ether_addr(rdev->etha->mac_addr)) eth_hw_addr_set(ndev, rdev->etha->mac_addr); @@ -1921,6 +1920,7 @@ static int rswitch_device_alloc(struct rswitch_private *priv, unsigned int index
out_rxdmac: out_get_params: + of_node_put(rdev->np_port); netif_napi_del(&rdev->napi); free_netdev(ndev);
@@ -1934,6 +1934,7 @@ static void rswitch_device_free(struct rswitch_private *priv, unsigned int index
rswitch_txdmac_free(ndev); rswitch_rxdmac_free(ndev); + of_node_put(rdev->np_port); netif_napi_del(&rdev->napi); free_netdev(ndev); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Yushchenko nikita.yoush@cogentembedded.com
[ Upstream commit 3dd002f20098b9569f8fd7f8703f364571e2e975 ]
Currently the stop routine of rswitch driver does not immediately prevent hardware from continuing to update descriptors and requesting interrupts.
It can happen that when rswitch_stop() executes the masking of interrupts from the queues of the port being closed, napi poll for that port is already scheduled or running on a different CPU. When execution of this napi poll completes, it will unmask the interrupts. And unmasked interrupt can fire after rswitch_stop() returns from napi_disable() call. Then, the handler won't mask it, because napi_schedule_prep() will return false, and interrupt storm will happen.
This can't be fixed by making rswitch_stop() call napi_disable() before masking interrupts. In this case, the interrupt storm will happen if interrupt fires between napi_disable() and masking.
Fix this by checking for priv->opened_ports bit when unmasking interrupts after napi poll. For that to be consistent, move priv->opened_ports changes into spinlock-protected areas, and reorder other operations in rswitch_open() and rswitch_stop() accordingly.
Signed-off-by: Nikita Yushchenko nikita.yoush@cogentembedded.com Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Link: https://patch.msgid.link/20241209113204.175015-1-nikita.yoush@cogentembedded... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/rswitch.c | 33 ++++++++++++++------------ 1 file changed, 18 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/renesas/rswitch.c b/drivers/net/ethernet/renesas/rswitch.c index 3b57abada200..9dffb7cf1254 100644 --- a/drivers/net/ethernet/renesas/rswitch.c +++ b/drivers/net/ethernet/renesas/rswitch.c @@ -908,8 +908,10 @@ static int rswitch_poll(struct napi_struct *napi, int budget)
if (napi_complete_done(napi, budget - quota)) { spin_lock_irqsave(&priv->lock, flags); - rswitch_enadis_data_irq(priv, rdev->tx_queue->index, true); - rswitch_enadis_data_irq(priv, rdev->rx_queue->index, true); + if (test_bit(rdev->port, priv->opened_ports)) { + rswitch_enadis_data_irq(priv, rdev->tx_queue->index, true); + rswitch_enadis_data_irq(priv, rdev->rx_queue->index, true); + } spin_unlock_irqrestore(&priv->lock, flags); }
@@ -1538,20 +1540,20 @@ static int rswitch_open(struct net_device *ndev) struct rswitch_device *rdev = netdev_priv(ndev); unsigned long flags;
- phy_start(ndev->phydev); + if (bitmap_empty(rdev->priv->opened_ports, RSWITCH_NUM_PORTS)) + iowrite32(GWCA_TS_IRQ_BIT, rdev->priv->addr + GWTSDIE);
napi_enable(&rdev->napi); - netif_start_queue(ndev);
spin_lock_irqsave(&rdev->priv->lock, flags); + bitmap_set(rdev->priv->opened_ports, rdev->port, 1); rswitch_enadis_data_irq(rdev->priv, rdev->tx_queue->index, true); rswitch_enadis_data_irq(rdev->priv, rdev->rx_queue->index, true); spin_unlock_irqrestore(&rdev->priv->lock, flags);
- if (bitmap_empty(rdev->priv->opened_ports, RSWITCH_NUM_PORTS)) - iowrite32(GWCA_TS_IRQ_BIT, rdev->priv->addr + GWTSDIE); + phy_start(ndev->phydev);
- bitmap_set(rdev->priv->opened_ports, rdev->port, 1); + netif_start_queue(ndev);
return 0; }; @@ -1563,7 +1565,16 @@ static int rswitch_stop(struct net_device *ndev) unsigned long flags;
netif_tx_stop_all_queues(ndev); + + phy_stop(ndev->phydev); + + spin_lock_irqsave(&rdev->priv->lock, flags); + rswitch_enadis_data_irq(rdev->priv, rdev->tx_queue->index, false); + rswitch_enadis_data_irq(rdev->priv, rdev->rx_queue->index, false); bitmap_clear(rdev->priv->opened_ports, rdev->port, 1); + spin_unlock_irqrestore(&rdev->priv->lock, flags); + + napi_disable(&rdev->napi);
if (bitmap_empty(rdev->priv->opened_ports, RSWITCH_NUM_PORTS)) iowrite32(GWCA_TS_IRQ_BIT, rdev->priv->addr + GWTSDID); @@ -1576,14 +1587,6 @@ static int rswitch_stop(struct net_device *ndev) kfree(ts_info); }
- spin_lock_irqsave(&rdev->priv->lock, flags); - rswitch_enadis_data_irq(rdev->priv, rdev->tx_queue->index, false); - rswitch_enadis_data_irq(rdev->priv, rdev->rx_queue->index, false); - spin_unlock_irqrestore(&rdev->priv->lock, flags); - - phy_stop(ndev->phydev); - napi_disable(&rdev->napi); - return 0; };
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shenghao Ding shenghao-ding@ti.com
[ Upstream commit 2aa13da97e2b92d20a8ad4ead10da89f880b64e7 ]
One specific test condition: the default registers of p[j].reg ~ p[j+3].reg are 0, TASDEVICE_REG(0x00, 0x14, 0x38)(PLT_FLAG_REG), TASDEVICE_REG(0x00, 0x14, 0x40)(SINEGAIN_REG), and TASDEVICE_REG(0x00, 0x14, 0x44)(SINEGAIN2_REG). After first calibration, they are freshed to TASDEVICE_REG(0x00, 0x1a, 0x20), TASDEVICE_REG(0x00, 0x16, 0x58)(PLT_FLAG_REG), TASDEVICE_REG(0x00, 0x14, 0x44)(SINEGAIN_REG), and TASDEVICE_REG(0x00, 0x16, 0x64)(SINEGAIN2_REG) via "Calibration Start" kcontrol. In second calibration, the p[j].reg ~ p[j+3].reg have already become tas2781_cali_start_reg. However, p[j+2].reg, TASDEVICE_REG(0x00, 0x14, 0x44)(SINEGAIN_REG), will be freshed to TASDEVICE_REG(0x00, 0x16, 0x64), which is the third register in the input params of the kcontrol. This is why only first calibration can work, the second-time, third-time or more-time calibration always failed without reboot. Of course, if no p[j].reg is in the list of tas2781_cali_start_reg, this stress test can work well.
Fixes: 49e2e353fb0d ("ASoC: tas2781: Add Calibration Kcontrols for Chromebook") Signed-off-by: Shenghao Ding shenghao-ding@ti.com Link: https://patch.msgid.link/20241211043859.1328-1-shenghao-ding@ti.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/tas2781-i2c.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/codecs/tas2781-i2c.c b/sound/soc/codecs/tas2781-i2c.c index 12d093437ba9..1b2f55030c39 100644 --- a/sound/soc/codecs/tas2781-i2c.c +++ b/sound/soc/codecs/tas2781-i2c.c @@ -370,7 +370,7 @@ static void sngl_calib_start(struct tasdevice_priv *tas_priv, int i, tasdevice_dev_read(tas_priv, i, p[j].reg, (int *)&p[j].val[0]); } else { - switch (p[j].reg) { + switch (tas2781_cali_start_reg[j].reg) { case 0: { if (!reg[0]) continue;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
[ Upstream commit 3e643e4efa1e87432204b62f9cfdea3b2508c830 ]
The bt_copy_from_sockptr() return value is being misinterpreted by most users: a non-zero result is mistakenly assumed to represent an error code, but actually indicates the number of bytes that could not be copied.
Remove bt_copy_from_sockptr() and adapt callers to use copy_safe_from_sockptr().
For sco_sock_setsockopt() (case BT_CODEC) use copy_struct_from_sockptr() to scrub parts of uninitialized buffer.
Opportunistically, rename `len` to `optlen` in hci_sock_setsockopt_old() and hci_sock_setsockopt().
Fixes: 51eda36d33e4 ("Bluetooth: SCO: Fix not validating setsockopt user input") Fixes: a97de7bff13b ("Bluetooth: RFCOMM: Fix not validating setsockopt user input") Fixes: 4f3951242ace ("Bluetooth: L2CAP: Fix not validating setsockopt user input") Fixes: 9e8742cdfc4b ("Bluetooth: ISO: Fix not validating setsockopt user input") Fixes: b2186061d604 ("Bluetooth: hci_sock: Fix not validating setsockopt user input") Reviewed-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Reviewed-by: David Wei dw@davidwei.uk Signed-off-by: Michal Luczaj mhal@rbox.co Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/bluetooth/bluetooth.h | 9 --------- net/bluetooth/hci_sock.c | 14 +++++++------- net/bluetooth/iso.c | 10 +++++----- net/bluetooth/l2cap_sock.c | 20 +++++++++++--------- net/bluetooth/rfcomm/sock.c | 9 ++++----- net/bluetooth/sco.c | 11 ++++++----- 6 files changed, 33 insertions(+), 40 deletions(-)
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index f66bc85c6411..e6760c11f007 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -590,15 +590,6 @@ static inline struct sk_buff *bt_skb_sendmmsg(struct sock *sk, return skb; }
-static inline int bt_copy_from_sockptr(void *dst, size_t dst_size, - sockptr_t src, size_t src_size) -{ - if (dst_size > src_size) - return -EINVAL; - - return copy_from_sockptr(dst, src, dst_size); -} - int bt_to_errno(u16 code); __u8 bt_status(int err);
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 2272e1849ebd..022b86797acd 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -1926,7 +1926,7 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg, }
static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname, - sockptr_t optval, unsigned int len) + sockptr_t optval, unsigned int optlen) { struct hci_ufilter uf = { .opcode = 0 }; struct sock *sk = sock->sk; @@ -1943,7 +1943,7 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname,
switch (optname) { case HCI_DATA_DIR: - err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, len); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -1954,7 +1954,7 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname, break;
case HCI_TIME_STAMP: - err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, len); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -1974,7 +1974,7 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname, uf.event_mask[1] = *((u32 *) f->event_mask + 1); }
- err = bt_copy_from_sockptr(&uf, sizeof(uf), optval, len); + err = copy_safe_from_sockptr(&uf, sizeof(uf), optval, optlen); if (err) break;
@@ -2005,7 +2005,7 @@ static int hci_sock_setsockopt_old(struct socket *sock, int level, int optname, }
static int hci_sock_setsockopt(struct socket *sock, int level, int optname, - sockptr_t optval, unsigned int len) + sockptr_t optval, unsigned int optlen) { struct sock *sk = sock->sk; int err = 0; @@ -2015,7 +2015,7 @@ static int hci_sock_setsockopt(struct socket *sock, int level, int optname,
if (level == SOL_HCI) return hci_sock_setsockopt_old(sock, level, optname, optval, - len); + optlen);
if (level != SOL_BLUETOOTH) return -ENOPROTOOPT; @@ -2035,7 +2035,7 @@ static int hci_sock_setsockopt(struct socket *sock, int level, int optname, goto done; }
- err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, len); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 5e2d9758bd3c..7212fd6047b9 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -1566,7 +1566,7 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -1577,7 +1577,7 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, break;
case BT_PKT_STATUS: - err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -1596,7 +1596,7 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- err = bt_copy_from_sockptr(&qos, sizeof(qos), optval, optlen); + err = copy_safe_from_sockptr(&qos, sizeof(qos), optval, optlen); if (err) break;
@@ -1617,8 +1617,8 @@ static int iso_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- err = bt_copy_from_sockptr(iso_pi(sk)->base, optlen, optval, - optlen); + err = copy_safe_from_sockptr(iso_pi(sk)->base, optlen, optval, + optlen); if (err) break;
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c index 18e89e764f3b..3d2553dcdb1b 100644 --- a/net/bluetooth/l2cap_sock.c +++ b/net/bluetooth/l2cap_sock.c @@ -755,7 +755,8 @@ static int l2cap_sock_setsockopt_old(struct socket *sock, int optname, opts.max_tx = chan->max_tx; opts.txwin_size = chan->tx_win;
- err = bt_copy_from_sockptr(&opts, sizeof(opts), optval, optlen); + err = copy_safe_from_sockptr(&opts, sizeof(opts), optval, + optlen); if (err) break;
@@ -800,7 +801,7 @@ static int l2cap_sock_setsockopt_old(struct socket *sock, int optname, break;
case L2CAP_LM: - err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -909,7 +910,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
sec.level = BT_SECURITY_LOW;
- err = bt_copy_from_sockptr(&sec, sizeof(sec), optval, optlen); + err = copy_safe_from_sockptr(&sec, sizeof(sec), optval, optlen); if (err) break;
@@ -956,7 +957,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -970,7 +971,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break;
case BT_FLUSHABLE: - err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -1004,7 +1005,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname,
pwr.force_active = BT_POWER_FORCE_ACTIVE_ON;
- err = bt_copy_from_sockptr(&pwr, sizeof(pwr), optval, optlen); + err = copy_safe_from_sockptr(&pwr, sizeof(pwr), optval, optlen); if (err) break;
@@ -1015,7 +1016,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break;
case BT_CHANNEL_POLICY: - err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -1046,7 +1047,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- err = bt_copy_from_sockptr(&mtu, sizeof(mtu), optval, optlen); + err = copy_safe_from_sockptr(&mtu, sizeof(mtu), optval, optlen); if (err) break;
@@ -1076,7 +1077,8 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- err = bt_copy_from_sockptr(&mode, sizeof(mode), optval, optlen); + err = copy_safe_from_sockptr(&mode, sizeof(mode), optval, + optlen); if (err) break;
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c index 40766f8119ed..913402806fa0 100644 --- a/net/bluetooth/rfcomm/sock.c +++ b/net/bluetooth/rfcomm/sock.c @@ -629,10 +629,9 @@ static int rfcomm_sock_setsockopt_old(struct socket *sock, int optname,
switch (optname) { case RFCOMM_LM: - if (bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen)) { - err = -EFAULT; + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); + if (err) break; - }
if (opt & RFCOMM_LM_FIPS) { err = -EINVAL; @@ -685,7 +684,7 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname,
sec.level = BT_SECURITY_LOW;
- err = bt_copy_from_sockptr(&sec, sizeof(sec), optval, optlen); + err = copy_safe_from_sockptr(&sec, sizeof(sec), optval, optlen); if (err) break;
@@ -703,7 +702,7 @@ static int rfcomm_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c index 1c7252a36866..700abb639a55 100644 --- a/net/bluetooth/sco.c +++ b/net/bluetooth/sco.c @@ -853,7 +853,7 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -872,8 +872,8 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname,
voice.setting = sco_pi(sk)->setting;
- err = bt_copy_from_sockptr(&voice, sizeof(voice), optval, - optlen); + err = copy_safe_from_sockptr(&voice, sizeof(voice), optval, + optlen); if (err) break;
@@ -898,7 +898,7 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname, break;
case BT_PKT_STATUS: - err = bt_copy_from_sockptr(&opt, sizeof(opt), optval, optlen); + err = copy_safe_from_sockptr(&opt, sizeof(opt), optval, optlen); if (err) break;
@@ -941,7 +941,8 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname, break; }
- err = bt_copy_from_sockptr(buffer, optlen, optval, optlen); + err = copy_struct_from_sockptr(buffer, sizeof(buffer), optval, + optlen); if (err) { hci_dev_put(hdev); break;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Clark james.clark@linaro.org
[ Upstream commit f7e36d02d771ee14acae1482091718460cffb321 ]
Since the linked fixes: commit, specifying a CPU on hybrid platforms results in an error because Perf tries to open an extended type event on "any" CPU which isn't valid. Extended type events can only be opened on CPUs that match the type.
Before (working):
$ perf record --cpu 1 -- true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.385 MB perf.data (7 samples) ]
After (not working):
$ perf record -C 1 -- true WARNING: A requested CPU in '1' is not supported by PMU 'cpu_atom' (CPUs 16-27) for event 'cycles:P' Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu_atom/cycles:P/). /bin/dmesg | grep -i perf may provide additional information.
(Ignore the warning message, that's expected and not particularly relevant to this issue).
This is because perf_cpu_map__intersect() of the user specified CPU (1) and one of the PMU's CPUs (16-27) correctly results in an empty (NULL) CPU map. However for the purposes of opening an event, libperf converts empty CPU maps into an any CPU (-1) which the kernel rejects.
Fix it by deleting evsels with empty CPU maps in the specific case where user requested CPU maps are evaluated.
Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events") Reviewed-by: Ian Rogers irogers@google.com Tested-by: Thomas Falcon thomas.falcon@intel.com Signed-off-by: James Clark james.clark@linaro.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Link: https://lore.kernel.org/r/20241114160450.295844-2-james.clark@linaro.org Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/perf/evlist.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c index c6d67fc9e57e..83c43dc13313 100644 --- a/tools/lib/perf/evlist.c +++ b/tools/lib/perf/evlist.c @@ -47,6 +47,20 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist, */ perf_cpu_map__put(evsel->cpus); evsel->cpus = perf_cpu_map__intersect(evlist->user_requested_cpus, evsel->own_cpus); + + /* + * Empty cpu lists would eventually get opened as "any" so remove + * genuinely empty ones before they're opened in the wrong place. + */ + if (perf_cpu_map__is_empty(evsel->cpus)) { + struct perf_evsel *next = perf_evlist__next(evlist, evsel); + + perf_evlist__remove(evlist, evsel); + /* Keep idx contiguous */ + if (next) + list_for_each_entry_from(next, &evlist->entries, node) + next->idx--; + } } else if (!evsel->own_cpus || evlist->has_user_cpus || (!evsel->requires_cpu && perf_cpu_map__has_any_cpu(evlist->user_requested_cpus))) { /* @@ -80,11 +94,11 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
static void perf_evlist__propagate_maps(struct perf_evlist *evlist) { - struct perf_evsel *evsel; + struct perf_evsel *evsel, *n;
evlist->needs_map_propagation = true;
- perf_evlist__for_each_evsel(evlist, evsel) + list_for_each_entry_safe(evsel, n, &evlist->entries, node) __perf_evlist__propagate_maps(evlist, evsel); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shengjiu Wang shengjiu.wang@nxp.com
[ Upstream commit 7c17f7780a48b5ed36b6d13a06004fac993e75af ]
As the snd_soc_card_get_kcontrol() is updated to use snd_ctl_find_id_mixer() in commit 897cc72b0837 ("ASoC: soc-card: Use snd_ctl_find_id_mixer() instead of open-coding") which make the iface fix to be IFACE_MIXER.
Fixes: 897cc72b0837 ("ASoC: soc-card: Use snd_ctl_find_id_mixer() instead of open-coding") Signed-off-by: Shengjiu Wang shengjiu.wang@nxp.com Link: https://patch.msgid.link/20241126053254.3657344-2-shengjiu.wang@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/fsl/fsl_xcvr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c index beede7344efd..4341269eb977 100644 --- a/sound/soc/fsl/fsl_xcvr.c +++ b/sound/soc/fsl/fsl_xcvr.c @@ -169,7 +169,7 @@ static int fsl_xcvr_capds_put(struct snd_kcontrol *kcontrol, }
static struct snd_kcontrol_new fsl_xcvr_earc_capds_kctl = { - .iface = SNDRV_CTL_ELEM_IFACE_PCM, + .iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = "Capabilities Data Structure", .access = SNDRV_CTL_ELEM_ACCESS_READWRITE, .info = fsl_xcvr_type_capds_bytes_info,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shengjiu Wang shengjiu.wang@nxp.com
[ Upstream commit bb76e82bfe57fdd1fe595cb0ccd33159df49ed09 ]
As the snd_soc_card_get_kcontrol() is updated to use snd_ctl_find_id_mixer() in commit 897cc72b0837 ("ASoC: soc-card: Use snd_ctl_find_id_mixer() instead of open-coding") which make the iface fix to be IFACE_MIXER.
Fixes: 897cc72b0837 ("ASoC: soc-card: Use snd_ctl_find_id_mixer() instead of open-coding") Signed-off-by: Shengjiu Wang shengjiu.wang@nxp.com Link: https://patch.msgid.link/20241126053254.3657344-3-shengjiu.wang@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/fsl/fsl_spdif.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c index b6ff04f7138a..ee946e0d3f49 100644 --- a/sound/soc/fsl/fsl_spdif.c +++ b/sound/soc/fsl/fsl_spdif.c @@ -1204,7 +1204,7 @@ static struct snd_kcontrol_new fsl_spdif_ctrls[] = { }, /* DPLL lock info get controller */ { - .iface = SNDRV_CTL_ELEM_IFACE_PCM, + .iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = RX_SAMPLE_RATE_KCONTROL, .access = SNDRV_CTL_ELEM_ACCESS_READ | SNDRV_CTL_ELEM_ACCESS_VOLATILE,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit d92906fd1b940681b4509f7bb8ae737789fb4695 ]
On some systems, neighbor discoveries from ns1 for fec0:42::1 (i.e., the martian trap address) would happen at the wrong time and cause false-negative test result.
Problem analysis also discovered that IPv6 martian ping test was broken in that sent neighbor discoveries, not echo requests were inadvertently trapped
Avoid the race condition by introducing the neighbors to each other upfront. Also pin down the firewall rules to matching on echo requests only.
Fixes: efb056e5f1f0 ("netfilter: ip6t_rpfilter: Fix regression with VRF interfaces") Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/netfilter/rpath.sh | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/netfilter/rpath.sh b/tools/testing/selftests/net/netfilter/rpath.sh index 4485fd7675ed..86ec4e68594d 100755 --- a/tools/testing/selftests/net/netfilter/rpath.sh +++ b/tools/testing/selftests/net/netfilter/rpath.sh @@ -61,9 +61,20 @@ ip -net "$ns2" a a 192.168.42.1/24 dev d0 ip -net "$ns1" a a fec0:42::2/64 dev v0 nodad ip -net "$ns2" a a fec0:42::1/64 dev d0 nodad
+# avoid neighbor lookups and enable martian IPv6 pings +ns2_hwaddr=$(ip -net "$ns2" link show dev v0 | \ + sed -n 's, *link/ether ([^ ]*) .*,\1,p') +ns1_hwaddr=$(ip -net "$ns1" link show dev v0 | \ + sed -n 's, *link/ether ([^ ]*) .*,\1,p') +ip -net "$ns1" neigh add fec0:42::1 lladdr "$ns2_hwaddr" nud permanent dev v0 +ip -net "$ns1" neigh add fec0:23::1 lladdr "$ns2_hwaddr" nud permanent dev v0 +ip -net "$ns2" neigh add fec0:42::2 lladdr "$ns1_hwaddr" nud permanent dev d0 +ip -net "$ns2" neigh add fec0:23::2 lladdr "$ns1_hwaddr" nud permanent dev v0 + # firewall matches to test [ -n "$iptables" ] && { common='-t raw -A PREROUTING -s 192.168.0.0/16' + common+=' -p icmp --icmp-type echo-request' if ! ip netns exec "$ns2" "$iptables" $common -m rpfilter;then echo "Cannot add rpfilter rule" exit $ksft_skip @@ -72,6 +83,7 @@ ip -net "$ns2" a a fec0:42::1/64 dev d0 nodad } [ -n "$ip6tables" ] && { common='-t raw -A PREROUTING -s fec0::/16' + common+=' -p icmpv6 --icmpv6-type echo-request' if ! ip netns exec "$ns2" "$ip6tables" $common -m rpfilter;then echo "Cannot add rpfilter rule" exit $ksft_skip @@ -82,8 +94,10 @@ ip -net "$ns2" a a fec0:42::1/64 dev d0 nodad table inet t { chain c { type filter hook prerouting priority raw; - ip saddr 192.168.0.0/16 fib saddr . iif oif exists counter - ip6 saddr fec0::/16 fib saddr . iif oif exists counter + ip saddr 192.168.0.0/16 icmp type echo-request \ + fib saddr . iif oif exists counter + ip6 saddr fec0::/16 icmpv6 type echo-request \ + fib saddr . iif oif exists counter } } EOF
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit f36b01994d68ffc253c8296e2228dfe6e6431c03 ]
Deletion of the last rule referencing a given idletimer may happen at the same time as a read of its file in sysfs:
| ====================================================== | WARNING: possible circular locking dependency detected | 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted | ------------------------------------------------------ | iptables/3303 is trying to acquire lock: | ffff8881057e04b8 (kn->active#48){++++}-{0:0}, at: __kernfs_remove+0x20 | | but task is already holding lock: | ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v] | | which lock already depends on the new lock.
A simple reproducer is:
| #!/bin/bash | | while true; do | iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme" | iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme" | done & | while true; do | cat /sys/class/xt_idletimer/timers/testme >/dev/null | done
Avoid this by freeing list_mutex right after deleting the element from the list, then continuing with the teardown.
Fixes: 0902b469bd25 ("netfilter: xtables: idletimer target implementation") Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/xt_IDLETIMER.c | 52 +++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 24 deletions(-)
diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c index f8b25b6f5da7..9869ef3c2ab3 100644 --- a/net/netfilter/xt_IDLETIMER.c +++ b/net/netfilter/xt_IDLETIMER.c @@ -409,21 +409,23 @@ static void idletimer_tg_destroy(const struct xt_tgdtor_param *par)
mutex_lock(&list_mutex);
- if (--info->timer->refcnt == 0) { - pr_debug("deleting timer %s\n", info->label); - - list_del(&info->timer->entry); - timer_shutdown_sync(&info->timer->timer); - cancel_work_sync(&info->timer->work); - sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr); - kfree(info->timer->attr.attr.name); - kfree(info->timer); - } else { + if (--info->timer->refcnt > 0) { pr_debug("decreased refcnt of timer %s to %u\n", info->label, info->timer->refcnt); + mutex_unlock(&list_mutex); + return; }
+ pr_debug("deleting timer %s\n", info->label); + + list_del(&info->timer->entry); mutex_unlock(&list_mutex); + + timer_shutdown_sync(&info->timer->timer); + cancel_work_sync(&info->timer->work); + sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr); + kfree(info->timer->attr.attr.name); + kfree(info->timer); }
static void idletimer_tg_destroy_v1(const struct xt_tgdtor_param *par) @@ -434,25 +436,27 @@ static void idletimer_tg_destroy_v1(const struct xt_tgdtor_param *par)
mutex_lock(&list_mutex);
- if (--info->timer->refcnt == 0) { - pr_debug("deleting timer %s\n", info->label); - - list_del(&info->timer->entry); - if (info->timer->timer_type & XT_IDLETIMER_ALARM) { - alarm_cancel(&info->timer->alarm); - } else { - timer_shutdown_sync(&info->timer->timer); - } - cancel_work_sync(&info->timer->work); - sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr); - kfree(info->timer->attr.attr.name); - kfree(info->timer); - } else { + if (--info->timer->refcnt > 0) { pr_debug("decreased refcnt of timer %s to %u\n", info->label, info->timer->refcnt); + mutex_unlock(&list_mutex); + return; }
+ pr_debug("deleting timer %s\n", info->label); + + list_del(&info->timer->entry); mutex_unlock(&list_mutex); + + if (info->timer->timer_type & XT_IDLETIMER_ALARM) { + alarm_cancel(&info->timer->alarm); + } else { + timer_shutdown_sync(&info->timer->timer); + } + cancel_work_sync(&info->timer->work); + sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr); + kfree(info->timer->attr.attr.name); + kfree(info->timer); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit b04df3da1b5c6f6dc7cdccc37941740c078c4043 ]
nf_tables_chain_destroy can sleep, it can't be used from call_rcu callbacks.
Moreover, nf_tables_rule_release() is only safe for error unwinding, while transaction mutex is held and the to-be-desroyed rule was not exposed to either dataplane or dumps, as it deactives+frees without the required synchronize_rcu() in-between.
nft_rule_expr_deactivate() callbacks will change ->use counters of other chains/sets, see e.g. nft_lookup .deactivate callback, these must be serialized via transaction mutex.
Also add a few lockdep asserts to make this more explicit.
Calling synchronize_rcu() isn't ideal, but fixing this without is hard and way more intrusive. As-is, we can get:
WARNING: .. net/netfilter/nf_tables_api.c:5515 nft_set_destroy+0x.. Workqueue: events nf_tables_trans_destroy_work RIP: 0010:nft_set_destroy+0x3fe/0x5c0 Call Trace: <TASK> nf_tables_trans_destroy_work+0x6b7/0xad0 process_one_work+0x64a/0xce0 worker_thread+0x613/0x10d0
In case the synchronize_rcu becomes an issue, we can explore alternatives.
One way would be to allocate nft_trans_rule objects + one nft_trans_chain object, deactivate the rules + the chain and then defer the freeing to the nft destroy workqueue. We'd still need to keep the synchronize_rcu path as a fallback to handle -ENOMEM corner cases though.
Reported-by: syzbot+b26935466701e56cfdc2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67478d92.050a0220.253251.0062.GAE@google.com/T/ Fixes: c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 4 ---- net/netfilter/nf_tables_api.c | 32 +++++++++++++++---------------- 2 files changed, 15 insertions(+), 21 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 066a3ea33b12..91ae20cb7648 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1103,7 +1103,6 @@ struct nft_rule_blob { * @name: name of the chain * @udlen: user data length * @udata: user data in the chain - * @rcu_head: rcu head for deferred release * @blob_next: rule blob pointer to the next in the chain */ struct nft_chain { @@ -1121,7 +1120,6 @@ struct nft_chain { char *name; u16 udlen; u8 *udata; - struct rcu_head rcu_head;
/* Only used during control plane commit phase: */ struct nft_rule_blob *blob_next; @@ -1265,7 +1263,6 @@ static inline void nft_use_inc_restore(u32 *use) * @sets: sets in the table * @objects: stateful objects in the table * @flowtables: flow tables in the table - * @net: netnamespace this table belongs to * @hgenerator: handle generator state * @handle: table handle * @use: number of chain references to this table @@ -1285,7 +1282,6 @@ struct nft_table { struct list_head sets; struct list_head objects; struct list_head flowtables; - possible_net_t net; u64 hgenerator; u64 handle; u32 use; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 4a137afaf0b8..0c5ff4afc370 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1495,7 +1495,6 @@ static int nf_tables_newtable(struct sk_buff *skb, const struct nfnl_info *info, INIT_LIST_HEAD(&table->sets); INIT_LIST_HEAD(&table->objects); INIT_LIST_HEAD(&table->flowtables); - write_pnet(&table->net, net); table->family = family; table->flags = flags; table->handle = ++nft_net->table_handle; @@ -3884,8 +3883,11 @@ void nf_tables_rule_destroy(const struct nft_ctx *ctx, struct nft_rule *rule) kfree(rule); }
+/* can only be used if rule is no longer visible to dumps */ static void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule) { + lockdep_commit_lock_is_held(ctx->net); + nft_rule_expr_deactivate(ctx, rule, NFT_TRANS_RELEASE); nf_tables_rule_destroy(ctx, rule); } @@ -5650,6 +5652,8 @@ void nf_tables_deactivate_set(const struct nft_ctx *ctx, struct nft_set *set, struct nft_set_binding *binding, enum nft_trans_phase phase) { + lockdep_commit_lock_is_held(ctx->net); + switch (phase) { case NFT_TRANS_PREPARE_ERROR: nft_set_trans_unbind(ctx, set); @@ -11456,19 +11460,6 @@ static void __nft_release_basechain_now(struct nft_ctx *ctx) nf_tables_chain_destroy(ctx->chain); }
-static void nft_release_basechain_rcu(struct rcu_head *head) -{ - struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head); - struct nft_ctx ctx = { - .family = chain->table->family, - .chain = chain, - .net = read_pnet(&chain->table->net), - }; - - __nft_release_basechain_now(&ctx); - put_net(ctx.net); -} - int __nft_release_basechain(struct nft_ctx *ctx) { struct nft_rule *rule; @@ -11483,11 +11474,18 @@ int __nft_release_basechain(struct nft_ctx *ctx) nft_chain_del(ctx->chain); nft_use_dec(&ctx->table->use);
- if (maybe_get_net(ctx->net)) - call_rcu(&ctx->chain->rcu_head, nft_release_basechain_rcu); - else + if (!maybe_get_net(ctx->net)) { __nft_release_basechain_now(ctx); + return 0; + } + + /* wait for ruleset dumps to complete. Owning chain is no longer in + * lists, so new dumps can't find any of these rules anymore. + */ + synchronize_rcu();
+ __nft_release_basechain_now(ctx); + put_net(ctx->net); return 0; } EXPORT_SYMBOL_GPL(__nft_release_basechain);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maxim Levitsky mlevitsk@redhat.com
[ Upstream commit bb1e3eb57d2cc38951f9a9f1b8c298ced175798f ]
Commit 8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores") added memory allocation in mana_gd_setup_irqs of 'irqs' but the code doesn't free this temporary array in the success path.
This was caught by kmemleak.
Fixes: 8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores") Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Reviewed-by: Saurabh Sengar ssengar@linux.microsoft.com Reviewed-by: Yury Norov yury.norov@gmail.com Link: https://patch.msgid.link/20241209175751.287738-2-mlevitsk@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microsoft/mana/gdma_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c index ca4ed58f1206..42076c90ce87 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -1372,6 +1372,7 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) gc->max_num_msix = nvec; gc->num_msix_usable = nvec; cpus_read_unlock(); + kfree(irqs); return 0;
free_irq:
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maxim Levitsky mlevitsk@redhat.com
[ Upstream commit 9a5beb6ca6305de5c5210efab0702ea79b62eb39 ]
gc->irq_contexts is not freeded if one of the later operations fail.
Suggested-by: Michael Kelley mhklinux@outlook.com Fixes: 8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores") Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Reviewed-by: Saurabh Sengar ssengar@linux.microsoft.com Reviewed-by: Yury Norov yury.norov@gmail.com Link: https://patch.msgid.link/20241209175751.287738-3-mlevitsk@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microsoft/mana/gdma_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c index 42076c90ce87..0c2ba2fa88c4 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -1315,7 +1315,7 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) GFP_KERNEL); if (!gc->irq_contexts) { err = -ENOMEM; - goto free_irq_vector; + goto free_irq_array; }
for (i = 0; i < nvec; i++) { @@ -1385,8 +1385,9 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) }
kfree(gc->irq_contexts); - kfree(irqs); gc->irq_contexts = NULL; +free_irq_array: + kfree(irqs); free_irq_vector: cpus_read_unlock(); pci_free_irq_vectors(pdev);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit acfcdb78d5d4cdb78e975210c8825b9a112463f6 ]
With this port schedule:
tc qdisc replace dev $send_if parent root handle 100 taprio \ num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ map 0 1 2 3 4 5 6 7 \ base-time 0 cycle-time 10000 \ sched-entry S 01 1250 \ sched-entry S 02 1250 \ sched-entry S 04 1250 \ sched-entry S 08 1250 \ sched-entry S 10 1250 \ sched-entry S 20 1250 \ sched-entry S 40 1250 \ sched-entry S 80 1250 \ flags 2
ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this:
increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l[4134.168]: port 2: send peer delay response failed
It turns out that the driver can't take their TX timestamps because it can't transmit them in the first place. And there's nothing special about the Pdelay_Resp packets - they're just regular 68 byte packets. But with this taprio configuration, the switch would refuse to send even the ETH_ZLEN minimum packet size.
This should have definitely not been the case. When applying the taprio config, the driver prints:
mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent without problems. Yet it's not.
For the forwarding path, the configuration is fine, yet packets injected from Linux get stuck with this schedule no matter what.
The first hint that the static guard bands are the cause of the problem is that reverting Michael Walle's commit 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode") made things work. It must be that the guard bands are calculated incorrectly.
I remembered that there is a magic constant in the driver, set to 33 ns for no logical reason other than experimentation, which says "never let the static guard bands get so large as to leave less than this amount of remaining space in the time slot, because the queue system will refuse to schedule packets otherwise, and they will get stuck". I had a hunch that my previous experimentally-determined value was only good for packets coming from the forwarding path, and that the CPU injection path needed more.
I came to the new value of 35 ns through binary search, after seeing that with 544 ns (the bit time required to send the Pdelay_Resp packet at gigabit) it works. Again, this is purely experimental, there's no logic and the manual doesn't say anything.
The new driver prints for this schedule look like this:
mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
So yes, the maximum MTU is now even smaller by 1 byte than before. This is maybe counter-intuitive, but makes more sense with a diagram of one time slot.
Before:
Gate open Gate close | | v 1250 ns total time slot duration v <----------------------------------------------------> <----><----------------------------------------------> 33 ns 1217 ns static guard band useful
Gate open Gate close | | v 1250 ns total time slot duration v <----------------------------------------------------> <-----><---------------------------------------------> 35 ns 1215 ns static guard band useful
The static guard band implemented by this switch hardware directly determines the maximum allowable MTU for that traffic class. The larger it is, the earlier the switch will stop scheduling frames for transmission, because otherwise they might overrun the gate close time (and avoiding that is the entire purpose of Michael's patch). So, we now have guard bands smaller by 2 ns, thus, in this particular case, we lose a byte of the maximum MTU.
Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Michael Walle mwalle@kernel.org Link: https://patch.msgid.link/20241210132640.3426788-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/ocelot/felix_vsc9959.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c index 0102a82e88cc..940f1b71226d 100644 --- a/drivers/net/dsa/ocelot/felix_vsc9959.c +++ b/drivers/net/dsa/ocelot/felix_vsc9959.c @@ -24,7 +24,7 @@ #define VSC9959_NUM_PORTS 6
#define VSC9959_TAS_GCL_ENTRY_MAX 63 -#define VSC9959_TAS_MIN_GATE_LEN_NS 33 +#define VSC9959_TAS_MIN_GATE_LEN_NS 35 #define VSC9959_VCAP_POLICER_BASE 63 #define VSC9959_VCAP_POLICER_MAX 383 #define VSC9959_SWITCH_PCI_BAR 4 @@ -1056,11 +1056,15 @@ static void vsc9959_mdio_bus_free(struct ocelot *ocelot) mdiobus_free(felix->imdio); }
-/* The switch considers any frame (regardless of size) as eligible for - * transmission if the traffic class gate is open for at least 33 ns. +/* The switch considers any frame (regardless of size) as eligible + * for transmission if the traffic class gate is open for at least + * VSC9959_TAS_MIN_GATE_LEN_NS. + * * Overruns are prevented by cropping an interval at the end of the gate time - * slot for which egress scheduling is blocked, but we need to still keep 33 ns - * available for one packet to be transmitted, otherwise the port tc will hang. + * slot for which egress scheduling is blocked, but we need to still keep + * VSC9959_TAS_MIN_GATE_LEN_NS available for one packet to be transmitted, + * otherwise the port tc will hang. + * * This function returns the size of a gate interval that remains available for * setting the guard band, after reserving the space for one egress frame. */ @@ -1303,7 +1307,8 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port) * per-tc static guard band lengths, so it reduces the * useful gate interval length. Therefore, be careful * to calculate a guard band (and therefore max_sdu) - * that still leaves 33 ns available in the time slot. + * that still leaves VSC9959_TAS_MIN_GATE_LEN_NS + * available in the time slot. */ max_sdu = div_u64(remaining_gate_len_ps, picos_per_byte); /* A TC gate may be completely closed, which is a
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Ottens martin.ottens@fau.de
[ Upstream commit f8d4bc455047cf3903cd6f85f49978987dbb3027 ]
In general, 'qlen' of any classful qdisc should keep track of the number of packets that the qdisc itself and all of its children holds. In case of netem, 'qlen' only accounts for the packets in its internal tfifo. When netem is used with a child qdisc, the child qdisc can use 'qdisc_tree_reduce_backlog' to inform its parent, netem, about created or dropped SKBs. This function updates 'qlen' and the backlog statistics of netem, but netem does not account for changes made by a child qdisc. 'qlen' then indicates the wrong number of packets in the tfifo. If a child qdisc creates new SKBs during enqueue and informs its parent about this, netem's 'qlen' value is increased. When netem dequeues the newly created SKBs from the child, the 'qlen' in netem is not updated. If 'qlen' reaches the configured sch->limit, the enqueue function stops working, even though the tfifo is not full.
Reproduce the bug: Ensure that the sender machine has GSO enabled. Configure netem as root qdisc and tbf as its child on the outgoing interface of the machine as follows: $ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100 $ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
Send bulk TCP traffic out via this interface, e.g., by running an iPerf3 client on the machine. Check the qdisc statistics: $ tc -s qdisc show dev <oif>
Statistics after 10s of iPerf3 TCP test before the fix (note that netem's backlog > limit, netem stopped accepting packets): qdisc netem 1: root refcnt 2 limit 1000 delay 100ms Sent 2767766 bytes 1848 pkt (dropped 652, overlimits 0 requeues 0) backlog 4294528236b 1155p requeues 0 qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms Sent 2767766 bytes 1848 pkt (dropped 327, overlimits 7601 requeues 0) backlog 0b 0p requeues 0
Statistics after the fix: qdisc netem 1: root refcnt 2 limit 1000 delay 100ms Sent 37766372 bytes 24974 pkt (dropped 9, overlimits 0 requeues 0) backlog 0b 0p requeues 0 qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms Sent 37766372 bytes 24974 pkt (dropped 327, overlimits 96017 requeues 0) backlog 0b 0p requeues 0
tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'. The interface fully stops transferring packets and "locks". In this case, the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at its limit and no more packets are accepted.
This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is only decreased when a packet is returned by its dequeue function, and not during enqueuing into the child qdisc. External updates to 'qlen' are thus accounted for and only the behavior of the backlog statistics changes. As in other qdiscs, 'qlen' then keeps track of how many packets are held in netem and all of its children. As before, sch->limit remains as the maximum number of packets in the tfifo. The same applies to netem's backlog statistics.
Fixes: 50612537e9ab ("netem: fix classful handling") Signed-off-by: Martin Ottens martin.ottens@fau.de Acked-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://patch.msgid.link/20241210131412.1837202-1-martin.ottens@fau.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_netem.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 39382ee1e331..3b519adc0125 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -78,6 +78,8 @@ struct netem_sched_data { struct sk_buff *t_head; struct sk_buff *t_tail;
+ u32 t_len; + /* optional qdisc for classful handling (NULL at netem init) */ struct Qdisc *qdisc;
@@ -382,6 +384,7 @@ static void tfifo_reset(struct Qdisc *sch) rtnl_kfree_skbs(q->t_head, q->t_tail); q->t_head = NULL; q->t_tail = NULL; + q->t_len = 0; }
static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) @@ -411,6 +414,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) rb_link_node(&nskb->rbnode, parent, p); rb_insert_color(&nskb->rbnode, &q->t_root); } + q->t_len++; sch->q.qlen++; }
@@ -517,7 +521,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, 1<<get_random_u32_below(8); }
- if (unlikely(sch->q.qlen >= sch->limit)) { + if (unlikely(q->t_len >= sch->limit)) { /* re-link segs, so that qdisc_drop_all() frees them all */ skb->next = segs; qdisc_drop_all(skb, sch, to_free); @@ -701,8 +705,8 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) tfifo_dequeue: skb = __qdisc_dequeue_head(&sch->q); if (skb) { - qdisc_qstats_backlog_dec(sch, skb); deliver: + qdisc_qstats_backlog_dec(sch, skb); qdisc_bstats_update(sch, skb); return skb; } @@ -718,8 +722,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (time_to_send <= now && q->slot.slot_next <= now) { netem_erase_head(q, skb); - sch->q.qlen--; - qdisc_qstats_backlog_dec(sch, skb); + q->t_len--; skb->next = NULL; skb->prev = NULL; /* skb->dev shares skb->rbnode area, @@ -746,16 +749,21 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) if (net_xmit_drop_count(err)) qdisc_qstats_drop(sch); qdisc_tree_reduce_backlog(sch, 1, pkt_len); + sch->qstats.backlog -= pkt_len; + sch->q.qlen--; } goto tfifo_dequeue; } + sch->q.qlen--; goto deliver; }
if (q->qdisc) { skb = q->qdisc->ops->dequeue(q->qdisc); - if (skb) + if (skb) { + sch->q.qlen--; goto deliver; + } }
qdisc_watchdog_schedule_ns(&q->watchdog, @@ -765,8 +773,10 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (q->qdisc) { skb = q->qdisc->ops->dequeue(q->qdisc); - if (skb) + if (skb) { + sch->q.qlen--; goto deliver; + } } return NULL; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit d2516c3a53705f783bb6868df0f4a2b977898a71 ]
Both bonding and team driver have logic to derive the base feature flags before iterating over their slave devices to refine the set via netdev_increment_features().
Add a small helper netdev_base_features() so this can be reused instead of having it open-coded multiple times.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Nikolay Aleksandrov razor@blackwall.org Cc: Ido Schimmel idosch@idosch.org Cc: Jiri Pirko jiri@nvidia.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Nikolay Aleksandrov razor@blackwall.org Link: https://patch.msgid.link/20241210141245.327886-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni pabeni@redhat.com Stable-dep-of: d064ea7fe2a2 ("bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 4 +--- drivers/net/team/team_core.c | 3 +-- include/linux/netdev_features.h | 7 +++++++ 3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 15e0f14d0d49..166910693fd7 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1520,9 +1520,7 @@ static netdev_features_t bond_fix_features(struct net_device *dev, struct slave *slave;
mask = features; - - features &= ~NETIF_F_ONE_FOR_ALL; - features |= NETIF_F_ALL_FOR_ALL; + features = netdev_base_features(features);
bond_for_each_slave(bond, slave, iter) { features = netdev_increment_features(features, diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index 18191d5a8bd4..481c8df8842f 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -2012,8 +2012,7 @@ static netdev_features_t team_fix_features(struct net_device *dev, netdev_features_t mask;
mask = features; - features &= ~NETIF_F_ONE_FOR_ALL; - features |= NETIF_F_ALL_FOR_ALL; + features = netdev_base_features(features);
rcu_read_lock(); list_for_each_entry_rcu(port, &team->port_list, list) { diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 66e7d26b70a4..11be70a7929f 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -253,4 +253,11 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start) NETIF_F_GSO_UDP_TUNNEL | \ NETIF_F_GSO_UDP_TUNNEL_CSUM)
+static inline netdev_features_t netdev_base_features(netdev_features_t features) +{ + features &= ~NETIF_F_ONE_FOR_ALL; + features |= NETIF_F_ALL_FOR_ALL; + return features; +} + #endif /* _LINUX_NETDEV_FEATURES_H */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit d064ea7fe2a24938997b5e88e6b61cbb0a4bb906 ]
If a bonding device has slave devices, then the current logic to derive the feature set for the master bond device is limited in that flags which are fully supported by the underlying slave devices cannot be propagated up to vlan devices which sit on top of bond devices. Instead, these get blindly masked out via current NETIF_F_ALL_FOR_ALL logic.
vlan_features and mpls_features should reuse netdev_base_features() in order derive the set in the same way as ndo_fix_features before iterating through the slave devices to refine the feature set.
Fixes: a9b3ace44c7d ("bonding: fix vlan_features computing") Fixes: 2e770b507ccd ("net: bonding: Inherit MPLS features from slave devices") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Nikolay Aleksandrov razor@blackwall.org Cc: Ido Schimmel idosch@idosch.org Cc: Jiri Pirko jiri@nvidia.com Reviewed-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Link: https://patch.msgid.link/20241210141245.327886-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 166910693fd7..dfad7b6f9f35 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1562,8 +1562,9 @@ static void bond_compute_features(struct bonding *bond)
if (!bond_has_slaves(bond)) goto done; - vlan_features &= NETIF_F_ALL_FOR_ALL; - mpls_features &= NETIF_F_ALL_FOR_ALL; + + vlan_features = netdev_base_features(vlan_features); + mpls_features = netdev_base_features(mpls_features);
bond_for_each_slave(bond, slave, iter) { vlan_features = netdev_increment_features(vlan_features,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit 77b11c8bf3a228d1c63464534c2dcc8d9c8bf7ff ]
Drivers like mlx5 expose NIC's vlan_features such as NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM which are later not propagated when the underlying devices are bonded and a vlan device created on top of the bond.
Right now, the more cumbersome workaround for this is to create the vlan on top of the mlx5 and then enslave the vlan devices to a bond.
To fix this, add NETIF_F_GSO_ENCAP_ALL to BOND_VLAN_FEATURES such that bond_compute_features() can probe and propagate the vlan_features from the slave devices up to the vlan device.
Given the following bond:
# ethtool -i enp2s0f{0,1}np{0,1} driver: mlx5_core [...]
# ethtool -k enp2s0f0np0 | grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: on rx-udp-gro-forwarding: off
# ethtool -k enp2s0f1np1 | grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: on rx-udp-gro-forwarding: off
# ethtool -k bond0 | grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: off [fixed] rx-udp-gro-forwarding: off
Before:
# ethtool -k bond0.100 | grep udp tx-udp_tnl-segmentation: off [requested on] tx-udp_tnl-csum-segmentation: off [requested on] tx-udp-segmentation: on rx-udp_tunnel-port-offload: off [fixed] rx-udp-gro-forwarding: off
After:
# ethtool -k bond0.100 | grep udp tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-udp-segmentation: on rx-udp_tunnel-port-offload: off [fixed] rx-udp-gro-forwarding: off
Various users have run into this reporting performance issues when configuring Cilium in vxlan tunneling mode and having the combination of bond & vlan for the core devices connecting the Kubernetes cluster to the outside world.
Fixes: a9b3ace44c7d ("bonding: fix vlan_features computing") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Nikolay Aleksandrov razor@blackwall.org Cc: Ido Schimmel idosch@idosch.org Cc: Jiri Pirko jiri@nvidia.com Reviewed-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Link: https://patch.msgid.link/20241210141245.327886-3-daniel@iogearbox.net Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index dfad7b6f9f35..4d73abae503d 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1534,6 +1534,7 @@ static netdev_features_t bond_fix_features(struct net_device *dev,
#define BOND_VLAN_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG | \ NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE | \ + NETIF_F_GSO_ENCAP_ALL | \ NETIF_F_HIGHDMA | NETIF_F_LRO)
#define BOND_ENC_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG | \
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit 396699ac2cb1bc4e3485abb48a1e3e41956de0cd ]
Similarly as with bonding, fix the calculation of vlan_features to reuse netdev_base_features() in order derive the set in the same way as ndo_fix_features before iterating through the slave devices to refine the feature set.
Fixes: 3625920b62c3 ("teaming: fix vlan_features computing") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Nikolay Aleksandrov razor@blackwall.org Cc: Ido Schimmel idosch@idosch.org Cc: Jiri Pirko jiri@nvidia.com Reviewed-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Link: https://patch.msgid.link/20241210141245.327886-4-daniel@iogearbox.net Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/team/team_core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index 481c8df8842f..ddd9ae7085c7 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -991,13 +991,14 @@ static void team_port_disable(struct team *team, static void __team_compute_features(struct team *team) { struct team_port *port; - netdev_features_t vlan_features = TEAM_VLAN_FEATURES & - NETIF_F_ALL_FOR_ALL; + netdev_features_t vlan_features = TEAM_VLAN_FEATURES; netdev_features_t enc_features = TEAM_ENC_FEATURES; unsigned short max_hard_header_len = ETH_HLEN; unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
+ vlan_features = netdev_base_features(vlan_features); + rcu_read_lock(); list_for_each_entry_rcu(port, &team->port_list, list) { vlan_features = netdev_increment_features(vlan_features,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit 98712844589e06d9aa305b5077169942139fd75c ]
Similar to bonding driver, add NETIF_F_GSO_ENCAP_ALL to TEAM_VLAN_FEATURES in order to support slave devices which propagate NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM as vlan_features.
Fixes: 3625920b62c3 ("teaming: fix vlan_features computing") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Nikolay Aleksandrov razor@blackwall.org Cc: Ido Schimmel idosch@idosch.org Cc: Jiri Pirko jiri@nvidia.com Reviewed-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Link: https://patch.msgid.link/20241210141245.327886-5-daniel@iogearbox.net Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/team/team_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index ddd9ae7085c7..6ace5a74cddb 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -983,7 +983,8 @@ static void team_port_disable(struct team *team,
#define TEAM_VLAN_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG | \ NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE | \ - NETIF_F_HIGHDMA | NETIF_F_LRO) + NETIF_F_HIGHDMA | NETIF_F_LRO | \ + NETIF_F_GSO_ENCAP_ALL)
#define TEAM_ENC_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG | \ NETIF_F_RXCSUM | NETIF_F_GSO_SOFTWARE)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charles Keepax ckeepax@opensource.cirrus.com
[ Upstream commit 255cc582e6e16191a20d54bcdbca6c91d3e90c5e ]
The code uses the initialised member of the asoc_sdw_dailink struct to determine if a member of the array is in use. However in the case the array is completely full this will lead to an access 1 past the end of the array, expand the array by one entry to include a space for a terminator.
Fixes: 27fd36aefa00 ("ASoC: Intel: sof-sdw: Add new code for parsing the snd_soc_acpi structs") Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Péter Ujfalusi peter.ujfalusi@linux.intel.com Signed-off-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://patch.msgid.link/20241212105742.1508574-1-ckeepax@opensource.cirrus.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/boards/sof_sdw.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/sound/soc/intel/boards/sof_sdw.c b/sound/soc/intel/boards/sof_sdw.c index a58842a8c8a6..db57292c00ca 100644 --- a/sound/soc/intel/boards/sof_sdw.c +++ b/sound/soc/intel/boards/sof_sdw.c @@ -1003,8 +1003,12 @@ static int sof_card_dai_links_create(struct snd_soc_card *card) return ret; }
- /* One per DAI link, worst case is a DAI link for every endpoint */ - sof_dais = kcalloc(num_ends, sizeof(*sof_dais), GFP_KERNEL); + /* + * One per DAI link, worst case is a DAI link for every endpoint, also + * add one additional to act as a terminator such that code can iterate + * until it hits an uninitialised DAI. + */ + sof_dais = kcalloc(num_ends + 1, sizeof(*sof_dais), GFP_KERNEL); if (!sof_dais) return -ENOMEM;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniil Tatianin d-tatianin@yandex-team.ru
[ Upstream commit c53d96a4481f42a1635b96d2c1acbb0a126bfd54 ]
This bug was first introduced in c27f3d011b08, where the author of the patch probably meant to do DeleteMutex instead of ReleaseMutex. The mutex leak was noticed later on and fixed in e4dfe108371, but the bogus MutexRelease line was never removed, so do it now.
Link: https://github.com/acpica/acpica/pull/982 Fixes: c27f3d011b08 ("ACPICA: Fix race in generic_serial_bus (I2C) and GPIO op_region parameter handling") Signed-off-by: Daniil Tatianin d-tatianin@yandex-team.ru Link: https://patch.msgid.link/20241122082954.658356-1-d-tatianin@yandex-team.ru Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/acpica/evxfregn.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/acpi/acpica/evxfregn.c b/drivers/acpi/acpica/evxfregn.c index 95f78383bbdb..bff2d099f469 100644 --- a/drivers/acpi/acpica/evxfregn.c +++ b/drivers/acpi/acpica/evxfregn.c @@ -232,8 +232,6 @@ acpi_remove_address_space_handler(acpi_handle device,
/* Now we can delete the handler object */
- acpi_os_release_mutex(handler_obj->address_space. - context_mutex); acpi_ut_remove_reference(handler_obj); goto unlock_and_exit; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 581dd2dc168fe0ed2a7a5534a724f0d3751c93ae ]
The usage of rcu_read_(un)lock while inside list_for_each_entry_rcu is not safe since for the most part entries fetched this way shall be treated as rcu_dereference:
Note that the value returned by rcu_dereference() is valid only within the enclosing RCU read-side critical section [1]_. For example, the following is **not** legal::
rcu_read_lock(); p = rcu_dereference(head.next); rcu_read_unlock(); x = p->address; /* BUG!!! */ rcu_read_lock(); y = p->data; /* BUG!!! */ rcu_read_unlock();
Fixes: a0bfde167b50 ("Bluetooth: ISO: Add support for connecting multiple BISes") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 33 +++++++++++---------------------- 1 file changed, 11 insertions(+), 22 deletions(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 2b5ba8acd1d8..388d46c6a043 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -6872,38 +6872,27 @@ static void hci_le_create_big_complete_evt(struct hci_dev *hdev, void *data, return;
hci_dev_lock(hdev); - rcu_read_lock();
/* Connect all BISes that are bound to the BIG */ - list_for_each_entry_rcu(conn, &hdev->conn_hash.list, list) { - if (bacmp(&conn->dst, BDADDR_ANY) || - conn->type != ISO_LINK || - conn->iso_qos.bcast.big != ev->handle) + while ((conn = hci_conn_hash_lookup_big_state(hdev, ev->handle, + BT_BOUND))) { + if (ev->status) { + hci_connect_cfm(conn, ev->status); + hci_conn_del(conn); continue; + }
if (hci_conn_set_handle(conn, __le16_to_cpu(ev->bis_handle[i++]))) continue;
- if (!ev->status) { - conn->state = BT_CONNECTED; - set_bit(HCI_CONN_BIG_CREATED, &conn->flags); - rcu_read_unlock(); - hci_debugfs_create_conn(conn); - hci_conn_add_sysfs(conn); - hci_iso_setup_path(conn); - rcu_read_lock(); - continue; - } - - hci_connect_cfm(conn, ev->status); - rcu_read_unlock(); - hci_conn_del(conn); - rcu_read_lock(); + conn->state = BT_CONNECTED; + set_bit(HCI_CONN_BIG_CREATED, &conn->flags); + hci_debugfs_create_conn(conn); + hci_conn_add_sysfs(conn); + hci_iso_setup_path(conn); }
- rcu_read_unlock(); - if (!ev->status && !i) /* If no BISes have been connected for the BIG, * terminate. This is in case all bound connections
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Iulia Tanasescu iulia.tanasescu@nxp.com
[ Upstream commit 9c76fff747a73ba01d1d87ed53dd9c00cb40ba05 ]
Since hci_get_route holds the device before returning, the hdev should be released with hci_dev_put at the end of iso_listen_bis even if the function returns with an error.
Fixes: 02171da6e86a ("Bluetooth: ISO: Add hcon for listening bis sk") Signed-off-by: Iulia Tanasescu iulia.tanasescu@nxp.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/iso.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 7212fd6047b9..34eade4b0587 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -1158,10 +1158,9 @@ static int iso_listen_bis(struct sock *sk) goto unlock; }
- hci_dev_put(hdev); - unlock: hci_dev_unlock(hdev); + hci_dev_put(hdev); return err; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Iulia Tanasescu iulia.tanasescu@nxp.com
[ Upstream commit 9bde7c3b3ad0e1f39d6df93dd1c9caf63e19e50f ]
This updates iso_sock_accept to use nested locking for the parent socket, to avoid lockdep warnings caused because the parent and child sockets are locked by the same thread:
[ 41.585683] ============================================ [ 41.585688] WARNING: possible recursive locking detected [ 41.585694] 6.12.0-rc6+ #22 Not tainted [ 41.585701] -------------------------------------------- [ 41.585705] iso-tester/3139 is trying to acquire lock: [ 41.585711] ffff988b29530a58 (sk_lock-AF_BLUETOOTH) at: bt_accept_dequeue+0xe3/0x280 [bluetooth] [ 41.585905] but task is already holding lock: [ 41.585909] ffff988b29533a58 (sk_lock-AF_BLUETOOTH) at: iso_sock_accept+0x61/0x2d0 [bluetooth] [ 41.586064] other info that might help us debug this: [ 41.586069] Possible unsafe locking scenario:
[ 41.586072] CPU0 [ 41.586076] ---- [ 41.586079] lock(sk_lock-AF_BLUETOOTH); [ 41.586086] lock(sk_lock-AF_BLUETOOTH); [ 41.586093] *** DEADLOCK ***
[ 41.586097] May be due to missing lock nesting notation
[ 41.586101] 1 lock held by iso-tester/3139: [ 41.586107] #0: ffff988b29533a58 (sk_lock-AF_BLUETOOTH) at: iso_sock_accept+0x61/0x2d0 [bluetooth]
Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Iulia Tanasescu iulia.tanasescu@nxp.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/iso.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 34eade4b0587..269ce0bb73a1 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -1225,7 +1225,11 @@ static int iso_sock_accept(struct socket *sock, struct socket *newsock, long timeo; int err = 0;
- lock_sock(sk); + /* Use explicit nested locking to avoid lockdep warnings generated + * because the parent socket and the child socket are locked on the + * same thread. + */ + lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
timeo = sock_rcvtimeo(sk, arg->flags & O_NONBLOCK);
@@ -1256,7 +1260,7 @@ static int iso_sock_accept(struct socket *sock, struct socket *newsock, release_sock(sk);
timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, timeo); - lock_sock(sk); + lock_sock_nested(sk, SINGLE_DEPTH_NESTING); } remove_wait_queue(sk_sleep(sk), &wait);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frédéric Danis frederic.danis@collabora.com
[ Upstream commit 29a651451e6c264f58cd9d9a26088e579d17b242 ]
The voice setting is used by sco_connect() or sco_conn_defer_accept() after being set by sco_sock_setsockopt().
The PCM part of the voice setting is used for offload mode through PCM chipset port. This commits add support for mSBC 16 bits offloading, i.e. audio data not transported over HCI.
The BCM4349B1 supports 16 bits transparent data on its I2S port. If BT_VOICE_TRANSPARENT is used when accepting a SCO connection, this gives only garbage audio while using BT_VOICE_TRANSPARENT_16BIT gives correct audio. This has been tested with connection to iPhone 14 and Samsung S24.
Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option") Signed-off-by: Frédéric Danis frederic.danis@collabora.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/bluetooth/bluetooth.h | 1 + net/bluetooth/sco.c | 29 +++++++++++++++-------------- 2 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index e6760c11f007..435250c72d56 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -123,6 +123,7 @@ struct bt_voice {
#define BT_VOICE_TRANSPARENT 0x0003 #define BT_VOICE_CVSD_16BIT 0x0060 +#define BT_VOICE_TRANSPARENT_16BIT 0x0063
#define BT_SNDMTU 12 #define BT_RCVMTU 13 diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c index 700abb639a55..b872a2ca3ff3 100644 --- a/net/bluetooth/sco.c +++ b/net/bluetooth/sco.c @@ -267,10 +267,13 @@ static int sco_connect(struct sock *sk) else type = SCO_LINK;
- if (sco_pi(sk)->setting == BT_VOICE_TRANSPARENT && - (!lmp_transp_capable(hdev) || !lmp_esco_capable(hdev))) { - err = -EOPNOTSUPP; - goto unlock; + switch (sco_pi(sk)->setting & SCO_AIRMODE_MASK) { + case SCO_AIRMODE_TRANSP: + if (!lmp_transp_capable(hdev) || !lmp_esco_capable(hdev)) { + err = -EOPNOTSUPP; + goto unlock; + } + break; }
hcon = hci_connect_sco(hdev, type, &sco_pi(sk)->dst, @@ -877,13 +880,6 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname, if (err) break;
- /* Explicitly check for these values */ - if (voice.setting != BT_VOICE_TRANSPARENT && - voice.setting != BT_VOICE_CVSD_16BIT) { - err = -EINVAL; - break; - } - sco_pi(sk)->setting = voice.setting; hdev = hci_get_route(&sco_pi(sk)->dst, &sco_pi(sk)->src, BDADDR_BREDR); @@ -891,9 +887,14 @@ static int sco_sock_setsockopt(struct socket *sock, int level, int optname, err = -EBADFD; break; } - if (enhanced_sync_conn_capable(hdev) && - voice.setting == BT_VOICE_TRANSPARENT) - sco_pi(sk)->codec.id = BT_CODEC_TRANSPARENT; + + switch (sco_pi(sk)->setting & SCO_AIRMODE_MASK) { + case SCO_AIRMODE_TRANSP: + if (enhanced_sync_conn_capable(hdev)) + sco_pi(sk)->codec.id = BT_CODEC_TRANSPARENT; + break; + } + hci_dev_put(hdev); break;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Iulia Tanasescu iulia.tanasescu@nxp.com
[ Upstream commit 168e28305b871d8ec604a8f51f35467b8d7ba05b ]
This fixes the circular locking dependency warning below, by releasing the socket lock before enterning iso_listen_bis, to avoid any potential deadlock with hdev lock.
[ 75.307983] ====================================================== [ 75.307984] WARNING: possible circular locking dependency detected [ 75.307985] 6.12.0-rc6+ #22 Not tainted [ 75.307987] ------------------------------------------------------ [ 75.307987] kworker/u81:2/2623 is trying to acquire lock: [ 75.307988] ffff8fde1769da58 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO) at: iso_connect_cfm+0x253/0x840 [bluetooth] [ 75.308021] but task is already holding lock: [ 75.308022] ffff8fdd61a10078 (&hdev->lock) at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth] [ 75.308053] which lock already depends on the new lock.
[ 75.308054] the existing dependency chain (in reverse order) is: [ 75.308055] -> #1 (&hdev->lock){+.+.}-{3:3}: [ 75.308057] __mutex_lock+0xad/0xc50 [ 75.308061] mutex_lock_nested+0x1b/0x30 [ 75.308063] iso_sock_listen+0x143/0x5c0 [bluetooth] [ 75.308085] __sys_listen_socket+0x49/0x60 [ 75.308088] __x64_sys_listen+0x4c/0x90 [ 75.308090] x64_sys_call+0x2517/0x25f0 [ 75.308092] do_syscall_64+0x87/0x150 [ 75.308095] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 75.308098] -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: [ 75.308100] __lock_acquire+0x155e/0x25f0 [ 75.308103] lock_acquire+0xc9/0x300 [ 75.308105] lock_sock_nested+0x32/0x90 [ 75.308107] iso_connect_cfm+0x253/0x840 [bluetooth] [ 75.308128] hci_connect_cfm+0x6c/0x190 [bluetooth] [ 75.308155] hci_le_per_adv_report_evt+0x27b/0x2f0 [bluetooth] [ 75.308180] hci_le_meta_evt+0xe7/0x200 [bluetooth] [ 75.308206] hci_event_packet+0x21f/0x5c0 [bluetooth] [ 75.308230] hci_rx_work+0x3ae/0xb10 [bluetooth] [ 75.308254] process_one_work+0x212/0x740 [ 75.308256] worker_thread+0x1bd/0x3a0 [ 75.308258] kthread+0xe4/0x120 [ 75.308259] ret_from_fork+0x44/0x70 [ 75.308261] ret_from_fork_asm+0x1a/0x30 [ 75.308263] other info that might help us debug this:
[ 75.308264] Possible unsafe locking scenario:
[ 75.308264] CPU0 CPU1 [ 75.308265] ---- ---- [ 75.308265] lock(&hdev->lock); [ 75.308267] lock(sk_lock- AF_BLUETOOTH-BTPROTO_ISO); [ 75.308268] lock(&hdev->lock); [ 75.308269] lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); [ 75.308270] *** DEADLOCK ***
[ 75.308271] 4 locks held by kworker/u81:2/2623: [ 75.308272] #0: ffff8fdd66e52148 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x443/0x740 [ 75.308276] #1: ffffafb488b7fe48 ((work_completion)(&hdev->rx_work)), at: process_one_work+0x1ce/0x740 [ 75.308280] #2: ffff8fdd61a10078 (&hdev->lock){+.+.}-{3:3} at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth] [ 75.308304] #3: ffffffffb6ba4900 (rcu_read_lock){....}-{1:2}, at: hci_connect_cfm+0x29/0x190 [bluetooth]
Fixes: 02171da6e86a ("Bluetooth: ISO: Add hcon for listening bis sk") Signed-off-by: Iulia Tanasescu iulia.tanasescu@nxp.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/iso.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 269ce0bb73a1..809e88fd3fcb 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -1129,6 +1129,7 @@ static int iso_listen_bis(struct sock *sk) return -EHOSTUNREACH;
hci_dev_lock(hdev); + lock_sock(sk);
/* Fail if user set invalid QoS */ if (iso_pi(sk)->qos_user_set && !check_bcast_qos(&iso_pi(sk)->qos)) { @@ -1159,6 +1160,7 @@ static int iso_listen_bis(struct sock *sk) }
unlock: + release_sock(sk); hci_dev_unlock(hdev); hci_dev_put(hdev); return err; @@ -1187,6 +1189,7 @@ static int iso_sock_listen(struct socket *sock, int backlog)
BT_DBG("sk %p backlog %d", sk, backlog);
+ sock_hold(sk); lock_sock(sk);
if (sk->sk_state != BT_BOUND) { @@ -1199,10 +1202,16 @@ static int iso_sock_listen(struct socket *sock, int backlog) goto done; }
- if (!bacmp(&iso_pi(sk)->dst, BDADDR_ANY)) + if (!bacmp(&iso_pi(sk)->dst, BDADDR_ANY)) { err = iso_listen_cis(sk); - else + } else { + /* Drop sock lock to avoid potential + * deadlock with the hdev lock. + */ + release_sock(sk); err = iso_listen_bis(sk); + lock_sock(sk); + }
if (err) goto done; @@ -1214,6 +1223,7 @@ static int iso_sock_listen(struct socket *sock, int backlog)
done: release_sock(sk); + sock_put(sk); return err; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Iulia Tanasescu iulia.tanasescu@nxp.com
[ Upstream commit 7a17308c17880d259105f6e591eb1bc77b9612f0 ]
This fixes the circular locking dependency warning below, by reworking iso_sock_recvmsg, to ensure that the socket lock is always released before calling a function that locks hdev.
[ 561.670344] ====================================================== [ 561.670346] WARNING: possible circular locking dependency detected [ 561.670349] 6.12.0-rc6+ #26 Not tainted [ 561.670351] ------------------------------------------------------ [ 561.670353] iso-tester/3289 is trying to acquire lock: [ 561.670355] ffff88811f600078 (&hdev->lock){+.+.}-{3:3}, at: iso_conn_big_sync+0x73/0x260 [bluetooth] [ 561.670405] but task is already holding lock: [ 561.670407] ffff88815af58258 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0}, at: iso_sock_recvmsg+0xbf/0x500 [bluetooth] [ 561.670450] which lock already depends on the new lock.
[ 561.670452] the existing dependency chain (in reverse order) is: [ 561.670453] -> #2 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0}: [ 561.670458] lock_acquire+0x7c/0xc0 [ 561.670463] lock_sock_nested+0x3b/0xf0 [ 561.670467] bt_accept_dequeue+0x1a5/0x4d0 [bluetooth] [ 561.670510] iso_sock_accept+0x271/0x830 [bluetooth] [ 561.670547] do_accept+0x3dd/0x610 [ 561.670550] __sys_accept4+0xd8/0x170 [ 561.670553] __x64_sys_accept+0x74/0xc0 [ 561.670556] x64_sys_call+0x17d6/0x25f0 [ 561.670559] do_syscall_64+0x87/0x150 [ 561.670563] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 561.670567] -> #1 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: [ 561.670571] lock_acquire+0x7c/0xc0 [ 561.670574] lock_sock_nested+0x3b/0xf0 [ 561.670577] iso_sock_listen+0x2de/0xf30 [bluetooth] [ 561.670617] __sys_listen_socket+0xef/0x130 [ 561.670620] __x64_sys_listen+0xe1/0x190 [ 561.670623] x64_sys_call+0x2517/0x25f0 [ 561.670626] do_syscall_64+0x87/0x150 [ 561.670629] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 561.670632] -> #0 (&hdev->lock){+.+.}-{3:3}: [ 561.670636] __lock_acquire+0x32ad/0x6ab0 [ 561.670639] lock_acquire.part.0+0x118/0x360 [ 561.670642] lock_acquire+0x7c/0xc0 [ 561.670644] __mutex_lock+0x18d/0x12f0 [ 561.670647] mutex_lock_nested+0x1b/0x30 [ 561.670651] iso_conn_big_sync+0x73/0x260 [bluetooth] [ 561.670687] iso_sock_recvmsg+0x3e9/0x500 [bluetooth] [ 561.670722] sock_recvmsg+0x1d5/0x240 [ 561.670725] sock_read_iter+0x27d/0x470 [ 561.670727] vfs_read+0x9a0/0xd30 [ 561.670731] ksys_read+0x1a8/0x250 [ 561.670733] __x64_sys_read+0x72/0xc0 [ 561.670736] x64_sys_call+0x1b12/0x25f0 [ 561.670738] do_syscall_64+0x87/0x150 [ 561.670741] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 561.670744] other info that might help us debug this:
[ 561.670745] Chain exists of: &hdev->lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> sk_lock-AF_BLUETOOTH
[ 561.670751] Possible unsafe locking scenario:
[ 561.670753] CPU0 CPU1 [ 561.670754] ---- ---- [ 561.670756] lock(sk_lock-AF_BLUETOOTH); [ 561.670758] lock(sk_lock AF_BLUETOOTH-BTPROTO_ISO); [ 561.670761] lock(sk_lock-AF_BLUETOOTH); [ 561.670764] lock(&hdev->lock); [ 561.670767] *** DEADLOCK ***
Fixes: 07a9342b94a9 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync") Signed-off-by: Iulia Tanasescu iulia.tanasescu@nxp.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/iso.c | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-)
diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 809e88fd3fcb..644b606743e2 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -1411,6 +1411,7 @@ static void iso_conn_big_sync(struct sock *sk) * change. */ hci_dev_lock(hdev); + lock_sock(sk);
if (!test_and_set_bit(BT_SK_BIG_SYNC, &iso_pi(sk)->flags)) { err = hci_le_big_create_sync(hdev, iso_pi(sk)->conn->hcon, @@ -1423,6 +1424,7 @@ static void iso_conn_big_sync(struct sock *sk) err); }
+ release_sock(sk); hci_dev_unlock(hdev); }
@@ -1431,39 +1433,57 @@ static int iso_sock_recvmsg(struct socket *sock, struct msghdr *msg, { struct sock *sk = sock->sk; struct iso_pinfo *pi = iso_pi(sk); + bool early_ret = false; + int err = 0;
BT_DBG("sk %p", sk);
if (test_and_clear_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags)) { + sock_hold(sk); lock_sock(sk); + switch (sk->sk_state) { case BT_CONNECT2: if (test_bit(BT_SK_PA_SYNC, &pi->flags)) { + release_sock(sk); iso_conn_big_sync(sk); + lock_sock(sk); + sk->sk_state = BT_LISTEN; } else { iso_conn_defer_accept(pi->conn->hcon); sk->sk_state = BT_CONFIG; } - release_sock(sk); - return 0; + + early_ret = true; + break; case BT_CONNECTED: if (test_bit(BT_SK_PA_SYNC, &iso_pi(sk)->flags)) { + release_sock(sk); iso_conn_big_sync(sk); + lock_sock(sk); + sk->sk_state = BT_LISTEN; - release_sock(sk); - return 0; + early_ret = true; }
- release_sock(sk); break; case BT_CONNECT: release_sock(sk); - return iso_connect_cis(sk); + err = iso_connect_cis(sk); + lock_sock(sk); + + early_ret = true; + break; default: - release_sock(sk); break; } + + release_sock(sk); + sock_put(sk); + + if (early_ret) + return err; }
return bt_sock_recvmsg(sock, msg, len, flags);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thadeu Lima de Souza Cascardo cascardo@igalia.com
[ Upstream commit b548f5e9456c568155499d9ebac675c0d7a296e8 ]
hci_devcd_append may lead to the release of the skb, so it cannot be accessed once it is called.
================================================================== BUG: KASAN: slab-use-after-free in btmtk_process_coredump+0x2a7/0x2d0 [btmtk] Read of size 4 at addr ffff888033cfabb0 by task kworker/0:3/82
CPU: 0 PID: 82 Comm: kworker/0:3 Tainted: G U 6.6.40-lockdep-03464-g1d8b4eb3060e #1 b0b3c1cc0c842735643fb411799d97921d1f688c Hardware name: Google Yaviks_Ufs/Yaviks_Ufs, BIOS Google_Yaviks_Ufs.15217.552.0 05/07/2024 Workqueue: events btusb_rx_work [btusb] Call Trace: <TASK> dump_stack_lvl+0xfd/0x150 print_report+0x131/0x780 kasan_report+0x177/0x1c0 btmtk_process_coredump+0x2a7/0x2d0 [btmtk 03edd567dd71a65958807c95a65db31d433e1d01] btusb_recv_acl_mtk+0x11c/0x1a0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec] btusb_rx_work+0x9e/0xe0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec] worker_thread+0xe44/0x2cc0 kthread+0x2ff/0x3a0 ret_from_fork+0x51/0x80 ret_from_fork_asm+0x1b/0x30 </TASK>
Allocated by task 82: stack_trace_save+0xdc/0x190 kasan_set_track+0x4e/0x80 __kasan_slab_alloc+0x4e/0x60 kmem_cache_alloc+0x19f/0x360 skb_clone+0x132/0xf70 btusb_recv_acl_mtk+0x104/0x1a0 [btusb] btusb_rx_work+0x9e/0xe0 [btusb] worker_thread+0xe44/0x2cc0 kthread+0x2ff/0x3a0 ret_from_fork+0x51/0x80 ret_from_fork_asm+0x1b/0x30
Freed by task 1733: stack_trace_save+0xdc/0x190 kasan_set_track+0x4e/0x80 kasan_save_free_info+0x28/0xb0 ____kasan_slab_free+0xfd/0x170 kmem_cache_free+0x183/0x3f0 hci_devcd_rx+0x91a/0x2060 [bluetooth] worker_thread+0xe44/0x2cc0 kthread+0x2ff/0x3a0 ret_from_fork+0x51/0x80 ret_from_fork_asm+0x1b/0x30
The buggy address belongs to the object at ffff888033cfab40 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 112 bytes inside of freed 232-byte region [ffff888033cfab40, ffff888033cfac28)
The buggy address belongs to the physical page: page:00000000a174ba93 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x33cfa head:00000000a174ba93 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0 anon flags: 0x4000000000000840(slab|head|zone=1) page_type: 0xffffffff() raw: 4000000000000840 ffff888100848a00 0000000000000000 0000000000000001 raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888033cfaa80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc ffff888033cfab00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
ffff888033cfab80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888033cfac00: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc ffff888033cfac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Check if we need to call hci_devcd_complete before calling hci_devcd_append. That requires that we check data->cd_info.cnt >= MTK_COREDUMP_NUM instead of data->cd_info.cnt > MTK_COREDUMP_NUM, as we increment data->cd_info.cnt only once the call to hci_devcd_append succeeds.
Fixes: 0b7015132878 ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support") Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@igalia.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btmtk.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/bluetooth/btmtk.c b/drivers/bluetooth/btmtk.c index 480e4adba9fa..85e99641eaae 100644 --- a/drivers/bluetooth/btmtk.c +++ b/drivers/bluetooth/btmtk.c @@ -395,6 +395,7 @@ int btmtk_process_coredump(struct hci_dev *hdev, struct sk_buff *skb) { struct btmtk_data *data = hci_get_priv(hdev); int err; + bool complete = false;
if (!IS_ENABLED(CONFIG_DEV_COREDUMP)) { kfree_skb(skb); @@ -416,19 +417,22 @@ int btmtk_process_coredump(struct hci_dev *hdev, struct sk_buff *skb) fallthrough; case HCI_DEVCOREDUMP_ACTIVE: default: + /* Mediatek coredump data would be more than MTK_COREDUMP_NUM */ + if (data->cd_info.cnt >= MTK_COREDUMP_NUM && + skb->len > MTK_COREDUMP_END_LEN) + if (!memcmp((char *)&skb->data[skb->len - MTK_COREDUMP_END_LEN], + MTK_COREDUMP_END, MTK_COREDUMP_END_LEN - 1)) + complete = true; + err = hci_devcd_append(hdev, skb); if (err < 0) break; data->cd_info.cnt++;
- /* Mediatek coredump data would be more than MTK_COREDUMP_NUM */ - if (data->cd_info.cnt > MTK_COREDUMP_NUM && - skb->len > MTK_COREDUMP_END_LEN) - if (!memcmp((char *)&skb->data[skb->len - MTK_COREDUMP_END_LEN], - MTK_COREDUMP_END, MTK_COREDUMP_END_LEN - 1)) { - bt_dev_info(hdev, "Mediatek coredump end"); - hci_devcd_complete(hdev); - } + if (complete) { + bt_dev_info(hdev, "Mediatek coredump end"); + hci_devcd_complete(hdev); + }
break; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Yushchenko nikita.yoush@cogentembedded.com
[ Upstream commit fb9e6039c325cc205a368046dc03c56c87df2310 ]
MPIC.PIS must be set per phy interface type. MPIC.LSC must be set per speed.
Do that strictly per datasheet, instead of hardcoding MPIC.PIS to GMII.
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Nikita Yushchenko nikita.yoush@cogentembedded.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Link: https://patch.msgid.link/20241211053012.368914-1-nikita.yoush@cogentembedded... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/rswitch.c | 27 ++++++++++++++++++++------ drivers/net/ethernet/renesas/rswitch.h | 14 ++++++------- 2 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/renesas/rswitch.c b/drivers/net/ethernet/renesas/rswitch.c index 9dffb7cf1254..09117110e3dd 100644 --- a/drivers/net/ethernet/renesas/rswitch.c +++ b/drivers/net/ethernet/renesas/rswitch.c @@ -1116,25 +1116,40 @@ static int rswitch_etha_wait_link_verification(struct rswitch_etha *etha)
static void rswitch_rmac_setting(struct rswitch_etha *etha, const u8 *mac) { - u32 val; + u32 pis, lsc;
rswitch_etha_write_mac_address(etha, mac);
+ switch (etha->phy_interface) { + case PHY_INTERFACE_MODE_SGMII: + pis = MPIC_PIS_GMII; + break; + case PHY_INTERFACE_MODE_USXGMII: + case PHY_INTERFACE_MODE_5GBASER: + pis = MPIC_PIS_XGMII; + break; + default: + pis = FIELD_GET(MPIC_PIS, ioread32(etha->addr + MPIC)); + break; + } + switch (etha->speed) { case 100: - val = MPIC_LSC_100M; + lsc = MPIC_LSC_100M; break; case 1000: - val = MPIC_LSC_1G; + lsc = MPIC_LSC_1G; break; case 2500: - val = MPIC_LSC_2_5G; + lsc = MPIC_LSC_2_5G; break; default: - return; + lsc = FIELD_GET(MPIC_LSC, ioread32(etha->addr + MPIC)); + break; }
- iowrite32(MPIC_PIS_GMII | val, etha->addr + MPIC); + rswitch_modify(etha->addr, MPIC, MPIC_PIS | MPIC_LSC, + FIELD_PREP(MPIC_PIS, pis) | FIELD_PREP(MPIC_LSC, lsc)); }
static void rswitch_etha_enable_mii(struct rswitch_etha *etha) diff --git a/drivers/net/ethernet/renesas/rswitch.h b/drivers/net/ethernet/renesas/rswitch.h index 72e3ff596d31..e020800dcc57 100644 --- a/drivers/net/ethernet/renesas/rswitch.h +++ b/drivers/net/ethernet/renesas/rswitch.h @@ -724,13 +724,13 @@ enum rswitch_etha_mode {
#define EAVCC_VEM_SC_TAG (0x3 << 16)
-#define MPIC_PIS_MII 0x00 -#define MPIC_PIS_GMII 0x02 -#define MPIC_PIS_XGMII 0x04 -#define MPIC_LSC_SHIFT 3 -#define MPIC_LSC_100M (1 << MPIC_LSC_SHIFT) -#define MPIC_LSC_1G (2 << MPIC_LSC_SHIFT) -#define MPIC_LSC_2_5G (3 << MPIC_LSC_SHIFT) +#define MPIC_PIS GENMASK(2, 0) +#define MPIC_PIS_GMII 2 +#define MPIC_PIS_XGMII 4 +#define MPIC_LSC GENMASK(5, 3) +#define MPIC_LSC_100M 1 +#define MPIC_LSC_1G 2 +#define MPIC_LSC_2_5G 3
#define MDIO_READ_C45 0x03 #define MDIO_WRITE_C45 0x01
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jesse Van Gavere jesseevg@gmail.com
[ Upstream commit 5af53577c64fa84da032d490b701127fe8d1a6aa ]
Commit 8d7ae22ae9f8 ("net: dsa: microchip: KSZ9477 register regmap alignment to 32 bit boundaries") fixed an issue whereby regmap_reg_range did not allow writes as 32 bit words to KSZ9477 PHY registers, this fix for KSZ9896 is adapted from there as the same errata is present in KSZ9896C as "Module 5: Certain PHY registers must be written as pairs instead of singly" the explanation below is likewise taken from this commit.
The commit provided code to apply "Module 6: Certain PHY registers must be written as pairs instead of singly" errata for KSZ9477 as this chip for certain PHY registers (0xN120 to 0xN13F, N=1,2,3,4,5) must be accessed as 32 bit words instead of 16 or 8 bit access. Otherwise, adjacent registers (no matter if reserved or not) are overwritten with 0x0.
Without this patch some registers (e.g. 0x113c or 0x1134) required for 32 bit access are out of valid regmap ranges.
As a result, following error is observed and KSZ9896 is not properly configured:
ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0
The solution is to modify regmap_reg_range to allow accesses with 4 bytes boundaries.
Fixes: 5c844d57aa78 ("net: dsa: microchip: fix writes to phy registers >= 0x10") Signed-off-by: Jesse Van Gavere jesse.vangavere@scioteq.com Link: https://patch.msgid.link/20241211092932.26881-1-jesse.vangavere@scioteq.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz_common.c | 42 +++++++++++--------------- 1 file changed, 18 insertions(+), 24 deletions(-)
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index 5290f5ad98f3..bf26cd0abf6d 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -1098,10 +1098,9 @@ static const struct regmap_range ksz9896_valid_regs[] = { regmap_reg_range(0x1030, 0x1030), regmap_reg_range(0x1100, 0x1115), regmap_reg_range(0x111a, 0x111f), - regmap_reg_range(0x1122, 0x1127), - regmap_reg_range(0x112a, 0x112b), - regmap_reg_range(0x1136, 0x1139), - regmap_reg_range(0x113e, 0x113f), + regmap_reg_range(0x1120, 0x112b), + regmap_reg_range(0x1134, 0x113b), + regmap_reg_range(0x113c, 0x113f), regmap_reg_range(0x1400, 0x1401), regmap_reg_range(0x1403, 0x1403), regmap_reg_range(0x1410, 0x1417), @@ -1128,10 +1127,9 @@ static const struct regmap_range ksz9896_valid_regs[] = { regmap_reg_range(0x2030, 0x2030), regmap_reg_range(0x2100, 0x2115), regmap_reg_range(0x211a, 0x211f), - regmap_reg_range(0x2122, 0x2127), - regmap_reg_range(0x212a, 0x212b), - regmap_reg_range(0x2136, 0x2139), - regmap_reg_range(0x213e, 0x213f), + regmap_reg_range(0x2120, 0x212b), + regmap_reg_range(0x2134, 0x213b), + regmap_reg_range(0x213c, 0x213f), regmap_reg_range(0x2400, 0x2401), regmap_reg_range(0x2403, 0x2403), regmap_reg_range(0x2410, 0x2417), @@ -1158,10 +1156,9 @@ static const struct regmap_range ksz9896_valid_regs[] = { regmap_reg_range(0x3030, 0x3030), regmap_reg_range(0x3100, 0x3115), regmap_reg_range(0x311a, 0x311f), - regmap_reg_range(0x3122, 0x3127), - regmap_reg_range(0x312a, 0x312b), - regmap_reg_range(0x3136, 0x3139), - regmap_reg_range(0x313e, 0x313f), + regmap_reg_range(0x3120, 0x312b), + regmap_reg_range(0x3134, 0x313b), + regmap_reg_range(0x313c, 0x313f), regmap_reg_range(0x3400, 0x3401), regmap_reg_range(0x3403, 0x3403), regmap_reg_range(0x3410, 0x3417), @@ -1188,10 +1185,9 @@ static const struct regmap_range ksz9896_valid_regs[] = { regmap_reg_range(0x4030, 0x4030), regmap_reg_range(0x4100, 0x4115), regmap_reg_range(0x411a, 0x411f), - regmap_reg_range(0x4122, 0x4127), - regmap_reg_range(0x412a, 0x412b), - regmap_reg_range(0x4136, 0x4139), - regmap_reg_range(0x413e, 0x413f), + regmap_reg_range(0x4120, 0x412b), + regmap_reg_range(0x4134, 0x413b), + regmap_reg_range(0x413c, 0x413f), regmap_reg_range(0x4400, 0x4401), regmap_reg_range(0x4403, 0x4403), regmap_reg_range(0x4410, 0x4417), @@ -1218,10 +1214,9 @@ static const struct regmap_range ksz9896_valid_regs[] = { regmap_reg_range(0x5030, 0x5030), regmap_reg_range(0x5100, 0x5115), regmap_reg_range(0x511a, 0x511f), - regmap_reg_range(0x5122, 0x5127), - regmap_reg_range(0x512a, 0x512b), - regmap_reg_range(0x5136, 0x5139), - regmap_reg_range(0x513e, 0x513f), + regmap_reg_range(0x5120, 0x512b), + regmap_reg_range(0x5134, 0x513b), + regmap_reg_range(0x513c, 0x513f), regmap_reg_range(0x5400, 0x5401), regmap_reg_range(0x5403, 0x5403), regmap_reg_range(0x5410, 0x5417), @@ -1248,10 +1243,9 @@ static const struct regmap_range ksz9896_valid_regs[] = { regmap_reg_range(0x6030, 0x6030), regmap_reg_range(0x6100, 0x6115), regmap_reg_range(0x611a, 0x611f), - regmap_reg_range(0x6122, 0x6127), - regmap_reg_range(0x612a, 0x612b), - regmap_reg_range(0x6136, 0x6139), - regmap_reg_range(0x613e, 0x613f), + regmap_reg_range(0x6120, 0x612b), + regmap_reg_range(0x6134, 0x613b), + regmap_reg_range(0x613c, 0x613f), regmap_reg_range(0x6300, 0x6301), regmap_reg_range(0x6400, 0x6401), regmap_reg_range(0x6403, 0x6403),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Robert Hodaszi robert.hodaszi@digi.com
[ Upstream commit 36ff681d2283410742489ce77e7b01419eccf58c ]
The blamed commit changed the dsa_8021q_rcv() calling convention to accept pre-populated source_port and switch_id arguments. If those are not available, as in the case of tag_ocelot_8021q, the arguments must be pre-initialized with -1.
Due to the bug of passing uninitialized arguments in tag_ocelot_8021q, dsa_8021q_rcv() does not detect that it needs to populate the source_port and switch_id, and this makes dsa_conduit_find_user() fail, which leads to packet loss on reception.
Fixes: dcfe7673787b ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()") Signed-off-by: Robert Hodaszi robert.hodaszi@digi.com Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://patch.msgid.link/20241211144741.1415758-1-robert.hodaszi@digi.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/dsa/tag_ocelot_8021q.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c index 8e8b1bef6af6..11ea8cfd6266 100644 --- a/net/dsa/tag_ocelot_8021q.c +++ b/net/dsa/tag_ocelot_8021q.c @@ -79,7 +79,7 @@ static struct sk_buff *ocelot_xmit(struct sk_buff *skb, static struct sk_buff *ocelot_rcv(struct sk_buff *skb, struct net_device *netdev) { - int src_port, switch_id; + int src_port = -1, switch_id = -1;
dsa_8021q_rcv(skb, &src_port, &switch_id, NULL, NULL);
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mirsad Todorovac mtodorovac69@gmail.com
[ Upstream commit ed69b28b3a5e39871ba5599992f80562d6ee59db ]
Running coccinelle spatch gave the following warning:
./drivers/gpu/drm/xe/tests/xe_migrate.c:226:5-11: inconsistent IS_ERR and PTR_ERR on line 228.
The code reports PTR_ERR(pt) when IS_ERR(tiny) is checked:
→ 211 pt = xe_bo_create_pin_map(xe, tile, m->q->vm, XE_PAGE_SIZE, 212 ttm_bo_type_kernel, 213 XE_BO_FLAG_VRAM_IF_DGFX(tile) | 214 XE_BO_FLAG_PINNED); 215 if (IS_ERR(pt)) { 216 KUNIT_FAIL(test, "Failed to allocate fake pt: %li\n", 217 PTR_ERR(pt)); 218 goto free_big; 219 } 220 221 tiny = xe_bo_create_pin_map(xe, tile, m->q->vm, → 222 2 * SZ_4K, 223 ttm_bo_type_kernel, 224 XE_BO_FLAG_VRAM_IF_DGFX(tile) | 225 XE_BO_FLAG_PINNED); → 226 if (IS_ERR(tiny)) { → 227 KUNIT_FAIL(test, "Failed to allocate fake pt: %li\n", → 228 PTR_ERR(pt)); 229 goto free_pt; 230 }
Now, the IS_ERR(tiny) and the corresponding PTR_ERR(pt) do not match.
Returning PTR_ERR(tiny), as the last failed function call, seems logical.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Mirsad Todorovac mtodorovac69@gmail.com Link: https://patchwork.freedesktop.org/patch/msgid/20241121212057.1526634-2-mtodo... Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com (cherry picked from commit cb57c75098c1c449a007ba301f9073f96febaaa9) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/tests/xe_migrate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c index 1a192a2a941b..3bbdb362d6f0 100644 --- a/drivers/gpu/drm/xe/tests/xe_migrate.c +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c @@ -224,8 +224,8 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test) XE_BO_FLAG_VRAM_IF_DGFX(tile) | XE_BO_FLAG_PINNED); if (IS_ERR(tiny)) { - KUNIT_FAIL(test, "Failed to allocate fake pt: %li\n", - PTR_ERR(pt)); + KUNIT_FAIL(test, "Failed to allocate tiny fake pt: %li\n", + PTR_ERR(tiny)); goto free_pt; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lucas De Marchi lucas.demarchi@intel.com
[ Upstream commit d7b028656c29b22fcde1c6ee1df5b28fbba987b5 ]
That pool implementation doesn't really work: if the krealloc happens to move the memory and return another address, the entries in the xarray become invalid, leading to use-after-free later:
BUG: KASAN: slab-use-after-free in xe_reg_sr_apply_mmio+0x570/0x760 [xe] Read of size 4 at addr ffff8881244b2590 by task modprobe/2753
Allocated by task 2753: kasan_save_stack+0x39/0x70 kasan_save_track+0x14/0x40 kasan_save_alloc_info+0x37/0x60 __kasan_kmalloc+0xc3/0xd0 __kmalloc_node_track_caller_noprof+0x200/0x6d0 krealloc_noprof+0x229/0x380
Simplify the code to fix the bug. A better pooling strategy may be added back later if needed.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reviewed-by: Matt Roper matthew.d.roper@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20241209232739.147417-2-lucas.... Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com (cherry picked from commit e5283bd4dfecbd3335f43b62a68e24dae23f59e4) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_reg_sr.c | 31 ++++++---------------------- drivers/gpu/drm/xe/xe_reg_sr_types.h | 6 ------ 2 files changed, 6 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_reg_sr.c b/drivers/gpu/drm/xe/xe_reg_sr.c index 440ac572f6e5..52969c090965 100644 --- a/drivers/gpu/drm/xe/xe_reg_sr.c +++ b/drivers/gpu/drm/xe/xe_reg_sr.c @@ -26,46 +26,27 @@ #include "xe_reg_whitelist.h" #include "xe_rtp_types.h"
-#define XE_REG_SR_GROW_STEP_DEFAULT 16 - static void reg_sr_fini(struct drm_device *drm, void *arg) { struct xe_reg_sr *sr = arg; + struct xe_reg_sr_entry *entry; + unsigned long reg; + + xa_for_each(&sr->xa, reg, entry) + kfree(entry);
xa_destroy(&sr->xa); - kfree(sr->pool.arr); - memset(&sr->pool, 0, sizeof(sr->pool)); }
int xe_reg_sr_init(struct xe_reg_sr *sr, const char *name, struct xe_device *xe) { xa_init(&sr->xa); - memset(&sr->pool, 0, sizeof(sr->pool)); - sr->pool.grow_step = XE_REG_SR_GROW_STEP_DEFAULT; sr->name = name;
return drmm_add_action_or_reset(&xe->drm, reg_sr_fini, sr); } EXPORT_SYMBOL_IF_KUNIT(xe_reg_sr_init);
-static struct xe_reg_sr_entry *alloc_entry(struct xe_reg_sr *sr) -{ - if (sr->pool.used == sr->pool.allocated) { - struct xe_reg_sr_entry *arr; - - arr = krealloc_array(sr->pool.arr, - ALIGN(sr->pool.allocated + 1, sr->pool.grow_step), - sizeof(*arr), GFP_KERNEL); - if (!arr) - return NULL; - - sr->pool.arr = arr; - sr->pool.allocated += sr->pool.grow_step; - } - - return &sr->pool.arr[sr->pool.used++]; -} - static bool compatible_entries(const struct xe_reg_sr_entry *e1, const struct xe_reg_sr_entry *e2) { @@ -111,7 +92,7 @@ int xe_reg_sr_add(struct xe_reg_sr *sr, return 0; }
- pentry = alloc_entry(sr); + pentry = kmalloc(sizeof(*pentry), GFP_KERNEL); if (!pentry) { ret = -ENOMEM; goto fail; diff --git a/drivers/gpu/drm/xe/xe_reg_sr_types.h b/drivers/gpu/drm/xe/xe_reg_sr_types.h index ad48a52b824a..ebe11f237fa2 100644 --- a/drivers/gpu/drm/xe/xe_reg_sr_types.h +++ b/drivers/gpu/drm/xe/xe_reg_sr_types.h @@ -20,12 +20,6 @@ struct xe_reg_sr_entry { };
struct xe_reg_sr { - struct { - struct xe_reg_sr_entry *arr; - unsigned int used; - unsigned int allocated; - unsigned int grow_step; - } pool; struct xarray xa; const char *name;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
[ Upstream commit 57e420c84f9ab55ba4c5e2ae9c5f6c8e1ea834d2 ]
After a recent change to clamp() and its variants [1] that increases the coverage of the check that high is greater than low because it can be done through inlining, certain build configurations (such as s390 defconfig) fail to build with clang with:
block/blk-iocost.c:1101:11: error: call to '__compiletime_assert_557' declared with 'error' attribute: clamp() low limit 1 greater than high limit active 1101 | inuse = clamp_t(u32, inuse, 1, active); | ^ include/linux/minmax.h:218:36: note: expanded from macro 'clamp_t' 218 | #define clamp_t(type, val, lo, hi) __careful_clamp(type, val, lo, hi) | ^ include/linux/minmax.h:195:2: note: expanded from macro '__careful_clamp' 195 | __clamp_once(type, val, lo, hi, __UNIQUE_ID(v_), __UNIQUE_ID(l_), __UNIQUE_ID(h_)) | ^ include/linux/minmax.h:188:2: note: expanded from macro '__clamp_once' 188 | BUILD_BUG_ON_MSG(statically_true(ulo > uhi), \ | ^
__propagate_weights() is called with an active value of zero in ioc_check_iocgs(), which results in the high value being less than the low value, which is undefined because the value returned depends on the order of the comparisons.
The purpose of this expression is to ensure inuse is not more than active and at least 1. This could be written more simply with a ternary expression that uses min(inuse, active) as the condition so that the value of that condition can be used if it is not zero and one if it is. Do this conversion to resolve the error and add a comment to deter people from turning this back into clamp().
Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Link: https://lore.kernel.org/r/34d53778977747f19cce2abb287bb3e6@AcuMS.aculab.com/ [1] Suggested-by: David Laight david.laight@aculab.com Reported-by: Linux Kernel Functional Testing lkft@linaro.org Closes: https://lore.kernel.org/llvm/CA+G9fYsD7mw13wredcZn0L-KBA3yeoVSTuxnss-AEWMN3h... Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202412120322.3GfVe3vF-lkp@intel.com/ Signed-off-by: Nathan Chancellor nathan@kernel.org Acked-by: Tejun Heo tj@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-iocost.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 384aa15e8260..a5894ec9696e 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1098,7 +1098,14 @@ static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse, inuse = DIV64_U64_ROUND_UP(active * iocg->child_inuse_sum, iocg->child_active_sum); } else { - inuse = clamp_t(u32, inuse, 1, active); + /* + * It may be tempting to turn this into a clamp expression with + * a lower limit of 1 but active may be 0, which cannot be used + * as an upper limit in that situation. This expression allows + * active to clamp inuse unless it is 0, in which case inuse + * becomes 1. + */ + inuse = min(inuse, active) ?: 1; }
iocg->last_inuse = iocg->inuse;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Weizhao Ouyang o451686892@gmail.com
[ Upstream commit ce03573a1917532da06057da9f8e74a2ee9e2ac9 ]
When using svcr_in to check ZA and Streaming Mode, we should make sure that the value in x2 is correct, otherwise it may trigger an Illegal instruction if FEAT_SVE and !FEAT_SME.
Fixes: 43e3f85523e4 ("kselftest/arm64: Add SME support to syscall ABI test") Signed-off-by: Weizhao Ouyang o451686892@gmail.com Reviewed-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20241211111639.12344-1-o451686892@gmail.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/arm64/abi/syscall-abi-asm.S | 32 +++++++++---------- 1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/arm64/abi/syscall-abi-asm.S b/tools/testing/selftests/arm64/abi/syscall-abi-asm.S index df3230fdac39..66ab2e0bae5f 100644 --- a/tools/testing/selftests/arm64/abi/syscall-abi-asm.S +++ b/tools/testing/selftests/arm64/abi/syscall-abi-asm.S @@ -81,32 +81,31 @@ do_syscall: stp x27, x28, [sp, #96]
// Set SVCR if we're doing SME - cbz x1, 1f + cbz x1, load_gpr adrp x2, svcr_in ldr x2, [x2, :lo12:svcr_in] msr S3_3_C4_C2_2, x2 -1:
// Load ZA and ZT0 if enabled - uses x12 as scratch due to SME LDR - tbz x2, #SVCR_ZA_SHIFT, 1f + tbz x2, #SVCR_ZA_SHIFT, load_gpr mov w12, #0 ldr x2, =za_in -2: _ldr_za 12, 2 +1: _ldr_za 12, 2 add x2, x2, x1 add x12, x12, #1 cmp x1, x12 - bne 2b + bne 1b
// ZT0 mrs x2, S3_0_C0_C4_5 // ID_AA64SMFR0_EL1 ubfx x2, x2, #ID_AA64SMFR0_EL1_SMEver_SHIFT, \ #ID_AA64SMFR0_EL1_SMEver_WIDTH - cbz x2, 1f + cbz x2, load_gpr adrp x2, zt_in add x2, x2, :lo12:zt_in _ldr_zt 2 -1:
+load_gpr: // Load GPRs x8-x28, and save our SP/FP for later comparison ldr x2, =gpr_in add x2, x2, #64 @@ -125,9 +124,9 @@ do_syscall: str x30, [x2], #8 // LR
// Load FPRs if we're not doing neither SVE nor streaming SVE - cbnz x0, 1f + cbnz x0, check_sve_in ldr x2, =svcr_in - tbnz x2, #SVCR_SM_SHIFT, 1f + tbnz x2, #SVCR_SM_SHIFT, check_sve_in
ldr x2, =fpr_in ldp q0, q1, [x2] @@ -148,8 +147,8 @@ do_syscall: ldp q30, q31, [x2, #16 * 30]
b 2f -1:
+check_sve_in: // Load the SVE registers if we're doing SVE/SME
ldr x2, =z_in @@ -256,32 +255,31 @@ do_syscall: stp q30, q31, [x2, #16 * 30]
// Save SVCR if we're doing SME - cbz x1, 1f + cbz x1, check_sve_out mrs x2, S3_3_C4_C2_2 adrp x3, svcr_out str x2, [x3, :lo12:svcr_out] -1:
// Save ZA if it's enabled - uses x12 as scratch due to SME STR - tbz x2, #SVCR_ZA_SHIFT, 1f + tbz x2, #SVCR_ZA_SHIFT, check_sve_out mov w12, #0 ldr x2, =za_out -2: _str_za 12, 2 +1: _str_za 12, 2 add x2, x2, x1 add x12, x12, #1 cmp x1, x12 - bne 2b + bne 1b
// ZT0 mrs x2, S3_0_C0_C4_5 // ID_AA64SMFR0_EL1 ubfx x2, x2, #ID_AA64SMFR0_EL1_SMEver_SHIFT, \ #ID_AA64SMFR0_EL1_SMEver_WIDTH - cbz x2, 1f + cbz x2, check_sve_out adrp x2, zt_out add x2, x2, :lo12:zt_out _str_zt 2 -1:
+check_sve_out: // Save the SVE state if we have some cbz x0, 1f
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ming Lei ming.lei@redhat.com
[ Upstream commit 22465bbac53c821319089016f268a2437de9b00a ]
Registering and unregistering cpuhp callback requires global cpu hotplug lock, which is used everywhere. Meantime q->sysfs_lock is used in block layer almost everywhere.
It is easy to trigger lockdep warning[1] by connecting the two locks.
Fix the warning by moving blk-mq's cpuhp callback registering out of q->sysfs_lock. Add one dedicated global lock for covering registering & unregistering hctx's cpuhp, and it is safe to do so because hctx is guaranteed to be live if our request_queue is live.
[1] https://lore.kernel.org/lkml/Z04pz3AlvI4o0Mr8@agluck-desk3/
Cc: Reinette Chatre reinette.chatre@intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Peter Newman peternewman@google.com Cc: Babu Moger babu.moger@amd.com Reported-by: Luck Tony tony.luck@intel.com Signed-off-by: Ming Lei ming.lei@redhat.com Tested-by: Tony Luck tony.luck@intel.com Link: https://lore.kernel.org/r/20241206111611.978870-3-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: be26ba96421a ("block: Fix potential deadlock while freezing queue and acquiring sysfs_lock") Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-mq.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 92 insertions(+), 6 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index b4fba7b398e5..1030875a3e95 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -43,6 +43,7 @@
static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); static DEFINE_PER_CPU(call_single_data_t, blk_cpu_csd); +static DEFINE_MUTEX(blk_mq_cpuhp_lock);
static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); static void blk_mq_request_bypass_insert(struct request *rq, @@ -3740,13 +3741,91 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node) return 0; }
-static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx) +static void __blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx) { - if (!(hctx->flags & BLK_MQ_F_STACKING)) + lockdep_assert_held(&blk_mq_cpuhp_lock); + + if (!(hctx->flags & BLK_MQ_F_STACKING) && + !hlist_unhashed(&hctx->cpuhp_online)) { cpuhp_state_remove_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE, &hctx->cpuhp_online); - cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD, - &hctx->cpuhp_dead); + INIT_HLIST_NODE(&hctx->cpuhp_online); + } + + if (!hlist_unhashed(&hctx->cpuhp_dead)) { + cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD, + &hctx->cpuhp_dead); + INIT_HLIST_NODE(&hctx->cpuhp_dead); + } +} + +static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx) +{ + mutex_lock(&blk_mq_cpuhp_lock); + __blk_mq_remove_cpuhp(hctx); + mutex_unlock(&blk_mq_cpuhp_lock); +} + +static void __blk_mq_add_cpuhp(struct blk_mq_hw_ctx *hctx) +{ + lockdep_assert_held(&blk_mq_cpuhp_lock); + + if (!(hctx->flags & BLK_MQ_F_STACKING) && + hlist_unhashed(&hctx->cpuhp_online)) + cpuhp_state_add_instance_nocalls(CPUHP_AP_BLK_MQ_ONLINE, + &hctx->cpuhp_online); + + if (hlist_unhashed(&hctx->cpuhp_dead)) + cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, + &hctx->cpuhp_dead); +} + +static void __blk_mq_remove_cpuhp_list(struct list_head *head) +{ + struct blk_mq_hw_ctx *hctx; + + lockdep_assert_held(&blk_mq_cpuhp_lock); + + list_for_each_entry(hctx, head, hctx_list) + __blk_mq_remove_cpuhp(hctx); +} + +/* + * Unregister cpuhp callbacks from exited hw queues + * + * Safe to call if this `request_queue` is live + */ +static void blk_mq_remove_hw_queues_cpuhp(struct request_queue *q) +{ + LIST_HEAD(hctx_list); + + spin_lock(&q->unused_hctx_lock); + list_splice_init(&q->unused_hctx_list, &hctx_list); + spin_unlock(&q->unused_hctx_lock); + + mutex_lock(&blk_mq_cpuhp_lock); + __blk_mq_remove_cpuhp_list(&hctx_list); + mutex_unlock(&blk_mq_cpuhp_lock); + + spin_lock(&q->unused_hctx_lock); + list_splice(&hctx_list, &q->unused_hctx_list); + spin_unlock(&q->unused_hctx_lock); +} + +/* + * Register cpuhp callbacks from all hw queues + * + * Safe to call if this `request_queue` is live + */ +static void blk_mq_add_hw_queues_cpuhp(struct request_queue *q) +{ + struct blk_mq_hw_ctx *hctx; + unsigned long i; + + mutex_lock(&blk_mq_cpuhp_lock); + queue_for_each_hw_ctx(q, hctx, i) + __blk_mq_add_cpuhp(hctx); + mutex_unlock(&blk_mq_cpuhp_lock); }
/* @@ -3797,8 +3876,6 @@ static void blk_mq_exit_hctx(struct request_queue *q, if (set->ops->exit_hctx) set->ops->exit_hctx(hctx, hctx_idx);
- blk_mq_remove_cpuhp(hctx); - xa_erase(&q->hctx_table, hctx_idx);
spin_lock(&q->unused_hctx_lock); @@ -3815,6 +3892,7 @@ static void blk_mq_exit_hw_queues(struct request_queue *q, queue_for_each_hw_ctx(q, hctx, i) { if (i == nr_queue) break; + blk_mq_remove_cpuhp(hctx); blk_mq_exit_hctx(q, set, hctx, i); } } @@ -3878,6 +3956,8 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set, INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn); spin_lock_init(&hctx->lock); INIT_LIST_HEAD(&hctx->dispatch); + INIT_HLIST_NODE(&hctx->cpuhp_dead); + INIT_HLIST_NODE(&hctx->cpuhp_online); hctx->queue = q; hctx->flags = set->flags & ~BLK_MQ_F_TAG_QUEUE_SHARED;
@@ -4416,6 +4496,12 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, xa_for_each_start(&q->hctx_table, j, hctx, j) blk_mq_exit_hctx(q, set, hctx, j); mutex_unlock(&q->sysfs_lock); + + /* unregister cpuhp callbacks for exited hctxs */ + blk_mq_remove_hw_queues_cpuhp(q); + + /* register cpuhp for new initialized hctxs */ + blk_mq_add_hw_queues_cpuhp(q); }
int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nilay Shroff nilay@linux.ibm.com
[ Upstream commit be26ba96421ab0a8fa2055ccf7db7832a13c44d2 ]
For storing a value to a queue attribute, the queue_attr_store function first freezes the queue (->q_usage_counter(io)) and then acquire ->sysfs_lock. This seems not correct as the usual ordering should be to acquire ->sysfs_lock before freezing the queue. This incorrect ordering causes the following lockdep splat which we are able to reproduce always simply by accessing /sys/kernel/debug file using ls command:
[ 57.597146] WARNING: possible circular locking dependency detected [ 57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G W [ 57.597162] ------------------------------------------------------ [ 57.597168] ls/4605 is trying to acquire lock: [ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0 [ 57.597200] but task is already holding lock: [ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 [ 57.597226] which lock already depends on the new lock.
[ 57.597233] the existing dependency chain (in reverse order) is: [ 57.597241] -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}: [ 57.597255] down_write+0x6c/0x18c [ 57.597264] start_creating+0xb4/0x24c [ 57.597274] debugfs_create_dir+0x2c/0x1e8 [ 57.597283] blk_register_queue+0xec/0x294 [ 57.597292] add_disk_fwnode+0x2e4/0x548 [ 57.597302] brd_alloc+0x2c8/0x338 [ 57.597309] brd_init+0x100/0x178 [ 57.597317] do_one_initcall+0x88/0x3e4 [ 57.597326] kernel_init_freeable+0x3cc/0x6e0 [ 57.597334] kernel_init+0x34/0x1cc [ 57.597342] ret_from_kernel_user_thread+0x14/0x1c [ 57.597350] -> #4 (&q->debugfs_mutex){+.+.}-{4:4}: [ 57.597362] __mutex_lock+0xfc/0x12a0 [ 57.597370] blk_register_queue+0xd4/0x294 [ 57.597379] add_disk_fwnode+0x2e4/0x548 [ 57.597388] brd_alloc+0x2c8/0x338 [ 57.597395] brd_init+0x100/0x178 [ 57.597402] do_one_initcall+0x88/0x3e4 [ 57.597410] kernel_init_freeable+0x3cc/0x6e0 [ 57.597418] kernel_init+0x34/0x1cc [ 57.597426] ret_from_kernel_user_thread+0x14/0x1c [ 57.597434] -> #3 (&q->sysfs_lock){+.+.}-{4:4}: [ 57.597446] __mutex_lock+0xfc/0x12a0 [ 57.597454] queue_attr_store+0x9c/0x110 [ 57.597462] sysfs_kf_write+0x70/0xb0 [ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac [ 57.597480] vfs_write+0x3dc/0x6e8 [ 57.597488] ksys_write+0x84/0x140 [ 57.597495] system_call_exception+0x130/0x360 [ 57.597504] system_call_common+0x160/0x2c4 [ 57.597516] -> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}: [ 57.597530] __submit_bio+0x5ec/0x828 [ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0 [ 57.597547] iomap_readahead+0x2a0/0x448 [ 57.597556] xfs_vm_readahead+0x28/0x3c [ 57.597564] read_pages+0x88/0x41c [ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8 [ 57.597580] filemap_get_pages+0x188/0x984 [ 57.597588] filemap_read+0x13c/0x4bc [ 57.597596] xfs_file_buffered_read+0x88/0x17c [ 57.597605] xfs_file_read_iter+0xac/0x158 [ 57.597614] vfs_read+0x2d4/0x3b4 [ 57.597622] ksys_read+0x84/0x144 [ 57.597629] system_call_exception+0x130/0x360 [ 57.597637] system_call_common+0x160/0x2c4 [ 57.597647] -> #1 (mapping.invalidate_lock#2){++++}-{4:4}: [ 57.597661] down_read+0x6c/0x220 [ 57.597669] filemap_fault+0x870/0x100c [ 57.597677] xfs_filemap_fault+0xc4/0x18c [ 57.597684] __do_fault+0x64/0x164 [ 57.597693] __handle_mm_fault+0x1274/0x1dac [ 57.597702] handle_mm_fault+0x248/0x484 [ 57.597711] ___do_page_fault+0x428/0xc0c [ 57.597719] hash__do_page_fault+0x30/0x68 [ 57.597727] do_hash_fault+0x90/0x35c [ 57.597736] data_access_common_virt+0x210/0x220 [ 57.597745] _copy_from_user+0xf8/0x19c [ 57.597754] sel_write_load+0x178/0xd54 [ 57.597762] vfs_write+0x108/0x6e8 [ 57.597769] ksys_write+0x84/0x140 [ 57.597777] system_call_exception+0x130/0x360 [ 57.597785] system_call_common+0x160/0x2c4 [ 57.597794] -> #0 (&mm->mmap_lock){++++}-{4:4}: [ 57.597806] __lock_acquire+0x17cc/0x2330 [ 57.597814] lock_acquire+0x138/0x400 [ 57.597822] __might_fault+0x7c/0xc0 [ 57.597830] filldir64+0xe8/0x390 [ 57.597839] dcache_readdir+0x80/0x2d4 [ 57.597846] iterate_dir+0xd8/0x1d4 [ 57.597855] sys_getdents64+0x88/0x2d4 [ 57.597864] system_call_exception+0x130/0x360 [ 57.597872] system_call_common+0x160/0x2c4 [ 57.597881] other info that might help us debug this:
[ 57.597888] Chain exists of: &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3
[ 57.597905] Possible unsafe locking scenario:
[ 57.597911] CPU0 CPU1 [ 57.597917] ---- ---- [ 57.597922] rlock(&sb->s_type->i_mutex_key#3); [ 57.597932] lock(&q->debugfs_mutex); [ 57.597940] lock(&sb->s_type->i_mutex_key#3); [ 57.597950] rlock(&mm->mmap_lock); [ 57.597958] *** DEADLOCK ***
[ 57.597965] 2 locks held by ls/4605: [ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154 [ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4
Prevent the above lockdep warning by acquiring ->sysfs_lock before freezing the queue while storing a queue attribute in queue_attr_store function. Later, we also found[1] another function __blk_mq_update_nr_ hw_queues where we first freeze queue and then acquire the ->sysfs_lock. So we've also updated lock ordering in __blk_mq_update_nr_hw_queues function and ensured that in all code paths we follow the correct lock ordering i.e. acquire ->sysfs_lock before freezing the queue.
[1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSB...
Reported-by: kjain@linux.ibm.com Fixes: af2814149883 ("block: freeze the queue in queue_attr_store") Tested-by: kjain@linux.ibm.com Cc: hch@lst.de Cc: axboe@kernel.dk Cc: ritesh.list@gmail.com Cc: ming.lei@redhat.com Cc: gjoyce@linux.ibm.com Signed-off-by: Nilay Shroff nilay@linux.ibm.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20241210144222.1066229-1-nilay@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-mq-sysfs.c | 16 ++++++---------- block/blk-mq.c | 29 ++++++++++++++++++----------- block/blk-sysfs.c | 4 ++-- 3 files changed, 26 insertions(+), 23 deletions(-)
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 156e9bb07abf..cd5ea6eaa76b 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -275,15 +275,13 @@ void blk_mq_sysfs_unregister_hctxs(struct request_queue *q) struct blk_mq_hw_ctx *hctx; unsigned long i;
- mutex_lock(&q->sysfs_dir_lock); + lockdep_assert_held(&q->sysfs_dir_lock); + if (!q->mq_sysfs_init_done) - goto unlock; + return;
queue_for_each_hw_ctx(q, hctx, i) blk_mq_unregister_hctx(hctx); - -unlock: - mutex_unlock(&q->sysfs_dir_lock); }
int blk_mq_sysfs_register_hctxs(struct request_queue *q) @@ -292,9 +290,10 @@ int blk_mq_sysfs_register_hctxs(struct request_queue *q) unsigned long i; int ret = 0;
- mutex_lock(&q->sysfs_dir_lock); + lockdep_assert_held(&q->sysfs_dir_lock); + if (!q->mq_sysfs_init_done) - goto unlock; + return ret;
queue_for_each_hw_ctx(q, hctx, i) { ret = blk_mq_register_hctx(hctx); @@ -302,8 +301,5 @@ int blk_mq_sysfs_register_hctxs(struct request_queue *q) break; }
-unlock: - mutex_unlock(&q->sysfs_dir_lock); - return ret; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 1030875a3e95..cc1b32023838 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4462,7 +4462,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, unsigned long i, j;
/* protect against switching io scheduler */ - mutex_lock(&q->sysfs_lock); + lockdep_assert_held(&q->sysfs_lock); + for (i = 0; i < set->nr_hw_queues; i++) { int old_node; int node = blk_mq_get_hctx_node(set, i); @@ -4495,7 +4496,6 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
xa_for_each_start(&q->hctx_table, j, hctx, j) blk_mq_exit_hctx(q, set, hctx, j); - mutex_unlock(&q->sysfs_lock);
/* unregister cpuhp callbacks for exited hctxs */ blk_mq_remove_hw_queues_cpuhp(q); @@ -4527,10 +4527,14 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
xa_init(&q->hctx_table);
+ mutex_lock(&q->sysfs_lock); + blk_mq_realloc_hw_ctxs(set, q); if (!q->nr_hw_queues) goto err_hctxs;
+ mutex_unlock(&q->sysfs_lock); + INIT_WORK(&q->timeout_work, blk_mq_timeout_work); blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ);
@@ -4549,6 +4553,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, return 0;
err_hctxs: + mutex_unlock(&q->sysfs_lock); blk_mq_release(q); err_exit: q->mq_ops = NULL; @@ -4929,12 +4934,12 @@ static bool blk_mq_elv_switch_none(struct list_head *head, return false;
/* q->elevator needs protection from ->sysfs_lock */ - mutex_lock(&q->sysfs_lock); + lockdep_assert_held(&q->sysfs_lock);
/* the check has to be done with holding sysfs_lock */ if (!q->elevator) { kfree(qe); - goto unlock; + goto out; }
INIT_LIST_HEAD(&qe->node); @@ -4944,9 +4949,7 @@ static bool blk_mq_elv_switch_none(struct list_head *head, __elevator_get(qe->type); list_add(&qe->node, head); elevator_disable(q); -unlock: - mutex_unlock(&q->sysfs_lock); - +out: return true; }
@@ -4975,11 +4978,9 @@ static void blk_mq_elv_switch_back(struct list_head *head, list_del(&qe->node); kfree(qe);
- mutex_lock(&q->sysfs_lock); elevator_switch(q, t); /* drop the reference acquired in blk_mq_elv_switch_none */ elevator_put(t); - mutex_unlock(&q->sysfs_lock); }
static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, @@ -4999,8 +5000,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, if (set->nr_maps == 1 && nr_hw_queues == set->nr_hw_queues) return;
- list_for_each_entry(q, &set->tag_list, tag_set_list) + list_for_each_entry(q, &set->tag_list, tag_set_list) { + mutex_lock(&q->sysfs_dir_lock); + mutex_lock(&q->sysfs_lock); blk_mq_freeze_queue(q); + } /* * Switch IO scheduler to 'none', cleaning up the data associated * with the previous scheduler. We will switch back once we are done @@ -5056,8 +5060,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, list_for_each_entry(q, &set->tag_list, tag_set_list) blk_mq_elv_switch_back(&head, q);
- list_for_each_entry(q, &set->tag_list, tag_set_list) + list_for_each_entry(q, &set->tag_list, tag_set_list) { blk_mq_unfreeze_queue(q); + mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->sysfs_dir_lock); + }
/* Free the excess tags when nr_hw_queues shrink. */ for (i = set->nr_hw_queues; i < prev_nr_hw_queues; i++) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 207577145c54..42c2cb97d778 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -690,11 +690,11 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr, return res; }
- blk_mq_freeze_queue(q); mutex_lock(&q->sysfs_lock); + blk_mq_freeze_queue(q); res = entry->store(disk, page, length); - mutex_unlock(&q->sysfs_lock); blk_mq_unfreeze_queue(q); + mutex_unlock(&q->sysfs_lock); return res; }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
commit 7a5f93ea5862da91488975acaa0c7abd508f192b upstream.
Each `bindgen` release may upgrade the list of Rust targets. For instance, currently, in their master branch [1], the latest ones are:
Nightly => { vectorcall_abi: #124485, ptr_metadata: #81513, layout_for_ptr: #69835, }, Stable_1_77(77) => { offset_of: #106655 }, Stable_1_73(73) => { thiscall_abi: #42202 }, Stable_1_71(71) => { c_unwind_abi: #106075 }, Stable_1_68(68) => { abi_efiapi: #105795 },
By default, the highest stable release in their list is used, and users are expected to set one if they need to support older Rust versions (e.g. see [2]).
Thus, over time, new Rust features are used by default, and at some point, it is likely that `bindgen` will emit Rust code that requires a Rust version higher than our minimum (or perhaps enabling an unstable feature). Currently, there is no problem because the maximum they have, as seen above, is Rust 1.77.0, and our current minimum is Rust 1.78.0.
Therefore, set a Rust target explicitly now to prevent going forward in time too much and thus getting potential build failures at some point.
Since we also support a minimum `bindgen` version, and since `bindgen` does not support passing unknown Rust target versions, we need to use the list of our minimum `bindgen` version, rather than the latest. So, since `bindgen` 0.65.1 had this list [3], we need to use Rust 1.68.0:
/// Rust stable 1.64 /// * `core_ffi_c` ([Tracking issue](https://github.com/rust-lang/rust/issues/94501)) => Stable_1_64 => 1.64; /// Rust stable 1.68 /// * `abi_efiapi` calling convention ([Tracking issue](https://github.com/rust-lang/rust/issues/65815)) => Stable_1_68 => 1.68; /// Nightly rust /// * `thiscall` calling convention ([Tracking issue](https://github.com/rust-lang/rust/issues/42202)) /// * `vectorcall` calling convention (no tracking issue) /// * `c_unwind` calling convention ([Tracking issue](https://github.com/rust-lang/rust/issues/74990)) => Nightly => nightly;
...
/// Latest stable release of Rust pub const LATEST_STABLE_RUST: RustTarget = RustTarget::Stable_1_68;
Thus add the `--rust-target 1.68` parameter. Add a comment as well explaining this.
An alternative would be to use the currently running (i.e. actual) `rustc` and `bindgen` versions to pick a "better" Rust target version. However, that would introduce more moving parts depending on the user setup and is also more complex to implement.
Starting with `bindgen` 0.71.0 [4], we will be able to set any future Rust version instead, i.e. we will be able to set here our minimum supported Rust version. Christian implemented it [5] after seeing this patch. Thanks!
Cc: Christian Poveda git@pvdrz.com Cc: Emilio Cobos Álvarez emilio@crisal.io Cc: stable@vger.kernel.org # needed for 6.12.y; unneeded for 6.6.y; do not apply to 6.1.y Fixes: c844fa64a2d4 ("rust: start supporting several `bindgen` versions") Link: https://github.com/rust-lang/rust-bindgen/blob/21c60f473f4e824d4aa9b2b508056... [1] Link: https://github.com/rust-lang/rust-bindgen/issues/2960 [2] Link: https://github.com/rust-lang/rust-bindgen/blob/7d243056d335fdc4537f7bca73c06... [3] Link: https://github.com/rust-lang/rust-bindgen/blob/main/CHANGELOG.md#0710-2024-1... [4] Link: https://github.com/rust-lang/rust-bindgen/pull/2993 [5] Reviewed-by: Alice Ryhl aliceryhl@google.com Link: https://lore.kernel.org/r/20241123180323.255997-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- rust/Makefile | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
--- a/rust/Makefile +++ b/rust/Makefile @@ -267,9 +267,22 @@ endif
bindgen_c_flags_final = $(bindgen_c_flags_lto) -D__BINDGEN__
+# Each `bindgen` release may upgrade the list of Rust target versions. By +# default, the highest stable release in their list is used. Thus we need to set +# a `--rust-target` to avoid future `bindgen` releases emitting code that +# `rustc` may not understand. On top of that, `bindgen` does not support passing +# an unknown Rust target version. +# +# Therefore, the Rust target for `bindgen` can be only as high as the minimum +# Rust version the kernel supports and only as high as the greatest stable Rust +# target supported by the minimum `bindgen` version the kernel supports (that +# is, if we do not test the actual `rustc`/`bindgen` versions running). +# +# Starting with `bindgen` 0.71.0, we will be able to set any future Rust version +# instead, i.e. we will be able to set here our minimum supported Rust version. quiet_cmd_bindgen = BINDGEN $@ cmd_bindgen = \ - $(BINDGEN) $< $(bindgen_target_flags) \ + $(BINDGEN) $< $(bindgen_target_flags) --rust-target 1.68 \ --use-core --with-derive-default --ctypes-prefix core::ffi --no-layout-tests \ --no-debug '.*' --enable-function-attribute-detection \ -o $@ -- $(bindgen_c_flags_final) -DMODULE \
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
commit 6685f5d572c22e1003e7c0d089afe1c64340ab1f upstream.
commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests, but didn't add trap handling. A previous patch supplied the missing trap handling.
Existing VMs that have the MPAM field of ID_AA64PFR0_EL1 set need to be migratable, but there is little point enabling the MPAM CPU interface on new VMs until there is something a guest can do with it.
Clear the MPAM field from the guest's ID_AA64PFR0_EL1 and on hardware that supports MPAM, politely ignore the VMMs attempts to set this bit.
Guests exposed to this bug have the sanitised value of the MPAM field, so only the correct value needs to be ignored. This means the field can continue to be used to block migration to incompatible hardware (between MPAM=1 and MPAM=5), and the VMM can't rely on the field being ignored.
Signed-off-by: James Morse james.morse@arm.com Co-developed-by: Joey Gouly joey.gouly@arm.com Signed-off-by: Joey Gouly joey.gouly@arm.com Reviewed-by: Gavin Shan gshan@redhat.com Tested-by: Shameer Kolothum shameerali.kolothum.thodi@huawei.com Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20241030160317.2528209-7-joey.gouly@arm.com Signed-off-by: Oliver Upton oliver.upton@linux.dev [maz: adapted to lack of ID_FILTERED()] Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/sys_regs.c | 55 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 52 insertions(+), 3 deletions(-)
--- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1535,6 +1535,7 @@ static u64 __kvm_read_sanitised_id_reg(c val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTEX); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_DF2); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_PFAR); + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MPAM_frac); break; case SYS_ID_AA64PFR2_EL1: /* We only expose FPMR */ @@ -1724,6 +1725,13 @@ static u64 read_sanitised_id_aa64pfr0_el
val &= ~ID_AA64PFR0_EL1_AMU_MASK;
+ /* + * MPAM is disabled by default as KVM also needs a set of PARTID to + * program the MPAMVPMx_EL2 PARTID remapping registers with. But some + * older kernels let the guest see the ID bit. + */ + val &= ~ID_AA64PFR0_EL1_MPAM_MASK; + return val; }
@@ -1834,6 +1842,42 @@ static int set_id_dfr0_el1(struct kvm_vc return set_id_reg(vcpu, rd, val); }
+static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, + const struct sys_reg_desc *rd, u64 user_val) +{ + u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1); + u64 mpam_mask = ID_AA64PFR0_EL1_MPAM_MASK; + + /* + * Commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits + * in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to + * guests, but didn't add trap handling. KVM doesn't support MPAM and + * always returns an UNDEF for these registers. The guest must see 0 + * for this field. + * + * But KVM must also accept values from user-space that were provided + * by KVM. On CPUs that support MPAM, permit user-space to write + * the sanitizied value to ID_AA64PFR0_EL1.MPAM, but ignore this field. + */ + if ((hw_val & mpam_mask) == (user_val & mpam_mask)) + user_val &= ~ID_AA64PFR0_EL1_MPAM_MASK; + + return set_id_reg(vcpu, rd, user_val); +} + +static int set_id_aa64pfr1_el1(struct kvm_vcpu *vcpu, + const struct sys_reg_desc *rd, u64 user_val) +{ + u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1); + u64 mpam_mask = ID_AA64PFR1_EL1_MPAM_frac_MASK; + + /* See set_id_aa64pfr0_el1 for comment about MPAM */ + if ((hw_val & mpam_mask) == (user_val & mpam_mask)) + user_val &= ~ID_AA64PFR1_EL1_MPAM_frac_MASK; + + return set_id_reg(vcpu, rd, user_val); +} + /* * cpufeature ID register user accessors * @@ -2377,7 +2421,7 @@ static const struct sys_reg_desc sys_reg { SYS_DESC(SYS_ID_AA64PFR0_EL1), .access = access_id_reg, .get_user = get_id_reg, - .set_user = set_id_reg, + .set_user = set_id_aa64pfr0_el1, .reset = read_sanitised_id_aa64pfr0_el1, .val = ~(ID_AA64PFR0_EL1_AMU | ID_AA64PFR0_EL1_MPAM | @@ -2385,7 +2429,12 @@ static const struct sys_reg_desc sys_reg ID_AA64PFR0_EL1_RAS | ID_AA64PFR0_EL1_AdvSIMD | ID_AA64PFR0_EL1_FP), }, - ID_WRITABLE(ID_AA64PFR1_EL1, ~(ID_AA64PFR1_EL1_PFAR | + { SYS_DESC(SYS_ID_AA64PFR1_EL1), + .access = access_id_reg, + .get_user = get_id_reg, + .set_user = set_id_aa64pfr1_el1, + .reset = kvm_read_sanitised_id_reg, + .val = ~(ID_AA64PFR1_EL1_PFAR | ID_AA64PFR1_EL1_DF2 | ID_AA64PFR1_EL1_MTEX | ID_AA64PFR1_EL1_THE | @@ -2397,7 +2446,7 @@ static const struct sys_reg_desc sys_reg ID_AA64PFR1_EL1_RES0 | ID_AA64PFR1_EL1_MPAM_frac | ID_AA64PFR1_EL1_RAS_frac | - ID_AA64PFR1_EL1_MTE)), + ID_AA64PFR1_EL1_MTE), }, ID_WRITABLE(ID_AA64PFR2_EL1, ID_AA64PFR2_EL1_FPMR), ID_UNALLOCATED(4,3), ID_WRITABLE(ID_AA64ZFR0_EL1, ~ID_AA64ZFR0_EL1_RES0),
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit f9244fb55f37356f75c739c57323d9422d7aa0f8 upstream.
When removing a netfront device directly after a suspend/resume cycle it might happen that the queues have not been setup again, causing a crash during the attempt to stop the queues another time.
Fix that by checking the queues are existing before trying to stop them.
This is XSA-465 / CVE-2024-53240.
Reported-by: Marek Marczykowski-Górecki marmarek@invisiblethingslab.com Fixes: d50b7914fae0 ("xen-netfront: Fix NULL sring after live migration") Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netfront.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -867,7 +867,7 @@ static netdev_tx_t xennet_start_xmit(str static int xennet_close(struct net_device *dev) { struct netfront_info *np = netdev_priv(dev); - unsigned int num_queues = dev->real_num_tx_queues; + unsigned int num_queues = np->queues ? dev->real_num_tx_queues : 0; unsigned int i; struct netfront_queue *queue; netif_tx_stop_all_queues(np->netdev); @@ -882,6 +882,9 @@ static void xennet_destroy_queues(struct { unsigned int i;
+ if (!info->queues) + return; + for (i = 0; i < info->netdev->real_num_tx_queues; i++) { struct netfront_queue *queue = &info->queues[i];
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit efbcd61d9bebb771c836a3b8bfced8165633db7c upstream.
In order to be able to differentiate between AMD and Intel based systems for very early hypercalls without having to rely on the Xen hypercall page, make get_cpu_vendor() non-static.
Refactor early_cpu_init() for the same reason by splitting out the loop initializing cpu_devs() into an externally callable function.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/processor.h | 2 ++ arch/x86/kernel/cpu/common.c | 36 +++++++++++++++++++++--------------- 2 files changed, 23 insertions(+), 15 deletions(-)
--- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -212,6 +212,8 @@ static inline unsigned long long l1tf_pf return BIT_ULL(boot_cpu_data.x86_cache_bits - 1 - PAGE_SHIFT); }
+void init_cpu_devs(void); +void get_cpu_vendor(struct cpuinfo_x86 *c); extern void early_cpu_init(void); extern void identify_secondary_cpu(struct cpuinfo_x86 *); extern void print_cpu_info(struct cpuinfo_x86 *); --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -868,7 +868,7 @@ static void cpu_detect_tlb(struct cpuinf tlb_lld_4m[ENTRIES], tlb_lld_1g[ENTRIES]); }
-static void get_cpu_vendor(struct cpuinfo_x86 *c) +void get_cpu_vendor(struct cpuinfo_x86 *c) { char *v = c->x86_vendor_id; int i; @@ -1652,15 +1652,11 @@ static void __init early_identify_cpu(st detect_nopl(); }
-void __init early_cpu_init(void) +void __init init_cpu_devs(void) { const struct cpu_dev *const *cdev; int count = 0;
-#ifdef CONFIG_PROCESSOR_SELECT - pr_info("KERNEL supported cpus:\n"); -#endif - for (cdev = __x86_cpu_dev_start; cdev < __x86_cpu_dev_end; cdev++) { const struct cpu_dev *cpudev = *cdev;
@@ -1668,20 +1664,30 @@ void __init early_cpu_init(void) break; cpu_devs[count] = cpudev; count++; + } +}
+void __init early_cpu_init(void) +{ #ifdef CONFIG_PROCESSOR_SELECT - { - unsigned int j; + unsigned int i, j;
- for (j = 0; j < 2; j++) { - if (!cpudev->c_ident[j]) - continue; - pr_info(" %s %s\n", cpudev->c_vendor, - cpudev->c_ident[j]); - } - } + pr_info("KERNEL supported cpus:\n"); #endif + + init_cpu_devs(); + +#ifdef CONFIG_PROCESSOR_SELECT + for (i = 0; i < X86_VENDOR_NUM && cpu_devs[i]; i++) { + for (j = 0; j < 2; j++) { + if (!cpu_devs[i]->c_ident[j]) + continue; + pr_info(" %s %s\n", cpu_devs[i]->c_vendor, + cpu_devs[i]->c_ident[j]); + } } +#endif + early_identify_cpu(&boot_cpu_data); }
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit dda014ba59331dee4f3b773a020e109932f4bd24 upstream.
The syscall instruction is used in Xen PV mode for doing hypercalls. Allow syscall to be used in the kernel in case it is tagged with an unwind hint for objtool.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Co-developed-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/objtool/check.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -3820,9 +3820,12 @@ static int validate_branch(struct objtoo break;
case INSN_CONTEXT_SWITCH: - if (func && (!next_insn || !next_insn->hint)) { - WARN_INSN(insn, "unsupported instruction in callable function"); - return 1; + if (func) { + if (!next_insn || !next_insn->hint) { + WARN_INSN(insn, "unsupported instruction in callable function"); + return 1; + } + break; } return 0;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream.
Add static_call_update_early() for updating static-call targets in very early boot.
This will be needed for support of Xen guest type specific hypercall functions.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Co-developed-by: Peter Zijlstra peterz@infradead.org Co-developed-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/static_call.h | 15 +++++++++++++++ arch/x86/include/asm/sync_core.h | 6 +++--- arch/x86/kernel/static_call.c | 9 +++++++++ include/linux/compiler.h | 37 ++++++++++++++++++++++++++----------- include/linux/static_call.h | 1 + kernel/static_call_inline.c | 2 +- 6 files changed, 55 insertions(+), 15 deletions(-)
--- a/arch/x86/include/asm/static_call.h +++ b/arch/x86/include/asm/static_call.h @@ -65,4 +65,19 @@
extern bool __static_call_fixup(void *tramp, u8 op, void *dest);
+extern void __static_call_update_early(void *tramp, void *func); + +#define static_call_update_early(name, _func) \ +({ \ + typeof(&STATIC_CALL_TRAMP(name)) __F = (_func); \ + if (static_call_initialized) { \ + __static_call_update(&STATIC_CALL_KEY(name), \ + STATIC_CALL_TRAMP_ADDR(name), __F);\ + } else { \ + WRITE_ONCE(STATIC_CALL_KEY(name).func, _func); \ + __static_call_update_early(STATIC_CALL_TRAMP_ADDR(name),\ + __F); \ + } \ +}) + #endif /* _ASM_STATIC_CALL_H */ --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -8,7 +8,7 @@ #include <asm/special_insns.h>
#ifdef CONFIG_X86_32 -static inline void iret_to_self(void) +static __always_inline void iret_to_self(void) { asm volatile ( "pushfl\n\t" @@ -19,7 +19,7 @@ static inline void iret_to_self(void) : ASM_CALL_CONSTRAINT : : "memory"); } #else -static inline void iret_to_self(void) +static __always_inline void iret_to_self(void) { unsigned int tmp;
@@ -55,7 +55,7 @@ static inline void iret_to_self(void) * Like all of Linux's memory ordering operations, this is a * compiler barrier as well. */ -static inline void sync_core(void) +static __always_inline void sync_core(void) { /* * The SERIALIZE instruction is the most straightforward way to --- a/arch/x86/kernel/static_call.c +++ b/arch/x86/kernel/static_call.c @@ -172,6 +172,15 @@ void arch_static_call_transform(void *si } EXPORT_SYMBOL_GPL(arch_static_call_transform);
+noinstr void __static_call_update_early(void *tramp, void *func) +{ + BUG_ON(system_state != SYSTEM_BOOTING); + BUG_ON(!early_boot_irqs_disabled); + BUG_ON(static_call_initialized); + __text_gen_insn(tramp, JMP32_INSN_OPCODE, tramp, func, JMP32_INSN_SIZE); + sync_core(); +} + #ifdef CONFIG_MITIGATION_RETHUNK /* * This is called by apply_returns() to fix up static call trampolines, --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -216,28 +216,43 @@ void ftrace_likely_update(struct ftrace_
#endif /* __KERNEL__ */
+/** + * offset_to_ptr - convert a relative memory offset to an absolute pointer + * @off: the address of the 32-bit offset value + */ +static inline void *offset_to_ptr(const int *off) +{ + return (void *)((unsigned long)off + *off); +} + +#endif /* __ASSEMBLY__ */ + +#ifdef CONFIG_64BIT +#define ARCH_SEL(a,b) a +#else +#define ARCH_SEL(a,b) b +#endif + /* * Force the compiler to emit 'sym' as a symbol, so that we can reference * it from inline assembler. Necessary in case 'sym' could be inlined * otherwise, or eliminated entirely due to lack of references that are * visible to the compiler. */ -#define ___ADDRESSABLE(sym, __attrs) \ - static void * __used __attrs \ +#define ___ADDRESSABLE(sym, __attrs) \ + static void * __used __attrs \ __UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)(uintptr_t)&sym; + #define __ADDRESSABLE(sym) \ ___ADDRESSABLE(sym, __section(".discard.addressable"))
-/** - * offset_to_ptr - convert a relative memory offset to an absolute pointer - * @off: the address of the 32-bit offset value - */ -static inline void *offset_to_ptr(const int *off) -{ - return (void *)((unsigned long)off + *off); -} +#define __ADDRESSABLE_ASM(sym) \ + .pushsection .discard.addressable,"aw"; \ + .align ARCH_SEL(8,4); \ + ARCH_SEL(.quad, .long) __stringify(sym); \ + .popsection;
-#endif /* __ASSEMBLY__ */ +#define __ADDRESSABLE_ASM_STR(sym) __stringify(__ADDRESSABLE_ASM(sym))
/* &a[0] degrades to a pointer: a different type from an array */ #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0])) --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -138,6 +138,7 @@ #ifdef CONFIG_HAVE_STATIC_CALL #include <asm/static_call.h>
+extern int static_call_initialized; /* * Either @site or @tramp can be NULL. */ --- a/kernel/static_call_inline.c +++ b/kernel/static_call_inline.c @@ -15,7 +15,7 @@ extern struct static_call_site __start_s extern struct static_call_tramp_key __start_static_call_tramp_key[], __stop_static_call_tramp_key[];
-static int static_call_initialized; +int static_call_initialized;
/* * Must be called before early_initcall() to be effective.
On 17. 12. 24, 18:08, Greg Kroah-Hartman wrote:
6.12-stable review patch. If anyone has any objections, please let me know.
From: Juergen Gross jgross@suse.com
commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream.
Add static_call_update_early() for updating static-call targets in very early boot.
This will be needed for support of Xen guest type specific hypercall functions.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Co-developed-by: Peter Zijlstra peterz@infradead.org Co-developed-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/x86/include/asm/static_call.h | 15 +++++++++++++++ arch/x86/include/asm/sync_core.h | 6 +++--- arch/x86/kernel/static_call.c | 9 +++++++++ include/linux/compiler.h | 37 ++++++++++++++++++++++++++----------- include/linux/static_call.h | 1 + kernel/static_call_inline.c | 2 +- 6 files changed, 55 insertions(+), 15 deletions(-)
--- a/arch/x86/include/asm/static_call.h +++ b/arch/x86/include/asm/static_call.h @@ -65,4 +65,19 @@ extern bool __static_call_fixup(void *tramp, u8 op, void *dest); +extern void __static_call_update_early(void *tramp, void *func);
+#define static_call_update_early(name, _func) \ +({ \
- typeof(&STATIC_CALL_TRAMP(name)) __F = (_func); \
- if (static_call_initialized) { \
__static_call_update(&STATIC_CALL_KEY(name), \
STATIC_CALL_TRAMP_ADDR(name), __F);\
- } else { \
WRITE_ONCE(STATIC_CALL_KEY(name).func, _func); \
__static_call_update_early(STATIC_CALL_TRAMP_ADDR(name),\
__F); \
- } \
+})
...
--- a/kernel/static_call_inline.c +++ b/kernel/static_call_inline.c @@ -15,7 +15,7 @@ extern struct static_call_site __start_s extern struct static_call_tramp_key __start_static_call_tramp_key[], __stop_static_call_tramp_key[]; -static int static_call_initialized; +int static_call_initialized;
This breaks the build on i386:
ld: arch/x86/xen/enlighten.o: in function `__xen_hypercall_setfunc': enlighten.c:(.noinstr.text+0x2a): undefined reference to `static_call_initialized' ld: enlighten.c:(.noinstr.text+0x62): undefined reference to `static_call_initialized' ld: arch/x86/kernel/static_call.o: in function `__static_call_update_early': static_call.c:(.noinstr.text+0x15): undefined reference to `static_call_initialized'
kernel/static_call_inline.c containing this `static_call_initialized` is not built there as: HAVE_STATIC_CALL_INLINE=n -> HAVE_OBJTOOL=n -> X86_64=n
This is broken in upstream too.
thanks,
On 18.12.24 09:37, Jiri Slaby wrote:
On 17. 12. 24, 18:08, Greg Kroah-Hartman wrote:
6.12-stable review patch. If anyone has any objections, please let me know.
From: Juergen Gross jgross@suse.com
commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream.
Add static_call_update_early() for updating static-call targets in very early boot.
This will be needed for support of Xen guest type specific hypercall functions.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Co-developed-by: Peter Zijlstra peterz@infradead.org Co-developed-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/x86/include/asm/static_call.h | 15 +++++++++++++++ arch/x86/include/asm/sync_core.h | 6 +++--- arch/x86/kernel/static_call.c | 9 +++++++++ include/linux/compiler.h | 37 ++++++++++++++++++++++++++----------- include/linux/static_call.h | 1 + kernel/static_call_inline.c | 2 +- 6 files changed, 55 insertions(+), 15 deletions(-)
--- a/arch/x86/include/asm/static_call.h +++ b/arch/x86/include/asm/static_call.h @@ -65,4 +65,19 @@ extern bool __static_call_fixup(void *tramp, u8 op, void *dest); +extern void __static_call_update_early(void *tramp, void *func);
+#define static_call_update_early(name, _func) \ +({ \ + typeof(&STATIC_CALL_TRAMP(name)) __F = (_func); \ + if (static_call_initialized) { \ + __static_call_update(&STATIC_CALL_KEY(name), \ + STATIC_CALL_TRAMP_ADDR(name), __F);\ + } else { \ + WRITE_ONCE(STATIC_CALL_KEY(name).func, _func); \ + __static_call_update_early(STATIC_CALL_TRAMP_ADDR(name),\ + __F); \ + } \ +})
...
--- a/kernel/static_call_inline.c +++ b/kernel/static_call_inline.c @@ -15,7 +15,7 @@ extern struct static_call_site __start_s extern struct static_call_tramp_key __start_static_call_tramp_key[], __stop_static_call_tramp_key[]; -static int static_call_initialized; +int static_call_initialized;
This breaks the build on i386:
ld: arch/x86/xen/enlighten.o: in function `__xen_hypercall_setfunc': enlighten.c:(.noinstr.text+0x2a): undefined reference to `static_call_initialized' ld: enlighten.c:(.noinstr.text+0x62): undefined reference to `static_call_initialized' ld: arch/x86/kernel/static_call.o: in function `__static_call_update_early': static_call.c:(.noinstr.text+0x15): undefined reference to `static_call_initialized'
kernel/static_call_inline.c containing this `static_call_initialized` is not built there as: HAVE_STATIC_CALL_INLINE=n -> HAVE_OBJTOOL=n -> X86_64=n
This is broken in upstream too.
I've sent a fix already:
https://lore.kernel.org/lkml/20241218080228.9742-1-jgross@suse.com/T/#u
Juergen
On Wed, Dec 18, 2024 at 09:53:24AM +0100, Jürgen Groß wrote:
On 18.12.24 09:37, Jiri Slaby wrote:
On 17. 12. 24, 18:08, Greg Kroah-Hartman wrote:
6.12-stable review patch. If anyone has any objections, please let me know.
From: Juergen Gross jgross@suse.com
commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream.
Add static_call_update_early() for updating static-call targets in very early boot.
This will be needed for support of Xen guest type specific hypercall functions.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Co-developed-by: Peter Zijlstra peterz@infradead.org Co-developed-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/x86/include/asm/static_call.h | 15 +++++++++++++++ arch/x86/include/asm/sync_core.h | 6 +++--- arch/x86/kernel/static_call.c | 9 +++++++++ include/linux/compiler.h | 37 ++++++++++++++++++++++++++----------- include/linux/static_call.h | 1 + kernel/static_call_inline.c | 2 +- 6 files changed, 55 insertions(+), 15 deletions(-)
--- a/arch/x86/include/asm/static_call.h +++ b/arch/x86/include/asm/static_call.h @@ -65,4 +65,19 @@ extern bool __static_call_fixup(void *tramp, u8 op, void *dest); +extern void __static_call_update_early(void *tramp, void *func);
+#define static_call_update_early(name, _func) \ +({ \ + typeof(&STATIC_CALL_TRAMP(name)) __F = (_func); \ + if (static_call_initialized) { \ + __static_call_update(&STATIC_CALL_KEY(name), \ + STATIC_CALL_TRAMP_ADDR(name), __F);\ + } else { \ + WRITE_ONCE(STATIC_CALL_KEY(name).func, _func); \ + __static_call_update_early(STATIC_CALL_TRAMP_ADDR(name),\ + __F); \ + } \ +})
...
--- a/kernel/static_call_inline.c +++ b/kernel/static_call_inline.c @@ -15,7 +15,7 @@ extern struct static_call_site __start_s extern struct static_call_tramp_key __start_static_call_tramp_key[], __stop_static_call_tramp_key[]; -static int static_call_initialized; +int static_call_initialized;
This breaks the build on i386:
ld: arch/x86/xen/enlighten.o: in function `__xen_hypercall_setfunc': enlighten.c:(.noinstr.text+0x2a): undefined reference to `static_call_initialized' ld: enlighten.c:(.noinstr.text+0x62): undefined reference to `static_call_initialized' ld: arch/x86/kernel/static_call.o: in function `__static_call_update_early': static_call.c:(.noinstr.text+0x15): undefined reference to `static_call_initialized'
kernel/static_call_inline.c containing this `static_call_initialized` is not built there as: HAVE_STATIC_CALL_INLINE=n -> HAVE_OBJTOOL=n -> X86_64=n
This is broken in upstream too.
I've sent a fix already:
https://lore.kernel.org/lkml/20241218080228.9742-1-jgross@suse.com/T/#u
Thanks, I'll go queue that up (after fixing it up for the different branches...)
greg k-h
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit a2796dff62d6c6bfc5fbebdf2bee0d5ac0438906 upstream.
Instead of jumping to the Xen hypercall page for doing the iret hypercall, directly code the required sequence in xen-asm.S.
This is done in preparation of no longer using hypercall page at all, as it has shown to cause problems with speculation mitigations.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/xen/xen-asm.S | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-)
--- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -176,7 +176,6 @@ SYM_CODE_START(xen_early_idt_handler_arr SYM_CODE_END(xen_early_idt_handler_array) __FINIT
-hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32 /* * Xen64 iret frame: * @@ -186,17 +185,28 @@ hypercall_iret = hypercall_page + __HYPE * cs * rip <-- standard iret frame * - * flags + * flags <-- xen_iret must push from here on * - * rcx } - * r11 }<-- pushed by hypercall page - * rsp->rax } + * rcx + * r11 + * rsp->rax */ +.macro xen_hypercall_iret + pushq $0 /* Flags */ + push %rcx + push %r11 + push %rax + mov $__HYPERVISOR_iret, %eax + syscall /* Do the IRET. */ +#ifdef CONFIG_MITIGATION_SLS + int3 +#endif +.endm + SYM_CODE_START(xen_iret) UNWIND_HINT_UNDEFINED ANNOTATE_NOENDBR - pushq $0 - jmp hypercall_iret + xen_hypercall_iret SYM_CODE_END(xen_iret)
/* @@ -301,8 +311,7 @@ SYM_CODE_START(xen_entry_SYSENTER_compat ENDBR lea 16(%rsp), %rsp /* strip %rcx, %r11 */ mov $-ENOSYS, %rax - pushq $0 - jmp hypercall_iret + xen_hypercall_iret SYM_CODE_END(xen_entry_SYSENTER_compat) SYM_CODE_END(xen_entry_SYSCALL_compat)
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit b4845bb6383821a9516ce30af3a27dc873e37fd4 upstream.
Add generic hypercall functions usable for all normal (i.e. not iret) hypercalls. Depending on the guest type and the processor vendor different functions need to be used due to the to be used instruction for entering the hypervisor:
- PV guests need to use syscall - HVM/PVH guests on Intel need to use vmcall - HVM/PVH guests on AMD and Hygon need to use vmmcall
As PVH guests need to issue hypercalls very early during boot, there is a 4th hypercall function needed for HVM/PVH which can be used on Intel and AMD processors. It will check the vendor type and then set the Intel or AMD specific function to use via static_call().
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Co-developed-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/xen/hypercall.h | 3 + arch/x86/xen/enlighten.c | 64 ++++++++++++++++++++++++++ arch/x86/xen/enlighten_hvm.c | 4 + arch/x86/xen/enlighten_pv.c | 4 + arch/x86/xen/xen-asm.S | 23 +++++++++ arch/x86/xen/xen-head.S | 83 +++++++++++++++++++++++++++++++++++ arch/x86/xen/xen-ops.h | 9 +++ 7 files changed, 189 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -88,6 +88,9 @@ struct xen_dm_op_buf;
extern struct { char _entry[32]; } hypercall_page[];
+void xen_hypercall_func(void); +DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func); + #define __HYPERCALL "call hypercall_page+%c[offset]" #define __HYPERCALL_ENTRY(x) \ [offset] "i" (__HYPERVISOR_##x * sizeof(hypercall_page[0])) --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -2,6 +2,7 @@
#include <linux/console.h> #include <linux/cpu.h> +#include <linux/instrumentation.h> #include <linux/kexec.h> #include <linux/memblock.h> #include <linux/slab.h> @@ -23,6 +24,9 @@
EXPORT_SYMBOL_GPL(hypercall_page);
+DEFINE_STATIC_CALL(xen_hypercall, xen_hypercall_hvm); +EXPORT_STATIC_CALL_TRAMP(xen_hypercall); + /* * Pointer to the xen_vcpu_info structure or * &HYPERVISOR_shared_info->vcpu_info[cpu]. See xen_hvm_init_shared_info @@ -68,6 +72,66 @@ EXPORT_SYMBOL(xen_start_flags); */ struct shared_info *HYPERVISOR_shared_info = &xen_dummy_shared_info;
+static __ref void xen_get_vendor(void) +{ + init_cpu_devs(); + cpu_detect(&boot_cpu_data); + get_cpu_vendor(&boot_cpu_data); +} + +void xen_hypercall_setfunc(void) +{ + if (static_call_query(xen_hypercall) != xen_hypercall_hvm) + return; + + if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD || + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)) + static_call_update(xen_hypercall, xen_hypercall_amd); + else + static_call_update(xen_hypercall, xen_hypercall_intel); +} + +/* + * Evaluate processor vendor in order to select the correct hypercall + * function for HVM/PVH guests. + * Might be called very early in boot before vendor has been set by + * early_cpu_init(). + */ +noinstr void *__xen_hypercall_setfunc(void) +{ + void (*func)(void); + + /* + * Xen is supported only on CPUs with CPUID, so testing for + * X86_FEATURE_CPUID is a test for early_cpu_init() having been + * run. + * + * Note that __xen_hypercall_setfunc() is noinstr only due to a nasty + * dependency chain: it is being called via the xen_hypercall static + * call when running as a PVH or HVM guest. Hypercalls need to be + * noinstr due to PV guests using hypercalls in noinstr code. So we + * the PV guest requirement is not of interest here (xen_get_vendor() + * calls noinstr functions, and static_call_update_early() might do + * so, too). + */ + instrumentation_begin(); + + if (!boot_cpu_has(X86_FEATURE_CPUID)) + xen_get_vendor(); + + if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD || + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)) + func = xen_hypercall_amd; + else + func = xen_hypercall_intel; + + static_call_update_early(xen_hypercall, func); + + instrumentation_end(); + + return func; +} + static int xen_cpu_up_online(unsigned int cpu) { xen_init_lock_cpu(cpu); --- a/arch/x86/xen/enlighten_hvm.c +++ b/arch/x86/xen/enlighten_hvm.c @@ -300,6 +300,10 @@ static uint32_t __init xen_platform_hvm( if (xen_pv_domain()) return 0;
+ /* Set correct hypercall function. */ + if (xen_domain) + xen_hypercall_setfunc(); + if (xen_pvh_domain() && nopv) { /* Guest booting via the Xen-PVH boot entry goes here */ pr_info(""nopv" parameter is ignored in PVH guest\n"); --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1341,6 +1341,9 @@ asmlinkage __visible void __init xen_sta
xen_domain_type = XEN_PV_DOMAIN; xen_start_flags = xen_start_info->flags; + /* Interrupts are guaranteed to be off initially. */ + early_boot_irqs_disabled = true; + static_call_update_early(xen_hypercall, xen_hypercall_pv);
xen_setup_features();
@@ -1431,7 +1434,6 @@ asmlinkage __visible void __init xen_sta WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_pv, xen_cpu_dead_pv));
local_irq_disable(); - early_boot_irqs_disabled = true;
xen_raw_console_write("mapping kernel into physical memory\n"); xen_setup_kernel_pagetable((pgd_t *)xen_start_info->pt_base, --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -20,10 +20,33 @@
#include <linux/init.h> #include <linux/linkage.h> +#include <linux/objtool.h> #include <../entry/calling.h>
.pushsection .noinstr.text, "ax" /* + * PV hypercall interface to the hypervisor. + * + * Called via inline asm(), so better preserve %rcx and %r11. + * + * Input: + * %eax: hypercall number + * %rdi, %rsi, %rdx, %r10, %r8: args 1..5 for the hypercall + * Output: %rax + */ +SYM_FUNC_START(xen_hypercall_pv) + ANNOTATE_NOENDBR + push %rcx + push %r11 + UNWIND_HINT_SAVE + syscall + UNWIND_HINT_RESTORE + pop %r11 + pop %rcx + RET +SYM_FUNC_END(xen_hypercall_pv) + +/* * Disabling events is simply a matter of making the event mask * non-zero. */ --- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -6,9 +6,11 @@
#include <linux/elfnote.h> #include <linux/init.h> +#include <linux/instrumentation.h>
#include <asm/boot.h> #include <asm/asm.h> +#include <asm/frame.h> #include <asm/msr.h> #include <asm/page_types.h> #include <asm/percpu.h> @@ -87,6 +89,87 @@ SYM_CODE_END(xen_cpu_bringup_again) #endif #endif
+ .pushsection .noinstr.text, "ax" +/* + * Xen hypercall interface to the hypervisor. + * + * Input: + * %eax: hypercall number + * 32-bit: + * %ebx, %ecx, %edx, %esi, %edi: args 1..5 for the hypercall + * 64-bit: + * %rdi, %rsi, %rdx, %r10, %r8: args 1..5 for the hypercall + * Output: %[er]ax + */ +SYM_FUNC_START(xen_hypercall_hvm) + ENDBR + FRAME_BEGIN + /* Save all relevant registers (caller save and arguments). */ +#ifdef CONFIG_X86_32 + push %eax + push %ebx + push %ecx + push %edx + push %esi + push %edi +#else + push %rax + push %rcx + push %rdx + push %rdi + push %rsi + push %r11 + push %r10 + push %r9 + push %r8 +#ifdef CONFIG_FRAME_POINTER + pushq $0 /* Dummy push for stack alignment. */ +#endif +#endif + /* Set the vendor specific function. */ + call __xen_hypercall_setfunc + /* Set ZF = 1 if AMD, Restore saved registers. */ +#ifdef CONFIG_X86_32 + lea xen_hypercall_amd, %ebx + cmp %eax, %ebx + pop %edi + pop %esi + pop %edx + pop %ecx + pop %ebx + pop %eax +#else + lea xen_hypercall_amd(%rip), %rbx + cmp %rax, %rbx +#ifdef CONFIG_FRAME_POINTER + pop %rax /* Dummy pop. */ +#endif + pop %r8 + pop %r9 + pop %r10 + pop %r11 + pop %rsi + pop %rdi + pop %rdx + pop %rcx + pop %rax +#endif + /* Use correct hypercall function. */ + jz xen_hypercall_amd + jmp xen_hypercall_intel +SYM_FUNC_END(xen_hypercall_hvm) + +SYM_FUNC_START(xen_hypercall_amd) + vmmcall + RET +SYM_FUNC_END(xen_hypercall_amd) + +SYM_FUNC_START(xen_hypercall_intel) + vmcall + RET +SYM_FUNC_END(xen_hypercall_intel) + .popsection + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz "linux") ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz "2.6") ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz "xen-3.0") --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -326,4 +326,13 @@ static inline void xen_smp_intr_free_pv( static inline void xen_smp_count_cpus(void) { } #endif /* CONFIG_SMP */
+#ifdef CONFIG_XEN_PV +void xen_hypercall_pv(void); +#endif +void xen_hypercall_hvm(void); +void xen_hypercall_amd(void); +void xen_hypercall_intel(void); +void xen_hypercall_setfunc(void); +void *__xen_hypercall_setfunc(void); + #endif /* XEN_OPS_H */
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit b1c2cb86f4a7861480ad54bb9a58df3cbebf8e92 upstream.
Call the Xen hypervisor via the new xen_hypercall_func static-call instead of the hypercall page.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Co-developed-by: Peter Zijlstra peterz@infradead.org Co-developed-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/xen/hypercall.h | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-)
--- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -39,9 +39,11 @@ #include <linux/string.h> #include <linux/types.h> #include <linux/pgtable.h> +#include <linux/instrumentation.h>
#include <trace/events/xen.h>
+#include <asm/alternative.h> #include <asm/page.h> #include <asm/smap.h> #include <asm/nospec-branch.h> @@ -91,9 +93,17 @@ extern struct { char _entry[32]; } hyper void xen_hypercall_func(void); DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
-#define __HYPERCALL "call hypercall_page+%c[offset]" -#define __HYPERCALL_ENTRY(x) \ - [offset] "i" (__HYPERVISOR_##x * sizeof(hypercall_page[0])) +#ifdef MODULE +#define __ADDRESSABLE_xen_hypercall +#else +#define __ADDRESSABLE_xen_hypercall __ADDRESSABLE_ASM_STR(__SCK__xen_hypercall) +#endif + +#define __HYPERCALL \ + __ADDRESSABLE_xen_hypercall \ + "call __SCT__xen_hypercall" + +#define __HYPERCALL_ENTRY(x) "a" (x)
#ifdef CONFIG_X86_32 #define __HYPERCALL_RETREG "eax" @@ -151,7 +161,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_h __HYPERCALL_0ARG(); \ asm volatile (__HYPERCALL \ : __HYPERCALL_0PARAM \ - : __HYPERCALL_ENTRY(name) \ + : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \ : __HYPERCALL_CLOBBER0); \ (type)__res; \ }) @@ -162,7 +172,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_h __HYPERCALL_1ARG(a1); \ asm volatile (__HYPERCALL \ : __HYPERCALL_1PARAM \ - : __HYPERCALL_ENTRY(name) \ + : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \ : __HYPERCALL_CLOBBER1); \ (type)__res; \ }) @@ -173,7 +183,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_h __HYPERCALL_2ARG(a1, a2); \ asm volatile (__HYPERCALL \ : __HYPERCALL_2PARAM \ - : __HYPERCALL_ENTRY(name) \ + : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \ : __HYPERCALL_CLOBBER2); \ (type)__res; \ }) @@ -184,7 +194,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_h __HYPERCALL_3ARG(a1, a2, a3); \ asm volatile (__HYPERCALL \ : __HYPERCALL_3PARAM \ - : __HYPERCALL_ENTRY(name) \ + : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \ : __HYPERCALL_CLOBBER3); \ (type)__res; \ }) @@ -195,7 +205,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_h __HYPERCALL_4ARG(a1, a2, a3, a4); \ asm volatile (__HYPERCALL \ : __HYPERCALL_4PARAM \ - : __HYPERCALL_ENTRY(name) \ + : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \ : __HYPERCALL_CLOBBER4); \ (type)__res; \ }) @@ -209,12 +219,9 @@ xen_single_call(unsigned int call, __HYPERCALL_DECLS; __HYPERCALL_5ARG(a1, a2, a3, a4, a5);
- if (call >= PAGE_SIZE / sizeof(hypercall_page[0])) - return -EINVAL; - - asm volatile(CALL_NOSPEC + asm volatile(__HYPERCALL : __HYPERCALL_5PARAM - : [thunk_target] "a" (&hypercall_page[call]) + : __HYPERCALL_ENTRY(call) : __HYPERCALL_CLOBBER5);
return (long)__res;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juergen Gross jgross@suse.com
commit 7fa0da5373685e7ed249af3fa317ab1e1ba8b0a6 upstream.
The hypercall page is no longer needed. It can be removed, as from the Xen perspective it is optional.
But, from Linux's perspective, it removes naked RET instructions that escape the speculative protections that Call Depth Tracking and/or Untrain Ret are trying to achieve.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Andrew Cooper andrew.cooper3@citrix.com Reviewed-by: Jan Beulich jbeulich@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/xen/hypercall.h | 2 -- arch/x86/kernel/callthunks.c | 5 ----- arch/x86/xen/enlighten.c | 2 -- arch/x86/xen/enlighten_hvm.c | 9 +-------- arch/x86/xen/enlighten_pvh.c | 7 ------- arch/x86/xen/xen-head.S | 23 ----------------------- 6 files changed, 1 insertion(+), 47 deletions(-)
--- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -88,8 +88,6 @@ struct xen_dm_op_buf; * there aren't more than 5 arguments...) */
-extern struct { char _entry[32]; } hypercall_page[]; - void xen_hypercall_func(void); DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
--- a/arch/x86/kernel/callthunks.c +++ b/arch/x86/kernel/callthunks.c @@ -143,11 +143,6 @@ static bool skip_addr(void *dest) dest < (void*)relocate_kernel + KEXEC_CONTROL_CODE_MAX_SIZE) return true; #endif -#ifdef CONFIG_XEN - if (dest >= (void *)hypercall_page && - dest < (void*)hypercall_page + PAGE_SIZE) - return true; -#endif return false; }
--- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -22,8 +22,6 @@
#include "xen-ops.h"
-EXPORT_SYMBOL_GPL(hypercall_page); - DEFINE_STATIC_CALL(xen_hypercall, xen_hypercall_hvm); EXPORT_STATIC_CALL_TRAMP(xen_hypercall);
--- a/arch/x86/xen/enlighten_hvm.c +++ b/arch/x86/xen/enlighten_hvm.c @@ -106,15 +106,8 @@ static void __init init_hvm_pv_info(void /* PVH set up hypercall page in xen_prepare_pvh(). */ if (xen_pvh_domain()) pv_info.name = "Xen PVH"; - else { - u64 pfn; - uint32_t msr; - + else pv_info.name = "Xen HVM"; - msr = cpuid_ebx(base + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); - }
xen_setup_features();
--- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -129,17 +129,10 @@ static void __init pvh_arch_setup(void)
void __init xen_pvh_init(struct boot_params *boot_params) { - u32 msr; - u64 pfn; - xen_pvh = 1; xen_domain_type = XEN_HVM_DOMAIN; xen_start_flags = pvh_start_info.flags;
- msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); - x86_init.oem.arch_setup = pvh_arch_setup; x86_init.oem.banner = xen_banner;
--- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -22,28 +22,6 @@ #include <xen/interface/xen-mca.h> #include <asm/xen/interface.h>
-.pushsection .noinstr.text, "ax" - .balign PAGE_SIZE -SYM_CODE_START(hypercall_page) - .rept (PAGE_SIZE / 32) - UNWIND_HINT_FUNC - ANNOTATE_NOENDBR - ANNOTATE_UNRET_SAFE - ret - /* - * Xen will write the hypercall page, and sort out ENDBR. - */ - .skip 31, 0xcc - .endr - -#define HYPERCALL(n) \ - .equ xen_hypercall_##n, hypercall_page + __HYPERVISOR_##n * 32; \ - .type xen_hypercall_##n, @function; .size xen_hypercall_##n, 32 -#include <asm/xen-hypercalls.h> -#undef HYPERCALL -SYM_CODE_END(hypercall_page) -.popsection - #ifdef CONFIG_XEN_PV __INIT SYM_CODE_START(startup_xen) @@ -198,7 +176,6 @@ SYM_FUNC_END(xen_hypercall_intel) #else # define FEATURES_DOM0 0 #endif - ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page) ELFNOTE(Xen, XEN_ELFNOTE_SUPPORTED_FEATURES, .long FEATURES_PV | FEATURES_PVH | FEATURES_DOM0) ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz "generic")
On 12/17/24 09:05, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On 12/17/24 10:05, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On 24/12/17 06:05PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
Hey everyone,
when testing the 6.12.6-rc1 release candidate on my Steam Deck I have found that the following issue came up (once). On my laptop everything works fine and even though the issue below came up the device was ususal like always.
I added the relevant Maintainers into CC so they can judge if it's something serious or not. I have also added a full dmesg as an attachment. If anybody has idea about possible reproducers I would be interested in that aswell, as I only saw the issue once so far.
Cheers, Chris
kernel: CPU: 4 UID: 0 PID: 436 Comm: kworker/u32:4 Not tainted 6.12.6-rc1-1home #1 c49ee701ad1a1a28c82c80281ff6159df069d19b kernel: Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024 kernel: Workqueue: sdma0 drm_sched_run_job_work [gpu_sched] kernel: RIP: 0010:check_flush_dependency+0xfc/0x120 kernel: Code: 8b 45 18 48 8d b2 c0 00 00 00 49 89 e8 48 8d 8b c0 00 00 00 48 c7 c7 e0 a1 ae a8 c6 05 29 03 16 02 01 48 89 c2 e8 04 8e fd ff <0f> 0b e9 1f ff ff ff 80 3d 14 03 16 02 00 75 93 e9 4a ff ff ff 66 kernel: RSP: 0018:ffffa65802707c60 EFLAGS: 00010082 kernel: RAX: 0000000000000000 RBX: ffff958c80050800 RCX: 0000000000000027 kernel: RDX: ffff958fb00218c8 RSI: 0000000000000001 RDI: ffff958fb00218c0 kernel: RBP: ffffffffc0a2eb00 R08: 0000000000000000 R09: ffffa65802707ae0 kernel: R10: ffffffffa92b54e8 R11: 0000000000000003 R12: ffff958c899b3580 kernel: R13: ffff958c8cc71c00 R14: ffffa65802707c90 R15: 0000000000000001 kernel: FS: 0000000000000000(0000) GS:ffff958fb0000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007f764c0d5000 CR3: 00000001b6222000 CR4: 0000000000350ef0 kernel: Call Trace: kernel: <TASK> kernel: ? check_flush_dependency+0xfc/0x120 kernel: ? __warn.cold+0x93/0xf6 kernel: ? check_flush_dependency+0xfc/0x120 kernel: ? report_bug+0xff/0x140 kernel: ? handle_bug+0x58/0x90 kernel: ? exc_invalid_op+0x17/0x70 kernel: ? asm_exc_invalid_op+0x1a/0x20 kernel: ? __pfx_amdgpu_device_delay_enable_gfx_off+0x10/0x10 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: ? check_flush_dependency+0xfc/0x120 kernel: __flush_work+0x110/0x2c0 kernel: cancel_delayed_work_sync+0x5e/0x80 kernel: amdgpu_gfx_off_ctrl+0xad/0x140 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: amdgpu_ring_alloc+0x43/0x60 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: amdgpu_ib_schedule+0xf0/0x730 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: amdgpu_job_run+0x8c/0x170 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: ? mod_delayed_work_on+0xa4/0xb0 kernel: drm_sched_run_job_work+0x25c/0x3f0 [gpu_sched da7f9c92395781c75e4ac0d605a4cf839a336d2f] kernel: ? srso_return_thunk+0x5/0x5f kernel: process_one_work+0x17e/0x330 kernel: worker_thread+0x2ce/0x3f0 kernel: ? __pfx_worker_thread+0x10/0x10 kernel: kthread+0xd2/0x100 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork+0x34/0x50 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork_asm+0x1a/0x30 kernel: </TASK> kernel: ---[ end trace 0000000000000000 ]---
On 12/17/24 15:43, Christian Heusel wrote:
On 24/12/17 06:05PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
Hey everyone,
when testing the 6.12.6-rc1 release candidate on my Steam Deck I have found that the following issue came up (once). On my laptop everything works fine and even though the issue below came up the device was ususal like always.
I added the relevant Maintainers into CC so they can judge if it's something serious or not. I have also added a full dmesg as an attachment. If anybody has idea about possible reproducers I would be interested in that aswell, as I only saw the issue once so far.
You might hit
https://gitlab.freedesktop.org/drm/amd/-/issues/3799
Guenter
Cheers, Chris
kernel: CPU: 4 UID: 0 PID: 436 Comm: kworker/u32:4 Not tainted 6.12.6-rc1-1home #1 c49ee701ad1a1a28c82c80281ff6159df069d19b kernel: Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024 kernel: Workqueue: sdma0 drm_sched_run_job_work [gpu_sched] kernel: RIP: 0010:check_flush_dependency+0xfc/0x120 kernel: Code: 8b 45 18 48 8d b2 c0 00 00 00 49 89 e8 48 8d 8b c0 00 00 00 48 c7 c7 e0 a1 ae a8 c6 05 29 03 16 02 01 48 89 c2 e8 04 8e fd ff <0f> 0b e9 1f ff ff ff 80 3d 14 03 16 02 00 75 93 e9 4a ff ff ff 66 kernel: RSP: 0018:ffffa65802707c60 EFLAGS: 00010082 kernel: RAX: 0000000000000000 RBX: ffff958c80050800 RCX: 0000000000000027 kernel: RDX: ffff958fb00218c8 RSI: 0000000000000001 RDI: ffff958fb00218c0 kernel: RBP: ffffffffc0a2eb00 R08: 0000000000000000 R09: ffffa65802707ae0 kernel: R10: ffffffffa92b54e8 R11: 0000000000000003 R12: ffff958c899b3580 kernel: R13: ffff958c8cc71c00 R14: ffffa65802707c90 R15: 0000000000000001 kernel: FS: 0000000000000000(0000) GS:ffff958fb0000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007f764c0d5000 CR3: 00000001b6222000 CR4: 0000000000350ef0 kernel: Call Trace: kernel: <TASK> kernel: ? check_flush_dependency+0xfc/0x120 kernel: ? __warn.cold+0x93/0xf6 kernel: ? check_flush_dependency+0xfc/0x120 kernel: ? report_bug+0xff/0x140 kernel: ? handle_bug+0x58/0x90 kernel: ? exc_invalid_op+0x17/0x70 kernel: ? asm_exc_invalid_op+0x1a/0x20 kernel: ? __pfx_amdgpu_device_delay_enable_gfx_off+0x10/0x10 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: ? check_flush_dependency+0xfc/0x120 kernel: __flush_work+0x110/0x2c0 kernel: cancel_delayed_work_sync+0x5e/0x80 kernel: amdgpu_gfx_off_ctrl+0xad/0x140 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: amdgpu_ring_alloc+0x43/0x60 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: amdgpu_ib_schedule+0xf0/0x730 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: amdgpu_job_run+0x8c/0x170 [amdgpu 857aca8165f9b9ab3793f37419bdc9a45d24aff0] kernel: ? mod_delayed_work_on+0xa4/0xb0 kernel: drm_sched_run_job_work+0x25c/0x3f0 [gpu_sched da7f9c92395781c75e4ac0d605a4cf839a336d2f] kernel: ? srso_return_thunk+0x5/0x5f kernel: process_one_work+0x17e/0x330 kernel: worker_thread+0x2ce/0x3f0 kernel: ? __pfx_worker_thread+0x10/0x10 kernel: kthread+0xd2/0x100 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork+0x34/0x50 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork_asm+0x1a/0x30 kernel: </TASK> kernel: ---[ end trace 0000000000000000 ]---
On 12/17/24 09:05, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
Hi Greg
On Wed, Dec 18, 2024 at 2:28 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
6.12.6-rc1 tested.
Build successfully completed. Boot successfully completed. No dmesg regressions. Video output normal. Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
[ 0.000000] Linux version 6.12.6-rc1rv (takeshi@ThinkPadX1Gen10J0764) (gcc (GCC) 14.2.1 20240910, GNU ld (GNU Binutils) 2.43.0) #1 SMP PREEMPT_DYNAMIC Wed Dec 18 20:05:26 JST 2024
Thanks
Tested-by: Takeshi Ogasawara takeshi.ogasawara@futuring-girl.com
On Tue, Dec 17, 2024 at 06:05:56PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
Am 17.12.2024 um 18:05 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
On Tue, 17 Dec 2024 at 22:55, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
The all i386 builds failed with the gcc-13 and clang-19 toolchain builds on following branches, - linux-6.12.y - linux-6.6.y - linux-6.1.y - linux-5.15.y - linux-5.10.y
* i386, build - clang-19-defconfig - gcc-13-defconfig
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Build log: ------------- i686-linux-gnu-ld: arch/x86/kernel/static_call.o: in function `__static_call_update_early': static_call.c:(.noinstr.text+0x15): undefined reference to `static_call_initialized'
The recent commit on this file is, x86/static-call: provide a way to do very early static-call updates commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream.
links: - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.12.y/build/v6.12.... - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.12.y/build/v6.12....
## Build * kernel: 6.12.6-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: 83a2a70d2d652a062cd4373f4e09baa111160de0 * git describe: v6.12.5-173-g83a2a70d2d65 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.12.y/build/v6.12....
## Test Regressions (compared to v6.12.4-468-g602e3159e817)
* i386, build - clang-19-allnoconfig - clang-19-defconfig - clang-19-lkftconfig-no-kselftest-frag - clang-19-tinyconfig - clang-nightly-allnoconfig - clang-nightly-defconfig - clang-nightly-lkftconfig-kselftest - clang-nightly-tinyconfig - gcc-13-allmodconfig - gcc-13-allnoconfig - gcc-13-defconfig - gcc-13-lkftconfig-kselftest - gcc-13-lkftconfig-no-kselftest-frag - gcc-13-lkftconfig-perf - gcc-13-tinyconfig - gcc-8-allnoconfig - gcc-8-i386_defconfig - gcc-8-tinyconfig
## Metric Regressions (compared to v6.12.4-468-g602e3159e817)
## Test Fixes (compared to v6.12.4-468-g602e3159e817)
## Metric Fixes (compared to v6.12.4-468-g602e3159e817)
## Test result summary total: 106171, pass: 85929, fail: 3687, skip: 16555, xfail: 0
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 139 total, 137 passed, 2 failed * arm64: 54 total, 54 passed, 0 failed * i386: 18 total, 0 passed, 18 failed * mips: 34 total, 33 passed, 1 failed * parisc: 4 total, 3 passed, 1 failed * powerpc: 40 total, 39 passed, 1 failed * riscv: 24 total, 23 passed, 1 failed * s390: 22 total, 21 passed, 1 failed * sh: 5 total, 5 passed, 0 failed * sparc: 4 total, 3 passed, 1 failed * x86_64: 46 total, 46 passed, 0 failed
## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-mincore * kselftest-mqueue * kselftest-net * kselftest-net-mptcp * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-rust * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-x86 * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-build-clang * log-parser-build-gcc * log-parser-test * ltp-commands * ltp-containers * ltp-controllers * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
On 18. 12. 24, 14:19, Naresh Kamboju wrote:
On Tue, 17 Dec 2024 at 22:55, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
The all i386 builds failed with the gcc-13 and clang-19 toolchain builds on following branches,
- linux-6.12.y
- linux-6.6.y
- linux-6.1.y
- linux-5.15.y
- linux-5.10.y
- i386, build
- clang-19-defconfig
- gcc-13-defconfig
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Build log:
i686-linux-gnu-ld: arch/x86/kernel/static_call.o: in function `__static_call_update_early': static_call.c:(.noinstr.text+0x15): undefined reference to `static_call_initialized'
The recent commit on this file is, x86/static-call: provide a way to do very early static-call updates commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream.
Yes, the fix is at (via one hop): https://lore.kernel.org/all/aec47f97-c59b-403a-bf2a-d8551e2ec6f9@suse.com/
On 12/18/24 06:56, Jiri Slaby wrote:
On 18. 12. 24, 14:19, Naresh Kamboju wrote:
On Tue, 17 Dec 2024 at 22:55, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
The all i386 builds failed with the gcc-13 and clang-19 toolchain builds on following branches, - linux-6.12.y - linux-6.6.y - linux-6.1.y - linux-5.15.y - linux-5.10.y
- i386, build
- clang-19-defconfig - gcc-13-defconfig
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Build log:
i686-linux-gnu-ld: arch/x86/kernel/static_call.o: in function `__static_call_update_early': static_call.c:(.noinstr.text+0x15): undefined reference to `static_call_initialized'
The recent commit on this file is, x86/static-call: provide a way to do very early static-call updates commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream.
Yes, the fix is at (via one hop): https://lore.kernel.org/all/aec47f97-c59b-403a-bf2a-d8551e2ec6f9@suse.com/
The fix is not yet in mainline, meaning the offending patch now results in the same build failure there.
Guenter
On 18. 12. 24, 17:54, Peter Zijlstra wrote:
On Wed, Dec 18, 2024 at 08:53:39AM -0800, Guenter Roeck wrote:
The fix is not yet in mainline, meaning the offending patch now results in the same build failure there.
Yes, as I wrote in the aforementioned message :).
It's a test, to see if anybody except the build robots actually gives a damn about i386 :-)
This is not much of a robot -- openSUSE still provides 386. As a port now. I failed to destroy that port in past several attempts :P -- there were still users.
On 12/18/24 09:48, Jiri Slaby wrote:
On 18. 12. 24, 17:54, Peter Zijlstra wrote:
On Wed, Dec 18, 2024 at 08:53:39AM -0800, Guenter Roeck wrote:
The fix is not yet in mainline, meaning the offending patch now results in the same build failure there.
Yes, as I wrote in the aforementioned message :).
It's a test, to see if anybody except the build robots actually gives a damn about i386 :-)
This is not much of a robot -- openSUSE still provides 386. As a port now. I failed to destroy that port in past several attempts :P -- there were still users.
For my part, if I get a few more of the "nobody except the build robots actually gives a damn" or "why are you testing this or "you should not be testing this" responses, I _will_ stop testing this and any other configuration where people make such statements. I am getting tired of it. If anyone doesn't want something tested, it should be dropped from the kernel or at least be marked broken or disabled for build tests.
Guenter
On Tue, 17 Dec 2024 18:05:56 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.12: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 116 tests: 116 pass, 0 fail
Linux version: 6.12.6-rc1-g83a2a70d2d65 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Tue, Dec 17, 2024 at 06:05:56PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
Hi Greg,
On 17/12/24 22:35, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.12.6 release. There are 172 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
We now made tests for 6.12.y stable rc testing ready.
No problems seen on x86_64 and aarch64 with our testing. (Boot/ Install/ LTP so far)
We do see some problems with failing kselftests, will investigate them and report them here. They are not specific to 6.12.6, most of them were reproducible on 6.12.5, but we will have to look at all of the failing kselftests before reporting.
Tested-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
Thanks, Harshit
On Tue, 17 Dec 2024 18:05:56 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Responses should be made by Thu, 19 Dec 2024 17:05:03 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.6-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y and the diffstat can be found below.
(Late reply, but for the record)
Boot-tested under QEMU for Rust x86_64, arm64 and riscv64; built-tested for loongarch64:
Tested-by: Miguel Ojeda ojeda@kernel.org
Thanks!
Cheers, Miguel
linux-stable-mirror@lists.linaro.org